System Weakness - Medium

The Authorization Gap, Closed: A Practitioner’s Blueprint for TMF 672, OpenFGA, and Claude as an…

Soumit Saha — Fri, 05 Jun 2026 13:24:38 GMT

The Authorization Gap, Closed: A Practitioner’s Blueprint for TMF 672, OpenFGA, and Claude as an AI-Native Authorization Layer

Fourth and final in a four-part series: ReBAC Meets BSS, A Practitioner’s Blueprintfor AI-Native Role-Based Access in Telco

Four weeks ago I described a failure mode taking shape quietly inside enterprise AI deployments. Not a dramatic system outage. Not a security breach with a clear perimeter. A structural gap between when a party’s permission changes and when the authorization systems acting on behalf of that party reflect that change.

I argued that this gap is not new. It has always existed in enterprise authorization architectures. What is new is that AI agents inherit it and operate on it at speed and at scale, without the contextual judgment a human operator would apply to bridge it.

Over the past three weeks I have walked through the technical foundation of a response to that gap. This fourth and final article draws the architecture together, reflects on the design decisions that shaped it, is honest about where the build currently stands, and provides a working manual proof of concept that anyone can run today.

The Series in Four Sentences

Article 1 named three anti-patterns worth designing against: the over-permissioned agent, the under-permissioned agent, and the stale-permission agent. Article 2 introduced the two technical primitives that belong at the foundation of a solution: TMF 672 as a PermissionSet notification event source and OpenFGA as a relationship-based authorization graph. Article 3 showed what belongs between them: a hybrid orchestration layer with a deterministic fast path for known transitions and a Claude reasoning slow path for novel ones, running inside your AWS security boundary with party identity data never crossing to external infrastructure. This article closes the series with the implementation blueprint, the key design decisions, a manual proof of concept, and an honest account of where the build is.

Why TMF 672 and OpenFGA Belong at the Foundation

Before walking through the blueprint, I want to restate why these two specific primitives anchor the architecture rather than alternatives, and be precise about what TMF 672 actually is and what events it publishes.

TMF 672 as the Single Authoritative Source

TMF 672 is the User Role Permission Management API. Its primary function is managing the lifecycle of PermissionSets granted to Security Principals across a BSS landscape.

A few definitions from the v5.0.1 specification are worth stating precisely because they shape how the architecture reasons:

A PermissionSet is a set of permissions granted to a Security Principal. It may be granted explicitly by an authorized user or acquired implicitly through Party Role assignment. It has a validity period.

A PermissionSpecification is a definition of a permissible action on a function. Action is Read, Write, ReadWrite, or a domain-specific string such as Resell or Manage. Function is the entity class name, such as CustomerAccount, EnterpriseBroadband, or FaultManagement.

A SecurityPrincipal is either a human Individual or an autonomous software process defined as a Resource. AI agents are Security Principals in TMF 672 terms. This is not an extension of the standard. It is the standard’s own definition.

TMF 672 publishes three notification event types that this architecture acts on: PermissionSetCreateEvent when a new PermissionSet is granted, PermissionSetChangeEvent when an existing PermissionSet is modified, and PermissionSetDeleteEvent when a PermissionSet is terminated.

Manageable Assets as Event Sources

The critical architectural point is that TMF 672 PermissionSet events are not limited to direct administrative assignments. They are also triggered by state changes in Manageable Assets across the BSS landscape.

A Manageable Asset is the realisation of something that can be used and managed by users. The TMF 672 v5.0.1 specification defines these explicitly: resources created as part of a purchased product, service instances provisioned under a product, blocks of personal data, eCare system registrations, digital service platform accounts, IoT devices, and home gateways.

Readers of Article 2 may recall the hair dryer that won a weatherman’s bet. It did not need to understand meteorology. It just needed to be plugged in at the right moment. The IoT authorization problem is the same pattern made serious: a building controller whose operational state changed, a 5G device transitioning between zones with different regulatory constraints, or a smart meter reporting a tampered state. Each is a Manageable Asset. Each state change carries authorization intent. And when that state change affects the permissions of a Security Principal associated with that asset, TMF 672 publishes the resulting PermissionSet notification event.

The translation layer in this architecture does not need to understand what triggered the PermissionSet event upstream. Whether the event originated from a direct Party Role assignment, a consent asset being withdrawn, an account asset reaching a credit limit, or a contract asset expiring, the translation layer sees one thing: a TMF 672 PermissionSet notification event. The complexity of what triggered it is upstream and out of scope.

This single-source design has three direct architectural benefits.

First, it is cleaner. The translation layer has one event source, one schema, and three notification event types to handle. There is no aggregation layer required because TMF 672 has already normalised the permission implications of upstream asset state changes into standardised PermissionSet events.

Second, it is more auditable. Every PermissionSet event entering the translation layer carries the PermissionSpecification function and action fields that describe precisely what capability is being granted or revoked and on which Manageable Asset. The audit trail from asset state change to authorization graph update is contained within the TMF 672 event payload itself.

Third, it is more precisely aligned with how TM Forum ODA actually works in production. Operators who have adopted ODA already have TMF 672 in their BSS landscape, managing PermissionSets for parties across their estate. This architecture is not asking them to add new infrastructure. It is asking them to act on what TMF 672 is already publishing.

Why OpenFGA Is the Right Authorization Primitive

OpenFGA earns its place for a different reason. Traditional RBAC models authorization as a flat assignment: a user has a role, and the role has permissions. Telco authorization is not flat.

A reseller organisation manages enterprise customers who own product Manageable Assets available in markets governed by regulatory frameworks. An AI agent SecurityPrincipal, holds a PermissionSet granted by an individual over specific Manageable Assets with a defined validity period. A service Manageable Asset in a restricted state carries different permission implications than the same asset in an active state. These are all graphs of relationships, not flat role assignments.

OpenFGA models that graph natively, express it with precision, and answers authorization queries by traversing the full relationship chain in real time. It is also writable via API, which is the property that makes a programmatic reasoning layer possible.

Together, TMF 672 and OpenFGA form a foundation that most Tier-1 operators already have in production on the input side and can adopt with minimal infrastructure overhead on the output side. The architecture is not asking operators to replace their BSS landscape. It is asking them to extend the output of the TMF 672 PermissionSet event stream into a dynamic, real-time authorization graph that AI agents can query before every action.

The Blueprint

The complete architecture in plain language before the diagram.

A TMF 672 PermissionSet notification event arrives. It may be a PermissionSetCreateEvent granting new permissions, a PermissionSetChangeEvent modifying existing permissions, or a PermissionSetDeleteEvent terminating a PermissionSet. It may have originated from a direct administrative assignment or from a Manageable Asset state change. From the translation layer’s perspective, the origin is irrelevant. TMF 672 is the single authoritative input.

The event is published to an event stream, either Apache Kafka or AWS EventBridge, depending on the operator’s existing infrastructure. The orchestration service consumes the event, validates it against the TMF 672 schema, and performs two enrichment steps before making any routing decision.

Enrichment step one: the orchestration service reads the current OpenFGA tuples held by the affected Security Principal. This gives the reasoning layer the context to understand what needs to change, not just what the new state should look like.

Enrichment step two: the orchestration service retrieves the market and regulatory flags applicable to this PermissionSet event.

Only after both enrichment steps are complete does the routing decision occur. The orchestration service constructs a TransitionKey from four fields derived from the enriched event: previous PermissionSet state, new PermissionSet state, market, and regulatory tier. It checks this key against a rules cache.

In a fresh deployment, the cache is empty, and every event goes through the slow path. This is correct and expected. The cache populates through operational learning over time.

If the TransitionKey exists in the cache with a promoted confidence value of 0.99, the fast path fires. Pre-validated tuple mutations are served directly to the OpenFGA Write endpoint with the Security Principal ID substituted at runtime from the current event. The transaction completes in milliseconds. Claude is never invoked. The audit trail records the fast path execution.

If the TransitionKey is novel or not yet in the cache, the slow path fires. The enriched event is sent to Claude on AWS Bedrock via a VPC private endpoint. Claude receives the PermissionSet notification event, including the PermissionSpecification function and action fields, the current OpenFGA relationship schema injected at runtime, and the market and regulatory flags. It reasons about which tuple mutations are required and returns structured output: the mutations, a plain language justification for each decision referencing the specific PermissionSpecification, a confidence score, and any regulatory flags raised.

A structural validation step confirms that every proposed tuple type exists in the OpenFGA schema before the output reaches the human review gate. The gate routes on two signals: Claude’s runtime confidence score and the structural validation result. High confidence with no flags proceeds to a lightweight review. Lower confidence or regulatory flags route to deeper review. Low confidence or high severity flags block pending senior assessment.

Approved outputs write to OpenFGA atomically. The complete audit record, the original TMF 672 PermissionSet event, Claude’s reasoning, the mutations applied, the confidence score, and the reviewer’s approval is logged to CloudWatch alongside the Bedrock inference record.

Approved slow path outputs for novel events increment an approval count for their TransitionKey. At the promotion threshold, the pattern moves to the fast path rules cache with a promoted confidence of 0.99. The system learns without retraining. The proportion of events requiring Claude inference decreases as the deployment matures.

AI agents query the OpenFGA Check endpoint before every action they take on behalf of a Security Principal. Not at session initialisation. Not once per day. Before every action. This is what closes the stale-permission anti-pattern. The agent operates on the live authorization graph, not a cached snapshot.

The Key Design Decisions

Every architecture involves decisions that could have gone differently. Here are the four that shaped this one most significantly.

Decision 1: TMF 672 as a single source rather than direct Manageable Asset event consumption

The temptation when designing an event-driven authorization architecture is to consume directly from the upstream Manageable Asset state change events: the billing system, the consent platform, the contract management layer, each feeding the translation layer directly.

The single source approach through TMF 672 PermissionSet events is architecturally correct for three reasons. TMF 672 has already normalised the permission implications of upstream asset state changes into standardised PermissionSet events with PermissionSpecification function and action fields. Consuming directly from upstream systems would require the translation layer to replicate that normalisation logic. And TMF 672 as the single authoritative publication point means the audit chain is complete: the PermissionSpecification fields in the event describe precisely what capability changed and on which Manageable Asset, without requiring the translation layer to reach back to the upstream source.

Decision 2: Enrichment before routing, not routing on raw event reception

The routing decision cannot be made on the raw TMF 672 event alone. The TransitionKey requires four fields: previous PermissionSet state, new PermissionSet state, market, and regulatory tier. The market and regulatory tier fields come from the enrichment step, not the raw event. Routing before enrichment would produce an incomplete TransitionKey and unreliable cache lookups.

The enrichment step also populates Claude’s context on the slow path. Without knowing the current OpenFGA tuples held by the affected Security Principal, Claude cannot reason about what needs to change. It can only reason about what the new state should look like, which is insufficient for producing correct atomic delete and create mutation sets.

Decision 3: Hybrid fast path and slow path rather than pure AI reasoning

The temptation when designing an AI-native architecture is to route everything through the reasoning layer. It is simpler to build and sidesteps the complexity of maintaining a rules cache.

The hybrid approach is harder to build but architecturally correct for three reasons. It is significantly cheaper at scale: the majority of PermissionSet events in a mature deployment will be served from the fast path at negligible inference cost. It is faster: the fast path completes in milliseconds, whereas the slow path involves inference latency. And it is more trustworthy: the fast path serves human-verified mutations rather than AI-generated ones, providing a stronger basis for authorization decisions that carry the highest compliance stakes.

Decision 4: Schema injection at runtime rather than fine-tuning

The most architecturally significant prompt engineering decision in this architecture is injecting the OpenFGA relationship schema into the Claude system prompt at runtime rather than attempting to fine-tune a model on a specific deployment’s authorization model.

Schema injection means the reasoning layer is immediately portable across different OpenFGA deployments with different relationship models. A different operator with a different schema gets the same architecture with a different system prompt. No retraining. No fine-tuning. No model management overhead. The reasoning layer adapts to the deployment it is operating in rather than requiring the deployment to conform to a pre-trained model. It also means Claude is constrained to produce only tuple types that exist in the schema, which is the primary defence against hallucinated tuple mutations.

Rather than wait for the full code implementation to be production-quality before sharing anything, I have published a manual proof of concept as a GitHub Gist that anyone can run in approximately thirty-five minutes with no code, no infrastructure, and no cloud spend.

The Gist contains eight files: a README with TMF 672 v5.0.1 terminology reference and setup instructions, a Claude system prompt to copy directly into Claude.ai, four example TMF 672 PermissionSet notification event files, an OpenFGA authorization model for the Playground, and a step by step demo walkthrough with expected outputs.

The four event files use the correct TMF 672 v5.0.1 event types and payload structure: PermissionSetCreateEvent for the reseller grant scenario, PermissionSetDeleteEvent for the AI agent consent revocation scenario, PermissionSetChangeEvent for the credit limit restriction scenario, and PermissionSetDeleteEvent for the partner contract expiry scenario. Each event carries PermissionSpecification function and action fields. The AI agent scenario correctly models the agent as a SecurityPrincipal with referredType Resource, which is the precise TMF 672 v5.0.1 framing for autonomous software process actors.

The workflow is straightforward. Paste the system prompt into Claude.ai. Paste an event file as your next message. Claude returns the tuple mutations with reasoning that references the specific PermissionSpecification and a confidence score. Paste those mutations into the OpenFGA Playground at play.fga.dev. Run authorization checks before and after to observe how the authorization graph changes.

Github Gist

The Gist link [Readme.md] provides all manual demo resources.

This manual proof of concept demonstrates the slow path reasoning. What it defers to the full code implementation is the automated event ingestion, the Go orchestration layer with enrichment and TransitionKey construction, the fast path rules cache with Security Principal ID substitution at runtime, the AWS Bedrock integration, the Step Functions review gate, and the persistent OpenFGA store. The reasoning pattern, the system prompt design, and the tuple mutation outputs are identical between the manual proof of concept and the production architecture.

What the Full Production Implementation Requires

For practitioners considering this architecture in an enterprise context, here is an honest account of what the proof of concept defers and what a production implementation would need to address.

TMF 672 event stream integration: The proof of concept uses manual JSON inputs. A production implementation connects to the live TMF 672 PermissionSet event stream. The orchestration service subscribes to PermissionSetCreateEvent, PermissionSetChangeEvent, and PermissionSetDeleteEvent notifications. The event stream is fed by whatever upstream Manageable Asset state changes trigger PermissionSet mutations in the operator’s BSS landscape. The translation layer does not need to know or care about those upstream sources.

Event infrastructure: A production implementation replaces manual JSON inputs with a Kafka or AWS EventBridge consumer. The orchestration service and reasoning layer components are identical. The event ingestion mechanism changes.

Enrichment layer: The production orchestration service enriches each event before routing by reading the current OpenFGA tuples for the affected Security Principal and retrieving market and regulatory flags. The enrichment step is a precondition for TransitionKey construction and for meaningful Claude reasoning.

AWS Bedrock migration: The proof of concept uses Claude.ai directly. A production implementation uses AWS Bedrock with a VPC private endpoint so that Security Principal identity data never leaves your cloud security boundary. The system prompt and reasoning output structures are identical. Only the transport and authentication mechanism changes.

Persistent OpenFGA store: The proof of concept uses an in-memory OpenFGA instance that resets on container restart. A production implementation uses a PostgreSQL or MySQL backed store with appropriate backup and recovery procedures.

Confidence threshold calibration: The 0.70 confidence threshold in the proof of concept is a starting heuristic. A production implementation calibrates this empirically against a sample of real PermissionSet events, observing where Claude’s self-assessed confidence correlates with human reviewer decisions.

Batching for large mutation sets: OpenFGA has a default maximum of 10 tuple operations per write request. Complex PermissionSet deletions generating more than 10 mutations require batching with compensating transaction logic to handle partial failures.

Step Functions review gate: The proof of concept implements a simple confidence threshold gate. A production implementation replaces this with an AWS Step Functions state machine providing a proper approval workflow, timeout escalation, and audit logging.

What This Architecture Is and Is Not

I want to close the series with a precise statement of scope because the architecture has been described across four articles and precision matters.

This architecture is an authorization infrastructure pattern that uses AI reasoning for a specific, bounded function: translating TMF 672 PermissionSet notification events into OpenFGA tuple mutations when deterministic rules are insufficient. It is not a general-purpose AI agent deployment framework. It is not a replacement for existing IAM or identity management systems. It is not a claim that AI reasoning should be introduced into every authorization decision.

The AI layer earns its place in this architecture because it is doing something the system cannot do without it: reasoning about the authorization implications of novel PermissionSet events that no rules engine has anticipated, whether those events originated from a commercial contract expiry, a consent asset withdrawal, a billing threshold breach, or an IoT Manageable Asset changing operational state. Remove it and the system falls back to the manual processes described in Article 1. The same trouble tickets. The same propagation lag. The same authorization drift that AI agents inherit and amplify as SecurityPrincipals acting on stale PermissionSet state.

That is the precise scope. No more and no less.

The Broader Observation

The authorization gap described in this series is not a telco-specific problem. It is a telco-specific manifestation of a problem that exists in every enterprise deploying AI agents into complex, multi-role environments where SecurityPrincipal permissions change constantly and authorization state struggles to keep pace.

TMF 672’s PermissionSet model, a clean separation of what is permitted (PermissionSpecification), how permissions are grouped (PermissionSpecificationSet), and what has been granted to whom (PermissionSet), is an industry-curated answer to something genuinely complex. It is also a transferable pattern. Healthcare systems managing patient consent and clinician PermissionSets. Financial services platforms managing customer tier changes and advisor access grants. Public sector environments managing citizen service entitlements and case worker permission assignments.

Every industry has Manageable Assets whose state changes carry authorization intent. Every industry has Security Principals, both human and autonomous, whose PermissionSets need to reflect those changes in real time. Every industry has the same gap between when a permission changes and when the systems acting on that permission reflect the change.

The architecture described in this series closes that gap for telco. The pattern is transferable everywhere the gap exists.

The authorization gap is everywhere. The pattern for closing it is here.

A Final Question for the Series

Over the past four weeks the closing question in each article has generated the most substantive conversations in the comments. This is the last one.

If you are designing or evaluating authorization architectures for AI agent deployments in your own environment, what is the specific constraint making it hardest to solve? The data boundary question, the explainability requirement, the manual PermissionSet authoring bottleneck, Manageable Asset event integration complexity, or something else entirely?

I am building the full code implementation and the answers to that question will directly inform what I prioritise and what I document when I share it.

The comment section is open. So is my inbox.

Soumit Saha is a Digital Platform and Technical Architect with 25 years of experience in telco, cloud, and enterprise integration. He has led the adoption of TM Forum Open APIs across multiple markets and holds TOGAF 9, ODA Practitioner, and AWS Cloud Architect certifications. This series represents his personal architectural thinking and does not reflect the views or systems of any employer.

The build continues. The manual proof of concept is live now. The full code implementation follows. Follow along for updates.

Originally published at https://www.linkedin.com.

The Authorization Gap, Closed: A Practitioner’s Blueprint for TMF 672, OpenFGA, and Claude as an… was originally published in System Weakness on Medium, where people are continuing the conversation by highlighting and responding to this story.

Why the Translation Layer Needs to Reason, Not Just Route: Claude, AWS Bedrock, and the Security…

Soumit Saha — Fri, 05 Jun 2026 13:24:36 GMT

Why the Translation Layer Needs to Reason, Not Just Route: Claude, AWS Bedrock, and the Security Boundary Question

Third in a four-part series: ReBAC Meets BSS, A Practitioner’s Blueprint for AI-Native Role-Based Access in Telco

If you have followed this series from the beginning, you will have arrived at this article with a reasonable objection forming. I know this because it is the same objection every architect I respect would raise.

Why not just build a rules engine?

Map the known TMF 672 role transitions to OpenFGA tuple mutations deterministically. Encode the UK market constraints as rules. Add the NIS2 flags as conditions. Test it thoroughly. Deploy it. Done. No AI inference cost. No latency overhead. No explainability concerns. No security boundary questions.

It is a fair objection. And for a significant proportion of role transitions it is the right answer. I want to address it directly before proposing anything else.

The Rules Engine Is Right for the 80 Percent

A deterministic orchestration service handles the majority of TMF 672 role transitions perfectly well. The transitions that are well understood, repeatable, and market-agnostic do not need a reasoning layer. They need a fast, reliable, cheap translation service.

A note on implementation language. The orchestration service in this architecture is described using Go. Go reduces the entry barrier for experimentation: its concurrency model suits event-driven workloads, its binary compilation simplifies deployment, and its standard library handles the HTTP and JSON requirements without additional dependencies. That said, the orchestration layer is not language-prescriptive. A Java-based implementation using an open-source rules engine such as Drools would handle the fast path routing logic with equal capability and may be the natural choice in organisations where Java is the established platform language. An inner-sourced rules engine built on existing enterprise frameworks is equally valid. The pattern is what matters. The implementation language is an operational decision.

A consumer upgrades from a standard plan to a premium plan. The tuple mutations are known. The product scope is fixed. The regulatory implications are unchanged. A rules engine handles this in milliseconds at negligible cost. There is no justification for sending this transition to an AI reasoning layer.

The rules engine is not the wrong answer. It is the right answer for the cases it can handle reliably. The architecture I am proposing uses it as the fast path for precisely this reason.

The problem is the other 20 percent.

Where the Rules Engine Breaks Down

A Tier-1 telco operating across multiple markets, regulatory frameworks, product scopes, and partner relationship models generates a continuous stream of role transitions that no rules engine can fully anticipate. Not because the rules engine is poorly built. Because the problem space exceeds what deterministic rules can reliably express.

Consider three examples from Article 2.

A consumer in the UK transitions to a ResellerPartner role scoped to Enterprise Broadband and VPN products under NIS2 constraints. The rules engine handles this if the transition has been explicitly modelled. But what if this is the first time this specific combination of product scope, market, and partner tier has occurred in this deployment? The rules engine has no rule for it. It fails, routes to a human, or worse, silently produces an incomplete mutation set.

A B2B2C reseller partner in mid-contract renegotiation has their tier temporarily elevated while commercial terms are agreed. The elevation is conditional on milestone completion and subject to a thirty-day review. The rules engine cannot express conditional, time-bounded, milestone-dependent tuple mutations. It was not designed to. The problem requires contextual reasoning about the business state, not pattern matching against pre-encoded transitions.

An enterprise account mid-migration between two connectivity products holds a complex authorization state that exists nowhere in the rules catalogue because it was assembled from a sequence of transitions that individually were routine but in combination produced a novel permission configuration. Resolving it requires understanding the history of how the account arrived at its current state, not just the current event.

In a Tier-1 operator processing thousands of role transitions daily, 20 percent is not a small number. And the transitions in that 20 percent are disproportionately the ones with the highest commercial stakes, the most complex regulatory implications, and the greatest potential for authorization drift if handled incorrectly.

This is where the reasoning layer becomes necessary. Not as a replacement for the rules engine. As the complement to it.

The Hybrid Architecture: Fast Path, Slow Path

The architecture that fills the broken arrow from Article 2 is a hybrid. Not AI instead of rules. AI alongside rules, each handling the subset of the problem it is best suited for.

Here is how it works end to end.

A TMF 672 role change event fires. The orchestration service receives it, validates it against the TMF 672 schema, and enriches it with two pieces of context retrieved from internal sources: the current OpenFGA tuples held by the party, and the market and regulatory flags applicable to this transition.

The enriched event reaches a routing decision. The orchestration service checks a rules cache keyed on the combination of previous role, new role, market, and regulatory tier. In a fresh system the cache is empty and every transition goes through the slow path. This is correct and expected. The cache populates over time through operational learning, which I will describe shortly.

If the transition pattern exists in the cache with a promoted confidence value, the fast path fires. Pre-validated tuple mutations are served directly. The OpenFGA Write endpoint is called immediately. Claude is never invoked. No review gate approval is required. The audit trail records the fast path execution.

If the transition is novel or not yet in the cache, the slow path fires. The enriched event is sent to Claude on AWS Bedrock. Claude receives three inputs: the enriched role transition event in structured form, the current OpenFGA relationship schema for this deployment, and the relevant market and regulatory flags. It reasons about which tuple mutations are required, returns structured output containing the mutations, a plain language justification, a confidence score between zero and one, and any regulatory flags raised.

The Two Confidence Scores and What They Each Do

This is worth being precise about because they are different values doing different jobs.

Claude’s runtime confidence score is returned by Claude on every slow path inference call. It reflects Claude’s self-assessed certainty about its proposed mutations given the context it received. Factors that push it higher include clear unambiguous role semantics, obvious schema mappings, and simple single-layer transitions. Factors that push it lower include novel market and regulatory combinations, complex multi-layer B2B2C transitions, and ambiguous current tuple states. This score routes the output through the human review gate tiers and is then logged to the audit trail. It is consumed by the slow path only and does not persist beyond that inference call.

The promoted cache confidence is a separate value assigned by the orchestration service when a slow path output is promoted to the fast path rules cache. It is a fixed high value, typically 0.99, assigned after a transition pattern has accumulated a defined number of consistent human approvals without modification. It signals that this mutation set has been operationally verified and is trusted for deterministic execution without Claude or a review gate.

The fast path never uses Claude’s runtime confidence. The slow path never uses the promoted cache confidence. They operate in separate parts of the architecture and serve separate purposes.

A structural validation step in the orchestration service adds a second signal alongside Claude’s runtime confidence. Before routing through the review gate, the service checks that every proposed tuple type exists in the OpenFGA schema and that every user type is valid for its assigned relation. A mutation set that fails structural validation is blocked regardless of how high Claude’s confidence score is. The combined signal gives the review gate a more robust routing basis than confidence alone.

The routing logic with both signals applied:

Structural validation FAILED
→ Block regardless of confidence score

Structural validation PASSED + confidence >= 0.90
→ Lightweight review, fast approval expected

Structural validation PASSED + confidence 0.70 to 0.89
→ Standard review queue

Structural validation PASSED + confidence < 0.70
→ Deep review or block pending expert assessment

The Learning Mechanism: How the Cache Populates

The system improves over time through operational learning, not model retraining.

When a slow path output for a specific transition pattern has been approved by a human reviewer a defined number of times without modification, the orchestration service promotes that pattern to the fast path rules cache. The TransitionKey, a combination of previous role, new role, market, and regulatory tier, is added to the cache with the pre-validated mutations and a promoted confidence of 0.99.

The next time that exact transition pattern arrives, the fast path serves it directly. Claude is not invoked. The review gate is bypassed. The audit trail records the fast path execution.

This has two practical consequences. First, the system gets faster and cheaper as the cache grows. The proportion of transitions requiring Claude inference decreases as the deployment matures. Second, the rules engine eventually reflects the real-world authorization semantics of this specific deployment rather than a theoretically authored set of rules that may not match operational reality.

The cache starts empty in every fresh deployment. Every transition begins on the slow path. This is not a limitation. It is the mechanism by which the system learns from its own operational history.

The Human Review Gate

The review gate is the enterprise trust mechanism that makes this architecture deployable today rather than in a future state where AI reasoning is trusted to operate without oversight.

It is not an admission that Claude cannot be trusted. It is a configurable risk control that applies the same tiered governance logic enterprises already use for financial approvals, change management, and access control requests. The pattern is familiar. The application to AI reasoning output is new.

Three tiers in the review gate:

High confidence transitions with no regulatory flags proceed to lightweight review. A reviewer sees Claude’s plain language reasoning, the proposed mutations in human-readable form, and approves or dismisses within a short window. The expectation is that the vast majority of slow path transitions fall here.

Medium confidence or medium severity regulatory flags route to a standard review queue with a defined response window. The reviewer examines the proposed mutations in detail before anything is written.

Low confidence or high severity flags are blocked pending senior review with a shorter response window. Timeout escalation prevents indefinite stalling on a difficult case.

The reviewer does not need to understand OpenFGA internals. Claude’s plain language justification does that translation. The reviewer is making a business and compliance judgement, not a technical one.

Approved outputs increment the approval count for their TransitionKey. At the promotion threshold, the pattern moves to the fast path. The gate that governs slow path outputs today is building the fast path that makes them unnecessary tomorrow.

Why This Is AI-Native, Not AI-Bolted-On

The distinction matters and is worth being precise about.

Most enterprise AI proposals add a conversational interface on top of an existing system. The underlying system is unchanged. The AI is a layer of convenience. Remove it and the system still functions, perhaps less pleasantly, but functionally.

What this architecture proposes is different. The AI reasoning layer is performing a function the system cannot perform without it. The translation from TMF 672 role semantics to OpenFGA tuple mutations across novel, multi-market, regulatory edge cases requires contextual reasoning that cannot be fully encoded as deterministic rules. Remove Claude from this architecture and the system does not degrade gracefully. It falls back to the manual processes described in Article 2. The same trouble tickets. The same propagation lag. The same operational risk the architecture was designed to eliminate.

The test for AI-native architecture is precise: is the AI doing something the system genuinely cannot do without it? In this architecture, for the subset of transitions that reach the slow path, yes.

The Security Boundary Question

Before an enterprise security team approves any architecture involving an external AI API, they will ask one question with predictable precision.

Does using Claude mean our party identity data leaves our cloud boundary?

It is the right question. TMF 672 role transition events contain party identifiers, market data, product scope, and regulatory context. This is commercially sensitive identity information subject to GDPR, NIS2, and telecoms-specific regulatory obligations. Sending it to any external API is a data boundary crossing that requires explicit risk assessment and approval.

The answer depends entirely on which deployment pattern you choose. There are four options and they are not equivalent.

Pattern 1: Direct Claude API

The TMF 672 event is sent directly to api.anthropic.com. The raw event including party identifiers crosses your cloud boundary to Anthropic-operated infrastructure. This is unacceptable for production. Your data protection officer will correctly block it. Do not use this pattern for PII-containing payloads.

Pattern 2: Anonymised Payload to Direct Claude API

Party identifiers are stripped before the boundary is crossed. Claude receives role transition semantics, product scope categories, and regulatory tier information only. A re-hydration function inside your boundary maps Claude’s output back to the actual party before writing to OpenFGA.

This is viable with trade-offs. Claude’s reasoning quality is reduced because it lacks the full party context that sometimes informs authorization decisions. The re-hydration logic adds complexity. And the anonymisation step must be audited carefully: partial anonymisation that leaves inferrable identifiers is not anonymisation.

Pattern 3: AWS Bedrock Standard

Claude inference runs within your AWS environment. The TMF 672 event never leaves your AWS boundary. AWS is the data processor under your existing AWS Data Processing Agreement. Anthropic never receives the data. Full CloudTrail audit logging of every inference call is available natively. Your existing AWS security controls wrap the entire flow.

This is the production-ready pattern for most enterprise environments. The data boundary question has a structural answer, not a configuration-dependent one. It resolves entirely because Anthropic-operated infrastructure is simply not in the data path.

Pattern 4: AWS Bedrock with VPC Private Endpoint

The same as Pattern 3 but inference traffic is routed through a VPC private endpoint, meaning it never traverses the public internet even within AWS. The TMF 672 event travels from your application over AWS private network infrastructure to Bedrock and back. Zero public internet exposure at any point.

This is the maximum security posture and the pattern I recommend for production deployments in regulated telco environments. It is defensible to a CISO, a Data Protection Officer, and an external auditor under GDPR and NIS2. The additional infrastructure overhead is a VPC endpoint configuration, which is a one-time setup within your existing AWS environment.

Add Bedrock Guardrails to either Bedrock pattern and you gain a native PII detection and redaction layer as an additional safety net. Bedrock Guardrails can detect and flag PII in inference payloads before they reach the model, adding a second layer of protection alongside your orchestration layer’s enrichment and validation logic.

A Note on Claude Platform on AWS

A relevant development worth addressing directly. Claude Platform on AWS reached general availability in May 2025, making it a current option for architects evaluating Claude deployment patterns.

The fundamental distinction is the operating model. AWS Bedrock is AWS-operated. Data stays within the AWS boundary. Anthropic never touches it. Claude Platform on AWS is Anthropic-operated. Data crosses to Anthropic-managed infrastructure. AWS handles authentication and billing only.

For TMF 672 role transition events containing party identity data, AWS Bedrock remains the correct recommendation. The AWS DPA is already in place as part of your existing AWS commercial relationship. The data boundary is structural. No separate Anthropic DPA review is required from your legal and procurement teams.

Claude Platform on AWS becomes worth evaluating in two specific scenarios. First, if you implement full anonymisation before the reasoning call, in which case no PII crosses the boundary and Claude Platform on AWS gives you same-day access to the latest Claude features without Bedrock’s feature lag. Second, as your architecture matures and the reasoning layer expands beyond tuple mutations into more complex agentic workflows where Claude’s Agent Skills and MCP connectors become relevant.

For the architecture described in this series, AWS Bedrock with VPC private endpoint is the right choice. Claude Platform on AWS is worth keeping on your evaluation list for future phases.

The Audit Trail as a First-Class Output

The second most consistent enterprise objection to AI reasoning layers is explainability. Not security. Explainability.

How do you audit what the AI decided and why? How do you respond to a regulator who asks why a specific authorization change was made? How does your compliance team demonstrate that authorization decisions were made with appropriate governance?

In this architecture the audit trail is not a secondary consideration added to satisfy compliance requirements. It is a designed output of the reasoning layer itself.

Every Claude inference call returns a plain language reasoning field alongside the tuple mutations. This field explains, in language a compliance officer can read without understanding OpenFGA internals, exactly why each tuple was created or revoked. It is logged alongside the mutations, the original TMF 672 event, the confidence score, the structural validation result, and the CloudTrail inference record.

The complete audit record for any authorization change answers four questions: what role transition triggered the change, what authorization mutations were made, why those specific mutations were made, and which human reviewer approved them before they were written.

This directly addresses the explainability requirements under GDPR Article 22, NIS2 audit obligations, and the telecoms-specific regulatory frameworks that govern authorization decisions in Tier-1 operator environments.

The Investment Case

This is not an AI experiment. It is authorization infrastructure that happens to use AI reasoning for the subset of transitions that deterministic rules cannot handle reliably.

It keeps all party identity data inside your existing AWS security boundary. It produces an audit trail that satisfies regulatory explainability requirements. It has a human review gate that preserves governance without sacrificing automation. It starts on the slow path and learns its way to the fast path through operational approval cycles. And it directly unblocks the enterprise AI agent deployments your organisation has already committed to but cannot yet deploy safely without a dynamic, context-aware authorization layer underneath them.

The AI agents you are building are only as trustworthy as the authorization layer they operate on. That layer cannot be static. It cannot be manually maintained. And it cannot be built from deterministic rules alone when the problem space exceeds what rules can reliably express.

Next week I close the series with the implementation blueprint and what I learned designing it. The build continues beyond the series.

Soumit Saha is a Digital Platform and Technical Architect with 25 years of experience in telco, cloud, and enterprise integration. He has led the adoption of TM Forum Open APIs across multiple markets and holds TOGAF 9, ODA Practitioner, and AWS Cloud Architect certifications. This series represents his personal architectural thinking and does not reflect the views or systems of any employer.

Next: Article 4, The Authorization Gap, Closed: A Blueprint for TMF 672, OpenFGA, and Claude on AWS Bedrock

Originally published at https://www.linkedin.com.

Why the Translation Layer Needs to Reason, Not Just Route: Claude, AWS Bedrock, and the Security… was originally published in System Weakness on Medium, where people are continuing the conversation by highlighting and responding to this story.

ReBAC Meets BSS: Why TMF 672 and OpenFGA Belong Together

Soumit Saha — Fri, 05 Jun 2026 13:24:34 GMT

Part 2 of 4 — The gap, nobody is designing for.

Second in a four-part series: ReBAC Meets BSS, A Practitioner’s Blueprint for AI-Native Role-Based Access in Telco

Last week (Part 1) I described three anti-patterns that I believe are taking shape quietly inside enterprise AI deployments: the over-permissioned agent, the under-permissioned agent, and the stale-permission agent. I argued that all three share a common structural cause: the gap between when a party’s role changes and when the authorization systems acting on behalf of that party reflect that change.

Before introducing the two technical primitives I believe belong at the foundation of a proper solution, I want to ground that abstract gap in something concrete. Because the authorization gap is not a future problem waiting to emerge when AI agents mature. It is a present operational reality that AI agents will inherit, and amplify.

The question worth sitting with is this: how many ways can the authorization state of a single party legitimately change in one day inside a Tier-1 telco operation?

The answer is more than most authorization architectures are designed to handle in real time.

The Many Faces of Authorization Change

Authorization state does not change through a single well-defined mechanism. It changes through a continuous stream of business events, each originating in a different system, each carrying different urgency, and each requiring a different set of downstream permission updates. Here are five of the most consequential.

The AI personal assistant whose mandate just ended

A consumer grants an AI personal assistant permission to manage their account. View bills, raise fault reports, modify add-on services. The consent is explicit, scoped, and recorded in a consent management system. The assistant begins operating on their behalf.

Three weeks later the customer withdraws consent. Perhaps they are uncomfortable with a recent interaction. Perhaps they simply changed their mind. The withdrawal is recorded.

How quickly does that withdrawal propagate to every system the assistant has been operating in? The billing system. The fault management platform. The product catalogue. The CRM. The network management layer. Each of these systems granted the assistant access when consent was established. Each needs to revoke that access when consent is withdrawn.

In most current implementations the answer is: not immediately. The withdrawal triggers a propagation process that touches each downstream system in sequence. The assistant may continue operating with full account permissions for hours or days after the customer believed they had withdrawn access. This is not a hypothetical future risk. Consent management APIs exist today. AI personal assistants are being deployed today. The propagation lag is a present architectural gap.

The credit limit that changed everything at midnight

A consumer hits their credit limit at 11:47pm. Their service is automatically restricted by the billing system. The restriction is a business event with immediate authorization implications: every system acting on behalf of this customer needs to operate within the restricted permission set from this moment forward.

The AI customer service agent handling their next interaction at 8am the following morning was initialized with the pre-restriction permission context. It offers the customer options they are no longer entitled to. Not because the AI reasoned incorrectly. Because the authorization context it was given reflects a financial state that no longer exists.

The partner that was offboarded on Friday afternoon

A reseller partner relationship is terminated. The offboarding is processed in the partner management system on a Friday afternoon. Across the B2B2C landscape that partner has been operating in, they hold permissions across a billing platform, a service management portal, a product catalogue, a customer data system, and a network configuration tool.

Revoking those permissions requires coordinated action across five teams managing five systems. By Monday morning, three of the five have been updated. Two have not. The partner’s access to customer data persists through the weekend. An AI agent operating in the partner portal context continues to honour permissions that should no longer exist.

The customer who exercised their right to erasure

A customer submits a GDPR right to erasure request. The request is acknowledged and logged. The erasure process begins. It touches the CRM, the billing history, the fault management records, the marketing preferences system, and the analytics platform.

While the erasure process is running, an AI personalisation agent is still operating on that customer’s profile. It has not been told the profile is in the process of being erased. Its authorization context has not been updated to reflect the regulatory event that is actively changing the customer’s data status. It continues to make personalisation decisions based on data that is legally in the process of ceasing to exist.

The trouble ticket that never closed properly

A field engineer raises a trouble ticket to diagnose a connectivity fault at a customer premises. The ticket triggers a temporary permission elevation scoped to the customer’s network segment. Standard procedure. The fault is resolved. The ticket is closed. The permission revocation is a separate manual step in a separate system. In the volume of daily operations, the step is missed.

Three weeks later the engineer still holds access they no longer need. The authorization system reflects a reality that ceased to exist when the ticket closed. Nobody knows. Nothing alerts. The gap is invisible until an access review surfaces it, which may be months away.

These five scenarios share the same structural characteristic. A legitimate business event changes the authorization reality of a party. The authorization layer does not reflect that change in a timely or reliable way. The gap is filled, imperfectly, by manual processes that depend on human memory, correctly configured workflows, and downstream systems being updated in the right sequence.

For human-operated systems, this lag has always been a compliance risk managed through periodic access reviews. It is tolerable, just about, because human operators carry contextual awareness that partially compensates for the authorization layer’s staleness. A human support agent knows their temporary elevation is scoped to a specific incident. They apply judgment. They do not use elevated access for tasks outside that scope.

An AI agent operating with the same temporarily elevated permission carries no such contextual awareness unless it is explicitly encoded in the authorization layer. The agent does not know the permission is temporary. It does not know it is scoped to a specific incident. It will use whatever permissions it has been given for whatever tasks it is asked to perform, because that is precisely what it was designed to do.

This is not a failure of the AI model. It is an architectural gap. The permission context the agent receives does not carry the metadata that a human operator would naturally apply. Closing that gap requires the authorization layer to encode not just what is permitted, but the conditions and constraints under which it is permitted, and to keep that encoding current as the operational reality around it evolves.

A Brief and Entirely Plausible Horror Story

Consider, purely as a thought experiment, the following sequence of events.

A field engineer raises a trouble ticket to diagnose a connectivity fault at a customer premises. The ticket triggers a temporary permission elevation scoped to the customer’s network segment. Standard procedure. The ticket is resolved. The permission is not revoked. Also standard procedure, regrettably.

Three months later a firmware update is pushed to the customer’s estate. The update touches a smart building management controller. The controller, now operating with a service account that inherited a fragment of the unrevoked elevated permission through a poorly scoped role propagation, begins reporting environmental telemetry to a network management endpoint it has no business accessing.

The network management AI agent, doing exactly what it was designed to do, notices the telemetry, infers a device is requesting a configuration update, and helpfully provisions the controller with expanded network visibility.

The controller is now, functionally, a network administrator. It did not ask to be. Nobody authorized it to be. A trouble ticket from three months ago and a firmware update from last week conspired to promote it.

This is not science fiction. Every step in that sequence is a documented pattern in operational technology environments. The IoT device did not attack anything. It simply existed in an environment where the authorization layer had not kept pace with the operational reality around it. The AI agent did not malfunction. It acted entirely rationally on the permissions it was given.

The hair dryer that wins a weatherman’s bet does not need to understand meteorology. It just needs to be plugged in at the right moment.

A word on the choice of TMF 672. The telecommunications industry has spent decades curating a deceptively simple model for something genuinely complex: how a party acquires, holds, and loses roles and the permissions those roles carry. That model, expressed in TMF 672, is not just a telco artefact. Party, role, and permission are universal authorization concepts. TMF 672 gives them a clean industry-validated shape that makes the real problems in authorization easier to see, easier to name, and easier to design against. That is why it anchors this series.

TMF 672: The Authorization Event Stream Nobody Is Using as One

Most architects who work with TM Forum Open APIs treat TMF 672, the Party Role Management API, as exactly what its name suggests: an API for managing the lifecycle of roles assigned to parties across a BSS landscape. Create a role. Update a role. Query a role. Delete a role. Standard operations on a standard resource.

I want to propose a different way of reading it.

Every TMF 672 role transition event is an authorization intent signal. When a consumer becomes a business account holder, that is not just a record update in a CRM. It is a statement that the downstream authorization state of every system acting on behalf of that party needs to change. When a reseller partner is downgraded, that is not just a tier change in a billing system. It is a cascading instruction to revoke, modify, and constrain permissions across every touchpoint that party interacts with. When an AI personal assistant’s consent is withdrawn, the consent event is an authorization revocation instruction directed at every system that assistant has been operating in.

TMF 672 is already in production in most Tier-1 telco operators. The authoritative source of party role truth exists. The event stream that signals when authorization state needs to change exists. What does not exist, in most implementations, is the intelligence to act on those signals in a timely, accurate, and auditable way.

Instead, what exists is the operational reality described above. Manual processes. Trouble tickets. Permission propagation that depends on human memory and correctly sequenced workflows across disconnected systems.

The gap is not a data gap. The data is there. It is a translation gap. Between the business event that TMF 672 records and the authorization state that downstream systems, and increasingly AI agents, act upon, there is a translation process that is currently slow, manual, and error-prone.

That translation process is what this series is ultimately about. But before introducing the translation layer, I want to introduce the authorization primitive that sits on the receiving end of it.

OpenFGA: Authorization as a Relationship Graph

Traditional Role-Based Access Control works well for straightforward permission models. A user has a role. The role has permissions. The mapping is static and relatively flat.

Telco authorization is not straightforward. It is not flat. And as the scenarios in this article illustrate, it is not static.

Consider what it means to express, in an authorization system, the following real-world relationship: a reseller partner is permitted to manage service requests on behalf of enterprise customers within their portfolio, but only for products within their contracted tier, only in markets where they hold an active agreement, and only for customers who have explicitly consented to partner-managed support.

A traditional RBAC model struggles with this. The relationship between the reseller, the enterprise customer, the product, the market, and the consent status is not a flat role assignment. It is a graph of relationships, each carrying its own conditions and constraints, each of which can change independently.

OpenFGA, the open-source implementation of Google’s Zanzibar authorization system, models exactly this kind of relationship complexity natively.

The core concept to understand is the relationship tuple. Think of a tuple as the simplest possible statement of a relationship between two things: “this entity has this relationship to that entity.” For example: reseller X manages customer Y. Customer Y owns product Z. Product Z is available in market M. Permissions are not assigned directly in OpenFGA. They are derived by traversing the chain of these relationship statements. A permission check asks: given everything I know about how these entities relate to each other, is this action permitted? OpenFGA traverses the graph and returns a precise, real-time answer. If any relationship in the chain changes, the answer to the permission check changes with it.

This matters for three specific reasons in the context of this series.

First, OpenFGA’s relationship model can express the multi-layered authorization complexity of B2C, B2B, and B2B2C telco contexts with a precision that RBAC cannot. The reseller relationship, the consumer relationship, the consent relationship, and the conditions that govern all three can be encoded in the same graph.

Second, OpenFGA’s tuple store is writable via API. This means an external system, or an AI reasoning layer, can propose and commit authorization changes programmatically. The authorization graph is not a static configuration. It is a dynamic, updateable representation of the current authorization state.

Third, OpenFGA provides real-time relationship checking at low latency. An AI agent can query OpenFGA before taking any action and receive a precise, current answer about what it is permitted to do. Not what it was permitted to do when its session was initialized. What it is permitted to do right now.

These three characteristics, expressive relationship modelling, programmatic writability, and real-time query capability, make OpenFGA the right authorization primitive for the problem this series is addressing.

The Natural Fit, and the Gap That Remains

TMF 672 and OpenFGA belong together because they are addressing the same problem from opposite ends.

TMF 672 knows when authorization state needs to change. It holds the authoritative record of party role transitions across the BSS landscape. Every time a consumer upgrades, a reseller is onboarded, a consent is withdrawn, a credit limit is breached, or a partner agreement expires, TMF 672 records that event.

OpenFGA knows how to represent authorization state with the precision and dynamism that complex telco relationships require. It can model the multi-layered permission structures of B2C, B2B, and B2B2C contexts. It can be updated programmatically. It can answer real-time authorization queries.

What sits between them is the translation problem. A TMF 672 role transition event needs to become a set of OpenFGA tuple mutations. The party that was a consumer needs to have their consumer relationship tuples revoked and their business account holder relationship tuples created. The reseller that was downgraded needs to have their premium product access tuples modified to reflect their new tier. The AI personal assistant whose consent was withdrawn needs to have every tuple that granted it access to the customer’s account systematically revoked.

This translation is not a simple data mapping exercise. The relationship between a TMF 672 role transition and the resulting OpenFGA tuple mutations is context-dependent. It varies by market, by product scope, by regulatory framework, and by the specific history of the party’s role transitions. It requires understanding not just what changed, but what that change means for every downstream relationship in the authorization graph. It is not the kind of translation that a deterministic rules engine handles reliably at the scale and complexity of a Tier-1 telco operation.

Here is where the gap currently sits:

Authorization Translation — The Broken Arrow

The broken arrow in that diagram is not a design choice. It is an accurate representation of how most enterprise authorization architectures currently handle the translation between business role events and authorization state. Manually. Slowly. With the operational risks this article has described.

Next week I will show what belongs in that gap, and why it needs to reason rather than simply route.

A Question for Practitioners

Before next week’s article I want to ask a direct question to anyone working on authorization architecture in enterprise or telco environments.

Which of the five authorization change triggers described here is causing the most operational friction in your current environment? And if you have designed or evaluated solutions to any of these patterns, I would genuinely like to understand what you found.

The comment section is open. So is my inbox.

Next: Article 3, Why the Translation Layer Needs to Reason, Not Just Route: Claude, AWS Bedrock and the Security Boundary Question

Originally published at https://www.linkedin.com.

ReBAC Meets BSS: Why TMF 672 and OpenFGA Belong Together was originally published in System Weakness on Medium, where people are continuing the conversation by highlighting and responding to this story.

The Authorization Problem Nobody Is Solving Before Deploying AI Agents

Soumit Saha — Fri, 05 Jun 2026 13:24:32 GMT

First in a four-part series: ReBAC Meets BSS, A Practitioner’s Blueprint for AI-Native Role-Based Access in Telco

Originally published on LinkedIn as part of a four part series on AI-native authorisation architecture in telco.

There is a failure mode that I believe is quietly taking shape across enterprise AI deployments right now.

It has not fully announced itself yet. There is no widespread pattern of system outages, security alerts, or failed deployment pipelines attributed to it. AI agents are starting up, connecting to enterprise systems, and beginning to operate. On the surface, everything looks fine.

But the conditions for a specific class of authorization failure are being built into these deployments today, and I think the industry is not paying sufficient attention to them before they become production incidents.

After twenty-five years of designing authorization and integration systems for telco and enterprise platforms, I have learned that authorization is the layer every programme underestimates. It is unglamorous, complex, and deeply contextual. And with enterprise AI deployments accelerating faster than authorization thinking can keep pace, that underestimation has the potential to become a structural risk.

This is not a model quality problem. An AI agent can reason correctly given what it knows about permissions. The question worth asking before deployment is: what happens when what it knows about permissions is no longer current?

Over the next four weeks I want to walk through a specific architectural response to this problem, one that connects telco industry standards, modern open-source authorization primitives, and AI reasoning in a way that I believe closes the gap properly. This first article is purely diagnostic. I want to name the potential failure pattern precisely before proposing any solution.

Why Telco Authorization Is Uniquely Complex

In most industries, a user is a user. They have a role, the role has permissions, and the mapping is relatively stable.

In telecommunications, a party is rarely just one thing. The same legal entity can simultaneously be a consumer customer, a business account holder, a reseller partner, and a wholesale connectivity buyer. Their permissions are not just a function of who they are. They are a function of which role they are acting in, which product they hold, which market they operate in, and which regulatory framework governs that interaction.

This complexity is not incidental. It is the business model. And it manifests differently across the three commercial contexts that define modern telco operations.

How the Problem Could Manifest: Three Scenarios

B2C: The Consumer Whose Status Changed Yesterday

In a Business-to-Consumer context, authorization appears straightforward on the surface. A consumer has a subscription. The subscription grants access to certain services.

Except consumer status changes constantly and at volume. Subscriptions upgrade and downgrade. Trial periods expire. Credit limits are breached. Loyalty tiers shift. Premium add-ons are added and removed. In a large consumer operation, these transitions number in the thousands daily.

When an AI agent is deployed into this environment, a customer service agent, a self-service portal agent, or a personalisation engine, it needs to know not just what a customer’s current entitlements are, but that its view of those entitlements reflects the state as of this moment, not as of whenever its authorization context was last refreshed.

The potential failure mode here is subtle but commercially significant. An AI agent offering a premium service to a customer whose subscription lapsed yesterday is not a security incident in the traditional sense. It is a revenue leakage event. Multiplied across thousands of interactions, it becomes a measurable commercial problem with an authorization root cause that could be genuinely difficult to trace after the fact.

There is also a trust dimension worth considering. A consumer who is offered something they cannot have, or blocked from something they should have access to, does not conclude that the authorization system is stale. They conclude that the company’s AI does not know what it is doing. That perception risks damaging the very customer experience the AI was deployed to improve.

B2B: The Enterprise Account in Transition

Business-to-Business authorization is where the complexity multiplies significantly.

An enterprise account is not a single party. It is an organisation with multiple contacts, administrators, and users, each with different levels of access to different products and services across potentially multiple sites, cost centres, and geographies. The account itself may be in the middle of a contract renegotiation, a product migration, or a service transition that changes what the account and its users are permitted to do.

Consider a scenario where an enterprise customer is mid-migration from one connectivity product to another. For a period of several weeks, their authorization state is genuinely ambiguous. They have partial access to the old product and conditional access to the new one, governed by milestones in a migration plan that exists in a project management system no authorization layer has ever been connected to.

An AI agent deployed into this account, a network management assistant, an order management agent, or a billing enquiry handler, must navigate this ambiguous state. Without a dynamic, context-aware authorization layer, it risks either over-serving the customer based on an old entitlement, or under-serving them by blocking access they legitimately hold under the new arrangement.

The B2B failure mode carries higher commercial stakes than B2C. Enterprise customers have account managers, SLAs, and escalation paths. A mis-authorization event in a B2B context has the potential to generate a service complaint rather than a frustrated click. And if the AI agent is operating with any degree of autonomy, making configuration changes, raising orders, or processing requests without human review at each step, the downstream consequences of an authorization error could compound quickly.

B2B2C: The Reseller Relationship Nobody Modelled

The Business-to-Business-to-Consumer scenario is where authorization complexity reaches its highest point, and where I would anticipate the most significant gap between current deployment practices and what the architecture actually requires.

In a B2B2C model, a telco sells through an intermediary. The intermediary, a reseller, a retail partner, or a white-label operator, then serves end consumers. The authorization question is no longer simply what is this party permitted to do. It becomes: what is this party permitted to do, on behalf of which end consumer, within the boundaries that the intermediary relationship permits, subject to the constraints that the originating telco’s regulatory obligations impose?

This is a multi-layered authorization problem. The reseller has permissions. The end consumer has permissions. The relationship between them has permissions. And all three layers can change independently.

A reseller tier upgrade changes what products the intermediary can offer. An end consumer status change affects what the reseller can do on their behalf. A regulatory change in a specific market constrains what either party can access regardless of their commercial arrangement.

When an AI agent operates in a B2B2C context, a partner portal agent, a white-label customer service agent, or a commission management assistant, it would need to reason across all three authorization layers simultaneously. Without a foundation that models these relationships explicitly and keeps them current, the agent would be operating on an incomplete and potentially incorrect view of what is permitted.

The potential failure modes here range from commercially problematic to genuinely serious. An agent that allows a downgraded reseller to continue offering premium product tiers creates a revenue and compliance exposure. An agent that surfaces one reseller’s customer data within another reseller’s administrative context creates a data protection risk. Both have authorization misconfiguration as a plausible root cause.

Three Anti-Patterns Worth Anticipating

Based on my experience designing authorization systems for complex enterprise and telco environments, I want to characterise three anti-patterns that could plausibly emerge as AI agent deployments mature and scale. Enterprise AI deployment at the level of complexity described here is still in relatively early stages across the industry. These are not patterns I have seen fully play out in production. They are patterns that the current architectural trajectory makes possible, and that I believe are worth designing against before they become the incident that prompts a retrospective.

The over-permissioned agent operates with more access than the party it represents currently holds. This would typically arise when role transitions, downgrades, expirations, or relationship terminations are not reflected in the agent’s authorization context in a timely way. The commercial and compliance risk is real and could remain invisible until audited or until a downstream consequence surfaces.

The under-permissioned agent is blocked from actions the party it represents legitimately holds the right to perform. This would typically arise when new entitlements, upgrades, or relationship extensions have not propagated to the authorization layer. The immediate consequence is a degraded experience. The less visible consequence is eroded trust in AI-assisted processes that were supposed to make interactions faster and more capable.

The stale-permission agent is the most structurally interesting anti-pattern. The agent’s authorization context was accurate at the time it was established. It has simply not been updated to reflect subsequent changes. This is not a design flaw in the traditional sense. The authorization system worked correctly when it was populated. The failure is the absence of a dynamic update mechanism that keeps authorization state current as the business state beneath it evolves.

All three anti-patterns share a common structural cause: the gap between when a party’s role changes and when the authorization systems acting on behalf of that party reflect that change.

Why This Gap Exists, and Why It Deserves Attention Now

The gap exists because authorization policy has historically been written and maintained by humans. A role changes in the BSS. A policy analyst or architect reviews the implications. A change request is raised. An authorization update is deployed. Days or weeks pass between the business event and the policy change.

For human-operated systems, this lag was manageable. A human customer service agent could use judgment to bridge the gap. A human partner manager could apply context that the system lacked.

AI agents cannot do this in the same way. They operate on what they are given. If the authorization context is stale, they act on stale context, at speed, at scale, and without the contextual judgment that a human operator might apply to an edge case.

The deployment of AI agents does not create this authorization gap. It has the potential to expose a gap that already exists and make the consequences of that gap significantly more visible, more frequent, and harder to attribute after the fact.

This is why I believe the right moment to address the authorization layer is before AI agents are deployed at scale, not after the first incident prompts a retrospective.

What Belongs in the Gap

I am deliberately not proposing a solution in this article. I want practitioners reading this to sit with the problem for a moment, because in my experience the instinct is to reach immediately for tooling, a new IAM product, a policy engine, a synchronisation job, without fully appreciating why those approaches may fall short at the scale and complexity that telco authorization demands.

Next week I will introduce two specific technical primitives that I believe belong at the foundation of a proper solution. One from the telco standards world that most operators already have in production. One from modern authorization engineering that models the relationship complexity described here with the precision it requires.

The week after, I will show where AI reasoning enters the architecture, and why it needs to reason rather than simply route.

For now I want to ask a direct question to practitioners reading this: do you recognise these three anti-patterns as risks in your own AI deployment planning? And if you are already designing authorization layers for AI agents in complex multi-role environments, I would genuinely like to compare notes on what you are finding.

Next: Article 2, ReBAC Meets BSS: Why TMF 672 and OpenFGA Belong Together

Originally published at https://www.linkedin.com.

The Authorization Problem Nobody Is Solving Before Deploying AI Agents was originally published in System Weakness on Medium, where people are continuing the conversation by highlighting and responding to this story.

The Tool I Installed for Fun and Ended Up Using Every Day

Fateyaly — Thu, 04 Jun 2026 14:10:34 GMT

Unexpected utility in cybersecurity.

Continue reading on System Weakness »

Splunk 101: Hands-On Introduction to SIEM, Log Ingestion, and Basic Threat Hunting

Aditya Bhatt — Thu, 04 Jun 2026 14:10:29 GMT

Splunk, SIEM, Log Analysis: Hands-on walkthrough of log ingestion, SPL querying, and basic security investigation using Splunk.

Keywords: Splunk, SIEM, Log Analysis, SOC, Threat Hunting, TryHackMe

Security monitoring is not just about collecting logs — it’s about turning raw machine data into actionable security insights.

In this hands-on walkthrough, I explored the fundamentals of Splunk, one of the most widely used SIEM platforms, through a practical lab exercise involving log ingestion and basic investigation queries. Instead of only covering theory, this walkthrough focuses on what we actually did, why we did it, and how Splunk helps security analysts investigate events efficiently.

Lab Link: https://tryhackme.com/room/splunk101

What is Splunk?

Splunk is a SIEM (Security Information and Event Management) platform that helps security teams:

Collect logs from multiple sources
Normalize and index machine data
Search events efficiently
Correlate security events
Visualize trends and anomalies
Accelerate incident detection and investigation

Think of it as a centralized visibility platform for your infrastructure.

Without SIEM tools, reviewing thousands of logs manually would be operationally painful.

Core Splunk Architecture

Splunk mainly operates using three core components.

1. Forwarder

The Forwarder acts as the log collector.

It is a lightweight agent installed on monitored systems that gathers logs and forwards them to Splunk.

Typical data sources include:

Windows Event Logs
Syslogs
Web server logs
Firewall logs
Database logs
Endpoint telemetry

Its lightweight nature ensures minimal performance impact on the host.

Answer from lab: Forwarder

2. Indexer

The Indexer is where the real processing happens.

Once data reaches the indexer:

Raw logs are parsed
Data is normalized
Fields are extracted
Events are indexed for fast searchability

Without indexing, searching through millions of events would be painfully slow.

3. Search Head

The Search Head is the analyst’s workspace.

This is where:

Queries are written
Investigations are performed
Dashboards are built
Visualizations are generated

Splunk uses SPL (Search Processing Language) for querying indexed data.

Connecting to the Lab

The lab environment provides a live Splunk instance.

After launching the machine, the dashboard becomes accessible via the provided IP.

At this point, the goal is not hunting threats yet — it’s simply getting familiar with the platform.

Navigating the Splunk Interface

When opening Splunk for the first time, several interface sections appear.

Top Navigation Bar

This provides administrative and operational controls:

Messages
Settings
Activity
Help
Search
App Switching

This is where analysts usually monitor running jobs and manage platform settings.

Apps Panel

Splunk is modular.

Different apps provide specialized functionality.

The default app is:

Search & Reporting

This is the primary workspace for analysts.

Explore Section

This section gives quick access to:

Add Data
Install Apps
Documentation

This becomes especially useful during data onboarding.

Adding Data into Splunk

A SIEM without data is just an expensive dashboard.

Splunk supports ingestion from:

Event logs
Syslogs
Web logs
Network telemetry
Custom files
API-based sources

For this lab, we worked with VPN log data.

Collecting Data from Files

Inside Add Data, Splunk provides multiple ingestion methods.

The correct option for file-based collection is:

Monitor

Why?

Because this mode continuously watches files or ports and ingests incoming data automatically.

Practical Log Ingestion

We uploaded the provided VPN log dataset.

Splunk ingestion follows a structured workflow:

Step 1 — Select Source

Choose the log file.

This tells Splunk what raw data needs processing.

Step 2 — Select Source Type

This defines how Splunk interprets the incoming data.

Examples:

Syslog
JSON
CSV
Windows logs

Choosing the correct type matters because field extraction depends on it.

Step 3 — Input Settings

Here we configured:

Index Name → vpn_logs
Host metadata

The index acts like a searchable storage bucket.

Step 4 — Review

Always validate configurations before ingestion.

Incorrect source parsing can break field extraction later.

Step 5 — Done

Splunk processes the file and indexes the events.

Now the investigation phase begins.

Investigation Queries

Now comes the fun part.

Instead of manually opening raw logs, we query structured event data.

1. Total Number of Events

Objective:

Determine how many events exist in the uploaded dataset.

PAYLOAD

source="VPN_logs.json" host="ip-10-10-40-195" sourcetype="_json"

Why this works

This tells Splunk:

Search everything inside the VPN_logs index.

Since no filters are applied, Splunk returns all indexed events.

Result

2862 events

2. Events Generated by User “Maleena”

Objective:

Identify all VPN activity associated with a specific user.

PAYLOAD

source="VPN_logs.json" host="ip-10-10-40-195" sourcetype="_json" UserName="Maleena"

Why this works

This query:

Searches inside vpn_logs
Filters only events where UserName equals Maleena

This is useful during user-centric investigations.

Examples:

Insider threat reviews
Suspicious account monitoring
Login activity auditing

Result

60 events

3. Identify Username Behind an IP Address

Objective:

Map an IP address back to a user identity.

PAYLOAD

source="VPN_logs.json" host="ip-10-10-40-195" sourcetype="_json" Source_ip="107.14.182.38"

Why this works

This filters events originating from the specified source IP.

This is common during incident response when an IP is flagged externally and analysts need attribution.

Result

Smith

4. Events from All Countries Except France

Objective:

Exclude one geography from the search.

PAYLOAD

source="VPN_logs.json" host="ip-10-10-40-195" sourcetype="_json" NOT Source_Country="France"

Why this works

The NOT operator removes all events matching the specified condition.

This is useful when:

Filtering known benign traffic
Narrowing investigations
Removing noise

Result

2814 events

5. Activity from a Specific IP

Objective:

Check all events tied to a suspicious IP.

PAYLOAD

source="VPN_logs.json" host="ip-10-10-40-195" sourcetype="_json" Source_ip="107.3.206.58"

Why this works

This isolates activity from one host.

This helps in:

IOC investigations
VPN misuse analysis
Threat actor tracking

Result

14 events

Key Splunk Concepts Learned

This exercise reinforces several foundational SIEM concepts:

Log Ingestion

Security visibility starts with proper onboarding.

No data = no detection.

Indexing

Raw logs become searchable events after parsing and indexing.

SPL Querying

Analysts use SPL to rapidly hunt across large datasets.

Filtering and Investigation

Instead of reading logs manually, we ask focused questions:

Who logged in?
From where?
How often?
Any anomalies?

Why This Matters in Real SOC Work

Even though this is a beginner lab, the workflow mirrors real-world analyst activity.

A SOC analyst frequently:

Investigates suspicious IPs
Reviews user login activity
Filters noise
Hunts indicators of compromise
Correlates events across datasets

Splunk dramatically reduces investigation time compared to manual log review.

Final Thoughts

Splunk remains one of the most practical SIEM platforms for defenders because it combines:

scalability
fast search
flexible parsing
rich dashboards
investigation speed

This lab was beginner-friendly, but it introduces the exact mindset needed for operational security monitoring.

Start with simple searches.

Then move toward:

correlation rules
alerting
dashboards
anomaly detection
incident investigations

That’s where Splunk becomes truly powerful.

Splunk 101: Hands-On Introduction to SIEM, Log Ingestion, and Basic Threat Hunting was originally published in System Weakness on Medium, where people are continuing the conversation by highlighting and responding to this story.

7 Security Tools I Use Constantly, But Rarely Talk About

Fateyaly — Thu, 04 Jun 2026 14:10:28 GMT

The quiet workhorses behind real investigations.

Continue reading on System Weakness »

Writeup for CyLab/picoCTF challenge “Irish-Name-Repo-1”

Walter Moar — Thu, 04 Jun 2026 14:09:29 GMT

Learn how CyLab’s “Irish-Name-Repo-1” challenge shows how SQL injection can bypass a login form when user input is passed directly into…

Continue reading on System Weakness »

OverTheWire Bandit Walkthrough — Level 10 → 11 | 30-Day Cybersecurity Learning Journey (Day 10)

William | Cybersecurity & SOC Analyst — Thu, 04 Jun 2026 14:09:27 GMT

OverTheWire Bandit Walkthrough — Level 10 → 11 | 30-Day Cybersecurity Learning Journey (Day 10)

Decoding base64 encoded data from the command line and why recognizing and reversing encoding schemes is a skill every SOC analyst needs to have ready immediately.

Introduction

Day 10. Bandit Level 10 to Level 11. The file in this level is only 69 bytes. It is readable, it opens cleanly with cat and it contains a single line of text. But the text makes no immediate sense. It is a long string of uppercase and lowercase letters, numbers and symbols with no spaces and no obvious structure. It is not random. It is base64 encoded data and decoding it requires one command.

This level introduces encoding as a concept that is completely separate from encryption. Base64 is not a security mechanism. It is a way of representing binary data as printable text so it can be safely transmitted or stored in systems that only handle plain text. Attackers use it constantly to obfuscate payloads, commands and credentials because it looks unfamiliar to anyone who does not recognise the format on sight.

By the end of this article you will know how to identify base64 encoded content, decode it in one command and understand why this skill appears in phishing analysis, malware triage and incident response regularly.

Level Objective

The password for the next level is stored in the file data.txt, which contains base64 encoded data. The file contains a single encoded string. The objective is to decode it and read the password it contains. The commands suggested by OverTheWire for this level include grep, sort, uniq, strings, base64, tr and others.

Approach

I logged in using the password retrieved from the previous level:

ssh bandit10@bandit.labs.overthewire.org -p 2220

The banner loaded and ended with “Enjoy your stay!” and the prompt changed to bandit10@bandit:~$.

Logged into bandit10 via SSH on port 2220.

I ran ls -la and confirmed data.txt was present, owned by bandit11 with group bandit10, permissions -rw-r----- and a size of just 69 bytes. That small size was a signal. A 69-byte file is not storing binary data or thousands of lines. It is storing a single short encoded string.

I decoded it immediately using the built-in base64 tool with the decode flag:

base64 -d data.txt

The output printed as a complete plain English sentence: The password is dtR173fZKb0RRsDFSGsg2RWnpNVj3qRr. One command. Complete result.

Password for Level 11 retrieved.

Commands Used

# Connect to the Bandit server as bandit10 using the Level 10 password
ssh bandit10@bandit.labs.overthewire.org -p 2220

# Check the file and confirm its size before approaching it
ls -la

# Decode the base64 encoded content and print the result
base64 -d data.txt

Command Breakdown

base64 -d data.txt Reads the base64 encoded content of data.txt and decodes it back to its original form. The -d flag tells the tool to decode rather than encode. Without this flag base64 would encode the file content rather than reverse it, producing a longer encoded string instead of the original message.

base64 A command-line tool that handles base64 encoding and decoding. It is available by default on Linux and macOS systems. It can read from a file directly or from piped input, making it easy to combine with other commands in a pipeline.

-d The decode flag. It is the only flag needed here. It reverses the base64 encoding and outputs the original data as readable text.

Base64 format recognition Base64 strings use only uppercase letters, lowercase letters, numbers and the characters + and /. They frequently end with one or two = padding characters. A long string with no spaces that ends in = is almost always base64 encoded. Recognising that pattern on sight is a useful quick-identification skill.

Lesson Learned

The main technical takeaway is that base64 is encoding, not encryption. Encoding transforms data into a different format for compatibility or transport reasons. It is fully reversible by anyone with the right tool and no key or password is required to decode it. This distinction matters enormously in security work because base64 encoded content is sometimes mistaken for encrypted content, which completely changes how an analyst approaches it.

What made this level particularly clean was the output. Unlike previous levels where the answer appeared in a sea of other data, this one decoded directly into a complete readable sentence. The structure The password is [password] confirmed immediately that the decode was successful and the result was correct. That clarity comes from base64 encoding preserving the original content exactly.

The file size was also informative. Seeing 69 bytes in the ls -la output before even opening the file told me the content was short and likely a single encoded string. Reading metadata before opening a file is a habit that keeps paying off.

base64 -d filename — decode a base64 encoded file
echo "encodedstring" | base64 -d — decode a base64 string directly from the terminal
base64 filename — encode a file into base64 format
cat filename | base64 — pipe file content into base64 for encoding
base64 -d filename | file - — decode and immediately check the file type of the result

🔴 SOC Analyst Insight

Base64 encoding is one of the most commonly used obfuscation techniques in malicious scripts, phishing emails and malware payloads. PowerShell commands delivered through phishing attacks are almost always base64 encoded to bypass email content filters and avoid keyword detection in security tools. When an analyst examines a suspicious email attachment, a flagged script or an unusual network request, encoded content is one of the first things to look for and one of the first things to decode.

# Decode a base64 encoded PowerShell command extracted from a suspicious email attachment
echo "cGluZyAxOTIuMTY4LjEuMQ==" | base64 -d

The command above decodes a base64 string that might appear inside a malicious macro or obfuscated dropper. The decoded output reveals what the attacker actually intended to execute. In a real investigation that information drives the next steps: scoping the impact, identifying the target system and determining whether the command was successfully run. Decoding it takes seconds. Not knowing how to decode it can cost minutes of confusion during an active incident.

Key Takeaway

Base64 is not encryption and it is not security. It is a reversible encoding scheme that attackers use to make malicious content less immediately readable to automated filters and human analysts who do not recognise the format. Recognising base64 by its character set and padding, and decoding it with a single command, is a skill that applies directly to phishing analysis, malware triage and log investigation. The faster an analyst can identify and reverse encoding schemes the faster they can read attacker intent and act on it.

30-Day Cybersecurity Learning Journey — Progress

🟢 Open Day — Setup & Series Introduction  | OverTheWire Bandit
✅ Day 0.   — Bandit Level 0               | First Login
✅ Day 1.   — Bandit Level 1 → 2           | Special Characters
✅ Day 2.   — Bandit Level 2 → 3           | Spaces in Filenames
✅ Day 3.   — Bandit Level 3 → 4           | Hidden Files
✅ Day 4.   — Bandit Level 4 → 5           | File Types
✅ Day 5.   — Bandit Level 5 → 6           | find with Properties
✅ Day 6.   — Bandit Level 6 → 7           | find across Filesystem
✅ Day 7.   — Bandit Level 7 → 8           | grep
✅ Day 8.   — Bandit Level 8 → 9           | sort and uniq
✅ Day 9.   — Bandit Level 9 → 10          | strings and grep
✅ Day 10.  — Bandit Level 10 → 11         | base64  ← today
⬜ Day 11.  — Bandit Level 11 → 12         | coming next

Follow along with the series as I document each level, command and lesson learned.

Encoding hides intent from the casual observer. One command is all it takes to read exactly what was written.

OverTheWire Bandit Walkthrough — Level 10 → 11 | 30-Day Cybersecurity Learning Journey (Day 10) was originally published in System Weakness on Medium, where people are continuing the conversation by highlighting and responding to this story.

How I cleared ISC2 CC (beginner-friendly certification) exactly what I did

Manubhav Sharma — Thu, 04 Jun 2026 14:09:25 GMT

No coaching. No bootcamp. Here’s the exact strategy.

This is specifically for students in second or third year CS/IT wanting their first security credential, non-technical backgrounds wanting to enter cybersecurity, and anyone who wants to test whether security is right for them before committing to Security+ or CEH.

The ISC2 CC is genuinely beginner-friendly. It’s an entry point, not a gatekeeping exam.

Image Source: Udemy

What ISC2 CC actually tests

What it does NOT test

Firewall configuration. Script writing. Packet capture analysis. Deep technical skills.

What it DOES test

Security concepts (CIA triad). Terminology fluency. Conceptual judgment in scenarios.

Think of it as a security literacy exam do you think in the right framework, or are you still thinking like a general IT user?

My preparation strategy

1. Understand the exam before studying for it

Spend days 1–3 reading the official ISC2 CC exam outline (free on their site).
Five domains: Security Principles (heaviest at ~30%), Business Continuity, Access Controls, Network Security, Security Operations. Questions are scenario-based not “define X” but “in this situation, what is correct?” That distinction changes how you study.

2. Focus on concepts, not memorisation

After every concept, ask “why”, not just what is the principle of least privilege, but why does it exist and what breaks without it? Understanding the reasoning made unfamiliar question wording easy to navigate.

3. Use only 3 resources deliberately

01. ISC2 official self-paced training free with account
02. Make your own notes
03. Practice Questions ~ 50 sample Qs/day

Resource overload is one of the biggest reasons people delay exams they’re already ready for.

4. Practise questions as a diagnostic, not a test

Every wrong answer got a written note: what did I misunderstand, and what reasoning was the correct answer using that I wasn’t?
After 50 questions, patterns became obvious consistently confusing preventive vs deterrent controls. Conceptual gaps, not knowledge gaps. Fixed in minutes, not weeks.

Mistakes I made

Mistake 1

Overstudied Network Security when Security Principles needed 40% of my time.

Mistake 2

Memorised RPO, RTO, MTD as acronyms without being able to apply them to scenarios.

Mistake 3

Waited too long to take the exam. 3–4 weeks of consistent study is enough. Waiting builds anxiety, not readiness.

Would do differently

Start with official ISC2 training on day 1. Do practice questions from week 1, not week 3.

Start with understanding, tools come later.
The CC is free for students through ISC2’s One Million Certified initiative.
There’s no reason to delay.

Want a clear roadmap from CC to your first SOC role?

Structured path: foundational thinking to analyst-ready skills:

Cybersecurity Foundations Course →

Weekly certification and career guidance:

Join the newsletter →

Daily content on certs, SOC careers, and analyst thinking:

Follow on LinkedIn →

— Manubhav Sharma · Threat Analyst at Sophos · Cybersecurity Mentor

How I cleared ISC2 CC (beginner-friendly certification) exactly what I did was originally published in System Weakness on Medium, where people are continuing the conversation by highlighting and responding to this story.