How to Pass Credentials Between LangGraph Agents Without Leaving Them in Your Message Queue
Here's what a standard LangGraph credential handoff looks like: agent A fetches an OAuth token, serializes it into a state update, and the orchestrator writes it to whatever backing store you're using. Redis, Postgres, a cloud queue, a checkpoint. The token now lives in at least one place it was never meant to be permanent. No expiry tied to the token itself. No record of whether the downstream agent consumed it. No way to revoke it if that step crashes or never runs.
This isn't a hypothetical. It's the default behavior of every workflow that treats credentials as just another piece of state. The orchestration layer doesn't know that a string is a token. It just passes it along.
Traditional secrets managers don't solve this. They were built on three assumptions that multi-agent workflows break: secrets are managed statically, workloads don't switch context frequently, and workloads operate at human scale. A LangGraph pipeline that spins up agents dynamically and passes tokens between them violates all three. 1
The Exact Failure Mode
Walk through what actually happens. Your auth agent fetches a short-lived API token. It puts the token in the LangGraph state dict under a key like api_token. The graph serializes that state and checkpoints it. Now the token exists in your checkpoint store. If you're using LangChain's default SQLite checkpointer for local dev, it's sitting in a file on disk. If you're using Redis or Postgres in production, it's in a row or a hash that will outlive the workflow run unless you write cleanup code.
The consuming agent picks it up and uses it. Happy path. But what if the consuming step fails halfway through? What if the workflow gets cancelled? The token is still in the checkpoint store, unencrypted, with no mechanism to expire it. And there's no audit trail: you can't answer which agent consumed the token, when, or whether it was consumed at all.
What you actually need is a credential that's issued dynamically, scoped to the task, short-lived, and auditable. The queue pattern delivers none of those properties. 2
The Pattern: Claim URLs Instead of Credential Payloads
The fix is a two-step pattern. The producer creates a temporary, encrypted container holding the credential and gets back a claim URL. The workflow carries that URL as state. The consumer presents the URL and gets the credential back exactly once. After that, the credential is gone from the container, whether or not it was ever claimed.
Kubbi implements this as two calls: kubbi.create() and kubbi.claim(). Neither requires new infrastructure. They fit into your existing LangGraph nodes as tool calls.
Step 1: The Producer Node
In the node that fetches or generates the credential, replace the state write with a kubbi.create() call. Pass the credential payload and a TTL in seconds. You get back a claim URL. Write that URL into your state dict instead of the credential itself.
A minimal Python example: your auth node fetches an OAuth token from your identity provider. Instead of state['api_token'] = token, you call claim_url = kubbi.create(payload={'token': token}, ttl=900) and write state['api_token_url'] = claim_url. The token never touches LangGraph state. The checkpoint store holds a URL string. That's all.
The TTL of 900 seconds is a choice, not a default. Set it to match the expected time between your producer node and your consumer node, with a buffer. If your workflow typically completes in under five minutes, 900 seconds gives you headroom without leaving a long exposure window.
Step 2: The Consumer Node
In the node that needs the credential, read the URL from state and call kubbi.claim(url=state['api_token_url']). You get back the decrypted payload. Use the credential. It's gone from Kubbi's side after that single claim.
The consumer node doesn't need to know anything about the producer. It doesn't hold a shared secret or a vault token. It holds a URL that's only useful for one claim, within the TTL window, from the right caller. That's the entire access control model.
What Happens When the Consumer Never Runs
This is the question that exposes the weakness of the queue pattern. If the consuming step crashes, gets cancelled, or is never scheduled, the credential sits in your checkpoint store indefinitely. You need a cleanup job. The cleanup job needs to know about TTLs. That TTL logic becomes part of your codebase, your monitoring, your runbooks.
With a claim URL, the payload expires automatically when the TTL elapses. You don't write cleanup code. If the consuming node never runs because the workflow was cancelled, the credential is gone after 900 seconds or whatever you set. The checkpoint store still holds the URL string, but the URL is dead. There's nothing to clean up on your side.
If you want to be explicit about failed workflows, the producer can call a revocation endpoint before the TTL elapses. That's useful in human-in-the-loop flows where an operator rejects a step: revoke immediately instead of waiting for expiry.
Multi-Step Workflows: When More Than One Agent Needs the Same Credential
The single-claim model breaks if two downstream agents both need the same OAuth token. The first claimant gets the credential and it's gone. The second call to kubbi.claim() returns nothing.
There are two clean approaches. First, the producer creates one kubbi per consumer: one claim URL for agent B, a separate claim URL for agent C, each with its own TTL. The state dict carries two URLs. Each agent claims its own. This is the preferred pattern because it gives you per-agent audit granularity.
Second, if the workflow genuinely requires a shared credential with multiple claims, set claim_limit to the number of expected consumers when calling kubbi.create(). The payload stays claimable up to that limit, then expires. Use this when a credential is consumed by a dynamic fan-out where you don't know the exact agent count at creation time.
| Scenario | Pattern | Why |
|---|---|---|
| One producer, one consumer | Single claim URL, default claim limit of 1 | Cleanest audit trail. Credential destroyed immediately after use. |
| One producer, fixed set of consumers | One claim URL per consumer node | Per-agent audit logging, independent TTLs, no cross-consumer dependency. |
| One producer, dynamic fan-out | Single claim URL with claim_limit set to expected consumer count | Avoids creating N URLs when N is unknown at creation time. |
| Choosing the right claim structure for your workflow topology |
Encryption
Every payload in Kubbi is encrypted at rest with AES-256 before it touches storage. The claim URL itself contains no credential material. An attacker who intercepts the URL can't recover the payload without the decryption key. This matters because claim URLs will end up in logs. That's expected. A URL in a log is not a credential leak.
Compare that to the baseline: an OAuth token serialized into a LangGraph state checkpoint, stored unencrypted in a Postgres row, with no TTL and no indication in your logs that it's there. That token in that row is the exposure. The URL in a log is not.
Audit Logging and the SOC 2 Question
When you pass credentials through a queue or state dict, the handoff is invisible to every audit tool in your stack. Your secrets manager knows the credential was issued. Your SIEM knows about network calls. Nobody knows that agent A passed a token to agent B at 14:32:07 UTC, that agent B claimed it 40 seconds later, or that the token expired unused in the one run where agent B crashed.
A per-claim audit log answers the question SOC 2 auditors will ask: which workload accessed which credential and when. Each kubbi.claim() call is a logged event: the claim URL identifier, the timestamp, and optionally the caller identity you pass as metadata when you call kubbi.create(). You get a trail that maps directly to your workflow run IDs.
Wire this up during implementation. Add a metadata field to your kubbi.create() call that includes the LangGraph run ID and the producing agent's node name. When the auditor asks for all credential accesses by the payment processing workflow in Q3, you have a query, not a reconstruction project.
Common Pitfalls
Setting the TTL too short is the most common mistake. If your workflow has a human-in-the-loop review step between the producer and consumer, a 15-minute TTL will expire before the reviewer gets to it. Set the TTL to match the realistic worst-case latency for that step. For human review steps, hours or overnight are reasonable.
Putting the claim URL in an environment variable defeats the purpose. The URL should travel through your workflow state, not be injected into the agent's environment at startup. An agent with the URL in its environment has roughly the same exposure profile as an agent with the credential in its environment. The TTL helps, but you've given up the structural guarantee.
Don't skip the revocation call on workflow cancellation. If your orchestrator cancels a run mid-flight, add a cleanup hook that revokes any outstanding claim URLs. LangGraph supports interrupt and cleanup patterns. Use them. Waiting for TTL expiry is acceptable for crashes. For intentional cancellations, it's lazy.
Where to Go From Here
The implementation described here takes an afternoon. You replace state dict credential writes with kubbi.create() calls and state dict credential reads with kubbi.claim() calls. The workflow structure doesn't change. The orchestration layer doesn't change. You're not adopting new infrastructure. You're changing what travels through the infrastructure you already have.
Once claim URLs are flowing through your graph instead of raw credentials, the next step is structured metadata on every kubbi.create() call: run ID, node name, credential type, intended consumer. That metadata is what turns Kubbi's event log into a compliance artifact you can hand to an auditor without additional work.
Start with your highest-risk handoff: the one where a long-lived token or a user's OAuth credential moves between two nodes. Replace that single state write with kubbi.create() and the corresponding read with kubbi.claim(). Run it in staging. The workflow behavior is identical. What changes is what survives a checkpoint dump.
Key takeaways
- Passing OAuth tokens or API keys through LangGraph state checkpoints leaves them stored unencrypted in backing stores like Redis or Postgres with no TTL and no revocation path.
- Traditional secrets managers were built on assumptions that multi-agent workflows break. Static management, infrequent context switching, and human-scale workloads. Making them inadequate for dynamic LangGraph pipelines.
- The claim URL pattern fixes credential exposure by having the producer call kubbi.create() with a TTL (for example, 900 seconds) and passing only the resulting URL through workflow state, so the raw credential never touches the orchestration layer.
- When two downstream agents need the same credential, creating one claim URL per consumer is the preferred approach because it provides per-agent audit granularity and independent TTLs.
- Each kubbi.claim() call produces a logged event tying the credential access to a timestamp and caller identity, giving SOC 2 auditors a direct query into which workload accessed which credential and when.
Footnotes
-
https://aembit.io/blog/future-of-secrets-management-in-the-era-of-agentic-ai/ — Analysis of how agentic AI breaks the three core assumptions of traditional secrets managers, and what dynamic, context-aware credential systems require. ↩
-
https://www.akeyless.io/blog/secure-enterprise-ai-with-unified-secrets-non-human-identity-management/ — Unified secrets management approach with JIT short-lived token retrieval and the properties required for secure agentic credential delivery. ↩