LLAMAINDEX INTEGRATION

LlamaIndex payment integration.

Add USDC payments to any LlamaIndex agent. Drop into a ReActAgent, gate a QueryEngineTool, charge per RAG response.

SHORT ANSWER

Blockchain0x provides drop-in payment_request_tool and paid_query_engine_tool helpers for LlamaIndex agents. Install blockchain0x-llamaindex, add the tool to your ReActAgent or FunctionAgent, or gate an existing QueryEngineTool behind payment. Your agent can now charge USDC before serving paid RAG responses or expensive inference. Payments settle on Base; webhooks confirm.

WHY LLAMAINDEX FITS PAID RAG

RAG queries are expensive. Charging per query makes the economics work.

LlamaIndex's strength is retrieval-augmented generation: you have a vector index over proprietary documents (your company's docs, a research dataset, a legal corpus), and the agent answers questions by retrieving + summarizing relevant chunks. Each query costs real money in embeddings, LLM tokens, and (often) third-party API calls. Free unlimited access burns budget; per-query billing fits the cost shape.

The paid_query_engine_tool() helper is built for this pattern. You wrap your existing QueryEngineTool with a payment gate; the agent must charge the client before each query (or per session, with caching). Most LlamaIndex agents we see in production are this shape: a paid index serving high-value answers with usage-aligned billing.

INSTALLATION

One pip install. Three environment variables. LlamaIndex 0.11+ ready.

Targets Python 3.10+ and LlamaIndex 0.11+ (with the modern ReActAgent / FunctionAgent / Workflows APIs). Works alongside any vector store (Pinecone, Qdrant, Chroma, Weaviate, LlamaIndex Cloud).

INSTALL

pip install blockchain0x-llamaindex

ENVIRONMENT VARIABLES

export BLOCKCHAIN0X_API_KEY=sk_live_...
export BLOCKCHAIN0X_AGENT_ID=agt_abc123
export BLOCKCHAIN0X_SIGNING_SECRET=whsec_...

BLOCKCHAIN0X_API_KEY and BLOCKCHAIN0X_AGENT_ID come from the agent's settings page after creation in the Blockchain0x dashboard. BLOCKCHAIN0X_SIGNING_SECRET is needed only in the FastAPI process handling webhooks.

FULL AGENT EXAMPLE

A ReActAgent with payment and refund tools.

Below is a complete LlamaIndex ReActAgent that requests USDC payment before producing a research report. The agent has both payment_request_tool (charge the client) and refund_payment_tool (refund a payment when something goes wrong). Add your own QueryEngineTool to the tools list and the agent can also query a RAG index.

AGENT.PY

from llama_index.core.agent.workflow import ReActAgent
from llama_index.llms.openai import OpenAI
from blockchain0x.llamaindex import payment_request_tool, refund_payment_tool

llm = OpenAI(model="gpt-4o")

agent = ReActAgent(
    name="research-bot",
    tools=[
        payment_request_tool,
        refund_payment_tool,
        # ... your other tools, e.g. a QueryEngineTool over your docs index
    ],
    llm=llm,
    system_prompt=(
        "You produce paid research reports for clients. "
        "Before doing any research, use payment_request_tool to request "
        "USDC payment. Hand the hosted_url to the client and wait for "
        "the webhook to confirm payment before delivering the report."
    ),
)

response = await agent.run(
    "Write me a Q4 LLM market analysis. Charge me $5 USDC for it."
)

When the agent reasons through the prompt, it identifies the dollar amount, calls payment_request_tool, gets back a hosted_url, and returns the URL to the user. The user clicks the link, pays $5 USDC on Base, and your webhook fires payment.confirmed within seconds. From the agent's perspective: one tool call, end the run, wait for a fresh resume when payment lands.

WEBHOOK HANDLING

Resuming the agent after payment.

Your webhook URL receives a signed POST when the chain confirms the payment. Verify the signature, then trigger a fresh agent.run() (or enqueue one) to deliver the paid work. FastAPI example below; same pattern in any async Python framework.

WEBHOOK.PY

from fastapi import FastAPI, Request, HTTPException
from blockchain0x.llamaindex import verify_webhook
import os

app = FastAPI()
SIGNING_SECRET = os.environ["BLOCKCHAIN0X_SIGNING_SECRET"]

@app.post("/webhooks/payment")
async def receive(request: Request):
    signature = request.headers.get("X-Blockchain0x-Signature", "")
    body = await request.body()
    if not verify_webhook(body, signature, SIGNING_SECRET):
        raise HTTPException(status_code=401)
    event = await request.json()
    if event["type"] == "payment.confirmed":
        # Resume the agent with the confirmed payment context
        await trigger_research_for(event["data"]["payment_request_id"])
    return {"ok": True}

The verify_webhook helper uses constant-time HMAC-SHA256. Read the raw body via await request.body(); do not call request.json() then re-serialize. The recommended architecture is to have the webhook enqueue an agent-resume job via Celery or arq, not run agent.run() inline - LlamaIndex runs over expensive indexes can take 30+ seconds and time out HTTP requests.

STARTER REPOSITORY

Working example with ReActAgent + Workflow + paid QueryEngine.

A complete LlamaIndex repository at the GitHub link below. Includes a ReActAgent example, a LlamaIndex Workflow variant, a paid_query_engine_tool example over a Chroma index, a FastAPI webhook handler, and a docker-compose orchestrating the agent, the webhook server, the vector store, and a Redis queue.

github.com/blockchain0x/agent-wallet-llamaindex

Repository structure: agent.py (ReActAgent), workflow.py (Workflow variant), paid_query.py (paid QueryEngineTool), webhook.py (FastAPI), docker-compose.yml, README with deployment notes for Modal, Fly.io, and Replit.

COMMON PITFALLS

Five LlamaIndex-specific traps to avoid.

These come from our support inbox. Each saves at least an hour of debugging once you know about it.

PITFALL 1

ReActAgent vs FunctionAgent

LlamaIndex 0.11+ has two agent flavors: ReActAgent (reasoning + tool calls in a loop) and FunctionAgent (single-shot function-calling). The payment_request_tool works in both, but the prompting differs. ReActAgent needs explicit instructions to call the tool before answering; FunctionAgent picks it up automatically when the user message implies payment. The starter repo has working examples of both shapes.

PITFALL 2

Mixing payment with QueryEngineTool

A common LlamaIndex pattern is wrapping a vector index as a QueryEngineTool and exposing it as a paid resource. The wrong way: query the index, then ask for payment after. The right way: gate the QueryEngineTool behind a payment-required check. The SDK ships paid_query_engine_tool() helper that combines a QueryEngineTool with payment_request_tool so the agent must charge before querying. See the starter repo for the wiring.

PITFALL 3

Async vs sync execution

LlamaIndex 0.11+ is async-first for agents. The payment_request_tool returns an awaitable; calling agent.run() (sync) instead of await agent.run() (async) silently swallows tool errors. If your agent says it succeeded but no payment request appeared in the Blockchain0x dashboard, this is almost certainly the issue.

PITFALL 4

Decimal precision in tool args

LlamaIndex passes tool arguments through pydantic models with strict type checking. Pass amount_usdc as a string ("5.00") not a float (5.0). Pydantic will coerce a float to a string, but the rounding can produce "4.999999999" in some Python versions. The SDK validates the final value before HTTP, but a confusing 422 with sub-cent amounts usually traces back to this.

PITFALL 5

Persistence across agent runs

Each agent.run() call is stateless by default; the agent does not remember that it requested a payment in a previous run. To resume work after a webhook fires, persist the payment_request_id somewhere durable (Redis, SQLite, Postgres) and use it as the agent's context in the resume call. Without this, your agent re-requests payment for the same job in the resume run.

FREQUENTLY ASKED

Three LlamaIndex-specific questions.

Does this work with LlamaIndex Workflows (the newer event-driven primitive)?

Yes. Workflows are LlamaIndex's newer step-based execution model where steps emit and consume events. The payment_request_tool can be called from any step; the pattern is to emit a PaymentRequested event, end that step, and have a separate webhook-driven step listen for PaymentConfirmed events that resume the work. The starter repo includes both a classic ReActAgent example and a Workflow example for direct comparison.

Can the same agent handle both paid and free queries?

Yes. Wrap only the tools you want gated with the paid_query_engine_tool() helper; leave free tools (general knowledge questions, status checks) as plain QueryEngineTool or FunctionTool. The agent decides per query which path to take based on the user's intent. This is the recommended shape for hybrid free/paid LlamaIndex agents - drives discovery on the free tools while monetizing the expensive ones.

How does this integrate with LlamaIndex Cloud (managed vector indexes)?

Same way as self-hosted LlamaIndex: the payment_request_tool is unrelated to where your index lives. Whether you query a managed LlamaIndex Cloud index or a local Chroma/Pinecone/Qdrant collection, the payment flow is identical. The only LlamaIndex Cloud-specific note is to make sure the agent's API key for LlamaIndex Cloud is separate from Blockchain0x's API key - we have seen first integrators paste the wrong one into the wrong env var.

All integrations Pricing

Charge for your RAG responses.

Wrap a QueryEngineTool in a payment gate. Pro at $9/agent/month.