AI agents are getting real tools. Database access, API calls, the ability to take actions on behalf of users. That makes them powerful, and it gives them a real attack surface. agent-inject is an open-source training range for exploring what happens when agentic AI systems get misconfigured. Deploy a realistic AI agent on AWS, toggle specific misconfigurations through Terraform, and run attack scenarios that demonstrate prompt injection, RAG poisoning, data exfiltration, and more. If you want the full build narrative, there is a three-part series covering the entire construction. This post is the showcase.

The Concept

agent-inject follows a deploy-secure-first philosophy. You start with a hardened baseline, not a broken one. A fictional SaaS company called NovaCrest Solutions runs an AI-powered customer support agent. It can look up customers, check refund eligibility, process refunds, and search a knowledge base. Everything works correctly out of the box.

The interesting part is the toggle pattern. Six boolean Terraform variables control six deliberate misconfigurations. Flip one, run terraform apply, and you have a specific vulnerability to explore.

enable_overpermissive_iam  = false  # wildcard IAM permissions
guardrail_sensitivity      = "HIGH" # content filter sensitivity (HIGH/LOW/NONE)
kb_include_internal_docs   = false  # index confidential docs in RAG
enable_refund_confirmation = true   # require confirmation before processing
use_weak_system_prompt     = false  # swap secure prompt for a 4-line stub
enable_excessive_tools     = false  # expose tools beyond least-privilege

Each toggle maps to a different vulnerability class. Combine them and the failures compound. This is the same dynamic that creates real breaches - individual controls fail independently, and the combination is what produces impact.

Architecture

The stack runs entirely on AWS with Terraform managing all resources across 9 modules.

The agent itself is an Amazon Bedrock Agent using a ReAct reasoning loop. It connects to a RAG pipeline built from S3 source documents, an OpenSearch Serverless vector store, and Amazon Titan Text Embeddings V2 for vectorisation. Four Lambda-backed tools handle customer lookups, refund eligibility checks, refund processing, and knowledge base search. Bedrock Guardrails provide content filtering, prompt attack detection, PII redaction, and denied topic policies.

The frontend is a Streamlit application running on EC2 with a chat interface and a trace panel that exposes agent reasoning steps, tool calls with parameters, knowledge base retrieval sources, and guardrail interventions in real time. CloudWatch captures everything for post-session analysis. The trace panel is where the learning happens - you can watch the agent’s decision-making process as it encounters poisoned content or follows a social engineering path.

The Attack Scenarios

Five scenarios, each demonstrating a different vulnerability class. Every scenario includes documented walkthroughs you can follow step by step.

Prompt Leakage. Extract the agent’s system prompt, tool definitions, and internal configuration through conversational techniques. The secure prompt with HIGH guardrails blocks these attempts. The weak prompt with reduced guardrails exposes everything.

RAG Poisoning. A 3-turn social engineering attack that exploits poisoned knowledge base content. Turn 1 establishes identity with a real customer lookup. Turn 2 triggers retrieval of a poisoned document that primes the agent with fabricated refund policies. Turn 3 requests a $750 refund. The agent complies because its context now contains “authoritative” documentation saying this is standard procedure. Content poisoning, not instruction injection.

Tool Manipulation. Exploit missing confirmation steps and weak system prompts to invoke tools with attacker-controlled parameters. Process refunds for arbitrary customers, modify records, call tools in sequences the system was never designed to permit.

Data Exfiltration. Extract customer records, internal documents, and sensitive data through conversational requests. With the right misconfigurations active, the agent will retrieve confidential onboarding guides, PTO policies, and incident response playbooks from the knowledge base and relay them directly.

Full Kill Chain. Nine steps, 12 tool calls, 2 knowledge base lookups, zero guardrail blocks. Reconnaissance, knowledge base probing, RAG poisoning, data breach, internal system access, financial fraud, data exfiltration, persistence through record modification, and cover track queries. Every step uses natural language. No exploit code, no special characters. Just a conversation.

The scenarios map to the OWASP LLM Top 10 and the OWASP Agentic AI Top 10, covering prompt injection, insecure output handling, excessive agency, and more.

What We Found

Building and testing agent-inject surfaced several findings that apply beyond this specific lab.

Multi-turn social engineering defeats all prompt-level defences. The secure baseline with HIGH guardrails stopped 13 out of 14 automated attacks. The one that got through was a 3-turn social engineering pattern where no single message triggers any filter. Gradual authority escalation is the real threat, not clever injection syntax.

Models hallucinate completed actions without calling tools. When primed with poisoned RAG content, the agent sometimes told users “your refund has been processed” without ever invoking the refund tool. If you are monitoring tool calls as your detection signal, this attack is invisible.

Guardrails only filter input and output, not intermediate context. Knowledge base documents and tool outputs enter the model’s context window unfiltered. A poisoned document retrieved via RAG bypasses every guardrail because the guardrail never sees it. This is the fundamental architectural gap.

Generic policy language gets interpreted loosely. An anti-enumeration rule telling the agent not to “call the same tool repeatedly” was ignored when a user politely asked to look up five customers in a table. The fix required explicit, specific rules. Vague security instructions do not survive contact with friendly requests.

Identity verification must be architectural, not conversational. The agent cannot distinguish between a customer and someone claiming to be an internal employee. No system prompt fixes this. You need external authentication and authorisation systems that exist outside the conversation.

NovaCrest agent UI showing a data exfiltration attack with full trace output

Try It

The environment costs roughly $6/day to run, primarily from OpenSearch Serverless. Deploy it, run the scenarios, tear it down when you are done.

The practical flow:

  1. Clone the repo and deploy with the secure baseline configuration
  2. Open the Streamlit UI and explore the agent as a normal user - ask about products, look up a customer, check refund policies
  3. Toggle one or more misconfigurations and run terraform apply
  4. Follow the documented walkthroughs for each attack scenario
  5. Watch the trace panel to understand what the agent is doing at each step
  6. Go off-script - try your own attacks, combine toggles, see what breaks

The best use of this lab is not following the walkthroughs. It is going off-script and discovering what works that nobody planned for.

If you find something interesting, open an issue. If you build a new scenario, submit a PR. Feedback on the walkthroughs, the documentation, or the infrastructure is welcome. The project is licensed under Apache 2.0.

The Build Series

For the full narrative of how agent-inject was designed, built, and tested:


Clone it, break it, fix it: github.com/keirendev/agent-inject