Grafana Labs blog on Grafana Labs

How to monitor AI agent applications on Amazon Bedrock AgentCore with Grafana Cloud


Summary

Today’s AI agents have grown increasingly sophisticated, moving into production environments and becoming integral parts of engineering workflows. But these agents can also be black boxes for engineers, which makes observability more critical than ever. Without proper monitoring, you’re often left feeling like you’re flying blind as you try to debug agent failures, understand performance bottlenecks, and track costs. We want to put our users back in control, so in this tutorial you’ll learn how to deploy an AI agent on Amazon Bedrock AgentCore with full observability powered by OpenTelemetry and Grafana Cloud. More specifically, you’ll learn how to: 1. Deploy AI agents on AWS Bedrock AgentCore for managed, scalable production runtime 2. Instrument agents with OpenTelemetry using OpenLit for automatic, zero-code observability 3. Monitor agent performance in Grafana Cloud with AI Observability dashboards 4. Debug production issues using distributed tracing 5. Optimize costs by tracking token usage and model performance Note: This post focuses on AI application observability. The second part of this guide will focus on AI observability at the infrastructure layer. What is Amazon Bedrock AgentCore? Amazon Bedrock AgentCore is a managed service that simplifies deploying and running AI agents in production. Think of it as a serverless runtime for your AI agents. You provide the agent code, and AWS handles the infrastructure, scaling, and execution environment. Key benefits include: Managed infrastructure: No need to provision servers or manage Kubernetes clusters Amazon Bedrock integration: Native access to foundation models like Llama 3, Claude, and others Container-based deployment: Package your agent with all dependencies using Docker Enterprise-ready: Built-in security, IAM integration, and compliance features AgentCore is particularly powerful for orchestration frameworks like CrewAI, LangGraph, or Strands, where coordinating multiple agents or complex workflows is necessary. Why use OpenTelemetry for AI agents? AI agents can be notoriously difficult to debug. A single user query might trigger: Multiple LLM API calls Tool invocations and external API requests Multi-step reasoning chains Retry logic and error handling When something goes wrong (or worse), when performance silently degrades, you need visibility into every step. To address this, we recommend using OpenTelemetry (OTel), the industry-standard observability framework, which provides unified instrumentation for distributed applications and infrastructure. For AI agents specifically, OpenTelemetry helps you answer critical questions: Which LLM calls are slowest? How many tokens am I consuming per request? Where are errors occurring in my agent workflow? What’s the end-to-end latency for user requests? And while OpenTelemetry is powerful, manually instrumenting every LLM call and agent step is tedious and error-prone. This is where OpenLit shines. OpenLit provides automatic instrumentation for AI frameworks: Zero code changes required; wrap your Python command with openlit-instrument Automatically capture LLM calls (OpenAI, Anthropic, Bedrock, etc.) Support for agent frameworks (CrewAI, LangChain, LlamaIndex) Export OpenTelemetry-compatible data to any OTLP backend Tutorial: deploy and monitor a CrewAI agent To illustrate how this works, let’s build a complete example: a research assistant agent powered by CrewAI and Meta’s Llama 3, deployed on AWS Bedrock AgentCore, with full observability in Grafana Cloud. Prerequisites Before starting, ensure you have: Python 3.12+** installed AWS CLI configured with credentials: Bash Copy aws configure You’ll need permissions for: Bedrock AgentCore Amazon ECR (Elastic Container Registry) Bedrock model access (specifically meta.llama3-8b-instruct-v1:0) 4. Grafana Cloud account (If you don’t have one, you can sign up for our forever-free tier now.) 5. AgentCore CLI installed: Bash Copy �� python -m venv .venv && source .venv/bin/activate
pip install bedrock-agentcore-starter-toolkit Step 1: Create a CrewAI Agent Let’s create an example AI Agent using CrewAI: Python Copy import os
from bedrock_agentcore import BedrockAgentCoreApp
from crewai import Agent, Task, Crew, Process

Initialize AgentCore runtime


app = BedrockAgentCoreApp()

Define a simple research assistant agent


researcher = Agent(
role="Research Assistant",
goal="Provide helpful, accurate answers, with concise summaries.",
backstory=("You are a knowledgeable research assistant who answers clearly "
"and cites facts when relevant."),

Use Llama 3 8B via AWS Bedrock


llm="bedrock/meta.llama3-8b-instruct-v1:0",
verbose=False,
max_iter=2
)
@app.entrypoint
def invoke(payload: dict):
"""AgentCore entrypoint. Expects {'prompt': ''}"""
user_message = payload.get("prompt", "Hello!")
task = Task(
description=user_message,
agent=researcher,
expected_output="A helpful, well-structured response."
)
crew = Crew(
agents=[researcher],
tasks=[task],
process=Process.sequential,
verbose=False,
)
result = crew.kickoff()
return {"result": result.raw}
if _name == "main_":
app.run() Key components: BedrockAgentCoreApp: Integrates CrewAI with the Amazon Bedrock AgentCore runtime Agent definition: Single agent with a research assistant role using Llama 3 @app.entrypoint: Decorator that marks the function as the agent’s entry point Crew orchestration: CrewAI manages task execution and agent coordination The agent accepts JSON input like {"prompt": "your question"} and returns a JSON response. Step 2: Configure dependencies Create a requirements.txt that includes: Copy crewai>=1.0.0
openlit>=1.35
litellm
bedrock-agentcore Step 3: Configure AgentCore deployment Run the AgentCore configuration command: Bash Copy agentcore configure \
--deployment-type container \
--entrypoint crewai_agent.py \
--name crewai_agent \
--non-interactive This generates a .bedrockagentcore/crewaiagent/ directory with: - Dockerfile: Container build configuration - agentconfig.json: Metadata for AgentCore Step 4: Add OpenTelemetry configuration Now comes the observability magic. Edit the generated Dockerfile at .bedrockagentcore/crewai_agent/Dockerfile and add these environment variables: dockerfile Copy # Disable AWS ADOT observability to use OpenLIT exclusively
ENV DISABLEADOTOBSERVABILITY="true"

OpenTelemetry configuration for Grafana Cloud


ENV OTELSERVICENAME="my_service"
ENV OTELDEPLOYMENTENVIRONMENT="my_environment"
ENV OTELEXPORTEROTLPENDPOINT="yourgrafanacloudotlp_endpoint"
ENV OTELEXPORTEROTLPHEADERS="Authorization=Basic%20" Important: Replace the OTLP endpoint and headers with your Grafana Cloud credentials: Sign in to the Grafana Cloud portal and select your Grafana Cloud stack. Click Configure in the OpenTelemetry section. In the Password / API Token section, click Generate to create a new API token Give the API token a name Click on Create token Click on Close without copying the token Copy and replace the values for OTELEXPORTEROTLPENDPOINT and OTELEXPORTEROTLP_HEADERS in the Dockerfile ENVs For more information, refer to our guide on manually setting up OpenTelemetry for Grafana Cloud. Next, ensure the CMD line in the Dockerfile uses OpenLit’s instrumentation wrapper: dockerfile Copy # Use OpenLit to automatically instrument the agent
CMD ["openlit-instrument", "python", "-m", "crewai_agent"] What’s happening here? openlit-instrument wraps your Python command At runtime, OpenLit automatically monitors the CrewAI agent operations Every LLM request and agent task is traced and exported to Grafana via OTLP Step 5: Build and deploy Build the Docker image and deploy to AgentCore: Bash Copy agentcore launch --local-build This command will: 1. Build the Docker image locally with all dependencies 2. Push the image to Amazon ECR 3. Deploy the agent to Bedrock AgentCore 4. Set up IAM execution roles 5. Configure the runtime environment The deployment process takes two to five minutes. You’ll see output like: Copy ��� Pushing to ECR...
��� Deploying to AgentCore...
��� Agent deployed successfully!
Agent ID: agt_abc123xyz Step 6: Invoke the agent Test your deployed agent: Bash Copy # Simple test
agentcore invoke '{"prompt": "hi"}'

Research query


agentcore invoke '{"prompt": "Explain AI Observability"}'

Complex request


agentcore invoke '{"prompt": "Compare supervised and unsupervised learning with examples"}' Response: JSON Copy {
"result": "Quantum computing is a revolutionary approach to computation that..."
} Response: JSON Copy {
"result": "Quantum computing is a revolutionary approach to computation that..."
} Step 7: Explore Grafana Cloud AI Observability Once you have telemetry flowing from CrewAI Agent on AgentCore to Grafana Cloud, you can use the pre-built dashboards from Grafana Cloud AI Observability. Navigate to Connections ��� search for AI Observability and click on it ��� go to GenAI Observability, scroll down, and install the dashboards. Here’s a breakdown of what you can see in the dashboards: End-to-end latency: Total time from request to response LLM call details: Which model, how many tokens, latency, cost Agent workflow: Task creation, execution, and response formatting Error traces: If something fails, you’ll see the exact step and error message Next steps The combination of AWS Bedrock AgentCore, OpenTelemetry, and Grafana Cloud provides a production-ready stack for AI agents with enterprise-grade observability. Explore our Grafana Cloud AI Observability documentation to learn more. Grafana Cloud is the easiest way to get started with metrics, logs, traces, dashboards, and more. We have a generous forever-free tier and plans for every use case. Sign up for free now!
Read the Original Article

This article originally appeared on Grafana Labs blog on Grafana Labs.

Read Full Article on Original Site

Popular from Grafana Labs blog on Grafana Labs