This is Part 2 of our series on building a production-grade AI development workflow. Read Part 1: Why Your AI Coding Setup Will Eventually Hurt You
Last week we talked about why most AI coding setups quietly fall apart — context limits, scattered tooling, no memory between sessions. This week we’re building the foundation that fixes all of it.
We’re starting with AWS Bedrock and Claude Code. Not because they’re trendy, but because this specific combination solves problems you’ll hit the moment you try to use AI on anything beyond a single-file script.
Why Bedrock Specifically
You can use Claude through Anthropic’s API directly. It works fine. But Bedrock gives you three things that matter enormously once you’re doing real work:
Prompt caching. This is the big one. Bedrock’s prompt caching can reduce costs by up to 90% and latency by up to 85%. If you’ve ever waited 15 seconds for Claude to re-read your entire codebase context on every single message, you understand why this matters. With caching, that context gets stored and reused. Your second, third, and fiftieth message in a session all reference the same cached context instead of re-processing it from scratch.
In practical terms: a session where you’re iterating on a complex feature — sending 30-40 messages against the same codebase context — goes from costing several dollars to costing cents. The latency improvement means you’re getting responses in 2-3 seconds instead of 10-15. That’s the difference between a flow state and a frustrating one.
Infrastructure you already manage. If your team is on AWS (and statistically, you probably are), Bedrock slots into your existing IAM roles, VPC configurations, CloudWatch logging, and billing. No separate vendor relationship. No separate API keys floating around. No separate billing reconciliation. It’s just another AWS service.
Model flexibility without rewiring. Bedrock gives you access to Claude Sonnet, Claude Haiku, and other models through the same interface. When you need a cheaper model for simple tasks (linting, formatting, boilerplate) and a powerful model for architecture decisions, you’re switching a parameter — not switching providers.
The Cost and Context Trade-Offs That Actually Matter
Before we set anything up, let’s be honest about the math.
Without caching: Claude Sonnet on Bedrock costs roughly $3 per million input tokens and $15 per million output tokens. A heavy development session where you’re feeding in project context repeatedly can run $5-15/day per developer. That adds up fast across a team.
With caching: Those same sessions drop to roughly $0.50-2.00/day. The cache reads cost a fraction of fresh input processing. Over a month, for a team of five engineers, you’re looking at the difference between a $1,500 bill and a $200 bill. That’s not optimization — that’s the difference between “we can afford to use this” and “we can’t.”
The context window trade-off is equally important. Claude’s 200K token context window is massive, but it’s not infinite. You need a strategy for what goes into that window and what doesn’t. We’ll cover that in depth in the CLAUDE.md and memory layer posts (Weeks 3 and 4), but the infrastructure decision matters now: Bedrock’s caching makes it economically viable to use large context windows aggressively, which means your AI actually understands enough of your codebase to give useful answers.
Setting It Up: Technical Steps
Here’s the actual setup. This assumes you have an AWS account and basic CLI familiarity.
Step 1: Enable Bedrock Model Access
Bedrock doesn’t give you model access by default. You need to request it.
- Open the AWS Console and navigate to Amazon Bedrock
- In the left sidebar, click Model access
- Click Manage model access
- Find and enable the Anthropic models you need:
- Claude Sonnet 4 (your primary workhorse — strong reasoning, good speed)
- Claude Haiku 4 (fast and cheap — use for simple tasks, commit messages, quick lookups)
- Submit the request. Approval is usually instant for most regions, but can take up to a few minutes.
Region matters. Not all regions support all models or prompt caching. As of this writing, us-east-1 (N. Virginia) and us-west-2 (Oregon) have the broadest support. Pick one of these unless you have a specific compliance reason not to.
Step 2: Set Up IAM Identity Center (SSO)
Don’t create long-lived IAM access keys. Don’t use your root credentials. Use IAM Identity Center with SSO — it’s what AWS recommends, and for good reason.
Long-lived access keys are a liability. They sit in dotfiles, get committed to repos accidentally, and never expire unless someone remembers to rotate them. SSO gives you temporary credentials that auto-expire, centralized access management, and an audit trail of who’s using what.
Enable IAM Identity Center (if you haven’t already):
- Open the AWS Console and navigate to IAM Identity Center
- Click Enable (this needs to be done in your management account if you’re using AWS Organizations)
- Choose your identity source — if you’re already using Google Workspace, Okta, Azure AD, or similar, connect it here. Otherwise the built-in directory works fine for smaller teams.
Create a permission set for Bedrock access:
# Create a permission set with scoped Bedrock access
# You can do this in the console under IAM Identity Center → Permission sets → Create
# The inline policy should look like this:
cat > bedrock-permission-set-policy.json << 'EOF'
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": [
"arn:aws:bedrock:*::foundation-model/anthropic.claude-*"
]
}
]
}
EOF
- In IAM Identity Center, go to Permission sets → Create permission set
- Choose Custom permission set
- Add the inline policy above
- Name it something clear like
BedrockClaudeCodeAccess - Assign it to the relevant users or groups for the AWS account where Bedrock is enabled
Configure the AWS CLI for SSO:
# Configure your SSO profile
aws configure sso
# Follow the prompts:
# SSO session name: baur-sso (or whatever makes sense)
# SSO start URL: https://your-org.awsapps.com/start
# SSO region: us-east-1
# SSO registration scopes: sso:account:access
# This creates a named profile. Log in:
aws sso login --profile your-profile-name
This gives you temporary credentials that refresh automatically. No keys to leak, no rotation to forget, and if someone leaves the team you revoke access in one place.
Why this matters beyond security: When you’re onboarding new developers (or new client engineers in our case), SSO means “log in with your company account” instead of “here’s a CSV with access keys, put them in your shell profile, and don’t lose them.” The former scales. The latter turns into a credential management nightmare by developer number five.
Step 3: Install and Configure Claude Code
# Install Claude Code
npm install -g @anthropic-ai/claude-code
# Configure it to use Bedrock with your SSO profile
export CLAUDE_CODE_USE_BEDROCK=1
export AWS_PROFILE=your-profile-name
export AWS_REGION=us-east-1
For persistence, add these to your shell profile (~/.zshrc, ~/.bashrc, etc.) or use direnv to scope them per project:
# In your project root, create .envrc
echo 'export CLAUDE_CODE_USE_BEDROCK=1' >> .envrc
echo 'export AWS_PROFILE=your-profile-name' >> .envrc
echo 'export AWS_REGION=us-east-1' >> .envrc
direnv allow
When your SSO session expires, just run aws sso login --profile your-profile-name again. No keys to manage.
Step 4: Verify the Connection
# Start Claude Code
claude
# In the Claude Code session, ask something simple:
> What model are you running on and through what provider?
Claude should confirm it’s running through Bedrock. If you get authentication errors, make sure your SSO session is active (aws sso login --profile your-profile-name) and double-check your permission set and region settings.
Step 5: Enable Prompt Caching
Here’s where it gets good. Prompt caching on Bedrock works automatically for qualifying requests — you don’t need to flip a switch. But you do need to structure your usage to benefit from it.
Caching kicks in when:
- You’re sending repeated context (system prompts, project files, documentation) across messages in a session
- The cached content meets the minimum token threshold (typically ~1,024 tokens for Sonnet, ~2,048 for larger contexts)
- You’re using the same model in the same region
What this means in practice with Claude Code: When you start a session and Claude Code loads your project context (your CLAUDE.md file, relevant source files, documentation), that context gets cached on the first message. Every subsequent message in that session reuses the cache. This is why longer, focused sessions are dramatically cheaper than many short ones.
To monitor your cache hit rates:
# Enable CloudWatch metrics for Bedrock
# In the AWS Console: Bedrock → Settings → Enable model invocation logging
# Or via CLI:
aws bedrock put-model-invocation-logging-configuration \
--logging-config '{
"cloudWatchConfig": {
"logGroupName": "/aws/bedrock/claude-code",
"roleArn": "arn:aws:iam::<account-id>:role/BedrockLoggingRole",
"largeDataDeliveryS3Config": {
"bucketName": "your-bedrock-logs-bucket",
"keyPrefix": "claude-code/"
}
}
}'
This gives you visibility into which requests are hitting cache and which aren’t — critical for understanding your actual costs.
Step 6: Set Up Cost Alerts
This is non-negotiable. AI costs can spike unexpectedly, especially during heavy development sprints.
# Create a Bedrock-specific cost budget
aws budgets create-budget \
--account-id <your-account-id> \
--budget '{
"BudgetName": "bedrock-claude-monthly",
"BudgetLimit": {"Amount": "500", "Unit": "USD"},
"TimeUnit": "MONTHLY",
"BudgetType": "COST",
"CostFilters": {
"Service": ["Amazon Bedrock"]
}
}' \
--notifications-with-subscribers '[
{
"Notification": {
"NotificationType": "ACTUAL",
"ComparisonOperator": "GREATER_THAN",
"Threshold": 80,
"ThresholdType": "PERCENTAGE"
},
"Subscribers": [
{"SubscriptionType": "EMAIL", "Address": "your-email@company.com"}
]
}
]'
Set the threshold to whatever makes sense for your team. The point is that you find out about cost overruns from a budget alert — not from your monthly AWS bill.
What This Gets You
At the end of these steps, you have:
- Claude Code running through Bedrock with SSO and properly scoped permissions
- Prompt caching automatically reducing your costs and latency
- CloudWatch logging so you can see what’s happening
- Cost alerts so you don’t get surprised
This is the infrastructure layer. It’s not exciting by itself. But every single thing we build in the coming weeks — the CLAUDE.md configuration, the memory systems, the multi-agent workflows — all of it runs on top of this foundation.
If the foundation is slow, expensive, or poorly secured, everything above it inherits those problems. Get this right now, and the rest of the series is about building capabilities. Get it wrong, and the rest of the series is about fighting your own infrastructure.
Next Week
Week 3: CLAUDE.md — Teaching AI How Your Team Actually Works. The configuration file that turns a generic AI assistant into one that understands your architecture decisions, coding standards, and project context. This is where things start to get genuinely powerful.
At Baur Software, we combine AI tooling like this with experienced engineers to help teams ship faster without sacrificing quality. If you’re building out your AI development workflow and want to skip the trial-and-error phase, let’s talk.
