Hey everyone, Leo here from AGNTDEV.com. Hope you’re all having a productive week. As you know, I’ve been deep in the trenches with agent development, specifically focused on getting these things to actually *do* stuff, not just sound smart in a demo. Today, I want to talk about something that’s been on my mind for a while, and it’s something I think many of us building agents are struggling with: the “last mile” problem in agent actions. Or, as I like to call it, getting your agent to actually build a damn thing.
We’ve all seen the impressive demos. Agents planning intricate trips, summarizing documents, writing marketing copy. They’re amazing at information processing and decision-making. But when it comes to interacting with the real world, beyond making an API call to fetch data, things get… fuzzy. How do you get an agent to, say, provision a new server, deploy a microservice, or even just build a simple static website? That’s where the “build” part of agent development really gets interesting, and often, frustrating.
The Illusion of Agent Builders
For a long time, I was caught up in the hype. The idea that an agent could just “build a website” was tantalizing. Give it a prompt, and poof, a fully functional site appears. The reality, as many of you have probably discovered, is far more complex. Most agents, even the most sophisticated ones, are excellent at *generating instructions* or *writing code snippets*. They can tell you exactly what command to run, or even write a full Python script to achieve a task. But the actual execution, the environment setup, the dependency management, the error handling – that often falls back to us, the human developers.
My first attempt at having an agent build something substantial was about six months ago. I wanted a simple CRUD application for tracking my blog post ideas. My agent, let’s call him “Architect,” was brilliant at designing the database schema, even recommending a frontend framework. It generated all the necessary SQL migrations and a skeleton React component. But when it came to actually spinning up a local database, installing Node.js, running `npm install`, and then actually serving the application? Architect just gave me a list of commands. And guess what? Half of them failed because my environment wasn’t set up exactly as it expected. It was like having a brilliant architect design a skyscraper, but then hand you a toolbox and say, “Good luck building it yourself!”
Why Building Is Hard for Agents (Right Now)
The core issue, as I see it, boils down to a few things:
- Environment Context: Agents lack a persistent, dynamic understanding of their execution environment. Is Node.js installed? What version? Are the correct environment variables set? Do I have permissions to write to this directory?
- State Management: Building is an inherently stateful process. Each step depends on the success of the previous one. An agent might propose a series of commands, but it often doesn’t have the internal mechanism to track the outcome of each command and adapt accordingly.
- Error Handling & Recovery: Real-world builds fail. Network issues, missing dependencies, syntax errors. Agents are generally poor at diagnosing the root cause of an error in a build process and then intelligently attempting recovery.
- Tooling & Abstraction: We humans use a vast array of tools and abstractions (Docker, Kubernetes, CI/CD pipelines, package managers) to simplify building. Agents often interact at a lower level, executing individual commands, which increases the surface area for failure.
This isn’t to say agents are useless for building. Far from it. They’re incredible accelerators. But we need to shift our perspective from “agent builds X” to “agent *assists in building* X, within a carefully constructed framework.”
My Approach: Constraining the Build Environment
After my initial frustrations, I started thinking about how I could bridge this gap. My solution has been to constrain the agent’s build environment as much as possible, giving it a powerful but limited sandbox to play in. This often means leveraging existing, mature build tools that are designed for robustness and reproducibility.
Here’s a practical example. Let’s say I want an agent to build and deploy a simple static website to an S3 bucket. Instead of having the agent try to figure out `aws cli` commands, install dependencies, and manage local files, I containerize the entire build process.
Example 1: Static Site Deployment with Docker and an Agent
The agent’s job here isn’t to *build* the Docker image or *write* the Dockerfile from scratch (though it could help refine it). Its job is to *orchestrate* the build and deployment process using pre-defined tools.
First, I set up a project structure like this:
my-static-site/
├── src/
│ └── index.html
├── Dockerfile
├── deploy.sh
└── agent_instructions.md
My `Dockerfile` is simple, using Nginx to serve the site:
# Dockerfile
FROM nginx:alpine
COPY src /usr/share/nginx/html
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
The `deploy.sh` script is where the real work happens. This script is designed to be idempotent and handle the actual AWS CLI commands. The agent’s role is to *invoke* this script with the correct parameters.
# deploy.sh (simplified for brevity)
#!/bin/bash
SITE_BUCKET=$1
AWS_REGION=$2
DOCKER_IMAGE_NAME="my-static-site"
DOCKER_IMAGE_TAG="latest"
if [ -z "$SITE_BUCKET" ] || [ -z "$AWS_REGION" ]; then
echo "Usage: $0 <s3-bucket-name> <aws-region>"
exit 1
fi
echo "Building Docker image..."
docker build -t "$DOCKER_IMAGE_NAME:$DOCKER_IMAGE_TAG" .
echo "Creating temporary container to copy files..."
CONTAINER_ID=$(docker create "$DOCKER_IMAGE_NAME:$DOCKER_IMAGE_TAG")
docker cp "$CONTAINER_ID":/usr/share/nginx/html ./dist
docker rm "$CONTAINER_ID"
echo "Syncing 'dist' directory to S3 bucket: s3://$SITE_BUCKET in region $AWS_REGION"
aws s3 sync ./dist "s3://$SITE_BUCKET" --region "$AWS_REGION" --delete
echo "Setting public read permissions for the bucket content..."
aws s3api put-bucket-policy --bucket "$SITE_BUCKET" --region "$AWS_REGION" --policy '{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::'"$SITE_BUCKET"'/*"
}
]
}'
echo "Deployment complete. Site should be accessible at http://$SITE_BUCKET.s3-website-$AWS_REGION.amazonaws.com"
rm -rf ./dist
Now, the agent’s task, described in `agent_instructions.md`, might look something like this:
# agent_instructions.md
## Task: Deploy Static Website
You are an agent responsible for deploying a static website.
The website source is located in the `src/` directory.
The deployment process involves building a Docker image, extracting the static files, and syncing them to an S3 bucket.
### Available Tools:
- `docker`: For building and managing Docker containers.
- `deploy.sh`: A shell script that handles the S3 synchronization and bucket policy setup.
### Required Parameters:
- `S3_BUCKET_NAME`: The name of the S3 bucket where the website will be deployed.
- `AWS_REGION`: The AWS region where the S3 bucket is located.
### Steps:
1. Ensure the `Dockerfile` and `deploy.sh` script are present in the current directory.
2. Execute the `deploy.sh` script with the provided `S3_BUCKET_NAME` and `AWS_REGION` as arguments.
Example: `./deploy.sh my-awesome-website-bucket us-east-1`
3. Report the success or failure of the deployment, and provide the S3 website URL if successful.
### My action:
The user wants to deploy the static site to an S3 bucket named "leo-agntdev-site-2026" in "us-west-2".
I should execute the `./deploy.sh` script with these parameters.
The agent then, using its internal tool-calling mechanism (e.g., via a `shell_exec` tool), simply runs: `./deploy.sh leo-agntdev-site-2026 us-west-2`.
This approach gives the agent a clear, defined boundary. It doesn’t need to know the intricacies of `aws s3 sync` or `aws s3api`. It just needs to know *how to call the pre-built tool* (`deploy.sh`) with the right parameters. The `deploy.sh` script itself is robust, idempotent, and handles the low-level details, including basic error checking.
Beyond Shell Scripts: Leveraging SDKs and Frameworks
While shell scripts are great for simple tasks, for more complex builds, we can give agents access to SDKs and frameworks. This means equipping the agent with a Python environment, for instance, and then providing it with a set of pre-written Python functions that encapsulate complex operations.
Example 2: Provisioning a Cloud Resource with a Python SDK
Let’s say I want an agent to provision a new EC2 instance. Instead of having it directly run `aws ec2 run-instances` (which has a gazillion parameters), I give it a Python script with a function that takes a few high-level parameters.
Here’s a simplified Python script (`aws_provisioner.py`):
# aws_provisioner.py
import boto3
import time
def provision_ec2_instance(instance_type: str, ami_id: str, key_name: str, region: str, tags: dict = None):
"""
Provisions a new EC2 instance with basic configuration.
Returns the instance ID if successful.
"""
ec2 = boto3.client('ec2', region_name=region)
try:
print(f"Attempting to provision EC2 instance in {region}...")
response = ec2.run_instances(
ImageId=ami_id,
MinCount=1,
MaxCount=1,
InstanceType=instance_type,
KeyName=key_name,
TagSpecifications=[
{
'ResourceType': 'instance',
'Tags': [{'Key': k, 'Value': v} for k, v in (tags or {}).items()]
},
]
)
instance_id = response['Instances'][0]['InstanceId']
print(f"Instance {instance_id} launched. Waiting for it to enter 'running' state...")
# Wait for the instance to be running
waiter = ec2.get_waiter('instance_running')
waiter.wait(InstanceIds=[instance_id])
print(f"Instance {instance_id} is now running.")
return instance_id
except Exception as e:
print(f"Error provisioning EC2 instance: {e}")
return None
def terminate_ec2_instance(instance_id: str, region: str):
"""
Terminates a given EC2 instance.
"""
ec2 = boto3.client('ec2', region_name=region)
try:
print(f"Attempting to terminate EC2 instance {instance_id} in {region}...")
ec2.terminate_instances(InstanceIds=[instance_id])
print(f"Instance {instance_id} termination initiated.")
return True
except Exception as e:
print(f"Error terminating EC2 instance {instance_id}: {e}")
return False
if __name__ == '__main__':
# Example usage for testing purposes
# This part can be removed or protected from agent execution
# instance_id = provision_ec2_instance("t2.micro", "ami-0abcdef1234567890", "my-key-pair", "us-east-1", {"Project": "AgentTest"})
# if instance_id:
# print(f"Successfully provisioned instance: {instance_id}")
# # time.sleep(60) # Simulate some work
# # terminate_ec2_instance(instance_id, "us-east-1")
pass
The agent is then given a tool definition that points to this script. For example, in a LangChain-like setup, you’d define a `PythonTool` that exposes `provision_ec2_instance` and `terminate_ec2_instance`. The agent’s prompt can then instruct it to use these tools:
# Agent Prompt Snippet
## Tools Available:
- `provision_ec2_instance(instance_type: str, ami_id: str, key_name: str, region: str, tags: dict)`: Provisions an EC2 instance.
- `terminate_ec2_instance(instance_id: str, region: str)`: Terminates an EC2 instance.
## Task:
A user wants to quickly spin up a 't2.micro' instance in 'us-west-2' using AMI 'ami-0abcdef1234567890' with key pair 'my-dev-key'. The instance should be tagged with 'Purpose: Testing'.
I need to use the `provision_ec2_instance` tool for this.
The agent would then generate a call like: `provision_ec2_instance(instance_type=”t2.micro”, ami_id=”ami-0abcdef1234567890″, key_name=”my-dev-key”, region=”us-west-2″, tags={“Purpose”: “Testing”})`.
This is a powerful pattern. The agent doesn’t need to write the `boto3` code itself. It just needs to understand the function signature and intent. This dramatically reduces the complexity for the agent and increases the reliability of the build process. We’re essentially giving our agents well-defined, robust “building blocks” instead of expecting them to craft every brick from raw materials.
Actionable Takeaways for Your Agent Builds
- Define Clear Boundaries: Don’t expect your agent to be a full-stack DevOps engineer, especially not initially. Clearly define what parts of the build process the agent is responsible for.
- Pre-Build Your Tools: For complex or sensitive operations, write robust, idempotent scripts or functions (in Bash, Python, Go, etc.) that handle the low-level details. Agents are fantastic at orchestrating these tools.
- Containerize Environments: Use Docker or similar technologies to provide agents with consistent, isolated build environments. This solves a huge chunk of the environment context problem.
- Abstract Complexity with SDKs: For programmatic interactions (like cloud APIs), encapsulate complex API calls into simpler, higher-level functions that your agent can easily call via an SDK.
- Embrace Idempotence: Design your build tools to be idempotent. This means running them multiple times with the same input should produce the same result without unintended side effects. This makes agent recovery and re-attempts much safer.
- Focus on Orchestration: Agents excel at planning and sequencing. Frame your build tasks as a series of steps that can be executed by well-defined tools. The agent’s intelligence then lies in determining the correct sequence and parameters for these tools.
- Start Small and Iterate: Don’t try to build a fully autonomous CI/CD agent on day one. Start with a single, well-defined build task and gradually expand its capabilities.
The dream of agents building complex systems from scratch is still a ways off, but the reality of agents *assisting* in builds, making our lives easier by automating tedious orchestration, is here today. By providing our agents with the right tools and a constrained, predictable environment, we can unlock immense productivity gains and finally get these intelligent systems to build some damn useful things.
What are your experiences with agents building things? Hit me up in the comments or on X (agntdev) – always keen to hear what you’re working on!
🕒 Published: