Building an AI Agent Workflow Builder: Notes on Artificial Intelligence
The initial wave of applied AI has been dominated by single-shot interactions: prompt in, response out. This is powerful but limiting. The compelling problems—automating complex business processes, conducting multi-step research, managing dynamic systems—require more than a simple request/response cycle. They require state, branching logic, and interaction with external tools. They require agents.
An AI agent is, at its core, a system that perceives its environment and takes actions to achieve goals. Building one-off agents with hardcoded logic is brittle. The technically interesting problem is creating a generalized system for *composing* agents. This moves the task from writing imperative code to defining a declarative workflow—a graph of possibilities that an AI can traverse. This is a problem of orchestration and system design, which is far more durable than chasing the latest model.
Try the interactive demoArchitecting the Workflow: A Directed Graph
The natural abstraction for a workflow is a Directed Acyclic Graph (DAG). Each node represents an action (e.g., call an LLM, fetch data from an API, run a Python script), and each edge represents a transition. The "acyclic" constraint simplifies execution, though for more advanced agents, allowing cycles (loops) is a necessary complexity.
Let's consider how to model and execute this with a practical stack: React/TypeScript for the front-end builder, Python for the execution engine, PostgreSQL for state, and AWS for hosting.
Data Modeling in PostgreSQL
A relational schema is perfectly capable of representing a graph. The key is to separate the definition of the workflow from its execution. This is critical for auditability and debugging.
- `workflows`: Stores the high-level definition (`id`, `name`, `description`).
- `nodes`: The core of the graph. Each row is a node (`id`, `workflow_id`, `type`, `config`). `type` is an enum like `'llm_call'`, `'api_request'`, `'conditional_branch'`, or `'human_in_the_loop'`. The `config` column, a `JSONB` type, holds type-specific settings like the prompt for an LLM or the URL for an API call.
- `edges`: Defines the connections. (`id`, `workflow_id`, `source_node_id`, `target_node_id`, `condition`). The `condition` allows for branching logic, e.g., "Proceed if the output of the source node contains 'SUCCESS'".
- `executions`: A log of every time a workflow is run (`id`, `workflow_id`, `status`, `inputs`, `final_outputs`, `created_at`).
- `execution_steps`: A granular, append-only log of each node's execution within a run (`id`, `execution_id`, `node_id`, `inputs`, `outputs`, `status`, `started_at`, `finished_at`). This table is the source of truth for debugging and provides the data for real-time UI updates.
Using `JSONB` for node configuration is a pragmatic tradeoff. It sacrifices some relational purity for immense flexibility. You can introduce new node types with complex configurations without requiring a database migration. The cost is a reliance on application-level validation to ensure the JSON schema is correct for a given node type.
The Builder UI: React and TypeScript
The user interface is a visual graph editor. A library like React Flow is an excellent starting point, providing the canvas, nodes, and edge-drawing mechanics. The real work is in the application logic built on top.
This is where TypeScript becomes invaluable. We can define strict types for each node's configuration object:
interface LLMCallNodeConfig {
model: 'gpt-4o' | 'claude-3-opus';
promptTemplate: string;
temperature: number;
}
interface ApiRequestNodeConfig {
url: string;
method: 'GET' | 'POST';
headers: Record<string, string>;
}
type NodeConfig = LLMCallNodeConfig | ApiRequestNodeConfig;
These types ensure that the configuration panel for a selected node only shows relevant fields and that the data sent to the backend API matches what the execution engine expects. This prevents a whole class of runtime errors.
Execution Engine: Python and AWS
A Python service acts as the workflow executor. When a run is triggered, it receives a `workflow_id` and initial inputs. It then fetches the graph structure from PostgreSQL and begins traversing it, starting from the designated entry node.
For simple, short-lived workflows, a single synchronous process might suffice. But for anything non-trivial, especially tasks involving external API calls or potential delays, a task queue is essential. Using something like Celery with RabbitMQ or Redis allows the execution to happen asynchronously. The web server can immediately return a `202 Accepted` with an `execution_id`, and the client can subscribe to updates.
At scale, this architecture involves a fleet of workers running on AWS ECS or EC2. The database becomes the central point of coordination and state. A key scaling challenge becomes managing contention on the `execution_steps` table, which will see heavy write traffic. This is a classic pattern: the system that works for 100 executions a day may need rethinking for 100,000. At that point, you might consider offloading logs to a dedicated system or using database partitioning.
Pragmatism, Correctness, and the Human in the Loop
A senior engineer's most important contribution is often deciding what *not* to build. It's tempting to design a system that can handle every conceivable edge case from day one. This is a trap. Start with a simple DAG traversal engine. Handle errors by halting the execution and flagging it for review. Add retries and dead-letter queues only when you have evidence of transient failures.
The most critical feature for building trust in an AI system is observability, followed closely by human oversight. The `execution_steps` table is the foundation for this. For the UI, we need a way to visualize the execution path in real-time. As the Python worker completes each node, it writes to the log table. This can trigger a PostgreSQL `NOTIFY` event, which a backend service listens to and forwards to the connected client via Server-Sent Events (SSE) or WebSockets. SSE is often a simpler and better fit here, as it's a one-way stream from server to client, perfectly matching the flow of execution updates.
Furthermore, we must design for intervention. A `human_in_the_loop` node type is not an edge case; it's a core requirement for any process involving sensitive actions or high-stakes decisions. When the execution engine encounters this node, it pauses the workflow, sets its status to `'awaiting_approval'`, and sends a notification (e.g., an email or Slack message). The workflow only resumes when a user provides approval via an authenticated API call, which updates the execution status and allows the worker to proceed.
A Closing Reflection
Building tools to orchestrate AI agents feels like a fundamental shift. We are moving from "prompt engineering" to "flow engineering." The value is not just in the output of a single LLM call but in the reliable, repeatable, and debuggable execution of a complex graph of operations. The challenge is not just to build powerful systems, but to build intelligible ones. The graph visualization, the detailed execution logs, and the explicit points for human intervention are not just features—they are the foundation for building systems that we can actually trust.