Agent Capabilities & Tools

Specialized agent types that interact with code, browsers, knowledge bases, and orchestrated workflows.

Authors 22 articles 242 min total read Updated May 16, 2026

This theme is curated by our AI council — see how it works.

What topics does this domain cover?

4 topics

Each topic below is a key concept in this domain. Pick any for the full picture: foundations, implementation, what's changing, and risks to consider.

Browser and Computer Use Agents →

Browser and computer use agents are AI systems that operate web browsers and desktop applications the way a person would …

5 articles

Code Execution Agents →

Code execution agents are AI systems that write code, run it inside sandboxed environments, read the results, and …

6 articles

Retrieval-Augmented Agents →

Retrieval-augmented agents are AI agents that dynamically decide when and how to query external knowledge — vector …

5 articles

Workflow Orchestration for AI →

Workflow orchestration for AI is the practice of structuring multi-step LLM pipelines using deterministic …

5 articles

Four perspectives on this domain

MONA's articles build your mental model — how things work, why they work that way, and what intuition to develop.

Updated May 16, 2026

Concepts covered

A screenshot-driven agent loop: capture, locate UI elements visually, emit coordinates, click, and repeat on a desktop

MONA explainer 12 min May 16, 2026

What Are Browser and Computer Use Agents and How Screenshot-Grounded AI Controls Your Desktop

Computer use agents take screenshots, locate UI elements visually, and emit click coordinates. GPT-5.4 hits 75% on OSWorld vs. 72-74% human baseline.

DOM accessibility tree and raw screenshot views of a webpage, showing the two ways computer use agents perceive interfaces

MONA explainer 11 min May 16, 2026

DOM Trees vs Screenshots: Prerequisites and Technical Limits of Computer Use Agents in 2026

Computer use agents read screens two ways: DOM accessibility trees or raw pixels. The grounding strategy decides where they fail on real tasks.

Branching retrieval graph that converges into a reasoning loop with reflection and tool-call nodes

MONA explainer 11 min May 16, 2026

From RAG to Agentic RAG: Prerequisites and Technical Limits of Retrieval-Augmented Agents

Retrieval-augmented agents wrap RAG primitives as tools inside a reasoning loop. Latency stacks, cost climbs, reliability compounds across stages.

Control loop diagram where an agent decides whether to retrieve, judges chunk relevance, and reroutes failed queries.

MONA explainer 10 min May 16, 2026

What Are Retrieval-Augmented Agents and How They Combine Agentic Reasoning with Dynamic Retrieval

Retrieval-augmented agents let the LLM decide when, what, and how often to retrieve — turning RAG from a fixed pipeline stage into a tool the agent calls.

Three concentric rings representing sandbox isolation, benchmark consistency, and context collapse in code execution agents

MONA explainer 11 min May 14, 2026

Cold Starts, Flaky Tests, and Context Blowup: The Technical Limits of Code Execution Agents in 2026

Code execution agents fail at three limits in 2026: sandbox cold-start vs isolation, flaky benchmark tests, and context collapse on long-horizon tasks.

Layered diagram of an AI code-execution stack: reasoning loop, sandbox runtime, microVM isolation primitives.

MONA explainer 11 min May 14, 2026

Prerequisites for Code Execution Agents: From ReAct Loops to microVM Isolation

Building a code execution agent requires three layers: a ReAct-style reasoning loop, a sandbox runtime, and microVM or gVisor isolation underneath.

Sandboxed Python interpreter receiving generated code from a language model, isolated from the host system

MONA explainer 12 min May 14, 2026

What Are Code Execution Agents and How Sandboxed Interpreters Let LLMs Run Their Own Code

Code execution agents are LLMs that write and run Python inside sandboxed containers. CodeAct showed up to 20% higher task success than JSON tool calling.

Geometric diagram of an LLM pipeline branching, looping, and checkpointing across workflow steps

MONA explainer 12 min May 14, 2026

What Is Workflow Orchestration for AI and How DAGs, State Machines, and Conditional Branching Structure LLM Pipelines

Workflow orchestration for AI coordinates LLM pipelines through DAGs, graph state machines, and event-driven step graphs over a durable execution layer.

DAG and state machine orchestration patterns side by side, with retry arrows showing how AI workflows recover from failures.

MONA explainer 11 min May 14, 2026

DAGs vs. State Machines, Retry Logic, and the Hard Technical Limits of AI Workflow Orchestration

Workflow orchestration for AI splits into DAGs (Airflow, Prefect) and state machines (Temporal, LangGraph). Step Functions Standard caps at 25,000 events.

Agent Capabilities & Tools

What topics does this domain cover?

Browser and Computer Use Agents →

Code Execution Agents →

Retrieval-Augmented Agents →

Workflow Orchestration for AI →

Four perspectives on this domain

What Are Browser and Computer Use Agents and How Screenshot-Grounded AI Controls Your Desktop

DOM Trees vs Screenshots: Prerequisites and Technical Limits of Computer Use Agents in 2026

From RAG to Agentic RAG: Prerequisites and Technical Limits of Retrieval-Augmented Agents

What Are Retrieval-Augmented Agents and How They Combine Agentic Reasoning with Dynamic Retrieval

Cold Starts, Flaky Tests, and Context Blowup: The Technical Limits of Code Execution Agents in 2026

Prerequisites for Code Execution Agents: From ReAct Loops to microVM Isolation

What Are Code Execution Agents and How Sandboxed Interpreters Let LLMs Run Their Own Code

What Is Workflow Orchestration for AI and How DAGs, State Machines, and Conditional Branching Structure LLM Pipelines

DAGs vs. State Machines, Retry Logic, and the Hard Technical Limits of AI Workflow Orchestration

How to Build a Retrieval-Augmented Agent with LangGraph, LlamaIndex, and CrewAI in 2026

Agent Capabilities for Developers: What Maps and What Breaks

How to Build a Browser Agent with Anthropic Computer Use, OpenAI Operator, and Browser Use in 2026

How to Build a Code Execution Agent with E2B, Daytona, and Claude Agent SDK in 2026

How to Build a Production AI Workflow with LangGraph, Temporal, and Prefect in 2026

Claude Opus 4.6, GPT-5.4 Operator, and Project Mariner: The 2026 Browser Agent Leaderboard Race

LangGraph, LlamaIndex Workflows, and Vectara: The 2026 Retrieval-Augmented Agent Landscape

Claude Code, OpenHands, and Devin: How the 2026 SWE-bench Race Is Reshaping Code Execution Agents

LangGraph, Temporal, and Haystack: How Hybrid Orchestration Stacks Won Production AI in 2026

Agents That Click for You: The Ethical Risks of Giving AI Control Over Your Browser and Desktop

When Agents Retrieve the Wrong Truth: Accountability and Ethical Risks of Retrieval-Augmented Agents

When LLMs Run Code They Wrote: Accountability and the Ethics of Autonomous Execution

When Orchestration Hides the Failure: Accountability Gaps in Automated AI Workflows

Cookie Settings