[iris]
Guides

AI Agents

Build AI agents with safe code execution

Why Iris for AI Agents?

AI agents that execute code face a fundamental problem: code can fail, corrupt state, or cause unintended side effects.

Iris solves this with instant forks:

  1. Fork before risky operations
  2. Execute AI-generated code in the branch
  3. Discard the branch on failure — original is untouched

Basic Pattern

import { Sandbox } from '@iris/sdk'
import { generateCode } from './your-llm'

async function safeExecute(sandbox: Sandbox, task: string) {
  const branch = await sandbox.fork()

  try {
    const code = await generateCode(task)
    const result = await branch.exec.run(code)

    if (!result.ok) {
      throw new Error(result.stderr)
    }

    return { success: true, result }
  } catch (error) {
    await branch.kill()
    return { success: false, error }
  }
}

ReAct Agent Example

import { Sandbox } from '@iris/sdk'
import Anthropic from '@anthropic-ai/sdk'

const anthropic = new Anthropic()

async function reactAgent(task: string) {
  const sandbox = await Sandbox.create()
  const messages: Anthropic.MessageParam[] = []

  while (true) {
    const response = await anthropic.messages.create({
      model: 'claude-opus-4-7',
      max_tokens: 4096,
      messages: [
        { role: 'user', content: task },
        ...messages,
      ],
      tools: [{
        name: 'execute_code',
        description: 'Execute a shell command in the sandbox',
        input_schema: {
          type: 'object' as const,
          properties: {
            code: { type: 'string', description: 'Shell command to run' },
          },
          required: ['code'],
        },
      }],
    })

    if (response.stop_reason === 'end_turn') break

    const toolResults: Anthropic.ToolResultBlockParam[] = []

    for (const block of response.content) {
      if (block.type === 'tool_use') {
        const input = block.input as { code: string }

        // Fork before execution — original sandbox is preserved on failure
        const branch = await sandbox.fork()
        const result = await branch.exec.run(input.code)

        if (result.ok) {
          toolResults.push({
            type: 'tool_result',
            tool_use_id: block.id,
            content: result.stdout,
          })
        } else {
          await branch.kill()
          toolResults.push({
            type: 'tool_result',
            tool_use_id: block.id,
            content: `Error (exit ${result.exit_code}): ${result.stderr}`,
            is_error: true,
          })
        }
      }
    }

    messages.push({ role: 'assistant', content: response.content })
    if (toolResults.length > 0) {
      messages.push({ role: 'user', content: toolResults })
    }
  }

  await sandbox.kill()
}

Multi-Step Workflows

Checkpoint at each successful step so you can fork back to the last good state:

const checkpoints: string[] = []

for (const step of workflow) {
  const cp = await sandbox.checkpoint.create({ name: step.name })
  checkpoints.push(cp.checkpoint_id)

  const result = await sandbox.exec.run(step.code)

  if (!result.ok) {
    // Roll back to the last good checkpoint and retry or bail
    const lastGood = checkpoints[checkpoints.length - 2]
    if (lastGood) await sandbox.checkpoint.restore(lastGood)
    console.error(`Step ${step.name} failed:`, result.stderr)
    break
  }
}

Parallel Exploration

Fork from a decision point to explore multiple approaches simultaneously:

await sandbox.exec.run('python3 setup.py')

const approaches = ['approach_a.py', 'approach_b.py', 'approach_c.py']

const results = await Promise.all(
  approaches.map(async (approach) => {
    const branch = await sandbox.fork()
    const result = await branch.exec.run(`python3 ${approach}`)
    await branch.kill()
    return { approach, stdout: result.stdout, ok: result.ok }
  }),
)

const best = results.find((r) => r.ok)

Best Practices

Fork, don't mutate

Fork before any action the agent might want to roll back. Keep the base sandbox clean.

Use timeouts

Set timeout_ms on exec.run() to prevent runaway processes from blocking your agent.

Check result.ok

result.ok is true only when exit_code === 0. Always check before treating output as valid.

Clean up branches

Call branch.kill() on completed forks. Suspended sandboxes still consume quota.

On this page