Skip to content

Agent System Examples

Collaborative Development Systems

Examples of agent systems working together to develop software and solutions.

📋
TheAgentCompany

TheAgentCompany provides a multi-agent environment designed for collaborative problem-solving and development, implementing a comprehensive benchmark for evaluating AI agents in realistic workplace scenarios. Built using OpenHands agent framework.

image

Environment Architecture

  1. Local Workspace

    • Docker-based sandboxed environment for safe execution
    • Pre-installed software tools and development environment
    • Isolated from evaluation machine for security
    • Browser (Playwright), code editor, and Linux terminal access
  2. Intranet Services

    • GitLab: Code repositories and tech-oriented wiki pages
    • OwnCloud: Document storage and collaborative editing
    • Plane: Issue tracking, sprint cycles, product roadmaps
    • RocketChat: Internal real-time messaging and collaboration
    • All services are reproducible and reset-able with mock data
  3. Simulated Colleagues

    • Built on Sotopia platform for human-like interactions
    • Detailed profiles including name, role, responsibilities, project affiliations
    • Backed by Claude-3.5-Sonnet for consistent behavior
    • Support for direct messages and channel communications

Task Implementation

  1. Task Components

    • Detailed task intent in natural language
    • Multiple checkpoints representing milestones
    • Programmatic evaluators for verification
    • Environment initialization and cleanup code
  2. Checkpoint System

    • Action Completion: Tool usage, navigation, data collection
    • Data Accuracy: Output correctness and completeness
    • Collaboration: Quality of colleague interactions
    • Point-based scoring for partial completion
  3. Evaluation Methods

    • Deterministic Evaluators

      • Python functions for objective checks
      • Environment state verification
      • File system change monitoring
      • Browser history tracking
      • Action sequence validation
    • LLM-based Evaluators

      • Complex deliverable assessment
      • Predefined evaluation rubrics
      • Reference output comparison
      • Subjective quality measurement

Scoring Implementation

  1. Full Completion Score

    Sfull = 1 if all checkpoints passed else 0
    

  2. Partial Completion Score

    Spartial = 0.5 * (points_achieved/total_points) + 0.5 * Sfull
    

  3. Efficiency Metrics

    • Number of LLM calls per task
    • Token usage and associated costs
    • Step count tracking
    • Execution time monitoring

Common Failure Categories

  1. Common Sense Deficits

    • Missing implicit assumptions
    • File type inference failures
    • Context understanding issues
    • Basic workflow comprehension gaps
  2. Social Interaction Issues

    • Incomplete communication flows
    • Missed social cues
    • Follow-up failures
    • Context switching problems
  3. Technical Challenges

    • Complex UI navigation
    • Popup handling difficulties
    • Multi-step process management
    • Tool integration issues
  4. Task Execution Problems

    • Invalid shortcut creation
    • Critical step omission
    • Incorrect assumption chains
    • Resource management issues

Performance Metrics

  • Success rate across different task types
  • Platform-specific performance analysis
  • Cost-efficiency measurements
  • Step count optimization
  • Token usage efficiency

Resources

📋
Agent Laboratory

A research assistant framework that transforms human research ideas into complete research reports and code repositories. Designed to complement human researchers rather than replace them.

Research Team Structure

  • PhD Agent: Research planning and literature review lead
  • Postdoc Agent: Expert guidance and methodology refinement
  • ML Engineer: Code implementation and technical development
  • Professor Agent: Quality evaluation and research direction

How It Works

  1. Literature Review

    • Semantic search across research papers
    • Contextual understanding of related work
    • Synthesis of key findings and gaps
    • Automatic citation management
  2. Experimentation

    • Collaborative experimental design
    • Iterative code development and testing
    • Results analysis and validation
    • Documentation of findings
  3. Report Writing

    • Academic paper structure
    • Integration of results and literature
    • LaTeX formatting and figure generation
    • Citation and reference management

Operation Modes

  1. Autonomous Mode

    • Self-directed research workflow
    • Internal peer review process
    • Continuous quality monitoring
    • Independent decision-making
  2. Co-Pilot Mode

    • Human-AI collaboration
    • Regular feedback checkpoints
    • Adjustable interaction levels
    • Responsive to researcher guidance

Key Tools

  1. MLE-Solver

    • Machine learning code generation
    • Self-improving algorithms
    • Iterative refinement process
    • Error detection and correction
  2. Paper-Solver

    • Research synthesis
    • Academic writing
    • Results visualization
    • Format compliance

External Integrations

  • arXiv for literature access
  • Hugging Face for ML models
  • Python environment for experiments
  • LaTeX for document preparation

Resources

image image

ChatDev - Collaborative Software Development

ChatDev is a communicative agent approach for developing solutions using ML models. It works with Camel to create agentic systems and provides a framework for creating systems of agents to produce software-enabled products.

Experiential Co-Learning of Software-Developing Agents

This system introduces a multi-agent paradigm with three key modules: - Co-tracking: Promotes interactive rehearsals between agents - Co-memorizing: Finds shortcuts based on past experiences - Co-reasoning: Enhances instructions using collective experience pools

Task-Specific Agent Teams

Polaris - Healthcare Safety System

Polaris is a safety-focused LLM constellation architecture for healthcare, ensuring safe and compliant AI chatbots through multi-agent collaboration.

Showrunner Agents - Content Generation

Showrunner Agents use LLMs to generate episodic content through a creative and multi-faceted process.

MAgICoRe - Reasoning Framework

MAgICoRe implements a multi-agent system with solver, reviewer, and refiner roles to enable improved solutions through collaborative refinement.

Learning and Teaching Systems

Theory of Mind Teaching

This research explores how language models can teach weaker agents using Theory of Mind concepts to improve student performance. Implementation

Multi-Agent Debate for Improvement

This approach uses multiple language model instances to debate and refine responses, improving factuality and reasoning through collaborative critique.

Production Systems

Agency Swarm - Production Framework

Agency Swarm provides a language for creating interacting systems of agents in production environments.

Council - Team Orchestration

Council enables the creation of networks of agents to form full-fledged teams for production outputs.

OpenAI Assistants

OpenAI's AI assistants system allows integration of different assistants within a chat using the @ symbol, enabling collaborative problem-solving.

Research Implementations

Generative Agents Simulation

This research implements a simulated town where agents with different personalities interact and evolve. Key features include: - Observation and reflection memory systems - Recursive planning capabilities - Dynamic environment interactions Implementation

SocraticAI - Conversational Problem Solving

SocraticAI leverages the power of conversation between agents to solve complex problems through structured dialogue.

Society of Minds

Based on Minsky's theory, this research implements a multi-agent debate approach where agents collectively review and refine answers through structured interaction.

Emerging Architectures

Hierarchical Autonomous Agent Swarm (HAAS)

HAAS implements self-directing, self-correcting, and self-improving agent systems through hierarchical organization.

Swarm Intelligence Systems

Swarms explores large-scale agent coordination, focusing on emergent behaviors and collective intelligence in multi-agent systems.