Agent System Examples¶

Collaborative Development Systems¶

Examples of agent systems working together to develop software and solutions.

📋

TheAgentCompany provides a multi-agent environment designed for collaborative problem-solving and development, implementing a comprehensive benchmark for evaluating AI agents in realistic workplace scenarios. Built using OpenHands agent framework.

Environment Architecture¶

Local Workspace
- Docker-based sandboxed environment for safe execution
- Pre-installed software tools and development environment
- Isolated from evaluation machine for security
- Browser (Playwright), code editor, and Linux terminal access
Intranet Services
- GitLab: Code repositories and tech-oriented wiki pages
- OwnCloud: Document storage and collaborative editing
- Plane: Issue tracking, sprint cycles, product roadmaps
- RocketChat: Internal real-time messaging and collaboration
- All services are reproducible and reset-able with mock data
Simulated Colleagues
- Built on Sotopia platform for human-like interactions
- Detailed profiles including name, role, responsibilities, project affiliations
- Backed by Claude-3.5-Sonnet for consistent behavior
- Support for direct messages and channel communications

Task Implementation¶

Task Components
- Detailed task intent in natural language
- Multiple checkpoints representing milestones
- Programmatic evaluators for verification
- Environment initialization and cleanup code
Checkpoint System
- Action Completion: Tool usage, navigation, data collection
- Data Accuracy: Output correctness and completeness
- Collaboration: Quality of colleague interactions
- Point-based scoring for partial completion
Evaluation Methods
- Deterministic Evaluators
  - Python functions for objective checks
  - Environment state verification
  - File system change monitoring
  - Browser history tracking
  - Action sequence validation
- LLM-based Evaluators
  - Complex deliverable assessment
  - Predefined evaluation rubrics
  - Reference output comparison
  - Subjective quality measurement

Scoring Implementation¶

Full Completion Score

Sfull = 1 if all checkpoints passed else 0

Partial Completion Score

Spartial = 0.5 * (points_achieved/total_points) + 0.5 * Sfull

Efficiency Metrics
- Number of LLM calls per task
- Token usage and associated costs
- Step count tracking
- Execution time monitoring

Common Failure Categories¶

Common Sense Deficits
- Missing implicit assumptions
- File type inference failures
- Context understanding issues
- Basic workflow comprehension gaps
Social Interaction Issues
- Incomplete communication flows
- Missed social cues
- Follow-up failures
- Context switching problems
Technical Challenges
- Complex UI navigation
- Popup handling difficulties
- Multi-step process management
- Tool integration issues
Task Execution Problems
- Invalid shortcut creation
- Critical step omission
- Incorrect assumption chains
- Resource management issues

Performance Metrics¶

Success rate across different task types
Platform-specific performance analysis
Cost-efficiency measurements
Step count optimization
Token usage efficiency

Resources¶

CAMEL: Communicative Agents for "Mind" Exploration of Large Scale Language Model Society (King Abdullah University, March 2023)

Paper: https://arxiv.org/abs/2303.17760

Abstract: "The rapid advancement of conversational and chat-based language models has led to remarkable progress in complex task-solving. However, their success heavily relies on human input to guide the conversation, which can be challenging and time-consuming. This paper explores the potential of building scalable techniques to facilitate autonomous cooperation among communicative agents and provide insight into their "cognitive" processes. To address the challenges of achieving autonomous cooperation, we propose a novel communicative agent framework named role-playing. Our approach involves using inception prompting to guide chat agents toward task completion while maintaining consistency with human intentions. We showcase how role-playing can be used to generate conversational data for studying the behaviors and capabilities of chat agents, providing a valuable resource for investigating conversational language models. Our contributions include introducing a novel communicative agent framework, offering a scalable approach for studying the cooperative behaviors and capabilities of multi-agent systems, and open-sourcing our library to support research on communicative agents and beyond. "

GitHub: https://github.com/camel-ai/camel

Article: https://blog.devgenius.io/coded-example-of-langchain-enabled-cooperative-agents-4859d294b197

📋

Agent Laboratory

A research assistant framework that transforms human research ideas into complete research reports and code repositories. Designed to complement human researchers rather than replace them.

Research Team Structure¶

PhD Agent: Research planning and literature review lead
Postdoc Agent: Expert guidance and methodology refinement
ML Engineer: Code implementation and technical development
Professor Agent: Quality evaluation and research direction

How It Works¶

Literature Review
- Semantic search across research papers
- Contextual understanding of related work
- Synthesis of key findings and gaps
- Automatic citation management
Experimentation
- Collaborative experimental design
- Iterative code development and testing
- Results analysis and validation
- Documentation of findings
Report Writing
- Academic paper structure
- Integration of results and literature
- LaTeX formatting and figure generation
- Citation and reference management

Operation Modes¶

Autonomous Mode
- Self-directed research workflow
- Internal peer review process
- Continuous quality monitoring
- Independent decision-making
Co-Pilot Mode
- Human-AI collaboration
- Regular feedback checkpoints
- Adjustable interaction levels
- Responsive to researcher guidance

Key Tools¶

MLE-Solver
- Machine learning code generation
- Self-improving algorithms
- Iterative refinement process
- Error detection and correction
Paper-Solver
- Research synthesis
- Academic writing
- Results visualization
- Format compliance

External Integrations¶

arXiv for literature access
Hugging Face for ML models
Python environment for experiments
LaTeX for document preparation

Resources¶

ChatDev - Collaborative Software Development

ChatDev is a communicative agent approach for developing solutions using ML models. It works with Camel to create agentic systems and provides a framework for creating systems of agents to produce software-enabled products.

Experiential Co-Learning of Software-Developing Agents

This system introduces a multi-agent paradigm with three key modules: - Co-tracking: Promotes interactive rehearsals between agents - Co-memorizing: Finds shortcuts based on past experiences - Co-reasoning: Enhances instructions using collective experience pools

Task-Specific Agent Teams¶

Polaris - Healthcare Safety System

Polaris is a safety-focused LLM constellation architecture for healthcare, ensuring safe and compliant AI chatbots through multi-agent collaboration.

Showrunner Agents - Content Generation

Showrunner Agents use LLMs to generate episodic content through a creative and multi-faceted process.

MAgICoRe - Reasoning Framework

MAgICoRe implements a multi-agent system with solver, reviewer, and refiner roles to enable improved solutions through collaborative refinement.

Learning and Teaching Systems¶

Theory of Mind Teaching

This research explores how language models can teach weaker agents using Theory of Mind concepts to improve student performance. Implementation

Multi-Agent Debate for Improvement

This approach uses multiple language model instances to debate and refine responses, improving factuality and reasoning through collaborative critique.

Production Systems¶

Agency Swarm - Production Framework

Agency Swarm provides a language for creating interacting systems of agents in production environments.

Council - Team Orchestration

Council enables the creation of networks of agents to form full-fledged teams for production outputs.

OpenAI Assistants

OpenAI's AI assistants system allows integration of different assistants within a chat using the @ symbol, enabling collaborative problem-solving.

Research Implementations¶

Generative Agents Simulation

This research implements a simulated town where agents with different personalities interact and evolve. Key features include: - Observation and reflection memory systems - Recursive planning capabilities - Dynamic environment interactions Implementation

SocraticAI - Conversational Problem Solving

SocraticAI leverages the power of conversation between agents to solve complex problems through structured dialogue.

Society of Minds

Based on Minsky's theory, this research implements a multi-agent debate approach where agents collectively review and refine answers through structured interaction.

Emerging Architectures¶

Hierarchical Autonomous Agent Swarm (HAAS)

HAAS implements self-directing, self-correcting, and self-improving agent systems through hierarchical organization.

Swarm Intelligence Systems

Swarms explores large-scale agent coordination, focusing on emergent behaviors and collective intelligence in multi-agent systems.