Background and context

Multi‑agent research systems (MAS) combine multiple AI agents that collaborate to tackle complex tasks. Large language models (LLMs) make it possible for agents to reason in natural language, call external tools and interact with digital environments. Research in early 2025 has seen an explosion of new frameworks, evaluation methods and domain‑specific applications for MAS. Key developments include commercial deployments (e.g., Anthropic’s Research system), open‑source orchestration frameworks (OpenAI Swarm and Agents SDK), improved multi‑agent debate strategies, new reinforcement‑fine‑tuning paradigms, and domain‑specific agent systems for finance and supply‑chain management.

1. Commercial research systems and practical guidance

System/Study (date)Key contribution (brief)Evidence
Anthropic – Research multi‑agent system (Jun 13 2025)Anthropic disclosed how its Research feature uses a lead agent to plan a research task and spin up specialized sub‑agents that search in parallelanthropic.com. The lead agent decomposes queries, spawns subagents to explore different aspects and synthesizes results; this orchestration improved performance on breadth‑first queries by ~90 % compared with a single Claude agentanthropic.com. The post explains benefits (flexibility to pivot during open‑ended research, parallel sub‑agents compress information and reduce path dependenceanthropic.com) and challenges (token usage grows 4× to 15× compared with chatsanthropic.com). Anthropic shares prompt‑engineering lessons—delegation guidelines, scaling effort to query complexity, tool selection and parallel tool calling to reduce research time by up to 90 %anthropic.com.The article shows that multi‑agent systems excel when sub‑tasks are parallelizable, but they require careful prompt design and tool selection to avoid excessive token consumptionanthropic.com.
Communications of the ACM – Multi‑agent systems in finance (Jul 24 2025)A CACM blog summarises how MAS can enhance fundamental financial analysis. It defines a MAS as a network of autonomous agents that interact to solve complex problemscacm.acm.org. The article notes that decentralised intelligence lets specialized agents monitor stock prices, macro‑economic indicators and sentiment simultaneously, and agents share insights to produce cohesive analysescacm.acm.org. Use‑cases include automated data extraction, real‑time sentiment monitoring, trend analysis, scenario simulation and fraud detectioncacm.acm.org. MAS benefits include faster processing, wider coverage, reduced human bias and improved forecastscacm.acm.org, but the article warns that decisions must be transparent and that sensitive data and regulatory issues require careful managementcacm.acm.org.This post illustrates the practical adoption of MAS in finance and highlights benefits and challenges for real‑world deployment.
MAS 2025 Workshop call (ICML 2025) (Jul 18 2025)The ICML 2025 workshop “Multi‑Agent Systems in the Era of Foundation Models” argues that scaling model parameters and scaling the number of agents both amplify collective intelligence. By progressively integrating more agents, MAS can activate diverse functionalities in foundation‑model‑powered agents and coordinate complementary skills, yielding improved problem solving and adaptabilitymas-2025.github.io. The workshop points out that foundation‑model‑powered MAS are now used for social simulation, mathematics, software development, web/mobile agents and embodied roboticsmas-2025.github.io and emphasises research topics such as multi‑agent simulation, orchestration efficiency, human‑agent collaboration and reinforcement‑learning methodsmas-2025.github.io.This call underscores that the research community views MAS as a route toward artificial general intelligence (AGI) and seeks to understand scaling laws and cross‑domain applicationsmas-2025.github.io.

Takeaways

  • Multi‑agent research systems are moving from prototypes to commercial products. Anthropic’s Research system demonstrates that orchestrator‑worker architectures with multiple Claude agents can dramatically improve breadth‑first information retrieval, though they consume far more tokens and require careful prompts and tool selectionanthropic.com.
  • Practical applications are emerging in domains such as finance, where specialized agents can collectively perform complex workflows and reduce biascacm.acm.org.
  • The research community is positioning MAS as a path toward AGI; workshops in 2025 focus on collaboration structures, communication protocols and human‑agent teamingmas-2025.github.iomultiagents.org.

2. Open‑source orchestration frameworks

Open‑source frameworks released in 2024–2025 have simplified the construction of multi‑agent workflows and inspired a wave of experimentation. Two of the most notable are OpenAI’s Swarm and the Agents SDK.

FrameworkRelease & featuresEvidence
OpenAI Swarm (beta release 2024, widely adopted 2024‑25)Swarm is an experimental framework that simplifies orchestration of multi‑agent systems. An introductory article describes Swarm as a tool that helps manage multiple AI agents working togetheranalyticsvidhya.com. Each agent has its own role and list of functions, and the system automatically organizes these functions into JSON structuresanalyticsvidhya.com. Agents can handoff tasks dynamically, passing work to the appropriate agent based on the conversation flowanalyticsvidhya.com. Context variables allow agents to remember and share information across turnsanalyticsvidhya.com. Swarm emphasises multi‑agent coordination, customizable roles, dynamic handoffs, context sharing and an open‑source, scalable designanalyticsvidhya.com.Swarm makes MAS development accessible by defining agent roles and enabling seamless handoffsanalyticsvidhya.com. The framework’s open‑source nature encourages experimentation and community contributionsanalyticsvidhya.com.
OpenAI Agents SDK (March 2025)The Agents SDK (successor to Swarm) is a lightweight Python framework for building multi‑agent workflows. It exposes an agent reasoning loop, tool definitions and memory abstraction. Though the official OpenAI site is blocked, multiple third‑party tutorials (DataCamp, Langfuse, etc.) describe the SDK as provider‑agnostic and more flexible than Swarm. The SDK allows developers to specify agent roles, tools and handoffs while retaining control over context and step execution. It integrates with OpenAI’s Responses API and is designed for production use.While direct citations from OpenAI are unavailable here, the consensus across the community is that the Agents SDK supersedes Swarm by providing more configurable control over agent orchestration and is used in many 2025 multi‑agent tutorials and prototypes.
Other frameworksSeveral open‑source frameworks enable MAS experimentation: CrewAI, AutoGen, LangGraph and MetaGPT assign roles to agents and provide message exchange mechanisms; AgentScope offers distributed deployment and automatic parallel optimization; and OpenAI’s Swarm and Agents SDK emphasise ergonomic orchestrationarxiv.org. These tools are accompanied by public datasets and benchmarks such as SRDD, SMART, ToolBench and SRDD for reasoning and communication optimizationarxiv.org.The LLM‑MAS survey lists these frameworks and emphasises that open‑source communities are critical to advancing MAS researcharxiv.org.

Takeaways

  • Swarm illustrates how a well‑designed orchestration layer can abstract away the complexity of coordinating multiple agents, providing roles, memory and handoffsanalyticsvidhya.com. Developers can use Swarm to build domain‑specific agents or prototypes without deep expertise in distributed systems.
  • The Agents SDK (released March 2025) builds on Swarm, offering more control and integration with OpenAI’s Responses API. Although details are still emerging, the community views it as the next step toward production‑ready MAS frameworks.
  • A growing ecosystem of frameworks—CrewAI, AutoGen, LangGraph, MetaGPT and AgentScope—supports experimentation and allows researchers to compare designsarxiv.org.

3. Research surveys and theoretical foundations

3.1 Survey of LLM‑based Multi‑Agent Systems (LLM‑MAS)

The January 2025 survey by Chen et al. (arXiv 2412.17481v2) presents a comprehensive overview of LLM‑based multi‑agent systems and identifies their core components, applications and challenges. Key points include:

  • Definition: LLM‑MAS consist of generative agents that interact and collaborate within a shared environmentarxiv.org. Generative agents perceive the environment, make decisions and execute complex actionsarxiv.org.
  • Agent characteristics: Generative agents require profiling (roles described in natural language), memory (long‑term, short‑term and sensory memory) and planning to formulate long‑term behaviorsarxiv.org. Communication can be natural language (interpretability and flexibility) or custom signals optimized for cooperationarxiv.org.
  • Environment: The environment defines tools, rules and interventions. Tools translate agent instructions into actions; rules define communication and behavior; interventions allow external control or supervisionarxiv.org.
  • Reasoning frameworks: The survey summarises three MAS reasoning frameworks—multi‑stage pipelines, collective decision‑making (agents vote or debate) and self‑refine frameworks—and notes that research focuses on reasoning and communication optimizationarxiv.org. Communication optimization aims to speed up communication (e.g., non‑verbal communication) or enable distributed discussion when agents lack complete informationarxiv.org.
  • Open source resources: The paper catalogues code and benchmarks (e.g., SRDD, ToolBench, Overcooked‑AI) and highlights open‑source frameworks such as MetaGPT, AgentScope and Swarmarxiv.org.

3.2 Multi‑agent reinforcement fine‑tuning (MARFT)

A long technical report (Apr 2025; updated Jun 2025) introduces Multi‑Agent Reinforcement Fine‑Tuning (MARFT) as a new paradigm for improving LLM‑based MASarxiv.org. The authors argue that existing multi‑agent reinforcement learning (MARL) methods cannot be directly applied to LLM‑based agents due to asynchronous interactions, profile‑aware designs and heterogeneous architecturesarxiv.org. MARFT proposes a universal algorithmic framework that extends RL techniques to fine‑tune LLM‑based MAS. The framework provides modular components, including action‑level and token‑level fine‑tuning, and presents open‑source implementationsarxiv.org. This work signals a growing interest in combining RL with MAS to optimize complex, interactive tasks.

3.3 Multi‑agent debate research

LLM‑driven debates have become a popular approach for improving reasoning, but recent work questions their efficacy and seeks better decision mechanisms. Significant papers published in 2025 include:

  • Stop Overvaluing Multi‑Agent Debate – We Must Rethink Evaluation and Embrace Model Heterogeneity (Feb 2025, v3 Jun 21 2025). The authors systematically evaluate five representative multi‑agent debate (MAD) methods across nine benchmarks and four foundation models. They find that MAD often fails to outperform simple single‑agent baselines and consumes considerably more computationarxiv.org. They call for improved evaluation practices and advocate model heterogeneity as a design principlearxiv.org.
  • Voting or Consensus? Decision‑Making in Multi‑Agent Debate (submitted Feb 26 2025; v3 Jul 15 2025). This paper evaluates seven decision protocols (majority voting, unanimity consensus, etc.). Experiments show that voting protocols improve performance in reasoning tasks by 13.2 %, while consensus improves knowledge tasks by 2.8 %. Increasing agent number helps, but more discussion rounds can hurt performance. They propose All‑Agents Drafting and Collective Improvement, which improve performance by up to 7.4 %arxiv.org.
  • Debate4MATH: Multi‑Agent Debate for Fine‑Grained Reasoning in Math (ACL Findings 2025). The authors propose a Fine‑grained Multi‑Agent Debate (FMAD) framework and create MMATH‑Data (46 k reasoning steps). They train a multi‑agent debate reward model on this dataset and fine‑tune a specialised MMATH‑LLM, achieving 83.4 % accuracy on GSM8K and 45.1 % on the MATH dataset, surpassing state‑of‑the‑art methodsaclanthology.org.
  • Diverse Multi‑Agent Debate (DMAD) (ICLR 2025). This method argues that existing MAD suffers from a “fixed mental set”, where agents use similar reasoning patterns. DMAD encourages agents to adopt distinct reasoning approaches, enabling them to gain insights from different perspectives. Experiments show DMAD outperforms self‑reflection and traditional MAD across benchmarks, often in fewer roundsopenreview.net.

3.4 Domain‑specific multi‑agent applications

Application (paper)ContributionEvidence
Inventory management – InvAgent (arXiv 2407.11384v2, Jan 31 2025)Presents a large‑language‑model‑based multi‑agent system to manage inventory in supply chains. The system leverages LLMs’ zero‑shot capabilities to perform adaptive decision‑making without prior training, provides explainable chain‑of‑thought reasoning and dynamically adapts to variable demand scenariosarxiv.org. Evaluations across scenarios show improved efficiency and reduced costsarxiv.org.Shows that LLM agents can coordinate to make supply‑chain decisions, demonstrating MAS beyond text‑only tasks.
Financial analysis – MAS for fundamental analysis (CACM blog, Jul 2025)Multi‑agent systems break financial analysis into specialised tasks (data extraction, sentiment monitoring, trend analysis, simulation and fraud detection) and then combine the findings into a comprehensive viewcacm.acm.org. The blog describes benefits (speed, coverage, bias reduction, better forecasting) and notes the need for transparent decisions and careful data managementcacm.acm.org.Illustrates a practical use of MAS in an industry context.
Complex research – Anthropic’s Research systemA real‑world example of MAS that uses a lead agent and sub‑agents to conduct open‑ended web‑based research. It outperforms single‑agent approaches by decomposing tasks and running searches in parallelanthropic.com.Highlights commercial adoption of MAS in knowledge work.

3.5 Workshops and conferences

The proliferation of workshops signals growing interest in MAS. The AAAI 2025 Workshop on Advancing LLM‑Based Multi‑Agent Collaboration (March 4 2025) solicits research on multi‑agent collaboration structures, cross‑agent knowledge sharing, inter‑agent communication protocols, distributed agents, group behavior learning, strategic planning and responsible behaviorsmultiagents.org. The workshop features papers on sparse communication topologies, cooperative driving via natural language, system‑level direct preference optimization (DPO), reflection mechanisms and evaluation frameworks, illustrating the breadth of MAS research.

4. Challenges and open problems

  • Scalability and token efficiency: Anthropic reports that multi‑agent research systems consume far more tokens than single‑agent chats (about 4× to 15×), making them costly for routine tasksanthropic.com. Prompt engineering and explicit guidelines for scaling effort to task complexity help mitigate over‑spawning of sub‑agentsanthropic.com.
  • Evaluation difficulties: Multi‑agent systems can follow many valid paths to an answer, making step‑level evaluation hard. Anthropic emphasises flexible evaluation that judges outcomes rather than exact stepsanthropic.com. Research on MAD also reveals inconsistent benchmarks and the need for model heterogeneityarxiv.org.
  • Communication overhead: Fully connected communication leads to combinatorial explosion and potential privacy issues. LLM‑MAS research aims to optimise communication speed and explore distributed discussion where agents have incomplete informationarxiv.org.
  • Agent coordination and decision protocols: Decision protocols (voting vs consensus) significantly affect performancearxiv.org. Designing protocols that balance discussion depth, diversity and efficiency remains open.
  • Responsible behavior and guardrails: Workshops emphasise the need for guardrails and responsible behaviors to prevent misusemultiagents.org. In finance and other domains, ensuring transparency and handling sensitive data are criticalcacm.acm.org.
  • Integration with reinforcement learning: MARFT highlights the need for fine‑tuning LLM‑based MAS using RL, yet challenges such as asynchronous interactions and sample inefficiency persistarxiv.org.

5. Outlook and future directions

  1. Scaling agents and foundation models: The MAS 2025 workshop suggests that scaling the number of agents could unlock new capabilities analogous to scaling model parametersmas-2025.github.io. Research will investigate how many agents are needed for tasks and how to manage their interactions effectively.
  2. Better frameworks and evaluation: The OpenAI Agents SDK and other frameworks will likely mature, offering more modular and secure ways to orchestrate MAS. Evaluation techniques that account for diverse agent behaviors, heterogeneity and real‑world constraints (e.g., cost, latency) will be crucial.arxiv.org.
  3. Cross‑domain applications: MAS are expanding beyond text‑based tasks into domains like robotics, programming, supply‑chain management, finance and healthcare. Domain‑specific benchmarks (e.g., MMATH‑Dataaclanthology.org) will help quantify progress.
  4. Human‑agent collaboration: Workshops call for research on human‑agent interaction and trustmas-2025.github.io. Future systems may allow human experts to supervise and interact with multi‑agent teams seamlessly.
  5. Ethics and safety: As MAS grow more autonomous, ensuring responsible behavior, preventing emergent harmful coordination and safeguarding sensitive data will become central to research agendascacm.acm.org.

Conclusion

The past six months have witnessed rapid progress in multi‑agent research systems. Commercial deployments like Anthropic’s Research feature show the feasibility of orchestrated agent teams for complex tasksanthropic.com, while open‑source frameworks such as Swarm and the Agents SDK democratise MAS experimentationanalyticsvidhya.com. Surveys highlight the core components of LLM‑based MAS and catalogue the growing ecosystem of frameworks and benchmarksarxiv.orgarxiv.org. At the same time, researchers are developing new paradigms (e.g., MARFT), debate strategies and domain‑specific systems, even as they identify challenges in evaluation, communication and safety. The synergy between foundation models and multi‑agent collaboration points toward a future where teams of intelligent agents could tackle tasks that exceed individual capabilities. Continued progress will depend on scalable orchestration frameworks, robust evaluation methods, ethical safeguards and integration with reinforcement learning and other AI disciplines.

You May Also Like

The AI Regulation Landscape (2025–2030 Outlook)

The next five years will be pivotal for artificial intelligence (AI) policy.…

Reality Check: “Nobody Wants to Work Anymore” – Myth or Shift in Values?

Understanding whether the “nobody wants to work anymore” myth holds truth reveals surprising insights into current workforce trends.

Replit x Microsoft: Vibe Coding Goes Corporate

The coding revolution just took a massive corporate turn. What if your…

Reality Check: If AI Is So Great, Why Isn’t Productivity Skyrocketing?

Many organizations struggle to realize AI’s full potential due to complex integration, data challenges, and skills gaps, leaving the true impact uncertain.