Boosting LLM Performance with the "More Agents" Method

Large language models (LLMs) like GPT-3 and ChatGPT have shown remarkable capabilities across many applications like language generation, understanding, reasoning, and coding. However, they can still struggle with very complex tasks. Researchers have explored different methods to enhance LLM performance, such as:

  • Ensembling: Combining the outputs from multiple LLMs

  • Multiple Agent Collaboration: Having different LLM "agents" interact and work together

This new paper proposes a much simpler approach they call the "More Agents" method. The key idea is that you can boost an LLM's performance on difficult tasks just by instantiating more agents (copies) of the same LLM model and doing a majority vote of their outputs. Surprisingly, they show this simple method can match or outperform far more complicated techniques across a wide range of tasks.

The "More Agents" Method

The approach has two basic phases:

  1. Sampling: Generate N samples (outputs) by querying the same LLM N times with the task input. This could just be N copies of the raw LLM, or N runs of the LLM integrated with another method like chain-of-thought prompting.

  2. Voting: Calculate the pairwise similarity between each pair of samples, such as using string matching for classification tasks or BLEU score for generation. The sample with the highest cumulative similarity to the others is chosen as the final output.

For example, let's say we want an LLM to solve the arithmetic word problem: "If John had 5 apples and Mary gave him 3 more, how many apples does John have now?"

In the sampling phase, we query the LLM N times and extract potential answers like:

  • Sample 1: "John has 8 apples now."

  • Sample 2: "The total number of apples John has is 7."

  • Sample 3: "John now has 8 apples."

In the voting phase, we calculate similarities like:

  • sim(Sample 1, Sample 2) = 0

  • sim(Sample 1, Sample 3) = 1

  • sim(Sample 2, Sample 3) = 0

Then we choose the sample with the highest cumulative similarity as the final answer, which is "John has 8 apples now" in this case.

Key Findings

The researchers conducted very thorough experiments evaluating this "More Agents" approach across a wide range of tasks and LLM models. Some key findings:

  1. Superior Performance: Simply scaling up the number of agents can outperform much more complicated state-of-the-art methods aimed at boosting LLM capabilities.

  2. Combines With Other Methods: The "More Agents" approach is orthogonal and complementary to other techniques like chain-of-thought prompting and multi-agent collaboration frameworks. Combining it with these other methods can lead to further performance gains.

  3. Smaller LLMs Can Beat Larger Ones: With enough agents, a smaller LLM like LLaMa-13B can match or exceed the performance of much larger models like LLaMa-70B or GPT-3.5 on many tasks.

  4. Works Better on Harder Tasks: The performance boost from using more agents is more pronounced on inherently harder problems or those requiring longer chains of reasoning.

  5. Robust: The approach consistently improves performance across different prompting techniques, tasks, model scales, and hyperparameter settings.

The researchers also analyzed what properties of a task influence the effectiveness of the "More Agents" approach. They found several key factors:

  • Inherent task difficulty: Gains first increase then decrease as inherent difficulty rises

  • Number of reasoning steps: Gains increase with more steps required

  • Prior probability of correct answer: Gains increase as the prior probability gets lower

Based on these findings, they proposed some optimizations like hierarchical sampling and step-wise voting to further boost performance.

Implications

The simplicity and strong empirical results of the "More Agents" method have several useful implications:

  1. It provides a simple baseline to compare against more sophisticated LLM boosting techniques. If a complicated method can't outperform just instantiating more raw agents, it may not be worth the added complexity.

  2. It enables boosting LLM performance in a very straightforward way without costly model retraining or designing complex prompts or architectures.

  3. Combined with other techniques, it could help push the boundaries of what current LLMs can achieve on very challenging tasks.

  4. Smaller LLMs could potentially match expensive larger models by just instantiating enough agents, providing a compelling cost/performance tradeoff.

Of course, an obvious downside is that instantiating many copies of large LLMs will be computationally expensive. The researchers note this mirrors the costs seen in many multi-agent or ensembling methods, and leave better efficiency as future work.

Overall, this "More Agents is All You Need" paper makes a compelling case that one of the most powerful ways to enhance LLM capabilities may simply be instantiating more agents of the same model and doing majority voting. For technical teams looking to boost performance on difficult LLM tasks, incorporating this straightforward method could be a simple yet effective starting point, especially when combined with other techniques. The approach and analysis also provide valuable insights into general properties influencing the behavior of large language models.

Previous
Previous

Three Paradigms of Retrieval-Augmented Generation (RAG) for LLMs

Next
Next

RAG vs Large Context Window LLMs: When to use which one?