The Debate on Emergent Abilities in Large Language Models
Breakthrough Behavior or Gradual Improvement?
As large language models (LLMs) like GPT-3 and LAMDA scale up, their performance on various tasks improves. Some researchers have observed “breakthrough” behavior, where the models’ abilities seem to jump suddenly at certain parameter thresholds. This phenomenon has been likened to phase transitions in physics, such as water freezing into ice. However, a group of researchers at Stanford University argue that these apparent jumps in ability may be a result of the chosen performance metrics rather than the models’ inner workings.
The Case of Three-Digit Addition
In a 2022 study, researchers reported that GPT-3 and LAMDA failed to accurately complete addition problems until they reached 13 billion and 68 billion parameters, respectively. This suggests that the ability to add emerges at a certain threshold. However, the Stanford researchers point out that the LLMs were judged only on accuracy, meaning they had to get the answer perfectly right to pass. They argue that this metric doesn’t account for partial correctness. For example, if you’re calculating 100 plus 278, then 376 seems like a much more accurate answer than −9.34.
A New Perspective on Measuring LLM Performance
The Stanford team, led by graduate students Rylan Schaeffer and Brando Miranda, along with professor Sanmi Koyejo, proposed using metrics that award partial credit for each correctly predicted digit in the addition problems. With this approach, they found that as parameters increased, the LLMs predicted an increasingly correct sequence of digits, suggesting that the ability to add develops gradually and predictably rather than emerging suddenly.
The Ongoing Debate and Future Implications
While the Stanford team’s work offers a new perspective on emergent abilities in LLMs, some researchers argue that it doesn’t fully dispel the notion of emergence. Tianshi Li, a computer scientist at Northeastern University, notes that the paper doesn’t explain how to predict which metrics will show abrupt improvement in an LLM. Others, like Jason Wei from OpenAI, maintain that the earlier reports of emergence were sound because, for abilities like arithmetic, the correct answer is what matters most.
As LLMs continue to grow in size and complexity, it’s likely that emergence will become more difficult to explain away. Alex Tamkin from the AI startup Anthropic suggests that the community should use this debate as a jumping-off point to emphasize the importance of building a science of prediction for these models. Understanding how LLMs behave and develop new abilities is crucial as these technologies become more widely applicable.
When we grow LLMs to the next level, inevitably they will borrow knowledge from other tasks and other models.
– Xia “Ben” Hu, computer scientist at Rice University
6 Comments
Since when did we start equating fancy algorithms to AI suddenly gaining a mind of its own
So, suddenly these AI models are just stumbling into sentience, or what
Oh, so now we’re pretending AI just magically becomes super intelligent overnight
Emergent abilities or just fancy coding, the line’s getting blurrier by the day, huh
Are we truly witnessing AI breakthroughs, or just dressing up old tech in new metaphors
Emergent abilities in AI, or are we just getting bamboozled by complex programming tricks