The era of no-so-large language models
In a previous post titled “It wasn’t Magic. How NLP went from Text Mining to the Era of Large Language Models”, I talked about how the Natural Language Processing (NLP) field moved to what we call today Large Language Models or LLMs. Today, I want to challenge the word “large” and introduce why, even when size does matter, it’s not the only thing
In the rapidly evolving domain of NLP, the race towards higher model performance often translates in scaling model parameters. For instance, when GPT-3 was released it was one order of magnitude bigger than any previous language model. Its own release caused a discontinuity in model size that was preserved by the incentives to outperform GPT-3 by itself by getting models even bigger.
However, it comes at a cost: such scaling increases the computational costs and inference latency, thereby raising barriers to deployment in practical, real-world scenarios.
Those deployment barriers are not trivial. Patterson et al. find that around 90% of total ML compute is spent on inference, so the search for balanced models delivering both high-level performance and efficiency has become essential and the incentive behind more…