The era of no-so-large language models

4 min readNov 29, 2023

In a previous post titled “It wasn’t Magic. How NLP went from Text Mining to the Era of Large Language Models”, I talked about how the Natural Language Processing (NLP) field moved to what we call today Large Language Models or LLMs. Today, I want to challenge the word “large” and introduce why, even when size does matter, it’s not the only thing

*“Like in David vs Goliath, the underdog often surprises everyone with unexpected strength, resilience, and determination against all odds.”*

In the rapidly evolving domain of NLP, the race towards higher model performance often translates in scaling model parameters. For instance, when GPT-3 was released it was one order of magnitude bigger than any previous language model. Its own release caused a discontinuity in model size that was preserved by the incentives to outperform GPT-3 by itself by getting models even bigger.

Number of parameters and year of release of different models. Number of parameters are in logarithmic scale. Source: Own

However, it comes at a cost: such scaling increases the computational costs and inference latency, thereby raising barriers to deployment in practical, real-world scenarios.

Those deployment barriers are not trivial. Patterson et al. find that around 90% of total ML compute is spent on inference, so the search for balanced models delivering both high-level performance and efficiency has become essential and the incentive behind more…

The era of no-so-large language models

Written by Facundo Santiago