The era of no-so-large language models

Facundo Santiago
4 min readNov 29, 2023

In a previous post titled “It wasn’t Magic. How NLP went from Text Mining to the Era of Large Language Models, I talked about how the Natural Language Processing (NLP) field moved to what we call today Large Language Models or LLMs. Today, I want to challenge the word “large” and introduce why, even when size does matter, it’s not the only thing

“Like in David vs Goliath, the underdog often surprises everyone with unexpected strength, resilience, and determination against all odds.”

In the rapidly evolving domain of NLP, the race towards higher model performance often translates in scaling model parameters. For instance, when GPT-3 was released it was one order of magnitude bigger than any previous language model. Its own release caused a discontinuity in model size that was preserved by the incentives to outperform GPT-3 by itself by getting models even bigger.

Number of parameters and year of release of different models. Number of parameters are in logarithmic scale. Source: Own

However, it comes at a cost: such scaling increases the computational costs and inference latency, thereby raising barriers to deployment in practical, real-world scenarios.

Those deployment barriers are not trivial. Patterson et al. find that around 90% of total ML compute is spent on inference, so the search for balanced models delivering both high-level performance and efficiency has become essential and the incentive behind more…

--

--

Facundo Santiago
Facundo Santiago

Written by Facundo Santiago

Product Manager @ Microsoft AI. Graduate adjunct professor at University of Buenos Aires. Frustrated sociologist.