Size certainly matters when it comes to large language models (LLMs) as it impacts where a model can run.
Stability AI, the vendor that is perhaps best known for its stable diffusion text to image generative AI technology, today released one of its smallest models yet, with the debut of Stable LM 2 1.6B. Stable LM is a text content generation LLM that Stability AI first launched in April 2023 with both 3 billion and 7 billion parameter models. The new StableLM model is actually the second model released in 2024 by Stability AI, following the company’s Stable Code 3B launched earlier this week.
The new compact yet powerful Stable LM model aims to lower barriers and enable more developers to participate in the generative AI ecosystem incorporating multilingual data in seven languages – English, Spanish, German, Italian, French, Portuguese, and Dutch. The model utilizes recent algorithmic advancements in language modeling to strike what Stability AI hopes is an optimal balance between speed and performance.
“In general, larger models trained on similar data with a similar training recipe tend to do better than smaller ones,” Carlos Riquelme, Head of the Language Team at Stability AI told VentureBeat. ” However, over time, as new models get to implement better algorithms and are trained on more and higher quality data, we sometimes witness recent smaller models outperforming older larger ones.”
Why smaller is better (this time) with Stable LM
According to Stability AI, the model outperforms other small language models with under 2 billion parameters on most benchmarks, including Microsoft’s Phi-2 (2.7B), TinyLlama 1.1B,and Falcon 1B.
The new smaller Stable LM is even able to surpass some larger models, including Stability AI’s own earlier Stable LM 3B model.
“Stable LM 2 1.6B performs better than some larger models that were trained a few months ago,” Riquelme said. “If you think about computers, televisions or microchips, we could roughly see a similar trend, they got smaller, thinner and better over time.”
To be clear, the smaller Stable LM 2 1.6B does have some drawbacks due to its size. Stability AI in its release for the new model cautions that,”… due to the nature of small, low-capacity language models, Stable LM 2 1.6B may similarly exhibit common issues such as high hallucination rates or potential toxic language.”
Transparency and more data are core to the new model release
The more toward smaller more powerful LLM options is one that Stability AI has been on for the last few months.
In December 2023, the StableLM Zephyr 3B model was released, providing more performance to StableLM with a smaller size than the initial iteration back in April.
Riquelme explained that the new Stable LM 2 models are trained on more data, including multilingual documents in 6 languages in addition to English (Spanish, German, Italian, French, Portuguese and Dutch). Another interesting aspect highlighted by Riquelme is the order in which data is shown to the model during training. He noted that it may pay off to focus on different types of data during different training stages.
Going a step further, Stability AI is making the new models available in with pre-trained and fine-tuned options as well as a format that the researchers describe as , “…the last model checkpoint before the pre-training cooldown.”
“Our goal here is to provide more tools and artifacts for individual developers to innovate, transform and build on top of our current model,” Riquelme said. “Here we are providing a specific half-cooked model for people to play with.”
Riquelme explained that during training, the model gets sequentially updated and its performance increases. In that scenario, the very first model knows nothing, while the last one has consumed and hopefully learned most aspects of the data. At the same time, Riquelme said that models may become less malleable towards the end of their training as they are forced to wrap up learning.
“We decided to provide the model in its current form right before we started the last stage of training, so that –hopefully– it’s easier to specialize it to other tasks or datasets people may want to use,” he said. “We are not sure if this will work well, but we really believe in people’s ability to leverage new tools and models in awesome and surprising ways.”