Desperate times call for desperate measures, but a pricey backseat animatronic toy probably isn’t the answer to your baby’s tears.
Microsoft launched the next version of its lightweight AI model Phi-3 Mini, the first of three small models the company plans to release.
Phi-3 Mini measures 3.8 billion parameters and is trained on a data set that is smaller relative to large language models like GPT-4. It is now available on Azure, Hugging Face, and Ollama. Microsoft plans to release Phi-3 Small (7B parameters) and Phi-3 Medium (14B parameters). Parameters refer to how many complex instructions a model can understand.
The company released Phi-2 in December, which performed just as well as bigger models like Llama 2. Microsoft says Phi-3 performs better than the previous version and can provide responses close to how a model 10 times bigger than it can.
Eric Boyd, corporate vice president of Microsoft Azure AI Platform, tells The Verge Phi-3 Mini is as capable as LLMs like GPT-3.5 “just in a smaller form factor.”
Compared to their larger counterparts, small AI models are often cheaper to run and perform better on personal devices like phones and laptops. The Information reported earlier this year that Microsoft was building a team focused specifically on lighter-weight AI models. Along with Phi, the company has also built Orca-Math, a model focused on solving math problems.
Microsoft’s competitors have their own small AI models as well, most of which target simpler tasks like document summarization or coding assistance. Google’s Gemma 2B and 7B are good for simple chatbots and language-related work. Anthropic’s Claude 3 Haiku can read dense research papers with graphs and summarize them quickly, while the recently released Llama 3 8B from Meta may be used for some chatbots and for coding assistance.
Boyd says developers trained Phi-3 with a “curriculum.” They were inspired by how children learned from bedtime stories, books with simpler words, and sentence structures that talk about larger topics.
“There aren’t enough children’s books out there, so we took a list of more than 3,000 words and asked an LLM to make ‘children’s books’ to teach Phi,” Boyd says.
He added that Phi-3 simply built on what previous iterations learned. While Phi-1 focused on coding and Phi-2 began to learn to reason, Phi-3 is better at coding and reasoning. While the Phi-3 family of models knows some general knowledge, it cannot beat a GPT-4 or another LLM in breadth — there’s a big difference in the kind of answers you can get from a LLM trained on the entirety of the internet versus a smaller model like Phi-3.
Boyd says that companies often find that smaller models like Phi-3 work better for their custom applications since, for a lot of companies, their internal data sets are going to be on the smaller side anyway. And because these models use less computing power, they are often far more affordable.
0 Comments