- Tech News's Newsletter
- Posts
- Meta introduces first multimodal Llama models
Meta introduces first multimodal Llama models
Meta has released Llama 3.2 LLM. Variants include the 11 and 90 billion-parameter vision multimodal LLMs and two smaller text-only variants of 1 and 3 billion parameters, mainly for on-device and edge applications.
Meta says that the two largest Llama 3.2 LLMs of 11 and 90 billion parameters are suitable for advanced image interpretation. They are also the social media and tech giant’s first multimodal LLMs. They include document-level understanding of maps and graphs, can convert images to text and perform vision tasks such as directionally indicating objects in images based on questions asked in plain language (Such as, “How far is the pot from this kettle?”).
According to Meta, these LLMs can also bridge the gap between image and language by extracting details from an image, understanding the scene in question and then generating a sentence that can be used as a caption for the image to tell a story.
LLM versions for on-device applications
The small 1 and 3 billion parameter LLMs include good multilingual text generation and ‘tool calling’ functionality. These allow developers to build on-device and edge apps with strong privacy settings to ensure data never leaves the device.
Meta sees two benefits here. First, it allows users to experience the responses to their prompts as more ‘immediate’ output. This is because processing takes place locally on the device in question. The second advantage is that running the processes locally better ensures privacy. Actions for messages or calendar activities, for example, are not sent to the cloud, making the operation of the app in question even more private.
Such an app can keep track of which queries should be forwarded locally on the device and which, if any, should be forwarded to the cloud for processing by a larger LLM. The 1 and 3 billion parameter LLMs are optimized for hardware from Qualcomm and MediaTek as well as Arm processors, according to Meta.
Llama Stack distributions
In addition to the models, Meta introduced the first Lllama Stack distributions. This should simplify and improve developers’ access to the Llama LLMs in different environments, including single-node, on-premises, cloud, and on-device environments.
Components of Llama Stack include the Llama CLI for building, configuring, and running Llama Stack distributions, client code in multiple programming languages such as Python, Node.js, and Agents API Provider, and Docker containers for Llama Stack Distribution Server and Agents API Provider.
Multiple distributions have also been released, including a single-node Llama Stack Distribution via internal Meta deployment and Ollama, cloud-based Llama Stack distributions from AWS, Databricks, Fireworks, and Together, on-device distributions on iOS via PyTorch ExecuTorc, and a Dell-supported on-premises Llama Stack Distribution.
Azure and Google Cloud availability
Furthermore, Meta’s various Llama 3.2 versions are now available via Microsoft Azure and Google Cloud. The Llama offerings on Azure include: Llama 3.2 1B, Llama 3.2 3B, Llama 3.2-1B-Instruct; Llama 3.2-3B-Instruct, Llama Guard 3 1B, Llama 3.2 11B Vision Instruct, Llama 3.2 90B Vision Instruct and Llama Guard 3 11B Vision.
The Llama 3.2 11B Vision Instruct and Llama 3.2 90B Vision Instruct LLMs are also now available in the Azure AI Model Catalog.
Google Cloud offers all four Llama 3.2 LLMs in Vertex AI Model Garden. Only the Llama 3.2 90B LLM is currently available in preview through Google’s Model-as-a-Service (MaaS) product.
|
|
Reply