Meta Releases Llama 3.1 405B Open AI Model
Meta has recently introduced its new and most potent open-source AI model known as Llama 3.1 405B, with 405 billion parameters.
This makes it the largest model Meta has released in recent years among its products that are used for artificial intelligence.
Developed with the help of 16,000 Nvidia H100 GPUs, Llama 3.1 is updated with new training methods that improve its efficiency and make it comparable to such models as OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet.
Llama 3.1 is also downloadable and can be run on cloud solutions like AWS, Azure, and Google Cloud solutions and as the company said, it is a “carefully balance[d]” model.
It is also being incorporated into Meta’s family of apps such as WhatsApp and Meta.AI, expanding their features with the help of AI technologies.
It can code, answer math problems, and summarize documents in eight languages; however, the model is only applicable to text-based activities at the moment.
To train Llama 3. 1, Meta employed a dataset of 15 trillion tokens which equals to 750 billion words, and improved its data selection and data quality mechanisms.
Other AI models’ synthetic data was also employed to further train Llama 3.1, a practice that is now being practiced by most of the prominent AI vendors despite the bias that comes with it.
“The training data, in many ways, is sort of like the secret recipe and the sauce that goes into building these models, and so from our perspective, we’ve invested a lot in this. And it is going to be one of these things where we will continue to refine it.” Ragavan Srinivasan, VP of AI program management at Meta, told TechCrunch.
One of the peculiarities of Llama 3.1 is the increased context window of 128,000 tokens which enables it to handle more text inputs and summarize them in a better way. This is a vast improvement from previous models where the maximum context window was at 8000 tokens.
Llama 3.1 also has smaller versions, for example, Llama 3.1 8B and Llama 3.1 70B, which is suitable for most use cases, including chatbots and code generation.
These models are trained to use third-party tools and API’s to do their work and hence are very effective and flexible.
Meta has changed the licensing terms associated with Llama’s outputs with the permission to utilize the outputs for the development of third-party AI generative models.
This move is intended to encourage the development of an open environment for innovation and cooperation with other AI participants.
Despite its advancements, Llama 3.1 405B does have limitations compared to other open models. It excels in coding and generating plots but lags in multilingual capabilities and general reasoning compared to GPT-4o and Claude 3.5 Sonnet so basically it is failing here for this point only.
Additionally, the large size of the model requires substantial computational resources and power, making it more suitable for model distillation and generating synthetic data than for general deployment or personal use.
Meta's strategic release of Llama 3.1 405B aligns with its broader goal of becoming a leading player in the generative AI and LLMs market.
By making powerful AI models and tools freely available to the general public and developers, Meta aims to build a robust ecosystem and incorporate community-driven improvements into future models.
And there's much more to see!