LLAMA3 : Meta Delivers a Masterclass in OpenSource Machine Learning

Overview

Blog

LLAMA3

A look into key advancements Details and Performance numbers of Meta's Flagship Model

Kaushik Tiwari

Founder @SNR.Audio

April 22, 2024

Introduction

Meta recently , announced the release of the Llama 3 family of large language models (LLMs), marking a major leap forward in open-source AI technology. The Llama 3 collection includes pretrained and instruction-tuned models with 8B and 70B parameters, demonstrating state-of-the-art performance on a wide range of industry benchmarks.

‍
Performance Numbers -> the benchmark beater

Meta LLAMA3 70B beats Current State of the art models at Benchmarks Like MMLU , Math and HumanEval by significant Margins

Key highlights of the Llama 3 release include:

Improved model architecture: Llama 3 uses a tokenizer with a 128K token vocabulary and incorporates grouped query attention (GQA) for enhanced inference efficiency.
Extensive pretraining data: The models were pretrained on over 15T tokens from publicly available sources, including a significant amount of high-quality non-English data covering more than 30 languages.
Scaled-up pretraining: Meta developed detailed scaling laws for downstream benchmark evaluations, enabling optimal data mix selection and informed training compute allocation.
Advanced instruction fine-tuning: The post-training approach combines supervised fine-tuning, rejection sampling, proximal policy optimization, and direct policy optimisation to fully unlock the potential of the pretrained models in chat use cases.

Meta’s commitment to responsible AI development is evident in the release of new trust and safety tools, such as Llama Guard 2, Code Shield, and CyberSec Eval 2. These tools help developers implement model and system-level safety measures tailored to their specific use cases and audiences.

Above is a brief look into Performance numbers of the LLAMA3 400B , Meta plans to release even larger models with capabilities such as multimodality, multilingual conversation, and longer context windows. The company’s largest model currently in training, is over 400B parameters and are expected to set new benchmarks in AI performance , along with the next version of Gemini , Claude , OpenAI , and Grok