Technology

Chinchilla

DeepMind's 70B-parameter LLM: compute-optimal scaling, outperforming larger models like GPT-3 and Gopher with 4x the training data.

Chinchilla is DeepMind's 70-billion-parameter large language model (LLM), introduced in 2022 to redefine scaling laws (Source: *Training Compute-Optimal Large Language Models*). It challenges the 'bigger is better' trend: using the same compute budget as the 280B-parameter Gopher, Chinchilla achieves superior performance by training on 1.4 trillion tokens, four times more data. This compute-optimal approach yields an average accuracy of 67.5% on the MMLU benchmark and drastically cuts inference and fine-tuning costs: a clear win for efficiency.

https://deepmind.google/discover/blog/an-empirical-analysis-of-compute-optimal-large-language-model-training/

1 project · 1 city

Related technologies

BERT 179 BLOOM 115 GPT-3 191 GPT-4 528 LearnQuantum 2 Llama-2 227 PaLM 2 116 Quantum Computing 1 RoBERTa 118 Synthetic data 3

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

LearnQuantum: AI Physics Engine

San Francisco Jul 11

LearnQuantum Chinchilla