Technology

LLaMA-65B

Meta AI’s 65-billion parameter foundational model designed to outperform GPT-3 while running on consumer-grade hardware.

LLaMA-65B is the flagship variant of Meta’s Large Language Model Meta AI suite, trained on 1.4 trillion tokens sourced from public datasets like CommonCrawl and GitHub. It utilizes a standard transformer architecture with specific optimizations (including RoPE embeddings and SwiGLU activation functions) to achieve state-of-the-art performance on benchmarks like Chinchilla and PaLM. By releasing the weights for research, Meta enabled the open-source community to build high-performance tools (such as llama.cpp) that run efficiently on single-node setups (8x A100 GPUs) or quantized consumer hardware.

https://ai.facebook.com/blog/large-language-model-llama-meta-ai/

1 project · 1 city

Related technologies

PyTorch/XLA 1 TPU 1

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

LLaMA: 10x Faster Inference on TPU

San Francisco Jul 6

LLaMA-65B PyTorch/XLA