Technology

Tiny Llama

A compact 1.1B parameter language model pre-trained on 3 trillion tokens to deliver Llama-2 capabilities on edge devices.

TinyLlama packs the punch of the Llama-2 architecture into a slim 1.1B parameter footprint. Developed by researchers at SUTD, it utilizes FlashAttention-2 and Triton to achieve high computational throughput during its 3-trillion-token training run. This efficiency allows the model to run on consumer hardware with less than 3GB of VRAM (GPU memory). It serves as a high-performance baseline for mobile deployment, local inference, and specialized fine-tuning tasks where latency and resource constraints are critical.

https://github.com/jzhang38/TinyLlama

1 project · 1 city

Related technologies

API 16 LLM 89 Python 613 Reinforcement Learning 5

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Training AI Like a Dog

Chicago Apr 14