Technology
Tiny Llama
A compact 1.1B parameter language model pre-trained on 3 trillion tokens to deliver Llama-2 capabilities on edge devices.
TinyLlama packs the punch of the Llama-2 architecture into a slim 1.1B parameter footprint. Developed by researchers at SUTD, it utilizes FlashAttention-2 and Triton to achieve high computational throughput during its 3-trillion-token training run. This efficiency allows the model to run on consumer hardware with less than 3GB of VRAM (GPU memory). It serves as a high-performance baseline for mobile deployment, local inference, and specialized fine-tuning tasks where latency and resource constraints are critical.
Related technologies
Recent Talks & Demos
Showing 1-1 of 1