The results of a first attempt at creating a capable small model.
March 24th, 2026
Today, we're releasing TFM-1.6, the latest model from Astar Labs.
TFM-1.6 is a real step up from TFM-1.5, our first experimental model. Pretrained on the full SlimPajama-6B dataset, and fine-tuned on OpenOrca, it brings noticeably better knowledge and instruction following, whilst staying small and efficient.
TFM-1.6 has been trained in full in 10 days on a single NVIDIA RTX 5090.
Two versions of the model are available:
| Model/Benchmark | TFM-1.6 | Llama 2 70B | Grok-0 (33B) | GPT-3.5 | Grok-1 | Mistral Large | Claude 2 | Grok-1.5 |
|---|---|---|---|---|---|---|---|---|
| GSM8k | 1.60% 8-shot | 56.80% 8-shot | 57.10% 8-shot | 57.10% 8-shot | 62.90% 8-shot | 81.00% 8-shot | 88.00% 8-shot | 90.00% 8-shot |
| MMLU | 26.40% 5-shot | 68.90% 5-shot | 65.70% 5-shot | 70.00% 5-shot | 73.00% 5-shot | 81.20% 5-shot | 75.00% 5-shot + CoT | 81.30% 5-shot |
| HumanEval | 0.00% 0-shot | 29.90% 0-shot | 39.70% 0-shot | 48.10% 0-shot | 63.20% 0-shot | 45.10% 0-shot | 70.00% 0-shot | 74.10% 0-shot |
TFM-1.6 has been benchmarked against 3 standard benchmarks: GSM8k, MMLU, and HumanEval.
TFM-1.6 is still a small model, with a lot of room for improvement, but it's the strongest one I've built so far, and a solid foundation for what comes next. You can chat with TFM-1.6, as well as the older TFM-1.5 model, right here on the Astar Labs website.
TFM is free to use. It is not affiliated with any company, research institution, or commercial entity. It's just a project. A very personal one.