Astar Labs is one person, a GPU rental bill, and a lot of curiosity.
When AI started taking over the world, I did the opposite of what most people did: I stepped back. Not because I wasn't interested, but because I was interested in the wrong thing for most people's tastes: not what AI could do for me, but how it actually worked. The tools got shinier, the benchmarks got bigger, and I kept thinking the same thing: I want to build one myself.
So I did.
TFM, short for Transformer Foundation Model, is my attempt to understand AI from the ground up. It won't beat GPT-4. It won't beat anything, frankly. It's a small model, trained on a single rented GPU, by someone doing this entirely in their spare time. But every weight in it, every line of code behind it, is something I built and understand. That matters more to me than benchmarks.
If you're here to use it, welcome. If you're here to poke around the GitHub and see how the sausage is made, even better.
The name is a hand-me-down. Before TFM, there was Astar Technologies, a project where I built self-landing model rockets. When I pivoted to AI, I kept "Astar" and appended "Labs", because, honestly, it sounds appropriately AI-ey. The full internal joke is "AI Research Lab Astar" -> Astar Labs. I wish there was a deeper story, but sometimes, there isn't.
TFM-1.5 | last-gen creative specialist
Parameter Count: 152M |
Vocabulary Size: 16k |
Context Window: 4k TFM-1.5 is the original. It was pretrained on TinyStories, a dataset of short, simple narratives, and fine-tuned on UltraChat conversations. The result is a model that isn't particularly smart in the traditional sense, but has a genuine flair for writing. It understands narrative structure, it has a feel for language, and it can surprise you. Think of it less as an assistant, and more as a creative collaborator.
TFM-1.6 | knowledgeable generalist
Parameter Count: 167M |
Vocabulary Size: 24k |
Context Window: 4k TFM-1.6 is where things get more serious. Pretrained on the full SlimPajama-6B dataset and fine-tuned on OpenOrca, it has significantly broader knowledge and better instruction-following than its predecessor. The larger vocabulary size (24k vs 1.5's 16k) accounts for most of the parameter difference, and gives it a more expressive token space to work with. It's still a small model, but it's the best one I've built so far.
Both models share the same core architecture: a decoder-only Transformer, built in PyTorch with custom implementations of:
torch.nn.MultiHeadAttention, to better integrate other custom components.The tokenizer is a BPE tokenizer built with the tokenizers library, using NFD normalization, Metaspace and Punctuation pre-tokenizers, and a Metaspace decoder.
It's trained per-model, which is why vocabulary sizes differ between 1.5 and 1.6.
Training runs on a single rented RTX 5090, chosen for its balance of compute power, VRAM, and rental cost. Local inference runs on MPS, where TFM reaches a comfortable ~80 tokens per second at around 1.5 GB of VRAM, lean enough to run on consumer hardware without breaking a sweat.
TFM-1.6 is currently finishing up pretraining and moving into fine-tuning soon. Once it's out, the focus shifts to a few things I'm genuinely excited about:
The goal is the same as it's always been: build it myself, understand it properly, and share it with whoever's curious enough to show up.
TFM is free to use. It is not affiliated with any company, research institution, or commercial entity. It's just a project. A very personal one.