Hi, i'm shehab ashraf, This is where i write about what i'm working on
posts
-
December 30, 2025
nanogpt
I trained a 124M parameter language model from scratch on a single A100 GPU in about an hour. That’s it....
projects
-
nanogpt
llm in ~1 hour on a single A100 GPU.
favorite reads
-
How to Optimize a CUDA Matmul Kernel for cuBLAS-like Performance: a Worklog
Simon's legendary deep-dive blog on CUDA optimizations.