Authors: Ondřej Čertík, Brian Beckman
In this blog post I am announcing fastGPT, fast GPT-2 inference written in Fortran. In it, I show
Fortran has speed at least as good as default PyTorch on Apple M1 Max.
Fortran code has statically typed arrays, making maintenance of the code easier than with Python
It seems that the bottleneck algorithm in GPT-2 inference is matrix-matrix multiplication. For physicists like us, matrix-matrix multiplication is very familiar, unlike other aspects of AI and ML.