“The talent level required to train a massive model with high FLOPS utilization on a GPU grows increasingly higher because of all the tricks needed to extract maximum performance.”
How Nvidia’s CUDA Monopoly In ML Is Breaking – OpenAI Triton & PyTorch 2.0 https://t.co/zrOtrOBKVX