Training LLMs at Scale with AMD MI250 GPUs
Figure 1: Training performance of LLM Foundry and MPT-7B on a multi-node AMD MI250 cluster. As we increase the number of GPUs from 4 x MI250 to 128 x MI250, we see near-linear scaling of training performance (TFLOP/s) and throughput (tokens/sec). Introduction Four months ago, we shared how AMD had emerged as a capable platform […]
Continue Reading