Modern computing architectures feature low-precision matrix multiplication units that achieve substantially higher throughput than their high-precision counterparts. Motivated by this architectural ...