News
Pipeline parallelism, for training models larger than can fit on a single GPU Useful metrics logged to Tensorboard Compute metrics on a held-out eval set, for measuring generalization Training state ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results