Large-scale model training on Chameleon
In this tutorial, we will practice fine-tuning a large language model. We will use a selection of techniques to allow us to train models that would not otherwise fit in GPU memory:
- gradient accumulation
- reduced precision
- parameter efficient fine tuning
- distributed training across multiple GPUs and with CPU offload
Follow along at Large-scale model training on Chameleon as you use this artifact.
Note: this tutorial requires advance reservation of specific hardware! Either:
- one node with 4x A100 80GB:
gpu_a100_pcie
at CHI@UC for a 3-hour block, or - a 2-hour block on a node with 4x A100 80GB or 4x V100:
gpu_a100_pcie
orgpu_v100
AND a 2-hour block on a node with 1x A100 80GB:compute_gigaio
at CHI@UC
This material is based upon work supported by the National Science Foundation under Grant No. 2230079.
Launching this artifact will open it within Chameleon’s shared Jupyter experiment environment, which is accessible to all Chameleon users with an active allocation.
Download ArchiveDownload an archive containing the files of this artifact.
Download with git
Clone the git repository for this artifact, and checkout the version's commit
git clone https://github.com/teaching-on-testbeds/llm-chi
# cd into the created directory
git checkout 5bb926bb0ac9d95ac6aa7901762b4a9825d2cd4e
Submit feedback through GitHub issues