Large-scale model training on Chameleon

In this tutorial, we will practice fine-tuning a large language model. We will use a selection of techniques to allow us to train models that would not otherwise fit in GPU memory:

  • gradient accumulation
  • reduced precision
  • parameter efficient fine tuning
  • distributed training across multiple GPUs and with CPU offload

Follow along at Large-scale model training on Chameleon as you use this artifact.

Note: this tutorial requires advance reservation of specific hardware! Either:

  • one node with 4x A100 80GB: gpu_a100_pcie at CHI@UC for a 3-hour block, or
  • a 2-hour block on a node with 4x A100 80GB or 4x V100: gpu_a100_pcie orgpu_v100 AND a 2-hour block on a node with 1x A100 80GB: compute_gigaio at CHI@UC

This material is based upon work supported by the National Science Foundation under Grant No. 2230079.

282 104 100 3 Feb. 20, 2025, 1:59 AM

Authors

Launch on Chameleon

Launching this artifact will open it within Chameleon’s shared Jupyter experiment environment, which is accessible to all Chameleon users with an active allocation.

Download Archive

Download an archive containing the files of this artifact.

Download with git

Clone the git repository for this artifact, and checkout the version's commit

git clone https://github.com/teaching-on-testbeds/llm-chi
# cd into the created directory
git checkout 5bb926bb0ac9d95ac6aa7901762b4a9825d2cd4e
Feedback

Submit feedback through GitHub issues

Version Stats

97 57 54