Using cloud servers for GPU-based inference
Machine learning models are most often trained in the "cloud", on powerful centralized servers with specialized resources (like GPU acceleration) for training machine learning models. These servers are also well-resources for inference, i.e. making predictions on new data.
In this experiment, we will use a cloud server equipped with GPU acceleration for fast inference in an image classification context.
This notebook assumes you already have a "lease" available for an RTX6000 GPU server on the CHI@UC testbed. Then, it will show you how to:
- launch a server using that lease
- attach an IP address to the server, so that you can access it over SSH
- install some fundamental machine learning libraries on the server
- use a pre-trained image classification model to do inference on the server
- optimize the model for fast inference on NVIDIA GPUs, and measure reduced inference times
- delete the server
Consider running this together with Using edge devices for CPU-based inference!
Materials are also available at: https://github.com/teaching-on-testbeds/cloud-gpu-inference
Launching this artifact will open it within Chameleon’s shared Jupyter experiment environment, which is accessible to all Chameleon users with an active allocation.
Download ArchiveDownload an archive containing the files of this artifact.
Download with git
Clone the git repository for this artifact, and checkout the version's commit
git clone https://github.com/teaching-on-testbeds/cloud-gpu-inference
# cd into the created directory
git checkout 011c89df414e617aa3f6e04ebbe8e95007c912c0
Submit feedback through GitHub issues