Sharing Portal | Chameleon Cloud

Using cloud servers for GPU-based inference

Machine learning models are most often trained in the "cloud", on powerful centralized servers with specialized resources (like GPU acceleration) for training machine learning models. These servers are also well-resources for inference, i.e. making predictions on new data.

In this experiment, we will use a cloud server equipped with GPU acceleration for fast inference in an image classification context.

This notebook assumes you already have a "lease" available for an RTX6000 GPU server on the CHI@UC testbed. Then, it will show you how to:

launch a server using that lease
attach an IP address to the server, so that you can access it over SSH
install some fundamental machine learning libraries on the server
use a pre-trained image classification model to do inference on the server
optimize the model for fast inference on NVIDIA GPUs, and measure reduced inference times
delete the server

Consider running this together with Using edge devices for CPU-based inference!

Materials are also available at: https://github.com/teaching-on-testbeds/cloud-gpu-inference

32 21 3 1 Nov. 14, 2023, 8:01 PM

education

Authors

Fraida Fund, NYU Tandon School of Engineering (ffund@nyu.edu)

Launch on Chameleon

Launching this artifact will open it within Chameleon’s shared Jupyter experiment environment, which is accessible to all Chameleon users with an active allocation.

Download Archive

Download an archive containing the files of this artifact.

Download with git

Clone the git repository for this artifact, and checkout the version's commit

git clone https://github.com/teaching-on-testbeds/cloud-gpu-inference
# cd into the created directory
git checkout 011c89df414e617aa3f6e04ebbe8e95007c912c0

Feedback

Submit feedback through GitHub issues

Versions

Version 2023-11-14 Nov. 14, 2023, 7:56 PM

Version Stats

32 21 3