Sharing Portal | Chameleon Cloud

FSA-benchmark

This project aims to explore and benchmark various machine learning models for detecting disks at high risk of experiencing fail-slow anomalies.

Implemented Algorithms

Cost-Sensitive Ranking Model
Inspired by the paper "Improving Service Availability of Cloud Systems by Predicting Disk Error" presented at the USENIX ATC '18 conference, this model ranks disks based on fail-slow risk.
Multi-Prediction Models
Drawing from "Improving Storage System Reliability with Proactive Error Prediction" presented at the USENIX ATC '17 conference, this approach uses multiple traditional machine learning models to evaluate disk health using diverse features. Various models were tested, with the Random Forest classifier proving most effective.
LSTM Model
This model employs Long Short-Term Memory (LSTM) networks, trained on the first day's data for each cluster and evaluated on data spanning all days. It captures temporal dependencies to accurately predict fail-slow anomalies over time.
PatchTST Model
An advanced sequence model that leverages transformers to handle time series prediction and fail-slow detection.
GPT-4o-mini
A large language model used to analyze disk metrics and detect fail-slow conditions

Performance Analysis

To evaluate model performance, we generate heatmaps depicting precision and recall across various clusters. These visualizations offer a clear representation of each algorithm's effectiveness, enabling us to assess prediction accuracy and inter-cluster performance

Requirements

A Chameleon account with an active project allocation.

110 16 12 8 Feb. 27, 2025, 3:33 PM

reproducible research storage example

Authors

Xikang Song, University of Chicago (xikang@uchicago.edu)
Ruidan Li, Univeristy of Chicago (ruidanli@uchicago.edu)

Launch on Chameleon

Launching this artifact will open it within Chameleon’s shared Jupyter experiment environment, which is accessible to all Chameleon users with an active allocation.

Request daypass

If you do not have an active Chameleon allocation, or would prefer to not use your allocation, you can request a temporary one from the PI of the project this artifact belongs to.

Download Archive

Download an archive containing the files of this artifact.

Download with git

Clone the git repository for this artifact, and checkout the version's commit

git clone https://github.com/songxikang/FSA-benchmarking.git
# cd into the created directory
git checkout 0ccf56992888cb12f6098d44383737f22f384694

Feedback

Submit feedback through GitHub issues

Versions

Version Stats

96 15 11