Chameleon Changelog for August 2022
- Sept. 1, 2022 by
- Mark Powers
Dear Chameleon users,
Welcome back – we hope everybody has had a great summer and is coming back with fresh energy to tackle important research problems – we are so looking forward to see what cool discoveries you make this year! Don’t forget to share them with us – your tests and triumphs is what makes us roll out of bed every day – if you have an exciting experiment or result you’d like to share via our blog please contact us, we will be happy to help you share your experiences with others.
A quick reminder to all that we have a long planned authentication outage scheduled on September 6 at 11:00 AM CT to deploy necessary upgrades. We estimate that the outage will take only an hour but during that time you won’t be able to log into Chameleon – though you will be able to work on deployed instances – please, plan accordingly and renew any leases ahead of time. On a more cheerful note, we also bring you the following:
Composable Hardware at CHI@TACC! The big news this month is the composable Liqid hardware. What makes this composable hardware special is that it disaggregates memory, GPU, and storage, making it possible to detach and attach these resources between Liqid nodes via software. At this time, Liqid nodes are available as a set of static configurations:
Machine |
# CPU |
# GPU |
# NVMe SSD |
liqid01 |
1 |
2 |
2 |
liqid02 |
1 |
2 |
2 |
liqid03 |
1 |
1 |
3 |
liqid04 |
1 |
1 |
3 |
liqid05 |
1 |
1 |
2 |
liqid06 |
1 |
1 |
2 |
liqid07 |
1 |
1 |
1 |
liqid08 |
1 |
1 |
1 |
You can reference these names with the CPU information on our hardware discovery page under the node number “compute_liqid.” Due to the reconfigurable nature of this hardware, we are still working through some issues getting it to work correctly on the hardware page, but for reference, the NVMe SSDs are 3841GB SAMSUNG MZ1LB3T8HMLA-00007 and the GPUs are 40GB A100s. If you wish to reconfigure Liqid nodes to explore different configurations, please reserve the nodes, and submit a help ticket for us to do the reconfiguration for you as it is not yet supported programmatically. Another way you can experiment with disaggregated hardware on the testbed is with the Haswell InfiniBand nodes at TACC.
New Project Categorization. We are always trying to better understand what our users are working on so that we may make better prioritization and purchasing decisions. To help us develop that understanding, when you make a new project you will be asked to select a project “tag” from a list of Computer Science research fields. You can also update the tag for your existing projects on your project’s page. If you don’t do this by 09/16 we will propose a tag for you, but if you don’t like it, you are free to update it yourself, or contact us at help@chameleoncloud.org.
Appliance news. We have several changes on the appliance front this month. First, we have a JupyterHub Trovi artifact that allows you to create your own standalone JupyterHub on Chameleon resources. While we also used to have a JupyterHub appliance in Appliance Catalog, the users voted with their feet and we are now officially stopping support for the appliance in favor of the Trovi artifact. Second, if you ever had trouble ssh-ing to Ubuntu instances with two networks attached, you will be glad to know that we’ve fixed this issue and released a new version of Chameleon-supported Ubuntu images, including ARM and CUDA images. In addition, we no longer support Ubuntu16.04 and CentOS8 images as these systems are no longer supported; please, use Ubuntu18.04, Ubuntu20.04, and CentOS8-stream instead. Last but not least, Chameleon has started using customized Ironic Python Agent images to better serve all node types of both core sites and associate sites. Please see CHI-in-a-Box documentation for details.
CHI@NU Xena upgrade. The CHI@NU site has been upgraded to a newer version of OpenStack, Xena. CHI@NU has nodes with Mellanox ConnectX-5 100GE cards, making it the ideal site for 100G networking experiments.
Heads up on A100 hardware problem. Last month we announced nodes with multiple A100 GPUs at CHI@UC. We are happy to see that our users are excited to use this new hardware, however we’ve discovered that it comes with a problem: when using it with Ubuntu 20.04, due to an operating system issue, not all the GPUs will appear. We are actively pursuing a resolution of this problem with Dell but for the time being we’ve found a workaround for you in this support article. Alternatively, you may use a CentOS CUDA image which does not have this problem.
Enjoy the last days of summer – though don’t forget to enjoy the new hardware too!
No comments