This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Tutorials

Show your user how to work through some end to end examples.

1: Setting up Environment from Scratch
2: Running MLCube on Rivanna
3: Singularity Collection
4: Installing Singularity on Windows Workstations
5: Running GPU Batch jobs on Rivanna
6: Installing nvcc on Uuntu 20.04
7: Installing tensorflow on Windows 10

This is a placeholder page that shows you how to use this template site.

Tutorials are complete worked examples made up of multiple tasks that guide the user through a relatively simple but realistic scenario: building an application that uses some of your project’s features, for example. If you have already created some Examples for your project you can base Tutorials on them. This section is optional. However, remember that although you may not need this section at first, having tutorials can be useful to help your users engage with your example code, especially if there are aspects that need more explanation than you can easily provide in code comments.

1 - Setting up Environment from Scratch

A procedure to build an optimized python from source and setup a development environment to run benchmarks.

A description on how to install nvcc in cuda

Requirements

Draft

Introduction

Most modern linux systems come prepackaged with a version of Python 3. However, this version is typically deeply integrated into the operating system’s ecosystem of tools, so it may be a significantly older version of python and it may lack some optimizations to maximize compatibility.

For benchmarking, it is desireable to have control over your source program, so that running programs are both consistent and repeatable. Below are the steps to build Python 3.10.2 on a variety of hosts.

Setup

Configurations

This procedure assumes the following:

You are building using bash
You have curl, make, gcc, openssl, bzip2, libffi, ‘zlib, readline, sqlite3, llvm, ncurses, and xz c header files installed.
You have set the following environment variables
1. BASE - Specifies the working directory for all operations. This procedure assumes ~/.local
2. PREFIX - Where you want the final python instance to be positioned. This procedure assumes ${BASE}/python/3.10.2.

Build OpenSSL

# Fetch source code
curl -OL https://www.openssl.org/source/openssl-1.1.1m.tar.gz
tar -zxvf openssl-1.1.1m.tar.gz -C ${BASE}/src/
cd ${BASE}/src/openssl-1.1.1m/
./config --prefix=${BASE}/ssl --openssldir=${BASE}/ssl shared zlib
make
#make test
make instal
make clean

Build Python

curl -OL https://www.python.org/ftp/python/3.10.2/Python-3.10.2.tar.xz
tar Jxvf Python-3.10.2.tar.xz -C ${BASE}/src/
cd Python-3.10.2
export CPPFLAGS=" -I${BASE}/ssl/include "
export LDFLAGS=" -L${BASE}/ssl/lib "
export LD_LIBRARY_PATH=${BASE}/ssl/lib:$LD_LIBRARY_PATH
./configure --prefix=${PREFIX} --enable-optimizations --with-lto --with-computed-gotos --with-system-ffi

make -j "$(nproc)"
make test
make altinstall
make clean

mkdir -p ${BASE}/.local/bin
(cd ${BASE}/bin ; ln -s python3.10 python)

cat <<EOF > ${BASE}/setup.source
#!/bin/bash

BASE=$BASE
PREFIX=$PREFIX

export LD_LIBRARY_PATH=\$BASE/ssl/lib:\$PREFIX/lib:\$LD_LIBRARY_PATH
export PATH=\$PREFIX/bin:\$PATH
EOF

Archive Build

tar Jxvf python-3.10.2.tar .xz $BASE

Common Setup Procedures

To bootstrap your new environment with all the tools frequently leveraged during development, see the below procedures.

Assumption: The variable BASE is your user home directory, and python3.10 is on the path.

mkdir -p ${BASE}/ENV3
python3.10 -m venv --prompt ENV3 ~/ENV3

source ${BASE}/ENV3/bin/activate
pip install -U pip
pip install cloudmesh-installer

mkdir -p ~/git/cm
(cd ~/git/cm && cloudmesh-installer get cms)

echo "alias ENV3=\"source $BASE/ENV3/bin/activate\"" >> ~/.bash_profile
echo "alias EQ=\"cd $BASE/git\"" >> ~/.bash_profile
source ~/.bash_profile

EQ

git clone git@github.com:laszewsk/mlcommons.git
git clone git@github.com:laszewsk/mlcommons-data-earthquake.git

pip install -r mlcommons/examples/mnist-tensorflow/requirements.txt
pip install -r mlcommons/benchmarks/earthquake/new/requirements.txt

2 - Running MLCube on Rivanna

A gentle introduction to running MLCube on Rivanna

In this guide, we introduce MLCube and demonstrate how to run workloads on Rivanna using the Singularity backend.

Running models consistently across platforms requires users to have commanding knowledge of the configuration of not only the source code, but also of the hardware ecosystem. It’s not uncommon that you’ll encounter a project where configuring your system to get reproducible results is error prone and time consuming, and ultimately not productive to the analyst.

MLCube(tm) is a contract-driven approach to address system configuration details and establishes a standard for generating consistent models and a mechanism for delivering these models to others, allowing others to benefit from having a solved environment.

Getting Started

First you need to install a runner for MLCube. The MLCube supports many backend runners and should run on each of them equally.

For this walkthrough, we will target the Rivanna HPC ecosystem, so we’ll leverage the lmod and singularity ecosystems.

Python install

We have two choices to install python. One is with pyenv, the other is with conda.

If you decide to install it with pyenv, use the following steps

pyenv install 3.9.7
pyenv global 3.9.7
python -m venv --prompt mlcube venv
source venv/bin/activate
python -m pip install mlcube-singularity

If you decide to install it with conda, use the following steps

conda create -n mlcube -c conda-forge python=3.9.7
conda activate mlcube
# We use pip as conda does not have an mlcube repository
python -m pip install mlcube-singularity

Note that the mlcube-singularity package can and should be installed within your target environment.

Using MLCube

Once you have run the above commands, you will now have the MLCube script available on your path and you can now list what runners mlcube has registered with

$ mlcube config --get runners
# System settings file path = /home/<username>/mlcube.yaml
# singularity:
#   pkg: mlcube_singularity

At this point you can run through any of the example projects that the mlcube project hosts at https://github.com/mlcommons/mlcube_examples.git.

Below is a set of procedures to run their hello world project.

git clone https://github.com/mlcommons/mlcube_examples.git
cd ./mlcube_examples/hello_world

mlcube run --mlcube=. --task=hello --platform=singularity
# No output expected.

mlcube run --mlcube=. --task=bye --platform=singularity
# No output expected.

cat ./workspace/chats/chat_with_alice.txt
# You should some log lines in this file.

Nontrivial example - Earthquake Data

Help wanted

We are looking to convert our earthquake model into an MLCube container.

3 - Singularity Collection

A collection of information about Singularity

User Guides

Add gregors info

Presentations

Organize

TACC Singularity

Containers

4 - Installing Singularity on Windows Workstations

A procedure to get singularity running on WSL2

Singularity is a container-based runtime engine designed to run in permission constrained environments. Singularity provides similar functions to systems like Docker, Containerd, and Podman, and provides an ecosystem to share a computer’s kernel and drivers and provide a filesystem based on overlaying files. These overlays create a type of partitioned software that that can create isolated execution on the host as a type of “container”.

However, Singularity differs from typical container runtime engine, most notably:

Singularity was designed to be run as a normal, non-root user and does not depend on a daemon.
Singularity does not natively support OCI images (the typical container image format target), and uses its own SIF format; but OCI images can be imported.
Singularity container images are distributed as files.
Singularity was designed to create a container platform that works from laptops to HPC clusters.

(Windows Only) Setup on Window Subsystem for Linux

While not the normal place to install singularity, it is useful to have the ability to run commands from a local machine to validate command structure and workflows. Singularity does not run natively on windows, but with Windows 10 Professional, you can build Singularity using a WSL2 distribution and provide the ability to run the commands on your workstation.

Enabling WSL2

To enable WSL2, follow microsoft’s instructions

Windows 10/11 - https://docs.microsoft.com/en-us/windows/wsl/install
Windows 10 older than 2004 - https://docs.microsoft.com/en-us/windows/wsl/install-manual

Any version of linux will work with Singularity, but we recommend using Ubuntu.

Building Singularity

This process has been automated in ./tools/install-singularity-wsl2.bash if you’re running Ubuntu. However, the general flow of the instruction is:

Install the singularity code dependencies (gcc, libssl, gpgme, squashfs, seccomp, wget, pkg-config, git, and cryptsetup)
Install a modern version of golang.
Download the Singularity source code from https://github.com/apptainer/singularity.git
Run ./mconfig from the singularity codebase
Run make && make install from the ./builddir directory.

These procedures are more thoroughly covered in the apptainer website at: https://apptainer.org/docs/user/main/quick_start.html#quick-installation-steps

Run your first singularity container

Once the build has completed, you should be able to run the singularity command. Try to run

$ singularity run docker://godlovedc/lolcow

If this command was successful you should see something similar to the following:

 _____________________________________
/ You recoil from the crude; you tend \
\ naturally toward the exquisite.     /
 -------------------------------------
        \   ^__^
         \  (oo)\_______
            (__)\       )\/\
                ||----w |
                ||     ||

5 - Running GPU Batch jobs on Rivanna

A short introduction on how to run GPU Jobs on Rivanna

We explain how to run GPU batch jobs using different GPU cards on Rivanna. Rivanna is a supercomputer at the University of Virginia. This tutorial is only useful if you can get an account on it. The official documentation is available at

https://www.rc.virginia.edu/userinfo/rivanna/overview/

However, it includes some issues and does not explain certain important aspects for using GPUs on it. Therefore, this guide has been created.

PLEASE HELP US IMPROVE THIS GUIDE

Requirements

We require that you have

A valid account on Rivanna
A valid accounting group allowing you to run GPU jobs on Rivanna

Introduction

Rivanna is the High-Performance Computing (HPC) cluster managed by University of Virginia’s Research Computing. Rivanna is composed 575 nodes with a total of 20,476 cores and 8PB of different types of storage. Table 1 shows an overview of the compute nodes. Some of the compute nodes also includes these GPUs:

A100, K80, P100, V100, RTX2080, and RTX3090

Table 1: GPUs on Rivanna

Cores/Node	Memory/Node	Specialty Hardware	GPU memory/Device	GPU devices/Node	# of Nodes
40	354GB	-	-	-	1
20	127GB	-	-	-	115
28	255GB	-	-	-	25
40	768GB	-	-	-	34
40	384GB	-	-	-	348
24	550GB	-	-	-	4
16	1000GB	-	-	-	5
48	1500GB	-	-	-	6
64	180GB	KNL	-	-	8
128	1000GB	GPU: A100	40GB	8	2
28	255GB	GPU: K80	11GB	8	9
28	255GB	GPU: P100	12GB	4	3
40	383GB	GPU: RTX 2080 Ti	11GB	10	2
28	188GB	GPU: V100	16GB	4	1
40	384GB	GPU: V100	32GB	4	12

*) This information may be outdated

Access to Rivanna

Access to Rivanna is secured by University of Virginias VPN. UVA offers two different VPNs. We recommend that you install the UVA Anywhere VPN. This can be installed on Linux, macOS and Windows.

After installation, you have to start the VPN. After that, you can use a terminal to access Rivanna via ssh. If you have not used ssh, we encourage you to read about it and explore commands such as ssh, ssh-keygen, ssh-copy-id, ssh-agent, and ssh-add`.

Note: gitbash on Windows

Please note that on Windows, you are expected to install gitbash so you can use the same commands and ssh logic as on Linux and Mac. For this reason, we do not recommend putty, PowerShell or cmd.exe. This is because we can do scripting the same way, even from those running Windows, and significantly simplifies this guide.

We will not provide an extensive tutorial on how to use ssh, but you can contribute it. Instead, we will summarize the most important steps:

Create an ssh key if you have not done that before
```
$ ssh-keygen
```
It is VERY important that you create the key with a strong passphrase.
Add an abbreviation for Rivanna to your ~/.ssh/config file

Use your favorite editor. Mine is emacs

emacs ~/.ssh/config

copy and paste the following into that file, where abc1de is to be substituted by your UVA compute id.
```
Host rivanna
  User abc1de
  HostName rivanna.hpc.virginia.edu 
  IdentityFile ~/.ssh/id_rsa.pub
```
This will allow you to use rivanna instead of abc1de@rivanna.hpc.virginia.edu. The next steps assume you have done this and can use just rivanna
Copy your public key to rivanna
```
$ ssh-copy-id rivanna
```
This will copy your public key into the rivanna:~/.ssh/authorized_keys file.
After this step, you can use your keys to authenticate. You still need to be using the VPN, though.

The most convenient system for it is Mac and Ubuntu. It already has a tool installed called ssh-agent and keychain. In Windows under gitbash you need to start it with
```
$ eval `ssh-agent`
```
First, you add the key to your session, so you do not have to constantly type in the password. Use the command
```
$ ssh-add
```
to test if it works, just say
```
$ ssh rivanna hostanme
```
which will print the hostname of Rivanna

In case your machine does not run ssh-agent, you can start it before you type in the ssh-add command with
```
$ ssh rivanna hostanme
```
If everything is set up correctly, it will return the string
```
udc-ba35-36
```

To login to Rivanna, simply say

“`bash ssh rivanna


If this does not work, you have made a mistake. Please, review the
previous steps carefully.

Running Jobs on Rivanna

Jobs on Rivanna can be scheduled through Slurm either as a batch job or as an interactive job. In order to achieve this, one needs to load the software first and create special scripts that are used to submit them to nodes that contain the GPUs you specify.

The user documentation about this is provided here:

https://www.rc.virginia.edu/userinfo/rivanna/overview/#gpu-partition

However, at the time when we looked at it, it had some mistakes and limitations that we hope to overcome here.

Modules

Rivanna’s default mechanism of software configuration management is done via modules. The UVA modules documentation is provided through this link.

Modules provide the ability to load a particular software stack and configuration into your shell but also into your batch jobs. You can load multiple modules in your environment to load them in order.

To list the available modules, log into Rivanna and use the command

$ module available

To list aproximately, the python modules use

$ module available py

It will return all modules that have py in it. Please chose those that look like python modules.

To probe for deep learning modules, use something similar to

$ module available cuda tensorflow pytorch mxnet nvidia cudnn

Python

Different versions of python are available.

To load python 3.8 we can say

$ module load anaconda/2020.11-py3.8

To load Python 3.10.0 we can say

$ module load anaconda
$ conda create -n py3.10 python=3.10
$ source activate py3.10
$ python -V
Python 3.10.0

Please note that at this time anaconda did not support 3.10.2, which I run personally on my computer, but from python.org.

Adding Modules with Spider

Details about modules can be identified with the module spider command. If you type it in you get a list of many available configurations. Spider can take a keyword and lists all available version the keyword matches. Let us demonstrate it on

$ module spider python

----------------------------------------------------------------------------
  python:
----------------------------------------------------------------------------
    Description:
      Python is a programming language that lets you work more effectively.

     Versions:
        python/2.7.16
        python/3.6.6
        python/3.6.8
        python/3.7.7
        python/3.8.8
     Other possible modules matches:
        biopython  openslide-python  wxpython
----------------------------------------------------------------------------
...

For detailed information about a specific “python” package use the module’s full name.

$ module spider python/3.8.8

This will return a page with lots of information. The most important one for us is

 You will need to load all module(s) on any one of the lines below before the
 "python/3.8.8" module is available to load.

      gcc/11.2.0  openmpi/3.1.6
      gcc/9.2.0  cuda/11.0.228  openmpi/3.1.6
      gcc/9.2.0  mvapich2/2.3.3
      gcc/9.2.0  openmpi/3.1.6
      gcccuda/9.2.0_11.0.228  openmpi/3.1.6
      goolfc/9.2.0_3.1.6_11.0.228

Here you see various options that need to be loaded in BEFORE you load python.

Thus to properly load python 3.8.8 you need to say (if this is what you chose):

module load gcc/11.2.0
module load openmpi/3.1.6
module spider python/3.8.8

Modules for tensorflow

module load singularity/3.7.1
module load tensorflow/2.7.0

Modules for pytorch

module load singularity/3.7.1
module lod pytorch/1.10.0

Containers

Rivanna uses singularity as container technology. The documentation specific to singularity for Rivanna is avalable at this link

Singularity needs to be also loaded as a module befor it can be used.

Singularity containers have the ability to access GPUs via a passthrough using NVidia drivers. Once you load singularity you can use it as follows:

singularity <cmd> --nv <imagefile> <args>

The container will be used inside a job.

Jobs

More detail specific to jobs for Rivanna is provided here.

Before we start an example, we explain how we create a job first in a job description file and then submit it to Rivanna. We use a simple MNIST example showcases the aspects of successfully running a job on the machine. We will therefore focus on creating jobs using GPUs.

New 8 A100 GPUs to be added

Rivanna will have eight nodes available to us, but they are not yet in service.

Instead, we will be using the two existing nodes shared with other users.

Rivanna uses the SLURM job scheduler for allocating submitted jobs. Jobs are charged SUs from an allocation. The Rivanna compute allocation. Please contact your supervisor for the name of the allocation. Gregor’s allocation is named

bii_dsc

and it currently contains 100k SUs. Students from the UVA capstone class will have the following allocation:

ds6011-sp22-002

To see the available SUs for your project, please use the command

allocations
allocations -a <allocation_name>

SUs can be requested via the Standard Allocation Renewal form. Due to the limitation, we encourage you to plan things and try to avoid unnecessary runs. General instructions for submitting SLURM jobs is located at

https://www.rc.virginia.edu/userinfo/rivanna/slurm/

To request the job be submitted to the GPU partition, you use the option

-p gpu

The A100 GPUs are a requestable resource. To request them, you would add the gres option with the number of A100 GPUs requested (1 through 8 GPUs), for example, to request 2 A100 GPUs,

--gres=gpu:a100:2.

If you are using a SLURM script to submit the job the options would appear as follows. Your script will need to specify other options such as the allocation to charge as seen in the sample scripts shown in the above URL:

#SBATCH -p gpu
#SBATCH --gres=gpu:a100:2
#SBATCH -A bii_dsc

Interactive Jobs

Please avoid running interactive jobs as they may waste SUs, and we are charged by you keeping the A100 idle.

Although Research Computing also offers some interactive apps such as JupyterLab, RStudio, CodeServer, Blender, Mathematica via our Open OnDemand portal at:

https://rivanna-portal.hpc.virginia.edu

we ask you to avoid using them for benchmarks.

To request the use of the A100s via Open OnDemand, first log in to the Open the OnDemand portal select the desired interactive app. You will be presented with a form to complete. Currently, you would

select gpu for Rivanna partition,
select NVIDIA A100 from the Optional: GPU type for GPU partition pulldown menu and enter the number of desired GPUs from the Optional: Number of GPUs. Once you’ve completed the form, click the Launch button and your session will be launched. The session will start once the resources are available.

Using the MNIST example

For now, the code is located at:

https://github.com/laszewsk/mlcommons/tree/main/examples/mnist-tensorflow

A sample slurm job specification is included at

https://github.com/laszewsk/mlcommons/blob/main/examples/mnist-tensorflow/mnist-rivanna-a100.slurm

To run it use the command

$ sbatch mnist-rivanna-a100.slurm

NOTE: We want to improve the script to make sure it is running on a GPU and add GPU placement commands into the code.

Custom Version of TensorFlow

https://www.rc.virginia.edu/userinfo/rivanna/software/tensorflow/

Keras on Rivanna

https://www.rc.virginia.edu/userinfo/rivanna/software/keras/

Building a Python verion from Source

Requirements

This section is under development

Why do you wnat to do this?

How is it been done?

Whe have developed the following script to create the enfironment on rivanna \url{httplatex ://example.com}

You can download the script from git with wget

wget ....

and place it in a driectory. running it with

$ python-install.py --version="3.10.2" --host=rivanna

will create an optimized version for rivanna. Other options can be found with python-install.py help

Where do you want to place it

scratch vs home dir

How do you access it?

deployment into your own environment

What is the performance gain?

benchmarks vs the various versions on python here. This needs to be reproducible when we have a new version of python

How to cite if you use this

This work was conducted as part of the mlcommons science benchmark earthquake project and if youl ike to reuse it we like that you cite the following paper:

@TechReport{mlcommons-eartquake,
  author = 	 {Thomas Butler and Robert Knuuti and
              Jake Kolessar and Geoffrey C. Fox and
              Gregor von Laszewski and Judy Fox},
  title = 	 {MLCommons Earthquake Science Benchmark},
  institution =  {MLCommons Science Working Group},
  year = 	 2022,
  type = 	 {Report by University of Virginia},
  address = 	 {Charlottesville, VA},
  month = 	 may,
  note = 	 {The order of the authors and url location may change},
  annote = 	 {Version: draft},
  url = {https://github.com/cyberaide/paper-capstone-mlcommons}
}

6 - Installing nvcc on Uuntu 20.04

A description on how to install nvcc in cuda

Requirements

Draft

Instalation

$ sudo wget -O /etc/apt/preferences.d/cuda-repository-pin-600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
$ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub
$ sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /"
$ sudo apt update
$ sudo apt install cuda

Add it to your path

$ echo 'export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}' >> ~/.bashrc

Check CUDA version:

$ nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

7 - Installing tensorflow on Windows 10

A description on how to install nvcc in cuda

Requirements

Draft

Instalation

$ TBD

Add it to your path

$ TBD

Check CUDA version: