# BigPurple Enroot User Guide and Setup
## Quick Enroot Container Setup w/ Enroot

### 1.A Configure big purple at launch
``` 
module add enroot parallel jq 
```
### 1.B Personal Images store [see storing and sharing](#share-a-container-across-a-lab):
```
#Personal images
export XDG_RUNTIME_DIR=$HOME
export XDG_CONFIG_HOME=$HOME/.config
export XDG_CACHE_HOME=$HOME/.cache
export XDG_DATA_HOME=$HOME/.local/share
```
```
#Images in a lab repo
export XDG_RUNTIME_DIR=/gpfs/data/oermannlab/public_data/enroot/
export XDG_CONFIG_HOME=/gpfs/data/oermannlab/public_data/enroot/.config
export XDG_CACHE_HOME=/gpfs/data/oermannlab/public_data/enroot/.cache
export XDG_DATA_HOME=/gpfs/data/oermannlab/public_data/enroot/.local/share
```
### 2. Use srun or sbatch (need nvidia-smi to run)
``` 
srun  -p a100_short --nodes=1 -t 4:00:00  --tasks=1 --cpus-per-task=4 --tasks-per-node=1 --gres=gpu:a100:1 --mem=32G --pty bash 
```
``` 
sbatch my_slurm_batch_file.sbatch 
```

### 3. Import a .sqsh from docker (ubuntu, etc) to local directory
```
 enroot import --output name_of_sqsh_import.sqsh docker://ubuntu:20.04 
 ```
 or blank cuda container
 ```
 enroot import --output name_of_sqsh_import.sqsh docker://nvidia/cuda
 ```
### 4. Create an image from a .sqsh
``` 
enroot create --name my_enroot_image ./name_of_sqsh_import.sqsh 
```

### 5. Confirm image is created
``` 
enroot list | grep my_enroot_image 
```

### 6. Launch a simple container
```
 enroot start -r -w -m .:/mnt my_enroot_image 
 ```
- **-r: root / sudo**
- **-w: make root file system writable**
- **-m: mount from source/outside/container:/destination/in/container**

## Advanced Enroot Setup with Slurm and Launch Configs
In step 2 of this guide we will be implementing a enroot container + conda environment with pytorch to run a simple torch operation on BigPurple GPUs

### 1. Create torch file
``` 
echo "
import torch
device = torch.device('gpu' if torch.cuda.is_available() else 'cpu')
print(f'using device: {device}')
a = torch.rand(10).to(device)
b = torch.rand(10).to(device)
c = torch.matmul(a, b)
print(a)
print(b)
print(c)
" > test_torch.py 
```

### 2. enroot config file my_enroot_conf.batch [read more](https://github.com/NVIDIA/enroot/blob/master/doc/cmd/start.md)
```
echo "
#ENROOT_ROOTFS=ubuntu
#ENROOT_REMAP_ROOT=y
#ENROOT_ROOTFS_WRITABLE=y

echo "hello in home ${HOME}"
#if NVIDIA_VISIBLE_DEVICES=all then you must have a gpu in your environment -> test by running $ nvidia-smi
export NVIDIA_VISIBLE_DEVICES=all
export THIS_IS_A_TEST=test_variable

environ() {
    env
}
    
mounts() {
    echo "${PWD} /mnt none bind"
    # mount big purple binaries to /mnt/import_execs (if the bin needs libraries it will have to either download or import)
    echo "/usr/bin/ /mnt/import_execs/ none bind"
} " > my_enroot_conf.batch
```
### 3. Utilize a GPU
```
srun  -p a100_short --nodes=1 -t 4:00:00  --tasks=1 --cpus-per-task=4 --tasks-per-node=1 --gres=gpu:a100:1 --mem=32G --pty bash
```
### 4. Launch the container
``` 
enroot start --conf my_enroot_conf.batch my_enroot_image 
```

### 5. Install Conda *[see permanant state](#permanent-state-of-container)*
```
apt-get update
apt-get install -y wget 

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh --no-check-certificate -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
export PATH="$PATH:~/miniconda3/bin"
conda init
source ~/.bashrc 

pip3 install torch numpy
```
### 6. Run the python file
```
python /mnt/test_torch.py
```

--- 

## SLURM for fully isolated runtimes
 enroot config file my_enroot_conf.batch with a --conf and a command script -- bash 
```
#example slurm in torch_run.sbatch

#!/bin/bash
#SBATCH -p a100_short
#SBATCH --nodes=1
#SBATCH -t 4:00:00
#SBATCH --tasks=1
#SBATCH --cpus-per-task=4
#SBATCH --tasks-per-node=4
#SBATCH --gres=gpu:a100:1
#SBATCH --mem=32G


cd /gpfs/data/oermannlab/users/jjs815/ENROOT_EXAMPLES
module add enroot
module add jq
module add parallel

echo 'launching enroot'
enroot start --conf my_enroot_conf.batch my_enroot_image -- bash -c 'python /mnt/test'
```

```
sbatch torch_run.sbatch
```


## Share a container across a lab
### Creating a lab enroot repository
``` 
make sure your XDG configs point to shared path 
```
### Export and send to a lab member 
``` 
enroot export --output ready_to_go_test1.sqsh new_test1 
```
``` 
cp ready_to_gotest1.sqsh /gpfs/home/kerbos_id/ready_to_gotest1.sqsh
 ```

### Download locally from your local machine
``` 
sync -avz --progress jjs815@bigpurple.hpc.nyumc.org:/gpfs/data/oermannlab/users/jjs815/RLBench/docs_ubuntu_test_image.sqsh ./ready_to_gotest1.sqsh 
```
### export to aws? 



---
# Notes
## Permanent state of container 
Once an image is created, any changes you make while bashed into a container will be persistent. Meaning you only have to install conda on the initial setup. And then when you share it, log back in, export it, etc... it will preserve your changes. This makes it highly efficient as a environment to build an enviorment for a project with specific library requirements (CUDA, Torch, etc)
## To utilize a GPU in a enroot container 
To utilize a gpu you must use 
**export NVIDIA_VISIBLE_DEVICES=all** **within** the container enviorment. This can be set with the .batch file or with -env NVIDIA_VISBLE_DEVICE=all
If you set **NVIDIA_VISIBLE_DEVICES=all** and do not have an available GPU runtime you will get this error. 
```
[ERROR] Command not found: nvidia-container-cli, see https://github.com/NVIDIA/libnvidia-container
[ERROR] /gpfs/share/apps/enroot/3.4.1/etc/enroot/hooks.d/98-nvidia.sh exited with return code 1
```

You can always test this by running **nvidia-smi**
If you have a GPU it will look like this:

```
[jjs815@a100-4030 jjs815]$ nvidia-smi
Tue Jul 16 14:47:19 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100 80G...  On   | 00000000:E3:00.0 Off |                    0 |
| N/A   29C    P0    51W / 300W |      0MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
```