BIMSA Compute Cluster Usage Guide
Introduction
BIMSA Compute Cluster (BCC) is a mid-powered, shared and scheduled cluster that aims to serve BIMSA academic and research activities. During Nov-2023, IT department at BIMSA conducted a survey on compute needs to faculties. Upon completion, IT has consolidated the needs, sought for possible solutions, allocated budgets from management, prepared and conducted a fair tender, evaluated the optimal solution, submitted for approval and finally exercised procurement processes. The cluster was acquired afterwards, properly configured and deployed.
Basic Server Configuration
1. Compute Servers (6 sets)
Hardware | |
CPU | Dual Intel Xeon Gold 6444Y (3.6GHz, 16C/32T) |
Memory | 768GB (24 x 32GB RDIMM 4800MT/s) |
Storage | 480GB SSD mounted on / |
Network | 10Gbps uplink (to login nodes) |
GPU | Dual Nividia L40 GPU PCIe |
Software | |
OS | Ubuntu 22.04 LTS |
System Packages | build-essentials |
GPU Driver | Nvidia 550.90.07 |
CUDA | 11.8 |
CUDNN | 9.2.1.18 |
Anaconda | 2024.06-1 |
Pytorch | 2.4.0 |
2. Login Servers (2 Sets)
Hardware | |
CPU | Dual Intel Xeon Gold 6442Y (2.6GHz, 24C/48T) |
Memory | 128GB (4 x 32GB RDIMM 4800MT/s) |
Storage | 480GB SSD mounted on / |
Network | 10Gbps uplink (to user network) |
Software | |
OS | Ubuntu 22.04 LTS |
System Packages | build-essentials |
Anaconda | 2024.06-1 |
Basic Cluster Usage Guide
Prerequisite
Users of this cluster should be familiar with the following skills / know-hows
1. Linux (especially Ubuntu) operating system, command-line operations
2. SLURM job-scheduling system (sample tutorial)
3. Build your own compute environment at your home directory without root privilege
Access BCC
Upon successful account application, you may use any ssh client to login to sls.bimsa.net Port 22. If you are accessing outside BIMSA, make sure you connect to BIMSA VPN service first.
Build Your Compute Environment
The whole BCC cluster (including login and compute nodes) shares the same user home directory. The cluster has already provided anaconda environment. You may build your own virtual environment in your home directory to perform computation.
Alternatively, you may compile programs and put in your home directory
Use SLURM
The Simple Linux Utility for Resource Management (SLURM) is the resource management and job scheduling system of BCC. All compute jobs much be submitted by SLURM.
Here is a sample script (test.sh)
#!/bin/bash
#SBATCH -o job.%j.out
#SBATCH -J myFirstGPUJob
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=6
#SBATCH --gres=gpu:1
nvidia-smi
Then, you may submit your job with
$ sbatch test.sh
And you may read the job output via
$ cat job.[nn].out