- Workflow: Github, Collaboration, Testing, Reproducibility
- Software tools: IDE, CLI, Environment Management(Docker/Singularity)
- System Knowledge: HPC, GPUs
- CLI
- IDE(Vscode), Debugging, Testing, Notebook
- Python(Conda), Programming with GPUs
Resource Covering a lot of the general software engineering material in more detail
Practical Examples given by WendaZhou
https://wangchujiang.com/linux-command/
- Programmable/ Automatable
- Translate well across platforms(Laptop, Desktop, HPC Cluster, Cloud)
- Shell is a programming language
- Main programming aspects we make use of: variables, redirection, globbing
- All variables in Shell are "strings"
- Define a variable by assignment: MY_VAR="my value"
- Access variable value by expansions(string replacement): echo"$HOME/$MY_VAR"
- Double quotes(") enalble expansions, but single quotes(') don't!: echo'$MY_VAR' doesn't work
- Pass variables to prgram: export, otherwise value only set in your shell
- *: match any number of characters: echo *.txt lists all text files
- ?: match single character
- [1-9]: match any single digit
- Programs can communicate with each other through standard pipes: standard output=1, standard error=2
- Redirect output to files: echo "Test" data 1>output.txt (redirect standard output to output.txt); python -m script.py 2>&1 (redirect standard error to standard output)
- Be careful about buffering: python -u *.py will run without buffering
- Pipe output between programs: ls -l | wc -l
- ls: list files
- ls -lah: list all files, long listing format, human readable sizes
- Enviroment Varaibles: set of key-value pairs like HOME=/Users/Wendy
- Examples: $PATH: list of folders where executables are searched for; $CC: default C compiler; $LD_LIBRARY_PATH: search path for the dynamic linker; $TMPDIR: temporary directory
- Environment Variables are not saved, use dotfiles to configurer shell on startup
- All programs have a notion of working directory, the command pwd prints the current working directory that you are "in"
- Relative paths(be resolved with respect to this directory) & Absolute path(do not consider the working directory)
- Directory('.') & Parent Directory('..')
- Path configures where the shell looks for commands
- The command which can be used to resolve the command. >>which is /bin/ls; >>which python /Users/Wendy/miniconda3/bin/python
- Package managers(e.g. conda) use the path to manage environments.
- You can also set custom paths if you install additional software. This can be very useful in shared environment(e.g. HPC)
- Hidden by default from "ls" and global expansion
- ~/.bashrc or ~/.zshrc; This file is run at the startup of every shell. Add commands here to make "persistant" changes to the shell environment
- .condarc: This file is used to configure conda(e.g. channels)
- .gitignore: This file is used to exclue files from tracking by git
- Distributed source control system; keep track of different version of code; not so good for data
- Used through command line tool GIT
- Manage separate installations of python: Separate packages for different projects; On shared systems: Different installations for different users
- Reproducible environments: Share between collaborators; Ensure that results are stable through time
- Containers: Encapsulate entire system
- Be careful about system specific packages(especially GPU!) eg: install packages with GPU on cluster machine, but CPU only on your laptop
- General purpose package manager: Manage python install; Manage native tools installs(e.g. compilers, CUDA)
- Prefer conda over pip when available: pip has trouble with native dependencies(e.g. CUDA)
- Avoid installing packages in the base environment
- Try to keep track of packages you have installed. Write an environment.yml file
- For faster installation, use mamba
- Code Completion; Debugger; Test Explorer; Notebooks
- VScode(lightweight multi-platform IDE); May also want to consider pycharm; Remote development capabilities: use this extensively.
- Getting start with VScode
- Python Project Sturcutre
- setup.py: used to install packages, to install locally, run: pip install -e
- mypackage/: main folder for your package
- tests/: main folder for your tests
- requirement.txt: explicit listing of your dependencies(pip); can also use environment.yml(conda)
- Leverage powerful hardware: GPUs/accelerators + CPU clusters + Storage for large datasets
- Remote Shells(SSH): Connect a text-based interface to a remote computer; Can be used to tunnel other applications(e.g. file copy, git). Encrypted + Authenticated connection; Username + Password | Key-based authentication(file on local computer) | Hardware key(common in big companies); Can be used to connect other applications(Jupyter | VSCode)
- SSH configuration: SSH is configured through files found in the ~/.ssh folder. ~/.ssh/config file allows for configuration(configure how ssh connects to various hosts, the basic configuration specifies username and host); Public/Private key pairs store in ~./ssh folder; ~/.ssh/id_rsa contains private key
- SSH connection Multiplexing: To avoid the need to reconnect every time, we can multiplex(reuse) one connection. Edit configuration in ~/.ssh/config
- SSH Agent Forwarding: We will often need to further authenticate the remote to further services. eg: laptop->cluster->github; laptop->gateway->cluster; We can ask SSH to forward the authentication through agent forwarding: make sure to add the key to the agent
- SSH ProxyJump: In many large systems, we cannot connect to the remote directly. eg: gateway/ bastion host; NYU HPC: must connect to gw.hpc.nyu.edu from outside VPN; We can ask SSH to proxy the connection through another host using the ProxyJump command
- nyugateway(ProxyJump) --> greene(large HPC cluster) --> greeneburst --> burstinstance(GCP on Greene)
- Greene cluster(large HPC cluster): ~550 CPU machines(48 cores each); ~60 GPU machines
- Slurm: System to specify jobs for processing on the cluster; Best suited for batch processing; Used to control allocations of GCP instances
- GCP on Greene: NYU HPC provides a service to access google cloud resources through SLURM; we use this to provision a remote development environment
- Allocate new GCP instance: srun --account=(authorization to allocate the machine) --partition=interactive --time=8:00:00(max 24 hours) --pty /bin/bash(allocate interactive shell); for gpus, we need to specify additionally: --gres=gpu:v100:1
- Check allocated node with: squeue -u USER(run on the log-burst server); Edit the .ssh/config file on laptop to point to the allocated server(Host burstinstance Hostname
$log-burst$ ); Connect with ssh burstinstance
- Use tmux(terminal multiplexer) to keep the session alive(install tmux through conda); allow multiple panes through a single connection; cheatsheet
- Use htop to monitor CPU/memory usage: (Greene cluster storage: Home:/home/$USER 50GM 30k inodes; Scratch:/scratch/$USER 5TB 1M inodes; Archive:/archive/$USER 2TB 20k inodes; for home/archive, inode limit is severe, don't install conda in home directory!)
- Use nvidia-smi to monitor GPU usage: watch -n1 nvidia-smi(to continuously monitor)
- GCP machines see a different filesystem: no "hard" quotas on GCP, also have /home and /scratch but different content
- To transfer data between GCP and Greene, log onto a GCP machine, then use scp to greene-dtn: scp greene-dtn:/scratch/wz2247/data/places365.squashfs
- Container technology(Docker): Built for HPC environment; Can be run without root priviledges; GPU Integration
- Create fully encapsulated portable environments: reproduce your environment between GCP and Greene; share exact environment between team members
- Reuse optimized docker containers: eg: nvidia ngc, tensorflow/ pytorch; Download the Nvidia pytorch docker image; convert it to a singularity image on GCP; run a shell in the image directly: singularity exec /scratch/wz2247/singularity/images/pytorch_21.08-py3.sif /bin/bash
- Bind Paths: allow us to control which paths to expose to the container; (modify files in the main system)
- Overlays: If we want to modify the container. e.g.: install new package in the container environment. /.sif files are immutable; instead, we can use overlays and mount the overlay when starting the container: singularity exec --overlay my_overlay.ext3 /scratch/wz2247/singularity/images/pytorch_21.08-py3.sif bash