https://www.linkedin.com/in/shahzebmsiddiqui/
Seasoned HPC Engineer with 14+ years of experience designing, deploying, and optimizing high-performance computing systems across academia, national labs, and industry. Seeking to contribute deep expertise in HPC infrastructure, scientific software ecosystems, and performance engineering to drive innovation and efficiency at scale.
Shahzeb is the creator of buildtest an HPC testing framework to automate build and execution of tests. He also created lmodule, a Python API for module system which is a spin-off from buildtest project and it is a standalone API that can be used for testing modules.
Shahzeb created slurm utility called jobstats which is a wrapper to sacct
and sreport
to show slurm job details.
Please refer to my Resume if you are interested in reaching out to me for new job opportunities.
- Extensive HPC Expertise: Delivered robust HPC infrastructure and user support at national lab, corporations and academia.
- Software Ecosystem Leadership: Led deployment of the E4S stack with Spack and CI; contributed to 1,000+ package builds.
- Monitoring & Observability: Built monitoring stack at CZ Biohub with Prometheus, Grafana, and alerting systems.
- Training & Open Source: Created and maintained Buildtest, an open-source framework for HPC acceptance testing.
- DevOps & Containerization: Architected container workflows with Singularity, Lmod, Ansible, and GitLab CI.
- Performance Engineering: Expertise in SLURM, MPI/OpenMP/CUDA, and job-level analytics with XDMoD
Company | Position | Date |
---|---|---|
Chan Zuckerberg BioHub | HPC Principal Engineer | 01/2025-05/2025 |
Lawrence Berkeley National Laboratory | HPC Programming Environment Engineer | 11/2023 - 01/2025 |
Lawrence Berkeley National Laboratory | HPC Consultant/Software Integration Specialist | 05/2020 - 10/2023 |
Dassault Systemes | HPC Systems Administrator | 09/2019 - 05/2020 |
Pfizer | HPC Linux Engineer | 09/2016 - 09/2019 |
Penn State University | R&D Software Systems Engineer | Oct 2014 - Sep 2016 |
HPC Intern | IBM T.J Watson Center | Jun 2013 - Aug 2013 |
Graduate Researcher | King Abduallah University of Science & Technology | Jan 2013 - Dec 2013 |
Systems Analyst | Global Science and Technology | Feb 2012 - Aug 2012 |
Cyber Software Engineer | Northrop Grumman Corporation | Jun 2011 - Dec 2011 |
Database Programmer Intern | Applied Research Laboratory | May 2010 - Dec 2010 |
Shahzeb has experience installing and managing large software stack, cluster manager (Bright Cluster Manager, Cobbler), configuration management (Ansible), GPFS, Slurm and LSF. Shahzeb is an experienced Developer, Dev-Ops, System Administrator and often involved in open-source projects.
Shahzeb Siddiqui started out his career in High Performance Computing (HPC) in 2012 at King Abdullah University of Science and Technology (KAUST) while pursuing his Masters. His focus in HPC includes Parallel Programming, Performance Tuning, Containers (Singularity, Docker), Linux system administration, Scientific Software Installation and testing, Scheduler Optimization, and Job Metrics.
-
M.S Computer Science at KAUST
-
B.S Computer Engineer at Penn State University
-
Red Hat Certified System Administrator (RHCSA) - Credential ID: 200-019-677
-
Negotiation and Influence Program – UC Berkeley Executive Education
For list of publications, please refer to my ORCID: https://orcid.org/0000-0002-2342-6974
Topics | Tools |
---|---|
Cluster Manager | Bright Cluster Manager, Cobbler |
Scheduler | SLURM, LSF |
Containers | Singularity, Docker, Docker swarm, Kubernettes |
Configuration Management | Ansible |
Build Framework | Easybuild, Spack, OpenHPC |
Programming | Distributed Computing, GPU Computing, Parallel Computing, C, C++, Python, JAVA, PHP, CSS, HTML, Javascript, ColdFusion, TCL, Lua |
Database | MySQL, mariadb |
DevOps Tools | Jenkins, Git, Gitlab, Artifactory |
Module Environment | Lmod, EnvironmentModules |
Ticketing System | JIRA, JIRA Service Desk, ServiceNow |
Misc | Restructured Text, Markdown, Shell Scripting, Cyber-security, Computer Architecture |
- Automated Software Testing of Spack/E4S with Buildtest at SC23 in BoF Software Testing for Scientific Computing in HPC, Nov 15th 2023
- Testing your HPC System with Buildtest at PEARC23, July 24th 2023
- Buildtest: A Framework for testing HPC systems at Improving Scientific Software Conference 2023, April 17-19th 2023. Video
- Lmodule - Python API for testing modules at Improving Scientific Software Conference 2023, April 17-19th 2023. Video
- Facility Testing of E4S at NERSC at ECP Community BoF 2023 on Enhancing Confidence in a Software Ecosystem through Complimentary Layers of Software Testing, Feb 15th 2023
- E4S at OLCF, ALCF, and NERSC at ECP Community BoF 2023, Feb 14th 2023
- Automated Acceptance Testing in HPC with buildtest at ECP Project Tutorial 2023, Feb 7th 2023, Video
- NERSC Spack Infrastructure Project - Leverage Gitlab for automating Software Stack Deployment at SC22 DOE booth, Nov 15th 2022
- An Automated Approach to Continuous Acceptance Testing of HPC Systems at NERSC at HPCSYSPRO22, Nov 14th 2022
- New User Training, Sep 28th 2022
- E4S at NERSC 2022, Aug 25th 2022
- MVAPICH2 at NERSC, Aug 24th 2022 at MVAPICH User Group Meeting 2022, Video
- Testing your HPC System with Buildtest at PEARC22, July 11th 2022. See Talk
- Facility Deployment of E4S at ALCF, OLCF, and NERSC, May 3rd 2022 at ECP Annual Meeting 2022
- Spack Infrastructure at NERSC, April 5th 2022 at SEA Improving Scientific Software Conference 2022
- Building a Spack Pipeline in Gitlab at SC21 DOE booth, Nov 16th 2021
- Facility Testing of E4S via E4S Testsuite, Spack Test, and buildtest, Sep 14th 2021. See Talk
- E4S at DOE Facilities with Deep Dive at NERSC, Oct 4th 2021
- Lmod User Training, June 22 2021
- HPC System Test: Building a cross-center collaboration for system testing, May 6th 2021 at Cray User Group 2021 BOF
- HPC System and Software testing via buildtest at ECP Annual Meeting 2021, April 15th 2021
- Spack E4S Facility Pipeline Update at ECP Annual Meeting 2021, April 14th 2021
- Acceptance Test with Buildtest for Cori System at High Performance Computing Benchmarking and Optimization (HPBench20), Mar 27th 2021
- Acceptance Test with buildtest and Cori Testsuite at SEA's Improving Scientific Software Conference and Tutorials 2021, Mar 23rd 2021
- Panel moderator for Benchmarking in the Data Center: Expanding to the Cloud at Principles and Practice of Parallel Programming (PPoPP) 2021, Feb 28th 2021
- buildtest: HPC Testing Framework for Acceptance Testing at FOSDEM21 HPC, Big Data and Data Science devroom, Feb 7th 2021
- buildtest: Testing Framework for HPC systems at EasyBuild User Meeting 2021, Jan 29th, 2021
- Automate Module Testing with Lmodule at EasyBuild User Meeting 2021 on Jan 29th, 2021
- Spack Community BoF at SC20, Nov 18, 2020.
- Buildtest: HPC Software Stack Testing Framework at FOSDEM'20 HPC Big Data and Data Science devroom, Feb 1-2, 2020
- Building an Easybuild Container Library in Sylabs Cloud at 5th Easybuild User Meeting, Jan 29-31, 2020
- buildtest: HPC Software Stack Testing Framework at 5th Easybuild User Meeting, Jan 29-31, 2020
- Buildtest: A Software Testing Framework with Module Operations for HPC systems at SC'19 in HPC User Support Tools Workshop, Nov 18, 2019
- Software Stack Testing with buildtest at HPCKP'18, June 21-22, 2018
- HPC Application Testing Framework - buildtest at HPCKP'17, June 15-16, 2017
- Our Journey in Automated Testing of E4S Software Stack via Buildtest, NERSC Seminar, Dec 20th 2023
- Siddiqui, Shahzeb, Palmer, Erik, Shende, Sameer, Spear, Wyatt, Sambrekar, Prathmesh, & Xiang, Sijie. (2022, November 14). An Automated Approach to Continuous Acceptance Testing of HPC Systems at NERSC. SC22 (HPCSYSPROS22), Dallas, TX. https://doi.org/10.5281/zenodo.7320179
- Siddiqui, Shahzeb, & Shende, Sameer. Software Deployment Process at NERSC: Deploying the Extreme-scale Scientific Software Stack (E4S) Using Spack at the National Energy Research Scientific Computing Center (NERSC), 2022-05-17, https://doi.org/10.2172/1868332
- Shahzeb Siddiqui, Buildtest: A Software Testing Framework with Module Operations for HPC Systems, HUST, Springer, March 25, 2020, https://doi.org/10.1007/978-3-030-44728-1_1
- Shahzeb Siddiqui, Fatemah AlZayer, Saber Feki, Historic Learning Approach for Auto-tuning OpenACC Accelerated Scientific Applications, VECPAR, Springer, December 7, 2019, https://doi.org/10.1007/978-3-319-17353-5_19
- Shahzeb Siddiqui, Automatic Performance Tuning of Parallel and Accelerated Seismic Imaging Kernels, EAGE Workshop on High Performance Computing for Upstream, European Association of Geoscientists & Engineer, September 1, 2014, https://doi.org/10.3997/2214-4609.20141941