Getting Started with High Performance Computing
Mines has a number of distinct, High Performance Computing platforms. They have similarities and some differences. This page provides an overview of High Performance Computing, Mines’ platforms and points to additional information.
The term ‘high performance computer’ has evolved over the years and continues to evolve. A calculation that required a high performance computer 10 years ago, might today, be done on a laptop. Cell phones have the computing power of early generations of HPC platforms.
Today’s definition of an HPC platform often involves parallelism. That is, most HPC platforms gang many processing cores to work on a single problem.
A processing core is what most people think of as the central part of a computer; a chip or set of chips with the capability to access memory and perform calculations. Most modern computers contain more than one computing core, actually even more than one core on a single chip. For example, the Intel Core i5 chip found in many low-end laptops contains two computing cores.
A computing node encapsulates the cores, memory, networking and related technologies. A node may or may not have video output capability.
A high performance computer is one that can effectively use multiple cores on a single node or in a collection of nodes to perform a calculation, by distributing the work to the various cores.
The HPC platforms at Mines are augmented with a high speed network connecting nodes, to facilitate efficiency. The nodes are enclosed in a collection of racks with 10s of nodes per rack.
The individual cores of Mines’ HPC platforms may not be any more powerful than the cores of a recent generation laptop. There are, however, thousands of them available.
Description of Mines’ HPC Platforms
AuN.mines.edu is based on the IBM iDataplex platform. The nodes employ the x86 Sandy Bridge generation architecture. Each of the 144 nodes has 64 gigabytes of memory and 16 processors for a total of 2,304 cores and 9.216 terabytes of memory.
AuN uses the same compiler suite as Mines’ Mio supercomputer. Many applications that run on one of these machines could run on the other machine without a recompile. However, because of the updated processor instruction sets available on the newer Mio nodes, you would expect improved performance with a recompile.
AuN and Mc2 share a common 480 terabyte parallel file system.
Mc2.mines.edu, an IBM Blue Gene/Q, is designed to handle programs that can take advantage of large numbers of compute cores. Also, the BGQ is designed to run applications that use multiple levels of parallelism, such as combining threading and message passing. Multilevel parallelism is expected to be the dominant paradigm in the future of HPC.
Our BGQ contains 512 nodes with each node having 16 cores. It has 8.192 terabytes of memory and a peak computational rate of 104.9 teraflops. The BGQ rack is currently half-populated. That is, there is room for an additional 512 nodes within the same cabinet.
The machine Mio.mines.edu represents a new concept in computing at Mines. Mio is a shared resource funded in part by Mines’ Administration and in part by money from individual researchers and the student Tech Fee. Mio came on line in March 2010. Initially it was a relatively small cluster dedicated to a single group of research projects. Mio has grown into a supercomputing class machine.
The Mio concept is simple. Mines funds the infrastructure and individuals or research groups purchase compute nodes to add to the cluster. This infrastructure consists of racks, networking components (Ethernet and high speed Infiniband), a file system (240 terabytes), management nodes, software and support. The professors own their nodes; they have exclusive access when they need them. When they are not in use by the owners the nodes are available for use by other cluster members.
What’s in a Name?
The name “Mio” is a play on words. It is a Spanish translation of the word “mine” as in “belongs to me”, not the hole in the ground. The phrase “The computer is mine.” can be translated as “El ordenador es mío.”
The latest generation of nodes on Mio are Superserver 6018TR-TF servers with 2x(Intel e5-2680 V4) “Broadwell” processors containing 28 Cores running at 2.40 GHz with 256 Gbytes of memory. The cost is about $7,000 each. However, purchasing of nodes on Mio has been deprecated in favor of purchasing nodes on Wendian. A complete list of the nodes on Mio can be found at http://geco.mines.edu/prototype/Who_owns_nodes_on_Mio/index.shtml.
Mio Configuration, August 2018
|Processor||Cores||Memory(GB)||Node Count||Cores Total||Memory Total (GB)|
Wendian is Mines’ newest HPC platform scheduled to come on line for the 2018 fall semester.
Wendian – HPC@MINES 2018 Description
- 82 compute plus 5 nodes with GPUs @ over 200 TFLOPs (Theoretical);
3 administration nodes;
6 file system nodes heading up 1152 Tbytes (raw) storage @ over 10 Gbytes/Sec;
- File system:
- BeeGFS supporting on-the-fly configuration of parallel file systems;
- Primary Network:
- EDR Infiniband Fabric 2:1 Oversubscribed;
- Hybrid water and air;
What’s in a Name?
Wendian is an old English word not in common use today.
- to turn or change direction;
to change or alter;
78 Relion XO1132g Server – Skylake Nodes
- 1OU (1/3rd Width) w/ 2x 2.5″ Fixed 12Gb SATA Bay
- Dual Intel Xeon 6154 (18C, 3.0GHz,200W)
- 39 nodes with 192GB RAM, DDR4-2666MHz REG, ECC, 1R (12 x 16GB)
- 39 nodes with 384GB RAM, DDR4-2666MHz REG, ECC, 1R (12 x 32GB)
- 256 Gbyte SSD
- Integrated AHCI, Intel C621, 6Gb SATA: Linux RAID 0/1/5/6/10/50/60 Integrated NIC, Intel I350, 2x RJ-45/GbE (1-Port Shared with BMC for IPMI) HCA, Mellanox ConnectX-4, 1x QSFP28/EDR
- Preload, CentOS, Version 7
- Processors water cooled
- 3-Year Standard Warranty
5 Relion XO1114GTS Server GPU nodes
- 1OU (Full Width) w/ 4x 2.5″ Hot Swap 12Gb SAS Bay
- Dual Intel Xeon Gold 5118 CPU (12C, 2.30GHz, 105W)
- 192GB RAM, DDR4-2666MHz REG, ECC, 2R (12 x 16GB)
- Integrated AHCI, Intel C621, 6Gb SATA: Linux RAID 0/1/5/6/10/50/60
- 256GB SSD, 2.5″, SATA, 6Gbps, 0.2 DWPD, 3D TLC (Micron 1100)
- Integrated NIC, Intel I350, 2x RJ-45/GbE (1-Port Shared with BMC for IPMI)
- PBB, 96 Lanes, 1x PCIE Gen3 x16 to 5x PCIE Gen3 x16 (4x GPU + 1x PCIE)
- HCA, Mellanox ConnectX-5, 1x QSFP28/100Gb VPI
- 4 x Accelerator, NVIDIA Tesla V100-SXM2, 32GB HBM2, 5120 CUDA, 640 Tensor, 300W
- Preload, CentOS, Version 7
- Standard 3-Year Warranty
- 3-Year On-Site Service, 8×5 Next Business Day
2 Magna 2002S Server – OpenPower8 Nodes
- 2U, 2x 2.5″ Hot Swap 6Gb SATA Bay w/ 2x 1300W Hot Swap PSU Dual IBM POWER8 Murano 00UL670 CPU (8C/64T, 3.2GHz, 190W) 8 x Memory Module, 4 x DDR4 Slot
- 256GB RAM, DDR4-2400, REG, ECC, (32 x 8GB)
- Integrated AHCI, Marvell 88SE9235 6Gb SATA: Linux RAID 0/1/5/6/10/50/60 Integrated NIC, 2x RJ-45/GbE (1-Port Shared with BMC for IPMI)
- HCA, Mellanox ConnectX-4, 1x QSFP28/EDR
- Preload, Ubuntu 16.04
- Standard 3-Year Warranty
- 3-Year On-Site Service, 8×5 Next Business Day
2 Magna 2xxx Server – OpenPower9 Nodes
- To be installed when they become available
- Details to follow
The XO1132g servers have on-board water cooling for the CPUs. These are all fed water from a cooling distribution unit, a CDU. This removes about 60% of the total heat generated. The water to the compute resources is in a closed loop. The CDU has a heat exchanger with the heat emitted by the closed loop warming chilled water from central facilities. Remaining heat from the servers and heat generated by the other nodes is removed via two in-row coolers. The equipment list is given below.
- (2) APC ACRC301S In-Row Coolers
- MOTIVAIR Coolant Distribution Unit MCDU25
- Head and Skylake nodes to run CentOS 7.0
- Power nodes to run Ubuntu 16.04
- Scheduler: Slurm (same as AuN and Mio)
- Serial Compilers:
- Intel 18.x
- Portland Group
- IBM XL
- MPI (Parallel) Compilers:
- Intel (Supports both Intel and GNU backend compilers)
- OpenMPI (Supports Intel, GNU and Portland Group backend compilers)
- mpxl__ (Power nodes)
- Unique file system capability;
- Users can create parallel file systems out of ssds on the compute nodes.
Getting An Account
For more detail on gaining access to HPC resources see the “How do I get an account?” tab on the HPC@Mines FREQUENTLY ASKED QUESTIONS page https://www.mines.edu/hpc/faq/.
For professors and students working with professors, access to Mio is restricted to research groups who have purchased nodes. The PI for the research group can request additions of new users (and new nodes!) by emailing the HPC Group: email@example.com.
For students not working with a professor, access can be obtained by emailing the HPC group at firstname.lastname@example.org. For such students, access is restricted to educational use and projects that can make efficient use of the HPC resources; it is expected that they are developing or running parallel applications. Students can publish based on results from Mio; however, unless a professor owns nodes on Mio they cannot share authorship with the student. The reason for this restriction is to prevent professors from improperly gaining proxy access to Mio.
AuN and Mc2
Access to AuN and Mc2 is through a proposal process. See the “How do I get an account?” tab on the HPC@Mines FREQUENTLY ASKED QUESTIONS page: https://www.mines.edu/hpc/faq/.
Access to Wendian is granted in two ways. It will be possible for research groups to purchase nodes on Wendian. Research groups who own nodes on Wendian will have access privileges similar in form to Mio, including priority access to their nodes. If they would like access to all nodes of the machine they will need to submit a proposal. Researchers who do not own nodes must submit a proposal as is done for AuN and Mc2.
Accounts on Wendian granted via proposals will be given an allocation of hours. After the allocation is consumed, jobs run against that account will experience a drop in priority. Jobs run on owned nodes will not count against the owner’s allocation, and any reduction in priority for jobs run on owner allocations will not affect those running on owned nodes.
About the Environment
Mines’ HPC platforms run versions of the Linux operating system, with a command line interface. Users need to be familiar with Linux to make effective use of Mines’ machines. A list of Linux tutorials may be found under “I am new to Linux; HELP!” on the FAQ page.
All access to Mines’ HPC clusters is via SSH. SSH is part of macOS- and Linux-based machines. Windows will require remote access software to connect to Mines’ HPC resources; some options include, Bash on Windows or Cygwin. Either of the latter two is recommended.
Much of the runtime environment is managed via a module system. Module systems are in common use in many HPC environments.
Scheduling and Running in Parallel
Running parallel applications on Mines’ HPC resources is managed via a scheduler. The same scheduling software is used on all machines; an assortment of links to information and tutorials can be accessed by clicking on Further Resources, selecting User Guides, then Slurm Guides.
Running a parallel application requires first creating a script. The script contains a request for resources and commands to run the job on these resources. The script is submitted to the scheduler and will run when resources become available. Information about writing scripts and examples of scripts for Mio and AuN are found by clicking on Further Resources, selecting User Guides, then Scripting Guides.
The topics listed above are discussed on the HPC@Mines FREQUENTLY ASKED QUESTIONS (FAQ) page https://www.mines.edu/hpc/faq/.
If you are new to HPC@Mines it is strongly suggested to work through the example How do I do a simple build and run? on the FAQ page.
Questions should be sent to: email@example.com. However, please visit the HPC@Mines FREQUENTLY ASKED QUESTIONS (FAQ) page https://www.mines.edu/hpc/faq/ first.