High Performance & Research Computing

CCIT’s High-Performance Computing group maintains various HPC (“supercomputer”) resources and offers support for Mines faculty and students using using HPC systems in research efforts. The goal of the service is to help scientists do their science through the application of HPC.

High Performance & Research Computing

CCIT’s High-Performance Computing group maintains various HPC (“supercomputer”) resources and offers support for Mines faculty and students using using HPC systems in research efforts. The goal of the service is to help scientists do their science through the application of HPC.

Want to get started with supercomputing?

Supercomputing is an increasingly important part of engineering and scientific research.

Mio.Mines.Edu

Your Supercomputer

You have access to a 120-plus Tflop HPC cluster for student and faculty research use.

Want to get started with supercomputing? Supercomputing is an increasingly important part of engineering and scientific research. Mines provides an advanced supercomputing cluster called “Mio” for the use of students and faculty who wish to take advantage of this extraordinary high-performance computing resource.

For students

Students have already purchased some access to Mio with Tech Fee funds—usable for general research, class projects, and learning HPC techniques. Students may also at times use Mio nodes purchased by their academic advisor or other professors. The HPC Group offers assistance to students (and faculty) to get up and running on Mio. Individual consultations and workshops are available.

For faculty

Mio holds many advantages for professors:

  • There’s no need to manage their own HPC resources
  • Professors can access other professors’ resources when allowed
  • Mines supplies high-quality Infiniband network infrastructure, which greatly improves the scalability of multinode applications
  • Cost is a reasonable $7,000 per node

Hardware description

  • 8 -28 compute cores per node
  • 2.4GHz – 3.06GHz
  • 24-256 GB/Node
  • Infiniband Interconnect
  • 2 GPU nodes – 7.23 Tflops
  • 240 TB parallel file system
  • 2 Power8 w/GPU nodes

What’s in a name?

The name “Mio” is a play on words. It is a Spanish translation of the word “mine,” as in “belongs to me.” The phrase “The computer is mine” can be translated as “El ordenador es mío.”

 

BlueM.Mines.Edu

Mines’ Big Iron Supercomputer

154 Tflops 17.4Tbytes 10,496 Cores 85KW

BlueM is a unique machine, composed of two distinct compute platforms or partitions that share a common file system. Both platforms, as well as the file system, were purchased from IBM as a package. The common file system shared between the partitions can hold 480 TB. It has efficient support for parallel operation (that is, multiple cores accessing it at the same time). The two compute platforms are optimized for different purposes.

AuN

The smaller compute platform, in terms of capability, is AuN (“Golden”). It is a traditional HPC platform using standard Intel processors. It contains 144 compute nodes connected by a high-speed network. Each node contains 16 Intel SandyBridge compute cores and 64 GB of memory for a total 2,304 cores and 9,216 GB of memory. AuN is rated at 50 Tflops. It is housed in two double-wide racks with 72 nodes in each rack.

AuN is designed to run jobs that require more memory per core.

Mc2

Mc2 (“Energy”) is an IBM BlueGene. Mc2 is housed in a single large 4’ x 4’ rack, currently half full with room for expansion. The BlueGene computer is designed from the ground up as an HPC platform. It has a very-high-speed network connecting the nodes so applications can scale well. Each node has a processor dedicated to systems operations in addition to the 16 cores that are available for users.

The processors on Mc2 are IBM “Power” processors. Mc2 has 512 compute nodes, each with 16 GB of memory for a total core count of 8,912 user cores and 8,912 GB of memory. Mc2 is rated at 104 Tflops.

The total power consumption of the system is about 85 kW with only 35 kW used by Mc2. Mc2 is water cooled. AuN currently runs with rear door heat exchangers but could run with air cooling only. BlueM is housed at the National Renewable Energy Laboratory, in Golden, CO.

Mc2 is designed for jobs that can make use of a large number of cores.

 

Supercomputing is an increasingly important part of engineering and scientific research. Mines has a number of distinct, High Performance Computing platforms. Wendian is the newest.

Wendian came on line in the fall of 2018. It contains the latest generation of Intel processors, Nvidia GPUs, and OpenPower nodes with 82 compute plus 5 nodes with GPUs combined to over 350 TFLOPs. It also has 3 administration nodes, 6 file system nodes heading up 1152 Tbytes (raw) storage @ over 10 Gbytes/Sec;

Wendian runs the CentOS version 7 of linux. Parallel jobs are managed via the Slurm scheduler. The programming languages of choice include C, C++, Fortran, OpenMP, OpenACC, Cuda and MPI.

ProcessorCoresMemory (GB)Nodes / CardsCores TotalMemory Total (GB)
Skylake 615436192391,4047,488
Skylake 615436384391,40414,976
Skylake 5118241925120960
GPU cards for 5118s (Volta)3220640
OpenPower 816256232512
OpenPower 9 16256232512
Totals1072,99225,088

Wendian Details

78 Relion XO1132g Server – Skylake Nodes

  • 1OU (1/3rd Width) w/ 2x 2.5″ Fixed 12Gb SATA Bay
  • Dual Intel Xeon 6154 (18C, 3.0GHz,200W)
  • 39 nodes with 192GB RAM, DDR4-2666MHz REG, ECC, 1R (12 x 16GB)
  • 39 nodes with 384GB RAM, DDR4-2666MHz REG, ECC, 1R (12 x 32GB)
  • 256 Gbyte SSD
  • Integrated AHCI, Intel C621, 6Gb SATA: Linux RAID 0/1/5/6/10/50/60 Integrated NIC, Intel I350, 2x RJ-45/GbE (1-Port Shared with BMC for IPMI) HCA, Mellanox ConnectX-4, 1x QSFP28/EDR
  • Preload, CentOS, Version 7
  • Processors water cooled
  • 3-Year Standard Warranty

5 Relion XO1114GTS Server GPU nodes

  • 1OU (Full Width) w/ 4x 2.5″ Hot Swap 12Gb SAS Bay
  • Dual Intel Xeon Gold 5118 CPU (12C, 2.30GHz, 105W)
  • 192GB RAM, DDR4-2666MHz REG, ECC, 2R (12 x 16GB)
  • Integrated AHCI, Intel C621, 6Gb SATA: Linux RAID 0/1/5/6/10/50/60
  • 256GB SSD, 2.5″, SATA, 6Gbps, 0.2 DWPD, 3D TLC (Micron 1100)
  • Integrated NIC, Intel I350, 2x RJ-45/GbE (1-Port Shared with BMC for IPMI)
  • PBB, 96 Lanes, 1x PCIE Gen3 x16 to 5x PCIE Gen3 x16 (4x GPU + 1x PCIE)
  • HCA, Mellanox ConnectX-5, 1x QSFP28/100Gb VPI
  • 4 x Accelerator, NVIDIA Tesla V100-SXM2, 32GB HBM2, 5120 CUDA, 640 Tensor, 300W
  • Preload, CentOS, Version 7
  • Standard 3-Year Warranty
  • 3-Year On-Site Service, 8×5 Next Business Day

2 Magna 2002S Server – OpenPower8 Nodes

  • 2U, 2x 2.5″ Hot Swap 6Gb SATA Bay w/ 2x 1300W Hot Swap PSU Dual IBM POWER8 Murano 00UL670 CPU (8C/64T, 3.2GHz, 190W) 8 x Memory Module, 4 x DDR4 Slot
  • 256GB RAM, DDR4-2400, REG, ECC, (32 x 8GB)
  • Integrated AHCI, Marvell 88SE9235 6Gb SATA: Linux RAID 0/1/5/6/10/50/60 Integrated NIC, 2x RJ-45/GbE (1-Port Shared with BMC for IPMI)
  • HCA, Mellanox ConnectX-4, 1x QSFP28/EDR
  • Preload, Ubuntu 16.04
  • Standard 3-Year Warranty
  • 3-Year On-Site Service, 8×5 Next Business Day

2 Magna 2xxx Server – OpenPower9 Nodes

  • To be installed when they become available
  • Details to follow

File System

  1. 960TB Usable Capacity @ 10GB/s
  2. Relion 1900 Server – running the BeeGFS MDS
  3. 2 x 150GB SSD, 2.5″, SATA, 6Gbps, 1 DWPD, 3D MLC
  4. 4 x 400GB SSD, 2.5″, SATA, 6Gbps, 3 DWPD, MLC
  5. IceBreaker 4936 Server – running BeeGFS OSS
  6. 2 x 150GB SSD, 2.5″, SATA, 6Gbps, 1 DWPD, 3D MLC
  7. 4 x 36 x 8TB HDD, 3.5″, SAS, 12Gbps – 1,152TB raw
  8. Ability to create parallel file systems from local disk on the fly

Cooling

The XO1132g and X01114GTS servers have on-board water cooling for the CPUs. These are all fed water from a cooling distribution unit, a CDU. This removes about 60% of the total heat generated. The water to the compute resources is in a closed loop. The CDU has a heat exchanger with the heat emitted by the closed loop warming chilled water from central facilities. Remaining heat from the servers and heat generated by the other nodes is removed via two in-row coolers. The equipment list includes (2) APC ACRC301S In-Row Coolers and a MOTIVAIR Coolant Distribution Unit MCDU25