Greetings GECO and other Mines HPC users,

The html version of this message can be found at http://hpc.mines.edu/announce.html

As you may have heard, plans have been in the works to purchase a new HPC platform for Mines for some time. I am pleased to announce that an order has been placed for, BlueM, Mines next HPC platform.

This message should be considered a preliminary courtesy message for HPC users on campus. An official announcement and press release is in the works and will be occurring soon. This message is intended to let researchers know what will be happening to help us get ready for the machine.

As has been the case with RA, BlueM will be available to Mines faculty and their students for research needing high performance computing. Faculty will again be required to submit requests for allocations on the machine as was done for RA.

The current planned acceptance date is June 25. We hope to hit the ground running on that day with real science runs. So we will discuss what will need to happen between now and then to make that a reality. Before that discussion, a description of the new machine is in order.

CSM has ordered of a unique high performance computing system from IBM. The overall specifications are:

FeatureValue
Teraflop rating154 teraflops. (Roughly 7xRA)
Memory17.4 terabytes
Nodes656
Cores10,496
Disk480 terabytes

One of the unique characteristics of this machine is its small footprint both in physical size and low energy usage. It will require only 85 kW. The new machine occupies a total of five racks, requiring only 3 compute racks, a management rack, and a file system rack.

The new IBM is also unique in configuration. It contains two independent compute partitions that share a common file system. The combined compute partitions and their file system will be know collectively as BlueM.

The two partitions are built using different architectures. The first partition runs on an IBM BlueGene Q (BGQ). The second partition uses the iDataplex architecture.

Each of the architectures is optimized for a particular type of parallel application. The IBM BlueGene Q is designed to handle programs that can take advantage of large numbers of compute cores. Also, the BGQ was designed to run applications that use multiple levels of parallelism, such as combining threading and message passing. Multilevel parallelism is expected to be the dominant paradigm in the future of HPC.

Our BGQ will contain 512 nodes with each node having 16 cores. It will have 8.192 terabytes of memory and a peak computational rate of 104.9 teraflops. The BGQ rack will be half populated. That is, there will be room for an additional 512 nodes within the same cabinet.

The BGQ will be given the name MC2 pronounced “Energy.”

The second partition, based on the IBM iDataplex platform, is designed to handle applications that may require more memory per core. The nodes employ the x86 Sandy Bridge generation architecture. Each of the 144 nodes will have 64 Gbytes of memory and 16 processors for a total of 2,304 cores and 9.216 terabytes of memory.

The iDataplex will use the same compiler suite as CSM’s current HPC platforms. Many application that are being run on these machines today could run on the new machine without a recompile. However, because of the updated processor instruction set available on the new machine we would expect improved performance with a recompile.

The iDataplex will be given the name AuN, “Golden”.

Initially, users will login to either AuN or MC2 to compile and run their applications. We will soon deploy a common login node which will be given the name BlueM. Thus the overall machine will be know as BlueM.

We have put together a short video describing the history of HPC at Mines and the new machine, available at http://hpc.mines.edu/Resources. There is a poster with a description of the new machine at http://hpc.mines.edu/sc12pdf/bluem.pdf.

As discussed above applications built on RA and Mio should be relatively straight forward to move to AuN. Some will work without even recompiling. Some will require recompiling because libraries have changed. However, almost all applications will benefit from a rebuild because there are additional optimizations that can be done for the updated processors.

For MC2 all applications will need to be rebuilt. The processors on the BGQ are in a different family and thus the binaries and compilers are different. Again, many applications will be easy to rebuild but getting optimum performance will require more than just a recompile. Fortunately, many of the applications used by Mines researchers are also used at institutions that already have a BGQ so much of the porting work, has been, or is being done.

We have started collecting documentation for the BGQ. The link is: http://hpc.mines.edu/bgq/.

What we need form you.

Please send an email to hpcinfo@mines.edu with the subject line “BlueM applications to port” listing the applications that you would like to see running on the new machine. Also, please include the typical number of cores that you use for a run and typical runtime. Finally include an example run script. We may ask for data sets in the future. Our goal, as always, is to help scientists do their science. To do this we need your input, without which, you may be left behind.

Additional Plans

The default scheduler a package on the BGQ is called Loadleveler. Initially we will use Loadleveler. However, we will quickly migrate to a scheduler called slurm with is used by many sites that have BGQ platforms (LLNL and RPI for example). The advantage of slurm is that can be used on other platforms. We will be moving all Mines HPC platforms to use the common slurm scheduler.

Also, we are deploying a new file system for Mio. This file system will be visible on the head node of BlueM and BlueM’s file system will be visible on the head node of Mio. You will not be able to see the “remote” file system from compute nodes. This will facilitate transfer of data between the platforms but prevent people from saturating the network by running parallel applications on one platform while addressing the file system on the other. We will be sending out a memo about this new file system shortly.

There will also be an update of the operating system on Mio. This will enable some compiler optimizations that are not available under the current OS.

Tech Fee has funded the purchase of Intel Phi nodes for Mio. These nodes are designed to run massively threaded and hybrid MPI/threaded applications. This will be installed shortly.

Finally, we will be hosting a number of discussions and meeting to facilitate the transition to the new machine and the changes on Mio. Stay tuned!

CSCI 580

I will be teaching CSCI 580, Advanced High Performance Computing, again this fall. We will spend a great deal of time in this class on programming for new machines, especially BlueM. I would like to encourage all groups to have students sign up for this class. It will also be useful for new grad students.

SC13

This year’s annual supercomputing conference, SC13, will be held in Denver, Nov. 18-21, http://sc13.supercomputing.org. Mines will again have a booth and will be participating in the Front Range Consortium for Research Computing, http://frcrc.org, booth. Our booth will be used to highlight the research done at CSM that uses high performance computing. This will be done primarily by in-booth poster presentations. All groups using BlueM will be expected to contribute to the booth by sending a representative to present their poster in the booth. This is independent of the conference poster session.

Timothy H. Kaiser, Ph.D.
Director of Research and High Performance Computing
Director Golden Energy Computing Organization
tkaiser@mines.edu

The National Center for Atmospheric Research is sponsored by the National Science Foundation. Any opinions, findings and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Infrastructure for BlueM is provided via support from The National Center for Atmospheric Research and by the National Science Foundation.