How do I do a simple build and run?

The files discussed here can also be obtained from:
http://hpc.mines.edu/bluem/quickfiles/example.tgz

This page shows how you can build and run a simple example on AuN, Mc2, or Mio.

While these examples were completed on Mio, the procedure is the same on BlueM and Aun with a minor exception noted below.

Note that the "makefile" and run scripts discussed here can be used as templates for other applications.

To run the quick start example create a directory for your example and go to it.

[joeuser@mio001 bins]$  mkdir guide
[joeuser@mio001 bins]$  cd guide

Next we want to ensure the Set up your environment to run parallel applications. The following two commands will give you a clean, tested environment.

[joeuser@mio001 guide]$  module purge
[joeuser@mio001 guide]$  module load reset

Copy the file that contains our example code to your directory and unpack it.

[joeuser@mio001 guide]$  cp /opt/utility/quickstart/example.tgz  .
[joeuser@mio001 guide]$  tar -xzf *

If you like, do an ls to see what you have.

[joeuser@mio001 guide]$  ls
add.f90  color.f90  complex_slurm  docol.f90  example.tgz  
helloc.c  input  makefile  phostname.c  simple_slurm  threads_slurm

Make the program

[joeuser@mio001 guide]$ make
echo mio001
mio001
mpif90 -c color.f90
mpicc -DNODE_COLOR=node_color_  helloc.c color.o -lifcore -o helloc
rm -rf *.o

On AuN and Mc2 you need to supply an account number to run parallel applications. Mio does not require accounts numbers. So you next find out which accounts your are authorized to use on each machine.

[joeuser@aun002 auto]$  /opt/utility/accounts
Account
--------------------
science
test

If you run this command on Mio you will get:

[joeuser@mio001 guide]$  /opt/utility/accounts 
Accounts strings are not required on Mio

Now you are ready to run a parallel application. On Mio you would do the following:

[joeuser@mio001 guide]$  sbatch  simple_slurm
Submitted batch job 1993

On AuN and Mc2 you add a -A option to the command line followed by the an account string from the command given above.

[joeuser@aun001 guide]$  sbatch  -A test  simple_slurm
Submitted batch job 1993

If you receive the message shown below that means that the account you have specified has run out of time. Try another.

sbatch: error: Batch job submission failed: Job violates accounting/QOS policy 
(job submit limit, user's size and/or time limits)

If you quickly enter the command below you may/will see you job waiting to run or running. A "USER ST" of "PD" implies that it is waiting "R" means it is running.

[joeuser@mio001 guide]$  squeue -u $USER
  JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
   1993   compute   hybrid  joeuser PD       0:00      2 (Priority)

If this command returns no jobs listed then your job is finished. If the machine is very busy then it could take some time to run.

When the job is complete there will be an output file in your directory that starts with the word "slurm" then contains the jobid from the sbatch command followed by the work out.

For example:

[joeuser@mio001 guide]$  ls slurm*
slurm-722122.out

The simple test program is a glorified parallel "hello world" program. You will see 16 lines that start with the name of the nodes on which you are running, followed by the MPI task id which should be in the range 0-15 and the the number 16 which is the number of tasks you are running. Next we have a number which will be either 0 or 8. This is the MPI task number of the lowest task on a node.

You will also see 2 additional lines that are the basically the same output described above except we have "First task". There is one line output per node.

The command cat slurm*.out will show you the output of the job. You can see your output in a nice order you can use the sort command:

[joeuser@mio001 guide]$  sort slurm*.out  -k1,1 -k2,2n | grep 16
compute028 0 16 0
compute028 1 16 0
compute028 2 16 0
compute028 3 16 0
compute028 4 16 0
compute028 5 16 0
compute028 6 16 0
compute028 7 16 0
compute029 8 16 8
compute029 9 16 8
compute029 10 16 8
compute029 11 16 8
compute029 12 16 8
compute029 13 16 8
compute029 14 16 8
compute029 15 16 8
First task on node compute028 is 0 16 0
First task on node compute029 is 8 16 8

Just to note, the sort options -k1,1 sorts on the first word in the output. The next option -k2,2n sorts on the second column numerically. The grep command filters out every line that does not contain "16", giving us only those lines of interest.

Congratulations, you have run your first super computing program.

The script complex_slurm runs the same program but it adds a number of features to the run. It first creates a new directory for you run, then goes to it and runs your program there.

The script threads_slurm show how to run a hybrid MPI/OpenMP program. The program it runs in /opt/utility/phostname. This is again a glorified "hello world program that also prints thread ID. Note the source for this program is included in the directory and it can be made using the command make phostname.