Instructions

These examples exercise the GPUs on the Power nodes in various ways. To build/run these examples:

  1. In a new directory, download the examples
    wget http://geco.mines.edu/prototype/Show_me_Power8_and_GPU_examples/gpu/gpu.tgz
  2. Uncompress it
    tar -xzf gpu.tgz
  3. Get an interactive session on ppc001 or ppc002.
    srun -N 1 --tasks-per-node=1 -p ppc-build --share --time=1:00:00 --gres=gpu:kepler:4 --pty bash
  4. Run the script buildit. This sets up the environment and does a make.
    ./buildit
  5. Exit the interactive session.
    exit
  6. Run the batch script
    sbatch -p ppc power_script

buildit

A script that sets up the environment and then does a make

Makefile

Makefile for the examples.

gpucount.c

This program returns the number of GPUs detected on a node. It should be 4 for ppc001 and ppc002. If not, there is a problem with your environment.

simple.f90

A very simple OpenAcc program.

laplace2d.c
laplace2d.f90

Jacobi relaxation Calculation in OpenAcc and OpenMP This is from the Nvidia workshop.

timer.h

Timer code for laplace2d.c.

multi.cu
multShare.h
multi_cuda.cu

A matrix multiply in Cuda from
https://www.shodor.org/media/content/petascale/materials/UPModules/matrixMultiplication/moduleDocument.pdf

simpleCUFFT.cu
MakeFFT

cuFFT library example
See: https://developer.nvidia.com/gpu-accelerated-libraries

testinput.cu
testinput.f90

C and Fortran Cuda programs. The CPU code accepts the Grid and Block dimensions then calls the kernel. We note that the number of threads for the kernel is the product of the grid and block dimensions. The kernel simply fills in an array of length 6*(# threads). The first element of each set of 6 is a thread number. Then we have: blockIdx.x, blockIdx.y, threadIdx.x, threadIdx.y, and threadIdx.z. Finally, the CPU prints this array. The file "input" is for this program.

input

Input for testinput.cu and testinput.f90

power_script

A script for running the examples.