TRAINING

Introduction

One can get started by reading the HPC Wiki at Marquette HPC website, which has details for getting an account and using Père, the primary resource on MUGrid. Below is a short narrative on how one might use distributed resources to solve big computational tasks. Additional tutorials are available as well.

A Short Example

One can use a grid to solve big computational tasks. Big could mean running independent simulations millions of times with parameters changing (sometimes called distributed or embarassingly parallel). Big could also mean a massively parallel finite element simulation of a really large geometry with billions of nodes. Père and Pario are right for the latter type. The former can be done on all of MUGrid.

Suppose one has a simulation program that is called from the command line as follows:

$ runsimulation 27

Where runsimulation is the executable and 27 is the input parameter which can change ($ just represents the prompt). Also suppose that the output of this runsimulation program is given to an output file out27.dat. To have that simulation run on MUGrid, a submission script is needed, let's call it submit27.condor, shown here:

universe = vanilla
executable = runsimulation
transfer_output_files=out27.dat
output = 27.out
error = 27.err
log = 27.log
Arguments = 27
requirements = Arch == "INTEL" &&  OpSys == "LINUX"
should_transfer_files = true
when_to_transfer_output = on_exit
queue

In the file, the executable is runsimulation, the argument is 27 (you can have a list), the outputfile is out27.dat and the rest in this simple example can be viewed as overhead.

To submit it, log into a submit host making sure a working version of runsimulation is on it and the submit27.condor file is there and then just type:

$ condor_submit submit27.condor

Then watch the progress of you job by typing

$ condor_q

If you want to submit runsimulation with parameters ranging from 0 to 999, a new script is needed, shown here in a file called submitall.condor.

universe = vanilla
executable = runsimulation
transfer_output_files=out$(Process).dat
output = $(Process).out
error = $(Process).err
log = $(Process).log
Arguments = $(Process)
requirements = Arch == "INTEL" &&  OpSys == "LINUX"
should_transfer_files = true
when_to_transfer_output = on_exit
queue 1000

This will run all 1000 jobs for you and out0.dat through out999.dat will be waiting for you at the submithost when they are done.

References


SITE MENU

Circuit Board

Campus Grids

Campus grids link together computing resources at an institution to support research and collaboration. A goal of campus grids is to provide a seamless workflow from data collection to analysis and dissemination of results. The campus grid is an essential component of discovery in the 21st Century. Read the Cyberinfrastructure Vision for 21st Century Discovery report for more details.