Marquette.edu // High Performance Computing //

Tips and Tricks for the Novice Bash User

Linux is one of those disciplines where you can spend your whole life getting better at and still not discover even a fraction of its capabilities. If you have already read through our Intro to the Linux CLI or already have a firm grasp on the basics, here are a couple of additional tips and tricks.

Wildcards

Wildcard characters are probably the easiest method for dramatically increasing your productivity on a Linux machine. Much like how in cards games a wild card can be used as a substitute for any other card in the deck, in bash, wildcard characters can be used to substitute for any other character in the alphabet (including numerals and symbols). The technical term for this substitution is called an expansion. The three most common wildecards are the asterisk (*) the question mark (?) and brackets([]). The asterisk can represent zero or more of any character, the question mark represents exactly one character and the brackets represent exactly one character which falls within the range of characters specified within the brackets. The best way to explain the power of wildcards is through an example.

Imagine you have twelve files in a directory: six figures (figure1.fig, figure2.fig, figure3.fig, figureA.fig, figureB.fig and figureC.fig) and six pictures (picture1.jpg, picture2.jpg, picture3.jpg, pictureA.jpg, pictureB.jpg and pictureC.jpg). If you wanted to organize these files into two separate directories, one for figures and one for pictures, you could move each file into its respective directory individually, or you could use the asterisk wildecard like so:

mkdir figures
mkdir pictures
mv *.fig figures
mv *.jpg pictures

Here the command

mv *.fig figures

expands to

mv figure1.fig figure2.fig figure3.fig figureA.fig figureB.fig figureC.fig figures

The command to move the jpgs into the pictures directory expands in a similar fashion.

If you instead of organizing the files by type, you wanted to separate the files based on whether they were specified by a letter or number you could use bracket wildcards like so:

mkdir letters
mkdir numbers
mv *[A-C].* letters
mv *[1-3].* numbers

Notes on Brackets

The - symbol denotes a range so [A-C] is equivalent to [ABC], but not equivalent to [AC], which would match characters A or C, but not B
Bracketed wildcards are case sensitive so [A-C] is not equivalent to [a-c]

Let's look at a slightly more complicated scenario. Assume you are in the letters directory, and you have just finished a secondary analysis where you combined the results of figures A and B into figure AB, and likewise for figures A and C, as well as figures B and C. You now want to move figures A, B and C into a directory called individual, and figures AB, AC and BC into a directory called combined. Since the individual results are denoted with a single letter and combined results are denoted with two letters, separating them can be easily done using the question mark wildcard.

mkdir individual
mkdir combined
mv figure?.fig individual
mv figure??.fig combined

Curly Brace Expansions

Although not technically a wildcard, curly brace expansions ({}) serve a similar function. Curly braces will expand to any comma-separated list of characters, strings or numerals specified. For example, consider the previous scenario in which you had group a files you were attempting to sort via various criteria. When attempting to sort into files by letters or numbers, we used the commands:

mkdir letters
mkdir numbers
mv *[A-C].* letters
mv *[1-3].* numbers

However, we could use curly brace expansions to create the directories letters and numbers in a single command like so:

mkdir {letters,numbers}

Which expands to:

mkdir letters numbers

Although less efficient, we could also use curly brace expansion to substitute for the move commands as well:

mv {picture,figure}{A,B,C}.* letters
mv {picture,figure}{1,2,3}.* numbers

When two or more sets of curly braces are used in a single command, the expansion occurs in phases. So the command:

mv {picture,figure}{1,2,3}.* numbers

would first expand to:

mv picture{1,2,3}.* figure{1,2,3}.* numbers

Then expand to

mv picture1.* picture2.* picture3.* figure1.* figure2.* figure3.* numbers

and then lastly use the wildcard to fill in the file extension. We could make this command shorter using sequence notation, which is a shortcut for denoting a numerical sequence. A sequence has the form {x..y} where x is the number or letter which starts the sequence, and y is the number or letter which ends the sequence. Using sequence notation our commands to move the pictures and figures could be rewritten as:

mv {picture,figure}{1..3}.* numbers
mv {picture,figure}{A..C}.* letters

Important Differences Between Brace Expansions and Wildcard Characters

Wildcard characters are explicitly designed to look though the listing of the current working directory and find files whose names match a particular pattern (e.g., *.fig looks for a file name which fits the pattern "has file extension .fig"). Curly brace expansions expand to the elements of the list period. They are not explicitly designed around file patterns. So in the above example, you would not want to issue the command:

mv {picture,figure}{1..3}.{fig,jpg} numbers

As bash would now attempt to find a file named picture1.jpg as well as picture1.fig, which would result in an error since there is no file named picture1.fig.

Environment Variables

A computing environment is the environment in which processes are run. It contains information about a user's current login session which is used by the operating system (OS) when executing a command. An environment variable is a variable which controls a user's current environment. An environment variable can be set in the local shell by simply setting the variable equal to some value:

VAR=value

To reference the variable use a $ like so:

$VAR

To export the value of a variable outside the shell to other running processes use the command:

export $VAR

Assigning and exporting a value to an environment variable can also be done in a single command:

export VAR=value

Common Environment Variables

Most of these common environment variables are set upon login. However, this is not an exhaustive list of all environment variables necessary for a functioning environment, nor does it include the countless optional environment variables used by individual programs.

HOME

The location of the user's home directory.

USER

The user's username.

SHELL

The current shell being used to interpret commands input into the terminal.

PWD

Points to the current working directory.

PATH

A list of colon (:) separated fully qualified paths which tell the shell where to look for executables.

LD_LIBRARY_PATH

A list of colon separated fully qualified paths which tell the shell where to look for shared objects (libraries).

Example

Assume you have just written and compiled some c code. You now wish to use this new executable on several different simulations. You could copy the binary into each simulation folder and execute it from there, or you could create a bin subdirectory in your home directory and add it to your PATH.

First we can examine our current path using the echo command:

echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

Now let's make the bin directory and add our executable, in this example called my_executable, to our PATH:

mkdir bin
cp my_executable bin
export PATH=/mmfs1/home/user/bin

The issue with the above commands is that you have now erased your current PATH and replaced it with a new PATH, as we can see by reexamining our PATH:

echo $PATH
/mmfs1/home/user/bin

This can be a major problem as basic commands like ls, cd, pwd, and more are located in /bin, and by removing from your PATH they are no longer usable. To properly add a directory to your PATH you should use a command like this:

export PATH=$PATH:/mmfs1/home/user/bin

This appends the new directory to the end of the current path. You could also use the HOME variable to make this command more succinct:

export PATH=$PATH:$HOME/bin

Now we can see that the new folder was added to the PATH instead of replacing it.

echo $PATH
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/mmfs1/home/user/bin

Bash Scripting Basics

A bash script is a text file which contains a list of bash commands to be executed upon invocation of the script. Every bash script starts with a hashbang (sometimes referred to as a shebang line) which looks like this:

#!/bin/bash

Quick Tip

Don't forget to check if your script is executable using the 'ls' command
If not, you can add executable permissions using the 'chmod' command

This informs the OS to interpret all the following lines in the script as bash commands. After the hashbang line, any command that could typed into a terminal can be added to the script. Additionally, in every line after the hashbang line the pound sign (#) is used to add comments (i.e., everything after a pound sign is ignored by the interpreter). For example, a simple script to compile and run a c program would look like this:

#!/bin/bash
gcc -o myprog myprog.c #compile program using gcc
./myprog #run program

Assuming the name of the script is compile.sh, it would then be run using the following command:

./compile.sh

Variables and Arguments

Variables can be assigned and referenced in the same manner as environment variables. You assign the variable var the value of hello like so:

var=hello

It can then be referenced using the $ symbol like so:

echo $var

To pass arguments into a bash script from the terminal, you can add the arguments after script name when calling the script. The arguments can then be referenced inside the script using the $ symbol followed by the numerical position of the argument. For example, a bash script which looked like this:

#!/bin/bash

echo "argument 1 is $1"
echo "argument 2 is $2"
echo "argument 3 is $3"

Could be called like this:

./my_script argument1 a2 arg3

and would produce the following output:

argument 1 is argument1 
argument 2 is a2 
argument 3 is arg3

A more practical example would be to modify the earlier compile.sh script to accept arguments instead of hard coding the name of the c code to be compiled. Modifying it to use arguments would yield the following script.

#!/bin/bash

gcc -o $1 $1.c
./$1

Then to compile the code myprog.c, you would invoke the compile.sh script using the command:

./compile.sh myprog

This script can now be used to compile and run multiple different programs inside the same folder.

For Loops and If Statements

For Loops

For loops are a useful tool for issuing identical commands to multiple targets. For example, say you have just finished running four iterations of a simulation and the results the simulations are located in the directories, sim1, sim2, sim3 and sim4, respectively. You now want to analyze on the data inside those directories using a Python script called analyze.py located somewhere in your PATH environment. Doing this without a for loop would result in a script which looks something like this:

#!/bin/bash

cd sim1
python analyze.py
cd ../sim2
python analyze.py
cd ../sim3
python analyze.py
cd ../sim4
python analyze.py

Utilizing a for loop we can define the variable dir as our target directory then iterate through the four simulation directories like so:

#!/bin/bash

for dir in sim1 sim2 sim3 sim4
do
  cd $dir
  python analyze.py
  cd ..
done

The for loop is introduced by the for command, dir is the name of the variable to be defined, and in separates the variable from the list of space separated items to be iterated through. The do command begins the list of commands to be executed on each iteration of the for loop and the loop is terminated with the done command. If we then add in curly brace expansions, we can further simplify the script like so:

#!/bin/bash

for dir in sim{1..4}
do
  cd $dir
  python analyze.py
  cd ..
done

If Statements

An if statement allows you to execute a set of commands if and only if a specific condition is met. A simple if statement looks like this.

if [ condition ]
then
   commands
fi

To add a additional conditions you can add an elif clause like so.

if [ condition ]
then
   commands
elif [ condition ]
then
   more commands
fi

To have a default set of command that runs if none of the conditions are met you can add an else clause.

if [ condition ]
then
   commands
elif [ condition ]
then
   more commands
else
   more commands
fi

Some basic numeric conditionals can be seen in the table below:

Numeric Conditional	Meaning
! EXPRESSION	The EXPRESSION is false.
-n STRING	The length of STRING is greater than zero.
-z STRING	The length of STRING is zero (i.e. it is empty).
STRING1 = STRING2	STRING1 is equal to STRING2
STRING1 != STRING2	STRING1 is not equal to STRING2
INTEGER1 -eq INTEGER2	INTEGER1 is numerically equal to INTEGER2
INTEGER1 -gt INTEGER2	INTEGER1 is numerically greater than INTEGER2
INTEGER1 -lt INTEGER2	INTEGER1 is numerically less than INTEGER2
-d FILE	FILE exists and is a directory.
-e FILE	FILE exists.

For example, if you wanted to create a script which detected whether or not an argument is above or below 10.

#!/bin/bash

if [ $1 -gt 10 ]
then
   echo "Argument above 10"
elif [ $1 -lt 10 ]
then
   echo "Argument is below 10"
else
   echo "Argument is not above or below 10"
fi

Next Steps

For a more in depth tutorials check out these resources:

Ryan's Tutorials
Linux Config

Marquette Resources

The Basics

More Advanced