Author: Riccardo Di Meo (International Centre for Theoretical Physics, EUIndiaGRID Project)

Exercise 1: Learning bash script incrementally

EXERCISE DESCRIPTION:

The aim of this exercise is to show the power of bash scripting starting with a very simple task (that can be satisfied just using a single command) and then increasingly add some more features ending up in bash script that will solve the original task in a more useful and complete way.

In this process, we will step by step introduce and discuss some features, tools, and tricks of bash scripting.

The task

Our goal is to periodically check a file on our machine: it may be the output file of a scientific code, a system log file, or something else.

In this specific case, we want to periodically check how much free memory we have at disposal. We will use here the /proc/meminfo file: it contains information about the memory usage on the system (if you want to know more, use the command "man proc" to know everything about that file, and have many more interesting details about the files in the /proc directory).

The 2nd line of this file tells us the free memory on the system. Since this file is dynamically generated by the operating system (all files in the /proc directory are), it changes in time as the memory use changes. Thus, we can keep an eye on how much memory is available on our computer just by checking it periodically.

 The steps:

   1. The naive way
   2. First step: print only the line that matters
   3. Second step: check the output every second
   4. Third step: save the line of interest to a file
   5. Fourth step: use a shorter period and more flexibility
   6. Fifth step: create your first script
   7. Sixth step: specifying the file for the output
   8. Sixth step: handling invalid input
   9. Seventh step: save the data in a format suitable for plotting
  10. Eight step: Preventing files from being overwritten 

The simplest and naive way

$ cat /proc/meminfo

and the command will print something like:

 MemTotal:       906716 kB
 MemFree:        431908 kB
 Buffers:         30496 kB
 Cached:         207928 kB
 SwapCached:          0 kB
 Active:         289864 kB
 Inactive:       112168 kB
 SwapTotal:           0 kB
 SwapFree:            0 kB
 Dirty:              12 kB
 Writeback:           0 kB
 AnonPages:      163628 kB
 Mapped:          78932 kB
 Slab:            23216 kB
 SReclaimable:    16036 kB
 SUnreclaim:       7180 kB
 PageTables:       1776 kB
 NFS_Unstable:        0 kB
 Bounce:              0 kB
 CommitLimit:    453356 kB
 Committed_AS:   612016 kB
 VmallocTotal:   122844 kB
 VmallocUsed:      4096 kB
 VmallocChunk:   117976 kB

Then we need to look at the text of the 2nd line, and repeat.

This simple task can be boring and quite inefficient done this way. Let us try to improve and automatize it.

First step : print only the line that matters

 $ cat /proc/meminfo | grep 'MemFree'
 MemFree:         28024 kB

The | (pipe) symbol should already be familiar: it connects the output (or more precisely, the standard output only) of a command to the input of another command, in this case grep.

 grep

grep is one of the most powerful commands in a linux/unix environment: it effectively filters the lines passed to it, printing only the ones that match a search criteria specified by the user (technically speaking, it prints the lines matching a pattern).

In our case, the pattern is MemFree: only lines containing that sequence are printed.

The ' quotes used around MemFree prevent the bash to interpret the content in any way: they are not necessary in this case, but they can become quite handy in other situations.

If invoked without a pipe, grep can be tested interactively against different rules: just type the line you want to test against them and if the result is the line printing twice, then it means that the line was matched (running a command that accepts input in the standard input without pipe is a common way to test it's behavior).

E.g.

 $ grep 'bar'
  bar
  bar
  foobar
  foobar
  BAR               <--- not matched: wrong case!

As you see, the first line written by the user (bar) matches the pattern, as the second (foobar) where the third doesn't. The test is done in a case sensitive way, which means that the case of letters matters (unless you specifically ask for a case-insensitive search with the -i flag.

To exit the program, just press Ctrl+d or Ctrl+c.

For more information and advanced use of the grep utility, see this link to know more.


Second step : check the output every second

The command watch executes its argument at specific intervals and thus allows us to repeat the test automatically:

 $ watch -n 1 "cat /proc/meminfo|grep MemFree" 

in this case, the free memory is checked ever second (the argument 1 of the -n option).

Here we also see another kind of quotes, the " quotes: they are used to group an expression, in this case, are used to tell watch that the whole string "cat /proc/meminfo |grep MemFree" should be periodically executed.


Third step : save the line of interest to a file

The easiest way to accomplish this, is to use output redirection with the > or >> operators (assuming you know them already), however this has the drawback of hiding the output from the user (nothing is printed on the terminal).

To get around this limitation, we can use the command tee: our command is now this:

$ watch -n 1 "cat /proc/meminfo|grep 'MemFree'|tee -a free_memory.txt"  

which will append (due to the -a option) everything received on it's standard input (through the pipe) to the file free_memory.txt.


Fourth step : use a shorter period and more flexibility.

Since watch doesn't support periods shorter than a second, we can try to write the instructions necessary to endlessly repeat the command by ourselves. To do this we will use a do-while bash construct.

$ while /bin/true
> do
> cat /proc/meminfo | grep 'MemFree' | tee -a free_memory.txt
> sleep 0.2
> done

Please note how the prompt changes after the while line. This is the bash way of telling you, that you are writing a block of code which is not yet closed.

The same line could have been written as:

$ while /bin/true; do cat /proc/meminfo | grep 'MemFree' | tee -a free_memory.txt; sleep 0.2; done

which is how bash will recall it, when you consult the bash history (e.g. by pressing repeatedly the "UP" arrow).

In Bash, the ";" symbol can be used to separate different commands, just like hitting the "enter" key.

The while-do block is one of the bash many kinds of cycles: you can follow this link to see some more examples.


Fifth step: create your first script

At this point, the chain of commands is becoming more and more complicated. In order to reuse them or modify to improve the functionality, we put them in a script.

Open your favorite editor, and then save a file named test_script.sh containing the following text:

  
#!/bin/bash
while /bin/true
do
  cat /proc/meminfo        |    
           grep 'MemFree'  |    
           tee -a free_memory.txt
  sleep 0.2
done

and then change the permissions of the file with the command:

$ chmod +x test_script.sh

The .sh extension is the standard convention for BASH/SH scripts and should be always used for them. Keep, however, in mind that in Linux (and Unix in general), file extensions have no functionality. They just are used to indicate to the user or to some applications the format of the file (some editors for example turn on some nice coloring rules to highlight the syntax of shell scripts, if they are saved with the .sh extension).

Now your script can be executed, if you are in the directory where it was saved:

$ ./test_script.sh
MemFree:        210252 kB
MemFree:        209880 kB
MemFree:        209880 kB
MemFree:        209880 kB
MemFree:        209880 kB
MemFree:        209880 kB
   '''(...)'''

or specifying it's full path or, if in the PATH, simply as test_script.sh.

The chmod sets the execution bit for the owner of the file test_script.sh (you, since you created it) and allows it to be executed.

Unix has a complex and flexible system of permissions which allows you to allow/prevent a file or directory to be read, written to or executed (entered, for a directory), as well as allowing other users to do the same.

This is extensively used by the system administrator to prevent other users to change system critical files or read sensible informations. Try to read the file /etc/shadow as user to get an idea.

The point is: after the chmod command, your script can be executed. Do read the chmod manual page to know more about unix permissions "man chmod".

The only differences between what we have done in the interactive execution and this script, are the first line, and the formatting (and the continuation markers and new lines after the |, which doesn't bother the bash interpreter). This has no impact on running the code, but improves the readbility and can thus be extremely useful, as soon as your scripts become longer..


Sixth step : specifying the file for the output

It can be convenient to be able to save the output of our command in a user specified file. In this way we can save the results of different runs without cluttering the results of old executions (appending data on them) by mistake.

#!/bin/bash

DELAY=0.2

output_file=$1

while /bin/true
do
  cat /proc/meminfo        |  
           grep 'MemFree'  |  
           tee -a ${output_file}
  sleep ${DELAY}
done

Now you have to execute your command like:

$ ./test_script.sh free_memory1.txt

and the data will be written on free_memory1.txt.

This script makes a first use of bash variables (more info. about that can be found here) and also uses a special kind of variables ($1) which are explained here )


Seventh step : handling invalid input

Since we are planning to use our newly created script multiple times, we need to make sure that it will handle wrong input from the user in some reasonable way.

In this simple example, if the user forgets to pass the name of the output file to the command, then it will run, though it will not do what expected (you can try).

A command that handles incorrect data behaving in a strange way is the worst kind of "error handling" (or better: absence of it), since it will do unexpected things when executed.

The next example will check if the user provided an argument, and exit with a meaningful error otherwise:

#!/bin/bash

# The delay between 2 subsequent checks, in seconds
DELAY=0.2

if test $# -ne 1
then
   # Not enough arguments
   echo -e "Usage:\n$0 output_file" 1>&2
   exit 1
fi

output_file=$1

while /bin/true
do
  cat /proc/meminfo        |  
           grep 'MemFree'  |  
           tee -a ${output_file}
  sleep ${DELAY}
done

What we did was simply to add a check for the number of parameters passed to the script ($#), and if it's not 1, then print an error message on the standard error stream (1>&2 redirects the standard output of the command echo to the standard error) telling the user what was wrong (see the echo man page to get the meaning of the -e option and of the \n sequence!)and exits with the correct exit code.

The conditional execution of the code is handled by the if-then construct, which is shown in more details on this page.

We also introduce comments: everything following a # sign until the end of the line is ignored by the BASH interpreter, therefore the # symbol can be used to introduce information for people reading the script.

Commenting the content of a script (or program) is a very good idea and a sign of respect for people that will read it (if you find yourself writing a lot of long comments however, it may be that your script/program is too convoluted and you should think a bit about simplifying it: comments are not a substitute for a good programming practice).


Eight step : save the data in a format suitable for plotting

#!/bin/bash

# The delay between 2 subsequent checks, in seconds
DELAY=1

if test $# -ne 1
then
   # Not enough arguments
   echo -e "Usage:\n$0 output_file" 1>&2
   exit 1
fi

output_file=$1

while /bin/true
do
  cat /proc/meminfo        |  
           awk '/MemFree/ {print systime(),$2}' | 
           tee -a ${output_file}
  sleep ${DELAY}
done

We used the awk program (info gawk, if the documentation for GNU awk is properly installed) to both replace the grep (and thus discarding all lines but the one matching MemFree) and better format the output (a column with the system time and another with the memory free is printed).

Keep in mind that the '$2' in the awk line is not a shell variables explained before (they are inside ' quotes which, as said, prevent the shell from interpreting as variables!) but are the gawk own way to refer to the first and second fields in a line!

More about gawk

'Ninth step : preventing files from being overwritten

#!/bin/bash

# The delay between 2 subsequent checks, in seconds
DELAY=1

if test $# -ne 1
then
   # Not enough arguments
   echo -e "Usage:\n$0 output_file" 1>&2
   exit 1
fi

output_file=$1


if test -f ${output_file}
then
   tmp=`mktemp -p . ${output_file}_XXXXX`
   echo "${output_file} already present: moving it to ${tmp}" 1>&2
   mv ${output_file} ${tmp}
fi

while /bin/true
do
  cat /proc/meminfo        |  
           awk '/MemFree/ {print systime(),$2}' | 
           tee -a ${output_file}
  sleep ${DELAY}
done

We used again the if-then construct to see if the file selected by the user for the output is present.

If so, we use the command mktemp to obtain the name for a file in the local directory which starts like the filename provided by the user, but ends with a sequence of characters provided by the mktemp utility, in a way that no other file with that name is present in the directory, and then move the file that would have been appended, there, preserving it.

In this way we can direct the output to another file in all safety, without the risk of losing any data.

References

  • The man pages

Installed by default in most Linux distributions, they are an invaluable source of information for most commands and should be the first place to check for references and examples.

To access them, type man followed by the command the man page of which you want to read.

Some notable man pages:

 man bash
 man echo
 man chmod
 man grep
  • The info pages

Not always installed by default, though a very useful source of information about the more complex programs, they can be navigated through a system of links.

Type just info to access the main index of all infopages.

Some notable links (not necessarily related to this bash script tutorial):

 info gawk
 info coreutils
 info gcc

though all documents in the Project are worth a glance.

  • Since most of the commands in a GNU/Linux distribution comes from

the GNU project, it is worth to mention the GNU site:

http://www.gnu.org

where the manuals for many commands, in many formats (even ready to be printed) can be found.

Here is the more relevant one, since it explains the most important commands used commonly in scripting:

http://www.gnu.org/software/coreutils/manual/

  • Last but not least: .

Use google to find what a command (you cannot find the man page for) does, find with it people which are having your same problems and see if they found a way to solve them, search for documentation.

Keep in mind that if your problem has already been solved (and for trivial problems this is true in 99% of the time), the answer can be found using it.