Latin American School On Computational Materials Science

Regular expressions

For the criteria specified above (the matching patterns) grep

Regular expressions are a concise and flexible means for identifying string of text of interest (from "Regular expressions" at wikipedia ): they can be searched the same way as normal strings are, as long as you just use letters and numbers, but have different meanings than them when other characters are used (like ".", "$" and many others).

A regular expression is a sequence of characters that encodes a number of rules a certain string may (or may not) possess. A string that satisfies the rules in a regular expression is said to be "matched" by it.

Regular expressions may be very simple, like this: a which matches any string containing at least 1 a or ar which matches, for example: radar and garage.

Almost all characters which are not letters or numbers have different meanings in the regular expressions that we are used to: for example the regular expression . doesn't match just any string containing a dot, but in fact all strings containing at least one character.

This because the . symbol matches (finds a correspondence) with any single character, therefore the regular expression fo. will match not just fo. but foa, foO,fo1, fo+ and any other string containing the characters fo followed by at least another one.

Just to give you an idea of the things the regular expressions can do, this regular expression:

^(test|sim)data_[[:digit:]]+\.dat$

will match all strings starting (^) with either test or sim ((test|sim)), followed by data_ (data_), at least one or more (+) digits (:digit:?) and terminating ($) with .dat (.dat).

You can test it using grep:

$ grep -E '^(test|sim)data_[[:digit:]]+\.dat$'
testdata_1.dat
testdata_1.dat
testdata_.dat         <-- not matched: _ not followed by 1 or more digits
simdata_3200.dat
simdata_3200.dat
foo testdata_1.dat    <-- not matched: not starting with test/sim
testdata_1.dat2       <-- not matched: not terminated by dat

Regular expressions allow also for very complex rules to be built gradually from simple ones and they are used everywhere in unix systems: knowing them at at least a basic level is required.

The manual page for grep is a good starting point for learning both the basics of regular expressions and grep (which is also a must know): read it and experiment with them.

Latin American School On Computational Materials Science

Menu