Unix Shell Scripts: Regular Expressions

What Are Regular Expressions?

A regular expression is a pattern template you define that a Linux utility Uses to filter text. A Linux utility (such as the sed editor or the gawk program)matches the regular expression pattern against data as that data flows Into the utility. If the data matches the pattern, it’s accepted for processing.

If the data doesn’t match the pattern, it’s rejected. The regular expression pattern makes use of wildcard characters to represent one or more characters in the data stream.

Types of regular expressions:

There are two popular regular expression engines:

The POSIX Basic Regular Expression (BRE) engine
The POSIX Extended Regular Expression (ERE) engine

Defining BRE Patterns:

The most basic BRE pattern is matching text characters in a data stream.

Eg 1: Plain text

$ echo "This is a test" | sed -n ’/test/p’

This is a test.

$ echo "This is a test" | sed -n ’/trial/p’

$ echo "This is a test" | gawk ’/test/{print $0}’

This is a test.

$ echo "This is a test" | gawk ’/trial/{print $0}’

Eg 2: Special characters

The special characters recognized by regular expressions are:

.*[]^${}\+?|()

For example, if you want to search for a dollar sign in your text, just precede it with a backslash character:

$ cat data2

The cost is $4.00

$ sed -n ’/\$/p’ data2

The cost is $4.00

Eg 3: Looking for the ending

The dollar sign ($) special character defines the end anchor.

$ echo "This is a good book" | sed -n ’/book$/p’

This is a good book

$ echo "This book is good" | sed -n ’/book$/p’

Eg 4: Using ranges

You can use a range of characters within a character class by using the dash symbol.

Now you can simplify the zip code example by specifying a range of digits:

$ sed -n ’/^[0-9][0-9][0-9][0-9][0-9]$/p’ data8

60633

46201

45902

Extended Regular Expressions:

The POSIX ERE patterns include a few additional symbols that are used by some Linux applications and utilities. The gawk program recognizes the ERE patterns, but the sed editor doesn’t.

Eg 1: The question mark

The question mark indicates that the preceding character can appear zero or one time, but that’s all. It doesn’t match repeating occurrences of the character:

$ echo "bt" | gawk ’/be?t/{print $0}’

$ echo "bet" | gawk ’/be?t/{print $0}’

Bet

$ echo "beet" | gawk ’/be?t/{print $0}’

$ echo "beeet" | gawk ’/be?t/{print $0}’

Eg 2: The plus sign

The plus sign indicates that the preceding character can appear one ormore times, but must be present at least once. The pattern doesn’t match if the character is not present:

$ echo "beeet" | gawk ’/be+t/{print $0}’

beeet

$ echo "beet" | gawk ’/be+t/{print $0}’

beet

$ echo "bet" | gawk ’/be+t/{print $0}’

bet

$ echo "bt" | gawk ’/be+t/{print $0}’

Eg 3: The pipe symbol

The pipe symbol allows to you to specify two or more patterns that the regular expression engine uses in a logical OR formula when examining the data stream. If any of the patterns match the data stream text, the text passes. If none of the patterns match, the data stream text fails.

The format for using the pipe symbol is:

expr1|expr2|...

Here’s an example of this:

$ echo "The cat is asleep" | gawk ’/cat|dog/{print $0}’

The cat is asleep

$ echo "The dog is asleep" | gawk ’/cat|dog/{print $0}’

The dog is asleep

$ echo "The sheep is asleep" | gawk ’/cat|dog/{print $0}’

Eg 4: Grouping expressions

When you group a regular expression pattern, the group is treated like a standard character. You can apply a special character to the group just as you would to a regular character.

For example:

$ echo "Sat" | gawk ’/Sat(urday)?/{print $0}’

Sat

$ echo "Saturday" | gawk ’/Sat(urday)?/{print $0}’

Saturday

Unix Shell Scripts

Pages

Discussion Forum

Subscribe

Chat with Me

Total Visits

About Me

Saturday

Regular Expressions

1 comment :

Recent Comments

Index