egrep sed vi awk perl Pattern Matching and Regular Expressions sed & awk

egrep sed vi awk perl Pattern Matching and Regular Expressions sed & awk • All these Unix commands support using regular expressions to describe patt...
Author: Lynn Porter
2 downloads 0 Views 29KB Size
egrep sed vi awk perl Pattern Matching and Regular Expressions sed & awk

• All these Unix commands support using regular expressions to describe patterns. • There are some minor differences between the regular expressions supported by these programs – see the book for details. • We will cover the general matching operator. 1

2

Not Filename Expansion!

Search Patterns

• Although there are similarities to the metacharacters used in filename expansion – we are talking about something different!

• Any character (except a metacharacter!) matches itself. • The "." character matches any character except newline.

Filename expansion is done by the shell. "F." Matches an 'F' followed by any character. "a.b" Matches 'a' followed by any1 char followed by 'b'.

Regular expressions are used by commands (programs). 3

4

Searching for Metacharacters

Character Class

If you really want to match '.', you can use "\." Regexp a.b a\.b

Matches axb a.b

Does not match abc axb

[abc]

matches a single a b or c

[a-z]

matches any of abcdef…xyz

[^A-Za-z] matches a single character as long as it is not a letter.

5

6

[Dd][Aa][Vv][Ee]

Repetition using *

• Matches "Dave" or "dave" or "dAVE",

* means 0 or more of the previous single character pattern. [abc]* matches "aaaaa" or "acbca"

• Does not match "ave" or "da"

Hi Dave.* 0*10

matches "Hi Dave" or

"Hi Daveisgoofy" matches "010" or "0000010" or "10"

7

Repetition using +

? Repetition Operator

+ means 1 or more of the previous single character pattern. [abc]+ matches "aaaaa" or "acbca" Hi Dave.+ 0+10

8

? means 0 or 1 of the previous single character pattern. x[abc]?x matches "xax" or "xx"

matches "Hi Dave." or A[0-9]?B

"Hi Dave…." matches "010" or "0000010"

matches "A1B" or "AB"

does not match "a1b" or "A123B"

does not match "10" 9

Using regexps with grep

Grouping with parens

grep regexp files…

egrep [a-z][0-9] file1 egrep "[a-z][0-9]" file1

10

• If you put a subpattern inside parens you can use + * and ? to the entire subpattern.



a(bc)*d matches "ad" and "abcbcd" does not match "abcxd" or "bcbcd"

   11

12

Grouping and Memory

More than one memory • \1 is the substring that matches the first regexp in parenthesis.

• The string that matches the stuff in parentheses can be used later in the regular expression:

• \2 matches the second substring … ([a-z]+)[ \t]+\1 • Up to \9 matches "n n" or "book book" 13

What does this match?

14

Sed – stream editor • Supports many types of editing operations, we look at one very useful one – substitution.

http:\/\/([a-z])(\1)\2\.*\.com

s/regexp/replacement/modifier

sed 's/[aeiou]/\$/g'

15

16

sed Command Line

sed Commands

sed [-n] –f scriptfile file(s) -orsed [-n] [–e] ‘command’ files(s)

[address[,address]][!]command[arguments]

• Each command is applied to any input lines that match the address(es). • The commands are editing commands:

-n means supress default output (doesn’t print anything unless you tell it to). -f commands are in the file scriptfile

– append, replace, insert, delete, substitute, translate, print, copy, paste,…

-e used when specifying multiple commands from the command line. 17

18

sed addresses

sed addresses (cont.)

• The address for a command can be:

• Each address can be:

– No address: matches every line of input – 1 address: any line that matches the address. – 2 addresses: lines that are between a line that matches the first address and one that matches the second adress (inclusive). – Address followed by !: any line that would not be matched by the address.

– line number – the ‘$’ character (means the previous line). – a regular expression inside /s

/foo/ /[hH][eE][aA][dD]/

19

20

Some Commands

Some examples

d delete line

/Windows/d deletes lines that contain the word Windows.

p print line h copy to temp buffer

/[Uu]nix/!d deletes lines that do not contain the word unix.

g paste (replace with temp buffer).

/[0-9]/p digits.

prints lines that contain any

21

22

Entire command lines substitutions

> sed ‘1,5/d’ somefile prints everything but the first 5 lines of the file somefile (same as tail +6 somefile).

s/regexp/replacement/[flags]

> sed –n ‘/foo/p’ /usr/dict/words – replaces anything that matches the regular expression with the replacement text.

just like grep foo /usr/dict/words

> sed –n ‘/START/,/STOP/p’

sed –n ‘s/[0-9]/#/’

prints everything from STDIN between a line that contains START and one the contains STOP

matches every line (no addresses!) replaces the first digit on line with “#” 23

24

Substitution Flags g

substitution examples sed ‘s/[wW]indows/Unix/g’

global (replace all matches on the line).

sed –n ‘s/f/F/gp’

p print the line if a successful match w

write the line to a file if a match was found.

print every line containing ‘f’ after replacing all ‘f’s by ‘F’s.

n replace the nth instance (match) only. 25

sed scripts

26

Possibly Interesting Example

• Series of commands can be put in a file and use the ‘-f’ option. • Can also create an sed script: #!/bin/sed –nf s/vi/emacs/g /[Ww]indows/d p

• sed script to remove all HTML tags from a file: #!/bin/sed –nf # sed script to remove tags s/]*>//g p

27

Even more interesting

28

awk

• Replace all HTML tags with the tag name (something like would become B)

• Pattern matching program – useful for automating complex text-handling chores.

#!/bin/sed –nf # s/]*)>/\1/g p

• You create a script that is a series of patterns and corresponding actions (sounds like sed, but is really quite different).

whatever matched the regexp inside the parens! 29

30

Awk script format

Awk and input

/pattern1/ { actions… }

• Awk automatically splits each line of input on the field separator FS (by default whitespace).

/pattern2/ { actions… } BEGIN { actions… } END { actions… }

these actions happen before input is read!

these actions happen after all the input is read!

• Awk creates the variables $1, $2, $3… that correspond to the resulting fields (just like a shell script). $0 is the entire line.

31

Lots of built-in variables

Example awk '{ for (i=1; i