Extended pattern matching in BASH

Many features provided by BASH are not widely known or used, but they really can be useful. One example is extglob (extended pattern matching) – with this, a pattern can be almost as powerful as a regular expression.

Use “shopt -s extglob” to enable this feature. After that, in addition to the standard asterisks, question marks and square brackets, we can also use the following five sub-patterns:

?(pattern-list): Matches empty or one of the patterns
*(pattern-list): Matches empty or any number of occurrences of the patterns
+(pattern-list): Matches at least one occurrences of the patterns
@(pattern-list): Matches exactly one of the patterns
!(pattern-list): Matches anything EXCEPT any of the patterns

The pattern-list represents one or more patterns, which can again contain these extended sub-patterns, delimited by pipe signs (|). Two simple examples:

  • rm -rf !(lost+found)
    Removes everything except lost+found
  • for x in *.@(jp?(e)g|gif|png)
    Loops through all files having extension jpg, jpeg, gif, or png

The following example is a little more complicated. It prints a list of default/GNU/Intel C/C++ compilers present in directories specified by $PATH:

#!/bin/bash
shopt -s extglob nullglob
x="${PATH//:/,}"
eval "printf '%s\n' {$x}/@(?([ig])cc|[cg]++|icpc)?(-+([0-9])+(\.+([0-9])))"

(NOTE: nullglob makes a pattern matching no file to expand to nothing instead of unchanged.)

Doesn’t it look like a regular expression? The output is like this in my system:

/usr/bin/c++
/usr/bin/c++-4.2.4
/usr/bin/c++-4.3.3
/usr/bin/cc
/usr/bin/g++
/usr/bin/g++-4.2.4
/usr/bin/g++-4.3.3
/usr/bin/gcc
/usr/bin/gcc-4.2.4
/usr/bin/gcc-4.3.3
/usr/x86_64-pc-linux-gnu/gcc-bin/4.2.4/c++
/usr/x86_64-pc-linux-gnu/gcc-bin/4.2.4/g++
/usr/x86_64-pc-linux-gnu/gcc-bin/4.2.4/gcc
/opt/intel/cce/10.1.018/bin/icc
/opt/intel/cce/10.1.018/bin/icpc

Unfortunately, we cannot use the following codes:

x="${PATH//:/|}"
printf '%s\n' $x/@(?([ig])cc|[cg]++|icpc)?(-+([0-9])+(\.+([0-9])))

This is not surprising, however. No sub-patterns is allowed to expand to a string including forward slashes (path delimiter)[1]. (This means a single asterisk won’t expand to a file in a subdirectory, which is usually desired. Bash 4 has introduced ** which matches slashes as well.)

[1] In a case statement or a [[ ]] builtin (using the == operator), sub-patterns indeed match slashes.


Related posts:

  1. globstar in bash 4 follows directory symlinks

Tags: ,

One Comment

Leave a Reply

*

Hint: Register at Gravatar and your comments will be accompanied by your personalized icon.