Posts Tagged ‘shell’

globstar in bash 4 follows directory symlinks

Globstar is a new feature is bash 4, allowing us to traverse a directory more easily.

Unfortunately, it follows directory symlinks and thus can easily cause problems.

(bleeding) desktop t # echo ${BASH_VERSINFO[@]}
4 0 17 2 release x86_64-pc-linux-gnu
(bleeding) desktop t # shopt -s globstar
(bleeding) desktop t # ls -l
total 0
lrwxrwxrwx 1 root root 1 2009-04-16 18:58 t -> .
(bleeding) desktop t # find
.
./t
(bleeding) desktop t # echo **
t t/t t/t/t t/t/t/t t/t/t/t/t t/t/t/t/t/t t/t/t/t/t/t/t t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t/t
(bleeding) desktop t #

Oh no…

If you unfortunately tried something like echo /proc/**/meminfo, it probably would make you wait for minutes before dying with “Insufficient memory.” (In /proc/fd there resides a root symlink.)

If you use GRUB to boot your Linux system, you are likely to find a symlink in /boot also named boot pointing to the directory itself. Yes, this is going to confuse bash, too. And there surely are many more cases.

So let’s continue writing find ... | xargs ...

Tags: ,

Extended pattern matching in BASH

Many features provided by BASH are not widely known or used, but they really can be useful. One example is extglob (extended pattern matching) – with this, a pattern can be almost as powerful as a regular expression.

Use “shopt -s extglob” to enable this feature. After that, in addition to the standard asterisks, question marks and square brackets, we can also use the following five sub-patterns:

?(pattern-list): Matches empty or one of the patterns
*(pattern-list): Matches empty or any number of occurrences of the patterns
+(pattern-list): Matches at least one occurrences of the patterns
@(pattern-list): Matches exactly one of the patterns
!(pattern-list): Matches anything EXCEPT any of the patterns

The pattern-list represents one or more patterns, which can again contain these extended sub-patterns, delimited by pipe signs (|). Two simple examples:

  • rm -rf !(lost+found)
    Removes everything except lost+found
  • for x in *.@(jp?(e)g|gif|png)
    Loops through all files having extension jpg, jpeg, gif, or png

The following example is a little more complicated. It prints a list of default/GNU/Intel C/C++ compilers present in directories specified by $PATH:

#!/bin/bash
shopt -s extglob nullglob
x="${PATH//:/,}"
eval "printf '%s\n' {$x}/@(?([ig])cc|[cg]++|icpc)?(-+([0-9])+(\.+([0-9])))"

(NOTE: nullglob makes a pattern matching no file to expand to nothing instead of unchanged.)

Doesn’t it look like a regular expression? The output is like this in my system:

/usr/bin/c++
/usr/bin/c++-4.2.4
/usr/bin/c++-4.3.3
/usr/bin/cc
/usr/bin/g++
/usr/bin/g++-4.2.4
/usr/bin/g++-4.3.3
/usr/bin/gcc
/usr/bin/gcc-4.2.4
/usr/bin/gcc-4.3.3
/usr/x86_64-pc-linux-gnu/gcc-bin/4.2.4/c++
/usr/x86_64-pc-linux-gnu/gcc-bin/4.2.4/g++
/usr/x86_64-pc-linux-gnu/gcc-bin/4.2.4/gcc
/opt/intel/cce/10.1.018/bin/icc
/opt/intel/cce/10.1.018/bin/icpc

Unfortunately, we cannot use the following codes:

x="${PATH//:/|}"
printf '%s\n' $x/@(?([ig])cc|[cg]++|icpc)?(-+([0-9])+(\.+([0-9])))

This is not surprising, however. No sub-patterns is allowed to expand to a string including forward slashes (path delimiter)[1]. (This means a single asterisk won’t expand to a file in a subdirectory, which is usually desired. Bash 4 has introduced ** which matches slashes as well.)

[1] In a case statement or a [[ ]] builtin (using the == operator), sub-patterns indeed match slashes.

Tags: ,

BASH’s ‘read’ built-in supports ‘\0′ as delimiter

I thought it was impossible to use ‘’ as a delimiter in bash, but noticed yesterday that Gentoo’s ebuild.sh had pipelines like this:

find ….. -print0 |
while read -r -d $’’ x; do
# Do something with file $x
done

This makes it possible to handle any strange filenames correctly, even if the filename contains newline ('n') or carriage return ('r') characters. (Some other commands, including sort and xargs, have options to make null character the delimiter based on the same reason.)

Because BASH internally uses C-style strings, in which '' is the terminator, read -d $'' is essentially equivalent to read -d ''. This is why I believed read did not accept null-delimited strings. However, it turns out that BASH actually handles this correctly.

I checked BASH’s souce code and found the delimiter was simply determined by delim = *list_optarg; (bash-3.2/builtins/read.def, line 296) where list_optarg points to the argument following -d. Therefore, it makes no difference to the value of delim whether $'' or '' is used.

Tags: ,

Jumbled Characters after Catting a Binary File

When this happens, simply press Ctrl-V Ctrl-O Ctrl-M. Or alternatively, type “reset” and Return (Enter).

A terminal interpretes 0x0e byte as “activates the G1 character set”, and 0x0f as “activates the G0 character set”. The characters we read are in the G0 set. So, if there is no byte 0x0f after the last 0x0e in a binary file, everything will be shown in the unreadable G1 set, including the next shell prompt.

How does Ctrl-V Ctrl-O Ctrl-M work?
Ctrl-V is an ‘escape character’ – the next keystroke will always be interpreted as a literal character; Ctrl-O is 0x0f; Ctrl-M is carriage return. So the shell gets the command “x0f” and outputs the error message “bash: x0f: command not found”. The byte 0x0f in this message turns the active character back to the readable G0.

G1 character set is not often used these days. Konsole chooses not to implement it at all, so we never have this problem in Konsole.

Tags: , ,