Some of my directories - my , for instance - have some text files (like shell scripts and documentation) as well as non-text files (executable binary files, compressed files, archives, etc.). If I'm trying to find a certain file - with or a - the non-text files can print garbage on my screen. I want some way to say "only look at the files that have text in them."
So, for example, instead of typing:
egrep something *
Here's the script, then some explanation of how to set it up on your system:
#!/bin/sh # PIPE OUTPUT OF file THROUGH sed TO PRINT FILENAMES FROM LINES # WE LIKE. NOTE: DIFFERENT VERSIONS OF file RETURN DIFFERENT # MESSAGES. CHECK YOUR SYSTEM WITH strings /usr/bin/file OR # cat /etc/magic AND ADAPT THIS. /usr/bin/file "$@" | sed -n ' /MMDF mailbox/b print /Interleaf ASCII document/b print /PostScript document/b print /Frame Maker MIF file/b print /c program text/b print /fortran program text/b print /assembler program text/b print /shell script/b print /c-shell script/b print /shell commands/b print /c-shell commands/b print /English text/b print /ascii text/b print /\[nt\]roff, tbl, or eqn input text/b print /executable .* script/b print b :print s/:[TAB].*//p'
The script is simple: It runs file on the command-line arguments. The output of file looks like this:
COPY2PC: directory Ex24348: empty FROM_consult.tar.Z: compressed data block compressed 16 bits GET_THIS: ascii text hmo: English text msg: English text 1991.ok: [nt]roff, tbl, or eqn input text
The output is piped to a
Different versions of file produce different output. Some versions also read an /etc/magic file. To find the kinds of names your file calls text files, use commands like:
strings /usr/bin/file > possible%
cat /etc/magic >> possible%
The possible file will have a list of descriptions that strings found in the file binary; some of them are for text files. If your system has an /etc/magic file, it will have lines like these:
0 long 0x1010101 MMDF mailbox 0 string <!OPS Interleaf ASCII document 0 string %! PostScript document 0 string <MIFFile Frame Maker MIF file
Save the descriptions of text-type files from the right-hand column.
Then, turn each line of your edited possible file into a sed command:
Watch for special characters in the file descriptions. I had to handle two special cases in the last two lines of the script above:
I had to change the string
executable %s script
from our file command to
/executable .* script/b print
in the sed script.
That's because our file command replaces
%s with a name
Characters that sed will treat as a regular expression,
such as the brackets in
[nt]roff, need to be escaped with backslashes.
\[nt\]troff in the script.
If you have , you can make a simpler version of this script, since perl has a built-in test for whether or not a file is a text file. Perl picks a "text file" by checking the first block or so for strange control codes or metacharacters. If there are too many (more than 10%), it's not a text file. You can't tune the Perl script to, for example, skip a certain kind of file by type. But the Perl version is simpler! It looks like this:
perl -le '-T && print while $_ = shift' *
|If you want to put that into an, the C shell's make it tough to do. Thanks to , though, here's an alias that does the job:|
alias findtext 'perl -le '\''-T && print while $_ = shift'\'' *'