Glimpse is an indexing and query system that lets you search huge
amounts of text (for example, all of your files) very quickly.
For example, if you're looking for the word something, just type
Before you use glimpse, you need to index your files by running glimpseindex. You'll probably want to run it every night from. So, your searches will miss files that have been added since the last glimpseindex run. But, other than that problem (which can't be avoided in an indexed system like this), glimpse is fantastic - especially because it's (usually) so fast.
The speed depends on the size of the index file you build: a bigger index makes the searches faster. But even with the smallest index file, I can search my entire 70-Megabyte email archive, on a fairly slow workstation, in less than 30 seconds. With faster CPUs and disks, the search could be much quicker. One weakness is in search patterns that could match many files, which can take a lot of time to do: glimpse will print a warning and ask if you want to continue the search. (After glimpse checks its index for possible matches, it runs agrep on the possibly matching files to check and get the exactly matching records.)
agrep is one of the nicer additions to the grep family. It's not only one of the faster greps around, it has the unique feature that it will look for approximate matches. It's also record-oriented rather than line-oriented. Glimpse calls agrep, but you can also use agrep without using glimpse. The three most significant features of agrep that are not supported by the grep family are:
The ability to search for approximate patterns, with a user-definable level of accuracy. For example,
agrep -2 homogenos foo
will find "homogeneous" as well as any other word that can be obtained from "homogenos" with at most 2 substitutions, insertions, or deletions.
agrep -B homogenos foo
will generate a message of the form:
best match has 2 errors, there are 5 matches, output them? (y/n)
agrep is record-oriented rather than just line-oriented; a record is by default a line, but it can be user-defined with the -d option specifying a pattern that will be used as a record delimiter. For example,
agrep -d '^From ' 'pizza' mbox
outputs all(delimited by a line beginning with From and a space) in the file mbox that contain the keyword pizza. Another example:
agrep -d '$$'
will output all paragraphs (separated by an empty line) that contain pattern.
agrep allows multiple patterns with AND (or OR) logic queries. For example,
agrep -d '^From ' 'burger,pizza' mbox
outputs all mail messages containing at least one of the
two keywords (
, stands for OR).
agrep -d '^From ' 'good;pizza' mbox
outputs all mail messages containing both keywords.
Putting these options together one can write queries like:
agrep -d '$$' -2 '<CACM>;
which outputs all paragraphs referencing articles in CACM between 1985 and 1989 by TheAuthor dealing with Curriculum. Two errors are allowed, but they cannot be in either CACM or the year. (The <> brackets forbid errors in the pattern between them.)
Other agrep features include searching for regular expressions (with or without errors), unlimited wildcards, limiting the errors to only insertions or only substitutions or any combination, allowing each deletion, for example, to be counted as, say, 2 substitutions or 3 insertions, restricting parts of the query to be exact and parts to be approximate, and many more.
Email firstname.lastname@example.org to be added to the glimpse mailing list. Email email@example.com to report bugs, ask questions, discuss tricks for using glimpse, etc. (This is a moderated mailing list with very little traffic, mostly announcements.)