The UNIX formatter nroff produces output for line printers and CRT displays. To achieve such special effects as emboldening, it outputs the character followed by a backspace and then outputs the same character again. A sample of it viewed with a text editor or might look like:
which emboldens the word "NAME." There are three overstrikes for each character output. Similarly, underlining is achieved by outputting an underscore, a backspace, and then the character to be underlined. Some pagers, such as 50.3). There are a number of ways to get rid of these decorations. The easiest way to do it is to use a utility like col, colcrt, or ul:, take advantage of overstruck text. But there are many times when it's necessary to strip these special effects; for example, if you want to grep through formatted man pages (as we do in article
col -b <
The -b option tells col to strip all backspaces (and the character preceding the backspace) from the file. col doesn't read from files; you need to redirect input from a pipe-or, as above, with the shellfile-redirection character. col is available on System V and BSD UNIX. Under System V, add the -x option to avoid changing spaces to TABs.
The - (dash) option (yes, that's an option) says "ignore underlining." If you omit it, colcrt tries to save underlining by putting the underscores on a separate line. For example:
Refer to Installing System V for information about ---------- ------ - installing optional software.
colcrt is only available under BSD; in any case, col is probably preferable.
term option lets you specify a terminal type; it
I think that ul is probably the least useful of these commands;
it tries to be too intelligent, and doesn't always do what you want.
Both col and colcrt attempt to handle "half linefeeds" (used to print superscripts and subscripts) reasonably. Many printers handle half linefeeds correctly, but most terminals can't deal with them.
Here's one other solution to the problem: a simplescript. The virtue of this solution is that you can elaborate on it, adding other features that you'd like, or integrating it into larger sed scripts. The following sed command removes the sequences for emboldening and underscoring:
It removes any character preceding the backspace along with the
In the case of underlining, "." matches the underscore; for emboldening,
it matches the overstrike character.
Because it is applied repeatedly, multiple occurrences of the overstrike
character are removed, leaving a single character for each sequence.
^H is the single character CTRL-h.
If you're a
vi user, enter this character by typing
If you're an emacs user, type