On a typewriter-like device (including a CRT), an em-dash
is typed as a pair of hyphens (
In typesetting, it is
printed as a single, long dash ( - ).
troff provides a
special character name for the em-dash, but it is
inconvenient to type
\ - , and the escape
sequence is also inappropriate for use with nroff.
 Typists often use three hyphens (
---) for an em-dash, and two (
--) for the shorter en-dash.
Similarly, a typesetter provides "curly" quotation marks ("
as opposed to a typewriter's straight quotes (
In standard troff, you can substitute two backquote characters
") for open quote and two frontquote characters
") for closed quote; these characters would appear
as " and ". But it would be much better
if we could just continue to type in
<"> and have the computer
do the dirty work.
A peculiarity of
troff is that it generates the space before each word in the
font used at the beginning of that word. This means that when we
mix a constant-width font such as Courier within text, we get a
noticeably large space before each word, which can be distracting
for readers - for example:
text is in
Courier; note the
The fix for this is to force
troff to generate the space in the previous font by inserting
a no-space character (
\&) before each constant-width
font change. As you can imagine, this can turn into a large
The solution for each of these problems is to preprocess troff input with. This is an application that shows sed in its role as a true stream editor, making edits in a pipeline - edits that are never written back into a file.
We almost never invoke troff directly. Instead, we invoke it with a script that strings together a pipeline including the standard preprocessors (when appropriate) as well as doing this special preprocessing with sed.
The sed commands themselves are fairly simple.
The following command changes two consecutive dashes into an em-dash:
We double the backslashes in the replacement string
\ - , since the backslash has a special meaning to sed.
However, there may be cases in which we don't want this substitution command to be applied. What if someone is using hyphens to draw a horizontal line? We can refine the script to exclude lines containing three or more consecutive hyphens. To do this, we use the:
It may take a moment to penetrate this syntax. What's different is that we use a pattern address to restrict the lines that are affected by the substitute command, and we use ! to reverse the sense of the pattern match. It says, simply, "If you find a line containing three consecutive hyphens, don't apply the edit." On all other lines, the substitute command will be applied.
Similarly, to deal with the font change problem, we can use sed
to search for all strings matching
\f(CB, and insert
\& before them. This can be
written as follows:
To deal with the open and closed quote problem, the script needs to be more involved because there are many separate cases that must be accounted for. You need to make sed smart enough to change double quotes to open quotes only at the beginning of words and to change them to closed quotes only at the end of words. Such a script might look like the one below, which obviously could be shortened by judicious application ofregular expression syntax, but it is shown in its long form for effect.
s/^"/``/ s/"$/''/ s/"? /''? /g s/"?$/''?/ s/ "/ ``/g s/" /'' /g s/[TAB]"/[TAB]``/g s/"[TAB]/''[TAB]/g s/")/'')/g s/"]/'']/g s/("/(``/g s/\["/\[``/g s/";/'';/g s/":/'':/g s/,"/,''/g s/",/'',/g s/\."/.\\\&''/g s/"\./''.\\\&/g s/"\\(em/''\\(em/g s/\\(em"/\\(em``/g
|cleanup.sed||The preceding code shows the kind of contortions you need to go through to capture all the possible situations in which quotation marks appear. The solution to the other problems mentioned earlier in the article is left for your imagination. If you prefer, a more complete "typesetting preprocessor" script written in sed, and suitable for integration into a troff environment (perhaps with a bit of tweaking), can be found on the disc.|