UNIX Unleashed, System Administrator's Edition
- 25 -
By Jeff Smith and James Edwards
The history of the USENET news service can be traced back to the original ARPANET. The original ARPA-Internet community used a series of mailing lists to distribute information, bulletins, and updates to community members. As this community expanded, management of these mailing lists became more and more difficult. The lists became exceptionally long, and carrying out the necessary moves, adds, and changes became more onerous.
The USENET provides a viable alternative for relaying this news. The idea is that the "news" information be posted on a central server and available for users to retrieve whenever they want. The USENET system provides similar functionality to the old mailing list operation; the information is arranged as individual articles divided into different groups and classifications. (Such a server is also referred to as an electronic bulletin board system, or BBS). To make client access as efficient as possible, these central stores of "news" information are distributed to a number of local servers.
USENET has developed into what is certainly the world's largest electronic BBS. It's a loose conglomeration of computers that run operating systems ranging from MS-DOS to UNIX and VM/CMS, and that exchange articles through UUCP, the Internet, and other networks. USENET is also probably the largest experiment to date with creative anarchy--there is little central authority or control--and anyone can join who runs the appropriate software and who can find a host already on the network with which to exchange news.
The lenient requirements for membership, the wide variety of computers able to run USENET software, and the tremendous growth of the Internet have combined to make USENET big. How big? No one really knows how many hosts and users participate, but the volume of news will give you some idea. Estimates in the latest "How to Become a USENET site" Frequently Asked Questions (FAQ) document suggests 5,400 MB of news is updated per month. This works out to an average of more than 150 MB per day. Downloading this much information over a standard analog modem could take as much as 15 hours a day!
This huge volume can cause problems for the system administrator, because the amount of disk space used for news may vary a lot, and quickly. You might think you've got plenty of space in your news system when you leave on Friday night, but then you get a call in the wee hours of Sunday morning telling you that the news file system is full. If you've planned poorly, it might take more important things with it--such as e-mail, system logging, or accounting (see "Isolating the News Spool" later in this chapter to avoid that problem). This chapter (and good planning) will help you avoid some (but not all) of the late-night calls.
The chapter begins with some pointers on finding additional sources of information. Some information is included on the UNIX Unleashed CD-ROM, some is available on the Internet, and some (from the technical newsgroups) you'll be able to apply only after you get your news system running.
The examples in this chapter assume you have an Internet site running the Network News Transfer Protocol (NNTP). If your networking capabilities are limited to the UNIX-to-UNIX Copy Program (UUCP), you're mostly on your own. Although some of the general information given here still applies, UUCP is a pain, and the economics of a full newsfeed make Internet access more and more attractive every day. If your site isn't on the Internet but you want to receive news, it might be time to talk to your local Internet service provider. You might find it cheaper to pay Internet access fees than 15-hour-per-day phone bills. If your site's news needs aren't too great, it might be even more economical to buy USENET access from an Internet service provider. (See the section "Do You Really Want to Be a USENET Site?" later in this chapter.)
Additional Sources of Information
News software is inherently complex. This chapter can only begin to give you the information you need to successfully maintain a USENET site. The following sources of additional information will help you fill in the gaps.
Frequently Asked Questions (FAQ) Documents
In many USENET newsgroups, especially the technical ones, similar questions are repeated as new participants join the group. To avoid answering the same questions over and over, volunteers collect these prototypical questions (and the answers) into FAQs. The FAQs are posted periodically to that newsgroup and to the newsgroup news.answers. Many FAQs are also available through the Internet file transfer protocol (ftp), through e-mail servers, or through other information services such as Gopher, Wide Area Information Service (WAIS) and, of course, the World Wide Web (WWW).
You should read the FAQs in the following list after you've read this chapter and before you install your news system. All of them are available on the host rtfm.mit.edu in subdirectories of the directory pub/usenet/news.answers/index and referenced through the site URL--http://www.rtfm.mit.edu.
Another excellent source is the USENET Information Center. This provides a hypertext-based index that covers the vast majority of available newsgroups, providing a FAQ for each one. The USENET Information Center can be found at the following URL--http://sunsite.uuc.edu/usenet-i/.
News Transport Software Documentation
There are a number of available News transport systems; some of the most commonly recommended include C-news, InterNetworkNews (INN) and Netscape's News Server. All these packages come with extensive documentation to help you install and maintain them. Whichever you choose, read the documentation and then read it again. This chapter is no substitute for the software author's documentation, which is updated to match each release of the software and which contains details that a chapter of this size can't cover.
Request for Comments (RFC) Documents
RFCs are issued by working groups of the Internet Engineering Task Force (IETF). They were known initially as requests for comments, but as they become adopted as Internet standards, you should think of them as requirements for compliance--if you want to exchange news with another Internet NNTP site, you must comply with the provisions of both RFC 977 and 1036. RFCs are available for anonymous ftp on the host ftp://ftp.internic.net and others. The RFCs mentioned here are also included on the UNIX Unleashed CD-ROM.
Once you get your news system running, there are several technical and policy newsgroups you'll want to read. These newsgroups will keep you abreast of new releases of your news transport software, bug fixes, and security problems. You'll also see postings of common problems experienced at other sites, so if you encounter the same problems, you'll have the solutions. Many knowledgeable people contribute to these newsgroups, including the authors of C-news and INN.
Remember that the people answering your questions are volunteers, doing so in their spare time, so be polite. The first step toward politeness is to read the newsgroup's FAQ (if there is one) and so avoid being the 1,001st lucky person to ask how to make a round wheel. You should also read the "Emily Postnews" guide to USENET etiquette and other introductory articles in the newsgroup news.announce.newusers.. Listed below are a few of the newsgroups you may want to read. You may want to subscribe to all of the news.* groups for a few weeks and then cancel the subscriptions for the ones you don't need
News Systems and Software
This section focuses on the major components of the USENET. Clearly these can be divided into two very broad groups: those relating to the content, including such things as articles, data stores, and file formats; and those relating to the transport of articles between news servers and between news clients and servers.
A news article is like an e-mail message--it has a message body, which is accompanied by one or more headers that provide supplemental information relating to the message. A standard format for both message body and headers has been outlined in RFC 1036.
This RFC indicates that the message body will follow a number of required header values that must accompany each posted news article. In addition, any message may optionally include one or more additional headers; however, these optional headers might be ignored by the receiving News server or client. Table 25.1 provides a useful summary of these header values. The table outlines which headers are optional and which are mandatory.
Table 25.1. USENET news article message format.
Articles are posted to one or more newsgroups, whose names are separated by periods to categorize them into hierarchies. For instance, the newsgroups comp.unix.solaris and comp.risks are both in the comp hierarchy, which contains articles having to do with computers. The comp.unix.solaris newsgroup is further categorized by inclusion in the unix subhierarchy, which has to do with various vendors' versions of UNIX.
Some of the current USENET newsgroup hierarchies are shown in the following list. There are others--this is by no means a definitive list. Some Internet mailing lists are fed into newsgroups in their own hierarchies. For instance, the GNU (GNU is a self-referential acronym for "GNU is not UNIX") project's mailing lists are fed to the gnu newsgroup hierarchy.
Where News Articles Live
News articles arranged within the USENET hierarchy are commonly stored in a separate file system in a site's news server. This file system is often named /var/spool/news or /usr/spool/news. The files that contain articles are given serial numbers as they are received, with the periods in the newsgroup names replaced by the slash character (/). For instance, article number 1047 of the newsgroup comp.unix.solaris would be stored in the file /var/spool/news/comp/unix/solaris/1047.
The News Overview Database (NOV)
Newsreaders (and users) have a difficult job. Remember that more than 100 MB of news is posted to USENET every day. That's about the same as a fairly thick novel every day of the year, without any holidays. Most people want to have their favorite newsreader sift the wheat from the chaff and present them with only the articles they want to see, in some rational order.
To do this, newsreaders must keep a database of information about the articles in the news spool; for instance, an index of subject headers and article cross-references. These are commonly known as threads databases. The authors of newsreaders have independently developed different threads databases for their newsreaders, and naturally, they're all incompatible with each other. For instance, if you install trn, nn, and tin, you must install each of their threads database maintenance programs and databases, which can take a lot of CPU cycles to generate and may become quite large.
Geoff Collyer, one of the authors of C-news, saw that this was not good and created the News Overview Database (NOV), a standard database of information for fancy newsreaders. The main advantage of NOV is that just one database must be created and maintained for all newsreaders. The main disadvantage is that it hasn't yet caught on with all the authors of news software.
If you're interested in NOV support, you must install news transport software that has the NOV NNTP extensions (INN does) and newsreaders that can take advantage of it. According to the NOV FAQ, trn3.3 and tin-1.21 have built-in NOV support, and there is an unofficial version (not supported by the author) of nn for anonymous ftp on the host agate.berkeley.edu in the directory ~ftp/pub/usenet/NN-6.4P18+xover.tar.Z.
Distributing the News
The network News Transfer protocol is the application that is used to distribute news articles between news servers and clients. NNTP is an application level protocol--similar in operation and functionality to HTTP. As with HTTP, the NNTP application makes use of the reliable communication services that are provided by the TCP protocol.
The following section examines the operation of the NNTP application and highlights how NNTP provides a mechanism for both the distribution of news throughout the USENET and enables user access to these "news" servers. Figure 25.1 below provides an overview of the architecture of the USENET.
As Figure 25.1 indicates, the USENET network relies on the operation of NNTP servers acting as central data stores of news information. Users are granted access to this information through client programs known as newsreaders. Information is conveyed throughout the USENET through a process of server replication--known as a newsfeed. Access for both clients and servers occurs over established TCP connections via the well known port 119.
Like the HTTP application protocol, NNTP uses a system of request and response messages to exchange information with both clients and servers. These messages are formatted using standard ASCII characters. Table 25.1 provides a summary of the standard request message commands.
Table 25.1. Summary of USENET Request Message Commands.
The contacted news server responds to any message request with a response that consist of two parts: a three-digit status number and a text-based message body. The returned status number provides an indication of the success or failure of the particular request--following a similar format to that used within the ftp application. Table 25.2 provides a summary of the possible values.
Table 25.2. NNTP status line response codes.
The news server signals the end of any message or command with a line consisting of a single dot (.). If any line of text actually starts with a dot, the server adds another one to indicate that it is not the end of message marker.
Listing 25.1 provides an example of the operation of NNTP between a newsreader and USENET server. In this example, the client requests to read a single news article that is contained within a particular newsgroup--notice how the server responds to the client NNTP requests with a status line and one or more lines of text.
Listing 25.1. Example operation of the NNTP application.
client attaches to selected newserver 200 usenetserver news server ready - posting ok client requests a list of available newsgroups LIST 215 list of newsgroups follows alt.2600 alt.2600.aol ... ... comp.protocols.snmp comp.protocols.frame-relay comp.protocols.tcp-ip ... ... ... select a particular group GROUP comp.protocols.tcp-ip 211 86 1001 1087 comp.protocols.tcp-ip group selected ARTICLE 1002 220 1002 <email@example.com> Article retrieved, text follows Path: From: Newsgroup: comp.protocols.tcp-ip Subject: HTTP Request Formats Date: 8 March 1997 20:21:32 EST Organization: Deloitte Touche Consulting Group message body appears here . message response is terminated with a single period client ends session using the quit command. QUIT
Sharing News Over the Network
If you have several hosts on a local area network (LAN), you'll want to share news among them to conserve disk space. As mentioned previously, if you carry all possible newsgroups, your news spool needs about a gigabyte of disk space, more or less, depending on how long you keep articles online. A year from now, who knows how much you'll need? It makes more sense to add disk capacity to a single host than to add it to all your hosts.
There are two ways to share news over a LAN. If all of your hosts run a network file system such as Sun Microsystem's NFS or Transarc's AFS (Andrew File System), you can export the news host's spool directory to them, or use NNTP to transfer news from a single server host to client newsreaders and news posting programs. An alternative approach would be to use NNTP to transfer news from a single server host to client newsreaders and news posting programs. The only requirements for the client hosts are that they be able to open up a TCP/IP connection over the network and have client software that understands NNTP. Most common UNIX-based newsreaders and news posting programs have built-in NNTP support, and there are many NNTP clients for non-UNIX operating systems such as DOS, VMS, VM/CMS, and others.
An NNTP daemon runs continuously on the news server host, listening on a well known port, just as the Simple Mail Transfer Protocol (SMTP) server listens on a well known port for incoming e-mail connections. NNTP client programs connect to the NNTP server and issue commands for reading and posting news articles. For instance, there are commands to ask for all the articles that have arrived since a certain date and time. A client newsreader can ask for those articles and display them to the user as the NNTP server ships them over the network. Hosts with which you exchange news connect to the NNTP server's port and transfer articles to your host.
NNTP servers usually have some form of built-in access control so that only authorized hosts can connect to them--after all, you don't want all the hosts on the Internet to be able to connect to your news server.
Transferring News to Other Hosts
When a posting program hands an article to the news system, it expects a copy of the article to be deposited in the local news spool (or the news spool of the local NNTP server), sent to other hosts, and eventually sent to the rest of USENET. Similarly, articles posted on other USENET hosts should eventually find their way into the local (or NNTP server's) spool directory.
Figure 25.2 illustrates a simple set of connections between hosts transferring news. The incoming and outgoing lines emphasize that news is both sent and received between each set of hosts.
USENET news is transferred by a flooding algorithm, which means that when a host receives an article, it sends it to all other hosts with which it exchanges news, and those hosts do the same. Now suppose that someone on host-b in Figure 25.2 posts a news article.
Because of the flooding algorithm, host-b sends the article to host-a, host-c, and any other hosts with which it exchanges news. Host-c gets the article and does the same, which means it gives the same article to host-a, which may try to give it back to host-b, which already has a copy of the article in its news spool. Further, since host-b gave host-a the article, it will try to give it to host-c, which already got it from host-b.. It's also possible that host-a got a copy of the article from host-b before host-c offered it and will want to give it to host-c.. Just to keep the news administrator's life interesting, no one can say whether any other hosts will ship the same article back to host-b or host-c. (Well-behaved hosts should avoid transferring articles back to the hosts from which they originally received them, but on USENET, it's best to plan for worst case behavior from another site's software.) How do these hosts know when articles are duplicates and should be rejected? Obviously, they can't compare a new article with every article currently in the spool directory.
The news system software uses two different methods to avoid duplicate articles. The first is the Path header, which is a record of all the hosts through which a news article has passed. The Path header is just a list of hosts separated by punctuation marks other than periods, which are considered part of a hostname. A Path such as hst.gonzo.com,host-c.big.org!host-b.shark.com means that an article has been processed by each of the sites hst.gonzo.com, host-c.big.org and host-b.shark.com. Any of those hosts can reject the article because their names are already in the path.
RFC 1036 says that the Path header should not be used to generate e-mail reply addresses. However, some obsolete software might try to use it for that. INN discourages this use by inserting the pseudo-host not-for-mail into the Path.
The second way in which news systems avoid duplicate articles is the message identifier header, Message-ID. Here is a sample Message-ID header:
When a news article is created, the posting program, or some other part of the news system, generates this unique header. Because no two articles have the same Message-ID header, the news system can keep track of the message identifiers of all recent articles and reject those that it has already seen. The news history file keeps this record, and news transport programs consult the history file when they're offered news articles. Because the volume of news is so large, history files get big pretty fast and are usually kept in some database format that allows quick access.
The history mechanism is not perfect. If you configure your news system to remember the message identifiers of all articles received in the past month, your history files may become inconveniently large. On the other hand, if a news system somewhere malfunctions and injects two-month-old articles into USENET, you won't have enough of a history to reject those articles. Inevitably, no matter how long a history you keep, it won't be long enough, and you'll get a batch of old, bogus articles. Your users will complain. Such is life.
Host-to-Host News Transport Protocols
As with electronic mail, in order to transfer news from host to host, both hosts must speak the same language. Most USENET news is transferred either with the UUCP (UNIX-to-UNIX Copy Protocol) or NNTP. UUCP is used by hosts that connect with modems over ordinary phone lines, and NNTP is the method of choice for hosts on the Internet. As mentioned above, you should avoid UUCP if you can.
News Transport System Configuration Files
The news transport system needs a lot of information about your site. Minimally, it must know with which hosts you exchange news, at what times you do so, and what transport protocol you use for each site. It has to know which newsgroups and distributions your site should accept and which it should reject. NNTP sites must know which hosts are authorized to connect with them to read, post, and transfer news.
The news transport system's configuration files provide this information. The news administrator must set up these files when installing the news system and must modify them in response to changes, such as a new newsfeed. The format of news transport system control files varies, but all current systems provide detailed configuration documentation. Read it.
The User Interface--Newsreaders and Posting Programs
Newsreaders are the user interface to reading news. Because news articles are stored as ordinary files, you could use a program such as cat or more for your news reading, but most users want something more sophisticated. Many newsgroups receive more than a hundred articles a day, and most users don't have time to read them all. They want a program that helps them quickly reject the junk so they can read only articles of interest to them. A good newsreader enables users to select and reject articles based on their subject header; several provide even more sophisticated filtering capabilities. Some of the more popular newsreaders are rn (and its variant trn), nn, and tin. The GNU Emacs editor also has several packages (GNUS and Gnews) available for news reading from within Emacs. These newsreaders are available for anonymous ftp from the host ftp.uu.net and others.
Newsreaders usually have built-in news posting programs or the capability to call a posting program from within the newsreader. Most of them also let you respond to articles by e-mail.
Newsreaders are like religions and text editors--there are lots of them and no one agrees on which is best. Your users will probably want you to install them all, as well as whatever wonderful new one was posted to comp.sources.unix last week. If you don't have much time for news administration, you may want to resist or suggest the users get their own sources and install private copies. Otherwise, you can spend a lot of time maintaining newsreaders.
News posting programs enable you to post your own articles. A news posting program prepares an article template with properly formatted headers, and then calls the text editor of your choice (usually whatever is named in the EDITOR environment variable) so you can type in your article. When you exit the editor, you're usually given choices to post the article, edit it again, or quit without posting anything. If you choose to post the article, the news posting program hands it to another news system program, which injects it into the news transport system and puts a copy in the news spool directory.
Newsreaders and news posting programs are usually both included in the same package of software. For instance, if you install the rn package you will also install Pnews, its news posting program.
Listing 25.1 provides an example of the operation of a text-based newsreader. Increasingly, newsreaders are also being incorporated within web browser applications, providing a graphical view of newsgroups and articles. Figure 25.3 provides a screen shot of the newsreader that is incorporated within a standard Netscape browser.
Notice that the Netscape Newsreader provides a three-way split screen. In the left window, the name of the news server is displayed along with the newsgroups available. For the highlighted newsgroup (comp.protocols.tcp-ip in the example) the right screen details the existing articles. These articles are arranged into separate threads--each thread relating to a particular conversation or related topic. The user can select a particular news article and view it in the bottom window of the newsreader screen.
The benefits of using a graphical newsreader are clearly demonstrated in Figure 25.3. Each of the separate windows is related to the execution of a particular NNTP request message. The user can use the mouse to navigate the information returned by the news server without having to remember the somewhat cryptic command requests listed in Table 25.1.
Planning a News System
You can see from the preceding discussion that there are many different strategies you can use to set up a news system. Because sites' needs vary, there is no single right way to do it. You must evaluate your site's needs and choose a strategy that fits. The questions in this section are intended to make you think about some of the issues you should consider.
Do You Really Want To Be a USENET Site?
As pointed out in the "how to join USENET" FAQ, you may not want to join at all. A newsfeed consumes significant CPU cycles, disk space, network (or modem) bandwidth, and staff time. Many Internet service providers will give your site access to USENET news over the network through NNTP client newsreaders. If your site is small this may be more economical than a newsfeed. Do yourself a favor and do the math before you jump in. You can always join USENET at a later date if you find that your site's needs require a real feed.
Shared News Versus One News Spool Per Host
A basic decision is whether you will maintain separate news spools and news systems on all of your hosts, or designate a single host to act as a news server and let other hosts access news through the network. If you have more than one host to administer, there are definite advantages to the latter approach.
If you have a single news host, your job as news administrator is much easier. Most news problems are confined to that host, and you only have to maintain the (fairly complex) news transport software on that host. Software on client hosts is limited to newsreaders and news posting software--no news transport software is necessary. If there are problems, you know where to go to solve them, and once you solve them on the news host, they are solved for all the hosts in your domain.
USENET volume helps make a single-host strategy attractive. As mentioned previously, a full newsfeed can easily require a gigabyte of disk space, and the volume of USENET news continues to grow, seemingly without bound. It's a lot easier to convince your boss to buy a bigger disk drive for a single host than for twenty. Because many users don't read news every day, the longer you can retain articles the happier they are, and you can retain articles longer on a single, dedicated news host than you can on multiple hosts.
Economics points to using a single news host both to minimize expensive staff time and to conserve disk space. The only reason you might want to store news on multiple hosts is if your network isn't up to par--if your only network connections are through UUCP, you can't use NNTP or a network file system to share news.
Isolating the News Spool
Most UNIX systems use the file system /var to contain files that grow unpredictably. For instance, /var/mail contains user mailboxes and /var/log contains system log files. Since the news spool is usually located in /var/spool/news, news articles may compete for space with potentially more important data such as e-mail. Having your e-mail system grind to a halt because someone posts his 10 MB collection of Madonna erotica will not endear you to your users or your boss.
The best way around this problem is to isolate the news spool in its own disk partition. If /var/spool/news is mounted on its own disk partition and it fills up, only the news system is affected.
The disadvantage of this approach is that it forces you to pre-allocate disk space. If you allocate too little to the news spool, you'll have to either expire articles sooner than you'd like or spend a lot of time fixing things by hand when the spool directory fills. If you allocate too much, it can't be used by other file systems, so you waste space. (However, it's better to guess too big than too little. Remember that the volume of USENET news constantly increases.)
Depending on how flexible your UNIX is, if you guess wrong and have to resize your partitions, it may be painful. You must resize at least two adjoining disk partitions to shrink or enlarge the news spool, which means dumping all the data in the partitions, creating new ones, and restoring the data. (A safer approach is to dump all the data on the disk and verify that you can read the backup tapes before you resize the partitions.) During this operation, the news system (and probably the computer) are unavailable.
Configuring Your News Spool's File System
Before you can use a disk partition, you must create a UNIX file system on it, using newfs, a front-end to the harder-to-user mkfs program. (Some versions of UNIX use mkdev fs to create file systems. Consult your system's administration manual.) Unless you tell it otherwise, newfs uses its built-in default for the ratio of inodes (index nodes) to disk blocks. Inodes are pre-allocated, and when you run out of them, no new files can be created, even if you have disk space available in the file system. The newfs default for inodes is usually about right for most file systems but may not be for the news spool. News articles tend to be small, so you may run out of inodes in your news spool before you run out of disk space. On the other hand, since each pre-allocated inode takes some disk space, if you allocate too many you'll waste disk space.
Most likely you'll want to tell newfs to create additional inodes when you create your news spool. The hard question is how many additional inodes to allocate. If your news system is already running, you can use the df command to find out. Simply compare the percentage of inodes in use to the percentage of disk blocks in use. If they are about the same, you're doing okay. If the disk block usage is a lot greater than the inodes in use, you've allocated too many inodes. What is more likely is that the inodes in use greatly outnumber the available disk blocks. The solution is to shut down your news system, dump the news spool to tape, run newfs to make a file system with more inodes, and restore the news spool from tape.
Where Will You Get Your News?
Some organizations use USENET for internal communications--for instance, a corporate BBS--and don't need or want to connect to USENET. However, if you want a USENET connection, you'll have to find one or more hosts willing to exchange news with you. Note that they are doing you a big favor--a full newsfeed consumes a lot of CPU cycles, network bandwidth, and staff time. The spirit of USENET, however, is altruistic, and you may find a host willing to supply you with a newsfeed for free. In turn, you may someday be asked to supply a feed to someone else.
Finding a host willing to give you a newsfeed is easier if you're already on USENET, but if you were, you wouldn't need one. Your Internet service provider might be able to give you contact information, and many service providers supply newsfeeds, either as part of their basic service or at additional cost. Personal contacts with other system administrators who are already connected to USENET may help, even if they can't supply you a feed themselves. The "how to join USENET" FAQ mentioned previously contains other good ideas for finding a newsfeed.
It's a good idea to try to find a newsfeed that is topographically close on your network. If your site is in Indiana, you don't want a transatlantic feed from Finland, even if you manage to find a host there willing to do it.
Your users' USENET articles reflect on your site, and new users often make mistakes. Unfortunately, the kinds of mistakes you can make on a worldwide network are the really bad ones. You should develop organizational USENET access policies and educate your users on proper USENET etiquette.
Policy questions tend toward the ethical and legal. For instance, if you carry the alt hierarchy, what will be your site's response when someone creates the newsgroup alt.child-molesting.advocacy? This is not beyond the pale of what you may expect in the alt hierarchy. Even within the traditional hierarchies, where newsgroups are voted into existence, you might find newsgroups your site won't want to carry. What will you do when you receive a letter from firstname.lastname@example.org, whining that one of your users is polluting his favorite newsgroup with "inappropriate" (in his opinion) postings. Do you want to get involved in USENET squabbles like that?
What will you do when you get 2,843 letters complaining that one of your users posted a pyramid-scheme come-on to 300 different newsgroups? Shoot him? Or maybe wish you'd done a more careful job of setting policy in the first place?
And what will you do when someone complains that the postings in alt.binaries.pictures.erotica.blondes are a form of sexual harassment and demands that the newsgroup be removed? Will you put yourself in the position of censor and drop that newsgroup, or drop the entire alt hierarchy to avoid having to judge the worth of a single newsgroup?
If you put yourself in the position of picking and choosing newsgroups, you will find that while it may be completely obvious to you that comp.risks has merit and alt.spam doesn't, your users may vehemently disagree. If you propose to locally delete alt.spam to conserve computing resources, some users will refer to their right to free speech and accuse you of censorship and fascism. (Are you sure you wanted this job?)
Most news administrators don't want to be censors or arbiters of taste. Therefore, answers to policy questions should be worked out in advance, codified as site policy, and approved by management. You need to hammer away at your boss until you get a written policy telling you what you should and should not do with respect to news administration, and you need to do this before you join USENET. Such a policy should also provide for user education and set bounds for proper user behavior.
Without taking a position on the merits of alt.spam, USENET access is not one of the fundamental rights enumerated in the United States Constitution. It's more like a driver's license--if you're willing to follow your site's rules, you can drive, and if you're not, you can't. It's management's job to provide those rules, with guidance from you.
News system software is flexible enough to selectively purge old articles. In other words, if your site doesn't care much about the alt hierarchy but considers the comp hierarchy to be important, it can retain comp articles longer than alt articles. From the preceding discussion, you can see that this might be contentious. If Joe thinks that alt.spam is the greatest thing since indoor plumbing, he will cry foul if you expire spam articles in one day but retain comp articles for seven. You can see that article expiration is not just a technical issue but a policy issue and should be covered in the same written policies mentioned previously.
Automatic Response to newgroup/rmgroup Control Messages
Newsgroups are created and removed by special news articles called control messages. Anyone bright enough to understand RFCs 1036 and 977 can easily forge control messages to create and remove newsgroups. (That is, just about anyone.) This is a particular problem in the alt hierarchy, which for some reason attracts people with too much time on their hands, who enjoy creating newsgroups such as alt.swedish-chef.bork.bork.bork. The alt hierarchy also is used by people who don't want to go to the trouble of creating a new newsgroup through a USENET-wide vote, or who (usually correctly) guess that their hare-brained proposal wouldn't pass even the fairly easy USENET newsgroup creation process.
Another problem, somewhat less frequent, occurs when a novice news administrator posts newsgroup messages with incorrect distributions and floods the net with requests to create his local groups.
You can configure your news system software to create and delete groups automatically upon receiving control messages, or to send e-mail to the news administrator saying that the group should be created or removed. If you like living dangerously, you can enable automatic creation and deletion, but most people don't. You don't want someone to delete all your newsgroups just to see if he can, and you don't want two or three hundred created because a news system administrator made a distribution mistake. Many sites allow automatic creation but do deletions manually. More cautious sites create and delete all groups by hand, and only if they have reason to believe the control message is valid. I recommend the latter approach. The only disadvantage is that you may miss the first few articles posted to a new newsgroup if you don't stay on top of things.
The ABCs of News Transport Software
USENET began with A-news, a prototype news transport system that was killed by its own success and was supplanted by B-news. B-news sufficed for quite a while but became another victim of USENET growth and was supplanted by C-news, a much more efficient system written by Henry Spencer and Geoff Collyer of the University of Toronto. C-news was followed by INN (InterNetworkNews), which was originally written by Rich Salz of the Open Software Foundation, who apparently hadn't heard of the letter "D." Rich has since passed responsibility for INN over to the Internet Software Consortium (ISC), which now is the official source for all INN releases. The Consortium can be located at the URL, HTTP://www.isc/org/isc.
Depending on your site's requirements, either C-news, INN or even Netscape's New Server make good news transport systems, but this chapter has space for only one, INN. If you install C-news and your site plans to use NNTP, you should also obtain and install the NNTP "reference implementation," that is available by anonymous ftp from the URL ftp://ftp.uu.net/~ftp/networking/news/nntp. This isn't necessary for INN, which has a slightly modified version of NNTP built in.
INN is the news transport system of choice for Internet sites that use NNTP to exchange news and provide newsreaders and news posting services. It was designed specifically for efficiency in an Internet/NNTP environment, for hosts with many newsfeeds and lots of NNTP client newsreaders. Although its installation isn't as automated as C-news, it's not all that difficult, and it's well-documented. The following sections give an overview of how to build and install INN.
Getting Your Hands on the Sources
The latest version of INN available as this book goes to press is called INN 1.5.1. It is available from ftp://ftp.vix.com/pub/inn or in one of the mirror sites that have been set up by the ISC. Refer to the ISC web site at http://www.isc.org/inn.HTML for more information. It is important to note that a patch has been released to fix a security bug found within INN 1.5.1. This patch is also available via the ISC and can be found at their ftp site at ftp://ftp.isc.org/isc/inn/unoff-patches.
An INN Distribution Roadmap
Most of the important directories and programs in the INN distribution are summarized in the following list. Some are covered in more detail in the sections "Configuring INN--the config.data File," "Building INN," and "Site Configuration."
Learning About INN
The first step in setting up INN is to format and read its documentation. cd into the top of the INN source tree and type the following to create a formatted copy of the INN documentation named Install.txt:
$ make Install.ms cat Install.ms.1 Install.ms.2 >Install.ms chmod 444 Install.ms $ nroff -ms Install.ms > Install.txt
If the make command doesn't work for you (and if it doesn't, your make is defective and will cause you problems later), type cat Install.ms.? > Install.ms and then the preceding nroff command. These two commands create a file named Install.txt, which you can view with your preferred editor or pager. Read it. Print it. Highlight it with your favorite color of fluorescent marker. Sleep with it under your pillow. Take it into the shower. Share it with your friends. Read it again. You won't be sorry.
The Install.ms document tells you just about everything you need to know to set up a news system based on INN. The only problem with it is that many people fail to read it carefully and think that there's something missing. There isn't. If you think there is, read it again. Buy a new fluorescent marker, print off a copy of the file, and sit down with a nice glass of your favorite tea. Put it back under your pillow. Discuss it at dinner parties until your hosts ask you to leave, and ask your spouse what she or he thinks about it. You may destroy your social life, but in the process you'll discover that you missed a few crucial bits of information the first time around. (Don't feel bad, nearly everyone does.)
Configuring INN--the config.data File
Once you've absorbed the INN documentation, you're ready to configure INN's compilation environment. Like C-news, INN can run on many different versions of UNIX. The programs that build INN need information about your version of UNIX so they can build INN correctly. This configuration is one of the most difficult parts of installing INN, and you must make sure that you get it right. The Install.ms documentation is essential because it contains sample configurations for many different versions of UNIX.
The directory config holds the INN master configuration file, config.data. INN uses the C-news subst program to modify its sources before compilation, and config.data provides the information subst needs to do its job. Subst uses the definitions in config.data to modify the INN source files before they are compiled.
INN supplies a prototype version of config.data named config.dist. Config.dist is almost undoubtedly wrong for your UNIX. You must create your own version of config.data:
$ cd config $ cp config.dist config.data
Now edit config.data to match your site's version of UNIX. As mentioned earlier, this is one of the hardest parts of installing INN. Config.data is about 700 lines long, and there's nothing for it but to go through it line by line and make the appropriate changes. Depending on how experienced you are, you may have to set aside several hours for this task. Install.ms devotes about 18 pages to config.data, and you should refer to it as you edit.
Unless you know off the top of your head the answers to questions such as, "How does your UNIX set non-blocking I/O?", you'll need to keep your programmer's manuals handy. If you have a workstation, you can edit config.data in one window and use another to inspect your system's online documentation. Install.ms gives sample configurations for many popular versions of UNIX. If your version is listed, use its values. (That doesn't, however, relieve you of the chore of inspecting the entire file.)
Once you've edited config.data, you're ready to let subst configure the INN sources. From within the config directory, type the following:
$ make quiet
Now that INN is configured, you're ready to build the system. Install.ms gives several ways to do this, depending on how trusting you are and your general philosophy of life. If you're the kind of person who likes cars with automatic transmission, you can cd to the top of the INN source tree, type ./BUILD, and answer its questions. The BUILD shell script compiles and installs INN without much input from you.
If you prefer to shift gears yourself, from the same directory you can type the following:
$ make world $ cat */lint | more
Carefully inspect the lint output for errors. (See the following Tip.)
You'll learn the most about INN if you compile it bit by bit with Install.ms by your side. You may think that if INN is so simple to install you should take the easy road and use BUILD.. But news systems are complex, and no matter how good they are, you will inevitably have some problems to solve. When you do, you'll need all the clues you can muster, and building INN step-by-step helps you learn more about it. Someday, when the weasels are at the door, you'll be glad you did.
The step-by-step compilation procedure is fairly simple. First build the INN library:
$ cd lib $ make libinn.a lint 2>&1 | tee errs $ cd ..
The tee command prints the output of the make command to your terminal and saves it to the file errs. If you use an ugly shell such as csh or one of its variants, type sh or ksh before executing the preceding command, or read your shell's manual page for the correct syntax to save the standard output and standard error of a command into a file.
The make command creates a library of C language functions used by the other INN programs and a lint library to help detect possible problems with it. Since the other INN programs depend on the INN library, it's crucial that you compile it correctly. Check the output in the file errs and assure yourself that any errors detected by your C compiler or lint are innocuous. If you find errors (especially compiler warnings), it's probably due to a mistake you've made in config.data. The only solution is to correct config.data, run subst again, and recompile libinn.a.
After you've successfully built the INN library, you can build the rest of INN. cd into each of the following directories in turn: frontends, innd, nnrpd, backends, and expire. In each directory, type the following:
$ make all 2>&1 | tee errs
Check the output in the file errs. If there are compiler warnings or lint errors, do not pass go and do not collect $200. Consult your system's online documentation, edit config.data to correct the problems, rerun subst, and recompile the system beginning with libinn.a.
Now you're ready to install INN. Assuming that everything has gone well so far, cd to the root of the INN source tree, type su to become the superuser, and type this:
$ sh makedirs.sh 2>&1 | tee errs $ make update 2>&1 | tee -a errs
This runs the commands to install INN and saves the output in the file errs, which you should carefully inspect for errors. Note the -a argument to tee in the second command line, which makes tee append to the file errs.
The makedirs.sh shell script creates the directories for the INN system and must be run before you type make update. The latter command installs INN in the directories created by makedirs.sh.. Now you've installed the INN programs and are ready to configure your news system.
cd into the site directory and type make all 2>&1 | tee errs. This command copies files from the samples and backends directories and runs subst over them. Some of these files must be edited before you install INN. They give INN information it can't figure out on its own; for instance, with which hosts you exchange news.
The site directory also contains some utility shell scripts. You probably won't have to change these, but you should look at them to see what they do and ensure that paths to programs in them are correct.
Modifying the files in the site directory is the second most difficult part of configuring INN, especially if you haven't configured a news system before. However, INN won't work if these files aren't configured correctly, so you'll want to spend some time here. The files you must edit are shown below, each with a brief explanation of its function. There are manual pages for each of these files in the doc directory, and you'll need to read them carefully in order to understand their function and syntax.
expire.ctl controls article expiration policy. In it, you list a series of patterns to match newsgroup names and what actions expire should take for groups that match. This means that you can expire newsgroups selectively. The expire.ctl file is also where you tell expire how long you want it to remember Message-IDs. You can't keep a record of Message-IDs forever because your history file would grow without bound. Expire not only removes articles from the news spool but controls how long their Message-IDs are kept in the history file.
hosts.nntp lists the hosts that feed you news through NNTP. The main news daemon innd reads this file when it starts. If a host not listed in this file connects to innd, it assumes it's a newsreader and creates an nnrpd process to service it. If the host is in the file, innd accepts incoming news articles from it.
inn.conf contains some site configuration defaults, such as the names put in an article's Organization and From headers. For instance, your organization might want all From headers to appear as From: email@example.com, regardless of which host posted the article. Some of these defaults may be overridden by environment variables. For instance, if the user sets the ORGANIZATION environment variable, it overrides the default in inn.conf..
Articles posted to a moderated newsgroup are first mailed to the newsgroup's moderator, who approves (or disapproves) the article. If it's approved, the moderator posts it with an Approved header containing his e-mail address. The moderators file tells INN where to mail these articles.
The newsfeeds file describes the sites to which you feed news, and how you feed them. This is something you will already have arranged with the administrator of the sites which you feed. The important thing is for both sites to agree. For instance, if you feed the alt.binaries groups to a site that doesn't want them, it discards the articles, and you both waste a lot of CPU time and network bandwidth in the process. The newsfeeds file enables you to construct specific lists of newsgroups for each site you feed. For instance, one site might not want to receive any of the alt groups, and another might want all of the alt newsgroups except for the alt.binaries newsgroups. The newsfeeds file is also where you specify INN's behavior with respect to an article's Distribution headers. There are other parameters you can set here to determine whether articles are transmitted, such as maximum message size.
nnrp.access controls which hosts (and optionally, users) can access your NNTP server. When a newsreader connects to the NNTP port, innd hooks it up with an nnrpd process so it can read and post news. The nnrpd program reads the nnrpd.access file to see whether that host is allowed to read or post. The hosts may be specified as patterns, so it's easy to allow access to all the hosts in your organization. Reading and posting can also be controlled on a per user basis if your newsreader knows how to use the authinfo command, a common extension to NNTP.
passwd.nntp contains hostname:user:password triplets for an NNTP client (for example, a newsreader) to use in authenticating itself to an NNTP server.
Once you've edited the files in site, install them:
$ make install 2>&1 | tee errs
As usual, carefully inspect the make command's output for any problems.
System Startup Scripts and news cron Jobs
A news system doesn't run on its own. You must modify your system's boot sequence to start parts of it and create cron jobs for the news user to perform other tasks.
INN supplies the file rc.news to start the news system when your computer boots. For most SVR4 hosts, you should install it as /etc/init.d/news and make a hard link to it named /etc/rc2.d/S99news. (See the section "Modifying sendmail's Boot-time Startup" in Chapter 41 for more information on how SVR4 systems boot.)
The shell script news.daily should be run as a cron job from the news user's crontab. News.daily handles article expiration and calls the scanlogs shell script to process news log files. You should probably schedule this for a time when most people aren't using the news system, such as after midnight.
You'll also need to add a news user cron entry to transmit news to your USENET neighbors. INN supplies sample shell scripts that show several different ways to do this for both NNTP and UUCP neighbors. The scripts are copied into the site directory. The shell scripts nntpsend (and its control file nntpsend.ctl), send-ihave, and send-nntp are various ways to transfer news through NNTP. The scripts send-uucp and sendbatch are for sites using UUCP. Pick the one that most closely suits your site's needs, and add its invocation to the news user's crontab.
If you use sendbatch, edit it to ensure that the output of the df command on your system matches what the script expects. Unfortunately, the output of df varies a lot between vendors, and if sendbatch misinterprets it, you may have problems with your news spool filling up.
How often you should run the shell script depends on the needs of the site you're feeding. If it's an NNTP site and it wants to receive your articles as soon as they are posted, you could run one of the NNTP submission scripts every five minutes. If it's a UUCP site or an NNTP site on the end of a slow link, it might want news much less often. You must work this out with the remote site and make sure that your setup matches what it wants.
Miscellaneous Final Tasks
The active file shows what newsgroups are valid on your system. If you're converting to INN from another news system, you can convert your existing active file. Otherwise, you may want to get a copy of your feed site's active file and edit it to remove newsgroups you don't want and add local groups.
You must also create a history file or convert your existing one. Appendix II of Install.ms gives information for converting an existing news installation to INN.
Even if you didn't run the BUILD shell script to build and install INN, you can save the last 71 lines of it into a file and run that file to build a minimal active file and history database. You can then add whatever lines you want to the active file.
Some vendors' versions of sed, awk, and grep are deficient and may need to be replaced with better versions before INN can function correctly. The GNU project's versions of these commands work well with INN. They are available for anonymous ftp from the host ftp://prep.ai.mit.edu in the directory ~ftp/pub/gnu.
You may also have to modify your syslog.conf file to match the logging levels used by INN. These logging levels are defined in config/config.data, and the file syslog/syslog.conf shows sample changes you may need to make to your syslog.conf.
Checking Your Installation and Problem Solving
If you have perl installed on your system, you can run the inncheck program to check your installation. You should also try posting articles, first to the local group test and then to groups with wider distributions. Make sure that articles are being transmitted to your USENET neighbors.
If you have problems, many of the INN programs are shell scripts and you can see what they're doing by typing sh -x scriptname.. You might also temporarily modify a script to invoke its programs with their verbose options turned on. For instance, the nntpsend article submission shell script calls the innxmit program to do the work. If nntpsend wasn't working for you, you could edit it to turn on innxmit's verbose option (-v), run it by hand as sh -x nntpsend, and save the results to a file.
Some simple NNTP server problems can be checked with the telnet command. If you know the NNTP protocol, you can simply telnet to a host's NNTP port and type commands to the NNTP server. For instance:
$ telnet some.host.edu nntp Trying 220.127.116.11 ... Connected to some.host.edu. Escape character is '^]'. 200 somehost NNTP server version 1.5.11 (10 February 1991) ready at Sun Jul 17 19:32:15 1994 (posting ok). quit
(If your telnet command doesn't support the mnemonic name for the port, substitute 119 for nntp in the command above.) In this example, no NNTP commands were given other than quit, but at least you can see that the NNTP server on some.host.edu is willing to let you read and post news.
If your news system develops problems you can't solve on your own, comp.news.software.b and comp.news.software.nntp are good resources. You'll get much better advice if you do two things: First, read the INN FAQ and other INN documentation and see if the problem is listed there. Imagine your embarrassment when you ask your burning question and the collective answer is, "It's in the FAQ. Read it." Second, make sure you include enough information for people to help you. A surprising number of problem posts don't even tell what version of UNIX the person uses. Your article should include the following:
If you do a good job of researching your posting, you may even figure out the problem on your own. If you don't, you'll get much better advice for having done the work to include the necessary details.
This chapter gives you a good start on becoming a news administrator, but installing the software is only the beginning of what you'll need to know to keep your news system running. Most of your additional learning will probably be in the form of on-the-job training, solving the little (and big) crises your news system creates. Your best defense against this mid-crisis style of training is to read the INN manual pages, the INN and news.software.b FAQs, and the news.software.* newsgroups. The more information you pick up before something goes wrong, the better prepared you are to handle it.