by Mike Morgan
If you've drawn the task of designing your organization's Web site, your first question should be, "Why do we want a Web site?" Some organizations are putting up Web sites just because everyone else is doing it. The most successful organizations, however, are following a three-step process:
This approach is called OMR for the three key components: Objectives, Methods, and Resources. The key to OMR's success is to deal with these three components in the order given.
Here's an example of OMR: suppose you are designing a Web site for a residential real estate broker. As you talk with the broker and her agents, you find that they have plenty of buyers. In their market, the limiting factor in their company's growth is the number of homes available for sale. You determine that a reasonable objective is to locate people who want to sell their homes and persuade them to consider listing their homes with the broker.
Objectives and Goals Before you begin to design the site, you need to make the objective more specific. Once the site is up, you'll need to determine whether the objective has been met. You'll want to continually place fresh material on the site--choose material that continues to support the objective, so that the performance of the site gets better and better. A goal is a measurable objective. It should include a time and a level of performance. For the real estate broker's site, for example, the objective may be "locate people who want to sell their homes." One goal might be, "Within six months of being published, the site will generate four leads a month."
Most businesses define their objectives and goals in terms that translate into sales. Their objectives and goals may take one of several forms. Here are some ideas:
NOTE: The HTML and graphics for the Nikka Galleria site are on the CD-ROM that accompanies this book, in chap44/nikka/.
Nonbusiness organizations do not measure their success in sales, per se. For example, a political organization might define its goals in terms of funds raised or votes cast. A nonprofit organization might use its site to communicate with its volunteers and to encourage other people to volunteer their time.
Methods Once the site owner has identified an objective and one or more goals, it's time to design the site. The design depends upon the goals, the audience, and the industry. The best designs are developed by a team whose members understand all three aspects of site design. Here's one approach to developing a site design:
Use each of these four steps to design a different part of the Web site and its associated business processes. While you should follow these four steps in the order given, you should use them as the basis for design in reverse order:
Using the real estate broker as an example, here's how you might answer those four questions:
Use the answers to these questions to develop several different designs. Often you'll get the most variation in the second question: What information and motivation does the prospect need? It's easy to add multimedia "sizzle" to the site. You're more likely to reach your goals, however, if you develop solid content first and then use the presentation (such as multimedia) to make the content more effective.
NOTE: Many companies choose a design that is loaded with content. They then place their offer and their form deep within the site and provide "ads" at strategic places on the content pages. For example, if you are a mortgage broker, consider using your site to educate the visitors about the range of mortgages available and letting them see for themselves which product might be best for their needs. Then provide forms on which they can request additional information, or even prequalify for a mortgage.
Most Web experts agree that a content-rich site can be quite effective even if the presentation is limited to text and simple graphics, while a sensational multimedia site with little content is more likely to generate awards than leads.
For each design, use your own experience on other sites and your company and industry's experience to estimate how effective the design will be. This information may be difficult to come by, particularly if your company or industry is new to the Web. You may want to start with a simple site that's strong in content and then add dynamic HTML and multimedia a little at a time and see how the change in the presentation affects the response. For more information about measuring the effectiveness of the site, see the comments in "Measuring the Results," later in this section.
For each design, you should also develop a budget. The budget should include:
Resources Suppose you've followed the recommendations given and have developed the design alternatives shown in Table 44.1. Here the "high-speed" design might represent a high-speed connection to the Internet with lots of graphics and multimedia.
ON THE WEB: http://cnn.com/ This site, shown in Figure 44.1, represents a typical "high-speed" site. No expense was spared--the site includes dynamic content, Java applets, and, of course, QuickTime video.
CNN spared no expense in developing its Web site.
"Low-budget" represents an attractive site with text and graphics, hosted on a virtual host by an Internet service provider.
ON THE WEB: http://www.dse.com/gsh/ This Realtor's site, shown in Figure 44.2, has none of the frills of the high-speed site but has paid for itself many times over.
This site offers information for both home buyers and sellers. It paid for itself within 30 days of going online.
"Middle-of-the-road" is a compromise design--less bandwidth than the high-speed connection, and with a tighter graphics and multimedia budget.
ON THE WEB: http://www.familychannel.com/ The Family Channel's site, shown in Figure 44.3, is far less elaborate than CNN's but still offers a nice blend of graphics and information.
|Leads per Costs|
|Low-budget||10||$ 4,000||$ 500||$50|
The family graphic in the upper-right corner is continually refreshed by a Java applet.
Selecting the Best Alternative Your sales and accounting staff can help you turn this information into a cost-benefit analysis. For example, the sales staff may tell you that for every 10 qualified leads you give them they'll average $10,000 in net revenue. The accounting staff may tell you that the company computes the value of an investment by using its Internal Rate of Return (IRR). Between the sales staff and the accounting staff, you should be able to prepare a cash flow projection such as the one shown in Table 44.2.
|1||$(40,000)||0||$ -||$ -||$(40,000)|
|2||$ (4,000)||0||$ -||$ -||$(4,000)|
|3||$ (4,000)||10||$ (500)||$10,000||$ 5,500|
|4||$ (4,000)||15||$ (750)||$15,000||$ 10,250|
|5||$ (4,000)||20||$(1,000)||$20,000||$ 15,000|
|6 thru 24||$ (4,000)||40||$(2,000)||$40,000||$ 34,000|
The Net column in Table 44.2 represents an Internal Rate of Return of 34 percent over the first 24 months of the project.
TIP: Be sure to consult your accounting staff before developing your data. Each company uses different formats and standards--you'll save yourself some work if you develop the data using your company's forms.
Following the same logic, the IRR of the middle-of-the-road alternative is 52 percent, and the IRR of the low-budget alternative is 70 percent. Based on these figures alone, you would recommend the low-budget alternative.
NOTE: If you're not a financial expert, use the built-in formulas in your spreadsheet to assess the relative values of your designs. For example, in Microsoft Excel you can get an Internal Rate of Return for a block of cells by using the IRR() function. Be sure you check with your accounting staff to see that you are using the correct function.
Note, too, that many organizations have a hurdle rate--a minimum rate of return the project must offer if it is to be considered. If your company's hurdle rate is 40 percent, you could drop the high-speed alternative from consideration, since its IRR is 34 percent.
The figures in this example are illustrative only. While many organizations get an excellent rate of return from relatively simple Web sites, others easily justify the extra expense of high-speed connections, high-end servers, and multimedia Web pages. Do your own calculations and select the alternative that's right for your organization.
Most of the expense of developing and maintaining a Web site has to do with the cost of development. Once you've spent the company's money to bring a visitor to the site, give that visitor many different ways to satisfy your goals. For example, consider allowing the visitor to make any of the following decisions:
Many marketing experts call this technique "selling into all levels of the organization." You can also use this technique to convert more visitors into customers by having a wide range of price points. Suppose your product is fine art selling for $1,000 and up. Consider offering limited edition prints for a few hundred dollars, and posters for a few dollars. Once you have converted a visitor to a customer, you can learn more about the customer and offer additional products that meet that customer's needs.
TIP: Consider placing a cookie on your visitor's machine. Use it to record the date and time of his visit. When the visitor comes back, use dynamic HTML to read the cookie and show them which pages have changed since the last visit. For example, you might place an attractive "New" graphic next to pages that have been updated since the last visit.
The Greek philosopher Socrates is credited with saying, "The unexamined life is not worth living." Modern quality experts have restated this position: "You cannot improve something you do not measure." With that thought in mind, let's look at ways to add "instrumentation" to the site to determine how well you're doing at meeting your goals.
Interpreting the Log All industrial-strength servers maintain a variety of logs that show how many requests your server handled for each page. The raw numbers from these logs are often called hits. For your purposes as a Webmaster, hits are a meaningless number. Suppose you have two pages, one with three graphics and one with seven. The first page will generate four hits every time someone requests it--one hit for the page itself and one for each graphic. The second page will generate twice the number of hits even if its popularity is exactly the same as the first page. When you designed your site and estimated the number of people who would fill out the site's form each month, you probably used two figures to estimate the site's effectiveness: the number of visitors per month and the number of leads generated per hundred visitors.
Here's a simple procedure for computing the effectiveness of a Web site:
Suppose you find that 1 visitor in 100 fills out the form and becomes a qualified lead, and your goal is 30 qualified leads per month. You find that your site's home page, http://www.xyz.com/index.html, is being requested 1,000 times per month (leading to 10 leads per month). You may decide that, considering the size of your audience, 1,000 visitors per month is low. By promoting your site more effectively (both online and by traditional means), you may be able to increase the number of visitors. If you can generate 3,000 visitors per month, you'll reach your goal of 30 leads per month.
For more information on increasing the total number of visitors to your site, see "Making Your Site Findable," later in this chapter.
The rest of this section describes how to increase the likelihood that a visitor will become a qualified lead.
Computing Page Loss Ratios Depending upon the design of your site, you'll often have a typical path a visitor might take through the site. Consider, for example, the simple six-page site shown in Figure 44.4. This site has been designed with content about the company (on index.html), the products (on products.html), and the staff qualifications (on staff.html).
Use a graphical view of the site to identify natural paths.
A typical visitor might start at the home page (index.html) and go down the product information path (products.html) or the people path (staff.html). They might also explore one path and then follow a link to the other path. Once the questions about the products and staff are answered, the visitor is encouraged to go to a page that helps them select the right product for their needs (choose.html). Finally, they fill out a form (form.html) by which they identify themselves to the sales staff. When they've successfully filled out the form, a Thank You page (thanks.html) appears.
NOTE: You can use this model for much larger sites if the site is arranged into sections. For example, you could have 12 pages of product information and 4 pages about the staff, but still use this general model to track a visitor's progress through the site.
Use the server's log to identify how many visitors came to each page. A simple examination might reveal the information in Table 44.3.
TIP: If your server cannot give you the kind of figures shown in Table 44.3, use your operating system's command-line utilities. For example, in UNIX you can use grep and wc -l to quickly discover the total number of visitors to your home page in the month of April of 1998:
grep "Apr/1998" access.log | grep "index.html" | wc -l
Copy this information back onto your site schematic to get a better feel for these numbers. Figure 44.5 shows these figures on our example site.
If you assume that the number of people who start their visit at an interior page (that is, not the home page) is small, then the number of visitors to this site is about 1,000. (You can examine the access and referrer logs to verify that the home page is most visitor's starting point.) Of those 1,000 people, only 10, or one in 100, made it all the way to the Thank You page that comes after the form. To improve that ratio you need to understand where these folks are dropping off.
Annotate the site schematic with figures from the log.
In general, a site loses people at three points:
If you see large numbers of people who come to the home page and leave without further examining the site, one of two factors may be at work. First, these visitors may have gotten to your site by mistake. Perhaps they entered keywords in a search engine, and the search engine recommended your site inappropriately. For more information about search engines, see "Making Your Site Findable," later in this chapter.
Use your server's referrer log to see how people came to your site. If many of the people who left after reading the home page came from a single search engine, check that search engine to see how your home page is categorized. You may be able to change the terms that match your site in order to decrease the number of disappointed visitors.
Sometimes you'll find that visitors come to your site but are turned off by your home page and don't explore the site further. Examine your home page to see if you can find elements that are offensive or aesthetically displeasing. Consider testing your home page with focus groups or other human testers to see whether they see a problem. Examine the server's user-agent log to see if the visitors who leave your site early tend to be using a particular browser. If you're not careful to validate your site's HTML, some browsers will render your site in unpleasant colors or jumbled type. A few browsers may even crash if your HTML is particularly offensive. See Chapter 39, "Verifying and Testing HTML Documents," to learn how to write high-quality HTML.
In this example, half of the site's visitors never get beyond the home page. The Webmaster needs to take the steps recommended here to find out why the internal pages are not being visited more frequently.
If a visitor moves beyond your home page, it's likely that he came to the site he wanted to visit and the home page gave him enough information that he was willing to look inside. It's quite possible that this person represents a qualified lead--you would like to see this person fill out your form so you can follow up and help meet his needs. Look at Figure 44.5 again. Of the 500 visitors who came past the home page and went into the interior of the site, only 100 made it to the form. Four-hundred people visited products.html--300 from the home page and another 100 from staff.html. Only 70 of those, or 17.5 percent, got to choose.html and then the form. Similarly, 300 people visited staff.html, but only 30 of those, or 10 percent, eventually visited the form. Perhaps there is something on these interior pages that convinces the visitor that they do not want the product. Examine the pages carefully. Use focus groups or other human evaluators to help you determine why these pages do not motivate visitors to move further.
Finally, of the 100 visitors who actually arrived at the form, 90 percent left without filling it out. Make sure the form is visually attractive. Consider adding a free offer, or reassuring the visitor that filling out the form does not place them under any obligation to buy the product. If your form is an order form, make sure you offer a guarantee. Again, consider using human evaluators to examine your form and determine how you might improve your offer.
If you attract large numbers of qualified visitors to your site and present them with a clear, compelling description of your offer, many of them will choose to fill out your form and request follow-up.
ON THE WEB: http://www.tlc-systems.com/dir.html Web-Scope is one of the few analyzers that reports visitors' paths through your site. Web-Scope can generate a summary report for 16 days previous to its report. It also reports an interesting statistic: "pages per visitor." See the sample report in Figure 44.6.
Manual examination of Web-Scope's output shows paths, depth, and dwell-time for each visitor.
Tracking Repeat Visitors Often visitors will come to your site, visit for a while, and then leave. Some of these visitors will return later, possibly to an interior page and will go on to complete the form and become qualified leads. Store a cookie on the visitor's browser when he or she first comes to your site. Log information such as the referrer and the date and time of the visit. Check for that cookie on every page. If it doesn't exist, you have a first-time visitor coming directly to an interior page, possibly as a referral from a search engine. If the cookie does exist, store a record of this visit in the cookie and make a note of the visit on the server. Repeat visitors have often seen something on your site that they like and are coming back for more information. If you have a content-rich site, they may be using you as a resource. Keep offering them fresh, high-quality content. If your content is meeting their needs, they are likely to become customers one day.
Refining Referrers When a Web browser contacts your server, it sends a series of lines defined by the HyperText Transfer Protocol (HTTP). One of those headers is Referer. Your Web server can be configured to write these headers into a log (commonly called the referrer log). It will also make the contents of this header available to server scripts. (For example, in CGI the variable name is REFERER.) Use your server's referrer log or a server script to identify which page contains the link that the visitor followed to get to your site. You can use this information to:
Much of the previous section, "Objectives, Methods, and Resources," describes ways to increase the rate at which visitors to your site become qualified leads and, eventually, sales. The foundation for converting visitors to leads, of course, is to have plenty of visitors. This section describes how to make your site findable by people who want to know more about your products, services, and technology.
Suppose an Internet user wants to learn about mortgages in southeast Virginia (a region known as Hampton Roads). The user might go to any of a dozen or more search engines and enter a search phrase such as "mortgages AND Hampton Roads Virginia." The Hampton Roads area consists of several cities--the user might also enter "mortgages AND Virginia Beach" or "mortgages AND Norfolk." If your company offers mortgages in that region, your site should be listed in response to any one of these queries.
You can explicitly submit your site to many search engines. Indeed, at least once search engine--Yahoo!--lists only sites that have been explicitly submitted. Most search engines, however, operate robots that explore the Web. When the robot encounters a new page, it examines the page and attempts to classify it by keywords it finds on the page. Robots are the most important way that your interior pages get indexed on the Web.
TIP: Each search engine looks at slightly different elements of the page in order to find keywords. For best results, make sure the following HTML elements accurately reflect your page's contents:
- Headers (H1 through H6)
- <META> tags with keywords
- The first few sentences of text in the <BODY> tag
You should examine the server's referrer log regularly. If you find that one or more of the popular search engines is not contributing many referrals, use that search engine and try to look up your site. If the page is not correctly indexed, resubmit the page to the site.
You may find that your site is correctly indexed, but appears on a list with hundreds of similar sites. Look for ways to make your site stand out from the others. Some search engines copy the first few characters from the BODY into the page's entry. Make sure the first sentence or two of each page contains an accurate summary of the page that will make sense to someone reading the page out of context.
NOTE: Some search engines do not look inside FRAMESETs to index the contents of the FRAMEs. If you have important information on an interior page, consider offering it on a nonframe page as well, and make sure that page gets indexed. Place a link on the nonframe page offering the frame version of the information.
Getting into Yahoo! Yahoo! (www.yahoo.com) is quite possibly the most popular search engine on the Web. They maintain their quality by being different from the robot-driven search engines. Yahoo! itself is a searchable catalog. Human indexers (called Yahoo! Surfers) visit each submitted site and classify it. To get your page into Yahoo!, follow these three steps:
TIP: If your site is commercial--it sells something or describes a company that offers products or services--you should look for your category inside the Business and Economy:Companies hierarchy.
TIP: If your company offers its products or services in a specific geographic region--say, one city--submit your page in the appropriate Business and Economy:Companies category, but make sure the regional nature of your business is obvious on your home page. The Yahoo! Surfer will add a cross-reference from the Regional hierarchy.
While you're getting into Yahoo!, visit Magellan (www.mckinley.com). Like Yahoo! (and unlike nearly all other search engines) Magellan is oriented around categories. In general, you get into Magellan by getting listed in Excite. Follow the Add Site link from Magellan's home page to get their latest policy.
TIP: While you're exploring Yahoo! categories, take note of the Directories link that appears at the top of most category pages. This link lists category-specific or industry-specific pages of links. In many industries, it's as important to get listed in the right directory as it is to get placed in the search engines. Visit the relevant directory sites to learn how to add your company to the directory.
The Other Search Engines Except for Yahoo! and Magellan, most search engines rely on automatic classifiers to decide how to index your pages. These search engines look at the title, header tags, and <META...> tags. If you follow the guidelines given later in this chapter, in the section "Making Your Site Searchable," most robots will properly index you.
NOTE: Some site developers have observed that many requests are made to the search engines for pornographic sites. They then add a large number of sexually explicit keywords to their site in order to build a large number of hits.
Not only is this strategy (called spamdexing) of dubious logic, but the search engines themselves will block it. If they determine that your page is full of repeated keywords, all major search engines will simply ignore your page.
While there are hundreds of search engines, only a few are generally considered to be the "major" search engines. These few include:
While you may choose to be listed in other search engines, you should start with these, and then move on to the less popular (but possibly more specific) search engines.
TIP: Once you've decided on the keywords that should represent your site, go to an integrated search tool such as SavvySearch (http://guaraldi.cs.colostate.edu:2000/) and submit a search based on those keywords.
SavvySearch develops a search plan based on its knowledge of more than two dozen different search engines. You can navigate through all of those search engines to see who else has sites similar to yours. This information may help you decide which search engines you want to be in, and how you want to tailor your site to distinguish it from the competition.
ON THE WEB: http://www.kcpl.lib.mo.us/search/srchengines.htm--This site, developed by the Kansas City Public Library, describes seven major search engines.
It's easy to get into the search engines--the real trick is to make sure your page is so well-indexed that users who are looking for a product or service that you offer will find your site easily. When possible, you should be at the top of the list. You should certainly strive to be among the top five or ten sites listed.
Most search engines rank the sites they report by relevance, but each search engine has a slightly different idea of what aspects of a Web page make it "relevant." The more you learn about the different search engines, the easier it is to ensure that your pages score well.
ON THE WEB: http://www.searchenginewatch.com/features.htm Learn about the search engines from the point of view of a site developer.
TIP: For some of the most specific recommendations about scoring well, visit http://www.deadlock.com/promote/. This site, authored by Jim Rhodes of Deadlock Promotions, covers every detail of getting into the search engines and scoring well. Rhodes also offers an e-zine that describes his latest products and findings.
The process of getting well-placed in the search engines is not perfect. After you submit your page, wait a week or two, then check to be sure you're properly indexed. If your page doesn't appear, or if it appears but is not indexed as you would like it to be, make any changes you need to make to your site and resubmit it to the search engine.
Many Web experts are convinced that users find more Web sites through traditional media than they find online. It's important to get your site and its pages listed in the major search engines. You should also work with your company's public relations and advertising specialists to make sure that your site is part of the company's overall presence.
Make Your URL Memorable If a radio announcer is to read your URL on the air or a driver is to note it on a billboard, the URL must be short and memorable. If your company's name is XYZ, Inc., clearly you should favor an URL of http://www.xyz.com/ over http://www.myServiceProvider.com/~xyz/. Of course, if your company is new or if you're just beginning to establish a Web presence, you may find that the "obvious" domain name is already taken. Look for other obvious names for your company, such as xyzonline.com or xyznet.com. If your organization's audience is primarily associated with one geographical area, consider using a geographical domain name such as xyz.va.us. Many companies are dropping the http:// portion of the URL. Many users know enough to put it in, and modern browsers such as Netscape Communicator will default to the HTTP protocol if a user should leave it out entirely. Other companies are making the www optional. For example, CNN gives out its URL as http://cnn.com. You should make sure your name server is set up so that your URL works whether the user includes www or not.
Put Your URL Everywhere Chances are, your company already has a successful program to get the company's name and products known in the marketplace. Make sure your public relations and advertising staff are aware of your Web site. Every piece of paper that leaves your offices--business cards, stationary, purchase orders, invoices--should have your URL.
TIP: For years advertisers have included a "key" in their address, such as a department code, so they could tell which advertising channels were effective. Consider adding a key to your URL. For example, your TV ads could reference www.xyz.com/tv/ while your ads in a magazine for personal computer owners might reference www.xyz.com/pc/. When a user requests one of these pages, set a cookie recording the key and then redirect his or her browser to your home page. Now you can collect statistics on your visitors and sort them by advertising channel.
Get People Talking Your public relations (PR) staff is familiar with press releases, and probably works with editors from your industry's magazines on a first-name basis. They know that these editors are deluged with press releases about new corporate Web sites. Help your PR staff find something unique about your site to promote. Do you have an online magazine or a clever demo? That information is likely to be of more interest to the trade magazine's readers than the simple fact that you're online.
ON THE WEB: http://www.compare.net/ If your product is sold to consumers and end-users, it may be appropriate for it to be compared on one of the consumer product guide sites. Compare.net, or its counterpart, www.productreviewnet.com, are good starting points.
If your site is truly unique, you may be able to get it promoted on television. CNN and the major networks often feature Web sites, particularly on their weekend or mid-day topical programs. MSNBC, the joint venture between Microsoft and NBC, is constantly looking for good material for its show, The Site. If you get picked up by MSNBC, you may also find your site featured on CNBC or even an NBC News broadcast. Find out from your PR staff whether getting your site featured on any of these programs is possible.
TIP: Appoint someone in your organization to read the articles in UseNet newsgroups that are relevant to your company. Whenever possible they should answer questions posted to the newsgroup, or provide help. At the bottom of their response they should include a .sig file that mentions your company's products or services and gives the URL.
You should never spam a UseNet newsgroup, but by posting appropriate and useful information you'll get a good reputation and build name awareness. You'll also be listed in the UseNet search engine at www.dejanews.com, which increases the likelihood of people finding your site.
If you've adopted a content-rich design, your site may actually look like many different sites. For example, the Family Channel's home page reveals six different areas of information:
The Family Channel's site contains streaming video in RealPlayer format.
Any site with more than about 20 pages should include an index page, such as the one shown in Figure 44.8.
Larger sites cannot index every page--they should take a two-pronged approach. First, build an index of sections, such as the site index provided by Netscape and shown in Figure 44.9. Second, make the site searchable, so that a user can go directly to any relevant page, as shown in Figure 44.10.
You can make your site searchable by adding software to automatically index all pages. You can also use a catalog server (such as Netscape's Catalog Server, a component of SuiteSpot) to maintain a taxonomy of your site. The rest of this section describes these two techniques.
Even a small site can benefit from an index of pages.
Large sites should index their sections.
Apple Computer's site contains 114 references to "Mac OS 8 multitasking."
One of the best ways to add value to your site is to make it searchable. In order for your site to be searchable, you need three components: an index, a search engine, and a presentation manager that displays the results as HTML. As shown in Table 44.4, search software can be classified by how it maintains these components.
|site-idx.pl||From <META> keywords||via ALIWEB||via ALIWEB|
NOTE: Some HTML tools such as Microsoft FrontPage and Backstage have searching features built-in. These tools are covered in Chapter 42, "Using HTML and Site Tools."
HTGREP and site-idx.pl are relatively simple indexers but are quite useful for small sites. Larger sites will benefit from the Wide Area Information Server, or WAIS. WAIS servers and their HTTP gateways can be challenging to set up--many corporate Webmasters prefer to use servers such as the Netscape Enterprise server, which includes a built-in search engine.
Using HTGREP HTGREP was developed by Oscar Nierstrasz at the University of Berne, Switzerland.
ON THE WEB: http://iamwww.unibe.ch/~scg/Src/Doc/htgrep.html You can download HTGREP and its documentation from this Web site.
This section shows an example of HTGREP at work on the Nikka Galleria site (http://www.dse.com/nikka/). At any given time, several works of art are available for purchase. Visitors to the site can find something close to what they're looking for and then use HTGREP to search the site for similar works.
Working with site-idx As its name suggests, site-idx.pl is an indexer, but it bases its work on keywords supplied by the Webmaster on each page. The result of running site-idx.pl is an index file that can be submitted to search engines such as ALIWEB. This section introduces a simple ALIWEB-like search engine that can read an index file and serve up pages based on the index file's contents.
ON THE WEB: http://www.ai.mit.edu/tools/site-index.html site-idx.pl is the work of Robert S. Thau at the Massachusetts Institute of Technology. This program was written to address the indexing needs of ALIWEB (http://www.nexor.co.uk/aliweb/doc/aliweb.html).
Unlike most search engines, ALIWEB relies neither on human classifiers (like Yahoo! does) nor on robots. ALIWEB looks for an index file on each Web site and uses that file as the basis for its classifications.
The indexing is done by the site developer at the time the page is produced, the search is done by ALIWEB (or a local ALIWEB-like CGI script), and the results are presented by that CGI script.
The index file must be named site.idx and must contain records in the format used by IAFA-compliant FTP sites. For example, the events-list document on the server at the MIT Artificial Intelligence Laboratory produces the following entry in http://www.ai.mit.edu/site.idx:
Template-Type: DOCUMENT Title: Events at the MIT AI Lab URI: /events/events-list.html Description: MIT AI Lab events, including seminars, conferences, and tours Keywords: MIT, Artificial Intelligence, seminar, conference
The process of producing site.idx would be tedious if done by hand. Thau's program automates the process by scanning each file on the site, looking for keywords. The recommended way to supply these keywords is with <META> tags in the header. <META> tags have the following general syntax:
<META NAME="..." CONTENT="...">
Valid names include:
Remember that the descriptions ultimately appear in a set of search results. Each description should stand alone so that it makes sense in that context. Thau's program uses the HTML <TITLE> tag to generate the document title. Thus, a document at MIT might include the following lines in the <HEAD> section:
<TITLE>MIT AI lab publications index</TITLE> <META NAME="description" CONTENT="Search the index of online and hardcopy-only publications at the MIT Artificial Intelligence Laboratory"> <META NAME="keywords" CONTENT="Artificial Intelligence, publications"> <META NAME="resource-type" CONTENT="service">
By default, site-idx.pl looks for the description, keywords, and resource type in <META> tags. This behavior can be overridden so that any document with a title gets indexed, but the override undoes most of the benefits of using site-idx.pl.
Taking Advantage of WAIS For large Web sites, the best search engines are full-index systems. The archetype of this family is the Wide Area Information Server, or WAIS. This section describes WAIS and its numerous cousins, all of which are characterized by automated indexing, powerful search tools, and a gateway between the database and the Web.
ON THE WEB: http://www.cis.ohio-state.edu/hypertext/faq/usenet/wais-faq/ Check out this Frequently Asked Questions list for an overview of WAIS. User-level information on queries is available at http://town.hall.org/util/wais_help.html.
WAIS is a different service than the Web. WAIS is based on ANSI Standard Z39.50, version 1 (also known as Z39.50 88). Clients exist for most platforms, but the most interesting work lies in integrating WAIS databases with the Web.
The document author(s) and you, as the installer, must agree on a document format, so that the format file can prepare a meaningful index. If the document authors routinely use <META> tag keywords in a standardized way, you can build a format file to extract the information from those lines.
Because the Web and WAIS use two different protocols (HTTP and Z39.50, respectively) there must be some program or programs between the Web user and the database to format the query and present the responses. One approach is to use a CGI front end to a WAIS server.
ON THE WEB: http://ls6-www.informatik.uni-dortmund.de/SFgate/SFgate.html To access a freeWAIS-sf database from the Web, use SFgate, a CGI program that uses WAISPERL, an adaptation of PERL linked with the freeWAIS-sf libraries. You can find SFgate online.
The Enterprise Server's Built-In Search Engine etscape took note of the fact that many Webmasters wanted to make their sites searchable, but didn't want to deal with the complexities of WAIS. Netscape's solution was to build the Verity full-text search engine into its Enterprise Web server, beginning with version 3.0. Many sites benefit from full-text search, but if you wait until a user makes a request to search your files, your server can get bogged down reading files from the hard drive. Verity, the company that supplies the full-text search engine for Netscape's Enterprise server, has designed its search engine to work from sets of indexes called collections.
You build a collection over all of the files in a single directory or, optionally, a directory tree. Figure 44.11 illustrates the two kinds of collection possible. Figure 44.11a shows a collection based on a single directory (which may have many documents). Subdirectories are not included in this collection. Figure 44.11b shows a collection that includes all of the documents in a directory and all of its subdirectories.
Collections may be based on a single directory or on a directory with its subdirectories.
A user searches for a document in a three-step process:
In general, you design an HTML page for each of these steps. (As you gain experience in setting up text search systems, you may decide to combine any two of these steps into a single page.)
If you were writing a conventional HTML page, you could specify everything in your search, results, and documents pages with static HTML. Since the contents of these pages is based on the contents of your site, however, you need to write pattern files instead of static documents. In your pattern files you include pattern variables--identified by two leading dollar signs (for example, $$background).
Netscape has already defined many useful pattern variables. You can add additional pattern variables if you like.
Once you have your collections designed and built and you have selected the pattern variables you want to use, you're ready to call the search function /ns-search. You can pass pattern variables to /ns-search using GET or POST. To search the finance and manufacturing collections for the string Model 1000, you could include the following HTML on your query page:
<A HREF=/ns-search?NS-collection=finance&NS-collection=manufacturing& NS-query=Model+1000>Search for finance and manufacturing information about the Model 1000</A>
After you've called /ns-search, that function sets some reserved pattern variables of its own. You can use these variables in subsequent queries:
Listing 44.4 shows how these variables can be used in a pattern file.
<HTML> <HEAD> <TITLE>ES3.0 TEST: $$banner</TITLE> </HEAD> <BODY BGCOLOR="$$background"> <TABLE WIDTH=100%> <TR> <TD ALIGN=LEFT><IMG $$logo></TD> <TD ALIGN=RIGHT><H1>$$sitename</H1></TD> </TR> </TABLE> <HR> To search for an article, choose a subject, then enter a single word, several words, or a phrase. You can get <A HREF="$$help">search tips, </A> or perform an <A HREF="/ns-search? NS-query-pat=/text/NS-advquery.pat">advanced search.</A><P> <TABLE CELLPADDING=5> <TR><TD ALIGN=RIGHT><B>Subject: </B></TD> <FORM METHOD="POST" ACTION="/ns-search?NS-search-page=results"> <INPUT TYPE="HIDDEN" NAME="NS-search-type" VALUE="NS-boolean-query"> <INPUT TYPE="HIDDEN" NAME="NS-max-records" VALUE="$$NS-max-records"> <TD>$$NS-collection-list-dropdown</TD></TR> <TR><TD ALIGN=RIGHT><B>Search for: </B></TD> <TD ALIGN=LEFT><INPUT NAME="NS-query" SIZE=40 VALUE="$$NS-query"> &NBSP&NBSP&NBSP&NBSP<INPUT TYPE="SUBMIT" VALUE="Search"></TD></TR> </FORM> </TABLE> <HR> <B><FONT SIZE=-1>$$copyright</FONT></B> </BODY> </HTML>
Figure 44.12 shows how this page looks from the browser.
Most Internet users have visited Yahoo! or one of the other catalogs of the Internet. By using the Netscape Catalog Server, you can build your own catalog--of your site, or of the Internet as a whole. This section describes the Netscape catalog server.
The Catalog server system includes a Resource Description Server, or RDS, which is responsible for exploring a designated portion of the Web and returning results to the Catalog server itself. Each RDS launches one or more robots, which summarize the Web resources (typically HTML pages) that they find based on HTML tags and <META...> tags. Figure 44.13 shows how these components work together.
Use Netscape's default query page as a starting point for building your own queries.
A typical Catalog server configuration includes several RDSs, many robots, and of course, a Catalog server.
The Catalog server uses the Resource Description Format, or RDF, to store its summary objects.
Browsing Sometimes a user doesn't know enough about a topic or the catalog to build a good query. She may want to explore the catalog first to find out what sort of information is available. You facilitate her exploration by building a taxonomy of the resources in the catalog. Figure 44.14 shows the top level of the sample taxonomy supplied by Netscape.
Netscape supplies a typical corporate taxonomy.
If a subcategory has further categories under it, the Catalog server makes that subcategory a link. The user can click such a link to make the subsubcategories. If the check box Retrieve Documents During Navigation is selected, the taxonomy will show documents as well as subcategories.
NOTE: Retrieving documents during navigation significantly slows down navigation. Encourage your users to use the browser to navigate to the correct category and then use Search to retrieve relevant documents.
If the user wants to see the entire taxonomy in one window, they can click the View Contents Tree button.
What's New and What's Popular The user can get the equivalent of a What's New or a What's Cool button choosing New or Popular from the pop-up View menu. If you're building customized versions of these pages, consider setting the currSearchType parameter to New or Popular to provide such a link directly.
If you've determined that the objectives and goals of your site justify the use of sophisticated packaging such as dynamic HTML and multimedia, use the techniques described in this section to enhance your site.
One advantage of HTML 3.2 and, now, HTML 4.0, is that the HTML standard closely matches the actual practice found in advanced browsers such as Netscape Communicator and Microsoft Internet Explorer 4.0. For example you can use Cascading Style Sheets to isolate style issues in your page, allowing you to concentrate first on content and then add stylistic embellishments. You can also use the <FRAMESET> tag to divide the browser window into frames; this practice is supported in both Netscape and Microsoft products. Learn more about Cascading Style Sheets in Chapter 17, "Applying Cascading Style Sheets."
Many libraries of Java applets and ActiveX objects are available on the Web, and both Netscape Communicator and Microsoft Internet Explorer support these objects. HTML 4.0 provides a standard interface for including such objects in your pages. If you need a fast way to present certain content in a highly graphical way, consider using objects from one of these libraries.
One of the most impressive ways of delivering content is to use streaming audio or even streaming video. Unlike simple graphics such as GIF and JPEG files, streaming audio and video are not native formats for Web browsers. Users need to install browser plug-ins in order to use streaming audio or video.
ON THE WEB: http://home.netscape.com/comprod/products/navigator/version_2.0/plugins/audio-video.html You can get a current list of audio and video plug-ins for Navigator online from the Netscape site.
Much of the early work in streaming audio was done by Progressive Networks (http://www.real.com/). Its product, RealAudio, supports audio at the quality of a strong AM broadcast station over a 14.4Kbps connection. Over a 28.8Kbps connection, the quality improves to that of FM stereo. ISDN and faster connections support near-CD-quality audio.
Progressive's RealVideo streaming video plug-in delivers newscast-quality video over a 28.8Kbps connection and full-motion video over ISDN or faster connections. Their latest client product, RealPlayer, can deliver both RealAudio and RealVideo in one application.
Another early entry into the streaming video market was VDONet (http://www.vdo.net). VDONet's VDOLive product delivers compressed video even over low-bandwidth connections. A 28.8Kbps dial-up connection can support 10 to 15 frames per second.
Learn more about multimedia in Part IV, "Serving Multimedia Content." In particular, Chapter 27 addresses streaming audio, and Chapter 28 addresses streaming video.
Netscape LiveAudio Streaming audio is available for Netscape Navigator (a component of Netscape Communicator) through the Netscape Media Player Audio Streaming Plug-in. This plug-in handles media type "audio/x-liveaudio," which is served by the Netscape Media Server.
Apple QuickTime For years the "gold standard" for video on both Macintosh and Windows computers has been Apple Computer's QuickTime technology. You can use Apple's QuickTime Plug-in to display QuickTime animation, music, MIDI, audio, video, and Virtual Reality inside a Web page. QuickTime's fast-start feature allows the content developer to send streaming content (including video).
ON THE WEB: http://quickTime.apple.com/sam/ Start on this Apple site to visit a variety of commercial sites using QuickTime technology. Visit http://www.digigami.com/moviescreamer/demo.html to see a side-by-side comparison of streaming and nonstreaming video.
Figure 44.15 shows QuickTime in action. MovieScreamer, featured here, is an accelerator that applies fast-start to a QuickTime movie.
See for yourself the dramatic difference fast-start makes.
© Copyright, Macmillan Computer Publishing. All rights reserved.