Полезная информация

Exploring Java

Previous: 12.2 Web Browsers and HandlersChapter 12
Working with URLs
Next: 12.4 Writing a Protocol Handler
 

12.3 Writing a Content Handler

getContent() invokes a content handler whenever it's called to retrieve an object at some URL. The content handler must read the flat stream of data produced by the URL's protocol handler (the data read from the remote source), and construct a well-defined Java object from it. By "flat," I mean that the data stream the content handler receives has no artifacts left over from retrieving the data and processing the protocol. It's the protocol handler's job to fetch and decode the data before passing it along. The protocol handler's output is your data, pure and simple.

The roles of content and protocol handlers do not overlap. The content handler doesn't care how the data arrives, or what form it takes. It's concerned only with what kind of object it's supposed to create. For example, if a particular protocol involves sending an object over the network in a compressed format, the protocol handler should do whatever is necessary to unpack it before passing the data on to the content handler. The same content handler can then be used again with a completely different protocol handler to construct the same type of object received via a different transport mechanism.

Let's look at an example. The following lines construct a URL that points to a GIF file on an FTP archive and attempt to retrieve its contents:

try { 
    URL url = new URL ("ftp://ftp.wustl.edu/graphics/gif/a/apple.gif"); 
    Image img = (Image)url.getContent(); 
    ... 

When we construct the URL object, Java looks at the first part of the URL string (i.e., everything prior to the colon) to determine the protocol and locate a protocol handler. In this case, it locates the FTP protocol handler, which is used to open a connection to the host and transfer data for the specified file.

After making the connection, the URL object asks the protocol handler to identify the resource's MIME type. The handler can try to resolve the MIME type through a variety of means, but in this case, it might just look at the filename extension (.gif) and determine that the MIME type of the data is image/gif. Here, "image/gif" is a string that denotes that the content falls into the category of images and is, more specifically, a GIF image. The protocol handler then looks for the content handler responsible for the image/gif type and uses it to construct the right kind of object from the data. The content handler returns an Image object, which getContent() returns to us as an Object. As we've seen before, we cast this Object back to its real type, an Image, so we can work with it.

In an upcoming section, we'll build a simple content handler. To keep things as simple as possible, our example will produce text as output; the URL's getContent() method will return this as a String object.

12.3.1 Locating Content Handlers

When Java searches for a class, it translates package names into filesystem pathnames. This applies to locating content-handler classes as well as other kinds of classes. For example, on a UNIX- or DOS-based system, a class in a package named foo.bar.handlers would live in a directory with foo/bar/handlers/ as part of its pathname. To allow Java to find handler classes for arbitrary new MIME types, content handlers are organized into packages corresponding to the basic MIME type categories. The handler classes themselves are then named after the specific MIME type. This allows Java to map MIME types directly to class names. The only remaining piece of information Java needs is a list of packages in which the handlers might reside. To supply this information, use the system properties java.content.handler.pkgs and java.protocol.handler.pkgs. In these properties, you can use a vertical bar (|) to separate different packages in a list.[2]

[2] This method for locating handlers is completely different from the method I described in the first edition. Evidently, my educated guesses about how HotJava would develop weren't good enough. The method I described in the first edition will still work for your own applications.

We'll put our content handlers in the exploringjava.contenthandlers package. According to the scheme for naming content handlers, a handler for the image/gif MIME type is called gif and placed in a package called exploringjava.contenthandlers.image. The fully qualified name of the class would then be exploringjava.contenthandlers.image.gif, and it would be located in the file exploringjava/contenthandlers/image/gif.class, somewhere in the local class path or, perhaps someday, on a server. Likewise, a content handler for the video/mpeg MIME type would be called mpeg, and an mpeg.class file would be located in a exploringjava/contenthandlers/video/ directory somewhere in the class path.

Many MIME type names include a dash (-), which is illegal in a class name. You should convert dashes and other illegal characters into underscores (_) when building Java class and package names. Also note that there are no capital letters in the class names. This violates the coding convention used in most Java source files, in which class names start with capital letters. However, capitalization is not significant in MIME type names, so it's simpler to name the handler classes accordingly.

12.3.2 The application/x-tar Handler

In this section, we'll build a simple content handler that reads and interprets tar (tape archive) files. tar is an archival format widely used in the UNIX-world to hold collections of files, along with their basic type and attribute information.[3] A tar file is similar to a ZIP file, except that it's not compressed. Files in the archive are stored sequentially, in flat text or binary with no special encoding. In practice, tar files are usually compressed for storage using an application like UNIX compress or GNU gzip and then named with a filename extension like .tar.gz or .tgz.

[3] There are several slightly different versions of the tar format. This content handler understands the most widely used variant.

Most Web browsers, upon retrieving a tar file, prompt the user with a File Save dialog. The assumption is that if you are retrieving an archive, you probably want to save it for later unpacking and use. We would like to implement a tar content handler that allows an application to read the contents of the archive and give us a listing of the files that it contains. In itself, this would not be the most useful thing in the world, because we would be left with the dilemma of how to get at the archive's contents. However, a more complete implementation of our content handler, used in conjunction with an application like a Web browser, could generate HTML output or pop up a dialog that lets us select and save individual files within the archive.

Some code that fetches a .tar file and lists its contents might look like this:

try { 
    URL listing = 
        new URL("http://somewhere.an.edu/lynx/lynx2html.tar"); 
    String s = (String)listing.getContents(); 
    System.out.println( s ); 
     ... 

Our handler will produce a listing similar to the UNIX tar application's output:

Tape Archive Listing: 
      
0     Tue Sep 28 18:12:47 CDT 1993 lynx2html/ 
14773 Tue Sep 28 18:01:55 CDT 1993 lynx2html/lynx2html.c 
470   Tue Sep 28 18:13:24 CDT 1993 lynx2html/Makefile 
172   Thu Apr 01 15:05:43 CST 1993 lynx2html/lynxgate 
3656  Wed Mar 03 15:40:20 CST 1993 lynx2html/install.csh 
490   Thu Apr 01 14:55:04 CST 1993 lynx2html/new_globals.c 
... 

Our handler will dissect the file to read the contents and generate the listing. The URL's getContent() method will return that information to an application as a String object.

First we must decide what to call our content handler and where to put it. The MIME-type hierarchy classifies the tar format as an "application type extension." Its proper MIME type is then application/x-tar. Therefore, our handler belongs in the exploringjava.contenthandlers.application package and goes into the class file exploringjava/contenthandlers/application/x_tar.class. Note that the name of our class is x_tar, rather than x-tar; you'll remember the dash is illegal in a class name so, by convention, we convert it to an underscore.

Here's the code for the content handler; compile it and put it in exploringjava/contenthandlers/application/, somewhere in your class path:

package exploringjava.contenthandlers.application;

import java.net.*;
import java.io.*;
import java.util.Date;

public class x_tar extends ContentHandler {
    static int 
        RECORDLEN = 512, 
        NAMEOFF = 0, NAMELEN = 100,
        SIZEOFF = 124, SIZELEN = 12,
        MTIMEOFF = 136, MTIMELEN = 12;

    public Object getContent(URLConnection uc) throws IOException {
        InputStream is = uc.getInputStream();
        StringBuffer output = new StringBuffer( "Tape Archive Listing:\n\n" );
        byte [] header = new byte[RECORDLEN];
        int count = 0;

        while ( (is.read(header) == RECORDLEN) && (header[NAMEOFF] != 0) ) {

            String name = new String(header, NAMEOFF, NAMELEN, "8859_1").trim();
            String s = new String(header, SIZEOFF, SIZELEN, "8859_1").trim();
            int size = Integer.parseInt(s, 8);
            s = new String(header, MTIMEOFF, MTIMELEN, "8859_1").trim();
            long l = Integer.parseInt(s, 8);
            Date mtime = new Date( l*1000 );

            output.append( size + " " + mtime + " " + name + "\n" );

            count += is.skip( size ) + RECORDLEN;
            if ( count % RECORDLEN != 0 )
                count += is.skip ( RECORDLEN - count % RECORDLEN);
        }

        if ( count == 0 )
            output.append("Not a valid TAR file\n");

        return( output.toString() );
    }
}

The ContentHandler class

Our x_tar handler is a subclass of the abstract class java.net.ContentHandler. Its job is to implement one method: getContent(), which takes as an argument a special "protocol connection" object and returns a constructed Java Object. The getContent() method of the URL class ultimately uses this getContent() method when we ask for the contents of the URL.

The code looks formidable, but most of it's involved with processing the details of the tar format. If we remove these details, there isn't much left:

public class x_tar extends ContentHandler { 
 
    public Object getContent( URLConnection uc ) throws IOException { 
        // get input stream 
        InputStream is = uc.getInputStream(); 
 
        // read stream and construct object 
        // ... 
 
        // return the constructed object 
        return( output.toString() ); 
    } 
} 

That's really all there is to a content handler; it's relatively simple.

The URLConnection

The java.net.URLConnection object that getContent() receives represents the protocol handler's connection to the remote resource. It provides a number of methods for examining information about the URL resource, such as header and type fields, and for determining the kinds of operations the protocol supports. However, its most important method is getInputStream(), which returns an InputStream from the protocol handler. Reading this InputStream gives you the raw data for the object the URL addresses. In our case, reading the InputStream feeds x_tar the bytes of the tar file it's to process.

Constructing the object

The majority of our getContent() method is devoted to interpreting the stream of bytes of the tar file and building our output object: the String that lists the contents of the tar file. Again, this means that this example involves the particulars of reading tar files, so you shouldn't fret too much about the details.

After requesting an InputStream from the URLConnection, x_tar loops, gathering information about each file. Each archived item is preceded by a header that contains attribute and length fields. x_tar interprets each header and then skips over the remaining portion of the item. To parse the header, we use the String constructor to read a fixed number of characters from the byte array header[]. To convert these bytes into a Java String properly, we specify the character encoding used by Web servers: 8859_1, which (for the most part) is equivalent to ASCII. Once we have a file's name, size, and time stamp, we accumulate the results (the file listings) in a StringBuffer--one line per file. When the listing is complete, getContent() returns the StringBuffer as a String object.

The main while loop continues as long as it's able to read another header record, and as long as the record's "name" field isn't full of ASCII null values. (The tar file format calls for the end of the archive to be padded with an empty header record, although most tar implementations don't seem to do this.) The while loop retrieves the name, size, and modification times as character strings from fields in the header. The most common tar format stores its numeric values in octal, as fixed-length ASCII strings. We extract the strings and use Integer.parseInt() to parse them.

After reading and parsing the header, x_tar skips over the data portion of the file and updates the variable count, which keeps track of the offset into the archive. The two lines following the initial skip account for tar's "blocking" of the data records. In other words, if the data portion of a file doesn't fit precisely into an integral number of blocks of RECORDLEN bytes, tar adds padding to make it fit.

Whew. Well, as I said, the details of parsing tar files are not really our main concern here. But x_tar does illustrate a few tricks of data manipulation in Java.

It may surprise you that we didn't have to provide a constructor; our content handler relies on its default constructor. We don't need to provide a constructor because there isn't anything for it to do. Java doesn't pass the class any argument information when it creates an instance of it. You might suspect that the URLConnection object would be a natural thing to provide at that point. However, when you are calling the constructor of a class that is loaded at run-time, you can't easily pass it any arguments.

Using our new handler

When we began this discussion of content handlers, we showed a brief example of how our x_tar content handler would work for us. You can try that code snippet now with your favorite tar file by setting the java.content.handler.pkgs system property to exploringjava.contenthandlers and making sure that package is in your class path.

To make things more exciting, try setting the property in your HotJava properties file. (The HotJava properties file usually resides in a .hotjava directory in your home directory or in the HotJava installation directory on a Windows machine.) Make sure that the class path is set before you start HotJava. Once HotJava is running, go to the Preferences menu, and select Viewer Applications. Find the type TAR archive, and set its Action to View in HotJava. This tells HotJava to try to use a content handler to display the data in the browser. Now, drive HotJava to a URL that contains a tar file. The result should look something like that shown in Figure 12.3.

Figure 12.3: Using a content handler to display data in a browser

Figure 12.3

We've just extended our copy of HotJava to understand tar files! In the next section, we'll turn the tables and look at protocol handlers. There we'll be building URLConnection objects; someone else will have the pleasure of reconstituting the data.


Previous: 12.2 Web Browsers and HandlersExploring JavaNext: 12.4 Writing a Protocol Handler
12.2 Web Browsers and HandlersBook Index12.4 Writing a Protocol Handler

Other Books in this LibraryJava in a NutshellJava Language ReferenceJava AWTJava Fundamental ClassesExploring Java