UNIX Unleashed, Internet Edition
- 16 -
MIME--Multipurpose Internet Mail Extensions
by Robin Burk
HTML underlies the World Wide Web, but it is only one of a number of standard data types whose definition makes the Web possible. In this chapter, we'll look at the broader set of data formats used by Web and Internet programs to bridge the gaps between diverse operating systems and hardware platforms.
The topics covered in this chapter include:
How MIME Became an Internet Standard
MIME (Multipurpose Internet Mail Extensions) is one of the Internet protocol standards defined by the Internet Engineering Task Force (IETF). Once associated primarily with electronic mail, MIME has evolved to become an important element supporting multimedia applications on the Net. In order to understand MIME and how it operates, it's helpful to step back and see how it got to where it is today.
How Internet Standards Are Adopted
The IETF is the official body that proposes and adopts communications protocols, data formats, and similar conventions to be supported by the public Internet. For instance, all of the familiar Internet communications protocols, such as TCP, IP, PPP and SLIP, are formally defined by IETF documents called Requests For Comment (RFCs). The IETF also defines the Simple Mail Transfer Protocol (SMTP), the Network Timing Protocol (NTP), and newer, multimedia protocols such as the Resource reSerVation Protocol (RSVP) and the Real Time Protocol (RTP) that support interactive conferencing over the Net.
Not all RFCs adopted by the IETF become Internet standards. Those that are proposed for the standards track often begin as Internet Drafts submitted by one or more people from industry or academia. Internet Drafts must advance to RFC status within six months of publication or they are removed from consideration.
Once advanced to RFC status, a proposed protocol is open for comment and can be superseded by a revised version based on feedback from the technical community. Any interested party can participate in the discussion, either online or at face-to-face meetings. Each RFC is shepherded and debated within a specific Working Group of the IETF. The Working Groups meet from time to time to hammer out the details of proposed protocols.
Some RFCs are not intended for adoption as Internet standards. A few contain comments or information about a given technical scenario or about the standards process itself. Other informational RFCs do define protocols in detail, but are not proposed for adoption as standards because they were developed by a single company that chooses to retain control of their evolution. The RFCs that describe successive versions of Sun's Network File System fall into this category. By publishing the definition of the NFS protocol, Sun allows and encourages other vendors to support NFS in their own operating systems. In this way NFS has become a de facto, but not official, Internet standard.
Finally, some RFCs are designated as experimental (available for limited implementation to evaluate their effectiveness) or historical (once in use, now effectively replaced by an alternative protocol).
Official standards are not necessarily required to be adopted by all Internet server or client systems. A standard may fall into any of several categories:
The InterNIC Web site contains links to online copies of the RFCs, Internet Drafts, and other Internet-related information. Point your browser to http://www.internic.net/ds/ for the main site and to http://ds.internic.net/ds/dspg0intdoc.html to search for specific topics in the RFC database.
History of MIME
As its name suggests, MIME originally was associated with electronic mail transmission over the Internet.
The core standards for Internet e-mail are defined in RFC 821 "Simple Mail Transfer Protocol" and RFC 822 "Standard for the Format of ARPA Internet Text Messages". Together, these documents define a common format for e-mail encoded as U.S. ASCII characters.
Within the original ARPANET, a single, text-oriented e-mail standard was practical and appropriate. Over time, however, the ARPANET underwent several significant changes, among them a transition from its original home in the Department of Defense to become the public Internet, which in turn now supports the World Wide Web and attracts truly global use.
As the scope of the public internetwork expanded, it became useful to define ways for e-mail to be exchanged across the Net without requiring non-ASCII systems to convert all message character sets. Non-U.S. ASCII e-mail traveling over the Internet is analogous to letters written in French or Chinese being sent through the U.S. Postal Service. All that is required is that the letter be enclosed within an envelope that carries the standard addressing information in a form readable to the Postal Service's employees and scanning machines.
In addition, users often wanted to attach files of various formats and origins to their e-mail messages, much as the writer of a letter might include a newspaper clipping, photograph, or check in the letter's envelope. Potential e-mail attachments might be the output of standard applications such as word processors and spreadsheets, or might consist of binary executable files, graphical images, or even data files from custom applications.
MIME was intended to support both of these scenarios. At its most fundamental, MIME encodes e-mail messages into standard formats beyond the ASCII text format defined in the original ARPANET protocols.
By extending these formats to include multi-part messages, MIME allows e-mail messages to have attached files in a variety of formats. Prior to the adoption of the MIME protocols, users on diverse systems (and often on similar systems) could not easily pass non-text information along with their e-mail.
The MIME protocol provides both a list of currently-defined message types and also a mechanism for adding new formats over time. This means that MIME can evolve to support new multimedia formats, application file types, languages, character sets, and other data types as they become widespread or otherwise useful within the Internet's technical environment. It is this breadth of scope, and its open-ended nature, that places MIME in the category of "elected" rather than "recommended" or "required" Internet protocols.
MIME data type definitions soon found uses beyond e-mail. When the founders of the World Wide Web created a hypertext capability, they found it easy to use the MIME framework to define a new hypertext data type to specify HTML scripts. And when the language rules for HTML were written, the authors found it easy to allow graphics to be embedded in Web pages because MIME had already centralized the definition of graphical image formats.
Today there are MIME formats for audio, video, ZIPped, and vendor-specific data types. MIME even provides a way to name a data type for which no official IANA recognition has yet occurred. This allows software vendors to create optimized or specialized formats that, if they achieve widespread adoption, are then likely to be added to the official list. Developers of browser clients and browser plug-ins have made extensive use of this capability. In this way, MIME plays a critical role in the rapid evolution of both the World Wide Web and of the wider use of multimedia in computing. All this from what started as humble extensions to ASCII e-mail messages!
The MIME Data Type Scheme
For many years, the core MIME documents were RFCs 1521 and 1522. In November 1996, however, a new series of MIME standards were proposed in RFCs 2045 through 2049. These documents reflect the great variety of data types that had evolved, especially for multimedia applications, since the original MIME definitions were established.
RFC 2046 outlines the media types that are supported by MIME. More accurately, this RFC outlines the categories into which such data types can be placed.
The first distinction to be made is between discrete media and composite media. Discrete media contain a single entity or data object. An entity consists of a MIME header and either the contents of a message or one of the parts of a multi-part message. MIME treats discrete media as opaque objects that are passed on to the receiving application without interpretation or other processing.
Composite media contain multiple entities, which can be of the same or different types. Composite media require MIME processing to correctly handle the various entities being transmitted together.
MIME defines top-level media types, which are used to specify the general type of data, and subtypes, which typically specify a particular format for that type of data. New top-level media types and lower-level subtypes may be added as needed. The definition of a top-level media type includes the following:
There are five discrete top-level media types initially defined in the new MIME scheme. These are:
The two top-level composite media types are:
MIME types that are not recognized by IANA are given names that start with "x-". For instance, the MPEG layer-2 format for audio information, which is associated with file extension .mp2, is mapped to the MIME type audio/x-mpeg". Officially recognized MIME types are generally supported by the relevant server and client software, but private or experimental types may require explicit configuration at both the Internet server and the client workstation in order to be processed correctly.
Common MIME Data Types
Although the top-level MIME media types correspond to basic concepts that all users would understand, not all subtypes fall under the obvious media category. Those that are associated with specific application software, for instance, may be classified as application types rather than text, image, or audio, despite being widely available over the Internet. Often these data types require a browser plug-in before their contents will be correctly processed when visiting a Web site, or the client browser might ask you to specify which application is associated with that subtype or file extension.
Because the official status of data types is changing rapidly, especially with the rapid expansion of multimedia applications, I've grouped these descriptions by the intuitive categories to which they belong rather than their official status. Each data format description that follows includes the common format name and current MIME name, the file extension(s) associated with the media, and a brief description.
Table 16.1 lists the most common MIME text types.
Table 16.1. Text types commonly found on the Internet.
Table 16.2 lists the most common MIME image types.
Table 16.2. Image types commonly found on the Internet.
Table 16.3 lists the most common MIME audio types.
Table 25.3. Audio types commonly found on the Internet
Table 16.4 lists the most common MIME video types.
Table 16.4. Video types commonly found on the Internet.
Table 16.5 lists the most common MIME application types.
Table 16.5. Application types commonly found on the Internet.
Note that there are many other application types that can be sent over the Internet as file attachments to e-mail. Spreadsheet and word processor files are the most common, along with the output of presentation software. E-mail clients, browsers, and similar software that receives such formats will simply store the data in a disk file unless configured to map the file extension or private MIME type to a specific executable for processing.
One important subcategory of the application media type is the variety of compression schemes applied to general files. (Note that many audio, image, and video formats include standard compression/decompression that is automatically applied when the data is processed.) Table 16.6 lists the most common compression types and their public or private MIME names.
Table 16.6. Compression types commonly found on the Internet.
Multipart and Message Types
These MIME formats are primarily used for e-mail messages with multiple parts and are manipulated by e-mail server and client software
Listing 16.1 shows a compound e-mail message, which includes the text of a message received earlier, the sender's response, and an attached file. Each element of this message has its own MIME format and is a separate entity within the compound message.
Listing 16.1. MIME supports compound e-mail messages.
X-POP3-Rcpt: email@example.com Return-Path: firstname.lastname@example.org From: email@example.com Date: Mon, 2 Jun 1997 10:29:20 -0500 Subject: example of forwarding a compound email message To: firstname.lastname@example.org Content-Description: cc:Mail note part Here's my reply, which quotes the original message in full. -----Original Message----- From: email@example.com Sent: Friday, 30 May 1997 11:40:00 To: firstname.lastname@example.org Subject: here's an original message with attachments Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Description: cc:Mail note part Attached are two files in different formats. <<File: unx.ini>>S<<File: global95.dot>>u<<File: global97.dot>>b
Web Pages, Web Servers, and MIME
Web servers are the software that runs on a system that provides file, Internet, and World Wide Web access to client workstations.
A variety of commercial and shareware Web servers are available. In most cases, the operating system of choice for server systems is one or more flavors of UNIX.
The primary job of a Web server is to transmit the HTML scripts that make up a World Wide Web page. The client's browser software then interprets the HTML script and displays the Web page contents on the client system's monitor.
Along with the text whose presentation is specified by the HTML script, a Web page may contain images or other multimedia content stored in separate files on the server machine. The client browser will issue requests to the server each time it finds a tag referring to such a file. The Web server software must find the file, encode it appropriately (using standard MIME schemes) so that the integrity of the transmitted information can be verified and send the file off to the client machine. At the client, the browser then decodes the information and displays it, plays it over the speaker, or otherwise presents it as part of the web page.
Web pages may also make use of Common Gateway Interface calls. CGI provides a way for HTML scripts to exchange information with other applications running on the server system. These most commonly are database applications accessed by HTML forms; however, the Web page may contain server-side html logic, which causes the server itself to take different actions depending on what has come before. A Web page form may ask the user to specify whether or not his browser can support frames, for instance. If the user says it does not, the server will then present a non-framed version of the Web page to the user at his workstation. The Web server software is responsible for processing server-side HTML logic.
In each of these cases, MIME data types are at work. HTML itself is a MIME text type, as are the common image, audio, and video formats for Web page multimedia content. Even application and private data types must be encoded properly to protect against transmission errors, and MIME defines appropriate encoding schemes for this purpose.
Many servers and browsers come pre-configured to recognize the standard MIME data types. Some standard types, and all private types, must be defined to the server and browser software before they can be correctly processed.
To configure a MIME data type in the Netscape Navigator browser (version 3.01), for instance, select options, general preferences and helpers from the menu tree. Figure 16.1 shows how the helpers screen allows you to create associations between MIME types, file extensions, and the actions to be taken when such a data object is received.
Each Web server has its own way of configuring MIME types. Typically, this is done by means of a configuration file read when the server process is created. The Apache Web server, included on the CD-ROM for this book, looks for its configuration files in /usr/local/httpd/conf unless told that the configuration files are located elsewhere. The basic server configuration file httpd.conf and the server resource map srm.conf tell the server which MIME data types are legal and how to process the various data contents. For more information, see the Apache documentation on the CD-ROM or online at http://www.apache.org/docs/.
In this chapter we've taken a brief look at the data format standards that allow diverse hardware and software platforms to exchange data across the Internet and the Web. Understanding how the MIME standard was established, what data formats it covers, and how it is used by Web pages and Web servers can help you correctly configure e-mail and browser software for yourself and your system's users.
An extensible Internet standard, MIME is a fundamental enabling technology for Internet e-mail, the World Wide Web, and most networked multimedia applications.