Contributed by John Lind <john@starfire.MN.ORG>.
Certain Ethernet adapters for ISA PC systems have limitations which can lead to serious network problems, particularly with NFS. This difficulty is not specific to FreeBSD, but FreeBSD systems are affected by it.
The problem nearly always occurs when (FreeBSD) PC systems are networked with high-performance workstations, such as those made by Silicon Graphics, Inc., and Sun Microsystems, Inc. The NFS mount will work fine, and some operations may succeed, but suddenly the server will seem to become unresponsive to the client, even though requests to and from other systems continue to be processed. This happens to the client system, whether the client is the FreeBSD system or the workstation. On many systems, there is no way to shut down the client gracefully once this problem has manifested itself. The only solution is often to reset the client, because the NFS situation cannot be resolved.
Though the ``correct'' solution is to get a higher performance and capacity Ethernet adapter for the FreeBSD system, there is a simple workaround that will allow satisfactory operation. If the FreeBSD system is the server, include the option -w=1024 on the mount from the client. If the FreeBSD system is the client, then mount the NFS file system with the option -r=1024. These options may be specified using the fourth field of the fstab entry on the client for automatic mounts, or by using the -o parameter of the mount command for manual mounts.
It should be noted that there is a different problem, sometimes mistaken for this one, when the NFS servers and clients are on different networks. If that is the case, make certain that your routers are routing the necessary UDP information, or you will not get anywhere, no matter what else you are doing.
In the following examples, fastws is the host (interface) name of a high-performance workstation, and freebox is the host (interface) name of a FreeBSD system with a lower-performance Ethernet adapter. Also, /sharedfs will be the exported NFS filesystem (see man exports), and /project will be the mount point on the client for the exported file system. In all cases, note that additional options, such as hard or soft and bg may be desirable in your application.
Examples for the FreeBSD system (freebox) as the client: in /etc/fstab on freebox:
fastws:/sharedfs /project nfs rw,-r=1024 0 0
As a manual mount command on freebox:
# mount -t nfs -o -r=1024 fastws:/sharedfs /project
Examples for the FreeBSD system as the server: in /etc/fstab on fastws:
freebox:/sharedfs /project nfs rw,-w=1024 0 0
As a manual mount command on fastws:
# mount -t nfs -o -w=1024 freebox:/sharedfs /project
Nearly any 16-bit Ethernet adapter will allow operation without the above restrictions on the read or write size.
For anyone who cares, here is what happens when the failure occurs, which also explains why it is unrecoverable. NFS typically works with a ``block'' size of 8k (though it may do fragments of smaller sizes). Since the maximum Ethernet packet is around 1500 bytes, the NFS ``block'' gets split into multiple Ethernet packets, even though it is still a single unit to the upper-level code, and must be received, assembled, and acknowledged as a unit. The high-performance workstations can pump out the packets which comprise the NFS unit one right after the other, just as close together as the standard allows. On the smaller, lower capacity cards, the later packets overrun the earlier packets of the same unit before they can be transferred to the host and the unit as a whole cannot be reconstructed or acknowledged. As a result, the workstation will time out and try again, but it will try again with the entire 8K unit, and the process will be repeated, ad infinitum.
By keeping the unit size below the Ethernet packet size limitation, we ensure that any complete Ethernet packet received can be acknowledged individually, avoiding the deadlock situation.
Overruns may still occur when a high-performance workstations is slamming data out to a PC system, but with the better cards, such overruns are not guaranteed on NFS ``units''. When an overrun occurs, the units affected will be retransmitted, and there will be a fair chance that they will be received, assembled, and acknowledged.