Network Programming with Perl

Preface



Network Programming in Perl: Home Page

Preface

The network is everywhere. At the office, machines are wired together into local area networks and the local networks are interconnected via the Internet. At home, personal computers are intermittently connected to the Internet via dial-up links or via ``always on'' cable and DSL modems. New wireless technologies such as Bluetooth promise to vastly expand the network realm, embracing everything from cell phones to kitchen appliances.

Such an environment creates tremendous opportunities for innovation. Whole new classes of application are now predicated on the availability of high-bandwidth, always-on connectivity. Interactive games allow players from across the globe to compete on virtual playing fields and the instant messaging protocols let them broadcast news of their triumphs to their friends. New peer-to-peer systems such as Napster and Gnutella allow people to directly exchange MP3 audio files and other types of digital content. The SETI@Home project takes advantage of idle time on the millions of personal computers around the world to search for signs of extraterrestial life in a vast collection of cosmic noise.

The ubiquity of the network allows for more earthbound applications as well: with the right knowledge you can write a robot that will fetch and summarize prices from competitors' web sites, a script to page you when a certain stock drops below a specified level, a program to generate daily management reports and send them off via e-mail, a server that centralize some number-crunching task on a single high-powered macine, or alternatively distributes that task among the multiple nodes of a compute cluster.

Whether you are searching for the best price on a futon or for life in a distant galaxy, you'll need to understand how network applications work in order to take full advantage of these opportunities. You'll need a working understanding of the TCP/IP protocol, the common denominator for all Internet-based communications and the most common protocol in use in local area networks. You'll need to know how to connnect to a remote program, to exchange data with it, and what to do when something goes wrong. To work with existing applications, such as web servers, you'll have to understand how the application-level protocols are built on top of TCP/IP, and how to deal with common data exchange formats such as XML and MIME.

This book uses the Perl programming language to illustrate how to design and implement practical network applications. Perl is an ideal language for network programming for a number of reasons. First, like the rest of the language, Perl's networking facilities were designed to make the easy things easy. It takes just two lines of code to open up a network connection to a server somewhere on the Internet and send it a message. A fully-capable Web server can be written in well under a hundred lines of code.

Second, Perl's open architecture has encouraged many talented programmers to contribute to an ever-expanding library of useful third-party modules. Many of these modules provide powerful interfaces to common network applications. For example, after loading the LWP::Simple module, a single function call allows you to fetch the contents of a remote web page and store it in a variable. Other third-party modules provide intuitive interfaces to e-mail, FTP, net news, and a variety of network databases.

Perl also provides impressive portability. Most of the applications developed in this book will run without modification on UNIX machines, Windows boxes, Macintoshes, VMS systems, and OS/2.

However, the most compelling reason to choose Perl for network application development is that it allows you to fully exploit the power of TCP/IP. Perl provides you with full access to the same low-level networking calls that are available to C programs and other natively compiled languages. You can create multicast applications, implement multiplexed servers, and design peer-to-peer systems. Using Perl, you can rapidly prototype new networking applications and develop interfaces to existing ones. Should you ever need to write a networking application in C or Java, you'll be delighted to discover how much of the Perl API carries over into these languages.


Who this book is for

This book is written for the novice to intermediate Perl programmer. I assume you know the basics of Perl programming, including how to write loops, how to construct if-else statements, how to write regular expression pattern matches, the concept of the automatic $_ variable, and the basics of arrays and hashes.

You should have access to a Perl interpreter and have some experience writing, running, and debugging scripts. Just as importantly, you should have access to a computer that is connected to a local area network, and, hopefully, the Internet as well! Although the recipes in Chapter 10 on setting Perl-based network servers to start automatically when a machine is booted do require superuser (administrative) access, none of the other examples require privileged access to a machine.

Although this book does take advantage of the object-oriented features in Perl version 5 and higher (which in my opinion are one of the main attractions of network programming in Perl), most chapters do not assume a deep knowledge of how this system works. Chapter 1 goes over all the details you will need as a casual user of Perl objects.

This book does not intend to be a thorough review of the TCP/IP protocol at the lowest level, or a guide to installing and configuring network hubs, routers and name servers. The many good books on the mechanics of the TCP/IP protocol and network administration are given in the references at the end of this book.


Roadmap

This book is organized into four main sections, Basics, Developing Cients for Common Services, Developing TCP Client/Server Systems, and Advanced Topics.

Basics introduces the fundamentals of TCP/IP network communications.

Chapters 1 and 2, Networking Basics and Processes, Pipes and Signals review Perl's functions and variables for input and output, discusses the exceptions that can occur during I/O operations, and uses the piped filehandle as the basis for introducing sockets. The chapter also reviews Perl's process model, including signals and forking, and introduces Perl's object-oriented extensions.

Chapter 3, Introduction to Berkeley Sockets, discusses the basics of Internet networking and discusses IP addresses, network ports, and the principles of client/server applications. It then turns to the Berkeley Socket API, which provides the programmer's interface to TCP/IP.

Chapters 4 and 5, The TCP Protocol and The IO::Socket API and Simple TCP Applications, shows the basics of TCP, the networking protocol that provides reliable stream-oriented communications. These chapters demonstrate how to create client and server applications and introduce examples which show the power of technique as well as some of the common roadblocks.

The next part, Developing Clients for Common Services, looks at a collection of the best third-party modules that developers have contributed to the Comprehensive Perl Archive Network.

Chapter 6, FTP and Telnet, introduce modules that provide access to the FTP file-sharing service, as well as to the flexible Net::Telnet module which allows you to create clients to access all sorts of network services.

E-mail is still the dominant application on the Internet, and Chapter 7, SMTP: Sending Mail, introduces one half of the equation. This chapter shows you how to create e-mail messages on the fly, including binary attachments, and send them off to their destination.

Chapter 8, POP, IMAP and NNTP: Processing Mail and Netnews, covers the other half of e-mail, explaining modules that make it possible to receive mail from mail drop systems and process their contents, including binary attachments.

Chapter 9, HTTP: Talking to the Web discusses the LWP module, which provides everything you need to talk to web servers, download and process HTML documents, and parse XML.

Part III, Developing TCP Client/Server Systems is the longest section of the book. It discusses the alternatives for designing TCP-based client/server systems. The major example used in these chapters is an interactive psychotherapist server, based on Joseph Weizenbaum's classic Eliza program.

Chapter 10, Forking Servers and the Inetd Daemon, covers the common type of TCP server which forks a new process to handle each incoming connection. It also covers the Unix and Windows inetd daemons, which allow programs not specifically designed for networking to act as servers.

Chapter 11, Multithreaded Applications, explains Perl's experimental multithreaded API, and shows how this can be used to greatly simplify the design of TCP clients and servers.

Chapters 12 and 13, Multiplexed Operations and Non-blocking I/O, discuss the select() call, which enables an application to process multiple I/O streams concurrently without using multiprocessing or multithreading.

Chapter 14, Pre-Forking and Pre-Threading, cover enhancements of the forking and threading models discussed in earlier chapters. These enhancements increase a server's ability to perform well under heavy loads.

Chapter 15, The IO::Poll Module, discusses an alternative to select() available on Unix platforms. This module allows applications to multiplex multiple I/O streams using an API that some people find more natural than select()'s.

Chapter 16: Bulletproofing servers, discusses techniques for enhancing the reliability and maintainability of network servers. Among the topics discussed are logging, signal handling, and exceptions, as well as the important topic of network security.

Part IV, Advanced Topics covers advanced techniques that are useful for specialized applications.

Chapter 17, TCP Urgent Data, is devoted to TCP urgent or ``out of band'' data. This technique is often used in highly interactive applications in which the user needs to signal the remote server in an urgent manner.

Chapters 18 and 19, The UDP Protocol and UDP Servers introduce the UDP (User Datagram Protocol), which provides an unreliable message-oriented communications service. Chapter 18 introduces the protocol, and chapter 19 shows how to design UDP servers. The major example in this and the next two chapters is a live online chat and messaging system written entirely in Perl.

Chapters 20 and 21, Broadcasting and Multicasting, extend the UDP discussion by showing how to build one to all and one to many message broadcasting systems. In these chapters we extend the chat system to take advantage of automatic server discovery and multicasting.

Chapter 22, Unix Domain Sockets, shows how to create lightweight communications channels between processes on the same machine. This can be useful for specialized applications such as loggers.


The many versions of Perl

All good things evolve to meet changing conditions, and Perl has gone through several major changes in the course of its short lifetime. This book was written for version of Perl in the 5.X series (5.003 and higher recommended). At the time I wrote this preface (August 2000), the most recent version of Perl was 5.6, with the release of 5.7 expected imminently. I expect that Perl versions 5.8 and 5.9 (assuming there will be such versions) will be compatible with the code examples given here as well.

Over the horizon, however, is Perl version 6. Version 6, which is expected to be in early alpha form by the summer, will fix many of the idiosyncrasies and misfeatures of earlier versions of Perl. In so doing, however, it is expected to break most existing scripts, and will probably break the examples in this book as well. Fortunately, the Perl language developers are committed to developing tools to automatically port existing scripts to version 6. With an eye to this, I have tried to make the examples in this book generic, avoiding the more obscure Perl constructions whenever possible. Only time will tell how successful this strategy has been.

Cross-Platform Compatibility

More serious are the differences between implementations of Perl on various operating systems. Perl started out on Unix (and Linux) systems, but has been ported to many different operating systems, including Microsoft Windows, the Macintosh, VMS, OS/2, Plan9 and others. A script written for the Windows platform will run on Unix or Macintosh without modifications.

The problem is that the I/O subsystem (the part of the system that manages input and output operations) is the part that differs the most dramatically from operating system to operating system. This restricts the ability of Perl to make its I/O system completely portable. While Perl's basic I/O functionality is identical from port to port, some of the more sophisticated operations are either missing or behave significantly different on non-Unix platforms. This affects network programming, of course, because networking is fundamentally about input and output.

The early chapters of this bug, Chapters 1-9, use generic networking calls which will run on all platforms. The exception to this rule is the last example in Chapter 5, which calls a function that isn't implemented on the Macintosh, fork(), and some of the introductory discussion in Chapter 2 of process management on Unix systems. The techniques discussed in these chapters are all you need for the vast majority of client programs, and are sufficient to get a simple server up and running.

Of the remaining half of the book, which deals with more advanced topics in server design, the table below shows whether the features used in the chapters are supported by Unix, Windows or the Macintosh ports of Perl.

  Chapter   Subject                    Unix/Linux  Windows  Macintosh
   1-9      Basic network programming   +            +         +
    10      Forking servers             +            P         -
    11      Multithreaded servers       +            +         -
    12      Multiplexing                +            +         +
    13      Nonblocking I/O             +            +         +
    14      Preforking                  +            -         -
    15      IO::Poll                    +            -         -
    16      Server Bulletproofing       +            P         -
    17      TCP Urgent data             +            -         -
    18      UDP                         +            +         +
    19      UDP Servers                 +            +         +
    20      Broadcasting                +            +         +
    21      Multicasting                +            -         -
    22      Unix domain sockets         +            -         -
    Key: +  features supported
         -  features unsupported
         P  partial support

The nice thing is that the non-Unix ports of Perl are rapidly improving, and there is a good chance that features which weren't available when this book was written will be available at the time you read this.


Getting the code for the code examples

I believe that the best way to learn programming techniques is to read lots of code, test it, and adapt it for your own use. This book is full of working examples that perform ``real life'' tasks. Among the programs we develop here are a real-time chat and messaging system, a program for fetching and processing e-mail containing MIME attachments, a script for mirroring data from an FTP site, an interactive chatbot robot, and a program for uploading and analyzing the word frequencies in large text files.

All the example scripts and modules discussed in this book are available on the web in ZIP and TAR/GZIP formats. The URL for downloading the source is http://www.modperl.com/perl_networking. This page also includes instructions for unpacking and installing the source code.


Installing modules

Many of Perl's networking modules are preinstalled in the standard distribution. Other are third-party modules that you must download and install from the Web. Most third-party modules are written in pure Perl, but some, including several that are mentioned in this book are partly written in C and must be compiled before they can be used.

CPAN, the Comprehensive Perl Archive Network, is a large web-based collection of contributed Perl modules. You can get access to it via a web or FTP browser, or by using a command-line application built into Perl itself.

Installing from the web

To find a CPAN site near you, point your web browser at http://www.cpan.org/. This will present you with a page that allows you to search for specific modules, or to browse the entire list of contributed modules sorted in various ways. When you find the module you want, download it to disk.

Perl modules are distributed as gzipped tar archives. You can unpack them like this:

 % gunzip -c Digest-MD5-2.00.tar.gz  | tar xvf -
 Digest-MD5-2.00/
 Digest-MD5-2.00/typemap
 Digest-MD5-2.00/MD2/
 Digest-MD5-2.00/MD2/MD2.pm
 ...

Once unpacked, you'll enter the newly-created directory and give the perl Makefile.PL, make, make test, and make install commands. Together these will build, test and install the module.

 % cd Digest-MD5-2.00
 % perl Makefile.PL 
 Testing alignment requirements for U32...
 Checking if your kit is complete...
 Looks good
 Writing Makefile for Digest::MD2
 Writing Makefile for Digest::MD5
 % make
 mkdir ./blib
 mkdir ./blib/lib
 mkdir ./blib/lib/Digest
 ...
 % make test
 make[1]: Entering directory `/home/lstein/Digest-MD5-2.00/MD2'
 make[1]: Leaving directory `/home/lstein/Digest-MD5-2.00/MD2'
 PERL_DL_NONLAZY=1 /usr/local/bin/perl -I./blib/arch -I./blib/lib...
 t/digest............ok
 t/files.............ok
 t/md5-aaa...........ok
 t/md5...............ok
 t/rfc2202...........ok
 t/sha1..............skipping test on this platform
 All tests successful.
 Files=6,  Tests=291,  1 secs ( 1.37 cusr  0.08 csys =  1.45 cpu)
 % make install
 make[1]: Entering directory `/home/lstein/Digest-MD5-2.00/MD2'
 make[1]: Leaving directory `/home/lstein/Digest-MD5-2.00/MD2'
 Installing /usr/local/lib/perl5/site_perl/i586-linux/./auto/Digest/MD5/MD5.so
 Installing /usr/local/lib/perl5/site_perl/i586-linux/./auto/Digest/MD5/MD5.bs
 ...

On Unix systems, you may need to have superuser privileges to perform the final step. If you don't have such privileges, you can install the modules in your home directory. At the perl Makefile.PL step, provide a PREFIX= argument with the path of your home directory. For example, assuming your home directory can be found at /home/jdoe, you would type:

  % perl Makefile.PL PREFIX=/home/jdoe

The rest of the install procedure is identical to what was shown earlier.

If you are using a custom install directory, you must tell Perl to look in this directory for installed modules. One way to do this is to add the name of the directory to the environment variable PERL5LIB. For example:

 setenv PERL5LIB /home/jdoe            # C shell
 PERL5LIB=/home/jdoe; export PERL5LIB  # bourne shell

Another way is to place this line at the top of each script that uses an installed module.

 use lib '/home/jdoe';

Installing from the Command Line

A simpler way to do the same thing is to use Andreas Koenig's wonderful CPAN shell. With it, you can search, download, build and install Perl modules from a simple command-line shell. The ``install'' command does it all:

 % perl -MCPAN -e shell [bold]
 cpan shell -- CPAN exploration and modules installation (v1.40)
 ReadLine support enabled
 cpan> install MD5     [bold]
 Running make for GAAS/Digest-MD5-2.00.tar.gz
 Fetching with LWP:
   ftp://ftp.cis.ufl.edu/pub/perl/CPAN/authors/id/GAAS/Digest-MD5-2.00.tar.gz
 CPAN: MD5 loaded ok
 Fetching with LWP:
   ftp://ftp.cis.ufl.edu/pub/perl/CPAN/authors/id/GAAS/CHECKSUMS
 ...
 Checksum for /home/lstein/.cpan/sources/authors/id/GAAS/Digest-MD5-2.00.tar.gz ok
 Digest-MD5-2.00/
 Digest-MD5-2.00/typemap
 ...
 Installing /usr/local/lib/perl5/site_perl/i586-linux/./auto/Digest/MD5/MD5.so
 Installing /usr/local/lib/perl5/site_perl/i586-linux/./auto/Digest/MD5/MD5.bs
 Installing /usr/local/lib/perl5/site_perl/i586-linux/./auto/MD5/MD5.so
 ...
 Writing /usr/local/lib/perl5/site_perl/i586-linux/auto/MD5/.packlist
 Appending installation info to /usr/local/lib/perl5/i586-linux/5.00404/perllocal.pod
 cpan> exit  [bold]

Installing modules with the Perl Package Manager

These examples all assume that you have Unix-compatible versions of the gzip, tar and make commands. Virgin Windows systems do not have these utilities. The Cygwin package, available from http://www.cygnus.com/cygwin/, provides these utilities as part of a complete set of Unix compatibility tools.

It's easier, however, to use the ActiveState Perl Package Manager (PPM). This Perl script is installed by default in the ActiveState distribution of Perl, available at http://www.activestate.com. Its interface is similar to the command-line CPAN interface shown in the previous section, except that it can install precompiled binaries as well as pure-Perl scripts.

Example:

 % ppm  [bold]
 ppm> install MD5 [bold]
 Connecting to repository...
 Retrieving MD5.ppd...
 MD5 installed
 ppm> exit

Installing modules from MacPerl

The MacPerl Module Porters site, http://pudge.net/cgi-bin/mmp.plx, contains a series of modules that have been ported for use in MacPerl. A variety of helper programs have been developed to make module installation easier on the Macintosh. The packages are described at http://pudge.net/macperl/macperlmodinstall.html, which also give instructions on downloading and installling them.


Online Documentation

In addition to books and web sites, this book refers to two major sources of online information, Internet RFCs and to Perl POD documentation.

Internet RFCs

The specifications of all the fundamental protocols of the Internet are described in a series of Requests for Comment (RFCs) submitted to the Internet Engineering Task Force. These documents are numbered sequentially. For example RFC 1927 ``Suggested Additional MIME Types for Associating Documents'' was the 1927th RFC submitted. Some of these RFCs eventually go on to become Internet Standards, in which case they are given sequentially-numbered STD names. However, most of them remain as RFCs. Even though the RFCs are technically unofficial, they are the references that people use to learn the details of networking protocols and to validate that a particular implementation is correct.

The RFC archives are mirrored at many locations on the Internet, and maintained in searchable form by several organizations. One of the best archives is maintained at http://www.faqs.org/rfcs/. To retrieve an RFC from this site, go to the indicated page and type the number of the desired RFC in the text field labeled ``Display the document by number''. The document will be delivered in a minimally HTMLized form. This page also allows you to search for standards documents, and to search the archive by keywords and phrases.

If you prefer a text-only form, the www.faqs.org site contains a link to their FTP site, where you can find and download the RFCs in their original form.

POD

Much of Perl's internal documentation comes in POD (Plain Old Documentation) format. These are mostly plain text, with a few markup elements inserted to indicate headings, subheadings, and itemized lists.

When you installed Perl, the POD documentation was installed as well. The POD files are located in the pod subdirectory of the Perl library directory. You can either read them directly, or use the perldoc script to format and display them in a text pager such as more.

To use perldoc type the command and the name of the POD file you wish to view. The best place to start is the Perl table of contents, perltoc:

  % perldoc perltoc [bold]

This will give you a list of other POD pages that you can display.

For a quick summary of a particular Perl function, perldoc accepts the -f flag. For example, to see a summary of the socket() function, type:

 % perldoc -f socket [bold]

For Macintosh user's the MacPerl distribution comes with a ``helper'' application called shuck. This adds POD viewing facilities to the MacPerl Help menu.


Acknowledgements

They say that the first skill an editor learns on the job is patience, but I think that Karen Gettman was born with an excess of it. She must have caught on after the second or third time that when I said ``it should be done in just another week'', I really was talking about months. Yet she never betrayed any sign of dismay, even though I'm sure she was fighting an increasingly restive production and marketing staff. To Karen, all I can say is ``thank you!''

Thanks also to Mary Hart, the assistant editor responsible for my book. I have worked with Mary on other projects, and I know that it is her tireless efforts that make publishing with Addison-Wesley seem so frictionless.

I am extremely grateful to the technical reviewers who worked so diligently to keep me honest: Jon Orwant, Harry Hochheiser, Robert Kolstad, Sander Wahls, and Megan Conklin. The book is very much better because of your efforts.

I owe a debt of gratitude to the long-suffering members of my laboratory, Ravi, David, Marco, Hong, Guanming, Nathalie and Peter, who somehow managed to keep things moving forward even during the last months of manuscript preparation, when my morning absences became increasing extended.

And of course I wish to thank my wife, Jean, who has stuck with me through several of these projects already, and has never, ever, asked for the dining room table back.

Stony Brook NY August 20, 2000


Network Programming in Perl: Home Page