We're back after a server migration that caused effbot.org to fall over a bit harder than expected. Expect some glitches.

Building An Asynchronous FTP Client

October 17, 2002 | Fredrik Lundh

This article describes how to use Python’s standard asynchat and asyncore modules to implement an asynchronous FTP client. In the first part, we’ll look at the FTP protocol itself, and how to use the asynchat library to talk to an FTP server.

Contents:

The scripts and modules used in this article are available from the effbot.org subversion repository:

$ svn co http://svn.effbot.python-hosting.com/stuff/zone/asyncore-ftp

Part #1: Reading Directory Listings

The File Transfer Protocol

The File Transfer Protocol (FTP) has been around for ages; it’s even older than the Internet. Despite its age, FTP is still commonly used to download data from remote servers, and it’s by far the most common protocol for uploading data to servers.

Unlike HTTP, the FTP is a “chat-style” protocol. The client sends a command, waits for a response, sends another command, reads the response, etc. A typical interchange might look something like (C=client, S=server):

C: connects
S: 220 FTP server ready.
C: USER mulder
S: 331 Password required for mulder
C: PASS trustno1
S: 230 User mulder logged in.
C: PASV
S: 227 Entering Passive Mode (195,100,36,198,219,28)
C: RETR sculley.zip
S: 150 Opening BINARY mode data connection for sculley.zip (271165 bytes).
S: 226 Transfer complete.
C: PASV
S: 227 Entering Passive Mode (195,100,36,198,219,29)
C: LIST
S: 150 Opening ASCII mode data connection for directory listing.
S: 226 Transfer complete.
C: QUIT
S: 221-You have transferred 271165 bytes in 1 files.
S: 221-Total traffic for this session was 271859 bytes in 1 transfers.
S: 221 Thank you for using the FTP service on server.example.com.

The client lines all consists of a command name (e.g. USER) followed by an optional argument. The server response lines consist of a 3-digit code, followed by either a space or a dash (-), followed by a text message. The lines using a dash are belong to a multi-line response; the client should keep reading response lines until it gets a line without the dash.

Lines are separated by CR and LF (chr(10)+chr(13)), but some clients and servers use only LF (chr(10)).

Common FTP Commands

The above example uses the following FTP commands:

USER. Provide user name. The server should respond with 230 if the user is accepted as is, 530 if the login attempt was rejected, or 331 or 332 if the client must provide a password (using the PASS command).

PASS. Provide password. The server should respond with 230 if the user is accepted, 530 if the login failed, or 332 if further login information is required (the details of which is outside the scope of this article).

PASV. Tell the server to prepare a data transfer channel. The server will return 227 and the response message will also contain six integers, separated by commas. The numbers specify an IP address and a port number to which the client should connect to transfer the data. The client should ignore the first four digits, and use the server address instead. To get the port number, multiply the fifth integer by 256 and add the sixth integer.

RETR. Initialize a data transfer from the server to the client, using the port number specified by the PASV command. The client should connect to the data port before issuing this command. When the transfer is initialized, the server will return a 150 response and start sending data over the transfer port. When the transfer is completed (whether all data was sent or not), the server follows up with a 226 response.

LIST. This is similar to RETR, but it returns a directory listing for the current directory. As with RETR, you must use PASV to prepare the data channel before issuing this command.

QUIT. Shutdown the connection. The server usually returns a multiline summary message. If you’re not interested in the message, you can just shut down the socket connection.

For more information on the FTP protocol, see Dan Bernstein’s extensive FTP protocol reference, which is written with an emphasis on how FTP works in practice.

Introducing the asynchat Module

The asyncore library comes with a support module for chat-style protocols, called asynchat. This module provides a asyncore.dispatcher subclass called async_chat, which adds an input parser and output buffering to the basic dispatcher.

The input parser feeds data to the collect_incoming_data method. When the parser sees a predefined terminator string, it calls the found_terminator method. The following example prints incoming lines to standard output, one line at a time:

class channel(asynchat.async_chat):

    def __init__(self):
        asynchat.async_chat.__init__(self)
        self.buffer = ""
        self.set_terminator("\r\n")

    def collect_incoming_data(self, data):
        self.buffer = self.buffer + data

    def found_terminator(self):
        print "got", self.buffer
        self.buffer = ""

The async_chat class also provides output buffering, via the push method:

class channel(asynchat.async_chat):

    def found_terminator(self):
        # echo string back to sender
        self.push("echo %s\n" % self.buffer)
        self.buffer = ""

There’s also a push_with_producer method that takes a producer object, which can be used to generate data on the fly. Producer objects are outside the scope of this article.

The push and push_with_producer methods add data to an output queue, and the framework automatically sends data whenever the receiving end is ready.

Using asynchat for FTP

But let’s get back to the topic for this article: doing asynchronous FTP.

The FTP server expects the client to read a response, send a command, read the next response, etc. The found_terminator method is where you end up after each response, so it makes a certain sense to put the protocol logic in that method. Here’s a first attempt:

Example: talking to an FTP server
import asyncore, asynchat
import re, socket

class anon_ftp(asynchat.async_chat):

    def __init__(self, host):
        asynchat.async_chat.__init__(self)

        self.commands = [
            "USER anonymous",
            "PASS anonymous@",
            "PWD",
            "QUIT"
            ]

        self.set_terminator("\n")

        self.data = ""

        # connect to ftp server
        self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
        self.connect((host, 21))

    def handle_connect(self):
        # connection succeeded
        pass

    def handle_expt(self):
        # connection failed
        self.close()

    def collect_incoming_data(self, data):
        # received a chunk of incoming data
        self.data = self.data + data

    def found_terminator(self):
        # got a response line
        data = self.data
        if data.endswith("\r"):
            data = data[:-1]
        self.data = ""

        print "S:", data

        if re.match("\d\d\d ", data):
            # this was the last line in this response
            # send the next command to the server
            try:
                command = self.commands.pop(0)
            except IndexError:
                pass # no more commands
            else:
                print "C:", command
                self.push(command + "\r\n")

anon_ftp("ftp.python.org")

asyncore.loop()

This class uses a predefined command list (in the commands attribute), which logs in to an FTP server as an anonymous user, fetches the name of the current directory using the PWD command, and finally logs off.

The re.match function uses a regular expression to look for a string that starts with three digits followed by a space; as we saw earlier, the server may send multiline responses, but only the last line in such a response may use a space as the fourth character.

If you run this script, it should print something like this:

S: 220 ProFTPD 1.2.4 Server (ftp.python.org)
C: USER anonymous
S: 331 Anonymous login ok, send your complete email address as your password.
C: PASS anonymous@
S: 230 Anonymous access granted, restrictions apply.
C: PWD
S: 257 "/" is current directory.
C: QUIT
S: 221 Goodbye.

A problem here is of course that the client doesn’t really look at the server response; we’ll keep sending commands even if the server doesn’t allow us to log in. And even if it’s not very common, an FTP server does not have to require a password. If the USER command results in a 220 response code, the client shouldn’t send a PASS command.

In other words, you need to look at each response before you decide what to do next. One way to do this is to add explicit tests to the found_terminator code; something like this could work:

    last_command = None

    def found_terminator(self):
        # got a response line
        data = self.data
        if data.endswith("\r"):
            data = data[:-1]
        self.data = ""

        if not re.match("\d\d\d ", data):
	    return

        # this was the last line in this response
        # check if last command needs special treatment

        if self.last_command == None:
            # handle connection
            if data.startswith("220"):
                self.last_command = "USER"
                self.push("USER anonymous\r\n")
                return
            else:
                raise Exception("ftp login failed")

        elif self.last_command == "USER":
            # handle user response
            if data.startswith("230"):
                pass # user accepted
            elif data.startswith("331") or data.startswith("333"):
                self.last_command = "PASS"
                self.push("PASS " + self.password + "\r\n")
                return
            else:
                raise Exception("ftp login failed")

        elif self.last_command == "PASS":
            if code == "230":
                pass # user and password accepted
            else:
                raise Exception("ftp login failed")

        # send the next command to the server
        try:
            self.push(self.commands.pop(0) + "\r\n")
        except IndexError:
            pass # no more commands

A more flexible (and scalable) approach is to use pluggable response handlers. The following version adds a handle attribute which, if not None, points to a piece of code that’s prepared to look at the response from the previous command.

The ftp_handle_connect, ftp_handle_user_response, and ftp_handle_pass_response handlers take care of the login sequence.

Example: using response handlers to check FTP responses
import asyncore, asynchat
import re, socket

class anon_ftp(asynchat.async_chat):

    def __init__(self, host):
        asynchat.async_chat.__init__(self)

        self.host = host

        self.user = "anonymous"
        self.password = "anonymous@"

        self.set_terminator("\n")

        self.data = ""

        self.response = []

        self.commands = ["PWD", "QUIT"]

        self.handler = self.ftp_handle_connect

        # connect to ftp server
        self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
        self.connect((host, 21))

    def handle_connect(self):
        # connection succeeded
        pass

    def handle_expt(self):
        # connection failed
        self.close()

    def collect_incoming_data(self, data):
        self.data = self.data + data

    def found_terminator(self):

        # collect response
        data = self.data
        if data.endswith("\r"):
            data = data[:-1]
        self.data = ""
        self.response.append(data)
        if not re.match("\d\d\d ", data):
            return

        response = self.response
        self.response = []

        for line in response:
            print "S:", line

        # process response
        if self.handler:
            # call the response handler
            handler = self.handler
            self.handler = None

            handler(response)

            if self.handler:
                return # follow-up command in progress

        # send next command from queue
        try:
            print "C:", self.commands[0]
            self.push(self.commands.pop(0) + "\r\n")
        except IndexError:
            pass

    def ftp_handle_connect(self, response):
        code = response[-1][:3] # get response code
        if code == "220":
            self.push("USER " + self.user + "\r\n")
            self.handler = self.ftp_handle_user_response
        else:
            raise Exception("ftp login failed")

    def ftp_handle_user_response(self, response):
        code = response[-1][:3]
        if code == "230":
            return # user accepted
        elif code == "331" or code == "332":
            self.push("PASS " + self.password + "\r\n")
            self.handler = self.ftp_handle_pass_response
        else:
            raise Exception("ftp login failed: user name not accepted")

    def ftp_handle_pass_response(self, response):
        code = response[-1][:3]
        if code == "230":
            return # user and password accepted
        else:
            raise Exception("ftp login failed: user/password not accepted")

anon_ftp("ftp.python.org")

asyncore.loop()

Running this, you’ll get output similar to this (note that commands sent by the response handlers are not logged):

S: 220 ProFTPD 1.2.4 Server (ftp.python.org)
S: 331 Anonymous login ok, send your complete email address as your password.
S: 230 Anonymous access granted, restrictions apply.
C: PWD
S: 257 "/" is current directory.
C: QUIT
S: 221 Goodbye.

Downloading Directory Listings

As mentioned earlier, the FTP server uses separate data channels to transfer data. The main channel is only used to issue commands, and to return responses from the server.

Let’s use the LIST command as an example. Before you can send this command, you must use PASV to set up a data channel. The server will respond with the port number to connect to, and wait for the LIST command (or any other data transfer command).

The command/response exchange might look something like:

C: PASV
S: 227 Entering Passive Mode (194,109,137,227,8,11).
C: LIST
S: 150 Opening ASCII mode data connection for file list
...download listing from port 8*256+11=2059...
S: 226 Transfer complete.

To parse the PASV response, you can use a response handler looking something like:

import re

# get port number from pasv response
pasv_pattern = re.compile("[-\d]+,[-\d]+,[-\d]+,[-\d]+,([-\d]+),([-\d]+)")

class anon_ftp(asynchat.async_chat):

    ...

    def ftp_handle_pasv_response(self, response):
        code = response[-1][:3]
        if code != "227":
            return # pasv failed
        match = pasv_pattern.search(response[-1])
        if not match:
            return # bad port
        p1, p2 = match.groups()
        try:
            port = (int(p1) & 255) * 256 + (int(p2) & 255)
        except ValueError:
            return # bad port
        # establish data connection
        async_ftp_download(self.host, port)

Note that to be on the safe side, the regular expression accepts negative integers, and the port number calculation only uses eight bits from each integer.

The async_ftp_download class is another asynchronous socket class. Here’s a simple implementation that simple prints all incoming data to standard output:

import asyncore, socket, sys

class async_ftp_download(asyncore.dispatcher):

    def __init__(self, host, port):
        asyncore.dispatcher.__init__(self)
        self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
        self.connect((host, port))

    def writable(self):
        return 0

    def handle_connect(self):
        pass

    def handle_expt(self):
        self.close()

    def handle_read(self):
        sys.stdout.write(self.recv(8192))

    def handle_close(self):
        self.close()

The last piece of the puzzle is to make sure that the ftp_handle_pasv_response method is called at the right time. The first step is to change the command list, to make sure we send PASV followed by a LIST command:

        self.commands = ["PASV", "LIST", "QUIT"]

If you run this, the client will hang after the LIST command. Or rather, it’s the server that hangs, waiting for the client to connect to the given port.

To fix this, let’s add an optional handler to the command list, and change the send code to look for an optional response handler:

class anon_ftp(asynchat.async_chat):
   
    def __init__(self, host):

        ...

        self.commands = [
	    "PASV", self.ftp_handle_pasv_response,
            "LIST",
            "QUIT"
        ]

        ...

    def found_terminator(self):

        ...

        # send next command from queue
        try:
            command = self.commands.pop(0)
            if self.commands and callable(self.commands[0]):
                self.handler = self.commands.pop(0)
            print "C:", command
            self.push(command + "\r\n")
        except IndexError:
            pass

If you put all the pieces together and run the script, you’ll get something like:

S: 220 ProFTPD 1.2.4 Server (ftp.python.org)
S: 331 Anonymous login ok, send your complete email address as your password.
S: 230 Anonymous access granted, restrictions apply.
C: PASV
S: 227 Entering Passive Mode (194,109,137,227,8,20).
C: LIST
S: 150 Opening ASCII mode data connection for file list
C: QUIT
drwxrwxr-x   4 webmaster webmaster      512 Oct 12  2001 pub
S: 226 Transfer complete.
S: 221 Goodbye.

In this case, the directory listing contains a single directory, called pub.

Note that this directly listing looks like the output from Unix’ ls command. Unfortunately, the FTP standard doesn’t specify what format to use; the servers can use any format they want, hoping that a human reader will be able to figure something out. But in practice, most contemporary servers use the Unix format.

The following snippet can be used to “parse” the output line. It’s far from bulletproof (e.g. what happens if a filename contains a space?), but it’s better than nothing:

parts = line.split()
if len(parts) > 2:
    directory = parts[0].startswith("d")
    size = int(parts[5])
    filename = parts[-1]

To be continued…


In the next article, we’ll look at how to move around between directories on the server, and how to download data from the server. Stay tuned.

Send questions and comments to fredrik@pythonware.com.