Building An Asynchronous FTP Client
October 17, 2002 | Fredrik Lundh
This article describes how to use Python’s standard asynchat and asyncore modules to implement an asynchronous FTP client. In the first part, we’ll look at the FTP protocol itself, and how to use the asynchat library to talk to an FTP server.
Contents:
- Part #1: Reading Directory Listings
- Part #2: Transferring Files
The scripts and modules used in this article are available from the effbot.org subversion repository:
$ svn co http://svn.effbot.python-hosting.com/stuff/zone/asyncore-ftp
Part #1: Reading Directory Listings #
The File Transfer Protocol #
The File Transfer Protocol (FTP) has been around for ages; it’s even older than the Internet. Despite its age, FTP is still commonly used to download data from remote servers, and it’s by far the most common protocol for uploading data to servers.
Unlike HTTP, the FTP is a “chat-style” protocol. The client sends a command, waits for a response, sends another command, reads the response, etc. A typical interchange might look something like (C=client, S=server):
C: connects S: 220 FTP server ready. C: USER mulder S: 331 Password required for mulder C: PASS trustno1 S: 230 User mulder logged in. C: PASV S: 227 Entering Passive Mode (195,100,36,198,219,28) C: RETR sculley.zip S: 150 Opening BINARY mode data connection for sculley.zip (271165 bytes). S: 226 Transfer complete. C: PASV S: 227 Entering Passive Mode (195,100,36,198,219,29) C: LIST S: 150 Opening ASCII mode data connection for directory listing. S: 226 Transfer complete. C: QUIT S: 221-You have transferred 271165 bytes in 1 files. S: 221-Total traffic for this session was 271859 bytes in 1 transfers. S: 221 Thank you for using the FTP service on server.example.com.
The client lines all consists of a command name (e.g. USER) followed by an optional argument. The server response lines consist of a 3-digit code, followed by either a space or a dash (-), followed by a text message. The lines using a dash are belong to a multi-line response; the client should keep reading response lines until it gets a line without the dash.
Lines are separated by CR and LF (chr(10)+chr(13)), but some clients and servers use only LF (chr(10)).
Common FTP Commands
The above example uses the following FTP commands:
USER. Provide user name. The server should respond with 230 if the user is accepted as is, 530 if the login attempt was rejected, or 331 or 332 if the client must provide a password (using the PASS command).
PASS. Provide password. The server should respond with 230 if the user is accepted, 530 if the login failed, or 332 if further login information is required (the details of which is outside the scope of this article).
PASV. Tell the server to prepare a data transfer channel. The server will return 227 and the response message will also contain six integers, separated by commas. The numbers specify an IP address and a port number to which the client should connect to transfer the data. The client should ignore the first four digits, and use the server address instead. To get the port number, multiply the fifth integer by 256 and add the sixth integer.
RETR. Initialize a data transfer from the server to the client, using the port number specified by the PASV command. The client should connect to the data port before issuing this command. When the transfer is initialized, the server will return a 150 response and start sending data over the transfer port. When the transfer is completed (whether all data was sent or not), the server follows up with a 226 response.
LIST. This is similar to RETR, but it returns a directory listing for the current directory. As with RETR, you must use PASV to prepare the data channel before issuing this command.
QUIT. Shutdown the connection. The server usually returns a multiline summary message. If you’re not interested in the message, you can just shut down the socket connection.
For more information on the FTP protocol, see Dan Bernstein’s extensive FTP protocol reference, which is written with an emphasis on how FTP works in practice.
Introducing the asynchat Module #
The asyncore library comes with a support module for chat-style protocols, called asynchat. This module provides a asyncore.dispatcher subclass called async_chat, which adds an input parser and output buffering to the basic dispatcher.
The input parser feeds data to the collect_incoming_data method. When the parser sees a predefined terminator string, it calls the found_terminator method. The following example prints incoming lines to standard output, one line at a time:
class channel(asynchat.async_chat): def __init__(self): asynchat.async_chat.__init__(self) self.buffer = "" self.set_terminator("\r\n") def collect_incoming_data(self, data): self.buffer = self.buffer + data def found_terminator(self): print "got", self.buffer self.buffer = ""
The async_chat class also provides output buffering, via the push method:
class channel(asynchat.async_chat): def found_terminator(self): # echo string back to sender self.push("echo %s\n" % self.buffer) self.buffer = ""
There’s also a push_with_producer method that takes a producer object, which can be used to generate data on the fly. Producer objects are outside the scope of this article.
The push and push_with_producer methods add data to an output queue, and the framework automatically sends data whenever the receiving end is ready.
Using asynchat for FTP
But let’s get back to the topic for this article: doing asynchronous FTP.
The FTP server expects the client to read a response, send a command, read the next response, etc. The found_terminator method is where you end up after each response, so it makes a certain sense to put the protocol logic in that method. Here’s a first attempt:
import asyncore, asynchat import re, socket class anon_ftp(asynchat.async_chat): def __init__(self, host): asynchat.async_chat.__init__(self) self.commands = [ "USER anonymous", "PASS anonymous@", "PWD", "QUIT" ] self.set_terminator("\n") self.data = "" # connect to ftp server self.create_socket(socket.AF_INET, socket.SOCK_STREAM) self.connect((host, 21)) def handle_connect(self): # connection succeeded pass def handle_expt(self): # connection failed self.close() def collect_incoming_data(self, data): # received a chunk of incoming data self.data = self.data + data def found_terminator(self): # got a response line data = self.data if data.endswith("\r"): data = data[:-1] self.data = "" print "S:", data if re.match("\d\d\d ", data): # this was the last line in this response # send the next command to the server try: command = self.commands.pop(0) except IndexError: pass # no more commands else: print "C:", command self.push(command + "\r\n") anon_ftp("ftp.python.org") asyncore.loop()
This class uses a predefined command list (in the commands attribute), which logs in to an FTP server as an anonymous user, fetches the name of the current directory using the PWD command, and finally logs off.
The re.match function uses a regular expression to look for a string that starts with three digits followed by a space; as we saw earlier, the server may send multiline responses, but only the last line in such a response may use a space as the fourth character.
If you run this script, it should print something like this:
S: 220 ProFTPD 1.2.4 Server (ftp.python.org) C: USER anonymous S: 331 Anonymous login ok, send your complete email address as your password. C: PASS anonymous@ S: 230 Anonymous access granted, restrictions apply. C: PWD S: 257 "/" is current directory. C: QUIT S: 221 Goodbye.
A problem here is of course that the client doesn’t really look at the server response; we’ll keep sending commands even if the server doesn’t allow us to log in. And even if it’s not very common, an FTP server does not have to require a password. If the USER command results in a 220 response code, the client shouldn’t send a PASS command.
In other words, you need to look at each response before you decide what to do next. One way to do this is to add explicit tests to the found_terminator code; something like this could work:
last_command = None def found_terminator(self): # got a response line data = self.data if data.endswith("\r"): data = data[:-1] self.data = "" if not re.match("\d\d\d ", data): return # this was the last line in this response # check if last command needs special treatment if self.last_command == None: # handle connection if data.startswith("220"): self.last_command = "USER" self.push("USER anonymous\r\n") return else: raise Exception("ftp login failed") elif self.last_command == "USER": # handle user response if data.startswith("230"): pass # user accepted elif data.startswith("331") or data.startswith("333"): self.last_command = "PASS" self.push("PASS " + self.password + "\r\n") return else: raise Exception("ftp login failed") elif self.last_command == "PASS": if code == "230": pass # user and password accepted else: raise Exception("ftp login failed") # send the next command to the server try: self.push(self.commands.pop(0) + "\r\n") except IndexError: pass # no more commands
A more flexible (and scalable) approach is to use pluggable response handlers. The following version adds a handle attribute which, if not None, points to a piece of code that’s prepared to look at the response from the previous command.
The ftp_handle_connect, ftp_handle_user_response, and ftp_handle_pass_response handlers take care of the login sequence.
import asyncore, asynchat import re, socket class anon_ftp(asynchat.async_chat): def __init__(self, host): asynchat.async_chat.__init__(self) self.host = host self.user = "anonymous" self.password = "anonymous@" self.set_terminator("\n") self.data = "" self.response = [] self.commands = ["PWD", "QUIT"] self.handler = self.ftp_handle_connect # connect to ftp server self.create_socket(socket.AF_INET, socket.SOCK_STREAM) self.connect((host, 21)) def handle_connect(self): # connection succeeded pass def handle_expt(self): # connection failed self.close() def collect_incoming_data(self, data): self.data = self.data + data def found_terminator(self): # collect response data = self.data if data.endswith("\r"): data = data[:-1] self.data = "" self.response.append(data) if not re.match("\d\d\d ", data): return response = self.response self.response = [] for line in response: print "S:", line # process response if self.handler: # call the response handler handler = self.handler self.handler = None handler(response) if self.handler: return # follow-up command in progress # send next command from queue try: print "C:", self.commands[0] self.push(self.commands.pop(0) + "\r\n") except IndexError: pass def ftp_handle_connect(self, response): code = response[-1][:3] # get response code if code == "220": self.push("USER " + self.user + "\r\n") self.handler = self.ftp_handle_user_response else: raise Exception("ftp login failed") def ftp_handle_user_response(self, response): code = response[-1][:3] if code == "230": return # user accepted elif code == "331" or code == "332": self.push("PASS " + self.password + "\r\n") self.handler = self.ftp_handle_pass_response else: raise Exception("ftp login failed: user name not accepted") def ftp_handle_pass_response(self, response): code = response[-1][:3] if code == "230": return # user and password accepted else: raise Exception("ftp login failed: user/password not accepted") anon_ftp("ftp.python.org") asyncore.loop()
Running this, you’ll get output similar to this (note that commands sent by the response handlers are not logged):
S: 220 ProFTPD 1.2.4 Server (ftp.python.org) S: 331 Anonymous login ok, send your complete email address as your password. S: 230 Anonymous access granted, restrictions apply. C: PWD S: 257 "/" is current directory. C: QUIT S: 221 Goodbye.
Downloading Directory Listings
As mentioned earlier, the FTP server uses separate data channels to transfer data. The main channel is only used to issue commands, and to return responses from the server.
Let’s use the LIST command as an example. Before you can send this command, you must use PASV to set up a data channel. The server will respond with the port number to connect to, and wait for the LIST command (or any other data transfer command).
The command/response exchange might look something like:
C: PASV S: 227 Entering Passive Mode (194,109,137,227,8,11). C: LIST S: 150 Opening ASCII mode data connection for file list ...download listing from port 8*256+11=2059... S: 226 Transfer complete.
To parse the PASV response, you can use a response handler looking something like:
import re # get port number from pasv response pasv_pattern = re.compile("[-\d]+,[-\d]+,[-\d]+,[-\d]+,([-\d]+),([-\d]+)") class anon_ftp(asynchat.async_chat): ... def ftp_handle_pasv_response(self, response): code = response[-1][:3] if code != "227": return # pasv failed match = pasv_pattern.search(response[-1]) if not match: return # bad port p1, p2 = match.groups() try: port = (int(p1) & 255) * 256 + (int(p2) & 255) except ValueError: return # bad port # establish data connection async_ftp_download(self.host, port)
Note that to be on the safe side, the regular expression accepts negative integers, and the port number calculation only uses eight bits from each integer.
The async_ftp_download class is another asynchronous socket class. Here’s a simple implementation that simple prints all incoming data to standard output:
import asyncore, socket, sys class async_ftp_download(asyncore.dispatcher): def __init__(self, host, port): asyncore.dispatcher.__init__(self) self.create_socket(socket.AF_INET, socket.SOCK_STREAM) self.connect((host, port)) def writable(self): return 0 def handle_connect(self): pass def handle_expt(self): self.close() def handle_read(self): sys.stdout.write(self.recv(8192)) def handle_close(self): self.close()
The last piece of the puzzle is to make sure that the ftp_handle_pasv_response method is called at the right time. The first step is to change the command list, to make sure we send PASV followed by a LIST command:
self.commands = ["PASV", "LIST", "QUIT"]
If you run this, the client will hang after the LIST command. Or rather, it’s the server that hangs, waiting for the client to connect to the given port.
To fix this, let’s add an optional handler to the command list, and change the send code to look for an optional response handler:
class anon_ftp(asynchat.async_chat): def __init__(self, host): ... self.commands = [ "PASV", self.ftp_handle_pasv_response, "LIST", "QUIT" ] ... def found_terminator(self): ... # send next command from queue try: command = self.commands.pop(0) if self.commands and callable(self.commands[0]): self.handler = self.commands.pop(0) print "C:", command self.push(command + "\r\n") except IndexError: pass
If you put all the pieces together and run the script, you’ll get something like:
S: 220 ProFTPD 1.2.4 Server (ftp.python.org) S: 331 Anonymous login ok, send your complete email address as your password. S: 230 Anonymous access granted, restrictions apply. C: PASV S: 227 Entering Passive Mode (194,109,137,227,8,20). C: LIST S: 150 Opening ASCII mode data connection for file list C: QUIT drwxrwxr-x 4 webmaster webmaster 512 Oct 12 2001 pub S: 226 Transfer complete. S: 221 Goodbye.
In this case, the directory listing contains a single directory, called pub.
Note that this directly listing looks like the output from Unix’ ls command. Unfortunately, the FTP standard doesn’t specify what format to use; the servers can use any format they want, hoping that a human reader will be able to figure something out. But in practice, most contemporary servers use the Unix format.
The following snippet can be used to “parse” the output line. It’s far from bulletproof (e.g. what happens if a filename contains a space?), but it’s better than nothing:
parts = line.split() if len(parts) > 2: directory = parts[0].startswith("d") size = int(parts[5]) filename = parts[-1]
To be continued…
In the next article, we’ll look at how to move around between directories on the server, and how to download data from the server. Stay tuned.
Send questions and comments to .