SMTP Overview

Document Sample
SMTP Overview Powered By Docstoc
					                         Simple Mail Transfer Protocol
SMTP is the Internet protocol used to transfer electronic mail between computers, much like HTTP is the
Internet protocol used to transfer web pages between computers. Like HTTP, there has been more than one
generation of SMTP; the second generation is called ESMTP (for Extended SMTP), but the differences are not
important for this introduction.

This attempts to be a quick overview of SMTP and related concepts, explaining enough of how it works so that
the reader can follow reasonable technical discussions.

In SMTP (and in the rest of this discussion), the client is the computer that is sending email, and the server is
the computer that is receiving it. Thus we say SMTP clients (or SMTP senders) send email to SMTP servers,
although the machines involved may both be servers in the general sense; for example an ISP's mail server
sending email to the SMTP servers for

The envelope versus the letter
Just like physical letters, SMTP email has two different sets of address information: the envelope headers, like
the addresses on the outside of an envelope, which are used by mail transport software to route and deliver the
email, and the normal headers, which are part of the mail message and which are only read and interpreted by
the user and his software, just like the address attached to a salutation at the start of a physical letter. Unlike the
post office, SMTP usually throws away most of the envelope before it hands the message to the user, so many
users are not aware of the envelope headers.

In fact, SMTP never looks at the message headers at all; as far as SMTP is concerned, the email message
(headers and all) is just one big blob that it shuttles around. Many SMTP clients are perfectly happy to deliver
email with badly broken or entirely nonexistent message headers.

The SMTP protocol
Like many Internet protocols, SMTP operates by sending lines of text back and forth between the client and the
server. The client sends commands and eventually the email message, and the server sends back responses to
tell the client if the server accepted the command or if something went wrong.

Server responses always come in a special format: three digits, a space (or a dash), and then some free-format
text (in error messages, this is usually intended for users to read; otherwise it is generally just noise). If there is
a dash after the third digit instead of a space, further lines of response follow; otherwise, this is the last line. The
only really important thing about the response is the first digit, like so:

Code            Meaning
2xx everything is fine, go on
4xx   temporary problem, try again later
5xx permanent error, give up
Errors can happen at any time, so at any response a server can send a temporary or a permanent error instead
of the go ahead indication the client was expecting. A proper client must be able to cope with this, retrying
temporary failures (but not too soon or too often) and giving up gracefully on permanent failures. (Tragically,
there are improper clients out there in the world.)

A SMTP conversation between the client and the server goes in stages, each one initiated by the client doing
something. A typical conversation will look like:

                     Client does:                          Server normally responds with:

          Connects to the server                220 Helo there

          HELO client-hostname                  250 Pleased to meet you

          MAIL FROM:<Sender address>            250 OK

          RCPT TO:<Recipient address>
                                                250 OK
          (May be repeated)

          DATA                                  354 Start mail input; end with <CRLF>.<CRLF>

          Sends the actual email message        (Nothing, it's waiting for the . that ends the message)

          .                                     250 OK, accepted for delivery

At this point the email message has been sent. The client can now disconnect with a QUIT command, or it can
send another email message by starting with the MAIL FROM step again (optionally sending a RSET command

The sender address is the email address that will receive email about delivery problems (mailing lists change
this but not the From: email header so that they, and not the people sending to them, get messages about
delivery problems). A special null sender address (MAIL FROM:<>) is used to signal that no one cares and no
bounce notifications should be sent. Null senders are used when sending bounce messages themselves, and
sometimes at other times.

There can be multiple recipients of the same message on the same computer. So that the actual email message
only has to be transfered once (saving bandwidth), there can be several RCPT TOs for a message. (There has to
be at least one, just like there has to be a MAIL FROM.) The client has to keep track of which recipient addresses
have problems, if any, and retry them later if necessary.

The envelope headers are the MAIL FROM and RCPT TO parts of the SMTP conversation. The envelope sender is
the MAIL FROM address, and the envelope recipients are the RCPT TO addresses.

The client-hostname, the sender address, and the recipient addresses should all be fully qualified. A fully
qualified host or domain name is one that anyone on the Internet could use to look up information, not a
shortened name useful only on machines inside an organization; for example, instead of
just server. A fully qualified email address is an email address with a fully qualified host or domain name, not
just an email login; for example, MAIL FROM:<> instead of MAIL
FROM:<postmaster> or MAIL FROM:<postmaster@server>. If the host or domain name is left off an email
address, the SMTP server usually has no choice but to interpret it as an address on itself.
Email routing, or welcome to DNS
All of this is very well and good, but it doesn't tell us how a client machine with email to send to decides which SMTP server to deliver it to. That is decided by looking various pieces of
information up in the Domain Name System, DNS, which is another Internet protocol and system.

DNS exists to give out various sorts of information about names; you give it a name and what type of
information you want, and it tries to give you back an answer. For our purposes (and simplifying a bit), there
are three interesting types of information, conventionally called record types:

       NS records for a domain, which tell you what hostnames can give you further information about that
       domain and names inside that domain, such as MX records or A records.
       MX records for a name, which tell you what hostnames should accept SMTP email for user@name, and
       which order you should try them in.
       A records for a hostname, which give you the IP addresses associated with the host.

Reduced to its simplest form, a SMTP client with email to send to looks up NS records
until it finds the nameservers for, then asks them for the MX record for, and finally
asks for A records to determine the IP addresses of the names in the MX record. If a name has no MX record
but does have an A record, email is delivered straight to the IP addresses listed.

Shared By: