Posts Tagged ‘RFC2047’

Email marketing and encoding

Friday 31 October 2008

Yesterday I on my gmail account a commercial e-mail with the following subject line:

What a great way to get my attention. As a someone interested in encodings that is, I doubt whether this will make a good impression on the general audience.

This and many other commercial e-mail senders have troubles getting encoding right. It is simple but yet not easy to do. I have written before about encoding and Joel Spolsky has a great article about it. Encoding e-mails is even harder than webpages and most other documents because e-mail delivery is only secure for 7 bit characters and an email is a combination of an envelope (you do not see), the header (you see parts of) and the contents. Each part has it’s own rules for encoding. Having only 7bit channels, means you can only send 128 different characters. Characters with higher numbers such as é, ø and ü -used in most European countries- must therefor be encoded.

Going back to the above subject line and the appearance of =?Windows-1252?Q? in the subject line, clearly indicates something went wrong using the Windows-1252 encoding. Subject lines are part of an email header which must be encoded using the so called MIME encoded-word syntax (described in RFC2047). The format of this encoding is “=?charset?encoding?encoded text?=“. Where encoding can be either B for base64 or Q for quoted printable and the encoded text is written using the specified encoding.

On a first look there seems to be nothing wrong with the subject line “=?windows-1252?Q?Kerst_of_Oudjaar_buitenshuis_vieren?_Onze_tips_voor_de_feestdagen.?=”. Closer look reveals that the question mark in the subject line itself is the problem. Since it is used as a marker for the encoding it should be encoded itself as =3F in quoted printable. Most likely the developer of the library used to compose and encode the email overlooked this part of the RFC2047 specification.

Advertisements