Anda di halaman 1dari 47

Here is some links about this issue and how to resolve:

http://help.outlook.com/en-us/140/gg263346.aspx

Manage TNEF Message Formatting with Remote Domains


Do some of your users report that e-mail recipients in external domains can't open their messages that contain a
Winmail.dat attachment? If so, the recipients in the external domain are probably using an e-mail client that doesn't
support the Transport Neutral Encapsulation Format (TNEF). Microsoft Outlook is one of the few e-mail clients that
support TNEF-encoded messages, although some third-party utilities can help convert Winmail.dat attachments.

You can configure remote domains to prevent your users from sending messages that contain the Winmail.dat
attachment to all external domains or to specific external domains. The resulting messages are delivered as HTML
or plain text.

The white paper about the message formatting:

http://technet.microsoft.com/en-us/library/hh547013(v=exchg.141).aspx

Message Formatting
Contents
Introduction. 2
Structure of a message. 3
Message Content. 4
Message Header. 4
Header Folding. 5
Message Body. 5
Originator fields. 6
MIME (Multipurpose Internet Mail Extensions). 7
Mime Headers. 9
Mime-Version. 9
Content-Type. 9
Content-Transfer-Encoding. 35
Character Encoding (charset): 43
References. 45
Credits. 45
Tech Bulletin Archive and Subscription Information. 45

The information in this Tech bulletin is customer ready.

Introduction
When a message is sent using any Messaging System it will contain certain information and structure
which will allow the user agent also referred as client to display the message as it was sent.

This document discusses these headers and the structure of these messages. These structures and
formats are defined as MIME and discussed in various RFCs such as RFC 2821, 2822, 2045, 2046, 2047
and 2049.

In this header discussion we used EML files because this file type gives the most complete example of a
message when accessed from its source
Figure1a: A way of reaching the Message Source for eml file
(File-> Properties->Details->Message Source)

Figure 1b: Message Source of the same eml file

Structure of a message
The first step is to understand the format of a message.

A message consists of two parts:


a. Message Envelope: This portion consists of information about transmission and delivery of the
message. This is generated by the transmission process and is not a part of the message. The
message envelope is created by the client who is submitting the message as has relevant
information for successful transmission of the message. The message envelope is defined in
RFC2821

More information about RFC 2821 found here:

b. Message content: This is the portion which is delivered to the recipient. This portion will have
two elements as defined in RFC2822. Any email client will use this information to display a
message.

 Message Header: The message header is a collection of header field.

 Message Body: The message body is collection of lines US-ASCII text. This follows the
message headers.

Before we discuss about the details of the Message Header and Message Body let us first look at a
graphical picture.

Figure 2: A Simple Internet Message

(Figure 2 reference found here)


Message Content

Having seen the basic format of the message now let us understand some Message Content and
RFC2822 in bit detail

Message Header
 It consists of a field name; followed by a colon (:) character, followed by a field body, and ended
by a carriage return/line feed (CR/LF) character combination.

 A field name and the field body are composed of printable US-ASCII text characters except the
colon (:) character. US-ASCII characters that have values from 33 through 57 and 59 through 126
are permitted for Header field name.

 A Header field can be composed of any US-ASCII characters except CR and LF.

Taken from: http://technet.microsoft.com/en-us/library/bb232174.aspx

You will need a tool such as Notepad2 or Notepad++ to view the CRLF in a Message header

Figure 3: Example of a header field

Note: Notepad++ can be found here & Notepad2 can be found here

Header Folding

In Header folding, a field body is spilt into multiple lines represented with the help of CR and LF. This is
known as folding and it is done to deal with 998/78 character limitation per line.

A field body can contain a carriage return (CR) and a line feed (LF) when it is used in Header folding.

The general rule is that wherever this standard allows for folding white space not simply white space
characters (here after referred to as WSP), a CRLF may be inserted before any WSP.

For example, the header field:

SUBJECT: THIS IS A TEST

Can be represented as:


SUBJECT: THIS
IS A TEST

More information in section 2.2.3 of RFC 2822. For more detailed syntax information please refer
section 3 and 4.

Message Body

The message body is a collection of lines of US-ASCII text characters that appear after the message
header. The message header and the message body are separated by a blank line that ends with the
CR/LF character combination. The message body is optional. Any line of text in the message body must
be less than 998 characters. The CR and LF characters can only appear together to indicate the end of a
line.

Figure 4: A message with headers and body


Originator fields

The message must consist of from field, sender field and an optional reply-to field. If more than one
from field is present then a sender field must be present. The “From” field is supposed to be the author
of the message while the “Sender” field is the mailbox which actually sends this message. Also, if the
author and transmitter of the message are same, the “Sender” field should not be used.

In Exchange, sender and from fields are seen in envelope journaled messages and also in Send on Behalf
of messages.

An example of these fields as seen in one message

FROM: <AUTHOR@DOMAIN.COM>
SUBJECT: TESTING SENDER-FIELD
SENDER: < SENDER@DOMAIN. COM>
TO: <XYZ@DOMAIN.COM>
CC: <ABC@DOMAIN.COM >

Another example from journaling

MIME-VERSION: 1.0
FROM : TESTE 2K31 <TESTE 2K31@SFDOMAIN.LCOAL>
SENDER : <MICROSOFT E XCHANGE329E71EC88AE4615BBC36AB6 CE41109E@TESTINGDOMAIN .LOCAL>
TO: UKP <UKP@SFDOMAIN.LCOAL>
SUBJECT: E2K3 TO E2K7
MESSAGE -ID: <81DEE 15F -8A91-4660- BDB 5- B16A8547F067@ JOURNAL.REPORT .GENERATOR>
DATE: FRI, 5 FEB 2010 06:28:49 +0530
CONTENT-TRANSFER -ENCODING : BINARY
X-MS-JOURNAL-REPORT:
RETURN-PATH: <>
X-ORIGINALARRIVALTIME: 05 FEB 2010 00:58:49.0482 (UTC) FILETIME=[609F76A0:01CAA5FE]

MIME (Multipurpose Internet Mail Extensions)


In today’s world, messages are not just sent in US-ASCII text. Multipurpose Internet Mail Extensions
defines standards to encode in these messages.

MIME defines a message format that allows for:

 Textual message bodies in character sets other than US-ASCII.

 Non-textual message bodies.

 Multipart message bodies.

 Textual header information in character sets other than US-ASCII.

Figure 5: A Simple MIME Message


(Figure 5 reference found here)
This example shows the use of a MIME message to send a text message and an attached text file. Both
are body parts of this message.

The MIME-Version header informs the receiving client to treat this as a MIME message.

Due to the presence of multipart content type a boundary is present. It tells the receiving clients the
message has multiple parts and is separated by string defined in boundary=. A MIME-compliant client
will only display or otherwise process content following the specified boundary= text
strings. Boundaries are constructed using the boundary= string, pre pended by -- (double dash). The
final body part is followed by the boundary= string with the -- (double dash) both pre pended and
appended.

Defining a boundary:

Content-Type: multipart/mixed; boundary="XXXXboundary text"

Defining a body part:

--XXXXboundary text

Body part of the message:

--XXXXboundary text

Ending a boundary:

--XXXXboundary text--

A MIME aware client will not display “This is a multipart message in MIME format” because it is outside
the boundary.

Mime Headers
These headers appear at the beginning of a MIME message as well as within the separate body parts.
Some of them can be used both as message headers and in MIME body parts. Some headers are
defined for use only in body parts.

The following headers are defined in MIME:

 MIME-Version

 Content-Type

 Content-Transfer-Encoding

 Content-ID

 Content-Disposition
Note: Headers that begin with "Content-" are the only headers that have defined meaning in body
parts.

Mime-Version
This is the first header present on the message except in message-rfc822 which also has a Mime-Version
header for the encapsulated message. MIME aware e-mail clients use this header field to identify a
MIME-encoded message. When this header field is absent, MIME aware e-mail clients identify the
message as plain text.
"MIME-Version: 1.0" is the only value presently accepted value.

Content-Type
This gives power to the MIME encoded message. Headers are used to specify the media type and
subtype of data in the body of a message. This header field identifies the media type of the message
content as described in RFC 2046. A media type consists of a type, a subtype, and one or more optional
parameters, such as a charset= parameter that defines the MIME character encoding. It can also have
values such as X-Something or x-something although they are not standards. Vendor specific media
types begin with vnd. The Internet Assigned Numbers Authority (IANA) maintains a list of registered
media types.
Example of standard content-type
multipart/mixed
multipart/alternative

Examples of x-something content-type


application/x-dvi: Digital Video files in DVI format
application/x-rar-compressed: RAR archive files

Examples of vendor specified content-type


application/vnd.openxmlformats-officedocument.presentationml.presentation for pptx files

application/vnd.openxmlformats-officedocument.spreadsheetml.sheet for xlsx files

Note: Both are Office 2007 content-types.

The Internet Assigned Numbers Authority (IANA) maintains a list of registered media types. More
information found here.

Content-type can be either composite or discrete

The two composite top-level media types are:

 Message

 Multipart
The five discrete top-level media types are:

 Text

 Image

 Audio

 Video

 Application

Let us discuss few of the above with examples

Discrete media types

Message
The message content-type allows messages to contain other messages or pointers to other messages.

message/delivery-status[RFC1894]:

The message/delivery-status content type is defined for use in message delivery status notification,
allowing automated information transmission.

DSN from gateway to foreign system. A delivery report generated by Message Router(MAILBUS) and
gatewayed by PMDF_MR to a DSN. In this case the gateway did not have sufficient information to supply
an original-recipient address

Figure 6a: An Example of message/delivery-status

message/external-body [RFC2046]:
The message/external-body content type allows the contents of a message to be external to the
message and only referenced in the message. The only required parameter of this content type is
access-type, which can have values such as "FTP" and "LOCAL-ACCESS." If values are used that have not
been registered with IANA, then they begin with "x-". Message/external-body parts must include a
Content-ID header field with a unique identifier to reference the external data.

Figure 6b: An Example of message/external-body

Exchange Protocol Document [MS-OXCMAIL]: RFC2822 and MIME to E-Mail Object Conversion
Protocol Specification clearly specifies the following

4.5 Considerations for Message/External-Body

The original MIME RFC [RFC1521] allowed the body of an entity to be referenced externally rather than
requiring it to be inline. The current MIME RFC [RFC2046] specifies the form of this construct; the
security implications are as follows:

1. The blind retrieval of the content by the client can disclose information about the recipient.

2. The authentication mechanism tied to the retrieval (access-type parameter) can result in a pop-
up dialog box, leading the user to expose credential information.

3. The server (Policy or delivery application) that is attempting to check the content opens up a
denial of service vector for the remote host to tie up server resources.
message/partial [RFC2046]:
The message/partial content type allows for large messages to be broken up into smaller messages. The
full message can then be put back together by the client or User Agent(UA). Only 7bit content-transfer-
encoding is allowed for this content type.
Three parameters are required:
1. ID: a unique identifier used to match up the pieces.
2. Number: an integer identifying which piece of the message this is.
3. Total: an integer indicating the total number of parts the message has. This parameter is
required only on the final fragment of the message, but should be used on all parts.

Figure 7a: First part of message/partial


Figure 7b: Second part of message/partial

Figure 7c: Client or User Agent(UA) resembling the message

Exchange 2007 does not support Message/partial. When such messages are sent a NDR“5.6.1 Messages
of type message/partial are not supported” will be generated.

Exchange Protocol Document [MS-OXCMAIL]: RFC2822 and MIME to E-Mail Object Conversion
Protocol Specification

Clearly specifies the following reasons:

2.3.2 Message/Partial

The message/partial content type is not supported. <198>MIME readers MUST reject messages
that contain MIME entities with a message/partial Content-Type header field. This is to prevent
virus scanning from being defeated by splitting up attachment content.

4.4 Do Not Support Message/Partial

A Content-Type of message/partial allows large messages to be sent in pieces and re-assembled


by the client. It was originally designed to work around transmission failures during slow
delivery causing the complete message to be resent from scratch, and to work around message
size restrictions of implementations of protocols like SMTP. With increased bandwidth speeds,
and greater connectivity, the long transmission times are more a thing of the past. Continued
support for this Content-Type allows an avenue for content that is inappropriate to reach (or
leave) the e-mail client's computer. This could include things such as "Information disclosure" of
proprietary information, unsolicited commercial e-mail (spam), and computer virus
attachments.

E-mail servers attempt to protect their users from inappropriate content by implementing Policy
applications that run as part of the protocol. For them to work efficiently, the complete content
is incorporated into one message. For this reason, servers need to prohibit sending or receiving
messages with a Content-Type of "message/partial".

message/rfc822:
The message/rfc822 content type is used to enclose a complete message within a message. It is
different from other MIME body parts in that it must be a fully formed RFC822 message, complete with
headers.
Figure 8: Example of message/rfc822

Message/rfc822 content type is also used by envelope journal messages.

There are a few limitations with message/rfc822 messages:

Exchange 2007 Journal Reports lose header information in Outlook client when you configure
Exchange Server 2007 to deliver journal reports to an Exchange 2003 mailbox
http://support.microsoft.com/kb/972524

Also, if an eml is attached to a MAPI message the message content-type will change from
MESSAGE/RFC822 to APPLICATION/OCTET-STREAM and the entire message is base64 encoded.
Few clients like Pine, Simeon and Netscape cannot open the message but are able to save
attachments. Outlook express is able to successfully open this message. This was done to
distinguish if the sent message has an eml or msg attachment.

Multipart
Multi-part Content-Type headers identify multipart messages. They require that a subtype and other
elements be included in the header.

multipart/alternative:

This content type is used to specify the same content in different body parts in different forms. They are
place with increasing order of complexity.
Figure 9: Example of Multipart/Alternative

In the above example there are three body parts:

1. Text/plain

2. Text/enriched

3. Application/x-whatever

In multipart/alternative the same content is given in all of these body parts but in their own format. The
message builds from least complex to most complex body part. This is done so that non-mime clients
can take the benefits of text/plain and mime aware client can display the most complex body part they
support.

In Exchange 2003 multipart/alternative messages can be sent by selecting the following settings:

Note: When Multipart/Alternative is sent using Exchange it includes Text/plain and Text/Html
body parts.
Figure 10: Sending multipart/alternative messages in Exchange 2003

In Exchange 2007 Exchange Management Shell is used to enable sending messages in


Multipart/Alternative format.

Figure 11a: Sending multipart/alternative messages in Exchange 2007


Multipart/Mixed:

This is the most commonly used content-type when sending emails with body and attachments.

The multipart/mixed content type is used when the body parts are independent and need to be bundled
in a particular order. When a client does not recognize a multipart subtype, it will treat the message as
multipart/mixed.

Multipart/mixed specifies that the order of the body parts is important.

Figure 11b: Example of Multipart/Mixed

Multipart/mixed content-type scenarios:

1. When sending a message with attachments: The body part of the message body should be
presented first followed by the attachment. This is exactly as given in the above example.

If the order is reversed which means the attachment body part is specified before the body part
the body of the message will be attached to the message.

The body of a message is shown incorrectly as an attachment if you try to use an application in
an Exchange Server environment to send a message that includes attachments

http://support.microsoft.com/kb/969854
2. When sending a message with two or more attachments: If the message has an inline
attachment and an attachment body part. Then the inline body part should appear before the
attachment body part.

Figure 12: Example of multipart/mixed with inline attachment

3. When the message contains two or more attachments: If the message contains multiple
attachments but does not have an inline attachment then the order of attachments body part is
not important.
Figure 13: Example of multipart/mixed with two attachments

multipart/digest:

The multipart/digest content type used to send collections of plain-text messages. It is accomplished in
the same way as the multipart/mixed content-type, but each body part is expected to be of content-
type: message/rfc822.
Figure 14: Example of Multipart/Digest

Figure 15: Details of the same message

multipart/parallel:

The purpose of the multipart/parallel content type is to display all of the parts simultaneously on
hardware and software that can do so. For instance, an image file can be displayed while a sound file is
playing.
Figure 16: Example of multipart/parallel message

Syntactically multiple/mixed and multiple/parallel are identical. For comparison the example above has
multiple/mixed and multiple/parallel is shown in the same message.

multipart/related:

This is the most commonly used content-type after multipart/mixed. This is used mostly with HTML
data.

The multipart/related content type is used for compound documents, those messages in which the
separate body parts are intended to work together to provide the full meaning of the message.

Additionally, multipart/related can be used to provide links to content not contained within the message
or reference to an object in the present in the message using content-id parameter. Multipart/related
can be used for compound documents where the object is built progressively from pieces, starting with
the "root" body part as specified in the start parameter. If the start parameter is not specified, then the
first body part is considered the starting point or "root" body part. Multipart/related requires a type
parameter. The type parameter specifies the content type of the first or "root" part. Multipart/related
processing takes precedence over content-disposition.

Many MIME user agents do not recognize multipart/related and treat these messages as
multipart/mixed. To allow for this, messages will include the technically unnecessary Content-
Disposition header in multipart/related body parts.

Content-location and Content-base headers are used to reference to links which are external to the
body of the message.

Figure 17: Example of Multiple/related message when

Content-base and Content-location header is used


Figure 18: Example of multipart/related message
Figure 19: Details of the same message which is shown in figure 18

Note: The content-id header is specified in the Attachment body part and uses cid. It is referenced in
the body part where it needs to be used.

The most referred to article for multiple/related message is:


You cannot use an Outlook 2007 client to display or download an attachment when you access a
message that includes an inline attachment from Exchange Server 2007

http://support.microsoft.com/default.aspx?scid=kb;EN-US;954684

multipart/report:

The multipart/report content type was defined for returning delivery status reports, with optional
included messages. It is finding wider use in machine-to-machine communication. The multipart/report
is used for Message Disposition Notification.

An example of this is given in Figure 6: An Example of message/delivery-status.

Discrete media types

Text:
The text content type is used for message content that is primarily in human-readable text character
format. The more complex text content types are defined and identified so that an appropriate tool can
be used to display that body part.

text/enriched:

The text/enriched content type is intended to be simple enough to make multi-font, formatted e-mail
widely readable. It uses a very limited set of formatting commands that all begin with <commandname>
and end with </commandname>, affecting the formatting of the text between those two tokens.

Figure 20: Example of text/enriched

text/html:

The text/html content type is an Internet Media Type as well as a MIME content type. Using HTML in
MIME messages allows the full richness of Web pages to be available in e-mail.
Figure 21: An example of HTML email

Figure 22: Details of the same email

The global structure of an HTML document

An HTML document consist of three parts

1. A line containing HTML version information


2. A declarative header section

3. A body which contains the documents actual content. The body may be implemented by the
body element of the framset element.

Figure 23: An example of HTML data

Reference:

The global structure of an HTML document

http://www.w3.org/TR/html401/struct/global.html

text/plain: The text/plain content type is the generic subtype for plain text. It is the default
specified by RFC 822.
Figure 24: An example of text/plain message

text/rfc822-headers: The text/RFC822-headers content type provides a mechanism for an MTA to label
and return only the RFC 822 headers of a failed message. Only the headers are returned, not the
complete message. The returned headers are useful for identifying the failed message and for
diagnosing delivery problems. All headers are to be returned, up to the blank line following the headers.

Figure 25: An example of text/rfc822-headers

UUENCODE Attachment Format: The Unix-to-Unix encode (UUENCODE) format provided one of the
earliest ways to add attachments to messages. In the UUENCODE format, attachments are appended to
the message body after being encoded using the UUENCODE algorithm. Each attachment is prefixed
with the file name and the encoding end string. Multiple attachments are individually appended in
sequence and separated by a blank line. In the UUENCODE attachment format, the message body
consists of only two basic parts the message text and the message attachments.
Figure 26: Message with uuencode attachment

Figure 27: Details of the uuencode message


The format of the uuencode message is as follows:

1. Uuencode message starts with

begin <mode> <file>

<mode> is the file's Unix read/write/execute permissions as three octal digits

<file> is the name to be used when recreating the binary data

2. The file ends with two trailer lines:

end

3. Each line begins except the last begins with M, indicating 45 bytes of encoded data.

4. The tilde character “`” is used in places of space characters.

Mime clients do not recognize the above format and expect the message to be in the correct mime
format.

Mime clients expect the message to be in multipart/mixed. It should have plain/text body and
attachment body part should be uuencoded.

Limitations:

Although the UUENCODE format provides a way to add attachments to messages, it does not define
ways to:

 Indicate the type of the attachment, except through the file's extension.

 Specify alternate character encoding for the message text to support international languages.

 Relate groups of attachments.

 Indicate that the message text is a form of rich text, such as HTML or Rich Text Format (RTF)
formatted text.

 Provide future enhancements to a structure of complex message bodies. The UUENCODE


attachment format is neither flexible nor descriptive.
Figure 28: Mime formatted uuencoded message

References:

2.3.1 Analysis of Non-MIME Content


http://msdn.microsoft.com/en-us/library/ee202146(EXCHG.80).aspx

UUENCODE Attachment Format


http://msdn.microsoft.com/en-us/library/aa579638.aspx

Uuencoding
http://en.wikipedia.org/wiki/Uuencode

Transport Neutral Encoding Format (TNEF):


A TNEF message contains a plain text version of the message and an attachment that packages the
original formatted version of the message. The attachment is named Winmail.dat.

The Winmail.dat attachment includes the following information:

 The original formatted version of the message, including, for example, fonts, text sizes, and
text colors

 OLE objects, including, for example, embedded pictures or embedded Microsoft Office
documents

 Special Outlook features, including, for example, custom forms, voting buttons or meeting
requests

 Regular message attachments that were in the original message

The resulting plain text message can be represented in the following formats:

 An RFC 2822-compliant message composed of only US-ASCII text

 A multipart MIME-encoded message that has a Winmail.dat attachment

Encoding options for Winmail.dat:

Figure 29: Winmail.dat encoded in Mime and Uuencode

Winmail.dat can be Mime encoded or Uuencode. A TNEF aware client will be able to decode them
successfully.
Reference:

Description of Transport Neutral Encapsulation Format (TNEF) in Outlook 2000


http://support.microsoft.com/kb/241538

Figure 30: Example of MS-Tnef message

Summary Transport Neutral Encoding Format (STNEF):

STNEF messages are encoded differently than TNEF messages.

 They are always Mime encoded messages

 They have Content-Transfer-Encoding: binary

 No plain-text body and no distinct Winmail.dat attachment


 Travel using BDAT command instead of data

 Can only be transferred between SMTP messaging servers that support and advertise the
BINARYMIME and CHUNKING SMTP extensions as defined in RFC 3030

Figure 31: Example of STNEF message

STNEF is understood by Exchange 2000 and later versions. STNEF is automatically used by Exchange if
the following conditions are true:

 Exchange 2000: STNEF is used for messages that are transferred between Exchange servers that
are in the same routing group. An unsupported hotfix also enables Exchange 2000 to use STNEF
for messages that are transferred between Exchange servers in different routing groups.

 Exchange 2003: If the Exchange organization is in native mode, STNEF is used for all messages
that are transferred between Exchange servers in the organization.
 Exchange 2007: STNEF is used for all messages that are transferred between Exchange servers
in the organization.

Exchange never sends STNEF messages to external recipients. Only TNEF messages can be sent to
recipients outside the Exchange organization.

Reference:

Understanding Content Conversion


http://technet.microsoft.com/en-us/library/bb232174(EXCHG.80).aspx

Content-Transfer-Encoding

This header field can describe the following information about a message:

 The encoding algorithm that was used to transform any non-US-ASCII text or binary data that
exists in the message body.

 An indicator that describes the current condition of the message body.

There can be multiple values of the Content-Transfer-Encoding header field in a MIME message. When
the Content-Transfer-Encoding header field appears in the message header, it applies to the whole body
of the message. When the Content-Transfer-Encoding header field appears in one of the parts of a
multipart message, it applies only to that part of the message.

The purpose of Encoding is to convert the data into US-ASCII. This is required so data can pass through
SMTP host successfully. Many old SMTP messaging servers only support US-ASCII messages.

There are three values of content-transfer-encoding that can be used in Internet SMTP messages.

1. 7-bit encoding mechanism.

2. The base64 and quoted-printable are encoding schemes that ensure that the content will
properly pass through all messaging servers.

3. The 8bit and binary content-transfer-encoding values are defined to explicitly identify content
which may require processing or encoding before being packaged for Internet transfer.

Base64:

The Base64 Content-Transfer-Encoding is designed to represent arbitrary sequences of octets in a form


that need not be humanly readable. The encoding and decoding algorithms are simple, but the encoded
data are consistently only about 33 percent larger than the un-encoded data. This encoding is virtually
identical to the one used in Privacy Enhanced Mail (PEM) applications, as defined in RFC 1421.

Note: The base64 encoding is adapted from RFC 1421, with one change: base64 eliminates the
"*" mechanism for embedded clear text.
Base64 processes data as 24-bit groups, mapping this data to four encoded characters. It is sometimes
referred to as 3-to-4 encoding. Each 6 bits of the 24-bit group is used as an index into a mapping table
(the base64 alphabet) to obtain a character for the encoded data. The encoded data has line lengths
limited to 76 characters.

An Example of Base64 encoding:

Figure 32: US-ASCII code chart

US-ASCII characters is represented in binary equivalent as b7b6b5b4b3b2b1

Reference:

ASCII
http://en.wikipedia.org/wiki/ASCII

Process of conversion for three US-ASCII characters:


XCO

Step 1: Find the binary equivalent (the table is seven bit but it has to be represented in eight bits)
X is 01011000
C is 01000011
O is 01001111

Step 2: Represent the binary data from left to right


XCO will be represented as 010110000100001101001111

Step 3: Divide the binary data into group of six


010110 000100 001101 001111

Step 4: Convert the six bit to a decimal value.


22 4 13 15

Step 5: Look up in the base 64 alphabet table


WENP

Final Outcome: The base 64 encoding data for XCO is WENP.

The base64 character table is shown below:


Figure 33: Base64 character table

Reference:

Base64
http://en.wikipedia.org/wiki/Base64

Padding:

Special processing is performed if fewer than 24 bits are available at the end of the data being encoded.
A full encoding quantum is always completed at the end of a body. When fewer than 24 input bits are
available in an input group, zero bits are added (on the right) to form an integral number of 6-bit groups.
Padding at the end of the data is performed using the '=' character.

Since all base64 input is an integral number of octets, only the following cases can arise:

1. The final quantum of encoding input is an integral multiple of 24 bits; here, the final unit of
encoded output will be an integral multiple of 4 characters with no '=' padding

2. The final quantum of encoding input is exactly 8 bits; here, the final unit of encoded output
will be two characters followed by two '=' padding characters

3. The final quantum of encoding input is exactly 16 bits; here, the final unit of encoded output
will be three characters followed by one '=' padding character."

Quoted-printable:

The Quoted-Printable encoding is intended to represent data that largely consists of octets that
correspond to printable characters in the ASCII character set. It encodes the data in such a way that the
resulting octets are unlikely to be modified by mail transport. If the data being encoded is mostly ASCII
text, the encoded form of the data remains largely recognizable by humans. A body which is entirely
ASCII may also be encoded in Quoted-Printable to ensure the integrity of the data should the message
pass through a character- translating, and/or line-wrapping gateway. All printable US-ASCII text
characters except the equal sign character “=” can be represented without encoding.

An Example of Quoted Printable:

PLEASE CONSIDER THE ENVIRONMENT BEFORE PRINTING THIS E -MAIL. PLEASE CONSIDER THE
ENVIRONMENT BEFORE PRINTING THIS E -MAIL . PLEASE CONSIDER THE ENVIRONMENT BEFORE PRINTING
THIS E -MAIL .

This can be represented, in the Quoted Printable encoding, as

PLEASE CONSIDER THE ENVIRONMENT BEFORE PRINTING THIS E -MAIL. PLEASE CONSIDE =
R THE ENVIRONMENT BEFORE PRINTING THIS E -MAIL . PLEASE CONSIDER THE ENVIRONM =
ENT BEFORE PRINTING THIS E -MAIL

Due to soft line break the Quoted-Printable encoding requires that encoded lines be no more than 76
characters long.
Note: The 76 character limit does not count the trailing CRLF, but counts all other characters,
including any equal signs.

When no encoding algorithm has been used on the message body, the Content-Transfer Encoding
header field merely identifies the current condition of the message body data.

7bit:

This value indicates that the message body data is already in the RFC 2822 format. Specifically, this
means that the following conditions must be true:

 All lines of text must be less than 998 characters long


 All characters must be US-ASCII text that have character values 1 through 127, inclusive
 The CR and LF characters can only be used together to indicate the end of a line of text

The whole message body may be 7bit, or part of the message body in a multipart message may be 7bit.
If the multipart message contains other parts that have any binary data or non US-ASCII text, that part of
the message must be encoded using the Quoted-printable or Base64 encoding algorithms.

Note: Messages that have 7bit bodies can travel between SMTP messaging servers by using the
standard DATA command.

Figure 34: Message submitted using Data command


8bit:

Specifically, this means that the following conditions must be true:

 All lines of text must be less than 998 characters long.


 All characters must be US-ASCII texts that have character values 1 through 127, inclusive.
 The CR and LF characters can only be used together to indicate the end of a line of text.

The whole message body may be 8bit, or part of the message body in a multipart message may be 8bit.
If the multipart message contains other parts that have any binary data or non US-ASCII text, that part of
the message must be encoded using the Quoted-printable or Base64 encoding algorithms.

Note: Messages that have 8bit bodies can only travel between SMTP messaging servers that
support the 8BITMIME SMTP extension as defined in RFC 1652, such as Exchange 2000 Server or
later versions.

Specifically, this means that the following conditions must be true:

 The 8BITMIME keyword must be advertised in the server's EHLO response.


 Messages are still transferred by using the SMTP standard DATA command. However, the
BODY=8BITMIME parameter must be added to the end of the MAIL FROM command.

Figure 35: A message delivery of 8-bit mime message

Binary:
This value indicates that the message body contains non-US-ASCII text or binary data. Specifically, this
means that the following conditions are true:

 Any sequence of characters is allowed.


 There is no line length limitation.
 Binary message elements don't require encoding.

Messages that have Binary bodies can only travel between SMTP messaging servers that support the
BINARYMIME SMTP extension as defined in RFC 3030, such as Exchange 2000 or later versions.

Specifically, this means that the following conditions must be true:

 The BINARYMIME keyword must be advertised in the server's EHLO response.


 The BINARYMIME SMTP extension can only be used with the CHUNKING SMTP extension.
Chunking enables large message bodies to be sent in multiple, smaller chunks. Chunking is also
defined in RFC 3030. The CHUNKING keyword must also be advertised in the server's EHLO
response.
 Messages are transferred using the BDAT command instead of the standard DATA command.
 The BODY=BINARYMIME parameter must be added to the end of the MAIL FROM command
when the message has a message body.

Note: Binary encoded messages are not valid Internet messages.

Figure 36: Example of BDAT with chunking


Figure 37: Example of pipelining Binary Mime

Content-Transfer-Encoding key points:

 The values 7bit, 8bit, and Binary never exist together in the same multipart message. The values
are mutually exclusive.
 The Quoted-Printable or Base64 values may appear in a 7bit or 8bit multipart message body, but
never in a Binary message body.
 If a multipart message body contains different parts that are composed of 7bit and 8bit content,
the whole message is classified as 8bit.
 If a multipart message body contains different parts composed of 7bit, 8bit, and Binary content,
the whole message is classified as Binary

Content-Disposition:

This header field instructs a MIME-enabled email client on how it should display an attached file. The
values of this field may be Inline or Attachment.

When the value of this field is Inline, the attachment is displayed in the message body.

When the value of this field is Attachment, the attached file appears as a regular attachment that is
separate from the message body. Other parameters are available when the value is Attachment, such
as Filename, Creation-date, and Size.

This header is ignored when they appear within multipart/related body parts.
This header cannot contain any comments.

Figure 38: An example of message with attachment

Figure 39: Content-Disposition header highlighted in the message source

Character Encoding (charset):


Character set is a collection of letters and symbol used to specify information to the receiving client.
SMTP data is sent only in US-ASCII format so that it can safely pass through SMTP host but the message
can actually contain information which belongs to another language or encoded information. Adding
charset information to the data will help the receiver to decode it correctly.

Charset can be specified in the content-type header of the body parts.

For example in the figure 39

CONTENT-TYPE: TEXT/PLAIN ; CHARSET=”ISO-8859-1”

Charset can also be specified in the body of message body. It is mostly used in HTML message body. In
HTML body it is specified in the meta data of the html portion.

For example:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 TRANSITIONAL//EN">


<HTML><HEAD><TITLE>M ESSAGE </TITLE>
<META HTTP-EQUIV=3DCONTENT-TYPE CONTENT=3D"TEXT/HTML; =
CHARSET =3DUS-ASCII ">
<META CONTENT=3D"MSHTML 6.00.2800.1498" NAME=3DGENERATOR></HEAD>
<BODY><!-- CONVERTED FROM TEXT /PLAIN FORMAT -->
<P><FONT SIZE=3D2>T HE RAIN IN SPAIN FALLS MAINLY ON THE=20
PLAIN.<BR><BR></FONT></P></BODY></HTML>

In the above example charset=us-ascii is specified in the body.

Note: The charset specified in the Content-type header and the message body for the same body part
should match.
Figure 40: Figure showing charset matching

References
Character encodings in HTML and CSS
http://www.w3.org/International/tutorials/tutorial-char-enc/

Introducing Character Sets and Encodings


http://www.w3.org/International/getting-started/characters

Definitions on MSDN
http://msdn.microsoft.com/en-us/library/ee200565(EXCHG.80).aspx

Anda mungkin juga menyukai