1 Introduction
1.1 Background
This section is non-normative.
The World Wide Web's markup language has always been HTML. HTML
was primarily designed as a language for semantically describing
scientific documents, although its general design and adaptations over
the years have enabled it to be used to describe a number of other types
of documents.
The main area that has not been adequately addressed by HTML is a
vague subject referred to as Web Applications. This specification attempts
to rectify this, while at the same time updating the HTML specifications to
address issues raised in the past few years.
1.2 Audience
This section is non-normative.
This specification is intended for authors of documents and scripts that
use the features defined in this specification, implementors of tools that
operate on pages that use the features defined in this specification, and
individuals wishing to establish the correctness of documents or
implementations with respect to the requirements of this specification.
This document is probably not suited to readers who do not already have
at least a passing familiarity with Web technologies, as in places it
sacrifices clarity for precision, and brevity for completeness. More
1.3 Scope
This section is non-normative.
This specification is limited to providing a semantic-level markup
language and associated semantic-level scripting APIs for authoring
accessible pages on the Web ranging from static documents to dynamic
applications.
The scope of this specification does not include providing mechanisms for
media-specific customization of presentation (although default rendering
rules for Web browsers are included at the end of this specification, and
several mechanisms for hooking into CSS are provided as part of the
language).
The scope of this specification is not to describe an entire operating
system. In particular, hardware configuration software, image
manipulation tools, and applications that users would be expected to use
with high-end workstations on a daily basis are out of scope. In terms of
applications, this specification is targeted specifically at applications that
would be expected to be used by users on an occasional basis, or
regularly but from disparate locations, with low CPU requirements.
Examples of such applications include online purchasing systems,
searching systems, games (especially multiplayer online games), public
telephone books or address books, communications software (e-mail
clients, instant messaging clients, discussion software), document editing
software, etc.
1.4 History
This section is non-normative.
For its first five years (1990-1995), HTML went through a number of
revisions and experienced a number of extensions, primarily hosted first
at CERN, and then at the IETF.
With the creation of the W3C, HTML's development changed venue
again. A first abortive attempt at extending HTML in 1995 known as HTML
3.0 then made way to a more pragmatic approach known as HTML 3.2,
which was completed in 1997. HTML4 quickly followed later that same
year.
The following year, the W3C membership decided to stop evolving HTML
and instead begin work on an XML-based equivalent, called XHTML. This
effort started with a reformulation of HTML4 in XML, known as XHTML
1.0, which added no new features except the new serialization, and which
was completed in 2000. After XHTML 1.0, the W3C's focus turned to
making it easier for other working groups to extend XHTML, under the
banner of XHTML Modularization. In parallel with this, the W3C also
worked on a new language that was not compatible with the earlier HTML
and XHTML languages, calling it XHTML2.
Around the time that HTML's evolution was stopped in 1998, parts of the
API for HTML developed by browser vendors were specified and
published under the name DOM Level 1 (in 1998) and DOM Level 2 Core
and DOM Level 2 HTML (starting in 2000 and culminating in 2003). These
efforts then petered out, with some DOM Level 3 specifications published
in 2004 but the working group being closed before all the Level 3 drafts
were completed.
In 2003, the publication of XForms, a technology which was positioned as
the next generation of Web forms, sparked a renewed interest in evolving
HTML itself, rather than finding replacements for it. This interest was
borne from the realization that XML's deployment as a Web technology
was limited to entirely new technologies (like RSS and later Atom), rather
than as a replacement for existing deployed technologies (like HTML).
A proof of concept to show that it was possible to extend HTML4's forms
to provide many of the features that XForms 1.0 introduced, without
requiring browsers to implement rendering engines that were
incompatible with existing HTML Web pages, was the first result of this
renewed interest. At this early stage, while the draft was already publicly
available, and input was already being solicited from all sources, the
specification was only under Opera Software's copyright.
The idea that HTML's evolution should be reopened was tested at a W3C
workshop in 2004, where some of the principles that underlie the HTML5
work (described below), as well as the aforementioned early draft
proposal covering just forms-related features, were presented to the W3C
jointly by Mozilla and Opera. The proposal was rejected on the grounds
that the proposal conflicted with the previously chosen direction for the
Web's evolution; the W3C staff and membership voted to continue
developing XML-based replacements instead.
Shortly thereafter, Apple, Mozilla, and Opera jointly announced their intent
to continue working on the effort under the umbrella of a new venue
called the WHATWG. A public mailing list was created, and the draft was
moved to the WHATWG site. The copyright was subsequently amended
to be jointly owned by all three vendors, and to allow reuse of the
specification.
The WHATWG was based on several core principles, in particular that
technologies need to be backwards compatible, that specifications and
implementations need to match even if this means changing the
specification rather than the implementations, and that specifications
need to be detailed enough that implementations can achieve complete
interoperability without reverse-engineering each other.
The latter requirement in particular required that the scope of the HTML5
specification include what had previously been specified in three separate
documents: HTML4, XHTML1, and DOM2 HTML. It also meant including
significantly more detail than had previously been considered the norm.
In 2006, the W3C indicated an interest to participate in the development
of HTML5 after all, and in 2007 formed a working group chartered to work
with the WHATWG on the development of the HTML5 specification.
Apple, Mozilla, and Opera allowed the W3C to publish the specification
under the W3C copyright, while keeping a version with the less restrictive
license on the WHATWG site.
For a number of years, both groups then worked together under the same
editor: Ian Hickson. In 2011, the groups came to the conclusion that they
had different goals: the W3C wanted to draw a line in the sand for
features for a HTML5 Recommendation, while the WHATWG wanted to
continue working on a Living Standard for HTML, continuously
maintaining the specification and adding new features. In mid 2012, a
new editing team was introduced at the W3C to take care of creating a
HTML5 Recommendation and prepare a Working Draft for the next HTML
version.
Since then, the W3C HTML WG has been cherry picking patches from the
WHATWG that resolved bugs registered on the W3C HTML specification
or more accurately represented implemented reality in UAs. At time of
publication of this document, patches from the WHATWG HTML
specification have been merged until revision 8152 inclusive. The W3C
HTML editors have also added patches that resulted from discussions
and decisions made by the W3C HTML WG as well a bug fixes from bugs
not shared by the WHATWG.
A separate document is published to document the differences between
the HTML specified in this document and the language described in the
HTML4 specification. [HTMLDIFF]
This is a warning.
interface Example {
// this is an IDL definition
};
this.
node on the Internet appear to come from many disparate parts of the
network.
However, the IP address used for a user's requests is not the only
mechanism by which a user's requests could be related to each other.
Cookies, for example, are designed specifically to enable this, and
are the basis of most of the Web's session features that enable you to
log into a site with which you have an account.
There are other mechanisms that are more subtle. Certain
characteristics of a user's system can be used to distinguish groups
of users from each other; by collecting enough such information, an
individual user's browser's "digital fingerprint" can be computed,
which can be as good, if not better, as an IP address in ascertaining
which requests are from the same user.
Grouping requests in this manner, especially across multiple sites,
can be used for both benign (and even arguably positive) purposes,
as well as for malevolent purposes. An example of a reasonably
benign purpose would be determining whether a particular person
seems to prefer sites with dog illustrations as opposed to sites with
cat illustrations (based on how often they visit the sites in question)
and then automatically using the preferred illustrations on subsequent
visits to participating sites. Malevolent purposes, however, could
include governments combining information such as the person's
home address (determined from the addresses they use when getting
driving directions on one site) with their apparent political affiliations
(determined by examining the forum sites that they participate in) to
determine whether the person should be prevented from voting in an
election.
Since the malevolent purposes can be remarkably evil, user agent
implementors are encouraged to consider how to provide their users
with tools to minimize leaking information that could be used to
fingerprint a user.
Unfortunately, as the first paragraph in this section implies,
sometimes there is great benefit to be derived from exposing the very
information that can also be used for fingerprinting purposes, so it's
not as easy as simply blocking all possible leaks. For instance, the
ability to log into a site to post under a specific identity requires that
the user's requests be identifiable as all being from the same user,
Other features in the platform can be used for the same purpose,
though, including, though not limited to:
The exact list of which features a user agents supports.
The maximum allowed stack depth for recursion in script.
Features that describe the user's environment, like Media
Queries and the Screen object. [MQ] [CSSOMVIEW]
The user's time zone.
an end tag, such as "</body>". (Certain start tags and end tags can in
certain cases be omitted and are implied by other tags.)
Tags have to be nested such that elements are all completely within
each other, without overlapping:
<p>This is <em>very <strong>wrong</em>!</strong></p>
<p>This <em>is <strong>correct</strong>.</em></p>
Attributes are placed inside the start tag, and consist of a name and
a value, separated by an "=" character. The attribute value can
remain unquoted if it doesn't contain space characters or any
of " ' ` = < or >. Otherwise, it has to be quoted using either single or
double quotes. The value, along with the "=" character, can be omitted
altogether if the value is the empty string.
<!-- empty attributes -->
<input name=address disabled>
<input name=address disabled="">
<!-- attributes with a value -->
<input name=address maxlength=200>
<input name=address maxlength='200'>
<input name=address maxlength="200">
HTML user agents (e.g. Web browsers) then parse this markup,
turning it into a DOM (Document Object Model) tree. A DOM tree is an
in-memory representation of a document.
DOM trees contain several kinds of nodes, in particular
a DocumentType node, Element nodes, Text nodes, Comment nodes, and in
some cases ProcessingInstruction nodes.
The markup snippet at the top of this section would be turned into the
following DOM tree:
DOCTYPE: html
o
o
html
head
#text:
title
#text:
#text:
#text:
Sample page
body
#text:
h1
#text:
Sample page
#text:
p
#text:
This is a
a href="demo.html"
#text: simple
#text: sample.
#text:
#comment: this is a comment
#text:
The root element of this tree is the html element, which is the element
always found at the root of HTML documents. It contains two
elements, head and body, as well as a Text node between them.
There are many more Text nodes in the DOM tree than one would
initially expect, because the source contains a number of spaces
(represented here by "") and line breaks ("") that all end up
as Text nodes in the DOM. However, for historical reasons not all of
the spaces and line breaks in the original markup appear in the DOM.
In particular, all the whitespace before head start tag ends up being
dropped silently, and all the whitespace after the body end tag ends up
placed at the end of the body.
The head element contains a title element, which itself contains
a Text node with the text "Sample page". Similarly, the body element
contains an h1 element, a p element, and a comment.
This DOM tree can be manipulated from scripts in the page. Scripts
(typically in JavaScript) are small programs that can be embedded
using the script element or using event handler content attributes.
For example, here is a form with a script that sets the value of the
form's output element to say "Hello World":
<form name="main">
Result: <output name="result"></output>
<script>
document.forms.main.elements.result.value = 'Hello World';
</script>
</form>
Since DOM trees are used as the way to represent HTML documents
when they are processed and presented by implementations
(especially interactive implementations like Web browsers), this
specification is mostly phrased in terms of DOM trees, instead of the
markup described above.
</script>
However, if the author first created the img element and then in a
separate script added the event listeners, there's a chance that
the load event would be fired in between, leading it to be missed:
<!-- Do not use this style, it has a race condition! -->
<img id="games" src="games.png" alt="Games">
<!-- the 'load' event might fire here while the parser is
taking a
break, in which case you will not see it! -->
<script>
var img = document.getElementById('games');
img.onload = gamesLogoHasLoaded; // might never fire!
</script>
#text: He dreamt.
p
o
i
nesting span elements inside div elements all serve the same
purpose as nesting a div element in a spanelement, but only the
latter involves a block box in an inline box, the latter combination
is disallowed.
Another example would be the way interactive content cannot be
nested. For example, a button element cannot contain
a textarea element. This is because the default behavior of such
nesting interactive elements would be highly confusing to users.
Instead of nesting these elements, they can be placed side by
side.
Errors that indicate a likely misunderstanding of the
specification
Sometimes, something is disallowed because allowing it would
likely cause author confusion.
For example, setting the disabled attribute to the value "false" is
disallowed, because despite the appearance of meaning that the
element is enabled, it in fact means that the element
is disabled (what matters for implementations is the presence of
the attribute, not its value).
Errors involving limits that have been imposed merely to
simplify the language
Some conformance errors simplify the language that authors need
to learn.
For example, the area element's shape attribute, despite accepting
both circ and circle values in practice as synonyms, disallows
the use of the circ value, so as to simplify tutorials and other
learning aids. There would be no benefit to allowing both, but it
would cause extra confusion when teaching the language.
Errors that involve peculiarities of the parser
Certain elements are parsed in somewhat eccentric ways (typically
for historical reasons), and their content model restrictions are
intended to avoid exposing the author to these issues.
For example, a form element isn't allowed inside phrasing content,
because when parsed as HTML, a form element's start tag will
imply a p element's end tag. Thus, the following markup results in
two paragraphs, not one: