Anda di halaman 1dari 6

Sitemap Protocol

06/06/2005 12:36 AM

Sitemap Protocol
Google Sitemaps Home About Google Sitemaps Google Sitemaps Help Sitemap Protocol Sitemap Generator Instructions

Contents
1. Overview 2. XML Sitemap Format Sample XML Sitemap XML Tag Definitions 3. Providing Multiple Sitemap Files Sample XML Sitemap Index Sitemap Index XML Tag Definitions 4. Location of Sitemap Files 5. Frequently Asked Questions

Overview

[Contents]

The Sitemap Protocol allows you to inform search engine crawlers about URLs on your Web sites that are available for crawling. A Sitemap consists of a list of URLs and may also contain additional information about those URLs, such as when they were last modified, how frequently they change, etc. Sitemaps are particularly beneficial when users can not reach all areas of a Web site through a browseable interface i.e. users are unable to reach certain pages or regions of a site by following links. For example, any site where certain pages are only accessible via a search form would benefit from creating a Sitemap and submitting it to search engines. This document describes the formats for Sitemap files and also explains where you should post your Sitemap files so that search engines can retrieve them. Please note that the Sitemap Protocol supplements, but does not replace, the crawl-based mechanisms that search engines already use to discover URLs. By submitting a Sitemap (or Sitemaps) to a search engine, you will help that engine's crawlers to do a better job of crawling your site. Using this protocol does not guarantee that your Web pages will be included in search indexes. In addition, using this protocol may not influence the way your pages are ranked by a search engine. Sitemap 0.84 is offered under the terms of the Attribution-ShareAlike Creative Commons License.

XML Sitemap Format

[Contents]

The XML Sitemap Format allows you to provide a list of URLs and include additional information about those URLs in your Sitemap. This additional information includes the date the content at that URL last changed, how often that content can be expected to change and how important that URL is relative to other URLs on your site. The XML Sitemap Format uses the following XML tags:

changefreq how frequently the content at the URL is likely to change lastmod the time the content at the URL was last modified loc the URL location priority the priority of the page relative to other pages on the same site url this tag encapsulates the first four tags in this list urlset this tag encapsulates the first five tags in this list
Note: All data values, including URLs, in your Sitemap files must be XML-encoded. The chart below provides a list of characters with their corresponding encoded values. You can use either the entity or the character code to XML encode a character. Please see the FAQ for more information about XML encoding. Character Ampersand Single Quote Double Quote Greater Than Less Than Sample XML Sitemap
https://www.google.com/webmasters/sitemaps/docs/en/protocol.html Page 1 of 6

Escaped Forms Entity & ' " > < Character Code & ' " > <

& ' " > <

Sitemap Protocol

06/06/2005 12:36 AM

The following example shows a Sitemap in XML format. The Sitemap in the example contains a small number of URLs, each of which is identified using the loc XML tag. In this example, a different set of optional parameters has been provided for each URL.

<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.google.com/schemas/sitemap/0.84"> <url> <loc>http://www.yoursite.com/</loc> <lastmod>2005-01-01</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url> <url> <loc>http://www.yoursite.com/catalog?item=12&amp;desc=vacation_hawaii</loc> <changefreq>weekly</changefreq> </url> <url> <loc>http://www.yoursite.com/catalog?item=73&amp;desc=vacation_new_zealand</loc> <lastmod>2004-12-23</lastmod> <changefreq>weekly</changefreq> </url> <url> <loc>http://www.yoursite.com/catalog?item=74&amp;desc=vacation_newfoundland</loc> <lastmod>2004-12-23T18:00:15+00:00</lastmod> <priority>0.3</priority> </url> <url> <loc>http://www.yoursite.com/catalog?item=83&amp;desc=vacation_usa</loc> <lastmod>2004-11-23</lastmod> </url> </urlset>
You can compress your Sitemap files using gzip. Compressing your Sitemap files will reduce your bandwidth requirement. Please note that your uncompressed Sitemap file may not be larger than 10MB. Note: Your Sitemap files must use UTF-8 encoding. XML Tag Definitions This section provides details about the XML tags that can appear in your Sitemap(s). In the "Subtags" section of some of the XML tag definitions, a question mark ("?") appearing after the name of an XML tag indicates that the tag is optional.

changefreq
Definition Optional. This value indicates how frequently the content at a particular URL is likely to change. The value must be either "always", "hourly", "daily", "weekly", "monthly", "yearly" or "never". The value "always" should be used to describe documents that change each time they are accessed. The value "never" should be used to describe archived URLs. Please note that the value of this tag is considered a hint and not a command. Even though search engine crawlers consider this information when making decisions, they may crawl pages marked "hourly" less frequently than that, and they may crawl pages marked "yearly" more frequently than that. It is also likely that crawlers will periodically crawl pages marked "never" so that they can handle unexpected changes to those pages. Constraints Example Subtag of Content Format Enumerated list. Valid values are "always", "hourly", "daily", "weekly", "monthly", "yearly" and "never".

<changefreq>monthly</changefreq> url
Text

lastmod
Definition Optional. The time the URL was last modified. You should specify the timestamp using ISO 8601; for example, 2004-09-22T14:12:14+00:00. You can omit the time portion of the ISO 8601 format; for example, 2004-09-22 is also valid. This information allows crawlers to avoid recrawling documents that haven't changed. Value must be in ISO 8601 format.
Page 2 of 6

Constraints

https://www.google.com/webmasters/sitemaps/docs/en/protocol.html

Sitemap Protocol

06/06/2005 12:36 AM

Example

<lastmod>2005-02-21</lastmod> or <lastmod>2005-02-21T18:00:15+00:00</lastmod> url


Text

Subtag of Content Format

loc
Definition Constraints Example Subtag of Content Format Required. A URL for a page on your site. Value must be <= 2048 characters.

<loc>http://www.yoursite.com/catalog? item=1&amp;desc=vacation_hawaii</loc> url


Text

priority
Definition Optional. The priority of a particular URL relative to other pages on the same site. The value for this tag is a number between 0.0 and 1.0, where 0.0 identifies the lowest priority page(s) on your site and 1.0 identifies the highest priority page(s) on your site. The default priority of a page is 0.5. Please note that the priority you assign to a page has no influence on the position of your URLs in a search engine's result pages. Search engines use this information when selecting between URLs on the same site, so you can use this tag to increase the likelihood that your more important pages are present in a search index. Also, please note that assigning a high priority to all of the URLs on your site will not help you. Since the priority is relative, it is only used to select between URLs on your site; the priority of your pages will not be compared to the priority of pages on other sites. Constraints Example Subtag of Content Format Value must be between 0.0 and 1.0 inclusive.

<priority>0.7</priority> url
Text

url
Definition Subtags Subtag of Content Format Encapsulates information about a particular URL.

changefreq?, lastmod?, loc, priority? urlset


Empty

urlset
Definition Subtags Content Format Encapsulates information about all of the URLs in a Sitemap file.

url
Empty [Contents]

Providing Multiple Sitemap Files

You can provide multiple Sitemap files, but each file that you provide must have no more than 50,000 URLs and must be no larger than 10MB (10,485,760) when uncompressed. These limits help to ensure that your Web server does not get bogged down serving very large files. If you want to list more than 50,000 URLs, you must create multiple Sitemap files. If you anticipate your Sitemap growing beyond 50,000 URLs or 10MB, you should consider creating multiple Sitemap files. If you do provide multiple Sitemaps, you must list them in a Sitemap index file. Sitemap index files may not list more than 1,000 Sitemaps. Your Sitemap index file could be named Sitemap_index.xml. The XML format of a Sitemap index file is very similar to the XML format of a Sitemap file. The Sitemap index file uses the following XML tags:

https://www.google.com/webmasters/sitemaps/docs/en/protocol.html

Page 3 of 6

Sitemap Protocol

06/06/2005 12:36 AM

lastmod loc sitemap sitemapindex


Note: A Sitemap index file can only specify Sitemaps that are found on the same site as the Sitemap index file. For example, http://www.yoursite.com/sitemap_index.xml can include Sitemaps on http://www.yoursite.com but not on http://www.mysite.com or http://yourhost.yoursite.com. Sample XML Sitemap Index The following example shows a Sitemap index in XML format. The Sitemap index lists two Sitemaps:

<?xml version="1.0" encoding="UTF-8"?> <sitemapindex xmlns="http://www.google.com/schemas/sitemap/0.84"> <sitemap> <loc>http://www.mysite.com/sitemap1.xml.gz</loc> <lastmod>2004-10-01T18:23:17+00:00</lastmod> </sitemap> <sitemap> <loc>http://www.mysite.com/sitemap2.xml.gz</loc> <lastmod>2005-01-01</lastmod> </sitemap> </sitemapindex>
Note: Sitemap URLs, like all values in your XML files, must be XML-encoded. Sitemap Index XML Tag Definitions The loc tag is required and identifies the location of the Sitemap. The lastmod tag is an optional tag that identifies the time that the corresponding Sitemap file was modified. It does not correspond to the time that any of the pages listed in that Sitemap were changed. The value for the lastmod tag should be in ISO 8601 format. By providing the last modification timestamp, you enable search engine crawlers to retrieve only a subset of the Sitemaps in the index i.e. a crawler could only retrieve Sitemaps that were modified since a certain date. This incremental Sitemap fetching mechanism allows for the rapid discovery of new URLs on very large sites. The sitemap tag encapsulates information about an individual Sitemap. The sitemapindex tag encapsulates information about all of the Sitemaps in the file.

Location of Sitemap Files

[Contents]

The location of a Sitemap file determines the set of URLs that can be included in that Sitemap. A Sitemap file located at http://yoursite.com/catalog/sitemap.gz can include any URLs starting with http://yoursite.com/catalog/ but can not include URLs starting with http://yoursite.com/images/ . If you have the permission to change "http://site.org/path/sitemap.gz", it is safe to assume that you also have permission to provide information for URLs with the prefix "http://site.org/path/". Examples of URLs considered valid in http://yoursite.com/catalog/sitemap.gz include:

http://yoursite.com/catalog/show?item=23 http://yoursite.com/catalog/show?item=233&user=3453
URLs not considered valid in http://yoursite.com/catalog/sitemap.gz include:

http://yoursite.com/image/show?item=23 http://yoursite.com/image/show?item=233&user=3453 http://mysite.com/catalog/show?item=24


URLs that are not considered valid are dropped from further consideration. It is strongly recommended that you place your Sitemap at the root directory of your web server. For example, if your HTTP Web server is at yoursite.com, then your Sitemap index file would be at "http://yoursite.com/sitemap.gz". In certain cases, you may need to produce different Sitemaps for different paths e.g. if security permissions in your organization compartmentalize write access to different directories.

https://www.google.com/webmasters/sitemaps/docs/en/protocol.html

Page 4 of 6

Sitemap Protocol

06/06/2005 12:36 AM

Frequently Asked Questions


How do I XML-encode a URL? Does it matter which character encoding method I use to generate my Sitemap files? How do I specify time? How do I compute lastmod date? Where do I place my Sitemap? How big can my Sitemap be? My site has tens of millions of URLs; can I somehow submit only those that have changed recently? What happens after I produce my Sitemap? Do URLs in the Sitemap need to be completely specified? My site has both "http" and "https" version of URLs. Do I need to list both? URLs on my site have session IDs in them. Do I need to remove them? Does position of a URL in a Sitemap influence its use?

[Contents]

Some of the pages on our site use frames. Should we include the frameset URLs or the URLs of the frame contents? Can I zip my Sitemaps or do they have to be gzipped? Will the "priority" hint in the XML Sitemap change the ranking of my pages in search results? Is there an XML schema that I can validate my XML Sitemap against?
Q: How do I XML-encode a URL? To properly encode your URLs, follow the procedure recommended by the HTML 4.0 specification, section B.2.1. Convert the string to UTF-8 and then URL-escape the result. For details about Internationalized Resource Identifiers, also see RFC2396 (sections 2.3 and 2.4) and RFC3987. The following is an example python script for XML encoding a URL:

$ python Python 2.2.2 (#1, Feb 24 2003, 19:13:11) >>> import xml.sax.saxutils >>> xml.sax.saxutils.escape("http://www.test.org/view?widget=3&count>2")
The encoded URL from the example above is:

http://www.test.org/view?widget=3&amp;count&gt;2
Q: Does it matter which character encoding method I use to generate my Sitemap files? Yes. Your Sitemap files must use UTF-8 encoding. Q: How do I specify time? Use ISO 8601 encoding for the lastmod timestamps and all other dates and times in this protocol. For example, 2004-09-22T14:12:14+00:00. If you wish, you can omit the time portion of the ISO8601 format; for example, 2004-09-22 is also valid. However, if your site changes frequently, you are encouraged to include the time portion so crawlers have more complete information about your site. Q: How do I compute lastmod date? For static files, this is the actual file update date. You can use the UNIX date command to get this date:

$ date --iso-8601=seconds -u -r /home/foo/www/bar.html >> 2004-10-26T08:56:39+00:00


For many dynamic URLs, you may be able to easily compute a lastmod date based on when the underlying data was changed or by using some approximation based on periodic updates (if applicable). Using even an approximate date or timestamp can help crawlers avoid crawling URLs that have not changed. This will reduce the bandwidth and CPU requirements for your Web servers. Q: Where do I place my Sitemap?
https://www.google.com/webmasters/sitemaps/docs/en/protocol.html Page 5 of 6

Sitemap Protocol

06/06/2005 12:36 AM

It is strongly recommended that you place your Sitemap at the root directory of your HTML server; that is, place it at http://yoursite.com/sitemap.gz. In some situations, you may want to produce different Sitemaps for different paths on your site e.g. if security permissions in your organization compartmentalize write access to different directories. If you have the permission to change http://site.org/path/sitemap , then it is generally safe to assume that you also have permission to report metadata under http://site.org/path/. Q: How big can my Sitemap be? Search engines will not process Sitemaps larger than 10MB (10,485,760 bytes) in length when uncompressed or that contain more than 50,000 URLs. This means that if your site contains more than 50,000 URLs or your Sitemap is bigger than 10MB, you must create multiple Sitemap files and use a Sitemap index file. You should use a Sitemap index file even if you have a small site but plan on growing beyond 50,000 URLs or a filesize of 10MB. Q: My site has tens of millions of URLs; can I somehow submit only those that have changed recently? You can list the updated URLs in a small number of Sitemaps that change frequently and then use the lastmod tag in your Sitemap index file to identify those Sitemap files. Search engines will then incrementally crawl only the changed Sitemaps. Q: What happens after I produce my Sitemap? After you produce your Sitemap, you will need to notify search engines of the Sitemap's location. The search engines that you notify will then retrieve your Sitemap and make the URLs available to their crawlers. Q: Do URLs in the Sitemap need to be completely specified? Yes. Search engines will crawl the URLs exactly as you provide them. (Search engines will XML decode your URLs if they are XML-encoded.) You do need to include the protocol e.g. http in your URL; you also need to include a trailing slash in your URL if your Web server requires one. For example, http://www.google.com/ is a valid URL for a Sitemap, whereas www.google.com is not. Q: My site has both "http" and "https" version of URLs. Do I need to list both? No. Please list only one version of a URL in your Sitemaps. Including multiple versions of URLs may result in incomplete crawling of your site. Q: URLs on my site have session IDs in them. Do I need to remove them? Yes. Including session IDs in URLs may result in incomplete and redundant crawling of your site. Q: Does position of a URL in a Sitemap influence its use? No. The position of a URL in the Sitemap has no impact on how it is used or regarded by search engines. Q: Some of the pages on our site use frames. Should we include the frameset URLs or the URLs of the frame contents? Please include both URLs. Q: Can I zip my Sitemaps or do they have to be gzipped? Please use gzip to compress your Sitemaps. Q: Will the "priority" hint in the XML Sitemap change the ranking of my pages in search results? No. The "priority" hint in your Sitemap only indicates the importance of a particular URL relative to other URLs on your own site. Q: Is there an XML schema that I can validate my XML Sitemap against? An XML schema is available for Sitemap files at http://www.google.com/schemas/sitemap/0.84/sitemap.xsd, and a schema for Sitemap index files is available at http://www.google.com/schemas/sitemap/0.84/siteindex.xsd.

2005 Google - Privacy Policy - Terms and Conditions - About Google

https://www.google.com/webmasters/sitemaps/docs/en/protocol.html

Page 6 of 6

Anda mungkin juga menyukai