Anda di halaman 1dari 17


Web Architecture: HTTP, URIs & URLs


Some NETWORK terms

Term Host Packet IP IP address Hostname DNS Connection TCP Definition Machine connected to a network Basic unit of Internet communication Internet Protocol to coordinate delivery of individual packets between hosts Numerical address for an Internet host Case-insensitive string identifier for an Internet host Distributed Domain Name System to translate between IP addresses and hostnames Logical communication channel between hosts Transmission Control Protocol providing abstraction of a reliable, bidirectional connection

JISC Netskills


Some WEB terms

Term Web Hypertext Internet Web page Web site Browser/ Web Client Web server Origin server Intermediary
JISC Netskills

Definition Network of interlinked information Linking related information for navigation Worldwide network of networks using IP Document accessible on the Web via a URI Collection of related Web pages Application (user agent) to request and display Web pages Program that receives & responds to HTTP request Server where requested resource resides Web component in the request path between client and origin server (proxies, gateways)


Web architecture: Key Components

HTTP/1.1: HyperText Transfer Protocol URI: Uniform Resource Identifier
Format and semantics of request/response messages

Formatted string that identifies a resource

HTML/XHTML: HyperText Markup Language Plus DNS: Domain Name System TCP/IP: Internet Protocol Suite
JISC Netskills


URI: Anatomy
A Uniform Resource Identifier identifies a resource on the internet and is independent of current location
Generic Scheme





Scheme specific Scheme Scheme specific syntax



<a href="">Mail me</a>

JISC Netskills


When is a URI a URL?

Uniform Resource Locator
"...a URL is a type of URI that identifies a resource via a representation of its primary access mechanism (e.g., its network "location"), rather than by some other attributes it may have."

http: is a URI scheme


an http: URI is a URL

JISC Netskills


HTTP: A request-response protocol

HyperText Transfer Protocol makes the Web work!
Clients ask for resources from servers by assembling and sending an HTTP request message Servers respond with the appropriate HTTP response message, including any content to be displayed Metadata in headers & content in an (optional) entity body

HTTP requests & responses travel as TCP/IP packets: HTTP is stateless

Each request/response pair is an independent exchange. No protocol level maintenance of state (for scalability)

JISC Netskills


A Simple Request
DNS server
2. Client makes a DNS query "Where is" 1. User types URL (or clicks a link) 4. TCP connection established (2-way) 5. HTTP request (from client) 6. HTTP response (from server) 7. Browser processes & displays response
JISC Netskills

3. DNS server looks up & responds "At IP address"


Web server


HTTP methods
The operations carried out over HTTP

HEAD and GET are mandatory (all resources support them)

Others are optional (depends on what you're up to ) This is the "default" method for HTTP

GET returns current state and content of resource HEAD just returns response metadata
i.e. a GET without the body (content)

JISC Netskills


HTTP headers
Headers are the metadata for the request/response exchange Some are generic and apply to both request and response e.g.
Date -> the date/time stamp for the message Cache-Control -> instructions for en-route caching (or not)

Understanding the reading, setting and manipulation HTTP headers is very useful for managing a web site Be aware that headers can be spoofed not good!
JISC Netskills


More HTTP Headers

Request headers (19 in total in 4 classes)
Response preferences Additional request info Conditional headers Constrain server behaviour Redirection Additonal information Authentication Caching -> -> -> -> -> -> -> ->
Accept, Accept-Charset etc Authorization, From etc If-Modified-Since etc Max-Forwards etc Location Server, Retry-After etc WWW-Authenticate, Proxy-Authenticate Age etc

Response headers (9 in total in 4 classes)

Entity & Hop-by-hop headers

JISC Netskills

Info about the resource (content) Content-Type etc Hop-by-hops can be read, stripped or added to en-route


Common HTTP methods

Method HEAD GET POST PUT DELETE Use Exchange of request/response headers Request and return the current state and content of a resource e.g. access a web page Request uses entity body to update resource or as input for processing e.g. form input Server stores entity body contents at request URI location e.g. file uploads Deletes identified resource i.e. opposite of PUT Safe* Yes Yes No No No Idempotent** Yes Yes No Yes Yes Mandatory Yes Yes No No No

* Safe does not change state of resource ** Idempotent... the side effects of repeated, identical requests are the same as for a single request
JISC Netskills


HTTP response codes

Generated by the server tell a client the status of a request
41 response codes in total (some you'll never see!)
Class Informational Success Redirection Client Error Server Error
JISC Netskills

Range 1XX 2XX 3XX 4XX 5XX

Examples 100 Continue 101 Switching Protocols 200 OK 201 Created 204 No Content 300 Multiple Choices 301 Moved Permanently 400 Bad Request 401 Unauthorized 403 Forbidden 404 Not Found 500 Internal Server Error


Basic HTTP request structure

Request line General headers Request headers Entity & Hop-by-hop headers CRLF Entity body GET /index.html HTTP/1.1 Date: Mon, 11 Oct 2010 11:00:00 Host: User-Agent: Mozilla/5.0

JISC Netskills


Basic HTTP response structure

Status line General headers Response headers Entity headers Hop-by-hop headers CRLF <!DOCTYPE html /> <html > XHTML WEB PAGE CONTENT </html> HTTP/1.1 200 OK Date: Mon, 11 Oct 2010 11:00:01 Server: Apache/2.2.16 (Unix) Content-Length: 4488 Content-Type: text/html

Entity body

JISC Netskills


Proxies and caches

A proxy acts as a server to clients and as a client to other proxies or to an origin server
Use to share access, to anonymise clients, as a gateway to other systems, as a filter/firewall, as a cache

A cache stores HTTP messages to reduce user-perceived latency, network traffic and server load
HTTP provides extensive cache control support (what can/cannot be cached, when to invalidate entries etc.)

JISC Netskills


Standards promote interoperability between client and server software from different providers

Client has a clear expectation of valid responses to a request; Server can unambiguously interpret a request Levels the playing field; can promote innovation (and inhibit it!)

Two important open standards bodies for the Web

IETF for evolution of Internet and protocols (incl. HTTP) W3C for representation of content (incl. HTML), architecture, social and legal issues, accessibility

JISC Netskills