Anda di halaman 1dari 17

www.netskills.ac.

uk

Web Architecture: HTTP, URIs & URLs

CSC1014

Some NETWORK terms


Term Host Packet IP IP address Hostname DNS Connection TCP Definition Machine connected to a network Basic unit of Internet communication Internet Protocol to coordinate delivery of individual packets between hosts Numerical address for an Internet host Case-insensitive string identifier for an Internet host Distributed Domain Name System to translate between IP addresses and hostnames Logical communication channel between hosts Transmission Control Protocol providing abstraction of a reliable, bidirectional connection

JISC Netskills

CSC1014

Some WEB terms


Term Web Hypertext Internet Web page Web site Browser/ Web Client Web server Origin server Intermediary
JISC Netskills

Definition Network of interlinked information Linking related information for navigation Worldwide network of networks using IP Document accessible on the Web via a URI Collection of related Web pages Application (user agent) to request and display Web pages Program that receives & responds to HTTP request Server where requested resource resides Web component in the request path between client and origin server (proxies, gateways)

CSC1014

Web architecture: Key Components


HTTP/1.1: HyperText Transfer Protocol URI: Uniform Resource Identifier
Format and semantics of request/response messages

Formatted string that identifies a resource

HTML/XHTML: HyperText Markup Language Plus DNS: Domain Name System TCP/IP: Internet Protocol Suite
JISC Netskills

CSC1014

URI: Anatomy
A Uniform Resource Identifier identifies a resource on the internet and is independent of current location
Generic Scheme
http:

Authority
//www.cs.ncl.ac.uk

Path
/teaching/

Query
?m=3504

Fragment
#cwork

https://internal.cs.ncl.ac.uk/modules/2011-12/csc3504/ https://internal.cs.ncl.ac.uk/modules/2011-12/modules.php?m=3504 https://internal.cs.ncl.ac.uk/modules/2011-12/csc3504/index.html#cwork

Scheme specific Scheme Scheme specific syntax


mailto: some.body@ncl.ac.uk

Query
?subject=Hello

<a href="mailto:some.body@ncl.ac.uk?subject=hello">Mail me</a>


JISC Netskills

CSC1014

When is a URI a URL?


Uniform Resource Locator
"...a URL is a type of URI that identifies a resource via a representation of its primary access mechanism (e.g., its network "location"), rather than by some other attributes it may have."

http://www.w3.org/TR/uri-clarification/#contemporary

http: is a URI scheme


and

an http: URI is a URL


JISC Netskills

CSC1014

HTTP: A request-response protocol


HyperText Transfer Protocol makes the Web work!
Clients ask for resources from servers by assembling and sending an HTTP request message Servers respond with the appropriate HTTP response message, including any content to be displayed Metadata in headers & content in an (optional) entity body

HTTP requests & responses travel as TCP/IP packets: HTTP is stateless

Each request/response pair is an independent exchange. No protocol level maintenance of state (for scalability)

JISC Netskills

CSC1014

A Simple Request
DNS server
2. Client makes a DNS query "Where is www.netskills.ac.uk?" 1. User types URL (or clicks a link) 4. TCP connection established (2-way) 5. HTTP request (from client) 6. HTTP response (from server) 7. Browser processes & displays response
JISC Netskills

3. DNS server looks up & responds "At IP address 128.240.233.249"

at 128.240.233.249

Web server

CSC1014

HTTP methods
The operations carried out over HTTP
HEAD, GET, POST, PUT, DELETE etc

HEAD and GET are mandatory (all resources support them)


Others are optional (depends on what you're up to ) This is the "default" method for HTTP

GET returns current state and content of resource HEAD just returns response metadata
i.e. a GET without the body (content)

JISC Netskills

CSC1014

HTTP headers
Headers are the metadata for the request/response exchange Some are generic and apply to both request and response e.g.
Date -> the date/time stamp for the message Cache-Control -> instructions for en-route caching (or not)

Understanding the reading, setting and manipulation HTTP headers is very useful for managing a web site Be aware that headers can be spoofed not good!
JISC Netskills

CSC1014

More HTTP Headers


Request headers (19 in total in 4 classes)
Response preferences Additional request info Conditional headers Constrain server behaviour Redirection Additonal information Authentication Caching -> -> -> -> -> -> -> ->
Accept, Accept-Charset etc Authorization, From etc If-Modified-Since etc Max-Forwards etc Location Server, Retry-After etc WWW-Authenticate, Proxy-Authenticate Age etc

Response headers (9 in total in 4 classes)

Entity & Hop-by-hop headers


JISC Netskills

Info about the resource (content) Content-Type etc Hop-by-hops can be read, stripped or added to en-route

CSC1014

Common HTTP methods


Method HEAD GET POST PUT DELETE Use Exchange of request/response headers Request and return the current state and content of a resource e.g. access a web page Request uses entity body to update resource or as input for processing e.g. form input Server stores entity body contents at request URI location e.g. file uploads Deletes identified resource i.e. opposite of PUT Safe* Yes Yes No No No Idempotent** Yes Yes No Yes Yes Mandatory Yes Yes No No No

* Safe does not change state of resource ** Idempotent... the side effects of repeated, identical requests are the same as for a single request
JISC Netskills

CSC1014

HTTP response codes


Generated by the server tell a client the status of a request
41 response codes in total (some you'll never see!)
Class Informational Success Redirection Client Error Server Error
JISC Netskills

Range 1XX 2XX 3XX 4XX 5XX

Examples 100 Continue 101 Switching Protocols 200 OK 201 Created 204 No Content 300 Multiple Choices 301 Moved Permanently 400 Bad Request 401 Unauthorized 403 Forbidden 404 Not Found 500 Internal Server Error

CSC1014

Basic HTTP request structure


Request line General headers Request headers Entity & Hop-by-hop headers CRLF Entity body GET /index.html HTTP/1.1 Date: Mon, 11 Oct 2010 11:00:00 Host: www.internal.cs.ncl.ac.uk User-Agent: Mozilla/5.0

JISC Netskills

CSC1014

Basic HTTP response structure


Status line General headers Response headers Entity headers Hop-by-hop headers CRLF <!DOCTYPE html /> <html > XHTML WEB PAGE CONTENT </html> HTTP/1.1 200 OK Date: Mon, 11 Oct 2010 11:00:01 Server: Apache/2.2.16 (Unix) Content-Length: 4488 Content-Type: text/html

Entity body

JISC Netskills

CSC1014

Proxies and caches


A proxy acts as a server to clients and as a client to other proxies or to an origin server
Use to share access, to anonymise clients, as a gateway to other systems, as a filter/firewall, as a cache

A cache stores HTTP messages to reduce user-perceived latency, network traffic and server load
HTTP provides extensive cache control support (what can/cannot be cached, when to invalidate entries etc.)

JISC Netskills

CSC1014

Standardisation
Standards promote interoperability between client and server software from different providers

Client has a clear expectation of valid responses to a request; Server can unambiguously interpret a request Levels the playing field; can promote innovation (and inhibit it!)

Two important open standards bodies for the Web

IETF for evolution of Internet and protocols (incl. HTTP) W3C for representation of content (incl. HTML), architecture, social and legal issues, accessibility

JISC Netskills