Web
The World-Wide Web (WWW) is a pair of software applications, which allow both distribution of and access to information on the Internet. The web is not the Internet but a means of distributing and accessing the information that is on it.
"E-Commerce" (Electronic Commerce or EC) is the buying and selling of goods and services on the Internet, especially on the World-Wide Web
Amazon.com
HTTP Server
Clients (browsers)
URL/URI for addressing PHP, python etc. for interaction
URIs
Universal naming mechanism for identifying resources on the Web A resource is anything to which we can attach identity (Web page, image, anchor in page, database record, etc.) Web is an information space, URIs are handles Unique Web naming/addressing technology (HTML/HTTP: not the only data format/Web protocol) Subset of URIs for some existing Internet protocols (http, ftp, mailto, etc.) No longer used in specifications
URI syntax
Scheme: tells the application the type of the resource and the mechanisms to use to access it
Example: http, ftp, news, mailto, telnet, file Recent examples: Azureus magnet link, Skype call link,
HTTP
The Hypertext Transfer Protocol is the set of rules for exchanging files (text, graphic images, sound, video, and other multimedia files) on the World Wide Web.
Syntax
http://<host>:<port>/<path>?<searchpart> http://<host>:<port>/<path>#<fragment> IP port is optional (80 by default)
If path is empty, the system "home page" is returned Path and search part are interpreted by server
http://hilcoe.com.et/registration.htm#msc
Special characters represented by '%' (escape) and 2 hex digits Example: %20 (space), %25 (%), %26 (&), %2D (-), %2F (/), %3D (=), %3F (?), etc.
HTTP
HTTP basically publishes and retrieves the HTTP pages on the World Wide Web. HTTP is a language that is used to communicate between the browser and web server. The information that is transferred using HTTP can be plain text, audio, video, images, and hypertext. Many proxies, tunnels, and gateways can be existing between the web browser (client) and server (web server). An HTTP client initializes a request by establishing a TCP connection to a particular port on the remote host (typically 80 or 8080). An HTTP server listens to that port and receives a request message from the client. Upon receiving the request, server sends back 200 OK messages, its own message, an error
Web server
HTTP
HTTP
IP packet IP IP
IP packet IP
IP packet IP
Ethernet interface
Ethernet interface
SONET interface
SONET interface
Ethernet interface
Ethernet interface
Ethernet
SONET link
Ethernet
10
HTTP in Context
http://origin/..
DNS query
DNS server
Client
HTML
11
HTTP Transactions
An HTTP transaction is a request/reply interaction between a Web client (e.g., browser) and a web server, using HTTP
GET / HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/ jpeg, image/pjpeg, */* Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: www.hilcoe.com.et Connection: Keep-Alive
Client
Origin server
HTTP/1.1 200 OK Date: Mon, 15 Jul 2002 08:49:00 GMT Server: Apache/1.3.26 (Unix) PHP/4.2.1 Last-Modified: Wed, 12 Jun 2002 08:49:49 GMT ETag: "2a-50ea-3d070b2d" Accept-Ranges: bytes Content-Length: 20714 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: text/html <html> ...
HTML
12
1 Object for the page skeleton [n] objects for each page element (graphics, )
Client
Origin server
Communication Overhead !! For each object: 3 TCP messages (3-way handshake) 2 HTTP messages 2 TCP messages (Connection Close)
13
Eliminates the problem of establishing multiple TCP connections Allows a CLIENT to re-use existing TCP connection after initial request
TCP SYN TCP SYN, ACK TCP ACK
Client
HTTP Request 1 HTTP Response 1 HTTP Request 2 HTTP Response 2 TCP FIN TCP FIN ACK
Origin server
Communication Overhead For the first object: 3 TCP messages (3-way handshake) 2 HTTP messages 2 TCP messages (Connection Close) Subsequent objects: 2 HTTP messages
14
Non-persistent
Persistent
http 1.0: server parses request, responds, closes TCP connection 2 RTTs (Round Trip Time) to fetch object
Each transfer suffers from TCPs initially slow sending rate Many browsers open multiple parallel connections
Default for http 1.1 On same TCP connection: server, parses request, responds, parses new request, Client sends request for all referenced object as soon as it receives base HTMLL Fewer RTTS, less slow start
CLIENT does not have to wait for a response to one request before issuing a new request on the same TCP connection
TCP SYN TCP SYN, ACK TCP ACK
Client
HTTP Request 1 HTTP Request 2 HTTP Response 1 HTTP Response 2 TCP FIN TCP FIN ACK
Origin server
16
Restrictions
CLIENTS should not pipeline until they are sure the connection is persistent HTTP responses must be returned in the same order as the requests CLIENTS should not pipeline requests that have side effects On error, pipelining prevents clients from knowing which of a series of pipelined requests were executed by the server
17
Recall: a browser could naively process each embedded object serially HTTP allows clients to open multiple connections and perform multiple HTTP transactions in parallel Properties / drawbacks
Parallel Conn. May Make Pages Load Faster Connection delays can be overlapped if client BW is not saturated Parallel Conn. Are Not Always Faster If client BW is scarce, it is better to transfer as fast as possible each object Parallel Conn. May "Feel" Faster Human perception of seeing multiple objects gradually appearing on the screen
18
Client
GET / HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/ jpeg, image/pjpeg, */* Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: www.hilcoe.com.et Connection: Keep-Alive
Origin server
HTTP/1.1 200 OK Date: Mon, 15 Jul 2002 08:49:00 GMT Server: Apache/1.3.26 (Unix) PHP/4.2.1 Last-Modified: Wed, 12 Jun 2002 08:49:49 GMT ETag: "2a-50ea-3d070b2d" Accept-Ranges: bytes Content-Length: 20714 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: text/html <html> ...
HTML
19
HTTP Requests
Request line (method, URI, HTTP version) Header lines <CR> indicates end of message Optional payload
GET /index.htm HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */* Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: www.hilcoe.com.et Connection: Keep-Alive <CR>
GET /index.htm HTTP/1.1 request "/index.htm" using HTTP version 1.1 Accept: types of documents accepted by browser Accept-Language: preferred language is English Accept-Encoding: browser understands compressed documents User-Agent: identification of browser (real type is IE 5.01) Host: what the client thinks the server host is Connection: keep TCP connection open until explicitly disconnected
20
method
sp
URL :
sp value
version cr lf
cr
lf
Request line
Header lines
Header field name cr lf Entity Body : value cr lf
First line: tells the server the method to use, the entity (document) to apply it to, and the client's version of HTTP General header: used in client and server messages Request header: tell more information about the client Entity header/body: used when an entity is sent by the client
POST /cgi-bin/query HTTP/1.0 Connection: Keep-Alive Host: www.hilcoe.com.et User-Agent: Mozilla/4.0 Content-type: application-www-form-urlencoded Content-length: 23 query=knuth&type=author
Client Methods
GET
Retrieve a resource from the server (static file, or dynamically-generated data)
DELETE
Remove a resource from the server
HEAD
Get information about a resource (but not the actual resource)
POST
Client provides some information to the server, e.g., through forms (may update the state of the server)
PUT
Provide a new or replacement resource to put on the server
23
HTTP Replies
html file
HTTP/1.1 200 OK document found (code 200); server is using HTTP 1.1 Date: current date at the server Server: software run by the server Last-Modified: most recent modification of the document ETag: entity tag (unique identifier for the server resource, usable for caching) Accept-Ranges: server can return subsections of a document Content-Length: length of the body (which follows the header) in bytes Connection: the connection will close after the server's response Content-Type: what kind of document is included in the response <html>... document text (follows blank line)
24
First line: tells the client the server's version of HTTP, the status code, and a human-readable description of the status General header: used in client and server messages Response header: tell more information about the server Entity header/body: response sent to the client
HTTP/1.0 200 OK Date: Mon, 15 Jul 2002 08:49:00 GMT Server: Apache/1.3.26 (Unix) PHP/4.2.1 Content-type: text/html Content-Length: 20714 <html>...
100 Continue 101: Switching protocols 200: OK 201: Created 204: No content ... 301: Moved permanently 305: Use proxy ... 400: Bad request 401: Unauthorized ... 500: Internal error 501: Not implemented ...
26
Resource Retrieval
GET Method
GET /index.htm HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/ jpeg, image/pjpeg, */* Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: www.hilcoe.com.et Connection: Keep-Alive <CR>
Reply
HTTP/1.1 200 OK Date: Mon, 15 Jul 2002 08:49:00 GMT Server: Apache/1.3.26 (Unix) PHP/4.2.1 Last-Modified: Wed, 12 Jun 2002 08:49:49 GMT ETag: "2a-50ea-3d070b2d" Accept-Ranges: bytes Content-Length: 20714 Connection: close Content-Type: text/html <CR> <html>...
27
POST Method
POST /cgi-bin/query HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/ jpeg, image/pjpeg, */* Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: www.hilcoe.com.et Content-type: application-www-form-urlencoded Content-length: 23 <CR> query=knuth&type=author
Reply as before
28
HTTP Proxying
Proxy servers are both Web servers and Web clients Can be nested, can be used as firewall, cache, anonymizer, etc. Proxy may modify requests/replies (e.g., change image formats)
GET http://origin/..
Proxy server 1
GET http://origin/.. Via: proxy1
Proxy server 2
Origin server
Client
GET http://origin/.. Via: proxy1,proxy2
29
Managed by ISPs
30
31
32
Firewall
A firewall is a program, usually an Internet gateway server, that protects the resources of one network from users from other networks. Enterprise want a firewall to prevent outsiders from accessing its own private data resources. There are a number of firewall screening methods.
screen requests to make sure they come from acceptable domain names and IP addresses. not allow Telnet access into your network except for your own users.
34
35
HTTP Caching
Improves Web performance, reduces load on server Cache control directives in HTTP header (no cache, age, etc.)
GET http://origin/..
Cache server
GET http://origin/..
Origin server
Client 1
200 OK
200 OK
Client 2
GET http://origin/..
Deployment issues: How to best place caches? How many caches to use? How to dimension cache? How long to cache?
200 OK
36
37
38
Basic scheme: username:password base-64 encoded echo -n user:password" | openssl base64 echo "c2NvdHQ6dGlnZXI=" | openssl base64 -d
Client
Username: joe Password: ********
Origin server
HTML
39
WWW-Authenticate header of the server's initial 401 response contains a nonce value
40
Cookies
Client
GET http://origin/.. 200 OK Set-Cookie: ABC=XYZ
Origin server
Cookies are scoped by a site or domain Server can specify desired expiration date Client can reject cookies, limit their size/duration, etc.
Use later
41
Three layers
Presentation Business Data access Presentation layer refers to UI that communicate with the business layer The business layer contains set of methods that validate user input condition before calling a method from data layer It also insure that the output is correct. The validation of input is called business rules. The business rule is not only restricted to data validation, it can apply also to any calculations
Architecture
1 tier
Main frame All processing in a single computer All resources attached to the same computer Access via dumb terminals
Advantage
2 tier
The personal computer Client server model Logical system components are mostly on the client (UI, data access, and business rules), the server contains the data layer Drawback:
3 tier
It is client/server model but from a web server The client only display the GUI and data but has no part in producing results Application layer: user interface, business rules and data access Data layer
3 tier (Cont )
Benefit:
Scalability
The application servers can be deployed on may machines The database no longer requires a connection from every client rather from application servers
Better Re-use Improve data integrity, security Reduce distribution Improve availability Encapsulate database structure