1
Chapter 3 outline
3.1 Transport-layer 3.5 Connection-oriented
services transport: TCP
3.2 Multiplexing and segment structure
demultiplexing reliable data transfer
flow control
3.3 Connectionless
connection management
transport: UDP
3.6 Principles of
3.4 Principles of
reliable data transfer congestion control
3.7 TCP congestion
control
2
Transport vs. network layer
network layer: logical Household analogy:
communication 12 kids sending letters to
between hosts 12 kids
transport layer: logical processes = kids
communication app messages = letters
between processes in envelopes
relies on, enhances,
hosts = houses
network layer services
transport protocol =
Ann and Bill
network-layer protocol
= postal service
connection setup
unreliable, unordered
network
data link
physicalnetwork
delivery: UDP data link
physical
no-frills extension of network
data link
“best-effort” IP physical network
application
transport
data link network
services not available: physical data link
physical
delay guarantees
bandwidth guarantees
3
Chapter 3 outline
3.1 Transport-layer 3.5 Connection-oriented
services transport: TCP
3.2 Multiplexing and segment structure
demultiplexing reliable data transfer
flow control
3.3 Connectionless
connection management
transport: UDP
3.6 Principles of
3.4 Principles of
reliable data transfer congestion control
3.7 TCP congestion
control
Multiplexing/demultiplexing
Demultiplexing at rcv host: Multiplexing at send host:
gathering data from multiple
delivering received segments
sockets, enveloping data with
to correct socket
header (later used for
demultiplexing)
= socket = process
application P3 P1
P1 application P2 P4 application
host 2 host 3
host 1
Transport Layer 3-8
Department of Computer Engineering Yeditepe University ©2011
4
How demultiplexing works
host receives IP datagrams
each datagram has source
32 bits
IP address, destination IP
address source port # dest port #
Connectionless demultiplexing
When host receives UDP
Create sockets with port
segment:
numbers:
DatagramSocket mySocket1 = new checks destination port
DatagramSocket(12534); number in segment
DatagramSocket mySocket2 = new directs UDP segment to
DatagramSocket(12535); socket with that port
number
UDP socket identified by
two-tuple: IP datagrams with
different source IP
(dest IP address, dest port number)
addresses and/or source
port numbers directed
to same socket
5
Connectionless demux (cont)
DatagramSocket serverSocket = new DatagramSocket(6428);
P2 P1
P1
P3
Connection-oriented demux
TCP socket identified Server host may support
by 4-tuple: many simultaneous TCP
source IP address sockets:
source port number each socket identified by
dest IP address its own 4-tuple
dest port number Web servers have
recv host uses all four different sockets for
values to direct each connecting client
segment to appropriate non-persistent HTTP will
socket have different socket for
each request
6
Connection-oriented demux
(cont)
P1 P4 P5 P6 P2 P1P3
SP: 5775
DP: 80
S-IP: B
D-IP:C
Connection-oriented demux:
Threaded Web Server
P1 P4 P2 P1P3
SP: 5775
DP: 80
S-IP: B
D-IP:C
7
Chapter 3 outline
3.1 Transport-layer 3.5 Connection-oriented
services transport: TCP
3.2 Multiplexing and segment structure
demultiplexing reliable data transfer
flow control
3.3 Connectionless
connection management
transport: UDP
3.6 Principles of
3.4 Principles of
reliable data transfer congestion control
3.7 TCP congestion
control
8
UDP: more
often used for streaming
multimedia apps 32 bits
UDP checksum
Goal: detect “errors” (e.g., flipped bits) in transmitted
segment
Sender: Receiver:
treat segment contents compute checksum of
as sequence of 16-bit received segment
integers check if computed checksum
checksum: addition (1‟s equals checksum field value:
complement sum) of NO - error detected
segment contents YES - no error detected.
sender puts checksum But maybe errors
value into UDP checksum nonetheless? More later
field ….
9
Internet Checksum Example
Note
When adding numbers, a carryout from the
most significant bit needs to be added to the
result
Example: add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
sum 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
checksum 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
Transport Layer 3-19
Department of Computer Engineering Yeditepe University ©2011
Chapter 3 outline
3.1 Transport-layer 3.5 Connection-oriented
services transport: TCP
3.2 Multiplexing and segment structure
demultiplexing reliable data transfer
flow control
3.3 Connectionless
connection management
transport: UDP
3.6 Principles of
3.4 Principles of
reliable data transfer congestion control
3.7 TCP congestion
control
10
Principles of Reliable data transfer
important in app., transport, link layers
top-10 list of important networking topics!
11
Principles of Reliable data transfer
important in app., transport, link layers
top-10 list of important networking topics!
send receive
side side
12
Reliable data transfer: getting started
We‟ll:
incrementally develop sender, receiver sides of
reliable data transfer protocol (rdt)
consider only unidirectional data transfer
but control info will flow on both directions!
use finite state machines (FSM) to specify
sender, receiver
event causing state transition
actions taken on state transition
state: when in this
“state” next state state state
1 event
uniquely determined 2
by next event actions
sender receiver
13
Rdt2.0: channel with bit errors
underlying channel may flip bits in packet
checksum to detect bit errors
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
14
rdt2.0: operation with no errors
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
15
rdt2.0 has a fatal flaw!
What happens if Handling duplicates:
ACK/NAK corrupted? sender retransmits current
sender doesn‟t know what pkt if ACK/NAK garbled
happened at receiver! sender adds sequence
can‟t just retransmit: number to each pkt
possible duplicate receiver discards (doesn‟t
deliver up) duplicate pkt
16
rdt2.1: receiver, handles garbled ACK/NAKs
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq0(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && (corrupt(rcvpkt) rdt_rcv(rcvpkt) && (corrupt(rcvpkt)
sndpkt = make_pkt(NAK, chksum) sndpkt = make_pkt(NAK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
Wait for Wait for
rdt_rcv(rcvpkt) && 0 from 1 from rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) && below below not corrupt(rcvpkt) &&
has_seq1(rcvpkt) has_seq0(rcvpkt)
sndpkt = make_pkt(ACK, chksum) sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
rdt2.1: discussion
Sender: Receiver:
seq # added to pkt must check if received
two seq. #‟s (0,1) will packet is duplicate
suffice. Why? state indicates whether
0 or 1 is expected pkt
must check if received seq #
ACK/NAK corrupted
note: receiver can not
twice as many states know if its last
state must “remember” ACK/NAK received OK
whether “current” pkt
at sender
has 0 or 1 seq. #
17
rdt2.2: a NAK-free protocol
18
rdt3.0: channels with errors and loss
rdt3.0 sender
rdt_send(data)
rdt_rcv(rcvpkt) &&
sndpkt = make_pkt(0, data, checksum) ( corrupt(rcvpkt) ||
udt_send(sndpkt) isACK(rcvpkt,1) )
rdt_rcv(rcvpkt) start_timer L
L Wait for Wait
for timeout
call 0from
ACK0 udt_send(sndpkt)
above
start_timer
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt,1) && notcorrupt(rcvpkt)
stop_timer && isACK(rcvpkt,0)
stop_timer
Wait Wait for
timeout for call 1 from
udt_send(sndpkt) ACK1 above
start_timer rdt_rcv(rcvpkt)
rdt_send(data) L
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) || sndpkt = make_pkt(1, data, checksum)
isACK(rcvpkt,0) ) udt_send(sndpkt)
start_timer
L
19
rdt3.0 in action
rdt3.0 in action
20
Performance of rdt3.0
L 8000bits
dtrans 8 microsecon ds
R 109 bps
U sender: utilization – fraction of time sender busy sending
U L/R .008
sender
= = = 0.00027
RTT + L / R 30.008 microsec
onds
1KB pkt every 30 msec -> 33kB/sec thruput over 1 Gbps link
network protocol limits use of physical resources!
U L/R .008
sender
= = = 0.00027
RTT + L / R 30.008 microsec
onds
21
Pipelined protocols
Pipelining: sender allows multiple, “in-flight”, yet-to-
be-acknowledged pkts
range of sequence numbers must be increased
buffering at sender and/or receiver
Increase utilization
by a factor of 3!
U 3*L/R .024
sender
= = = 0.0008
RTT + L / R 30.008 microsecon
ds
Transport Layer 3-44
Department of Computer Engineering Yeditepe University ©2011
22
Pipelining Protocols
Go-back-N: big picture: Selective Repeat: big pic
Sender can have up to Sender can have up to
N unacked packets in N unacked packets in
pipeline pipeline
Rcvr only sends Rcvr acks individual
cumulative acks packets
Doesn‟t ack packet if Sender maintains
there‟s a gap timer for each
Sender has timer for unacked packet
oldest unacked packet When timer expires,
If timer expires, retransmit only unack
retransmit all unacked packet
packets
23
Go-Back-N
Sender:
k-bit seq # in pkt header
“window” of up to N, consecutive unack‟ed pkts allowed
24
GBN: receiver extended FSM
default
udt_send(sndpkt) rdt_rcv(rcvpkt)
&& notcurrupt(rcvpkt)
L && hasseqnum(rcvpkt,expectedseqnum)
expectedseqnum=1 Wait extract(rcvpkt,data)
sndpkt = deliver_data(data)
make_pkt(expectedseqnum,ACK,chksum) sndpkt = make_pkt(expectedseqnum,ACK,chksum)
udt_send(sndpkt)
expectedseqnum++
GBN in
action
25
Selective Repeat
receiver individually acknowledges all correctly
received pkts
buffers pkts, as needed, for eventual in-order delivery
to upper layer
sender only resends pkts for which ACK not
received
sender timer for each unACKed pkt
sender window
N consecutive seq #‟s
again limits seq #s of sent, unACKed pkts
26
Selective repeat
sender receiver
data from above : pkt n in [rcvbase, rcvbase+N-1]
if next available seq # in send ACK(n)
window, send pkt out-of-order: buffer
timeout(n): in-order: deliver (also
resend pkt n, restart timer deliver buffered, in-order
pkts), advance window to
ACK(n) in [sendbase,sendbase+N]: next not-yet-received pkt
mark pkt n as received
pkt n in [rcvbase-N,rcvbase-1]
if n smallest unACKed pkt,
ACK(n)
advance window base to
next unACKed seq # otherwise:
ignore
27
Selective repeat:
dilemma
Example:
seq #‟s: 0, 1, 2, 3
window size=3
receiver sees no
difference in two
scenarios!
incorrectly passes
duplicate data as new
in (a)
Q: what relationship
between seq # size
and window size?
Transport Layer 3-55
Department of Computer Engineering Yeditepe University ©2011
Chapter 3 outline
3.1 Transport-layer 3.5 Connection-oriented
services transport: TCP
3.2 Multiplexing and segment structure
demultiplexing reliable data transfer
flow control
3.3 Connectionless
connection management
transport: UDP
3.6 Principles of
3.4 Principles of
reliable data transfer congestion control
3.7 TCP congestion
control
28
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581
overwhelm receiver
door door
TCP TCP
send buffer receive buffer
segment
29
TCP seq. #‟s and ACKs
Seq. #‟s:
Host A Host B
byte stream
“number” of first User
types
byte in segment‟s „C‟
data host ACKs
receipt of
ACKs: „C‟, echoes
seq # of next byte back „C‟
expected from
other side host ACKs
cumulative ACK receipt
of echoed
Q: how receiver handles „C‟
out-of-order segments
A: TCP spec doesn‟t
time
say, - up to
simple telnet scenario
implementor
Transport Layer 3-59
Department of Computer Engineering Yeditepe University ©2011
30
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
350
300
250
RTT (milliseconds)
200
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
31
TCP Round Trip Time and Timeout
Setting the timeout
EstimtedRTT plus “safety margin”
large variation in EstimatedRTT -> larger safety margin
first estimate of how much SampleRTT deviates from
EstimatedRTT:
DevRTT = (1-)*DevRTT +
*|SampleRTT-EstimatedRTT|
(typically, = 0.25)
Chapter 3 outline
3.1 Transport-layer 3.5 Connection-oriented
services transport: TCP
3.2 Multiplexing and segment structure
demultiplexing reliable data transfer
flow control
3.3 Connectionless
connection management
transport: UDP
3.6 Principles of
3.4 Principles of
reliable data transfer congestion control
3.7 TCP congestion
control
32
TCP reliable data transfer
TCP creates rdt Retransmissions are
service on top of IP‟s triggered by:
unreliable service timeout events
Pipelined segments duplicate acks
Cumulative acks Initially consider
TCP uses single
simplified TCP sender:
ignore duplicate acks
retransmission timer
ignore flow control,
congestion control
33
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
X
loss
Sendbase
= 100
Seq=92 timeout
SendBase
= 120
SendBase
= 100 SendBase
= 120 premature timeout
time time
lost ACK scenario
Transport Layer 3-68
Department of Computer Engineering Yeditepe University ©2011
34
TCP retransmission scenarios (more)
Host A Host B
timeout
X
loss
SendBase
= 120
time
Cumulative ACK scenario
35
Fast Retransmit
Time-out period often If sender receives 3
relatively long: ACKs for the same
long delay before data, it supposes that
resending lost packet segment after ACKed
Detect lost segments data was lost:
via duplicate ACKs. fast retransmit: resend
Sender often sends segment before timer
many segments back-to- expires
back
If segment is lost,
there will likely be many
duplicate ACKs.
Host A Host B
X
timeout
time
36
Fast retransmit algorithm:
Chapter 3 outline
3.1 Transport-layer 3.5 Connection-oriented
services transport: TCP
3.2 Multiplexing and segment structure
demultiplexing reliable data transfer
flow control
3.3 Connectionless
connection management
transport: UDP
3.6 Principles of
3.4 Principles of
reliable data transfer congestion control
3.7 TCP congestion
control
37
TCP Flow Control
flow control
sender won‟t overflow
receive side of TCP receiver‟s buffer by
connection has a transmitting too much,
receive buffer: too fast
speed-matching
service: matching the
send rate to the
receiving app‟s drain
rate
app process may be
slow at reading from
buffer
Transport Layer 3-75
Department of Computer Engineering Yeditepe University ©2011
38
Chapter 3 outline
3.1 Transport-layer 3.5 Connection-oriented
services transport: TCP
3.2 Multiplexing and segment structure
demultiplexing reliable data transfer
flow control
3.3 Connectionless
connection management
transport: UDP
3.6 Principles of
3.4 Principles of
reliable data transfer congestion control
3.7 TCP congestion
control
seq. #s no data
39
TCP Connection Management (cont.)
close
client closes socket:
clientSocket.close();
timed wait
FIN, replies with ACK.
Closes connection, sends
FIN. closed
40
TCP Connection Management (cont)
TCP server
lifecycle
TCP client
lifecycle
Chapter 3 outline
3.1 Transport-layer 3.5 Connection-oriented
services transport: TCP
3.2 Multiplexing and segment structure
demultiplexing reliable data transfer
flow control
3.3 Connectionless
connection management
transport: UDP
3.6 Principles of
3.4 Principles of
reliable data transfer congestion control
3.7 TCP congestion
control
41
Principles of Congestion Control
Congestion:
informally: “too many sources sending too much
data too fast for network to handle”
different from flow control!
manifestations:
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem!
receivers
one router,
Host B unlimited shared
output link buffers
infinite buffers
no retransmission
large delays
when congested
maximum
achievable
throughput
Transport Layer 3-84
Department of Computer Engineering Yeditepe University ©2011
42
Causes/costs of congestion: scenario 2
R/3
lout
lout
lout
R/4
a. b. c.
“costs” of congestion:
more work (retrans) for given “goodput”
unneeded retransmissions: link carries multiple copies of pkt
Transport Layer 3-86
Department of Computer Engineering Yeditepe University ©2011
43
Causes/costs of congestion: scenario 3
four senders
Q: what happens as l
multihop paths in
and l increase ?
timeout/retransmit in
Host A lout
lin : original data
l'in : original data, plus
retransmitted data
Host B
H
o
s
t
B
44
Approaches towards congestion control
Two broad approaches towards congestion control:
45
Case study: ATM ABR congestion control
Chapter 3 outline
3.1 Transport-layer 3.5 Connection-oriented
services transport: TCP
3.2 Multiplexing and segment structure
demultiplexing reliable data transfer
flow control
3.3 Connectionless
connection management
transport: UDP
3.6 Principles of
3.4 Principles of
reliable data transfer congestion control
3.7 TCP congestion
control
46
TCP congestion control: additive increase,
multiplicative decrease
Approach: increase transmission rate (window size),
probing for usable bandwidth, until loss occurs
additive increase: increase CongWin by 1 MSS
every RTT until loss detected
multiplicative decrease: cut CongWin in half after
loss congestion window size congestion
window
24 Kbytes
Saw tooth
behavior: probing
16 Kbytes
for bandwidth
8 Kbytes
time
time
47
TCP Slow Start
When connection begins, When connection begins,
CongWin = 1 MSS increase rate
Example: MSS = 500 exponentially fast until
bytes & RTT = 200 msec first loss event
initial rate = 20 kbps
available bandwidth may
be >> MSS/RTT
desirable to quickly ramp
up to respectable rate
48
Refinement: inferring loss
After 3 dup ACKs:
CongWin is cut in half
Philosophy:
window then grows
linearly 3 dup ACKs indicates
But after timeout event: network capable of
delivering some segments
CongWin instead set to
timeout indicates a
1 MSS;
“more alarming”
window then grows congestion scenario
exponentially
to a threshold, then
grows linearly
Refinement
Q: When should the
exponential
increase switch to
linear?
A: When CongWin
gets to 1/2 of its
value before
timeout.
Implementation:
Variable Threshold
At loss event, Threshold is
set to 1/2 of CongWin just
before loss event
49
Summary: TCP Congestion Control
50
TCP throughput
What‟s the average throughout of TCP as a
function of window size and RTT?
Ignore slow start
Let W be the window size when loss occurs.
When window is W, throughput is W/RTT
Just after loss, window drops to W/2,
throughput to W/2RTT.
Average throughout: .75 W/RTT
51
TCP Fairness
Fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K
TCP connection 1
bottleneck
TCP
router
connection 2
capacity R
Connection 1 throughput R
52
Fairness (more)
Fairness and UDP Fairness and parallel TCP
Multimedia apps often
connections
do not use TCP nothing prevents app from
do not want rate opening parallel
throttled by congestion connections between 2
control hosts.
Instead use UDP: Web browsers do this
pump audio/video at Example: link of rate R
constant rate, tolerate
packet loss
supporting 9 connections;
new app asks for 1 TCP, gets
Research area: TCP rate R/10
friendly new app asks for 11 TCPs,
gets R/2 !
Chapter 3: Summary
principles behind transport
layer services:
multiplexing,
demultiplexing
reliable data transfer
flow control Next:
congestion control leaving the network
instantiation and “edge” (application,
implementation in the transport layers)
Internet into the network
UDP “core”
TCP
Transport Layer 3-106
Department of Computer Engineering Yeditepe University ©2011
53