Transport Layer
Computer Networking:
A Top Down Approach
5th edition.
Jim Kurose, Keith Ross
Addison-Wesley, April
2009.
1
Chapter 3 outline
3.1 Transport-layer 3.5 Connection-oriented
services transport: TCP
3.2 Multiplexing and segment structure
demultiplexing reliable data transfer
flow control
3.3 Connectionless
connection management
transport: UDP
3.6 Principles of
3.4 Principles of
reliable data transfer congestion control
3.7 TCP congestion
control
2
Transport vs. network layer
network layer: logical Household analogy:
communication 12 kids sending letters to
between hosts 12 kids
transport layer: logical processes = kids
communication app messages = letters
between processes in envelopes
relies on, enhances,
hosts = houses
network layer services
transport protocol =
Ann and Bill
network-layer protocol
= postal service
connection setup
unreliable, unordered
network
data link
physicalnetwork
delivery: UDP data link
physical
no-frills extension of network
data link
“best-effort” IP
application
physical network transport
data link network
services not available:
physical data link
physical
delay guarantees
bandwidth guarantees
3
Chapter 3 outline
3.1 Transport-layer 3.5 Connection-oriented
services transport: TCP
3.2 Multiplexing and segment structure
demultiplexing reliable data transfer
flow control
3.3 Connectionless
connection management
transport: UDP
3.6 Principles of
3.4 Principles of
reliable data transfer congestion control
3.7 TCP congestion
control
Multiplexing/demultiplexing
Demultiplexing at rcv host: Multiplexing at send host:
gathering data from multiple
delivering received segments
sockets, enveloping data with
to correct socket
header (later used for
demultiplexing)
= socket = process
application P3 P1
P1 application P2 P4 application
host 2 host 3
host 1
Transport Layer 3-8
4
How demultiplexing works
host receives IP datagrams
each datagram has source 32 bits
IP address, destination IP
address source port # dest port #
Connectionless demultiplexing
When host receives UDP
Create sockets with port
segment:
numbers:
DatagramSocket mySocket1 = new checks destination port
DatagramSocket(12534); number in segment
DatagramSocket mySocket2 = new directs UDP segment to
DatagramSocket(12535); socket with that port
number
UDP socket identified by
two-tuple: IP datagrams with
different source IP
(dest IP address, dest port number)
addresses and/or source
port numbers directed
to same socket
5
Connectionless demux (cont)
DatagramSocket serverSocket = new DatagramSocket(6428);
P2 P1
P1
P3
Connection-oriented demux
TCP socket identified Server host may support
by 4-tuple: many simultaneous TCP
source IP address sockets:
source port number each socket identified by
dest IP address its own 4-tuple
dest port number Web servers have
receiving host uses all different sockets for
four values to direct each connecting client
segment to appropriate non-persistent HTTP will
socket have different socket for
each request
6
Connection-oriented demux
(cont)
P1 P4 P5 P6 P2 P1P3
SP: 5775
DP: 80
S-IP: B
D-IP:C
Connection-oriented demux:
Threaded Web Server
P1 P4 P2 P1P3
SP: 5775
DP: 80
S-IP: B
D-IP:C
7
Chapter 3 outline
3.1 Transport-layer 3.5 Connection-oriented
services transport: TCP
3.2 Multiplexing and segment structure
demultiplexing reliable data transfer
flow control
3.3 Connectionless
connection management
transport: UDP
3.6 Principles of
3.4 Principles of
reliable data transfer congestion control
3.7 TCP congestion
control
8
UDP: more
often used for streaming
multimedia apps 32 bits
UDP checksum
Goal: detect “errors” (e.g., flipped bits) in transmitted
segment
Sender: Receiver:
treat segment contents compute checksum of
as sequence of 16-bit received segment
integers check if computed checksum
checksum: addition (1’s equals checksum field value:
complement sum) of NO - error detected
segment contents YES - no error detected.
sender puts checksum But maybe errors
value into UDP checksum nonetheless? More later
field ….
9
Internet Checksum Example
Note
When adding numbers, a carryout from the
most significant bit needs to be added to the
result
Example: add two 16-bit integers
1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0
1 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
wraparound 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1
sum 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0
checksum 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1
Transport Layer 3-19
Chapter 3 outline
3.1 Transport-layer 3.5 Connection-oriented
services transport: TCP
3.2 Multiplexing and segment structure
demultiplexing reliable data transfer
flow control
3.3 Connectionless
connection management
transport: UDP
3.6 Principles of
3.4 Principles of
reliable data transfer congestion control
3.7 TCP congestion
control
10
Principles of Reliable data transfer
important in app., transport, link layers
top-10 list of important networking topics!
11
Principles of Reliable data transfer
important in app., transport, link layers
top-10 list of important networking topics!
send receive
side side
12
Reliable data transfer: getting started
We’ll:
incrementally develop sender, receiver sides of
reliable data transfer protocol (rdt)
consider only unidirectional data transfer
but control info will flow on both directions!
use finite state machines (FSM) to specify
sender, receiver
event causing state transition
actions taken on state transition
state: when in this
“state” next state state state
1 event
uniquely determined 2
by next event actions
sender receiver
13
Rdt2.0: channel with bit errors
underlying channel may flip bits in packet
checksum to detect bit errors
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
14
rdt2.0: operation with no errors
rdt_send(data)
snkpkt = make_pkt(data, checksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) &&
isNAK(rcvpkt)
Wait for Wait for rdt_rcv(rcvpkt) &&
call from ACK or udt_send(sndpkt) corrupt(rcvpkt)
above NAK
udt_send(NAK)
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
udt_send(ACK)
15
rdt2.0 has a fatal flaw!
What happens if Handling duplicates:
ACK/NAK corrupted? sender retransmits current
sender doesn’t know what pkt if ACK/NAK garbled
happened at receiver! sender adds sequence
can’t just retransmit: number to each pkt
possible duplicate receiver discards (doesn’t
deliver up) duplicate pkt
16
rdt2.1: receiver, handles garbled ACK/NAKs
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq0(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
rdt_rcv(rcvpkt) && (corrupt(rcvpkt) rdt_rcv(rcvpkt) && (corrupt(rcvpkt)
sndpkt = make_pkt(NAK, chksum) sndpkt = make_pkt(NAK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
Wait for Wait for
rdt_rcv(rcvpkt) && 0 from 1 from rdt_rcv(rcvpkt) &&
not corrupt(rcvpkt) && below below not corrupt(rcvpkt) &&
has_seq1(rcvpkt) has_seq0(rcvpkt)
sndpkt = make_pkt(ACK, chksum) sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt) udt_send(sndpkt)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt)
&& has_seq1(rcvpkt)
extract(rcvpkt,data)
deliver_data(data)
sndpkt = make_pkt(ACK, chksum)
udt_send(sndpkt)
rdt2.1: discussion
Sender: Receiver:
seq # added to pkt must check if received
two seq. #’s (0,1) will packet is duplicate
suffice. Why? state indicates whether
0 or 1 is expected pkt
must check if received seq #
ACK/NAK corrupted
note: receiver can not
twice as many states know if its last
state must “remember” ACK/NAK received OK
whether “current” pkt
at sender
has 0 or 1 seq. #
17
rdt2.2: a NAK-free protocol
18
rdt3.0: channels with errors and loss
rdt3.0 sender
rdt_send(data)
rdt_rcv(rcvpkt) &&
sndpkt = make_pkt(0, data, checksum) ( corrupt(rcvpkt) ||
udt_send(sndpkt) isACK(rcvpkt,1) )
rdt_rcv(rcvpkt) start_timer
Wait for Wait
for timeout
call 0from
ACK0 udt_send(sndpkt)
above
start_timer
rdt_rcv(rcvpkt)
&& notcorrupt(rcvpkt) rdt_rcv(rcvpkt)
&& isACK(rcvpkt,1) && notcorrupt(rcvpkt)
stop_timer && isACK(rcvpkt,0)
stop_timer
Wait Wait for
timeout for call 1 from
udt_send(sndpkt) ACK1 above
start_timer rdt_rcv(rcvpkt)
rdt_send(data)
rdt_rcv(rcvpkt) &&
( corrupt(rcvpkt) || sndpkt = make_pkt(1, data, checksum)
isACK(rcvpkt,0) ) udt_send(sndpkt)
start_timer
19
rdt3.0 in action
rdt3.0 in action
20
Performance of rdt3.0
L 8000bits
d trans 8 microseconds
R 109 bps
U sender: utilization – fraction of time sender busy sending
U L/R .008
= = = 0.00027
sender 30.008
RTT + L / R
1KB pkt every 30 msec -> 33kB/sec thruput over 1 Gbps link
network protocol limits use of physical resources!
U L/R .008
= = = 0.00027
sender 30.008
RTT + L / R
21
Pipelined protocols
Pipelining: sender allows multiple, “in-flight”, yet-to-
be-acknowledged pkts
range of sequence numbers must be increased
buffering at sender and/or receiver
Increase utilization
by a factor of 3!
U 3*L/R .024
= = = 0.0008
sender 30.008
RTT + L / R
22
Pipelining Protocols
Go-back-N: overview Selective Repeat: overview
sender: up to N sender: up to N unACKed
unACKed pkts in packets in pipeline
pipeline receiver: ACKs individual
receiver: only sends pkts
cumulative ACKs sender: maintains timer
doesn’t ACK pkt if for each unACKed pkt
there’s a gap
if timer expires: retransmit
sender: has timer for only unACKed packet
oldest unACKed pkt
if timer expires:
retransmit all unACKed
packets
Go-Back-N
Sender:
k-bit seq # in pkt header
“window” of up to N, consecutive unACKed pkts allowed
23
GBN: sender extended FSM
rdt_send(data)
if (nextseqnum < base+N) {
sndpkt[nextseqnum] = make_pkt(nextseqnum,data,chksum)
udt_send(sndpkt[nextseqnum])
if (base == nextseqnum)
start_timer
nextseqnum++
}
else
refuse_data(data)
base=1
nextseqnum=1
timeout
start_timer
Wait
udt_send(sndpkt[base])
rdt_rcv(rcvpkt) udt_send(sndpkt[base+1])
&& corrupt(rcvpkt) …
udt_send(sndpkt[nextseqnum-1])
rdt_rcv(rcvpkt) &&
notcorrupt(rcvpkt)
base = getacknum(rcvpkt)+1
If (base == nextseqnum)
stop_timer
else
start_timer Transport Layer 3-47
24
GBN in
action
Selective Repeat
receiver individually acknowledges all correctly
received pkts
buffers pkts, as needed, for eventual in-order delivery
to upper layer
sender only resends pkts for which ACK not
received
sender timer for each unACKed pkt
sender window
N consecutive seq #’s
again limits seq #s of sent, unACKed pkts
25
Selective repeat: sender, receiver windows
Selective repeat
sender receiver
data from above : pkt n in [rcvbase, rcvbase+N-1]
if next available seq # in send ACK(n)
window, send pkt out-of-order: buffer
timeout(n): in-order: deliver (also
resend pkt n, restart timer deliver buffered, in-order
pkts), advance window to
ACK(n) in [sendbase,sendbase+N]: next not-yet-received pkt
mark pkt n as received
pkt n in [rcvbase-N,rcvbase-1]
if n smallest unACKed pkt,
ACK(n)
advance window base to
next unACKed seq # otherwise:
ignore
26
Selective repeat in action
Selective repeat:
dilemma
Example:
seq #’s: 0, 1, 2, 3
window size=3
receiver sees no
difference in two
scenarios!
incorrectly passes
duplicate data as new
in (a)
Q: what relationship
between seq # size
and window size?
Transport Layer 3-54
27
Chapter 3 outline
3.1 Transport-layer 3.5 Connection-oriented
services transport: TCP
3.2 Multiplexing and segment structure
demultiplexing reliable data transfer
flow control
3.3 Connectionless
connection management
transport: UDP
3.6 Principles of
3.4 Principles of
reliable data transfer congestion control
3.7 TCP congestion
control
overwhelm receiver
door door
TCP TCP
send buffer receive buffer
segment
28
TCP segment structure
32 bits
URG: urgent data counting
(generally not used) source port # dest port #
by bytes
sequence number of data
ACK: ACK #
valid acknowledgement number (not segments!)
head not
PSH: push data now len used
UA P R S F Receive window
(generally not used) # bytes
checksum Urg data pointer
rcvr willing
RST, SYN, FIN: to accept
Options (variable length)
connection estab
(setup, teardown
commands)
application
Internet data
checksum (variable length)
(as in UDP)
29
TCP Round Trip Time and Timeout
Q: how to set TCP Q: how to estimate RTT?
timeout value? SampleRTT: measured time from
longer than RTT segment transmission until ACK
but RTT varies
receipt
ignore retransmissions
too short: premature
timeout SampleRTT will vary, want
unnecessary
estimated RTT “smoother”
retransmissions average several recent
30
Example RTT estimation:
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
350
300
250
RTT (milliseconds)
200
150
100
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
DevRTT = (1-)*DevRTT +
*|SampleRTT-EstimatedRTT|
(typically, = 0.25)
31
Chapter 3 outline
3.1 Transport-layer 3.5 Connection-oriented
services transport: TCP
3.2 Multiplexing and segment structure
demultiplexing reliable data transfer
flow control
3.3 Connectionless
connection management
transport: UDP
3.6 Principles of
3.4 Principles of
reliable data transfer congestion control
3.7 TCP congestion
control
32
TCP sender events:
data rcvd from app: timeout:
create segment with retransmit segment
seq # that caused timeout
seq # is byte-stream restart timer
number of first data ACK rcvd:
byte in segment if acknowledges
start timer if not previously unACKed
already running (think segments
of timer as for oldest update what is known to
unACKed segment) be ACKed
expiration interval: start timer if there are
TimeOutInterval outstanding segments
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
33
TCP: retransmission scenarios
Host A Host B Host A Host B
Seq=92 timeout
timeout
X
loss
Sendbase
= 100
Seq=92 timeout
SendBase
= 120
SendBase
= 100 SendBase
= 120 premature timeout
time time
lost ACK scenario
Transport Layer 3-67
X
loss
SendBase
= 120
time
Cumulative ACK scenario
34
TCP ACK generation [RFC 1122, RFC 2581]
Fast Retransmit
time-out period often If sender receives 3
relatively long: ACKs for same data, it
long delay before assumes that segment
resending lost packet after ACKed data was
detect lost segments lost:
via duplicate ACKs. fast retransmit: resend
sender often sends segment before timer
many segments back-to- expires
back
if segment is lost, there
will likely be many
duplicate ACKs for that
segment
35
Host A Host B
seq # x1
seq # x2
seq # x3
ACK x1
seq # x4 X
seq # x5
ACK x1
ACK x1
ACK x1
triple
duplicate
ACKs
timeout
time
36
Chapter 3 outline
3.1 Transport-layer 3.5 Connection-oriented
services transport: TCP
3.2 Multiplexing and segment structure
demultiplexing reliable data transfer
flow control
3.3 Connectionless
connection management
transport: UDP
3.6 Principles of
3.4 Principles of
reliable data transfer congestion control
3.7 TCP congestion
control
IP
(currently)
TCP data application
speed-matching
unused buffer
datagrams space
(in buffer) process service: matching
send rate to receiving
application’s drain rate
app process may be
slow at reading from
buffer
Transport Layer 3-74
37
TCP Flow control: how it works
IP
(currently)
TCP data application
receiver: advertises
unused buffer
datagrams space
(in buffer) process unused buffer space by
including rwnd value in
rwnd segment header
RcvBuffer
sender: limits # of
(suppose TCP receiver unACKed bytes to rwnd
discards out-of-order guarantees receiver’s
segments) buffer doesn’t overflow
unused buffer space:
= rwnd
= RcvBuffer-[LastByteRcvd -
LastByteRead]
Chapter 3 outline
3.1 Transport-layer 3.5 Connection-oriented
services transport: TCP
3.2 Multiplexing and segment structure
demultiplexing reliable data transfer
flow control
3.3 Connectionless
connection management
transport: UDP
3.6 Principles of
3.4 Principles of
reliable data transfer congestion control
3.7 TCP congestion
control
38
TCP Connection Management
Recall: TCP sender, receiver Three way handshake:
establish “connection”
before exchanging data Step 1: client host sends TCP
segments SYN segment to server
initialize TCP variables: specifies initial seq #
seq. #s no data
close
client closes socket:
clientSocket.close();
39
TCP Connection Management (cont.)
timed wait
Note: with small
closed
modification, can handle
simultaneous FINs.
closed
TCP server
lifecycle
TCP client
lifecycle
40
Chapter 3 outline
3.1 Transport-layer 3.5 Connection-oriented
services transport: TCP
3.2 Multiplexing and segment structure
demultiplexing reliable data transfer
flow control
3.3 Connectionless
connection management
transport: UDP
3.6 Principles of
3.4 Principles of
reliable data transfer congestion control
3.7 TCP congestion
control
Congestion:
informally: “too many sources sending too much
data too fast for network to handle”
different from flow control!
manifestations:
lost packets (buffer overflow at routers)
long delays (queueing in router buffers)
a top-10 problem!
41
Causes/costs of congestion: scenario 1
Host A out
two senders, two
in : original data
receivers
one router,
Host B unlimited shared
output link buffers
infinite buffers
no retransmission
large delays
when congested
maximum
achievable
throughput
Transport Layer 3-83
42
Causes/costs of congestion: scenario 2
always: = (goodput)
in out
“perfect” retransmission only when loss: > out
in
retransmission of delayed (not lost) packet makes larger
in
(than perfect case) for same out
R/2 R/2 R/2
R/3
out
out
out
R/4
a. b. c.
“costs” of congestion:
more work (retrans) for given “goodput”
unneeded retransmissions: link carries multiple copies of pkt
Transport Layer 3-85
Host B
43
Causes/costs of congestion: scenario 3
H
o
o
s
u
t
A t
H
o
s
t
B
44
Case study: ATM ABR congestion control
45
Chapter 3 outline
3.1 Transport-layer 3.5 Connection-oriented
services transport: TCP
3.2 Multiplexing and segment structure
demultiplexing reliable data transfer
flow control
3.3 Connectionless
connection management
transport: UDP
3.6 Principles of
3.4 Principles of
reliable data transfer congestion control
3.7 TCP congestion
control
46
TCP congestion control: bandwidth probing
“probing for bandwidth”: increase transmission rate
on receipt of ACK, until eventually loss occurs, then
decrease transmission rate
continue to increase on ACK, decrease on loss (since available
bandwidth is changing, depending on other connections in
network)
ACKs being received,
X loss, so decrease rate
so increase rate
X
X
X
sending rate
TCP’s
X “sawtooth”
behavior
time
cwnd
rate = bytes/sec
RTT
RTT
cwnd is dynamic, function of
perceived network congestion ACK(s)
47
TCP Congestion Control: more details
48
Transitioning into/out of slowstart
ssthresh: cwnd threshold maintained by TCP
on loss event: set ssthresh to cwnd/2
remember (half of) TCP rate when congestion last occurred
when cwnd >= ssthresh: transition from slowstart to congestion
avoidance phase
duplicate ACK
dupACKcount++ new ACK
cwnd = cwnd+MSS
dupACKcount = 0
transmit new segment(s),as allowed
cwnd = 1 MSS
ssthresh = 64 KB cwnd > ssthresh
dupACKcount = 0 slow congestion
start timeout avoidance
ssthresh = cwnd/2
cwnd = 1 MSS
timeout dupACKcount = 0
retransmit missing segment
ssthresh = cwnd/2
cwnd = 1 MSS
dupACKcount = 0
retransmit missing segment
49
TCP congestion control FSM: overview
new ACK
duplicate ACK
dupACKcount++ new ACK
.
cwnd = cwnd + MSS (MSS/cwnd)
dupACKcount = 0
cwnd = cwnd+MSS transmit new segment(s),as allowed
dupACKcount = 0
transmit new segment(s),as allowed
cwnd = 1 MSS
ssthresh = 64 KB cwnd > ssthresh
dupACKcount = 0 slow congestion
start timeout avoidance
ssthresh = cwnd/2
cwnd = 1 MSS duplicate ACK
timeout dupACKcount = 0
dupACKcount++
retransmit missing segment
ssthresh = cwnd/2
cwnd = 1 MSS
dupACKcount = 0
retransmit missing segment
timeout
ssthresh = cwnd/2
cwnd = 1 New ACK
dupACKcount = 0
cwnd = ssthresh dupACKcount == 3
dupACKcount == 3 retransmit missing segment dupACKcount = 0
ssthresh= cwnd/2 ssthresh= cwnd/2
cwnd = ssthresh + 3 cwnd = ssthresh + 3
retransmit missing segment retransmit missing segment
fast
recovery
duplicate ACK
cwnd = cwnd + MSS
transmit new segment(s), as allowed
50
Popular “flavors” of TCP
ssthresh
ssthresh
TCP Tahoe
Transmission round
51
TCP throughput
Q: what’s average throughout of TCP as
function of window size, RTT?
ignoring slow start
let W be window size when loss occurs.
when window is W, throughput is W/RTT
just after loss, window drops to W/2,
throughput to W/2RTT.
average throughout: .75 W/RTT
1.22 MSS
RTT L
➜ L = 2·10-10 Wow
new versions of TCP for high-speed
52
TCP Fairness
fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K
TCP connection 1
bottleneck
TCP
router
connection 2
capacity R
Connection 1 throughput R
53
Fairness (more)
Fairness and UDP Fairness and parallel TCP
multimedia apps often
connections
do not use TCP nothing prevents app from
do not want rate opening parallel
throttled by congestion connections between 2
control hosts.
instead use UDP: web browsers do this
pump audio/video at example: link of rate R
constant rate, tolerate
packet loss
supporting 9 connections;
new app asks for 1 TCP, gets
rate R/10
new app asks for 11 TCPs,
gets R/2 !
Chapter 3: Summary
principles behind transport
layer services:
multiplexing,
demultiplexing
reliable data transfer
flow control Next:
congestion control leaving the network
instantiation and “edge” (application,
implementation in the transport layers)
Internet into the network
UDP “core”
TCP
Transport Layer 3-108
54