Agenda
Packet processing
Q&A
Reality
Network Stack
(Kernel Space)
Ring Buffer
Parse
L2 & IP
Local?
Process
(User Space)
Parse
TCP/UDP
Task /
Container
Forward
DMA
Route?
Ring Buffer
Construct
IP
Construct
TCP/UDP
write()
Socket Buffer
A
B
Network
Stack
Ring Buffer
poll()
Network
Stack
Ring Buffer
Busy Polling
busy_poll()
Task
C
Ring Buffer
Network
Stack
RX-queue-3
CPU 1
RX-queue-4
CPU 2
Use it to ...
RX-queue-1
RX-queue-2
RX-queue-3
RX-queue-4
CPU 1
CPU 1
CPU 2
CPU 2
CPU 3
CPU 3
Hardware Offload
RX/TX Checksumming
Segmentation Offload
Network
Stack
Ring Buffer
GRO
MTU
Up to 64K
Segmentation Offload
(ethtool -K eth0 tso on)
(ethtool -K eth0 gso on)
Up to 64K
Network
Stack
Ring Buffer
Packet Processing
Link Layer
Packet Socket
ETH_P_ALL
Ingress QoS
tcpdump
Bridge
Open vSwitch
RX Handler
Team
Bonding
macvlan
macvtap
IPv4
Proto Handler
IPv6
ARP
The Feast!
IPX
Drop
...
IP Processing
PREROUTING
IP
Handler
INPUT
Route Lookup
Local Delivery
Forwarding
L4
(TCP, ...)
FORWARD
Route Lookup
Link Layer
IPv4
Construction
POSTROUTING
OUTPUT
Local Output
User
Space
TCP Processing
IP
Parse TCP
Lookup Socket
Socket Filter
socket locked
task exists
Receive TCP
Prequeue
poll()
Task
Backlog
Fast Open
Server
Client
1st Req
SYN
ACK
SYN+
2x RTT
ACK+
H
TTP G
ET
2x RTT
A
SYN+
2x RTT
ACK+
H
CK
TTP G
ET
Data
ACK+
H
okie
TTP G
ET
Data
2nd Req
SYN
SYN
Co
ACK+
+
N
Y
S
Data
2nd Req
Server
1x RTT
SYN+
Cook
ie+HT
TP GE
T
ta
K+Da
C
A
+
SYN
ssh
Block or EWOULDBLOCK
write()
rmem -= packet-size
wmem
overlimit?
Socket Buffer
rmem += packet-size
wmem += packet-size
rmem
overlimit?
Socket Buffer
Reduce TCP Window
TCP/IP
TCP/IP
TX Ring Buffer
wmem -= packet-size
RX Ring Buffer
torrent
write()
write()
Socket Buffer
Socket Buffer
Queuing Discipline
Driver
TX Ring Buffer
Q&A
Contact:
E-Mail: tgraf@suug.ch
Twitter: @tgraf__