Anda di halaman 1dari 10

Alex Britt

2013/4/23
MATH 547


Linear Network Coding to Increase TCP Throughput
Please update to the latest version of Solve for X to view this content.

INTRODUCTION
Current network organization serves to transmit files between sources and
receivers for the most part reliably and quickly. To ensure a signal reaches its correct
destination and remains accurate during transmission involves organizational details
that are handled modularly by a data structure called the protocol stack, illustrated in
Figure 1. This consists of the source user at the top, which can be for example a server
or other source of information to be transmitted. This information is fed to the TCP
layer, responsible for error-checking, and then to the IP layer, which directs it towards
the assigned address. After the IP layer are additional lower-level layers such as the
actual connections that move the bits from one router to another. On the other end the
signal reaches the receivers IP layer, which passes it up to the TCP layer. The receivers
TCP layer error-checks and passes the completed message to the receiver (a computer
or a program such as a browser). This architecture has flaws, however, that can result
in extreme inefficiency, which network coding aims to solve.


Figure 1 - Normal configuration of TCP


THE WORKINGS OF TCP/IP AND THE PROTOCOL STACK
Transmission Control Protocol (TCP) is integral to the accuracy and reliability of
data communicated via the Internet. It is the service that registers incoming and
outgoing data on both the source and receiver ends of a connection to check for errors
and ensure that information arrives in order and is complete. After a user, such as a
web browser, makes a request for information to a source, the data is streamed from
the source to the sources TCP layer, which separates it into chunks and attaches to each
Alex Britt
2013/4/23
MATH 547


a TCP header containing information such as source, destination, its number within the
sequence of related chunks, and other options and flags. These data will tell the
receivers TCP layer which this is so that it can ensure all information has arrived and is
in order. The resulting units, called TCP segments, are passed to the IP layer,
responsible for determining the path to the receiver. To perform this function, the IP
layer attaches its own header, containing source, destination, and other routing data
used by the nodes between the source and receiver. The contents (a TCP segment) are
called the payload, and together with the header form a packet. When the receivers IP
layer collects packets, it checks their source and destination fields to ensure they are
the ones desired, and discards them if they are not. The remainder have their IP
headers removed and the segments inside are passed to the receivers TCP layer, where
the data in the TCP header allows them to be put in order and acknowledgements
(ACKs) sent back to the source for each segment received. IP routing necessitates
navigating a constantly changing set of available paths and connections in and between
networks (part of the need for the routing data), and so the parts of a message may
arrive out of order and without some segments, due to losses in transmission. The
receivers TCP sends an ACK for each segment it gets, back to the sources TCP. The
source-side TCP keeps a record of what it sent and when, and if it does not receive an
ACK for a segment within a defined time period it assumes the segment is lost and
resends it.


LOSSY NETWORKS AND TCPS CONGESTION CONTROL
When this algorithm of waiting and resending lost data was designed, the
primary method of transmission was wired connections [1]. When there are heavy
losses, TCP assumes they result from congestion and reduces its transmission rate
accordingly.
At any given time, a sources TCP works with a limited range of segments within
those in its queue. This range is called the transmission window. TCPs response to
congestion is to reduce the transmission rate by shortening this window, meaning it
clogs the network with fewer packets, which would resolve congestion.
Assuming congestion as a main cause of segment loss was accurate in wired
networks, but with the increasing ubiquity of wireless connections it becomes in some
ways a hindrance. Wireless networks are susceptible to losses in transmission caused
by a variety of factors other than congestion. These include weak signals, interfering
signals/radiation sources, and physical obstructions. The way to overcome these is to
transmit as much or more, since losses to these causes are some fraction of the amount
Alex Britt
2013/4/23
MATH 547


transmitted. Therefore TCPs assumption of congestion, and corresponding reaction,
massively hurts throughput in lossy wireless networks.


PLACING NETWORK CODING WITHIN EXISTING ARCHITECHTURE
The segments coming out of the sources TCP are intercepted by the NC layer,
made into linear combinations, and given to the IP layer along with another vector
containing the coefficients used, called the coding vector. When these reach the
receivers TCP layer the received combinations and the coefficients are used to solve the
linear system, resulting in the original vectors (segments to be communicated).

Figure 2 - Placing Network Coding within existing architecture
2


NETWORK CODING ALGORITHMS
Since the information transmitted is entirely 1s and 0s, the act of taking linear
combinations can work in a variety of ways. One example simple enough to explicate
here is a bitwise exclusive or operator (XOR, ). Each bit in a linear combination is the
XOR of the corresponding bits in the vectors it contains: [3]

[

] [

] [

]

This operation can also be thought of as adding in binary and dropping overflow
(1+1=0), or as a linear space with: <a,b>i = aibi.
Alex Britt
2013/4/23
MATH 547


Segments in the transmission window of the TCP layer (the ones it is
transmitting) are captured by the NC coding layer and taken as the basis from which it
forms linear combinations. These combinations are passed to the IP layer, which
handles them like regular segments, sending them to their destination. The NC layer
also passes a segment containing a list of coefficients indicating which segments are
components of each of the combinations. [1]
For instance, here is the process of encoding and decoding the string Hello!
This is an abstraction, as real segments/packets would be many pages long, and
TCP/NC operates on a much larger window of segments, but the operation holds for any
amount of data in question.

Characters in ASCII:

H: 01001000 e: 01100101 l: 01101100 o: 01101111 !: 00100001

For this abstraction, we treat each character as the architecture treats a segment, and
hence packet, so we have 6 of them to communicate:


[



The TCP layer transmits these, and the NC layer reworks them into, for example,
something like these coded segments:

c1=s2s4s5s6 c5=s3s6
c2=s1s2s3 c6=s2s4
c3=s1s4 c7=s3s5
c4=s4s5 c8=s1s6

Alex Britt
2013/4/23
MATH 547



[



Here are the coefficient vectors that form the extra segment. They have 6 positions, one
for each of the 6 original segments. The vector e1, for instance, has a 1 in the 2, 4, 5, and
6 positions, because c1 contains the XOR of s2, s4, s5, and s6:


[



For packets in an example this small the coefficient vectors relative size makes
them comically inefficient, but for real packet sizes they scale to extreme efficiency. If
we wished to communicate 50 original packets, we could transmit up to 256 possible
linear combinations and use only 50 bytes for all of the coefficients. Since average
packet size is around 1500 bytes, this overhead is trivial. For this implementation, I
used one on/off bit to indicate whether each basis vector is in a given coded vector,
another abstraction. When dealing with real-size bases it proves more efficient to
denote coefficients by their number and assume others are not included (say, 15, 47,
134, 167, and 245). [1]
DECODING
The nature of these vectors means that decoding them to find the original
segments is the same as solving a linear system. [] If the e vectors are interpreted as
the rows of a matrix and solved (where the row operation is XOR), the set of coefficient
(e) vectors that, when XORed together, produces a vector with only the nth element
equal to 1 correlates to the set of coded (c) vectors that can be XORed to produce the
original vector sn. For instance:

Alex Britt
2013/4/23
MATH 547



[


means that: c2c4c6c7=s1.


For a computer, it is inexpensive to decode the entire linear system in this way:
[


s2=c3c6p1 s3=c5c8p1; s4=c6p2

[


s5=c7p3; s6=c5p3

Having determined and performed the linear combinations to decode the
original segments contents, we compare them to the ASCII encodings to get:


[



revealing the original string: Hello!



Alex Britt
2013/4/23
MATH 547


OPTIMIZING LEVEL OF REDUNDANCY
The purpose of all these operations is to prevent loss of data when some packets
fail to reach the receiver. To achieve the required degrees of freedom in those that are
received, the number of linear combinations originally transmitted is greater than the
number of segments to be communicated, as some will not reach their destination. The
degree to which the number transmitted exceeds the number of originals is the
redundancy factor R. If R is high enough these losses are not harmful, as the segments
remaining after losses are dependent on each other as a linear system, which the
receiving computer can solve very quickly. For instance, in the above example, if the
coded segment c1 is lost, the original p1...p6 can still be decoded. An optimized
algorithm would be able to produce combinations such that any packet could be lost
without affecting their ability to be decoded.
The exact meaning of R is that for every segment sent by the source, the NC layer
produces and sends R random linear combinations. Random network losses from these
are recoverable since the data of any one segment exists in many of the combinations.
Experimentation in [2] found that increasing R yielded significant improvements in
throughput up to R = 1.25. After this point, additional redundancy does not greatly
increase the probability that the remaining packets will contain all the information,
because the probability is already near 1. Excess redundancy was shown to reduce
throughput, due to the cost of transmitting additional coded segments.
A variety of algorithms of this nature (fountain codes) exist: Reed-Solomon code,
raptor code, tornado code, LT code. Their complexity yields more efficient operation
new methods and optimizations. [3] For a given R, the higher the probability of
complete communication, the better a coding algorithm is. Optimization of an
algorithm entails minimizing the amount by which R must exceed 1 ( in Fig. 3) to
ensure reliable transmission.
Alex Britt
2013/4/23
MATH 547



Figure 3 - Effect of increasing R on probability of successful transmission


RELATIVE COSTS COMPARED TO NON-CODED TCP
This algorithm is effective at masking random losses, but susceptible to
correlated losses, the type which result mainly from congestion. So when losses escape
NC and are not masked from TCP, they are likely to be the product of congestion,
meaning TCPs response of reducing the transmission window is appropriate and
effective. This allows TCPs countermeasures to remain effective while limiting the
false positive activations in cases other than congestion. [2]
The overhead accompanying network coding comes from two sources: 1) the
increased volume of transmission in redundancy and the coding vector (the
coefficients); 2) the cost of encoding and decoding the segments at either end. As noted
above, network coding uses more bandwidth up front with its initial transmission, but
ultimately creates significant savings by eliminating the frequent retransmissions of
uncoded TCP. The computational costs of decoding even large data sets are measured
in microseconds, trivial compared to the tens or hundreds of milliseconds required for a
round trip through the network. [2]





Alex Britt
2013/4/23
MATH 547


CONCLUSION
Replacing conventional data segments with carefully chosen linear combinations
proves effective for increasing throughput in lossy networks, suggesting potential for
greatly increased bandwidth in the wireless networks that commonly experience
random losses. By masking losses from TCPs congestion control algorithm, network
coding allows the sliding transmission window to remain large instead of closing
erroneously due to random losses misinterpreted as congestion.































Alex Britt
2013/4/23
MATH 547





REFERENCES:
1. Kim, MinJi; Mdard, Muriel; and Barros, Joo. Modeling Network Coded TCP
Throughput: A Simple Model and its Validation. MIT.edu. Web. 9 Apr.
2013.
<http://www.mit.edu/~medard/papers2011/Modeling%20Network%2
0Coded%20TCP.pdf>

2. Sundararajan, Jay Kumar et. al. Network Coding Meets TCP. IEEE.org. Web. 9
Apr. 2013. <http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber
=5061931&tag=1>

3. Meyer-Patel, Ketan. Personal Conversation. 18 Apr 2013.

4. Fasolo, Elena. Network Coding Techniques. 2012. Slide show. 9 Apr. 2013.
<cs.virginia.edu/~yw5s/Network%20coding.ppt>
[basic tutorial helpful for initial understanding]

5. Mitzenmacher, Michael. Network Coding Meets TCP. 2012. Slide show. 9
Apr. 2013. <eecs.harvard.edu/~michaelm/TALKS/London.ppt >
[basic tutorial helpful for initial understanding]

Anda mungkin juga menyukai