Anda di halaman 1dari 3

Self Information Self-information is a measure of the information content associated with the outcome of a random variable.

It is expressed in a unit of information, for example bits, nats, o rhartleys, depending on the base of the logarithm used in its calculation. The amount of self-information contained in a probabilistic event depends only on the probability of that event: the smaller its probability, the larger the self-information associated with receiving the information that the event indeed occurred. As the first step in finding a measure of information, consider an information source with a series of ordered outputs:

where the output is most-likely and is least-likely -- e.g. might be, for example, the weather condition in a given city and on a certain day or, perhaps, the outcome of a particular athletic event or A measure of "information" should satisfy the following conditions The information content of an output -- and not on the value of information of the output. depends only on the probability of occurring -- i.e.

. We denote this function by

and call it the self-

Note that

Self-information is a continuous function of Self-information is a decreasing function of If , then .

Only the "logarithmic" function satisfies these essential properties and thus self-information may be written:

Mutual Information Mutual information is one of many quantities that measures how much one random variable tells us about another. It is a dimensionless quantity with (generally) units of bits, and can be thought of as the reduction in uncertainty about one random variable given knowledge of another. High mutual information indicates a large reduction in uncertainty; low mutual information indicates a small reduction; and zero mutual information between two random variables means the variables are independent. For two discrete variables X and Y whose joint probability distribution is PXY(x,y) , the mutual information between them, denoted I(X;Y) , is given I(X;Y)=x,yPXY(x,y)logPXY(x,y)PX(x)PY(y)=EPXYlogPXYPXPY.

Here PX(x) and PY(y) are the marginals PX(x)=yPXY(x,y) and PY(y)=xPXY(x,y) and EP is the expected value over the distribution P . The units of information depend on the base of the logarithm. If base 2 is used (the most common, and the one used here), information is measured in bits. To understand what I(X;Y) actually means, we first need to define entropy and conditional entropy. Qualitatively, entropy is a measure of uncertainty the higher the entropy, the more uncertain one is about a random variable. Shannon postulated that a measure of uncertainty of a random variable X should be a continuous function of its probability distribution PX(x) and should satisfy the following conditions: It should be maximal when PX(x) is uniform, and in this case it should increase with the number of possible values X can take; It should remain the same if we reorder the probabilities assigned to different values of X ; The uncertainty about two independent random variables should be the sum of the uncertainties about each of them. He then showed that the only measure of uncertainty that satisfies all these conditions is the entropy, defined as H(X)=xPX(x)logPX(x)=EPXlogPX

H(X) has a very concrete interpretation: Suppose x is chosen randomly from the distribution PX(x) , and someone who knows the distribution PX(x) is asked to guess which x was chosen by asking only yes/no questions. If the guesser uses the optimal questionasking strategy, which is to divide the probability in half on each guess by asking questions like "is x greater than x0 ?", then the average number of yes/no questions it takes to guess x lies between H(X) and H(X)+. This gives quantitative meaning to "uncertainty": it is the number of yes/no questions it takes to guess a random variables, given knowledge of the underlying distribution and taking the optimal question-asking strategy. The conditional entropy is the average uncertainty about X after observing a second random variable Y , and is given by H(X|Y)=yPY(y)[xPX|Y(x|y)log(PX|Y(x|y))]=EPY[EPX|YlogPX|Y] where PX|Y(x|y)(PXY(x,y)/PY(y)) is the conditional probability of x given y . With the definitions of H(X) and H(X|Y) , I(X,Y)=H(X)H(X|Y).(4) Entropy For Markov Sources

Mutual information is therefore the reduction in uncertainty about variable X , or the expected reduction in the number of yes/no questions needed to guess X after observing Y . Since a stochastic process defined by a Markov chain that is irreducible and aperiodic has astationary distribution, the entropy rate is independent of the initial distribution. For example, for such a Markov chain Yk defined on a countable number of states, given the transition matrix Pij, H(Y) is given by:

where i is the stationary distribution of the chain. A simple consequence of this definition is that the entropy rate of an i.i.d. stochastic process has an entropy rate that is the same as the entropy of any individual member of the process

Anda mungkin juga menyukai