Multivariat Kernel Regression

A Note on Kernel Regression Partho Sarkar
of regressors
1 x1 18. X = 1 xm x12 x d 1 2 xm x d m
for locally weighted regression determined by the degree d of the polynomial. Note that a column of constants (1s) is the first column- this corresponds to the constant term 0 in the equation below. Thus, provided ( X'Wj X ) exists, the fit at xj is obtained as:
-1
19. y j = x j j = 0 j + 1 j x j + 2 j x j 2 + ... + dj x j d
where x j is the j-th row of the X matrix Note that a separate regression on all the memory points has to be carried out for every query point, i.e., the coefficients have to re-estimated for every xj (though they are used to estimate yj only for the j-th point). This makes local polynomial regression even more computationally intensive than simple kernel regression for sizeable memory and query sets. Authors generally agree that for the majority of cases, a first order fit (local linear regression) is an adequate choice for d. Local linear regression is suggested to balance computational ease with the flexibility to reproduce patterns that exist in the data. Nonetheless, local linear regression may fail to capture sharp curvature if present in the data structure. In such cases, local quadratic regression (d=2) may be needed to provide an adequate fit. Most authors agree there is usually no need for polynomials of order d>2).
4. Multivariate Kernel Regression

When there are multiple explanatory variables (k>1), the basic principles of kernel regression remain the same, but their implementation becomes more complex. Our independent variable data will now look like a matrix7 X, where xki is the i-th value of the k-th variable Xk
x11 X= x1m x21 x2 m xk1 x km
The data in memory will now take the form of pairs of vectors of values of the independent and dependent variables, ( X1 , y1 ) , ( X2 , y2 ) , , ( Xm , ym ) , where Xi is the i-th independent variable observation vector8, Xi = [ x1i
x2i xki ] '
7 8
Matrices and vectors are shown in bold type. It is more convenient for later work to express this as a column vector, hence the transpose operator
Page 10 of 15
A Note on Kernel Regression Partho Sarkar
As in the univariate case, the regression function, m(X), expresses the expected value of Y, conditional on X, as a function of X: 20. Y=m(X)+ 21. E(Y |X ) = m(X) 22. ()=0 If we now have a query or test point
X0 = [ x10
x20 xk 0 ] '
the kernel function, which gives the weights for kernel regression, will now be a function of the vector of distances
X j - X 0 = x1 j x10
x2 j x20 xkj xk 0 '
There are different ways to define the kernel function in the multivariate case. The simplest approach is to define the kernel function as a function of a single variable measure of the distance between the points Xj and X0 , e.g., the Euclidean distance:
D ( X j , X0 ) =
X j - X0
2 where X j - X0 = ( x1 j x10 ) +
(x
2j
x20 ) + +
2
(x
kj
2 xk 0 )
The weight associated with observation yj , for prediction at the point xo is then given by
d ( X j , X0 ) ) h 23. wij = d ( Xj , X ) K( h 0 ) i =1,..m Thus, in this case, the kernel function is effectively a function of a single variable, the distance, and all the univariate methods can now be used for further computations, e.g. to find the (unique) optimal bandwidth etc.. K(
More generally, the multivariate kernel function is defined as: 24. f ( X j , X0 ) = K (H 1 i( X j - X0 )) where H is a k k matrix of kernel widths, the bandwidth matrix. Thus in a fully general multivariate method, there are k2 bandwidths to select. For practical purposes, a diagonal matrix is commonly used:
h1 25. H = diag(h1 ,h 2 ,...,h k )= 0
hk
Page 11 of 15
A Note on Kernel Regression Partho Sarkar So in that case, there are only k bandwidths to be chosen, one for each dimension or independent variable. And the kernel function is now:
x2 j x10 xkj x10 x x 26. f ( X j , X 0 ) = K 1 j 10 h2 hk h1 For example, the Gaussian kernel function for the multivariate case is:
'
27. f ( X j , X 0 ) = (2 ) k /2 .exp( 1 ( X j 2
X 0 ) ' H ( X j X 0 ))
Of course, as one might expect, the computation involved in estimating a kernel regression model increases several fold when multiple variables are involved. In practice, this sets an upper limit on the number of variables one can include.
5. Model selection
As mentioned earlier, choosing the bandwidth so that a good compromise between over- and under-smoothing is achieved is the crucial problem in kernel regression. In addition, one also has to decide upon the kernel function and the degree of the polynomial. The kernel function, the degree of the polynomial and the bandwidth may be said to make up the kernel regression model. The aim is, given the data we have now, to find a model which would enable us to make the most accurate and reliable predictions on new data, i.e., data outside the current dataset. In other words, we want a model with the best generalization performance. Here we shall review approaches to selecting a good kernel regression model.
Here it might be useful to mention briefly the concept of regression error statistics, which are summary measures of the deviation of regression predictions from actual values.
Thus, if for a query point (xj,yj), the estimated value of yj from kernel regression is y j ,
the prediction error is
ej = y j y j
Thus if we have a set of observations (xj ,yj ),
j=1,2,n, for which kernel regression predictions y j have been computed, we have a
vector of prediction errors
e = e1 e2 ... en
We can then calculate various overall measures of the accuracy of prediction, or the error statistics, e.g., 1) The most basic and widely used statistic is the Mean Squared Error (MSE), defined as the Mean (average) of the Squared Prediction Errors. i.e., MSE = e t 2/ n
2) The Root Mean Squared Error (RMSE) is the square root of the MSE
3) Sometimes one wants a measure of the percent error, e.g. the Mean (or median) Absolute Percent Error. The Mean Absolute Percent Error (MAPE) is defined as the mean of the percentage errors. MAPE = 1/n (|e t/y t| * 100)
Page 12 of 15

Multivariat Kernel Regression

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Multivariat Kernel Regression

Diunggah oleh

Hak Cipta:

Format Tersedia

A Note on Kernel Regression Partho Sarkar

4. Multivariate Kernel Regression

x2i xki ] '

A Note on Kernel Regression Partho Sarkar

x2 j x20 xkj xk 0 '

Thus if we have a set of observations (xj ,yj ),

Anda mungkin juga menyukai