Anda di halaman 1dari 5

Assignment 7 Regression

Harrison Zheng December 31, 2013

Introduction
Generally speaking, it is desireable to be able to model natural phenomeno with a mathematical equation or function. However, developing these mdoels is pretty tricky. They must be derived from experimental results which leads to two main issues. First o, experimental data is composed up of nite data points and not a continuouse function. Second, experimental data has error, both from signal noise and other esperimental error that causes reading to be inaccurate and also from the fact that theoretical models cannot perfectly model the physical world. Thus it is often necessary to determine a function of best t for a given set of discrete data points.

Methods
Linear Least Squares Regression
The simplest way to model the behavior of a set of data is lienar regression, also known as the line of best t. Linear regression tries to nd a function f (x) = ax + b that can geenratize the behavior of a set of data. The most commonly used algorithm for this is Least Squares Regression. Least Squares regression attempts to minimize r2 where r is the residual term dened as ri = yi f (xi ) where f (x) is dened as the function (in this case line) of best t. Coincidentally, this method and its algorithm is taught to all magent students in freshman Pre-Calc so this will essentially be a rehash of the method. You basically end up setting up the linear system a a x2 + b x= y xy

x2 + bn =

where n is the number of data points. The formulas for a and b can then be determined using kramers rule if youre into that sort of stu, or you could just us Gauss-Jordan elimination, but I digress. The general upshot is that you now solve that system to nd a and b and thus a line of best t.

Power Regression
Unfortunately most things in life arent linear. However, for interpolating a nite data set you can get a pretty decent approximation with a monomial (after all thats more or less what a Taylor series does). Now we need to nd a method to minimize r2 for f (x) = bxa 1

. So it turns out if you take the natural log of both sides (Im subbing in y for f (x) but just bear with me here) and seperate the terms out (go log rules!) you get the equation ln y = ln xa + ln b which when simplied to ln y = a ln x + ln b looks awefully linear. Thus we can just adopt the linear regression algorithm for Power regression (though we do have to take the natural log of all the data points).

0.1

Logarithmic Regression

It also turns out that many natural phenomena can be modeled with logarithmic functions. Now logarithmic functions which take the form y = a ln x + b already are in a linear form so it can use the normal least squares regression formula, albeit the x values must be natural logged.

Data and Results


Data
So all results displayed are to 3 decimal points of percision. Also the initial dataset given is as follows. Table 1: Data X 1 2 3 4 5 Y 1 4 5 6 6

Linear Regression
Figure 1: Linear Regression Line through the data. Line is y = 1.2x + 0.8

So when Linear Regression was run on the data, the result was a line with the equation y = 1.2x + 0.8. Now overall, the line does a poor job of tting the data. To be fair, the data doesnt seem very linear and thus the we shouldnt really be expecting a good t. However, the line seems to be okay from x [2, 4] since at this point, the data seems to have a quasi-linear trend going on. In conclusion the line seems to be a poor t, though it could be useful for interpolating points from x [2, 4].

Power Regression
Figure 2: Power Regression Line through the data. Line is y = 1.285x1.112

So it turns out that the Power Regression result is actually pretty similar to the Linear Regression result. This shouldnt be too suprising since the power regression curve that was obtained is y = 1.285x1.112 which is awfully close to being linear. However, it does have better end behavior than the linear regression. The power regression curve does start out at (0, 0) which for this model (a chemcial reaction) seems more realistic than starting at a y-intercept. Whats pretty interesting though is that the concavity seems to be wrong. The curve has upwards concavity, while the data has downwards concavity. In the future, it might be worthwile to pursue full polynomial regression so that issues like inconsistent concavity can be xed.

Logarithmic Regressiom
Figure 3: Logarithmic Regression Line through the data. Line is y = 3.203 ln x + 1.333

Logarithmic regression returns as result of y = 3.202 ln x + 1.333 and actually does a pretty good job of matching the data for the domains that the data is given in (x [1, 6]). This seems to be because logarithmic regression is the only type of regression use that can provide a trend line with downwards concavity. However at the endpoints/extremas the logarithmic model has seriouse deciencies. When 3

x the model suggests that y when in reality the data seems to be displaying asymptotic behavior where y c and c 6.0. Also it seems to suggest the process starts at . Though overall, while it is a poor model to extrapolate data, it seems to be an excellent model to interpolate data.

Appendix
Source Code
Listing 1: Source Code

import j a v a . l a n g . Math ; /* * H a r r i s o n Zheng 12/16/13 Assignment 7 : R e g r e s s i o n A n a l y s i s */ public c l a s s r e g { // yeah I h ardcod ed t h e d a t a s e t s i n c e i t s p r e t t y s m a l l and t h i s w i l l make t h e code // I d a l s o l i k e t o t a k e t h e time t o make a s o a p b o x and complain a b o u t how s p a r s e t h

public s t a t i c double [ ] [ ] data = { { 1 , 1 } , { 2 , 4 } , { 3 , 5 } , { 4 , 6 } , { 5 , 6 } } ; /* * Main method , c a l l s t h e o t e r methods , n o t h i n g t o o new h e r e */ public s t a t i c void main ( S t r i n g [ ] a r g s ) { double [ ] l i n = l i n R e g ( data ) ; double [ ] pow = powReg ( data ) ; double [ ] l o g = logReg ( data ) ; System . out . p r i n t f ( L i n e a r R e g r e s s i o n Li ne : y = %.3 f x +%.3 f \ n , l i n [ 0 ] , l i n [ 1 ] ) System . out . p r i n t f ( Power R e g r e s s i o n Li ne : y = %.3 f x (%.3 f ) \ n , pow [ 1 ] , pow [ 0 System . out . p r i n t f ( Log R e g r e s s i o n Li ne : y= %.3 f l n ( x)+%.3 f \ n , l o g [ 0 ] , l o g [ 1 ] ) } /* * Does l i n e a r r e g r e s s i o n on a x , y f o r m a t t e d d a t a s e t S o l v e s f o r ax+b and r e t u r n an a r r a y w/ e l e m e n t s { a , b } */ public s t a t i c double [ ] l i n R e g ( double [ ] [ ] i n p u t ) { double [ ] l i n e = new double [ 2 ] ; int n = i n p u t . l e n g t h ; double s i g y =0; double s i g x = 0 ; double s i g x y = 0 ; double s i g x s = 0 ; double tx , ty ; f o r ( int i =0; i <n ; i ++) { tx = i n p u t [ i ] [ 0 ] ; ty = i n p u t [ i ] [ 1 ] ; s i g y+=ty ; s i g x += tx ; 4

s i g x y += ty * tx ; s i g x s += tx * tx ; } l i n e [ 0 ] = ( n * s i g x y s i g x * s i g y ) / ( n * s i g x s s i g x * s i g x ) ; l i n e [ 1 ] =( s i g x s * s i g y s i g x * s i g x y ) / ( n * s i g x s s i g x * s i g x ) ; return l i n e ; } /* * Does power r e g r e s s i o n s o l v e s y=ax n and r e t u r n s t h e a r r a y w i t h e l e m e n t s { n , a } */ public s t a t i c double [ ] powReg ( double [ ] [ ] i n p u t ) { double [ ] [ ] l o g I n = new double [ i n p u t . l e n g t h ] [ 2 ] ; double [ ] pow = new double [ 2 ] ; f o r ( int i =0; i <i n p u t . l e n g t h ; i ++){ l o g I n [ i ] [ 0 ] = Math . l o g ( i n p u t [ i ] [ 0 ] ) ; l o g I n [ i ] [ 1 ] = Math . l o g ( i n p u t [ i ] [ 1 ] ) ; } pow = l i n R e g ( l o g I n ) ; pow [ 1 ] = Math . exp ( pow [ 1 ] ) ; return pow ; } /* * Does Log R e g r e s s i o n Finds o p t i m a l y= a * l n ( x)+b and r e t u r n s t h e a r r a y w/ e l e m e n t s { a , b } */ public s t a t i c double [ ] logReg ( double [ ] [ ] i n p u t ) { double [ ] [ ] l o g I n = new double [ i n p u t . l e n g t h ] [ 2 ] ; double [ ] l o g = new double [ 2 ] ; f o r ( int i =0; i <i n p u t . l e n g t h ; i ++) { l o g I n [ i ] [ 0 ] = Math . l o g ( i n p u t [ i ] [ 0 ] ) ; logIn [ i ] [ 1 ] = input [ i ] [ 1 ] ; } log = linReg ( logIn ) ; return l o g ; } }