Anda di halaman 1dari 3

Simple Regression

The simplest regression models involve a single response variable Y and a single predictor variable X.
STATGRAPHICS will fit a variety of functional forms, listing the models in decreasing order of R-squared. If
outliers are suspected, resistant methods can be used to fit the models instead of least squares.
Box-Cox Transformations
When the response variable does not follow a normal distribution, it is sometimes possible to use the
methods of Box and Cox to find a transformation that improves the fit. Their transformations are based on
powers of Y. STATGRAPHICS will automatically determine the optimal power and fit an appropriate model.
Polynomial Regression
Another approach to fitting a nonlinear equation is to consider polynomial functions of X. For interpolative
purposes, polynomials have the attractive property of being able to approximate many kinds of functions.
Calibration Models
In a typical calibration problem, a number of known samples are measured and an equation is fit relating
the measurements to the reference values. The fitted equation is then used to predict the value of an
unknown sample by generating an inverse prediction (predicting X from Y) after measuring the sample.
Multiple Regression
The Multiple Regression procedure fits a model relating a response variable Y to multiple predictor
variables X1, X2, ... . The user may include all predictor variables in the fit or ask the program to use a
stepwise regression to select a subset containing only significant predictors. At the same time, the Box-Cox
method can be used to deal with non-normality and the Cochrane-Orcutt procedure to deal with
autocorrelated residuals.
Comparison of Regression Lines
In some situations, it is necessary to compare several regression lines. STATGRAPHICS will fit parallel or
non-parallel linear regressions for each level of a "BY" variable and perform statistical tests to determine
whether the intercepts and/or slopes of the lines are significantly different.
Regression Model Selection
If the number of predictors is not excessive, it is possible to fit regression models involving all
combinations of 1 predictor, 2 predictors, 3 predictors, etc, and sort the models according to a goodness-of
fit statistic. In STATGRAPHICS, the Regression Model Selection procedure implements such a scheme,
selecting the models which give the best values of the adjusted R-Squared or of Mallows' Cp statistic.
Ridge Regression
When the predictor variables are highly correlated amongst themselves, the coefficients of the resulting
least squares fit may be very imprecise. By allowing a small amount of bias in the estimates, more
reasonable coefficients may often be obtained. Ridge regression is one method to address these issues.
Often, small amounts of bias lead to dramatic reductions in the variance of the estimated model
coefficients.
Nonlinear Regression
Most least squares regression programs are designed to fit models that are linear in the coefficients. When
the analyst wishes to fit an intrinsically nonlinear model, a numerical procedure must be used. The
STATGRAPHICS Nonlinear Least Squares procedure uses an algorithm due to Marquardt to fit any function
entered by the user.
Regression Analysis for Counts
For response variables that are counts, STATGRAPHICS provides two procedures: a Poisson Regression and
a Negative Binomial Regression. Each fits a loglinear model involving both quantitative and categorical
predictors.
Regression Analysis for Proportions
When the response variable is a proportion or a binary value (0 or 1), standard regression techniques must
be modified. STATGRAPHICS provides two important procedures for this situation: Logistic Regression and
Probit Analysis. Both methods yield a prediction equation that is constrained to lie between 0 and 1.
Life Data Regression
To describe the impact of external variables on failure times, regression models may be fit. Unfortunately,
standard least squares techniques do not work well for two reasons: the data are often censored, and the
failure time distribution is rarely Gaussian. For this reason, STATGRAPHICS provides a special procedure

that will fit life data regression models with censoring, assuming either an exponential, extreme value,
logistic, loglogistic, lognormal, normal or Weibull distribution.

REGRESSION ANALYSIS

In statistical modeling, regression analysis is a statistical process for estimating the relationships
among variables. It includes many techniques for modeling and analyzing several variables, when the
focus is on the relationship between a dependent variable and one or more independent variables (or
'predictors'). More specifically, regression analysis helps one understand how the typical value of the
dependent variable (or 'criterion variable') changes when any one of the independent variables is
varied, while the other independent variables are held fixed. Most commonly, regression analysis
estimates the conditional expectation of the dependent variable given the independent variables
that is, the average value of the dependent variable when the independent variables are fixed. Less
commonly, the focus is on a quantile, or other location parameter of the conditional distribution of the
dependent variable given the independent variables. In all cases, the estimation target is
a function of the independent variables called the regression function. In regression analysis, it is
also of interest to characterize the variation of the dependent variable around the regression function
which can be described by a probability distribution. A related but distinct approach is necessary
condition analysis[1] (NCA), which estimates the maximum (rather than average) value of the
dependent variable for a given value of the independent variable (ceiling line rather than central line)
in order to identify what value of the independent variable is necessary but not sufficient for a given
value of the dependent variable.
Regression analysis is widely used for prediction and forecasting, where its use has substantial
overlap with the field of machine learning. Regression analysis is also used to understand which
among the independent variables are related to the dependent variable, and to explore the forms of
these relationships. In restricted circumstances, regression analysis can be used to infer causal
relationships between the independent and dependent variables. However this can lead to illusions or
false relationships, so caution is advisable; for example, correlation does not imply causation. Many
techniques for carrying out regression analysis have been developed. Familiar methods such as linear
regression and ordinary least squares regression are parametric, in that the regression function is
defined in terms of a finite number of unknown parameters that are estimated from
the data. Nonparametric regression refers to techniques that allow the regression function to lie in a
specified set of functions, which may be infinite-dimensional.
The performance of regression analysis methods in practice depends on the form of the data
generating process, and how it relates to the regression approach being used. Since the true form of
the data-generating process is generally not known, regression analysis often depends to some
extent on making assumptions about this process. These assumptions are sometimes testable if a
sufficient quantity of data is available. Regression models for prediction are often useful even when
the assumptions are moderately violated, although they may not perform optimally. However, in
many applications, especially with small effects or questions ofcausality based on observational data,
regression methods can give misleading results.

In a narrower sense, regression may refer specifically to the estimation of continuous response
variables, as opposed to the discrete response variables used in classification. The case of a
continuous output variable may be more specifically referred to as metric regression to distinguish
it from related problems.

Anda mungkin juga menyukai