Anda di halaman 1dari 10

Quality Innovation Report I

The background of this report is to deepen understanding in writing statistical
analysis through the use of JMP program. The data used for analysis is taken
from American Bureau Statistic. The residential Energy Consumption Survey
(RECS) is a national area-probability sample survey that collects energy-related
data for occupied primary housing units. The survey collected data from 4,381
household in housing units statistically selected to represent the 111.1 million
housing units in the United States. Data were obtained from residential energy
suppliers for each unit in the sample to produce the Consumption and
Expenditure data.

The report purpose is to analyze result made by JMP program in applied data.

The Data analysis is carried out by selecting model analysis that suits the
characteristic of the data. In this data, it is noted that there are dependent and
independent variable. Therefore analysis model that states relation between
these two variables is chosen, such as Multiple Regression with Stepwise and
Leverage Plot.

Results Summary:
The result of Stepwise Regression analysis is summarized as:
 The increased number of “member per household” means “the
greater energy expenditure” with 1% of the variance in the observed
value is explained by the model, and 99% variance remain unexplained in
the error term (1-0.0076(R squared) =99%).
 The greater “number of member per household” doesn’t mean the
increase of “Floor Space per Household” with 31% of the variance in
the observed value is explained by the model, and (1-0.6846(R squared)
=31%), though the R squared value is larger than 0.01 shows that the
relation is weak and not significant.
 The increased number of “energy consumption per household” is
the result of the increase in number of member per household with
24% of the variance in the observed value is explained by the model, and
(1-0.7537(R squared) =24%), though the R squared value is larger than
0.01 shows that the relation is weak and not significant.
General result:
 A result of bias on Omitted Variable is found since the interval of
regression change is not constant. That according Leverage Plot analysis
this can be inferred that the scattered plot has outlier points that states
there should be another variable that affects the data. That omitted variable
is the Census Region or Division.
 Overall model shows a weak relationship between variables. Only some
relations can be defined as having a greater chance that the relationship is
real, which is result of the more consumption of energy means more
energy expenditure result per household.

Leverage Plot (てこ比プロット)


Energy Expenditures2 Total U.S._(billion Dollars)

ステップワイズ回 帰 の設定

変数を追加するときのp値 0.250
変数を除去するときのp値 0.100


SSE DFE MSE R2乗 自由度調整R2乗 Cp AICc
3672.6187 12 306.05156 0.0000 0.0000 0.7551421 115.4606

ロック 追加 パラメータ 推定値 自由度 平方和 "F値" "p値

X X 切片 30.933076 1 0 0.000 1
Number of Members per 0 1 28.07764 0.085 0.77638
Floorspace per 0 1 70.98461 0.217 0.65057
Household_(Square Feet)
SSE DFE MSE R2乗 自由度調整R2乗 Cp AICc
3644.541 11 331.32191 0.0076 -0.0826 2.6652725 118.8275

ロック 追加 パラメータ 推定値 自由度 平方和 "F値" "p値

X X 切片 2.228638 1 0 0.000 1
X Number of Members per 11.232923 1 28.07764 0.085 0.77638
Household 1
Floorspace per 0 1 520.2754 1.665 0.22593
Household_(Square Feet)

SSE DFE MSE R2乗 自由度調整R2乗 Cp AICc
3124.2657 10 312.42657 0.1493 -0.0208 3 121.1585

ロック 追加 パラメータ 推定値 自由度 平方和 "F値" "p値

X X 切片 - 1 0 0.000 1
X Number of Members per 82.474758 1 477.3684 1.528 0.24467
X Floorspace per 0.0471835 1 520.2754 1.665 0.22593
Household_(Square Feet) 7

ステッ パラメータ アクショ "有意確率" 逐次平方和 R2乗 Cp p
プ ン
1 Number of Members per Household 追加 0.7764 28.07764 0.0076 2.6653 2
2 Floorspace per Household_(Square 追加 0.2259 520.2754 0.1493 3 3

The above stepwise method shows Total Energy Expenditure per Household as
dependent variable in relation with Independent variable such as Number of
member per household and Floor space per Household (square feet).

Result 1:
The former value of R2 乗 (R squared ) is 0.00 that indicates no relation.
When “number of member” is included, the R squared value change from 0.00
to 0.0076 and then significantly increased to 0.14 when “Floor space” variable
is included and stepwise progress is done.
Though, the 推 定 値 (The "parameter estimates)" are the partial regression
coefficients; they show that the model is Yexp= 82.474758 (number of member)
+ 0.04718 (Floor space) − 282.48946.
The "standardized estimates" are the standard partial regression coefficients;
they show that “number of member per household” has the greatest
contribution to the mode. The value of this multiple regression would be that it
suggests that “number of member per household” is very important
variable. This means every changes of value in “number of member per
household” variable, would affect “Total Energy Expenditure per Household”
significantly. Since the change in R squared is positive, the increased number
of “member per household” means “the greater energy expenditure”
is interpreted as the result.

The same analysis also can be inferred by changing the dependent variable.


Number of Members per Household

SSE DFE MSE R2乗 自由度調整R2乗 Cp AICc
0.0470155 9 0.0052239 0.7887 0.7183 4 -17.6251

ロッ 追加 パラメータ 推定値 自由度 平方和 "F値" "p値

ク (Prob>F)"
X X 切片 3.65853924 1 0 0.000 1
X Floorspace per -0.0005403 1 0.16764 32.091 0.00031
Household_(Square Feet)
X Energy Consumption2 0.15138245 1 0.013862 2.654 0.13775
X Energy Expenditures2 Total -0.0055979 1 0.007791 1.491 0.25304
U.S._(billion Dollars)

ステップ パラメータ アクショ "有意確率" 逐次平方和 R2乗 Cp p

1 Floorspace per 追加 0.0005 0.152343 0.6846 4.4342 2
Household_(Square Feet)
2 Energy Consumption2 追加 0.1249 0.015374 0.7537 3.4913 3
3 Energy Expenditures2 Total 追加 0.2530 0.007791 0.7887 4 4
U.S._(billion Dollars)

The above stepwise method shows Number of Member per Household as

dependent variable in relation with Independent variable such as Total Energy
Consumption per household, Total Energy Expenditure per Household and Floor
space per Household (square feet).

Result 2:
As the previous result, the former value of R2 乗 (R squared ) is 0.00 that
indicates no relation.
When “Floor space” is included, the R squared value change from 0.00 to
0.6846, 0.7537 and then increased to 0.7887 when “Energy Consumption” and
“ Energy Expenditure” variable is included and stepwise progress is done. The
change of R squared value is not significant this time.
The 推 定 値 (The "estimation parameter)" shows that the model is Yexp=
-0.0005403 (Floor space)   + 0.15138 (Energy Consumption) – 0.0.55979
(Energy Expenditure) +3.658539.
It shows that “number of energy consumption” has the greatest
contribution to the mode.
Result 3:
But the 推定値 value changed to negative when “Floor space” is included, which
means that these two variables states a weak relation. That the greater
“number of member per household” doesn’t mean the increase of
“Floor Space per Household.” the R squared value change from 0.00 to
0.6846, 0.7537 and then increased to 0.7887 when “Energy Consumption” and
“Energy Expenditure”
Since the change in R squared is positive, the increased number of “energy
consumption per household” is the result of the increase in number of member
per household.
Nevertheless, the increased number of “energy consumption per
household” is the result of the increase in number of member per
F value amount varies from 32.091, 2.654, 0.1491 which is greater amount than
0.1, means that less chance that relationships in the model are real.
Leverage Plot (てこ比プロット)
応答 Energy Expenditures2 Total U.S._(billion Dollars)

Energy Expenditures2 Total
U.S._(billionDollars) 実測値

10 20 30 40 50 60 70 80

Energy Expenditures2 Total U.S._(billion

Dollars) 予測値 P<.0001 R2乗=0.94

R2乗 0.941931
自由度調整R2乗 0.922574
誤差の標準偏差(RMSE) 4.867876
Yの平均 30.93308
オブザベーション(または重みの合計) 13

要因 自由度 平方和 平均平方 F値
モデル 3 3459.3527 1153.12 48.6625
誤差 9 213.2660 23.70 p値(Prob>F)
全体(修正 済 み ) 12 3672.6187 <.0001*

項 推定値 標準誤差 t値 p値(Prob>|t|)
切片 108.10527 75.04148 1.44 0.1836
Number of Members per Household -25.39274 20.7934 -1.22 0.2530
Floorspace per Household_(Square Feet) -0.021029 0.011801 -1.78 0.1084
Energy Consumption2 20.623998 1.860764 11.08 <.0001*
Energy Expenditures2 Total
U.S._(billionDollars) 残差


- 10
20 30 40 50 60 70 10 80
Energy Expenditures2 Total
Number of Members per Household
U.S._(billionDollars) 予測値

Energy Expenditures2 Total

2.3 2.4 2.5 2.6 2.7 2.8
Number of Members per
Floorspace per Household_(Square Feet)

Energy Expenditures2 Total

1700 190020002100220023002400
Floorspace per Household_(Square
Energy Consumption2

Energy Expenditures2 Total

0.5 1.0 1.52.0 2.5 3.0 3.5 4.0
Result 4:
The F (3 d.f.) is 48.6625 with a p<.0.0001. So, we conclude that the population
R-squared is different from zero, or that at least one variable does have an
effect on “total energy expenditure”. The variance explained in the sample
is slightly more than 94% (R-squared =0.941), and estimation of the population
variance explained (shrunken or adjusted R-squared is 0.9225). In short, the
model is very successful in providing an account of why “Total energy
expenditure” varies. The predictions of the model are, however, reasonably
accurate. The average prediction error (Root mean squared error/ RMSE) is 4.86
points, relative to the mean 13 ( オ ブ ザ ベ ー シ ョ ン ), the mean prediction
error is about 13-4.86 = 8.13% (Coefficient of variation) of the average value
being predicted. This level of precision is generally regarded as acceptable.

The effect of “total energy consumption” is convex (20.623 with P value

<0.0001). That is, the more consumption of energy resulted in more
energy expenditure per household. Whereas, “floor space per
household” and “number of member per household” relation is concave
(-25.392 and -0.021 with T value -1.22 and -1.78) which means the larger the
floor space and the increase in number of member means lower energy

Result 5:

From the Stepwise Regression result, and the last hypothesis “the larger the
floor space and the increase in number of member means lower energy
expenditure” seems not logical because logically the larger the floor space
and the increase in number of member should resulted in greater
energy expenditure.
So it can be inferred that the data indicates a result of bias on Omitted
Variable since the interval of regression change is not constant. That according
Leverage Plot analysis this can be inferred that the scattered plot has
outlier points that states there should be another variable that affects the
data. That variable is the Census Region or Division. We could see from
the table (Excel data) that example relation between Number of Member per
Household, Area/ Division with Total Energy Expenditure in that Area. The
Northeast area and South Atlantic shows about the same value in Member per
Household between 2.5 and 2.56. But the result of Energy Expenditure varies
significantly from 36.94 (South Atlantic) and 47.72 (Northeast).

Anda mungkin juga menyukai