Brodkin
P6
11-19-2015
pollution
as
one
of
the
most
widely
known
greenhouse
gases
on
the
planet,
everyone
knows
what
it
is.
But
does
it
directly
affect
how
long
we
will
live?
This
is
the
question
that
has
been
hiding
in
the
back
of
my
head
for
years,
but
Ive
never
had
the
chance
to
explore
until
this
project.
An
example
of
a
career
that
would
find
this
data
to
be
helpful
is
any
person
who
works
for
a
factory
whose
job
it
is
to
cut
down
on
CO2
emissions.
This
question
struck
me
as
particularly
important,
because
not
only
does
it
affect
me
and
you,
it
also
affects
every
other
organism
on
earth
that
breathes
air.
So
naturally
I
wondered
if
the
United
States,
one
of
the
highest
CO2
emitting
countries
in
the
world,
had
a
lower
life
expectancy
than
countries
with
lower
emission
levels.
So
come
explore
with
me,
I
think
itll
be
a
gas!
Many conclusions can be drawn from the above scatterplot, but before we dive into
that,
the
variables
must
be
discussed
first.
The
explanatory
variable
in
this
specific
instance
is
the
CO2
emissions,
while
the
response
variable
is
the
life
expectancy
in
years.
In
other
words,
CO2
emissions
is
the
independent
variable
because
the
amount
of
emissions
can
be
controlled
by
the
country,
whereas
the
life
expectancy
is
the
dependent
variable
because
it
can
be
explained
by
the
emission
levels.
As
for
outliers,
there
are
several
for
both
variables.
For
CO2
emissions,
Australia
(16.7
metric
tons
per
capita),
Aruba
(24.2
metric
tons
per
capita),
and
Bahrain
(18.4
metric
tons
per
capita),
are
all
outliers
because
they
stuck
out
of
the
box
and
whisker
plot
(shown
below).
For
Life
Expectancy
there
were
only
two
outliers,
Afghanistan
(59.6
years)
and
Angola
(51.1
years).
This
was
also
determined
by
the
box
and
whisker
plot
shown
below.
Although
there
is
a
low
amount
of
outliers,
there
is
an
even
smaller
amount
of
influential
points.
These
include
the
countries
of
Afghanistan
and
Angola.
Afghanistan
only
emits
.3
metric
tons
per
capita,
and
has
a
life
expectancy
of
60
years,
while
Angola
only
emits
1.4
metric
tons
of
CO2
per
capita
and
has
a
life
expectancy
of
51.1
years.
If
both
of
these
points
were
to
be
removed,
there
would
be
a
slight
increase
in
the
line
of
best
fit.
and
how
long
a
person
is
expected
to
live,
upon
finding
the
r
value
for
the
strength
of
correlation,
I
am
persuaded
to
think
otherwise.
The
calculated
r
value
for
these
two
quantitative
bivariate
data
sets
is
.4776714352,
which
means
that
there
is
a
moderately
weak
correlation
between
emission
levels
and
life
expectancy.
The
R
squared
value
is
.22817,
which
means
that
approximately
23%
of
the
variance
of
life
expectancy
can
be
explained
by
the
emission
levels.
After running a linear regression on the data, a least squares regression equation can be
calculated
fairly
quickly.
This
equation
comes
out
to
be
yhat
=
68.91195
+
0.552284(x).
In
other
words,
the
predicted
value
of
y
equals
the
y-
intercept
plus
the
coefficient
of
regression,
or
slope,
times
x.
What
this
equation
specifically
means
is
that
without
any
pollution
at
all,
the
life
expectancy
of
a
person
in
said
country
would
live
to
be
68.91195
years
old.
Although
this
sounds
like
the
truth
at
first
glance,
it
is
actually
deceiving
because
life
expectancy
depends
on
many
hidden
factors
such
as
how
advanced
the
country
is
and
their
access
to
medical
care.
The
slope
is
stating
that
with
every
metric
ton
per
capita
of
CO2
emission,
the
life
expectancy
of
the
country
is
projected
to
increase
by
0.552284
years.
This
indicates
that
a
linear
regression
is
not
a
great
fit
for
this
set
of
data,
because
the
residual
plot
points
should
be
essentially
random.
In
order
to
test
the
accuracy
of
my
linear
regression
equation,
I
chose
to
input
the
point
1.4
into
my
linear
regression
equation
to
find
a
predicted
value
of
69.643926,
which
is
higher
than
the
actual
value
by
18.543926
years.
The
residual
value,
as
found
by
subtracting
the
predicted
value
from
the
actual
value
comes
out
to
be
-17.81195.
This regression turned out to be quite different than previously expected. It turns out
that
there
is
a
moderately
weak
correlation
between
the
CO2
emissions
of
a
country
and
the
life
expectancy
that
it
provides.
Perhaps
If
more
data
points
were
selected,
the
data
would
be
different,
but
with
the
provided
data
it
appears
that
there
is
little
correlation
between
the
two
variables.
However,
this
could
be
explained
by
a
presence
of
hidden
variables,
such
as
how
large
the
countrys
population
is
as
well
as
access
to
medical
care
and
abundance
of
wealth.
Works
Cited
"World DataBank." The World Bank DataBank. The World Bank, n.d. Web. 19 Nov. 2015.