A first visual comparison of the histograms for each The distributions of values between measurements
RAW DATA ANALYSIS
site and hour measured against the Before with Sensor 1 and Sensor 2 suggest there are no
BEFORE
histogram, suggest that Site 3, Site 4, Site 6 and important differences for considering to perform
Site 7 could be the more affected sites. the analysis by Sensor.
The Boxplot
visualizations over the
OUTLIERS ANALYSIS
29 datasets, before
and after performing
the Outlier Analysis
with Tukey and
PPM
Hampel methods,
mainly suggest that
the distribution of the
Before dataset with
a logarithmic
transformation could
follow a normal
TRANSFORMATION)
(LOGARITHMIC
distribution.
PPM_LOG
PPM
TRANSFORMATION)
(LOGARITHMIC
PPM_LOG
The Normality Analysis results for the Before datasets suggest that the logarithmic transformed version of the data for the complete dataset and the Tukeys method resulting dataset follow a Normal Distribution. In these cases, the
4 tests performed (ShapiroWilk, KolmogorovSmirnov, Cramervon Mises and AndersonDarling) returned Pvalues over 0.05, not rejecting Normality.
The Normality Analysis results for the After datasets (the other 28 datasets for the 7 sites and 4 hours), in a vast majority, rejected normality. In the case of the logarithmic transformed datasets, just the complete (without removing
outliers) Site 5 dataset did not get a rejection of normality from the 4 tests performed.
Datasets
As the main goal of obtaining normally distributed datasets by removing the outliers was not achieved, the 28 datasets
considered (7 Sites by 4 hours) are the original datasets (without removing outliers). There was not found any reasons
to remove these outlier points.
TESTS AND RESULTS
Hypothesis testing
To know if the measured PPM values are worse than normal, there were performed 3 one sided tests for two
independent samples to check for rejections in the means values greater than normal hypothesis between each one of
the 28 After datasets with the Before dataset. The tests were performed for the untransformed data and for the
PPM
logarithmic transformed data. Hence, the results shown here correspond to 56 comparisons in total.
Ttest Safe datasets (PPM values): Site 1 (Hour 2), Site 2 (All hours), Site 5 (Hour 4)
Safe datasets (PPM_LOG values): Site 2 (All hours)
TRANSFORMATION)
*However, the Ttest requires the normality assumption which is already rejected in the previous part. Therefore, T
(LOGARITHMIC
test is not the right choice for testing the equality of means for these data sets.
PPM_LOG
Conclusion
All sites, except for Site 2, present PPM values worse than normal.
* T-test requires the normality assumption which is already rejected in the previous
part. Therefore, T-test is not the right choice for testing the equality of means for
these data sets.