Since there are outliers in sample data, we cannot use a normal curve to describe the distribution of population. The hypothesis to be tested can be stated as follow:
Figure 1. Histogram of home prices 2006. (n=100)
Gainesville Realtors Association (GRA) claim that in 2009 the average price of a house sold in 2009 was $100 per square foot and this is below the average for 2006. A random sample of house sold in 2006 was selected to test this claim. The sample data included the square feet price for one hundred houses sold in 2006. The five number summary, as well as mean and the standard deviation of the data are given in table 1. The distribution of house price is shown by means of a histogram plot (Figure 1) with the corresponding values given in table 2. The coefficient of variation of the data is 33.69%. The distribution of houses status sold in 2006 is shown by the pie chart in Figure 2. According to the sample data, 11 percent of the houses sold were new and 89% were old. The box plot shown in Figure 3 represents the lower and upper quintile of the price per square feet for the sample data. To test the GRA claim, having a random sample of 100 houses sold in 2006, since the sample size is greater than 30, we can use either z- test or ttest. Here we dont know the population standard deviation, instead we can use sample standard deviation since our sample size is greater than 30. We can also use the sample mean as an estimation of the 2006 population mean and then use t-test for population means.
: = : > () From the exploratory analysis we have n=100 (df=99), SD=31.105, Mean=92.33. From the ttable, with = 0.05 and d.f=99, we have = 1.65. We want to check if the claim that the mean price of house sold in 2006 is greater than 100 is valid or not. We use t-test to transfer the sample mean into the standard normal distribution. The test statistic is then: = / = 92.33 100 31.105/ 100 = 2.4658
The p-value corresponding to this test statistic is greater than our significant level (P-value > = 0.05), hence we fail to reject the null hypothesis and conclude that there is not enough evidence that the mean price of house sold in 2006 is greater than 100. This procedure is shown in Figure 4.
Variable SqftPrice
N 100
SE Mean Mean StDev Minimum Q1 Median Q3 Maximum 3.11 92.33 31.1052 21.00 70.50 90.50 108.00 196.00 Table 1. Summary of selling price of homes in Gainesville, Florida, Fall 2006
Range 175
10 to 30 30 up to 50 50 up to 70 70 up to 90 90 up to 110 110 up to 130 130 up to 150 150 up to 170 170 up to 190 190 up to 210 Total
Table 2. Distribution of Homes in Gainesville by Price, Fall 2006 (n=100) Type of Home Frequency 11 New 89 Old Table 3. Distribution of Homes in Gainesville
Figure 4. Box plot of the home prices, 2006. (n=100) The assumption of this test was: Random variable: The average price of a house sold in 2006. Distribution of the population: It is not normal. Parameter of the distribution: = mean of the population. Data collection method and type of data: Assume SRS, Quantitative variable. The sample size: n = 100 is greater than 30 so we can either use t-test or z-test.