Anda di halaman 1dari 9

EDEXCEL ANALYTICAL METHODS FOR ENGINEERS H1

UNIT 2 - NQF LEVEL 4


OUTCOME 4 - STATISTICS AND PROBABILITY
TUTORIAL 1 STATISTICAL DATA 1

Tabular and graphical form: data collection methods; histograms; bar charts; line diagrams;
cumulative frequency diagrams; scatter plots
Central tendency and dispersion: the concept of central tendency and variance measurement;
mean; median; mode; standard deviation; variance and interquartile range; application to
engineering production
Regression, linear correlation: determine linear correlation coefficients and regression lines and
apply linear regression and product moment correlation to a variety of engineering situations
Probability: interpretation of probability; probabilistic models; empirical variability; events and
sets; mutually exclusive events; independent events; conditional probability; sample space and
probability; addition law; product law; Bayes theorem
Probability distributions: discrete and continuous distributions, introduction to the binomial,
Poisson and normal distributions; use of the Normal distribution to estimate confidence intervals
and use of these confidence intervals to estimate the reliability and quality of appropriate
engineering components and systems

This section of the syllabus should have been covered at national level so the same material is
repeated here and should be skipped if you are already familiar with the work.
On completion of this tutorial you should be able to do the following.

Explain the use of raw data.

Present data as frequency polygons, bar graphs and histograms.

Explain and find the mean and median values.

Explain and plot ogives.

Explain and find quartiles.

D.J.Dunn www.freestudy.co.uk

1. INTRODUCTION
Statistics are used to help us analyse and understand the performance and trends in various areas of
work. These might be financial trends, things to do with the population or things to do with
manufacturing.
2.

SAMPLE VALUES

The samples are taken from a POPULATION and the samples form a SET within that population.
For example a set of 100 people might be taken from a population of 1000 000. The samples must
be truly random with no factors to bias the results to one extreme or the other. If the set is large
enough and truly random then we might expect any information we get to apply equally well to the
larger population.
The data collected will vary about a mean value. The data is a variable such as the height of people,
the values of resistors coming off a production line or the diameter of a machined component
coming off a machine tool. We find out how many of these things f fall within defined bands or
ranges of values denoted x.
We need to discuss what x means. If you were throwing a dice over and over again you would get
a score of exactly 1, 2, 3, 4, 5 or 6. Hence x can only be these exact numbers and if you throw the
dice repeatedly you can measure how many times a particular number comes up.
In the case of continuous variables such as height, weight, size and so on, the values of x might be
a decimal number depending on the accuracy of measurement. In the extreme we would get one
sample for each exact decimal value and this would be no use. We could do one of two things. We
could round off the values so that all those close together will be counted as the same, or we could
count the number of samples f that fall within a specified range with x being the value at the
middle of the range. Lets use an example to get started.
3. RAW DATA
Consider a set of statistics compiled for the height of children of age 10. First we would compile a
table of heights. This would be the raw data. Note that the larger the sample we take, the more
meaningful the results will be. By measuring the heights to an accuracy of 0.01 m we may get
children with the same height but this is unlikely with a small set. This is the table of raw data for
10 year old children
Sample
Height (m)
Sample
Height (m)

10

11

12

1.45

1.56

1.37

1.44

1.32

1.42

1.55

1.29

1.37

1.49

1.47

1.34

13

14

15

16

17

1.56

1.28

1.35

1.62

1.46

4. BANDS
If we used exact measurements of heights, it is unlikely we would find two children exactly the
same height but the more we round off the values, the more likely it becomes. Also when handling
large numbers of samples, we end up with huge lists of data so to simplify the table we create bands
within which the measurements fall and we will find more than one child falls in each band. The
number of children within each band is the frequency. Next we would have to go through the
laborious task of counting how many there are in each band. If we found a child with a height
exactly on the edge of the band edge, we might decide to allocate a half to each band on either side
resulting in frequency values that are not whole numbers. The result is a FREQUENCY
DISTRIBUTION TABLE.
D.J.Dunn www.freestudy.co.uk 2

FREQUENCY DISTRIBUTION TABLE


Height
Mid Point
Freq.

1.2 - 1.3
1.25
2

1.3 - 1.4
1.35
5

1.4 - 1.5
1.45
6

1.5 - 1.6
1.55
3

1.6 - 1.7
1.65
1

5. PLOTS
If we plot frequency vertically against height horizontally, we get a frequency distribution graph
and this can be drawn in different ways.

Notice that the points are drawn for the middle of the band. The rectangles that form the histogram
are drawn between the limits of each band.
6.

MEAN

This is one of the more common statistics you will see and it's easy to compute. All you have to do
is add up all the values in a set of data and then divide that sum by the number of values in the
dataset. For our example, let the height be represented by the variable x and the frequency be f.
Sample 1
number
Height 1.45
(m)

10

11

12

13

1.56

1.37

1.44

1.32

1.42

1.55

1.29

1.37

1.49

1.47

1.34

1.56

16

17

Total

Sample 14
15
number
Height 1.28 1.35
(m)

1.62 1.46

24.34

The mean value is denoted x and x = 24.34/17 = 1.432 m


We can do this a bit more simply using the frequency distribution table.

X (mid pt)
f.
fx

1.25
2
2.5

1.35
5
6.75

1.45
6
8.7

1.55
3
4.65

1.65
1
1.65

total
24.25

The mean value is x = 24.25/17 = 1.426 m. This is not quite as accurate as the previous answer
because the values have been taken at the mid point of the band.

D.J.Dunn www.freestudy.co.uk 3

7. MEDIAN
Whenever you see words like, "the average person ...", or "the average income of ... you don't
always want to know the mean. Often you want to know the about the one in the middle. That's the
median.
Again, this statistic is easy to determine because the median literally is the value in the middle. In
order to find it, you just line up the values in your set of data from largest to smallest. The one in
the dead-centre is your median. Our table would look like this.
Sample 1
number
Height 1.28
(m)
Sample 14
number
Height 1.55
(m)

10

11

12

13

1.29

1.32

1.34

1.35

1.37

1.37

1.44

1.44

1.46

1.46

1.47

1.49

15

16

17

Total

1.56

1.56

1.62

24.34

The mid point in the table is point number 9 with so the median value is 1.44 m. If we had an even
number of samples, say 18, then there would be two values in the middle and we should average the
two to get the median.
8. OGIVE
Lets add a new row to our frequency distribution table containing the cumulative frequency.
x
f.
fx
cum. f

1.25
2
2.5
2

1.35
5
6.75
7

1.45
6
8.7
13

1.55
3
4.65
16

1.65
1
1.65
17

Total
17 = n
24.25

Point number 9 is between 1.35 and 1.45 and we cant get it exactly from this table. The median
and the mean are almost the same value but in some cases they can be very different. Comparing
the mean to the median for a set of data can tell you a lot about the distribution or weighting of the
sample. For example if the mean is much larger than the median, it could mean that you have a lot
of tall children or a lot of highly paid staff. We can get show this with a graph called an OGIVE.
This is another way of plotting the data with the accumulative frequency plotted vertical. For a
small number of samples we can plot the raw table against sample number. In effect this is using
bands of 0.01 m and the sample number becomes the accumulative frequency.
The ninth sample divides the ogive into two
vertically and the projection down to the x
axis gives the median value as 1.44.
Examining the graph we see that the banded
data plot does not correlate very well with
the un-banded plot that gives us the correct
median. This is because we are plotting the
mid of the band. If we plot the upper edge of
each band, the correlation is better and then
the graph tells us how many children have a
height less than a given value.

D.J.Dunn www.freestudy.co.uk 4

9. QUARTILES and PERCENTILES


Plotting the same data using the upper limit of each band of 0.1 m shows how many children are
either shorter or taller than that value. The table to plot is as follows. The red points are the unbanded figures.
x (upper point)
f.
fx
cum. f

1.3
2
2.5
2

1.4
5
6.75
7

1.5
6
8.7
13

1.6
3
4.65
16

1.7
1
1.65
17

Total
n =17
24.25

If we divide the vertical scale into 4 equal parts (each 4.5 in this case) and project the lines down to
the x axis we have 4 divisions on the x axis. The lower part is called the lower quartile. The upper
part is called the upper quartile. The two middle parts added together is called the inter-quartile
range and if this is divided by 2 we have the semi-inter-quartile. These tell us something about how
the samples are spread around the median but a better method of doing this is to use the standard
deviation.
If we repeated the process by dividing the vertical axis into 100 parts, then we can look up how
many children have a height less than a given percentage. These are called the PERCENTILES. The
median is the 50th percentile. The first quartile is the 25th percentile.

D.J.Dunn www.freestudy.co.uk 5

WORKED EXAMPLE No.1


A company manufactures steel bars of nominal diameter 20 mm and cuts them into equal
lengths. The diameter of each length is measured at the middle for the purpose of quality
control. The results for 20 bars are given below.

Produce a frequency distribution table using bands of 0.1 mm.


Calculate the mean of the samples.
Draw a histogram.
Plot the Ogive.
Determine the median, the upper and lower quartiles and the semi- inter-quartile.
Use the tables and plots to show your solutions
Sample
diameter

1
19.9

2
19.8

3
20.1

4
19.9

5
19.7

6
20.1

7
20.0

8
19.6

9
19.7

10
20.1

Sample
diameter

11
20.2

12
20.0

13
19.9

14
19.8

15
20.1

16
20.0

17
19.7

18
19.6

19
19.9

20
20.2

SOLUTION
Total = 398.3

Total samples n = 20

mean = 398.3/20=19.915

FREQUENCY DISTRIBUTION TABLE


d
19.55 19.65 19.65- 19.75- 19.85Mid
19.6
19.7 19.8
19.9
f
2
3
2
4
fd
39.2
59.1 39.6
79.6
cum.f
2
5
7
11

n=

20

D.J.Dunn www.freestudy.co.uk 6

19.9520
3
60
14

20.0520.1
4
80.4
18

Mean = 398.3/20=19.915

20.15- 20.25
20.2
2
40.4
20

Totals
20
398.3

median = 19.92 Upper quartile =20.25 -20.07= 0.18 Lower quartile = 19.76-19.55 = 0.21
Inter-quartile range =

D.J.Dunn www.freestudy.co.uk 7

20.07-19.76 = 0.31

Semi-inter-quartile = 0.31/2 = 0.155

SELF ASSESSMENT EXERCISE No.1


The diameters of a number of components are measured to the nearest 0.1 mm. The distribution
is shown.
Diameter mm
Number

9.6
3

9.7
9

9.8
36

9.9
88

10.0
122

10.1
90

10.2
44

10.3
7

10.4
1

Draw the histogram and the Ogive and deduce the mean, the median, the upper and lower
quartile and the semi-interquartile.

The ogive looks like this

D.J.Dunn www.freestudy.co.uk 8

450
400
350
300
250
200
150
100
50
0
9.6

9.8

10

10.2

(mean = 10.001 mm,


median = 10 upper quartile = 3.6
semi-interquartile = 0.9)

D.J.Dunn www.freestudy.co.uk 9

10.4

10.6

lower quartile = 2.68

Anda mungkin juga menyukai