Anda di halaman 1dari 15

1

TOPIC 1 ROLE OF STATISTICS IN ENGINEERING


Introduction
This chapter discusses the important role of statistics in engineering field. This is followed by the
conditions of variations in data collection for decision making. These topics are important as the
foundation for data analysis topics.
Learning Outcome
Identify the role that statistics can play in the engineering
Discuss how variability affects the data collected and used for engineering decisions.
Discuss the different methods that engineers use to collect data.
Engineers Role
Solve problems with the efficient application of scientific principles by:
Refining existing products
Designing new products or processes

Engineering Methods

Problem
description

Factors
identification

Scientific
model
proposal

Data
collection

Conclusion

Validation

Manipulation

2
Statistics

Science dealing with collection, presentation, analysis, use of data to make decisions, solve problems,
design products & processes
Data collection planned in terms of the design of surveys & experiments
Purpose is to extract information from data
Statistical techniques useful for describing and understanding variability and its potential sources
Relate to population or sample
Sample chosen subset of the population; opposed to compiling data about the entire group where
information available usually partial information from population
Conclusion about population is drawn based on information obtained in sample
Relates to uncertainty concepts (probability theory, probability distributions)

Statistical analysis

description

inference

(i) Descriptive statistics


Summarize population data by describing sample observation numerically or graphically.
Numerical description:
- mean, standard deviation (continuous data)
- frequency, percentage (categorical data)
(ii) Inferential statistics
Uses patterns in the sample data to draw inferences about the population
Inferences:
- answering yes/no questions about the data (hypothesis testing)
- estimating numerical characteristics of the data (estimation)
- describing associations within the data (correlation)
- modeling relationships within the data (regression analysis)
Inference can extend to forecasting; prediction and estimation of unobserved values include
extrapolation and interpolation of time series, and can also include data mining.

3
Data Collection
How data will be mathematically analyzed depends on how those data were collected
Experimental design and statistics go hand in hand!

Statistical Data
Collection

Experimental

Observational

Census

Sample survey

Experimental:
1. Planning the research, finding the number of replicates of the study, using the following
information:
e.g. preliminary estimates regarding the size of treatment effects, alternative hypotheses,
and the estimated experimental variability.
To allow an unbiased estimate of the difference in treatment effects, experiments shall be
compared with (at least) one new treatment with a standard treatment or control,.
2. Design of experiments (DOE), using blocking to reduce the influence of confounding
variables, and randomized assignment of treatments to subjects to allow unbiased
estimates of treatment effects and experimental error.
3. Performing the experiment and analyzing the data.
4. Examining the data set in secondary analyses, to suggest new hypotheses for future study.
5. Documenting and presenting the results.
A control study which researcher attempts to understand cause-and-effect relationships.
The researcher controls
how subjects are assigned to groups
which treatments each group receives.

4
In analysis, the researcher compares group scores on some dependent variable.
Based on the analysis, the researcher draws a conclusion about whether the treatment (independent
variable) had a causal effect on the dependent variable.
Observational:
Typically uses a survey or case-control study to collect observations about the area of interest
and then performs statistical analysis

Observational studies attempt to understand cause-and-effect relationships. However, unlike


experiments, the researcher is not able to control

how subjects are assigned to groups and/or

which treatments each group receives.

Census:
Obtains data from every member of a population. In most studies, a census is not practical,
because of the cost and/or time required.

Sample survey:
Obtains data from a subset of a population, in order to estimate population attributes.
This subset of the population will be used to represent the whole population.
Statistical measures within population or sample: Variances, and standard deviation, are called
parameters.
For a sample to be used as a guide to an entire population, it is important that it is truly a
representative of that overall population. Representative sampling assures that the inferences and
conclusions can be safely extended from the sample to the population as a whole.
Statistics offers methods to estimate and correct for any random trending within the sample and
data collection procedures.
Various ways to sample a population: random sampling (most common). Randomness is studied
using the mathematical discipline of probability theory.

5
-ExampleA researcher who carries out a study to determine the average height of fifth graders in a particular
school district. If only boys were measured, the results would only apply to boys, not all fifth
graders, and would thus be biased, not random. To collect unbiased data, one would randomly
choose the same number of boys and girls from each fifth grade class to measure.

-ExampleAn experimental design calls for observing what food items red ants bring back to their colony as
compared to black ants. You have too many ant colonies to observe all of them, so you pick a
random sample of 5 colonies of each ant type to observe. An easy way to choose randomly is by
giving each colony a number or letter on a slip of paper. Put these in a basket and pull 5 slips for
each ant colony type. This way there is no bias toward any particular colonies.

-ExampleIn drug trials, fifty out of one hundred people are randomly chosen to receive the drug, while the
other fifty receive a placebo.

How many
study subjects
needed?

Each study subject called an experimental unit or replicate.


Repeating a measure more than once replicating the units.
Replication is a must! Replication means to have more than one experimental unit that will be
subjected to independent variable or treatment.
Reasons for replicating
1) Organisms die or don't perform.
2) To calculate averages or other statistics, you must have more than one measurement
-ExampleThree plants are each given a different amount of water. Plant 1 receives 0.1L/day, Plant 2 receives
0.5L/day and Plant 3 receives 1L/day. Only one plant receives a particular amount of water each
day. Determine the number of replication in this experiment and state whether data analysis can
be done.

7
-ExampleThree plants receive 0.1L/day, three receive 0.5L/day, and three receive 1L/day. With three plants
in each treatment group, data analysis such as to determine the averages is carried out. Determine
the number of replication in this experiment and state whether data analysis can be do

Pseudoreplication
Taking multiple measurements on the same experimental unit and treating each measurement
as an independent data point not true replication.
Pseudoreplication should always be avoided because the results are not scientifically valid.
-ExampleUsing one plant for an experiment measuring the effect of nitrogen on growth and counting each
branch as a separate experimental unit or replicate, would be an example of pseudoreplication.
You need to use multiple separate plants for each treatment.

8
Controlled experiment
An experiment where only one variable or factor is manipulated and all other variables are held
constant. An experiment is controlled if the only factor that is allowed to vary is the independent
variable (treatment). All other factors are kept as constant as possible.
Control
An experimental unit that is being subjected to all the same conditions as the units actually are
treated, except for the control does not receive an actual treatment or receives only a placebo.
Blind study
The people collecting and analyzing the data do not know which experimental units received which
treatments. Only after the data are analyzed are the treatments revealed, or decoded. The purpose
is to reduce any human bias toward an expected outcome.
-ExampleIf the pots have coded stickers on the bottom that only the treatment students understand, then the
data takers will not know which plants are getting which treatment and that will reduce their bias
(preconceived expectations), and the data will be more objective and reliable. Labels can be as
simple as T1-1, T1-2, T1-3, T2-1...T2-3, and T3-1...T3-3. T1, T2 and T3 stand for the treatment
(5 g N, 10 g N or 0 g N). The numerals after the dash number each pot within the treatment group.

9
Data Collection: Pros and Cons
Resources
When the population is large, a sample survey has a big resource advantage over a census. A welldesigned sample survey can provide very precise estimates of population parameters - quicker,
cheaper, and with less manpower than a census.
Generalizability
Generalizability refers to the appropriateness of applying findings from a study to a larger
population. Generalizability requires random selection. If participants in a study are randomly
selected from a larger population, it is appropriate to generalize study results to the larger
population; if not, it is not appropriate to generalize.
Observational studies do not feature random selection; so generalizing from the results of an
observational study to a larger population can be a problem.
Causal inference
Cause-and-effect relationships can be teased out when subjects are randomly assigned to groups.
Therefore, experiments, which allow the researcher to control assignment of subjects to treatment
groups, are the best method for investigating causal relationships.
Data Recording
Counting (raw numbers)
Collecting numerical data begins as counts, called raw numbers such as the number of flowers on
the plants, write the numbers on a data sheet or in a science journal, and graph those or put them
in a table.
Pictures, drawings
Sometimes the data collected is in the form of a drawing when recording variables such as shape
and color. Drawings are usually necessary for presentations to help explain to an audience what
the experiment was, how it was conducted, and the results.
Non-numerical data
In some experiments the data to be collected is not numerical in nature. It might be color change,
intensity of color, or some other qualitative measure such as high, low, or medium light.

10
-

GROUP ACTIVITY-

Engineering & Statistics


Engineers solve many types of engineering problem that have to be precisely calculated with
little data recorded

Statistics techniques are essential for


(a) determining exact measurement
(b) quality control for improvements
(c) design and built product/ structure
(d) calculate time a job requires & number of human resources needed
-ExampleAn engineer is designing a nylon connector to be used in an automotive engine application. The
engineer is considering establishing the design specification on wall thickness at 3/32 inch, but is
somewhat uncertain about the effect of this decision on the connector pulloff force. If the pulloff
force is too low, the connector may fail when it is installed in an engine. Eight prototype units are
produced and their pulloff forces measured (in pounds):
12.6, 12.9, 13.4, 12.3, 13.6, 13.5, 12.6, 13.1.

The dot diagram is a very useful plot for displaying a small body of data say up to
about 20 observations.

This plot allows us to see easily two features of the data; the location, or the middle, and
the scatter or variability.

The engineer considers an alternate design and eight prototypes are built and pulloff
force measured.

11

The dot diagram can be used to compare two sets of data.

Since pulloff force varies or exhibits variability, it is a random variable.

A random variable, X, can be modeled by:

=+
where
= constant
= random disturbance.

Issues in Engineering Statistics applications


Misuse of statistics
Can produce subtle, but serious errors in description and interpretation subtle in the sense that
even experienced professionals make such errors, and serious in the sense that they can lead to
devastating decision errors.
For instance, social policy, medical practice, and the reliability of structures like construction or
bridges all rely on the proper use of statistics.
Statistical significance
Even when statistical techniques are correctly applied, the results can be difficult to interpret for
those lacking expertise.
The statistical significance of a trend in the data which measures the extent to which a trend
could be caused by random variation in the sample may or may not agree with an intuitive
sense of its significance.

12
Practice
1. Which of the following statements are true?
I. A sample survey is an example of an experimental study.
II. An observational study requires fewer resources than an experiment.
III. The best method for investigating causal relationships is an observational study.
(A) I only
(B) II only
(C) III only
(D) All of the above.
(E) None of the above.
2. Which of the following statements are true?
I. The mean of a population is denoted by x.
II. Sample size is never bigger than population size.
III. The population mean is a statistic.
(A) I only.
(B) II only.
(C) III only.
(D) All of the above.
(E) None of the above.
3. Hypothesis testing and estimation are both types of descriptive statistics.
(A) True
(B) False
4. A set of data organized in a participants(rows)-by-variables(columns) format is known as a
data set.
(A) True
(B) False
5. A graph that uses vertical bars to represent data is called a ____.
(A) Line graph
(B) Bar graph
(C) Scatterplot
(D) Vertical graph
6. The goal of ___________ is to focus on summarizing and explaining a specific set of data.
(A) Inferential statistics
(B) Descriptive statistics
(C) None of the above
(D) All of the above

13
7. A _______ is a numerical characteristic of a sample and a ______ is a numerical characteristic
of a population.
(A) Sample, population
(B) Population, sample
(C) Statistic, parameter
(D) Parameter, statistic
8. A sampling distribution might be based on which of the following?
(A) Sample means
(B) Sample correlations
(C) Sample proportions
(D) All of the above
9. The car will probably cost about 16,000 dollars; this number sounds more like a(n):
(A) Point estimate
(B) Interval estimate
10. The use of the laws of probability to make inferences and draw statistical conclusions about
populations based on sample data is referred to as ___________.
(A) Descriptive statistics
(B) Inferential statistics
(C) Sample statistics
(D) Population statistics
11. Which of the following are principles of questionnaire construction?
(A) Consider using multiple methods when measuring abstract constructs
(B) Use multiple items to measure abstract constructs
(C) Avoid double-barreled questions
(D) All of the above
(E) Only B and C
12. Which of these is not a method of data collection.
(A) Questionnaires
(B) Interviews
(C) Experiments
(D) Observations
13. Secondary/existing data may include which of the following?
(A) Official documents
(B) Personal documents
(C) Archived research data
(D) All of the above

14
14. Which of the following terms best describes data that were originally collected at an earlier
time by a different person for a different purpose?
(A) Primary data
(B) Secondary data
(C) Experimental data
(D) Field notes
15. Researchers use both open-ended and closed-ended questions to collect data. Which of the
following statements is true?
(A) Open-ended questions directly provide quantitative data based on the researchers
predetermined response categories
(B) Closed-ended questions provide quantitative data in the participants own words
(C) Open-ended questions provide qualitative data in the participants own words
(D) Closed-ended questions directly provide qualitative data in the participants own words
16. Open-ended questions provide primarily ______ data.
(A) Confirmatory data
(B) Qualitative data
(C) Predictive data
(D) None of the above
17. Which of the following is true concerning observation?
(A) It takes less time than self-report approaches
(B) It costs less money than self-report approaches
(C) It is often not possible to determine exactly why the people behave as they do
(D) All of the above
18. Qualitative observation is usually done for exploratory purposes; it is also called
___________ observation.
(A) Structured
(B) Naturalistic
(C) Complete
(D) Probed
19. Another name for a Likert Scale is a(n):
(A) Interview protocol
(B) Event sampling
(C) Summated rating scale
(D) Ranking
20. Which of the following is not one of the six major methods of data collection that are used by
educational researchers?
(A) Observation
(B) Interviews
(C) Questionnaires
(D) Checklists

15

21. The type of interview in which the specific topics are decided in advance but the sequence
and wording can be modified during the interview is called:
(A) The interview guide approach
(B) The informal conversational interview
(C) A closed quantitative interview
(D) The standardized open-ended interview
22. Which one of the following in not a major method of data collection:
(A) Questionnaires
(B) Interviews
(C) Secondary data
(D) Focus groups
(E) All of the above are methods of data collection
23. A census taker often collects data through which of the following?
(A) Standardized tests
(B) Interviews
(C) Secondary data
(D) Observations
24. The researcher has secretly placed him or herself (as a member) in the group that is being
studied. This researcher may be which of the following?
(A) A complete participant
(B) An observer-as-participant
(C) A participant-as-observer
(D) None of the above
25. Which of the following is not a major method of data collection?
(A) Questionnaires
(B) Focus groups
(C) Correlational method
(D) Secondary data

Anda mungkin juga menyukai