Anda di halaman 1dari 23

METHODS OF DATA COLLECTION

Applied Sta+s+cs and Compu+ng Lab Indian School of Business

Applied Sta+s+cs and Compu+ng Lab

Learning goals
Importance of data How do we obtain data? Randomiza=on Sampling

Applied Sta+s+cs and Compu+ng Lab

Data
Recorded, relevant informa=on Not necessarily just numbers Any relevant facts, gures, observa=ons or descrip=ons of things But why do we need it?

Applied Sta+s+cs and Compu+ng Lab

Why data?
Bases /Raw material for analysis Help provide answers to ques=ons we cannot answer right away Empirical support to (against) theories How do we answer this ques=on:
Do individuals with larger waist circumferences have larger intra- abdominal adipose =ssue (AT) area? Good idea to gather relevant informa=on Study a few individuals , not possible to study all individuals Data :
Age Waist circumference AT area

Try to nd a model to relate these


Applied Sta+s+cs and Compu+ng Lab

Sources: Primary
Original sources from which researchers directly collect First hand informa=on Collected through:
Observa=on ( AT-WC) Interviewing ( Stores- Customers) Complete enumera=on Sampling Designed experiment

Classied as:

Researcher can collect accurately and according to research needs


Applied Sta+s+cs and Compu+ng Lab

Complete enumera=on
One way to gather useful informa=on Collect data from each and every unit in the popula=on Applied where informa=on on all units under study is needed:
Popula=on Census ( popula=on: all individuals in the country) Prepara=on of voters list (popula=on: all individuals with voter Ids) Selec=on from many applicants for a job etc Social networking sites trace all your ac=vi=es, online shopping etc

Sources: Primary

Applied Sta+s+cs and Compu+ng Lab

Is Complete enumera=on the best method?


Time and money expenses Need not necessarily provide accurate informa=on, not free of errors ( non- sampling errors):
Even in census there is always over/under enumera=on i.e. incomplete coverage Tabula=on/data entry errors Errors due to diculty in organizing and implemen=ng out surveys of large magnitudes

Most importantly, complete enumera=on might not be necessary

Applied Sta+s+cs and Compu+ng Lab

Need for Sampling


Only part of the popula=on is studied Certain circumstances under which it is preferred over complete enumera=on?
Limited resources Provision for a permissible error exists Destruc=ve studies

Idea is that if the sample considered is representa+ve of the popula+on that it is taken from, the results derived from it can be generalized to the popula+on.
Applied Sta+s+cs and Compu+ng Lab

Sampling: Example
The cars data
Not possible to study every used car in US Sample of used cars Cars of all makes, models, engine sizes etc included

AT-WC example:
Data only on men Not representa=ve if interest is in measuring CVD risk for women

Applied Sta+s+cs and Compu+ng Lab

Representa=ve Samples
Which samples are representa=ve? Samples are biased if some characteris=cs of popula=on are over/under represented
Results from these samples cannot be adjusted

We need unbiased & representa=ve samples

Applied Sta+s+cs and Compu+ng Lab

Randomiza=on
To overcome bias
Shuing a deck of cards before dealing s=rring a pot of soup before tas=ng

Random selec=on is fair Random sampling means sampling based on each popula=on unit having a par=cular chance of being selected in to the sample
In a popula=on with 60% men and 40% women a sample drawn with women having a greater chance of selec=on
Applied Sta+s+cs and Compu+ng Lab

Sampling: Advantages
Time and money saved More accurate/reliable informa=on Opera=onal exibility Not a new idea! Applied in day to day life:
Taste only a spoonful of soup Only a hand full of grains are examined before buying the en=re sack
Applied Sta+s+cs and Compu+ng Lab

Sampling: Disadvantages
Sampling errors:
Bound to occur Due to fact that only a part of popula=on is considered

Non sampling errors as well

Applied Sta+s+cs and Compu+ng Lab

Sampling: Trivia
More popularly used than complete enumera=on Sampling method depends on ques=on at hand

Applied Sta+s+cs and Compu+ng Lab

Source: Primary Experimental Design


An experiment is a test Conducted to study a process or a system Outcome not known with certainty Inves=gator can control for many condi=ons Data arise from such experiments

Applied Sta+s+cs and Compu+ng Lab

Experimental Design, Example

Source: Primary

To conduct an experiment on eye focus =me Eect of distance of object from eye on focus =me 4 dierent distances and 5 subjects Dierences between individuals ( more on this in future sessions)

Applied Sta+s+cs and Compu+ng Lab

Experimental Design, Example


The results of the experiment or the data collected was:
DISTANCE (<) 4 6 8 10 SUBJECT 1 10 7 5 6 2 6 6 3 4 3 6 6 3 4 4 6 1 2 2 5 6 6 5 3

Source: Primary

Applied Sta+s+cs and Compu+ng Lab

Advantages and Disadvantages Advantages:


Accurate as we can control for error Random observa=ons ( we do not know results in advance)

Experimental Design

Disadvantages:
Experimenter bias Cannot design experiments for all studies ( cars example)

Applied Sta+s+cs and Compu+ng Lab

Source: Secondary
Have already been collected and compiled Readily available E.g.: scores data! The school already has a mark sheet for each student!

Applied Sta+s+cs and Compu+ng Lab

Source: Secondary contd..


Published sources Govt. publica=ons ( RBI, Labour Gazeje, NSSO etc) Interna=onal bodies (WHO,IBRD,IMF etc) Research scholars Other Secondary Sources Compustat Prowess IndiaStat
Applied Sta+s+cs and Compu+ng Lab

Secondary sources: Advantages


Easily available 1st hand informa=on might not be easy to collect

Applied Sta+s+cs and Compu+ng Lab

Secondary sources: Disadvantages


Format Missing data Outdated :

Dierent sources:

Want to study poverty rates but deni=on of the poverty line has changed over the years Data available from many sources Have to be merged E.g.: we have prices from one secondary source in dollars and other data in rupees from another source. Merging them is not easy.

Need to consider exchange rates at dierent =mes, Purchasing power parity dierences

Applied Sta+s+cs and Compu+ng Lab

Thank you

Applied Sta+s+cs and Compu+ng Lab