A distribution obtained from the multiplying the ratio of sample variance to population variance by the degrees of
freedom when random samples are selected from a normally distributed population
Contingency Table
Data arranged in table form for the chi-square independence test
Expected Frequency
The frequencies obtained by calculation.
Goodness-of-fit Test
A test to see if a sample comes from a population with the given distribution.
Independence Test
A test to see if the row and column variables are independent.
Observed Frequency
The frequencies obtained by observation. These are the sample frequencies.
The chi-square (
) distribution is obtained from the values of the ratio of the sample variance and population
variance multiplied by the degrees of freedom. This occurs when the population is normally distributed with
population variance sigma^2.
Chi-square is non-negative. Is the ratio of two non-negative values, therefore must be nonnegative itself.
Chi-square is non-symmetric.
There are many different chi-square distributions, one for each degree of freedom.
The degrees of freedom when working with a single population variance is n-1.
Chi-Square Probabilities
Since the chi-square distribution isn't symmetric, the method for looking up left-tail values is different from the
method for looking up right tail values.
Area to the left - the table requires the area to the right, so subtract the given area from one and
look this area up in the table.
Area in both tails - divide the area by two. Look up this area for the right critical value and one
minus this area for the left critical value.
DF which aren't in the table
When the degrees of freedom aren't listed in the table, there are a couple of choices that you have.
You can interpolate. This is probably the more accurate way. Interpolation involves estimating the
critical value by figuring how far the given degrees of freedom are between the two df in the table and going that
far between the critical values in the table. Most people born in the 70's didn't have to learn interpolation in high
school because they had calculators which would do logarithms (we had to use tables in the "good old" days).
You can go with the critical value which is less likely to cause you to reject in error (type I error).
For a right tail test, this is the critical value further to the right (larger). For a left tail test, it is the value further to
the left (smaller). For a two-tail test, it's the value further to the left and the value further to the right. Note, it is
not the column with the degrees of freedom further to the right, it's the critical value which is further to the
right. The Bluman text has this wrong on page 422. The guideline is right, the instructions are wrong.
Stats: Goodness-of-fit Test
The idea behind the chi-square goodness-of-fit test is to see if the sample comes from the population with the
claimed distribution. Another way of looking at that is to ask if the frequency distribution fits a specific pattern.
Two values are involved, an observed value, which is the frequency of a category from a sample, and the
expected frequency, which is calculated based upon the claimed distribution. The derivation of the formula is very
similar to that of the variance which was done earlier (chapter 2 or 3).
The idea is that if the observed frequency is really close to the claimed (expected) frequency, then the square of
the deviations will be small. The square of the deviation is divided by the expected frequency to weight
frequencies. A difference of 10 may be very significant if 12 was the expected frequency, but a difference of 10
isn't very significant at all if the expected frequency was 1200.
If the sum of these weighted squared deviations is small, the observed frequencies are close to the expected
frequencies and there would be no reason to reject the claim that it came from that distribution. Only when the
sum is large is the a reason to question the distribution. Therefore, the chi-square goodness-of-fit test is always
a right tail test.
The test statistic has a chi-square distribution when the following
assumptions are met
The data are the observed frequencies. This means that there is only one data value for each
category. Therefore, ...
The degrees of freedom is one less than the number of categories, not one less than the sample
size.
The value of the test statistic doesn't change if the order of the categories is switched.
Stats: Test for Independence
In the test for independence, the claim is that the row and column variables are independent of each other. This is
the null hypothesis.
The multiplication rule said that if two events were independent, then the probability of both occurring was the
product of the probabilities of each occurring. This is key to working the test for independence. If you end up
rejecting the null hypothesis, then the assumption must have been wrong and the row and column variable are
dependent. Remember, all hypothesis testing is done under the assumption the null hypothesis is true.
The test statistic used is the same as the chi-square goodness-of-fit test. The principle behind the test for
independence is the same as the principle behind the goodness-of-fit test. The test for independence is always a
right tail test.
In fact, you can think of the test for independence as a goodness-of-fit test where the data is arranged into table
form. This table is called a contingency table.
The test statistic has a chi-square distribution when the following
assumptions are met
The degrees of freedom are the degrees of freedom for the row variable times the degrees of
freedom for the column variable. It is not one less than the sample size, it is the product of the two degrees of
freedom.
The expected value is computed by taking the row total times the column total and dividing by the
grand total
The value of the test statistic doesn't change if the order of the rows or columns are switched.
The value of the test statistic doesn't change if the rows and columns are interchanged (transpose
of the matrix)
..
Clifford woody: Research is a careful enquiry or examination in seeking facts or principles, a diligent
investigation
to
ascertain
something.
Mouley: It is the process of arriving at dependable solution to the problems through the planned and systematic
collection, analysis and interpretation of data.
Research is a logical and systematic search for new and useful information on a particular topic. It is an
investigation of finding solutions to scientific and social problems through objective and systematic analysis. It is
a search for knowledge, that is, a discovery of hidden truths. Here knowledge means information about matters.
The information might be collected from different sources like experience, human beings, books, journals, nature,
etc. A research can lead to new contributions to the existing knowledge. Only through research is it possible to
make progress in a field. Research is indeed civilization and determines the economic, social and political
development of a nation. The results of scientific research very often force a change in the philosophical view of
problems which extend far beyond the restricted domain of science itself.
Research is not confined to science and technology only. There are vast areas of research in other disciplines
such as languages, literature, history and sociology. Whatever might be the subject, research has to be an active,
diligent and systematic process of inquiry in order to discover, interpret or revise facts, events, behaviours and
theories. Applying the outcome of research for the refinement of knowledge in other subjects, or in enhancing the
quality of human life also becomes a kind of research and development.
Research is done with the help of study, experiment, observation, analysis, comparison and reasoning.
Research is in fact ubiquitous. For example, we know that cigarette smoking is injurious to health; heroine is
addictive; cow dung is a useful source of biogas; malaria is due to the virus protozoan plasmodium; AIDS
(Acquired Immuno Deficiency Syndrome) is due to the virus HIV (Human Immuno Deficiency Virus). How did
we know all these? We became aware of all these information only through research. More precisely, it seeks
predictions of events, explanations, relationships and theories for them.
Twitter
Pivoted from: Odeo
In 2005, Evan Williams and Biz Stone designed a platform to create, browse, and share podcasts. They were
making a bet that podcasting would become a mainstream medium for sharing news and broadcasting opinion.
They would eventually be proven correct, but not before Apple launched podcast support for iTunes in June.
Williams and Stone took a step back and researched the new market, exploring user adoption rates, technology,
and customer acquisition costs. They concluded that they had no real chance of competing against Apple.
Crucially, however, they didn't simply give up. They realized that the platform they had built had tremendous
scalability and potential. Suppose they doubled-down on simplicity, and just made a portal where people could
share what they were up to. They looked at existing social networks like Facebook, and researched customer
dissatisfaction. Users loved Facebook for photo-sharing and friend-snooping, but often found the News Feed to
be overwhelming and cluttered. Their new venture, Twitter, would provide a back-to-the-basics feed of
information, with a focus on news and celebrity. It seemed crazy, but they pulled it off, accomplishing one of the
most successful pivots of the 21st century.
..
Advantages of primary data: Advantages of primary data are as follows:
The primary data are original and relevant to the topic of the research study so the degree of accuracy is very
high.
Primary data is that it can be collected from a number of ways like interviews, telephone surveys, focus groups
etc. It can be also collected across the national borders through emails and posts. It can include a large population
and wide geographical coverage.
Moreover, primary data is current and it can better give a realistic view to the researcher about the topic under
consideration.
Reliability of primary data is very high because these are collected by the concerned and reliable party.
Greater Control
Not only does primary research enable the marketer to focus on specific issues, it also enables the marketer to
have a higher level of control over how the information is collected. In this way the marketer can decide on
such issues as size of project (e.g., how many responses), location of research (e.g., geographic area) and time
frame for completing the project.
Proprietary Information
Information collected by the marketer using primary research is their own and is generally not shared with
others. Thus, information can be kept hidden from competitors and potentially offer an information
advantage to the company that undertook the primary research.
Cost
Compared to secondary research, primary data may be very expensive since there is a great deal of marketer
involvement and the expense in preparing and carrying out research can be high.
Time Consuming
To be done correctly primary data collection requires the development and execution of a research plan. Going
from the start-point of deciding to undertake a research project to the end-point to having results is often much
longer than the time it takes to acquire secondary data.
Some research projects, while potentially offering information that could prove quite valuable, are not within
the reach of a marketer. Many are just too large to be carried out by all but the largest companies and some are
not feasible at all. For instance, it would not be practical for McDonalds to attempt to interview every customer
who visits their stores on a certain day since doing so would require hiring a huge number of researchers, an
unrealistic expense. Fortunately, as we will see in a later tutorial there are ways for McDonalds to use other
methods (e.g., sampling) to meet their needs without the need to talk with all customers.
.
Merits
Degree of accuracy is quite high.
Primary source of data collection frequently includes definitions of various terms and units
used.