Anda di halaman 1dari 5

Types of Data

Qualitative data

Qualitative data is a categorical measurement expressed not in terms of numbers, but rather
by means of a natural language description. In statistics, it is often used interchangeably with
"categorical" data.

Although we may have categories, the categories may have a structure to them. When there
is not a natural ordering of the categories, we call these nominal categories. Examples might
be gender, race, religion, or sport.

When the categories may be ordered, these are called ordinal variables. Categorical
variables that judge size (small, medium, large, etc.) are ordinal variables. Attitudes (strongly
disagree, disagree, neutral, agree, strongly agree) are also ordinal variables, however we
may not know which value is the best or worst of these issues. Note that the distance
between these categories is not something we can measure.

Quantitative data

Quantitative data is a numerical measurement expressed not by means of a natural


language description, but rather in terms of numbers. However, not all numbers are
continuous and measurable. For example, the social security number is a number, but not
something that one can add or subtract.

Quantitative data always are associated with a scale measure.

Probably the most common scale type is the ratio-scale. Observations of this type are on a
scale that has a meaningful zero value but also have an equidistant measure (i.e., the
difference between 10 and 20 is the same as the difference between 100 and 110). For
example, a 10 year-old girl is twice as old as a 5 year-old girl. Since you can measure zero
years, time is a ratio-scale variable. Money is another common ratio-scale quantitative
measure. Observations that you count are usually ratio-scale (e.g., number of widgets).

Branches of Statistics
Descriptive Statistics

Descriptive statistics deals with the presentation and collection of data. This is usually
the first part of a statistical analysis. It is usually not as simple as it sounds, and the
statistician needs to be aware of designing experiments, choosing the right focus group

and avoid biases that are so easy to creep into the experiment.
Different areas of study require different kinds of analysis using descriptive statistics. For
example, a physicist studying turbulence in the laboratory needs the average quantities

that vary over small intervals of time. The nature of this problem requires that physical
quantities be averaged from a host of data collected through the experiment.
Inferential Statistics

Inferential statistics, as the name suggests, involves drawing the right conclusions from
the statistical analysis that has been performed using descriptive statistics. In the end, it
is the inferences that make studies important and this aspect is dealt with in inferential

statistics.
Most predictions of the future and generalizations about a population by studying a
smaller sample come under the purview of inferential statistics. Most social sciences
experiments deal with studying a small sample population that helps determine how the
population in general behaves. By designing the right experiment, the researcher is able
to draw conclusions relevant to his study.

Independent Variable

Variables are properties or characteristics of some event, object, or person


that can take on different values or amounts (as opposed to constants such
as that do not vary). When conducting research, experimenters often
manipulate variables. For example, an experimenter might compare the
effectiveness of four types of antidepressants. In this case, the variable is
"type of antidepressant." When a variable is manipulated by an
experimenter, it is called an independent variable.

Dependent Variable

The experiment seeks to determine the effect of the independent variable on


relief from depression. In this example, relief from depression is called a
dependent variable.

4 Types of Measurement
Nominal
Lets start with the easiest one to understand. Nominal scales are used for labeling
variables, without any quantitative value. Nominal scales could simply be called
labels. Here are some examples, below. Notice that all of these scales are
mutually exclusive (no overlap) and none of them have any numerical significance.
A good way to remember all of this is that nominal sounds a lot like name and
nominal scales are kind of like names or labels.
Ordinal

With ordinal scales, it is the order of the values is whats important and significant,
but the differences between each one is not really known. Take a look at the
example below. In each case, we know that a #4 is better than a #3 or #2, but we
dont knowand cannot quantifyhow much better it is. For example, is the
difference between OK and Unhappy the same as the difference between Very
Happy and Happy? We cant say.
Ordinal scales are typically measures of non-numeric concepts like satisfaction,
happiness, discomfort, etc.Ordinal is easy to remember because is sounds like
order and thats the key to remember with ordinal scalesit is the order that
matters, but thats all you really get from these.
Interval
Interval scales are numeric scales in which we know not only the order, but also the
exact differences between the values. The classic example of an interval scale is
Celsius temperature because the difference between each value is the same. For
example, the difference between 60 and 50 degrees is a measurable 10 degrees, as
is the difference between 80 and 70 degrees. Time is another good example of an
interval scale in which the increments are known, consistent, and measurable.
Ratio
Ratio scales are the ultimate nirvana when it comes to measurement scales
because they tell us about the order, they tell us the exact value between units,
AND they also have an absolute zerowhich allows for a wide range of both
descriptive and inferential statistics to be applied. At the risk of repeating myself,
everything above about interval data applies to ratio scales + ratio scales have a
clear definition of zero. Good examples of ratio variables include height and weight.
Ratio scales provide a wealth of possibilities when it comes to statistical analysis.
These variables can be meaningfully added, subtracted, multiplied, divided (ratios).
Central tendency can be measured by mode, median, or mean; measures of
dispersion, such as standard deviation and coefficient of variation can also be
calculated from ratio scales.
Summary
In summary, nominal variables are used to name, or label a series of values.
Ordinal scales provide good information about the order of choices, such as in a
customer satisfaction survey. Interval scales give us the order of values + the
ability to quantify the difference between each one. Finally, Ratio scales give us the
ultimateorder, interval values, plus the ability to calculate ratios since a true zero
can be defined.
Discrete probability distribution

A discrete random variable is a random variable that can take on any value from a
discrete set of values. The set of possible values could be finite, such as in the case
of rolling a six-sided die, where the values lie in the set {1,2,3,4,5,6}. However, the
set of possible values could also be countably infinite, such as the set of integers
{0,1,1,2,2,3,3,}. The requirement for a discrete random variable is that we
can enumerate all the values in the set of its possible values, as we will need to sum
over all these possibilities.
For a discrete random variable X, we form its probability distribution function by
assigning a probability that X is equal to each of its possible values. For example,
for a six-sided die, we would assign a probability of 1/6 to each of the six options. In
the context of discrete random variables, we can refer to the probability distribution
function as a probability mass function. The probability mass function P(x) for a
random variable X is defined so that for any number x, the value of P(x) is the
probability that the random variable X equals the given number x,
Continuous probability distribution
A continuous random variable is a random variable that can take on any value from
a continuum, such as the set of all real numbers or an interval. We cannot form a
sum over such a set of numbers. (There are too many, since such a continuum is
uncountable.) Instead, we replace the sum used for discrete random variables with
an integral over the set of possible values.
For a continuous random variable X, we cannot form its probability distribution
function by assigning a probability that X is exactly equal to each value. The
probability distribution function we must use in the case is called a probability
density function, which essentially assigns the probability that X is near each value.
For intuition behind why we must use such a density rather than assigning
individual probabilities, see the page that describes the idea behind the probability
density function.
Null Hypothesis

null hypothesis usually refers to a general statement or default position that


there is no relationship between two measured phenomena, or no difference
among groups.[1] Rejecting or disproving the null hypothesisand thus
concluding that there are grounds for believing that there is a relationship
between two phenomena (e.g. that a potential treatment has a measurable
effect)is a central task in the modern practice of science, and gives a
precise sense in which a claim can be proven false.
The null hypothesis is generally assumed to be true until evidence indicates
otherwise. In statistics, it is often denoted H0 (read H-naught, "H-null", or
"H-zero").
The concept of a null hypothesis is used differently in two approaches to
statistical inference. In the significance testing approach of Ronald Fisher, a

null hypothesis is rejected on the basis of data that is significantly unlikely if


the null is true, but the null hypothesis is never accepted or proved. This is
analogous to a criminal trial, in which the defendant is assumed to be
innocent (null is not rejected) until proven guilty (null is rejected) beyond a
reasonable doubt (to a statistically significant degree).
Alternative Hypothesis

alternative hypothesis (or maintained hypothesis or research hypothesis) and


the null hypothesis are the two rival hypotheses which are compared by a
statistical hypothesis test.
An example might be where water quality in a stream has been observed
over many years and a test is made of the null hypothesis that there is no
change in quality between the first and second halves of the data against the
alternative hypothesis that the quality is poorer in the second half of the
record.

Anda mungkin juga menyukai