By Kshitij Chaurasia

Definitions. History. Concepts of causation. Defining the variable in an association. Types of Association. Spurious association. Indirect association. Direct association. Additional criteria for judging causality. Measuring an association. Problems in establishing causality. Establishing a causal inference. References.

Defining an association

Concurrence of two variables (A and B) more often than would be expected by chance.

An association is present if probability of occurrence of a variable depends upon one or more variable. (A dictionary of Epidemiology by John M. Last)

Synonyms: correlation, statistical dependence, relationship

An association is said to be causal when it can be proved that change in the independent variable produces change in the dependent variable. B A


An exercise in measurement of an effect rather than as a criterion-guided process for deciding whether an effect is present or not.

Am J Public Health. 2005;95:S144S150

If one of these attributes say A is the suspected cause and the other say B is a disease then we have a reason to suspect that A has caused B.

Karl Popper stressed that science progresses by rejecting or modifying causal hypotheses, not by actually proving causation.

1835- Pierre-Charles-Alexandre Louis The "Father of Medical Statistics a clinician, selected 77 patients of homogeneous group with the same, well-characterized form of pneumonia for his bloodletting analysis.

John Snow (15 March 1813 16 June 1858) a British physician and a leader in the adoption of anaesthesia and medical hygiene. He is considered to be one of the fathers of epidemiology, because of his work in tracing the source of a cholera outbreak in Soho, England, in 1854.

` ` ` ` `

Up to the time of Louis Pasteur (1895-1922) various concept of disease causation were in vogue, e.g., Supernatural theory of disease, The theory of humors, The concept of contagion, Miasmatic theory of disease The theory of spontaneous regeneration

The concept gained momentum during the 19th and early part of 20th century. Emphasized one-to-one relationship between causal agent and disease. Disease agent Man Disease

Susceptible host (the person at risk for the disease), Disease agent (the proximate cause) Environmental context for interaction between host and agent

Pettenkofer of Munich (1819-1901) was an early proponent of this concept. Germ theory of disease overshadowed the multiple cause theory. Example: Tuberculosis is caused not merely due to tubercle bacilli, factors such as poverty, overcrowding and malnutrition also contribute.

In seeking a model that better expressed the complex reality of multicausality, some epidemiologists began thinking in terms of chains of causation.

Example- "diet-heart hypothesis" (DHH) as described by Sherwin (1978).

A diet high in saturated fat and cholesterol leads to high blood lipids, which lead to atherosclerosis (coronary artery disease), which leads to coronary heart disease and the clinical event of a myocardial infarct (heart attack). It was over simplified model.

Suggested by MacMohon, Pugh and Ipsen (1960) Model is suited in the study of chronic disease, where disease agent is often unknown but depends on multiple factors. Considers all predisposing factors and their complex interrelationship.

De-emphasizes the agent as the sole cause of disease, Emphasizing the interplay of physical, biological and social environments

Example: Potatoe famine of Ireland in mid 19th century

Fungal invasion of potato crops Predominantly peasant population subsisting on a potato diet Repressive British colonial rule.

Commonly used paradigm in the injury prevention field.

Causal relationships are used to make public health decisions and design interventions. In example, if smoking was indeed causal, it would be irresponsible to target coffee drinking as an intervention.

Independent variable: The variable which changes irrespective of dependent variable.

Dependent variable: The variable which changes according to dependent variable.

Independent and dependent variable depends on the study hypothesis variable involved in hypertension CHD Independent


Salt intake Hypertension causes Obesity





Intermediate or intervening variables :

causes causes Salt intake hypertension CHD


Social condition or Development ( causal variable ) access to prenatal care, better nutrition, vaccination, better personal hygiene.

Some independent variables may modify the effect of the hypothesized casual variable. black

Hypertension CHD Some confounding variables are also effect modifiers.

It can be grouped under three headings Spurious association. Indirect association. Direct (causal) association. On the basis of causality Causal Non causal Positive and Negative

is an association which appears due to improper comparison.


association between a disease and suspected factor may not be real.

In an study in UK neonatal mortality was observed to be more in the newborns born in a hospital than those born at home. This is likely to lead to a conclusion that home delivery is better for the health of newborn. However, this conclusion was not drawn in the study because the proportion of high risk deliveries was found to be higher in the hospital than in home.

Statistical association between a characteristic of interest and a disease due to the presence of another factor, known or unknown.

A (Altitude) C (Iodine Deficiency)

B (Endemic goitre)

One-to-one causal association: Two variables are stated to be causally related (AB) if change in A is followed by a change in B. When the disease is present, the factor must also be present.


B (Disease)

Example: Koch's postulates

The microorganism must be found in abundance in all organisms suffering from the disease, but should not be found in healthy animals.

The microorganism must be isolated from a diseased organism and grown in pure culture.

The cultured microorganism should cause disease when introduced into a healthy organism.

The microorganism must be reisolated from the inoculated, diseased experimental host and identified as being identical to the original specific causative agent.

2. Multifactorial causation:

Multiple factors are involved in causing the disease, ex. CHD.

If E is the exposure factor & D is the disease E1 E2 D E3 E1 D E2 Conditionally causal Independently causal

E1 + E2

D Synergism

E2 E1 D

Effect modification ( or form of synergism)

E2 E1 D Confounding association of E1 and D.

One cause with multiple effects

E Radiation

D1 Leukemia D2 Lung Cancer D3 Radiation sickness

In absence of controlled experimental evidence to incriminate the cause other criteria to decide causal association: Temporal Association. Strength of association. Specificity of the association. Consistency of the association. Biological plausibility. Coherence of the association.

Experimental evidence


Hills criteria




Biological gradient

 The causal attribute must precede the disease or unfavorable outcome.  Exposure to the factor must have occurred before the disease developed.  Length of interval between exposure and disease very important  If the disease develops in a period of time too soon after exposure, the causal relationship is called into question.

Asbetos and Lung Cancer

Well - established temporal relationship


Latent period of 10 - 20 yrs

Lung Cancer

New Study


Latent period of 3 yrs

Lung Cancer

In this case, the latent period is not long enough for lung cancer to develop if caused by exposure.

Relationship between cause and outcome could be strong or weak.

There are statistical methods to quantify the strength of association viz; calculation of relative risk, attributable risk etc.

(Incidence Rate)


Rate in exposed (E) Rate in unexposed (U)

Attributable fraction:


Odds ratio*:

* Used in case-control studies

The larger the relative risk or odds ratio, the higher the likelihood that the relationship is causal. However, care must be taken to examine confidence intervals and sample size.
For example, if the confidence interval is wide (e.g., 1.8 - 22.6), an OR of 12.0 is less strong because we are less confident of the strength of the odds ratio.

Consistency is the occurrence of the association at some other time and place repeatedly. If a relationship is causal, the findings should be consistent with other data. If there is no consistency it will weaken a causal interpretation. Example: The causal association between smoking and lung cancer due to its consistency.

 The weakest of the criteria (should probably be eliminated)  Specific exposure is associated with only one disease.  This is used by tobacco companies to argue that smoking is not causal in lung cancer.
 Smoking is associated with many diseases.

 Specificity implies a one to one relationship between the cause and effect.

Causal significance of an association is its unity with known facts that are thought to be related.

E.g.: the rising consumption of tobacco in the form of cigarettes and the rising incidence of lung cancer are coherent.

The association must be consistent with the other knowledge (viz mechanism of action, evidence from animal experiments etc). Sometimes the lack of plausibility may simply be due to the lack of sufficient knowledge about the pathogenesis of a disease.

With increasing dose, there is increasing risk of disease. This is not considered necessary for a causal relationship, but does provide additional evidence that a causal relationship exists. With increasing level of exposure to the risk factor an increase in incidence of the disease is found.

Age-standardized death rates due to wellestablished cases of bronchogenic carcinoma

If there is a true causal relationship between exposure and disease, the expectation is that we would see the association consistently in other (NOT necessarily all) subgroups of the population.

Upon elimination or reduction of exposure to the factor, the risk of disease declines. Strengthen the association being causal. Example: diminishing of leukoplakia lesion on cessation of tobacco chewing. HOWEVER, in certain cases, the damage may be irreversible. Example: Emphysema is not reversed with the cessation of smoking, but its progression is reduced.







Bias is systematic favoritism (error) that is present in the data collection process resulting in misleading results. Reasons:
No control over participants in studies To not obtain representative sample of population under study Difficulty to measure variables

Types of bias:
Selection bias: When there is a systematic difference between the characteristics of people selected for a study and those who are not.

x Example: In a study for assessing tobacco habit people who responded were not having tobacco habit where as people who not responded were having.

Measurement Bias: When measurement or classifications of disease or exposure rate inaccurate.

Example: Biochemical or physiological measurements are never completely accurate.

When another exposure exists in the study population associated with both disease and exposure being studied. Example:

Exposure (coffee drinking)

Disease (Heart Disease)

Confounding factor (Cigarette smoking)

Experimental studies
in vitro systems animal studies in controlled environments Allows for
x control of precise dose x control of environmental conditions x follow up

Problems with
x extrapolating data to human populations x human diseases with no good animal models

Clinical pathologies

microbiological studies

Second step in determining causation: Conducting Studies in Human Populations

 Human Epidemiology.  All of the study designs are important here and provide different evidence for or against a causal hypothesis.

Clinical observations

Available data (Ecological or Cross-sectional Studies)

Case-control studies

Cohort studies

Randomized trials (only used for potentially beneficial treatments)

Nature of Causation
Token causal claims Type causal claims

Types of Causal relationships

Direct Indirect

Types of causal factors

Sufficient Necessary

Claims about causation between particular tokens, not populations

Event A caused event B Tobacco caused cancer Having property A caused X to have property B Smoking caused high temperature on palate to cause smokers palate

- Thing 1 having property A caused Thing 2 to have property B Sugar diet caused caries in caries prone individual.

About causation that occurs in general, or in the population Events of type A cause events of type B Tobacco habit causes high prevalance of lung cancer Having property A causes things of type X to have property B Some smokers get smokers palate because they smoke

- Thing 1 having property A caused Thing 2 to have property B Caries prone individuals have high caries rate who are on sugar diet.

Direct Factor

Indirect Factor 1 Factor 2 Factor 3 Factor 4



Direct (F508 Polymorphism

Indirect High cholesterol Artery thickening Hemostatic factors

Cystic Fibrosis

Myocardial infarction

Predisposing factors: such as age, sex, previous illness etc. Enabling factors: such as low income, poor nutrition, bad housing etc Precipitating factors: such as exposure to a specific disease agent of noxious agent. Reinforcing factors: such as repeated exposure

Necessary and sufficient

Without factor, disease does not develop Example: HIV

Necessary but not sufficient

Multiple factors, including main factor, required Example: Development of tuberculosis requires M. tuberculosis and other factors, such as immunosuppression, to cause disease Bacteria still necessary, but not sufficient to cause the disease

Sufficient but not necessary

Factor can produce disease, but not necessary Example: Both radiation exposure and exposure to benzene are sufficient to cause leukemia, but neither are necessary if the other present.

Neither sufficient nor necessary

Complex models of disease etiology Example: High fat diet and heart disease, hypertension, diabetes, certain kinds of cancer

The existence of a correlation or association does not necessarily imply causation. The concept of single cause, once held in relation to communicable disease, has been replaced by concept of multiple causation in disease. The criteria used in establishing causality in infectious disease are not applicable to non infectious diseases, KOCHS Postulates.(not totally applicable in some infectious diseases.) Relatively long period between exposure & clinical appearance of a disease..

Certain factors or confounders tend to distort the relationship with suspected factors

Spurious associations between a disease & suspected factors.

Association is symmetric Causation is asymmetric Example X associated with Y Y associated with X X causes Y Y causes X In fact, for token-causation, we think we have: X causes Y Y does not cause X

Although different, they are connected In general, If X causes Y, then X will be associated with Y If X and Y are associated, then there is some sort of causal connection between them ` Statistics is relevant to science precisely because the two are connected ` Causal inference is really the problem of moving between these two types of claims

Statistical association established Selection and information bias excluded. Confounding excluded or neutralized & association persists. Confirmatory criteria of causality (strength, biological factor, consistent, experimental proof.)


` ` ` ` `

Parks text book of Preventive and Social Medicine, K.Park. Biostatistics by Mahajan Basic epidemiology by Beaglehole

1. To study historically the rise and fall of disease in the population. 2. Community diagnosis. 3. Planning and evaluation. 4. Evaluation of individuals risk chances. 5. Syndrome identification. 6. Completing the natural history of disease. 7. Searching for causes and risk factors.

Communicable diseases are transmitted from the reservoir or source of infection to susceptible host.
Source or Reservoir Modes of Transmission

Susceptible Host

The source of infection is define as the person, animal, object or substance from which an infectious agent passes or is disseminated to the host. A reservoir is define as any person, animal, arthropod, plant, soil or substance or combination of these in which infectious agent lives and multiplies Reservoir is natural habitants in which the organism metabolizes and replicates.

Reservoir and Source are not always synonyms. For example, hookworm infection, In tetanus, Reservoir a) homologous reservoir. b) heterologous reservoir. The reservoir may be of three types: a) Human reservoir. b) Animal reservoir. c) Reservoir in non living things.

a) Cases : A person in a population or study group identified as having the particular disease. Presence of infection in a host : a) Clinical illness mild, moderate, severe, fatal, b) Subclinical. infection. - dominant role in spread of

c)Latent infection.- the host does not shed the infectious agent which lies dormant within the host without Herpes simplex

B) CARRIERS : an infected person or animal that harbors a specific infectious agent in the absence of discernible clinical disease and serves as a potential source of infection for others. Carriers may be classified as : A) Type :a) Incubatory who shed the infectious agent during the incubation period of the disease. eg.measles,mumps polio,hepatitis

b) convalescent carriers : continue to shed the disease agent during the period of convalescent, e.g. typhoid fever. c) healthy carriers : they are victim of subclinical infection who have developed carrier state without suffering from the disease. e.g. cholera, diphtheria, polio. B) Duration a) Temporary b) chronic C) By portal of exit: urinary carrier, respiratory, nasal.

The source of infection may sometimes be animals and birds. These, like human sources of infection, may cases or carriers. Zoonoses The diseases and infections which are transmitted to man from vertebrates. E.g. are rabies, yellow fever, influenza. There is evidence that genetic recombination between animal and human virus might produce new strain of viruses e.g. influenza viruses.

Soil and inanimate matter can also act as reservoirs of infections. For example, soil may harbour agent that cause tetanus, anthrax, mycetoma.

Communicable diseases may be transmitted from the reservoir or source of infection to a susceptible individual in many different ways. A) Direct Transmission B) Indirect Transmission a)Direct contact b)Droplet infection c)Contact with soil d)Inoculation into skin or mucosa e)Transplacental

B) INDIRECT TRANSMISSION a) Vehicle born b) Vector born -- mechanical -- biological c) Air born ---Droplet nuclei -- Dust d) Fomite born e) Unclear hands and fingers

For the successful parasitism : 1. Infectious agent must find a portal of entry by which it may enter the host e.g. respiratory tract, alimentary tract etc. 2. On gaining entry into the host the organisms must reach the appropriate tissue for its multiplication and survival 3. The disease agent must find a way out of the body inorder to reach a new host. 4. After leaving the human body the organism must survive in the external environment for a sufficient period till a new host is found. This is called successful parasitism.

Definition: The incubation period is the amount of time between infection with a virus or bacteria to the start of symptoms.

Rocky Mountain spotted fever - 2-14 days Smallpox - 12 days Common cold - 2-5 days Measles - 8-12 days Chicken pox - 14-16 days Erythema infectiosum (Fifth Disease) - 13-18 days Roseola - 9-10 days Rubella (German measles) - 14-21 days Influenza - 1-2 days

Generation Time Communicable period Secondary attack rate

When the incidence of a condition in a group with certain characteristic differs from the incidence in a group without the characteristics , an association is inferred that may or may not be causal. The strength of the association is commonly measured by the relative risk or odd ratio. The relationship can also be expressed in terms of a correlation coefficient , which is a measure of a degree to which a dependent variable varies with an independent variable.

Exposure OR Genetic Background OR Combination of Both Association ? Causation ?

Disease or Other Outcome

Suppose we determine that an exposure is associated with disease. How do we know if the observed association reflects a causal relationship?