METHODOLOGY
A PROJECT
in the subject of Research Methodology in Commerce
SUBMITTED TO
UNIVERSITY OF MUMBAI
FOR SEMESTER -IV OF
MASTER OF COMMERCE
BY.
KHAN MOHD.MOHSIN
Roll No.(10)
Specialization: Business Management
UNDER THE GUIDANCE OF
Dr Vivek Deolankar
YEAR - 2015-16
I, Shri Khan Mohd. Mohsin, student of M. Com. Part-II Roll Number (10), at the
Department of Commerce, University of Mumbai do hereby declare that the project
titled, Assignement of Research Methodology submitted by me in the subject of
Research Methodology in Commerce for Semester IV during the academic year 201516, is based on actual work carried out by me under the guidance and supervision of Dr
Vivek Deolankar
I further state that this work is original and not submitted anywhere else for any other
examination.
Date :
Mumbai
Signature of Student
EVALUATION CERTIFICATE
This is to certify that the undersigned have assessed and evaluated the project on
Assignment of Research Methodology in the subject of Research Methodology in
Commerce submitted by Kum/Smt/Shri Khan Mohd. Mohsin , student of M. Com.
Part-II at the Department of Commerce, University of Mumbai for Semester IV during
the academic year 2015-16.
This project is original to the best of our knowledge and has been accepted for Internal
Assessment.
Internal Examiner
External Examiner
Director
University of Mumbai
Department of Commerce
Internal Assessment: Subject: Research Methodology in Commerce
Name of Student
Class
Branch
Roll
Number
Business
M.COM
Surname : KHAN
PART -II
Management
10
sem -IV
Marks Awarded
Signature
DOCUMENTATION
Internal Examiner
(Out of 10 Marks)
External Examiner
(Out of 10 Marks)
Presentation
(Out of 10 Marks)
Viva and Interaction
(Out of 10 Marks)
Total Marks
(Out of 40 Marks)
INDEX
Contents
Q.No.
S.No.
Particular
Page
No.
1.1
1.2
1.3
Chapter 1
Meaning of Data Processing
Significance of Data Processing
Problems in Data Processing
Chapter 2
1
2-6
6-7
2.1
8-22
2.1.1 EDITING
8-11
2.2.1 CODING
12-13
3.1
3.2
3.3
3.4
4.1
4.2
14
2.4.1 TABULATION
15-17
18-22
Chapter 3
Measure of Central Tendency
Correlation Analysis
Regression Analysis
Measure of Dispersion
Conclusion
Bibliography
23-32
33-35
36
37-39
40
41
Answer 1
1.1 Meaning of Data Processing
Processing refers to subjecting the data collected to a process in which, the accuracy,
uniformity of entries and consistency of information gathered are examined. It is a
very important stage before the data is analyzed. Most commonly processing is
understood as editing, coding, classification, and tabulation of the data collected.
After collecting data, the methodology of converting raw data into meaningful
statement; includes data processing, data analysis, and data interpretation and
presentation. Data processing involves main stages such as editing, coding,
classification, tabulation, and graphic presentation of data.
Data processing is a process of skilfully organizing of data for the purpose
of data analysis and interpretation.Data processing can be done manually when
the data collected is limited or it can be done mechanically when the collected data
involve huge quantities.
Data processing is a intermediary stage between data collection and data
analysis. The completed instruments of data collection, such as interview
questionnaires, data sheets, and field notes contain a vast mass of data. The collected
data instruments are like raw materials and therefore, they cannot straightaway
provide answers to research questions. Therefore, there is a need for skillful
manipulation of data, i.e., data processing.
Significance of Coding
1. Facilitates Classification of Data:
Coding facilities classification of data. After providing codes to various
responses, the data can be classified into various categories. The coded responses can
be classified into categories such as age, gender, educational level, income level, area
wise, occupation wise, and so on.
2. Facilities Tabulation of Data:
Since coding facilities classifications of data. It becomes easier for the researcher
to the tabulate the data. The code responses are classified into different categories,
and accordingly the data is transferred to statistical tables. The tabulated data can
then be used for analysis and interpretation.
Significance of Classification
1. Protection and Management of Data:
From the time information is collected or created and until it is destroyed, it
should be classified to ensure it is protected, stored and managed appropriately. For
instance, information may be classified as public information, information for
internal use only, and confidential/ restricted information. The information needs to
be protected, stored and managed properly. The public information can be provided
to anyone - insiders and outsiders, the information that is classified as internal use
only should not be provided to outsiders, and the information that is classified as
internal use only should not be provided to outsiders, and the information that is
classified as confidential may be restricted only to top authorities in the
organization.
2. Facilities Tabulation of Data:
Coding and classification facilities tabulation of data. In fact coding is
considered as an important element of classification. The researcher assigns codes to
responses either at the pre-data collection stage or at the post data collection stage,
and accordingly the responses are classified into different categories. The classified
information is tabulated for proper analysis and interpretation.
instance, certain coded and classified data may not fit in the statistical tables.
Therefore, this may require changes in coding and classification of data.
5. Ease in Understanding of Data:
Tabulation helps the researchers to determine and communicate the findings in a
form which cane be easily understood by others. For instance, the tabulated data may
indicate high literacy in one state as compared to another. Therefore, one can easily
understand that the former state is more literate than the latter.
6. Facilities Location of Specific Data:
Tabulation helps to locate specific data required by the researcher. For example,
census data provides a wealth of geographic and demographic data, but a researcher
might need only certain segments of the data from certain locations. This specific
data about certain segments from certain locations can be easily identified from the
statistical tables - with reference to density of populations, gender ratio, life
expectancy, etc.
7. Supports Written Matter:
Statistical tables supports written matter. The written gets more important due to
statistical tables. This may be because; statistical tables gives a good feel of the
written matter, and also easy to understand.
Significance of Graphic Presentation
1.Quick Communication:
The graphs and charts can communicate the information at a glance. It does not
take much time to read and understand the message. One can easily understand the
data presented in the bar charts or pie diagrams, graphs and so on. For instance, a
graph can indicate at a glance the trends in slaves over a period of time, either
increasing or decreasing or showing a mixed trend.
2. Effective appeal:
The graphic presentation may have an effective appeal to the readers. For
instance, the pie diagrams, bar charts, graphs, etc, can be illustrated with the helps of
effective colours. In the graphs or charts easily attract attention and may create a
good impact on the mind of the readers with special reference to understanding the
data.
Editor Bias :
Data processing may get affected due to editor bias. For instance, there may be
inconsistency in the responses given by the respondent or some of the responses
may be complete. In such a situation, the editor may edit or complete the responses
in a biased manner.
2.
3.
4.
5.
6.
7.
8.
Problem of Tabulation :
Data processing becomes difficult if data is not tabulated properly. When there is
large number of tables, and lot of data to be tabulated., there is a possibility of errors.
For instance, the figure of Rs1,00,000 may be tabulated under the column of Rs
10,000 and therefore, faulty tabulation may lead to faulty analysis.
9.
data.
Answer 2
2.1 DATA PROCESSING STAGES
The various stages in Data Processing Stages are as follows :
STAGES OF DATA PROCESSING
EDITING
CODING
CLASSIFICATION
TABULATION
GRAPHIC PRESENTATION
2.1.1 EDITING
Editing is the process of examining errors and omissions in the collected data and making
necessary corrections in the same. This desirable when there is some inconsistency in the
response or responses as entered in the questionnaire or when it contains only a partial or
a vague answer.
When data collected through schedule and questionnaire there is a chance to lie
incompleteness, inaccuracy, inconsistency and absence of uniformity in the answers.
Editing is a first stage in data processing. It is a process which looks for as possible. If
error is left undetected at this stage, the research would not serve its purpose. Editing
8
ensure the complements, reliability and consistency of the data. It is a routine task of
checking the filled schedule and questionnaire.
Editing is a process of checking errors and omissions in data collection, and making
corrections, if required. Editing is required when :
There is inconsistency in responses given by the respondents.
Respondents may provide incorrect or false responses.
Some vague/incomplete answers given by the respondents.
No responses are provided by the respondents for certain questions.
There are following example of editing are as follows: The respondents has given answers which are in consistent with each other. In such a
case, the editor has to change one of the answers so as to make it consistent with the
other one, which can be suitably changed.
The respondent has marked two answers instead of one for a particular question. In
such a case, the editor has to carefully examine which of the two answer would be
more accurate. Sometimes, when a decision cannot be made categorically, he may
prefer to code no information for that question.
The respondent has answered a question by checking one of the many possible
categories contained in the questionnaire. In addition, the respondent has written
some remarks in the margin. These remarks do not go well with the particular
category marked by the respondent.
Sometimes the questionnaire contains imaginary and factious data. This may be due
to cheating by the interview who may fill in the entries in the questionnaire without
actually interviewing the respondent. This may also happen in case of mail
questionnaire, where the respondent has given an arbitrary answer without exercising
any care. The editor has to exercise his judgment in this regard.
Another type of editing is central editing, which is undertaken after the questionnaires
have been received at the headquarters. As far as possible, a single editor should carry out
this task so that consistency in editing can be ensured. However, in the case of large
studies, this may not be physically possible. When two or more editors are entrusted with
the task of editing, it is necessary that they are given uniform guidelines so that maximum
9
Field Editing
Editing undertaken at the time of field survey is called as field editing. At the time
of interview, the interviewer may use several abbreviations due to constraint. These
abbreviations need to be spell out fully, at the time of processing of data.
b. Central Editing
Editing done at the central office is called central editing. A single editor should carry
out this task so that consistency in editing can be ensured. But in the case of large
studies, two or more editors can handle the task. Sometime, the entries questionnaire
may be divided in two parts, and each part can be edited by separate editor.
11
2.2.1 CODING
Coding is the procedure of classifying the answers to a question into meaningful
categories. The symbols used to indicate these categories are called codes. Coding is
necessary to carry out the subsequent operations of tabulating and analyzing data. If
coding is not done, it will not be possible to reduce a large number of heterogeneous
responses into meaningful categories with the result that the analysis of data would be
weak and ineffective, and without proper focus.
Coding involves two steps. The first step is to specify the different categories or classes
into which the responses are to be classified. The second step is to allocate individual
answers to different categories.
Coding facilitates proper tabulation and analysis of data. One of the most important
points in this respect is that the categories must be all inclusive and mutually exclusively.
The all-inclusive and none. The other aspect is that categories must be mutually
exclusively i.e., they must not be overlapping and ambiguous. To give an example, a
person may, by occupation, be an industrial worker as well as unemployed. Here, two
concepts are dimensions have been used. The first is the occupational category and the
second is the current employment mutually exclusive. It would, therefore, be advisable to
use two category-set, one for the occupations and the other for the current employment
status.
The problem of coding is not so simple, especially in respect of an open-ended question.
He response to such a question is in a descriptive form, in the word of respondent
himself. For example, the respondent may be asked: what is your opinion regarding the
prohibition policy of the government? The respondent may give a lengthy answer
indicating what he feels about this policy. In case of such responses to an open-ended
question is to be included. He may first take down the entire response and then decide the
category in which it should be included.
12
13
Mutually Exclusive :
The categories must be mutually exclusive. A specific case or response must be
classified only once in one category only. For instance, on the basis of occupation,
one may place the response of a particular respondent in a definite pre-determined
category. But the problem may arise, if the respondent belongs to two categories.
2.
Appropriateness :
The classification/coding must be appropriate to the research work. For instance, a
researcher studying brand loyalty of readymade governments may classify the
population in certain groups appropriate to the survey. The senior citizens and the
kids may be ignored as they are not much loyal to the brands as far as readymade
garments are concerned.
3.
Exhaustive :
The classification must be exhaustive in nature. There must be a separate category
where the responses can be fitted or placed. The respondents must belong to a certain
category. For instance, if classification is based on students then there must be a
category for every class students. Therefore, there must be several classification. But
if there are too many groups, the researcher may include the isolated groups under
single category called as General Category.
14
Tabular Representation
%age
Number
124
11.1
211
18.9
3) I am not sure
204
18.3
200
17.9
1115
100.0
6) Yes i would
124
11.1
7) I would probably
211
18.9
8) Uncertain uninterested
780
115
69.0
100.00
15
When there are several column in the tables, then each column must be serially
numbered. Numbering of column facilitates easy reference.
Placing of column :
The column whose data are to be compared should placed side by side. Such
placement facilitates proper comparison.
Separation of column :
The columns must be separated by lines which made the table easily readable and
attractive. Thick lines must be placed to separate two unrelated columns.
Alignment of Data :
It is important that all the figures in a column should be suitably aligned. Positive
and negative signs must also be in perfect alignment.
Displaying of Data :
Display your data either by chronological order for time series or by using some
standard classifications. For longer time series it may be more appropriate to use the
reverse chronological order in some cases, such as for monthly unemployment.
No Empty Data Cells :
Do not leave any data cell empty. Missing values should be identified as not
available or not applicable. The abbreviation NA can apply to either (not
available or not applicable), so it needs to be defined in the footnote.
17
18
Bar Graphs :
A bar Graphs or bar charts is a chart with rectangular bars with lengths proportional
to the values that they represent. The bars can be plotted vertically or horizontally. A
bar chart is very useful for recording discrete data.
A bar graph is a chart that uses either horizontal or vertical bars to show comparisons
among categories.one axis of the chart shows the specific categories being compared,
and the other axis represents a discrete value.
Stacked bar graphs present the information in the same sequences on each bar. The
stacked bar can have two or more parts. For instance, the following diagram shows
stacked bar graph.
The following table shows subject wise distribution in three colleges :
8
0
7
6
0
5
E
c
o
n
m
i
c
s
4
0
M
a
g
e
n
t
A
u
t
a
y
3
0
2
S
a
l
e
R
e
v
n
u
e
1
0
o
le
gA
C
o
le
g
B
C
o
l
e
g
C
0C
P
e
rio
d
Accountancy
College A
300
Number of Students
College B
250
Management
200
250
100
Economics
Total
250
750
150
650
250
500
S
a
l
e
s
R
e
v
n
u
e
N
o
(
R
.
s
f
C
S
r
t
o
u
d
e
)
n
t
s
J
a
n
M
a
r
A
rJu
p
ilyl--JS
u
n
e
e
p
t
O
ct-D
e
c
Subject
Line
19
College C
150
Graphs :
A line graph shows information that is connected in some way. A line chart or line
graph is a type of chart which displays information as a series of data points called
markers connected by straight line series of data points called markers connected by
straight line segments. It is a basic type of chart common in many fields.
Line charts show how a particular data changes at equal intervals of time. A line
charts is often used to visualize a trend in data over intervals of time a time series
thus the line is often drawn chronologically.
Gantt Charts :
A Gantt charts is a type of bar chart, developed by Henry Gantt in the 1910s, that
illustrates a project schedule. For instance, a gantt chart may consist of two
1
2
0
8
0
A
s
i
a
6
E
u
r
o
p
e
m
c
a
4
0
2
0Ja
nF
e
bM
a
rA
p
r
horizontal or vertical bars for each period of time/activity. One bar indicates the
planned/anticipated performance, and the other bar indicates the actual performance.
Histograms :
A Histograms is a special kind of bar graph where the intervals are equal. In
statistics, a histogram is a graphical representation of the distribution of data. It is an
20
21
Answer 3
3.1 MEASURE OF CENTRAL TENDENCY
Meaning
A measure of central tendency is a single value that attempts to describe a set of data by
identifying the central position within that set of data. As such, measures of central
tendency are sometimes called measures of central location. They are also classed as
summary statistics. The mean (often called the average) is most likely the measure of
central tendency that you are most familiar with, but there are others, such as the median
and the mode.
22
There are the following definitions are :- According to George Simpson and Fritz
Kafka state that A measure of central tendency is a typical value around which the
other values congregate.
According to Jeff Clark states Average is an attempt to find one single figure to describe
the whole range of figures in a given series.
Characteristics of Measure of central Tendency
It should be simple to calculate and easy to understand.
It should be rigidly defined.
It should be based on all the observation.
It should not be affected by extreme items.
It should be capable of further algebraic treatment.
It should have sampling stability.
It can be easily calculated in the case of distributions containing open end classintervals.
It should be in the form of a mathematical formula.
MATHEMATICAL
POSITIONAL
Arithmetic Mean
* Meadian
Geometric Mean
* Mode
Harmonic Mean
23
1. MEAN (ARITHMETIC)
The mean (or average) is the most popular and well known measure of central tendency.
It can be used with both discrete and continuous data, although its use is most often with
continuous data. The mean is equal to the sum of all the values in the data set divided by
the number of values in the data set. So, if we have n values in a data set and they have
values x1, x2, ..., xn, the sample mean, usually (pronounced x bar).
Arithmetic mean is calculated with the following formula :
Arithmetic mean = Sum of values of all items / Total number of items
24
(A)
25
Direct Method
26
27
28
29
mean must be used when working with percentages (which are derived from values),
whereas the standard arithmetic mean will work with the values themselves.
Merits of Geometric Mean
It is rigidly defined and it is based on all the observations.
It is less affected by extreme values.
It is useful to obtain averages of percentages and ratios.
It is capable of further algebraic treatment.
It is least affected by the fluctuations of sampling.
It can be used to average rates of changes and to construct index numbers.
Demerits of Geometric Mean
It is difficult to understand and still, more difficult to compute.
It can be a value which does not exist in the series.
It brings out the difference in the value ratio of change and not of absolute difference.
It gives more weight-age to smaller items as compared to larger items.
Its value can not be obtained when there are some negative values or some of them
are zero.
3. HARMONIC MEAN
Harmonic mean of a series is the reciprocal of the arithmetic average of the reciprocal of
the values of its various items.
H.M. =
31
4. MEDIAN
Median is the middle value of a series when the data of a series is arranged in ascending
or descending order. It divides the series in two equal parts.
Calculation of Median
(A) Individual distribution
Formula
n = Number
of items
The same
formula is to
be used for even and odd number of items
(B) Discrete Series
32
n = number
of items
obtained by
taking
cumulative frequency.
(C) Continuous Distribution
5.
MODE
The mode is defined as the value of a variable which occurs most frequently. It is the
value which is repeated maximum number of items or with the highest frequency in the
series.
Croxton and cowden define The mode of a distribution is the value at the point
around which the items tend to be most heavily concentrated.
A.M. Tuttle defines Mode is the value which has the highest frequency density in its
immediate neighborhood.
A) Individual Numbers
Example: Marks obtained in a test by 10 Students - 20, 15, 14, 20, 16, 12, 18, 13, 19, 20.
Solution
(Arrangement in ascending order)
33
Merits
and
Demerits
Mode is to understand and easy to calculate from given series of items.
Mode is the most typical or representative value.
Mode is not affected by extreme values in the data; as it considers the value that is
frequently neighborhood of point of concentration is known.
Mode can be calculated in open-end class intervals or in those cases where the
neighborhood of point of concentration is known.
To calculate mode, there is no need to know the value of all items of the series.
The mode can be identified from mere inspection of the values in the series and there
is no need for calculation.
Mode can be determined graphically from a histogram.
Demerits
Mode is not rigidly define. A distribution may be bi-modal and multi-modal.
There is greater instability in mode. It is affected by sampling fluctuations.
34
35
The Pearson correlation is defined only if both of the standard deviations are finite and
nonzero. It is a corollary of the CauchySchwarz inequality that the correlation cannot
exceed 1 in absolute value. The correlation coefficient is symmetric:
corr(X,Y) = corr(Y,X).
36
If x
and y are results of measurements that contain measurement error, the realistic limits on
the correlation coefficient are not 1 to +1 but a smaller range.[6]
For the case of a linear model with a single independent variable, the coefficient of
determination (R squared) is the square of r, Pearson's product-moment coefficient .
Rank Correlation
Rank correlation coefficients, such as Spearman's rank correlation
coefficient and Kendall's rank correlation coefficient () measure the extent to which, as
one variable increases, the other variable tends to increase, without requiring that increase
to be represented by a linear relationship. If, as the one variable increases, the
other decreases, the rank correlation coefficients will be negative. However, this view has
little mathematical basis, as rank correlation coefficients measure a different type of
relationship than the Pearson product-moment correlation coefficient, and are best seen as
measures of a different type of association, rather than as alternative measure of the
population correlation coefficient.To illustrate the nature of rank correlation, and its
difference from linear correlation, consider the following four pairs of numbers (x, y):
(0, 1), (10, 100), (101, 500), (102, 2000).
37
As we go from each pair to the next pair x increases, and so does y. This relationship is
perfect, in the sense that an increase in x is always accompanied by an increase in y. This
means that we have a perfect rank correlation, and both Spearman's and Kendall's
correlation coefficients are 1, whereas in this example Pearson product-moment
correlation coefficient is 0.7544, indicating that the points are far from lying on a straight
line. In the same way if y always decreases when x increases, the rank correlation
coefficients will be 1, while the Pearson product-moment correlation coefficient may or
may not be close to 1, depending on how close the points are to a straight line. Although
in the extreme cases of perfect rank correlation the two coefficients are both equal (being
both +1 or both 1) this is not in general so, and values of the two coefficients cannot
meaningfully be compared. For example, for the three pairs (1, 1) (2, 3) (3, 2) Spearman's
coefficient is 1/2, while Kendall's coefficient is 1/3.
3.3.1 REGRESSOIN ANALYSIS
Regression analysis involves identifying the relationship between a dependent variable
and one or more independent variables. A model of the relationship is hypothesized, and
estimates of the parameter values are used to develop an estimated regression equation.
Various tests are then employed to determine if the model is satisfactory. If the model is
deemed satisfactory, the estimated regression equation can be used to predict the value of
the dependent variable given values for the independent variables.
TYPES OF REGRESSION MODEL
SIMPLE AND MULTIPLE
In simple linear regression, the model used to describe the relationship between
a single dependent variable y and a single independent variable x is y = a0 + a1x + k.
a0and a1 are referred to as the model parameters, and is a probabilistic error term that
accounts for the variability in y that cannot be explained by the linear relationship with x.
If the error term were not present, the model would be deterministic; in that case,
knowledge of the value of x would be sufficient to determine the value of y.
38
MEASURE OF DISPERSION
39
Relative Measures
Absolute Measures
Range
It is the difference the maximum value and the minimum value in a series of data. In
the other words, it is the difference between the largest value and the smallest value
of the distribution. It is an absolute measure.
Range = Largest Value - Smallest Value
Semi-Inter-Quartile Range
It is defined as follows.
Semi-inter-Quartile Range = Q3 - Q1
2
The semi-inter-Quartile Range considers only the middle 50% of the observation and
it ignores that first and the last quarter. It is an absolute measure. The quartile
deviation also measures the average amount by which the two quartile Q1 and Q3
differ from median.
e.
Mean Deviation
The range or Quartile deviation do not take into account, the deviation from the
central value. The mean deviation considers these differences in absolute values and
averages these differences.
Thus, mean deviations, in which is an absolute measure is defined as the arithmetic
mean of absolute values of deviations of all the observations taken from the mean,
median and mode.
41
f.
g.
Standard Deviation
This concept was
developed by Karl
Pearson in 1893. It
is defined as the
positive square root
of the arithmetic
mean of the squares of the deviations of the observations from the arithmetic
mean. It is denoted by s (sigma). It is an absolute measures. It is the most important
and widely used measure of dispersion.
42
h. Coefficient of Variation
The coefficient of variation is the relative measure corresponding to standard
deviation. It is denoted by C.V. And it is expressed as percentage.
C.V. = Standard Deviation
100
Mean
It is used to compare variability or consistency of two or more distributions.
.
4.1 Conclusion
After answer all the question which are given in the assignment i am understand the how
to process the data by research in each are like Editing, Coding , Tabulation, Graphic
Presentation and Classification of data and mean of all these aspect which are research
by mean meaning definition and in depth details of each items and in the third question
we have a study of Measure of Central Tendency, Correlation Analysis, Regression
Analysis and Measure of Dispersion in all categories wish explaination and like what
is mean, median , mode and corelation of karl peason Pearsons product moment
coefficient and Kendall's rank correlation coefficient () and after the Regression
analysis and type of regression analysis SIMPLE AND MULTIPLE, TOTAL AND
PARTIAL Linear and Non-linear and Measure of Dispersion in that we have a study
of Absolute Measure and Related Measure. After doing this research i have a lot of
knowledge of statistical mathematics. This my conclusion of about this project.
43
4.1 BIBLIOGRAPHY
Research Methodology written by S. Mohan and R. Elangovan published by
Deep publication pvt ltd.
Research Methodology written by Dr. S.L. Gupta and Hitesh Gupta published
by international book house.
Staitiscal mathematics
44