(BRM) Statistical Methods

Notes 13 Pages

Bengaluru City University

Bachelor of Business Administration

Contributed by

Prajwal Hallale

BUSINESS RESEARCH METHODS, 4TH SEMESTER BBA, BANGALORE CENTRAL UNIVERSITY

Vinutha T.N, Assistant Professor, MES Institute of Management
1

UNIT 4: STATISTICAL METHODS
Tabulation of data - Analysis of data –Testing of Hypothesis, Advanced techniques – ANOVA,
Chi-Square - Discriminant Analysis - Factor analysis, Conjoint analysis - Multidimensional
Scaling - Cluster Analysis
PROCESSING OPERATIONS IN RESEARCH METHODOLOGY
1. Editing: Editing of data is a process of examining the collected raw data (specially in
surveys) to detect errors and omissions and to correct these when possible. As a matter of
fact, editing involves a careful scrutiny of the completed questionnaires and/or schedules.
Editing is done to assure that the data are accurate, consistent with other facts gathered,
uniformly entered, as completed as possible and have been well arranged to facilitate coding
and tabulation. With regard to points or stages at which editing should be done, one can talk
of field editing and central editing. Field editing consists in the review of the reporting forms
by the investigator for completing (translating or rewriting) what the latter has written in
abbreviated and/or in illegible form at the time of recording the respondents’ responses.
This type of editing is necessary in view of the fact that individual writing styles often can
be difficult for others to decipher. This sort of editing should be done as soon as possible
after the interview, preferably on the very day or on the next day. While doing field editing,
the investigator must restrain himself and must not correct errors of omission by simply
guessing what the informant would have said if the question had been asked.
Central editing should take place when all forms or schedules have been completed and
returned to the office. This type of editing implies that all forms should get a thorough
editing by a single editor in a small study and by a team of editors in case of a large inquiry.
Editor(s) may correct the obvious errors such as an entry in the wrong place, entry recorded
in months when it should have been recorded in weeks, and the like. In case of inappropriate
on missing replies, the editor can sometimes determine the proper answer by reviewing the
other information in the schedule. At times, the respondent can be contacted for
clarification. The editor must strike out the answer if the same is inappropriate and he has
no basis for determining the correct answer or the response. In such a case an editing entry
of ‘no answer’ is called for. All the wrong replies, which are quite obvious, must be dropped
from the final results, especially in the context of mail surveys.
Editors must keep in view several points while performing their work: They should be
familiar with instructions given to the interviewers and coders as well as with the editing
instructions supplied to them for the purpose. While crossing out an original entry for one
reason or another, they should just draw a single line on it so that the same may remain
legible. They must make entries (if any) on the form in some distinctive colour and that too
in a standardised form. They should initial all answers which they change or supply. Editor’s
initials and the date of editing should be placed on each completed form or schedule.

2. Coding: Coding refers to the process of assigning numerals or other symbols to answers so
that responses can be put into a limited number of categories or classes. Such classes should
be appropriate to the research problem under consideration. They must also possess the
Page 1
BUSINESS RESEARCH METHODS, 4TH SEMESTER BBA, BANGALORE CENTRAL UNIVERSITY

Vinutha T.N, Assistant Professor, MES Institute of Management
2

characteristic of exhaustiveness (i.e., there must be a class for every data item) and also that
of mutual exclusively which means that a specific answer can be placed in one and only one
cell in a given category set. Another rule to be observed is that of unidimensional by which
is meant that every class is defined in terms of only one concept. Coding is necessary for
efficient analysis and through it the several replies may be reduced to a small number of
classes which contain the critical information required for analysis. Coding decisions should
usually be taken at the designing stage of the questionnaire. This makes it possible to
precode the questionnaire choices and which in turn is helpful for computer tabulation as
one can straight forward key punch from the original questionnaires. But in case of hand
coding some standard method may be used. One such standard method is to code in the
margin with a coloured pencil. The other method can be to transcribe the data from the
questionnaire to a coding sheet. Whatever method is adopted, one should see that coding
errors are altogether eliminated or reduced to the minimum level.

3. Classification: Most research studies result in a large volume of raw data which must be
reduced into homogeneous groups if we are to get meaningful relationships. This fact
necessitates classification of data which happens to be the process of arranging data in
groups or classes on the basis of common characteristics. Data having a common
characteristic are placed in one class and in this way the entire data get divided into a number
of groups or classes. Classification can be one of the following two types, depending upon
the nature of the phenomenon involved

4. Tabulation: When a mass of data has been assembled, it becomes necessary for the
researcher to arrange the same in some kind of concise and logical order. This procedure is
referred to as tabulation. Thus, tabulation is the process of summarising raw data and
displaying the same in compact form (i.e., in the form of statistical tables) for further
analysis. In a broader sense, tabulation is an orderly arrangement of data in columns and
rows. Tabulation is essential because of the following reasons.
1. It conserves space and reduces explanatory and descriptive statement to a minimum.
2. It facilitates the process of comparison.
3. It facilitates the summation of items and the detection of errors and omissions.
4. It provides a basis for various statistical computations.
Tabulation can be done by hand or by mechanical or electronic devices. The choice depends
on the size and type of study, cost considerations, time pressures and the availaibility of
tabulating machines or computers. In relatively large inquiries, we may use mechanical or
computer tabulation if other factors are favourable and necessary facilities are available. Hand
tabulation is usually preferred in case of small inquiries where the number of questionnaires is
small and they are of relatively short length. Hand tabulation may be done using the direct tally,
the list and tally or the card sort and count methods. When there are simple codes, it is feasible
to tally directly from the questionnaire. Under this method, the codes are written on a sheet of
paper, called tally sheet, and for each response a stroke is marked against the code in which it
falls. Usually after every four strokes against a particular code, the fifth response is indicated
by drawing a diagonal or horizontal line through the strokes. These groups of five are easy to
Page 2
BUSINESS RESEARCH METHODS, 4TH SEMESTER BBA, BANGALORE CENTRAL UNIVERSITY

Vinutha T.N, Assistant Professor, MES Institute of Management
3

count and the data are sorted against each code conveniently. In the listing method, the code
responses may be transcribed onto a large work-sheet, allowing a line for each questionnaire.
This way a large number of questionnaires can be listed on one work sheet. Tallies are then
made for each question. The card sorting method is the most flexible hand tabulation. In this
method the data are recorded on special cards of convenient size and shape with a series of
holes. Each hole stands for a code and when cards are stacked, a needle passes through
particular hole representing a particular code. These cards are then separated and counted. In
this way frequencies of various codes can be found out by the repetition of this technique. We
can as well use the mechanical devices or the computer facility for tabulation purpose in case
we want quick results, our budget permits their use and we have a large volume of straight
forward tabulation involving a number of cross-breaks.
Tabulation may also be classified as simple and complex tabulation. The former type of
tabulation gives information about one or more groups of independent questions, whereas the
latter type of tabulation shows the division of data in two or more categories and as such is
deigned to give information concerning one or more sets of inter-related questions. Simple
tabulation generally results in one-way tables which supply answers to questions about one
characteristic of data only. As against this, complex tabulation usually results in two-way tables
(which give information about two inter-related characteristics of data), three-way tables
(giving information about three interrelated characteristics of data) or still higher order tables,
also known as manifold tables, which supply information about several interrelated
characteristics of data. Two-way tables, three-way tables or manifold tables are all examples
of what is sometimes described as cross tabulation.
Generally accepted principles of tabulation: Such principles of tabulation, particularly of
constructing statistical tables, can be briefly states as follows:
1. Every table should have a clear, concise and adequate title so as to make the table intelligible
without reference to the text and this title should always be placed just above the body of
the table.
2. Every table should be given a distinct number to facilitate easy reference.
3. The column headings (captions) and the row headings (stubs) of the table should be clear
and brief.
4. The units of measurement under each heading or sub-heading must always be indicated.
5. Explanatory footnotes, if any, concerning the table should be placed directly beneath the
table, along with the reference symbols used in the table.
6. Source or sources from where the data in the table have been obtained must be indicated
just below the table.
7. Usually the columns are separated from one another by lines which make the table more
readable and attractive. Lines are always drawn at the top and bottom of the table and below
the captions.
8. There should be thick lines to separate the data under one class from the data under another
class and the lines separating the sub-divisions of the classes should be comparatively thin
lines.
9. The columns may be numbered to facilitate reference.
Page 3
BUSINESS RESEARCH METHODS, 4TH SEMESTER BBA, BANGALORE CENTRAL UNIVERSITY

Vinutha T.N, Assistant Professor, MES Institute of Management
4

10. Those columns whose data are to be compared should be kept side by side. Similarly,
percentages and/or averages must also be kept close to the data.
11. It is generally considered better to approximate figures before tabulation as the same would
reduce unnecessary details in the table itself.
12. In order to emphasise the relative significance of certain categories, different kinds of type,
spacing and indentations may be used.
13. It is important that all column figures be properly aligned. Decimal points and (+) or (–)
signs should be in perfect alignment.
14. Abbreviations should be avoided to the extent possible and ditto marks should not be used
in the table.
15. Miscellaneous and exceptional items, if any, should be usually placed in the last row of the
table.
16. Table should be made as logical, clear, accurate and simple as possible. If the data happen
to be very large, they should not be crowded in a single table for that would make the table
unwieldy and inconvenient.
17. Total of rows should normally be placed in the extreme right column and that of columns
should be placed at the bottom.
18. The arrangement of the categories in a table may be chronological, geographical,
alphabetical or according to magnitude to facilitate comparison. Above all, the table must
suit the needs and requirements of an investigation.

ANALYSIS OF DATA
Data analysis embraces a whole range of activities of both the qualitative and quantitative type.
It is usual tendency in behavioral research that much use of quantative analysis is made and
statistical methods and techniques are employed. The statistical methods and techniques are
employed. The statistical methods and techniques have got a special position in research
because they provide answers to the problems.
Kaul defines data analysis as,” Studying the organized material in order to discover inherent
facts. The data are studied from as many angles as possible to explore the new facts.”
Purpose:
The following are the main purposes of data analysis:
(i) Description: It involves a set of activities that are as essential first step in the
development of most fields. A researcher must be able to identify a topic about which
much was not known; he must be able to convince others about its importance and
must be able to collect data.
(ii) Construction of Measurement Scale: The researcher should construct a measurement
scale. All numbers generated by measuring instruments can be placed into one of four
categories:
Page 4
BUSINESS RESEARCH METHODS, 4TH SEMESTER BBA, BANGALORE CENTRAL UNIVERSITY

Vinutha T.N, Assistant Professor, MES Institute of Management
5

(a) Nominal: The number serves as nothing more than labels. For example, no 1 was
not less than No. 2. Similarly, No. 2 was neither more than no 1 and nor less than no
3.
(b) Ordinal: Such numbers are used to designate an ordering along some dimensions
such as from less to more, from small to large, from sooner to later.
(c) Interval: The interval provides more précised information than ordinal one. By this
type of measurement, the researcher can make exact and meaningful decisions. For
example if A, B and C are of 150 cm, 145cm and 140 cm height, the researcher can
say that A is 5 cm taller than B and B is 5 cm taller than C.
(d) Ratio Scale: It has two unique characteristics. The intervals between points can be
demonstrated to be precisely the same and the scale has a conceptually meaningful
zero point.
(iii) Generating empirical relationships: Another purpose of analysis of data is
identification of regularities and relationships among data. The researcher has no clear
idea about the relationship which will be found from the collected data. If the data were
available in details it will be easier to determine the relationship. The researcher can
develop theories if he is able to recognize pattern and order of data. The pattern may
be showing association among variables, which may be done by calculating correlation
among variables or showing order, precedence or priority. The derivation of empirical
laws may be made in the form of simple equations relating one interval or ratio scaled
variable to a few others through graph methods.
(iv) Explanation and prediction: Generally knowledge and research are equated with the
identification of causal relationships and all research activities are directed to it. But in
many fields the research has not been developed to the level where causal explanation
is possible or valid predictions can be made. In such a situation explanation and
prediction is construct as enabling the values of one set of variables to be derived given
the values of another.
Functions: The following are the main functions of data analysis:
(i) The researcher should analyze the available data for examining the statement
of the problem.
(ii) The researcher should analyze the available data for examining each hypothesis
of the problem.
(iii) The researcher should study the original records of the data before data
analysis.
(iv) The researcher should analyze the data for thinking about the research problem
in lay man’s term.
(v) The researcher should analyze the data by attacking it through statistical
calculations.
(vi) The researcher should think in terms of significant tables that the available data
permits for the analysis of data.

Page 5
BUSINESS RESEARCH METHODS, 4TH SEMESTER BBA, BANGALORE CENTRAL UNIVERSITY

Vinutha T.N, Assistant Professor, MES Institute of Management
6

STATISTICAL CALCULATIONS:
The researcher will have to use either descriptive statistics or inferential statistics for the
purpose of the analysis.
The descriptive statistics may be on any of the following forms:
(a) Measures of Central Tendency: These measures are mean, median, mode geometric mean
and harmonic mean. In behavioral statistics the last two measures are not used. Which of the
first three will be used in social statistics depends upon the nature of the problem.
(b) Measures of Variability: These measures are range, mean deviation, quartile deviation and
standard deviation. In social statistics the first two measures are rarely used. The use of standard
deviation is very frequently made for the purpose of analysis.
(c) Measures of Relative Position: These measures are standard scores (Z or T scores),
percentiles and percentile ranks. All of them are used in educational statistics for data analysis.
(d) Measures of Relationship: There measures are Co-efficient of Correlation, partial
correlation and multiple correlations. All of them are used in educational statistics for the
analysis of data. However, the use of rank method is made more in comparison to Karl pearson
method.
The inferential statistics may be in any one of the following forms:
(a) Significance of Difference between Means: It is used to determine whether a true
difference exists between population means of two samples.
(b) Analysis of Variance: The Z or t tests are used to determine whether there was any
significant difference between the means of two random samples. The F test enables the
researcher to determine whether the sample means differ from one another to a greater extent
then the test scores differ from their own sample means using the F ratio.
(c) Analysis of Co-Variance: It is an extension of analysis of variance to test the significance
of difference between means of final experimental data by taking into account the Correlation
between the dependent variable and one or more Co-variates or control variables and by
adjusting initial mean differences in the group.
(d) Correlation Methods: Either of two methods of correlation can be used for the purpose of
calculating the significance of the difference between Co-efficient of Correlation.
(e) Chi Square Test: It is used to estimate the like hood that some factor other than chance
accounts to the observed relationship. In this test the expected frequency and observed
frequency are used for evaluating Chi Square.
(f) Regression Analysis: For calculating the probability of occurrence of any phenomenon or
for predicting the phenomenon or relationship between different variables regression analysis
is cone.
Page 6
BUSINESS RESEARCH METHODS, 4TH SEMESTER BBA, BANGALORE CENTRAL UNIVERSITY

Vinutha T.N, Assistant Professor, MES Institute of Management
7

ANOVA is essentially a procedure for testing the difference among different groups of data
for homogeneity. “The essence of ANOVA is that the total amount of variation in a set of data
is brokendown into two types, that amount which can be attributed to chance and that amount
which can be attributed to specified causes.” There may be variation between samples and also
within sample items. ANOVA consists in splitting the variance for analytical purposes. Hence,
it is a method of analysing the variance to which a response is subject into its various
components corresponding to various sources of variation. Through this technique one can
explain whether various varieties of seeds or fertilizers or soils differ significantly so that a
policy decision could be taken accordingly, concerning a particular variety in the context of
agriculture researches.
TESTING OF HYPOTHESIS
The word hypothesis consists of two words –Hypo+Thesis. ‘Hypo’ means tentative or subject
to the verification. ‘Thesis’ means statement about solution of the problem. Thus the literal
meaning of the term hypothesis is a tentative statement about the solution of the problem.
Hypothesis offers a solution of the problem that is to be verified empirically and based on some
rationale.
Ordinarily, when one talks about hypothesis, one simply means a mere assumption or some
supposition to be proved or disproved. But for a researcher hypothesis is a formal question that
he intends to resolve.
Quite often a research hypothesis is a predictive statement, capable of being tested by scientific
methods, that relates an independent variable to some dependent variable. For example,
consider statements like the following ones.
“Students who receive counselling will show a greater increase in creativity than students not
receiving counselling” Or “the automobile A is performing as well as automobile B.”
These are hypotheses capable of being objectively verified and tested. Thus, we may conclude
that a hypothesis states what we are looking for and it is a proposition which can be put to a
test to determine its validity.
BASIC CONCEPTS OF HYPOTHESIS TESTING
Basic concepts in the context of testing of hypotheses need to be explained.
Null hypothesis and alternative hypothesis: In the context of statistical analysis, we often
talk about null hypothesis and alternative hypothesis. If we are to compare method A with
method B about its superiority and if we proceed on the assumption that both methods are
equally good, then this assumption is termed as the null hypothesis. As against this, we may
think that the method A is superior or the method B is inferior, we are then stating what is
termed as alternative hypothesis. The null hypothesis is generally symbolized as H0 and the
alternative hypothesis as Ha. Suppose we want to test the hypothesis that the population mean
bmg is equal to the hypothesised mean mH0 d i = 100.
Then we would say that the null hypothesis is that the population mean is equal to the
hypothesized mean 100 and symbolically we can express as:
Page 7
BUSINESS RESEARCH METHODS, 4TH SEMESTER BBA, BANGALORE CENTRAL UNIVERSITY

Vinutha T.N, Assistant Professor, MES Institute of Management
8

If our sample results do not support this null hypothesis, we should conclude that something
else is true. What we conclude rejecting the null hypothesis is known as alternative hypothesis.
In other words, the set of alternatives to the null hypothesis is referred to as the alternative
hypothesis. If we accept H0, then we are rejecting Ha and if we reject H0, then we are accepting
Ha. For H0 : m m = = H0 100 , we may consider three possible alternative hypotheses as
follows:

The null hypothesis and the alternative hypothesis are chosen before the sample is drawn (the
researcher must avoid the error of deriving hypotheses from the data that he collects and then
testing the hypotheses from the same data). In the choice of null hypothesis, the following
considerations are usually kept in view:
• Alternative hypothesis is usually the one which one wishes to prove and the null hypothesis
is the one which one wishes to disprove. Thus, a null hypothesis represents the hypothesis
we are trying to reject, and alternative hypothesis represents all other possibilities.
• If the rejection of a certain hypothesis when it is actually true involves great risk, it is taken
as null hypothesis because then the probability of rejecting it when it is true is a (the level
of significance) which is chosen very small.
• Null hypothesis should always be specific hypothesis i.e., it should not state about or
approximately a certain value. Generally, in hypothesis testing we proceed on the basis of
null hypothesis, keeping the alternative hypothesis in view. Why so? The answer is that on
the assumption that null hypothesis is true, one can assign the probabilities to different
possible sample results, but this cannot be done if we proceed with the alternative
hypothesis. Hence the use of null hypothesis (at times also known as statistical hypothesis)
is quite frequent.

The level of significance: This is a very important concept in the context of hypothesis
testing. It is always some percentage (usually 5%) which should be chosen with great care,
thought and reason. In case we take the significance level at 5 per cent, then this implies
that H0 will be rejected when the sampling result (i.e., observed evidence) has a less than
0.05 probability of occurring if H0 is true. In other words, the 5 per cent level of significance
means that researcher is willing to take as much as a 5 per cent risk of rejecting the null
hypothesis when it (H0) happens to be true. Thus, the significance level is the maximum
Page 8
BUSINESS RESEARCH METHODS, 4TH SEMESTER BBA, BANGALORE CENTRAL UNIVERSITY

Vinutha T.N, Assistant Professor, MES Institute of Management
9

value of the probability of rejecting H0 when it is true and is usually determined in advance
before testing the hypothesis.

Decision rule or test of hypothesis: Given a hypothesis H0 and an alternative hypothesis
Ha, we make a rule which is known as decision rule according to which we accept H0 (i.e.,
reject Ha) or reject H0 (i.e., accept Ha). For instance, if (H0 is that a certain lot is good
(there are very few defective items in it) against Ha) that the lot is not good (there are too
many defective items in it), then we must decide the number of items to be tested and the
criterion for accepting or rejecting the hypothesis. We might test 10 items in the lot and plan
our decision saying that if there are none or only 1 defective item among the 10, we will
accept H0 otherwise we will reject H0 (or accept Ha). This sort of basis is known as decision
rule.

Type I and Type II errors: In the context of testing of hypotheses, there are basically two
types of errors we can make. We may reject H0 when H0 is true and we may accept H0
when in fact H0 is not true. The former is known as Type I error and the latter as Type II
error. In other words, Type I error means rejection of hypothesis which should have been
accepted and Type II error means accepting the hypothesis which should have been rejected.
Type I error is denoted by a (alpha)known as a error, also called the level of significance of
test; and Type II error is denoted by b (beta) known as b error. In a tabular form the said
two errors can be presented as follows:

The probability of Type I error is usually determined in advance and is understood as the level
of significance of testing the hypothesis. If type I error is fixed at 5 per cent, it means that there
are about 5 chances in 100 that we will reject H0 when H0 is true. We can control Type I error
just by fixing it at a lower level. For instance, if we fix it at 1 per cent, we will say that the
maximum probability of committing Type I error would only be 0.01.
But with a fixed sample size, n, when we try to reduce Type I error, the probability of
committing Type II error increases. Both types of errors cannot be reduced simultaneously.
There is a trade-off between two types of errors which means that the probability of making
one type of error can only be reduced if we are willing to increase the probability of making
the other type of error. To deal with this trade-off in business situations, decision-makers decide
the appropriate level of Type I error by examining the costs or penalties attached to both types
of errors. If Type I error involves the time and trouble of reworking a batch of chemicals that
should have been accepted, whereas Type II error means taking a chance that an entire group
Page 9
BUSINESS RESEARCH METHODS, 4TH SEMESTER BBA, BANGALORE CENTRAL UNIVERSITY

Vinutha T.N, Assistant Professor, MES Institute of Management
10

of users of this chemical compound will be poisoned, then in such a situation one should prefer
a Type I error to a Type II error. As a result one must set very high level for Type I error in
one’s testing technique of a given hypothesis.2 Hence, in the testing of hypothesis, one must
make all possible effort to strike an adequate balance between Type I and Type II errors.
Two-tailed and One-tailed tests: In the context of hypothesis testing, these two terms are
quite important and must be clearly understood. A two-tailed test rejects the null hypothesis if,
say, the sample mean is significantly higher or lower than the hypothesised value of the mean
of the population. Such a test is appropriate when the null hypothesis is some specified value
and the alternative hypothesis is a value not equal to the specified value of the null hypothesis.
Symbolically, the two tailed test is appropriate when we havewhich may mean m > mH0 or m
< mH0.
Thus, in a two-tailed test, there are two rejection regions*, one on each tail of the curve which
can be illustrated as under:

Mathematically we can state:
Acceptance Region A : Z < 1.96
Rejection Region R : Z > 1.96
Page 10

(BRM) Statistical Methods

(BRM) Statistical Methods

Download this file to view remaining 3 pages