# statistical analysis

Document Sample

```					STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

CORRELATION ANALYSIS

Correlation analysis deals with the association between two or more variables. If
two or more quantities vary in sympathy so that movements in one tend to be
accompanied by corresponding movements in the other then they are said to be
correlated.
Thus correlation is a statistical device which helps us in analyzing the co variation
of two or more variables. The problems of analyzing the relation between different series
should be broken into three steps;
1. Determining whether a relation exists and if it does measuring it.
2. Testing whether it is significant.
3. Establishing the cause and effect relation.

Significance of correlation
•   Most of the variables show some kind of relationship like there is
relationship between price and supply and income and expenditure. With
the help of correlation we can find the degree of relationship between the
variables.
•   When we know the degree of relationship we can know the value of one
variable with the help of another variable.
•   Correlation analysis contributes to the understanding of the economic
behavior, aids in locating the critically important variable on which others
depend reveal to the economist and suggest to him the paths through
which stabilizing forces may become effective.
•   The effect of correlation is to reduce the range of uncertainty the
prediction based on correlation analysis is likely to be more valuable and
near to reality.

TYPES OF CORRELATION;-

Positive or negative correlation;-                 This depend upon the direction of
series. If both variable are increasing or decreasing in same direction than this is
positive correlation and if they are varying in opposite directions then they are having
negative correlation. If one series is increasing and other is also increasing and if one
is decreasing and other series is also decreasing then this is positive correlation and if
one series is increasing and other is decreasing and if one is decreasing and other is
decreasing they this is negative correlation.

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

Simple partial and multiple correlation; -       This depends upon number of
variable studied. When only two problems are studied then it is simple correlation.
When three or more variables are studied then this is multiple or partial correlation. In
multiple correlations three or more variables are studied simultaneously. In partial
correlation we recognize three or more variable but make correlation of two variables
from the series.

Linear and non linear correlation; -    It is based upon the constancy of the ratio of
change between the variables. If the amount of change in one variables tends to bear
constant ratio to the amount of change in other variable then it is called linear
correlation or vice versa.

METHODS OF CORRELATION;-

Scatter diagram method
Graphic method
Karl Pearson’s coefficient of correlation
Rank correlation method

Scatter diagram method;-
This is the simplest device for ascertaining whether two variables are related is to
prepare a dot chart called scatter diagram. The given data are plotted on a graph paper
in the form of dots.
Merits and demerits of the method;-
Merits- It is the simplest form of studying correlation.
It is not influenced by the size of extreme it3ems where as most of mathematical
methods are influenced by extreme figures.
Demerits; - We can get idea of correlation but we can not find out exact degree of
correlation.

Graphic Method;- The individual values of the two variables are plotted on the
graph paper then we obtain two curves one for X variable and another for Y variable
by examining the direction and closeness of the two curves so drawn wee can infer
they are related or not.

Karl Pearson’s coefficient of correlation;- It is most widely used for calculating
correlation. The correlation is denoted by r.

R =      ∑xy / N s.d (X) s.d(y)
x= (X-Mean), y = (Y-Mean)
s.d(x) = standard deviation of X
s.d(y) = standard deviation of Y

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

Direct method of calculating correlation;-

N ∑XY – (∑X) (∑Y)
___________________________
R=
2      2     2             2
N ∑X- (∑X) * N ∑Y- (∑Y)

When deviations are taken from assumed mean

N ∑dx.dy – (∑dx) (∑dy)
___________________________
R=
2      2        2        2
N ∑dx - (∑dx) * N ∑dy - (∑dy)

Steps
• Take the deviations of X series from an assumed mean and denote these
deviations by dx and obtain the total ∑dx
• Take the deviations of Y series from an assumed mean and denote these
deviations by dy and obtain the total ∑dy.
• Square dx and obtain the total ∑dx square
• Square dy and obtain the total ∑dy square
• Multiply dx and dy and obtain the total ∑dx.dy
• Substitute the value of ∑dx.dy, ∑dx, ∑dy, ∑dx square, ∑dy square

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

X           Y          dx         dy          dx sq   dy sq   dx.dy
78          125        9          13          81      169     117
89          137        20         25          400     625     500
99          156        30         44          900     1936    1320
60          112        -9         0           81      0       0
59          107        -10        -5          100     25      50
79          136        10         24          100     576     240
68          123        -1         11          1       121     -11
61          108        -8         -4          64      16      32
∑Y=                   ∑dy=
∑X= 593     1004       ∑dx= 41    108         1727    3468    ∑dx.dy=2248

N ∑dx.dy – (∑dx) (∑dy)
___________________________
R=
2      2        2        2
N ∑dx - (∑dx) * N ∑dy - (∑dy)

R=            (8) (2248) – (41) (108)
________________________________

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

(8) (1727) – 41*41 (8) (3468) (108*108)

= 0.97

The formula of frequency distribution is

N ∑fdx.dy – (∑fdx) (∑fdy)
___________________________
R=
2       2        2       2
N ∑fdx - (∑fdx) * N ∑fdy - (∑fdy)

The limitations of this method are great care must be expected for calculating correlation.

Conditions;
• When r is +1 it means there is perfect positive relationship between the variables,
• When r is -1 it means there is perfect negative relationship between the variables
• When r is 0 it means there is no relationship between the variables

Rank correlation method

This method was developed by British psychologist Charles Edward Spearman in
1904. The ranking is done of the variables whether in ascending or descending order.

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

2
R=                 6∑D
1-    _____________________
2
N (N - 1)

R denotes rank coefficient of correlation and D refers to the difference of rank between
paired items in two series.

X               Rx            Y           Ry         D square
97.8            3             73.2        1          4
99.2            7             85.8        6          1
98.8            6             78.9        4          4
98.3            4             75.8        2          4
98.4            5             77.2        3          4
96.7            1             87.2        7          36
97.1            2             83.8        5          9
2
R=              1-                  6∑D
_____________________
2
N (N - 1)

R=    1-         6* 62
________________

7 (7*7 – 1)

1- 1.07 = - 0.107

Equal ranks- When there are two equal variables then it will be very difficult to provide
them ranks. If two individuals are ranked equal at fifth place they are each given the rank

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

5+6 /2= 5.5 and if three are ranked equal at fifth place they are given the rank 5+6+7 /2 =
6.

2            3     3
R=                         6∑D + 1/12 (m –m) + 1/12 (m –m)
1-            ________________________________
2
N (N - 1)

Concurrent deviation method

This is the simplest method of all the methods. The only thing is required under
this method is to find out the direction of change of X variable and Y variable. The
formula is

Rc=

± (2c-n) / n

n= number of pairs of observation compared.

Steps
•   Find out the direction of change of X variable as compared with
the first value whether the second value is increasing or decreasing
or is constant. If it is increasing put a plus sign and if it is
decreasing then minus sign and if it is constant then zero sign will
be there. And denote them by dx.
•   In the same manner find out the direction of change of y variables
and denote the column by Dy.
•   Multiply dx with dy and determine the value of c the number of
positive signs. And apply the above formula.

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

X            dx              Y         dy           dxdy
60                           65
55            -              40        -            +
50            -              35        -            +
56           +               75        +            +
30            -              63        -            +
70            +              80        +            +
40            -              35        -            +
35            -              20        -            +
80            +              80        +            +
80            0              60        -            0
75            -              60        0            0
C=8

Concurrent deviation method- coefficient of correlation

± (2c-n) / n

± (2*8 – 10) / 10

Regression analysis

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

The analysis of coefficient of correlation finds the closeness of the variables but
the regression analysis helps us to find out one variable as we know the other variable.
The given variable is independent variable and other which has to find out is called
dependent variable.

Definitions- “regression is the measure of the average relationship between two or more
variable if terms of the original unit of the data”
“One of the most frequently used techniques in economics and business research to bind
a relation between two or more variables that are related casually is regression analysis.

Difference between correlation and regression-
Whereas coefficient is a measure of degree pf co variability between x and y the
objective of regression is to find out the nature of relationship between the
variables.
Correlation is merely a tool of ascertaining the degree of relationship between two
variables and we can not say that one variable is the cause another effect.
There may be nonsense correlation between two variables which is purely due to
chance and has no practical relevance there is nothing like nonsense regression,
Correlation coefficient is independent of change of scale and origin regression
coefficients are independent of change of scale but not of origin.

Regression lines-

There may be two regression lines one is x on y and another is y on x. x on y represents x
as dependent variable on y and vice versa.

Least square method

Regression equation of y on x

Y= a+ bx
To determine the value
Making summation of the equation

∑Y= Na + b∑X
2
∑XY= a∑X +bX

Regression equation of x on y

X= a+ bY

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

To determine the value
Making summation of the equation

∑X= Na + b∑Y
2
∑XY= a∑Y +bY

Deviations taken from arithmetic means of X and Y

Regression equation of X on Y: X- mean = rσx/σy (y-mean)

Rσx/σy= regression coefficient of x on y

Deviations taken from assumed means

X on Y     : X- mean = rσx/σy (Y-mean)

Rσx/σy =
N ∑dx.dy – (∑dx) (∑dy)
__________________
2      2
N ∑dy - (∑dy)

Dx= (X-A) Dy= (Y-A)

Yon X;

(Y-mean)= rσy/σx (X- mean)

Rσy/σx= N ∑dx.dy – (∑dx) (∑dy)
__________________
2      2
N ∑dx - (∑dx)

In the case of frequency distribution

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

Rσx/σy =
N ∑fdx.dy – (∑fdx) (∑fdy)
__________________             *ix/iy
2      2
N ∑fdy - (∑fdy)

Rσy/σx= N ∑fdx.dy – (∑fdx) (∑fdy)
__________________                *iy/ix
2       2
N ∑fdx - (∑fdx)

Limitations of regression analysis-
In making estimate from a regression it is important to remember that the assumption is
being made that relationship has not changed since the regression equation was
computed.

Time Series Analysis

“A time series is a set of statistical observations arranged in chronological order.”
“A time series consists of statistical data which are collected recorded and observed over
successive increments of time.”

It is clear from the definitions that if we arrange the data according to time then it is
called time series.

UTILITY OF THE TIME SERIES ANALYSIS

•   It helps in understanding past behavior- by observing data over a period of
time one can easily understand what changes have been taken place in the past.
This analysis will be extremely helpful in predicting the future behavior.
•   It helps in planning future operations- plans for the future can not be made
without forecasting events and relationship they will have. Statistical techniques
like time series helps to make decision for future.
•   It helps in evaluating current accomplishments.- The actual performance can
be compared with the expected performance and the cause of variation analyzed
•   It facilitates comparison- Different time series are compared and important
decisions are concluded.

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

COMPONENTS OF TIME SERIES
• Secular trend
• Seasonal variations
• Cyclical variations
• Irregular variations

Y = T* S*C *I
Y denotes the result of the four element and T stands for trend, S for seasonal, C for
cyclical and I for irregular variations.
Another approach is to treat each observation of a time series as the sum of these four
components.

Y= T+S+C+I

SECULAR TREND-

The term trend is very commonly used in day to day parlance. For example we
talk of rising trend of population; prices etc. are called secular trend or long term trend.
The concept of secular trend indicates only for long term data.

SEASONAL VARIATIONS

Seasonal variations are those periodic movements in business activity which
occur regularly every year and have their origin in the nature of year itself. The factors
that cause seasonal variations are-
1. Climate and weather conditions- The most important factor causing seasonal
variations is the climate. Changes in the climate and weather and weather
conditions such as rainfall, humidity, heat act on different products and industries
differently.
2. Customs and traditions and habits- Though nature is primarily responsible for
seasonal variations in the times series, customs traditions and habits also have
their impact. For example on certain occasions like deepawali, dusserha
Christmas there is big demand for sweets and also there is large demand for cash
before the festivals because they need money for shopping and gifts.

CYCLICAL VARIATIONS

The term cycle refers to the recurrent variations in time series that usually last
longer than a year and are regular neither in amplitude nor in length. Cyclical

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

fluctuations are long term movements that represent consistently recurring rises and
declines in activity, a business cycle consists of the recurrence of the up and downs
movements of business activity from some sort of statistical trend.

IRREGULAR VARIATIONS-

Irregular variations are called erratic accidental, random refer to such variations
inn business activity which do not repeat in a definite pattern. There are two reasons
for recognizing irregular movements.
1. To suggest that on occasions it may be possible to explain certain movements
in the data due to specific causes and to simplify further analysis.
2. To emphasize the fact that predictions of economic conditions are always
subject to degree of error owing to the unpredictable erratic influences which
may enter?

MEASUREMENT OF TREND

1.   Free hand method
2.   Semi- average method
3.   Moving average
4.   Method of least square

•   FREE HAND METHOD
Plot the time series on a graph paper then examine carefully the direction
of the trend based on the plotted information. Then draw a straight line
which will best fit to the data according to personal judgment and this line
will show the direction of the trend

Semi- average method
The given data is divided into two parts. Preferably with the same number of
years
Illustration-

Year            sale of firm A
1997            102
1998            105
1999            114
2000            110
2001            108
2002            116

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

2003           112

Since seven years are given the middle year shall be left out and an average of first three
years and last three years shall be appointed. The average of first three years is
102+105+114 /3 = 107 and average of last three years is 108+116+112 /3 112 thus we get
two points and by joining points we shall obtain the re4quired trend line it can be used for
prediction or for determining intermediate value.

Method of moving average.
This method is selected of period of three years, five years and eight years The three
years moving average shall be computed as follows , a+b+c /3, b+c+d /3, c+d+e/3,
d+e+f/3 and for five years moving average is a+b+c+d+e/5, b+c+d+e+f /5, c+d+e+f+g/5

Three years moving average

year           production                  3 year total       moving average
1989           15                                        -                       -
1990           21                          66                 22
1991           30                          87                 29
1992           36                          108                36
1993           42                          124                41.33
1994           46                          138                46
1995           50                          152                50.67
1996           56                          169                56.33
1997           63                          189                63
1998           70                          207                69
1999           74                          226                75.33
2000           82                          246                82
2001           90                          267                89
2002           95                          287                95.67
2003           102                                  -                                -

Five years moving average

year           no of students              5 year total       moving average
1994           332                                   -                       -

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

1995           317                                -                     -
1996           357                      1800               360
1997           392                      1873               374.6
1998           402                      1966               393.2
1999           405                      2036               407.2
2000           410                      2049               409.8
2001           427                      2085               417
2002           405                                    -                     -

2003           438                       -                 -

Method of least square-

Equation of y on x

Y= a+ bX
To determine the value
Making summation of the equation

∑Y= Na + b∑X
2
∑XY= a∑X +bX

Equation of x on y

X= a+ by
To determine the value
Making summation of the equation

∑X= Na + b∑Y
2
∑XY= a∑Y +bY

Mathematics of management

Business mathematics consists of a set of mathematical and statistical tools that
can be used for the fulfillment of one or more objective of a businesslike the

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

maximization of out or sales, maximization of profits, minimization of cost etc. These
tools are often known as quantitative techniques.

Main features of quantitative techniques

1. Every management problem can be represented by one or more equations with the
help of certain symbols. These symbols denote relevant variables and constants of
the problems.
2. The solution of the model is obtained by the application of one or more
techniques from the set of quantitative techniques.
3. The quantitative techniques take care of the polcies and capacities of different
departments and hence avoid the occurrence of any contradiction between them.
4. This is interdisciplinary approach to problem solving.
5. These techniques attempt to analyze the business problem in actual working
environment which often differ from the ideal conditions assumed in
mathematics, economics and other disciplines.

IMPORTANCE OF QUANTITATIVE TECHNIQUES.

1. Basis for scientific analysis- With the increase in complexities of modern
business it is not possible to rely on the unscientific decisions based on the
intuitions. This provides the scientific methods for tackling various
2. Tools for scientific analysis- Quantitative techniques provide the
managers with a variety of tools from mathematics, statistics, economics
and operational research. These tools help the manager to provide a more
precise description and solution of the problem. The solutions obtained by
using quantitative thechniques are often free from the bias of the manager
or the owner of the business.
3. Solution for various business problems. Quantitative techniques provide
solutions to almost every area of a business. These can be used in
production, marketing, inventory, finance and other areas to find answers
to various question like (a) how the resources should be used in
production so that profits are maximized. (b) How should the production
be matched to demand so as to minimize the cost of inventory?
4. Optimum allocation of resources- An allocation of resources is said to
be optional if either a given level of output is being produced at minimum
cost or maximum output is being produced at a given cost. A quantitative
technique enables a manager to optimally allocate the resources of a
5. Selection of an optimal strategy- Using quantitative techniques it is
possible to determine the optimal strategy of a business or firm that is

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

facing competition from its rivals. The techniques for determining the
optimal strategy is dependent upon game theory.
6. Optimal deployment of resources- Using quantitative technique It is
possible to find out the earliest and latest time for successful completion of
project and this is called program evaluation and review technique.
7. Facilitate the process of decision making- quantitative techniques
provide a method of decision making in the face of uncertainty. These
techniques are based upon decision theory.

SCOPE OF QUANTITATIVE TECHNIQUES

Production management- quantitative techniques are useful to the production
management in (a) selecting the location site for a plant, scheduling and controlling
its development and designing of plant layout. (b) Locating within the plant and
controlling the movements of required production material and finished goods
inventories and (c) scheduling and sequencing production by adequate preventive
maintenance with optimum product mix.

Personnel management- quantitative techniques are useful to personnel management
to find out (a) optimum manpower planning, (b) the number of employees to be
maintained on the permanent or full time roll, (c) the number of persons to be kept in
a work pool intended for meeting the absenteeism, (d) in studying personnel
recruiting procedures, accidents rates, labor turnover.

Marketing management- Quantitative techniques equally help n marketing
management to determine (a) warehouse distribution point and where warehousing
should be located, their size quantity to be stocked and the choice of customers, (b)
The optimum allocation of sales budget to direct selling and promotional expenses,
(c) The choice of different media of advertising and bidding strategies and (d) The
customer preferences relating to size, color, packaging et for various products as well
as to outbid and outwit customers.

Financial management - Quantitative techniques are also very useful to the financial
management in (a) finding long range capital requirements as well as how to generate
these requirements, (b) Determining optimum replacement policies (c) working out a
profit plan for the firm (d) developing capital investment plan, (e) estimating credit
and investment risk

ARITHMATIC PROGRESSION

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

A series in which each successive term is obtained by adding a constant quantity (known
as common difference) to its proceeding term is called Arithmetic Progression

The general term of an A.P with first term equal to a and common difference equal to d is
written as
a, a+d, a+2d, a+3d -----------, a+(n-1)d
Where n denotes the number of terms l = a+ (n-1) d is the last term of A.P. with n terms.
Sum of n terms of an A.P=
N/2 [2a + (n-1) d]
On substituting l = a+ (n-1) d the formula can be written as

Sum = n/2{a+l}

Example- find the sum of first 15 terms of the following series- 10, 15, 20, 25, --------
Solution- here a = 10 d = 5, n=15
Sum = 15/2 {2*10 (15-1) 5}

675 ans

Example-The fourth term of an A.P is 14 and the eighth term is 26 find the sum of first
ten terms.
Solution-
Let a be the first term and d be the common difference
Then it is given that a+3d = 14 and a+7d = 26 eliminating a from these
equations 4d= 12 so d=3
A+3*3 = 14 so a = 5
And sum = 10/2 {2*5+9*3} = 185

GEOMETRIC PROGRESSION

A series in which each successive term is obtained by multiplying the proceeding term by
a constant quantity (known as common ratio) is called geometric progression
The general term of G.P with first term equal to a and common ratio equal to R, is written
as
2   3                    (n-1)
A, aR, aR. -------------aR

Sum of n terms of G.P

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

The sum of n terms of a G.P with the first term equal to a and common 2ratio equal to R,
is written as2
n
Sum =     a (1-R )
_______________
1-R

Sum of an infinite G.P

When R < 1 and n becomes infinite the formula for the sum of G.P is given by     sum = a
/ (1-R)

Example- The first term of a G.P is 8 and the common ratio is 3 . Find the sum of first 10
terms.
Solution- a= 8, R=3 and n = 10
10
Sum=      8(1-3)
_________
1-3

QUANTITATIVE METHODS

CONTENTS:-

1. BASIC MATHEMATICS
2. ARITHMETIC PROGRESSION
3. GEOMETRIC PROGRESSION
4. MEASUREMENT OF CENTRAL TENDENCY
5. MEASUREMENT OF DISPERSION

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

6. SKEWNESS, MOMENTS, KURTOSIS
7. CORRELATION ANALYSIS
8. REGRESSION ANALYSIS
9. ANALYSIS OF TIME SERIES
10. PROBABILITYAND EXPECTED VALUE
11. THEORETICAL DISTRIBUTION

PROBABILITY AND EXPECTED VALUE

In day to day conversation we normally use the terms chance etc. and generally
people have a vague idea about its meaning. For example we come across statements like
probably it may rain tomorrow it is likely that Mr. A may not coming for taking the class
today, these all vague ideas are probability,
Definition of probability- The probability of a given event is an expression of likelihood
or chance of occurrence of an event. A probability is a number which ranges from 0 to 1
zero for an event which can not occur and 1 for an event certain to occur.
Calculation of probability-
1. Experiments and events- The term experiments refer to describe an act which
can be repeated under some given conditions. Random experiments are those
experiments whose results depend on chance such as tossing of a coin, throwing a
dice. The result of a random experiments are called outcomes
2. Mutually exclusive events- Two events are said to be mutually exclusive or
incompatible when both cannot happen simultaneously in a single trial or the
occurrence of any one of them precludes the occurrence of the other. For example
if a single coin is tossed either head can be up or tail can be up. Both cannot be up
at the same time. These events are called mutually exclusive events. if both cases
can be happened then these events are called not mutually exclusive events.
3. Independent and dependent events- Two or more events are said to be
independent when the outcome of one does not affect and is not affected by other.
For example if a coin is tossed twice the result of the second throw would in no
way be affected by the result of the first throw. Similarly the results obtained by
throwing a dice are independent of the results obtained by drawing an ace from a
pack of cards.
4. Equally likely events- Events are said to be equally likely when one does not
occur more often than the others. For example if an unbiased coin or dice is
thrown each face may be expected to be observed approximately the same number
of times in the long run, similarly the cards of a pack of playing cards are so
closely alike that we expect each card to appear equally often when a large
number of drawings are made with replacement. However if the coin or dice is

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

biased we should not expect each face to appear exactly the same number of
times.
5. Simple and compounds events- In case of simple events we consider the
probability of the happening or not happening of single events. For example we
might be interested in finding out the probability of drawing a red ball from al bag
containing 10 white and 6 red balls. On the other hand in case of compound
events we consider the joint occurrence of two or more events.
THEOREMS OF PROBABILITY-
The multiplication theorem
The additional theorem states that if two events A and B are mutually
exclusive the probability of the occurrence of either A or B is the sum of the individual
probability of A and B.
P (Aor B) = P (A) + P (B)

Example- One head is drawn from a standard pack of 52. What is the probability that it is
either a king or a queen?
Solution- There is 4 kings and 4 queens in a pack of 52 cards.
The probability that the card drawn is a king = 4/52
And the probability that the card drawn is a queen = 4/52
Since the events are mutually exclusive events the probability that the card drawn is
either king or queen
4/52 +4/52 = 8/52 = 2/13 answer
When events are not mutually exclusive events

P (Aor B) = P (A) + P (B) – P (A and B)
In the example taken the probability of drawing a king or a heart shall be-
P (Heart or king) = P (Heart) +P (King) – P (Heart and king)
4/52 +4/52 -1/52 = 4/13 answer
Multiplication theorem- This theorem states that if two events A and B are independent
the probability that they both will occur is equal to the product of their individual
probability. If A and B are independent then
P (A and B) = P (A) * P (B)

Conditional probability

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

Two events A and B are said to be dependent when B Can occur only when A is known
to have occurred only the probability attached to such an event is called the conditional
probability.
Example- Find the probability of drawing a queen, a king and a knave in that order from
a pack of cards in three consecutive draws the cards drawn not being replaced.
Solution- The probability of drawing a queen = 4/52
The probability of drawing a king after a queen has been drawn – 4/51
The probability of drawing a knave after queen and king have been drawn 4/50
Since they are dependent event the required probability of the compound events us
4/52 *4/51 *4/50 = 64/132600= 0.00048

THEORETICAL DISTRIBUTION

Probability distributions are used in discrete and continues series.
The distributions are 1. Binomial distribution, poison distribution and normal
distribution.

BINOMIAL DISTRIBUTION- Binomial distribution is known as Bernoulli distribution
is associated with the name of a Swiss mathematician James Bernoulli. Binomial
distribution is a probability distribution expressing the probability of one set of
dichotomous alternatives .i.e. success or failure. The assumptions are
• An experiment is performed under the same conditions for a fixed number of
trials say n.
• In each trial there are two possible outcomes of the experiment. For lack of a
better nomenclature they are called success or failure.
• The probability of a success denoted by p remains constant from trial to trial. The
probability of a failure denoted by q is equal to (1-p) if the probability of success
is not the same n each trial we will not have binomial distribution.
• The trials are statistically independent.
The binomial distribution-
P(r) =
(n-r)    r
Nc q           p
r

p = probability of success in a single trial
q = (1-p)
n = number of trials
r= number of success

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

POISSON DISTRIBUTION-
Poisson distribution is a discrete distribution and is used in statistical
work. This distribution is used to describe the behavior of rare events. This is expected in
cases where the chance of any individual event being a very small success such as no of
accidents on road, printing mistakes on a paper.
The poison distribution
-m r
P(r) =     e      m
__________________
R!

Normal distribution- normal distribution is used in continuous series in this distribution
value of z is find out and

Z= X – mean / S.D.

Properties of normal distribution-
1. Normal distribution is bell shaped.
2. It is perfectly symmetrical about mean.
3. This is unimodal means it has one modal.
4. Mean median and mode are equal in normal distribution.

MEASUREMENT OF DISPERSION

Dispersion;-

Dispersion is the measure of the variation of the items.

The concept of dispersion is related to the extent of variability in observations. The
variability in an observation is often measured as its deviation from central value. A
suitable average of all such deviations is called measure of dispersion. Measure of
dispersion enables a comparison to be made of two or more series with regard to their
variability,

Significance of Dispersion:-

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

1. To determine the reliability of an average;-
The dispersion is used to know how much the average is reliable. A low variation or
dispersion shows more reliability and consistency in data.

2. To serve as a basis for the control of variability: - dispersion acts as a basis of
variability. Many measurements can be done by the dispersion and if the
variations are high then controlling tools can be used.
3. Useful in quality control;- due to variation methods it can be known that the
product are up to grade or not if they are not means if variation are high then they
can rectify the technique of productions.
4. To facilitate the use of other statistical tools;- Many powerful analytical tools in
statistics such as correlation analysis, the testing of hypothesis, analysis of
variance regression analysis are based of measurement of variation

Properties of Good Measure of Variation

1.   It should be simple to understand.
2.   It should be easy to compute.
3.   It should be rigidly defined.
4.   It should be based on each and every item of the distribution.
5.   It should have sampling stability.
6.   It should not be affected by the extreme items.

Methods of measuring dispersion;-

1.   Range
2.   Quartile deviation
3.   Mean deviation
4.   standard deviation

Range;-

Range is simplest method of studying dispersion. It is the difference between the
value of the smallest item and the value of the largest item include in distribution

Range = L- S
L = largest value
S= Smallest value

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

Coefficient of range=     L- S
_____________
L+S

Quartile deviation

It represents the difference between third quartile and first quartile.

Q.D. = Q3 –Q1 / 2

Coefficient of Quartile deviation;-

= Q3 –Q1/ Q3+ Q1

Q3 = 3N/4

Q3=   L+           3N/4 - c.f.
______________ * H
F

Q1 = N/4

Q1= L +            N/4 - c.f.
______________ * H
F

L = Least value
C.F. = cumulative frequency
F = Frequency
H = class interval

Mean Deviation

M.D. =      ∑ │D│

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

___________
N

∑ │D│ = summation of deviation from actual mean
.
Mean deviation in discrete series.

M.D. =     ∑ F│D│
___________
N

F = frequencies

Coefficient of mean deviation; -       M.D. / Median

Standard Deviation;-

Standard deviation measures absolute dispersion or variability of the series.

2
∑ │D│         _      ∑ │D│ 2
S.D =                                                    *i
N                    N

Standard deviation in continues series;-

2
∑ F│D│             ∑F│D│        2
-                          *i
N                    N

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

Standard deviation is denoted by
2
Variance = {S.D}

Coefficient of standard deviation = S.D / Mean * 100

Skewness, moments and kurtosis

Skew ness;-
When a series is not symmetrical it is said to be skewed

Absolute measure of skew ness; - Mean – Mode

Relative measure of skew ness;-

1. Karl Pearson’s coefficient of skew ness;-

Mean - Mode
SKp     =       _________________
Std. deviation

2. Bow ley’s coefficient of skew ness;-

SKb =          Q3 – Q1 – 2 median
____________________________
Q3 + Q1

3. Kelly’s coefficient of skew ness;-

SKk =         P10 + P90 – 2median
_____________________________
P90 - P10

=             D1 + D9 – 2median

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

_________________________
D9 - D1

P10 =          10N/100

10N/100 - c.f.
L+           ______________         *H
F

P90 =              90N/100

D1 =                 1N/100
D9 =          9N/100

All the methods are just like median.

Moments

Moments are the sum of the deviations. It is the sum of the deviation is also
known as the first moment of dispersion. It is the sum of the deviations of the items of a
series from mean of the series, divided by the total number of items in the distribution. In
other words, it is the average deviation of the items from the mean. The arithmetic means
of the various powers of the deviations of the items in a distribution are called the
moments of the distribution.

Moments are denoted by u (mu)

U1 =    ∑ (X – mean) / n
2
U2 = ∑ (X – mean) / n
3
U3 =           ∑ (X – mean) / n

For frequency distribution;-

U1 =       ∑ F (X – mean) / n
2

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

U2 = ∑ F (X – mean) / n
3
U3 =              ∑ F (X – mean) / n

First moment about origin is mean
Second moments about mean are variance.
Third moments about mean is skew ness
Fourth moments about mean are kurtosis.

Kurtosis

Kurtosis is the degree of peaked ness of a distribution, usually taken relative to a normal
distribution.
If a curve is peaked then a normal curve it is called leptokurtic. If a curve is flat topped
than the normal curve it is called platykurtic. The normal curve is called mesokurtic,

2
β1     =    u4 / u2

β=     Kurtosis

u4, u2     fourth moment and second moment

Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

Important Questions other than above notes

1. Define the various methods of data collection. Explain with example.
2. Explain the term probability and its theorems.
3. Explain the following
a. Action space
b. Bayesian rule
c. Games theory
d. Expected pay off table
e. Null hypothesis and alternate hypothesis
f. One tail and two tail test
4. What is a sampling distribution? What purpose does it serve?
5. How does the size of population and the kind of random sampling determine the
shape of a sampling distribution?
6. Ch No 10
7. explain the following with the help of example
a. Z- test
b. T-test
c. χ2 Test
8. What is chi- square test of goodness of fit? what precautions are necessary in
using this test?
compared with parametric methods in statistics.
10. What are index numbers? What purpose do they serve? Discuss the various
problems faced in the construction of index numbers.
11. Explain the test used in index number?

Visit us at www.sgiithisar.com Contact at 94163-59920

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 44 posted: 4/12/2012 language: English pages: 30
Description: Notes for MBA 2nd Sem of GJU University Hissar Haryana In India