statistical analysis

Document Sample
statistical analysis Powered By Docstoc
					STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009


                       CORRELATION ANALYSIS


        Correlation analysis deals with the association between two or more variables. If
two or more quantities vary in sympathy so that movements in one tend to be
accompanied by corresponding movements in the other then they are said to be
correlated.
        Thus correlation is a statistical device which helps us in analyzing the co variation
of two or more variables. The problems of analyzing the relation between different series
should be broken into three steps;
    1. Determining whether a relation exists and if it does measuring it.
    2. Testing whether it is significant.
    3. Establishing the cause and effect relation.

               Significance of correlation
           •   Most of the variables show some kind of relationship like there is
               relationship between price and supply and income and expenditure. With
               the help of correlation we can find the degree of relationship between the
               variables.
           •   When we know the degree of relationship we can know the value of one
               variable with the help of another variable.
           •   Correlation analysis contributes to the understanding of the economic
               behavior, aids in locating the critically important variable on which others
               depend reveal to the economist and suggest to him the paths through
               which stabilizing forces may become effective.
           •   The effect of correlation is to reduce the range of uncertainty the
               prediction based on correlation analysis is likely to be more valuable and
               near to reality.

   TYPES OF CORRELATION;-

   Positive or negative correlation;-                 This depend upon the direction of
   series. If both variable are increasing or decreasing in same direction than this is
   positive correlation and if they are varying in opposite directions then they are having
   negative correlation. If one series is increasing and other is also increasing and if one
   is decreasing and other series is also decreasing then this is positive correlation and if
   one series is increasing and other is decreasing and if one is decreasing and other is
   decreasing they this is negative correlation.



        Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

  Simple partial and multiple correlation; -       This depends upon number of
  variable studied. When only two problems are studied then it is simple correlation.
  When three or more variables are studied then this is multiple or partial correlation. In
  multiple correlations three or more variables are studied simultaneously. In partial
  correlation we recognize three or more variable but make correlation of two variables
  from the series.

  Linear and non linear correlation; -    It is based upon the constancy of the ratio of
  change between the variables. If the amount of change in one variables tends to bear
  constant ratio to the amount of change in other variable then it is called linear
  correlation or vice versa.

METHODS OF CORRELATION;-

     Scatter diagram method
     Graphic method
     Karl Pearson’s coefficient of correlation
     Rank correlation method

  Scatter diagram method;-
       This is the simplest device for ascertaining whether two variables are related is to
  prepare a dot chart called scatter diagram. The given data are plotted on a graph paper
  in the form of dots.
  Merits and demerits of the method;-
  Merits- It is the simplest form of studying correlation.
  It is not influenced by the size of extreme it3ems where as most of mathematical
  methods are influenced by extreme figures.
  Demerits; - We can get idea of correlation but we can not find out exact degree of
  correlation.

  Graphic Method;- The individual values of the two variables are plotted on the
  graph paper then we obtain two curves one for X variable and another for Y variable
  by examining the direction and closeness of the two curves so drawn wee can infer
  they are related or not.

  Karl Pearson’s coefficient of correlation;- It is most widely used for calculating
  correlation. The correlation is denoted by r.

   R =      ∑xy / N s.d (X) s.d(y)
             x= (X-Mean), y = (Y-Mean)
            s.d(x) = standard deviation of X
            s.d(y) = standard deviation of Y

      Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009




Direct method of calculating correlation;-

                 N ∑XY – (∑X) (∑Y)
               ___________________________
     R=
                  2      2     2             2
              N ∑X- (∑X) * N ∑Y- (∑Y)




When deviations are taken from assumed mean


                 N ∑dx.dy – (∑dx) (∑dy)
               ___________________________
     R=
                   2      2        2        2
              N ∑dx - (∑dx) * N ∑dy - (∑dy)




Steps
   • Take the deviations of X series from an assumed mean and denote these
      deviations by dx and obtain the total ∑dx
   • Take the deviations of Y series from an assumed mean and denote these
      deviations by dy and obtain the total ∑dy.
   • Square dx and obtain the total ∑dx square
   • Square dy and obtain the total ∑dy square
   • Multiply dx and dy and obtain the total ∑dx.dy
   • Substitute the value of ∑dx.dy, ∑dx, ∑dy, ∑dx square, ∑dy square


       Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009




X           Y          dx         dy          dx sq   dy sq   dx.dy
78          125        9          13          81      169     117
89          137        20         25          400     625     500
99          156        30         44          900     1936    1320
60          112        -9         0           81      0       0
59          107        -10        -5          100     25      50
79          136        10         24          100     576     240
68          123        -1         11          1       121     -11
61          108        -8         -4          64      16      32
            ∑Y=                   ∑dy=
∑X= 593     1004       ∑dx= 41    108         1727    3468    ∑dx.dy=2248



               N ∑dx.dy – (∑dx) (∑dy)
             ___________________________
     R=
                 2      2        2        2
            N ∑dx - (∑dx) * N ∑dy - (∑dy)




R=            (8) (2248) – (41) (108)
          ________________________________

      Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009




               (8) (1727) – 41*41 (8) (3468) (108*108)



                                                          = 0.97




       The formula of frequency distribution is



                 N ∑fdx.dy – (∑fdx) (∑fdy)
               ___________________________
     R=
                   2       2        2       2
              N ∑fdx - (∑fdx) * N ∑fdy - (∑fdy)




The limitations of this method are great care must be expected for calculating correlation.

Conditions;
   • When r is +1 it means there is perfect positive relationship between the variables,
   • When r is -1 it means there is perfect negative relationship between the variables
   • When r is 0 it means there is no relationship between the variables


   Rank correlation method

   This method was developed by British psychologist Charles Edward Spearman in
   1904. The ranking is done of the variables whether in ascending or descending order.

        Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

                                     2
     R=                 6∑D
                1-    _____________________
                               2
                          N (N - 1)

R denotes rank coefficient of correlation and D refers to the difference of rank between
paired items in two series.


X               Rx            Y           Ry         D square
97.8            3             73.2        1          4
99.2            7             85.8        6          1
98.8            6             78.9        4          4
98.3            4             75.8        2          4
98.4            5             77.2        3          4
96.7            1             87.2        7          36
97.1            2             83.8        5          9
                                               2
     R=              1-                  6∑D
                                  _____________________
                                                     2
                                               N (N - 1)


R=    1-         6* 62
                ________________

                     7 (7*7 – 1)



          1- 1.07 = - 0.107



Equal ranks- When there are two equal variables then it will be very difficult to provide
them ranks. If two individuals are ranked equal at fifth place they are each given the rank


           Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

5+6 /2= 5.5 and if three are ranked equal at fifth place they are given the rank 5+6+7 /2 =
6.



                                     2            3     3
   R=                         6∑D + 1/12 (m –m) + 1/12 (m –m)
                1-            ________________________________
                                              2
                                        N (N - 1)


Concurrent deviation method

       This is the simplest method of all the methods. The only thing is required under
this method is to find out the direction of change of X variable and Y variable. The
formula is

        Rc=




               ± (2c-n) / n



n= number of pairs of observation compared.

Steps
                  •   Find out the direction of change of X variable as compared with
                      the first value whether the second value is increasing or decreasing
                      or is constant. If it is increasing put a plus sign and if it is
                      decreasing then minus sign and if it is constant then zero sign will
                      be there. And denote them by dx.
                  •   In the same manner find out the direction of change of y variables
                      and denote the column by Dy.
                  •   Multiply dx with dy and determine the value of c the number of
                      positive signs. And apply the above formula.




        Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009


X            dx              Y         dy           dxdy
60                           65
55            -              40        -            +
50            -              35        -            +
56           +               75        +            +
30            -              63        -            +
70            +              80        +            +
40            -              35        -            +
35            -              20        -            +
80            +              80        +            +
80            0              60        -            0
75            -              60        0            0
                                                    C=8


Concurrent deviation method- coefficient of correlation




              ± (2c-n) / n




              ± (2*8 – 10) / 10


                                                           0.774 answer




                       Regression analysis


       Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

        The analysis of coefficient of correlation finds the closeness of the variables but
the regression analysis helps us to find out one variable as we know the other variable.
The given variable is independent variable and other which has to find out is called
dependent variable.

Definitions- “regression is the measure of the average relationship between two or more
variable if terms of the original unit of the data”
“One of the most frequently used techniques in economics and business research to bind
a relation between two or more variables that are related casually is regression analysis.

Difference between correlation and regression-
       Whereas coefficient is a measure of degree pf co variability between x and y the
       objective of regression is to find out the nature of relationship between the
       variables.
       Correlation is merely a tool of ascertaining the degree of relationship between two
       variables and we can not say that one variable is the cause another effect.
       There may be nonsense correlation between two variables which is purely due to
       chance and has no practical relevance there is nothing like nonsense regression,
       Correlation coefficient is independent of change of scale and origin regression
       coefficients are independent of change of scale but not of origin.

Regression lines-

There may be two regression lines one is x on y and another is y on x. x on y represents x
as dependent variable on y and vice versa.

Least square method

Regression equation of y on x

Y= a+ bx
To determine the value
Making summation of the equation

∑Y= Na + b∑X
                       2
∑XY= a∑X +bX

Regression equation of x on y

X= a+ bY

       Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

To determine the value
Making summation of the equation

∑X= Na + b∑Y
                       2
∑XY= a∑Y +bY

Deviations taken from arithmetic means of X and Y

Regression equation of X on Y: X- mean = rσx/σy (y-mean)

Rσx/σy= regression coefficient of x on y

Deviations taken from assumed means


X on Y     : X- mean = rσx/σy (Y-mean)

Rσx/σy =
              N ∑dx.dy – (∑dx) (∑dy)
              __________________
                        2      2
              N ∑dy - (∑dy)

          Dx= (X-A) Dy= (Y-A)


Yon X;

(Y-mean)= rσy/σx (X- mean)

Rσy/σx= N ∑dx.dy – (∑dx) (∑dy)
           __________________
                     2      2
          N ∑dx - (∑dx)




In the case of frequency distribution



         Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

Rσx/σy =
              N ∑fdx.dy – (∑fdx) (∑fdy)
              __________________             *ix/iy
                        2      2
              N ∑fdy - (∑fdy)


Rσy/σx= N ∑fdx.dy – (∑fdx) (∑fdy)
           __________________                *iy/ix
                     2       2
          N ∑fdx - (∑fdx)




Limitations of regression analysis-
In making estimate from a regression it is important to remember that the assumption is
being made that relationship has not changed since the regression equation was
computed.

Time Series Analysis


“A time series is a set of statistical observations arranged in chronological order.”
“A time series consists of statistical data which are collected recorded and observed over
successive increments of time.”

It is clear from the definitions that if we arrange the data according to time then it is
called time series.

UTILITY OF THE TIME SERIES ANALYSIS

   •   It helps in understanding past behavior- by observing data over a period of
       time one can easily understand what changes have been taken place in the past.
       This analysis will be extremely helpful in predicting the future behavior.
   •   It helps in planning future operations- plans for the future can not be made
       without forecasting events and relationship they will have. Statistical techniques
       like time series helps to make decision for future.
   •   It helps in evaluating current accomplishments.- The actual performance can
       be compared with the expected performance and the cause of variation analyzed
   •   It facilitates comparison- Different time series are compared and important
       decisions are concluded.

       Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

   COMPONENTS OF TIME SERIES
     • Secular trend
     • Seasonal variations
     • Cyclical variations
     • Irregular variations

                Y = T* S*C *I
Y denotes the result of the four element and T stands for trend, S for seasonal, C for
cyclical and I for irregular variations.
Another approach is to treat each observation of a time series as the sum of these four
components.

               Y= T+S+C+I

SECULAR TREND-

        The term trend is very commonly used in day to day parlance. For example we
talk of rising trend of population; prices etc. are called secular trend or long term trend.
The concept of secular trend indicates only for long term data.

SEASONAL VARIATIONS

        Seasonal variations are those periodic movements in business activity which
occur regularly every year and have their origin in the nature of year itself. The factors
that cause seasonal variations are-
    1. Climate and weather conditions- The most important factor causing seasonal
        variations is the climate. Changes in the climate and weather and weather
        conditions such as rainfall, humidity, heat act on different products and industries
        differently.
    2. Customs and traditions and habits- Though nature is primarily responsible for
        seasonal variations in the times series, customs traditions and habits also have
        their impact. For example on certain occasions like deepawali, dusserha
        Christmas there is big demand for sweets and also there is large demand for cash
        before the festivals because they need money for shopping and gifts.



   CYCLICAL VARIATIONS

      The term cycle refers to the recurrent variations in time series that usually last
   longer than a year and are regular neither in amplitude nor in length. Cyclical

        Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

 fluctuations are long term movements that represent consistently recurring rises and
 declines in activity, a business cycle consists of the recurrence of the up and downs
 movements of business activity from some sort of statistical trend.

 IRREGULAR VARIATIONS-

     Irregular variations are called erratic accidental, random refer to such variations
 inn business activity which do not repeat in a definite pattern. There are two reasons
 for recognizing irregular movements.
     1. To suggest that on occasions it may be possible to explain certain movements
         in the data due to specific causes and to simplify further analysis.
     2. To emphasize the fact that predictions of economic conditions are always
         subject to degree of error owing to the unpredictable erratic influences which
         may enter?

 MEASUREMENT OF TREND

        1.   Free hand method
        2.   Semi- average method
        3.   Moving average
        4.   Method of least square

 •   FREE HAND METHOD
          Plot the time series on a graph paper then examine carefully the direction
          of the trend based on the plotted information. Then draw a straight line
          which will best fit to the data according to personal judgment and this line
          will show the direction of the trend

        Semi- average method
        The given data is divided into two parts. Preferably with the same number of
        years
        Illustration-

                      Year            sale of firm A
                      1997            102
                      1998            105
                      1999            114
                      2000            110
                      2001            108
                      2002            116

     Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

                        2003           112

Since seven years are given the middle year shall be left out and an average of first three
years and last three years shall be appointed. The average of first three years is
102+105+114 /3 = 107 and average of last three years is 108+116+112 /3 112 thus we get
two points and by joining points we shall obtain the re4quired trend line it can be used for
prediction or for determining intermediate value.

Method of moving average.
This method is selected of period of three years, five years and eight years The three
years moving average shall be computed as follows , a+b+c /3, b+c+d /3, c+d+e/3,
d+e+f/3 and for five years moving average is a+b+c+d+e/5, b+c+d+e+f /5, c+d+e+f+g/5

Three years moving average

year           production                  3 year total       moving average
1989           15                                        -                       -
1990           21                          66                 22
1991           30                          87                 29
1992           36                          108                36
1993           42                          124                41.33
1994           46                          138                46
1995           50                          152                50.67
1996           56                          169                56.33
1997           63                          189                63
1998           70                          207                69
1999           74                          226                75.33
2000           82                          246                82
2001           90                          267                89
2002           95                          287                95.67
2003           102                                  -                                -

Five years moving average

year           no of students              5 year total       moving average
1994           332                                   -                       -

        Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

1995           317                                -                     -
1996           357                      1800               360
1997           392                      1873               374.6
1998           402                      1966               393.2
1999           405                      2036               407.2
2000           410                      2049               409.8
2001           427                      2085               417
2002           405                                    -                     -

2003           438                       -                 -


Method of least square-

Equation of y on x

Y= a+ bX
To determine the value
Making summation of the equation

∑Y= Na + b∑X
                      2
∑XY= a∑X +bX

Equation of x on y

X= a+ by
To determine the value
Making summation of the equation

∑X= Na + b∑Y
                      2
∑XY= a∑Y +bY


Mathematics of management

      Business mathematics consists of a set of mathematical and statistical tools that
can be used for the fulfillment of one or more objective of a businesslike the

       Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

maximization of out or sales, maximization of profits, minimization of cost etc. These
tools are often known as quantitative techniques.

Main features of quantitative techniques

   1. Every management problem can be represented by one or more equations with the
      help of certain symbols. These symbols denote relevant variables and constants of
      the problems.
   2. The solution of the model is obtained by the application of one or more
      techniques from the set of quantitative techniques.
   3. The quantitative techniques take care of the polcies and capacities of different
      departments and hence avoid the occurrence of any contradiction between them.
   4. This is interdisciplinary approach to problem solving.
   5. These techniques attempt to analyze the business problem in actual working
      environment which often differ from the ideal conditions assumed in
      mathematics, economics and other disciplines.

   IMPORTANCE OF QUANTITATIVE TECHNIQUES.

          1. Basis for scientific analysis- With the increase in complexities of modern
             business it is not possible to rely on the unscientific decisions based on the
             intuitions. This provides the scientific methods for tackling various
             problems for modern business.
          2. Tools for scientific analysis- Quantitative techniques provide the
             managers with a variety of tools from mathematics, statistics, economics
             and operational research. These tools help the manager to provide a more
             precise description and solution of the problem. The solutions obtained by
             using quantitative thechniques are often free from the bias of the manager
             or the owner of the business.
          3. Solution for various business problems. Quantitative techniques provide
             solutions to almost every area of a business. These can be used in
             production, marketing, inventory, finance and other areas to find answers
             to various question like (a) how the resources should be used in
             production so that profits are maximized. (b) How should the production
             be matched to demand so as to minimize the cost of inventory?
          4. Optimum allocation of resources- An allocation of resources is said to
             be optional if either a given level of output is being produced at minimum
             cost or maximum output is being produced at a given cost. A quantitative
             technique enables a manager to optimally allocate the resources of a
             business or industry.
          5. Selection of an optimal strategy- Using quantitative techniques it is
             possible to determine the optimal strategy of a business or firm that is

       Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

            facing competition from its rivals. The techniques for determining the
            optimal strategy is dependent upon game theory.
         6. Optimal deployment of resources- Using quantitative technique It is
            possible to find out the earliest and latest time for successful completion of
            project and this is called program evaluation and review technique.
         7. Facilitate the process of decision making- quantitative techniques
            provide a method of decision making in the face of uncertainty. These
            techniques are based upon decision theory.

SCOPE OF QUANTITATIVE TECHNIQUES

  Production management- quantitative techniques are useful to the production
  management in (a) selecting the location site for a plant, scheduling and controlling
  its development and designing of plant layout. (b) Locating within the plant and
  controlling the movements of required production material and finished goods
  inventories and (c) scheduling and sequencing production by adequate preventive
  maintenance with optimum product mix.

  Personnel management- quantitative techniques are useful to personnel management
  to find out (a) optimum manpower planning, (b) the number of employees to be
  maintained on the permanent or full time roll, (c) the number of persons to be kept in
  a work pool intended for meeting the absenteeism, (d) in studying personnel
  recruiting procedures, accidents rates, labor turnover.

  Marketing management- Quantitative techniques equally help n marketing
  management to determine (a) warehouse distribution point and where warehousing
  should be located, their size quantity to be stocked and the choice of customers, (b)
  The optimum allocation of sales budget to direct selling and promotional expenses,
  (c) The choice of different media of advertising and bidding strategies and (d) The
  customer preferences relating to size, color, packaging et for various products as well
  as to outbid and outwit customers.

  Financial management - Quantitative techniques are also very useful to the financial
  management in (a) finding long range capital requirements as well as how to generate
  these requirements, (b) Determining optimum replacement policies (c) working out a
  profit plan for the firm (d) developing capital investment plan, (e) estimating credit
  and investment risk


ARITHMATIC PROGRESSION



      Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

A series in which each successive term is obtained by adding a constant quantity (known
as common difference) to its proceeding term is called Arithmetic Progression

The general term of an A.P with first term equal to a and common difference equal to d is
written as
a, a+d, a+2d, a+3d -----------, a+(n-1)d
Where n denotes the number of terms l = a+ (n-1) d is the last term of A.P. with n terms.
Sum of n terms of an A.P=
               N/2 [2a + (n-1) d]
On substituting l = a+ (n-1) d the formula can be written as

Sum = n/2{a+l}

Example- find the sum of first 15 terms of the following series- 10, 15, 20, 25, --------
Solution- here a = 10 d = 5, n=15
Sum = 15/2 {2*10 (15-1) 5}

                         675 ans

Example-The fourth term of an A.P is 14 and the eighth term is 26 find the sum of first
ten terms.
Solution-
              Let a be the first term and d be the common difference
              Then it is given that a+3d = 14 and a+7d = 26 eliminating a from these
equations 4d= 12 so d=3
              A+3*3 = 14 so a = 5
              And sum = 10/2 {2*5+9*3} = 185


GEOMETRIC PROGRESSION

A series in which each successive term is obtained by multiplying the proceeding term by
a constant quantity (known as common ratio) is called geometric progression
The general term of G.P with first term equal to a and common ratio equal to R, is written
as
                       2   3                    (n-1)
                A, aR, aR. -------------aR



Sum of n terms of G.P


        Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

The sum of n terms of a G.P with the first term equal to a and common 2ratio equal to R,
is written as2
                             n
        Sum =     a (1-R )
                  _______________
                        1-R



Sum of an infinite G.P

When R < 1 and n becomes infinite the formula for the sum of G.P is given by     sum = a
/ (1-R)

Example- The first term of a G.P is 8 and the common ratio is 3 . Find the sum of first 10
terms.
Solution- a= 8, R=3 and n = 10
                                     10
                Sum=      8(1-3)
                            _________
                                1-3


236192 answer




QUANTITATIVE METHODS

 CONTENTS:-

       1. BASIC MATHEMATICS
       2. ARITHMETIC PROGRESSION
       3. GEOMETRIC PROGRESSION
       4. MEASUREMENT OF CENTRAL TENDENCY
       5. MEASUREMENT OF DISPERSION

       Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

       6. SKEWNESS, MOMENTS, KURTOSIS
       7. CORRELATION ANALYSIS
       8. REGRESSION ANALYSIS
       9. ANALYSIS OF TIME SERIES
       10. PROBABILITYAND EXPECTED VALUE
       11. THEORETICAL DISTRIBUTION

PROBABILITY AND EXPECTED VALUE

        In day to day conversation we normally use the terms chance etc. and generally
people have a vague idea about its meaning. For example we come across statements like
probably it may rain tomorrow it is likely that Mr. A may not coming for taking the class
today, these all vague ideas are probability,
Definition of probability- The probability of a given event is an expression of likelihood
or chance of occurrence of an event. A probability is a number which ranges from 0 to 1
zero for an event which can not occur and 1 for an event certain to occur.
Calculation of probability-
    1. Experiments and events- The term experiments refer to describe an act which
        can be repeated under some given conditions. Random experiments are those
        experiments whose results depend on chance such as tossing of a coin, throwing a
        dice. The result of a random experiments are called outcomes
    2. Mutually exclusive events- Two events are said to be mutually exclusive or
        incompatible when both cannot happen simultaneously in a single trial or the
        occurrence of any one of them precludes the occurrence of the other. For example
        if a single coin is tossed either head can be up or tail can be up. Both cannot be up
        at the same time. These events are called mutually exclusive events. if both cases
        can be happened then these events are called not mutually exclusive events.
    3. Independent and dependent events- Two or more events are said to be
        independent when the outcome of one does not affect and is not affected by other.
        For example if a coin is tossed twice the result of the second throw would in no
        way be affected by the result of the first throw. Similarly the results obtained by
        throwing a dice are independent of the results obtained by drawing an ace from a
        pack of cards.
    4. Equally likely events- Events are said to be equally likely when one does not
        occur more often than the others. For example if an unbiased coin or dice is
        thrown each face may be expected to be observed approximately the same number
        of times in the long run, similarly the cards of a pack of playing cards are so
        closely alike that we expect each card to appear equally often when a large
        number of drawings are made with replacement. However if the coin or dice is

        Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

       biased we should not expect each face to appear exactly the same number of
       times.
   5. Simple and compounds events- In case of simple events we consider the
       probability of the happening or not happening of single events. For example we
       might be interested in finding out the probability of drawing a red ball from al bag
       containing 10 white and 6 red balls. On the other hand in case of compound
       events we consider the joint occurrence of two or more events.
   THEOREMS OF PROBABILITY-
               The additional theorem
               The multiplication theorem
The additional theorem-
               The additional theorem states that if two events A and B are mutually
exclusive the probability of the occurrence of either A or B is the sum of the individual
probability of A and B.
               P (Aor B) = P (A) + P (B)


Example- One head is drawn from a standard pack of 52. What is the probability that it is
either a king or a queen?
Solution- There is 4 kings and 4 queens in a pack of 52 cards.
The probability that the card drawn is a king = 4/52
And the probability that the card drawn is a queen = 4/52
Since the events are mutually exclusive events the probability that the card drawn is
either king or queen
4/52 +4/52 = 8/52 = 2/13 answer
When events are not mutually exclusive events

P (Aor B) = P (A) + P (B) – P (A and B)
In the example taken the probability of drawing a king or a heart shall be-
P (Heart or king) = P (Heart) +P (King) – P (Heart and king)
4/52 +4/52 -1/52 = 4/13 answer
Multiplication theorem- This theorem states that if two events A and B are independent
the probability that they both will occur is equal to the product of their individual
probability. If A and B are independent then
          P (A and B) = P (A) * P (B)




Conditional probability



       Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

Two events A and B are said to be dependent when B Can occur only when A is known
to have occurred only the probability attached to such an event is called the conditional
probability.
Example- Find the probability of drawing a queen, a king and a knave in that order from
a pack of cards in three consecutive draws the cards drawn not being replaced.
Solution- The probability of drawing a queen = 4/52
The probability of drawing a king after a queen has been drawn – 4/51
The probability of drawing a knave after queen and king have been drawn 4/50
Since they are dependent event the required probability of the compound events us
               4/52 *4/51 *4/50 = 64/132600= 0.00048


       THEORETICAL DISTRIBUTION


        Probability distributions are used in discrete and continues series.
The distributions are 1. Binomial distribution, poison distribution and normal
distribution.

BINOMIAL DISTRIBUTION- Binomial distribution is known as Bernoulli distribution
is associated with the name of a Swiss mathematician James Bernoulli. Binomial
distribution is a probability distribution expressing the probability of one set of
dichotomous alternatives .i.e. success or failure. The assumptions are
    • An experiment is performed under the same conditions for a fixed number of
        trials say n.
    • In each trial there are two possible outcomes of the experiment. For lack of a
        better nomenclature they are called success or failure.
    • The probability of a success denoted by p remains constant from trial to trial. The
        probability of a failure denoted by q is equal to (1-p) if the probability of success
        is not the same n each trial we will not have binomial distribution.
    • The trials are statistically independent.
    The binomial distribution-
    P(r) =
                             (n-r)    r
                       Nc q           p
                           r


p = probability of success in a single trial
q = (1-p)
n = number of trials
r= number of success

        Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009


POISSON DISTRIBUTION-
               Poisson distribution is a discrete distribution and is used in statistical
work. This distribution is used to describe the behavior of rare events. This is expected in
cases where the chance of any individual event being a very small success such as no of
accidents on road, printing mistakes on a paper.
The poison distribution
                  -m r
P(r) =     e      m
         __________________
               R!




Normal distribution- normal distribution is used in continuous series in this distribution
value of z is find out and

       Z= X – mean / S.D.

Properties of normal distribution-
   1. Normal distribution is bell shaped.
   2. It is perfectly symmetrical about mean.
   3. This is unimodal means it has one modal.
   4. Mean median and mode are equal in normal distribution.



      MEASUREMENT OF DISPERSION

Dispersion;-

               Dispersion is the measure of the variation of the items.

The concept of dispersion is related to the extent of variability in observations. The
variability in an observation is often measured as its deviation from central value. A
suitable average of all such deviations is called measure of dispersion. Measure of
dispersion enables a comparison to be made of two or more series with regard to their
variability,

Significance of Dispersion:-

        Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

   1. To determine the reliability of an average;-
   The dispersion is used to know how much the average is reliable. A low variation or
   dispersion shows more reliability and consistency in data.

   2. To serve as a basis for the control of variability: - dispersion acts as a basis of
      variability. Many measurements can be done by the dispersion and if the
      variations are high then controlling tools can be used.
   3. Useful in quality control;- due to variation methods it can be known that the
      product are up to grade or not if they are not means if variation are high then they
      can rectify the technique of productions.
   4. To facilitate the use of other statistical tools;- Many powerful analytical tools in
      statistics such as correlation analysis, the testing of hypothesis, analysis of
      variance regression analysis are based of measurement of variation



Properties of Good Measure of Variation

   1.   It should be simple to understand.
   2.   It should be easy to compute.
   3.   It should be rigidly defined.
   4.   It should be based on each and every item of the distribution.
   5.   It should have sampling stability.
   6.   It should not be affected by the extreme items.

Methods of measuring dispersion;-

   1.   Range
   2.   Quartile deviation
   3.   Mean deviation
   4.   standard deviation

   Range;-

   Range is simplest method of studying dispersion. It is the difference between the
   value of the smallest item and the value of the largest item include in distribution


   Range = L- S
             L = largest value
             S= Smallest value

        Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

 Coefficient of range=     L- S
                            _____________
                                  L+S




 Quartile deviation

             It represents the difference between third quartile and first quartile.

             Q.D. = Q3 –Q1 / 2



 Coefficient of Quartile deviation;-

 = Q3 –Q1/ Q3+ Q1

 Q3 = 3N/4

 Q3=   L+           3N/4 - c.f.
                     ______________ * H
                       F

 Q1 = N/4

 Q1= L +            N/4 - c.f.
                    ______________ * H
                      F

 L = Least value
 C.F. = cumulative frequency
 F = Frequency
 H = class interval

 Mean Deviation


 M.D. =      ∑ │D│

     Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

          ___________
               N

 ∑ │D│ = summation of deviation from actual mean
 .
  Mean deviation in discrete series.

 M.D. =     ∑ F│D│
          ___________
               N


 F = frequencies

 Coefficient of mean deviation; -       M.D. / Median




 Standard Deviation;-

    Standard deviation measures absolute dispersion or variability of the series.

                          2
           ∑ │D│         _      ∑ │D│ 2
 S.D =                                                    *i
               N                    N




 Standard deviation in continues series;-

                    2
           ∑ F│D│             ∑F│D│        2
                     -                          *i
               N                    N


    Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

   Standard deviation is denoted by
                          2
   Variance = {S.D}

   Coefficient of standard deviation = S.D / Mean * 100

Skewness, moments and kurtosis


   Skew ness;-
             When a series is not symmetrical it is said to be skewed

   Absolute measure of skew ness; - Mean – Mode

   Relative measure of skew ness;-

   1. Karl Pearson’s coefficient of skew ness;-

                                   Mean - Mode
              SKp     =       _________________
                                   Std. deviation


   2. Bow ley’s coefficient of skew ness;-

              SKb =          Q3 – Q1 – 2 median
                           ____________________________
                                          Q3 + Q1


   3. Kelly’s coefficient of skew ness;-

              SKk =         P10 + P90 – 2median
                          _____________________________
                             P90 - P10




              =             D1 + D9 – 2median

       Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

                          _________________________
                             D9 - D1



   P10 =          10N/100

                     10N/100 - c.f.
         L+           ______________         *H
                        F

   P90 =              90N/100


 D1 =                 1N/100
 D9 =          9N/100

All the methods are just like median.

                      Moments


        Moments are the sum of the deviations. It is the sum of the deviation is also
known as the first moment of dispersion. It is the sum of the deviations of the items of a
series from mean of the series, divided by the total number of items in the distribution. In
other words, it is the average deviation of the items from the mean. The arithmetic means
of the various powers of the deviations of the items in a distribution are called the
moments of the distribution.


Moments are denoted by u (mu)

U1 =    ∑ (X – mean) / n
                      2
U2 = ∑ (X – mean) / n
                                  3
U3 =           ∑ (X – mean) / n

For frequency distribution;-

U1 =       ∑ F (X – mean) / n
                          2

        Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

U2 = ∑ F (X – mean) / n
                                       3
U3 =              ∑ F (X – mean) / n

First moment about origin is mean
Second moments about mean are variance.
Third moments about mean is skew ness
Fourth moments about mean are kurtosis.

                         Kurtosis

Kurtosis is the degree of peaked ness of a distribution, usually taken relative to a normal
distribution.
If a curve is peaked then a normal curve it is called leptokurtic. If a curve is flat topped
than the normal curve it is called platykurtic. The normal curve is called mesokurtic,


                         2
β1     =    u4 / u2

β=     Kurtosis

u4, u2     fourth moment and second moment




           Visit us at www.sgiithisar.com Contact at 94163-59920
STATISTICAL ANALYSIS NOTES FOR YEAR 2008-2009

Important Questions other than above notes

   1. Define the various methods of data collection. Explain with example.
   2. Explain the term probability and its theorems.
   3. Explain the following
           a. Action space
           b. Bayesian rule
           c. Games theory
           d. Expected pay off table
           e. Null hypothesis and alternate hypothesis
           f. One tail and two tail test
   4. What is a sampling distribution? What purpose does it serve?
   5. How does the size of population and the kind of random sampling determine the
       shape of a sampling distribution?
   6. Ch No 10
   7. explain the following with the help of example
           a. Z- test
           b. T-test
           c. χ2 Test
   8. What is chi- square test of goodness of fit? what precautions are necessary in
       using this test?
   9. Briefly discuss the advantages and disadvantages of non-parametric methods as
       compared with parametric methods in statistics.
   10. What are index numbers? What purpose do they serve? Discuss the various
       problems faced in the construction of index numbers.
   11. Explain the test used in index number?




       Visit us at www.sgiithisar.com Contact at 94163-59920

				
DOCUMENT INFO
Shared By:
Tags:
Stats:
views:44
posted:4/12/2012
language:English
pages:30
Description: Notes for MBA 2nd Sem of GJU University Hissar Haryana In India