Docstoc

PHD

Document Sample
PHD Powered By Docstoc
					       STATISTICS-INTRODUCTION

• Role of Statistics in Managerial Decisions
• Nature of Data, Population data ,Sample data.
• Frequency Distribution




                                                  0
You use statistics daily without even realizing it!!!

 You use statistics very often
 without even realising it !!!!!


Examples ??????




                                                        1
Statistics is used to help determine
Which product I should sale (Demand stats)

How much you pay for insurance (Mortality Stat)

Whether drugs are approved for use (Drug trials)

Which cars you buy (Reliability ratings, crash tests)

Which products are on you grocery shelf (focus groups),

and where they are located (Big Bazzar & Snacks Shop are right next
to each other…what a concept!!!)

What politicians claim as their “firm beliefs” (opinion polls).

Favorites to win in sports.

Whether it will rain

And , on and on.

                                                                      2
  Statistics…..Defn
Many people think of statistics as large amounts of numerical data, e.g.
share prices, GDP statistics, runs scored by Sachin etc etc

Definition : Statistics refers to the range of
techniques and procedure for collecting data,
summarizing data, classifying data, analyzing
data, interpreting data, displaying data and
making decisions based on data.

Definition: By Statistics, we mean aggregate of
facts, affected to a marked extent by
multiplicity of causes, numerically expresses,
enumerated or estimated accordingly to a
reasonable standards of accuracy, collected in a
systematic manner for a predetermined purpose
and placed in relation to each other

                                                                           3
   Characteristics of Statistics

Statistics are the aggregate of facts

Statistics are affected to a marked extent by multiplicity of causes

Statistics are numerically expressed

Statistics are expressed according to reasonable standards of accuracy

Statistics should be collected with reasonable standards of accuracy

Statistics should be placed in relation to each other




                                                                         4
    Why Study Statistics
It presents the facts in a definite & clear terms.

It gives the concise shape to the mass of figures and develops meaning from
the data
It helps to compare between two sets of figures

It helps in formulating & testing hypothesis
It helps in understanding & predicting the future events, from the past &
current data

It helps in formulation of suitable policies

It helps in understanding the complex happenings


Statistics are widely used in business. Usage continues to increase as the
business world becomes larger, more complex, and more quantitative.

                                                                              5
   Limitations of Statistics
Statistics does not study individual observations. It is only concerned with
groups of observations

Statistics deals with quantitative characteristics. It does not deal with
qualitative characteristics such as beauty, honesty, sharpness, brightness,
poverty, intelligence etc

Statistical laws are true only on averages

Statistics does not reveal the entire story

Statistics is only one of the methods of studying the problem

Statistics can be misused

Statistical data should be uniform & homogeneous.


                                                                               6
Decision Making - Businesses
 Accounting

   Public accounting firms use statistical
   sampling procedures when conducting
   audits for their clients.


 Economics
   Economists use statistical information
   in making forecasts about the future of
   the economy or some aspect of it.




                                             7
Decision Making - Businesses
 Marketing

   Electronic point-of-sale scanners at
   retail checkout counters are used to
   collect data for a variety of marketing
   research applications.


  Production
   A variety of statistical quality
   control charts are used to monitor
   the output of a production process.




                                             8
Decision Making - Businesses
 Finance
  Financial advisors use price-earnings ratios and
  dividend yields to guide their investment
  recommendations.




                                                     9
Uses & Abuses of Statistics
Most of the time, samples are used to infer something (draw conclusions)
about the population. However, occasionally the conclusions are inaccurate
or inaccurately portrayed for the following reasons:

Sample is too small. Even a large sample may not represent the population.

Unauthorized personnel are giving wrong information that the public will
take as truth. A possibility is a company sponsoring a statistics research
to prove that their company is better.

Visual aids may be correct, but emphasize different aspects. Specific
examples include graphs which don't start at zero thus exaggerating small
differences and charts which misuse area to represent proportions.

Precise statistics or parameters may incorrectly convey a sense of high
accuracy.

Misleading or unclear or incomplete information may be shared.


                                                                             10
Misleading Statistical Presentation

                              These two graphs represent sales…who has
                                        seen faster sales growth?
 16000                                                                14000

 14000                                                                13500

 12000                                                                13000

 10000                                                                12500

  8000                                                                12000

  6000                                                                11500

  4000                                                                11000

  2000                                                                10500

    0                                                                 10000
         1   3   5   7   9   11 13   15 17 19 21   23 25 27 29   31           1   3   5   7   9   11 13 15   17 19   21 23 25   27 29 31




                              These are actually the same numbers with
                                     different scales along the side.



                                                                                                                                           11
  Branches of Statistics
The academic discipline of statistics can be divided into
two major branches:

   – Descriptive   statistics

   – Inferential   statistics.




                                                            12
  Descriptive Statistics
Deals with summarizing and presenting data in a readable, easily understood
form.

It is tabular, graphical, and numerical methods used to summarize data


Techniques:

• Visualizing and Summarizing Data: Raw Data, Data Array, Distribution
• Characterizing Distributions with Numerical and Graphical Tools: Histogram,
 Ogive, Measures of Central Tendency: mean, median, mode; Measures of
 Dispersion: Range, standard deviation, variance, etc.

• Exploring the Relationship between Two variables: Scatter Diagrams,
 Correlation Coefficients, Frequency Tables


                                                                                13
   Inferential Statistics
Drawing conclusions about a population based on information from a sample.

Statistical Inference is the process of using information obtained from
analyzing a sample to make estimates about characteristics of the entire
population. It is a discipline that allows us to estimate unknown quantities by
making some elementary measurements.

Using these estimates we can then make Predictions and Forecast the Future

Statistical Inference with Hypothesis Testing: null and alternative
hypotheses, one-tailed vs. two-tailed tests, test statistics, p-value, statistical
significance, decision rules
• The Concept of Risk and Power: risks involved, type I and II errors,
  confidence level and power of test
• Statistical Inference with Confidence Intervals: how it works, when to use it
• Equivalence of the Hypothesis Testing and the Confidence Interval
  Approaches
• Statistical Inference for a Single Sample or Group: Hypothesis Testing vs.
  Confidence Interval Approach
                                                                                     14
Population & Sample



   Population

                      Sample




                               15
   Population & Sample
Population: The complete set of data elements is termed the population. It is a
set of all items in a particular study

Sample: A sample is a portion of a population selected for further analysis. It is
the subset of population

Parameter: A parameter is a characteristic of the whole population

Statistic: A statistics is a characteristic of the sample, presumably a measurable

Remember: Parameter is to Population as Statistic is to Sample




                                                                                  16
Why Sample                Why Sample?


Less time consuming than a census

Less costly to administer than a
census

More practical to administer than
a census of the targeted
population

Case of Sampling Survey
       Opinion Polls




                                        17
Data

   – Data are the facts and figures that are collected,
     summarized, analyzed, and interpreted. A collection of data
     is called „data set‟ and a single observation is called a „data
     element‟

   – Data can be further classified as being qualitative
     (Attribute) or quantitative (Variable).

   – Variables: Weight, height etc……Two types….Continuous &
     Discrete
   Continuous Variable is the variable, which can take any value
     within the given interval . E.g. Weight….50.0, 50.2, 50.5, 51.0
     etc
   Discrete variable is the variable which can take isolated values
     e.g. No of patients visiting a doctor e.g. 50, 51 etc

   – Attribute: Honesty, Integrity etc

                                                                       18
                        Data Types




                         Data


           Numerical                 Categorical
       (Quantitative)                (Qualitative)



Discrete           Continuous



                                                     19
Primary Data


   Data can be classified as Primary Data or Secondary Data


   Primary data are those which are collected for a specific
   purpose directly from the field and hence are original in nature.
   This is collected by or on behalf of the person or persons who
   are going to make the use of the data. Once the data have been
   collected, processed & published, it becomes the secondary
   data for the subsequent usage by different people for other
   application in different connection

   Methods for Primary Data Collection
     • Direct Personal Interview
     • Observations
     • Indirect Oral Interviews
     • Information from agents/correspondents
     • Mailed Questionnaire Method
                                                                       20
Secondary Data


   Secondary data are such numerical information, which have
   been already collected by some agency for specific purpose and
   are subsequently compiled from that source for the application
   in different connections.

   There are many advantages of using secondary data
     • It is inexpensive
     • Large quantity of data available from wide range of sources
     • The data may be available for many number of years, and
       hence we can understand trend and may forecast the
       futuristic information




                                                                     21
Data Sources




                 Primary                Secondary
               Data Collection       Data Compilation


                                     Print or Electronic
    Observation             Survey


          Experimentation

                                                           22
Descriptive Statistics




                         23
Data Processing Techniques
•Raw Data

•Data Array

•Discrete Frequency Distribution

•Continuous Frequency Distribution




                                     24
Raw Data & Data Array

Raw Data:
•Information before it is arranged & analysed is raw data. It is
called raw, as it is unprocessed by any statistical methods


•Example

Data Array:
•It involves arranging the values in either ascending or descending
order


•Example


                                                                      25
Numerical 1 – Data Array
Raw Data

 14     26     2      34   8    13   27   37   9    12
 39     42    45      30   32   24   24   30   20   23
 14     18    30      33   24   34   30   10   22   14



Prepare data array.




                                                         26
Numerical 1 – Solution
Data Array.

 2      8     9    10   12   13   14   14   14   18
 20    22     23   24   24   24   26   27   30   30
 30    30     32   33   34   34   37   39   42   45




                                                      27
Discrete Distribution
•In the discrete frequency distribution, after arranging the values
in ascending order, we count the frequency i.e. number of times
each value has appeared in the data set by using tally marks

•Discrete distribution is also known as ungrouped FD.


•Numerical




                                                                      28
Numerical 2 - Discrete FD
Marks   Tally   Frequency   Marks   Tally   Fequency
        Marks                       Marks
 2                  1        24                3
 8                  1        26                1
 9                  1        27                1
 10                 1        30                4
 12                 1        32                1
 13                 1        33                1
 14                3         34                2
 18                 1        37                1
 20                 1        39                1
 22                 1        42                1
 23                 1        45                1
                                                       29
Continuous Frequency Distribution
•Continuous Frequency Distribution
•In this, all the values are classified in groups or classes, hence
this type of distribution is known as grouped or continuous
frequency distribution

   •Class Limits
   •Class Interval
   •Class Frequency
   •Class Mid Point or Class Mark




                                                                      30
Class Limits
 Class Limits

 The two boundaries of the class are known as Class Limits. The
 Class Limits are the lowest and the highest value that can be
 included in the class.

 e.g. 10-20…In this class, 10 is the lower limit and 20 is the
 upper limit

 The lower limit of the class is that value below which no
 observation can be included in the class.

 The upper limit of the class is that value above which no
 observation can be included in the class.



                                                                  31
Class Interval
 Class Interval
 The difference between the upper limit and lower limkt of the
      class is known as class interval or class width of that class.
 e.g. Class 10-20 has the CI of 10.

 In case, for the classification, the number of classes are not
     given, then the number of classes can be determined by
 (A) using the Sturge‟s formula

      No of Classes (K) = 1 + 3.322 log N

 (B) K shall be the smallest exponent of number 2 i.e. „2 power K
     should be greater than or equal to N.

 where N is the total no of observations

 Note: Normally, classes should be between 5 & 15.
                                                                       32
Class Interval
    Formula for the Class Interval:

    Class Interval (i) = (Next unit value after the largest value in
        the data – Smallest value in the data)/No of Classes

    e.g. If the marks of 30 students range between 10 & 40 and if
         we want to divide in 3 classes, then

Class Interval (i) = (41-10)/3 = 10.33 i.e. 11

The classes become 10-21, 21-32, 32-43.




                                                                       33
Exclusive / Inclusive Method
 There are 2 methods of classifying the data according to class
    intervals.

 Exclusive Method: In this, the class intervals are so fixed that
    the upper limit of the class is the lower limit of the next
    class. In other words, in exclusive method, upper limits are
    excluded from that class. E.g. 10-20, 20-30, 30-40 etc.
    This is more suitable for continuous variable.

 Inclusive Method: In this type, the upper limits are included in
     the class. E.g. 10-19, 20-29, 30-39 etc. This is more
     suitable for discrete variable.

 Correction Factor = (Lower Limit of 2nd Class – Upper Limit of
    1st Class)/2


                                                                    34
Correction Factor
 In case of inclusive type, for getting the correct CI, we need
     to add the correction factor to upper limit of the classes
     and subtract the same from the lower limit of the classes.

 Correction Factor = (Lower Limit of 2nd Class – Upper Limit of
    1st Class)/2

 e.g. 10-19, 20-29 etc etc Class

 Correction factor = (20-19)/2 = 0.5 and hence the class
    becomes 9.5-19.5 and hence the CI becomes 10




                                                                  35
Inclusive to exclusive
 Convert the following inclusive classes into exclusive classes
           Inclusive Type            Exclusive type


                10-14                   9.5-14.5


                15-19                   14.5-19.5


               20-24                    19.5-24.5


               25-29                    24.5-29.5




                                                                  36
Constructing FD

Step 1: Decide on the type (Inclusive / Exclusive) and
number of classes for dividing the data by using Sturge‟s
formula. (If given in the numerical, then go to step 2
directly.

Step 2: Sort the data into different classes and count the
frequency

Step 3: Illustrate the data in the chart




                                                             37
Numerical 3 – Continuous FD
Step 1: Calculate the No of Classes (Sturge‟s formula)
No of Classes (K) = 1 + 3.322 log N
                 = 1 + 3.322 log 30
                 = 1 + 3.322 (1.477)
                 = 5.9 = 6
Also 2 power 5 = 32, hence by using the other formula, the
number of classes shall be 5.

Let‟s use 6 classes and proceed. i.e. K = 6.

Step 2: Sort the data points into classes and count the no of
points in each class.
Now Class Interval width = (Next unit value after Largest
value –Smallest value)/K = (46-2)/6 = 44/6 = 7.33 i.e. approx
8.

Hence the classes shall be 2-9, 10-17, 18-25, 26-33, 34-41,
42-49.
                                                                38
Numerical 3 – Continuous FD

    Class    Tally Marks   Frequency
    2–9                       3
   10 – 17                    6
   18 – 25                    7
   26 – 33                    8
   34 – 41                    4
   42 – 49                    2




                                       39
Numerical 4

The following set of the data represents the Km per litre of
40 similar motor cycles.

40.5, 39.7, 40.6, 39.9, 40.9, 38.9, 41.4, 40.5, 41.0, 38.8, 39.6,
40.4, 39.9, 40.2, 40.8, 40.7, 40.6, 41.7, 40.8, 39.1, 40.1, 40.7,
40.1, 40.7, 40.7, 39.8, 39.3, 39.6, 40.5, 41.3, 41.0, 39.9, 40.4,
40.9, 40.1, 41.2, 40.2, 40.0, 39.4, 40.6.

Construct the frequency distribution to this data taking
classes as 38.5-39.0, 39.0-39.5 etc




                                                                    40
Numerical 4
      Classes    Tally Marks   Frequency
     38.5-39.0                    2
     39.0-39.5                    3
     39.5-40.0                    7
     40.0-40.5                    8
     40.5-41.0                    14
     41.0-41.5                    5
     41.5-42.0                     1




                                           41
Numerical 5

The credit office of a departmental store gave the following
statements for the payment due to 40 customers. Construct
the frequency table of the balances due, taking the class
interval as Rs 50 and under Rs 200, Rs 200 and under Rs 350
etc. Also find the relative frequencies & percentage
frequencies.

337, 570, 99, 759, 487, 352, 115, 60, 521, 95, 563, 399,
625, 215, 360, 178, 827, 301, 501, 199, 110, 501, 201, 99,
637, 328, 539, 150, 417, 250, 451, 595, 422, 344, 186, 681,
397, 790, 272, 514.




                                                               42
Numerical 5
Classes   Tally   Frequency     Relative    Percentage
          Marks                Frequency    Frequency
50-200               10       10/40 = 0.25 0.25*100=25
200-350              8            0.2          20
350-500              8            0.2          20
500-650              10          0.25          25
650-800              3           0.075         7.5
800-950               1          0.025         2.5
                     40          1.00         100




                                                         43
Numerical 6

The following data shows the time spent by the passenger at
the airport before he can enter into the check in lounge.
Construct the frequency distribution using the suitable
classes.

Also find the relative and percentage frequency

34, 40, 23, 28, 31, 40, 25, 33, 47, 32,
44, 34, 38, 31, 33, 42, 26, 35, 27, 31,
29, 40, 31, 30, 34, 31, 38, 35, 37, 33,
24, 44, 37, 39, 32, 36, 34, 36, 41, 39,
29, 22, 28, 44, 51, 31, 44, 28, 47, 31.




                                                              44

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:42
posted:4/16/2011
language:English
pages:45