11th social-economics-statistics for economics

Document Sample
11th social-economics-statistics for economics Powered By Docstoc
					                               CONTENTS

Foreword                                              i
                                                      i
                                                      i

Chapter 1 : Introduction                               1

Chapter 2 : Collection of Data                         9

Chapter 3 : Organisation of Data                      22

Chapter 4 : Presentation of Data                      40

Chapter 5 : Measures of Central Tendency              58

Chapter 6 : Measures of Dispersion                    74

Chapter 7 : Correlation                               91

Chapter 8 : Index Numbers                            107

Chapter 9 : Use of Statistical Tools                 121

APPENDIX A : GLOSSARY   OF   STATISTICAL TERMS       131

APPENDIX B : TABLE   OF   TWO-DIGIT RANDOM NUMBERS   134
CHA P T ER

       1                                                 Introduction




                                          told this subject is mainly around
  Studying this chapter should
  enable you to:                          what Alfred Marshall (one of the
  • know what the subject of              founders of modern economics) called
     economics is about;                  “the study of man in the ordinary
  • understand how economics is           business of life”. Let us understand
     linked with the study of economic
     activities in consumption,           what that means.
     production and distribution;             When you buy goods (you may
  • understand why knowledge of           want to satisfy your own personal
     statistics can help in describing
                                          needs or those of your family or those
     consumption, production and
     distribution;                        of any other person to whom you want
  • learn about some uses of              to make a gift) you are called
     statistics in the understanding of   a consumer.
     economic activities.
                                              When you sell goods to make
                                          a profit for yourself (you may be
 .
1 WHY    ECONOMICS?                       a shopkeeper), you are called a seller.
You have, perhaps, already had                When you produce goods (you may
Economics as a subject for your earlier   be a farmer or a manufacturer), you
classes at school. You might have been    are called a producer.
 2                                                    STATISTICS FOR ECONOMICS

   When you are in a job, working for         In real life we cannot be as lucky
some other person, and you get paid        as Aladdin. Though, like him we have
for it (you may be employed by             unlimited wants, we do not have a
somebody who pays you wages or a           magic lamp. Take, for example, the
salary), you are called a service-         pocket money that you get to spend.
holder.                                    If you had more of it then you could
   When you provide some kind of           have purchased almost all the things
service to others for a payment (you       you wanted. But since your pocket
may be a lawyer or a doctor or a           money is limited, you have to choose
banker or a taxi driver or a transporter   only those things that you want the
of goods), you are called a service-       most. This is a basic teaching of
provider.                                  Economics.
   In all these cases you will be called
gainfully employed in an economic                         Activities
activity. Economic activities are ones        • Can you think for yourself of
that are undertaken for a monetary              some other examples where a
gain. This is what economists mean              person with a given income has
by ordinary business of life.                   to choose which things and in
                                                what quantities he or she can
                Activities                      buy at the prices that are being
     • List different activities of the         charged (called the current
       members of your family. Would            prices)?
       you call them economic                 • What will happen if the current
       activities? Give reasons.                prices go up?
     • Do you consider yourself a             Scarcity is the root of all economic
       consumer? Why?                      problems. Had there been no scarcity,
                                           there would have been no economic
We cannot get something for                problem. And you would not have
nothing                                    studied Economics either. In our daily
If you ever heard the story of Aladdin     life, we face various forms of scarcity.
and his Magic Lamp, you would agree        The long queues at railway booking
that Aladdin was a lucky guy.              counters, crowded buses and trains,
Whenever and whatever he wanted, he        shortage of essential commodities, the
just had to rub his magic lamp on          rush to get a ticket to watch a new
when a genie appeared to fulfill his       film, etc., are all manifestations of
wish. When he wanted a palace to live      scarcity. We face scarcity because the
in, the genie instantly made one for       things that satisfy our wants are
him. When he wanted expensive gifts        limited in availability. Can you think
to bring to the king when asking for       of some more instances of scarcity?
his daughter’s hand, he got them at           The resources which the producers
the bat of an eyelid.                      have are limited and also have
INTRODUCTION                                                                        3

alternative uses. Take the case of food       activities of various kinds. For this,
that you eat every day. It satisfies your     you need to know reliable facts about
want of nourishment. Farmers                  all the diverse economic activities like
employed in agriculture raise crops           production, consumption and
that produce your food. At any point          distribution. Economics is often
of time, the resources in agriculture         discussed in three parts: consum-
like land, labour, water, fertiliser, etc.,   ption, production and distribution.
are given. All these resources have               We want to know how the
alternative uses. The same resources          consumer decides, given his income
can be used in the production of non-         and many alternative goods to choose
food crops such as rubber, cotton, jute       from, what to buy when he knows the
etc. Thus alternative uses of resources       prices. This is the study of Consum-
give rise to the problem of choice            ption.
between different commodities that
                                                  We also want to know how the
can be produced by those resources.
                                              producer, similarly, chooses what to
                Activities
                                              produce for the market when he
                                              knows the costs and prices. This is the
    • Identify your wants. How many           study of Production.
      of them can you fulfill? How
      many of them are unfulfilled?
                                                  Finally, we want to know how the
      Why you are unable to fulfill           national income or the total income
      them?                                   arising from what has been produced
    • What are the different kinds of         in the country (called the Gross
      scarcity that you face in your          Domestic Product or GDP) is
      daily life? Identify their causes.      distributed through wages (and
                                              salaries), profits and interest (We will
Consumption,        Production        and     leave aside here income from
Distribution                                  international trade and investment).
If you thought about it, you might            This is study of Distribution.
have realised that Economics involves             Besides these three conventional
the study of man engaged in economic          divisions of the study of Economics
                                              about which we want to know all the
                                              facts, modern economics has to
                                              include some of the basic problems
                                              facing the country for special studies.
                                                  For example, you might want to
                                              know why or to what extent some
                                              households in our society have the
                                              capacity to earn much more than
                                              others. You may want to know how
                                              many people in the country are really
 4                                                    STATISTICS FOR ECONOMICS

poor, how many are middle-class, how       of numbers relating to selected facts
many are relatively rich and so on. You    in a systematic form) to be added to
may want to know how many are              all modern courses of modern
illiterate, who will not get jobs,         economics.
requiring education, how many are             Would you now agree with the
highly educated and will have the best     following definition of economics that
job opportunities and so on. In other      many economists use?
words, you may want to know more              “Economics is the study of how
facts in terms of numbers that would       people and society choose to
answer questions about poverty and         employ scarce resources that could
disparity in society. If you do not like   have alternative uses in order to
the continuance of poverty and gross       produce various commodities that
disparity and want to do something         satisfy their wants and to
about the ills of society you will need    distribute them for consumption
to know the facts about all these          among various persons and groups
things before you can ask for              in society.”
appropriate actions by the
government. If you know the facts it                       Activity
may also be possible to plan your own         • Would you say, in the light of the
life better. Similarly, you hear of —           discussion above, that this
some of you may even have                       definition used to be given seems
experienced disasters like Tsunami,             a little inadequate now? What
earthquakes, the bird flu — dangers             does it miss out?
threatening our country and so on
that affect man’s ‘ordinary business       2. STATISTICS   IN   ECONOMICS
of life’ enormously. Economists can
look at these things provided they         In the previous section you were told
know how to collect and put together       about certain special studies that
the facts about what these disasters       concern the basic problems facing a
cost systematically and correctly. You     country. These studies required that
may perhaps think about it and ask         we know more about economic facts
yourselves whether it is right that        in terms of numbers. Such economic
modern economics now includes              facts are also known as data.
learning the basic skills involved in         The purpose of collecting data
making useful studies for measuring        about these economic problems is to
poverty, how incomes are distributed,      understand and explain these
how earning opportunities are related      problems in terms of the various
to your education, how environmental       causes behind them. In other words,
disasters affect our lives and so on?      we try to analyse them. For example,
   Obviously, if you think along these     when we analyse the hardships of
lines, you will also appreciate why we     poverty, we try to explain it in terms
needed Statistics (which is the study      of the various factors such as
INTRODUCTION                                                                      5

unemployment, low productivity of              By data or statistics, we mean both
people, backward technology, etc.          quantitative and qualitative facts that
   But, what purpose does the              are used in Economics. For example,
analysis of poverty serve unless we are    a statement in Economics like “the
able to find ways to mitigate it. We       production of rice in India has
may, therefore, also try to find those     increased from 39.58 million tonnes
measures that help solve an economic       in 1974–75 to 58.64 million tonnes in
problem. In Economics, such                1984–85”, is a quantitative fact. The
measures are known as policies.            numerical figures such as ‘39.58
   So, do you realise, then, that no       million tonnes’ and ‘58.64 million
analysis of a problem would be             tonnes’ are statistics of the
possible without the availability of       production of rice in India for
data on various factors underlying an      1974–75 and 1984–85 respectively.
economic problem? And, that, in such           In addition to the quantitative
a situation, no policies can be            data, Economics also uses qualitative
formulated to solve it. If yes, then you   data. The chief characteristic of such
have, to a large extent, understood the
                                           information is that they describe
basic relationship between Economics
                                           attributes of a single person or a group
and Statistics.
                                           of persons that is important to record
3. WHAT   IS   STATISTICS?                 as accurately as possible even though
                                           they cannot be measured in
At this stage you are probably ready       quantitative terms. Take, for example,
to know more about Statistics. You         “gender” that distinguishes a person
might very well want to know what the      as man/woman or boy/girl. It is often
subject “Statistics” is all about. What
                                           possible (and useful) to state the
are its specific uses in Economics?
                                           information about an attribute of a
Does it have any other meaning? Let
                                           person in terms of degrees (like better/
us see how we can answer these
questions to get closer to the subject.    worse; sick/ healthy/ more healthy;
    In our daily language the word         unskilled/ skilled/ highly skilled etc.).
‘Statistics’ is used in two distinct       Such qualitative information or
senses: singular and plural. In the        statistics is often used in Economics
plural sense, ‘statistics’ means           and other social sciences and
‘numerical facts systematically            collected and stored systematically
collected’ as described by Oxford          like quantitative information (on
Dictionary. Thus, the simple meaning       prices, incomes, taxes paid etc.),
of statistics in plural sense is data.     whether for a single person or a group
  Do you know that the term statistics
                                           of persons.
  in singular means the ‘science of
                                               You will study in the subsequent
  collecting, classifying and using        chapters that statistics involves
  statistics’ or a ‘statistical fact’.     collection and organisation of data. The
                                           next step is to present the data in
 6                                                      STATISTICS FOR ECONOMICS

tabular, diagrammatic and graphic            a statistical data. Whereas, saying
forms. The data, then, is summarised         hundreds of people died, is not.
by calculating various numerical                Statistics also helps in condensing
indices such as mean, variance,              the mass of data into a few numerical
standard deviation etc. that represent       measures (such as mean, variance
the broad characteristics of the             etc., about which you will learn later).
collected set of information.                These numerical measures help
                                             summarise data. For example, it
                 Activities                  would be impossible for you to
     • Think of two examples of              remember the incomes of all the
       qualitative and quantitative data.    people in a data if the number of
     • Which of the following would give     people is very large. Yet, one can
       you qualitative data; beauty,         remember easily a summary figure like
       intelligence, income earned,          the average income that is obtained
       marks in a subject, ability to        statistically. In this way, Statistics
       sing, learning skills?                summarises and presents a
                                             meaningful overall information about
4. WHAT STATISTICS DOES?                     a mass of data.
                                                Quite often, Statistics is used in
By now, you know that Statistics is
an indispensable tool for an economist       finding relationships between different
that helps him to understand an              economic factors. An economist may
economic problem. Using its various          be interested in finding out what
methods, effort is made to find the          happens to the demand for a
causes behind it with the help of the        commodity when its price increases
qualitative and the quantitative facts       or decreases? Or, would the supply of
of the economic problem. Once the            a commodity be affected by the
causes of the problem are identified,        changes in its own price? Or, would
it is easier to formulate certain policies   the consumption expenditure increase
to tackle it.                                when the average income increases?
    But there is more to Statistics. It      Or, what happens to the general price
enables an economist to present              level when the government
economic facts in a precise and              expenditure increases? Such ques-
definite form that helps in proper           tions can only be answered if any
comprehension of what is stated.             relationship exists between the
When economic facts are expressed in         various economic factors that have
statistical terms, they become exact.        been stated above. Whether such
Exact facts are more convincing than         relationships exist or not can be easily
vague statements. For instance,              verified by applying statistical
saying that with precise figures, 310        methods to their data. In some cases
people died in the recent earthquake         the economist might assume certain
in Kashmir, is more factual and, thus,       relationships between them and like
INTRODUCTION                                                                            7

to test whether the assumption she/           consumption of past years or of recent
he made about the relationship is valid       years obtained by surveys. Thus,
or not. The economist can do this only        statistical methods help formulate
by using statistical techniques.              appropriate economic policies that
    In another instance, the economist        solve economic problems.
might be interested in predicting the
changes in one economic factor due            5. CONCLUSION
to the changes in another factor. For
example, she/he might be interested           Today, we increasingly use Statistics
in knowing the impact of today’s              to analyse serious economic problems
investment on the national income in          such as rising prices, growing
future. Such an exercise cannot be            population, unemployment, poverty
undertaken without the knowledge of           etc., to find measures that can solve
Statistics.                                   such problems. Further it also helps
    Sometimes, formulation of plans           evaluate the impact of such policies
and policies requires the knowledge           in solving the economic problems. For
of future trends. For example, an             example, it can be ascertained easily
           Statistical methods are no substitute for common sense!
    There is an interesting story which is told to make fun of statistics. It is said
    that a family of four persons (husband, wife and two children) once set out
    to cross a river. The father knew the average depth of the river. So he
    calculated the average height of his family members. Since the average height
    of his family members was greater than the average depth of the river, he
    thought they could cross safely. Consequently some members of the family
    (children) drowned while crossing the river.
    Does the fault lie with the statistical method of calculating averages or
    with the misuse of the averages?

economic planner has to decide in             using statistical techniques whether
2005 how much the economy should              the policy of family planning is
produce in 2010. In other words, one          effective in checking the problem of
must know what could be the                   ever-growing population.
expected level of consumption in 2010            In economic policies, Statistics
in order to decide the production plan        plays a vital role in decision making.
of the economy for 2010. In this              For example, in the present time of
situation, one might make subjective          rising global oil prices, it might be
judgement based on the guess about            necessary to decide how much oil
consumption in 2010. Alternatively,           India should import in 2010. The
one might use statistical tools to            decision to import would depend on
predict consumption in 2010. That             the expected domestic production of
could be based on the data of                 oil and the likely demand for oil in
 8                                                         STATISTICS FOR ECONOMICS

2010. Without the use of Statistics, it        cannot be made unless we know the
cannot be determined what the                  actual requirement of oil. This vital
expected domestic production of oil            information that help make the
and the likely demand for oil would            decision to import oil can only be
be. Thus, the decision to import oil           obtained statistically.



                                         Recap
       •   Our wants are unlimited but the resources used in the production
           of goods that satisfy our wants are limited and scarce. Scarcity is
           the root of all economic problems.
       •   Resources have alternative uses.
       •   Purchase of goods by consumers to satisfy their various needs is
           Consumption.
       •   Manufacture of goods by producers for the market is Production.
       •   Division of the national income into wages, profits, rents and interests
           is Distribution.
       •   Statistics finds economic relationships using data and verifies them.
       •   Statistical tools are used in prediction of future trends.
       •   Statistical methods help analyse economic problems and
           formulate policies to solve them.




                                      EXERCISES

      .
     1 Mark the following statements as true or false.
        i)
        ( Statistics can only deal with quantitative data.
         i)
        (i Statistics solves economic problems.
        (iii) Statistics is of no use to Economics without data.
      .
     2 Make a list of activities that constitute the ordinary business of life. Are
        these economic activities?
      .
     3 ‘The Government and policy makers use statistical data to formulate
        suitable policies of economic development’. Illustrate with two examples.
      .
     4 You have unlimited wants and limited resources to satisfy them. Explain
        by giving two examples.
      .
     5 How will you choose the wants to be satisfied?
      .
     6 What are your reasons for studying Economics?
      .
     7 Statistical methods are no substitute for common sense. Comment.
CHAPTER

          2                                     Collection of Data




                                           chapter, you will study the sources of
  Studying this chapter should enable
                                           data and the mode of data collection.
  you to:
  • understand the meaning and             The purpose of collection of data is to
     purpose of data collection;           collect evidence for reaching a sound
  • distinguish between primary and        and clear solution to a problem.
     secondary sources;                        In economics, you often come
  • know the mode of collection of data;   across a statement like,
  • distinguish between Census and             “After many fluctuations the output
     Sample Surveys;
                                           of food grains rose to 176 million tonnes
  • be familiar with the techniques of
     sampling;                             in 1990–91 and 199 million tonnes in
  • know about some important              1996–97, but fell to 194 million tonnes
     sources of secondary data.            in 1997–98. Production of food grains
                                           then rose continuously and touched
                                           212 million tonnes in 2001–02.”
1. I N T R O D U C T I O N
                                               In this statement, you can observe
In the previous chapter, you have read     that the food grains production in
about what is economics. You also          different years does not remain the
studied about the role and importance      same. It varies from year to year and
of statistics in economics. In this        from crop to crop. As these values
1 0                                                       STATISTICS FOR ECONOMICS

vary, they are called variable. The        2. WHAT   ARE     THE   SOURCES   OF    DATA?
variables are generally represented by
                                           Statistical data can be obtained from
the letters X, Y or Z. The values of
                                           two sources. The enumerator (person
these variables are the observation.
                                           who collects the data) may collect the
For example, suppose the food grain
                                           data by conducting an enquiry or an
production in India varies between
                                           investigation. Such data are called
100 million tonnes in 1970–71 to 220
                                           Primary Data, as they are based on
million tonnes in 2001–02 as shown
                                           first hand information. Suppose, you
in the following table. The years are
                                           want to know about the popularity of
represented by variable X and the
                                           a film star among school students. For
production of food grain in India (in
                                           this, you will have to enquire from a
million tonnes) is represented by
                                           large number of school students, by
variable Y:
                                           asking questions from them to collect
                 TABLE 2.1                 the desired information. The data you
      Production of Food Grain in India    get, is an example of primary data.
              (Million Tonnes)                 If the data have been collected and
        X                      Y           processed (scrutinised and tabulated)
      1970–71                108           by some other agency, they are called
      1978–79                132           Secondary Data. Generally, the
      1979–80                108           published data are secondary data.
      1990–91                176           They can be obtained either from
      1996–97                199           published sources or from any other
      1997–98                194           source, for example, a web site. Thus,
      2001–02                212           the data are primary to the source that
                                           collects and processes them for the
    Here, these values of the variables    first time and secondary for all sources
X and Y are the ‘data’, from which we      that later use such data. Use of
can obtain information about the           secondary data saves time and cost.
trend of the production of food grains
                                           For example, after collecting the data
in India. To know the fluctuations in
                                           on the popularity of the film star
the output of food grains, we need the
                                           among students, you publish a report.
‘data’ on the production of food grains
                                           If somebody uses the data collected
in India. ‘Data’ is a tool, which helps
                                           by you for a similar study, it becomes
in understanding problems by
                                           secondary data.
providing information.
    You must be wondering where do
                                           3. HOW    DO    WE   COLLECT   THE     DATA?
‘data’ come from and how do we collect
these? In the following sections we will   Do you know how a manufacturer
discuss the types of data, method and      decides about a product or how a
instruments of data collection and         political party decides about a
sources of obtaining data.                 candidate? They conduct a survey by
COLLECTION OF DATA                                                           1 1

asking questions about a particular       Good Q
product or candidate from a large         )
                                          i
                                          ( Is the electricity supply in your
group of people. The purpose of              locality regular?
surveys is to describe some               ii
                                          () Is increase in electricity charges
characteristics like price, quality,         justified?
usefulness (in case of the product) and   •  The questions should be precise
popularity, honesty, loyalty (in case        and clear. For example,
of the candidate). The purpose of the     Poor Q
survey is to collect data. Survey is a    What percentage of your income do
method of gathering information from      you spend on clothing in order to look
individuals.                              presentable?
Preparation of Instrument                 Good Q
                                          What percentage of your income do
The most common type of instrument        you spend on clothing?
used in surveys is questionnaire/
interview schedule. The questionnaire     •   The questions should not be
is either self administered by the            ambiguous, to enable the respon-
respondent or administered by the             dents to answer quickly, correctly
researcher (enumerator) or trained            and clearly. For example:
investigator. While preparing the         Poor Q
questionnaire/interview schedule, you     Do you spend a lot of money on books
should keep in mind the following         in a month?
points;                                   Good Q
                                          How much do you spend on books in
•   The questionnaire should not be too   a month?
    long. The number of questions         i)
                                          ( Less than Rs 200
    should be as minimum as possible.     ii
                                          () Between Rs 200–300
    Long questionnaires discourage         ii
                                          (i) Between Rs 300–400
    people from completing them.           i)
                                          (v More than Rs 400
•  The series of questions should move    •  The question should not use double
   from general to specific. The             negatives. The questions starting
   questionnaire should start from           with “Wouldn’t you” or “Don’t you”
   general questions and proceed to          should be avoided, as they may
   more specific ones. This helps the        lead to biased responses. For
   respondents feel comfortable. For         example:
   example:                               Poor Q
Poor Q                                    Don’t you think smoking should be
i)
( Is increase in electricity charges      prohibited?
   justified?                             Good Q
ii
() Is the electricity supply in your      Do you think smoking should be
   locality regular?                      prohibited?
1 2                                                      STATISTICS FOR ECONOMICS


•  The question should not be a               because all the respondents respond
   leading question, which gives a clue       from the given options. But they are
   about how the respondent should            difficult to write as the alternatives
   answer. For example:                       should be clearly written to represent
Poor Q                                        both sides of the issue. There is also
How do you like the flavour of this           a possibility that the individual’s true
high-quality tea?                             response is not present among the
Good Q                                        options given. For this, the choice of
How do you like the flavour of this tea?      ‘Any Other’ is provided, where the
                                              respondent can write a response,
•  The question should not indicate
                                              which was not anticipated by the
   alternatives to the answer. For
                                              researcher. Moreover, another
   example:
Poor Q                                        limitation of multiple-choice questions
Would you like to do a job after college      is that they tend to restrict the
or be a housewife?                            answers by providing alternatives,
Good Q                                        without which the respondents may
Would you like to do a job, if possible?      have answered differently.
    The questionnaire may consist of              Open-ended questions allow for
closed ended (or structured) questions        more individualised responses, but
or open ended (or unstructured)               they are difficult to interpret and hard
questions.                                    to score, since there are a lot of
    Closed ended or structured                variations in the responses. Example,
questions can either be a two-way             Q. What is your view about
question or a multiple choice question.           globalisation?
When there are only two possible
answers, ‘yes’ or ‘no’, it is called a two-   Mode of Data Collection
way question.
                                              Have you ever come across a television
    When there is a possibility of more
than two options of answers, multiple         show in which reporters ask questions
choice questions are more appropriate.        from children, housewives or general
Example,                                      public regarding their examination
Q. Why did you sell your land?                performance or a brand of soap or a
    ( To pay off the debts.
    i)                                        political party? The purpose of asking
    ii
    () To finance children’s educa-           questions is to do a survey for
        tion.                                 collection of data. There are three
     ii
    (i) To invest in another property.        basic ways of collecting data: (i)
     i)
    (v Any other (please specify).            Personal Interviews, (ii) Mailing
   Closed -ended questions are easy           (questionnaire) Surveys, and (iii)
to use, score and code for analysis,          Telephone Interviews.
COLLECTION OF DATA                                                            1 3

Personal Interviews                       less expensive. It allows the researcher
                                          to have access to people in remote
This method is used
                                          areas too, who might be difficult to
when the researcher
                                          reach in person or by telephone. It
has access to all the                     does not allow influencing of the
members. The resea-                       respondents by the interviewer. It also
rcher (or investigator)                   permits the respondents to take
conducts face to face interviews with     sufficient time to give thoughtful
the respondents.                          answers to the questions. These days
    Personal interviews are preferred     online surveys or surveys through
due to various reasons. Personal          short messaging service i.e. SMS have
contact is made between the               become popular. Do you know how an
respondent and the interviewer. The       online survey is conducted?
interviewer has the opportunity of           The disadvantages of mail survey
explaining the study and answering        are that, there is less opportunity to
any query of the respondents. The         provide assistance in clarifying
interviewer can request the respon-       instructions, so there is a possibility
dent to expand on answers that are        of misinterpretation of questions.
particularly important. Misinterpre-      Mailing is also likely to produce low
                                          response rates due to certain factors
tation and misunderstanding can be
                                          such as returning the questionnaire
avoided. Watching the reactions of the
                                          without completing it, not returning
respondents can provide supplemen-
                                          the questionnaire at all, loss of
tary information.                         questionnaire in the mail itself, etc.
    Personal interview has some
demerits too. It is expensive, as it      Telephone Interviews
requires trained interviewers. It takes
                                          In a telephone interview, the
longer time to complete the survey.
                                          investigator asks questions over the
Presence of the researcher may inhibit
                                                        telephone. The advan-
respondents from saying what they
                                                        tages of telephone
really think.
                                                        interviews are that they
                                                        are cheaper than
Mailing Questionnaire
                                                        personal interviews and
When the data in a survey are             can be conducted in a shorter time.
collected by mail, the questionnaire is   They allow the researcher to assist the
sent to each individual                   respondent by clarifying the
by mail with a request                    questions. Telephone interview is
to complete and return                    better in the cases where the
it by a given date. The                   respondents are reluctant to answer
advantages of this                        certain questions in personal
method are that, it is                    interviews.
1 4                                                     STATISTICS FOR ECONOMICS


                  Activities                small group which is known as Pilot
                                            Survey or Pre-Testing of the
    •   You have to collect information     questionnaire. The pilot survey helps
        from a person, who lives in a
                                            in providing a preliminary idea about
        remote village of India. Which
                                            the survey. It helps in pre-testing of
        mode of data collection will be
        the most appropriate for            the questionnaire, so as to know the
        collecting information from him?    shortcomings and drawbacks of the
    •   You have to interview the parents   questions. Pilot survey also helps in
        about the quality of teaching in    assessing the suitability of questions,
        a school. If the principal of the   clarity of instructions, performance of
        school is present there, what       enumerators and the cost and time
        types of problems can arise?        involved in the actual survey.
   The disadvantage of this method
is access to people, as many people         4. CENSUS    AND   SAMPLE SURVEYS
may not own telephones. Telephone           Census or Complete Enumeration
Interviews also obstruct visual
                                            A survey, which includes every
reactions of the respondents, which
                                            element of the population, is known
becomes helpful in obtaining
                                            as Census or the Method of Complete
information on sensitive issues.
                                            Enumeration. If certain agencies are
                                            interested in studying the total
Pilot Survey
                                            population in India, they have to
Once the questionnaire is ready, it is      obtain information from all the
advisable to conduct a try-out with a       households in rural and urban India.

               Advantages                                Disadvantages
• Highest Response Rate                       • Most expensive
• Allows use of all types of questions        • Possibility of influencing
• Better for using open-ended                   respondents
  questions                                   • More time taking.
• Allows clarification of ambiguous
  questions.

•     Least expensive                         • Cannot be used by illiterates
•     Only method to reach remote areas       • Long response time
•     No influence on respondents             • Does not allow explanation of
•     Maintains anonymity of respondents        unambiguous questions
•     Best for sensitive questions.           • Reactions cannot be watched.

• Relatively low cost                         • Limited use
• Relatively less influence on                • Reactions cannot be watched
  respondents                                 • Possibility of influencing respon-
• Relatively high response rate.                dents.
COLLECTION OF DATA                                                               1 5

The essential feature of this method
is that this covers every individual unit
in the entire population. You cannot
select some and leave out others. You
may be familiar with the Census of
India, which is carried out every ten
years. A house-to-house enquiry is
carried out, covering all households
in India. Demographic data on birth
and death rates, literacy, workforce,
life expectancy, size and composition
of population, etc. are collected and
                                            Source: Census of India, 2001.
published by the Registrar General of
India. The last Census of India was         1981 indicated that the rate of
held in February 2001.                      population growth during 1960s and
                                            1970s remained almost same. 1991
                                            Census indicated that the annual
                                            growth rate of population during
                                            1980s was 2.14 per cent, which came
                                            down to 1.93 per cent during 1990s
                                            according to Census 2001.
                                               “At 00.00 hours of first March,
                                              2001 the population of India stood
                                              at 1027,015,247 comprising of
                                              531,277,078        males      and
                                              495,738,169 females. Thus, India
                                              becomes the second country in the
                                              world after China to cross the one
                                              billion mark.”

                                            Source: Census of India, 2001.

                                            Sample Survey
                                            Population or the Universe in statistics
                                            means totality of the items under
   According to the Census 2001,            study. Thus, the Population or the
population of India is 102.70 crore. It     Universe is a group to which the
was 23.83 crore according to Census         results of the study are intended to
1901. In a period of hundred years,         apply. A population is always all the
the population of our country               individuals/items who possess certain
increased by 78.87 crore. Census            characteristics (or a set of characteris-
1 6                                                   STATISTICS FOR ECONOMICS

tics), according to the purpose of the     • Sample: Ten per cent of the
survey. The first task in selecting a      agricultural labourers in Chura-
sample is to identify the population.      chandpur district.
Once the population is identified, the         Most of the surveys are sample
researcher selects a Representative        surveys. These are preferred in
Sample, as it is difficult to study the    statistics because of a number of
entire population. A sample refers to      reasons. A sample can provide
a group or section of the population       reasonably reliable and accurate
from which information is to be            information at a lower cost and
obtained. A good sample (represen-         shorter time. As samples are smaller
tative sample) is generally smaller than   than population, more detailed
the population and is capable of           information can be collected by
providing reasonably accurate              conducting intensive enquiries. As we
information about the population at        need a smaller team of enumerators,
a much lower cost and shorter time.        it is easier to train them and supervise
   Suppose you want to study the           their work more effectively.
average income of people in a certain          Now the question is how do you
region. According to the Census            do the sampling? There are two main
method, you would be required to find      types of sampling, random and non-
out the income of every individual in      random. The following description will
the region, add them up and divide         make their distinction clear.
by number of individuals to get the
average income of people in the region.                   Activities
This method would require huge               •   In which years will the next
expenditure, as a large number of                Census be held in India and
enumerators have to be employed.                 China?
Alternatively, you select a represent-       •   If you have to study the opinion
ative sample, of a few individuals, from         of students about the new
the region and find out their income.            economics textbook of class XI,
                                                 what will be your population and
The average income of the selected
                                                 sample?
group of individuals is used as an
                                             •   If a researcher wants to estimate
estimate of average income of the                the average yield of wheat in
individuals of the entire region.                Punjab, what will be her/his
                                                 population and sample?
Example
• Research problem: To study the           Random Sampling
economic condition of agricultural         As the name suggests, random
labourers in Churachandpur district        sampling is one where the individual
of Manipur.                                units from the population (samples)
• Population: All agricultural             are selected at random. The
labourers in Churachandpur district.       government wants to determine the
COLLECTION OF DATA                                                                      1 7

                                                  tables have been generated to
                                                  guarantee equal probability of
                                                  selection of every individual unit (by
                                                  their listed serial number in the
                                                  sampling frame) in the population.
                                                  They are available either in a
                 A Population of 20
                                                  published form or can be generated
                   Kuchha and 20
                   Pucca Houses
                                                  by using appropriate software
                                                  packages (See Appendix B).You can
                                                  start using the table from anywhere,
                                                  i.e., from any page, column, row or
   A Representative        A non Representative   point. In the above example, you need
        Sample                    Sample
                                                  to select a sample of 30 households
impact of the rise in petrol price on
                                                  out of 300 total households. Here, the
the household budget of a particular
                                                  largest serial number is 300, a three
locality. For this, a representative
                                                  digit number and therefore we consult
(random) sample of 30 households has
                                                  three digit random numbers in
to be taken and studied. The names
                                                  sequence. We will skip the random
of all the 300 households of that area
                                                  numbers greater than 300 since there
are written on pieces of paper and
                                                  is no household number greater than
mixed well, then 30 names to be
                                                  300. Thus, the 30 selected households
interviewed are selected one by one.
                                                  are with serial numbers: 149, 219,
    In the random sampling, every
                                                  111, 165, 230, 007, 089, 212, 051,
individual has an equal chance of being
                                                  244, 300, 051, 244, 155, 300, 051,
selected and the individuals who are
                                                  152, 156, 205, 070, 015, 157, 040,
selected are just like the ones who are
                                                  243, 479, 116, 122, 081, 160, 162.
not selected. In the above example, all
the 300 sampling units (also called
sampling frame) of the population got
                                                                 Exit Polls
an equal chance of being included in
the sample of 30 units and hence the                You must have seen that when an
sample, such drawn, is a random                     election takes place, the television
sample. This is also called lottery                 networks provide election coverage.
method. The same could be done using                They also try to predict the results.
a Random Number Table also.                         This is done through exit polls,
                                                    wherein a random sample of voters
How to use the Random Number                        who exit the polling booths are asked
Tables?                                             whom they voted for. From the data
                                                    of the sample of voters, the
Do you know what are the Random
                                                    prediction is made.
Number Tables? Random number
1 8                                                    STATISTICS FOR ECONOMICS

                Activity                    characteristic of the population (that
  •   You have to analyse the trend of      may be the average income, etc.). It is
      foodgrains production in India        the error that occurs when you make
      for the last fifty years. As it is    an observation from the sample taken
      difficult to include all the years,   from the population. Thus, the
      you have to select a sample of        difference between the actual value of
      production of ten years. Using        a parameter of the population (which
      the Random Number Tables,             is not known) and its estimate (from
      how will you select your sample?
                                            the sample) is the sampling error. It is
                                            possible to reduce the magnitude of
Non-Random Sampling
                                            sampling error by taking a larger
There may be a situation that you           sample.
have to select 10 out of 100
                                            Example
households in a locality. You have to
decide which household to select and        Consider a case of incomes of 5
which to reject. You may select the         farmers of Manipur. The variable x
households conveniently situated or         (income of farmers) has measure-
the households known to you or your         ments 500, 550, 600, 650, 700. We
friend. In this case, you are using your    note that the population average of
judgement (bias) in selecting 10            (500+550+600+650+700)
households. This way of selecting 10        ÷ 5 = 3000 ÷ 5 = 600.
out of 100 households is not a random          Now, suppose we select a sample
selection. In a non-random sampling         of two individuals where x has
method all the units of the population      measurements of 500 and 600. The
do not have an equal chance of being        sample average is (500 + 600) ÷ 2
selected and convenience or judgement       = 1100 ÷ 2 = 550.
of the investigator plays an important      Here, the sampling error of the
role in selection of the sample. They are   estimate = 600 (true value) – 550
mainly selected on the basis of             (estimate) = 50.
judgment, purpose, convenience or
quota and are non-random samples.           Non-Sampling Errors
                                            Non-sampling errors are more serious
 .
5 SAMPLING     AND   NON-S AMPLING          than sampling errors because a
   ERRORS                                   sampling error can be minimised by
Sampling Errors                             taking a larger sample. It is difficult
The purpose of the sample is to take        to minimise non-sampling error, even
an estimate of the population.              by taking a large sample. Even a
Sampling error refers to the                Census can contain non-sampling
differences between the sample              errors. Some of the non-sampling
estimate and the actual value of a          errors are:
COLLECTION OF DATA                                                            1 9

Errors in Data Acquisition                 process and tabulate the statistical
This type of error arises from recording   data. Some of the major agencies at
of incorrect responses. Suppose, the       the national level are Census of India,
teacher asks the students to measure       National Sample Survey Organisation
the length of the teacher’s table in the   (NSSO), Central Statistical Organisa-
classroom. The measurement by the          tion (CSO), Registrar General of India
students may differ. The differences       (RGI), Directorate General of
may occur due to differences in            Commercial Intelligence and Statistics
measuring tape, carelessness of the        (DGCIS), Labour Bureau etc.
students etc. Similarly, suppose we           The Census of India provides the
want to collect data on prices of          most complete and continuous
oranges. We know that prices vary          demographic record of population. The
from shop to shop and from market          Census is being regularly conducted
to market. Prices also vary according      every ten years since 1881. The first
to the quality. Therefore, we can only     Census after Independence was held
consider the average prices. Recording     in 1951. The Census collects
mistakes can also take place as the        information on various aspects of
enumerators or the respondents may         population such as the size, density,
commit errors in recording or trans-       sex ratio, literacy, migration, rural-
scripting the data, for example, he/       urban distribution etc. Census in
she may record 13 instead of 31.           India is not merely a statistical
                                           operation, the data is interpreted and
Non-Response Errors                        analysed in an interesting manner.
                                              The NSSO was established by the
Non-response occurs if an interviewer      government of India to conduct
is unable to contact a person listed in    nation-wide surveys on socio-
the sample or a person from the            economic issues. The NSSO does
sample refuses to respond. In this         continuous surveys in successive
case, the sample observation may not       rounds. The data collected by NSSO
be representative.                         surveys, on different socio economic
                                           subjects, are released through reports
Sampling Bias
                                           and its quarterly journal
Sampling bias occurs when the              Sarvekshana. NSSO provides periodic
sampling plan is such that some            estimates of literacy, school
members of the target population           enrolment, utilisation of educational
could not possibly be included in the      services, employment, unemployment,
sample.                                    manufacturing and service sector
                                           enterprises, morbidity, maternity,
6. CENSUS   OF   INDIA   AND   NSSO        child care, utilisation of the public
There are some agencies both at the        distribution system etc. The NSS 59th
national and state level, which collect,   round survey (January–December
2 0                                                     STATISTICS FOR ECONOMICS

2003) was on land and livestock              of data collection is to understand,
holdings, debt and investment. The           explain and analyse a problem and
NSS 60th round survey (January–              causes behind it. Primary data is
June 2004) was on morbidity and              obtained by conducting a survey.
health care. The NSSO also
                                             Survey includes various steps, which
undertakes the fieldwork of Annual
                                             need to be planned carefully. There are
survey of industries, conducts crop
estimation surveys, collects rural and       various agencies which collect,
urban retail prices for compilation of       process, tabulate and publish
consumer price index numbers.                statistical data. These can be used as
                                             secondary data. However, the choice
7. CONCLUSION                                of source of data and mode of data
Economic facts, expressed in terms of        collection depends on the objective of
numbers, are called data. The purpose        the study.



                                        Recap
        •   Data is a tool which helps in reaching a sound conclusion on any
            problem by providing information.
        •   Primary data is based on first hand information.
        •   Survey can be done by personal interviews, mailing questionnaires
            and telephone interviews.
        •   Census covers every individual/unit belonging to the population.
        •   Sample is a smaller group selected from the population from which
            the relevant information would be sought.
        •   In a random sampling, every individual is given an equal chance of
            being selected for providing information.
        •   Sampling error arises due to the difference between the actual
            population and the estimate.
        •   Non-sampling errors can arise in data acquisition, by non-response
            or by bias in selection.
        •   Census of India and National Sample Survey Organisation
            are two important agencies at the national level, which collect,
            process and tabulate data.



                                     EXERCISES

       .
      1 Frame at least four appropriate multiple-choice options for following
         questions:
         i)
         ( Which of the following is the most important when you buy a new
             dress?
COLLECTION OF DATA                                                                     2 1

       i)
      (i    How often do you use computers?
       ii
      (i)   Which of the newspapers do you read regularly?
       i)
      (v    Rise in the price of petrol is justified.
       v
      ()    What is the monthly income of your family?
    2. Frame five two-way questions (with ‘Yes’ or ‘No’).
       i)
    3. ( There are many sources of data (true/false).
        i)
       (i Telephone survey is the most suitable method of collecting data, when
           the population is literate and spread over a large area (true/false).
        ii
       (i) Data collected by investigator is called the secondary data (true/false).
        i)
       (v There is a certain bias involved in the non-random selection of samples
           (true/false).
        v
       () Non-sampling errors can be minimised by taking large samples (true/
            as)
           fle.
    4. What do you think about the following questions. Do you find any problem
       with these questions? If yes, how?
       i)
       ( How far do you live from the closest market?
        i)
       (i If plastic bags are only 5 percent of our garbage, should it be banned?
        ii
       (i) Wouldn’t you be opposed to increase in price of petrol?
        i) a
       (v () Do you agree with the use of chemical fertilizers?
            b
           ( ) Do you use fertilizers in your fields?
            c
           () What is the yield per hectare in your field?
    5. You want to research on the popularity of Vegetable Atta Noodles among
       children. Design a suitable questionnaire for collecting this information.
    6. In a village of 200 farms, a study was conducted to find the cropping
       pattern. Out of the 50 farms surveyed, 50% grew only wheat. Identify the
       population and the sample here.
    7. Give two examples each of sample, population and variable.
    8. Which of the following methods give better results and why?
        a
       () Census           (b) Sample
    9. Which of the following errors is more serious and why?
      (a) Sampling error (b) Non-Sampling error
   10. Suppose there are 10 students in your class. You want to select three out
       of them. How many samples are possible?
   11. Discuss how you would use the lottery method to select 3 students out of
       10 in your class?
   12. Does the lottery method always give you a random sample? Explain.
   13. Explain the procedure of selecting a random sample of 3 students out of
       10 in your class, by using random number tables.
   14. Do samples provide better results than surveys? Give reasons for your
       answer.
                                                                      CHAPTER


Organisation of Data




                                          between census and sampling. In this
 Studying this chapter should enable      chapter, you will know how the data,
 you to:                                  that you collected, are to be classified.
 • classify the data for further
                                          The purpose of classifying raw data is
    statistical analysis;
 • distinguish between quantitative       to bring order in them so that they
    and qualitative classification;       can be subjected to further statistical
 • prepare a frequency distribution       analysis easily.
    table;                                    Have you ever observed your local
 • know the technique of forming          junk dealer or kabadiwallah to whom
    classes;                              you sell old newspapers, broken
 • be familiar with the method of tally   household items, empty glass bottles,
    marking;                              plastics etc. He purchases these
 • differentiate between univariate
                                          things from you and sells them to
    and bivariate frequency distribu-
    tions.
                                          those who recycle them. But with so
                                          much junk in his shop it would be very
                                          difficult for him to manage his trade,
 .
1 INTRODUCTION                            if he had not organised them properly.
In the previous chapter you have          To ease his situation he suitably
learnt about how data is collected. You   groups or “classifies” various junk.
also came to know the difference          He puts old newspapers together and
ORGANISATION OF DATA                                                                2 3

ties them with a rope. Then collects       manner. The kabadiwallah groups his
all empty glass bottles in a sack. He      junk in such a way that each group
heaps the articles of metals in one        consists of similar items. For example,
corner of his shop and sorts them into     under the group “Glass” he would put
groups like “iron”, “copper”,              empty bottles, broken mirrors and
“aluminium”, “brass” etc., and so on.      windowpanes etc. Similarly when you
In this way he groups his junk into        classify your history books under the
different classes — “newspapers,           group “History” you would not put a
“plastics”, “glass”, “metals” etc. — and   book of a different subject in that
brings order in them. Once his junk        group. Otherwise the entire purpose
is arranged and classified, it becomes     of grouping would be lost.
easier for him to find a particular item   Classification, therefore, is arranging
that a buyer may demand.                   or organising similar things into groups
    Likewise when you arrange your         or classes.
schoolbooks in a certain order, it
becomes easier for you to handle                            Activity
them. You may classify them                  •   Visit your local post-office to find
                                                 out how letters are sorted. Do
                                                 you know what the pin-code in a
                                                 letter indicates? Ask your
                                                 postman.

                                           2. RAW DATA
                                           Like the kabadiwallah’s junk, the
                                           unclassified data or raw data are
                                           highly disorganised. They are often
                                           very large and cumbersome to handle.
                                           To draw meaningful conclusions from
                                           them is a tedious task because they
according to subjects where each           do not yield to statistical methods
subject becomes a group or a class.        easily. Therefore proper organisation
So, when you need a particular book        and presentation of such data is
on history, for instance, all you need     needed before any systematic
to do is to search that book in the        statistical analysis is undertaken.
group “History”. Otherwise, you            Hence after collecting data the next
would have to search through your          step is to organise and present them
entire collection to find the particular   in a classified form.
book you are looking for.                      Suppose you want to know the
    While classification of objects or     performance of students in
things saves our valuable time and         mathematics and you have collected
effort, it is not done in an arbitrary     data on marks in mathematics of 100
2 4                                                           STATISTICS FOR ECONOMICS

students of your school. If you present                           TABLE 3.2
them as a table, they may appear                      Monthly Household Expenditure (in
                                                       Rupees) on Food of 50 Households
something like Table 3.1.
                                                     1904    1559   3473    1735   2760
              TABLE 3.1                              2041    1612   1753    1855   4439
 Marks in Mathematics Obtained by 100                5090    1085   1823    2346   1523
      Students in an Examination                     1211    1360   1110    2152   1183
                                                     1218    1315   1105    2628   2712
  47   45   10   60   51   56   66   100 49   40     4248    1812   1264    1183   1171
  60   59   56   55   62   48   59   55 51    41     1007    1180   1953    1137   2048
  42   69   64   66   50   59   57   65 62    50     2025    1583   1324    2621   3676
  64   30   37   75   17   56   20   14 55    90     1397    1832   1962    2177   2575
  62   51   55   14   25   34   90   49 56    54     1293    1365   1146    3222   1396
  70   47   49   82   40   82   60   85 65    66
  49   44   64   69   70   48   12   28 55    65   from Table 3.1 then you have to first
  49   40   25   41   71   80   0    56 14    22   arrange the marks of 100 students
  66   53   46   70   43   61   59   12 30    35
  45   44   57   76   82   39   32   14 90    25
                                                   either in ascending or in descending
                                                   order. That is a tedious task. It
   Or you could have collected data                becomes more tedious, if instead of
on the monthly expenditure on food                 100 you have the marks of a 1,000
of 50 households in your                           students to handle. Similarly in Table
neighbourhood to know their average                3.2, you would note that it is difficult
expenditure on food. The data                      for you to ascertain the average
collected, in that case, had you                   monthly expenditure of 50
                                                   households. And this difficulty will go
                                                   up manifold if the number was larger
                                                   — say, 5,000 households. Like our
                                                   kabadiwallah, who would be
                                                   distressed to find a particular item
                                                   when his junk becomes large and
                                                   disarranged, you would face a similar
                                                   situation when you try to get any
                                                   information from raw data that are
                                                   large. In one word, therefore, it is a
                                                   tedious task to pull information from
                                                   large unclassified data.
                                                      The raw data are summarised, and
presented as a table, would have
resembled Table 3.2. Both Tables 3.1               made comprehensible by classifi-
and 3.2 are raw or unclassified data.              cation. When facts of similar
In both the tables you find that                   characteristics are placed in the same
numbers are not arranged in any                    class, it enables one to locate them
order. Now if you are asked what are               easily, make comparison, and draw
the highest marks in mathematics                   inferences without any difficulty. You
ORGANISATION OF DATA                                                                 2 5

have studied in Chapter 2 that the        ways. Instead of classifying your books
Government of India conducts Census       according to subjects — “History”,
of population every ten years. The raw    “Geography”, “Mathematics”, “Science”
data of census are so large and           etc. — you could have classified them
fragmented that it appears an almost      author-wise in an alphabetical order.
impossible task to draw any               Or, you could have also classified them
meaningful conclusion from them.          according to the year of publication.
But when the data of Census are           The way you want to classify them
classified according to gender,           would depend on your requirement.
education, marital status, occupation,       Likewise the raw data could be
etc., the structure and nature of         classified in various ways depending
population of India is, then, easily      on the purpose in hand. They can be
understood.                               grouped according to time. Such a
    The raw data consist of               classification is known as a
observations on variables. Each unit      Chronological Classification. In
of raw data is an observation. In Table   such a classification, data are
3.1 an observation shows a particular     classified either in ascending or in
                                          descending order with reference to
value of the variable “marks of a
                                          time such as years, quarters, months,
student in mathematics”. The raw
                                          weeks, etc. The following example
data contain 100 observations on
                                          shows the population of India
“marks of a student” since there are      classified in terms of years. The
100 students. In Table 3.2 it shows a     variable ‘population’ is a Time Series
particular value of the variable          as it depicts a series of values for
“monthly expenditure of a household       different years.
on food”. The raw data in it contain
50 observations on “monthly               Example 1
expenditure on food of a household”
                                                Population of India (in crores)
because there are 50 households.
                                            Year               Population (Crores)
               Activity                     1951                     35.7
  •   Collect data of total weekly          1961                     43.8
      expenditure of your family for a      1971                     54.6
      year and arrange it in a table.       1981                     68.4
      See how many observations you         1991                     81.8
      have. Arrange the data monthly        2001                    102.7
      and find the number of
      observations.                           In Spatial Classification the data
                                          are classified with reference to
3. CLASSIFICATION   OF   DATA             geographical locations such as
                                          countries, states, cities, districts, etc.
The groups or classes of a                Example 2 shows the yield of wheat in
classification can be done in various     different countries.
2 6                                                     STATISTICS FOR ECONOMICS

                                            on the basis of either the presence or
                                            the absence of a qualitative
                                            characteristic. Such a classification of
                                            data on attributes is called a
                                            Qualitative Classification. In the
                                            following example, we find population
                                            of a country is grouped on the basis
                                            of the qualitative variable “gender”. An
                                            observation could either be a male or
Example 2
                                            a female. These two characteristics
  Yield of Wheat for Different Countries    could be further classified on the basis
  Country        Yield of wheat (kg/acre)   of marital status (a qualitative
  America                1925               variable) as given below:
  Brazil                  127
  China                   893               Example 3
  Denmark                 225
  France                  439                               Population
  India                   862

                                                   Male                  Female
               Activities
  •   In the time-series of Example 1,
      in which year do you find the          Married   Unmarried Married    Unmarried
      population of India to be the
      minimum. Find the year when it           The classification at the first stage
      is the maximum.                       is based on the presence and absence
  •   In Example 2, find the country        of an attribute i.e. male or not male
      whose yield of wheat is slightly      (female). At the second stage, each
      more than that of India’s. How        class — male and female, is further sub
      much would that be in terms of        divided on the basis of the presence or
      percentage?                           absence of another attribute i.e.
  •   Arrange the countries of              whether married or unmarried. On the
      Example 2 in the ascending
      order of yield. Do the same
                                                            Activity
      exercise for the descending order
      of yield.                               •   The objects around can be
                                                  grouped as either living or non-
   Sometimes you come across                      living. Is it a quantitative
characteristics that cannot be                    classification?
expressed quantitatively. Such
characteristics are called Qualities or     other hand, characteristics like height,
Attributes. For example, nationality,       weight, age, income, marks of
literacy, religion, gender, marital         students, etc. are quantitative in
status, etc. They cannot be measured.       nature. When the collected data of
Yet these attributes can be classified      such characteristics are grouped into
ORGANISATION OF DATA                                                              2 7

classes, the classification is a           chapter, does not tell you how it varies.
Quantitative Classification.               Different variables vary differently and
                                           depending on the way they vary, they
Example 4                                  are broadly classified into two types:
   Frequency Distribution of Marks in      i
                                           )
                                           ( Continuous and
      Mathematics of 100 Students
                                           ii
                                           () Discrete.
  Marks                 Frequency
                                               A continuous variable can take any
  0–10                       1
                                           numerical value. It may take integral
  10–20                      8
  20–30                      6             values (1, 2, 3, 4, ...), fractional values
  30–40                      7             (1/2, 2/3, 3/4, ...), and values that
  40–50                     21             are not exact fractions ( 2 =1.414,
  50–60                     23
  60–70                     19
                                             3 =1.732, … , 7 =2.645). For
  70–80                      6             example, the height of a student, as
  80–90                      5             he/she grows say from 90 cm to 150
  90–100                     4             cm, would take all the values in
  Total                    100             between them. It can take values that
                                           are whole numbers like 90cm, 100cm,
   Example 4 shows quantitative            108cm, 150cm. It can also take
classification of the data of marks in     fractional values like 90.85 cm, 102.34
mathematics of 100 students given in       cm, 149.99cm etc. that are not whole
Table 3.1 as a Frequency Distribution.     numbers. Thus the variable “height”
                                                                   is capable of
                Activity                                           manifesting in
  •   Express the values of frequency                              every conceivable
      of Example 4 as proportion or                                value and its
      percentage of total frequency.                               values can also
      Note that frequency expressed in     be broken down into infinite
      this way is known as relative        gradations. Other examples of a
      frequency.                           continuous variable are weight, time,
  •   In Example 4, which class has        distance, etc.
      the maximum concentration of             Unlike a continuous variable, a
      data? Express it as percentage       discrete variable can take only certain
      of total observations. Which class   values. Its value changes only by finite
      has the minimum concentration
                                           “jumps”. It “jumps” from one value to
      of data?
                                           another but does not take any
                                           intermediate value between them. For
 .
4 VARIABLES: CONTINUOUS          AND
                                           example, a variable like the “number
   DISCRETE
                                           of students in a class”, for different
   A simple definition of variable,        classes, would assume values that are
which you have read in the last            only whole numbers. It cannot take
2 8                                                  STATISTICS FOR ECONOMICS

any fractional value like                 before we address this question, you
0.5 because “half of a                    must know what a frequency
student” is absurd.                       distribution is.
Therefore it cannot take a
value like 25.5 between 25                5. WHAT   IS A   FREQUENCY DISTRIBUTION?
and 26. Instead its value
                                          A frequency distribution is a
could have been either 25
                                          comprehensive way to classify raw
or 26. What we observe is
                                          data of a quantitative variable. It
that as its value changes
                                          shows how the different values of a
from 25 to 26, the values
                                          variable (here, the marks in
in between them — the fractions are
                                          mathematics scored by a student) are
not taken by it. But do not have the
                                          distributed in different classes along
impression that a discrete variable
                                          with their corresponding class
cannot take any fractional value.
                                          frequencies. In this case we have ten
Suppose X is a variable that takes
                                          classes of marks: 0–10, 10–20, … , 90–
values like 1/8, 1/16, 1/32, 1/64, ...
                                          100. The term Class Frequency means
Is it a discrete variable? Yes, because
                                          the number of values in a particular
though X takes fractional values it
                                          class. For example, in the class 30–
cannot take any value between two
                                          40 we find 7 values of marks from raw
adjacent fractional values. It changes
                                          data in Table 3.1. They are 30, 37, 34,
or “jumps” from 1/8 to 1/16 and from
                                          30, 35, 39, 32. The frequency of the
1/16 to 1/32. But cannot take a value
                                          class: 30–40 is thus 7. But you might
in between 1/8 and 1/16 or between
                                          be wondering why 40–which is
1/16 and 1/32
                                          occurring twice in the raw data – is
                                          not included in the class 30–40. Had
                Activity
                                          it been included the class frequency
  •   Distinguish the following           of 30–40 would have been 9 instead
      variables as continuous and         of 7. The puzzle would be clear to you
      discrete:
                                          if you are patient enough to read this
      Area, volume, temperature,
      number appearing on a dice,
                                          chapter carefully. So carry on. You will
      crop yield, population, rainfall,   find the answer yourself.
      number of cars on road, age.            Each class in a frequency
                                          distribution table is bounded by Class
    Earlier we have mentioned that
                                          Limits. Class limits are the two ends
example 4 is the frequency                of a class. The lowest value is called
distribution of marks in mathematics      the Lower Class Limit and the highest
of 100 students as shown in Table 3.1.    value the Upper Class Limit. For
It shows how the marks of 100             example, the class limits for the class:
students are grouped into classes. You    60–70 are 60 and 70. Its lower class
will be wondering as to how we got it     limit is 60 and its upper class limit is
from the raw data of Table 3.1. But,      70. Class Interval or Class Width is
ORGANISATION OF DATA                                                                  2 9

the difference between the upper class            frequency distribution of the data in
limit and the lower class limit. For the          our example above. To obtain the
class 60–70, the class interval is 10             frequency curve we plot the class
(upper class limit minus lower class              marks on the X-axis and frequency on
 ii)
lmt.                                              the Y-axis.
    The Class Mid-Point or Class Mark
is the middle value of a class. It lies
halfway between the lower class limit
and the upper class limit of a class
and can be ascertained in the
following manner:

Class Mid-Point or Class Mark =
(Upper Class Limit + Lower Class
 ii) ..................()
L m t /2. . . . . . . . . . . . . . . . . . . 1
                                                  Fig.3.1: Diagrammatic Presentation of
   The class mark or mid-value of                 Frequency Distribution of Data.
each class is used to represent the
                                                  How to prepare         a   Frequency
class. Once raw data are grouped into
                                                  Distribution?
classes, individual observations are
not used in further calculations.                 While preparing a frequency
Instead, the class mark is used.                  distribution from the raw data of Table
                                                  3.1, the following four questions need
                TABLE 3.3                         to be addressed:
 The Lower Class Limits, the Upper Class           .
                                                  1 How many classes should we
        Limits and the Class Mark
                                                     have?
 Class    Frequency   Lower Upper      Class       .
                                                  2 What should be the size of each
                      Class Class      Marks
                      Limit Limit
                                                     class?
                                                   .
                                                  3 How should we determine the class
 0–10         1         0      10         5
 10–20        8        10      20        15
                                                     limits?
 20–30        6        20      30        25        .
                                                  4 How should we get the frequency
 30–40        7        30      40        35          for each class?
 40–50       21        40      50        45
 50–60       23        50      60        55       How many classes should we have?
 60–70       19        60      70        65
 70–80        6        70      80        75           Before we determine the number
 80–90        5        80      90        85       of classes, we first find out as to what
 90–100       4        90     100        95
                                                  extent the variable in hand changes
   Frequency Curve is a graphic                   in value. Such variations in variable’s
representation of a frequency                     value are captured by its range. The
distribution. Fig. 3.1 shows the                  Range is the difference between the
diagrammatic presentation of the                  largest and the smallest values of the
3 0                                                   STATISTICS FOR ECONOMICS

variable. A large range indicates that     example, suppose the range is 100
the values of the variable are widely      and the class interval is 50. Then the
spread. On the other hand, a small         number of classes would be just 2
range indicates that the values of the     (i.e.100/50 = 2). Though there is no
variable are spread narrowly. In our       hard-and-fast rule to determine the
example the range of the variable          number of classes, the rule of thumb
“marks of a student” are 100 because       often used is that the number of
the minimum marks are 0 and the            classes should be between 5 and 15.
maximum marks 100. It indicates that       In our example we have chosen to
the variable has a large variation.        have 10 classes. Since the range is 100
   After obtaining the value of range,
                                           and the class interval is 10, the
it becomes easier to determine the
                                           number of classes is 100/10 =10.
number of classes once we decide the
class interval. Note that range is the
                                           What should be the size of each
sum of all class intervals. If the class
                                           class?
intervals are equal then range is the
product of the number of classes and       The answer to this question depends
class interval of a single class.          on the answer to the previous
                                           question. The equality (2) shows that
Range = Number of Classes × Class
                                           given the range of the variable, we can
 nevl ....................2
Itra ....................()
                                           determine the number of classes once
                                           we decide the class interval. Similarly,
               Activities                  we can determine the class interval
  Find the range of the following:         once we decide the number of classes.
  • population of India in Example 1,      Thus we find that these two decisions
  • yield of wheat in Example 2.           are inter-linked with one another. We
   Given the value of range, the           cannot decide on one without deciding
number of classes would be large if        on the other.
we choose small class intervals. A
                                              In Example 4, we have the number
frequency distribution with too many
                                           of classes as 10. Given the value of
classes would look too large. Such a
                                           range as 100, the class intervals are
distribution is not easy to handle. So
we want to have a reasonably compact       automatically 10 by the equality (2).
set of data. On the other hand, given      Note that in the present context we
the value of range if we choose a class    have chosen class intervals that are
interval that is too large then the        equal in magnitude. However we could
number of classes becomes too small.       have chosen class intervals that are
The data set then may be too compact       not of equal magnitude. In that case,
and we may not like the loss of            the classes would have been of
information about its diversity. For       unequal width.
ORGANISATION OF DATA                                                               3 1

How should we determine the class            the lower class limit of that class. Had
limits?                                      we done that we would have excluded
When we classify raw data of a               the observation 0. The upper class
continuous variable as a frequency           limit of the first class: 0–10 is then
distribution, we in effect, group the        obtained by adding class interval with
individual observations into classes.        lower class limit of the class. Thus the
The value of the upper class limit of a      upper class limit of the first class
class is obtained by adding the class        becomes 0 + 10 = 10. And this proce-
interval with the value of the lower         dure is followed for the other classes
class limit of that class. For example,      as well.
the upper class limit of the class 20–           Have you noticed that the upper
30 is 20 + 10 = 30 where 20 is the           class limit of the first class is equal to
lower class limit and 10 is the class        the lower class limit of the second
interval. This method is repeated for        class? And both are equal to 10. This
other classes as well.                       is observed for other classes as well.
    But how do we decide the lower           Why? The reason is that we have used
class limit of the first class? That is to   the Exclusive Method of classification
say, why 0 is the lower class limit of       of raw data. Under the method we
the first class: 0–10? It is because we      form classes in such a way that the
chose the minimum value of the               lower limit of a class coincides with
variable as the lower limit of the first     the upper class limit of the previous
class. In fact, we could have chosen a       class.
value less than the minimum value of             The problem, we would face next,
the variable as the lower limit of the       is how do we classify an observation
first class. Similarly, for the upper        that is not only equal to the upper
class limit for the last class we could      class limit of a particular class but is
have chosen a value greater than the
                                             also equal to the lower class limit of
maximum value of the variable. It is
                                             the next class. For example, we find
important to note that, when a
                                             observation 30 to be equal to the
frequency distribution is being
                                             upper class limit of the class 20–30
constructed, the class limits should
                                             and it is equal to the lower class limit
be so chosen that the mid-point or
class mark of each class coincide, as        of class 30–40. Then, in which of the
far as possible, with any value around       two classes: 20–30 or 30–40 should
which the data tend to be                    we put the observation 30? We can put
concentrated.                                it either in class 20–30 or in class 30–
    In our example on marks of 100           40. It is a dilemma that one commonly
students, we chose 0 as the lower limit      faces while classifying data in
of the first class: 0–10 because the         overlapping classes. This problem is
minimum marks were 0. And that is            solved by the rule of classification in
why, we could not have chosen 1 as           the Exclusive Method.
3 2                                                    STATISTICS FOR ECONOMICS

Exclusive Method                                           TABLE 3.4
                                            Frequency Distribution of Incomes of 550
The classes, by this method, are                    Employees of a Company
formed in such a way that the upper           Income (Rs)       Number of Employees
class limit of one class equals the
                                              800–899                    50
lower class limit of the next class. In       900–999                   100
this way the continuity of the data is        1000–1099                 200
maintained. That is why this method           1100–1199                 150
                                              1200–1299                  40
of classification is most suitable in
                                              1300–1399                  10
case of data of a continuous variable.
                                              Total                     550
Under the method, the upper class limit
is excluded but the lower class limit of
                                            in the class: 800–899 those employees
a class is included in the interval. Thus
                                            whose income is either Rs 800, or
an observation that is exactly equal
                                            between Rs 800 and Rs 899, or Rs
to the upper class limit, according to
                                            899. If the income of an employee is
the method, would not be included in
                                            exactly Rs 900 then he is put in the
that class but would be included in
                                            next class: 900–999.
the next class. On the other hand, if
it were equal to the lower class limit      Adjustment in Class Interval
then it would be included in that class.
In our example on marks of students,        A close observation of the Inclusive
the observation 40, that occurs twice,      Method in Table 3.4 would show that
in the raw data of Table 3.1 is not         though the variable “income” is a
included in the class: 30–40. It is         continuous variable, no such
included in the next class: 40–50. That     continuity is maintained when the
is why we find the frequency corres-        classes are made. We find “gap” or
ponding to the class 30–40 to be 7          discontinuity between the upper limit
instead of 9.                               of a class and the lower limit of the
                                            next class. For example, between the
    There is another method of forming
                                            upper limit of the first class: 899 and
classes and it is known as the
                                            the lower limit of the second class:
Inclusive Method of classification.
                                            900, we find a “gap” of 1. Then how
                                            do we ensure the continuity of the
Inclusive Method
                                            variable while classifying data? This
In comparison to the exclusive method,      is achieved by making an adjustment
the Inclusive Method does not exclude       in the class interval. The adjustment
the upper class limit in a class            is done in the following way:
interval. It includes the upper class        .
                                            1 Find the difference between the
in a class. Thus both class limits are          lower limit of the second class and
parts of the class interval.                    the upper limit of the first class.
    For example, in the frequency               For example, in Table 3.4 the lower
distribution of Table 3.4 we include            limit of the second class is 900 and
ORGANISATION OF DATA                                                                          3 3

   the upper limit of the first class is                           TABLE 3.5
   899. The difference between them                Frequency Distribution of Incomes of 550
                                                           Employees of a Company
   is 1, i.e. (900 – 899 = 1)
 .
2 Divide the difference obtained in                  Income (Rs)        Number of Employees
   (1) by two i.e. (1/2 = 0.5)                        799.5–899.5                50
 .
3 Subtract the value obtained in (2)                  899.5–999.5               100
                                                      999.5–1099.5              200
   from lower limits of all classes                  1099.5–1199.5              150
   (lower class limit – 0.5)                         1199.5–1299.5               40
 .
4 Add the value obtained in (2) to                   1299.5–1399.5               10
   upper limits of all classes (upper                Total                      550
   class limit + 0.5).
   After the adjustment that restores
continuity of data in the frequency                How should we get the frequency
distribution, the Table 3.4 is modified            for each class?
into Table 3.5                                     In simple terms, frequency of an
   After the adjustments in class                  observation means how many times
limits, the equality (1) that determines           that observation occurs in the raw
the value of class-mark would be                   data. In our Table 3.1, we observe that
modified as the following:                         the value 40 occurs thrice; 0 and 10
Adjusted Class Mark = (Adjusted                    occur only once; 49 occurs five times
Upper Class Limit + Adjusted Lower                 and so on. Thus the frequency of 40
Class Limit)/2.                                    is 3, 0 is 1, 10 is 1, 49 is 5 and so on.
                                                   But when the data are grouped into
                                       TABLE 3.6
                 Tally Marking of Marks of 100 Students in Mathematics
Class    Observations                               Tally               Frequency     Class
                                                    Mark                              Mark
0–10     0                                          /                       1          5
10–20    10, 14, 17, 12, 14, 12, 14, 14             ////    ///             8         15
20–30    25, 25, 20, 22, 25, 28                     ////    /               6         25
30–40    30, 37, 34, 39, 32, 30, 35,                ////    //              7         35
40–50    47, 42, 49, 49, 45, 45, 47, 44, 40, 44,    ////    //// ////
         49, 46, 41, 40, 43, 48, 48, 49, 49, 40,    ////    /
         41                                                               21          45
50–60    59, 51, 53, 56, 55, 57, 55, 51, 50, 56,    //// //// ////
         59, 56, 59, 57, 59, 55, 56, 51, 55, 56,    //// ///
         55, 50, 54                                                       23          55
60–70    60, 64, 62, 66, 69, 64, 64, 60, 66, 69,    //// //// ////
         62, 61, 66, 60, 65, 62, 65, 66, 65         ////                  19          65
70–80    70, 75, 70, 76, 70, 71                     /////                  6          75
80–90    82, 82, 82, 80, 85                         ////                   5          85
90–100   90, 100, 90, 90                            ////                   4          95
         Total                                                           100
3 4                                                   STATISTICS FOR ECONOMICS

classes as in example 3, the Class         in classifying raw data though much
Frequency refers to the number of          is gained by summarising it as a
values in a particular class. The          classified data. Once the data are
counting of class frequency is done by     grouped into classes, an individual
tally marks against the particular         observation has no significance in
class.                                     further statistical calculations. In
                                           Example 4, the class 20–30 contains
Finding class frequency by tally           6 observations: 25, 25, 20, 22, 25 and
marking                                    28. So when these data are grouped
A tally (/) is put against a class for     as a class 20–30 in the frequency
each student whose marks are               distribution, the latter provides only
                                           the number of records in that class
included in that class. For example, if
                                           (i.e. frequency = 6) but not their actual
the marks obtained by a student are
                                           values. All values in this class are
57, we put a tally (/) against class 50
                                           assumed to be equal to the middle
–60. If the marks are 71, a tally is put
                                           value of the class interval or class
against the class 70–80. If someone
                                           mark (i.e. 25). Further statistical
obtains 40 marks, a tally is put
                                           calculations are based only on the
against the class 40–50. Table 3.6
                                           values of class mark and not on the
shows the tally marking of marks of
                                           values of the observations in that
100 students in mathematics from
                                           class. This is true for other classes as
Table 3.1.
                                           well. Thus the use of class mark
   The counting of tally is made easier
                                           instead of the actual values of the
when four of them are put as ////          observations in statistical methods
and the fifth tally is placed across       involves considerable loss of
them as      . Tallies are then counted    information.
as groups of five. So if there are 16
tallies in a class, we put them as         Frequency distribution             with
              / for the sake of            unequal classes
convenience. Thus frequency in a
class is equal to the number of tallies    By now you are familiar with
against that class.                        frequency distributions of equal class
                                           intervals. You know how they are
Loss of Information                        constructed out of raw data. But in
                                           some cases frequency distributions
The classification of data as a            with unequal class intervals are more
frequency distribution has an              appropriate. If you observe the
inherent shortcoming. While it             frequency distribution of Example 4,
summarises the raw data making it          as in Table 3.6, you will notice that
concise and comprehensible, it does        most of the observations are
not show the details that are found in     concentrated in classes 40–50, 50–60
raw data. There is a loss of information   and 60–70. Their respective frequen-
ORGANISATION OF DATA                                                                 3 5

cies are 21, 23 and 19. It means that             terms of unequal classes. Each of the
out of 100 observations, 63                       classes 40–50, 50–60 and 60–70 are
(21+23+19) observations are                       split into two classes. The class 40–
concentrated in these classes. These              50 is divided into 40–45 and 45–50.
classes are densely populated with                The class 50–60 is divided into 50– 55
observations. Thus, 63 percent of data            and 55–60. And class 60–70 is divided
lie between 40 and 70. The remaining              into 60–65 and 65–70. The new
37 percent of data are in classes                 classes 40–45, 45–50, 50–55, 55–60,
0–10, 10–20, 20–30, 30–40, 70–80,                 60–65 and 65–70 have class interval
80–90 and 90–100. These classes are               of 5. The other classes: 0–10, 10–20,
sparsely populated with observations.
                                                  20–30, 30–40, 70–80, 80–90 and 90–
Further you will also notice that
                                                  100 retain their old class interval of
observations in these classes deviate
                                                  10. The last column of this table shows
more from their respective class marks
                                                  the new values of class marks for
than in comparison to those in other
                                                  these classes. Compare them with the
classes. But if classes are to be formed
in such a way that class marks                    old values of class marks in Table 3.6.
coincide, as far as possible, to a value          Notice that the observations in these
around which the observations in a                classes deviated more from their old
class tend to concentrate, then in that           class mark values than their new class
case unequal class interval is more               mark values. Thus the new class mark
appropriate.                                      values are more representative of the
   Table 3.7 shows the same                       data in these classes than the old
frequency distribution of Table 3.6 in            values.
                                        TABLE 3.7
                        Frequency Distribution of Unequal Classes
Class    Observations                                        Frequency     Class
                                                                           Mark
0–10     0                                                       1             5
10–20    10, 14, 17, 12, 14, 12, 14, 14                          8            15
20–30    25, 25, 20, 22, 25, 28                                  6            25
30–40    30, 37, 34, 39, 32, 30, 35,                             7            35
40–45    42, 44, 40, 44, 41, 40, 43, 40, 41                      9            42.5
45–50    47, 49, 49, 45, 45, 47, 49, 46, 48, 48, 49, 49         12            47.5
50–55    51, 53, 51, 50, 51, 50, 54                              7            52.5
55–60    59, 56, 55, 57, 55, 56, 59, 56, 59, 57, 59, 55,
         56, 55, 56, 55                                         16            57.5
60–65    60, 64, 62, 64, 64, 60, 62, 61, 60, 62,                10            62.5
65–70    66, 69, 66, 69, 66, 65, 65, 66, 65                      9            67.5
70–80    70, 75, 70, 76, 70, 71                                  6            75
80–90    82, 82, 82, 80, 85                                      5            85
90–100   90, 100, 90, 90                                         4            95
         Total                                                 100
3 6                                                      STATISTICS FOR ECONOMICS

   Figure 3.2 shows the frequency                        TABLE 3.8
curve of the distribution in Table 3.7.   Frequency Array of the Size of Households
The class marks of the table are           Size of the                Number of
plotted on X-axis and the frequencies      Household                 Households
are plotted on Y-axis.                         1                          5
                                               2                         15
                                               3                         25
                                               4                         35
                                               5                         10
                                               6                          5
                                               7                          3
                                               8                          2
                                              Total                    100

                                              The variable “size of the
                                          household” is a discrete variable that
Fig. 3.2: Frequency Curve                 only takes integral values as shown
                                          in the table. Since it does not take any
                                          fractional value between two adjacent
                 Activity
                                          integral values, there are no classes
  •   If you compare Figure 3.2 with      in this frequency array. Since there
      Figure 3.1, what do you observe?    are no classes in a frequency array
      Do you find any difference
      between them? Can you explain
                                          there would be no class intervals. As
      the difference?                     the classes are absent in a discrete
                                          frequency distribution, there is no
                                          class mark as well.
Frequency array
                                          6. BIVARIATE FREQUENCY DISTRIBUTION
So far we have discussed the
classification of data for a continuous   The frequency distribution of a single
variable using the example of             variable is called a Univariate
percentage marks of 100 students in       Distribution. The example 3.3 shows
mathematics. For a discrete variable,     the univariate distribution of the
the classification of its data is known   single variable “marks of a student”.
as a Frequency Array. Since a discrete    A Bivariate Frequency Distribution is
variable takes values and not             the frequency distribution of two
intermediate fractional values            variables.
between two integral values, we have          Table 3.9 shows the frequency
frequencies that correspond to each       distribution of two variable sales and
of its integral values.                   advertisement expenditure (in Rs.
    The example in Table 3.8              lakhs) of 20 companies. The values of
illustrates a Frequency Array.            sales are classed in different columns
ORGANISATION OF DATA                                                                     3 7

                                      TABLE 3.9
 Bivariate Frequency Distribution of Sales (in Lakh Rs) and Advertisement Expenditure
                             (in Thousand Rs) of 20 Firms
            115–125   125–135    135–145     145–155    155–165    165–175       Total
62–64          2         1                                                        3
64–66          1                     3                                            4
66–68          1         1           2          1                                 5
68–70                    2                      2                                 4
70–72                    1           1                      1          1          4
Total          4         5           6          3           1          1         20


and the values of advertisement                 unclassified. Once the data is
expenditure are classed in different            collected, the next step is to classify
rows. Each cell shows the frequency             them for further statistical analysis.
of the corresponding row and column
                                                Classification brings order in the
values. For example, there are 3 firms
whose sales are between Rs 135–145              data.
lakhs and their advertisement                   The chapter enables you to know how
expenditures are between Rs 64–66               data can be classified       through a
thousands. The use of a bivariate               frequency distribution in a
distribution would be taken up in               comprehensive manner. Once you
Chapter 8 on correlation.                       know the techniques of classification,
7. CONCLUSION                                   it will be easy for you to construct a
The data collected from primary and             frequency distribution, both for
secondary sources are raw or                    continuous and discrete variables.




                                           Recap
        •    Classification brings order to raw data.
        •    A Frequency Distribution shows how the different values of a variable
             are distributed in different classes along with their corresponding
             class frequencies.
        •    The upper class limit is excluded but lower class limit is included in
             the Exclusive Method.
        •    Both the upper and the lower class limits are included in the Inclusive
             Method.
        •    In a Frequency Distribution, further statistical calculations are based
             only on the class mark values, instead of values of the observations.
        •    The classes should be formed in such a way that the class mark
             of each class comes as close as possible, to a value around
             which the observations in a class tend to concentrate.
3 8                                                          STATISTICS FOR ECONOMICS

                                       EXERCISES

       .
      1 Which of the following alternatives is true?
      i)
      ( The class midpoint is equal to:
          a
         () The average of the upper class limit and the lower class limit.
          b
         ( ) The product of upper class limit and the lower class limit.
          c
         () The ratio of the upper class limit and the lower class limit.
          d
         ( ) None of the above.
       i)
      (i The frequency distribution of two variables is known as
           a
          () Univariate Distribution
           b
          ( ) Bivariate Distribution
           c
          () Multivariate Distribution
           d
          ( ) None of the above
       ii
      (i) Statistical calculations in classified data are based on
           a
          () the actual values of observations
           b
          ( ) the upper class limits
           c
          () the lower class limits
           d
          ( ) the class midpoints
       i)
      (v Under Exclusive method,
           a
          () the upper class limit of a class is excluded in the class interval
           b
          ( ) the upper class limit of a class is included in the class interval
           c
          () the lower class limit of a class is excluded in the class interval
           d
          ( ) the lower class limit of a class is included in the class interval
       v
      () Range is the
          a
         () difference between the largest and the smallest observations
          b
         ( ) difference between the smallest and the largest observations
          c
         () average of the largest and the smallest observations
          d
         ( ) ratio of the largest to the smallest observation
       .
      2 Can there be any advantage in classifying things? Explain with an example
         from your daily life.
       .
      3 What is a variable? Distinguish between a discrete and a continuous
         variable.
       .
      4 Explain the ‘exclusive’ and ‘inclusive’ methods used in classification of
         data.
       .
      5 Use the data in Table 3.2 that relate to monthly household expenditure
         (in Rs) on food of 50 households and
         i)
         ( Obtain the range of monthly household expenditure on food.
          i)
         (i Divide the range into appropriate number of class intervals and obtain
             the frequency distribution of expenditure.
          ii
         (i) Find the number of households whose monthly expenditure on food is
              a
             () less than Rs 2000
              b
             ( ) more than Rs 3000
ORGANISATION OF DATA                                                                   3 9

               c
              () between Rs 1500 and Rs 2500
    6. In a city 45 families were surveyed for the number of domestic appliances
       they used. Prepare a frequency array based on their replies as recorded
       below.

         1     3    2    2    2    2    1    2     1    2    2    3    3    3      3
         3     3    2    3    2    2    6    1     6    2    1    5    1    5      3
         2     4    2    7    4    2    4    3     4    2    0    3    1    4      3
    7. What is ‘loss of information’ in classified data?
    8. Do you agree that classified data is better than raw data?
    9. Distinguish between univariate and bivariate frequency distribution.
   10. Prepare a frequency distribution by inclusive method taking class interval
       of 7 from the following data:

         28    17   15   22   29   21   23   27    18   12   7    2    9    4      6
         1     8    3    10   5    20   16   12    8    4    33   27   21   15     9
         3     36   27   18   9    2    4    6     32   31   29   18   14   13
         15    11   9    7    1    5    37   32    28   26   24   20   19   25
         19    20

                                   Suggested Activity
     •       From your old mark-sheets find the marks that you obtained in
             mathematics in the previous classes. Arrange them year-wise. Check
             whether the marks you have secured in the subject is a variable or
             not. Also see, if over the years, you have improved in mathematics.
                                                                         CHAPTER



Presentation of Data




                                           •   Textual or Descriptive presentation
  Studying this chapter should             •   Tabular presentation
  enable you to:                           •   Diagrammatic presentation.
  • present data using tables;
  • represent data using appropriate
     diagrams.
                                            .
                                           2 TEXTUAL PRESENTATION         OF   DATA
                                           In textual presentation, data are
1 INTRODUCTION
 .                                         described within the text. When the
                                           quantity of data is not too large this form
You have already learnt in previous        of presentation is more suitable. Look
chapters how data are collected and        at the following cases:
organised. As data are generally
voluminous, they need to be put in a       Case 1
compact and presentable form. This         In a bandh call given on 08 September
chapter deals with presentation of data    2005 protesting the hike in prices of
precisely so that the voluminous data      petrol and diesel, 5 petrol pumps were
collected could be made usable readily     found open and 17 were closed whereas
and are easily comprehended. There are     2 schools were closed and remaining 9
generally three forms of presentation of   schools were found open in a town of
data:                                      Bihar.
PRESENTATION OF DATA                                                                   4 1

Case 2                                     3 rows (for male, female and total) and
Census of India 2001 reported that         3 columns (for urban, rural and total).
Indian population had risen to 102         It is called a 3 × 3 Table giving 9 items
crore of which only 49 crore were          of information in 9 boxes called the
females against 53 crore males. 74 crore   "cells" of the Table. Each cell gives
people resided in rural India and only     information that relates an attribute of
28 crore lived in towns or cities. While   gender ("male", "female" or total) with a
there were 62 crore non-worker             number (literacy percentages of rural
population against 40 crore workers in     people, urban people and total). The
the entire country, urban population       most important advantage of tabulation
had an even higher share of non-           is that it organises data for further
workers (19 crores) against the workers    statistical treatment and decision-
(9 crores) as compared to the rural        making. Classification used in
population where there were 31 crore       tabulation is of four kinds:
workers out of a 74 crore population....   •     Qualitative
    In both the cases data have been       •     Quantitative
presented only in the text. A serious      •     Temporal and
drawback of this method of presentation    •     Spatial
is that one has to go through the
complete text of presentation for          Qualitative classification
comprehension but at the same time, it     When classification is done according
enables one to emphasise certain points    to qualitative characteristics like social
of the presentation.                       status, physical status, nationality, etc.,
                                           it is called qualitative classification. For
                                           example, in Table 4.1 the characteris-
                                           tics for classification are sex and
                                           location which are qualitative in nature.
                                                             TABLE 4.1
                                           Literacy in Bihar by sex and location (per cent)
                                                             Location               Total
                                           Sex            Rural    Urban
                                           Male          57.70     80.80           60.32
                                           Female        30.03     63.30           33.57
                                           Total         44.42     72.71           47.53
 .
3 TABULAR P RESENTATION      OF   DATA
                                           Source: Census of India 2001, Provisional
In a tabular presentation, data are        Population Totals.
presented in rows (read horizontally)
and columns (read vertically). For         Quantitative classification
example see Table 4.1 below tabulating     In quantitative classification, the data
information about literacy rates. It has   are classified on the basis of
4 2                                                             STATISTICS FOR ECONOMICS

characteristics which are quantitative              Temporal classification
in nature. In other words these                     In this classification time becomes the
characteristics can be measured                     classifying variable and data are
quantitatively. For example, age, height,           categorised according to time. Time
production, income, etc are quantitative            may be in hours, days, weeks, months,
characteristics. Classes are formed by              years, etc. For example, see Table 4.3.
assigning limits called class limits for                              TABLE 4.3
the values of the characteristic under                        Yearly sales of a tea shop
consideration. An example of                                     from 1995 to 2000

quantitative classification is Table 4.2.           Years                  Sale (Rs in lakhs)
                                                    1995                          79.2
                   TABLE 4.2                        1996                          81.3
       Distribution of 542 respondents by           1997                          82.4
     their age in an election study in Bihar        1998                          80.5
                                                    1999                         100.2
Age group             No. of
                                                    2000                          91.2
 ys
(r)                respondents     Per cent
20–30                   3             0.55          Data Source: Unpublished data.
30–40                  61            11.25
40–50                 132            24.35              In this table the classifying
50–60                 153            28.24          characteristic is year and takes values
60–70                 140            25.83          in the scale of time.
70–80                  51             9.41
80–90                   2             0.37
Al
 l                    542          100.00                             Activity
                                                      •     Go to your library and collect
Source: Assembly election Patna central
                                                            data on the number of books in
constituency 2005, A.N. Sinha Institute of Social
Studies, Patna.
                                                            economics, the library had at
                                                            the end of the year for the last
    Here classifying characteristic is age                  ten years and present the data
in years and is quantifiable.                               in a table.


                  Activities                        Spatial classification
     •   Construct a table presenting               When classification is done in such a
         data on preferential liking of the         way that place becomes the classifying
         students of your class for Star            variable, it is called spatial
         News, Zee News, BBC World,                 classification. The place may be a
         CNN, Aaj Tak and DD News.                  village/town, block, district, state,
     •   Prepare a table of                         country, etc.
          i
          )
         ( heights (in cm) and
                                                       Here the classifying characteristic is
          i)
         (i weights (in kg) of students
                                                    country of the world. Table 4.4 is an
             of your class.
                                                    example of spatial classification.
PRESENTATION OF DATA                                                                     4 3

                 TABLE 4.4                        )
                                                  i
                                                  ( Table Number
 Export from India to rest of the world in
one year as share of total export (per cent)      Table number is assigned to a table for
Destination                   Export share        identification purpose. If more than one
                                                  table is presented, it is the table
USA                              21.8
Germany                           5.6
                                                  number that distinguishes one table
Other EU                         14.7             from another. It is given at the top or
UK                                5.7             at the beginning of the title of the table.
Japan                             4.9             Generally, table numbers are whole
Russia                            2.1
Other East Europe                 0.6
                                                  numbers in ascending order if there are
OPEC                             10.5             many tables in a book. Subscripted
Asia                             19.0             numbers like 1.2, 3.1, etc. are also in
Other LDCs                        5.6             use for identifying the table according
Others                            9.5
                                                  to its location. For example, Table
 l
Al                              100.0             number 4.5 may read as fifth table
(Total Exports: US $ 33658.5 million)
                                                  of the fourth chapter and so on.
                                                  (See Table 4.5)

                  Activity                        ii
                                                  () Title
     •   Construct a table presenting             The title of a table narrates about the
         data collected from students of
                                                  contents of the table. It has to be very
         your class according to their
                                                  clear, brief and carefully worded so that
         native states/residential
         locality.                                the interpretations made from the table
                                                  are clear and free from any ambiguity.
4 TABULATION
 .                OF   DATA    AND   PARTS   OF   It finds place at the head of the table
   A TABLE
                                                  succeeding the table number or just
                                                  below it. (See Table 4.5).
To construct a table it is important to
learn first what are the parts of a good           ii
                                                  (i) Captions or Column Headings
statistical table. When put together in
                                                  At the top of each column in a table a
a systematically ordered manner these
                                                  column designation is given to explain
parts form a table. The most simple way
                                                  figures of the column. This is
of conceptualising a table may be data            called caption or column heading.
presented in rows and columns                     (See Table 4.5)
alongwith some explanatory notes.
Tabulation can be done using one-                  i)
                                                  (v Stubs or Row Headings
way, two-way or three-way                         Like a caption or column heading each
classification depending upon the                 row of the table has to be given a
number of characteristics involved. A             heading. The designations of the rows
good table should essentially have the            are also called stubs or stub items, and
following:                                        the complete left column is known as
4 4                                                                                        STATISTICS FOR ECONOMICS

stub column. A brief description of the                                         were non-workers in 2001. (See Table
row headings may also be given at the                                            .)
                                                                                45.
left hand top in the table. (See Table
45.
 .)                                                                              v)
                                                                                (i Unit of Measurement
                                                                                The unit of measurement of the figures
 v
() Body of the Table                                                            in the table (actual data) should always
Body of a table is the main part and it                                         be stated alongwith the title if the unit
contains the actual data. Location of                                           does not change throughout the table.
any one figure/data in the table is fixed                                       If different units are there for rows or
and determined by the row and column                                            columns of the table, these units must
of the table. For example, data in the                                          be stated alongwith ‘stubs’ or
second row and fourth column indicate                                           ‘captions’. If figures are large, they
that 25 crore females in rural India                                            should be rounded up and the method

                       Table Number                                     Title
                            ↓                                           ↓
                 Table 4.5 Population of India according to workers and non-workers by gender and location

                                                                                                     (Crore)
                                                        Column Headings/Captions                       ↑
                                                                  ↓                                  Units
                                Location       Gender                 Workers             Non-worker Total
                                                            Main      Marginal Total
                                               Male          17          3          20        18       38
  Row Headings/stubs




                                      Rural




                                                                                                                   Body of the table
                                               Female        6           5          11        25       36
                                               Total         23          8          31        43       74
                       →                       Male           7          1           8        7        15
                                                                                                               ←
                                      Urban




                                               Female         1          0           1        12       13
                                               Total          8          1           9        19       28

                                               Male          24          4          28        25        53
                                      All




                                               Female        7           5          12        37        49
                                               Total         31          9          40        62       102

                           Source : Census of India 2001
                              ↑                Foot note : Figures are rounded to nearest crore
                        Source note
                                                ↑
                                              Footnote

(Note : Table 4.5 presents the same data in tabular form already presented through case 2 in
textual presentation of data)
PRESENTATION OF DATA                                                                4 5

of rounding should be indicated (See             Diagrams may be less accurate but
Table 4.5).                                   are much more effective than tables in
                                              presenting the data.
vi
(i) Source Note                                  There are various kinds of diagrams
It is a brief statement or phrase             in common use. Amongst them the
indicating the source of data presented       important ones are the following:
in the table. If more than one source is          )
                                                  i
                                                 ( Geometric diagram
there, all the sources are to be written          i
                                                  i
                                                 () Frequency diagram
in the source note. Source note is                ii
                                                 (i) Arithmetic line graph
generally written at the bottom of the
table. (See Table 4.5).                       Geometric Diagram
                                              Bar diagram and pie diagram come in
vi
 i)
(i Footnote                                   the category of geometric diagram for
Footnote is the last part of the table.       presentation of data. The bar diagrams
Footnote explains the specific feature        are of three types – simple, multiple and
of the data content of the table which is     component bar diagrams.
not self explanatory and has not been
explained earlier.                            Bar Diagram
                                              Simple Bar Diagram
                                              Bar diagram comprises a group of
               Activities
                                              equispaced and equiwidth rectangular
  •   How many rows and columns               bars for each class or category of data.
      are essentially required to form
                                              Height or length of the bar reads the
      a table?
                                              magnitude of data. The lower end of the
  •   Can the column/row headings
      of a table be quantitative?             bar touches the base line such that the
                                              height of a bar starts from the zero unit.
                                              Bars of a bar diagram can be visually
5 DIAGRAMMATIC
 .                   PRESENTATION        OF   compared by their relative height and
   DATA                                       accordingly data are comprehended
                                              quickly. Data for this can be of
This is the third method of presenting        frequency or non-frequency type. In
data. This method provides the                non-frequency type data a particular
quickest understanding of the actual          characteristic, say production, yield,
situation to be explained by data in          population, etc. at various points of
comparison to tabular or textual              time or of different states are noted and
presentations. Diagrammatic presenta-         corresponding bars are made of the
tion of data translates quite effectively     respective heights according to the
the highly abstract ideas contained in        values of the characteristic to construct
numbers into more concrete and easily         the diagram. The values of the
comprehensible form.                          characteristics (measured or counted)
4 6                                                            STATISTICS FOR ECONOMICS

retain the identity of each value. Figure        expenditure profile, export/imports
4.1 is an example of a bar diagram.              over the years, etc.

                 Activity
  •   You had constructed a table
      presenting the data about the
      students of your class. Draw a
      bar diagram for the same table.
    Different types of data may require
different modes of diagrammatical
representation. Bar diagrams are
suitable both for frequency type and                A category that has a longer bar
non-frequency type variables and                 (literacy of Kerala) than another
attributes. Discrete variables like family       category (literacy of West Bengal), has
size, spots on a dice, grades in an              more of the measured (or enumerated)
examination, etc. and attributes such            characteristics than the other. Bars
as gender, religion, caste, country, etc.        (also called columns) are usually used
can be represented by bar diagrams.              in time series data (food grain
Bar diagrams are more convenient for             produced between 1980–2000,
non-frequency data such as income-               decadal variation in work participation
                                          TABLE 4.6
                            Literacy Rates of Major States of India
                                             2001                         1991
Major Indian States              Person      Male     Female     Person   Male   Female
Andhra Pradesh (AP)               60.5       70.3      50.4       44.1    55.1    32.7
Assam (AS)                        63.3       71.3      54.6       52.9    61.9    43.0
Bihar (BR)                        47.0       59.7      33.1       37.5    51.4    22.0
Jharkhand (JH)                    53.6       67.3      38.9       41.4    55.8    31.0
Gujarat (GJ)                      69.1       79.7      57.8       61.3    73.1    48.6
Haryana (HR)                      67.9       78.5      55.7       55.8    69.1    40.4
Karnataka (KA)                    66.6       76.1      56.9       56.0    67.3    44.3
Kerala (KE)                       90.9       94.2      87.7       89.8    93.6    86.2
Madhya Pradesh (MP)               63.7       76.1      50.3       44.7    58.5    29.4
Chhattisgarh (CH)                 64.7       77.4      51.9       42.9    58.1    27.5
Maharashtra (MR)                  76.9       86.0      67.0       64.9    76.6    52.3
Orissa (OR)                       63.1       75.3      50.5       49.1    63.1    34.7
Punjab (PB)                       69.7       75.2      63.4       58.5    65.7    50.4
Rajasthan (RJ)                    60.4       75.7      43.9       38.6    55.0    20.4
Tamil Nadu (TN)                   73.5       82.4      64.4       62.7    73.7    51.3
Uttar Pradesh (UP)                56.3       68.8      42.2       40.7    54.8    24.4
Uttaranchal (UT)                  71.6       83.3      59.6       57.8    72.9    41.7
West Bengal (WB)                  68.6       77.0      59.6       57.7    67.8    46.6
India                             64.8       75.3      53.7       52.2    64.1    39.3
PRESENTATION OF DATA                                                                    4 7




Fig. 4.1: Bar diagram showing literacy rates (person) of major states of India, 2001.

rate, registered unemployed over the                different years, marks obtained in
years, literacy rates, etc.) (Fig 4.2).             different subjects in different classes,
    Bar diagrams can have different                  t.
                                                    ec
forms such as multiple bar diagram
and component bar diagram.                          Component Bar Diagram

                 Activities                         Component bar diagrams or charts
                                                    (Fig.4.3), also called sub-diagrams, are
   •   How many states (among the                   very useful in comparing the sizes of
       major states of India) had
                                                    different component parts (the elements
       higher female literacy rate than
                                                    or parts which a thing is made up of)
       the national average in 2001?
                                                    and also for throwing light on the
   •   Has the gap between maximum
       and minimum female literacy                  relationship among these integral parts.
       rates over the states in two                 For example, sales proceeds from
       consecutive census years 2001                different products, expenditure pattern
       and 1991 declined?                           in a typical Indian family (components
                                                    being food, rent, medicine, education,
Multiple Bar Diagram                                power, etc.), budget outlay for receipts
Multiple bar diagrams (Fig.4.2) are                 and expenditures, components of
used for comparing two or more sets of              labour force, population etc.
data, for example income and                        Component bar diagrams are usually
expenditure or import and export for                shaded or coloured suitably.
4 8                                                               STATISTICS FOR ECONOMICS




Fig. 4.2: Multiple bar (column) diagram showing female literacy rates over two census years 1991
and 2001 by major states of India.
Interpretation: It can be very easily derived from Figure 4.2 that female literacy rate over the years
was on increase throughout the country. Similar other interpretations can be made from the figure
like the state of Rajasthan experienced the sharpest rise in female literacy, etc.

                 TABLE 4.7                           its height equivalent to the total value
 Enrolment by gender at schools (per cent)           of the bar [for per cent data the bar
 of children aged 6–14 years in a district of
                    Bihar
                                                     height is of 100 units (Figure 4.3)].
                                                     Otherwise the height is equated to total
                      Enrolled    Out of school
Gender               (per cent)     (per cent)
                                                     value of the bar and proportional
                                                     heights of the components are worked
Boy                    91.5            8.5
                                                     out using unitary method. Smaller
 il
Gr                     58.6           41.4
Al
 l                     78.0           22.0           components are given priority in
                                                     parting the bar.
Data Source: Unpublished data
                                                     Pie Diagram
    A component bar diagram shows
the bar and its sub-divisions into two               A pie diagram is also a component
or more components. For example, the
bar might show the total population of
children in the age-group of 6–14 years.
The components show the proportion
of those who are enrolled and those
who are not. A component bar diagram
might also contain different component
bars for boys, girls and the total of
children in the given age group range,
as shown in Figure 4.3. To construct a
component bar diagram, first of all, a               Fig. 4.3: Enrolment at primary level in a district
bar is constructed on the x-axis with                of Bihar (Component Bar Diagram)
PRESENTATION OF DATA                                                                      4 9

diagram, but unlike a component bar          of the components have to be converted
diagram, a circle whose area is              into percentages before they can be
proportionally divided among the             used for a pie diagram.
components (Fig.4.4) it represents. It
                                                              TABLE 4.8
                                              Distribution of Indian population by their
                                                     working status (crore)
                                             Status          Population Per cent     Angular
                                                                                   Component
                                             Marginal Worker       9         8.8       32°
                                             Main Worker          31        30.4      109°
                                             Non-Worker           62        60.8      219°
                                              l
                                             Al                  102      100.0       360°




is also called a pie chart. The circle is
divided into as many parts as there are
components by drawing straight lines
from the centre to the circumference.
    Pie charts usually are not drawn
with absolute values of a category. The
values of each category are first            Fig. 4.4: Pie diagram for different categories of
                                             Indian population according to working status
expressed as percentage of the total
                                             2001.
value of all the categories. A circle in a
pie chart, irrespective of its value of
                                                               Activities
radius, is thought of having 100 equal
parts of 3.6° (360°/100) each. To find            •   Represent data presented
out the angle, the component shall                    through Figure 4.4 by a
                                                      component bar diagram.
subtend at the centre of the circle, each         •   Does the area of a pie have any
percentage figure of every component                  bearing on total value of the
is multiplied by 3.6°. An example of this             data to be represented by the
conversion of percentages of                          pie diagram?
components into angular components
of the circle is shown in Table 4.8.         Frequency Diagram
   It may be interesting to note that        Data in the form of grouped frequency
data represented by a component bar          distributions are generally represented
diagram can also be represented              by frequency diagrams like histogram,
equally well by a pie chart, the only        frequency polygon, frequency curve
requirement being that absolute values       and ogive.
5 0                                                       STATISTICS FOR ECONOMICS


Histogram                                                      TABLE 4.9
                                                 Distribution of daily wage earners in a
A histogram is a two dimensional                           locality of a town
diagram. It is a set of rectangles with        Daily         No.     Cumulative Frequencey
bases as the intervals between class           earning    of wage 'Less than' 'More than'
                                                R)
                                               (s        earners (f)
boundaries (along X-axis) and with
                                               45–49         2         2           85
areas proportional to the class
                                               50–54         3         5           83
frequency (Fig.4.5). If the class intervals    55–59         5        10           80
are of equal width, which they generally       60–64         3        13           75
                                               65–69         6        19           72
are, the area of the rectangles are            70–74         7        26           66
proportional to their respective               75–79        12        38           59
frequencies. However, in some type of          80–84        13        51           47
                                               85–89         9        60           34
data, it is convenient, at times               90–94         7        67           25
necessary, to use varying width of class       95–99         6        73           18
intervals. For example, when tabulating        100–104       4        77           12
                                               105–109       2        79            8
deaths by age at death, it would be very       110–114       3        82            6
meaningful as well as useful too to have       115–119       3        85            3
very short age intervals (0, 1, 2, ..., yrs/
                                               Source: Unpublished data
0, 7, 28, ..., days) at the beginning
when death rates are very high                 Since histograms are rectangles, a line
                                               parallel to the base line and of the same
compared to deaths at most other
                                               magnitude is to be drawn at a vertical
higher age segments of the population.         distance equal to frequency (or
For graphical representation of such           frequency density) of the class interval.
data, height for area of a rectangle is        A histogram is never drawn for a
the quotient of height (here frequency)        discrete variable/data. Since in an
and base (here width of the class              interval or ratio scale the lower class
interval). When intervals are equal, that      boundary of a class interval fuses with
                                               the upper class boundary of the
is, when all rectangles have the same
                                               previous interval, equal or unequal, the
base, area can conveniently be                 rectangles are all adjacent and there is
represented by the frequency of any            no open space between two consecutive
interval for purposes of comparison.           rectangles. If the classes are not
When bases vary in their width, the            continuous they are first converted into
heights of rectangles are to be adjusted       continuous classes as discussed in
to yield comparable measurements.              Chapter 3. Sometimes the common
                                               portion between two adjacent
The answer in such a situation is
                                               rectangles (Fig.4.6) is omitted giving a
frequency density (class frequency             better impression of continuity. The
divided by width of the class interval)        resulting figure gives the impression of
instead of absolute frequency.                 a double staircase.
PRESENTATION OF DATA                                                                         5 1

    A histogram looks similar to a bar               continuous variables, but histogram is
diagram. But there are more differences              drawn only for a continuous variable.
than similarities between the two than               Histogram also gives value of mode of
it may appear at the first impression.               the frequency distribution graphically
The spacing and the width or the area                as shown in Figure 4.5 and the x-
of bars are all arbitrary. It is the height          coordinate of the dotted vertical line
and not the width or the area of the bar             gives the mode.
that really matters. A single vertical line
could have served the same purpose                   Frequency Polygon
as a bar of same width. Moreover, in                 A frequency polygon is a plane
histogram no space is left in between                bounded by straight lines, usually four
two rectangles, but in a bar diagram                 or more lines. Frequency polygon is an
some space must be left between                      alternative to histogram and is also
consecutive bars (except in multiple                 derived from histogram itself. A
bar or component bar diagram).                       frequency polygon can be fitted to a
Although the bars have the same                      histogram for studying the shape of the
width, the width of a bar is unimportant             curve. The simplest method of drawing
for the purpose of comparison. The                   a frequency polygon is to join the
width in a histogram is as important                 midpoints of the topside of the
as its height. We can have a bar                     consecutive rectangles of the
diagram both for discrete and                        histogram. It leaves us with the two




Fig. 4.5: Histogram for the distribution of 85 daily wage earners in a locality of a town.
5 2                                                            STATISTICS FOR ECONOMICS

ends away from the base line, denying             No matter whether class boundaries or
the calculation of the area under the             midpoints are used in the X-axis,
curve. The solution is to join the two            frequencies (as ordinates) are always
end-points thus obtained to the base              plotted against the mid-point of class
line at the mid-values of the two classes         intervals. When all the points have been
with zero frequency immediately at                plotted in the graph, they are carefully
each end of the distribution. Broken              joined by a series of short straight lines.
lines or dots may join the two ends with          Broken lines join midpoints of two
the base line. Now the total area under           intervals, one in the beginning and the
the curve, like the area in the                   other at the end, with the two ends of
histogram, represents the total                   the plotted curve (Fig.4.6). When
frequency or sample size.                         comparing two or more distributions
    Frequency polygon is the most                 plotted on the same axes, frequency
common method of presenting grouped               polygon is likely to be more useful since
frequency distribution. Both class                the vertical and horizontal lines of two
boundaries and class-marks can be                 or more distributions may coincide in
used along the X-axis, the distances              a histogram.
between two consecutive class marks
                                                  Frequency Curve
being proportional/equal to the width
of the class intervals. Plotting of data          The frequency curve is obtained by
becomes easier if the class-marks fall            drawing a smooth freehand curve
on the heavy lines of the graph paper.            passing through the points of the




Fig. 4.6: Frequency polygon drawn for the data given in Table 4.9
PRESENTATION OF DATA                                                                   5 3




Fig. 4.7: Frequency curve for Table 4.9

frequency polygon as closely as             frequencies are plotted against the
possible. It may not necessarily pass       respective lower limits of the class
through all the points of the frequency     interval. An interesting feature of the
polygon but it passes through them as       two ogives together is that their
closely as possible (Fig. 4.7).             intersection point gives the median
                                            Fig. 4.8 (b) of the frequency distribu-
Ogive                                       tion. As the shapes of the two ogives
Ogive is also called cumulative             suggest, less than ogive is never
frequency curve. As there are two types     decreasing and more than ogive is
of cumulative frequencies, for example      never increasing.
less than type and more than type,
                                                              TABLE 4.10
accordingly there are two ogives for any            Frequency distribution of marks
grouped frequency distribution data.                    obtained in mathematics
Here in place of simple frequencies as      Marks       Number of ‘Less than’ ‘More than’
in the case of frequency polygon,                        students cumulative cumulative
cumulative frequencies are plotted          x                f    frequency frequency
along y-axis against class limits of the    0–20            6          6          64
frequency distribution. For less than       20–40           5         11          58
                                            40–60          33         44          53
ogive the cumulative frequencies are
                                            60–80          14         58          20
plotted against the respective upper        80–100          6         64           6
limits of the class intervals whereas for
                                            Total          64
more than ogives the cumulative
5 4                                                                STATISTICS FOR ECONOMICS




Fig. 4.8(a): 'Less than' and 'More than' ogive for data given in Table 4.10
                                                     Arithmetic Line Graph
                                                     An arithmetic line graph is also called
                                                     time series graph and is a method of
                                                     diagrammatic presentation of data. In
                                                     it, time (hour, day/date, week, month,
                                                     year, etc.) is plotted along x-axis and
                                                     the value of the variable (time series
                                                     data) along y-axis. A line graph by
                                                     joining these plotted points, thus,
                                                     obtained is called arithmetic line graph
                                                     (time series graph). It helps in
                                                     understanding the trend, periodicity,
                                                     etc. in a long term time series data.

                                                                         Activity
                                                        •    Can the ogive be helpful in
                                                             locating the partition values of
Fig. 4.8(b): ‘Less than’ and ‘More than’ ogive               the distribution it represents?
for data given in Table 4.10
PRESENTATION OF DATA                                                                                                                                                                                            5 5

                TABLE 4.11                                                                                                     Here you can see from Fig. 4.9 that
   Value of Exports and Imports of India
                                                                                                                            for the period 1978 to 1999, although
             (Rs in 100 crores)
                                                                                                                            the imports were more than the exports
Year                                               Exports                            Imports
                                                                                                                            all through, the rate of acceleration
1977–78                                              54                                 60                                  went on increasing after 1988–89 and
1978–79                                              57                                 68
1979–80                                              64                                 91
                                                                                                                            the gap between the two (imports and
1980–81                                              67                                125                                  exports) was widened after 1995.
1982–83                                              88                                143
1983–84                                              98                                158                                   .
                                                                                                                            6 CONCLUSION
1984–85                                             117                                171
1985–86                                             109                                197                                  By now you must have been able to
1986–87                                             125                                201                                  learn how collected data could be
1987–88                                             157                                222
1988–89                                             202                                282
                                                                                                                            presented using various forms of
1989–90                                             277                                353                                  presentation — textual, tabular and
1990–91                                             326                                432                                  diagrammatic. You are now also able
1991–92                                             440                                479                                  to make an appropriate choice of the
1992–93                                             532                                634
1993–94                                             698                                731                                  form of data presentation as well as the
1994–95                                             827                                900                                  type of diagram to be used for a given
1995–96                                            1064                               1227                                  set of data. Thus you can make
1996–97                                            1186                               1369
1997–98                                            1301                               1542
                                                                                                                            presentation of data meaningful,
1998–99                                            1416                               1761                                  comprehensive and purposeful.

                                          Scale: 1cm=200 crores on Y-axis
                                   2000

                                   1800

                                   1600

                                   1400
       Values (in Rs 100 Crores)




                                   1200                                                                                                                                                               Exports
                                                                                                                                                                                                      Imports
                                   1000

                                    800

                                    600

                                    400

                                    200

                                      0
                                                                 1981
                                            1978
                                                   1979
                                                          1980


                                                                        1982
                                                                               1983
                                                                                      1984
                                                                                             1985
                                                                                                    1986
                                                                                                           1987
                                                                                                                  1988
                                                                                                                         1989
                                                                                                                                1990
                                                                                                                                       1991
                                                                                                                                              1992
                                                                                                                                                     1993
                                                                                                                                                            1994
                                                                                                                                                                   1995
                                                                                                                                                                          1996
                                                                                                                                                                                 1997
                                                                                                                                                                                        1998
                                                                                                                                                                                               1999




                                                                                                                  Year

Fig. 4.9: Arithmetic line graph for time series data given in Table 4.11
5 6                                                    STATISTICS FOR ECONOMICS


                                        Recap
        •   Data (even voluminous data) speak meaningfully through
            presentation.
        •   For small data (quantity) textual presentation serves the purpose
            better.
        •   For large quantity of data tabular presentation helps in
            accommodating any volume of data for one or more variables.
        •   Tabulated data can be presented through diagrams which enable
            quicker comprehension of the facts presented otherwise.




                                    EXERCISES

      Answer the following questions, 1 to 10, choosing the correct answer
       .
      1 Bar diagram is a
          i)
          ( one-dimensional diagram
           i)
          (i two-dimensional diagram
           ii
          (i) diagram with no dimension
           i)
          (v none of the above
       .
      2 Data represented through a histogram can help in finding graphically the
         i)
         ( mean
          i)
         (i mode
          ii
         (i) median
          i)
         (v all the above
       .
      3 Ogives can be helpful in locating graphically the
         i)
         ( mode
          i)
         (i m e a n
          ii
         (i) median
          i)
         (v none of the above
       .
      4 Data represented through arithmetic line graph help in understanding
         i)
         ( long term trend
          i)
         (i cyclicity in data
          ii
         (i) seasonality in data
          i)
         (v all the above
       .
      5 Width of bars in a bar diagram need not be equal (True/False).
       .
      6 Width of rectangles in a histogram should essentially be equal (True/
         False).
       .
      7 Histogram can only be formed with continuous classification of data
         (True/False).
PRESENTATION OF DATA                                                                 5 7

    8. Histogram and column diagram are the same method of presentation of
       data. (True/False).
    9. Mode of a frequency distribution can be known graphically with the
       help of histogram. (True/False).
   10. Median of a frequency distribution cannot be known from the ogives.
       (True/False).
   11. What kind of diagrams are more effective in representing the following?
       i)
       ( Monthly rainfall in a year
        i)
       (i Composition of the population of Delhi by religion
        ii
       (i) Components of cost in a factory
   12. Suppose you want to emphasise the increase in the share of urban
       non-workers and lower level of urbanisation in India as shown in
       Example 4.2. How would you do it in the tabular form?
   13. How does the procedure of drawing a histogram differ when class
       intervals are unequal in comparison to equal class intervals in a
       frequency table?
   14. The Indian Sugar Mills Association reported that, ‘Sugar production
       during the first fortnight of December 2001 was about 3,87,000 tonnes,
       as against 3,78,000 tonnes during the same fortnight last year (2000).
       The off-take of sugar from factories during the first fortnight of December
       2001 was 2,83,000 tonnes for internal consumption and 41,000 tonnes
       for exports as against 1,54,000 tonnes for internal consumption and
       nil for exports during the same fortnight last season.’
       i)
       ( Present the data in tabular form.
        i)
       (i Suppose you were to present these data in diagrammatic form which
           of the diagrams would you use and why?
        ii
       (i) Present these data diagrammatically.
   15. The following table shows the estimated sectoral real growth rates
       (percentage change over the previous year) in GDP at factor cost.
      Year          Agriculture and allied sectors      Industry      Services
       1
      ()                          2
                                 ()                         3
                                                           ()             4
                                                                         ()
      1994–95                     5.0                      9.2           7.0
      1995–96                    –0.9                     11.8          10.3
      1996–97                     9.6                      6.0           7.1
      1997–98                    –1.9                      5.9           9.0
      1998–99                     7.2                      4.0           8.3
      1999–2000                   0.8                      6.9           8.2
      Represent the data as multiple time series graphs.
                                                                     CHAPTER


Measures of Central Tendency




  Studying this chapter should           of the data. In this chapter, you will
  enable you to:                         study the measures of central
  • understand the need for              tendency which is a numerical method
     summarising a set of data by one    to explain the data in brief. You can
     single number;                      see examples of summarising a large
  • recognise and distinguish            set of data in day to day life like
     between the different types of      average marks obtained by students
     averages;
                                         of a class in a test, average rainfall in
  • learn to compute different types
     of averages;                        an area, average production in a
  • draw meaningful conclusions          factory, average income of persons
     from a set of data;                 living in a locality or working in a firm
  • develop an understanding of           t.
                                         ec
     which type of average would be          Baiju is a farmer. He grows food
     most useful in a particular         grains in his land in a village called
     situation.                          Balapur in Buxar district of Bihar. The
                                         village consists of 50 small farmers.
                                         Baiju has 1 acre of land. You are
 .
1 INTRODUCTION
                                         interested in knowing the economic
In the previous chapter, you have read   condition of small farmers of Balapur.
the tabular and graphic representation   You want to compare the economic
MEASURES OF CENTRAL TENDENCY                                                          5 9


condition of Baiju in Balapur village.     2. ARITHMETIC MEAN
For this, you may have to evaluate the
                                           Suppose the monthly income (in Rs)
size of his land holding, by comparing
                                           of six families is given as:
with the size of land holdings of other
                                           1600, 1500, 1400, 1525, 1625, 1630.
farmers of Balapur. You may like to
                                               The mean family income is
see if the land owned by Baiju is –
1 above average in ordinary sense
 .                                         obtained by adding up the incomes
   (see the Arithmetic Mean below)         and dividing by the number of
2 above the size of what half the
 .                                         families.
   farmers own (see the Median                  1600 + 1500 + 1400 + 1525 + 1625 + 1630
                                           Rs
   below)                                                          6
3 above what most of the farmers
 .                                         = Rs 1,547
   own (see the Mode below)                   It implies that on an average, a
   In order to evaluate Baiju’s relative   family earns Rs 1,547.
economic condition, you will have to          Arithmetic mean is the most
summarise the whole set of data of         commonly used measure of central
land holdings of the farmers of            tendency. It is defined as the sum of
Balapur. This can be done by use of        the values of all observations divided
central tendency, which summarises         by the number of observations and is
the data in a single value in such a       usually denoted by x . In general, if
way that this single value can             there are N observations as X1, X2, X3,
represent the entire data. The             ..., XN, then the Arithmetic Mean is
measuring of central tendency is a         given by
way of summarising the data in the
form of a typical or representative                  X 1 + X 2 + X 3 + ... + X N
                                                 x=
value.                                                           N
   There are several statistical                     SX
                                                   =
measures of central tendency or                       N
“averages”. The three most commonly
                                              Where, S X = sum of all observa-
used averages are:
                                           tions and N = total number of obser-
• Arithmetic Mean
                                           vations.
• Median
•    Mode
                                           How Arithmetic Mean is Calculated
   You should note that there are two
more types of averages i.e. Geometric      The calculation of arithmetic mean
Mean and Harmonic Mean, which are          can be studied under two broad
suitable in certain situations.            categories:
However, the present discussion will        .
                                           1 Arithmetic Mean for Ungrouped
be limited to the three types of              Data.
averages mentioned above.                   .
                                           2 Arithmetic Mean for Grouped Data.
6 0                                                   STATISTICS FOR ECONOMICS

Arithmetic Mean for Series of              mean by direct method. The
Ungrouped Data                             computation can be made easier by
                                           using assumed mean method.
Direct Method                                  In order to save time of calculation
                                           of mean from a data set containing a
Arithmetic mean by direct method is        large number of observations as well
the sum of all observations in a series    as large numerical figures, you can
divided by the total number of             use assumed mean method. Here you
observations.                              assume a particular figure in the data
                                           as the arithmetic mean on the basis
Example 1                                  of logic/experience. Then you may
Calculate Arithmetic Mean from the         take deviations of the said assumed
data showing marks of students in a        mean from each of the observation.
class in an economics test: 40, 50, 55,    You can, then, take the summation of
78, 58.                                    these deviations and divide it by the
                                           number of observations in the data.
           SX
      X=                                   The actual arithmetic mean is
            N                              estimated by taking the sum of the
           40 + 50 + 55 + 78 + 58          assumed mean and the ratio of sum
       =                          = 56.2   of deviations to number of observa-
                     5
                                           tions. Symbolically,
   The average marks of students in        Let, A = assumed mean
the economics test are 56.2.                    X = individual observations
                                                N = total numbers of observa-
Assumed Mean Method                                  tions
If the number of observations in the            d = deviation of assumed mean
data is more and/or figures are large,               from individual observation,
it is difficult to compute arithmetic                i.e. d = X – A




                                                               (HEIGHT IN INCHES)
MEASURES OF CENTRAL TENDENCY                                                            6 1

    Then sum of all deviations is taken            Arithmetic Mean using assumed mean
as Sd = S( X - A )                                 method

                   Sd                                         Sd
                                                       X =A +    = 850 + (2, 660)/10
    Then find                                                  N
                   N
                    Sd                                   = Rs1,116.
    Then add A and      to get X
                     N                                Thus, the average weekly income
                        Sd                         of a family by both methods is
   Therefore, X = A +                              Rs 1,116. You can check this by using
                         N
   You should remember that any                    the direct method.
value, whether existing in the data or
not, can be taken as assumed mean.                 Step Deviation Method
However, in order to simplify the                  The calculations can be further
calculation, centrally located value in            simplified by dividing all the deviations
the data can be selected as assumed                taken from assumed mean by the
mean.                                              common factor ‘c’. The objective is to
Example 2                                          avoid large numerical figures, i.e., if
                                                   d = X – A is very large, then find d'.
The following data shows the weekly
                                                   This can be done as follows:
income of 10 families.
Family                                                d X-A
    A    B    C    D  E  F  G   H                        =     .
                                                       c    C
    I    J
Weekly Income (in Rs)                                 The formula is given below:
   850 700 100 750 5000 80 420 2500
                                                               S d¢
   400 360                                             X =A +       ·c
    Compute mean family income.                                 N
                                                      Where d' = (X – A)/c, c = common
                TABLE 5.1
    Computation of Arithmetic Mean by              factor, N = number of observations,
          Assumed Mean Method                      A= Assumed mean.
Families    Income      d = X – 850        d
                                           '
                                                      Thus, you can calculate the
              ()
               X                  = (X – 850)/10   arithmetic mean in the example 2, by
A           850              0            0
                                                   the step deviation method,
B           700           –150          –15           X = 850 + (266)/10 × 10 = Rs 1,116.
C           100           –750          –75
D           750           –100          –10        Calculation of arithmetic mean for
E          5000          +4150         +415        Grouped data
F            80           –770          –77
G           420           –430          –43        Discrete Series
H          2500          +1650         +165
I           400           –450          –45        Direct Method
J           360           –490          –49
                                                   In case of discrete series, frequency
           11160         +2660          +266       against each of the observations is
6 2                                                                 STATISTICS FOR ECONOMICS

multiplied by the value of the                          Assumed Mean Method
observation. The values, so obtained,                   As in case of individual series the
are summed up and divided by the                        calculations can be simplified by using
total number of frequencies.                            assumed mean method, as described
Symbolically,                                           earlier, with a simple modification.
                                                        Since frequency (f) of each item is
            S fX                                        given here, we multiply each deviation
      X =
             Sf                                         (d) by the frequency to get fd. Then we
   Where, S fX = sum of product of                      get S fd. The next step is to get the
variables and frequencies.                              total of all frequencies i.e. S f. Then
S f = sum of frequencies.
                                                        find out S fd/ S f. Finally the
                                                        arithmetic mean is calculated by
Example 3
                                                                  S fd
Calculate mean farm size of                             X =A +             using assumed mean
cultivating households in a village for                           Sf
the following data.                                     method.
Farm Size (in acres):
      64     63         62   61     60        59
                                                        Step Deviation Method
No. of Cultivating Households:                          In this case the deviations are divided
       8     18         12   9      7         6         by the common factor ‘c’ which
                                                        simplifies the calculation. Here we
                 TABLE 5.2
                                                                        d X-A
      Computation of Arithmetic Mean by                 estimate d' =     =        in order to
               Direct Method                                            c     C
Farm Size    No. of       X       d        fd           reduce the size of numerical figures
()
 X        cultivating  (1 × 2) (X - 62) (2 × 4)         for easier calculation. Then get fd' and
in acres households(f)
()
 1             ()
                2        ()
                          3       ()
                                   4        5
                                           ()           S fd'. Finally the formula for step
64             8        512      +2       +16           deviation method is given as,
63            18       1134      +1      +18                             S fd ¢
62            12        744        0         0                X =A +           ·c
61              9       549       –1       –9                             Sf
60              7       420       –2      –14
59              6       354       –3      –18                               Activity
                   60        3713        –3        –7     •    Find the mean farm size for the
                                                               data given in example 3, by using
Arithmetic mean using direct method,                           step deviation and assumed
                                                               mean methods.
       S fX 3717
X =        =     = 61.88 acres
        Sf   60                                         Continuous Series
   Therefore, the mean farm size in a                   Here, class intervals are given. The
village is 61.88 acres.                                 process of calculating arithmetic mean
MEASURES OF CENTRAL TENDENCY                                                     6 3


in case of continuous series is same        Steps:
as that of a discrete series. The only       .
                                            1 Obtain mid values for each class
difference is that the mid-points of           denoted by m.
various class intervals are taken. You
                                            2 Obtain S fm and apply the direct
                                             .
should note that class intervals may
                                               method formula:
be exclusive or inclusive or of unequal
size. Example of exclusive class                      S fm 2110
                                               X=         =     = 30.14 marks
interval is, say, 0–10, 10–20 and so                   Sf   70
on. Example of inclusive class interval
is, say, 0–9, 10–19 and so on. Example      Step deviation method
of unequal class interval is, say,
0–20, 20–50 and so on. In all these                         m A
                                             .
                                            1 Obtain d' =
cases, calculation of arithmetic mean                        c
is done in a similar way.                    .
                                            2 Take A = 35, (any arbitrary figure),
                                               c = common factor.
Example 4
                                                          £ fd’          ( 34)
Calculate average marks of the                 X = A+           c = 35 +         10
                                                           £f             70
following students using (a) Direct                  = 30.14 marks
method (b) Step deviation method.
                                            An interesting property of A.M.
Direct Method
Marks                                          It is interesting to know and
      0–10 10–20 20–30 30–40 40–50          useful for checking your calculation
     50–60 60–70                            that the sum of deviations of items
No. of Students                             about arithmetic mean is always equal
        5      12 15    25     8
        3       2                           to zero. Symbolically, S ( X – X ) = 0.
                                                However, arithmetic mean is
               TABLE 5.3                    affected by extreme values. Any large
    Computation of Average Marks for
Exclusive Class Interval by Direct Method
                                            value, on either end, can push it up
                                            or down.
Mark     No. of   mid                  d
                         fm d'=(m-35) f'
 x
()      students value (2)×(3)    10
           (
           f)     (m)                       Weighted Arithmetic Mean
 1
()          2
           ()      3
                  ()     ()
                          4     5
                               ()      6
                                      ()    Sometimes it is important to assign
0–10        5      5     25    –3    –15
10–20     12      15    180    –2    –24    weights to various items according to
20–30     15      25    375    –1    –15    their importance, when you calculate
30–40     25      35    875     0      0    the arithmetic mean. For example,
40–50       8     45    360     1      8    there are two commodities, mangoes
50–60       3     55    165     2      6
60–70       2     65    130     3      6
                                            and potatoes. You are interested in
                                            finding the average price of mangoes
          70          2110           –34
                                            (p1) and potatoes (p2). The arithmetic
6 4                                                         STATISTICS FOR ECONOMICS


                   p1 + p2                      3. MEDIAN
mean will be               . However, you
                      2                         The arithmetic mean is affected by the
might want to give more importance              presence of extreme values in the data.
to the rise in price of potatoes (p2). To       If you take a measure of central
do this, you may use as ‘weights’ the           tendency which is based on middle
quantity of mangoes (q1) and the                position of the data, it is not affected
quantity of potatoes (q2). Now the              by extreme items. Median is that
arithmetic mean weighted by the                 positional value of the variable which
                                                divides the distribution into two equal
                           q1p1 + q 2 p 2
quantities would be                         .   parts, one part comprises all values
                             q1 + q 2
                                                greater than or equal to the median
   In general the weighted arithmetic           value and the other comprises all
mean is given by,                               values less than or equal to it. The
      w1 x1 + w 2 x 2 +...+ w n x n £ wx        Median is the “middle” element when
                                   =            the data set is arranged in order of the
          w1 + w 2 +...+ w n         £w
                                                magnitude.
   When the prices rise, you may be
interested in the rise in the price of          Computation of median
the commodities that are more                   The median can be easily computed
important to you. You will read more            by sorting the data from smallest to
about it in the discussion of Index             largest and counting the middle value.
Numbers in Chapter 8.
                                                Example 5
                  Activities                    Suppose we have the following
  •     Check this property of the              observation in a data set: 5, 7, 6, 1, 8,
        arithmetic mean for the following       10, 12, 4, and 3.
        example:                                Arranging the data, in ascending order
        X:      4    6     8 10 12              you have:
  •     In the above example if mean is            1, 3, 4, 5, 6, 7, 8, 10, 12.
        increased by 2, then what
        happens to the individual
        observations, if all are equally
        affected.
                                                   The “middle score” is 6, so the
  •     If first three items increase by        median is 6. Half of the scores are
        2, then what should be the              larger than 6 and half of the scores
        values of the last two items, so        are smaller.
        that mean remains the same.                If there are even numbers in the
  •     Replace the value 12 by 96. What        data, there will be two observations
        happens to the arithmetic mean.         which fall in the middle. The median
        Comment.
                                                in this case is computed as the
MEASURES OF CENTRAL TENDENCY                                                             6 5

arithmetic mean of the two middle                                        th
                                                           (N+1)
values.                                   Median = size of       item
                                                             2
Example 6                                 Discrete Series
The following data provides marks of      In case of discrete series the position
20 students. You are required to          of median i.e. (N+1)/2th item can be
calculate the median marks.               located through cumulative freque-
25, 72, 28, 65, 29, 60, 30, 54, 32, 53,   ncy. The corresponding value at this
33, 52, 35, 51, 42, 48, 45, 47, 46, 33.   position is the value of median.

Arranging the data in an ascending        Example 7
order, you get
                                          The frequency distribution of the
25, 28, 29, 30, 32, 33, 33, 35, 42,       number of persons and their
45, 46, 47, 48, 51, 52, 53, 54, 60,       respective incomes (in Rs) are given
                                          below. Calculate the median income.
                                          Income (in Rs):       10       20      30     40
65, 72.                                   Number of persons:    2        4       10     4
   You can see that there are two            In order to calculate the median
observations in the middle, namely 45     income, you may prepare the
and 46. The median can be obtained        frequency distribution as given below.
by taking the mean of the two
observations:                                             TABLE 5.4
                                          Computation of Median for Discrete Series
           45 + 46
Median =           = 45.5 marks           Income              No of            Cumulative
              2                           (in Rs)           persons(f)        frequency(cf)
In order to calculate median it is        10                    2                   2
important to know the position of the     20                    4                   6
median i.e. item/items at which the       30                   10                  16
                                          40                    4                  20
median lies. The position of the
median can be calculated by the              The median is located in the (N+1)/
following formula:                        2 = (20+1)/2 = 10.5th observation.
                            th
                                          This can be easily located through
                     (N+1)                cumulative frequency. The 10.5th
Position of median =        item
                        2                 observation lies in the c.f. of 16. The
Where N = number of items.                income corresponding to this is Rs 30,
   You may note that the above            so the median income is Rs 30.
formula gives you the position of the
median in an ordered array, not the       Continuous Series
median itself. Median is computed by      In case of continuous series you have
the formula:                              to locate the median class where
6 6                                                      STATISTICS FOR ECONOMICS




N/2th item [not (N+1)/2th item] lies.          In the above illustration median
The median can then be obtained as         class is the value of (N/2)th item
follows:                                   (i.e.160/2) = 80th item of the series,
               (N/2 c.f.)                  which lies in 35–40 class interval.
Median = L +                 h
                                           Applying the formula of the median
                    f
Where, L = lower limit of the median       as:
class,
                                                         TABLE 5.5
c.f. = cumulative frequency of the class    Computation of Median for Continuous
preceding the median class,                                Series
f = frequency of the median class,
                                           Daily wages       No. of     Cumulative
h = magnitude of the median class          (in Rs)        Workers (f)   Frequency
interval.
                                           20–25               14           14
    No adjustment is required if           25–30               28           42
frequency is of unequal size or            30–35               33           75
magnitude.                                 35–40               30          105
                                           40–45               20          125
Example 8                                  45–50               15          140
                                           50–55               13          153
Following data relates to daily wages      55–60                7          160
of persons working in a factory.
Compute the median daily wage.                             (N/2 c.f.)
                                           Median = L +                 h
Daily wages (in Rs):                                           f
55–60 50–55 45–50 40–45 35–40 30–35                    35 +(80 75)
25–30 20–25                                         =                 (40 35)
Number of workers:                                          30
   7      13     15  20  30    33                   = Rs 35.83
  28      14
                                              Thus, the median daily wage is
   The data is arranged in ascending
order here.                                Rs 35.83. This means that 50% of the
MEASURES OF CENTRAL TENDENCY                                                    6 7

workers are getting less than or equal    The third Quartile (denoted by Q3) or
to Rs 35.83 and 50% of the workers        upper Quartile has 75% of the items
are getting more than or equal to this    of the distribution below it and 25%
wage.                                     of the items above it. Thus, Q1 and Q3
   You should remember that               denote the two limits within which
median, as a measure of central           central 50% of the data lies.
tendency, is not sensitive to all the
values in the series. It concentrates
on the values of the central items of
the data.

                Activities
  •   Find mean and median for all
      four values of the series. What
      do you observe?

                TABLE 5.6
                                          Percentiles
   Mean and Median of different series    Percentiles divide the distribution into
  Series    X (Variable   Mean   Median   hundred equal parts, so you can get
            Values)                       99 dividing positions denoted by P1,
  A         1, 2, 3          ?     ?      P2, P3, ..., P99. P50 is the median value.
  B         1, 2, 30         ?     ?
  C         1, 2, 300        ?     ?
                                          If you have secured 82 percentile in a
  D         1, 2, 3000       ?     ?      management entrance examination, it
                                          means that your position is below 18
  •   Is median affected by extreme
      values? What are outliers?          percent of total candidates appeared
  •   Is median a better method than      in the examination. If a total of one
      mean?                               lakh students appeared, where do you
                                          stand?
Quartiles
                                          Calculation of Quartiles
Quartiles are the measures which
divide the data into four equal parts,    The method for locating the Quartile
each portion contains equal number        is same as that of the median in case
of observations. Thus, there are three    of individual and discrete series. The
quartiles. The first Quartile (denoted    value of Q1 and Q3 of an ordered series
by Q1) or lower quartile has 25% of       can be obtained by the following
the items of the distribution below it
                                          formula where N is the number of
and 75% of the items are greater than
                                          observations.
it. The second Quartile (denoted by Q2)
or median has 50% of items below it                     (N + 1)th
and 50% of the observations above it.     Q1= size of             item
                                                            4
6 8                                                    STATISTICS FOR ECONOMICS


               3(N +1)th                   Mode is the most frequently observed
Q3 = size of             item.             data value. It is denoted by Mo.
                   4
                                           Computation of Mode
Example 9
                                           Discrete Series
Calculate the value of lower quartile
from the data of the marks obtained        Consider the data set 1, 2, 3, 4, 4, 5.
by ten students in an examination.         The mode for this data is 4 because 4
22, 26, 14, 30, 18, 11, 35, 41, 12, 32.    occurs most frequently (twice) in the
    Arranging the data in an ascending     data.
order,
11, 12, 14, 18, 22, 26, 30, 32, 35, 41.    Example 10
                (N +1)th                   Look at the following discrete series:
Q1 = size of             item = size of
                    4                      Variable     10    20   30   40   50
                                           Frequency    2     8    20   10   5
(10 +1)th
          item = size of 2.75th item          Here, as you can see the maximum
    4                                      frequency is 20, the value of mode is
= 2nd item + .75 (3rd item – 2nd item)     30. In this case, as there is a unique
= 12 + .75(14 –12) = 13.5 marks.           value of mode, the data is unimodal.
                                           But, the mode is not necessarily
                 Activity                  unique, unlike arithmetic mean and
  •   Find out Q3 yourself.                median. You can have data with two
                                           modes (bi-modal) or more than two
5. MODE                                    modes (multi-modal). It may be
                                           possible that there may be no mode if
Sometimes, you may be interested in
                                           no value appears more frequent than
knowing the most typical value of a        any other value in the distribution. For
series or the value around which           example, in a series 1, 1, 2, 2, 3, 3, 4,
maximum concentration of items             4, there is no mode.
occurs. For example, a manufacturer
would like to know the size of shoes
that has maximum demand or style
of the shirt that is more frequently
demanded. Here, Mode is the most
                                              Unimodal Data        Bimodal Data
appropriate measure. The word mode
has been derived from the French           Continuous Series
word “la Mode” which signifies the         In case of continuous frequency
most fashionable values of a               distribution, modal class is the class
distribution, because it is repeated the   with largest frequency. Mode can be
highest number of times in the series.     calculated by using the formula:
MEASURES OF CENTRAL TENDENCY                                                                           6 9

                                                             exclusive to calculate the mode. If mid
                  D1
     MO = L +                      h                         points are given, class intervals are
                D1 + D2                                      to be obtained.
Where L = lower limit of the modal
class                                                        Example 11
D 1 = difference between the frequency                       Calculate the value of modal worker
of the modal class and the frequency                         family’s monthly income from the
of the class preceding the modal class                       following data:
(ignoring signs).                                            Income per month (in ’000 Rs)
D2 = difference between the frequency                           Below 50 Below 45 Below 40 Below 35
of the modal class and the frequency                            Below 30 Below 25 Below 20 Below 15
                                                             Number of families
of the class succeeding the modal                               97           95        90    80
class (ignoring signs).                                         60           30        12     4
h = class interval of the distribution.                          As you can see this is a case of
     You may note that in case of                            cumulative frequency distribution. In
continuous series, class intervals                           order to calculate mode, you will have
should be equal and series should be                         to covert it into an exclusive series. In
                                                    TABLE 5.7
                                                  Grouping Table
Income (in
’000 Rs)                                      Group Frequency
                        I              I             I
                                                     II                  IV           V          VI
45–50        97 –   95      = 2
40–45        95 –   90      = 5        7                                 17
35–40        90 –   80      = 10                       15
30–35        80 –   60      = 20       30                                             35
25–30        60 –   30      = 30                       50                                        60
20–25        30 –   12      = 18       48                                68
15–20        12 –    4      = 8                        26                             56
10–15                          4       12                                                        30

                                                      TABLE 5.8
                                                    Analysis Table
Columns                                                Class Intervals
              45–50            40–45        35–40      30–35     25–30        20–25   15–20    10–15
I                                                                    ×
I                                                                    ×          ×
 I
II                                                       ×           ×
IV                                                       ×           ×          ×
V                                                                    ×          ×          ×
VI                                            ×          ×           ×
Total               –              –          1          3           6          3          1     –
7 0                                                    STATISTICS FOR ECONOMICS




this example, the series is in the            •   Take a small survey in your class
descending order. Grouping and                    to know the student’s preference
Analysis table would be made to                   for Chinese food using
determine the modal class.                        appropriate measure of central
                                                  tendency.
    The value of the mode lies in
                                              •   Can mode be located
25–30 class interval. By inspection               graphically?
also, it can be seen that this is a modal
class.                                       .
                                            6 RELATIVE POSITION OF ARITHMETIC
Now L = 25, D1 = (30 – 18) = 12, D2
                                               MEAN, MEDIAN AND MODE
= (30 – 20) = 10, h = 5
    Using the formula, you can obtain       Suppose we express,
the value of the mode as:                       Arithmetic Mean = Me
MO (in ’000 Rs)                                 Median = Mi
                                                Mode = Mo
             D1
      M=             h                          so that e, i and o are the suffixes.
           D1 + D2                          The relative magnitude of the three are
            12                              M e>M i>M o or M e<M i<M o (suffixes
       = 25 +     5 = Rs 27,273             occurring in alphabetical order). The
           10+12                            median is always between the
Thus the modal worker family’s              arithmetic mean and the mode.
monthly income is Rs 27,273.
                                            7. CONCLUSION
                 Activities
                                            Measures of central tendency or
  •    A shoe company, making shoes         averages are used to summarise the
       for adults only, wants to know       data. It specifies a single most
       the most popular size of shoes.
                                            representative value to describe the
       Which average will be most
       appropriate for it?                  data set. Arithmetic mean is the most
                                            commonly used average. It is simple
MEASURES OF CENTRAL TENDENCY                                                        7 1

to calculate and is based on all the         graphically. In case of open-ended
observations. But it is unduly affected      distribution they can also be easily
by the presence of extreme items.            computed. Thus, it is important to
Median is a better summary for such          select an appropriate average
data. Mode is generally used to              depending upon the purpose of
describe the qualitative data. Median        analysis and the nature of the
and mode can be easily computed              distribution.



                                        Recap
      •   The measure of central tendency summarises the data with a single
          value, which can represent the entire data.
      •   Arithmetic mean is defined as the sum of the values of all observations
          divided by the number of observations.
      •   The sum of deviations of items from the arithmetic mean is always
          equal to zero.
      •   Sometimes, it is important to assign weights to various items
          according to their importance.
      •   Median is the central value of the distribution in the sense that the
          number of values less than the median is equal to the number greater
          than the median.
      •   Quartiles divide the total set of values into four equal parts.
      •   Mode is the value which occurs most frequently.




                                    EXERCISES

     .
    1 Which average would be suitable in the following cases?
       i)
       ( Average size of readymade garments.
        i)
       (i Average intelligence of students in a class.
        ii
       (i) Average production in a factory per shift.
        i)
       (v Average wages in an industrial concern.
        v
       () When the sum of absolute deviations from average is least.
        v)
       (i When quantities of the variable are in ratios.
        vi
       (i) In case of open-ended frequency distribution.
     .
    2 Indicate the most appropriate alternative from the multiple choices
       provided against each question.
    i)
    ( The most suitable average for qualitative measurement is
        a
       () arithmetic mean
        b
       ( ) median
        c
       () mode
7 2                                                      STATISTICS FOR ECONOMICS

           d
          ( ) geometric mean
           e
          () none of the above
       i)
      (i Which average is affected most by the presence of extreme items?
           a
          () median
           b
          ( ) mode
           c
          () arithmetic mean
           d
          ( ) geometric mean
           e
          () harmonic mean
       ii
      (i) The algebraic sum of deviation of a set of n values from A.M. is
           a
          () n
           b
          () 0
           c
          () 1
           d
          ( ) none of the above
          [Ans. (i) b (ii) c (iii) b]
       .
      3 Comment whether the following statements are true or false.
         i)
         ( The sum of deviation of items from median is zero.
          i)
         (i An average alone is not enough to compare series.
          ii
         (i) Arithmetic mean is a positional value.
          i)
         (v Upper quartile is the lowest value of top 25% of items.
          v
         () Median is unduly affected by extreme observations.
         [Ans. (i) False (ii) True (iii) False (iv) True (v) False]
       .
      4 If the arithmetic mean of the data given below is 28, find (a) the missing
         frequency, and (b) the median of the series:
         Profit per retail shop (in Rs)    0-10 10-20 20-30 30-40 40-50 50-60
         Number of retail shops             12    18    27      -    17      6
         (Ans. The value of missing frequency is 20 and value of the median is
         Rs 27.41)
       .
      5 The following table gives the daily income of ten workers in a factory.
         Find the arithmetic mean.
         Workers                 A   B    C    D    E    F    G    H     I   J
         Daily Income (in Rs)   120 150 180 200 250 300 220 350 370 260
         (Ans. Rs 240)
       .
      6 Following information pertains to the daily income of 150 families.
         Calculate the arithmetic mean.
         Income (in Rs)          Number of families
         More than 75                   150
         ,,       85                    140
         ,,       95                    115
         ,,      105                     95
         ,,      115                     70
         ,,      125                     60
         ,,      135                     40
         ,,      145                     25
         (Ans. Rs 116.3)
MEASURES OF CENTRAL TENDENCY                                                       7 3

    .
   7 The size of land holdings of 380 families in a village is given below. Find
      the median size of land holdings.
      Size of Land Holdings (in acres)
          Less than 100     100–200 200 – 300 300–400 400 and above. –
      Number of families
                 40             89         148         64              39
      (Ans. 241.22 acres)
    .
   8 The following series relates to the daily income of workers employed in a
      firm. Compute (a) highest income of lowest 50% workers (b) minimum
      income earned by the top 25% workers and (c) maximum income earned
      by lowest 25% workers.
      Daily Income (in Rs) 10–14 15–19 20–24 25–29 30–34 35–39
      Number of workers        5       10       15       20       10       5
      (Hint: compute median, lower quartile and upper quartile.)
      [Ans. (a) Rs 25.11 (b) Rs 19.92 (c) Rs 29.19]
    .
   9 The following table gives production yield in kg. per hectare of wheat of
      150 farms in a village. Calculate the mean, median and mode production
      yield.
      Production yield (kg. per hectare)
         50–53 53–56 56–59 59–62 62–65 65–68 68–71 71–74 74–77
      Number of farms
           3       8      14      30     36 28 16   10     5
      (Ans. mean = 63.82 kg. per hectare, median = 63.67 kg. per hectare,
      mode = 63.29 kg. per hectare)
CHAPTER

      7                                                  Correlation




                                            As the summer heat rises, hill
 Studying this chapter should           stations, are crowded with more and
 enable you to:                         more visitors. Ice-cream sales become
 • understand the meaning of the
    term correlation;
                                        more brisk. Thus, the temperature is
 • understand the nature of             related to number of visitors and sale
    relationship      between two       of ice-creams. Similarly, as the supply
    variables;                          of tomatoes increases in your local
 • calculate the different measures
                                        mandi, its price drops. When the local
    of correlation;
 • analyse the degree and direction     harvest starts reaching the market,
    of the relationships.               the price of tomatoes drops from a
                                        princely Rs 40 per kg to Rs 4 per kg or
1. INTRODUCTION                         even less. Thus supply is related to
                                        price. Correlation analysis is a means
In previous chapters you have learnt
                                        for examining such relationships
how to construct summary measures
                                        systematically. It deals with questions
out of a mass of data and changes
among similar variables. Now you will   such as:
learn how to examine the relationship   • Is there any relationship between
between two variables.                       two variables?
    92                                                  STATISTICS FOR ECONOMICS

                                             integral part of the theory of demand,
                                             which you will read in class XII. Low
                                             rainfall is related to low agricultural
                                             productivity. Such examples of
                                             relationship may be given a cause and
                                             effect interpretation. Others may be
                                             just coincidence. The relation between
                                             the arrival of migratory birds in a
•        If the value of one variable        sanctuary and the birth rates in the
         changes, does the value of the      locality can not be given any cause
         other also change?                  and ef fect interpretation. The
                                             relationships are simple coincidence.
                                             The relationship between size of the
                                             shoes and money in your pocket is
                                             another such example. Even if
                                             relationship exist, they are difficult to
                                             explain it.
                                                 In another instance a third
                                             variable’s impact on two variables may
                                             give rise to a relation between the two
                                             variables. Brisk sale of ice-creams may
•        Do both the variables move in the   be related to higher number of deaths
         same direction?                     due to drowning. The victims are not
                                             drowned due to eating of ice-creams.
                                             Rising temperature leads to brisk sale
                                             of ice-creams. Moreover, large number
                                             of people start going to swimming
                                             pools to beat the heat. This might have
                                             raised the number of deaths by
                                             drowning. Thus temperature is behind
                                             the high correlation between the sale
                                             of ice-creams and deaths due to
                                             drowning.
•        How strong is the relationship?
                                             What Does Correlation Measure?
2. TYPES       OF   RELATIONSHIP             Correlation studies and measures the
Let us look at various types of              direction and intensity of relationship
relationship. The relation between           among variables. Correlation
movements in quantity demanded               measures covariation, not causation.
and the price of a commodity is an           Correlation should never be
CORRELATION                                                                      93

interpreted as implying cause and          3. T E C H N I Q U E S   FOR   MEASURING
effect relation. The presence of              CORRELATION
correlation between two variables X        Widely used techniques for the study
and Y simply means that when the           of correlation are scatter diagrams,
value of one variable is found to          Karl Pearson’s coef ficient of
change in one direction, the value of      correlation and Spearman’s rank
the other variable is found to change      correlation.
either in the same direction (i.e.             A scatter diagram visually presents
positive change) or in the opposite        the nature of association without
direction (i.e. negative change), but in   giving any specific numerical value. A
a definite way. For simplicity we          numerical measure of linear
assume here that the correlation, if       relationship between two variables is
it exists, is linear, i.e. the relative    given by Karl Pearson’s coefficient of
movement of the two variables can be       correlation. A relationship is said to
                                           be linear if it can be represented by a
represented by drawing a straight line
                                           straight line. Another measure is
on graph paper.
                                           Spearman’s coefficient of correlation,
                                           which measures the linear association
Types of Correlation
                                           between ranks assigned to indiviual
Correlation is commonly classified         items according to their attributes.
into negative and positive correlation.    Attributes are those variables which
The correlation is said to be positive     cannot be numerically measured such
when the variables move together in        as intelligence of people, physical
the same direction. When the income        appearance, honesty etc.
rises, consumption also rises. When
                                           Scatter Diagram
income falls, consumption also falls.
Sale of ice-cream and temperature          A scatter diagram is a useful
move in the same direction. The            technique for visually examining the
correlation is negative when they move     for m of relationship, without
in opposite directions. When the price     calculating any numerical value. In
                                           this technique, the values of the two
of apples falls its demand increases.
                                           variables are plotted as points on a
When the prices rise its demand
                                           graph paper. The cluster of points, so
decreases. When you spend more time        plotted, is referred to as a scatter
in studying, chances of your failing       diagram. From a scatter diagram, one
decline. When you spend less hours         can get a fairly good idea of the nature
in study, chances of your failing          of relationship. In a scatter diagram
increase. These are instances of           the degree of closeness of the scatter
negative correlation. The variables        points and their overall direction
move in opposite direction.                enable us to examine the relation-
 94                                                      STATISTICS FOR ECONOMICS

ship. If all the points lie on a line, the      Inspection of the scatter diagram
correlation is perfect and is said to be     gives an idea of the nature and
unity. If the scatter points are widely      intensity of the relationship.
dispersed around the line, the
correlation is low. The correlation is       Karl Pearson’s Coef ficient of
said to be linear if the scatter points      Correlation
lie near a line or on a line.                This is also known as product moment
    Scatter diagrams spanning over           correlation and simple correlation
Fig. 7.1 to Fig. 7.5 give us an idea of      coefficient. It gives a precise numerical
the relationship between two                 value of the degree of linear
variables. Fig. 7.1 shows a scatter          relationship between two variables X
around an upward rising line                 and Y. The linear relationship may be
                                             given by
indicating the movement of the
                                                 Y = a + bX
variables in the same direction. When
                                                 This type of relation may be
X rises Y will also rise. This is positive
                                             described by a straight line. The
correlation. In Fig. 7.2 the points are      intercept that the line makes on the
found to be scattered around a               Y-axis is given by a and the slope of
downward sloping line. This time the         the line is given by b. It gives the
variables move in opposite directions.       change in the value of Y for very small
When X rises Y falls and vice versa.         change in the value of X. On the other
This is negative correlation. In Fig.7.3     hand, if the relation cannot be
there is no upward rising or downward        represented by a straight line as in
sloping line around which the points             Y = X2
are scattered. This is an example of         the value of the coefficient will be zero.
no correlation. In Fig. 7.4 and Fig. 7.5     It clearly shows that zero correlation
the points are no longer scattered           need not mean absence of any type
around an upward rising or downward          of relation between the two variables.
falling line. The points themselves are          Let X1, X2, ..., XN be N values of X
on the lines. This is referred to as         and Y1, Y2 ,..., YN be the corresponding
perfect positive correlation and perfect     values of Y. In the subsequent
negative correlation respectively.           presentations         the    subscripts
                                             indicating the unit are dropped for the
                Activity
                                             sake of simplicity. The arithmetic
                                             means of X and Y are defined as
  •   Collect data on height, weight
                                                  ΣX             ΣY
      and marks scored by students
      in your class in any two subjects
                                             X=      ;      Y=
                                                   N             N
      in class X. Draw the scatter           and their variances are as follows
      diagram of these variables taking
      two at a time. What type of                     Σ( X - X )2 ΣX 2
      relationship do you find?              s2 x =              =     - X2
                                                          N        N
CORRELATION   95
 96                                                              STATISTICS FOR ECONOMICS

                                                      Properties of Correlation Coefficient
                      Σ( Y - Y )2 ΣY 2
and     s   2
                y   =            =     - Y2           Let us now discuss the properties of
                          N        N                  the correlation coefficient
   The standard deviations of X and                   • r has no unit. It is a pure number.
Y respectively are the positive square                    It means units of measurement are
roots of their variances. Covariance of                   not part of r. r between height in
X and Y is defined as                                     feet and weight in kilograms, for
                                                          instance, is 0.7.
                      Σ( X - X )( Y - Y ) Σxy
Cov(X,Y) =                               =            • A negative value of r indicates an
                             N             N              inverse relation. A change in one
Where x = X - X and y = X - Y                             variable is associated with change
                                                          in the other variable in the
are the deviations of the ith value of X
                                                          opposite direction. When price of
and Y from their mean values
                                                          a commodity rises, its demand
respectively.
                                                          falls. When the rate of interest
   The sign of covariance between X
                                                          rises the demand for funds also
and Y determines the sign of the
                                                          falls. It is because now funds have
correlation coefficient. The standard
                                                          become costlier.
deviations are always positive. If the
covariance is zero, the correlation
coefficient is always zero. The product
moment correlation or the Karl
Pearson’s measure of correlation is
given by

      r = Σxy Ns s                           ...(1)
                x y

or
            Σ( X - X ) ( Y - Y )
r=                                           ...(2)
        Σ( X - X )2          Σ( Y - Y )2
or
                          (ΣX )(ΣY )
                ΣXY -                                 •   If r is positive the two variables
r=                           N                            move in the same direction. When
                     (ΣX ) 2        (ΣY ) 2 ...(3)        the price of coffee, a substitute of
       ΣX 2 -                ΣY 2 -
                       N             N                    tea, rises the demand for tea also
                                                          rises. Improvement in irrigation
or
                    NΣXY (ΣX )(ΣY )                       facilities is associated with higher
r=                                                        yield. When temperature rises the
       NΣX 2 (ΣX )2           NΣY 2 (ΣY )2 ...(4)         sale of ice-creams becomes brisk.
CORRELATION                                                                        97

•   If r = 0 the two variables are          before correlation is calculated. An
    uncorrelated. There is no linear        epidemic spreads in some villages and
    relation between them. However          the gover nment sends a team of
    other types of relation may be          doctors to the affected villages. The
    there.                                  correlation between the number of
•   If r = 1 or r = –1 the correlation is   deaths and the number of doctors sent
    perfect. The relation between them      to the villages is found to be positive.
    is exact.                               Normally the health care facilities
•   A high value of r indicates strong      provided by the doctors are expected
    linear relationship. Its value is       to reduce the number of deaths
    said to be high when it is close to
                                            showing a negative correlation. This
    +1 or –1.
                                            happened due to other reasons. The
•   A low value of r indicates a weak
                                            data relate to a specific time period.
    linear relation. Its value is said to
                                            Many of the reported deaths could be
    be low when it is close to zero.
                                            terminal cases where the doctors
•   The value of the correlation
    coefficient lies between minus one      could do little. Moreover, the benefit
    and plus one, –1 ≤ r ≤ 1. If, in        of the presence of doctors becomes
                                            visible after some time. It is also
    any exercise, the value of r is
    outside this range it indicates error   possible that the reported deaths are
    in calculation.                         not due to the epidemic. A tsunami
•   The value of r is unaffected by the     suddenly hits the state and death toll
    change of origin and change of          rises.
    scale. Given two variables X and Y          Let us illustrate the calculation of
    let us define two new variables.        r by examining the relationship
                                            between years of schooling of the
         X    A          Y   C              farmer and the annual yield per acre.
    U=            ; V=
          B           D
where A and C are assumed means of          Example 1
X and Y respectively. B and D are
common factors. Then                              No. of years       Annual yield per
                                                  of schooling       acre in ’000 (Rs)
    rxy = ruv                                      of farmers
    This. property is used to calculate               0                     4
correlation coefficient in a highly                   2                     4
                                                      4                     6
simplified manner, as in the step                     6                    10
deviation method.                                     8                    10
    As you have read in chapter 1, the               10                     8
statistical methods are no substitute                12                     7
for common sense. Here, is another             Formula 1 needs the value of
example, which highlights the need for
understanding the data properly
                                            Σxy, s x , s y
 98                                                                   STATISTICS FOR ECONOMICS

      From Table 7.1 we get,                          education, higher will be the yield per
                                                      acre. It underlines the importance of
      Σxy = 42,                                       farmers’ education.
                                                         To use formula (3)
               Σ( X - X )2   112
      sx =                 =     ,
                   N          7                                                 (ΣX )(ΣY )
                                                                       ΣXY -
                                                            r=                     N
          Σ( Y - Y )2    38                                                (ΣX ) 2        (ΣY ) 2     ...(3)
   sy =                =                                          ΣX 2 -           ΣY 2 -
              N          7                                                   N             N
   Substituting these values in                       the value of the following expressions
formula (1)                                           have    to    be     calculated    i.e.
              42                                       ΣXY, ΣX 2 , ΣY 2 .
   r=                    = 0.644
          112       38                                    Now apply formula (3) to get the
       7                                              value of r.
            7         7
   The same value can be obtained                         Let us know the interpretation of
from formula (2) also.                                different values of r. The correlation
                                                      coefficient between marks secured in
             Σ ( X - X )( Y - Y )                     English and Statistics is, say, 0.1. It
r=                                          ...(2)
        Σ ( X - X )2        Σ ( Y - Y )2              means that though the marks secured
                                                      in the two subjects are positively
                42                                    correlated, the strength of the
      r=              = 0.644
             112   38                                 relationship is weak. Students with high
   Thus years of education of the                     marks in English may be getting
farmers and annual yield per acre are                 relatively low marks in statistics. Had
positively correlated. The value of r is              the value of r been, say, 0.9, students
also large. It implies that more the                  with high marks in English will
number of years farmers invest in                     invariably get high marks in Statistics.

                                           TABLE 7.1
             Calculation of r between years of schooling of farmers and annual yield
Years of        (X– X )   (X– X ) 2      Annual yield       (Y– Y )     (Y– Y )2             (X– X )(Y– Y )
Education                             per acre in ’000 Rs
(X)                                           (Y)
 0                –6        36               4               –3             9                    18
 2                –4        16               4               –3             9                    12
 4                –2         4               6               –1             1                     2
 6                 0         0              10                3             9                     0
 8                 2         4              10                3             9                     6
10                 4        16               8                1             1                     4
12                 6        36               7                0             0                     0

Σ X=42                 Σ (X– X )2=112      Σ Y=49                     Σ (Y– Y )2=38 Σ (X– X )(Y– Y )=42
CORRELATION                                                                         99

    An example of negative correlation                     TABLE 7.2
is the relation between arrival of          Year      Annual growth Gross Domestic
vegetables in the local mandi and price                of National     Saving as
                                                         Income    percentage of GDP
of vegetables. If r is –0.9, vegetable
                                            1992–93        14              24
supply in the local mandi will be           1993–94        17              23
accompanied by lower price of               1994–95        18              26
vegetables. Had it been –0.1 large          1995–96        17              27
                                            1996–97        16              25
vegetable supply will be accompanied
                                            1997–98        12              25
by lower price, not as low as the price,    1998–99        16              23
when r is –0.9. The extent of price fall    1999–00        11              25
                                            2000–01         8              24
depends on the absolute value of r.
                                            2001–02        10              23
Had it been zero there would have
been no fall in price, even after large    Source: Economic Survey, (2004–05) Pg. 8,9
supplies in the market. This is also a     a pr operty of r. It is that r is
possibility if the increase in supply is   independent of change in origin and
taken care of by a good transport          scale. It is also known as step
network transferring it to other           deviation method. It involves the
markets.                                   transformation of the variables X and
                                           Y as follows:
               Activity                       X A         Y B
                                           U=       ;V =
  •   Look at the following table.              h          k
      Calculate r between annual           where A and B are assumed means, h
      growth of national income at         and k are common factors.
      current price and the Gross          Then rUV = rXY
      Domestic Saving as percentage
                                              This can be illustrated with the
      of GDP.
                                           exercise of analysing the correlation
                                           between price index and money
Step deviation method to calculate         supply.
correlation coefficient.
                                           Example 2
   When the values of the variables
are large, the burden of calculation       Price         120 150 190       220    230
                                           index (X)
can be considerably reduced by using       Money       1800 2000 2500     2700   3000
a pr operty of r. It is that r is          supply
independent of change in origin and        in Rs crores (Y)
scale. It is also known as step               The simplification, using step
deviation method. It involves the          deviation method is illustrated below.
transformation of the variables X and      Let A = 100; h = 10; B = 1700 and
Y as follows:                              k = 100
100                                                                    STATISTICS FOR ECONOMICS


    The table of transformed variables                                     Activity
is as follows:                                                •   Take some examples of India’s
    Calculation of r between price                                population and national income.
index and money supply using step                                 Calculate the corr elation
                                                                  between them using step
deviation method                                                  deviation method and see the
                                                                  simplification.
                         TABLE 7.3
      U             V                                       Spearman’s rank correlation
 Ê X - 100 ˆ Ê Y - 1700 ˆ                                   Spearman’s rank correlation was
 Á
 Ë 10 ˜ Á 100 ˜
           ¯ Ë          ¯       U2      V2         UV
                                                            developed by the British psychologist
       2             1           4         1        2       C.E. Spearman. It is used when the
       5             3          25         9       15       variables cannot be measured
       9             8          81      64         72       meaningfully as in the case of price,
      12            10         144     100        120       income, weight etc. Ranking may be
      13            13         169     169        169       more      meaningful       when      the
                                                            measurements of the variables are
ΣU = 41; ΣV = 35; ΣU 2 = 423;
                                                            suspect. Consider the situation where
ΣV 2 = 343; ΣUV = 378                                       we are required to calculate the
Substituting these values in formula                        correlation between height and weight
(3)                                                         of students in a remote village. Neither
                                                            measuring rods nor weighing scales
                              (ΣU )(ΣV )                    are available. The students can be
                    ΣUV -
r=                               N                          easily ranked in terms of height and
                    (ΣU ) 2                    (ΣV )2 (3)   weight without using measuring rods
           ΣU 2 -                ΣV 2 -                     and weighing scales.
                      N                          N
                                                                There are also situations when you
                                                            are required to quantify qualities such
                              41 ¥ 35
                    378 -                                   as fairness, honesty etc. Ranking may
  =                              5                          be a better alternative to quantifica-
                     (41) 2                    (35) 2       tion of qualities. Moreover, sometimes
           423 -                     343 -
                      5                         5           the correlation coefficient between two
                                                            variables with extreme values may be
  = 0.98
                                                            quite different from the coefficient
                                                            without the extreme values. Under
    This strong positive correlation
                                                            these circumstances rank correlation
between price index and money
                                                            provides a better alternative to simple
supply is an important premise of                           correlation.
monetary policy. When the money                                  Rank correlation coefficient and
supply grows the price index also                           simple correlation coefficient have the
rises.                                                      same interpretation. Its formula has
CORRELATION                                                                        101

been derived from simple correlation           concerning the data is not utilised.
coefficient where individual values            The first differences of the values of
have been replaced by ranks. These             the items in the series, arranged in
ranks are used for the calculation of          order of magnitude, are almost never
correlation. This coefficient provides         constant. Usually the data cluster
a measure of linear association                around the central values with smaller
between ranks assigned to these                differences in the middle of the array.
units, not their values. It is the             If the first differences were constant
Product Moment Correlation between             then r and r k would give identical
the ranks. Its formula is                      results. The first difference is the
                                               difference of consecutive values.
              6ΣD
                      2
    rk = 1                       ...(4)        Rank correlation is preferred to
             n3 n                              Pearsonian coefficient when extreme
where n is the number of observations          values are present. In general
and D the deviation of ranks assigned          rk is less than or equal to r.
to a variable from those assigned to               The calculation of rank correlation
the other variable. When the ranks are         will be illustrated under three
repeated the formula is                        situations.
rk = 1–                                        1. The ranks are given.
                                               2. The ranks are not given. They have
  È      ( m 31 - m1 ) ( m 32 - m 2 )      ˘
6 ÍΣD2 +               +              + ...˙       to be worked out from the data.
  Î           12            12             ˚   3. Ranks are repeated.
                n( n 2 - 1)


where m1, m2, ..., are the number of           Case 1: When the ranks are given
                            m 31 m1            Example 3
repetitions of ranks and              ...,
                                12             Five persons are assessed by three
their corresponding correction                 judges in a beauty contest. We have
factors. This correction is needed for         to find out which pair of judges has
every repeated value of both variables.        the nearest approach to common
If three values are repeated, there will       perception of beauty.
be a correction for each value. Every                        Competitors
time m1 indicates the number of times
                                                 Judge 1     2     3       4   5
a value is repeated.
    All the properties of the simple             A      1    2     3       4   5
                                                 B      2    4     1       5   3
correlation coefficient are applicable           C      1    3     5       2   4
here. Like the Pearsonian Coefficient
of correlation it lies between 1 and              There are 3 pairs of judges
–1. However, generally it is not as            necessitating calculation of rank
accurate as the ordinary method. This          correlation thrice. Formula (4) will be
is due the fact that all the information       used —
102                                                               STATISTICS FOR ECONOMICS


            6ΣD2                                      Case 2: When the ranks are not given
      rs = 1 -                  ...(4)
           n3 - n
                                                      Example 4
   The rank correlation between A
and B is calculated as follows:                       We are given the percentage of marks,
                                                      secured by 5 students in Economics
                                                      and Statistics. Then the ranking has
         A        B       D        D2
                                                      to be worked out and          the rank
         1        2       –1       1                  correlation is to be calculated.
         2        4       –2       4
         3        1       2        4
         4        5       –1       1                    Student    Marks in       Marks in
         5        3       2        4                               Statistics    Economics
                                                                      (X)           (Y)
       Total                       14
                                                        A              85            60
                                                        B              60            48
   Substituting these values in                         C              55            49
formula (4)                                             D              65            50
                                                        E              75            55
             6ΣD2
 rs = 1 -                                    ...(4)
             n3 - n                                     Student    Ranking in    Ranking in
       6 ¥ 14       84                                              Statistics   Economics
   =1-        =1-       = 1 - 0.7 = 0.3                               (Rx)          (RY )
        5 -5
         3
                   120
                                                        A              1             1
   The rank correlation between A                       B              4             5
and C is calculated as follows:                         C              5             4
                                                        D              3             3
                                                        E              2             2
             A        C        D        D2
             1        1     0           0                Once the ranking is complete
             2        3    –1           1             formula (4) is used to calculate rank
             3        5    –2           4
                                                      correlation.
             4        2     2           4
             5        4     1           1             Case 3: When the ranks are repeated
         Total                         10
                                                      Example 5
   Substituting these values in                       The values of X and Y are given as
formula (4) the rank correlation is 0.5.                X   25    45   35   40 15   19    35 42
Similarly, the rank correlation                         Y   55    60   30   35 40   42    36 48
between the rankings of judges B and                  In order to work out the rank
C is 0.9. Thus, the perceptions of                    correlation, the ranks of the values
judges A and C are the closest. Judges                are worked out. Common ranks are
B and C have very different tastes.                   given to the repeated items. The
CORRELATION                                                                            103

common rank is the mean of the ranks             m 3 - m 23 - 2 1
which those items would have                             =       =
                                                   12        12    2
assumed if they were slightly different
                                                 Using this equation
from each other. The next item will be
assigned the rank next to the rank                            È        (m3 - m ) ˘
                                                            6 ÍΣD 2 +            ˙
                                                    rs = 1 - Î                   ˚ ...(5)
already assumed. The formula of                                           12
Spear man’s      rank    correlation                               n 3 - n

coef ficient when the ranks are                  Substituting the values of these
repeated is as follows                           expressions
rs = 1 -                                                  6(65.5 + 0.5)     396
                                                 rs = 1 -               =1-
  È      ( m - m1 ) ( m 2 - m 2 )
               3              3
                                       ˘                     83 - 8         504
6 ÍΣD2 +            +             + ...˙
                1

  Î         12           12            ˚            = 1 - 0.786 = 0.214
             n( n 2 - 1)                         Thus there is positive rank correlation
   where m1, m2, ..., are the number             between X and Y. Both X and Y move
of r epetitions of ranks and                     in the same direction. However, the
                                                 relationship cannot be described as
m 31 - m1                                        strong.
          ..., their corresponding
    12
correction factors.                                              Activity
   X has the value 35 both at the                  •   Collect data on marks scored by
4th and 5th rank. Hence both are                       10 of your classmates in class
given the average rank i.e.,                           IX and X examinations. Calculate
                                                       the rank correlation coefficient
4+5                                                    between them. If your data do not
    th       = 4.5 th rank
 2                                                     have any repetition, repeat the
                                                       exercise by taking a data set
                                                       having repeated ranks. What are
X       Y Rank of Rank of    Deviation in D2           the circumstances in which rank
                              Ranking                  corr elation     coef ficient   is
             XR'    YR''      D=R'–R''                 preferred to simple correlation
25      55   6       2             4     16            coefficient? If data are precisely
45      80   1       1             0      0            measured will you still prefer
35      30   4.5     8             3.5   12.25         rank correlation coefficient to
40      35   3       7            –4     16            simple correlation? When can
15      40   8       5             3      9            you be indifferent to the choice?
19      42   7       4             3      9            Discuss in class.
35      36   4.5     6            –1.5    2.25
42      48   2       3            –1      1
                                                 4. CONCLUSION
Total                    ΣD = 65.5
                                                 We have discussed some techniques
The necessary correction thus is                 for studying the relationship between
104                                                       STATISTICS FOR ECONOMICS

two variables, particularly the linear        relationship. When the variables
relationship. The scatter diagram gives       cannot be measured precisely, rank
a visual presentation of the                  correlation can meaningfully be used.
relationship and is not confined to           These measures however do not imply
linear relations. Measures of                 causation. The knowledge of
correlation such as Karl Pearson’s            correlation gives us an idea of the
coefficient of corr elation and               direction and intensity of change in a
Spearman’s rank correlation are               variable when the correlated variable
strictly the measures of linear               changes.



                                          Recap
        •   Correlation analysis studies the relation between two variables.
        •   Scatter diagrams give a visual presentation of the nature of
            relationship between two variables.
        •   Karl Pearson’s coefficient of correlation r measures numerically only
            linear relationship between two variables. r lies between –1 and 1.
        •   When the variables cannot be measured precisely Spearman’s rank
            correlation can be used to measure the linear relationship
            numerically.
        •   Repeated ranks need correction factors.
        •   Correlation does not mean causation. It only means
            covariation.




                                      EXERCISES

      1. The unit of correlation coefficient between height in feet and weight in
         kgs is
         (i) kg/feet
         (ii) percentage
         (iii) non-existent
      2. The range of simple correlation coefficient is
         (i) 0 to infinity
         (ii) minus one to plus one
         (iii) minus infinity to infinity
      3. If rxy is positive the relation between X and Y is of the type
         (i) When Y increases X increases
         (ii) When Y decreases X increases
         (iii) When Y increases X does not change
CORRELATION                                                                             105

    4. If rxy = 0 the variable X and Y are
       (i) linearly related
       (ii) not linearly related
       (iii) independent
    5. Of the following three measures which can measure any type of relationship
       (i) Karl Pearson’s coefficient of correlation
       (ii) Spearman’s rank correlation
       (iii) Scatter diagram
    6. If precisely measured data are available the simple correlation coefficient
       is
       (i) more accurate than rank correlation coefficient
       (ii) less accurate than rank correlation coefficient
       (iii) as accurate as the rank correlation coefficient
    7. Why is r preferred to covariance as a measure of association?
    8. Can r lie outside the –1 and 1 range depending on the type of data?
    9. Does correlation imply causation?
   10. When is rank correlation more precise than simple correlation coefficient?
   11. Does zero correlation mean independence?
   12. Can simple correlation coefficient measure any type of relationship?
   13. Collect the price of five vegetables from your local market every day for a
       week. Calculate their correlation coefficients. Interpret the result.
   14. Measure the height of your classmates. Ask them the height of their
       benchmate. Calculate the correlation coefficient of these two variables.
       Interpret the result.
   15. List some variables where accurate measurement is difficult.
   16. Interpret the values of r as 1, –1 and 0.
   17. Why does rank correlation coefficient differ from Pearsonian correlation
       coefficient?
   18. Calculate the correlation coefficient between the heights of fathers in inches
       (X) and their sons (Y)
       X 65        66      57      67        68     69      70       72
       Y 67        56      65      68        72     72      69       71
       (Ans. r = 0.603)
   19. Calculate the correlation coefficient between X and Y and comment on
      their relationship:
       X    –3      –2       –1       1       2       3
       Y      9      4        1       1       4       9
       (Ans. r = 0)
106                                                    STATISTICS FOR ECONOMICS

      20. Calculate the correlation coefficient between X and Y and comment on
          their relationship
         X      1       3     4      5      7      8
         Y      2       6     8     10     14     16
         (Ans. r = 1)




                                      Activity
            •   Use all the formulae discussed here to calculate r between
                India’s national income and export taking at least ten
                observations.
                                                                       CHAPTER


Measures of Dispersion




                                           measures, which seek to quantify
  Studying this chapter should
                                           variability of the data.
  enable you to:
  • know the limitations of averages;
                                               Three friends, Ram, Rahim and
  • appreciate the need of measures        Maria are chatting over a cup of tea.
     of dispersion;                        During the course of their
  • enumerate various measures of          conversation, they start talking about
     dispersion;                           their family incomes. Ram tells them
  • calculate the measures and             that there are four members in his
     compare them;                         family and the average income per
  • distinguish between absolute           member is Rs 15,000. Rahim says that
     and relative measures.                the average income is the same in his
                                           family, though the number of members
1. INTRODUCTION                            is six. Maria says that there are five
                                           members in her family, out of which
In the previous chapter, you have          one is not working. She calculates that
studied how to sum up the data into        the average income in her family too,
a single representative value. However,    is Rs 15,000. They are a little surprised
that value does not reveal the             since they know that Maria’s father is
variability present in the data. In this   earning a huge salary. They go into
chapter you will study those               details and gather the following data:
MEASURES OF DISPERSION                                                              75


             Family Incomes                   variation in values, your understan-
 Sl. No.          Ram     Rahim    Maria      ding of a distribution improves
   1.            12,000    7,000         0    considerably. For example, per capita
   2.            14,000   10,000    7,000     income gives only the average income.
   3.            16,000   14,000    8,000     A measure of dispersion can tell you
   4.            18,000   17,000   10,000     about income inequalities, thereby
   5.             -----   20,000   50,000
   6.             -----   22,000     ------
                                              improving the understanding of the
                                              relative standards of living enjoyed by
Total income     60,000 90,000     75,000
Average income   15,000 15,000     15,000
                                              different strata of society.
                                                    Dispersion is the extent to which
    Do you notice that although the           values in a distribution differ from the
average is the same, there are                average of the distribution.
considerable differences in individual              To quantify the extent of the
incomes?                                      variation, there are certain measures
    It is quite obvious that averages         namely:
try to tell only one aspect of a              (i) Range
distribution i.e. a representative size       (ii) Quartile Deviation
of the values. To understand it better,       (iii) Mean Deviation
you need to know the spread of values         (iv) Standard Deviation
also.
                                                 Apart from these measures which
    You can see that in Ram’s family.,
                                              give a numerical value, there is a
dif ferences      in   incomes      are
                                              graphic method for estimating
comparatively lower. In Rahim’s
                                              dispersion.
family, differences are higher and in
                                                 Range and Quartile Deviation
Maria’s family are the highest.
                                              measure the dispersion by calculating
Knowledge of only average is
                                              the spread within which the values lie.
insufficient. If you have another value
                                              Mean Deviation and Standard
which reflects the quantum of
                                              Deviation calculate the extent to
                                              which the values differ from the
                                              average.

                                              2. MEASURES BASED     UPON   SPREAD   OF
                                                 VALUES
                                              Range
                                              Range (R) is the difference between the
                                              largest (L) and the smallest value (S)
                                              in a distribution. Thus,
                                              R=L–S
                                                  Higher value of Range implies
                                              higher dispersion and vice-versa.
76                                                     STATISTICS FOR ECONOMICS


               Activities                  Quartile Deviation
  Look at the following values:            The presence of even one extremely
  20, 30, 40, 50, 200                      high or low value in a distribution can
  • Calculate the Range.                   reduce the utility of range as a
  • What is the Range if the value         measure of dispersion. Thus, you may
      200 is not present in the data
                                           need a measure which is not unduly
      set?
  • If 50 is replaced by 150, what         affected by the outliers.
      will be the Range?                       In such a situation, if the entire
                                           data is divided into four equal parts,
                                           each containing 25% of the values, we
           Range: Comments
  Range is unduly affected by extreme      get the values of Quartiles and
  values. It is not based on all the       Median. (You have already read about
  values. As long as the minimum and       these in Chapter 5).
  maximum values remain unaltered,             The upper and lower quartiles (Q3
  any change in other values does not      and Q 1, respectively) are used to
  affect range. It can not be calculated   calculate Inter Quartile Range which
  for open-ended frequency distri-
                                           is Q3 – Q1.
  bution.
                                               Inter -Quartile Range is based
    Notwithstanding some limitations,      upon middle 50% of the values in a
Range is understood and used               distribution and is, therefore, not
frequently because of its simplicity.      affected by extreme values. Half of
For example, we see the maximum            the Inter -Quartile Range is called
and minimum temperatures of                Quartile Deviation. Thus:
different cities almost daily on our TV                 Q3 - Q1
screens and form judgments about the          Q .D . =
                                                           2
temperature variations in them.
                                               Q.D. is therefore also called Semi-
  Open-ended distributions are those       Inter Quartile Range.
  in which either the lower limit of the
  lowest class or the upper limit of the   Calculation of Range and Q.D. for
  highest class or both are not            ungrouped data
  specified.
                                           Example 1

                Activity                   Calculate Range and Q.D. of the
                                           following observations:
  •   Collect data about 52-week
                                                20, 25, 29, 30, 35, 39, 41,
      high/low of 10 shares from a
      newspaper. Calculate the range            48, 51, 60 and 70
      of share prices. Which stock is          Range is clearly 70 – 20 = 50
      most volatile and which is the           For Q.D., we need to calculate
      most stable?                         values of Q3 and Q1.
MEASURES OF DISPERSION                                                                      77


                       n +1                       Range is just the dif ference
      Q1 is the size of      th value.        between the upper limit of the highest
                         4
                                              class and the lower limit of the lowest
    n being 11, Q1 is the size of 3rd         class. So Range is 90 – 0 = 90. For
value.                                        Q.D., first calculate cumulative
    As the values are already arranged        frequencies as follows:
in ascending order, it can be seen that
Q1, the 3rd value is 29. [What will you       Class-           Frequencies    Cumulative
                                              Intervals                       Frequencies
do if these values are not in an order?]      CI                      f           c. f.
                               3( n + 1)       0–10                    5          05
   Similarly, Q3 is size of              th   10–20                    8          13
                                   4
                                              20–40                   16          29
value; i.e. 9th value which is 51. Hence      40–60                    7          36
Q3 = 51                                       60–90                    4          40
           Q3 - Q1   51 - 29                                   n = 40
      Q .D . =     =         = 11
              2         2                                                  n th
   Do you notice that Q.D. is the                 Q1 is the size of             value in a
                                                                            4
average difference of the Quartiles
                                              continuous series. Thus it is the size
from the median.                              of the 10th value. The class containing
                  Activity                    the 10th value is 10–20. Hence Q1 lies
  •     Calculate the median and check        in class 10–20. Now, to calculate the
        whether the above statement is        exact value of Q 1 , the following
        correct.                              formula is used:
Calculation of Range and Q.D. for a                        n
                                                                 cf
frequency distribution.                           Q1 = L + 4       ·i
                                                               f
Example 2
                                                  Where L = 10 (lower limit of the
For the following distribution of marks       relevant Quartile class)
scored by a class of 40 students,                 c.f. = 5 (Value of c.f. for the class
calculate the Range and Q.D.                      preceding the Quartile class)
                  TABLE 6.1                       i = 10 (interval of the Quartile
Class intervals           No. of students
                                              class), and
CI                               (f)              f = 8 (frequency of the Quartile
 0–10                           5
                                              class) Thus,
10–20                           8                             10 - 5
20–40                          16                 Q1 = 10 +          · 10 = 16.25
40–60                           7                               8
60–90                           4
                                                                                       3n th
                               40                 Similarly, Q3 is the size of
                                                                                        4
78                                                        STATISTICS FOR ECONOMICS

value; i.e., 30th value, which lies in       to rich and poor, from the median of
class 40–60. Now using the formula           the entire group.
for Q3, its value can be calculated as           Quartile Deviation can generally be
follows:                                     calculated for open-ended distribu-
                                             tions and is not unduly affected by
              3n
                 - c.f.                      extreme values.
     Q3 = L + 4             i
                 f                           3. M EASURES     OF   D ISPERSION   FROM
                 30 - 29                        AVERAGE
     Q3 = 40 +              20
                    7                        Recall that dispersion was defined as
     Q3 = 42.87                              the extent to which values differ from
                                             their average. Range and Quartile
              42.87 - 16.25                  Deviation do not attempt to calculate,
     Q.D. =                 = 13.31
                    2                        how far the values are, from their
  In individual and discrete series, Q1      average. Yet, by calculating the spread
                                             of values, they do give a good idea
                   n +1 th                   about the dispersion. Two measures
  is the size of           value, but in a
                      4                      which are based upon deviation of the
  continuous distribution, it is the size    values from their average are Mean
       n th                                  Deviation and Standard Deviation.
  of        value. Similarly, for Q3 and        Since the average is a central
        4
                                             value, some deviations are positive
  median also, n is used in place of
  n+1.                                       and some are negative. If these are
                                             added as they are, the sum will not
                                             reveal anything. In fact, the sum of
     If the entire group is divided into
                                             deviations from Arithmetic Mean is
two equal halves and the median
calculated for each half, you will have      always zero. Look at the following two
the median of better students and the        sets of values.
median of weak students. These                  Set A :      5,     9,    16
medians differ from the median of the           Set B :      1,     9,    20
entire group by 13.31 on an average.
                                                You can see that values in Set B
Similarly, suppose you have data
about incomes of people of a town.           are farther from the average and hence
Median income of all people can be           more dispersed than values in Set A.
calculated. Now if all people are            Calculate the deviations from
divided into two equal groups of rich        Arithmetic Mean amd sum them up.
and poor, medians of both groups can         What do you notice? Repeat the same
be calculated. Quartile Deviation will       with Median. Can you comment upon
tell you the average difference between      the quantum of variation from the
medians of these two groups belonging        calculated values?
MEASURES OF DISPERSION                                                                79

    Mean Deviation tries to overcome       Mean Deviation which is simply the
this problem by ignoring the signs of      arithmetic mean of the differences of
deviations, i.e., it considers all         the values from their average. The
deviations positive. For standard          average used is either the arithmetic
deviation, the deviations are first        mean or median.
squared and averaged and then                  (Since the mode is not a stable
square root of the average is found.       average, it is not used to calculate
We shall now discuss them separately       Mean Deviation.)
in detail.                                                Activities
                                             •   Calculate the total distance to be
Mean Deviation
                                                 travelled by students if the
Suppose a college is proposed for                college is situated at town A, at
students of five towns A, B, C, D and            town C, or town E and also if it
E which lie in that order along a road.          is exactly half way between A and
Distances of towns in kilometres from            E.
                                             •   Decide where, in you opinion,
town A and number of students in
                                                 the college should be establi-
these towns are given below:                     shed, if there is only one student
                                                 in each town. Does it change
Town          Distance         No.
            from town A    of Students
                                                 your answer?

A               0              90
                                           Calculation of Mean Deviation from
B               2             150
C               6             100          Arithmetic Mean for ungrouped
D              14             200          data.
E              18              80
                                           Direct Method
                              620
                                           Steps:
    Now, if the college is situated in
                                           (i) The A.M. of the values is calculated
town A, 150 students from town B will
                                           (ii) Difference between each value and
have to travel 2 kilometers each (a
                                                 the A.M. is calculated. All
total of 300 kilometres) to reach the            dif ferences are      considered
college. The objective is to find a              positive. These are denoted as |d|
location so that the average distance      (iii) The A.M. of these dif ferences
travelled by students is minimum.                (called deviations) is the Mean
    You may observe that the students            Deviation.
will have to travel more, on an average,                     S |d|
if the college is situated at town A or          i.e. M.D. =
                                                               n
E. If on the other hand, it is
somewhere in the middle, they are          Example 3
likely to travel less. The average         Calculate the Mean Deviation of the
distance travelled is calculated by        following values; 2, 4, 7, 8 and 9.
80                                                                   STATISTICS FOR ECONOMICS


                   SX                                    Where Σ |d| is the sum of absolute
The A.M. =            =6                             deviations taken from the assumed
                    n
                                                     mean.
        X                 |d|
                                                     x is the actual mean.
        2                   4                        A x is the assumed mean used to
        4                   2                        calculate deviations.
        7                   1                        Σ fB is the number of values below the
        8                   2                        actual mean including the actual
        9                   3                        mean.
                           12                        Σ fA is the number of values above the
                   12                                actual mean.
     M.D.( X ) =      = 2.4                              Substituting the values in the
                    5
                                                     above formula:
Assumed Mean Method
                                                                   11 + (6 - 7)(2 - 3) 12
                                                     M.D.( x ) =                      =   = 2.4
Mean Deviation can also be calculated                                       5           5
by calculating deviations from an
assumed mean. This method is                         Mean Deviation from median for
adopted especially when the actual                   ungrouped data.
mean is a fractional number. (Take
care that the assumed mean is close                  Direct Method
to the true mean).                                   Using the values in example 3, M.D.
    For the values in example 3,                     from the Median can be calculated as
suppose value 7 is taken as assumed                  follows,
mean, M.D. can be calculated as                      (i) Calculate the median which is 7.
under:                                               (ii) Calculate the absolute deviations
                                                           from median, denote them as |d|.
Example 4                                            (iii) Find the average of these absolute
         X                    |d|                          deviations. It is the Mean
                                                           Deviation.
         2                       5
         4                       3                   Example 5
         7                       0
                                                                        [X-Median]
         8                       1
                                                               X               |d|
         9                       2
                                                               2                 5
                                11
                                                               4                 3
   In such cases, the following                                7                 0
formula is used,                                               8                 1
               S| d | + ( x - Ax )(S f B - S f A )             9                 2
M.D.( x ) =                                                                     11
                              n
MEASURES OF DISPERSION                                                                               81

M. D. from Median is thus,                          (iii) Multiply each |d| value with its
                                                          corresponding frequency to get
                   S | d | 11                             f|d| values. Sum them up to get
M.D.( median ) =          =   = 2.2
                     n      5                             Σ f|d|.
                                                    (iv) Apply the following formula,
Short-cut method
                                                                          S f |d|
To calculate Mean Deviation by short                       M.D. ( x ) =
                                                                             Sf
cut method a value (A) is used to
calculate the deviations and the                        Mean Deviation of the distribution
following formula is applied.                       in Table 6.2 can be calculated as
                                                    follows:
M.D.( Median )
    S | d| + ( Median - A )(S f B - S f A )         Example 6
=
                     n
                                                    C.I.              f      m.p.      |d|         f|d|
where, A = the constant from which
                                                    10–20            5        15       25.5       127.5
deviations are calculated. (Other                   20–30            8        25       15.5       124.0
notations are the same as given in the              30–50           16        40        0.5         8.0
assumed mean method).                               50–70            8        60       19.5       156.0
                                                    70–80            3        75       34.5       103.5
Mean Deviation from Mean for                                        40                            519.0
Continuous distribution                                                   S f | d | 519
                                                           M.D.( x ) =             =    = 12.975
                    TABLE 6.2                                                Sf      40
Profits of                    Number of
companies                    Companies              Mean Deviation from Median
(Rs in lakhs)                frequencies
Class-intervals                                                            TABLE 6.3
10–20                             5                 Class intervals                 Frequencies
20–30                             8
                                                    20–30                                5
30–50                            16
                                                    30–40                               10
50–70                             8
                                                    40–60                               20
70–80                             3
                                                    60–80                                9
                                 40                 80–90                                6
                                                                                        50
Steps:
                                                       The procedure to calculate Mean
(i) Calculate the             mean       of   the   Deviation from the median is the
    distribution.                                   same as it is in case of M.D. from
(ii) Calculate the absolute deviations              Mean, except that deviations are to
     |d| of the class midpoints from the            be taken from the median as given
     mean.                                          below:
82                                                         STATISTICS FOR ECONOMICS

Example 7                                    Calculation of Standard Deviation
                                             for ungrouped data
C.I.             f       m.p.   |d|   f|d|
                                             Four alternative methods are available
20–30            5        25    25    125
30–40           10        35    15    150    for the calculation of standard
40–60           20        50     0      0    deviation of individual values. All
60–80            9        70    20    180    these methods result in the same
80–90            6        85    35    210    value of standard deviation. These are:
                50                    665
                                             (i) Actual Mean Method
                          S f |d|            (ii) Assumed Mean Method
       M.D.( Median )   =
                             Sf              (iii) Direct Method
                                             (iv) Step-Deviation Method
           665
       =       = 13.3                        Actual Mean Method:
           50
                                             Suppose you have to calculate the
      Mean Deviation: Comments               standard deviation of the following
   Mean Deviation is based on all            values:
   values. A change in even one value           5, 10, 25, 30, 50
   will affect it. It is the least when
   calculated from the median i.e., it       Example 8
   will be higher if calculated from the
   mean. However it ignores the signs        X                  d         d2
   of deviations and cannot be               5                –19        361
   calculated for open-ended distribu-       10               –14        196
   tions.                                    25                +1          1
                                             30                +6         36
                                             50               +26        676
Standard Deviation                                              0       1270
Standard Deviation is the positive                Following formula is used:
square root of the mean of squared
deviations from mean. So if there are               S d2
                                             s=
five values x1, x2, x3, x4 and x5, first             n
their mean is calculated. Then
deviations of the values from mean are              1270
                                             s=          =    254 = 15.937
calculated. These deviations are then                 5
squared. The mean of these squared
                                                Do you notice the value from which
deviations is the variance. Positive
                                             deviations have been calculated in the
square root of the variance is the
                                             above example? Is it the Actual Mean?
standard deviation.
(Note that Standard Deviation is             Assumed Mean Method
calculated on the basis of the mean          For the same values, deviations may
only).                                       be calculated from any arbitrary value
MEASURES OF DISPERSION                                                               83


A x such that d = X – A x . Taking A x       (This amounts to taking deviations
= 25, the computation of the standard     from zero)
deviation is shown below:                    Following formula is used.

Example 9                                              S x2
                                             s=             - ( x )2
                                                        n
X                  d           d2
                                                       4150
5                 –20         400
                                          or s =            - (24 )2
10                –15         225                       5
25                  0           0
30                 +5          25         or s =       254 = 15.937
50                +25         625
                                            Standard Deviation is not affected
                   –5        1275
                                            by the value of the constant from
                                            which deviations are calculated. The
Formula for Standard Deviation              value of the constant does not figure
                   2
                                            in the standard deviation formula.
        S d2    Sd                          Thus, Standard Deviation is
s=           -
         n     Łn ł                         Independent of Origin.

                       2
        1275    -5                        Step-deviation Method
s=           -      =      254 = 15.937
          5    Ł5 ł                       If the values are divisible by a common
                                          factor, they can be so divided and
     The sum of deviations from a value
     other than actul mean is not equal   standard deviation can be calculated
     to zero                              from the resultant values as follows:

                                          Example 11
Direct Method
                                          Since all the five values are divisible
Standard Deviation can also be            by a common factor 5, we divide and
calculated from the values directly,      get the following values:
i.e., without taking deviations, as
shown below:                                       x           x'        d      d2
                                                 5             1       –3.8   14.44
Example 10                                      10             2       –2.8    7.84
                                                25             5       +0.2    0.04
         X                     x2               30             6       +1.2    1.44
                                                50            10       +5.2   27.04
         5                     25
        10                    100                                       0     50.80
        25                    625
                                              (Steps in the calculation are same
        30                    900
        50                   2500         as in actual mean method).
                                              The following formula is used to
        120                  4150
                                          calculate standard deviation:
84                                                             STATISTICS FOR ECONOMICS


              S d2                                  Standard      Deviation      is  not
     s=            ·c                               independent of scale. Thus, if the
               n                                    values or deviations are divided by
          x                                         a common factor, the value of the
     x’ =                                           common factor is used in the
          c                                         formula to get the value of Standard
     c = common factor                              Deviation.
     Substituting the values,
       50.80                                  Standard Deviation in Continuous
s=                   5                        frequency distribution:
         5
                                              Like ungrouped data, S.D. can be
s = 10.16 · 5                                 calculated for grouped data by any of
s = 15.937                                    the following methods:
                                              (i) Actual Mean Method
   Alternatively, instead of dividing
                                              (ii) Assumed Mean Method
the values by a common factor, the
                                              (iii) Step-Deviation Method
deviations can be divided by a
common factor. Standard Deviation
                                              Actual Mean Method
can be calculated as shown below:
                                              For the values in Table 6.2, Standard
Example 12
                                              Deviation can be calculated as follows:
          x           d             d'   d2
                                              Example 13
         5           –20        –4       16
        10           –15        –3        9
                                              (1)        (2)   (3)    (4)     (5)     (6)        (7)
        25             0         0        0
                                              CI           f    m     fm       d      fd        fd2
        30            +5        +1        1
        50           +25        +5       25   10–20       5    15      75   –25.5   –127.5    3251.25
                                              20–30       8    25     200   –15.5   –124.0    1922.00
                                –1       51   30–50      16    40     640    –0.5     –8.0       4.00
                                              50–70       8    60     480   +19.5   +156.0    3042.00
   Deviations have been calculated            70–80       3    75     225   +34.5   +103.5    3570.75
from an arbitrary value 25. Common
                                                         40          1620              0     11790.00
factor of 5 has been used to divide
deviations.                                   Following steps are required:
                                              1. Calculate the mean of                           the
                           2
       S d ’2    Sd’                              distribution.
s=                             ·c
         n      Ł n ł                                  Sfm 1620
                                                       x=   =       = 40.5
                                                        Sf     40
                 2
   51 -1                                      2. Calculate deviations of mid-values
s=   -    ·5                                     from the mean so that
   5 Ł5 ł
                                                   d = m - x (Col. 5)
s = 10.16        · 5 = 15.937                 3. Multiply the deviations with their
MEASURES OF DISPERSION                                                                         85

   corresponding frequencies to get         4. Multiply ‘fd’ values (Col. 5) with ‘d’
   ‘fd’ values (col. 6) [Note that Σ fd        values (col. 4) to get fd2 values (col.
   = 0]                                        6). Find Σ fd2.
4. Calculate      ‘fd 2 ’ values    by      5. Standard Deviation can be
   multiplying ‘fd’ values with ‘d’            calculated by the following
   values. (Col. 7). Sum up these to           formula.
   get Σ fd2.                                                            2
                                                        Sfd2    Sfd
5. Apply the formula as under:                    s=         -
                                                         n     Ł n ł
        Sfd2   11790
  s=         =       = 17.168                                                      2
         n       40                                          11800    20
                                                  or s =           -
                                                               40    Ł40 ł
Assumed Mean Method                               or s = 294.75 = 17.168
For the values in example 13,
standard deviation can be calculated        Step-deviation Method
by taking deviations from an assumed        In case the values of deviations are
mean (say 40) as follows:                   divisible by a common factor, the
                                            calculations can be simplified by the
Example 14
                                            step-deviation method as in the
(1)        (2)   (3)   (4)    (5)    (6)    following example.
CI          f     m     d     fd     fd2
10–20      5     15    -25   –125   3125    Example 15
20–30      8     25    -15   –120   1800
30–50     16     40      0      0      0    (1)        (2)   (3)   (4)       (5)       (6)   (7)
50–70      8     60    +20    160   3200    CI          f    m      d         d'       fd'   fd'2
70–80      3     75    +35    105   3675
                                            10–20       5    15    –25       –5        –25   125
          40                 +20    11800   20–30       8    25    –15       –3        –24    72
                                            30–50      16    40      0        0          0     0
The following steps are required:           50–70       8    60    +20       +4        +32   128
1. Calculate mid-points of classes          70–80       3    75    +35       +7        +21   147
   (Col. 3)                                            40                               +4   472
2. Calculate deviations of mid-points
   from an assumed mean such that           Steps required:
   d = m – A x (Col. 4). Assumed            1. Calculate class mid-points (Col. 3)
   Mean = 40.                                  and deviations from an arbitrarily
3. Multiply values of ‘d’ with                 chosen value, just like in the
   corresponding frequencies to get            assumed mean method. In this
   ‘fd’ values (Col. 5). (note that the        example, deviations have been
   total of this column is not zero            taken from the value 40. (Col. 4)
   since deviations have been taken         2. Divide the deviations by a common
   from assumed mean).                         factor denoted as ‘C’. C = 5 in the
86                                                       STATISTICS FOR ECONOMICS

      above example. The values so            Set A       500       700     1000
      obtained are ‘d'’ values (Col. 5).      Set B    100000    120000   130000

3. Multiply      ‘d'’   values    with            Suppose the values in Set A are
   corresponding ‘f'’ values (Col. 2) to      the daily sales recorded by an ice-
   obtain ‘fd'’ values (Col. 6).              cream vendor, while Set B has the
                                              daily sales of a big departmental store.
4. Multiply ‘fd'’ values with ‘d'’ values     Range for Set A is 500 whereas for Set
   to get ‘fd'2’ values (Col. 7)              B, it is 30,000. The value of Range is
5. Sum up values in Col. 6 and Col.           much higher in Set B. Can you say
   7 to get Σ fd' and Σ fd'2 values.          that the variation in sales is higher
                                              for the departmental store? It can be
6. Apply the following formula.               easily observed that the highest value
                             2                in Set A is double the smallest value,
            Sfd ¢
                2    Sfd ¢
      s =         -        ·c                 whereas for the Set B, it is only 30%
             Sf     Ł Sf ł                    higher. Thus absolute measures may
                         2
                                              give misleading ideas about the extent
            472   4                           of variation specially when the
or s =          -     ·5
             40 Ł40 ł                         averages differ significantly.
                                                  Another weakness of absolute
or s = 11.8 - .01 · 5                         measures is that they give the answer
                                              in the units in which original values
      s = 11.79 · 5                           are expressed. Consequently, if the
or
      s = 17.168                              values are expressed in kilometers, the
                                              dispersion will also be in kilometers.
      Standard Deviation: Comments
                                              However, if the same values are
     Standard Deviation, the most widely      expressed in meters, an absolute
     used measure of dispersion, is based     measure will give the answer in meters
     on all values. Therefore a change in
                                              and the value of dispersion will appear
     even one value affects the value of
     standard deviation. It is independent
                                              to be 1000 times.
     of origin but not of scale. It is also       To overcome these problems,
     useful in certain advanced statistical   relative measures of dispersion can be
     problems.                                used. Each absolute measure has a
                                              relative counterpart. Thus, for Range,
                                              there is Coefficient of Range which is
5. ABSOLUTE AND RELATIVE MEASURES
                                              calculated as follows:
   OF DISPERSION
                                                                       L- S
All the measures, described so far, are       Coefficient of Range =
absolute measures of dispersion. They                                  L+ S
calculate a value which, at times, is         where L = Largest value
difficult to interpret. For example,                S = Smallest value
consider the following two data sets:           Similarly, for Quartile Deviation, it
MEASURES OF DISPERSION                                                                       87

is Coefficient of Quartile Deviation              be compared even across different
which can be calculated as follows:               groups having different units of
   Coefficient of Quartile Deviation              measurement.
      Q3 - Q 1                                    7. LORENZ CURVE
=                        rd
      Q3 + Q 1 where Q3=3 Quartile                    The measures of dispersion
   Q1 = 1st Quartile                              discussed so far give a numerical
   For Mean Deviation, it                 is      value of dispersion. A graphical
Coefficient of Mean Deviation.                    measure called Lorenz Curve is
Coefficient of Mean Deviation =                   available for estimating dispersion.
     M.D.( x )    M.D.( Median )                  You may have heard of statements like
               or                                 ‘top 10% of the people of a country
         x          Median
     Thus if Mean Deviation is                    earn 50% of the national income while
calculated on the basis of the Mean,              top 20% account for 80%’. An idea
it is divided by the Mean. If Median is           about income disparities is given by
used to calculate Mean Deviation, it              such figures. Lorenz Curve uses the
is divided by the Median.                         information expressed in a cumulative
     For Standard Deviation, the                  manner to indicate the degree of
relative measure is called Coefficient            variability. It is specially useful in
of Variation, calculated as below:                comparing the variability of two or
     Coefficient of Variation                     more distributions.
                                                      Given below are the monthly
          Standard Deviation                      incomes of employees of a company.
      =                      · 100
           Arithmetic Mean                                        TABLE 6.4
   It is usually expressed in                          Incomes         Number of employees
percentage terms and is the most
                                                       0–5,000                  5
commonly used relative measure of                      5,000–10,000            10
dispersion. Since relative measures                    10,000–20,000           18
are free from the units in which the                   20,000–40,000           10
values have been expressed, they can                   40,000–50,000            7

Example 16

Income            Mid-points   Cumulative Cumulative      No. of  Comulative      Comulative
limits                         mid-points mid-points as employees frequencies frequencies as
                                          percentages frequencies                percentages
(1)                  (2)          (3)          (4)         (5)        (6)             (7)
0–5000              2500         2500            2.5         5            5            10
5000–10000          7500        10000           10.0        10           15            30
10000–20000        15000        25000           25.0        18           33            66
20000–40000        30000        55000           55.0        10           43            86
40000–50000        45000       100000          100.0         7           50           100
88                                                    STATISTICS FOR ECONOMICS

Construction of the Lorenz Curve            from line OC has the highest
Following steps are required.               dispersion.

1. Calculate class mid-points and
   find cumulative totals as in Col. 3
   in the example 16, given above.
2. Calculate cumulative frequencies
   as in Col. 6.
3. Express the grand totals of Col. 3
   and 6 as 100, and convert the
   cumulative totals in these columns
   into percentages, as in Col. 4 and 7.
4. Now, on the graph paper, take the
   cumulative percentages of the
   variable (incomes) on Y axis and
   cumulative percentages of
   frequencies (number of employees)
   on X-axis, as in figure 6.1. Thus
   each axis will have values from ‘0’
   to ‘100’.
5. Draw a line joining Co-ordinate          8. CONCLUSION
   (0, 0) with (100,100). This is called    Although Range is the simplest to
   the line of equal distribution           calculate and understand, it is unduly
   shown as line ‘OC’ in figure 6.1.        affected by extreme values. QD is not
6. Plot the cumulative percentages of       affected by extreme values as it is
   the variable with corresponding          based on only middle 50% of the data.
   cumulative percentages of                However, it is more dif ficult to
   frequency. Join these points to get      interpret M.D. and S.D. both are based
   the curve OAC.                           upon deviations of values from their
                                            average. M.D. calculates average of
Studying the Lorenz Curve                   deviations from the average but
OC is called the line of equal              ignores signs of deviations and
distribution, since it would imply a        therefore appears to be unmathema-
situation like, top 20% people earn         tical. Standard Deviation attempts to
20% of total income and top 60% earn        calculate average deviation from
60% of the total income. The farther        mean. Like M.D., it is based on all
the curve OAC from this line, the           values and is also applied in more
greater is the variability present in the   advanced statistical problems. It is
distribution. If there are two or more      the most widely used measure of
curves, the one which is the farthest       dispersion.
MEASURES OF DISPERSION                                                                89


                                       Recap
     •   A measure of dispersion improves our understanding about the
         behaviour of an economic variable.
     •   Range and Quartile Deviation are based upon the spread of values.
     •   M.D. and S.D. are based upon deviations of values from the average.
     •   Measures of dispersion could be Absolute or Relative.
     •   Absolute measures give the answer in the units in which data are
         expressed.
     •   Relative smeasures are free from these units, and consequently can
         be used to compare different variables.
     •   A graphic method, which estimates the dispersion from shape
         of a curve, is called Lorenz Curve.




                                    EXERCISES

   1. A measure of dispersion is a good supplement to the central value in
      understanding a frequency distribution. Comment.
   2. Which measure of dispersion is the best and how?
   3. Some measures of dispersion depend upon the spread of values whereas
      some calculate the variation of values from a central value. Do you agree?
   4. In a town, 25% of the persons earned more than Rs 45,000 whereas 75%
      earned more than 18,000. Calculate the absolute and relative values of
      dispersion.
   5. The yield of wheat and rice per acre for 10 districts of a state is as under:
      District    1    2     3     4     5      6    7      8     9     10
      Wheat      12   10    15    19     21   16    18      9     25    10
      Rice       22   29    12    23     18   15    12     34     18    12
      Calculate for each crop,
      (i) Range
      (ii) Q.D.
      (iii) Mean Deviation about Mean
      (iv) Mean Deviation about Median
      (v) Standard Deviation
      (vi) Which crop has greater variation?
      (vii) Compare the values of different measures for each crop.
   6. In the previous question, calculate the relative measures of variation and
      indicate the value which, in your opinion, is more reliable.
   7. A batsman is to be selected for a cricket team. The choice is between X
      and Y on the basis of their five previous scores which are:
90                                                        STATISTICS FOR ECONOMICS

        X       25      85     40     80      120
        Y       50      70     65     45      80
        Which batsman should be selected if we want,
         (i) a higher run getter, or
        (ii) a more reliable batsman in the team?
      8. To check the quality of two brands of lightbulbs, their life in burning
         hours was estimated as under for 100 bulbs of each brand.
        Life                             No. of bulbs
        (in hrs)                   Brand A          Brand B
        0–50                           15               2
        50–100                         20               8
        100–150                        18              60
        150–200                        25              25
        200–250                        22               5
                                      100             100

         (i) Which brand gives higher life?
        (ii) Which brand is more dependable?
      9. Averge daily wage of 50 workers of a factory was Rs 200 with a Standard
         Deviation of Rs 40. Each worker is given a raise of Rs 20. What is the
         new average daily wage and standard deviation? Have the wages become
         more or less uniform?
     10. If in the previous question, each worker is given a hike of 10 % in wages,
         how are the Mean and Standard Deviation values affected?
     11. Calculate the Mean Deviation about Mean and Standard Deviation for the
         following distribution.
        Classes                       Frequencies
        20–40                                3
        40–80                                6
        80–100                              20
        100–120                             12
        120–140                              9
                                            50
     12. The sum of 10 values is 100 and the sum of their squares is 1090. Find
         the Coefficient of Variation.
CHAPTER



                                                     Index Numbers




                                           commodities have changed. Some
  Studying this chapter should             items have become costlier, while
  enable you to:                           others have become cheaper. On his
  • understand the meaning of the          return from the market, he tells his
     term index number;                    father about the change in price of the
  • become familiar with the use of
                                           each and every item, he bought. It is
     some widely used index
                                           bewildering to both. The industrial
     numbers;
  • calculate an index number;
                                           sector consists of many subsectors.
  • appreciate its limitations.            Each of them is changing. The output
                                           of some subsectors are rising, while it
                                           is falling in some subsectors. The
1. INTRODUCTION                            changes are not uniform. Description
You have learnt in the previous            of the individual rates of change will
chapters how summary measures can          be difficult to understand. Can a
be obtained from a mass of data. Now       single figur e summarise these
you will learn how to obtain summary       changes? Look at the following cases:
measures of change in a group of
related variables.                         Case 1
    Rabi goes to the market after a long   An industrial worker was earning a
gap. He finds that the prices of most      salary of Rs 1,000 in 1982. Today, he
108                                                   STATISTICS FOR ECONOMICS

earns Rs 12,000. Can his standard of       production in different sectors of an
living be said to have risen 12 times      industry, production of various
during this period? By how much            agricultural crops, cost of living etc.
should his salary be raised so that
he is as well off as before?

Case 2
You must be reading about the sensex
in the newspapers. The sensex
crossing 8000 points is, indeed,
greeted with euphoria. When, sensex
dipped 600 points recently, it eroded
investors’ wealth by Rs 1,53,690
crores. What exactly is sensex?

Case 3
The government says inflation rate will        Conventionally, index numbers are
not accelerate due to the rise in the      expressed in terms of percentage. Of
price of petroleum products. How           the two periods, the period with which
does one measure inflation?                the comparison is to be made, is
   These are a sample of questions         known as the base period. The value
you confront in your daily life. A study   in the base period is given the index
of the index number helps in               number 100. If you want to know how
analysing these questions.                 much the price has changed in 2005
                                           from the level in 1990, then 1990
2. WHAT   IS AN INDEX NUMBER               becomes the base. The index number
                                           of any period is in proportion with it.
An index number is a statistical device
                                           Thus an index number of 250
for measuring changes in the
                                           indicates that the value is two and half
magnitude of a group of related
variables. It represents the general       times that of the base period.
trend of diverging ratios, from which          Price index numbers measure and
it is calculated. It is a measure of the   permit comparison of the prices of
average change in a group of related       certain goods. Quantity index
variables over two different situations.   numbers measure the changes in the
The comparison may be between like         physical volume of production,
categories such as persons, schools,       construction or employment. Though
hospitals etc. An index number also        price index numbers are more widely
measures changes in the value of the       used, a production index is also an
variables such as prices of specified      important indicator of the level of the
list of commodities, volume of             output in the economy.
INDEX NUMBERS                                                                        109

3. CONSTRUCTION      OF AN INDEX NUMBER          The Aggregative Method
In the following sections, the                   The formula for a simple aggregative
principles of constructing an index              price index is
number will be illustrated through                           ΣP1
                                                    P01 =        ¥ 100
price index numbers.                                         ΣP0
Let us look at the following example:               Where P1 and P0 indicate the price
Example 1                                        of the commodity in the current
                                                 period and base period respectively.
Calculation of simple aggregative price
                                                 Using the data from example 1, the
index                                            simple aggregative price index is
                 TABLE 8.1
                                                           4+6+5+3
Commodity     Base      Current     Percentage      P01 =               ¥ 100 = 138.5
             period     period       change
                                                           2+5+4+2
            price (Rs) price (Rs)                    Here, price is said to have risen by
A               2          4           100       38.5 percent.
B               5          6           20            Do you know that such an index
C               4          5           25        is of limited use? The reason is that
D               2          3           50        the units of measurement of prices of
    As you observe in this example, the          various commodities are not the
percentage changes are different for             same. It is unweighted, because the
                                                 relative importance of the items has
every commodity. If the percentage
                                                 not been properly reflected. The items
changes were the same for all four
                                                 ar e treated as having equal
items, a single measure would have               importance or weight. But what
been sufficient to describe the change.          happens in reality? In reality the items
However, the percentage changes                  pur chased dif fer in order of
differ and reporting the percentage              importance. Food items occupy a
change for every item will be                    large proportion of our expenditure.
confusing. It happens when the                   In that case an equal rise in the price
number of commodities is large, which            of an item with large weight and that
                                                 of an item with low weight will have
is common in any r eal market
                                                 different implications for the overall
situation. A price index represents
                                                 change in the price index.
these changes by a single numerical                  The for mula for a weighted
measure.                                         aggregative price index is
    There are two methods of
                                                            ΣP1q1
constructing an index number. It can                P01 =          ¥ 100
                                                            ΣP0 q1
be computed by the aggregative
method and by the method of                         An index number becomes a
                                                 weighted index when the relative
averaging relatives.
110                                                         STATISTICS FOR ECONOMICS

importance of items is taken care of.              4 ¥ 10 + 6 ¥ 12 + 5 ¥ 20 + 3 ¥ 15
Here weights are quantity weights. To          =                                     ¥ 100
                                                   2 ¥ 10 + 5 ¥ 12 + 4 ¥ 20 + 2 ¥ 15
construct a weighted aggregative
index, a well specified basket of                 257
                                               =        ¥ 100 = 135.3
commodities is taken and its worth                190
each year is calculated. It thus                   This method uses the base period
measures the changing value of a fixed         quantities as weights. A weighted
aggregate of goods. Since the total            aggregative price index using base
value changes with a fixed basket, the         period quantities as weights, is also
change is due to price change.                 known as Laspeyre’s price index. It
Various methods of calculating a               provides an explanation to the
weighted aggregative index use                 question that if the expenditure on
different baskets with respect to time.        base period basket of commodities
                                               was Rs 100, how much should be the
                                               expenditure in the current period on
                                               the same basket of commodities? As
                                               you can see here, the value of base
                                               period quantities has risen by 35.3 per
                                               cent due to price rise. Using base
                                               period quantities as weights, the price
                                               is said to have risen by 35.3 percent.
                                                   Since the current period quantities
                                               differ from the base period quantities,
                                               the index number using current period
                                               weights gives a different value of the
                                               index number.
Example 2
                                                       ΣP1q1
Calculation of weighted aggregative            P01 =          ¥ 100
price index                                            ΣP0 q1
                TABLE 8.2                          4 ¥ 5 + 6 ¥ 10 + 5 ¥ 15 + 3 ¥ 10
                                               =                                    ¥ 100
                  Base period Current period       2 ¥ 5 + 5 ¥ 10 + 4 ¥ 15 + 2 ¥ 15
Commodity      Price Quantity Price Quality
                P0       q0     p1     q1        185
                                               =      ¥ 100 = 132.1
A               2       10      4       5        140
B               5       12      6      10         It uses the current period
C               4       20      5      15
D               2       15      3      10
                                               quantities as weights. A weighted
                                               aggregative price index using current
        ΣP1q1                                  period quantities as weights is known
P01 =          ¥ 100                           as Paasche’s price index. It helps in
        ΣP0 q1
                                               answering the question that, if the
INDEX NUMBERS                                                                     111

the current period basket of                   The weighted index of price
commodities was consumed in the            relatives is the weighted arithmetic
base period and if we were spending        mean of price relatives defined as
Rs 100 on it, how much should be the
expenditure in current period on the                      ÊP        ˆ
                                                      ΣW Á 1 ¥ 100˜
same basket of commodities. A                             Ë P0      ¯
Paasche’s price index of 132.1 is               P01 =
                                                            ΣW
interpreted as a price rise of 32.1
                                           where W = Weight.
percent. Using current period weights,
                                               In a weighted price relative index
the price is said to have risen by 32.1
                                           weights may be determined by the
per cent.
                                           proportion or percentage of
Method of Averaging relatives              expenditure on them in total
                                           expenditure during the base period.
When there is only one commodity, the      It can also refer to the current period
price index is the ratio of the price of   depending on the formula used. These
the commodity in the current period        are, essentially, the value shares of
to that in the base period, usually        different commodities in the total
expressed in percentage terms. The         expenditure. In general the base
method of averaging relatives takes        period weight is preferred to the
the average of these relatives when        current period weight. It is because
there are many commodities. The            calculating the weight every year is
price index number using price             inconvenient. It also refers to the
relatives is defined as                    changing values of different baskets.
                                           They are strictly not comparable.
           1 p1
   P01 =    Σ   ¥ 100                      Example 3 shows the type of
           n p0                            information one needs for calculating
                                           weighted price index.
where P1 and Po indicate the price of
the ith commodity in the current           Example 3
period and base period respectively.       Calculation of weighted price relatives
The ratio (P1/P0) × 100 is also referred   index
to as price relative of the commodity.
                                                             TABLE 8.3
n stands for the number of
commodities. In the curr ent               Commodity     Base   Current   Price Weight
                                                         year year price relative in %
example                                                  price  (in Rs)
                                                       (in Rs.)
           1 Ê 4 6 5 3ˆ
   P01 =     Á + + + ˜ ¥ 100 = 149         A             2        4      200     40
           4 Ë 2 5 4 2¯                    B             5        6      120     30
                                           C             4        5      125     20
   Thus the prices of the commodities      D             2        3      150     10
have risen by 49 percent.
112                                                      STATISTICS FOR ECONOMICS

The weighted price index is                           Consumer Price Index

             ÊP       ˆ                         In India three CPI’s are constructed.
          ΣW Á 1 ¥ 100˜                         They are CPI for industrial workers
             Ë P0     ¯
P01     =                                       (1982 as base), CPI for urban non
               ΣW                               manual employees (1984–85 as
                                                base) and CPI for agricultural
    40 ¥ 200 + 30 ¥ 120 + 20 ¥ 125 + 10 ¥ 150
=                                               labourers (base 1986–87). They are
                      100                       routinely calculated every month to
= 156                                           analyse the impact of changes in the
    The weighted price index is 156.            retail price on the cost of living of
The price index has risen by 56                 these three br oad categories of
percent. The values of the unweighted           consumers. The CPI for industrial
price index and the weighted price              workers and agricultural labourers
index differ, as they should. The higher        are published by Labour Bureau,
rise in the weighted index is due to            Shimla. The Central Statistical
the doubling of the most important              Organisation publishes the CPI
                                                number of urban non manual
item A in example 3.
                                                employees. This is necessary
                                                because their typical consumption
                   Activity
                                                baskets contain many dissimilar
    •    Interchange the current period         items.
         values with the base period             The weight scheme in CPI for
         values, in the data given in           industrial workers (1982=100) by
         example 2. Calculate the price         major commodity groups is given
         index using Laspeyre’s, and            in the following table. In this scheme
         Paasche’s       for mula. What         food has the largest weight. Food
         difference do you observe from         being the most important category,
         the earlier illustration?
                                                any rise in the food price will have a
                                                significant impact on CPI. This also
4. SOME IMPORTANT INDEX NUMBERS                 explains the government’s frequent
                                                statement that oil price hike will not
Consumer price index                            be inflationary.

Consumer price index (CPI), also                Major Group               Weight in %
known as the cost of living index,              Food                          57.00
measures the average change in retail           Pan, supari, tobacco etc.      3.15
prices. The CPI for industrial workers          Fuel & light                   6.28
                                                Housing                        8.67
is increasingly considered the
                                                Clothing, bedding & footwear   8.54
appropriate indicator of general                Misc. group                   16.36
inflation, which shows the most                 General                      100.00
accurate impact of price rise on the
                                                Source: Economic Survey, Government of
cost of living of common people.
                                                India.
Consider the statement that the CPI
INDEX NUMBERS                                                                           113

for industrial workers(1982=100) is           Wholesale price index
526 in January 2005. What does this
                                              The wholesale price index number
statement mean? It means that if the
                                              indicates the change in the general
industrial worker was spending Rs
100 in 1982 for a typical basket of           price level. Unlike the CPI, it does not
commodities, he needs Rs 526 in               have any reference consumer
January 2005 to be able to buy an             category. It does not include items
identical basket of commodities. It is        pertaining to services like barber
not necessary that he/she buys the            charges, repairing etc.
basket. What is important is whether              What does the statement “WPI with
he has the capability to buy it.              1993-94 as base is 189.1 in March,
Example 4                                     2005” mean? It means that the
                                              general price level has risen by 89.1
Construction of consumer price index
                                              percent during this period.
number.
                                        TABLE 8.4
Item        Weight in %   Base period     Current period      R=P1/P0 × 100       WR
                W          price (Rs)       price (Rs)            (in%)
Food            35           150              145                96.67        3883.45
Fuel            10            25               23                92.00         920.00
Cloth           20            75               65                86.67        1733.40
Rent            15            30               30               100.00        1500.00
Misc.           20            40               45               112.50        2250.00
                                                                              9786.85

                                              Industrial production index
      ΣWR 9786.85
CPI =     =       = 97.86                     The index number of industrial
       ΣW   100                               production measures changes in the
                                              level of industrial production
     This exercise shows that the cost        comprising many industries. It
of living has declined by 2.14 per cent.      includes the production of the public
What does an index larger than 100            and the private sector. It is a weighted
                                              average of quantity relatives. The
indicate? It means a higher cost of
                                              formula for the index is
living necessitating an upward
                                                               Σq1 ¥ W
adjustment in wages and salaries. The               IIP01 =            ¥ 100
rise is equal to the amount, it exceeds                          ΣW
100. If the index is 150, 50 percent             In India, it is currently calculated
                                              every month with 1993–94 as the
upward adjustment is required. The            base. In table 8.6, you can see the
salaries of the employees have to be          index number of some industrial
raised by 50 per cent.                        groupings along with their weights.
114                                                         STATISTICS FOR ECONOMICS


           Wholesale Price Index                 these categories. Why does a compa-
                                                 ratively lower performance of mining
  The commodity weights in the WPI
                                                 and quarrying not pull down the
  are determined by the estimates of
  the commodity value of domestic                general index?
  production and the value of imports
  inclusive of import duty during the            Index number of agricultural
  base year. It is available on a weekly         production
  basis. Commodities are broadly
                                                 Index number of agricultural production
  classified into three categories viz
                                                 is a weighted average of quantity
  primary articles, fuel, power, light
  and lubricants and manufactured                relatives. Its base period is the
  products. The weight scheme is                 triennium ending 1981-82. In 2003–
  given below. The low weight of                 04 the index number of agricultural
  fuel,power,light and lubricants                production was 179.5. It means that
  explains how the government can                agricultural production has increased
  get away with such a statement that            by 79.5 percent over the average of
  the oil price hike will not be                 the three years 1979–80, 1980–81 and
  inflationary at least in the short run.        1981–82. Foodgrains have a weight of
                  TABLE 8.5                      62.92 percent in this index.
  Category         Weight in % No. of items
  Primary articles   22.0          98            SENSEX
  Fuel, power,                                   You ofen come across a news item in
  light & lubricants 14.2          19
  Manufactured                                   a newspaper,
  products           63.8          318               “Sensex breaches 8700 mark. BSE
                                                 closes at 8650 points. Investor wealth
  Source: Economic Survey 2004–2005,
                                                 rises by Rs 9,000 crore. The sensex
  Govt. of India, p–89
                                                 broke the 8700 mark for the first time
                                                 in its history but ended off the mark
                    TABLE 8.6
      Broad industrial grouping and their        at 8650, also a new record closing
                   weights                       level”.
Broad groupings     Weight in %   Index no. in       The rise in sensex was at the
                                   May, 2005     highest level till date, which reflects
Mining and                                       the good health of the economy in
quarrying              10.47         155.2       general. As the share prices increase,
Manufacturing          79.36         222.7       reflected by the rise in sensex, the
Electricity            10.17         196.7       value of wealth of the shareholders
General index                        213.0
                                                 also rises.
   As the table shows, the growth                Look at another news item,
performances of the broad industrial                 “Sensex dips 600 in 30 days flat.
categories differ. The general index             Rs 1,53,690 crore investor wealth
represents the average performance of            eroded. While the sensex has lost 338
INDEX NUMBERS                                                                     115


  Bombay Stock Exchange
  Sensex is the short for m of
  Bombay Stock Exchange Sensitive
  Index with 1978–79 as base. The
  value of the sensex is with
  reference to this period. It is the
  benchmark index for the Indian
  stock market. It consists of 30
  stocks which represent 13 sectors
  of the economy and the companies
  listed ar e leaders in their
  respective industries. If the sensex
  rises, it indicates that the market
  is doing well and investors expect
  better earnings from companies.
  It also indicates a gr owing
  confidence of investors in the basic health of the economy.

points in two consecutive days, it has     index number will replace wholesale
eroded 6.8% or 598 points since            price index.
October 4 when it hit an all time high               Producer Price Index
at 8800 points. Investor wealth eroded
by a staggering Rs 1,53,690 crore or           Pr oducer price index number
                                              measures price changes from the
6.7% during the period.”
                                              producers’ perspective. It uses only
    It shows that all is not well with        basic prices including taxes, trade
the health of the economy. The                margins and transport costs. A
investors may find it hard to decide          Working Gr oup on Revision of
whether to invest or not.                     Wholesale Price Index (1993–
                                              94=100) is inter alia examining the
                                              feasibility of switching over from WPI
                                              to a PPI in India as in many
                                              countries.

                                           5. ISSUES   IN THE CONSTRUCTION OF AN
                                               INDEX NUMBER

                                           You should keep certain important
                                           issues in mind, while constructing an
                                           index number.
                                           • You need to be clear about the
   Another useful index in recent          purpose of the index. Calculation of a
years is the human development             volume index will be inappropriate,
index. Very soon producers price           when one needs a value index.
116                                                             STATISTICS FOR ECONOMICS

• Besides this, the items are not                                  Activity
equally important for different groups         •      Collect data from the local
of consumers when a consumer price                    vegetable market over a week for,
index is constructed. The rise in petrol              at least 10 items. T ry to
price may not directly impact the living              construct the daily price index
condition of the poor agricultural                    for the week. What problems do
labourers. Thus the items to be                       you encounter in applying both
                                                      methods for the construction of
included in any index have to be
                                                      a price index?
selected carefully to            be as
representative as possible. Only then
                                             6. INDEX       NUMBER IN ECONOMICS
you will get a meaningful picture of
the change.                                  Why do we need to use the index
• Every index should have a base.            numbers? Wholesale price index
This base should be as normal as             number (WPI), consumer price index
possible. Extreme values should not          number (CPI) and industrial
be selected as base period. The period       production index (IIP) are widely used
should also not belong to too far in         in policy making.
                                             • Consumer index number (CPI) or
the past. The comparison between
                                             cost of living index numbers are
1993 and 2005 is much more
                                             helpful in wage negotiation,
meaningful than a comparison
                                             formulation of income policy, price
between 1960 and 2005. Many items
                                             policy, rent control, taxation and
in a 1960 typical consumption basket
                                             general economic policy formulation.
have disappeared at present.
                                             • The wholesale price index (WPI) is
Therefore, the base year for any index       used to eliminate the effect of changes
number is routinely updated.                 in prices on aggregates such as
• Another issue is the choice of the         national income, capital formation etc.
formula, which depends on the nature         • The WPI is widely used to measure
of question to be studied. The only          the rate of inflation. Inflation is a
difference between the Laspeyres’            general and continuing increase in
index and Paasche’s index is the             prices. If inflation becomes sufficiently
weights used in these formulae.              large, money may lose its traditional
• Besides, there are many sources            function as a medium of exchange and
of data with different degrees of            as a unit of account. Its primary
reliability. Data of poor reliability will   impact lies in lowering the value of
give misleading results. Hence, due          money. The weekly inflation rate is
care should be taken in the collection       given by
of data. If primary data are not being
                                             Xt       Xt
used, then the most reliable source of                      1
                                                                ¥ 100 where X and X
secondary data should be chosen.                   X t -1                    t      t-1
INDEX NUMBERS                                                                     117

refer to the WPI for the t th and (t-1)      • Sensex is a useful guide for
th weeks.                                    investors in the stock market. If the
• CPI are used in calculating the            sensex is rising, investors ar e
purchasing power of money and real           optimistic of the future performance
wage:                                        of the economy. It is an appropriate
(i) Purchasing power of money = 1/           time for investment.
Cost of living index
(ii) Real wage = (Money wage/Cost of         Where can we get these index
living index) × 100                          numbers?
                                             Some of the widely used index
   If the CPI (1982=100) is 526 in           numbers are routinely published in
January 2005 the equivalent of a             the Economic Survey, an annual
rupee in January, 2005 is given by           publication of the Government of India
                                             are WPI, CPI, Index Number of Yield
         100
    Rs       = 0.19 . It means that it is    of Principal Crops, Index of Industrial
         526                                 Production, Index of Foreign Trade.
worth 19 paise in 1982. If the money
wage of the consumer is Rs 10,000,                           Activity
his real wage will be                          •   Check from the newspapers and
                                                   construct a time series of sensex
                   100                             with 10 observations. What
    Rs 10, 000 ¥       = Rs 1, 901                 happens when the base of the
                   526                             consumer price index is shifted
                                                   from 1982 to 2000?
    It means Rs 1,901 in 1982 has
the same purchasing power as Rs              7. CONCLUSION
10,000 in January, 2005. If he/she
                                             Thus, the method of the index number
was getting Rs 3,000 in 1982, he/
                                             enables you to calculate a single
she is worse off due to the rise in price.
                                             measure of change of a large number
To maintain the 1982 standard of
                                             of items. Index numbers can be
living the salary should be raised to
                                             calculated for price, quantity, volume
Rs 15,780 obtained by multiplying the
                                             etc.
base period salary by the factor 526/
100.                                             It is also clear from the formulae
• Index of industrial production             that the index numbers need to be
gives us a quantitative figure about         interpreted carefully. The items to be
the change in production in the              included and the choice of the base
industrial sector.                           period are important. Index numbers
• Agricultural production index              are extremely important in policy
provides us a ready reckoner of the          making as is evident by their various
performane of agricultural sector.           uses.
118                                                       STATISTICS FOR ECONOMICS


                                         Recap
        •   An index number is a statistical device for measuring relative change
            in a large number of items.
        •   There are several formulae for working out an index number and
            every formula needs to be interpreted carefully.
        •   The choice of formula largely depends on the question of interest.
        •   Widely used index numbers are wholesale price index, consumer
            price index, index of industrial production, agricultural production
            index and sensex.
        •   The index numbers are indispensable in economic policy
            making.


                                      EXERCISES

      1. An index number which accounts for the relative importance of the items
         is known as
         (i) weighted index
         (ii) simple aggregative index
         (iii) simple average of relatives
      2. In most of the weighted index numbers the weight pertains to
         (i) base year
         (ii) current year
         (iii) both base and current year
      3. The impact of change in the price of a commodity with little weight in the
         index will be
         (i) small
         (ii) large
         (iii) uncertain
      4. A consumer price index measures changes in
         (i) retail prices
         (ii) wholesale prices
         (iii) producers prices
      5. The item having the highest weight in consumer price index for industrial
         workers is
         (i) Food
         (ii) Housing
         (iii) Clothing
      6. In general, inflation is calculated by using
         (i) wholesale price index
         (ii) consumer price index
         (iii) producers’ price index
INDEX NUMBERS                                                                       119

    7. Why do we need an index number?
    8. What are the desirable properties of the base period?
    9. Why is it essential to have different CPI for dif ferent categories of
       consumers?
   10. What does a consumer price index for industrial workers measure?
   11. What is the difference between a price index and a quantity index?
   12. Is the change in any price reflected in a price index number?
   13. Can the CPI number for urban non-manual employees represent the
       changes in the cost of living of the President of India?
   14. The monthly per capita expenditure incurred by workers for an industrial
       centre during 1980 and 2005 on the following items are given below. The
       weights of these items are 75,10, 5, 6 and 4 respectively. Prepare a
       weighted index number for cost of living for 2005 with 1980 as the base.
             Items                Price in 1980   Price in 2005
             Food                      100             200
             Clothing                  20               25
             Fuel & lighting           15               20
             House rent                30               40
             Misc                      35               65
   15. Read the following table carefully and give your comments.

                 INDEX OF INDUSTRIAL PRODUCTION BASE 1993–94
      Industry                 Weight in %        1996–97         2003–2004
      General index               100               130.8           189.0
      Mining and quarrying       10.73              118.2           146.9
      Manufacturing              79.58              133.6           196.6
      Electricity                10.69              122.0           172.6
   16. Try to list the important items of consumption in your family.
   17. If the salary of a person in the base year is Rs 4,000 per annum and the
       current year salary is Rs 6,000, by how much should his salary rise to
       maintain the same standard of living if the CPI is 400?
   18. The consumer price index for June, 2005 was 125. The food index was
       120 and that of other items 135. What is the percentage of the total
       weight given to food?
   19. An enquiry into the budgets of the middle class families in a certain city
       gave the following information;
120                                                              STATISTICS FOR ECONOMICS


            Expenses on items             Food         Fuel   Clothing   Rent     Misc.
                                          35%          10%      20%      15%      20%
            Price (in Rs) in 2004         1500         250      750      300       400
            Price (in Rs) in 1995         1400         200      500      200       250
            What is the cost of living index of 2004 as compared with 1995?
      20. Record the daily expenditure, quantities bought and prices paid per unit
          of the daily purchases of your family for two weeks. How has the price
          change affected your family?
      21. Given the following data-

      Year            CPI of industrial     CPI of urban  CPI of agricultural          WPI
                          workers           non-manual        labourers       (1993–94=100)
                       (1982 =100)           employees     (1986–87 = 100)
                                          (1984–85 = 100)
      1995–96               313                  257             234                 121.6
      1996–97               342                  283             256                 127.2
      1997–98               366                  302             264                 132.8
      1998–99               414                  337             293                 140.7
      1999–00               428                  352             306                 145.3
      2000–01               444                  352             306                 155.7
      2001–02               463                  390             309                 161.3
      2002–03               482                  405             319                 166.8
      2003–04               500                  420             331                 175.9

      Source: Economic Survey, Government of India.2004–2005

      (i) Calculate the inflation rates using different index numbers.
      (ii) Comment on the relative values of the index numbers.
      (iii) Are they comparable?



                                              Activity
        •     Consult your class teacher to make a list of widely used index
              numbers. Get the most recent data indicating the source. Can you
              tell what the unit of an index number is?
        •     Make a table of consumer price index for industrial workers in the
              last 10 years and calculate the purchasing power of money. How is it
              changing?
    f    m
X
     f

				
DOCUMENT INFO
Categories:
Stats:
views:7
posted:10/16/2012
language:Unknown
pages:135
Prateek Bhuwania Prateek Bhuwania
About