Descriptive Statistics

Document Sample

```					POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Lecture Three

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Lecture Three
Descriptive Statistics:
histograms & bar charts

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
An Example:
Murder Rates, by State

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
An Example:
Murder Rates, by State
Data are just one variable: murder rates (per
100,000 population) for all ﬁfty American states in
1993.

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
An Example:
Murder Rates, by State
Data are just one variable: murder rates (per
100,000 population) for all ﬁfty American states in
1993.

Direct inspection of raw numbers in table is not
particularly instructive.

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
An Example:
Murder Rates, by State
Data are just one variable: murder rates (per
100,000 population) for all ﬁfty American states in
1993.

Direct inspection of raw numbers in table is not
particularly instructive.

Use tables, graphs and numerical summaries
(statistics).

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Table 3.1 Agresti and Finlay

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Table 3.1 Agresti and Finlay

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Frequency Distribution

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Frequency Distribution

A frequency distribution is a listing of (mutually
exclusive and exhaustive) intervals of possible
value for a variable, together with a tabulation of
the number of observations in each interval.

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Frequency Distribution

A frequency distribution is a listing of (mutually
exclusive and exhaustive) intervals of possible
value for a variable, together with a tabulation of
the number of observations in each interval.

Partition the variable into “bins”

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Frequency Distribution

A frequency distribution is a listing of (mutually
exclusive and exhaustive) intervals of possible
value for a variable, together with a tabulation of
the number of observations in each interval.

Partition the variable into “bins”

Count the number in each bin.

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Relative Frequency

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Relative Frequency

Deﬁnition: the relative frequency for an interval is
the proportion of the sample observations that fall
in that interval.

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Relative Frequency

Deﬁnition: the relative frequency for an interval is
the proportion of the sample observations that fall
in that interval.

Divide counts of observations in each bin by the
total number of observations.

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Histograms

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Histograms

Graphical representation of a frequency
distribution is a histogram.

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Histograms

Graphical representation of a frequency
distribution is a histogram.

Graph of observations per bin.

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Histograms

Graphical representation of a frequency
distribution is a histogram.

Graph of observations per bin.

As bin width gets smaller, and total number of
observations stays constant, few observations per
bin and more irregular looking histograms.

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Bar Graphs

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Bar Graphs

A bar graph is simply a histogram applied to the
special case of a relative frequency distribution.

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Nominal Variables

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Nominal Variables

Histograms and bar graphs can be used to
summarize nominal variables (e.g., see Fig 3.3 in
text).

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Nominal Variables

Histograms and bar graphs can be used to
summarize nominal variables (e.g., see Fig 3.3 in
text).

Bins are the discrete categories of the nominal
variable.

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Distributions

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Distributions

Up to the error induced from arbitrarily binning the
data, histograms let us see (literally!) the
distribution of our data.

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Distributions

Up to the error induced from arbitrarily binning the
data, histograms let us see (literally!) the
distribution of our data.

Excellent summary tool: conveys a lot of
information quickly.

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Distributions

Up to the error induced from arbitrarily binning the
data, histograms let us see (literally!) the
distribution of our data.

Excellent summary tool: conveys a lot of
information quickly.

Simultaneously visualize interesting features of a
variables: central tendency, dispersion and skew.

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Sample Histograms,
Population Distributions

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Sample Histograms,
Population Distributions

bin width can get smaller and smaller as number of
observations increases

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Sample Histograms,
Population Distributions

bin width can get smaller and smaller as number of
observations increases

as sample size gets arbitrarily large, bin width
goes to zero and the histogram becomes a smooth
function

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
Sample Histograms,
Population Distributions

bin width can get smaller and smaller as number of
observations increases

as sample size gets arbitrarily large, bin width
goes to zero and the histogram becomes a smooth
function

as sample size gets arbitrarily large, the histogram
tends to the population distribution of the variable

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY
R Demonstration

Use murder rate data from Table 3.1 of Agresti &
Finlay.

Chapter 2 of Verzani

POLITICAL SCIENCE 151B: SIMON JACKMAN
STANFORD UNIVERSITY

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 9 posted: 12/26/2011 language: pages: 34
How are you planning on using Docstoc?