Statistic 1181 Project
Kelvin Cheung
Na Ding
Stephanie Gozali
Elaine Wong
STATS 1181 – 005
November 28, 2004
Table of Content
Section 1.0 Introduction………………………………………………………………..3
Section 2.0 Sampling Method………………………………………………………….6
Section 3.0 The Univariate Summary
Variable 1: Number of Bedrooms…………………………………7
Variable 2: Age Range………………………………….…………9
Variable 3: Features…………………………………….………..12
Variable 4: House size………………………………….………..13
Variable 5: Lot Size……………………………………………...15
Variable 6: Price…………………………………………………17
Section 4.0 The Bivariate Summary
Price VS House Size…………………………………………….19
Price VS Lot Size……………..…………………………………24
Price VS Number of Bedrooms………………………………….29
Price VS Age Range……………………………………………..34
House Size VS Lot Size…………………………………………39
House size VS Number of Bedrooms…………………………...44
Features VS Age Range…………………………………………49
Features VS House Size…………………………………………51
Features VS Lot Size….………………………………………...53
Features VS Price……..…………………………………………54
Section 5.0 Conclusions……………………………………………………………...57
Section 6.0 Contributions…………………………………………………………….58
-2 -
Section 1.0 Introduction
Why do different houses have different price tags on them? In this project, we are
going to determine the factors that influence the price of a house in the vicinities of
Vancouver West and Greater Vancouver, as well as assess how these factors are related.
We are aiming to trace the factors that influence the price of a house and to see how these
variables relate to the price. Moreover, we will try to keep the results as random as
possible and to make the sample as representative to the target population, Vancouver
West, as possible.
In finding the factors that influence the price of a house, there are several of
variables that we will consider:
1. Number of bedrooms – Every house has a different number of bedrooms and this
is one of the biggest factors in determining the price of a house.
2. Age range – How old is the house? Has it been renovated? Has it been rebuilt?
Have there been additions? Does it have its own character (i.e., Victorian style,
Brick style)? All houses vary in age, some people prefer newly built houses
whereas others prefer a character house; therefore, the age of a house corresponds
to different prices.
-3 -
3. Features – Some houses have extra features, for example, fireplace, swimming
pool, security system, waterfront, cul-de-sac, close to shopping area, etc. These
factors certainly have an impact on the price.
4. House size (in sqft) – The size of a house has a definite influence on determining
the price of a house. However, some people might prefer smaller houses with a
bigger garden or vice versa.
5. Lot size (in sqft) – The size of the lot a house sits on also has an influence on the
price.
6. Price (in dollars)
In making an association or relationship between variables, we need to make a
hypothesis as our base of discussion. We have ten hypotheses to observe whether or not
each factor is related and also to see whether or not the variables have an influence on the
price.
1. Price and number of bedrooms: As the number of bedrooms increases, the house
size should increase proportionately leading to an increase in price.
2. Price and age range: The age of a house controls its price because the price of a
house is high if the house is still new and it will stay high for a number of years.
However, in other cases, as the house grows older, the price will also increase
because the house obtains its own characteristics that differentiate it from other
houses.
3. Price and features: The more features households or is surrounded with, the more
a house will cost. If the features are within the house (i.e., swimming pool, air
conditioner, fireplace, security system, etc), then it increases the cost of building
the house, which then increases the price of the house. If the features surround the
house (i.e., close to shopping area or recreation are), then it will also increase the
price because the house offers more benefits to the buyer which means the house
is worth more.
4. Price and house size: As the size of a house increases, we expect the price to
increase. Although the size of a house still depends on the buyer’s preference, it
still holds true that as the size increases, the price will also increase.
-4 -
5. Price and lot size: The bigger the size of the lot, the more expensive the price of
the house because lot size is usually measured in square feet, and the price of the
house is based on square feet. Therefore, the bigger the size of the lot, the more
expensive the house will be.
6. Lot size and features: Features in this case are only the features that need their
own space within the house (i.e., swimming pool, multi-car garages). We assume
that the more features the house has, the bigger the size of the lot because a
swimming pool is generally outside the house.
7. Lot size and house size: The bigger the size of a house, the bigger the size of the
lot. As the size of the house increases, with small gardens or backyards, the size
of the lot will also increase.
8. House size and number of bedrooms: The size of a house will increase
proportionate to the number of bedrooms it has. It is highly unlikely to have more
bedrooms with a small size house because every room (i.e., living rooms, dining
room, kitchen, etc) has to be divided proportionately.
9. House size and features: The more features a house enclose, the bigger the size of
the house. In this case, we are talking about all the features that are built within
the house itself (i.e., Jacuzzi).
10. Features and age range: We think that the older a house is the least feature a house
will have. This is because we have made an assumption that in the past, they did
not have the technology to build these features into a house; it may have been too
expensive or too difficult to include these features (i.e., cul-de-sac, swimming
pool, etc).
-5 -
Section 2.0 Sampling Method
For data collection, we obtained all the data from www.realtylink.org which is a
website that provides a large portion of southwestern British Columbia’s real estate
listings. We used the map search tool to extract all houses within the area we were
observing. Next, we chose Greater Vancouver – Vancouver West as our target
population. In order to acquire more specific results for the houses we were going to
sample we used “house” as our property type; the price range was to $200,000 to
$10,000,000; the number of bedrooms and bathrooms was set to at least 1 and the age
ranged from 0 to 90+ years old.
In selecting the sample, we used Simple Random Sampling (SRS). The steps are
as follows:
1. We labelled the population from 001-500.
2. We pick a corner from the random table (manually). We start from the
top-left corner and read it continuously downward.
3. When reading the random table, we read it 3-digits at a time,
continuously.
4. While we were reading the table, we included any numbers that were
within 001-500.
5. We went through the random table until we acquired 100 samples.
The possible bias that may occur in our sampling is called the selection bias. We may
have under represented the area we sampled. Another case is a house may be sampled
more than once because we collected our data from a website and the same house may be
listed under several realtors. Events may occur in some places that may increase house
listings in that area which would result in a poor sampling because the sample will be
concentrated on houses in that vicinity.
-6 -
Section 3.0 Univariate Analyses
Variable 1: Number of bedrooms
Histogram
80
60
frequency
40
20
0
0 2 4 6 8 10
Number of bedrooms
Box-and-W hisker Plot
2 3 4 5 6 7 8
Number of bedrooms
-7 -
Summary Statistics for Number of Bedrooms
Count = 100
Average = 4.62
Median = 5.0
Mode = 5.0
Variance = 1.49051
Standard deviation = 1.22086
Minimum = 2.0
Maximum = 8.0
Range = 6.0
Lower quartile = 4.0
Upper quartile = 5.0
Interquartile range = 1.0
Frequency Tabulation for Number of bedrooms
------------------------------------------------------------------------------------------------------------
Lower Upper Relative Cumulative Cum. Rel.
Class Limit Limit Midpoint Frequency Frequency Frequency Frequency
------------------------------------------------------------------------------------------------------------
At or below 0.0 0 0.0000 0 0.0000
1 0.0 1.25 0.625 0 0.0000 0 0.0000
2 1.25 2.5 1.875 4 0.0400 4 0.0400
3 2.5 3.75 3.125 12 0.1200 16 0.1600
4 3.75 5.0 4.375 67 0.6700 83 0.8300
5 5.0 6.25 5.625 11 0.1100 94 0.9400
6 6.25 7.5 6.875 3 0.0300 97 0.9700
7 7.5 8.75 8.125 3 0.0300 100 1.0000
8 8.75 10.0 9.375 0 0.0000 100 1.0000
Above 10.0 0 0.0000 100 1.0000
------------------------------------------------------------------------------------------------------------
Mean = 4.62 Standard deviation = 1.22086
Interpretation:
Every house has a different number of bedrooms, but from the graph, we can
conclude that 50 % of the houses we evaluated have less than 4.62 bedrooms and 50%
have more than 4.62 bedrooms. Moreover, there are two houses with an extreme high
number of bedrooms and one house with an extreme low number of bedrooms.
Furthermore, the number of bedrooms among all the samples is evenly distributed.
-8 -
Variable 2: Age Range (0-90+ years old)
Histogram
30
25
frequency
20
15
10
5
0
0 20 40 60 80 100 120
Age range from 0 to 90 years old
Box-and-Whisker Plot
0 20 40 60 80 100
Age range from 0 to 90 years old
-9 -
Summary Statistics for Age range from 0 Percentiles for Age range from 0 to 90 years old
to 90 years old 1.0% = 0.0
Count = 74 5.0% = 2.0
Average = 29.6486 10.0% = 7.0
Median = 17.5 25.0% = 9.0
Mode = 7.0 50.0% = 17.5
Variance = 619.957 75.0% = 49.0
Standard deviation = 24.8989 90.0% = 64.0
Minimum = 0.0 95.0% = 80.0
Maximum = 94.0 99.0% = 94.0
Range = 94.0
Lower quartile = 9.0
Upper quartile = 49.0
Interquartile range = 40.0
------------------------------------------------------------------------------------------------------------
Lower Upper Relative Cumulative Cum. Rel.
Class Limit Limit Midpoint Frequency Frequency Frequency Frequency
------------------------------------------------------------------------------------------------------------
at or below -10.0 0 0.0000 0 0.0000
1 -10.0 5.0 -2.5 5 0.0676 5 0.0676
2 5.0 20.0 12.5 35 0.4730 40 0.5405
3 20.0 35.0 27.5 8 0.1081 48 0.6486
4 35.0 50.0 42.5 8 0.1081 56 0.7568
5 50.0 65.0 57.5 12 0.1622 68 0.9189
6 65.0 80.0 72.5 4 0.0541 72 0.9730
7 80.0 95.0 87.5 2 0.0270 74 1.0000
8 95.0 110.0 102.5 0 0.0000 74 1.0000
above 110.0 0 0.0000 74 1.0000
------------------------------------------------------------------------------------------------------------
Mean = 29.6486 Standard deviation = 24.8989
- 10 -
Interpretation:
Every house varies in age, but among all the houses we observed, 50% are less
than 17.5 years old and 50% are more than 17.5 years old. In addition, there are more
houses with a younger age, and fewer houses with an older age. Most houses are
between the age of 0 and 27 years old because if we were to select a house randomly, the
chance of a house being on the range of 0 to 27 years old is bigger than the chance of a
house being above the age 27 years old. Furthermore, there are no houses with an
extreme high or low age among our observations.
- 11 -
Variable 3: Features
Data variable: features
Number of observations: 100
Number of unique values: 2
Barchart for features
no
yes
20 40 60 80
frequency
Frequency Table for features
------------------------------------------------------------------------
Relative Cumulative Cum. Rel.
Class Value Frequency Frequency Frequency Frequency
------------------------------------------------------------------------
1 no 20 .2000 20 .2000
2 yes 80 .8000 100 1.0000
------------------------------------------------------------------------
Piechart for features
20.00%
features
no
yes
80.00%
Interpretation: There are 80 houses with features and 20 houses without features.
Therefore if we were to randomly select a house out of our sample, the chance of us
selecting a house with futures is greater than the chance of us selecting a house with
features.
- 12 -
Variable 4: House Size
Histogram
30
25
frequency
20
15
10
5
0
0 1 2 3 4 5
(X 1000)
House size in sqft
Box-and-Whisker Plot
0 1 2 3 4 5
(X 1000)
House size in sqft
- 13 -
Summary Statistics for House size in sqft Percentiles for House size in
Count = 100 sqft
Average = 2877.96
Median = 2738.0 1.0% = 1228.0
Mode = 2200.0 5.0% = 1758.0
Variance = 659410.0 10.0% = 1894.0
Standard deviation = 812.041 25.0% = 2266.5
Minimum = 1056.0 50.0% = 2738.0
Maximum = 4712.0 75.0% = 3430.0
Range = 3656.0 90.0% = 4137.0
Lower quartile = 2266.5 95.0% = 4394.5
Upper quartile = 3430.0 99.0% = 4690.0
Interquartile range = 1163.5
------------------------------------------------------------------------------------------------------------
Lower Upper Relative Cumulative Cum. Rel.
Class Limit Limit Midpoint Frequency Frequency Frequency Frequency
------------------------------------------------------------------------------------------------------------
at or below 0.0 0 0.0000 0 0.0000
1 0.0 625.0 312.5 0 0.0000 0 0.0000
2 625.0 1250.0 937.5 1 0.0100 1 0.0100
3 1250.0 1875.0 1562.5 9 0.0900 10 0.1000
4 1875.0 2500.0 2187.5 30 0.3000 40 0.4000
5 2500.0 3125.0 2812.5 23 0.2300 63 0.6300
6 3125.0 3750.0 3437.5 21 0.2100 84 0.8400
7 3750.0 4375.0 4062.5 11 0.1100 95 0.9500
8 4375.0 5000.0 4687.5 5 0.0500 100 1.0000
above 5000.0 0 0.0000 100 1.0000
------------------------------------------------------------------------------------------------------------
Mean = 2877.96 Standard deviation = 812.041
Interpretation:
Among the house sizes, 50% are less than 2738 square feet and 50% are more
than 2738 square feet. We have also observed that most houses are around the size of
3000 square feet. Since there is more data on the left side of the histogram, if we pick
any house, the chance of that house having a size ranging from 1056 square feet to 3000
square feet is very high. There are no extreme high or low house sizes in our sample.
- 14 -
Variable 5: Lot Size
Histogram
40
30
frequency
20
10
0
0 3 6 9 12 15 18
(X 1000)
Lot size in sqft
Box-and-Whisker Plot
0 4 8 12 16
(X 1000)
Lot size in sqft
- 15 -
Summary Statistics for Lot size in sqft Percentiles for Lot size in sqft
Count = 100 1.0% = 2235.2
Average = 6596.44 5.0% = 3696.0
Median = 6487.5 10.0% = 3909.43
Mode = 4026.0 25.0% = 4306.5
Variance = 6.5303E6 50.0% = 6487.5
Standard deviation = 2555.44 75.0% = 7919.89
Minimum = 1858.6 90.0% = 9697.0
Maximum = 15417.0 95.0% = 11665.5
Range = 13558.4 99.0% = 14609.5
Lower quartile = 4306.5
Upper quartile = 7919.89
Interquartile range = 3613.39
------------------------------------------------------------------------------------------------------------
Lower Upper Relative Cumulative Cum. Rel.
Class Limit Limit Midpoint Frequency Frequency Frequency Frequency
------------------------------------------------------------------------------------------------------------
at or below 0.0 0 0.0000 0 0.0000
1 0.0 2250.0 1125.0 1 0.0100 1 0.0100
2 2250.0 4500.0 3375.0 25 0.2500 26 0.2600
3 4500.0 6750.0 5625.0 29 0.2900 55 0.5500
4 6750.0 9000.0 7875.0 32 0.3200 87 0.8700
5 9000.0 11250.0 10125.0 7 0.0700 94 0.9400
6 11250.0 13500.0 12375.0 4 0.0400 98 0.9800
7 13500.0 15750.0 14625.0 2 0.0200 100 1.0000
8 15750.0 18000.0 16875.0 0 0.0000 100 1.0000
above 18000.0 0 0.0000 100 1.0000
------------------------------------------------------------------------------------------------------------
Mean = 6596.44 Standard deviation = 2555.44
Interpretation:
50% of our sample has a lot size that is less than 6487.5 square feet and 50% are
more than 6487.5 square feet. The data of lot sizes are mostly in the low values. In other
words, within our sample, most of the lots are between the size of 4000 to 6000 square
feet and larger lots are not that common. This box plot shows us that there are two houses
with an extremely large lot size.
- 16 -
Variable 6: Price
Histogram
40
30
frequency
20
10
0
0 3 6 9 12 15
(X 100000)
Price in dollars
Box-and-Whisker Plot
0 3 6 9 12 15
(X 100000)
Price in dollars
- 17 -
Summary Statistics for Price in dollars Percentiles for Price in dollars
Count = 100 1.0% = 394000.0
Average = 866806.0 5.0% = 555000.0
Median = 854000.0 10.0% = 599000.0
Mode = 25.0% = 693000.0
Variance = 4.8849E10 50.0% = 854000.0
Standard deviation = 221018.0 75.0% = 996500.0
Minimum = 299000.0 90.0% = 1.194E6
Maximum = 1.338E6 95.0% = 1.28E6
Range = 1.039E6 99.0% = 1.318E6
Lower quartile = 693000.0
Upper quartile = 996500.0
Interquartile range = 303500.0
Frequency Tabulation for Price in dollars
------------------------------------------------------------------------------------------------------------
Lower Upper Relative Cumulative Cum. Rel.
Class Limit Limit Midpoint Frequency Frequency Frequency Frequency
------------------------------------------------------------------------------------------------------------
at or below 0.0 0 0.0000 0 0.0000
1 0.0 187500.0 93750.0 0 0.0000 0 0.0000
2 187500.0 375000.0 281250.0 1 0.0100 1 0.0100
3 375000.0 562500.0 468750.0 5 0.0500 6 0.0600
4 562500.0 750000.0 656250.0 29 0.2900 35 0.3500
5 750000.0 937500.0 843750.0 31 0.3100 66 0.6600
6 937500.0 1.125E6 1.03125E6 17 0.1700 83 0.8300
7 1.125E6 1.3125E6 1.21875E6 16 0.1600 99 0.9900
8 1.3125E6 1.5E6 1.40625E6 1 0.0100 100 1.0000
above 1.5E6 0 0.0000 100 1.0000
------------------------------------------------------------------------------------------------------------
Mean = 866806.0 Standard deviation = 221018.0
Interpretation:
Price is the key factor in our observation because we are observing all the factors
that may or may not influence the price of a house. From this analysis, 50% of houses
among the sample have a price less than $854,000 and 50% have a price that is more than
$854,000. Moreover, many houses are on the low side of the price range (i.e.,
approximately less than $854,000). Furthermore, there are no extreme high or low prices
among the sample.
- 18 -
Section 4.0 Bivariate Analyses
The Relationship between House Size and Price
Plot of price _in dollars_ vs house size _sqft_
(X 100000)
15
price _in dollars_
12
9
6
3
0
0 1 2 3 4 5
(X 1000)
house size _sqft_
Box-and-Whisker Plot
0 1 2 3 4 5
(X 1000)
House size in sqft
- 19 -
From the graph and the way the data are placed we can deduce that as house size
increases, the price will also increase. The form of the graph tells us as the house size
changes, the price will also change in a proportionate fashion at every value. The
closeness of the data to form a line represents the strength of the relationship; as the
house size increases, the price of the house will have a proportional increase. The Box
and Whisker Plot shows no extreme high house sizes or low house sizes in our study.
The scatter plot does not exhibit any distinct groups.
Plot of Fitted Model
(X 100000)
15
price _in dollars_
12
9
6
3
0
0 1 2 3 4 5
(X 1000)
house size _sqft_
- 20 -
Regression Analysis - Linear model: Y = a + b*X
-----------------------------------------------------------------------------
Dependent variable: price _in dollars_
Independent variable: house size _sqft_
-----------------------------------------------------------------------------
Standard T
Parameter Estimate Error Statistic P-Value
-----------------------------------------------------------------------------
Intercept 307111.0 57497.9 5.34125 0.0000
Slope 194.476 19.235 10.1105 0.0000
-----------------------------------------------------------------------------
Analysis of Variance
-----------------------------------------------------------------------------
Source Sum of Squares Df Mean Square F-Ratio P-Value
-----------------------------------------------------------------------------
Model 2.46902E12 1 2.46902E12 102.22 0.0000
Residual 2.36703E12 98 2.41534E10
-----------------------------------------------------------------------------
Total (Corr.) 4.83605E12 99
Correlation Coefficient = 0.714524
R-squared = 51.0545 percent
R-squared (adjusted for d.f.) = 50.5551 percent
Standard Error of Est. = 155414.0
Mean absolute error = 121476.0
Durbin-Watson statistic = 2.09967 (P=0.3110)
Lag 1 residual autocorrelation = -0.0518703
The Simple Linear Regression Model is the mathematical model to explain the
relationship (i.e., mathematical equation)
Least squares estimated regression:
o Estimated price in dollars = 307111.0 + 194.476*number of house size
o Interpretation:
If a house increased its size by one square foot, the estimated price
of the house will have an increase of $194.476.
The intercept has no physical meaning because the size of a house
cannot be zero. It is there just for the purpose of positioning the
line.
Correlation coefficient: 0.714524
o The decision points: +0.196 and -0.196; therefore, 0.196< 0.714524
Because the correlation coefficient is bigger than the upper decision point,
the population has an obvious linear relationship.
- 21 -
o Interpretation: For all the houses, if the house size was to be increased or
decreased, then the price of the house will have a proportional increase or
decrease.
The assumptions
Linearity
o The coefficient of determination: 51.0545 percent
Interpretation: 51.0545 percent of the total variation in the price of
the house can be explained by the linear relationship between the
price of the house and the house size.
Homocedasticity
Residual Plot
Studentized residual
5.8
3.8
1.8
-0.2
-2.2
-4.2
0 1 2 3 4 5
(X 1000)
house size _sqft_
o The graph shows that the data are homoscedastic because the shape of the
graph is oval-shaped. This means that the standard deviations of the
distribution of the price at each certain house size are the same. This
indicates that the R.M.S. error provides a reliable estimate of the common
standard deviation of the distribution to explain the variability of the price
of the house at each single house size.
- 22 -
Normality
Normal Probability Plot
99.9
99
percentage 95
80
50
20
5
1
0.1
-53 -33 -13 7 27 47
(X 10000)
RESIDUALS
o The distribution is a bell-shaped distribution because the normal
probability plot looks like a straight line.
o Estimated price: $307694.428
R.M.S. error: 155414.0
$307694.428± 2(155414.0) = (3134, 618522.428)
Interpretation: This means we have 95% confidence (or most
likely) a house that is 3 thousand square feet will have a price
between $3134.00 and $618522.428.
- 23 -
The Relationship between Lot Size and Price
Plot of price _in dollars_ vs lot size _sqft_
(X 100000)
15
price _in dollars_
12
9
6
3
0
0 4 8 12 16
(X 1000)
lot size _sqft_
Box-and-Whisker Plot
0 4 8 12 16
(X 1000)
Lot size in sqft
- 24 -
From the graph and the way the data are placed we can deduce that as lot size
increases, the price will also increase. The form of the graph tells us as the lot size
changes, the price will also change proportionately at every value. The closeness of the
data to form a line represents the strength of the relationship; as the lot size increases, the
price of the house will have a weak proportional increase. The Box and Whisker Plot
show there are two houses with extreme high (big) lot sizes in our study. The scatter plot
does not exhibit any distinct groups.
Plot of Fitted Model
(X 100000)
15
price _in dollars_
12
9
6
3
0
0 4 8 12 16
(X 1000)
lot size _sqft_
- 25 -
Regression Analysis - Linear model: Y = a + b*X
-----------------------------------------------------------------------------
Dependent variable: price _in dollars_
Independent variable: lot size _sqft_
-----------------------------------------------------------------------------
Standard T
Parameter Estimate Error Statistic P-Value
-----------------------------------------------------------------------------
Intercept 675671.0 58195.0 11.6105 0.0000
Slope 28.9755 8.23183 3.51994 0.0007
-----------------------------------------------------------------------------
Analysis of Variance
-----------------------------------------------------------------------------
Source Sum of Squares Df Mean Square F-Ratio P-Value
-----------------------------------------------------------------------------
Model 5.4279E11 1 5.4279E11 12.39 0.0007
Residual 4.29326E12 98 4.38088E10
-----------------------------------------------------------------------------
Total (Corr.) 4.83605E12 99
Correlation Coefficient = 0.33502
R-squared = 11.2238 percent
R-squared (adjusted for d.f.) = 10.3179 percent
Standard Error of Est. = 209306.0
Mean absolute error = 168263.0
Durbin-Watson statistic = 2.21536 (P=0.1385)
Lag 1 residual autocorrelation = -0.120617
Correlation coefficient: 0.33502
o The decision point: 0.196; therefore, 0.196< 0.33502
Since the correlation coefficient is bigger than the decision point, which is
0.196, the population has an obvious linear relationship.
o Interpretation: For all the houses, if the lot size were to be increased or
decreased, then the price of the house will have a proportional increase or
decrease.
The Simple Linear Regression Model is the mathematical model to explain the
relationship (i.e. mathematical equation)
Least squares estimated regression
o Estimated price in dollars =675671.0 + 28.9755*lot size in square feet
If a house increases its lot size by one square foot, the estimated
price of the house will have an increase of $28.9755.
- 26 -
The intercept has no physical meaning because the size of a house
can’t be zero. It is used for the purpose of positioning the line.
The Assumptions
Linearity
o The coefficient of determination: 11.2238
Interpretation: 11.2238% of the total variation in the price of the
house can be explained by the linear relationship between the price
of the house and the lot size.
Homoscedasity
Residual Plot
Studentized residual
4.5
2.5
0.5
-1.5
-3.5
0 4 8 12 16
(X 1000)
lot size _sqft_
o The graph shows that the data are homoscedastic because the shape of
the graph is an oval. This means that the standard deviations of the
distribution of the price at each certain lot size are the same. This
indicates that the R.M.S. error provides a reliable estimate of the
common standard deviation of the distribution to explain the
variability of the price of the house at each lot size.
- 27 -
Normality
Normal Probability Plot
99.9
99
percentage
95
80
50
20
5
1
0.1
-7 -5 -3 -1 1 3 5
(X 100000)
RESIDUALSls
o The distribution is not a bell-shaped distribution because the normal
probability plot does not look like a straight line.
o Estimated price: $675671.0
R.M.S. error: 209306
675671 ± 2(209306) = (466365, 884977)
Interpretation: since the normal probability plot is not bell shaped,
we don’t have 95% confidence to say that for a house with a 3
thousand square feet lot, the price of the house is between $466365
and $884977.
o This is double confirmation that we should not use the above 95%
prediction interval found. It is mainly because the normality is violated
and in addition the RMS error is the appropriate estimate of the difference
standard deviations.
- 28 -
The Relationship between Price and Number of Bedrooms
Plot of price _in dollars_ vs # of bedrooms
(X 100000)
15
price _in dollars_
12
9
6
3
0
2 3 4 5 6 7 8
# of bedrooms
Box-and-W hisker Plot
2 3 4 5 6 7 8
Number of bedrooms
- 29 -
From the graph and the way the data are placed we can deduce that as the number
of bedrooms increase, the price will also increase. The form of the graph tells us as the
number of bedrooms changes; the price will also change proportionately at every value.
The closeness of the data to form a line represents the strength of the relationship; as the
number of bedrooms changes, the price of the house will have a weak proportional
change. The Box and Whisker Plot shows there are two houses with an extreme high
number of bedrooms and one house with an extreme low number of bedrooms in our
study. The scatter plot does not exhibit any distinct groups.
Plot of Fitted Model
(X 100000)
15
price _in dollars_
12
9
6
3
0
2 3 4 5 6 7 8
# of bedrooms
- 30 -
Regression Analysis - Linear model: Y = a + b*X
-----------------------------------------------------------------------------
Dependent variable: price _in dollars_
Independent variable: # of bedrooms
-----------------------------------------------------------------------------
Standard T
Parameter Estimate Error Statistic P-Value
-----------------------------------------------------------------------------
Intercept 558340.0 81200.2 6.87608 0.0000
Slope 66767.6 16998.1 3.92795 0.0002
-----------------------------------------------------------------------------
Analysis of Variance
-----------------------------------------------------------------------------
Source Sum of Squares Df Mean Square F-Ratio P-Value
-----------------------------------------------------------------------------
Model 6.5781E11 1 6.5781E11 15.43 0.0002
Residual 4.17824E12 98 4.26351E10
-----------------------------------------------------------------------------
Total (Corr.) 4.83605E12 99
Correlation Coefficient = 0.368812
R-squared = 13.6022 percent
R-squared (adjusted for d.f.) = 12.7206 percent
Standard Error of Est. = 206483.0
Mean absolute error = 168419.0
Durbin-Watson statistic = 2.12485 (P=0.2679)
Lag 1 residual autocorrelation = -0.0745949
Correlation coefficient: 0.368812
o The decision point: 0.196; therefore, 0.196 < 0368812
Since the correlation coefficient is bigger than the decision point, which is
0.196, the population has an obvious linear relationship.
o Interpretation: For all the houses, if the number of bedrooms were to be
increased or decreased, then the price of the houses will have a
proportional increase or decrease.
The Simple Linear Regression Model is the mathematical model to explain the
relationship (i.e., mathematical equation)
Least squares estimated regression
o Estimated price in dollars =558,340.00 + 66,767.60*number of bedrooms
If a house adds one more bedroom, the estimated price of the
house will have an increase of $66,767.60
- 31 -
If the house has no bedroom, then the estimated price of the house
will be $558,340.00
The Assumptions
Linearity
o The coefficient of determination: 13.6022
Interpretation: 13.6022% of the total variation in the price of the
house can be explained by the linear relationship between the price
of the house and the number of bedrooms.
Homoscedasity
Residual Plot
Studentized residual
3.3
2.3
1.3
0.3
-0.7
-1.7
-2.7
2 3 4 5 6 7 8
# of bedrooms
o The graph shows that the data are homoscedastic because the shape of
the graph is an oval. This means that the standard deviations of the
distribution of the price at each number of bedrooms are the same.
This indicates that the R.M.S. error provides a reliable estimate of the
common standard deviation of the distribution to explain the
variability of the price of the house at each number of bedrooms.
- 32 -
Normality
Normal Probability Plot
99.9
99
percentage
95
80
50
20
5
1
0.1
-53 -33 -13 7 27 47
(X 10000)
RESIDUALS
o The distribution is a bell-shaped distribution because the normal
probability plot looks like a straight line.
o Estimated price: $758,642.80
R.M.S. error: 206483
758642 ± 2(206483) = (345676, 1171608)
Interpretation: This means we have 95% confidence (or most
likely) that for a house with 3 bedrooms, the price of the house is
between $345,676 and $1,171,608.
- 33 -
The Relationship between Age and Price
Plot of price _in dollars_ vs age range _0_90_ years old_
(X 100000)
15
price _in dollars_
12
9
6
3
0
0 20 40 60 80 100
age range _0_90_ years old_
Box-and-Whisker Plot
0 20 40 60 80 100
Age range from 0 to 90 years old
- 34 -
From the graph and the way the data are placed we can deduce that as the age of
the house increases, the price will also decrease. The form of the graph tells us as the age
of the house changes, the price will not change proportionately at every value. The
closeness of the data to form a line represents the strength of the relationship; as the age
of the house increases, the price of the house will have a mild proportional decrease. The
Box and Whisker Plot shows there are no extreme high or extreme low values in the age
range in our study. The scatter plot exhibits two distinct groups.
Plot of Fitted Model
(X 100000)
15
price _in dollars_
12
9
6
3
0
0 20 40 60 80 100
age range _0_90_ years old_
- 35 -
Regression Analysis - Linear model: Y = a + b*X
-----------------------------------------------------------------------------
Dependent variable: price _in dollars_
Independent variable: age range _0_90_ years old_
-----------------------------------------------------------------------------
Standard T
Parameter Estimate Error Statistic P-Value
-----------------------------------------------------------------------------
Intercept 1.00912E6 38337.4 26.3222 0.0000
Slope -4182.14 992.976 -4.21172 0.0001
-----------------------------------------------------------------------------
Analysis of Variance
-----------------------------------------------------------------------------
Source Sum of Squares Df Mean Square F-Ratio P-Value
-----------------------------------------------------------------------------
Model 7.91556E11 1 7.91556E11 17.74 0.0001
Residual 3.21288E12 72 4.46233E10
-----------------------------------------------------------------------------
Total (Corr.) 4.00443E12 73
Correlation Coefficient = -0.444601
R-squared = 19.767 percent
R-squared (adjusted for d.f.) = 18.6526 percent
Standard Error of Est. = 211242.0
Mean absolute error = 173166.0
Durbin-Watson statistic = 2.17166 (P=0.2351)
Lag 1 residual autocorrelation = -0.0979682
Correlation coefficient: -0.444601
o The decision points: +/- 0.196; therefore, -0.444601 < -0.196
Since the correlation coefficient is smaller than the lower decision point,
which is -0.196, the population has an obvious linear relationship.
o Interpretation: For all the houses, if the age range would like to be
increased or decreased, then the price of the house will have a proportional
decrease or increase.
The Simple Linear Regression Model is the mathematical model to explain the
relationship (i.e., mathematical equation)
Least squares estimated regression:
o Estimated price in dollars = 1009120 – 4182.14*the age of the house
- 36 -
o Interpretation:
With each passing year, the estimated price of the house will have
an decrease of $4182.14 each year
If the house is zero in age, then the estimated price of the house
will be $1,009,120
The assumptions
Linearity
o The coefficient of determination: 19.767 percent
Interpretation: 19.767% of the total variation in the price of the
house can be explained by the linear relationship between the price
of the house and the age of the house
Homocedasticity
Residual Plot
Studentized residual
3
2
1
0
-1
-2
-3
0 20 40 60 80 100
age range _0_90_ years old_
o The graph shows that the data are not homoscedastic because the shape of
the graph is not oval-shaped or random. This means that the standard
deviation of the distribution of y-values at each x-value is not the same.
This indicates that the R.M.S. error does not provide a reliable estimate of
- 37 -
the common standard deviation of the distribution to explain the
variability of the price of the house at different ages.
Normality
Normal Probability Plot
99.9
99
percentage
95
80
50
20
5
1
0.1
-6 -4 -2 0 2 4 6
(X 100000)
RESIDUALSage
o The distribution is not a bell-shaped distribution because the normal
probability plot does not look like a straight line.
o Estimated price: $996,573.58
R.M.S. error: 211242
996573 ± 2(211242) = (574089, 1419057)
Interpretation: since the normal probability plot is not bell shaped, we
don’t have 95% confidence (or most likely) that a house, 3 years of age, is
in between $574,089 and $1,419,057.
o This is double confirmation that we should not use the above 95%
prediction interval found because the normality is violated and in addition
the RMS error is not the appropriate estimate of the difference standard
deviations.
- 38 -
The Relationship between the House Size and Lot Size
Plot of lot size _sqft_ vs house size _sqft_
(X 1000)
16
lot size _sqft_
12
8
4
0
0 1 2 3 4 5
(X 1000)
house size _sqft_
Box-and-Whisker Plot
0 1 2 3 4 5
(X 1000)
House size in sqft
- 39 -
From the graph and the way the data are placed we can deduce that as the house size
changes, the lot size will also change. The form of the graph tells us the lot size changes,
the price will change proportionately at every value. The closeness of the data to form a
line represents the strength of the relationship; as the lot size changes, the price of the
house will have a mild proportional increase. The Box and Whisker Plot shows there are
no extreme high or extreme low values house sizes in our study. The scatter plot exhibits
two distinct groups.
Plot of Fitted Model
(X 1000)
16
lot size _sqft_
12
8
4
0
0 1 2 3 4 5
(X 1000)
house size _sqft_
- 40 -
Regression Analysis - Linear model: Y = a + b*X
-----------------------------------------------------------------------------
Dependent variable: lot size _sqft_
Independent variable: RESIDUALSHsLs
-----------------------------------------------------------------------------
Standard T
Parameter Estimate Error Statistic P-Value
-----------------------------------------------------------------------------
Intercept 6596.43 95.6433 68.9692 0.0000
Slope 1.0 0.0405307 24.6726 0.0000
-----------------------------------------------------------------------------
Analysis of Variance
-----------------------------------------------------------------------------
Source Sum of Squares Df Mean Square F-Ratio P-Value
-----------------------------------------------------------------------------
Model 5.56853E8 1 5.56853E8 608.74 0.0000
Residual 8.96468E7 98 914763.0
-----------------------------------------------------------------------------
Total (Corr.) 6.465E8 99
Correlation Coefficient = 0.928081
R-squared = 86.1335 percent
R-squared (adjusted for d.f.) = 85.992 percent
Standard Error of Est. = 956.433
Mean absolute error = 788.861
Durbin-Watson statistic = 2.13275 (P=0.2539)
Lag 1 residual autocorrelation = -0.0928307
Correlation coefficient: 0.928081
o The decision points: 0.196 and -0.196; therefore, 0.196< 0.928081
Because the correlation coefficient is bigger than 0.196, then the
population has an obvious linear relationship.
o Interpretation: For all the houses, if the house size were to be increased or
decreased, then the lot size will have a proportional increase or decrease.
The Simple Linear Regression Model is the mathematical model to explain the
relationship (i.e., mathematical equation)
Least squares estimated regression
o Estimated price in dollars =6596.43 + 1.0*lot size in square feet
If a house increased its size by one square foot, the estimated lot
size will have an increase of one square foot
The intercept has no physical meaning because the size of a house
cannot be zero. It is just for the purpose of positioning the line.
- 41 -
The Assumptions
Linearity
o The coefficient of determination: 11.2238
Interpretation: 11.2238% of the total variation in the price of the
house can be explained by the linear relationship between the price
of the house and the lot size.
scedasity
Residual Plot
Studentized residual
4
2
0
-2
-4
0 1 2 3 4 5
(X 1000)
house size _sqft_
o The graph shows that the data are homoscedastic because the graph is
oval-shaped. This means that the standard deviations of the
distribution of the lot size at each certain house size are the same. This
indicates that the R.M.S. error provides a reliable estimate of the
common standard deviation of the distribution to explain the
variability of the lot size at each house size.
- 42 -
Normality
Normal Probability Plot
99.9
99
percentage
95
80
50
20
5
1
0.1
-5 -2 1 4 7 10
(X 1000)
RESIDUALS Lot and House Size
o The distribution is not a bell-shaped distribution because the normal
probability plot does not look like a straight line.
o Estimated house size: 6599.43 square feet
R.M.S. error: 956.433
6599.43 ± 2(956.433) = (4686.564, 8512.296)
Interpretation: Since the normal probability plot is not bell shaped, we
don’t have 95% confidence that a house with 3 thousand square feet lot
size will have a house that is between 4686.564 square feet and 8512.296
square feet.
- 43 -
The Relationship between House Size and Number of
Bedrooms
Plot of house size _sqft_ vs # of bedrooms
(X 1000)
5
house size _sqft_
4
3
2
1
2 3 4 5 6 7 8
# of bedrooms
Box-and-W hisker Plot
2 3 4 5 6 7 8
Number of bedrooms
- 44 -
From the graph and the way the data are placed we can deduce that as the number
of bedrooms increase, the house size will also increase. The form of the graph tells us as
the number of bedrooms changes; the house size will also change proportionally at every
value. The closeness of the data to form a line represents the strength of the relationship;
as the number of bedrooms changes, the house size will have a weak proportional
increase. The Box and Whisker Plot shows there are two houses with an extreme high
number of bedrooms and one house with an extreme low number of bedrooms in our
study. The scatter plot does not exhibit any distinct groups.
Plot of Fitted Model
(X 1000)
5
house size _sqft_
4
3
2
1
2 3 4 5 6 7 8
# of bedrooms
- 45 -
Regression Analysis - Linear model: Y = a + b*X
-----------------------------------------------------------------------------
Dependent variable: house size _sqft_
Independent variable: # of bedrooms
-----------------------------------------------------------------------------
Standard T
Parameter Estimate Error Statistic P-Value
-----------------------------------------------------------------------------
Intercept 1277.51 273.996 4.66251 . 0
Slope 346.418 57.3569 6.0397 . 0
-----------------------------------------------------------------------------
Analysis of Variance
-----------------------------------------------------------------------------
Source Sum of Squares Df Mean Square F-Ratio P-Value
-----------------------------------------------------------------------------
Model 1.7708E7 1 1.7708E7 36.48 . 0
Residual 4.75736E7 98 485445.0
-----------------------------------------------------------------------------
Total (Corr.) 6.52816E7 99
Correlation Coefficient = .520822
R-squared = 27.1256 percent
R-squared (adjusted for d.f.) = 26.382 percent
Standard Error of Est. = 696.739
Mean absolute error = 590.125
Durbin-Watson statistic = 2.10262 (P=.3059)
Lag 1 residual autocorrelation = -.0666658
Correlation coefficient: 0.520822
o The decision point: 0.196; therefore, 0.196 < 0.520822
Because the correlation coefficient is bigger than -0.196, the population
has a linear relationship.
o Interpretation: For all the houses, if the number of the bedrooms were to
be increased or decreased, then the size of the houses will have a
proportional increase or decrease.
The Simple Linear Regression Model is the mathematical model to explain the
relationship (i.e., mathematical equation)
Least squares estimated regression
o Estimated house size in square feet =1277.51+ 346.418*number of
bedrooms
- 46 -
If a house adds 1 more bedroom, the estimated size of the house
will have an increase of 346.418 squares ft.
If the house has zero number of bedrooms, then the estimated price
of the house will be 1277.51-squared ft.
The Assumptions
Linearity
o The coefficient of determination: 27.1256 percent
Interpretation: 27.1256% of the total variation in the size of the
house can be explained by the linear relationship between the size
of the house and the number of bedrooms.
Homoscedasity
Residual Plot
Studentized residual
2.6
1.6
.6
-.4
-1.4
-2.4
2 3 4 5 6 7 8
# of bedrooms
o The graph shows that the data are homoscedastic because the shape of
the graph is an oval. This means that the standard deviations of the
distribution if the house size at each certain number of bedrooms are
the same. This indicates that the R.M.S. error provides a reliable
estimate of the common standard deviation of the distribution to
- 47 -
explain the variability of the size of the house at each number of
bedrooms.
Normality
Normal Probability Plot
99.9
99
percentage
95
80
50
20
5
1
.1
-1700 -700 300 1300 2300
RESIDUALShsbe
o The distribution is not bell-shaped distribution because the normal
probability plot does not look like a straight line. 1277.51+ 346.418
o Estimated house size: 2316.764 square feet
R.M.S. error: 696.739
2316.764 ± 2(696.739) = (923.286, 3710.242)
Interpretation: Since the normal probability plot is not bell shaped,
we don’t have 95% confidence that for a house with 3 bedrooms
will have a house size between 923.286 and 3710.242 square feet.
- 48 -
Relationship between Features and Age
Barchart for age range _0_90_ years old__1 by features
40 features
no
yes
30
frequency
20
10
0
1 2
Age Range
Frequency Table for age range _0_90_ years old__1 by features
Row
no yes Total
---------------------------
1 | 4 | 40 | 44
| 5.41% | 54.05% | 59.46%
| 7.73 | 36.27 |
---------------------------
2 | 9 | 21 | 30
| 12.16% | 28.38% | 40.54%
| 5.27 | 24.73 |
---------------------------
Column 13 61 74
Total 17.57% 82.43% 100.00%
Note: Column 1: 0 – 30 years old
Column 2: 31-95 years old
Null hypothesis: In the population of houses, there is no relationship between features
and age.
Alternative hypothesis: In the population of houses, there is a relationship between
features and age.
- 49 -
Chi-Square Test
------------------------------------------
Chi-Square Df P-Value
------------------------------------------
5.39 1 0.0203
4.04 1 0.0445 (with Yates' correction)
------------------------------------------
Fisher's Exact Test for 2 by 2 Tables
-------------------------------------
One-tailed P-value = 0.0228914
Two-tailed P-value = 0.0294331
Since Degree of Freedom = 1 the Decision Point is 3.84
Therefore, since the Chi-Square, 5.39, which is larger than DF, we have enough evidence
to conclude that there is an association between their features and age.
Hence, the alternative relationship is correct. For all houses, there is a relationship
between age and features (i.e. the older the house, the less features a house has). Our
hypothesis was correct; the older the house the less features it has.
- 50 -
Relationship between Features and House Size
Barchart for house size _sqft__1 by features
30 features
no
25 yes
frequency
20
15
10
5
0
1 2 3
House Size
Frequency Table for house size _sqft__1 by features
Row
no yes Total
---------------------------
1 | 11 | 29 | 40
| 11.00% | 29.00% | 40.00%
| 8.00 | 32.00 |
---------------------------
2 | 8 | 23 | 31
| 8.00% | 23.00% | 31.00%
| 6.20 | 24.80 |
---------------------------
3 | 1 | 28 | 29
| 1.00% | 28.00% | 29.00%
| 5.80 | 23.20 |
---------------------------
Column 20 80 100
Total 20.00% 80.00% 100.00%
Cell contents:
Observed frequency
Percentage of table
Expected frequency
- 51 -
Note: Column 1: 1000 – 2500 square feet
Column 2: 2501 – 3400 square feet
Column 3: 3401 – 5000 square feet
Null hypothesis: In the population of houses, there is no relationship between features
and house size.
Alternative hypothesis: In the population of houses, there is a relationship between
features and house size.
Chi-Square Test
------------------------------------------
Chi-Square Df P-Value
------------------------------------------
7.02 2 0.0298
------------------------------------------
Since Degree of Freedom = 2 the Decision Point is 5.99.
Therefore, since Chi-Square, 7.02, which is larger than DF, we have enough evidence to
conclude that there is an association between features and house size.
Hence, the alternative relationship is correct. For all houses, there is a relationship
between house size and features. Our hypothesis was correct. The larger the house, the
more features it has.
- 52 -
Relationship between Features and Lot size
Barchart for lot size _sqft__1 by features
30 features
no
25 yes
frequency
20
15
10
5
0
1 2 3
Lot Size
Frequency Table for lot size _sqft__1 by features
Row
no yes Total
---------------------------
1 | 7 | 29 | 36
| 7.00% | 29.00% | 36.00%
| 7.20 | 28.80 |
---------------------------
2 | 8 | 24 | 32
| 8.00% | 24.00% | 32.00%
| 6.40 | 25.60 |
---------------------------
3 | 5 | 27 | 32
| 5.00% | 27.00% | 32.00%
| 6.40 | 25.60 |
---------------------------
Column 20 80 100
Total 20.00% 80.00% 100.00%
Note: Column 1: 1500 – 5500 square feet
Column 2: 5501 – 7500 square feet
Column 3: 7501 – 15500 square feet
Null hypothesis: In the population of houses, there is no relationship between features
and lot size.
Alternative hypothesis: In the population of houses, there is a relationship between
features and lot size.
- 53 -
Chi-Square Test
------------------------------------------
Chi-Square Df P-Value
------------------------------------------
7.02 2 0.0298
------------------------------------------
Since Degree of Freedom = 2 the Decision Point is 5.99
Therefore, since the Chi-Square, 7.02, which is larger than DF, we have enough evidence
to conclude that there is an association between features and lot size.
Hence, the alternative relationship is correct. For all houses, there is a relationship
between lot size and features. Our hypothesis was correct, the bigger the lot size, the
more features a house has.
- 54 -
Relationship between Features and Price
Barchart for price _in dollars__1 by features
40 features
no
yes
30
frequency
20
10
0
1 2 3
Price in Dollars
Frequency Table for price _in dollars__1 by features
Row
no yes Total
---------------------------
1 | 11 | 18 | 29
| 11.00% | 18.00% | 29.00%
| 5.80 | 23.20 |
---------------------------
2 | 4 | 29 | 33
| 4.00% | 29.00% | 33.00%
| 6.60 | 26.40 |
---------------------------
3 | 5 | 33 | 38
| 5.00% | 33.00% | 38.00%
| 7.60 | 30.40 |
---------------------------
Column 20 80 100
Total 20.00% 80.00% 100.00%
Note: Column 1: 200,000 – 700,000 dollars
Column 2: 700,001 – 900,000 dollars
Column 3: 900,001 – 1,400,000 dollars
Null hypothesis: In the population of houses, there is no relationship between features
and price.
Alternative hypothesis: In the population of houses, there is a relationship between
features and price.
- 55 -
Chi-Square Test
------------------------------------------
Chi-Square Df P-Value
------------------------------------------
8.22 2 0.0164
------------------------------------------
Since Degree of Freedom = 2 the Decision Point is 5.99
Therefore, since the Chi-Square, 8.22, is larger than the DP, we have enough evidence to
conclude that there is an association between features and price.
Hence, the alternative relationship is correct. For all houses, there is a relationship
between price and features. Our hypothesis was correct, the more the house cost, the
more features a house has.
- 56 -
Section 5.0 Conclusions
There are many factors that can influence the price of a house around the
Vancouver Area of Greater Vancouver. In this project we came up with five variables
that have the biggest influence on price: number of bedrooms, age range, features, house
size and lot size. The influence towards price is made by the variations of the number of
bedrooms, age, features, house size and lot size in each house
From our analysis, we can see how each variable is related to the price of the
house and to other variables as well. Moreover, not every variable is related to each other.
Therefore, the way all these factors are related is still depends on the buyer’s preferences
when they are purchasing the house.
As we started the project early, we have encountered many limitations to our
results. Our first limitation: the houses we took for sample are not representative of the
houses in the area of Vancouver West because we only took the sample from a website
and we only sampled houses that were on the market. We did not sample the houses that
have been sold or currently lived in.
Moreover, another limitation that we have encountered include us having limited
time and it is also costly to do more detailed sampling therefore the sample of the houses
we have analyzed are not 100% representative of Vancouver West. This is because each
area varies on their own in price range; the exact same house in one area may have a
different price in another area
Our third limitation is since we took our samples from a website, if the day after
we wanted to recheck it, the houses with the number taken from the Simple Random
Sampling may be a different house because each day house are being sold and the house
will be removed from the site and the number that use to represent the house we sampled
may have been reassigned to another house.
The fourth limitation is that we calculated the factors that influence the price of a
house systematically, but the buyer’s preferences unquestionably contribute to price
influence of a house as well. For example, some parents will deliberately find a house
within the boundaries of a good school for the sake of their children so they are willing to
pay a high price in order to obtain that convenience and benefit.
- 57 -
Section 6.0 Contributions
Kelvin Cheung
Bivariate
House
Instant noodles
Na Ding
Bivariate
Univariate – feature
Some editing
Sampling – Search for Houses
Stephanie Gozali
Bivariate
Univariate
Putting the report together (i.e., intro to conclusion)
SGP work
Sampling using SRS
Elaine Wong
Edit the whole report, i.e., grammar, sentence structure, etc
Help in SGP
Bivariate
Sampling – Search for Houses
- 58 -