Embed
Email

Statistic 1181 Project

Document Sample
Statistic 1181 Project
Statistic 1181 Project

Kelvin Cheung

Na Ding

Stephanie Gozali

Elaine Wong



STATS 1181 – 005

November 28, 2004

Table of Content





Section 1.0 Introduction………………………………………………………………..3

Section 2.0 Sampling Method………………………………………………………….6

Section 3.0 The Univariate Summary

 Variable 1: Number of Bedrooms…………………………………7

 Variable 2: Age Range………………………………….…………9

 Variable 3: Features…………………………………….………..12

 Variable 4: House size………………………………….………..13

 Variable 5: Lot Size……………………………………………...15

 Variable 6: Price…………………………………………………17

Section 4.0 The Bivariate Summary

 Price VS House Size…………………………………………….19

 Price VS Lot Size……………..…………………………………24

 Price VS Number of Bedrooms………………………………….29

 Price VS Age Range……………………………………………..34

 House Size VS Lot Size…………………………………………39

 House size VS Number of Bedrooms…………………………...44

 Features VS Age Range…………………………………………49

 Features VS House Size…………………………………………51

 Features VS Lot Size….………………………………………...53

 Features VS Price……..…………………………………………54

Section 5.0 Conclusions……………………………………………………………...57

Section 6.0 Contributions…………………………………………………………….58









-2 -

Section 1.0 Introduction

Why do different houses have different price tags on them? In this project, we are

going to determine the factors that influence the price of a house in the vicinities of

Vancouver West and Greater Vancouver, as well as assess how these factors are related.

We are aiming to trace the factors that influence the price of a house and to see how these

variables relate to the price. Moreover, we will try to keep the results as random as

possible and to make the sample as representative to the target population, Vancouver

West, as possible.









In finding the factors that influence the price of a house, there are several of

variables that we will consider:

1. Number of bedrooms – Every house has a different number of bedrooms and this

is one of the biggest factors in determining the price of a house.

2. Age range – How old is the house? Has it been renovated? Has it been rebuilt?

Have there been additions? Does it have its own character (i.e., Victorian style,

Brick style)? All houses vary in age, some people prefer newly built houses

whereas others prefer a character house; therefore, the age of a house corresponds

to different prices.









-3 -

3. Features – Some houses have extra features, for example, fireplace, swimming

pool, security system, waterfront, cul-de-sac, close to shopping area, etc. These

factors certainly have an impact on the price.

4. House size (in sqft) – The size of a house has a definite influence on determining

the price of a house. However, some people might prefer smaller houses with a

bigger garden or vice versa.

5. Lot size (in sqft) – The size of the lot a house sits on also has an influence on the

price.

6. Price (in dollars)



In making an association or relationship between variables, we need to make a

hypothesis as our base of discussion. We have ten hypotheses to observe whether or not

each factor is related and also to see whether or not the variables have an influence on the

price.

1. Price and number of bedrooms: As the number of bedrooms increases, the house

size should increase proportionately leading to an increase in price.

2. Price and age range: The age of a house controls its price because the price of a

house is high if the house is still new and it will stay high for a number of years.

However, in other cases, as the house grows older, the price will also increase

because the house obtains its own characteristics that differentiate it from other

houses.

3. Price and features: The more features households or is surrounded with, the more

a house will cost. If the features are within the house (i.e., swimming pool, air

conditioner, fireplace, security system, etc), then it increases the cost of building

the house, which then increases the price of the house. If the features surround the

house (i.e., close to shopping area or recreation are), then it will also increase the

price because the house offers more benefits to the buyer which means the house

is worth more.

4. Price and house size: As the size of a house increases, we expect the price to

increase. Although the size of a house still depends on the buyer’s preference, it

still holds true that as the size increases, the price will also increase.







-4 -

5. Price and lot size: The bigger the size of the lot, the more expensive the price of

the house because lot size is usually measured in square feet, and the price of the

house is based on square feet. Therefore, the bigger the size of the lot, the more

expensive the house will be.

6. Lot size and features: Features in this case are only the features that need their

own space within the house (i.e., swimming pool, multi-car garages). We assume

that the more features the house has, the bigger the size of the lot because a

swimming pool is generally outside the house.

7. Lot size and house size: The bigger the size of a house, the bigger the size of the

lot. As the size of the house increases, with small gardens or backyards, the size

of the lot will also increase.

8. House size and number of bedrooms: The size of a house will increase

proportionate to the number of bedrooms it has. It is highly unlikely to have more

bedrooms with a small size house because every room (i.e., living rooms, dining

room, kitchen, etc) has to be divided proportionately.

9. House size and features: The more features a house enclose, the bigger the size of

the house. In this case, we are talking about all the features that are built within

the house itself (i.e., Jacuzzi).

10. Features and age range: We think that the older a house is the least feature a house

will have. This is because we have made an assumption that in the past, they did

not have the technology to build these features into a house; it may have been too

expensive or too difficult to include these features (i.e., cul-de-sac, swimming

pool, etc).









-5 -

Section 2.0 Sampling Method

For data collection, we obtained all the data from www.realtylink.org which is a

website that provides a large portion of southwestern British Columbia’s real estate

listings. We used the map search tool to extract all houses within the area we were

observing. Next, we chose Greater Vancouver – Vancouver West as our target

population. In order to acquire more specific results for the houses we were going to

sample we used “house” as our property type; the price range was to $200,000 to

$10,000,000; the number of bedrooms and bathrooms was set to at least 1 and the age

ranged from 0 to 90+ years old.



In selecting the sample, we used Simple Random Sampling (SRS). The steps are

as follows:

1. We labelled the population from 001-500.

2. We pick a corner from the random table (manually). We start from the

top-left corner and read it continuously downward.

3. When reading the random table, we read it 3-digits at a time,

continuously.

4. While we were reading the table, we included any numbers that were

within 001-500.

5. We went through the random table until we acquired 100 samples.



The possible bias that may occur in our sampling is called the selection bias. We may

have under represented the area we sampled. Another case is a house may be sampled

more than once because we collected our data from a website and the same house may be

listed under several realtors. Events may occur in some places that may increase house

listings in that area which would result in a poor sampling because the sample will be

concentrated on houses in that vicinity.









-6 -

Section 3.0 Univariate Analyses



Variable 1: Number of bedrooms





Histogram

80





60

frequency









40



20





0

0 2 4 6 8 10

Number of bedrooms







Box-and-W hisker Plot









2 3 4 5 6 7 8

Number of bedrooms









-7 -

Summary Statistics for Number of Bedrooms

Count = 100

Average = 4.62

Median = 5.0

Mode = 5.0

Variance = 1.49051

Standard deviation = 1.22086

Minimum = 2.0

Maximum = 8.0

Range = 6.0

Lower quartile = 4.0

Upper quartile = 5.0

Interquartile range = 1.0





Frequency Tabulation for Number of bedrooms



------------------------------------------------------------------------------------------------------------

Lower Upper Relative Cumulative Cum. Rel.

Class Limit Limit Midpoint Frequency Frequency Frequency Frequency

------------------------------------------------------------------------------------------------------------

At or below 0.0 0 0.0000 0 0.0000

1 0.0 1.25 0.625 0 0.0000 0 0.0000

2 1.25 2.5 1.875 4 0.0400 4 0.0400

3 2.5 3.75 3.125 12 0.1200 16 0.1600

4 3.75 5.0 4.375 67 0.6700 83 0.8300

5 5.0 6.25 5.625 11 0.1100 94 0.9400

6 6.25 7.5 6.875 3 0.0300 97 0.9700

7 7.5 8.75 8.125 3 0.0300 100 1.0000

8 8.75 10.0 9.375 0 0.0000 100 1.0000

Above 10.0 0 0.0000 100 1.0000

------------------------------------------------------------------------------------------------------------

Mean = 4.62 Standard deviation = 1.22086



Interpretation:

Every house has a different number of bedrooms, but from the graph, we can

conclude that 50 % of the houses we evaluated have less than 4.62 bedrooms and 50%

have more than 4.62 bedrooms. Moreover, there are two houses with an extreme high

number of bedrooms and one house with an extreme low number of bedrooms.

Furthermore, the number of bedrooms among all the samples is evenly distributed.









-8 -

Variable 2: Age Range (0-90+ years old)





Histogram

30



25

frequency





20

15

10



5



0

0 20 40 60 80 100 120

Age range from 0 to 90 years old









Box-and-Whisker Plot









0 20 40 60 80 100

Age range from 0 to 90 years old









-9 -

Summary Statistics for Age range from 0 Percentiles for Age range from 0 to 90 years old

to 90 years old 1.0% = 0.0

Count = 74 5.0% = 2.0

Average = 29.6486 10.0% = 7.0

Median = 17.5 25.0% = 9.0

Mode = 7.0 50.0% = 17.5

Variance = 619.957 75.0% = 49.0

Standard deviation = 24.8989 90.0% = 64.0

Minimum = 0.0 95.0% = 80.0

Maximum = 94.0 99.0% = 94.0

Range = 94.0

Lower quartile = 9.0

Upper quartile = 49.0

Interquartile range = 40.0





------------------------------------------------------------------------------------------------------------

Lower Upper Relative Cumulative Cum. Rel.

Class Limit Limit Midpoint Frequency Frequency Frequency Frequency

------------------------------------------------------------------------------------------------------------

at or below -10.0 0 0.0000 0 0.0000

1 -10.0 5.0 -2.5 5 0.0676 5 0.0676

2 5.0 20.0 12.5 35 0.4730 40 0.5405

3 20.0 35.0 27.5 8 0.1081 48 0.6486

4 35.0 50.0 42.5 8 0.1081 56 0.7568

5 50.0 65.0 57.5 12 0.1622 68 0.9189

6 65.0 80.0 72.5 4 0.0541 72 0.9730

7 80.0 95.0 87.5 2 0.0270 74 1.0000

8 95.0 110.0 102.5 0 0.0000 74 1.0000

above 110.0 0 0.0000 74 1.0000

------------------------------------------------------------------------------------------------------------

Mean = 29.6486 Standard deviation = 24.8989









- 10 -

Interpretation:

Every house varies in age, but among all the houses we observed, 50% are less

than 17.5 years old and 50% are more than 17.5 years old. In addition, there are more

houses with a younger age, and fewer houses with an older age. Most houses are

between the age of 0 and 27 years old because if we were to select a house randomly, the

chance of a house being on the range of 0 to 27 years old is bigger than the chance of a

house being above the age 27 years old. Furthermore, there are no houses with an

extreme high or low age among our observations.









- 11 -

Variable 3: Features

Data variable: features



Number of observations: 100

Number of unique values: 2







Barchart for features





no









yes







20 40 60 80

frequency

Frequency Table for features



------------------------------------------------------------------------

Relative Cumulative Cum. Rel.

Class Value Frequency Frequency Frequency Frequency

------------------------------------------------------------------------

1 no 20 .2000 20 .2000

2 yes 80 .8000 100 1.0000

------------------------------------------------------------------------





Piechart for features

20.00%

features

no

yes









80.00%









Interpretation: There are 80 houses with features and 20 houses without features.

Therefore if we were to randomly select a house out of our sample, the chance of us

selecting a house with futures is greater than the chance of us selecting a house with

features.







- 12 -

Variable 4: House Size





Histogram

30



25

frequency





20

15

10



5



0

0 1 2 3 4 5

(X 1000)

House size in sqft









Box-and-Whisker Plot









0 1 2 3 4 5

(X 1000)

House size in sqft









- 13 -

Summary Statistics for House size in sqft Percentiles for House size in

Count = 100 sqft

Average = 2877.96

Median = 2738.0 1.0% = 1228.0

Mode = 2200.0 5.0% = 1758.0

Variance = 659410.0 10.0% = 1894.0

Standard deviation = 812.041 25.0% = 2266.5

Minimum = 1056.0 50.0% = 2738.0

Maximum = 4712.0 75.0% = 3430.0

Range = 3656.0 90.0% = 4137.0

Lower quartile = 2266.5 95.0% = 4394.5

Upper quartile = 3430.0 99.0% = 4690.0

Interquartile range = 1163.5





------------------------------------------------------------------------------------------------------------

Lower Upper Relative Cumulative Cum. Rel.

Class Limit Limit Midpoint Frequency Frequency Frequency Frequency

------------------------------------------------------------------------------------------------------------

at or below 0.0 0 0.0000 0 0.0000

1 0.0 625.0 312.5 0 0.0000 0 0.0000

2 625.0 1250.0 937.5 1 0.0100 1 0.0100

3 1250.0 1875.0 1562.5 9 0.0900 10 0.1000

4 1875.0 2500.0 2187.5 30 0.3000 40 0.4000

5 2500.0 3125.0 2812.5 23 0.2300 63 0.6300

6 3125.0 3750.0 3437.5 21 0.2100 84 0.8400

7 3750.0 4375.0 4062.5 11 0.1100 95 0.9500

8 4375.0 5000.0 4687.5 5 0.0500 100 1.0000

above 5000.0 0 0.0000 100 1.0000

------------------------------------------------------------------------------------------------------------

Mean = 2877.96 Standard deviation = 812.041



Interpretation:

Among the house sizes, 50% are less than 2738 square feet and 50% are more

than 2738 square feet. We have also observed that most houses are around the size of

3000 square feet. Since there is more data on the left side of the histogram, if we pick

any house, the chance of that house having a size ranging from 1056 square feet to 3000

square feet is very high. There are no extreme high or low house sizes in our sample.









- 14 -

Variable 5: Lot Size





Histogram

40





30

frequency







20



10





0

0 3 6 9 12 15 18

(X 1000)

Lot size in sqft









Box-and-Whisker Plot









0 4 8 12 16

(X 1000)

Lot size in sqft









- 15 -

Summary Statistics for Lot size in sqft Percentiles for Lot size in sqft

Count = 100 1.0% = 2235.2

Average = 6596.44 5.0% = 3696.0

Median = 6487.5 10.0% = 3909.43

Mode = 4026.0 25.0% = 4306.5

Variance = 6.5303E6 50.0% = 6487.5

Standard deviation = 2555.44 75.0% = 7919.89

Minimum = 1858.6 90.0% = 9697.0

Maximum = 15417.0 95.0% = 11665.5

Range = 13558.4 99.0% = 14609.5

Lower quartile = 4306.5

Upper quartile = 7919.89

Interquartile range = 3613.39





------------------------------------------------------------------------------------------------------------

Lower Upper Relative Cumulative Cum. Rel.

Class Limit Limit Midpoint Frequency Frequency Frequency Frequency

------------------------------------------------------------------------------------------------------------

at or below 0.0 0 0.0000 0 0.0000

1 0.0 2250.0 1125.0 1 0.0100 1 0.0100

2 2250.0 4500.0 3375.0 25 0.2500 26 0.2600

3 4500.0 6750.0 5625.0 29 0.2900 55 0.5500

4 6750.0 9000.0 7875.0 32 0.3200 87 0.8700

5 9000.0 11250.0 10125.0 7 0.0700 94 0.9400

6 11250.0 13500.0 12375.0 4 0.0400 98 0.9800

7 13500.0 15750.0 14625.0 2 0.0200 100 1.0000

8 15750.0 18000.0 16875.0 0 0.0000 100 1.0000

above 18000.0 0 0.0000 100 1.0000

------------------------------------------------------------------------------------------------------------

Mean = 6596.44 Standard deviation = 2555.44





Interpretation:

50% of our sample has a lot size that is less than 6487.5 square feet and 50% are

more than 6487.5 square feet. The data of lot sizes are mostly in the low values. In other

words, within our sample, most of the lots are between the size of 4000 to 6000 square

feet and larger lots are not that common. This box plot shows us that there are two houses

with an extremely large lot size.









- 16 -

Variable 6: Price





Histogram

40





30

frequency







20



10





0

0 3 6 9 12 15

(X 100000)

Price in dollars









Box-and-Whisker Plot









0 3 6 9 12 15

(X 100000)

Price in dollars









- 17 -

Summary Statistics for Price in dollars Percentiles for Price in dollars

Count = 100 1.0% = 394000.0

Average = 866806.0 5.0% = 555000.0

Median = 854000.0 10.0% = 599000.0

Mode = 25.0% = 693000.0

Variance = 4.8849E10 50.0% = 854000.0

Standard deviation = 221018.0 75.0% = 996500.0

Minimum = 299000.0 90.0% = 1.194E6

Maximum = 1.338E6 95.0% = 1.28E6

Range = 1.039E6 99.0% = 1.318E6

Lower quartile = 693000.0

Upper quartile = 996500.0

Interquartile range = 303500.0









Frequency Tabulation for Price in dollars



------------------------------------------------------------------------------------------------------------

Lower Upper Relative Cumulative Cum. Rel.

Class Limit Limit Midpoint Frequency Frequency Frequency Frequency

------------------------------------------------------------------------------------------------------------

at or below 0.0 0 0.0000 0 0.0000

1 0.0 187500.0 93750.0 0 0.0000 0 0.0000

2 187500.0 375000.0 281250.0 1 0.0100 1 0.0100

3 375000.0 562500.0 468750.0 5 0.0500 6 0.0600

4 562500.0 750000.0 656250.0 29 0.2900 35 0.3500

5 750000.0 937500.0 843750.0 31 0.3100 66 0.6600

6 937500.0 1.125E6 1.03125E6 17 0.1700 83 0.8300

7 1.125E6 1.3125E6 1.21875E6 16 0.1600 99 0.9900

8 1.3125E6 1.5E6 1.40625E6 1 0.0100 100 1.0000

above 1.5E6 0 0.0000 100 1.0000

------------------------------------------------------------------------------------------------------------

Mean = 866806.0 Standard deviation = 221018.0



Interpretation:

Price is the key factor in our observation because we are observing all the factors

that may or may not influence the price of a house. From this analysis, 50% of houses

among the sample have a price less than $854,000 and 50% have a price that is more than

$854,000. Moreover, many houses are on the low side of the price range (i.e.,

approximately less than $854,000). Furthermore, there are no extreme high or low prices

among the sample.







- 18 -

Section 4.0 Bivariate Analyses



The Relationship between House Size and Price





Plot of price _in dollars_ vs house size _sqft_

(X 100000)

15

price _in dollars_









12



9



6



3



0

0 1 2 3 4 5

(X 1000)

house size _sqft_









Box-and-Whisker Plot









0 1 2 3 4 5

(X 1000)

House size in sqft









- 19 -

From the graph and the way the data are placed we can deduce that as house size

increases, the price will also increase. The form of the graph tells us as the house size

changes, the price will also change in a proportionate fashion at every value. The

closeness of the data to form a line represents the strength of the relationship; as the

house size increases, the price of the house will have a proportional increase. The Box

and Whisker Plot shows no extreme high house sizes or low house sizes in our study.

The scatter plot does not exhibit any distinct groups.









Plot of Fitted Model

(X 100000)

15

price _in dollars_









12



9



6



3



0

0 1 2 3 4 5

(X 1000)

house size _sqft_









- 20 -

Regression Analysis - Linear model: Y = a + b*X

-----------------------------------------------------------------------------

Dependent variable: price _in dollars_

Independent variable: house size _sqft_

-----------------------------------------------------------------------------

Standard T

Parameter Estimate Error Statistic P-Value

-----------------------------------------------------------------------------

Intercept 307111.0 57497.9 5.34125 0.0000

Slope 194.476 19.235 10.1105 0.0000

-----------------------------------------------------------------------------







Analysis of Variance

-----------------------------------------------------------------------------

Source Sum of Squares Df Mean Square F-Ratio P-Value

-----------------------------------------------------------------------------

Model 2.46902E12 1 2.46902E12 102.22 0.0000

Residual 2.36703E12 98 2.41534E10

-----------------------------------------------------------------------------

Total (Corr.) 4.83605E12 99



Correlation Coefficient = 0.714524

R-squared = 51.0545 percent

R-squared (adjusted for d.f.) = 50.5551 percent

Standard Error of Est. = 155414.0

Mean absolute error = 121476.0

Durbin-Watson statistic = 2.09967 (P=0.3110)

Lag 1 residual autocorrelation = -0.0518703







The Simple Linear Regression Model is the mathematical model to explain the

relationship (i.e., mathematical equation)

 Least squares estimated regression:

o Estimated price in dollars = 307111.0 + 194.476*number of house size

o Interpretation:

 If a house increased its size by one square foot, the estimated price

of the house will have an increase of $194.476.

 The intercept has no physical meaning because the size of a house

cannot be zero. It is there just for the purpose of positioning the

line.



 Correlation coefficient: 0.714524

o The decision points: +0.196 and -0.196; therefore, 0.196< 0.714524

Because the correlation coefficient is bigger than the upper decision point,

the population has an obvious linear relationship.









- 21 -

o Interpretation: For all the houses, if the house size was to be increased or

decreased, then the price of the house will have a proportional increase or

decrease.



The assumptions

 Linearity

o The coefficient of determination: 51.0545 percent

 Interpretation: 51.0545 percent of the total variation in the price of

the house can be explained by the linear relationship between the

price of the house and the house size.



 Homocedasticity







Residual Plot

Studentized residual









5.8



3.8



1.8



-0.2



-2.2



-4.2

0 1 2 3 4 5

(X 1000)

house size _sqft_

o The graph shows that the data are homoscedastic because the shape of the

graph is oval-shaped. This means that the standard deviations of the

distribution of the price at each certain house size are the same. This

indicates that the R.M.S. error provides a reliable estimate of the common

standard deviation of the distribution to explain the variability of the price

of the house at each single house size.









- 22 -

 Normality





Normal Probability Plot

99.9

99



percentage 95

80

50

20

5

1

0.1

-53 -33 -13 7 27 47

(X 10000)

RESIDUALS

o The distribution is a bell-shaped distribution because the normal

probability plot looks like a straight line.

o Estimated price: $307694.428

 R.M.S. error: 155414.0

 $307694.428± 2(155414.0) = (3134, 618522.428)

 Interpretation: This means we have 95% confidence (or most

likely) a house that is 3 thousand square feet will have a price

between $3134.00 and $618522.428.









- 23 -

The Relationship between Lot Size and Price



Plot of price _in dollars_ vs lot size _sqft_

(X 100000)

15

price _in dollars_

12



9



6



3



0

0 4 8 12 16

(X 1000)

lot size _sqft_









Box-and-Whisker Plot









0 4 8 12 16

(X 1000)

Lot size in sqft









- 24 -

From the graph and the way the data are placed we can deduce that as lot size

increases, the price will also increase. The form of the graph tells us as the lot size

changes, the price will also change proportionately at every value. The closeness of the

data to form a line represents the strength of the relationship; as the lot size increases, the

price of the house will have a weak proportional increase. The Box and Whisker Plot

show there are two houses with extreme high (big) lot sizes in our study. The scatter plot

does not exhibit any distinct groups.









Plot of Fitted Model

(X 100000)

15

price _in dollars_









12



9



6



3



0

0 4 8 12 16

(X 1000)

lot size _sqft_









- 25 -

Regression Analysis - Linear model: Y = a + b*X

-----------------------------------------------------------------------------

Dependent variable: price _in dollars_

Independent variable: lot size _sqft_

-----------------------------------------------------------------------------

Standard T

Parameter Estimate Error Statistic P-Value

-----------------------------------------------------------------------------

Intercept 675671.0 58195.0 11.6105 0.0000

Slope 28.9755 8.23183 3.51994 0.0007

-----------------------------------------------------------------------------







Analysis of Variance

-----------------------------------------------------------------------------

Source Sum of Squares Df Mean Square F-Ratio P-Value

-----------------------------------------------------------------------------

Model 5.4279E11 1 5.4279E11 12.39 0.0007

Residual 4.29326E12 98 4.38088E10

-----------------------------------------------------------------------------

Total (Corr.) 4.83605E12 99



Correlation Coefficient = 0.33502

R-squared = 11.2238 percent

R-squared (adjusted for d.f.) = 10.3179 percent

Standard Error of Est. = 209306.0

Mean absolute error = 168263.0

Durbin-Watson statistic = 2.21536 (P=0.1385)

Lag 1 residual autocorrelation = -0.120617









 Correlation coefficient: 0.33502

o The decision point: 0.196; therefore, 0.196< 0.33502

Since the correlation coefficient is bigger than the decision point, which is

0.196, the population has an obvious linear relationship.

o Interpretation: For all the houses, if the lot size were to be increased or

decreased, then the price of the house will have a proportional increase or

decrease.





The Simple Linear Regression Model is the mathematical model to explain the

relationship (i.e. mathematical equation)

 Least squares estimated regression

o Estimated price in dollars =675671.0 + 28.9755*lot size in square feet

 If a house increases its lot size by one square foot, the estimated

price of the house will have an increase of $28.9755.









- 26 -

 The intercept has no physical meaning because the size of a house

can’t be zero. It is used for the purpose of positioning the line.



The Assumptions

 Linearity

o The coefficient of determination: 11.2238

 Interpretation: 11.2238% of the total variation in the price of the

house can be explained by the linear relationship between the price

of the house and the lot size.



 Homoscedasity





Residual Plot

Studentized residual









4.5



2.5



0.5



-1.5



-3.5

0 4 8 12 16

(X 1000)

lot size _sqft_

o The graph shows that the data are homoscedastic because the shape of

the graph is an oval. This means that the standard deviations of the

distribution of the price at each certain lot size are the same. This

indicates that the R.M.S. error provides a reliable estimate of the

common standard deviation of the distribution to explain the

variability of the price of the house at each lot size.









- 27 -

 Normality





Normal Probability Plot

99.9

99

percentage

95

80

50

20

5

1

0.1

-7 -5 -3 -1 1 3 5

(X 100000)

RESIDUALSls

o The distribution is not a bell-shaped distribution because the normal

probability plot does not look like a straight line.

o Estimated price: $675671.0

 R.M.S. error: 209306

 675671 ± 2(209306) = (466365, 884977)

 Interpretation: since the normal probability plot is not bell shaped,

we don’t have 95% confidence to say that for a house with a 3

thousand square feet lot, the price of the house is between $466365

and $884977.

o This is double confirmation that we should not use the above 95%

prediction interval found. It is mainly because the normality is violated

and in addition the RMS error is the appropriate estimate of the difference

standard deviations.









- 28 -

The Relationship between Price and Number of Bedrooms



Plot of price _in dollars_ vs # of bedrooms

(X 100000)

15

price _in dollars_

12



9



6



3



0

2 3 4 5 6 7 8

# of bedrooms









Box-and-W hisker Plot









2 3 4 5 6 7 8

Number of bedrooms









- 29 -

From the graph and the way the data are placed we can deduce that as the number

of bedrooms increase, the price will also increase. The form of the graph tells us as the

number of bedrooms changes; the price will also change proportionately at every value.

The closeness of the data to form a line represents the strength of the relationship; as the

number of bedrooms changes, the price of the house will have a weak proportional

change. The Box and Whisker Plot shows there are two houses with an extreme high

number of bedrooms and one house with an extreme low number of bedrooms in our

study. The scatter plot does not exhibit any distinct groups.









Plot of Fitted Model

(X 100000)

15

price _in dollars_









12



9



6



3



0

2 3 4 5 6 7 8

# of bedrooms









- 30 -

Regression Analysis - Linear model: Y = a + b*X

-----------------------------------------------------------------------------

Dependent variable: price _in dollars_

Independent variable: # of bedrooms

-----------------------------------------------------------------------------

Standard T

Parameter Estimate Error Statistic P-Value

-----------------------------------------------------------------------------

Intercept 558340.0 81200.2 6.87608 0.0000

Slope 66767.6 16998.1 3.92795 0.0002

-----------------------------------------------------------------------------







Analysis of Variance

-----------------------------------------------------------------------------

Source Sum of Squares Df Mean Square F-Ratio P-Value

-----------------------------------------------------------------------------

Model 6.5781E11 1 6.5781E11 15.43 0.0002

Residual 4.17824E12 98 4.26351E10

-----------------------------------------------------------------------------

Total (Corr.) 4.83605E12 99



Correlation Coefficient = 0.368812

R-squared = 13.6022 percent

R-squared (adjusted for d.f.) = 12.7206 percent

Standard Error of Est. = 206483.0

Mean absolute error = 168419.0

Durbin-Watson statistic = 2.12485 (P=0.2679)

Lag 1 residual autocorrelation = -0.0745949









 Correlation coefficient: 0.368812

o The decision point: 0.196; therefore, 0.196 < 0368812

Since the correlation coefficient is bigger than the decision point, which is

0.196, the population has an obvious linear relationship.

o Interpretation: For all the houses, if the number of bedrooms were to be

increased or decreased, then the price of the houses will have a

proportional increase or decrease.





The Simple Linear Regression Model is the mathematical model to explain the

relationship (i.e., mathematical equation)

 Least squares estimated regression

o Estimated price in dollars =558,340.00 + 66,767.60*number of bedrooms

 If a house adds one more bedroom, the estimated price of the

house will have an increase of $66,767.60







- 31 -

 If the house has no bedroom, then the estimated price of the house

will be $558,340.00







The Assumptions

 Linearity

o The coefficient of determination: 13.6022

 Interpretation: 13.6022% of the total variation in the price of the

house can be explained by the linear relationship between the price

of the house and the number of bedrooms.



 Homoscedasity





Residual Plot

Studentized residual









3.3

2.3

1.3

0.3

-0.7

-1.7

-2.7

2 3 4 5 6 7 8

# of bedrooms

o The graph shows that the data are homoscedastic because the shape of

the graph is an oval. This means that the standard deviations of the

distribution of the price at each number of bedrooms are the same.

This indicates that the R.M.S. error provides a reliable estimate of the

common standard deviation of the distribution to explain the

variability of the price of the house at each number of bedrooms.









- 32 -

 Normality





Normal Probability Plot

99.9

99

percentage

95

80

50

20

5

1

0.1

-53 -33 -13 7 27 47

(X 10000)

RESIDUALS

o The distribution is a bell-shaped distribution because the normal

probability plot looks like a straight line.

o Estimated price: $758,642.80

 R.M.S. error: 206483

 758642 ± 2(206483) = (345676, 1171608)

 Interpretation: This means we have 95% confidence (or most

likely) that for a house with 3 bedrooms, the price of the house is

between $345,676 and $1,171,608.









- 33 -

The Relationship between Age and Price





Plot of price _in dollars_ vs age range _0_90_ years old_

(X 100000)

15

price _in dollars_





12



9



6



3



0

0 20 40 60 80 100

age range _0_90_ years old_









Box-and-Whisker Plot









0 20 40 60 80 100

Age range from 0 to 90 years old









- 34 -

From the graph and the way the data are placed we can deduce that as the age of

the house increases, the price will also decrease. The form of the graph tells us as the age

of the house changes, the price will not change proportionately at every value. The

closeness of the data to form a line represents the strength of the relationship; as the age

of the house increases, the price of the house will have a mild proportional decrease. The

Box and Whisker Plot shows there are no extreme high or extreme low values in the age

range in our study. The scatter plot exhibits two distinct groups.









Plot of Fitted Model

(X 100000)

15

price _in dollars_









12



9



6



3



0

0 20 40 60 80 100

age range _0_90_ years old_









- 35 -

Regression Analysis - Linear model: Y = a + b*X

-----------------------------------------------------------------------------

Dependent variable: price _in dollars_

Independent variable: age range _0_90_ years old_

-----------------------------------------------------------------------------

Standard T

Parameter Estimate Error Statistic P-Value

-----------------------------------------------------------------------------

Intercept 1.00912E6 38337.4 26.3222 0.0000

Slope -4182.14 992.976 -4.21172 0.0001

-----------------------------------------------------------------------------







Analysis of Variance

-----------------------------------------------------------------------------

Source Sum of Squares Df Mean Square F-Ratio P-Value

-----------------------------------------------------------------------------

Model 7.91556E11 1 7.91556E11 17.74 0.0001

Residual 3.21288E12 72 4.46233E10

-----------------------------------------------------------------------------

Total (Corr.) 4.00443E12 73



Correlation Coefficient = -0.444601

R-squared = 19.767 percent

R-squared (adjusted for d.f.) = 18.6526 percent

Standard Error of Est. = 211242.0

Mean absolute error = 173166.0

Durbin-Watson statistic = 2.17166 (P=0.2351)

Lag 1 residual autocorrelation = -0.0979682









 Correlation coefficient: -0.444601

o The decision points: +/- 0.196; therefore, -0.444601 < -0.196

Since the correlation coefficient is smaller than the lower decision point,

which is -0.196, the population has an obvious linear relationship.

o Interpretation: For all the houses, if the age range would like to be

increased or decreased, then the price of the house will have a proportional

decrease or increase.





The Simple Linear Regression Model is the mathematical model to explain the

relationship (i.e., mathematical equation)

 Least squares estimated regression:

o Estimated price in dollars = 1009120 – 4182.14*the age of the house









- 36 -

o Interpretation:

 With each passing year, the estimated price of the house will have

an decrease of $4182.14 each year

 If the house is zero in age, then the estimated price of the house

will be $1,009,120





The assumptions

 Linearity

o The coefficient of determination: 19.767 percent

 Interpretation: 19.767% of the total variation in the price of the

house can be explained by the linear relationship between the price

of the house and the age of the house



 Homocedasticity



Residual Plot

Studentized residual









3

2

1

0

-1

-2

-3

0 20 40 60 80 100

age range _0_90_ years old_





o The graph shows that the data are not homoscedastic because the shape of

the graph is not oval-shaped or random. This means that the standard

deviation of the distribution of y-values at each x-value is not the same.

This indicates that the R.M.S. error does not provide a reliable estimate of









- 37 -

the common standard deviation of the distribution to explain the

variability of the price of the house at different ages.

 Normality





Normal Probability Plot

99.9

99

percentage







95

80

50

20

5

1

0.1

-6 -4 -2 0 2 4 6

(X 100000)

RESIDUALSage

o The distribution is not a bell-shaped distribution because the normal

probability plot does not look like a straight line.

o Estimated price: $996,573.58

R.M.S. error: 211242

996573 ± 2(211242) = (574089, 1419057)

Interpretation: since the normal probability plot is not bell shaped, we

don’t have 95% confidence (or most likely) that a house, 3 years of age, is

in between $574,089 and $1,419,057.

o This is double confirmation that we should not use the above 95%

prediction interval found because the normality is violated and in addition

the RMS error is not the appropriate estimate of the difference standard

deviations.









- 38 -

The Relationship between the House Size and Lot Size





Plot of lot size _sqft_ vs house size _sqft_

(X 1000)

16

lot size _sqft_





12



8



4



0

0 1 2 3 4 5

(X 1000)

house size _sqft_









Box-and-Whisker Plot









0 1 2 3 4 5

(X 1000)

House size in sqft









- 39 -

From the graph and the way the data are placed we can deduce that as the house size

changes, the lot size will also change. The form of the graph tells us the lot size changes,

the price will change proportionately at every value. The closeness of the data to form a

line represents the strength of the relationship; as the lot size changes, the price of the

house will have a mild proportional increase. The Box and Whisker Plot shows there are

no extreme high or extreme low values house sizes in our study. The scatter plot exhibits

two distinct groups.









Plot of Fitted Model

(X 1000)

16

lot size _sqft_









12





8





4





0

0 1 2 3 4 5

(X 1000)

house size _sqft_









- 40 -

Regression Analysis - Linear model: Y = a + b*X

-----------------------------------------------------------------------------

Dependent variable: lot size _sqft_

Independent variable: RESIDUALSHsLs

-----------------------------------------------------------------------------

Standard T

Parameter Estimate Error Statistic P-Value

-----------------------------------------------------------------------------

Intercept 6596.43 95.6433 68.9692 0.0000

Slope 1.0 0.0405307 24.6726 0.0000

-----------------------------------------------------------------------------







Analysis of Variance

-----------------------------------------------------------------------------

Source Sum of Squares Df Mean Square F-Ratio P-Value

-----------------------------------------------------------------------------

Model 5.56853E8 1 5.56853E8 608.74 0.0000

Residual 8.96468E7 98 914763.0

-----------------------------------------------------------------------------

Total (Corr.) 6.465E8 99



Correlation Coefficient = 0.928081

R-squared = 86.1335 percent

R-squared (adjusted for d.f.) = 85.992 percent

Standard Error of Est. = 956.433

Mean absolute error = 788.861

Durbin-Watson statistic = 2.13275 (P=0.2539)

Lag 1 residual autocorrelation = -0.0928307





 Correlation coefficient: 0.928081

o The decision points: 0.196 and -0.196; therefore, 0.196< 0.928081

Because the correlation coefficient is bigger than 0.196, then the

population has an obvious linear relationship.

o Interpretation: For all the houses, if the house size were to be increased or

decreased, then the lot size will have a proportional increase or decrease.





The Simple Linear Regression Model is the mathematical model to explain the

relationship (i.e., mathematical equation)

 Least squares estimated regression

o Estimated price in dollars =6596.43 + 1.0*lot size in square feet

 If a house increased its size by one square foot, the estimated lot

size will have an increase of one square foot

 The intercept has no physical meaning because the size of a house

cannot be zero. It is just for the purpose of positioning the line.









- 41 -

The Assumptions

 Linearity

o The coefficient of determination: 11.2238

 Interpretation: 11.2238% of the total variation in the price of the

house can be explained by the linear relationship between the price

of the house and the lot size.









 scedasity





Residual Plot

Studentized residual









4



2



0



-2



-4

0 1 2 3 4 5

(X 1000)

house size _sqft_



o The graph shows that the data are homoscedastic because the graph is

oval-shaped. This means that the standard deviations of the

distribution of the lot size at each certain house size are the same. This

indicates that the R.M.S. error provides a reliable estimate of the

common standard deviation of the distribution to explain the

variability of the lot size at each house size.









- 42 -

 Normality





Normal Probability Plot

99.9

99

percentage

95

80

50

20

5

1

0.1

-5 -2 1 4 7 10

(X 1000)

RESIDUALS Lot and House Size

o The distribution is not a bell-shaped distribution because the normal

probability plot does not look like a straight line.

o Estimated house size: 6599.43 square feet

 R.M.S. error: 956.433

 6599.43 ± 2(956.433) = (4686.564, 8512.296)

Interpretation: Since the normal probability plot is not bell shaped, we

don’t have 95% confidence that a house with 3 thousand square feet lot

size will have a house that is between 4686.564 square feet and 8512.296

square feet.









- 43 -

The Relationship between House Size and Number of

Bedrooms



Plot of house size _sqft_ vs # of bedrooms

(X 1000)

5

house size _sqft_



4



3



2



1





2 3 4 5 6 7 8

# of bedrooms







Box-and-W hisker Plot









2 3 4 5 6 7 8

Number of bedrooms









- 44 -

From the graph and the way the data are placed we can deduce that as the number

of bedrooms increase, the house size will also increase. The form of the graph tells us as

the number of bedrooms changes; the house size will also change proportionally at every

value. The closeness of the data to form a line represents the strength of the relationship;

as the number of bedrooms changes, the house size will have a weak proportional

increase. The Box and Whisker Plot shows there are two houses with an extreme high

number of bedrooms and one house with an extreme low number of bedrooms in our

study. The scatter plot does not exhibit any distinct groups.









Plot of Fitted Model

(X 1000)

5

house size _sqft_









4



3



2



1





2 3 4 5 6 7 8

# of bedrooms









- 45 -

Regression Analysis - Linear model: Y = a + b*X

-----------------------------------------------------------------------------

Dependent variable: house size _sqft_

Independent variable: # of bedrooms

-----------------------------------------------------------------------------

Standard T

Parameter Estimate Error Statistic P-Value

-----------------------------------------------------------------------------

Intercept 1277.51 273.996 4.66251 . 0

Slope 346.418 57.3569 6.0397 . 0

-----------------------------------------------------------------------------







Analysis of Variance

-----------------------------------------------------------------------------

Source Sum of Squares Df Mean Square F-Ratio P-Value

-----------------------------------------------------------------------------

Model 1.7708E7 1 1.7708E7 36.48 . 0

Residual 4.75736E7 98 485445.0

-----------------------------------------------------------------------------

Total (Corr.) 6.52816E7 99



Correlation Coefficient = .520822

R-squared = 27.1256 percent

R-squared (adjusted for d.f.) = 26.382 percent

Standard Error of Est. = 696.739

Mean absolute error = 590.125

Durbin-Watson statistic = 2.10262 (P=.3059)

Lag 1 residual autocorrelation = -.0666658









 Correlation coefficient: 0.520822

o The decision point: 0.196; therefore, 0.196 < 0.520822

Because the correlation coefficient is bigger than -0.196, the population

has a linear relationship.

o Interpretation: For all the houses, if the number of the bedrooms were to

be increased or decreased, then the size of the houses will have a

proportional increase or decrease.





The Simple Linear Regression Model is the mathematical model to explain the

relationship (i.e., mathematical equation)

 Least squares estimated regression

o Estimated house size in square feet =1277.51+ 346.418*number of

bedrooms





- 46 -

 If a house adds 1 more bedroom, the estimated size of the house

will have an increase of 346.418 squares ft.

 If the house has zero number of bedrooms, then the estimated price

of the house will be 1277.51-squared ft.





The Assumptions

 Linearity

o The coefficient of determination: 27.1256 percent

 Interpretation: 27.1256% of the total variation in the size of the

house can be explained by the linear relationship between the size

of the house and the number of bedrooms.



 Homoscedasity





Residual Plot

Studentized residual









2.6



1.6



.6



-.4



-1.4



-2.4

2 3 4 5 6 7 8

# of bedrooms



o The graph shows that the data are homoscedastic because the shape of

the graph is an oval. This means that the standard deviations of the

distribution if the house size at each certain number of bedrooms are

the same. This indicates that the R.M.S. error provides a reliable

estimate of the common standard deviation of the distribution to









- 47 -

explain the variability of the size of the house at each number of

bedrooms.



 Normality





Normal Probability Plot

99.9

99

percentage







95

80

50

20

5

1

.1

-1700 -700 300 1300 2300

RESIDUALShsbe



o The distribution is not bell-shaped distribution because the normal

probability plot does not look like a straight line. 1277.51+ 346.418

o Estimated house size: 2316.764 square feet

 R.M.S. error: 696.739

 2316.764 ± 2(696.739) = (923.286, 3710.242)

 Interpretation: Since the normal probability plot is not bell shaped,

we don’t have 95% confidence that for a house with 3 bedrooms

will have a house size between 923.286 and 3710.242 square feet.









- 48 -

Relationship between Features and Age





Barchart for age range _0_90_ years old__1 by features

40 features

no

yes

30

frequency







20



10



0

1 2

Age Range

Frequency Table for age range _0_90_ years old__1 by features



Row

no yes Total

---------------------------

1 | 4 | 40 | 44

| 5.41% | 54.05% | 59.46%

| 7.73 | 36.27 |

---------------------------

2 | 9 | 21 | 30

| 12.16% | 28.38% | 40.54%

| 5.27 | 24.73 |

---------------------------

Column 13 61 74

Total 17.57% 82.43% 100.00%





Note: Column 1: 0 – 30 years old

Column 2: 31-95 years old





Null hypothesis: In the population of houses, there is no relationship between features

and age.

Alternative hypothesis: In the population of houses, there is a relationship between

features and age.









- 49 -

Chi-Square Test

------------------------------------------

Chi-Square Df P-Value

------------------------------------------

5.39 1 0.0203

4.04 1 0.0445 (with Yates' correction)

------------------------------------------







Fisher's Exact Test for 2 by 2 Tables

-------------------------------------

One-tailed P-value = 0.0228914

Two-tailed P-value = 0.0294331







Since Degree of Freedom = 1 the Decision Point is 3.84

Therefore, since the Chi-Square, 5.39, which is larger than DF, we have enough evidence

to conclude that there is an association between their features and age.





Hence, the alternative relationship is correct. For all houses, there is a relationship

between age and features (i.e. the older the house, the less features a house has). Our

hypothesis was correct; the older the house the less features it has.









- 50 -

Relationship between Features and House Size





Barchart for house size _sqft__1 by features

30 features

no

25 yes

frequency





20

15

10

5

0

1 2 3

House Size





Frequency Table for house size _sqft__1 by features



Row

no yes Total

---------------------------

1 | 11 | 29 | 40

| 11.00% | 29.00% | 40.00%

| 8.00 | 32.00 |

---------------------------

2 | 8 | 23 | 31

| 8.00% | 23.00% | 31.00%

| 6.20 | 24.80 |

---------------------------

3 | 1 | 28 | 29

| 1.00% | 28.00% | 29.00%

| 5.80 | 23.20 |

---------------------------

Column 20 80 100

Total 20.00% 80.00% 100.00%







Cell contents:

Observed frequency

Percentage of table

Expected frequency









- 51 -

Note: Column 1: 1000 – 2500 square feet

Column 2: 2501 – 3400 square feet

Column 3: 3401 – 5000 square feet





Null hypothesis: In the population of houses, there is no relationship between features

and house size.

Alternative hypothesis: In the population of houses, there is a relationship between

features and house size.





Chi-Square Test

------------------------------------------

Chi-Square Df P-Value

------------------------------------------

7.02 2 0.0298

------------------------------------------





Since Degree of Freedom = 2 the Decision Point is 5.99.

Therefore, since Chi-Square, 7.02, which is larger than DF, we have enough evidence to

conclude that there is an association between features and house size.





Hence, the alternative relationship is correct. For all houses, there is a relationship

between house size and features. Our hypothesis was correct. The larger the house, the

more features it has.









- 52 -

Relationship between Features and Lot size



Barchart for lot size _sqft__1 by features

30 features

no

25 yes

frequency





20

15

10

5

0

1 2 3

Lot Size

Frequency Table for lot size _sqft__1 by features



Row

no yes Total

---------------------------

1 | 7 | 29 | 36

| 7.00% | 29.00% | 36.00%

| 7.20 | 28.80 |

---------------------------

2 | 8 | 24 | 32

| 8.00% | 24.00% | 32.00%

| 6.40 | 25.60 |

---------------------------

3 | 5 | 27 | 32

| 5.00% | 27.00% | 32.00%

| 6.40 | 25.60 |

---------------------------

Column 20 80 100

Total 20.00% 80.00% 100.00%





Note: Column 1: 1500 – 5500 square feet

Column 2: 5501 – 7500 square feet

Column 3: 7501 – 15500 square feet





Null hypothesis: In the population of houses, there is no relationship between features

and lot size.

Alternative hypothesis: In the population of houses, there is a relationship between

features and lot size.





- 53 -

Chi-Square Test

------------------------------------------

Chi-Square Df P-Value

------------------------------------------

7.02 2 0.0298

------------------------------------------





Since Degree of Freedom = 2 the Decision Point is 5.99

Therefore, since the Chi-Square, 7.02, which is larger than DF, we have enough evidence

to conclude that there is an association between features and lot size.





Hence, the alternative relationship is correct. For all houses, there is a relationship

between lot size and features. Our hypothesis was correct, the bigger the lot size, the

more features a house has.









- 54 -

Relationship between Features and Price





Barchart for price _in dollars__1 by features

40 features

no

yes

30

frequency







20



10



0

1 2 3

Price in Dollars

Frequency Table for price _in dollars__1 by features



Row

no yes Total

---------------------------

1 | 11 | 18 | 29

| 11.00% | 18.00% | 29.00%

| 5.80 | 23.20 |

---------------------------

2 | 4 | 29 | 33

| 4.00% | 29.00% | 33.00%

| 6.60 | 26.40 |

---------------------------

3 | 5 | 33 | 38

| 5.00% | 33.00% | 38.00%

| 7.60 | 30.40 |

---------------------------

Column 20 80 100

Total 20.00% 80.00% 100.00%



Note: Column 1: 200,000 – 700,000 dollars

Column 2: 700,001 – 900,000 dollars

Column 3: 900,001 – 1,400,000 dollars





Null hypothesis: In the population of houses, there is no relationship between features

and price.

Alternative hypothesis: In the population of houses, there is a relationship between

features and price.







- 55 -

Chi-Square Test

------------------------------------------

Chi-Square Df P-Value

------------------------------------------

8.22 2 0.0164

------------------------------------------





Since Degree of Freedom = 2 the Decision Point is 5.99

Therefore, since the Chi-Square, 8.22, is larger than the DP, we have enough evidence to

conclude that there is an association between features and price.





Hence, the alternative relationship is correct. For all houses, there is a relationship

between price and features. Our hypothesis was correct, the more the house cost, the

more features a house has.









- 56 -

Section 5.0 Conclusions

There are many factors that can influence the price of a house around the

Vancouver Area of Greater Vancouver. In this project we came up with five variables

that have the biggest influence on price: number of bedrooms, age range, features, house

size and lot size. The influence towards price is made by the variations of the number of

bedrooms, age, features, house size and lot size in each house

From our analysis, we can see how each variable is related to the price of the

house and to other variables as well. Moreover, not every variable is related to each other.

Therefore, the way all these factors are related is still depends on the buyer’s preferences

when they are purchasing the house.

As we started the project early, we have encountered many limitations to our

results. Our first limitation: the houses we took for sample are not representative of the

houses in the area of Vancouver West because we only took the sample from a website

and we only sampled houses that were on the market. We did not sample the houses that

have been sold or currently lived in.

Moreover, another limitation that we have encountered include us having limited

time and it is also costly to do more detailed sampling therefore the sample of the houses

we have analyzed are not 100% representative of Vancouver West. This is because each

area varies on their own in price range; the exact same house in one area may have a

different price in another area

Our third limitation is since we took our samples from a website, if the day after

we wanted to recheck it, the houses with the number taken from the Simple Random

Sampling may be a different house because each day house are being sold and the house

will be removed from the site and the number that use to represent the house we sampled

may have been reassigned to another house.

The fourth limitation is that we calculated the factors that influence the price of a

house systematically, but the buyer’s preferences unquestionably contribute to price

influence of a house as well. For example, some parents will deliberately find a house

within the boundaries of a good school for the sake of their children so they are willing to

pay a high price in order to obtain that convenience and benefit.





- 57 -

Section 6.0 Contributions



Kelvin Cheung

 Bivariate

 House

 Instant noodles





Na Ding

 Bivariate

 Univariate – feature

 Some editing

 Sampling – Search for Houses





Stephanie Gozali

 Bivariate

 Univariate

 Putting the report together (i.e., intro to conclusion)

 SGP work

 Sampling using SRS





Elaine Wong

 Edit the whole report, i.e., grammar, sentence structure, etc

 Help in SGP

 Bivariate

 Sampling – Search for Houses









- 58 -


Related docs
Other docs by rogerholland
Shilpa Bhoj
Views: 2211  |  Downloads: 0
Software Quality Assurance
Views: 1198  |  Downloads: 50
Chapter 2 - The metaphysical impulse
Views: 14  |  Downloads: 0
Sarah Moore 4750 Pear Ridge Dr
Views: 20  |  Downloads: 0
PROJECT 1
Views: 3  |  Downloads: 0
Property Custody Reciept
Views: 23  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!