Decomposing the Effect of Supermarket Opening on Fruit and Vegetable
Consumption in Areas of Limited Retail Access:
Evidence from the Seacroft Intervention Study
Improving health, particularly for those whose diet is not nutritionally balanced, is now an
established part of government strategy in the UK and elsewhere. Much focus is given to
communities where deprivation is high and access to retail is low. Throughout the past
decade supermarkets have entered these predominantly urban and suburban areas that
were left behind in the original clamour for car accessible out of town sites. We ask, using a
study of one such area in West Yorkshire, whether fruit and vegetable consumption is
increased by such openings. Our conclusions can then be generalised to other similar areas
by policymakers here and abroad who aim to understand the role supermarkets can play to
achieve nutritional goals. Using quantile regression we seek to establish whether those who
consume the least when choice is poor do indeed react the most when presented with wider
healthy eating opportunities. Our evidence suggests supermarkets do not help those at the
lowest end of the consumption distribution as much as the simple regressions of the current
literature might suggest. As such we urge caution in blindly accepting the supermarkets case
that their entry into poor communities will improve fruit and vegetable consumption therein.
Literature and Background
Following increased concern in the 1990s that supermarkets had become the preserve of
the car user, and that many were then left without access to healthy foodstuffs, strategies to
improve retail access rose up the UK government agenda. Policy Action Team 13 (PAT13)1
were charged with looking at this issue and the term “food desert” was coined to describe
communities without an appropriate fruit and vegetable retailer within 500m2. Meanwhile as
out of town sites with planning permission for supermarkets were becoming more and more
scarce a potential solution to both problems was being created. Supermarkets, selling large
ranges of healthy produce, could be built to serve locals and the motorist alike. Early
examples were Manchester SportCity and Seacroft Green in Leeds, West Yorkshire, which
were both sites with good road access and far from the nearest other supermarket. Our
focus is on this latter opening as it has been widely studied in the literature, but similar
thinking was occurring in the United States and Australia amongst others3. Attention was not
only lavished on the urban situation, rural areas were suffering equally, and while the
distances involved may have been longer the lack of access was still very real4.
Wrigley et al (2003) forms the main basis of this paper, seeking to explore the impact of the
new Tesco opening on diet in Seacroft. They, like us, draw on data from the pre and post
waves of the Seacroft Intervention Study5 and compare consumption of fruit and vegetables
by the main survey respondent. Switching to Tesco, being close to the new store and having
consumed a high quantity of fruit and vegetables are all shown to be significant, both in two
Policy Action Teams were set up by the Labour government on election in 1997 but most did not report until
A fuller history of the debate can be found in Wrigley et al (2003), Wrigley et al (2004a), Guy et al (2004) and
O’Neill (2005) amongst others.
Studies in the international context include.... HHHHHHHHHHHHHHHHHHHHHHHHHH
Findlay and Sparks (2008b) and Fitch (2004) look at rural Scotland and the access problems there.
A useful guide to that study is provided both in the opening of Wrigley et al (2003) and Wrigley et al (2002).
sample t tests and in the OLS model presented. Interestingly the effect of pre-consumption is
negative with a positive coefficient on the switch to Tesco of roughly the same magnitude. All
else equal this would suggest a fall in the number of portions of fruit and vegetables eaten
per day amongst those who did not switch to the new store. Big increases are suggested for
nearby residents and those who have switched from stores previously not offering much fruit
and vegetables. They conclude that “Across the 615 respondents who completed both
waves of the survey, mean consumption of fruit and vegetables increased insignificantly
from 2.88 to 2.92 portions per day [but it is suggested that] these aggregate statistics mask a
picture of larger and more subtle changes” (Wrigley et al, 2003, page 169).
To delve into the effects at the various levels of fruit and vegetable consumption we will use
quantile regression, working with four quantiles of post-intervention consumption, to
understand how the various demographic, distance from store, and pre-intervention
variables interact at each level. Our use of quartiles is motivated to ensure that there are no
holes in the data6 and tests on the model with a greater number of quantiles suggest four
quartiles optimal7. Quantile regressions has the advantage of being able to allow the impact
of certain variables to vary over the consumption distribution. It has applications in many
areas, medicine, education and labour economics (Stilles 2007), finance (Lyons et al 2007),
traffic analysis (Wu et al 2006, Hewson 2008) and many others. All of these papers explain
why fitting average values for coefficients might fail to explain important effects at the ends
of the distribution of the type we are interested in.
Gustavsen and Ricketson (2006) model demand for vegetables using Norwegian data on
prices for substitutes and compliments, regional dummies, settlement dummies, seasonal
dummies and household composition dummies. Inclusion of prices gives this dataset a
useful ability to predict the way households may react to changes in the relative price of
vegetables, something which our data does not offer. However there is no account taken of
the distances that consumers travel and so no spatial effects can be captured. It is
interesting to note that prices are not significant at the lowest quartile. For those who
consume least it is the regional effects and household composition variables that have
strongest impact. Prices become increasingly significant as you move up the distribution but
the strength of the effect goes down. The impact of children is positive in all but the lowest
quantile. This will be reversed in what we see for the Seacroft data.
We use the 600 observations for which postcode information is available in both the first and
second waves of the Seacroft Intervention Study (Wrigley et al 2004b) to capture how fruit
and vegetable consumption has changed following the opening of the Tesco store.
This is necessary because some combinations important variables of influence are not observed at all post
intervention quartiles when those quartiles are small.
We consider here only symmetric quantiles because of the danger of missing data in smaller groupings. In a
different data set, or for a smaller number of explanatory variables it may be interesting to look at asymmetric
groupings, potentially motivated around the patterns observed in the distributions that follow.
Figure 1 Density Plot of Pre-Intervention Consumption and Post-Intervention Consumption. Source:
Wrigley et al (2004b) and Own Calculations.
Overall changes in pre (dashed line) and post intervention (solid line) fruit and vegetable
consumption can be seen, on Figure 1, to be small in line with Wrigley et al (2003). What is
noticeable is that there are differences, however small, around the mean and near the top
end. This suggests that different effects are indeed happening in different parts of the
distribution8. Respondents in this survey are naturally split into those who switched to Tesco
and those who did not. The next plots show the density of before and after consumption
amongst switchers and non switchers9. These plots again both depict the before densities as
dashed lines and the after densities as a solid line. While there does seem to be a genuine
increase in consumption amongst those who did switch we also see pronounced differences
in the did not switch plot. Around the three portions a day mark there is a clear increase
Some outlier behaviour is suggested towards the right of the plot, and there are two points truncated
from the diagram for clarity. Following Wrigley et al (2003) these outliers are kept within the data set.
Again a band width of 0.4 is used.
amongst the switchers, similarly near 6 portions per day. The biggest increases in the non
switchers come higher up the distribution at 6 portions per day.
Figure 2: Density plot of pre-intervention consumption and post-intervention consumption split by
whether or not the respondent switched to use the new Tesco store. Source: Wrigley et al (2004b)
and Own Calculations.
Our model is made more unique by using the Mastermap ITN layer for the calculation of
distances between respondents and stores. This gives us the exact distance from the store
to an individuals code point location10 via the road network.
Map 2.1 Distance Quartiles from Tesco Seacroft. Source: Wrigley et al (2004b) and Own
Map co-ordinates for the full postcode of a respondents address. We do not have any more
accurate details about where precisely a household is located within their street.
Table 1 summarises the data for the observations for which location information is available.
25th,50th and 75th percentiles are included, as well as the Mean for the distance to Tesco and
the pre- and post-intervention portions of fruit and vegetables consumed per day. It is worth
noting that there is an increase in the average consumption for those who did switch but not
amongst those who did not. In both cases the median values stay the same. Further the
quartiles of before and after consumption are in fact identical. We can see for the distance
variables that those who switched are nearer on average. Testing for identical average
consumption by switchers and non switchers using a two-sample t test reveals that there is a
difference that is significant at 1%.
Switched to Tesco Did not Switch Total
Count % Count % Count %
1 120 44.6 128 42.3 248 41.2
0 149 55.4 203 57.7 352 58.8
Children 1 127 47.2 137 45.7 264 34.1
Decision 0 142 52.8 194 54.3 336 55.9
Mean 1502m 1650m 1583m
Distance to 25% 1132m 1306m 1221m
Tesco 50% 1409m 1636m 1539m
75% 1927m 2073m 1985m
Mean 2.66 3.08 2.89
Pre-Intervention 25% 1.29 1.86 1.57
day 50% 2.29 2.57 2.43
75% 3.57 3.93 3.71
Mean 2.90 2.95 2.93
Intervention 25% 1.46 1.71 1.57
Portions per 50% 2.29 2.57 2.43
75% 3.71 3.71 3.71
Table 1 Summary of Data Used in Modelling.
Source: Own Calculations on Wrigley et al (2004b)
Following Wrigley et al (2003) we might expect that there would be a stronger effect nearer
to the new outlet, and that this would diminish as we move further from Tesco Seacroft.
Table 2 shows the only significant change is the 0.46 portions per day increase shown by all
respondents at the lowest distance quartile.
Distance Quartile 1 2 3 4
Switched to Before 2.673 2.470 2.664 2.881
After 3.258 2.552 2.548 3.158
Difference 0.585 0.082 -0.116 0.277
p value 0.1653 0.7687 0.7705 0.3794
Did not Before 2.980 2.833 2.663 3.755
Tesco After 3.270 2.745 2.449 3.373
Difference 0.290 -0.088 -0.214 -0.382
p value 0.3709 0.7413 0.3798 0.3191
Overall Before 2.803 2.672 2.663 3.434
After 3.263 2.660 2.490 3.294
Difference 0.460 -0.012 -0.173 -0.140
p value 0.09187* 0.9485 0.4216 0.6043
t-test Switch v Non Switch (p 0.9784 0.4328 0.7669 0.5207
Table 2 - Changes in average fruit and vegetable consumption in portions per day between
the before and after waves of the Seacroft Intervention Study broken down by distance to
Tesco Seacroft quartiles. P values calculated for joint sample t-tests. * Significant at 10%
Source: Own Calculations
In this Chapter I use a quantile regression process that looks to explain how the after
intervention consumption can be explained at the 25th, 50th and 75th percentiles. Following
Koenker and Bassett (1978) the quantile regression is calculated by solving:
Min yt xt 1 yt xt
K tt : y x
t t t t : y t xt
Where y t is the dependent variable, xt is a K by 1 vector of explanatory variables, is a
vector of coefficients and is the regression quantile. All estimation is done using R and the
package quantreg. We begin by including a large set of explanatory variables and then
systematically remove the least significant variables until we left with the model.
2.4.1 Regression Results
Variable Quartiles OLS
0.25 0.5 0.75
Intercept -3.12457 1.05788 -0.36740 -0.20170
(1.79915)* (0.88901) (1.93840) (2.025)
Dist1 -0.00028 0.00045 0.00089 0.00002
(0.00034) (0.00029) (0.00036)** (0.00054)
Dist3 0.00183 -0.00016 0.00077 -0.00049
(0.00096)* (0.00050) (0.00111) (0.00111)
Dist4 0.00011 -0.00021 -0.00096 -0.00002
(0.00027) (0.00025) (0.00048)** (0.00064)
D025 3.96993 -0.18679 1.19783 1.475
(1.81707)** (0.89876) (1.92738) (2.071)
D2550 3.52383 0.00494 1.84989 1.015
(1.79555)* (0.87984) (1.91653) (2.003)
D75100 3.37113 0.45264 3.82988 1.151
(1.89446)* (1.03103) (2.21525)* (2.467)
Preq1 0.67884 0.56026 0.71312 0.5857
(0.16577)*** (0.19671)*** (0.24246)*** (0.3383)*
Preq2 0.42430 1.17227 2.05287 1.083
(0.34901) (0.29251)*** (0.49142)*** (0.5521)*
Preq3 0.63689 0.64642 0.74827 0.6590
(0.07607)*** (0.07148)*** (0.09509)*** (0.1269)***
Preq4 0.09475 0.68362 0.94480 0.5502
(0.13753) (0.20870)*** (0.04771)*** (0.06306)***
F2550 0.42708 -1.22645 -2.88148 -0.9533
(0.75135) (0.56621)** (1.05537)*** (1.220)
F75100 2.63641 -0.00085 -0.71474 0.9858
(0.72595)*** (0.93059) (0.47285) (0.5264)*
D_Car -0.13415 -0.14439 -0.09373 0.03261
(0.09817) (0.07371)* (0.11320) (0.1451)
Child -0.15156 -0.26952 -0.20872 -0.1288
(0.09028)* (0.07370)*** (0.11650)* (0.1452)
Switch 0.00036 0.15556 0.27601 0.1936
(0.99680) (0.08001)* (0.10895)** (0.1432)
Table 3 - Model Coefficients from quantile and OLS regressions. Figures in parantheses are
standard errors. Significance levels are denoted by * = 10%, **=5% and ***=1%. Source: R
calculations on Wrigley et al (2004b)
Table 3 provides the results of the quantile regression model that results from following the
process described in section 3. In the final column I provide parameter estimates from a
linear model fitted using OLS with the same variables. The variables that are significant in
our quantile regression are the distances to Tesco for the first, third and fourth quartiles
(Dist1, Dist3, Dist4), together with the dummies for being resident in the first, second and
fourth distance quartiles (D025, D2550, D75100). Pre-intervention consumption is significant
as a slope dummy at all four quartiles thereof, and as an intercept dummy quartiles two and
four (F2550,F75100). Having access to a car (D_Car), being influenced about where to shop
by Children (Child) and switching to the new Tesco store (Switch) are also all significant at at
least one quartile.
We can see that different coefficients are significant in different equations. In our OLS
model the only significant variables are the four slope dummies from the before consumption
levels and the intercept dummy of being in the highest pre-intervention consumption quartile.
Implicitly this means that there is very little change, on average, between the two periods.
Reinforcing this is the fact that no other variables are significant. From the density plots in
Figure 1 this was always likely to be the case.
However, we do find that there are many other factors that influence the regression
differently across the quantiles. At the bottom end of the after consumption range distance
travelled is significant as a slope dummy in the third distance quartile (Dist3) and as a
dummy for the three quartiles featured. Interestingly consumption is increasing in distance
across Dist3, rather than declining as might be suggested by theory. When a household in
this quartile has their shopping decision influenced by children we see a fall in the portions of
fruit and vegetables eaten per day in a reversal of Gustavsen and Ricketson (2008). Access
to cars and switching to Tesco do not seem to have had an effect on the lowest quartile.
In the mid range distance slopes are not significant and nor are the distance intercept
dummies, suggesting much less of a spatial interaction here. What we do see though is that
all of the pre-intervention quartiles are significant, as is the F2550 intercept dummy. In the
mid range there is a negative effect of not having access to a car on portions eaten per day.
While this could be expected it is surprising that this is the only level we see this effect at.
Children once more exert a negative influence. Switching to Tesco has a positive effect ,
raising consumption by about 1 portion per week11. When we recall that the median
consumption level is lower than the recommended 5 per day, this increase is in line with
bringing improved diet to the area. For this quartile the intercept is not significant.
At the high end, distance again plays an important part. The slope dummies for the first and
fourth distance quartiles are significant. We also see a positive effect from living in the areas
furthest from Tesco. In the absence of any other information it is difficult to pinpoint why
exactly this should be the case. All of the slope dummies are significant, pointing to the
expected tight relationship between before and after consumption. Of the intercept pre-
intervention consumption dummies only the fourth quartile, F75100 is significant at the 10%
level. When children influence the decision about where to shop there is a negative impact
on portions purchased, which is the same as at the other quartiles. Switching to Tesco has a
positive effect, increasing consumption by 0.2 portions per day. Individuals who consume
more fruit and vegetables than average anyway are likely to be those who will look to
increase their consumption when the product range is increased. Hence switching to Tesco
gives them more opportunity to buy fruit and vegetables and so increase their consumption.
2.4.2 Differences Across the Quartiles?
Whilst the quantile regression coefficients appear to differ across quantiles it is important to
confirm that these differences are significant. If quantile regression is to be the best way of
modelling the impact of Tesco then we need the coefficients to be significantly different
across the quantiles. We can show this using graphical representations or tests for common
coefficients. Figure 2 shows the sixteen variables in the regression and how their parameters
vary across the quantile range. On these plots the OLS linear model parameters are shown
Increase 0.156 portions per day, equivalently 1 portion per week.
as a solid (red) horizontal line. 90% confidence intervals for the linear model and quantile
coefficients are drawn, the former as horizontal (red) dashed lines and the latter as shaded
areas. Where coefficients are significantly different we look for the quantile regression values
coming out of the confidence intervals from the OLS linear model. This can be seen to be
the case for Dist1, Preq4 and F75100. We can formally test for a variable having different
coefficients across the quantiles and the results of this test are reported in Table 4.
Figure 2 – Plots of coefficients from OLS and quantile regressions. Confidence intervals
illustrated at 90%. Source: R calculations on Wrigley et al (2004b)
In the area near Tesco it is seen that there is a significantly different effect of distance in the
three quantiles. The furthest distance quartile also reports a significant p-value as a slope
dummy suggesting differing effects at the top end of the distance spectrum as well. Of the
intercept dummies these two again come out as significantly different, as does the upper
middle distance quantile. Interestingly the pre-consumption slope dummies are significant at
25-50% and 75-100%, and again the intercept dummies follow suit. It is no surprise that this
happens given the strong link between before and after consumption levels and the way that
makes each level impact most on its own after quantile. Neither access to a car or the
impact of children report as having differential effects across quantiles.
Our variable of interest is the switch to Tesco. In Wrigley et al (2003) it was found that there
was a significant effect of switching to use the new Tesco store, but with no attempt to
separate that out across the consumption distribution. Here we see a stronger effect
amongst those who consumed more fruit and vegetables than there is at the bottom end.
With the test of varying coefficients revealing significance at 6% it follows that this result is
significant and that there is a different effect on those who consume most. With supermarket
intervention designed to help those eating the least number of portions of fruit and
vegetables this result has to be a concern. If people are only increasing their consumption if
they previously ate more healthily anyway, then the policy of permitting supermarkets to
build in such areas is not having its desired effect. Our model suggests there is a need to
look again at the way in which the supermarket impacts.
Parameter F Statistic P Value
Dist1 4.1604 0.015753**
Dist3 3.4446 0.32129
Dist4 2.6588 0.070306*
D025 3.9396 0.019624**
D2550 3.3094 0.036760**
D75100 3.3152 0.036549**
Preq1 0.3922 0.675620
Preq2 5.5388 0.003999***
Preq3 0.8063 0.446654
Preq4 22.7771 1.704e-10***
F2550 5.1675 0.005784***
F75100 11.4661 1.127e-5***
D_CAR 0.1485 0.862004
Child 1.3039 0.271719
Switch 2.8807 0.056355*
Table 4 - F Test on equality of coefficients across quanitles. Significance at
* = 10%,**=5%, ***=1%.
2.5 Model Prediction
To get a better feel for what the model is telling us we can have a look at the way in which
the impact of our variables affects the predicted levels of post-intervention fruit and
vegetable consumption. Some of the most theoretically important variables in the regression
are dummies and so it is straightforward to extract their effect. Similarly pre-intervention
consumption and post-intervention consumption are quite similar so there are less
interesting relationships to be extracted from the slope and intercept dummies on this topic.
What I am interested in however, is the way in which distance plays a part, and specifically
whether there are strong spatial effects at the three post-intervention quantiles.
2.5.1 Indications from the Base Model
Analysis of a set of predicted consumptions, generated by the coefficients of Table 3 and
zero standard errors12 is presented in Figure 3. To generate these predictions we use the
median distance from Tesco in each of the distance quartiles13 and median values from
within each of the pre-intervention consumption quartiles14.
This follows the Koenker (2007) guide to using R for the estimation of quantile regression.
These are 1018,1386,1836 and 2192 for quartiles one to four respectively.
These are 1.142,2,3 and 4.714 for quartiles one to four respectively.
Figure 3 – Predicted fruit and vegetable consumption for households in the highest
pre-intervention consumption quartile
On the graph the three values of are plotted for switchers and non switchers who were
initially in the fourth quartile of pre-consumption. At the lower end the diamonds are
switchers and the circles non switchers and there is very little difference between the two.
Recall switching was not statistically significant for 0.25 . In the middle range 0.5
the switchers are plotted as squares while the non switchers are upward pointing triangles.
At this middle range there is now a notable difference between the two variables, from Table
3 this is 0.16 portions per day. At the top end 0.75 downward facing triangles chart the
non switchers while solid dots denote the switchers. Here the differential between the two
sets of points is 0.28 portions per day15.
However, while the conclusions are neat, figures like this mask the true picture of the
coefficients as there are standard errors to consider. Figure 4 plots the effect of distance for
the three quantile levels as continuous functions of distance from Tesco under the
assumption that all households switch to the new store, have a car and are influenced by
children in where they shop16. We keep all insignificant variables in the regression at this
time. To illustrate the relationship with Figure 3 I show the 0.25 prediction for the first
distance quantile as a hollow dot. All of the solid lines will pass through the predictions that
were illustrated in Figure 3. I also illustrate 95% confidence intervals as dashed lines. Black
represents 0.25 , red 0.5 and green 0.75 . Vertical dotty lines denote the
different quartile boundaries. Generally confidence intervals are wide and there is a lot of
overlap between these intervals suggesting that the different coefficients may not be as
significant as might be hoped.
Again from Table 3
This is the same as Figure 3.
Figure 4 - 95% confidence intervals for predicted values across distance quartiles.
The broad pattern is that consumption is predicted to fall in the first quartile, and be constant
in the second. In the third distance quartile the predicition is upward sloping at 0.25 and
0.75 but downward sloping for 0.5 . In the final distance quartile the effect of
distance at 0.25 is almost nil, while at 0.5 and 0.75 predictions slope
While these patterns appear interesting it should be noted that all variables were kept within
the regression. To tighten the predictions each must be considered separately, with
insignificant variables removed. After omitting a set of variables the resulting model is tested
against the original specification for significance.. .
2.5.2 The Fourth Distance Quartile: Importance of Cars?
There is still some interest in decomposing what is happening to make predicted
consumption increase in the fourth distance quartile. On Figure 3 an interesting pattern
emerges at all levels of . We can see a fall in the number of portions per day that are
predicted to be consumed from distance quantile 1 to 2 and 2 to 3, but a rise between 3 and
4. This occurs to different extents at the three post intervention consumption quantiles, the
greatest leap being at the lower end. When we reach distances of over 2km it is reasonable
to assume that the car ownership will have an impact on the closeness people feel to the
Tesco store. Car deprivation, a household lacking access to a motor vehicle is about 45% in
the three quartiles nearest to Tesco, but is only 25% in the furthest quartile. This suggests
that maybe there is a different type of consumer there which car ownership could proxy for.
To explore the potential impact of car access I split distance into two new distance slope
dummies, one with car and one without. The quantile regression process is repeated and
after elimination of insignificant variables we are left with the coefficients given in Table 5.
Ncar4 is the product of Dist1 and D_Car, being a slope dummy for distance in the fourth
distance quantile for households without car access.
0.25 0.5 0.75
Intercept 0.23101 0.66234*** 0.74913*
(0.16174) (0.19824) (0.42037)
Ncar4 -0.00017*** -0.00011 -0.00025
(0.00003) (0.00008) (0.00017)
D025 0.34453** 0.49610*** 0.63066***
(0.13717) (0.10374) (0.19178)
D2550 0.15477 0.28052** 0.44599***
(0.13980) (0.12240) (0.15105)
D75100 0.43277*** 0.38182*** 0.38121***
(0.10625) (0.11477) (0.14420)
q1 0.62995*** 0.52727*** 0.93902***
(0.12584) (0.18105) (0.33617)
q2 0.66417** 1.29091*** 2.11220***
(0.33507) (0.32672) (0.36202)
q3 0.59724*** 0.65455*** 0.85366***
(0.05831) (0.06876) (0.13085)
q4 0.10296 0.67210*** 0.95856***
(0.13577) (0.22327) (0.06464)
FV2550 -0.19068 -1.37403** -2.75331***
(0.70128) (0.63449) (0.81192)
FV75100 2.56719** 0.12876 -0.66109
(0.70227) (0.97767) (0.55358)
Child -0.10573 -0.19740* -0.24739**
(0.06297) (0.08174) (0.11549)
Switch 0.04904 0.07532 0.28571***
(0.05370) (0.08416) (0.10600)
Table 5 – Coefficients of the re-estimated quantile regression. Significance
* - 10%, ** - 5% and *** - 1%. Source R calculations on Wrigley et al (2004b)
In this regression distance is much less significant, with all but one of the slope dummies
dropping out. Pre-intervention consumption remains significant, as do the Child and Switch
dummies at higher . Only the fourth quartile displays any effect of car ownership and so I
restrict my focus to the extreme distances from Tesco Seacroft. Figure 5 shows the
downward sloping predictions for households without cars, and horizontal predictions for
those with car access, as solid lines. Around these 95% confidence intervals are shown as
dashed lines for non car owners and dotty lines for households with motor vehicle access.
As before black is used for 0.25 , red for 0.5 and green for 0.75 . There is no
overlap between the black and red lines, but there is less differential between the middle and
upper quantiles. At 0.25 we see no overlap between car and non car coefficients, but we
do see some at 0.5 and at 0.75 there is overlap throughout the range. This suggests
that the major effects of not having a car are felt by those who consume the least fruit and
2.5.3 The Fourth Distance Quartile: Area Effects?
For the most part the survey area contains a number of similar council estates all laid out in
discernable patterns. However in the north east corner of the map there is a part of the
survey area which is characterised more by closes and cul-de-sacs. Coupling this with the
differential in car access it could be inferred that there is a slightly higher level of income in
those parts. If we are to isolate effects for these individuals then the easiest way to do this is
to use distance to the Asda Killingbeck store at the south west extreme of the map. Including
a dummy for being in the fourth quartile of distance from Killingbeck isolates the area of
interest well. Running the quantile regression process once more this dummy is soon
eliminated as insignificant. We should not necessarily let this eliminate the possibility of there
being a North East corner effect, but within the limited scope of the available data it is not
possible to find another variable that is sufficiently correlated to the area to get the desired
Figure 5 – Effects of car ownership at the fourth distance quartile, with 95% confidence
2.5.4 Sharpening the Predictions
In all of our predictions there have been large standard errors caused by the presence of
insignificant variables at the various levels. A natural solution is to permit a different model
for each quantile and so eliminate all insignificant variables. We do this using an iterative
process, beginning with those that were not significant in the first model17. A new model is
then run, and if there are any insignificant variables those are eliminated. After each
omission set we test the joint hypothesis that the coefficients on the omitted variables were
indeed zero. When this test returns a level of significance of 5% or higher we stop. Table 6
gives the variables
Variable 0.25 0.5 0.75
Intercept 1.42857*** 0.86474*** 1.05357***
(0.08509) (0.14194) (0.29275)
Dist1 0.00038*** 0.00047***
Preq1 -0.37500*** 0.63153*** 0.86250***
(0.09348) (0.15950) (0.25951)
Preq2 0.61412*** 1.75000***
Preq3 0.29167 0.65819*** 0.85326***
(0.04681) (0.05287) (0.09922)
Preq4 0.70041*** 0.87500***
See Table 3.
Test p-value 5.401x10-8 0.04738 0.0005038
Table 6 – Quantile Regression with Insignificant Variables Omitted. Figures in paratheses
are standard errors. Significance: * = 10%, ** = 5%, *** = 1%. Source: R Calculations on
Data From Wrigley et al (2004b)
and their coefficients for each of the levels. In the final row we report the Wald Test p
value for the final model against the original model of section 2.4.
At the lowest quartile only the pre-intervention consumption variables are significant
suggesting that distance and switching to the new Tesco store do not have an impact. We
see more interesting results at the mid range, 0.5 . There, as well as the pre-intervention
consumptions, distance in its first quartile (Dist1) and the characteristic dummies are
significant at 5%. The influence of children and being without a car both cause falls in the
predicted post-intervention consumption as in earlier models above. Switching to Tesco is
significant at 10% and leads to an increase in the number of portions of fruit and vegetables
that a household is predicted to eat per day. So we see that the intervention does have a
positive impact at some part of the distribution. At the top end again it is the pre-intervention
diet that is important, but there is also a small distance effect in the first distance quantile.
Comparing the standard errors (in parantheses below each coefficient) with those in the
original model we can see they are all a lot lower. Hence any predictions would have much
smaller confidence intervals. However, with distance now not playing an important part
recreating the figures above for the new models is not possible. Here I leave the illustration
to the text.
The Seacroft Intervention Study offered an excellent opportunity to study the impact that a
new supermarket, in an area of poor retail access, could have on fruit and vegetable
consumption. Extending the study of to take advantage of improved data on the distances
people travel to get to stores, and by using quantile regression we have been able to
decompose the impact across the various levels of pre-intervention consumption. What we
find is that the increase in portions of fruit and vegetables eaten on average per day, is
greatest at the top end of the distribution and there are people at the lower end who show
very little improvement at all. Spatially, impacts are concentrated most on those living near
the store but it is by no means just a local phenomenon, with all distance quantiles
displaying an increase among certain groups. Supermarkets target areas where road access
is good, but where there is a clear need for an improvement in the availability of healthy
produce. In such places it is hoped that those with some of the poorest diets will gain
residual benefit from having a cheap retailer on their doorstep. While we find that those
without children and with access to a car do display significant increases in consumption, for
those with children and no car the effect is almost nil. This tension with the objective of the
supermarket is a concern to policymakers and something that should be investigated.
Although the quantitative conclusions of this chapter are quite specific to the Seacroft study
the broad results and methods used could be applied to other geographic areas. However, it
is important to consider issues, like the high incidence of car ownership in the fourth distance
quartile, which could influence results. Adoption of a one size fits all approach is therefore
not recommended. Further, with the limited size of the Seacroft Intervention Study such
attempts to further decompose the impact of a new supermarket may well need a new study,
which is beyond the scope of this Chapter. It may be helpful to get a better understanding of
the ways in which people walk to stores, and how they make the decision about transport
mode, if we are to fully understand the benefits of using road distance data. We have also
not taken into account things like traffic levels, shopping times, slopes of the roads etc,
which could all clarify the transport picture. Possible interest lies in taking a model like this
and applying it to census data for another area to explore the predictive power contained
within. All of these extensions would offer further useful evidence to the debate about
Supermarkets as a solution to poor retail access.