Analysing binomial data conditional on number of Vtpositives being by 3xqiKoZt

VIEWS: 3 PAGES: 178

									 Statistical Analysis of the
SEERAD/SAC E. coli O157
Prevalence Study, 1998-2000
 SEERAD FF Project BSS/028/99



    Iain J. McKendrick

Biomathematics & Statistics
        Scotland




              1
                                Executive Summary

Properties of Data

Samples from 952 farms are included in the analysis, with a total of 14,856 faecal
samples analysed. Of these faecal samples, 1231 were positive for verocytotoxic E.
coli O157. These positive samples were sourced from 207 farms. Hence, the raw
figures indicate that 21.7% (19.2%, 24.5%) of groups sampled contained shedding
animals, and that the animal level prevalence is 8.3% (7.3%, 9.4%). However, these
figures do not allow for the effects of sampling error (which in a situation with many
groups with a small number of shedders would tend to underestimate the number of
groups containing shedders) and of the mixed nature of the sample (farms with no
infection will, by definition, have zero prevalence, a more useful statistic is the
estimate of the animal prevalence on those farms which are positive). The data are
analysed using a beta-binomial model, from which it is estimated that the proportion
of shedding animals is 7.9% with a 95% confidence interval of (6.5%, 9.6%). This is
slightly lower than the raw estimate given earlier. This adjustment arises from the
more appropriate modelling of the asymmetric prevalence distribution. It is estimated
that 22.8% of finishing groups contained at least one positive shedding animal, with a
95% confidence interval of (19.6%, 26.3%). The point-estimate and confidence
interval are both slightly higher than the raw estimates given earlier, since these
figures incorporate an adjustment to allow for farms with low shedding rates being
misclassified as negative due to sampling variability.


Analysis of Within-Farm Prevalences

These data are highly skewed, with many zero returns. This is because their true
statistical distribution should be a mixture distribution, with true negative farms
always generating a zero response and positive farms generating a range of responses,
many of which will be zero, with variability arising from the between-farm variability
and the sampling variability. Ignoring this aspect of the data gives rise to models with
unacceptable residuals. The data is handled by restricting analysis to those
observations with non-zero responses. Hence, the epidemiological analysis answers
the question ‘given that the farm has at least one positive sample, what factors tend to
be associated with higher within-farm prevalences?’

The data are analysed by fitting a series of generalised linear models to each variable
in turn, developing a multivariate model (using some of the stepwise regression
functions available for this class of model) containing all likely factors, and then
refitting this model as a generalised linear mixed model (GLMM). Hence the ultimate
model uses the most appropriate algorithm for the data. The data are consistently
fitted as binomial random variables with logit link functions. Generalised linear
models are consistently fitted with estimated dispersion parameters (all of which are
clearly greater than one), while the GLMMs are fitted with Farm as a random effect
and fixed dispersion (since farm is the basic sampling unit). Other possible random
effects are insignificant.

Within the univariate analysis, examining structural variables, animal health division
and sampling month are found to be highly significant. Examining possible


                                           2
explanatory variables, we find that housing status (housed or unhoused) has an
extremely significant effect on the prevalences (housed animals have a much higher
prevalence than unhoused animals).


Factor/Variable          Effect                          Comment
Division                 Highland area has a higher      Effect even stronger in
                         prevalence, South-West has a    ultimate         multivariate
                         low prevalence.                 model.
Sampling Month           Lower in summer months.         Effect      disappears     in
                                                         ultimate         multivariate
                                                         model. Effect explained
                                                         by differential housing in
                                                         different months.
Season/ Seas_List        Summer and Autumn show          Effect better explained by
                         lower prevalences.              examining results on a
                                                         month by month basis.
                                                         Effect      disappears     in
                                                         multivariate model.
Housed                   Housed animals have a much      This is the key finding of
                         higher prevalence. Highly       the study. All other parts
                         significant.                    of the analysis depend on
                                                         the correct modelling of
                                                         the ‘Housed’ effect.
Recent Move              A recent move is associated     This effect becomes even
                         with lower prevalences.         more clear when explored
                                                         in     conjunction      with
                                                         ‘Housed’.
Recent Change in Feed    Recent change in feed           This effect becomes even
                         associated   with lower         more clear when explored
                         prevalences.                    in     conjunction      with
                                                         ‘Housed’.
Silage_Home              Silage production on the        Effect explained more
                         farm is associated with lower   fully     in     multivariate
                         prevalence      in    housed    analysis.
                         animals.
Silage_Slurry            Silage production on the        Effect explained more
                         farm with the spreading of      fully     in multivariate
                         slurry is associated with       analysis.
                         lower prevalence in housed
                         animals.
N_Pigs                   Higher number of pigs is        Model result depends on 8
                         associated     with     lower   points with high leverage.
                         prevalence.                     Suspicious that categorical
                                                         variable derived from this
                                                         variable (Pigs) is not
                                                         significant. Effect found
                                                         not to be significant in
                                                         final multivariate model.
                                                         Probably spurious.


                                        3
N_Deer                 Higher number of deer is Model result depends on 1
                       associated     with      higher point with high leverage.
                       prevalence.                     No basis for drawing any
                                                       wider conclusions from
                                                       this result.      Probably
                                                       spurious.
Water                  Natural     water     supplies Natural water supplies
                       associated with significantly associated with unhoused
                       lower prevalences than main animals. Even so, natural
                       supply.                         water supply is associated
                                                       with lower prevalence.
Housing, Supplementary All of these factors, although No information above that
Feed, Forage, Silage, apparently significant in the gained from ‘Housed’
Concentrate,           univariate    analysis,     are
Grass_Manure,          confounded with Housed.
Grass_Slurry,
Grass_Sewage,
Grass_Geece,
Grass_Gulls

Fitting a multi-factor model, particularly exploring the interactions between the
Housed variable and the other possible variables, we find that the following factors
are of interest:

    Factor/                    Effect                  Log         se        p-
    Variable                                          Odds                  value
                                                      Ratio
    Housed         Housed animals have higher         1.319       0.33     <0.001
                   prevalences.
    FCattle        Farms with >100 finishing         -0.702       0.23      0.004
                   cattle have significantly lower
                   prevalences than those with
                   <100.
Housed/’Recent    Farms with Housed Animals           0.480       0.43       0.26
Changes       in  and recent changes have higher
Housing or Diet’  prevalences than farms with
interactions      unhoused animals. This effect
                  is not formally significant.        0.891       0.33      0.007
                  Farms with Housed Animals
                  and no recent changes have
                  higher prevalences than farms
                  with Housed Animals and
                  recent changes.
Water    sourced Farms with animals at pasture       -0.708       0.35       0.04
from      natural have lower prevalences if the
supply            water is from a natural source.




                                         4
Slurry spread on Farms with housed animals             -0.5529       0.29      0.07
Farm             which spread slurry on their
                 silage fields have a lower
                 prevalence than farms with
                 housed animals which do not.
Animal Health Scotland divided into three
Division         regions: Highlands; Central,
                 Islands, North-East and South-
                 East; and South West.
                 Highlands        exhibits       a      0.969        0.42      0.02
                 significantly higher prevalence
                 than the portmanteau region.
                 The South West exhibits a             -0.600        0.28      0.03
                 significantly lower prevalence
                 than the portmanteau region.
Sampling Month No         significant      effects     Various     Various     0.23
                 identified.     All variability
                 explained by explanatory
                 variables above, especially
                 Housed.
Sampling Year    No       significant      effects     Various     Various     0.61
                 identified.

Hence, various explanatory factors and variables have been identified as being
associated with the within-farm prevalence of E. coli O157 shedding in finishing
cattle on positive farms. No statistically significant management system variability
was observed in the analysis of the basic data, and nothing further became apparent
following the fitting of the multi-factor model. Similarly, there was no evidence of
any long-term trend in prevalences over the lifetime of the study, and this conclusion
remained unaffected by the fitting of the multi-factor model. By contrast, the basic
data showed evidence of variability between different Animal Health Divisions, and
this effect remained in the multi-factor model, unexplained by any of the proposed
explanatory factors. The basic data showed highly significant evidence of cyclicity
by month. When included in a model with the full multi-factor model, the month
effect was found to be insignificant, being fully explained by other explanatory
factors. Hence it can be concluded that although the within-farm prevalences do vary
with month, this is explained by the proposed explanatory factors. By contrast, the
geographical variability in the data appears to be genuine, and is best examined after
the extraneous effects of the other explanatory factors have been allowed for in the
model.


Analysis of Between-Farm Prevalences

The detailed data collected in the study can be converted into binary (or Bernoulli)
data, where the farm is recorded as a positive if at least one of the samples collected
from that farm is positive, and negative if all samples are negative. The binary data
can then be analysed in terms of the probability of observing a positive farm on
different types of farm. These data present fewer difficulties in analysis than the
within-farm prevalence data: since only positives and negatives are recorded, it is


                                          5
impossible for a generalised linear model to provide a poor fit in terms of the
distribution of residuals, since the data does not contain enough structure for any lack
of fit to occur. Accordingly, all the models in this section are fitted with dispersion
parameter set equal to one, since it is impossible to estimate any such over-dispersion
from the data. Many of the diagnostics which are available in terms of the fit of the
model for Binomial data are not useful for Bernoulli data. It is appropriate to examine
the data in this format for two reasons: firstly, since zero prevalence farms have been
excluded from the within-farm analysis for technical statistical reasons, it is desirable
to investigate the factors which are associated with farms being negative, since
otherwise these data will have never have been analysed. Secondly, there is no reason
to believe that the factors which promote high within-herd prevalences on farms
which are positive will be the same as the factors which either promote the infection
of farms with E. coli O157 or which encourage the maintenance of infection once
introduced. Obviously, a factor which is associated with high within-herd prevalence
will have potential to also be associated with a high probability of herd infection,
however, it will be interesting to identify where different factors may come into play
in the two models.

The data are analysed by fitting a series of generalised linear models to each variable
in turn, developing a multivariate model (using some of the stepwise regression
functions available for this class of model) containing all likely factors, and then
refitting this model as a generalised linear mixed model (GLMM). Hence the ultimate
model uses the most appropriate algorithm for the data. The data are consistently
fitted as Bernoulli random variables with logit link functions. Generalised linear
models are consistently fitted with dispersion parameters fixed equal to one, while the
GLMMs are fitted with Farm as a random effect and a fixed dispersion (since farm is
the basic sampling unit). Other possible random effects are found to be insignificant.

Within the univariate analysis, examining structural variables, none are found to be
highly significant. There is some weak evidence of an effect due to Sampling Year,
but this effects are not significant at the 5% level. Examining possible explanatory
variables, by contrast to the within-herd model, we find that Housing status has a
negligible effect on the probability of a farm being identified as positive. The
following factors were found to be of interest in the univariate analysis:

Factor/Variable               Effect                         Comment
Division                      No formally statistically      No       trend     apparent,
                              significant      effects.      although it is interesting
                              Highland division has a        that Highlands are so low,
                              particularly         low       when the within-herd
                              prevalence.                    prevalence     was     high.
                                                             Effects utterly disappear in
                                                             the multifactor model.
Sampling Month                No statistically significant   In the within-farm model,
                              evidence of any effects        January-April tended to
                              (p=0.26).       Prevalences    show higher prevalences,
                              from      December        to   associated with Housing
                              February show signs of         effects. This aspect of the
                              being lower.                   dataset requires careful
                                                             interpretation, since data


                                           6
                                                                 from early 2000 is
                                                                 included in the January to
                                                                 April estimates, and not in
                                                                 the other months. There is
                                                                 some evidence that the
                                                                 data from 2000 exhibits a
                                                                 lower prevalence. Hence
                                                                 this variable is analysed
                                                                 along with Sampling Year.
                                                                 However, even when Year
                                                                 and Sample Month are
                                                                 fitted in the same model,
                                                                 there is only weak
                                                                 evidence of any effect due
                                                                 to     Sampling       Month.
                                                                 However, the effects which
                                                                 are apparent in the
                                                                 univariate analysis can be
                                                                 shown       be     significant
                                                                 within the multifactor
                                                                 analysis.
Sampling Year                   A small drop in 1999 and         Due to a lack of balance in
                                a large drop in 2000. The        the dataset, this result is
                                result is close to statistical   derived from a model fitted
                                significance (p=0.06).           with Sampling Month.
                                                                 There      is     compelling
                                                                 evidence of a drop in
                                                                 prevalence by year 2000,
                                                                 less so for year 1999.
                                                                 Similar results are seen in
                                                                 the multifactor model,
                                                                 where the trend is highly
                                                                 significant.
Number of Finishing Cattle      Higher     numbers       of      Each      of     the     eight
                                finishing cattle were            significant cattle number
                                associated with a high risk      factors and variables gives
                                of the farm being positive.      the same result: more
                                P-value suppressed as            animals equates to a higher
                                arising from a poorly            risk of the farm being
                                fitting model.                   positive.        Some are
                                                                 rejected as presenting a
                                                                 poorly fitting model: others
                                                                 because another factor is
                                                                 found      to     be     more
                                                                 informative. This variate
                                                                 was overly sensitive to a
                                                                 small number of farms
                                                                 with high numbers of
                                                                 finishing cattle.
Categorised    Number        of Categorising the numbers         One       of    the      most


                                              7
Finishing Cattle               of animals into 4 classes,    informative factors in this
                               groups containing 1-49        sub-grouping.       Carried
                               animals were less likely to   forward      for    further
                               be identified as positive     investigation in the multi-
                               than larger groups, while     factor model.
                               groups of >200 animals
                               had       even      higher
                               prevalences still. Effects
                               are highly statistically
                               significant (p<0.001).
Number of Groups of Cattle     Higher numbers of groups      This variate was overly
                               of cattle were associated     sensitive to a small number
                               with a higher risk of the     of farms with high
                               farm being positive. p-       numbers of groups of
                               value     suppressed     as   cattle.
                               arising from a poorly
                               fitting model.
Categorised Number        of   Higher numbers of groups      Factor         relatively
Groups of Cattle               of cattle were associated     insignificant.    Lacked
                               with a higher risk of the     information relative to
                               farm     being    positive.   other terms in the sub-
                               (p=0.08). Fit still fairly    grouping.
                               poor.
Number of Cattle in Sampling   Higher      numbers      of   This variate was overly
Group                          animals in the sampling       sensitive to a small number
                               group were associated         of farms with high
                               with a higher risk of the     numbers of groups of
                               farm being positive. p-       cattle.
                               value     suppressed     as
                               arising from a poorly
                               fitting model.
Categorised Number of Cattle   Higher      numbers      of   Carried forward for further
in Sampling Group              animals in the sampling       investigation in the multi-
                               group were associated         factor model.
                               with a higher risk of the
                               farm     being     positive
                               (p<0.001).
Number of Cattle               Higher numbers of cattle      This variate was overly
                               were associated with a        sensitive to a small number
                               higher risk of the farm       of farms with high
                               being positive. p-value       numbers of cattle.
                               suppressed as arising from
                               a poorly fitting model.
Categorised Number of Cattle   Higher numbers of cattle Carried forward for further
                               were associated with a   investigation in the multi-
                               higher risk of the farm  factor model.         Lacks
                                                        significance when fitted
                               being positive. (p=0.002).
                                                        with other factors.
Source of Cattle               Farms which never buy in Lacks significance when
                               animals     have       a fitted with other factors in


                                           8
                            significantly       lower   the multivariate model.
                            (p=0.03) risk of being      When number of finishing
                            positive than those which   cattle or number of
                            always or sometimes buy     sampling      groups       are
                            in animals.                 included in the model, it
                                                        can be seen that source of
                                                        cattle lacks explanatory
                                                        power.
Breed                       Farms with B_D_DB           An extremely small level,
                            class animals have a        with a correspondingly
                            higher prevalence than      high leverage, it is not
                            others (p=0.018).           surprising that it is found
                                                        to lack significance when
                                                        fitted with other factors.
Beef Cattle on Dairy FarmFarms        which       are   Risk group identified from
                         described as having a          analysis of interaction of
                         dairy system with beef         two more broadly defined
                         cattle have a statistically    factors. Possible risk of
                         significantly higher risk of   over-trawling the data.
                         being positive than other
                         farms (p=0.017).
Spreading of Slurry on Farms with unhoused
Pasture                  animals which spread
                         slurry on the pasture have
                         a higher risk of being
                         positive than those which
                         do not, or those which
                         have housed animals.
                         (p=0.003).
Spreading of Manure on Farms with unhoused
Pasture                  animals which spread
                         manure on the pasture
                         have a lower risk of being
                         positive than those which
                         do not, or those which
                         have housed animals.
                         (p=0.037).
Number of Goats          High number of goats is        This variate was overly
                         associated with a higher       sensitive to two farms with
                         risk of farm being             higher numbers of goats.
                         positive.           p-value
                         suppressed as arising from
                         a poorly fitting model.
Presence of Pigs on Farm The presence of pigs on a
                         farm is associated with a
                         higher risk of the farm
                         being classed as positive
                         (p=0.01).
Lab Operator             The identity of the lab        This effect was found to be
                         operator who carried out       spurious, arising from the


                                       9
                                              unbalanced nature of the
                             the assaying of the
                                              data with respect to this
                             samples was found to be a
                             significant      factor. Different operators
                                                effect
                             (p=0.039).       carried out work at
                                              different times, on samples
                                              with      different    mean
                                              prevalences.
Max Age of Animals in A higher maximum age is This variate is included for
Group                 associated with a lower completeness, since it is
                      prevalence (p=0.31).    found to be relevant in the
                                              multi-factor          model,
                                              although, as can be seen, it
                                              lacks      any      apparent
                                              explanatory power         in
                                              isolation.

Fitting a multi-factor model, we find that the following factors and variates are of
interest:

   Factor/                  Effect              Log Odds        se        p-value
   Variable                                       Ratio
Sampling Year     Allowing       for      the    -0.425        0.21         0.04
                  explanatory factors, farms
                  sampled in year 1999 are
                  at lower risk of being
                  positive    than     those
                  sampled in 1998.

                  Allowing       for      the    -0.371        0.26         0.15
                  explanatory factors, farms
                  sampled in year 2000 are
                  at lower risk of being
                  positive    than     those
                  sampled in 1999.

                  Allowing       for      the    -0.795        0.31         0.01
                  explanatory factors, farms
                  sampled in year 2000 are
                  at lower risk of being
                  positive    than     those
                  sampled in 1998.
Sampling          A broad cyclical effect,      Various       Various       0.02
Month             with prevalence effects
                  peaking in Summer and
                  troughing in Winter.
                  Anomalous changes in
                  prevalences observed in a
                  number of months, such
                  as June, April and
                  November.


                                         10
Categorised    Farms with 12-28 animals              0.687           0.23         0.003
Number      of are at a higher risk of
Animals     in being positive than those
Sampling       with <12 animals.
Group.
               Farms with >28 animals                0.462           0.19          0.03
               are at a higher risk of
               being positive than those
               with 12-28 animals.
Categorised    Farms      with     50-199            0.367           0.19          0.05
Number      of animals are at a higher
Finishing      risk of being positive than
Cattle.        those with 1-49 animals.

                    Farms with 200+ animals          0.614           0.30          0.04
                    are at a higher risk of
                    being positive than those
                    with 50-199 animals.
Spreading      of   Considering only farms           1.205           0.32        <0.001
Slurry on Pasture   with animals at pasture,
                    those which spread slurry
                    are at a higher risk than
                    those which do not.
Spreading      of   Considering only farms           -1.155          0.36         0.001
Manure         on   with animals at pasture,
Pasture             those     which     spread
                    manure are at a lower risk
                    than those which do not.
Dairy       Farms   Dairy farms with beef            1.965           0.64         0.002
with         Beef   cattle are at a higher risk
Cattle              of being positive than
                    other farms.
Presence      of    Farms with pigs are at a         0.892           0.35          0.01
pigs on farm.       higher risk of being
                    positive     than    those
                    without pigs.
Maximum age         Higher maximum age is            -0.031         0.015          0.04
of cattle in        associated with a lower
sampling group.     risk of the farm being
                    positive.


Of these, it should be pointed out that the factor ‘Categorised Number of Animals in
Sampling Group’ is correlated with the number of animals in the sampling group and
hence with the number of samples collected from the group. Hence it might be
thought likely that a positive relationship might be generated through the higher
detection probability arising from a larger sample. Consideration of the data suggests
that this is unlikely, but even if the result is discounted on this basis, the inclusion of
FCattle in the model even in the presence of the sampling group factor indicates that
the size of enterprise is a highly significant risk factor.


                                            11
Hence, various explanatory factors and variables have been identified as being
associated with the farm prevalence of E. coli O157 shedding in finishing cattle. No
statistically significant geographical or management system variability was observed
in the analysis of the basic data, and nothing further became apparent following the
fitting of the multi-factor model. By contrast, the basic data showed evidence of a
long-term trend towards lower prevalences over the lifetime of the study, and this
trend remained in the multi-factor model, unexplained by any of the proposed
explanatory factors. The basic data showed no significant evidence of any cyclicity
by month or season, although various peculiarities were observable in the analysis.
When included in a model with the full multi-factor model, the month effect is found
to be significant. It is important to stress that this significance is associated with the
same peculiarities observed in the univariate model: the effect is not an artefact of a
poorly fitting model. Hence it can be concluded that the farm level prevalences do
vary with month, in a fashion which is not explained by the proposed explanatory
factors.




                                           12
Properties of Data

Samples from 952 farms are included in the analysis, with a total of 14,856 faecal
samples analysed. Of these faecal samples, 1231 were positive for verocytotoxic E.
coli O157. These positive samples were sourced from 207 farms. Hence, the raw
figures indicate that 21.7% (19.2%, 24.5%) of groups sampled contained shedding
animals.
126 "Modelling of binomial proportions. (e.g. by logits)."
 127 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=1
 128   FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes;
FACT=9]\
 129

129.............................................................................


***** Regression Analysis *****

Response variate:   VFarmPos
 Binomial totals:   1
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       0          0.0            *
Residual       951        997.0        1.048
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00


*** Estimates of parameters ***

                                                         antilog of
                  estimate         s.e.      t(*) t pr.    estimate
Constant           -1.2807       0.0786    -16.30 <.001      0.2779
* MESSAGE: s.e.s are based on dispersion parameter with value 1


Analysis of the animal level prevalence is complicated by the need to fit a dispersion
parameter and the (frankly) appalling fit of the model, giving a mean and confidence
interval of 8.3% (7.3%, 9.4%).
134 "Modelling of binomial proportions. (e.g. by logits)."
 135 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=*] VTPos; NBINOMIAL=N_Sam
 136   FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes;
FACT=9]\
 137

137.............................................................................


***** Regression Analysis *****

Response variate:   VTPos
 Binomial totals:   N_Sam
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant


*** Summary of analysis ***

                                        mean   deviance approx



                                          13
                         d.f.         deviance    deviance     ratio   F pr.
Regression                  0               0.           *
Residual                  951            5393.       5.671
Total                     951            5393.       5.671

Dispersion parameter is estimated to be 5.67 from the residual deviance
* MESSAGE: The following units have large standardized residuals:
         Unit     Response    Residual
            3        15.00        3.63
           15        21.00        4.02
           30        23.00        4.50
           38        16.00        3.45
          131        17.00        3.87
          259        16.00        3.45
          273        22.00        4.40
          305        18.00        3.81
          326        18.00        3.50
          428        17.00        3.57
          464        14.00        3.32
          514        20.00        4.19
          719        16.00        3.45
          720        17.00        3.87
          864        14.00        3.51
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses


*** Estimates of parameters ***

                                                                        antilog of
                  estimate         s.e.    t(951) t pr.                   estimate
Constant           -2.4041       0.0709    -33.92 <.001                    0.09035
* MESSAGE: s.e.s are based on the residual deviance


This model is, however, extremely poor, since the plot of fractional prevalences
shows that the distribution of positive samples is probably not even unimodal.




                   800

                   700

                   600

                   500
       Frequency




                   400

                   300

                   200

                   100

                    0

                                0.0                   0.5                      1.0
                                             Fractional Prevalence


Histogram of Fractional Prevalences.

However, these figures do not allow for the effects of sampling error (which in a
situation with many groups with a small number of shedders would tend to
underestimate the number of groups containing shedders) and of the mixed nature of


                                                        14
the sample (farms with no infection will, by definition, have zero prevalence, a more
useful statistic is the estimate of the animal prevalence on those farms which are
positive).

In order to deal with these issues, a more complex model for the within-herd
prevalence distribution is proposed. The data are treated as being the outcome of a
mixture distribution, where a proportion pneg of the population are defined as negative
farms and will always return a zero number of positive samples. Among the positive
population, the between farm variability is modelled as a beta distribution, taking
parameters a and b, while the sampling distribution of the faecal pat sampling process
is taken to be binomial. A small number of farms were sampled using rectal samples.
The sampling distribution of this process is taken to be hypergeometric. No positive
samples were collected from rectally sampled groups. Hence, where N is the number
of animal in the group, n is the number of samples collected, and x is the observed
number of positives, the distribution of x is taken to be:

                                                  n  b a  b 
                          p neg  (1  p neg )                                    under faecal pat sampling
                                                  a  b  n b 
P( X  0)                        N n N  n
                                                 i  a n  i  b a  b 
             p neg  (1  p neg )  
                                        i  a  b  N a b 
                                                                                    under rectal sampling
                                  i 0         
                                        n  x  a n  x  b a  b 
   P( X  x, x  0)  (1  p neg )    x                                    under faecal pat sampling
                                               a  b  n a b 


Hence, although two different sampling distributions are involved, they are based on
the same underlying parameters and can be incorporated into the same likelihood.
The log-likelihood is maximised with respect to a, b and pneg.

Parameter       Value
    pneg        3.98E-31
     a            0.0687
     b            0.8013

The beta function to model the between farm variability in positive groups has a bi-
modal shape, reflecting the long tail towards high proportional prevalences. The
population contains a large proportion of groups with low prevalences, which are
likely to give rise to observations of zero positives. This means that the estimate for
pneg and for a and b are highly negatively correlated.




                                              15
         6

         5

         4
   pdf




         3

         2

         1

         0
             0       0.2          0.4           0.6         0.8           1
                               Proportion Shedding

Between-farm variability as summarised by the beta function.

The fit of the model was tested against the faecal pat-sampled observations. These
data were categorised by sample size, and expected values for each response given the
model were calculated. Many of these expectations were extremely small, so the
expectations and observations were grouped into larger combinations with
expectations of at least 5. 55 variables were used to calculate a goodness of fit
statistic. However, the expectations also incorporated 26 constraints, conditioning on
the number of farms associated with each of the sample sizes. Hence there were 29
degrees of freedom associated with the test statistic. The fit to the data is found to be
adequate, with a chi-squared goodness-of-fit test generating a test statistic  29  36 .4
                                                                                2


which has a p-value of 0.16. The mean animal-level prevalence on positive farms was
                                                        a
estimated by the mean of the beta distribution,              and the mean farm level
                                                       ab
prevalence was estimated using a more complex procedure which took account of the
distribution of numbers of finishing cattle in the groups sampled in the study.

This distribution has a highly skewed distribution, as shown below:




                                           16
      Frequency   200




                  100




                   0

                             0                    100                   200
                                       Number of Cattle in Group


Histogram of Number of Cattle in Sampling Groups.
However, when the number of cattle are log-transformed, the distribution looks much
more symmetric:




                  100
      Frequency




                  50




                   0

                         0         1         2          3       4   5
                                        Log(Number of Cattle)


Histogram of the Log of Number of Cattle in Sampling Groups.

The distribution of number of cattle in the sampling groups is modelled as a log-
normal distribution, with parameters as shown in the table below:

 Parameter                Value
     mu                 2.843549
   sigma                0.708497


                                                    17
Assuming no relationship between size of group and the variability in prevalence
summarised in the beta distribution, the beta-binomial model was used to estimate the
fraction of of groups which contained at least one shedding animal (the parameters
already estimated give enough information to do this).

Confidence intervals for the prevalences were generated by exploring the nature of the
profile log-likelihood in the vicinity of the maximum, and using the chi-squared
approximation to the log-likelihood ratio to define a 95% confidence region for a, b
and pneg. Because of the strong negative correlation between pneg and a and b, pneg
was set equal to the maximum likelihood estimate. Marginal confidence intervals for
the mean prevalences were then generated from the profile log-likelihood by
identifying the maximum and minimum values of the prevalences on the boundary of
the confidence region specified by the chi-squared approximation to the profile log-
likelihood ratio. Two variables were assumed unfixed, so the confidence interval was
based on two available degrees of freedom. The results are summarised in the
following table:




                                         18
                       Point Estimate         95% Confidence Interval
     Group-Level             22.8%                (19.6%, 26.3%)
        Prevalence
   Overall Animal-             7.9%                   (6.5%, 9.6%)
  Level Prevalence

Just under one quarter of the groups of finishing cattle contained at least one shedding
animal. The point-estimate and confidence interval are both slightly higher than the
raw estimates given earlier, since these figures incorporate an adjustment to allow for
farms with low shedding rates being misclassified as negative due to sampling
variability. These figures imply that this misclassification occurred in just over 1% of
farms sampled, and hence, that from the population of positive groups sampled, just
under 5% (4.7%) were misclassified.

The overall proportion of animals estimated to be shedding is 7.9%. This is slightly
lower than the raw estimate given earlier. This adjustment arises from the more
appropriate modelling of the asymmetric prevalence distribution. The confidence
interval, (6.5%, 9.6%), is also slightly wider, for the same reason.

It is interesting to attempt to estimate the proportion of animals shedding in positive
groups. The difficult with this estimate is that because many groups may contain only
a small number of shedders, and it is difficult to distinguish such positive groups
(which should contribute to the estimate) from negative groups (which should not).
Estimates of this proportion are highly sensitive to the estimated value of pneg and
hence it is inappropriate to utilise the profile likelihood approach used to estimate the
earlier confidence intervals. Confidence intervals for the mean prevalences were
generated from the log-likelihood by identifying the maximum values of the
prevalence on the boundary of the confidence region specified by the chi-squared
approximation to the log-likelihood ratio. Three variables were varied, so the upper
limit of the confidence interval was based on three available degrees of freedom. The
lower bounds of the confidence interval for the within-infected groups prevalence
must occur where pneg is negligible, and when this is the case, the likelihood is
degenerate, with only two effective degrees of freedom. Therefore, the lower bound
of the confidence interval was taken to be equal to that calculated for the overall
prevalence of infected animals above, since this corresponded to a case with pneg
small and two degrees of freedom. The results are summarised in the following table:

                    Point Estimate            95% Confidence Interval
      Animal-Level         7.9%                   (6.5%, 21.0%)
      Prevalence in
    Positive Groups

The mean estimate of the shedding prevalence remains the same, at 7.9%, but the
confidence intervals is much wider, reflecting this uncertainly over the status of many
of the farms reported as negative. It is interesting to note that these data are consistent
with, on average, as many as 1 in 5 animals in positive groups shedding.




                                            19
Analysing binomial data conditional on number of Vtpositives being greater
than zero.

Descriptive variables (Division, Sam_Month, Manage_O)
5656 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=*] VTPos; NBINOMIAL=N_Sam
5657 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes;
FACT=9]\
5658 Manage_O


* MESSAGE: Term Manage_O cannot be fully included in the model
  because 1 parameter is aliased with terms already in the model

 (Manage_O Mixed) = 0

***** Regression Analysis *****

Response variate:   VTPos
 Binomial totals:   N_Sam
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant, Manage_O


*** Summary of analysis ***

                                        mean     deviance approx
             d.f.       deviance    deviance        ratio F pr.
Regression      2             0.       0.160         0.02 0.979
Residual      204          1528.       7.489
Total         206          1528.       7.418

Dispersion parameter is estimated to be 7.49 from the residual deviance
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
          620         5.00       0.048
          637         4.00       0.044
          681         4.00       0.046


*** Estimates of parameters ***

                                                               antilog of
                    estimate        s.e.   t(204)      t pr.     estimate
Constant              -0.701       0.250    -2.81      0.005       0.4958
Manage_O Beef          0.054       0.277     0.20      0.846        1.056
Manage_O Other
                      0.060        0.324        0.18   0.854       1.061
Manage_O Mixed
                         0            *         *         *        1.000
* MESSAGE: s.e.s are based on the residual deviance

Parameters for factors are differences compared with the reference level:
              Factor Reference level
            Manage_O Dairy


Manage_O shows no significant effects. By contrast, consider Division.
5659 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=*] VTPos; NBINOMIAL=N_Sam
5660 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes;
FACT=9]\
5661 Division


***** Regression Analysis *****

Response variate: VTPos
 Binomial totals: N_Sam
    Distribution: Binomial




                                           20
   Link function: Logit
    Fitted terms: Constant, Division


*** Summary of analysis ***

                                        mean    deviance approx
             d.f.       deviance    deviance       ratio F pr.
Regression      5            90.      18.017        2.52 0.031
Residual      201          1438.       7.154
Total         206          1528.       7.418

Dispersion parameter is estimated to be 7.15 from the residual deviance
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
           15        21.00       0.092
           51         3.00       0.092
          139         9.00       0.088
          143         1.00       0.105
          566        15.00       0.092
          584        10.00       0.104
          637         4.00       0.101


*** Estimates of parameters ***

                                                               antilog of
                    estimate        s.e.   t(201)      t pr.     estimate
Constant              -0.653       0.202    -3.23      0.001       0.5205
Division Highland
                      0.725        0.395        1.84   0.068       2.065
Division Islands
                      -0.326       0.439    -0.74      0.458      0.7218
Division North East
                      0.096        0.269        0.36   0.722       1.100
Division South East
                      0.243        0.303        0.80   0.424       1.275
Division South West
                    -0.531        0.305     -1.74 0.083           0.5881
* MESSAGE: s.e.s are based on the residual deviance

Parameters for factors are differences compared with the reference level:
              Factor Reference level
            Division Central


The prevalence in the Highlands is significantly higher than that in Central, while
those in the Islands and the South West show some evidence of being lower.




                                           21
  0.8

  0.7

  0.6

  0.5

  0.4

  0.3

  0.2

  0.1

    0
            Central    Highlands     Islands        NE        SE      SW



Plot of prevalences by animal health division (univariate analysis), with 95%
confidence intervals.

The estimated prevalences on positive farms in different divisions are as follows:

Central           34%
Highlands         52%
Islands           27%
NE                36%
SE                40%
SW                23%

Hence there is evidence that the South West and Islands are low, Central, NE and SE
are moderate and Highlands is high in terms of prevalence.

Examining Sampling Month,
***** Regression Analysis *****

 Response variate:     VTPos
  Binomial totals:     N_Sam
     Distribution:     Binomial
    Link function:     Logit
     Fitted terms:     Constant, Sam_Mon


*** Summary of analysis ***

                                             mean   deviance approx
                d.f.      deviance       deviance      ratio F pr.
Regression        11          177.         16.104       2.32 0.011
Residual         195         1351.          6.928
Total            206         1528.          7.418

Dispersion parameter is estimated to be 6.93 from the residual deviance
* MESSAGE: The error variance does not appear to be constant:
           intermediate responses are more variable than small or large
responses




                                               22
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
          308        16.00       0.176
          326        18.00       0.164
          333        14.00       0.172


*** Estimates of parameters ***

                                                            antilog of
                  estimate           s.e.    t(195) t pr.     estimate
Constant             0.301          0.460      0.65 0.514        1.351
Sam_Mon Feb         -1.037          0.602     -1.72 0.086       0.3545
Sam_Mon Mar         -0.570          0.525     -1.09 0.279       0.5656
Sam_Mon Apr         -0.878          0.579     -1.52 0.131       0.4155
Sam_Mon May         -0.535          0.517     -1.04 0.301       0.5854
Sam_Mon Jun         -1.458          0.591     -2.47 0.014       0.2327
Sam_Mon Jul         -1.407          0.569     -2.47 0.014       0.2448
Sam_Mon Aug         -1.008          0.556     -1.81 0.071       0.3650
Sam_Mon Sep         -1.695          0.594     -2.85 0.005       0.1836
Sam_Mon Oct         -1.730          0.581     -2.98 0.003       0.1772
Sam_Mon Nov         -0.653          0.540     -1.21 0.228       0.5207
Sam_Mon Dec         -0.542          0.661     -0.82 0.413       0.5816
* MESSAGE: s.e.s are based on the   residual deviance

Parameters for factors are differences compared with the reference level:
              Factor Reference level
             Sam_Mon Jan

   5745   RKEEP ; RESIDUALS=Resids; FITTEDVALUES=Fits;ESTIMATES=Para;VCOVAR=Var



Examining the associated confidence intervals:


  90%
  80%
  70%
  60%
  50%
  40%
  30%
  20%
  10%
   0%
                                                           r
           ry




                                      ne
   Fe ry




                                                          r
          ch




                                             ly
                        ril




                                                         r

                                                        er
                                                        st
                              ay




                                                       be

                                                       be
                                                       be
                                           Ju
                      Ap
        ua
         a




                                                      gu




                                                     ob
                                    Ju
        ar




                              M




                                                     m
      nu




                                                    m
                                                   em
     M
     br




                                                  Au




                                                   ct

                                                  ve

                                                 ce
    Ja




                                                 O
                                                pt



                                               No

                                              De
                                              Se




Plot of prevalences by sampling month, with 95% confidence intervals.

The estimated prevalences on positive farms in different sampling months are as
follows:




                                             23
January         57%
February        32%
March           43%
April           36%
May             44%
June            24%
July            25%
August          33%
September       20%
October         19%
November        41%
December        44%

There are clear differences between different months. The period June to October
show significantly lower prevalences and there is some evidence of a peak in January.
There is, however, little point in exploring these properties further before
investigating the explanatory factors which may influence shedding rates.

Exploring the possible explanatory factors in a univariate fashion using a Generalised
Linear Model, the results are summarised in the following table. The p-values
indicate the likely significance of the fitted values. Variables with p-values of less
than 5% are indicated in red, those in the range 5%-10% in blue. Those variables
which ultimately are found to be of interest in the multivariate analysis are indicated
by bold text.




                                          24
Factor/Variable p-value          Comments
Manage_C                    0.88 ‘Beef’ and ‘Others' higher than 'Dairy'
Manage_O                    0.98 ‘Beef’ and ‘Others' higher than 'Dairy'
Division                    0.03 ‘Highland’ higher than others.
Sam_Month                   0.01 Lower in summer months
Sample                           No variability in explanatory variable
Sam_Year                    0.50 No obvious pattern
Season                     0.006 Summer and Autumn lower than Winter and Spring
SeasList                    0.04 Both Summer and Autumn lower than Winter and Spring

Sampler                     0.85 ‘Fiona' is higher than 'Helen'
                                 Higher numbers of finishing cattle associated with lower
N_F_Cattle                 0.177 prevalence, probably better analysed as a factor, below
FCattle                    0.301 No consistent pattern
                                 Probably better analysed as a factor, below: More groups
N_Groups                    0.35 associated with lower prevalence.
GroupsCat                   0.93 No consistent pattern
                                 More animals in sampling group associated with lower
N_Sam_Gr                    0.22 prevalences
Min_Age                     0.44 Higher minimum age associated with lower prevalence
Max_Age                     0.25 Higher maximum age associated with lower prevalence
Source                      0.17 ‘Buy in' and ‘Both’ lower than 'Breeding only'
NewSource                   0.19 ‘Open' lower than 'Closed'
Breed                       0.54 ‘DairyBeef' less than 'Beef', but not significant
Housed                    <0.001 Housed animals have much higher prevalences
Housing                   <0.001 Housing confounded with Housed. Otherwise nothing.
NoChange                    0.59 1' higher than '0' (not sure of interpretation)
TDHouse                     0.45 Longer time associated with higher prevalences
Rec_Move                   0.002 A recent move is associated with lower prevalences
                                 Most recent move class 1 (<1 week) is lower than classes
RecMove2                    0.33 2 and 3 (>1 week)
SupFeed                   <0.001 SupFeed confounded with Housed. Otherwise nothing.
RecDFeed                   0.007 Recent change in feed associated with lower prevalence
Forage                     0.007 Forage confounded with Housed.
Silage                     0.007 Silage confounded with Housed. Otherwise nothing.
Concentrate                0.013 Concentrate confounded with Housed.
                                  ‘Yes' is lower than 'No'. Silage_Home confounded
Sil_Home                   0.029 with Housed.
                                 ‘Yes' is lower than 'No'. Silage_Manure confounded with
Sil_Manure                  0.19 Housed.
                                 ‘Yes' is lower than 'No'. Silage_Slurry confounded with
Sil_Slurry                 0.108 Housed.
                                 ‘Yes' is lower than 'No'. Silage_Sewage confounded with
Sil_Sewage                  0.44 Housed.
                                 ‘Yes' is higher than 'No'. Silage_Geece confounded with
Sil_Geece                   0.40 Housed.
                                 ‘Yes' is higher than 'No'. Silage_Gulls confounded with
Sil_Gulls                   0.37 Housed.
Hay                         0.79 ‘Yes' is lower than 'No'
Hay_Manure                  0.58 ‘Yes' is lower than 'No'
Hay_Slurry                  0.69 ‘Yes' is lower than 'No'
Hay_Sewage                       No data points in class with Sewage on hay fields.


                                         25
Hay_Geese                        No data points in class with Geese on hay fields.
Hay_Gulls                   0.45 Gulls present associated with lower prevalence
                                 Grass_Manure confounded with Housed. Otherwise ‘Yes'
Grass_Manure              <0.001 is lower than 'No', but not significant.
                                 Grass_Slurry confounded with Housed. Otherwise ‘Yes'
Grass_Slurry              <0.001 is lower than 'No', but not significant.
                                 Grass_Sewage confounded with Housed. Otherwise
Grass_Sewage              <0.001 nothing.
                                 Grass_Geece confounded with Housed. Otherwise ‘Yes'
Grass_Geece               <0.001 is lower than 'No'
                                 Grass_Gulls confounded with Housed. Otherwise ‘Yes' is
Grass_Gulls               <0.001 lower than 'No'
N_Cattle                    0.15 More cattle associated with lower prevalence
Cattle                      0.55 No clear pattern.
                                 Large numbers of sheep are protective, but better analysed
N_Sheep                     0.37 using a factor, below.
Sheep                       0.67 (Sheep absent or present) 'With' is lower than 'Without'
N_Goats                     0.21 More goats associated with higher prevalence
Goats                       0.46 (Goats absent or present) 'With' is higher than 'Without'
N_Horses                    0.84 More horses associated with lower prevalence
N_Pigs                     0.037 More pigs associated with lower prevalence
Pigs                        0.62 (Pigs absent or present) 'With' is lower than 'Without'
N_Chickens                  0.33 More chickens associated with higher prevalence
                                 (Chickens absent or present) 'With' is virtually identical to
Chickens                       1 'Without'
N_Deer                     0.026 More deer associated with higher prevalence
Deer                       0.026 (Deer absent or present) 'With' is higher than 'Without'
                                 Natural prevalences significantly lower than those for
Water                      0.014 Mains
                                 Mains prevalences slightly higher than those farms with
Mains                       0.83 other sources.
                                 Farms with natural water sources have lower prevalences
Natural                    0.002 than those with other sources.
                                 Farms with private water sources have lower prevalences
Private                     0.08 than those with other sources; confounded with housed.
WaterCon                    0.76 With' is higher than 'Without'
                                 All but 'None', 'Animal' and ASM thrown out for lack of
WaterCT                     0.52 information: 'ASM' lower than 'Animal'
                                 Those that wanted to know had higher prevalences than
Want2Know                   0.75 those who did not
                                 Those willing to have a 2nd visit had a lower prevalence
Visit2                      0.11 than those who were not
LabOperator                 0.55 S' generated lower prevalences than 'D' and ‘H’
BeefonDairy                 0.34 This class of farm exhibits a higher prevalence

The key explanatory factor appears to be Housed, reporting whether the animals were
housed or not. Many of the other factors which appear significant are actually
confounded with Housed, and reflect this variable. It may be appropriate to report the
full results for the Housed analysis:
5763 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=*] VTPos; NBINOMIAL=N_Sam
5764   FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes;
FACT=9]\
5765   Housed




                                           26
5765............................................................................


***** Regression Analysis *****

Response variate:   VTPos
 Binomial totals:   N_Sam
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant, Housed


*** Summary of analysis ***

                                           mean   deviance approx
             d.f.      deviance        deviance      ratio F pr.
Regression      1          161.         160.526      24.06 <.001
Residual      205         1367.           6.671
Total         206         1528.           7.418

Dispersion parameter is estimated to be 6.67 from the residual deviance
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses


*** Estimates of parameters ***

                                                             antilog of
                  estimate         s.e.    t(205) t pr.        estimate
Constant            -1.241        0.161     -7.73 <.001          0.2891
Housed 1             0.938        0.197      4.77 <.001           2.555
* MESSAGE: s.e.s are based on the residual deviance

Parameters for factors are differences compared with the reference level:
              Factor Reference level
              Housed 0


Unhoused       22%
Housed         42%


    60%

    50%

    40%

    30%

    20%

    10%

     0%
                       Unhoused                             Housed



Housed animals exhibit much higher prevalences than unhoused animals.

The effect of housing is so strong, and so fundamental, that it would seem wise to
review all the other factors in terms of their interaction with Housing.



                                            27
Factor/Variable p-value         Comments
Manage_C                  0.153 ‘Beef’ higher and ‘Others' lower than 'Dairy'
Manage_O                   0.33 ‘Beef’ higher and ‘Others' lower than 'Dairy'
                                ‘Highland’ higher than others, SW may be low. No
Division                  0.007 interaction with Housed.
                                No interaction, monthly variability explained by
Sam_Month                  0.31 differential housing in different months.
Sample                          No variability in explanatory variable
Sam_Year                   0.23 No obvious pattern
                                No obvious pattern: seasonal variability explained by
Season                     0.32 differential housing.
                                No obvious pattern: seasonal variability explained by
SeasList                   0.40 differential housing.
                                ‘Fiona' has a different effect to 'Helen' in housed and
Sampler                    0.42 unhoused farms. No obvious effect.
                                Higher numbers of finishing cattle associated with lower
                                prevalence, probably better analysed as a factor, below.
N_F_Cattle                0.009 No interaction with Housed.
                                The larger the group of cattle, the lower the
FCattle                   0.032 prevalence. No interaction with Housed.
                                Probably better analysed as a factor, below: More groups
N_Groups                  0.016 associated with lower prevalence.
GroupsCat                  0.41 No consistent pattern
                                More housed animals in sampling groups associated with
                                lower prevalences, more unhoused associated with higher
N_Sam_Gr                   0.20 prevalences.
                                Higher minimum age associated with lower prevalence in
Min_Age                    0.31 unhoused farms, opposite on housed.
                                Higher maximum age associated with lower prevalence in
Max_Age                    0.40 unhoused farms, opposite on housed.
                                ‘Buy in' does different things in housed and unhoused
                                farms. In unhoused, gives lower prevalences, in housed,
Source                     0.09 gives higher.
                                ‘Open' lower than 'Closed' in unhoused groups, vice versa
NewSource                  0.08 in housed.
Breed                      0.67 No consistent pattern.
                                Housing was confounded with Housed. Deal with this, and
                                there is nothing left. ‘Slats’ and ‘Other’ are higher than
Housing                    0.73 ‘Court’ but nothing significant.
NoChange                   0.60 1' higher than '0' (not sure of interpretation)
TDHouse                    0.36 Longer time associated with higher prevalences
                                Housed animals which have recently moved show
Rec_Move                  0.004 significantly lower shedding levels.
                                In unhoused groups, most recent move class 1 (<1 week)
RecMove2                   0.16 is lower than classes 2 and 3 (>1 week)
                                SupFeed was confounded with Housed. Having removed
                                this, animals with supplementary feed have lower
SupFeed                    0.49 prevalences than those without.
                                Housed animals which have had a recent change in
RecDFeed                  0.024 feed show significantly lower shedding levels.
                                Forage was confounded with Housed. Now no consistent
Forage                     0.55 pattern.



                                         28
                     Silage was confounded with Housed. Now no consistent
Silage          0.51 pattern.
                     Concentrate was confounded with Housed. Now no
Concentrate     0.67 consistent pattern.
                      ‘Yes' is lower than 'No'. ‘Null response’ lower than
Sil_Home        0.04 ‘No’. No interaction with Housed.
                     ‘Yes' is lower than 'No'. ‘Null response’ higher than ‘No’.
Sil_Manure     0.047 No interaction with Housed.
                     ‘Yes' is lower than 'No'. ‘Null response’ higher than ‘No’.
Sil_Slurry     0.027 No interaction with Housed.
Sil_Sewage      0.23 ‘Yes' is lower than 'No'. ‘Null response’ higher than ‘No’
Sil_Geece       0.34 No consistent pattern.
Sil_Gulls       0.19 No consistent pattern.
                     ‘Yes' is higher than 'No' in unhoused, vice versa in
Hay             0.56 housed.
Hay_Manure      0.52 ‘Yes' is lower than 'No' in unhoused animals.
Hay_Slurry      0.60 ‘Yes' is lower than 'No' in unhoused animals.
Hay_Sewage           No data points in class with Sewage on hay fields.
Hay_Geese            No data points in class with Geese on hay fields.
                     Gulls present associated with lower prevalence in
Hay_Gulls       0.42 unhoused animals.
                     Grass_Manure confounded with Housed. Otherwise ‘Yes'
Grass_Manure    0.59 is lower than 'No', but not significant.
                     Grass_Slurry confounded with Housed. Otherwise ‘Yes'
Grass_Slurry    0.39 is lower than 'No', but not significant.
Grass_Sewage         Grass_Sewage completely aliased with Housed.
                     Grass_Geese confounded with Housed. Otherwise ‘Yes'
Grass_Geese     0.49 is lower than 'No'
                     Grass_Gulls confounded with Housed. Otherwise ‘Yes' is
Grass_Gulls     0.99 lower than 'No'
                     More cattle associated with lower prevalence in housed
N_Cattle       0.012 groups.
                     No clear pattern: some evidence of lower prevalences in
Cattle          0.18 larger housed groups.
                     Large numbers of sheep are protective, but better analysed
N_Sheep         0.10 using a factor, below. No interaction with Housed.
Sheep           0.10 (Sheep absent or present) 'With' is lower than 'Without'
N_Goats         0.49 Different effects in housed and unhoused.
Goats           0.58 Different effects in housed and unhoused.
                     More horses associated with lower prevalence in
N_Horses       0.995 unhoused groups.
                     More pigs associated with lower prevalence. No
N_Pigs         0.034 interaction with Housed.
                     (Pigs absent or present) 'With' is lower than 'Without' in
Pigs            0.38 unhoused groups, vice versa for housed.
                     More chickens associated with higher prevalence in
N_Chickens      0.18 unhoused groups, vice versa in housed.
                     (Chickens absent or present) 'With' is higher than
Chickens        0.90 ‘Without’ in unhoused farms, vice versa for housed.
                     More deer associated with higher prevalence. Potentially
N_Deer         0.036 highly affected by one point’s leverage.
                     (Deer absent or present) 'With' is higher than 'Without'.
Deer           0.036 Potentially highly affected by one point’s leverage.



                              29
                                   Effects explained by Housed variable. Mains water
Water                         0.28 associated with housed.
                                   Unhoused animals with mains water had higher
Mains                         0.79 prevalences, housed animals had lower.
                                   Unhoused animals with natural water had lower
Natural                       0.06 prevalences.
                                   Unhoused animals with private water had higher
Private                       0.27 prevalences, housed animals had lower.
WaterCon                      0.24 With' is higher than 'Without'
WaterCT                       1.00 No clear pattern
                                   Those that wanted to know had higher prevalences than
Want2Know                     0.39 those who did not
                                   Those willing to have a 2nd visit had a lower prevalence
Visit2                        0.19 than those who were not
                                   ‘H’ and’ S' generated lower prevalences than 'D' for
LabOperator                   0.45 unhoused farms, higher for housed.
                                   This class of farm exhibits a higher prevalence in housed
BeefonDairy                   0.59 groups, lower in unhoused.

The Deer variables are driven by the presence of one farm in the study with a high
prevalence, which was the only farm with a high number of deer, and indeed was one
of only two farms with any deer at all. This record therefore has enormous leverage,
and the resulting model is of dubious use. This variable should therefore be ignored.
The      variables   which      are    of    interest    are    therefore    Housed,
N_FCattle/FCattle/NGroups/NCattle,        Source,     Housed*Rec_Move/RecDFeed,
Sil_Home/Sil_Manure/Sil_Slurry and N_Pigs. Note that the variables have been
grouped, where appropriate, into equivalence classes of what are likely to be highly
correlated factors.

Exploring the N_FCattle/FCattle/NGroups/Ncattle complex, which all associate lower
prevalences with larger numbers of cattle, using forward stepwise selection with the
Akaike information criterion to select candidates for inclusion/exclusion, we find that
FCattle is the most informative measure, with NGroups the second most informative,
but lacking statistical significance.
5579 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=*] VTPos; NBINOMIAL=N_Sam
5580 RSEARCH [PRINT=model,results; METHOD=fstepwise; CONSTANT=estimate; FACTORIAL=3;
DENOMINATOR=ss;\
5581     INRATIO=1; OUTRATIO=1; MAXCYCLE=50; AFACTORIAL=2; CRITERION=aic; EXTRA=cp;
NTERMS=60;\
5582   NBESTMODELS=8;FORCED=Housed] N_F_Catt+FCattle+N_Groups+N_Cattle

***** Model Selection *****

 Response variate:   VTPos
  Binomial totals:   N_Sam
     Distribution:   Binomial
    Link function:   Logit
  Number of units:   207
     Forced terms:   Constant + Housed
        Forced df:   2
       Free terms:   N_F_Catt + FCattle + N_Groups + N_Cattle


*** Stepwise (forward) analysis of deviance ***

Change                                                     mean   deviance approx
                                 d.f.     deviance     deviance      ratio F pr.
+ Housed                            1      160.526      160.526      24.82 <.001
+ FCattle                           3       58.365       19.455       3.01 0.031



                                            30
+ N_Groups                         1            9.011        9.011      1.39   0.239
Residual                         201         1300.105        6.468

Total                            206         1528.006        7.418

        Final model: Constant + Housed + FCattle + N_Groups

Exploring  the  Housed*Rec_Move/RecDFeed                     complex,     we     see    that
Housed*Rec_Move is the more informative variable.
5588 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=*] VTPos; NBINOMIAL=N_Sam
5589 RSEARCH [PRINT=model,results; METHOD=fstepwise; CONSTANT=estimate; FACTORIAL=3;
DENOMINATOR=ss;\
5590     INRATIO=1; OUTRATIO=1; MAXCYCLE=50; AFACTORIAL=2; CRITERION=aic; EXTRA=cp;
NTERMS=60;\
5591   NBESTMODELS=8;FORCED=Housed] Housed.(Rec_Move+RecDFeed)

***** Model Selection *****

Response variate:   VTPos
 Binomial totals:   N_Sam
    Distribution:   Binomial
   Link function:   Logit
 Number of units:   207
    Forced terms:   Constant + Housed
       Forced df:   2
      Free terms:   Housed.Rec_Move + Housed.RecDFeed


*** Stepwise (forward) analysis of deviance ***

Change                                                        mean   deviance approx
                                 d.f.        deviance     deviance      ratio F pr.
+ Housed                            1         160.526      160.526      25.16 <.001
+ Housed.Rec_Move                   2          72.370       36.185       5.67 0.004
Residual                          203        1295.110        6.380

Total                            206         1528.006        7.418

        Final model: Constant + Housed + Housed.Rec_Move

The non-inclusion of RecDFeed can be explained by a confounding between this
factor and Rec_Move. Considering the farms with shedding present, these divide into
4 categories depending on the status of the two factors:

Number of observations                            RecDFeed
                                                  0                     1
Rec_Move                0                         137                   14
                        1                         20                    36

Mean shedding fraction                            RecDFeed
                                                  0                     1
Rec_Move                0                         0.41                  0.29
                        1                         0.26                  0.24

However, the behaviour is heavily dependent on the housing status of the farm.
Tabulating the number of observations, the mean shedding fraction and the standard
error of these statistics gives the following:

Housed=0                    RecDFeed                     Housed=1              RecDFeed
                            0                1                                 0       1
Rec_Move 0                              38              6 Rec_Move 0                   99      8


                                                 31
             1                       15              22              1           5         14

Housed=0                  RecDFeed                        Housed=1       RecDFeed
                          0               1                              0       1
Rec_Move 0                         0.22             0.14 Rec_Move 0           0.48        0.40
         1                         0.26             0.26          1           0.26        0.20

Housed=0                  RecDFeed                        Housed=1       RecDFeed
                          0               1                              0       1
Rec_Move 0                       0.032             0.049 Rec_Move 0          0.032       0.127
         1                       0.072             0.051          1          0.093       0.041

The impression which might be given by a simple examination of the means would be
that the higher prevalences are restricted only to housed animals which have not been
subject to a recent move. However, care should be taken given the extremely small
numbers of animals which have been subjected to a change in diet without a change in
feed. The difference between the mean of this group and the means in the low
prevalence group is unlikely to be statistically significant.

Clearly a positive entry for either RecDFeed or Rec_Move is associated with a lower
shedding rate, although there is no sign of an interaction: the data set defining the
most interesting aspects of the relationship is extremely sparse. For ease of analysis
we therefore define a new variable RecChnge, which defines whether either change
has taken place. The resulting interaction with Housed is highly significant
(p=0.009). The effect of this factor could be centred on the effect of a change of
location or of a change of diet: the dataset does not allow any further detail to be
established.

Analysing the complex of significant silage related factors is complicated by the
questionnaire structure. Many of the questions were only asked if the responses to a
previous question took particular values. Hence, simple-minded fitting of multi-
variate models will fail due to multiple aliasing of terms in the model. The data
structure can be summarised as follows:




                                              32
                             Responses                                        Stratum     Comments
              0                                       1                       Housed       Housed or
                                                                                           unhoused
 0            1              999        0               1           999       Silage    0=no silage fed
                                                                                          1=silage fed
                                                                                         999=question
                                                                                           not asked
Few          Few           Many Many               Many             Few
999     0          1        999  999             0     1            999 Sil_Home        0=silage fed and
                                                                                          not produced
                                                                                        1=silage fed and
                                                                                          produced on-
                                                                                               farm
                                                                                          999=no silage
                                                                                         fed or question
                                                                                             not asked
999   999     0        1     999       999      999       0     1   999       Others         0=silage
                                                                                        produced, factor
                                                                                            not present
                                                                                             1=silage
                                                                                        produced, factor
                                                                                              present
                                                                                          999=no silage
                                                                                           produced on
                                                                                        farm or question
                                                                                             not asked

Aliasing will obviously be a problem, and it should be noted that non-trivial responses
to the later questions are more heavily drawn from the housed population. This may
affect the analysis. Housed has previously been shown to be a highly significant
variable. Silage is not significant, either as a main effect or in interaction with
Housed. Fitting Sil_Home in interaction with Housed gives the following results:

* MESSAGE: Term Housed.Sil_Home cannot be fully included in the model
  because 1 parameter is aliased with terms already in the model

  (Housed 1 .Sil_Home 999) = - 1.000 + (Housed 1) +
    (Sil_Home 1) + (Sil_Home 999) - (Housed 1 .Sil_Home 1)

***** Regression Analysis *****

 Response variate:       VTPos
  Binomial totals:       N_Sam
     Distribution:       Binomial
    Link function:       Logit
     Fitted terms:       Constant + Housed + Sil_Home + Housed.Sil_Home


*** Summary of analysis ***

                                                 mean       deviance approx
                  d.f.      deviance         deviance          ratio F pr.
Regression           4          205.           51.186           7.81 <.001
Residual           202         1323.            6.551
Total              206         1528.            7.418




                                                  33
Dispersion parameter is estimated to be 6.55 from the residual deviance
* MESSAGE: The error variance does not appear to be constant:
           intermediate responses are more variable than small or large
responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
           28        10.00       0.161
          202         1.00       0.121
          277         1.00       0.177
          326        18.00       0.520
          504         1.00       0.097
          703         1.00       0.209
          846         1.00       0.113
          877        15.00       0.473
          885         1.00       0.113


*** Estimates of parameters ***

                                                                           antilog of
                              estimate         s.e.      t(202)    t pr.     estimate
Constant                         0.182        0.996        0.18    0.855        1.200
Housed 1                         1.117        0.269        4.16    <.001        3.056
Sil_Home 1                       -2.08         1.21       -1.72    0.086       0.1246
Sil_Home 999                    -1.375        0.983       -1.40    0.163       0.2528
Housed 1 .Sil_Home 1             0.347        0.746        0.47    0.642        1.416
Housed 1 .Sil_Home 999               0            *           *        *        1.000
* MESSAGE: s.e.s are based on the residual deviance

Parameters for factors are differences compared with the reference level:
              Factor Reference level
              Housed 0
            Sil_Home 0

Trivial answers from housed animals are not fitted in the model because they are
aliased with a previously-fitted term. However, we are not interested in this group.
Dropping the interaction term is not statistically significant (p=0.63), however,
dropping the main Sil_Home effect significantly increases the deviance (p=0.04). We
therefore consider the model containing both the Housed and Sil_Home main effects:
**** Regression Analysis *****

Response variate:   VTPos
 Binomial totals:   N_Sam
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant + Housed + Sil_Home


*** Summary of analysis ***

                                          mean   deviance approx
             d.f.      deviance       deviance      ratio F pr.
Regression      3          203.         67.749      10.38 <.001
Residual      203         1325.          6.526
Total         206         1528.          7.418

Dispersion parameter is estimated to be 6.53 from the residual deviance
* MESSAGE: The error variance does not appear to be constant:
           intermediate responses are more variable than small or large
responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
          326        18.00       0.520
          877        15.00       0.473


*** Estimates of parameters ***

                                                                           antilog of
                                 estimate         s.e.   t(203)    t pr.     estimate
Constant                            0.133        0.989     0.13    0.893        1.143
Housed 1                            1.166        0.247     4.72    <.001        3.208




                                            34
Sil_Home 1                      -1.747        0.967       -1.81   0.072     0.1742
Sil_Home 999                    -1.345        0.979       -1.37   0.171     0.2606
* MESSAGE: s.e.s are based on the residual deviance

Parameters for factors are differences compared with the reference level:
              Factor Reference level
              Housed 0
            Sil_Home 0

Clearly, housed animal still present a higher prevalence, but this model indicates that
animals in the level 1 class of the Sil_Home factor have lower prevalences than those
in level 0 class. The level 999 class is not significantly different to either of the other
two classes, but this is not surprising, given the heterogeneous nature of this level: it
mostly refects unhoused farms, where the silage question was not asked. Hence,
among housed animals where the farm produces silage, the mean prevalence appears
to be lower. There are, of course, further factors nested within the silage production
factor. The GLM model is not a good choice for the analysis of such unbalanced data,
and it is also possible to define a more informative data structure.

The silage feeding factor is not nested within the housing factor, but it should have
been: only a few farms with unhoused animals have records relating to silage
production, even if they did produce silage. Such small numbers of values, generated
randomly by accident (biased towards early samples collected by a relatively
inexperienced operator) are worthless. Hence a new factor is defined: Silage2,
defining farms with housed animals which do feed them silage. We continue this
process, defining new dummy variables: SHome2, defining farms with housed
animals, feeding silage, which do produce silage; SMan2, defining farms with housed
animals, feeding and producing silage, which spread manure on the silage fields;
SSlu2, SSew2, SGeec2 and SGull2 are defined in a similar fashion. These variables
will be fitted along with Housed in a GLMM to explore the inter-relations between
the different factors.

Fitting the Housed, Silage Feeding and Silage Production factors gives the following
output:
6479 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects;
DISTRIBUTION=binomial;\
6480   LINK=logit; DISPERSION=1; FIXED=Housed+Silage2+SHome2; RANDOM=Farm;
CONSTANT=estimate;\
6481   FACT=9; PSE=*; MAXCYCLE=20; FMETHOD=all; CADJUST=mean] VTPos; NBINOMIAL=N_Sam

***** Generalised Linear Mixed Model Analysis *****

               Method:    cf Schall (1991) Biometrika
     Response variate:    VTPos
         Distribution:    BINOMIAL
        Link function:    LOGIT

          Random model:   Farm
           Fixed model:   Constant + (Housed + Silage2) + SHome2

* Dispersion parameter fixed at value 1.000


*** Monitoring information ***

 Iteration     Gammas Dispersion     Max change
         1      1.347      1.000     2.0404E+00
         2      1.734      1.000     3.8636E-01
         3      1.903      1.000     1.6972E-01
         4      1.927      1.000     2.3823E-02
         5      1.929      1.000     1.6145E-03
         6      1.929      1.000     2.0148E-04




                                            35
           7         1.929        1.000       2.4608E-05


*** Estimated Variance Components ***

Random term                       Component            S.e.

Farm                                  1.929          0.235


*** Residual variance model ***

Term                 Factor           Model(order)          Parameter   Estimate   S.e.

Dispersn                               Identity             Sigma2         1.000   FIXED



*** Estimated Variance matrix for Variance Components ***

     Farm        1       0.05510
 Dispersn        2       0.00000          0.00000

                                  1              2



*** Table of effects for Constant ***

               -1.471     Standard error:      0.1727



*** Table of effects for Housed ***

 Housed        0.0000    1.0000
               0.0000    1.4064

Standard error of differences:                0.3088



*** Table of effects for Silage2 ***

Silage2        0.0000    1.0000
               0.0000    1.3649

Standard error of differences:                1.083



*** Table of effects for SHome2 ***

       SHome2           0.0000          1.0000
                        0.0000         -1.7519

Standard error of differences:                 1.065




*** Tables of means ***


*** Table of predicted means for Housed ***

       Housed            0.0000         1.0000
                        -1.6650        -0.2586


*** Table of predicted means for Silage2 ***

       Silage2           0.0000         1.0000
                        -1.6442        -0.2794


*** Table of predicted means for SHome2 ***



                                                       36
       SHome2       0.0000           1.0000
                   -0.0859          -1.8377


*** Back-transformed Means (on the original scale) ***



       Housed
       0.0000          0.1591
       1.0000          0.4357


       Silage2
        0.0000         0.1619
        1.0000         0.4306


       SHome2
       0.0000          0.4785
       1.0000          0.1373

Note: means are probabilities not expected values.


6482   VDISPLAY [PRINT=Wald]


*** Wald tests for fixed effects ***


   Fixed term                   Wald statistic        d.f.   Wald/d.f.   Chi-sq prob

* Sequentially adding terms to fixed model

   Housed                            27.72              1       27.72      <0.001
   Silage2                            1.31              1        1.31       0.253
   SHome2                             2.71              1        2.71       0.100

* Dropping individual terms from full fixed model

   Housed                            20.74              1        20.74     <0.001
   Silage2                            1.59              1         1.59      0.208
   SHome2                             2.71              1         2.71      0.100

* Message: chi-square distribution for Wald tests is an asymptotic approximation
  (i.e. for large samples) and underestimates the probabilities in other cases.


Inevitably, Housed is highly significant, while Silage Feeding explains virtually none
of the variability. Silage production, however, has borderline significance in
explaining some of the variability seen in the data. Fitting the production variables in
turn gives the following p-values from the Wald statistic (when all other factors have
also been fitted).

             p-value
Manure        0.11
Sewage        0.91
Slurry        0.06
Geece         0.90
Gulls         0.61

Clearly, Gulls, Geece and Sewage have no significant effect. However, the spreading
of sewage and the spreading of slurry both appeear worth further examination. When
they are both fitted in the same model, the spreading of manure lacks significance,
with a p-value of 0.135, while the spreading of slurry is still within the range of



                                                 37
interest (p=0.08). Fitting the model with only slurry spreading gives rise to the
following Wald statsitics:
6515    VDISPLAY [PRINT=Wald]


*** Wald tests for fixed effects ***


  Fixed term                Wald statistic           d.f.   Wald/d.f.   Chi-sq prob

* Sequentially adding terms to fixed model

  Housed                            27.94              1       27.94     <0.001
  Silage2                            1.31              1        1.31      0.252
  SHome2                             2.73              1        2.73      0.098
  SSlur2                             3.40              1        3.40      0.065

* Dropping individual terms from full fixed model

  Housed                            20.91              1       20.91     <0.001
  Silage2                            1.61              1        1.61      0.205
  SHome2                             1.91              1        1.91      0.167
  SSlur2                             3.40              1        3.40      0.065

* Message: chi-square distribution for Wald tests is an asymptotic approximation
  (i.e. for large samples) and underestimates the probabilities in other cases.

We note that Silage2 (feeding) continues to lack any significance, while the presence
of slurry spreading factor (SSlur2) removed any significance from the Silage
production factor (SHome2). Refitting the model without Silage2 causes only
marginal changes. Refitting the model without SHome2 gives:
6516 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects;
DISTRIBUTION=binomial;\
6517   LINK=logit; DISPERSION=1; FIXED=Housed+SSlur2; RANDOM=Farm; CONSTANT=estimate;
FACT=9;\
6518   PSE=*; MAXCYCLE=20; FMETHOD=all; CADJUST=mean] VTPos; NBINOMIAL=N_Sam

***** Generalised Linear Mixed Model Analysis *****

                 Method:   cf Schall (1991) Biometrika
       Response variate:   VTPos
           Distribution:   BINOMIAL
          Link function:   LOGIT

           Random model:   Farm
            Fixed model:   Constant + Housed + SSlur2

* Dispersion parameter fixed at value 1.000


*** Monitoring information ***

Iteration       Gammas Dispersion       Max change
        1        1.340      1.000       1.9846E+00
        2        1.713      1.000       3.7332E-01
        3        1.882      1.000       1.6920E-01
        4        1.906      1.000       2.4158E-02
        5        1.908      1.000       1.6650E-03
        6        1.908      1.000       2.0323E-04
        7        1.908      1.000       2.4199E-05


*** Estimated Variance Components ***

Random term                 Component         S.e.

Farm                            1.908        0.232


*** Residual variance model ***




                                             38
Term                Factor         Model(order)        Parameter   Estimate    S.e.

Dispersn                           Identity            Sigma2        1.000    FIXED



*** Estimated Variance matrix for Variance Components ***

     Farm       1     0.05384
 Dispersn       2     0.00000       0.00000

                               1              2



*** Table of effects for Constant ***

            -1.471     Standard error:    0.1719



*** Table of effects for Housed ***

 Housed     0.0000    1.0000
            0.0000    1.3767

Standard error of differences:           0.2380



*** Table of effects for SSlur2 ***

       SSlur2        0.0000         1.0000
                     0.0000        -0.6851

Standard error of differences:         0.2917




*** Tables of means ***


*** Table of predicted means for Housed ***

 Housed     0.0000    1.0000
            -1.813    -0.437


*** Table of predicted means for SSlur2 ***

 SSlur2     0.0000    1.0000
            -0.782    -1.467


*** Back-transformed Means (on the original scale) ***



       Housed
       0.0000        0.1403
       1.0000        0.3926


       SSlur2
       0.0000        0.3138
       1.0000        0.1873

Note: means are probabilities not expected values.


6519   VDISPLAY [PRINT=Wald]


*** Wald tests for fixed effects ***




                                                  39
   Fixed term                 Wald statistic        d.f.    Wald/d.f.      Chi-sq prob

* Sequentially adding terms to fixed model

   Housed                          27.94              1         27.94       <0.001
   SSlur2                           5.52              1          5.52        0.019

* Dropping individual terms from full fixed model

   Housed                          33.45              1         33.45       <0.001
   SSlur2                           5.52              1          5.52        0.019

* Message: chi-square distribution for Wald tests is an asymptotic approximation
  (i.e. for large samples) and underestimates the probabilities in other cases.


The spreading of slurry on silage fields on farms where the animals are housed is
associated with statistically significantly lower (p=0.02) shedding levels. The other
factors are explained either by their association with housing or with slurry spreading.

Only one farm is recorded as having both housed animals and a natural water supply.
Hence, any effect of natural water supply can be estimated only for unhoused animals.
Refitting the model only to unhoused animals, we find that the effect remains
statistically significant (p=0.03). The factor is redefined to define farms with
unhoused animals with access to a natural water supply (Natural2).

Hence, the factors which appear to be particularly likely to be relevant in the multi-
factor model are Housed, FCattle, Housed*Source, Housed*RecChnge, SSlur2,
N_Pigs and Natural2. Forcing the model to contain Housed, we use stepwise
regression to evaluate which of these factors should be included in a multi-factor
model:
6520 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=*] VTPos; NBINOMIAL=N_Sam
6521 RSEARCH [PRINT=model,results; METHOD=fstepwise; FORCED=Housed;
CONSTANT=estimate; FACTORIAL=3; DENOMINATOR=ss;\
6522 INRATIO=1; OUTRATIO=1; MAXCYCLE=50; AFACTORIAL=2; CRITERION=aic; EXTRA=cp;
NTERMS=60;\
6523 NBESTMODELS=8] FCattle + Housed.Source + Housed.RecChnge +SSlur2 + N_Pigs +
Natural2

***** Model Selection *****

 Response variate:   VTPos
  Binomial totals:   N_Sam
     Distribution:   Binomial
    Link function:   Logit
  Number of units:   207
     Forced terms:   Constant + Housed
        Forced df:   2
       Free terms:   FCattle + Housed.Source + Housed.RecChnge +
                     SSlur2 + N_Pigs + Natural2


*** Stepwise (forward) analysis of deviance ***

Change                                                         mean   deviance approx
                                 d.f.      deviance        deviance      ratio F pr.
+ Housed                            1       160.526         160.526      26.69 <.001
+ Housed.RecChnge                   2        61.752          30.876       5.13 0.007
+ Housed.Source                     4        57.622          14.405       2.40 0.052
+ Natural2                          1        23.184          23.184       3.85 0.051
+ FCattle                           3        34.351          11.450       1.90 0.130
+ SSlur2                            1        22.338          22.338       3.71 0.055
+ N_Pigs                            1         7.532           7.532       1.25 0.264
Residual                          193      1160.702           6.014

Total                             206      1528.006          7.418




                                               40
      Final model: Constant + Housed + Housed.RecChnge + Housed.Source +
                   Natural2 + FCattle + SSlur2 + N_Pigs

All of the factors are statistically significant with p-values less than or near 0.05,
except for N_Pigs, which has ceased to show any appreciable evidence of fit and
Fcattle which now has a significance level of 0.13. Dropping N_Pigs from the full
model above produces a small change in deviance (p=0.26) by an F-test. We
therefore conclude that the univariate significance of the N_Pigs variable is caused by
some aspect of the data better explained by one of the other factors. Dropping FCattle
from the (new) full model produces a larger change in deviance (p=0.11) by an F-test.
It is decided to retain FCattle for the moment.

Fitting the remaining factors in a multi-factor model, we generate the following
output:
6600 "Modelling of binomial proportions. (e.g. by logits)."
6601 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=*] VTPos; NBINOMIAL=N_Sam
6602 TERMS [FACT=9] Housed + Housed.RecChnge + Housed.Source + Natural2 + FCattle +
SSlur2
6603 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes;
FACT=9]\
6604   Housed + Housed.RecChnge + Housed.Source + Natural2 + FCattle + SSlur2


***** Regression Analysis *****

 Response variate:   VTPos
  Binomial totals:   N_Sam
     Distribution:   Binomial
    Link function:   Logit
     Fitted terms:   Constant + Housed + Housed.RecChnge + Housed.Source +
                     Natural2 + FCattle + SSlur2


*** Summary of analysis ***

                                          mean   deviance approx
              d.f.       deviance     deviance      ratio F pr.
Regression      12           360.       29.981       4.98 <.001
Residual       194          1168.        6.022
Total          206          1528.        7.418

Dispersion parameter is estimated to be 6.02 from the residual deviance
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses


*** Estimates of parameters ***

                                                                           antilog of
                              estimate         s.e.      t(194)    t pr.     estimate
Constant                        -0.682        0.304       -2.25    0.026       0.5058
Housed 1                         0.961        0.348        2.76    0.006        2.616
Housed 0 .RecChnge 1            -0.179        0.322       -0.56    0.579       0.8362
Housed 1 .RecChnge 1            -0.780        0.302       -2.59    0.010       0.4584
Housed 0 .Source Buy            -0.883        0.473       -1.87    0.064       0.4134
Housed 0 .Source Both           -0.392        0.446       -0.88    0.380       0.6756
Housed 1 .Source Buy            -0.178        0.268       -0.66    0.507       0.8371
Housed 1 .Source Both           -0.479        0.311       -1.54    0.126       0.6196
Natural2 1                      -0.661        0.349       -1.89    0.060       0.5164
FCattle 2                        0.152        0.231        0.66    0.512        1.164
FCattle 3                       -0.364        0.268       -1.36    0.176       0.6950
FCattle 4                       -0.455        0.327       -1.39    0.165       0.6344
SSlur2 1                        -0.493        0.257       -1.92    0.057       0.6106
* MESSAGE: s.e.s are based on the residual deviance

Parameters for factors are differences compared with the reference level:
              Factor Reference level
              Housed 0
            Natural2 0



                                            41
              FCattle   1
               SSlur2   0


Again using stepwise regression to explore the properties of the data, we force the
above factors to be included in the model, and explore whether any other factors now
should be included in the model (excluding time and geographical variables which
will be considered later):
6605 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=*] VTPos; NBINOMIAL=N_Sam
6606 RSEARCH [PRINT=model,results; METHOD=fstepwise; FORCED=Housed + Housed.RecChnge\
6607   + Housed.Source + Natural2 + FCattle + SSlur2; CONSTANT=estimate; FACTORIAL=3;
DENOMINATOR=ss;\
6608 INRATIO=1; OUTRATIO=1; MAXCYCLE=50; AFACTORIAL=2; CRITERION=aic; EXTRA=cp;
NTERMS=60;\
6609 NBESTMODELS=8] BeefOnDairy + Breed + Cattle + Chicks +Forage + Goats \
6610 + Gra_Geec + Gra_Gull + Gra_Manu + Gra_Slur + Hay + Hay_Manu + Lab_Op + Manage_C
+\
6611 Manage_O + Max_Age + Min_Age + N_Goats + N_Horses + N_Pigs + N_Sheep + NoChange
+ \
6612 Pigs + Sampler + Sheep + T_DHouse + Visit2 + Want2Kno + Mains+Private+Water_Con
+ WaterCT

***** Model Selection *****

Response variate:    VTPos
 Binomial totals:    N_Sam
    Distribution:    Binomial
   Link function:    Logit
 Number of units:    199
    Forced terms:    Constant + Housed + Housed.RecChnge + Housed.Source +
                     Natural2 + FCattle + SSlur2
          Forced df: 13
         Free terms: BeefOnDairy + Breed + Cattle + Chicks + Forage +
                     Goats + Gra_Geec + Gra_Gull + Gra_Manu + Gra_Slur +
                     Hay + Hay_Manu + Lab_Op + Manage_C + Manage_O +
                     Max_Age + Min_Age + N_Goats + N_Horses + N_Pigs +
                     N_Sheep + NoChange + Pigs + Sampler + Sheep +
                     T_DHouse + Visit2 + Want2Kno + Mains + Private +
                     Water_Con + WaterCT


*** Stepwise (forward) analysis of deviance ***

Change                                                     mean   deviance approx
                                 d.f.     deviance     deviance      ratio F pr.
+ Housed
+ Housed.RecChnge
+ Housed.Source
+ Natural2
+ FCattle
+ SSlur2                          12      321.379       26.782       4.72    <.001
+ Sheep                            1       24.244       24.244       4.27    0.040
+ Visit2                           1       14.035       14.035       2.47    0.118
+ Breed                            5       39.495        7.899       1.39    0.229
+ Chicks                           1       13.171       13.171       2.32    0.129
+ Water_Con                        1       14.980       14.980       2.64    0.106
+ Forage                           2       15.461        7.731       1.36    0.259
+ NoChange                         1        6.347        6.347       1.12    0.292
Residual                         174      987.200        5.674

Total                            198     1436.312        7.254

        Final model: Constant + Housed + Housed.RecChnge + Housed.Source +
                     Natural2 + FCattle + SSlur2 + Sheep + Visit2 +
                     Breed + Chicks + Water_Con + Forage + NoChange


The threshold for inclusion is set deliberately low, so many of these will lack
statistical significance. We examine their suitability for inclusion in the model by
implementing a backwards stepwise procedure.



                                            42
1/ NoChange is not statistically significant when dropped (p=0.38). NoChange is
dropped.
2/ Forage is not statistically significant when dropped (p=0.37). Forage is dropped.
3/ Breed is not statistically significant when dropped (p=0.42). Breed is dropped.
4/ Chick is not statistically significant when dropped (p=0.23). Chick is dropped.
5/ Visit2 is not statistically significant when dropped (p=0.14). Visit2 is dropped.
6/ Water_Con is not statistically significant when dropped (p=0.23). Water_Con is
dropped.

When FCattle is experimentally dropped from the model, it registers a significance of
0.09. It is therefore retained, as is Sheep.

Hence we conclude that the multivariate model to be carried forward to the GLMM
process is Housed + FCattle + Housed.Source + Housed.RecChnge + SSlur2 +
Natural2+Sheep

Fitting this model in the Generalised Linear Mixed Model context gives the following
output (neither county or veterinary practice are found to be significant random
effects):
6629 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects;
DISTRIBUTION=binomial;\
6630   LINK=logit; DISPERSION=1; FIXED=Housed + FCattle + Housed.Source +
Housed.RecChnge + SSlur2 + Natural2+Sheep;\
6631   RANDOM=Farm; CONSTANT=estimate; FACT=9; PSE=*; MAXCYCLE=20; FMETHOD=all;
CADJUST=mean]\
6632   VTPos; NBINOMIAL=N_Sam

***** Generalised Linear Mixed Model Analysis *****

                 Method:   cf Schall (1991) Biometrika
       Response variate:   VTPos
           Distribution:   BINOMIAL
          Link function:   LOGIT

          Random model: Farm
           Fixed model: Constant + (((((Housed + FCattle) + (Housed . Source))
+ (Housed . RecChnge)) + SSlur2) + Natural2) + Sheep

* Dispersion parameter fixed at value 1.000


*** Monitoring information ***

Iteration       Gammas Dispersion       Max change
        1        1.192      1.000       1.9262E+00
        2        1.565      1.000       3.7302E-01
        3        1.707      1.000       1.4208E-01
        4        1.727      1.000       1.9953E-02
        5        1.729      1.000       1.3488E-03
        6        1.729      1.000       1.5644E-04
        7        1.729      1.000       1.7719E-05


*** Estimated Variance Components ***

Random term                 Component         S.e.

Farm                           1.729         0.221


*** Residual variance model ***

Term              Factor         Model(order)     Parameter   Estimate     S.e.

Dispersn                         Identity         Sigma2        1.000     FIXED




                                             43
*** Estimated Variance matrix for Variance Components ***

     Farm     1      0.04876
 Dispersn     2      0.00000       0.00000

                              1             2



*** Table of effects for Constant ***

          -0.6691     Standard error:           0.36486



*** Table of effects for Housed ***

 Housed     0.0000   1.0000
            0.0000   1.2032

Standard error of differences:        0.3911



*** Table of effects for FCattle ***

    FCattle               1             2                3               4
                     0.0000        0.1717          -0.4264         -0.6731

Standard error of differences:       Average              0.3330
                                     Maximum              0.3876
                                     Minimum              0.2608

Average variance of differences:                          0.1133


*** Table of effects for Housed.Source ***

      Source          Breed           Buy               Both
      Housed
      0.0000         0.0000        -0.8806            -0.2403
      1.0000         0.0000        -0.0607            -0.4802

Standard error of differences:       Average              0.4572
                                     Maximum              0.5820
                                     Minimum              0.3133

Average variance of differences:                          0.2177


*** Table of effects for Housed.RecChnge ***

    RecChnge          0.0000       1.0000
      Housed
      0.0000         0.0000        -0.1825
      1.0000         0.0000        -0.9878

Standard error of differences:       Average              0.3687
                                     Maximum              0.4842
                                     Minimum              0.3388

Average variance of differences:                          0.1393


*** Table of effects for SSlur2 ***

     SSlur2          0.0000        1.0000
                     0.0000       -0.4288

Standard error of differences:        0.2977



*** Table of effects for Natural2 ***



                                                 44
   Natural2         0.0000            1.0000
                    0.0000           -0.7141

Standard error of differences:           0.3534



*** Table of effects for Sheep ***

      Sheep              1                 2
                    0.0000           -0.3043

Standard error of differences:           0.2317




*** Tables of means ***


*** Table of predicted means for Housed ***

 Housed    0.0000   1.0000
           -2.090   -1.096


*** Table of predicted means for FCattle ***

FCattle         1         2          3          4
           -1.361    -1.189     -1.787     -2.034


*** Table of predicted means for Housed.Source ***

      Source    Breed          Buy       Both
      Housed
      0.0000   -1.716        -2.597    -1.956
      1.0000   -0.915        -0.976    -1.396


*** Table of predicted means for Housed.RecChnge ***

    RecChnge   0.0000     1.0000
      Housed
      0.0000   -1.998        -2.181
      1.0000   -0.602        -1.590


*** Table of predicted means for SSlur2 ***

 SSlur2    0.0000   1.0000
           -1.378   -1.807


*** Table of predicted means for Natural2 ***

Natural2   0.0000   1.0000
           -1.236   -1.950


*** Table of predicted means for Sheep ***

  Sheep         1         2
           -1.440    -1.745


*** Back-transformed Means (on the original scale) ***



      Housed
      0.0000        0.1101
      1.0000        0.2506


     FCattle



                                                  45
             1       0.2041
             2       0.2334
             3       0.1434
             4       0.1157

        Housed       0.0000      1.0000
        Source
         Breed       0.1524      0.2859
           Buy       0.0694      0.2737
          Both       0.1239      0.1985

       RecChnge      0.0000      1.0000
         Housed
         0.0000      0.1194      0.1015
         1.0000      0.3539      0.1695


        SSlur2
        0.0000       0.2013
        1.0000       0.1410


       Natural2
         0.0000      0.2252
         1.0000      0.1246


         Sheep
             1       0.1915
             2       0.1487

Note: means are probabilities not expected values.


6633    VDISPLAY [PRINT=Wald]


*** Wald tests for fixed effects ***


   Fixed term                 Wald statistic        d.f.   Wald/d.f.   Chi-sq prob

* Sequentially adding terms to fixed model

   Housed                          29.54              1       29.54     <0.001
   FCattle                         10.11              3        3.37      0.018
   Housed.Source                    5.57              4        1.39      0.234
   Housed.RecChnge                 10.75              2        5.38      0.005
   SSlur2                           2.34              1        2.34      0.126
   Natural2                         4.13              1        4.13      0.042
   Sheep                            1.73              1        1.73      0.189

* Dropping individual terms from full fixed model

   FCattle                          7.20              3        2.40      0.066
   Housed.Source                    5.60              4        1.40      0.231
   Housed.RecChnge                  8.84              2        4.42      0.012
   SSlur2                           2.08              1        2.08      0.150
   Natural2                         4.08              1        4.08      0.043
   Sheep                            1.73              1        1.73      0.189

* Message: chi-square distribution for Wald tests is an asymptotic approximation
  (i.e. for large samples) and underestimates the probabilities in other cases.



Remembering that the Wald tests are liberal, these results show no evidence for
retaining Sheep and Housed.Source in the model.

Refitting the model without these factors gives the following output:
6634 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects;
DISTRIBUTION=binomial;\




                                               46
6635   LINK=logit; DISPERSION=1; FIXED=Housed + FCattle + Housed.RecChnge + SSlur2 +
Natural2;\
6636   RANDOM=Farm; CONSTANT=estimate; FACT=9; PSE=*; MAXCYCLE=20; FMETHOD=all;
CADJUST=mean]\
6637   VTPos; NBINOMIAL=N_Sam

***** Generalised Linear Mixed Model Analysis *****

                 Method:       cf Schall (1991) Biometrika
       Response variate:       VTPos
           Distribution:       BINOMIAL
          Link function:       LOGIT

          Random model:        Farm
           Fixed model:        Constant + (((Housed + FCattle) + (Housed . RecChnge))
+ SSlur2) + Natural2

* Dispersion parameter fixed at value 1.000


*** Monitoring information ***

Iteration            Gammas Dispersion        Max change
        1             1.253      1.000        1.8574E+00
        2             1.585      1.000        3.3224E-01
        3             1.736      1.000        1.5076E-01
        4             1.757      1.000        2.1145E-02
        5             1.759      1.000        1.4440E-03
        6             1.759      1.000        1.6155E-04
        7             1.759      1.000        1.7614E-05


*** Estimated Variance Components ***

Random term                       Component            S.e.

Farm                                  1.759           0.221


*** Residual variance model ***

Term                  Factor          Model(order)          Parameter       Estimate      S.e.

Dispersn                               Identity             Sigma2                1.000   FIXED



*** Estimated Variance matrix for Variance Components ***

     Farm        1       0.04875
 Dispersn        2       0.00000         0.00000

                                  1               2



*** Table of effects for Constant ***

            -1.071        Standard error:      0.3036



*** Table of effects for Housed ***

 Housed     0.0000       1.0000
            0.0000       1.3188

Standard error of differences:                0.3318



*** Table of effects for FCattle ***

       FCattle                1               2               3               4
                         0.0000          0.1309         -0.5034         -0.7694

Standard error of differences:             Average             0.3248



                                                       47
                                      Maximum        0.3815
                                      Minimum        0.2595

Average variance of differences:                     0.1077


*** Table of effects for Housed.RecChnge ***

    RecChnge        0.0000          1.0000
      Housed
      0.0000        0.0000         -0.1043
      1.0000        0.0000         -0.8906

Standard error of differences:        Average        0.3661
                                      Maximum        0.4804
                                      Minimum        0.3361

Average variance of differences:                     0.1373


*** Table of effects for SSlur2 ***

     SSlur2         0.0000          1.0000
                    0.0000         -0.5229

Standard error of differences:         0.2901



*** Table of effects for Natural2 ***

   Natural2         0.0000          1.0000
                    0.0000         -0.7082

Standard error of differences:         0.3525




*** Tables of means ***


*** Table of predicted means for Housed ***

 Housed    0.0000   1.0000
           -2.024   -1.099


*** Table of predicted means for FCattle ***

FCattle         1         2        3          4
           -1.276    -1.145   -1.779     -2.045


*** Table of predicted means for Housed.RecChnge ***

    RecChnge   0.0000     1.0000
      Housed
      0.0000   -1.972     -2.077
      1.0000   -0.653     -1.544


*** Table of predicted means for SSlur2 ***

 SSlur2    0.0000   1.0000
           -1.300   -1.823


*** Table of predicted means for Natural2 ***

Natural2   0.0000   1.0000
           -1.207   -1.916


*** Back-transformed Means (on the original scale) ***




                                                48
        Housed
        0.0000      0.1167
        1.0000      0.2500


       FCattle
             1      0.2182
             2      0.2414
             3      0.1444
             4      0.1145

       RecChnge      0.0000       1.0000
         Housed
         0.0000     0.1221       0.1114
         1.0000     0.3422       0.1759


        SSlur2
        0.0000      0.2141
        1.0000      0.1391


       Natural2
         0.0000     0.2301
         1.0000     0.1283

Note: means are probabilities not expected values.


6638    VDISPLAY [PRINT=Wald]


*** Wald tests for fixed effects ***


  Fixed term                  Wald statistic        d.f.   Wald/d.f.   Chi-sq prob

* Sequentially adding terms to fixed model

  Housed                           29.34              1       29.34     <0.001
  FCattle                          10.03              3        3.34      0.018
  Housed.RecChnge                   9.87              2        4.94      0.007
  SSlur2                            3.23              1        3.23      0.072
  Natural2                          4.04              1        4.04      0.045

* Dropping individual terms from full fixed model

  FCattle                           9.21              3        3.07      0.027
  Housed.RecChnge                   7.14              2        3.57      0.028
  SSlur2                            3.25              1        3.25      0.071
  Natural2                          4.04              1        4.04      0.045

* Message: chi-square distribution for Wald tests is an asymptotic approximation
  (i.e. for large samples) and underestimates the probabilities in other cases.

These results show that farms on which the sampled animals were housed show
statistically significantly higher (p<0.001) prevalences than those where the sampled
animals were unhoused (Graph in JulyResults.xls[Multivariate Housed])




                                               49
                0.40

                0.35

                0.30

                0.25
   Prevalence




                0.20

                0.15

                0.10

                0.05

                0.00
                        Unhoused                      Housed
                                    Class of Farm



Plot of prevalences in housed and unhoused animals, with 95% confidence
intervals.

The estimated prevalences on positive farms by housing status are as follows:

            Mean
   Class  Prevalence
 Unhoused   11.7%
  Housed    25.0%


The number of finishing cattle on the farm was used to define a categorical factor as
follows:

Category Name          Number of Finishing Cattle
       1                          <50
       2                         50-100
       3                        100-200
       4                          >200

Farms which fell into categories 3 and 4 had statistically significantly lower
prevalences than those in categories 1 and 2 (p=0.004).




                                         50
  0.40
  0.35

  0.30
  0.25

  0.20
  0.15
  0.10

  0.05

  0.00
           FCattle 1       FCattle 2         FCattle 3    FCattle 4



Plot of prevalences in farms by FCattle category, with 95% confidence intervals.

The estimated prevalences on positive farms by number of finishing cattle are as
follows:

           Mean
Category Prevalence
   1       21.8%
   2       24.1%
   3       14.4%
   4       11.5%

The variable defining whether there has been any change in diet or housing in the
immediate past is significant when fitted in interaction with Housed.


  0.50
  0.45
  0.40
  0.35
  0.30
  0.25
  0.20
  0.15
  0.10
  0.05
  0.00
         Unhoused/No    Unhoused/With        Housed/No   Housed/With
           Changes        Changes             Changes     Changes



Plot of prevalences in farms by Housed.RChnge category, with 95% confidence
intervals.


                                        51
The estimated prevalences on positive farms by housing/change status are as follows:

                              Mean
        Category           Prevalence
   Unhoused/No Changes       12.2%
  Unhoused/With Changes      11.1%
    Housed/No Changes        34.2%
   Housed/With Changes       17.6%

There is no significant effect due to changes among unhoused animals (p=0.76).
However, the prevalence among housed animals with recent changes is higher
although not statistically significant (p=0.26), while the prevalence among housed
animals without recent changes is significantly higher again (p=0.007). This can be
interpreted as a ‘build-up’ effect: housing increases the prevalence, and the presence
of a recent change implies that the housing effect will have had a shorter period of
time to take effect. It should be remembered that this factor could reflect either
changes in diet or changes in location: although it is tempting to interpret the results
in terms of the change in location, this uncertainty should be borne in mind.


  0.35

  0.30

  0.25

  0.20

  0.15

  0.10

  0.05

  0.00
               Natural Water Source                     Other



Plot of prevalences in farms by water source, with 95% confidence intervals.

The estimated prevalences on positive farms by water source are as follows:

              Mean
 Category Prevalence
Natural Water 12.8%
    Other     23.0%

Farms on which unhoused animals have access to a natural water supply have a lower
prevalence (p=0.045) than on other farms.




                                          52
  0.30

  0.25

  0.20

  0.15

  0.10

  0.05

  0.00
                 No Slurry Spread                     Slurry Spread



Plot of prevalences in farms by Slurry Spreading status, with 95% confidence
intervals.

The estimated prevalences on positive farms by slurry spreading status are as follows:

                 Mean
   Category    Prevalence
No Silage Grown 21.4%
 Silage Grown    13.9%

Farms on which slurry is spread on the silage fields have a lower prevalence than
those farms on which no slurry is spread. This difference is not statistically
significant (p=0.07) but would seem worth reporting.

Having fitted all the likely explanatory variables in the multifactor model, we now
return to explore the effect that the inclusion of these factors may have on the fit of
the structural factors.

Fitting Division and Division.Housed gives the following output:
7122 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects;
DISTRIBUTION=binomial;\
7123   LINK=logit; DISPERSION=1; FIXED=Housed + FCattle + Housed.RecChnge + SSlur2+
Natural2+Division+Division.Housed;\
7124   RANDOM=Farm; CONSTANT=estimate; FACT=9; PSE=*; MAXCYCLE=20; FMETHOD=all;
CADJUST=mean]\
7125   VTPos; NBINOMIAL=N_Sam

***** Generalised Linear Mixed Model Analysis *****

               Method:   cf Schall (1991) Biometrika
     Response variate:   VTPos
         Distribution:   BINOMIAL
        Link function:   LOGIT

          Random model: Farm
           Fixed model: Constant + (((((Housed + FCattle) + (Housed . RecChnge)
) + SSlur2) + Natural2) + Division) + (Housed . Division)

* Dispersion parameter fixed at value 1.000




                                          53
*** Monitoring information ***

Iteration            Gammas Dispersion        Max change
        1             1.292      1.000        1.7250E+00
        2             1.533      1.000        2.4051E-01
        3             1.678      1.000        1.4561E-01
        4             1.697      1.000        1.9120E-02
        5             1.699      1.000        1.2743E-03
        6             1.699      1.000        1.3341E-04
        7             1.699      1.000        1.3609E-05


*** Estimated Variance Components ***

Random term                       Component            S.e.

Farm                                  1.699           0.221


*** Residual variance model ***

Term                  Factor          Model(order)          Parameter       Estimate       S.e.

Dispersn                              Identity              Sigma2                1.000   FIXED



*** Estimated Variance matrix for Variance Components ***

     Farm        1       0.04885
 Dispersn        2       0.00000         0.00000

                                  1               2



*** Table of effects for Constant ***

            -1.232        Standard error:      0.4404



*** Table of effects for Housed ***

 Housed     0.0000       1.0000
            0.0000       1.3835

Standard error of differences:                0.5199



*** Table of effects for FCattle ***

       FCattle                1               2               3               4
                         0.0000          0.2010         -0.3886         -0.6241

Standard error of differences:             Average             0.3262
                                           Maximum             0.3793
                                           Minimum             0.2614

Average variance of differences:                               0.1085


*** Table of effects for Housed.RecChnge ***

       RecChnge          0.0000          1.0000
         Housed
         0.0000          0.0000          -0.2138
         1.0000          0.0000          -0.8995

Standard error of differences:             Average             0.3702
                                           Maximum             0.4860
                                           Minimum             0.3347

Average variance of differences:                               0.1404




                                                       54
*** Table of effects for SSlur2 ***

     SSlur2        0.0000           1.0000
                   0.0000          -0.3293

Standard error of differences:         0.3029



*** Table of effects for Natural2 ***

   Natural2        0.0000           1.0000
                   0.0000          -0.6814

Standard error of differences:         0.3534



*** Table of effects for Division ***

   Division        Central    Highland           Islands       North East    South East
                    0.0000      0.6400            0.4473           0.1987        0.5044


   Division    South West
                  -0.4383

Standard error of differences:        Average           0.6037
                                      Maximum           0.7070
                                      Minimum           0.4892

Average variance of differences:                        0.3678


*** Table of effects for Housed.Division ***

    Division       Central       Highland            Islands   North East    South East
      Housed
      0.0000        0.0000          0.0000            0.0000        0.0000        0.0000
      1.0000        0.0000          1.1292           -1.7037       -0.1396       -0.4355


    Division   South West
      Housed
      0.0000         0.0000
      1.0000        -0.0928

Standard error of differences:        Average           0.8699
                                      Maximum            1.378
                                      Minimum           0.6133

Average variance of differences:                        0.8114



*** Tables of means ***


*** Table of predicted means for Housed ***

 Housed   0.0000    1.0000
          -1.822    -0.988


*** Table of predicted means for FCattle ***

FCattle        1          2        3          4
          -1.202     -1.001   -1.591     -1.826


*** Table of predicted means for Housed.RecChnge ***

    RecChnge   0.0000     1.0000
      Housed
      0.0000   -1.715     -1.929
      1.0000   -0.538     -1.438



                                                55
*** Table of predicted means for SSlur2 ***

 SSlur2    0.0000    1.0000
           -1.240    -1.570


*** Table of predicted means for Natural2 ***

Natural2   0.0000    1.0000
           -1.064    -1.746


*** Table of predicted means for Division ***

   Division         Central    Highland       Islands     North East    South East
                     -1.527      -0.322        -1.931         -1.398        -1.240


   Division    South West
                   -2.011


*** Table of predicted means for Housed.Division ***

    Division        Central    Highland         Islands   North East    South East
      Housed
      0.0000          -2.047      -1.407         -1.600        -1.848        -1.543
      1.0000          -1.006       0.763         -2.263        -0.947        -0.937


    Division   South West
      Housed
      0.0000          -2.485
      1.0000          -1.538


*** Back-transformed Means (on the original scale) ***



      Housed
      0.0000        0.1392
      1.0000        0.2713


     FCattle
           1        0.2311
           2        0.2688
           3        0.1693
           4        0.1387

    RecChnge        0.0000     1.0000
      Housed
      0.0000        0.1526     0.1269
      1.0000        0.3686     0.1919


      SSlur2
      0.0000        0.2244
      1.0000        0.1723


    Natural2
      0.0000        0.2565
      1.0000        0.1486


    Division
     Central        0.1785
    Highland        0.4202
     Islands        0.1266
  North East        0.1982
  South East        0.2244
  South West        0.1180




                                           56
       Housed        0.0000       1.0000
     Division
      Central        0.1144       0.2677
     Highland        0.1967       0.6820
      Islands        0.1680       0.0943
   North East        0.1361       0.2794
   South East        0.1762       0.2814
   South West        0.0769       0.1769

Note: means are probabilities not expected values.



7106   VDISPLAY [PRINT=Wald]


*** Wald tests for fixed effects ***


   Fixed term                 Wald statistic         d.f.   Wald/d.f.   Chi-sq prob

* Sequentially adding terms to fixed model

   Housed                           29.73              1        29.73     <0.001
   FCattle                          10.16              3         3.39      0.017
   Housed.RecChnge                  10.11              2         5.05      0.006
   SSlur2                            3.30              1         3.30      0.069
   Natural2                          4.12              1         4.12      0.042
   Division                         12.27              5         2.45      0.031
   Housed.Division                   4.78              5         0.96      0.443

* Dropping individual terms from full fixed model

   FCattle                           7.24              3        2.41      0.065
   Housed.RecChnge                   7.65              2        3.82      0.022
   SSlur2                            1.18              1        1.18      0.277
   Natural2                          3.72              1        3.72      0.054
   Housed.Division                   4.78              5        0.96      0.443

* Message: chi-square distribution for Wald tests is an asymptotic approximation
  (i.e. for large samples) and underestimates the probabilities in other cases.


Hence, although Housed.Division is not significant, there is still significant evidence
of geographical variability unexplained by the fitted epidemiological factors (in fact,
the geographical distinctions are more clear after the effects of the other factors have
been removed).

Fitting Manage_O gives the following Wald statistics:
7111   VDISPLAY [PRINT=Wald]


*** Wald tests for fixed effects ***


   Fixed term                  Wald statistic        d.f.   Wald/d.f.   Chi-sq prob

* Sequentially adding terms to fixed model

   Housed                           29.16              1       29.16     <0.001
   FCattle                           9.98              3        3.33      0.019
   Housed.RecChnge                   9.82              2        4.91      0.007
   SSlur2                            3.21              1        3.21      0.073
   Natural2                          4.01              1        4.01      0.045
   Manage_O                          0.93              2        0.46      0.630

* Dropping individual terms from full fixed model

   FCattle                           9.05              3        3.02      0.029
   Housed.RecChnge                   7.24              2        3.62      0.027
   SSlur2                            3.50              1        3.50      0.062
   Natural2                          3.90              1        3.90      0.048




                                                57
  Manage_O                          0.93             2        0.46      0.630

* Message: chi-square distribution for Wald tests is an asymptotic approximation
  (i.e. for large samples) and underestimates the probabilities in other cases.



Fitting Housed.Manage_O gives the following Wald statistics:
7116    VDISPLAY [PRINT=Wald]


*** Wald tests for fixed effects ***


  Fixed term                Wald statistic         d.f.   Wald/d.f.   Chi-sq prob

* Sequentially adding terms to fixed model

  Housed                            29.36            1       29.36     <0.001
  FCattle                           10.10            3        3.37      0.018
  Housed.RecChnge                    9.97            2        4.98      0.007
  SSlur2                             3.26            1        3.26      0.071
  Natural2                           4.01            1        4.01      0.045
  Housed.Manage_O                    6.25            4        1.56      0.181

* Dropping individual terms from full fixed model

  FCattle                           10.47            3        3.49      0.015
  Housed.RecChnge                    7.61            2        3.80      0.022
  SSlur2                             3.53            1        3.53      0.060
  Natural2                           4.02            1        4.02      0.045
  Housed.Manage_O                    6.25            4        1.56      0.181

* Message: chi-square distribution for Wald tests is an asymptotic approximation
  (i.e. for large samples) and underestimates the probabilities in other cases.


Hence there is no evidence of Manage_O or its interaction with Housed having any
significant effect on the prevalence.

Fitting Sam_Mon (which was highly significant in the univariate analysis) gives the
following output:
7126 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects;
DISTRIBUTION=binomial;\
7127   LINK=logit; DISPERSION=1; FIXED=Housed + FCattle + Housed.RecChnge + SSlur2+
Natural2+Sam_Mon+Sam_Mon.Housed;\
7128   RANDOM=Farm; CONSTANT=estimate; FACT=9; PSE=*; MAXCYCLE=20; FMETHOD=all;
CADJUST=mean]\
7129   VTPos; NBINOMIAL=N_Sam

***** Generalised Linear Mixed Model Analysis *****

                 Method:   cf Schall (1991) Biometrika
       Response variate:   VTPos
           Distribution:   BINOMIAL
          Link function:   LOGIT

          Random model: Farm
           Fixed model: Constant + (((((Housed + FCattle) + (Housed . RecChnge)
) + SSlur2) + Natural2) + Sam_Mon) + (Housed . Sam_Mon)

* Dispersion parameter fixed at value 1.000


*** Monitoring information ***

Iteration       Gammas Dispersion     Max change
        1        1.241      1.000     1.7883E+00
        2        1.548      1.000     3.0736E-01
        3        1.701      1.000     1.5289E-01
        4        1.722      1.000     2.1058E-02




                                             58
           5         1.724       1.000       1.4655E-03
           6         1.724       1.000       1.5810E-04
           7         1.724       1.000       1.6533E-05


*** Estimated Variance Components ***

Random term                      Component             S.e.

Farm                                 1.724            0.228


*** Residual variance model ***

Term                 Factor          Model(order)          Parameter        Estimate      S.e.

Dispersn                             Identity              Sigma2                 1.000   FIXED



*** Estimated Variance matrix for Variance Components ***

     Farm        1      0.05218
 Dispersn        2      0.00000          0.00000

                                 1                2



*** Table of effects for Constant ***

               -2.230    Standard error:       1.3511



*** Table of effects for Housed ***

 Housed        0.0000   1.0000
                0.000    2.929

Standard error of differences:                1.267



*** Table of effects for FCattle ***

       FCattle               1              2                 3               4
                        0.0000         0.1277           -0.6122         -0.7928

Standard error of differences:               Average           0.3353
                                             Maximum           0.3965
                                             Minimum           0.2706

Average variance of differences:                              0.1147


*** Table of effects for Housed.RecChnge ***

       RecChnge         0.0000           1.0000
         Housed
         0.0000         0.0000         -0.0109
         1.0000         0.0000         -0.9641

Standard error of differences:            Average             0.4218
                                          Maximum             0.5547
                                          Minimum             0.3789

Average variance of differences:                              0.1824


*** Table of effects for SSlur2 ***

       SSlur2           0.0000         1.0000
                        0.0000        -0.5271

Standard error of differences:               0.3023




                                                      59
*** Table of effects for Natural2 ***

   Natural2        0.0000         1.0000
                   0.0000        -0.6120

Standard error of differences:        0.3729



*** Table of effects for Sam_Mon ***

    Sam_Mon           Jan            Feb               Mar          Apr      May
                   0.0000        -1.1247            0.2774      -0.0039   1.1748


    Sam_Mon           Jun             Jul              Aug         Sep       Oct
                   1.3308          1.3849           1.4567      0.8369    0.5001


    Sam_Mon            Nov           Dec
                   -0.2282       -0.2707

Standard error of differences:       Average            1.259
                                     Maximum            2.157
                                     Minimum           0.5449

Average variance of differences:                        1.816


*** Table of effects for Housed.Sam_Mon ***

     Sam_Mon           Jan            Feb              Mar          Apr      May
      Housed
      0.0000             *              *            0.0000      0.0000    0.0000
      1.0000        0.0000         0.0000           -0.6260     -0.5438   -0.8902


     Sam_Mon           Jun            Jul              Aug          Sep      Oct
      Housed
      0.0000        0.0000          0.0000           0.0000      0.0000    0.0000
      1.0000             *         -2.7419          -1.5238     -4.0927   -1.4113


     Sam_Mon           Nov            Dec
      Housed
      0.0000        0.0000              *
      1.0000        0.0000         0.0000

Standard error of differences:       Average            1.655
                                     Maximum            2.152
                                     Minimum           0.9070

Average variance of differences:                        2.805



*** Tables of means ***


*** Table of predicted means for Housed ***

     Housed        0.0000        1.0000
                        *             *


*** Table of predicted means for FCattle ***

FCattle        1         2        3          4
          -1.726    -1.599   -2.338     -2.519


*** Table of predicted means for Housed.RecChnge ***

    RecChnge        0.0000         1.0000
      Housed



                                               60
      0.0000                *                  *
      1.0000                *                  *


*** Table of predicted means for SSlur2 ***

 SSlur2    0.0000     1.0000
           -1.782     -2.309


*** Table of predicted means for Natural2 ***

Natural2   0.0000     1.0000
           -1.740     -2.352


*** Table of predicted means for Sam_Mon ***

Sam_Mon        Jan         Feb       Mar          Apr        May         Jun       Jul       Aug
                 *           *    -1.934       -2.174     -1.169           *    -1.885    -1.204


Sam_Mon       Sep        Oct         Nov           Dec
           -3.108     -2.104      -2.127             *


*** Table of predicted means for Housed.Sam_Mon ***

     Sam_Mon         Jan         Feb       Mar           Apr       May         Jun       Jul
      Housed
      0.0000         *              *   -2.847       -3.129    -1.950     -1.794     -1.740
      1.0000    -0.673         -1.797   -1.021       -1.220    -0.388          *     -2.030


     Sam_Mon         Aug         Sep       Oct           Nov       Dec
      Housed
      0.0000    -1.668         -2.288   -2.625       -3.353         *
      1.0000    -0.740         -3.928   -1.584       -0.901    -0.943


*** Back-transformed Means (on the original scale) ***



      Housed
      0.0000               *
      1.0000               *


     FCattle
           1         0.1511
           2         0.1682
           3         0.0880
           4         0.0745

    RecChnge         0.0000         1.0000
      Housed
      0.0000               *               *
      1.0000               *               *


      SSlur2
      0.0000         0.1440
      1.0000         0.0904


    Natural2
      0.0000         0.1494
      1.0000         0.0869


     Sam_Mon
         Jan              *
         Feb              *
         Mar         0.1263
         Apr         0.1021
         May         0.2370



                                                    61
          Jun             *
          Jul        0.1319
          Aug        0.2308
          Sep        0.0428
          Oct        0.1087
          Nov        0.1065
          Dec             *

        Housed       0.0000      1.0000
       Sam_Mon
           Jan            *      0.3379
           Feb            *      0.1422
           Mar       0.0548      0.2648
           Apr       0.0419      0.2279
           May       0.1246      0.4042
           Jun       0.1426           *
           Jul       0.1493      0.1161
           Aug       0.1587      0.3231
           Sep       0.0921      0.0193
           Oct       0.0676      0.1703
           Nov       0.0338      0.2889
           Dec            *      0.2802

Note: means are probabilities not expected     values.

7121   VDISPLAY [PRINT=Wald]


*** Wald tests for fixed effects ***


   Fixed term                 Wald statistic        d.f.   Wald/d.f.   Chi-sq prob

* Sequentially adding terms to fixed model

   Housed                          29.42              1       29.42     <0.001
   FCattle                         10.20              3        3.40      0.017
   Housed.RecChnge                  9.95              2        4.97      0.007
   SSlur2                           3.30              1        3.30      0.069
   Natural2                         4.00              1        4.00      0.045
   Sam_Mon                         14.10             11        1.28      0.227
   Housed.Sam_Mon                   9.12              7        1.30      0.244

* Dropping individual terms from full fixed model

   FCattle                         10.76              3        3.59      0.013
   Housed.RecChnge                  6.48              2        3.24      0.039
   SSlur2                           3.04              1        3.04      0.081
   Natural2                         2.69              1        2.69      0.101
   Housed.Sam_Mon                   9.12              7        1.30      0.244

* Message: chi-square distribution for Wald tests is an asymptotic approximation
  (i.e. for large samples) and underestimates the probabilities in other cases.


Neither Sam_Mon or Housed.Sam_Mon are statistically significant. Hence, the
explanatory variables (particularly Housed) have explained most of the variability that
was assigned to Month in the univariate analysis. We confirm this by refitting the
model without any of the housing terms:
7134   VDISPLAY [PRINT=Wald]


*** Wald tests for fixed effects ***


   Fixed term                 Wald statistic        d.f.   Wald/d.f.   Chi-sq prob

* Sequentially adding terms to fixed model

   FCattle                          4.89              3        1.63      0.180
   SSlur2                           0.01              1        0.01      0.943
   Natural2                        21.19              1       21.19     <0.001




                                               62
   Sam_Mon                                                  24.74            11        2.25      0.010

* Dropping individual terms from full fixed model

   FCattle                                                   8.12              3       2.71      0.044
   SSlur2                                                    2.19              1       2.19      0.139
   Natural2                                                 12.75              1      12.75     <0.001
   Sam_Mon                                                  24.74             11       2.25      0.010

* Message: chi-square distribution for Wald tests is an asymptotic approximation
  (i.e. for large samples) and underestimates the probabilities in other cases.


This output confirms that the month to month variability is almost completely
explained by the Housed terms.

Reviewing the pattern of housing of animals over the year we see the following
pattern:


                                1
   Proportion Groups Housed




                              0.8


                              0.6

                              0.4


                              0.2


                                0
                                                                                    r
                                      ry




                                                               ne
                              Fe ry




                                                                                   r
                                     ch




                                                                      ly
                                                  ril




                                                                                  r

                                                                                 er
                                                                                 st
                                                        ay




                                                                                be

                                                                                be
                                                                                be
                                                                    Ju
                                                Ap
                                   ua
                                    a




                                                                               gu




                                                                              ob
                                                             Ju
                                   ar




                                                        M




                                                                              m
                                 nu




                                                                             m
                                                                            em
                                M
                                br




                                                                           Au




                                                                            ct

                                                                           ve

                                                                          ce
                               Ja




                                                                          O
                                                                         pt



                                                                        No

                                                                       De
                                                                       Se




                                                                    Month



Proportion of Sampling Groups Housed, by Month, with 95% Confidence
Intervals.

In the univariate analysis, the months exhibiting a lower prevalence were identified as
June to October. June to September are the months with the lowest proportion of
animals housed, while in October, although a higher proportion of groups are housed,
the ‘recent change’ factor is likely to operate to reduce the shedding prevalence.

Fitting Sam_Year and Sam_Year.Housed to the data gives rise to the following
summary statistics:
7139                          VDISPLAY [PRINT=Wald]


*** Wald tests for fixed effects ***


   Fixed term                                      Wald statistic           d.f.   Wald/d.f.   Chi-sq prob

* Sequentially adding terms to fixed model




                                                                     63
   Housed                           29.01             1       29.01       <0.001
   FCattle                           9.95             3        3.32        0.019
   Housed.RecChnge                   9.79             2        4.89        0.007
   SSlur2                            3.19             1        3.19        0.074
   Natural2                          3.98             1        3.98        0.046
   Sam_Year                          1.00             2        0.50        0.606
   Housed.Sam_Year                   2.33             2        1.17        0.312

* Dropping individual terms from full fixed model

   FCattle                          8.30              3       2.77            0.040
   Housed.RecChnge                  4.87              2       2.43            0.088
   SSlur2                           3.30              1       3.30            0.069
   Natural2                         3.44              1       3.44            0.064
   Housed.Sam_Year                  2.33              2       1.17            0.312

* Message: chi-square distribution for Wald tests is an asymptotic approximation
  (i.e. for large samples) and underestimates the probabilities in other cases.


There is no evidence of any year-on-year trend in prevalence in either housed or
unhoused animals.

Returning to the model with the explanatory factors and animal health division, the
prevalences by area, after adjusting for the significant explanatory variables, are given
by fitting the following model:
7140 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects;
DISTRIBUTION=binomial;\
7141   LINK=logit; DISPERSION=1; FIXED=Housed+ FCattle + Housed.RecChnge+SSlur2+
Natural2+Division;\
7142   RANDOM=Farm; CONSTANT=estimate; FACT=9; PSE=*; MAXCYCLE=20; FMETHOD=all;
CADJUST=mean]\
7143   VTPos; NBINOMIAL=N_Sam

***** Generalised Linear Mixed Model Analysis *****

                 Method:   cf Schall (1991) Biometrika
       Response variate:   VTPos
           Distribution:   BINOMIAL
          Link function:   LOGIT

          Random model: Farm
           Fixed model: Constant + ((((Housed + FCattle) + (Housed . RecChnge))
 + SSlur2) + Natural2) + Division

* Dispersion parameter fixed at value 1.000


*** Monitoring information ***

 Iteration      Gammas Dispersion       Max change
         1       1.237      1.000       1.7843E+00
         2       1.533      1.000       2.9627E-01
         3       1.666      1.000       1.3312E-01
         4       1.685      1.000       1.8695E-02
         5       1.686      1.000       1.2612E-03
         6       1.686      1.000       1.3440E-04
         7       1.686      1.000       1.3954E-05


*** Estimated Variance Components ***

Random term                 Component         S.e.

Farm                           1.686         0.217


*** Residual variance model ***

Term              Factor         Model(order)     Parameter     Estimate          S.e.

Dispersn                         Identity         Sigma2              1.000      FIXED



                                             64
*** Estimated Variance matrix for Variance Components ***

     Farm     1       0.04689
 Dispersn     2       0.00000        0.00000

                               1              2



*** Table of effects for Constant ***

            -1.178     Standard error:    0.3600



*** Table of effects for Housed ***

 Housed     0.0000    1.0000
            0.0000    1.2921

Standard error of differences:          0.3288



*** Table of effects for FCattle ***

    FCattle                1              2              3            4
                      0.0000         0.2252        -0.4122      -0.6472

Standard error of differences:         Average         0.3220
                                       Maximum         0.3765
                                       Minimum         0.2586

Average variance of differences:                       0.1058


*** Table of effects for Housed.RecChnge ***

    RecChnge          0.0000         1.0000
      Housed
      0.0000          0.0000         -0.1707
      1.0000          0.0000         -0.8901

Standard error of differences:         Average         0.3654
                                       Maximum         0.4806
                                       Minimum         0.3326

Average variance of differences:                       0.1369


*** Table of effects for SSlur2 ***

     SSlur2          0.0000          1.0000
                     0.0000         -0.3338

Standard error of differences:          0.2957



*** Table of effects for Natural2 ***

   Natural2          0.0000          1.0000
                     0.0000         -0.7004

Standard error of differences:          0.3498



*** Table of effects for Division ***

   Division          Central       Highland        Islands   North East   South East
                      0.0000         1.0762         0.1254       0.1065       0.1952




                                                  65
   Division    South West
                  -0.4932

Standard error of differences:        Average       0.4146
                                      Maximum       0.5626
                                      Minimum       0.2942

Average variance of differences:                    0.1788



*** Tables of means ***


*** Table of predicted means for Housed ***

 Housed    0.0000    1.0000
           -1.821    -0.888


*** Table of predicted means for FCattle ***

FCattle         1          2         3          4
           -1.146     -0.921    -1.558     -1.793


*** Table of predicted means for Housed.RecChnge ***

    RecChnge   0.0000     1.0000
      Housed
      0.0000   -1.735        -1.906
      1.0000   -0.443        -1.333


*** Table of predicted means for SSlur2 ***

 SSlur2    0.0000    1.0000
           -1.188    -1.521


*** Table of predicted means for Natural2 ***

Natural2   0.0000    1.0000
           -1.004    -1.705


*** Table of predicted means for Division ***

   Division         Central      Highland       Islands   North East   South East
                     -1.523        -0.447        -1.398       -1.416       -1.328


   Division    South West
                   -2.016


*** Back-transformed Means (on the original scale) ***



      Housed
      0.0000        0.1393
      1.0000        0.2914


     FCattle
           1        0.2412
           2        0.2848
           3        0.1739
           4        0.1427

    RecChnge        0.0000        1.0000
      Housed
      0.0000        0.1499        0.1294
      1.0000        0.3909        0.2086




                                              66
        SSlur2
        0.0000      0.2337
        1.0000      0.1792


    Natural2
      0.0000        0.2681
      1.0000        0.1538


    Division
     Central        0.1790
    Highland        0.3902
     Islands        0.1982
  North East        0.1952
  South East        0.2095
  South West        0.1175

Note: means are probabilities not expected values.



  0.7
  0.6
  0.5
  0.4
  0.3
  0.2
  0.1
    0
         Division   Division   Division    Division   Division    Division
          Central   Highland    Islands   North East South East    South
                                                                   West



Plot of prevalences by animal health division, with 95% confidence intervals.

The mean prevalence in Highland division appears to be significantly higher than
those in Central, Islands, North-East and South-East (p=0.02), while the prevalences
in these regions are significantly higher than that in the South-West (p=0.03). These
trends match those seen in the univariate analysis.

Reviewing the fit of the model, plotting the observed and expected fractions of
positive pats for the 207 data included in the model gives the following plot:




                                          67
                        1

                       0.9

                       0.8

                       0.7
   Model Probability




                       0.6

                       0.5

                       0.4

                       0.3

                       0.2

                       0.1

                        0
                             0   0.2    0.4         0.6    0.8        1
                                       Observed Fraction



Plot of observed and fitted fractional prevalences.

Overall, the fit looks fairly reasonable, with a few minor outliers. The only serious
lack of fit occurs for maximal prevalences, where the fitted model will always be
smaller than an observed 100% shedding rate. Even this cluster of negative residuals
looks likely to be of negligible effect. To assess this more formally, we examine a
residual plot for the model. The residuals and fitted values from the model (based on
the inclusion only of fixed effects) are recovered by refitting the model using the
marginal method of Breslow & Clayton (1993) and then recovering the residuals
using

VKEEP [RES=Residuals;FIT=Fitted].

The resulting fitted values are converted back onto the proportion scale using the
inverse of the logit function, and the resulting plot is shown below:




                                               68
                1.5


                     1


                0.5
     Residual




                     0
                         0          0.2            0.4            0.6         0.8         1

                -0.5


                 -1


                -1.5
                                                   Fitted Fraction



Plot of residual against model fit (random effects model).

The histogram of these residuals should also be examined.


                                          Histogram of Residuals (Random)
                40




                30
  Frequency




                20




                10




                 0
                             -0.8           -0.4           0.0        0.4           0.8       1.2
                                                         Residuals (Random)



Histogram of residuals (random effects model).




                                                             69
The pattern of the residuals against the fitted value is fairly typical of this class of
residuals. The histogram is sufficiently symmetric for the fit of the model to be
regarded as acceptable, although there may be some evidence of sub-populations in
the histogram. Interpretation of these residuals is problematic. To fully evaluate the
fit of the model, we examine the deviance residuals from the equivalent fixed effect
model with overdispersion. This model is close in its properties to the mixed model,
and the deviance residuals are easier to interpret. The residuals are recovered using
the RKEEP command, using the default residual settings in RKEEP and MODEL.

The resulting fitted values are converted back onto the proportion scale using the
inverse of the logit function, and the resulting plot is shown below:


                3
              2.5
                2
              1.5
                1
   Residual




              0.5
                0
              -0.5
               -1
              -1.5
               -2
              -2.5
                     0   0.2       0.4         0.6          0.8          1
                                   Fitted Fraction



Plot of residual against model fit (fixed effects model).

The histogram of these residuals should also be examined.




                                          70
                            Histogram of Residuals (Fixed)
              30


              25


              20
  Frequency




              15


              10


              5


              0
                   -1.50       -0.75       0.00         0.75         1.50        2.25
                                         Residuals (Fixed)



Histogram of residuals (fixed effects model).

These graphics are much more easy to interpret. The main peculiarities appear to be a
clustering of moderately negative residuals associated with observed fractional
prevalences in the range 20-30%, and a slightly disproportionate number of high (>2)
residuals. The latter are, however, drawn from a wide range of observations with
different prevalences. It is, indeed, plausible that the latter peculiarity is a side-effect
of the former, since if the residual histogram is visualised as a confounding of two
subpopulations, one centred on a value slightly larger than zero, and the other on a
value around –0.75, both sub-populations appear reasonably normally distributed in
the dataset. No points have been highlighted by Genstat as exhibiting high leverage.
Calculating Cook’s statistics for each observation to identify observations which
combine both large residuals with high leverage, no particular pattern is apparent. No
sub-population of the dataset appears to be having a consistently strong effect on the
model.




                                            71
                                        VTPos



      3.0




      2.5




      2.0




      1.5




      1.0




      0.5




      0.0


                40           60             80             100    120

                           Fitted values suitably transformed



Plotting the Cook’s statistics against the various explanatory factors shows no
particular trend. Only one point stands out in this exercise: the point (Farm 515) with
the largest Cook’s statistic appears as an outlier in both the Highland level of the
Division factor and in the Housed with recent change level of the Housed.RecChnge
interaction term. However, removing this farm from the model has a negligible effect
on the residuals (and on the model and associated p-values in general).

The subpopulation of residuals correspond to a group of farms with lower than
expected shedding levels. The predicted prevalence is in the range 20%-30%, while
that observed is much lower: typically only one or two positive pats. Examination of
the properties of these observations shows some pattern. They tend to be observations
from farms which lack any of the obvious risk factors, or, if they do, these are off-set
by other, protective factors. Hence, their fitted risk is close to the estimated mean,
which is higher than the actual prevalence seen on these farms. This does not appear
to be a response to the inclusion of any specific factor in the model (given the lack of
evidence for significant leverage in the model), rather, it is a property of the response
distribution, where on some farms there are much fewer positive pats detected than on
apparently similar farms. This could reflect some unidentified and hence unmodelled
explanatory factor, or some peculiarity of the distribution which describes the random
terms. It is difficult to interpret such effects in purely random terms: the most obvious
aspect of the raw data, namely the apparent ‘bulge’ at high prevalences, can be
explained by various aspects of contagion models (such as the stochastic threshold
theorem) or by hypothesising the existence of hyper-shedding cattle. It is more



                                                 72
difficult to conceptualise a distributional effect which gives rise to a smaller
population at moderate prevalences.

If this sub-population does reflect a genuine and unidentified explanatory factor, at
least it is an unidentified protective factor rather than an unidentified risk factor.
Examination of the residuals would suggest that the residuals, although less than
perfect, are not sufficiently asymmetric to undermine the asymptotic assumptions
which underlie the calculation of standard errors and p-values. Hence, the results
reported in this document are still valid, and can be reported with confidence.




                                         73
Analysing Bernoulli data (absence or presence of farm level infection)

Initially, the effect of the descriptive variables (Division, Sam_Month, Manage_O)
will be assessed:
5559 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos;
NBINOMIAL=N_Bin
5560 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes;
FACT=9]\
5561 Manage_O

***** Regression Analysis *****

Response variate:   VFarmPos
 Binomial totals:   N_Bin
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant, Manage_O


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       3          1.0        0.328      0.33 0.805
Residual       948        996.0        1.051
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
          221         0.00      0.1845
          351         0.00      0.1845


*** Estimates of parameters ***

                                                               antilog of
                    estimate       s.e.      t(*)      t pr.     estimate
Constant              -1.291      0.196     -6.58      <.001       0.2750
Manage_O Beef          0.010      0.220      0.05      0.963        1.010
Manage_O Other
                      0.032        0.260        0.12   0.903       1.032
Manage_O Mixed
                     -4.27         6.95     -0.61 0.539     0.01400
* MESSAGE: s.e.s are based on dispersion parameter with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
            Manage_O Dairy


Manage_O shows no significant effects. Division shows more interesting effects.
5562 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos;
NBINOMIAL=N_Bin
5563 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes;
FACT=9]\
5564 Division


5564............................................................................


***** Regression Analysis *****

Response variate:   VFarmPos
 Binomial totals:   N_Bin
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant, Division




                                           74
*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       5          7.9        1.580      1.58 0.162
Residual       946        989.1        1.046
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses


*** Estimates of parameters ***

                                                                 antilog of
                    estimate        s.e.          t(*)   t pr.     estimate
Constant              -1.106       0.170         -6.52   <.001       0.3309
Division Highland
                      -0.612       0.336         -1.82   0.069      0.5423
Division Islands
                      -0.475       0.339         -1.40   0.161      0.6221
Division North East
                      -0.005       0.232         -0.02   0.982      0.9947
Division South East
                      0.017        0.260         0.07    0.948       1.017
Division South West
                    -0.354        0.236     -1.50 0.133      0.7020
* MESSAGE: s.e.s are based on dispersion parameter with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
            Division Central


Overall, there is no statistically significant evidence of any differences in the levels of
farm prevalence in different areas of Scotland. The prevalence in the Central, North-
East and South-East are all comparable, with the prevalence in the South-West being
lower, and that in the Highlands and the Islands lower still.


       35%


       30%

       25%


       20%

       15%


       10%

        5%


        0%
                Central    Highlands   Islands           NE          SE       SW




                                            75
Plot of farm prevalences by animal health division (univariate analysis), with
95% confidence intervals.

The estimated prevalences of positive farms in different divisions are as follows:

Central        25%
Highlands      15%
Islands        17%
NE             25%
SE             25%
SW             19%

These results are interesting, noting in particular that the high animal prevalence in
the Highlands is matched with a low farm prevalence, but no trend is apparent when
the animal and farm prevalences are plotted by Division, and in general, it must be
stressed that the farm prevalence effects are not statistically significant.

Examining Sampling Month,
5602 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos;
NBINOMIAL=N_Bin
5603 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes;
FACT=9]\
5604 Sam_Mon

5604............................................................................


***** Regression Analysis *****

 Response variate:   VFarmPos
  Binomial totals:   N_Bin
     Distribution:   Binomial
    Link function:   Logit
     Fitted terms:   Constant, Sam_Mon


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression      11         19.0        1.731      1.73 0.060
Residual       940        978.0        1.040
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses


*** Estimates of parameters ***

                                                            antilog of
                     estimate       s.e.     t(*)   t pr.     estimate
Constant               -2.031      0.376    -5.40   <.001       0.1311
Sam_Mon Feb             0.292      0.481     0.61   0.544        1.340
Sam_Mon Mar             0.784      0.435     1.80   0.071        2.190
Sam_Mon Apr             0.340      0.475     0.71   0.475        1.405
Sam_Mon May             1.051      0.432     2.43   0.015        2.860
Sam_Mon Jun             0.502      0.485     1.04   0.300        1.652
Sam_Mon Jul             1.010      0.465     2.17   0.030        2.745
Sam_Mon Aug             0.915      0.463     1.97   0.048        2.496
Sam_Mon Sep             1.030      0.466     2.21   0.027        2.801
Sam_Mon Oct             0.696      0.452     1.54   0.123        2.007
Sam_Mon Nov             1.364      0.466     2.93   0.003        3.910




                                           76
Sam_Mon Dec          0.677        0.546      1.24 0.215       1.968
* MESSAGE: s.e.s are based on dispersion parameter with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
             Sam_Mon Jan

5605   RKEEP ; ESTIMATES=Est; VCOVARIANCE=Var


Overall, there is no statistically significant evidence of any differences in farm
prevalence in different months. Examining the associated confidence intervals:


        50%
        45%
        40%
        35%
        30%
        25%
        20%
        15%
        10%
         5%
         0%



                                                                r
                 ry




                                           ne
         Fe ry




                                                               r
                ch




                                                  ly
                              ril




                                                              r

                                                             er
                                                             st
                                    ay




                                                            be

                                                            be
                                                            be
                                                Ju
                            Ap
              ua
               a




                                                           gu




                                                          ob
                                         Ju
              ar




                                    M




                                                          m
            nu




                                                         m
                                                        em
           M
           br




                                                       Au




                                                        ct

                                                       ve

                                                      ce
          Ja




                                                      O
                                                     pt



                                                    No

                                                   De
                                                   Se




Plot of farm prevalences by sampling month, with 95% confidence intervals.

The estimated farm prevalences in different sampling months are as follows:

January          12%
February         15%
March            22%
April            16%
May              27%
June             18%
July             26%
August           25%
September        27%
October          21%
November         34%
December         21%

Although there is no formally statistically significant evidence of differences between
the mean prevalences on a month-by-month basis, a clear trend is visible in the data.
January, February and April are associated with the lowest three prevalences, while
the prevalence from March is fairly low. At the within-farm level, these months were



                                            77
associated with some of the highest animal prevalences, explained by factors such as
housing of animals. Given the complex multivariate model which was required in the
analysis of the within-farm data, there is little point in exploring these properties
further before investigating the explanatory factors which might affect prevalence
levels.
Exploring the possible explanatory factors in a univariate fashion using a Generalised
Linear Model, the results are summarised in the following table. The p-values
indicate the likely significance of the fitted values. Variables with p-values of less
than 5% are indicated in red, those in the range 5%-10% in blue. Those variables
which ultimately are found to be of interest in the multivariate analysis are indicated
by bold text.

Factor/Variable p-value          Comments
Manage_C                    0.67 ‘Beef’ and ‘Others' higher than 'Dairy'
Manage_O                    0.80 ‘Beef’ and ‘Others' higher than 'Dairy'
Division                    0.16 ‘Highland’ lower than others
Sam_Month                   0.06 Lower in January and February
Sample                      0.28 Lower in rectal samples
Sam_Year                   0.004 Consistent drop with time
Season                      0.04 Winter lower than other seasons
                                 Both Winter estimates lower than other seasons: final
SeasList                    0.01 Spring may also be lower

Sampler                     0.18 ‘Fiona' is higher than 'Helen'
                                 Higher numbers of finishing cattle associated with higher
                                 farm prevalence, probably better analysed as a factor,
N_F_Cattle                <0.001 below
FCattle                   <0.001 Groups 2 and 3 higher than group 1, group 4 higher again
                                 Probably better analysed as a factor, below: more groups
N_Groups                    0.04 associated with higher prevalence
GroupsCat                   0.08 More groups associated with higher prevalence
N_Sam_Gr                  <0.001 More sampling groups associated with higher prevalences
Min_Age                     0.74 Higher minimum age associated with lower prevalence
Max_Age                     0.31 Higher maximum age associated with lower prevalence
                                 ‘Buy in' and ‘Both’ higher prevalences than 'Breeding
Source                      0.01 only'
NewSource                   0.03 ‘Open' higher than 'Closed'
Breed                       0.03 ‘B_D_DB ' higher than others. No consistent pattern
                                 Farms with Housed animals are more likely to exhibit
Housed                      0.64 shedding animals: but this is not statistically significant
                                 ‘Byre’ excluded due to badly fitting model: too few
                                 observations. All alternatives have lower prevalences than
Housing                     0.17 ‘Court’.
NoChange                    0.87 1' higher than '0' (not sure of interpretation)
TDHouse                     0.46 Longer time associated with higher prevalences
Rec_Move                    0.66 A recent move is associated with lower prevalences
                                 Most recent move class 1 (<1 week) is lower than classes
RecMove2                    0.58 2 and 3 (>1 week)
                                 Farms with animals receiving supplementary feed less
SupFeed                     0.80 likely to be positive
RecDFeed                    0.69 Recent change in feed associated with higher prevalence
Forage                      0.39 Farms with animals having forage less likely to be



                                          78
                      positive
Silage           0.64 Farms with animals having silage less likely to be positive
                      Farms with animals having concentrate more likely to be
Concentrate      0.31 positive
Sil_Home         0.83 ‘Yes' is higher than 'No'
Sil_Manure       0.68 ‘Yes' is lower than 'No'
Sil_Slurry       0.16 ‘Yes' is higher than 'No'
Sil_Sewage       0.60 ‘Yes' is higher than 'No'
Sil_Geece        0.22 ‘Yes' is lower than 'No'
Sil_Gulls        0.57 ‘Yes' is higher than 'No'
Hay              0.87 ‘Yes' is lower than 'No'
Hay_Manure       0.68 ‘Yes' is lower than 'No'
Hay_Slurry       0.12 ‘Yes' is higher than 'No'
Hay_Sewage            No data points in class with Sewage on hay fields.
Hay_Geese        0.27 Geece present associated with lower prevalence
Hay_Gulls        0.22 Gulls present associated with lower prevalence
                      Farms reporting use of manure on grass less likely to be
Grass_Manure     0.02 positive for shedding
                      Farms reporting use of slurry on grass more likely to be
Grass_Slurry   <0.001 positive for shedding
                      Farms reporting use of sewage on grass less likely to be
Grass_Sewage     0.54 positive for shedding
                      Farms reporting geece on grass less likely to be positive
Grass_Geece      0.52 for shedding
                      Farms reporting gulls on grass more likely to be positive
Grass_Gulls      0.49 for shedding
N_Cattle        0.004 More cattle associated with higher prevalence
Cattle          0.002 Groups 2 and 3 show higher prevalences than group 1
                      Larger numbers of sheep are protective, but better
N_Sheep          0.41 analysed using a factor
Sheep            0.42 (Sheep absent or present) 'With' is higher than 'Without'
N_Goats          0.08 More goats associated with higher prevalence
Goats            0.44 (Goats absent or present) 'With' is higher than 'Without'
N_Horses         0.69 More horses associated with lower prevalence
N_Pigs           0.32 More pigs associated with lower prevalence
Pigs             0.01 (Pigs absent or present) 'With' is higher than 'Without'
N_Chickens       0.97 More chickens associated with lower prevalence
Chickens         0.46 (Chickens absent or present) 'With' is lower than 'Without'
N_Deer           0.28 More deer associated with higher prevalence
Deer             0.38 (Deer absent or present) 'With' is higher than 'Without'
Water            0.16 No obvious pattern
Mains            0.21 Mains supply farms have a higher mean prevalence
Natural          0.10 Natural supply farms have a lower mean prevalence
Private          0.34 Private supply farms have a lower mean prevalence
WaterCon         0.66 With' is higher than 'Without'
                      All but 'None', 'Animal' and ASM thrown out for lack of
WaterCT          0.81 information: ordering ‘Animals’ , ‘None’, 'ASM'
                      Those that wanted to know had lower prevalences than
Want2Know        0.68 those who did not
                      Those willing to have a 2nd visit had a lower prevalence
Visit2           0.82 than those who were not
                      ‘S’ generated lower prevalences than ‘D’ and ‘H’. ‘H’
LabOperator      0.04 was lower than ‘D’.


                               79
BeefonDairy                    0.02 This class of farm exhibits a higher prevalence

Unlike the analysis of the prevalence data from positive farms, no factor appears to be
absolutely pivotal in defining the system in the way that the Housed/Unhoused
classification did in for the Binomial data. The properties of the interesting factors
will therefore be reviewed in depth.                       These are        N_F_Cattle/
FCattle/N_Groups/GroupsCat/N_Sam_Gr/Cattle/N_Cattle, Source/NewSource, Breed/
BeefonDairy (BeefonDairy is defined as a particular interaction of a management and a breed
factor), Grass_Manure, Grass_Slurry, N_Goats, Pigs, LabOperator. Sample_Year and a
variety of associated Sample_Month and/or Seasonal factors are all worth further
investigation as possible descriptive factors. Note that the variables have been grouped,
where appropriate, into equivalence classes of what are likely to be highly correlated factors.

Exploring the N_F_Cattle/FCattle/N_Groups/GroupsCat/N_Sam_Gr/Cattle/N_Cattle group,
all of these measures are associated with the size of the animal population on the farm. All of
these factors and variables associate higher numbers of cattle and/or groups with a higher
probability of the farm exhibiting a sample containing VT E. coli O157. Examining the
output from the model for N_F_Cattle, we note the high leverage which is associated with the
larger values of the explanatory variable.

5621       MODEL   [DISTRIBUTION=binomial;          LINK=logit;   DISPERSION=1]       VFarmPos;
NBINOMIAL=N_Bin
5622 TERMS [FACT=9] N_F_Catt
5623   FIT [PRINT=model,summary,estimates;         CONSTANT=estimate;   FPROB=yes;   TPROB=yes;
FACT=9]\
5624   N_F_Catt


***** Regression Analysis *****

 Response variate:   VFarmPos
  Binomial totals:   N_Bin
     Distribution:   Binomial
    Link function:   Logit
     Fitted terms:   Constant, N_F_Catt


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       1         17.1       17.065     17.06 <.001
Residual       950        980.0        1.032
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           intermediate responses are more variable than small or large
responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
           65         0.00      0.0606
           70         1.00      0.0118
          130         1.00      0.0212
          172         0.00      0.0225
          286         1.00      0.0212
          308         1.00      0.0102
          422         0.00      0.0102
          440         1.00      0.0554
          444         1.00      0.0368
          450         0.00      0.0673
          454         0.00      0.0152
          455         0.00      0.0212
          496         0.00      0.0212
          499         0.00      0.0152
          527         0.00      0.0078




                                              80
          529          1.00       0.0279
          545          0.00       0.0085
          552          0.00       0.0102
          578          1.00       0.0102
          683          1.00       0.0423
          737          0.00       0.0102
          775          0.00       0.0131
          781          1.00       0.0082
          838          1.00       0.0111
          861          1.00       0.0212
          874          0.00       0.0517
          884          1.00       0.0102
          920          0.00       0.0187
          952          0.00       0.0102


*** Estimates of parameters ***

                                                         antilog of
                  estimate         s.e.      t(*) t pr.    estimate
Constant            -1.567        0.108    -14.54 <.001      0.2087
N_F_Catt          0.003631    0.000884       4.11 <.001       1.004
      MESSAGE: s.e.s are based on dispersion parameter with value 1

Such large leverages associated with a sparse tail of the distribution of a variable are
generally associated with poor models. Hence, FCattle is to be preferred as an
explanatory variable. The output from this model still exhibits the same leverage
issues, but these effects are confined to the largest classification class, which is of
relatively little importance.
5625 "Modelling of binomial proportions. (e.g. by logits)."
5626       MODEL   [DISTRIBUTION=binomial;   LINK=logit;    DISPERSION=1]     VFarmPos;
NBINOMIAL=N_Bin
5627 TERMS [FACT=9] FCattle
5628   FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes;     TPROB=yes;
FACT=9]\
5629   FCattle

***** Regression Analysis *****

 Response variate:   VFarmPos
  Binomial totals:   N_Bin
     Distribution:   Binomial
    Link function:   Logit
     Fitted terms:   Constant, FCattle


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       3         20.1        6.704      6.70 <.001
Residual       948        976.9        1.030
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
           22         1.00      0.0158
           65         0.00      0.0158
           70         1.00      0.0158
           97         0.00      0.0158
          130         1.00      0.0158
          172         0.00      0.0158
          200         0.00      0.0158
          280         0.00      0.0158
          286         1.00      0.0158
          308         1.00      0.0158
          322         0.00      0.0158
          324         0.00      0.0158




                                           81
         355         0.00         0.0158
         363         0.00         0.0158
         369         0.00         0.0158
         383         1.00         0.0158
         386         0.00         0.0158
         388         0.00         0.0158
         421         0.00         0.0158
         422         0.00         0.0158
         425         1.00         0.0158
         440         1.00         0.0158
         444         1.00         0.0158
         446         1.00         0.0158
         450         0.00         0.0158
         454         0.00         0.0158
         455         0.00         0.0158
         468         0.00         0.0158
         472         0.00         0.0158
         489         0.00         0.0158
         496         0.00         0.0158
         499         0.00         0.0158
         527         0.00         0.0158
         529         1.00         0.0158
         545         0.00         0.0158
         552         0.00         0.0158
         560         0.00         0.0158
         578         1.00         0.0158
         620         1.00         0.0158
         651         1.00         0.0158
         661         0.00         0.0158
         667         0.00         0.0158
         683         1.00         0.0158
         688         1.00         0.0158
         705         0.00         0.0158
         725         0.00         0.0158
         737         0.00         0.0158
         752         0.00         0.0158
         763         0.00         0.0158
         775         0.00         0.0158
         781         1.00         0.0158
         805         0.00         0.0158
         809         1.00         0.0158
         838         1.00         0.0158
         857         1.00         0.0158
         861         1.00         0.0158
         874         0.00         0.0158
         884         1.00         0.0158
         897         0.00         0.0158
         920         0.00         0.0158
         922         1.00         0.0158
         945         0.00         0.0158
         952         0.00         0.0158


*** Estimates of parameters ***

                                                           antilog of
                  estimate         s.e.      t(*)    t pr.   estimate
Constant            -1.649        0.126    -13.08    <.001     0.1923
FCattle 2            0.587        0.192      3.06    0.002      1.799
FCattle 3            0.588        0.214      2.75    0.006      1.800
FCattle 4            1.095        0.290      3.78    <.001      2.990
* MESSAGE: s.e.s are based on dispersion parameter   with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
             FCattle 1

When the model is refitted, constrained to model only the smaller classes, the
following output is generated:
5630 RESTRICT FCattle;CONDITION=FCattle.LT.4
5631 "Modelling of binomial proportions. (e.g. by logits)."
5632       MODEL   [DISTRIBUTION=binomial;   LINK=logit;    DISPERSION=1]   VFarmPos;
NBINOMIAL=N_Bin
5633 TERMS [FACT=9] FCattle




                                           82
5634   FIT [PRINT=model,summary,estimates;     CONSTANT=estimate;   FPROB=yes;   TPROB=yes;
FACT=9]\
5635   FCattle

* MESSAGE: Term FCattle cannot be fully included in the model
  because 1 parameter is aliased with terms already in the model

 (FCattle 4) = 0

***** Regression Analysis *****

Response variate:   VFarmPos
 Binomial totals:   N_Bin
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant, FCattle


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       2         12.4        6.211      6.21 0.002
Residual       886        894.2        1.009
Total          888        906.6        1.021
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses


*** Estimates of parameters ***

                                                           antilog of
                  estimate         s.e.      t(*)    t pr.   estimate
Constant            -1.649        0.126    -13.08    <.001     0.1923
FCattle 2            0.587        0.192      3.06    0.002      1.799
FCattle 3            0.588        0.214      2.75    0.006      1.800
FCattle 4                0            *         *        *      1.000
* MESSAGE: s.e.s are based on dispersion parameter   with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
             FCattle 1

The effect is still highly significant (p=0.002).     Hence, FCattle is always to be
preferred over N_F_Cattle.

Similar considerations apply to N_Group, where the tail of the distribution has a
strong leverage on the model:
5642       MODEL   [DISTRIBUTION=binomial;      LINK=logit;   DISPERSION=1]       VFarmPos;
NBINOMIAL=N_Bin
5643 TERMS [FACT=9] N_Groups
5644   FIT [PRINT=model,summary,estimates;     CONSTANT=estimate;   FPROB=yes;   TPROB=yes;
FACT=9]\
5645   N_Groups


***** Regression Analysis *****

Response variate:   VFarmPos
 Binomial totals:   N_Bin
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant, N_Groups


*** Summary of analysis ***

                                        mean    deviance approx
             d.f.      deviance     deviance       ratio chi pr




                                          83
Regression       1          4.0        4.044      4.04 0.044
Residual       950        993.0        1.045
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
           65         0.00      0.0461
           97         0.00      0.0104
          249         0.00      0.0087
          293         0.00      0.0104
          324         0.00      0.0123
          440         1.00      0.0594
          450         0.00      0.0797
          454         0.00      0.0087
          487         0.00      0.0072
          494         0.00      0.0087
          496         0.00      0.1141
          527         0.00      0.0166
          529         1.00      0.0123
          545         0.00      0.0277
          552         0.00      0.0217
          748         0.00      0.0123
          781         1.00      0.0123
          861         1.00      0.0217
          922         1.00      0.2307
          945         0.00      0.0104
          946         1.00      0.0087


*** Estimates of parameters ***

                                                           antilog of
                  estimate         s.e.      t(*)    t pr.   estimate
Constant            -1.417        0.104    -13.56    <.001     0.2426
N_Groups            0.0375       0.0185      2.02    0.043      1.038
* MESSAGE: s.e.s are based on dispersion parameter   with value 1

Replacing N_Groups with GroupsCat gives rise to the following output:
5652       MODEL   [DISTRIBUTION=binomial;      LINK=logit;   DISPERSION=1]       VFarmPos;
NBINOMIAL=N_Bin
5653 TERMS [FACT=9] GroupsCat
5654   FIT [PRINT=model,summary,estimates;     CONSTANT=estimate;   FPROB=yes;   TPROB=yes;
FACT=9]\
5655   GroupsCat

***** Regression Analysis *****

Response variate:   VFarmPos
 Binomial totals:   N_Bin
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant, GroupsCat


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       3          6.7        2.230      2.23 0.082
Residual       948        990.3        1.045
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
           50         0.00      0.0231
           65         0.00      0.0231
           97         0.00      0.0231
          172         0.00      0.0231




                                          84
          249         0.00        0.0231
          254         1.00        0.0231
          285         0.00        0.0231
          293         0.00        0.0231
          324         0.00        0.0231
          330         0.00        0.0231
          331         0.00        0.0231
          440         1.00        0.0231
          450         0.00        0.0231
          454         0.00        0.0231
          459         0.00        0.0231
          460         1.00        0.0231
          487         0.00        0.0231
          494         0.00        0.0231
          496         0.00        0.0231
          520         1.00        0.0231
          527         0.00        0.0231
          529         1.00        0.0231
          545         0.00        0.0231
          552         0.00        0.0231
          599         1.00        0.0231
          667         0.00        0.0231
          688         1.00        0.0231
          692         0.00        0.0231
          709         0.00        0.0231
          748         0.00        0.0231
          761         0.00        0.0231
          775         0.00        0.0231
          781         1.00        0.0231
          813         1.00        0.0231
          839         0.00        0.0231
          857         1.00        0.0231
          861         1.00        0.0231
          864         1.00        0.0231
          901         0.00        0.0231
          922         1.00        0.0231
          945         0.00        0.0231
          946         1.00        0.0231
          952         0.00        0.0231


*** Estimates of parameters ***

                                                           antilog of
                  estimate         s.e.      t(*)    t pr.   estimate
Constant            -1.604        0.178     -9.03    <.001     0.2011
GroupsCat 2          0.391        0.203      1.92    0.054      1.478
GroupsCat 3          0.318        0.303      1.05    0.295      1.374
GroupsCat 4          0.876        0.370      2.37    0.018      2.401
* MESSAGE: s.e.s are based on dispersion parameter   with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
           GroupsCat 1

On first review, this model output may appear less acceptable than the first, since the
number of high leverage observations is higher. However, these observations are all
of those allocated to the highest level of the group. The true suitability of the model
can again be examined by constraining the model to ignore this level.
5673 RESTRICT GroupsCat;CONDITION=GroupsCat.LT.4
5674 "Modelling of binomial proportions. (e.g. by logits)."
5675       MODEL   [DISTRIBUTION=binomial;   LINK=logit;    DISPERSION=1]    VFarmPos;
NBINOMIAL=N_Bin
5676 TERMS [FACT=9] GroupsCat
5677   FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes;    TPROB=yes;
FACT=9]\
5678   GroupsCat

* MESSAGE: Term GroupsCat cannot be fully included in the model
  because 1 parameter is aliased with terms already in the model

  (GroupsCat 4) = 0




                                           85
***** Regression Analysis *****

Response variate:   VFarmPos
 Binomial totals:   N_Bin
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant, GroupsCat


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       2          3.9        1.935      1.94 0.144
Residual       906        936.1        1.033
Total          908        939.9        1.035
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses


*** Estimates of parameters ***

                                                           antilog of
                  estimate         s.e.      t(*)    t pr.   estimate
Constant            -1.604        0.178     -9.03    <.001     0.2011
GroupsCat 2          0.391        0.203      1.92    0.054      1.478
GroupsCat 3          0.318        0.303      1.05    0.295      1.374
GroupsCat 4              0            *         *        *      1.000
* MESSAGE: s.e.s are based on dispersion parameter   with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
           GroupsCat 1

Leverage is not a problem in this model, but much of the significance of the effects
has been lost.
5680      GROUPS   [LMETHOD=*;boundaries=upper]   N_Groups;  RevGCat;   limits=!(1.5);
LABELS=!T(One, More)
5681 "Modelling of binomial proportions. (e.g. by logits)."
5682       MODEL    [DISTRIBUTION=binomial;    LINK=logit;  DISPERSION=1]    VFarmPos;
NBINOMIAL=N_Bin
5683 TERMS [FACT=9] RevGCat
5684   FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes;
FACT=9]\
5685   RevGCat


***** Regression Analysis *****

Response variate:   VFarmPos
 Binomial totals:   N_Bin
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant, RevGCat


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       1          4.6        4.581      4.58 0.032
Residual       950        992.4        1.045
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses


*** Estimates of parameters ***



                                          86
                                                           antilog of
                  estimate         s.e.      t(*)    t pr.   estimate
Constant            -1.604        0.178     -9.03    <.001     0.2011
RevGCat 2            0.413        0.198      2.09    0.037      1.512
* MESSAGE: s.e.s are based on dispersion parameter   with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
             RevGCat 1

Hence, farms with more than one sampling group are more likely to exhibit positive
samples (p=0.04). RevGCat is a more appropriate term to include in a model than
N_Groups or GroupsCat.

Similar considerations apply to the N_Cattle and Cattle terms. N_Cattle is a
significant variable, but some of the larger terms exert a strong leverage on the
results:
5687       MODEL   [DISTRIBUTION=binomial;      LINK=logit;   DISPERSION=1]       VFarmPos;
NBINOMIAL=N_Bin
5688 TERMS [FACT=9] N_Cattle
5689   FIT [PRINT=model,summary,estimates;     CONSTANT=estimate;   FPROB=yes;   TPROB=yes;
FACT=9]\
5690   N_Cattle

***** Regression Analysis *****

Response variate:   VFarmPos
 Binomial totals:   N_Bin
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant, N_Cattle


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       1          8.3        8.275      8.27 0.004
Residual       950        988.7        1.041
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
           62         1.00      0.0165
           70         1.00      0.0109
          182         0.00      0.0097
          200         0.00      0.0104
          201         0.00      0.0108
          310         0.00      0.0083
          370         0.00      0.0108
          418         0.00      0.0072
          444         1.00      0.0464
          460         1.00      0.0083
          494         0.00      0.0503
          496         0.00      0.0216
          527         0.00      0.1084
          599         1.00      0.0104
          651         1.00      0.0116
          680         0.00      0.0125
          737         0.00      0.0372
          748         0.00      0.0079
          750         1.00      0.0190
          761         0.00      0.0417
          763         0.00      0.0665
          769         0.00      0.0186
          884         1.00      0.0216




                                          87
*** Estimates of parameters ***

                                                         antilog of
                  estimate         s.e.      t(*) t pr.    estimate
Constant            -1.466        0.103    -14.18 <.001      0.2307
N_Cattle          0.001299    0.000446       2.91 0.004       1.001
      MESSAGE: s.e.s are based on dispersion parameter with value 1

Fitting Cattle gives similar results, but the leverage effects are confined to the larger
two levels.
5692       MODEL   [DISTRIBUTION=binomial;       LINK=logit;   DISPERSION=1]      VFarmPos;
NBINOMIAL=N_Bin
5693 TERMS [FACT=9] Cattle
5694   FIT [PRINT=model,summary,estimates;      CONSTANT=estimate;   FPROB=yes;   TPROB=yes;
FACT=9]\
5695   Cattle

5695............................................................................


***** Regression Analysis *****

 Response variate:   VFarmPos
  Binomial totals:   N_Bin
     Distribution:   Binomial
    Link function:   Logit
     Fitted terms:   Constant, Cattle


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       3         14.4        4.815      4.81 0.002
Residual       948        982.6        1.036
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
           62         1.00      0.0454
           70         1.00      0.0454
          165         0.00      0.0454
          182         0.00      0.0454
          200         0.00      0.0454
          201         0.00      0.0454
          284         1.00      0.0454
          310         0.00      0.0454
          348         0.00      0.0454
          370         0.00      0.0454
          418         0.00      0.0454
          437         0.00      0.0454
          444         1.00      0.1664
          460         1.00      0.0454
          494         0.00      0.1664
          496         0.00      0.0454
          527         0.00      0.1664
          599         1.00      0.0454
          603         1.00      0.0454
          651         1.00      0.0454
          680         0.00      0.0454
          737         0.00      0.1664
          748         0.00      0.0454
          750         1.00      0.0454
          761         0.00      0.1664
          763         0.00      0.1664
          769         0.00      0.0454
          884         1.00      0.0454


*** Estimates of parameters ***




                                           88
                                                           antilog of
                  estimate         s.e.      t(*)    t pr.   estimate
Constant            -1.560        0.118    -13.24    <.001     0.2101
Cattle 2             0.514        0.162      3.18    0.001      1.672
Cattle 3             1.192        0.449      2.65    0.008      3.294
Cattle 4             -0.05         1.10     -0.04    0.964     0.9517
* MESSAGE: s.e.s are based on dispersion parameter   with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
              Cattle 1

However, the leverage issues are restricted to the largest two levels of the factor.
Refitting the model, restricting the fit to lower levels, gives the following output:
5701   RESTRICT Cattle;CONDITION=Cattle.LT.3

* MESSAGE: The structure Cattle is already restricted. Results may be unexpected
.
5702       MODEL    [DISTRIBUTION=binomial;   LINK=logit;    DISPERSION=1]    VFarmPos;
NBINOMIAL=N_Bin
5703 TERMS [FACT=9] Cattle
5704   FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes;
FACT=9]\
5705   Cattle

* MESSAGE: Term Cattle cannot be fully included in the model
  because 2 parameters are aliased with terms already in the model

  (Cattle 3) = 0

  (Cattle 4) = 0

***** Regression Analysis *****

 Response variate:   VFarmPos
  Binomial totals:   N_Bin
     Distribution:   Binomial
    Link function:   Logit
     Fitted terms:   Constant, Cattle


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       1         10.2       10.176     10.18 0.001
Residual       922        947.4        1.028
Total          923        957.6        1.037
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses


*** Estimates of parameters ***

                                                           antilog of
                  estimate         s.e.      t(*)    t pr.   estimate
Constant            -1.560        0.118    -13.24    <.001     0.2101
Cattle 2             0.514        0.162      3.18    0.001      1.672
Cattle 3                 0            *         *        *      1.000
Cattle 4                 0            *         *        *      1.000
* MESSAGE: s.e.s are based on dispersion parameter   with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
              Cattle 1

The Cattle factor is highly significant and well-fitting. It is therefore preferable to the
N_Cattle variable.




                                            89
Fitting N_Sam_Gr gives rise to the following output:
5557      GROUPS   [LMETHOD=*;boundaries=upper]   N_Groups;  RevGCat;   limits=!(1.5);
LABELS=!T(One, More)
5558       MODEL    [DISTRIBUTION=binomial;    LINK=logit;  DISPERSION=1]    VFarmPos;
NBINOMIAL=N_Bin
5559   FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes;
FACT=9]\
5560 N_Sam_Gr


5560............................................................................


***** Regression Analysis *****

Response variate:   VFarmPos
 Binomial totals:   N_Bin
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant, N_Sam_Gr


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       1         23.1       23.052     23.05 <.001
Residual       950        974.0        1.025
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
           18         0.00      0.0147
           54         0.00      0.0098
           59         0.00      0.0074
           61         1.00      0.0070
           70         1.00      0.0505
          107         0.00      0.0126
          123         0.00      0.0351
          149         0.00      0.0158
          167         0.00      0.0290
          267         1.00      0.0186
          363         0.00      0.0228
          413         0.00      0.0169
          440         1.00      0.0074
          503         0.00      0.0158
          510         0.00      0.0074
          532         1.00      0.0169
          544         0.00      0.0098
          578         1.00      0.0228
          584         1.00      0.0351
          603         1.00      0.0090
          609         0.00      0.0198
          620         1.00      0.0290
          637         1.00      0.0074
          681         1.00      0.0136
          703         1.00      0.0290
          743         0.00      0.0070
          781         1.00      0.0136
          831         0.00      0.0141
          838         1.00      0.0074
          891         0.00      0.0086
          906         0.00      0.0406
          924         0.00      0.0116


*** Estimates of parameters ***

                                                             antilog of
                    estimate         s.e.     t(*)   t pr.     estimate
Constant              -1.770        0.134   -13.23   <.001       0.1703
N_Sam_Gr             0.02106      0.00444     4.75   <.001        1.021




                                            90
* MESSAGE: s.e.s are based on dispersion parameter with value 1


Again, many of the points have a high leverage: these are farms with particularly high
numbers of animals. Examining the properties of N_Sam_Gr we define a factor based
on the quartiles of the distribution.
5583   DESCRIBE [SELECTION=nobs,nmv,mean,median,min,max,q1,q3] N_Sam_Gr


Summary statistics for N_Sam_Gr

     Number of observations   =   952
   Number of missing values   =   0
                       Mean   =   21.85
                     Median   =   17.00
                    Minimum   =   2.00
                    Maximum   =   177.00
             Lower quartile   =   11.00
             Upper quartile   =   28.00

5586   GROUPS [LMETHOD=*;boundaries=upper] N_Sam_Gr; SamGrF; limits=!(11,17,28)

5586 GROUPS [LMETHOD=*;boundaries=upper] N_Sam_Gr; SamGrF; limits=!(11,17,28)
5587       MODEL   [DISTRIBUTION=binomial;   LINK=logit;    DISPERSION=1]    VFarmPos;
NBINOMIAL=N_Bin
5588   FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes;
FACT=9]\
5589 SamGrF

5589............................................................................


***** Regression Analysis *****

Response variate:   VFarmPos
 Binomial totals:   N_Bin
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant, SamGrF


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       3         32.2       10.728     10.73 <.001
Residual       948        964.8        1.018
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses


*** Estimates of parameters ***

                                                           antilog of
                  estimate         s.e.      t(*)    t pr.   estimate
Constant            -2.089        0.204    -10.24    <.001     0.1239
SamGrF 2             0.836        0.257      3.25    0.001      2.307
SamGrF 3             0.847        0.256      3.31    <.001      2.332
SamGrF 4             1.330        0.248      5.37    <.001      3.782
* MESSAGE: s.e.s are based on dispersion parameter   with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
              SamGrF 1


This factor fits well and is extremely statistically significant. Hence, SamGrF is
preferred to N_Sam_Gr for further analysis.



                                           91
Since Natural was such an important factor in the levels of shedding analysis, and the
observed p-value in this analysis was only marginally above 0.1, it is worthwhile to
review the effect of this factor in more depth. Focusing only on unhoused animals,
given the negligible number of farms with housed animals and a natural source of
water (7), and using the factor Natural2 to review the effect of natural water supplies
on unhoused animals only, the observed p-value increases to 0.12. Hence, this factor
is not considered for inclusion in the multifactor model.

Hence, FCattle, RevGCat, SamGrF and Cattle are the preferred factors for further
review, with the other factors being removed primarily for reasons of model fit.

Exploring the FCattle/RevGCat/SamGrF/Cattle complex, which all associate higher
risk of shedding being identified on a farm with larger numbers of cattle, using
forward stepwise selection with the Akaike information criterion to select candidates
for inclusion/exclusion, we generate the following output:
5594        MODEL  [DISTRIBUTION=binomial;   LINK=logit;   DISPERSION=1]   VFarmPos;
NBINOMIAL=N_Bin
5595 RSEARCH [PRINT=model,results; METHOD=fstepwise; CONSTANT=estimate; FACTORIAL=3;
DENOMINATOR=ss;\
5596    INRATIO=1; OUTRATIO=1; MAXCYCLE=50; AFACTORIAL=2; CRITERION=aic; EXTRA=cp;
NTERMS=60;\
5597 NBESTMODELS=8] FCattle+RevGCat+SamGrF+Cattle

***** Model Selection *****

 Response variate:   VFarmPos
  Binomial totals:   N_Bin
     Distribution:   Binomial
    Link function:   Logit
  Number of units:   952
     Forced terms:   Constant
        Forced df:   1
       Free terms:   FCattle + RevGCat + SamGrF + Cattle


*** Stepwise (forward) analysis of deviance ***

Change                                                      mean    deviance approx
                                 d.f.     deviance      deviance       ratio chi pr
+ SamGrF                            3       32.184        10.728       10.73 <.001
+ Cattle                            3       10.905         3.635        3.63 0.012
+ FCattle                           3        6.721         2.240        2.24 0.081
Residual                          942      947.210         1.006

Total                             951      997.020          1.048

        Final model: Constant + SamGrF + Cattle + FCattle


The factor categorising the numbers of sampling groups is the most relevant, but the
factor categorising the number of cattle on the farm also shows signs of strong
statistical significance. The factor categorising the number of finishing cattle shows
signs of statistical significance, even in the presence of the latter two factors. Only
the factor categorising the total numbers of groups of cattle on the farm is found to
lack any real statistical significance. On this basis, each of the factors FCattle,
SamGrF and Cattle should be candidates for inclusion in the multivariate model.

Considering Source and NewSource as candidate factors, fitting Source (the basic data) gives
the following output:




                                            92
5598       MODEL  [DISTRIBUTION=binomial;           LINK=logit;   DISPERSION=1]       VFarmPos;
NBINOMIAL=N_Bin
5599   FIT [PRINT=model,summary,estimates;         CONSTANT=estimate;   FPROB=yes;   TPROB=yes;
FACT=9]\
5600 Source


***** Regression Analysis *****

 Response variate:   VFarmPos
  Binomial totals:   N_Bin
     Distribution:   Binomial
    Link function:   Logit
     Fitted terms:   Constant, Source


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       2          4.7        2.341      2.34 0.096
Residual       949        992.3        1.046
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses


*** Estimates of parameters ***

                                                               antilog of
                  estimate         s.e.      t(*)        t pr.   estimate
Constant            -1.418        0.103    -13.73        <.001     0.2422
Source Buy           0.326        0.190      1.71        0.087      1.385
Source Both          0.372        0.212      1.75        0.080      1.451
* MESSAGE: s.e.s are based on dispersion parameter       with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
              Source Breed


The factor shows a moderate level of statistical significance, but this is entirely due to
the differences between the class of farms which never buy replacement cattle on one
hand, and those which buy or do both on the other. There is no evidence of any
statistically significant difference between this latter two group: t=0.16, p=0.87.
Hence, it would seem sensible to replace Source with a new factor, New Source,
which consolidates the farms into a single ‘Open’ class and a ‘Closed’ class. Fitting
this factor gives:
5601       MODEL  [DISTRIBUTION=binomial;           LINK=logit;   DISPERSION=1]       VFarmPos;
NBINOMIAL=N_Bin
5602   FIT [PRINT=model,summary,estimates;         CONSTANT=estimate;   FPROB=yes;   TPROB=yes;
FACT=9]\
5603 NewSource


***** Regression Analysis *****

 Response variate:   VFarmPos
  Binomial totals:   N_Bin
     Distribution:   Binomial
    Link function:   Logit
     Fitted terms:   Constant, NewSource


*** Summary of analysis ***

                                            mean    deviance approx
              d.f.      deviance        deviance       ratio chi pr
Regression       1           4.6           4.647        4.65 0.031



                                             93
Residual       950        992.4        1.045
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses


*** Estimates of parameters ***

                                                           antilog of
                  estimate         s.e.      t(*)    t pr.   estimate
Constant            -1.418        0.103    -13.73    <.001     0.2422
NewSource 2          0.346        0.159      2.17    0.030      1.413
* MESSAGE: s.e.s are based on dispersion parameter   with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
           NewSource 1
Farms which never buy in replacement cattle have statistically significantly (p=0.03)
lower risk of exhibiting a shedding animal than those which occasionally or
frequently buy animals in. NewSource will be a candidate factor in the multivariate
analysis.

BeefonDairy is a variable defined after close consideration of the properties of the
dataset, in particular, Breed and Manage_O. Breed shows some evidence of
significance in the bivariate analysis, but there is also evidence that the effect is
confined to a subset of farms. Manage_O exhibits no evidence of significant
differences in prevalence, but is important in understanding the patterns seen in
Breed.

Fitting Breed as a main effect gives the following output:
5562 "Modelling of binomial proportions. (e.g. by logits)."
5563       MODEL   [DISTRIBUTION=binomial;   LINK=logit;    DISPERSION=1]    VFarmPos;
NBINOMIAL=N_Bin
5564 TERMS [FACT=9] Breed
5565   FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes;    TPROB=yes;
FACT=9]\
5566   Breed

5566............................................................................


* MESSAGE: Term Breed cannot be fully included in the model
  because 1 parameter is aliased with terms already in the model

  (Breed B_D) = 0

***** Regression Analysis *****

 Response variate:   VFarmPos
  Binomial totals:   N_Bin
     Distribution:   Binomial
    Link function:   Logit
     Fitted terms:   Constant, Breed


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       5         12.6        2.513      2.51 0.028
Residual       946        984.5        1.041
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00




                                          94
* MESSAGE: The error variance does not appear to be constant:
           intermediate responses are more variable than small or large
responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
            7         1.00       0.091
            8         0.00       0.024
           17         1.00       0.091
           60         0.00       0.024
           87         0.00       0.024
          101         1.00       0.091
          110         0.00       0.024
          113         0.00       0.091
          116         0.00       0.091
          118         0.00       0.091
          184         0.00       0.024
          185         0.00       0.091
          223         0.00       0.024
          280         0.00       0.024
          291         0.00       0.024
          306         0.00       0.024
          314         0.00       0.024
          338         0.00       0.024
          345         0.00       0.024
          350         0.00       0.024
          447         0.00       0.091
          479         0.00       0.024
          485         0.00       0.024
          494         0.00       0.024
          542         0.00       0.024
          593         0.00       0.024
          595         0.00       0.024
          596         0.00       0.024
          598         0.00       0.024
          599         1.00       0.024
          600         0.00       0.166
          607         0.00       0.024
          619         0.00       0.024
          620         1.00       0.166
          637         1.00       0.166
          645         0.00       0.024
          646         1.00       0.024
          661         0.00       0.024
          688         1.00       0.166
          702         0.00       0.024
          708         0.00       0.024
          725         0.00       0.024
          728         0.00       0.024
          729         0.00       0.024
          735         0.00       0.024
          747         0.00       0.024
          755         0.00       0.091
          762         0.00       0.024
          813         1.00       0.024
          825         1.00       0.024
          826         0.00       0.024
          856         0.00       0.024
          859         1.00       0.091
          864         1.00       0.024
          884         1.00       0.166
          896         0.00       0.024
          911         0.00       0.166
          951         0.00       0.024
          952         0.00       0.091


*** Estimates of parameters ***

                                                            antilog of
                  estimate          s.e.     t(*)   t pr.     estimate
Constant           -1.2595        0.0872   -14.44   <.001       0.2838
Breed DB            -0.532         0.352    -1.51   0.131       0.5873
Breed D              0.700         0.632     1.11   0.268        2.014
Breed B_DB           0.182         0.301     0.60   0.546        1.200
Breed DB_D          -0.742         0.484    -1.53   0.126       0.4762
Breed B_D                0             *        *       *        1.000
Breed B_D_DB         1.953         0.868     2.25   0.024        7.047



                                           95
* MESSAGE: s.e.s are based on dispersion parameter with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
               Breed B


However, the patterns is very different on different types of farm.

Tabulating the number of farms and number of positive farms with respect to their
recorded values for Breed and Manage_O, gives the following results (the number of
farms recorded as “Mixed” are too small for any statistical analysis, and are excluded;
no animals were recorded as “B_D”):

Number Dairy           Beef         Other
B                 11          576        173
DB                59            3          8
D                 11            -          -
B_DB              25           18         18
DB_D              42            -          -
B_D_DB             5            1          -

Positives Dairy        Beef         Other
B                  6          123           39
DB                 9            0            1
D                  4            -            -
B_DB               6            6            4
DB_D               5            -            -
B_D_DB             3            1            -

The means and marginal means for these tables are given by:

               Dairy                Beef              Other        All
B              0.545                0.214             0.225       0.221
DB             0.153                0.000             0.125       0.143
D              0.364                  -                 -         0.364
B_DB           0.240                0.333             0.222       0.262
DB_D           0.119                  -                 -         0.119
B_D_DB         0.600                1.000                         0.667
All            0.216                0.217             0.221       0.218

Overall, there are clearly no significant differences between the mean prevalences on
the different classes of farm. However, there is no clear evidence of any differences
in the prevalence rates for different breeds on beef farms, and no evidence of any
differences in the prevalence rates for different breeds on ‘Other’ farms. Similarly,
for every breed except beef animals, there is no evidence of any differences in
prevalence for the breed on different types of farm. However, an attempt to fit the
interaction of Breed and Manage_O to the prevalence data gives the following output:
5716 "Modelling of binomial proportions. (e.g. by logits)."
5717       MODEL   [DISTRIBUTION=binomial;   LINK=logit;    DISPERSION=1]    VFarmPos;
NBINOMIAL=N_Bin
5718 TERMS [FACT=9] Breed.Manage_O



                                                 96
5719   FIT [PRINT=model,summary,estimates;      CONSTANT=estimate;   FPROB=yes;   TPROB=yes;
FACT=9]\
5720   Breed.Manage_O

5720............................................................................


* MESSAGE: Term Breed.Manage_O cannot be fully included in the model
  because 14 parameters are aliased with terms already in the model

 (Breed B .Manage_O Mixed) = 0

 (Breed DB .Manage_O Mixed) = 0

 (Breed D .Manage_O Beef) = 0

 (Breed D .Manage_O Other) = 0

 (Breed D .Manage_O Mixed) = 0

 (Breed DB_D .Manage_O Beef) = 0

 (Breed DB_D .Manage_O Other) = 0

 (Breed DB_D .Manage_O Mixed) = 0

 (Breed B_D .Manage_O Dairy) = 0

 (Breed B_D .Manage_O Beef) = 0

 (Breed B_D .Manage_O Other) = 0

 (Breed B_D .Manage_O Mixed) = 0

 (Breed B_D_DB .Manage_O Other) = 0

 (Breed B_D_DB .Manage_O Mixed) = 0

***** Regression Analysis *****

Response variate:   VFarmPos
 Binomial totals:   N_Bin
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant + Breed.Manage_O


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression      13         22.0        1.689      1.69 0.056
Residual       938        975.1        1.040
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
            5         0.00       0.125
            7         1.00       0.091
            9         0.00       0.056
           17         1.00       0.091
           30         1.00       0.056
           89         0.00       0.056
           93         0.00       0.123
          101         1.00       0.091
          113         0.00       0.091
          114         0.00       0.056
          116         0.00       0.091
          118         0.00       0.091
          131         1.00       0.091
          143         1.00       0.056
          148         0.00       0.123
          183         1.00       0.056
          185         0.00       0.091
          221         0.00       0.184



                                           97
         222         0.00         0.056
         274         0.00         0.125
         297         0.00         0.091
         301         1.00         0.091
         316         0.00         0.125
         340         0.00         0.056
         343         1.00         0.056
         351         0.00         0.184
         384         0.00         0.091
         385         1.00         0.091
         391         0.00         0.125
         440         1.00         0.056
         441         0.00         0.125
         447         0.00         0.091
         461         0.00         0.056
         467         0.00         0.125
         469         0.00         0.056
         495         1.00         0.056
         497         0.00         0.091
         503         0.00         0.056
         544         0.00         0.123
         550         1.00         0.056
         572         1.00         0.125
         590         0.00         0.125
         600         0.00         0.200
         601         1.00         0.091
         602         0.00         0.091
         620         1.00         0.200
         629         0.00         0.056
         636         0.00         0.056
         637         1.00         0.200
         640         1.00         0.056
         660         0.00         0.056
         667         0.00         0.056
         688         1.00         0.369
         701         1.00         0.091
         751         0.00         0.056
         755         0.00         0.091
         767         0.00         0.056
         777         0.00         0.056
         788         1.00         0.091
         806         0.00         0.056
         809         1.00         0.056
         810         0.00         0.056
         812         0.00         0.056
         816         0.00         0.056
         835         0.00         0.056
         858         0.00         0.091
         859         1.00         0.091
         866         0.00         0.056
         867         1.00         0.056
         882         0.00         0.056
         884         1.00         0.200
         895         0.00         0.056
         906         0.00         0.056
         911         0.00         0.200
         912         0.00         0.056
         923         0.00         0.056
         952         0.00         0.091


*** Estimates of parameters ***

                                                                       antilog of
                              estimate          s.e.    t(*)   t pr.     estimate
Constant                         0.182         0.606    0.30   0.763        1.200
Breed B .Manage_O Beef          -1.486         0.614   -2.42   0.016       0.2263
Breed B .Manage_O Other         -1.417         0.632   -2.24   0.025       0.2425
Breed B .Manage_O Mixed              0             *       *       *        1.000
Breed DB .Manage_O Dairy        -1.897         0.706   -2.69   0.007       0.1500
Breed DB .Manage_O Beef          -6.75          9.36   -0.72   0.471     0.001175
Breed DB .Manage_O Other         -2.13          1.23   -1.73   0.083       0.1190
Breed DB .Manage_O Mixed             0             *       *       *        1.000
Breed D .Manage_O Dairy         -0.742         0.872   -0.85   0.395       0.4762
Breed D .Manage_O Beef               0             *       *       *        1.000
Breed D .Manage_O Other              0             *       *       *        1.000
Breed D .Manage_O Mixed              0             *       *       *        1.000



                                          98
Breed B_DB .Manage_O Dairy
                                  -1.335            0.765     -1.74     0.081    0.2632
Breed B_DB .Manage_O Beef         -0.875            0.785     -1.11     0.265    0.4167
Breed B_DB .Manage_O Other
                                  -1.435            0.830     -1.73     0.084    0.2381
Breed B_DB .Manage_O Mixed
                                       -6.7          11.5     -0.59     0.556   0.001175
Breed DB_D .Manage_O Dairy
                                  -2.184            0.771     -2.83     0.005    0.1126
Breed DB_D .Manage_O Beef              0                *         *         *     1.000
Breed DB_D .Manage_O Other
                                         0              *         *        *      1.000
Breed DB_D .Manage_O Mixed
                                         0              *         *        *      1.000
Breed   B_D .Manage_O Dairy              0              *         *        *      1.000
Breed   B_D .Manage_O Beef               0              *         *        *      1.000
Breed   B_D .Manage_O Other              0              *         *        *      1.000
Breed   B_D .Manage_O Mixed              0              *         *        *      1.000
Breed   B_D_DB .Manage_O Dairy
                                       0.22          1.10      0.20     0.839     1.250
Breed B_D_DB .Manage_O Beef
                                       5.04          8.33      0.60     0.545     153.9
Breed B_D_DB .Manage_O Other
                                         0              *         *        *      1.000
Breed B_D_DB .Manage_O Mixed
                                     0            *         *              *      1.000
* MESSAGE: s.e.s are based on dispersion parameter with value 1


The model fit is extremely messy: many of the terms are aliased, and the leverage
situation is extremely complicated. The model fit has a p-value of 0.056, not quite
formally significant, but rather impressive where 13 degrees of freedom have been
used to fit interaction terms where we believe that only one term is likely to be
significant.

As noted earlier, there is no evidence of any pattern as a function of breed in the beef
and ‘Other’ herds: hence it might be informative to examine the output from fitting
Breed to only the Dairy herds:
5569       MODEL   [DISTRIBUTION=binomial;          LINK=logit;   DISPERSION=1]       VFarmPos;
NBINOMIAL=N_Bin
5570 TERMS [FACT=9] Breed
5571   FIT [PRINT=model,summary,estimates;         CONSTANT=estimate;    FPROB=yes;   TPROB=yes;
FACT=9]\
5572   Breed

5572............................................................................


* MESSAGE: Term Breed cannot be fully included in the model
  because 1 parameter is aliased with terms already in the model

  (Breed B_D) = 0

***** Regression Analysis *****

 Response variate:   VFarmPos
  Binomial totals:   N_Bin
     Distribution:   Binomial
    Link function:   Logit
     Fitted terms:   Constant, Breed


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       5         14.6       2.9249      2.92 0.012
Residual       147        144.9       0.9859
Total          152        159.5       1.0496
* MESSAGE: ratios are based on dispersion parameter with value 1




                                              99
Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
          600         0.00       0.199
          620         1.00       0.199
          637         1.00       0.199
          884         1.00       0.199
          911         0.00       0.199


*** Estimates of parameters ***

                                                            antilog of
                  estimate         s.e.      t(*)     t pr.   estimate
Constant             0.182        0.605      0.30     0.763      1.200
Breed DB            -1.897        0.705     -2.69     0.007     0.1500
Breed D             -0.742        0.870     -0.85     0.394     0.4762
Breed B_DB          -1.335        0.764     -1.75     0.081     0.2632
Breed DB_D          -2.184        0.770     -2.84     0.005     0.1126
Breed B_D                0            *         *         *      1.000
Breed B_D_DB          0.22         1.09      0.20     0.838      1.250
* MESSAGE: s.e.s are based on dispersion parameter    with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
               Breed B


The resulting model is statistically significant (p=0.01). It may be informative to
examine confidence intervals for the mean prevalences for different breeds in dairy
herds:


                      1
                     0.9
                     0.8
                     0.7
   Mean Prevalence




                     0.6
                     0.5
                     0.4
                     0.3
                     0.2
                     0.1
                      0
                           B   DB     D         B_D       DB_D      B_D_DB
                                    Breed of Animal



Restricting attention only to animals outwith the B or B_D_DB classes, the following
output is generated:
5654       MODEL   [DISTRIBUTION=binomial;      LINK=logit;    DISPERSION=1]     VFarmPos;
NBINOMIAL=N_Bin
5655 TERMS [FACT=9] Breed
5656   FIT [PRINT=model,summary,estimates;   CONSTANT=estimate;    FPROB=yes;   TPROB=yes;
FACT=9]\
5657   Breed




                                          100
5657............................................................................


* MESSAGE: Term Breed cannot be fully included in the model
  because 3 parameters are aliased with terms already in the model

 (Breed B) = 0

 (Breed B_D) = 0

 (Breed B_D_DB) = 0

***** Regression Analysis *****

Response variate:   VFarmPos
 Binomial totals:   N_Bin
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant, Breed


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       3          4.1       1.3683      1.37 0.250
Residual       133        123.0       0.9251
Total          136        127.1       0.9348
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
            7         1.00       0.091
           17         1.00       0.091
          101         1.00       0.091
          113         0.00       0.091
          116         0.00       0.091
          118         0.00       0.091
          185         0.00       0.091
          447         0.00       0.091
          755         0.00       0.091
          859         1.00       0.091
          952         0.00       0.091


*** Estimates of parameters ***

                                                           antilog of
                  estimate         s.e.      t(*)    t pr.   estimate
Constant            -1.715        0.362     -4.74    <.001     0.1800
Breed B                  0            *         *        *      1.000
Breed D              1.155        0.723      1.60    0.110      3.175
Breed B_DB           0.562        0.591      0.95    0.341      1.754
Breed DB_D          -0.287        0.598     -0.48    0.632     0.7508
Breed B_D                0            *         *        *      1.000
Breed B_D_DB             0            *         *        *      1.000
* MESSAGE: s.e.s are based on dispersion parameter   with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
               Breed DB


There is no evidence of any differences between the prevalences on these classes of
farms. Examining the B and B_D_DB classes, tabulating their positive and negative
values and carrying out a Fisher’s Exact test, we get:
5634   FEXACT2X2 [PRINT=prob] C1

***** Fisher's Exact Test *****

One-tailed significance level      0.635



                                           101
                     Mid-P value   0.433

 Two-tailed significance level
     Two times one-tailed significance level     1.269
                                 Mid-P value     0.865
     Sum of all outcomes with Prob<=Observed     1.000
                                 Mid-P value     0.798


There is no evidence of any difference in prevalence between the B and B_D_DB
classes in dairy herds. However, fitting a model only to dairy herds, while excluding
the beef class, gives the following output:
5660       MODEL    [DISTRIBUTION=binomial;   LINK=logit;    DISPERSION=1]    VFarmPos;
NBINOMIAL=N_Bin
5661 TERMS [FACT=9] Breed
5662   FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes;
FACT=9]\
5663   Breed
5663............................................................................


* MESSAGE: Term Breed cannot be fully included in the model
  because 2 parameters are aliased with terms already in the model

  (Breed B) = 0

  (Breed B_D) = 0

***** Regression Analysis *****

 Response variate:   VFarmPos
  Binomial totals:   N_Bin
     Distribution:   Binomial
    Link function:   Logit
     Fitted terms:   Constant, Breed


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       4          8.4       2.0954      2.10 0.079
Residual       137        129.8       0.9472
Total          141        138.1       0.9798
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
          600         0.00       0.199
          620         1.00       0.199
          637         1.00       0.199
          884         1.00       0.199
          911         0.00       0.199


*** Estimates of parameters ***

                                                           antilog of
                  estimate         s.e.      t(*)    t pr.   estimate
Constant            -1.715        0.362     -4.74    <.001     0.1800
Breed B                  0            *         *        *      1.000
Breed D              1.155        0.723      1.60    0.110      3.175
Breed B_DB           0.562        0.591      0.95    0.341      1.754
Breed DB_D          -0.287        0.598     -0.48    0.632     0.7508
Breed B_D                0            *         *        *      1.000
Breed B_D_DB         2.120        0.980      2.16    0.031      8.333
* MESSAGE: s.e.s are based on dispersion parameter   with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
               Breed DB



                                           102
Hence, although the prevalence in group B_D_DB is higher, strictly speaking it is not
statistically significantly higher than in the lower classes (p=0.08). However, the
sample size is extremely small, and the comparison will have lacked power.

The greatest danger in this exercise is to overtrawl the data. The overall effect of
fitting the Manage_O by Breed interaction was close to formal statistical significance.
Hence, we are not unjustified, invoking the overall test as a type of Fisher test for
multiple comparisons, in investigating the properties of individual interaction terms.
However, it would seem unwise to be overly liberal in then assigning importance to
extremely small samples from the data, which actually lack formal statistical
significance. In addition, the effect of beef animals on dairy herds appears to be
specific to this type of farm. It is impossible to have the same confidence about the
properties of the B_D_DB class, since the sample size in anything but the dairy herd
is negligible.

In conclusion, it seems rational to create a new variable, BeefonDairy, to identify
those farms with beef animals and a dairy management system. Fitting this variable
gives the following results:
5665 "Modelling of binomial proportions. (e.g. by logits)."
5666       MODEL   [DISTRIBUTION=binomial;   LINK=logit;    DISPERSION=1]    VFarmPos;
NBINOMIAL=N_Bin
5667 TERMS [FACT=9] BeefonDairy
5668   FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes;    TPROB=yes;
FACT=9]\
5669   BeefonDairy

5669............................................................................


***** Regression Analysis *****

 Response variate:   VFarmPos
  Binomial totals:   N_Bin
     Distribution:   Binomial
    Link function:   Logit
     Fitted terms:   Constant, BeefonDairy


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       1          5.7        5.685      5.69 0.017
Residual       950        991.3        1.044
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
          131         1.00      0.0908
          297         0.00      0.0908
          301         1.00      0.0908
          384         0.00      0.0908
          385         1.00      0.0908
          497         0.00      0.0908
          601         1.00      0.0908
          602         0.00      0.0908
          701         1.00      0.0908
          788         1.00      0.0908
          858         0.00      0.0908




                                             103
*** Estimates of parameters ***

                                                         antilog of
                  estimate         s.e.      t(*) t pr.    estimate
Constant           -1.3033       0.0794    -16.42 <.001      0.2716
BeefonDairy 1            1.486        0.610      2.43 0.015       4.418
* MESSAGE: s.e.s are based on dispersion parameter with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
             BeefonDairy 0

Farms in this class appear to have a significantly (p=0.02) higher prevalence.
However, care must be taken over interpreting this factor, since it is derived from an
extensive examination of the properties of the dataset.
However, BeefonDairy should clearly be incorporated into the multivariate analysis.
Fitting a model with both BeefonDairy and Breed as main effects, we generate the
following output:
5679       MODEL    [DISTRIBUTION=binomial;     LINK=logit;   DISPERSION=1]      VFarmPos;
NBINOMIAL=N_Bin
5680 TERMS [FACT=9] BeefonDairy+Breed
5681   FIT [PRINT=model,summary,estimates;    CONSTANT=estimate;   FPROB=yes;   TPROB=yes;
FACT=9]\
5682   BeefonDairy+Breed

5682............................................................................


* MESSAGE: Term Breed cannot be fully included in the model
  because 1 parameter is aliased with terms already in the model

 (Breed B_D) = 0

***** Regression Analysis *****

Response variate:   VFarmPos
 Binomial totals:   N_Bin
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant + BeefonDairy + Breed


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       6         18.1        3.019      3.02 0.006
Residual       945        978.9        1.036
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
            7         1.00       0.091
           17         1.00       0.091
          101         1.00       0.091
          113         0.00       0.091
          116         0.00       0.091
          118         0.00       0.091
          131         1.00       0.091
          185         0.00       0.091
          297         0.00       0.091
          301         1.00       0.091
          384         0.00       0.091
          385         1.00       0.091
          447         0.00       0.091
          497         0.00       0.091
          600         0.00       0.166



                                          104
          601            1.00         0.091
          602            0.00         0.091
          620            1.00         0.166
          637            1.00         0.166
          688            1.00         0.166
          701            1.00         0.091
          755            0.00         0.091
          788            1.00         0.091
          858            0.00         0.091
          859            1.00         0.091
          884            1.00         0.166
          911            0.00         0.166
          952            0.00         0.091


*** Estimates of parameters ***

                                                                        antilog of
                              estimate         s.e.      t(*) t pr.       estimate
Constant                        -1.792        0.341     -5.25 <.001         0.1667
BeefonDairy 1                    1.470        0.611      2.40 0.016          4.348
Breed B                          0.504        0.353      1.43 0.153          1.656
Breed D                          1.232        0.713      1.73 0.084          3.429
Breed B_DB                       0.714        0.447      1.60 0.110          2.043
Breed DB_D                      -0.210        0.586     -0.36 0.721         0.8108
Breed B_D                            0            *         *     *          1.000
Breed B_D_DB                     2.485        0.928      2.68 0.007          12.00
* MESSAGE: s.e.s are based on dispersion parameter with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
         BeefonDairy 0
               Breed DB

5683    DROP [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes]
Breed

5683............................................................................


***** Regression Analysis *****

Response variate:      VFarmPos
 Binomial totals:      N_Bin
    Distribution:      Binomial
   Link function:      Logit
    Fitted terms:      Constant + BeefonDairy


*** Summary of analysis ***

                                            mean      deviance approx
                d.f.       deviance     deviance         ratio chi pr
Regression         1            5.7        5.685          5.69 0.017
Residual         950          991.3        1.044
Total            951          997.0        1.048

Change           5         12.4        2.486      2.49 0.029
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
          131         1.00      0.0908
          297         0.00      0.0908
          301         1.00      0.0908
          384         0.00      0.0908
          385         1.00      0.0908
          497         0.00      0.0908
          601         1.00      0.0908
          602         0.00      0.0908
          701         1.00      0.0908
          788         1.00      0.0908
          858         0.00      0.0908




                                                105
*** Estimates of parameters ***

                                                                        antilog of
                              estimate         s.e.      t(*) t pr.       estimate
Constant                       -1.3033       0.0794    -16.42 <.001         0.2716
BeefonDairy 1                    1.486        0.610      2.43 0.015          4.418
* MESSAGE: s.e.s are based on dispersion parameter with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
         BeefonDairy 0

Both Breed (p=0.03) and BeefonDairy (p=0.02) are formally significantly explaining
variability in the dataset. The latter is no surprise, but the former result deserves
further attention. It is no surprise that the effect is completely driven by the B_D_DB
level of the factor. This small group of 6 farms have a much higher prevalence.
Leverage is a problem, but it would seem reasonable to define a new factor based
exclusively around this breed, and include it in the multivariate analysis. Fitting
Breed2 gives the following output:
5722       MODEL   [DISTRIBUTION=binomial;     LINK=logit;    DISPERSION=1]     VFarmPos;
NBINOMIAL=N_Bin
5723 TERMS [FACT=9] Breed2
5724   FIT [PRINT=model,summary,estimates;   CONSTANT=estimate;    FPROB=yes;   TPROB=yes;
FACT=9]\
5725   Breed2

5725............................................................................


***** Regression Analysis *****

 Response variate:   VFarmPos
  Binomial totals:   N_Bin
     Distribution:   Binomial
    Link function:   Logit
     Fitted terms:   Constant, Breed2


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       1          5.6        5.595      5.59 0.018
Residual       950        991.4        1.044
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
          600         0.00      0.1656
          620         1.00      0.1656
          637         1.00      0.1656
          688         1.00      0.1656
          884         1.00      0.1656
          911         0.00      0.1656


*** Estimates of parameters ***

                                                           antilog of
                  estimate         s.e.      t(*)    t pr.   estimate
Constant           -1.2975       0.0790    -16.42    <.001     0.2732
Breed2 1             1.991        0.867      2.30    0.022      7.320
* MESSAGE: s.e.s are based on dispersion parameter   with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
              Breed2 0




                                         106
Breed2 is therefore included in the multivariate analysis.

When investigating the properties of the factors Grass_Manure and Grass_Slurry, it is
important to remember that these questions were, for the most part, only asked of farms where
the animals were at pasture. Only 3 farms with housed animals recorded an answer to the
questions about the properties of their pasture.

Tabulating out the properties by Housing and slurry status gives the following tables:

Number of Farms
Housed No Slurry Yes Slurry Blank
      0       308        77        0
      1         3         0      563

Number Positive
Housed No Slurry Yes Slurry Blank
      0         53       27        -
      1          0        -      126

Fraction Positive
Housed No Slurry Yes Slurry Blank
       0       0.172  0.351         -
       1       0.000      -     0.224

The effect is clearly not just due to differences between housed and unhoused farms.
Fitting the GLM gives the following output (the effect of the small number of housed
animals which have non blank returns will be small and hence will be ignored for the
moment):
5789       MODEL   [DISTRIBUTION=binomial;         LINK=logit;     DISPERSION=1]     VFarmPos;
NBINOMIAL=N_Bin
5790 TERMS [FACT=9] Gra_Slur
5791   FIT [PRINT=model,summary,estimates;       CONSTANT=estimate;    FPROB=yes;   TPROB=yes;
FACT=9]\
5792   Gra_Slur

5792............................................................................


***** Regression Analysis *****

 Response variate:   VFarmPos
  Binomial totals:   N_Bin
     Distribution:   Binomial
    Link function:   Logit
     Fitted terms:   Constant, Gra_Slur


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       2         11.6        5.777      5.78 0.003
Residual       948        982.4        1.036
Total          950        994.0        1.046
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage



                                             107
 46   0.00   0.0129
 51   1.00   0.0129
 53   1.00   0.0129
 55   1.00   0.0129
 61   1.00   0.0129
 63   0.00   0.0129
 80   1.00   0.0129
 83   0.00   0.0129
 84   0.00   0.0129
 86   0.00   0.0129
 87   0.00   0.0129
 92   0.00   0.0129
100   1.00   0.0129
110   0.00   0.0129
116   0.00   0.0129
118   0.00   0.0129
119   0.00   0.0129
128   0.00   0.0129
129   0.00   0.0129
132   0.00   0.0129
133   1.00   0.0129
135   0.00   0.0129
139   1.00   0.0129
143   1.00   0.0129
174   1.00   0.0129
180   0.00   0.0129
189   1.00   0.0129
190   0.00   0.0129
196   1.00   0.0129
199   0.00   0.0129
202   1.00   0.0129
204   1.00   0.0129
206   0.00   0.0129
215   1.00   0.0129
217   1.00   0.0129
219   0.00   0.0129
225   0.00   0.0129
226   0.00   0.0129
230   0.00   0.0129
247   0.00   0.0129
345   0.00   0.0129
507   0.00   0.0129
533   0.00   0.0129
541   0.00   0.0129
542   0.00   0.0129
543   0.00   0.0129
546   0.00   0.0129
547   0.00   0.0129
548   0.00   0.0129
552   0.00   0.0129
566   1.00   0.0129
578   1.00   0.0129
581   1.00   0.0129
593   0.00   0.0129
598   0.00   0.0129
603   1.00   0.0129
606   0.00   0.0129
608   0.00   0.0129
612   0.00   0.0129
613   1.00   0.0129
637   1.00   0.0129
639   1.00   0.0129
640   1.00   0.0129
645   0.00   0.0129
646   1.00   0.0129
659   0.00   0.0129
662   0.00   0.0129
663   0.00   0.0129
665   0.00   0.0129
670   0.00   0.0129
677   0.00   0.0129
681   1.00   0.0129
690   0.00   0.0129
702   0.00   0.0129
703   1.00   0.0129
707   0.00   0.0129
924   0.00   0.0129



                      108
*** Estimates of parameters ***

                                                           antilog of
                  estimate         s.e.      t(*)    t pr.   estimate
Constant            -1.583        0.151    -10.50    <.001     0.2054
Gra_Slur 1           0.967        0.282      3.43    <.001      2.629
Gra_Slur 999         0.339        0.181      1.87    0.061      1.404
* MESSAGE: s.e.s are based on dispersion parameter   with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
            Gra_Slur 0

Among animals at pasture, those on farms which spread slurry on the grass are at a
higher risk of presenting shedding than those on farms which do not.

Considering Gra_Manure, we can generate the following tables:

Number of Farms
Housed No Manure Yes Manure Blank
      0         281      104        0
      1           3        0      563

Number Positive
Housed No Manure Yes Manure Blank
      0         67        13
      1          0                126

Fraction Positive
Housed No Manure Yes Manure Blank
       0        0.238   0.125
       1        0.000           0.224

Again, any significance due to this factor is clearly not just due to differences between
housed and unhoused animals. In fact, the prevalences in housed and unhoused/with
no manure on pasture farms are virtually identical. The apparent effect is of unhoused
farms which do spread manure having a lower prevalence. Fitting this as a GLM
gives the following output:
5801       MODEL   [DISTRIBUTION=binomial;      LINK=logit;   DISPERSION=1]      VFarmPos;
NBINOMIAL=N_Bin
5802 TERMS [FACT=9] Gra_Manu
5803   FIT [PRINT=model,summary,estimates;    CONSTANT=estimate;   FPROB=yes;   TPROB=yes;
FACT=9]\
5804   Gra_Manu

5804............................................................................


***** Regression Analysis *****

 Response variate:   VFarmPos
  Binomial totals:   N_Bin
     Distribution:   Binomial
    Link function:   Logit
     Fitted terms:   Constant, Gra_Manu


*** Summary of analysis ***




                                          109
                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       2          6.6        3.307      3.31 0.037
Residual       948        987.3        1.042
Total          950        994.0        1.046
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses


*** Estimates of parameters ***

                                                           antilog of
                  estimate         s.e.      t(*)    t pr.   estimate
Constant            -1.175        0.139     -8.43    <.001     0.3088
Gra_Manu 1          -0.771        0.328     -2.35    0.019     0.4627
Gra_Manu 999        -0.068        0.172     -0.40    0.691     0.9338
* MESSAGE: s.e.s are based on dispersion parameter   with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
            Gra_Manu 0

As indicated above, the significant effect (p=0.04) is associated with the spreading of
manure on farms with unhoused animals, where farms which spread manure are less
likely to present shedding animals.

It is necessary to investigate whether there is any confounding of effects occurring
between Gra_Slurry and Gra_Manure. Tabulating out the properties of the datset
gives the following tables:

Number of Farms

Unhoused Slurry
Manure           0            1
        0     241            40
        1       67           37

Housed           563

Number Positive

Unhoused Slurry
Manure           0            1
        0       49           18
        1        4            9

Housed           126

Fraction Positive

Unhoused Slurry
Manure           0            1
        0    0.203        0.450
        1    0.060        0.243



                                         110
Housed         0.224

All the groups have reasonable support in the data, and it is clear that the Slurry and
Manure effects both appear to be operating on unhoused animals. Fitting both terms
in the same GLM gives the following results (aliasing, mainly due to the blank coding
in both factors for most housed farms makes the output messy, but will not affect the
main estimates of interest):
5815       MODEL    [DISTRIBUTION=binomial;      LINK=logit;   DISPERSION=1]     VFarmPos;
NBINOMIAL=N_Bin
5816 TERMS [FACT=9] Gra_Manu*Gra_Slur
5817   FIT [PRINT=model,summary,estimates;    CONSTANT=estimate;   FPROB=yes;   TPROB=yes;
FACT=9]\
5818   Gra_Manu*Gra_Slur

5818............................................................................


* MESSAGE: Term Gra_Slur cannot be fully included in the model
  because 1 parameter is aliased with terms already in the model

  (Gra_Slur 999) = (Gra_Manu 999)

* MESSAGE: Term Gra_Manu.Gra_Slur cannot be fully included in the model
  because 3 parameters are aliased with terms already in the model

  (Gra_Manu 1 .Gra_Slur 999) = 0

  (Gra_Manu 999 .Gra_Slur 1) = 0

  (Gra_Manu 999 .Gra_Slur 999) = (Gra_Manu 999)

***** Regression Analysis *****

 Response variate:   VFarmPos
  Binomial totals:   N_Bin
     Distribution:   Binomial
    Link function:   Logit
     Fitted terms:   Constant + Gra_Manu + Gra_Slur + Gra_Manu.Gra_Slur


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       4         24.1        6.034      6.03 <.001
Residual       946        969.8        1.025
Total          950        994.0        1.046
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The residuals do not appear to be random;
           for example, fitted values in the range 0.22 to 0.22
           are consistently larger than observed values
           and fitted values in the range 0.45 to 0.45
           are consistently smaller than observed values
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
           46         0.00      0.0269
           51         1.00      0.0250
           53         1.00      0.0250
           55         1.00      0.0250
           61         1.00      0.0269
           63         0.00      0.0250
           80         1.00      0.0269
           83         0.00      0.0250
           84         0.00      0.0269
           86         0.00      0.0269
           87         0.00      0.0250
           92         0.00      0.0269




                                           111
           100         1.00       0.0250
           110         0.00       0.0250
           116         0.00       0.0269
           118         0.00       0.0269
           119         0.00       0.0269
           128         0.00       0.0250
           129         0.00       0.0269
           132         0.00       0.0269
           133         1.00       0.0250
           135         0.00       0.0250
           139         1.00       0.0269
           143         1.00       0.0250
           174         1.00       0.0269
           180         0.00       0.0269
           189         1.00       0.0250
           190         0.00       0.0269
           196         1.00       0.0269
           199         0.00       0.0269
           202         1.00       0.0250
           204         1.00       0.0250
           206         0.00       0.0269
           215         1.00       0.0250
           217         1.00       0.0250
           219         0.00       0.0269
           225         0.00       0.0269
           226         0.00       0.0250
           230         0.00       0.0250
           247         0.00       0.0250
           345         0.00       0.0250
           507         0.00       0.0269
           533         0.00       0.0250
           541         0.00       0.0269
           542         0.00       0.0250
           543         0.00       0.0250
           546         0.00       0.0250
           547         0.00       0.0250
           548         0.00       0.0250
           552         0.00       0.0269
           566         1.00       0.0250
           578         1.00       0.0250
           581         1.00       0.0269
           593         0.00       0.0250
           598         0.00       0.0250
           603         1.00       0.0269
           606         0.00       0.0269
           608         0.00       0.0250
           612         0.00       0.0250
           613         1.00       0.0250
           637         1.00       0.0269
           639         1.00       0.0250
           640         1.00       0.0269
           645         0.00       0.0269
           646         1.00       0.0250
           659         0.00       0.0250
           662         0.00       0.0269
           663         0.00       0.0269
           665         0.00       0.0269
           670         0.00       0.0269
           677         0.00       0.0269
           681         1.00       0.0250
           690         0.00       0.0250
           702         0.00       0.0269
           703         1.00       0.0250
           707         0.00       0.0269
           924         0.00       0.0269


*** Estimates of parameters ***

                                                                         antilog of
                              estimate            s.e.    t(*)   t pr.     estimate
Constant                        -1.381           0.159   -8.66   <.001       0.2513
Gra_Manu   1                    -1.376           0.539   -2.55   0.011       0.2527
Gra_Manu   999                   0.138           0.189    0.73   0.466        1.147
Gra_Slur   1                     1.180           0.356    3.32   <.001        3.256
Gra_Slur   999                       0               *       *       *        1.000
Gra_Manu   1 .Gra_Slur 1         0.441           0.733    0.60   0.547        1.555



                                           112
Gra_Manu 1 .Gra_Slur 999             0               *       *      *       1.000
Gra_Manu 999 .Gra_Slur 1             0               *       *      *       1.000
Gra_Manu 999 .Gra_Slur 999
                                     0            *         *       *       1.000
* MESSAGE: s.e.s are based on dispersion parameter with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
            Gra_Manu 0
            Gra_Slur 0

5819   DROP [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes]
Gra_Manu.Gra_Slur

5819............................................................................


* MESSAGE: Term Gra_Slur cannot be fully included in the model
  because 1 parameter is aliased with terms already in the model

 (Gra_Slur 999) = (Gra_Manu 999)

***** Regression Analysis *****

Response variate:   VFarmPos
 Binomial totals:   N_Bin
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant + Gra_Manu + Gra_Slur


*** Summary of analysis ***

                                        mean    deviance approx
             d.f.      deviance     deviance       ratio chi pr
Regression      3          23.8        7.922        7.92 <.001
Residual      947         970.2        1.024
Total         950         994.0        1.046

Change           1          0.4        0.370      0.37 0.543
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The residuals do not appear to be random;
           for example, fitted values in the range 0.22 to 0.22
           are consistently larger than observed values
           and fitted values in the range 0.47 to 0.47
           are consistently smaller than observed values
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
           46         0.00      0.0185
           51         1.00      0.0200
           53         1.00      0.0200
           55         1.00      0.0200
           61         1.00      0.0185
           63         0.00      0.0200
           80         1.00      0.0185
           83         0.00      0.0200
           84         0.00      0.0185
           86         0.00      0.0185
           87         0.00      0.0200
           92         0.00      0.0185
          100         1.00      0.0200
          110         0.00      0.0200
          116         0.00      0.0185
          118         0.00      0.0185
          119         0.00      0.0185
          128         0.00      0.0200
          129         0.00      0.0185
          132         0.00      0.0185
          133         1.00      0.0200
          135         0.00      0.0200
          139         1.00      0.0185
          143         1.00      0.0200
          174         1.00      0.0185



                                          113
          180         0.00        0.0185
          189         1.00        0.0200
          190         0.00        0.0185
          196         1.00        0.0185
          199         0.00        0.0185
          202         1.00        0.0200
          204         1.00        0.0200
          206         0.00        0.0185
          215         1.00        0.0200
          217         1.00        0.0200
          219         0.00        0.0185
          225         0.00        0.0185
          226         0.00        0.0200
          230         0.00        0.0200
          247         0.00        0.0200
          345         0.00        0.0200
          507         0.00        0.0185
          533         0.00        0.0200
          541         0.00        0.0185
          542         0.00        0.0200
          543         0.00        0.0200
          546         0.00        0.0200
          547         0.00        0.0200
          548         0.00        0.0200
          552         0.00        0.0185
          566         1.00        0.0200
          578         1.00        0.0200
          581         1.00        0.0185
          593         0.00        0.0200
          598         0.00        0.0200
          603         1.00        0.0185
          606         0.00        0.0185
          608         0.00        0.0200
          612         0.00        0.0200
          613         1.00        0.0200
          637         1.00        0.0185
          639         1.00        0.0200
          640         1.00        0.0185
          645         0.00        0.0185
          646         1.00        0.0200
          659         0.00        0.0200
          662         0.00        0.0185
          663         0.00        0.0185
          665         0.00        0.0185
          670         0.00        0.0185
          677         0.00        0.0185
          681         1.00        0.0200
          690         0.00        0.0200
          702         0.00        0.0185
          703         1.00        0.0200
          707         0.00        0.0185
          924         0.00        0.0185


*** Estimates of parameters ***

                                                                      antilog of
                              estimate         s.e.      t(*) t pr.     estimate
Constant                        -1.403        0.156     -8.97 <.001       0.2459
Gra_Manu 1                      -1.148        0.354     -3.24 0.001       0.3172
Gra_Manu 999                     0.159        0.186      0.86 0.392        1.173
Gra_Slur 1                       1.288        0.307      4.19 <.001        3.624
Gra_Slur 999                         0            *         *     *        1.000
* MESSAGE: s.e.s are based on dispersion parameter with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
            Gra_Manu 0
            Gra_Slur 0

There is no evidence of a statistically significant interaction between the factors
(p=0.54), while independently, the spreading of manure is protective and the
spreading of slurry is a risk factor for shedding being observed on the farm. It will be
important to stress that although this result has been established only for farms with


                                           114
unhoused animals, the relevant data were not collected for housed farms. Hence, both
Gra_Slurry and Gra_Manure will be considered in the multifactor model.

Considering N_Goats, it is suspicious that this variable is statistically significant,
while the related factor reporting the absence or presence of goats is not. Plotting a
histogram of N_Goats, we see that the bulk of the records contains zero. Generating a
new histogram of the non-zero values of N_Goats, we see the following picture:


                                    Histogram of N_Goats
              20




              15
  Frequency




              10




              5




              0
                   2         4       6      8      10          12      14        16
                                             N_Goats


Fitting the model to N_Goats, we generate the following output:
5831       MODEL   [DISTRIBUTION=binomial;       LINK=logit;    DISPERSION=1]     VFarmPos;
NBINOMIAL=N_Bin
5832 TERMS [FACT=9] N_Goats
5833   FIT [PRINT=model,summary,estimates;    CONSTANT=estimate;    FPROB=yes;   TPROB=yes;
FACT=9]\
5834   N_Goats

5834............................................................................


***** Regression Analysis *****

Response variate:      VFarmPos
 Binomial totals:      N_Bin
    Distribution:      Binomial
   Link function:      Logit
    Fitted terms:      Constant, N_Goats


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       1          3.1        3.150      3.15 0.076
Residual       950        993.9        1.046
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1




                                           115
Dispersion parameter is fixed at 1.00
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
            9         0.00      0.0075
           95         0.00      0.0515
          170         0.00      0.0075
          243         0.00      0.0075
          343         1.00      0.0171
          366         1.00      0.0075
          367         1.00      0.3600
          368         0.00      0.0075
          537         0.00      0.0075
          554         0.00      0.0316
          585         0.00      0.0171
          673         0.00      0.0075
          676         0.00      0.0515
          720         1.00      0.2125
          746         1.00      0.0515
          766         0.00      0.0765
          792         0.00      0.0075
          799         0.00      0.0075
          818         0.00      0.0515


*** Estimates of parameters ***

                                                          antilog of
                   estimate         s.e.      t(*) t pr.    estimate
Constant            -1.2989       0.0793    -16.38 <.001      0.2728
N_Goats              0.1635       0.0943      1.73 0.083       1.178
       MESSAGE: s.e.s are based on dispersion parameter with value 1

The two units with the highest leverage correspond to the farms with 10 and 16 goats.
Removing these ultra-high leverage points from the analysis gives rise to the
following output:
5934       MODEL   [DISTRIBUTION=binomial;     LINK=logit;   DISPERSION=1]       VFarmPos;
NBINOMIAL=N_Bin
5935 TERMS [FACT=9] N_Goats
5936   FIT [PRINT=model,summary,estimates;   CONSTANT=estimate;    FPROB=yes;   TPROB=yes;
FACT=9]\
5937   N_Goats

5937............................................................................


***** Regression Analysis *****

 Response variate:   VFarmPos
  Binomial totals:   N_Bin
     Distribution:   Binomial
    Link function:   Logit
     Fitted terms:   Constant, N_Goats


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       1          0.0        0.019      0.02 0.891
Residual       948        990.9        1.045
Total          949        990.9        1.044
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
            9         0.00      0.0192
           95         0.00      0.1143
          170         0.00      0.0192
          243         0.00      0.0192
          343         1.00      0.0422
          366         1.00      0.0192
          367         0.00      0.0192




                                         116
         536          0.00        0.0192
         553          0.00        0.0741
         584          0.00        0.0422
         672          0.00        0.0192
         675          0.00        0.1143
         744          1.00        0.1143
         764          0.00        0.1626
         790          0.00        0.0192
         797          0.00        0.0192
         816          0.00        0.1143


*** Estimates of parameters ***

                                                           antilog of
                  estimate         s.e.      t(*)    t pr.   estimate
Constant           -1.2888       0.0795    -16.22    <.001     0.2756
N_Goats             -0.023        0.172     -0.14    0.892     0.9770
* MESSAGE: s.e.s are based on dispersion parameter   with value 1

Having removed the two high leverage points, N_Goats no longer exhibits any
particular statistical significance (p=0.89). It will therefore not be considered for
inclusion in the multifactor model.

The next factor which will receive detailed consideration is Pigs. Fitting this factor
gives rise to the following output:
5558        MODEL  [DISTRIBUTION=binomial;       LINK=logit;   DISPERSION=1]     VFarmPos;
NBINOMIAL=N_Bin
5559 TERMS [FACT=9] Pigs
5560   FIT [PRINT=model,summary,estimates;   CONSTANT=estimate;    FPROB=yes;   TPROB=yes;
FACT=9]\
5561   Pigs

5561............................................................................


***** Regression Analysis *****

Response variate:   VFarmPos
 Binomial totals:   N_Bin
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant, Pigs


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       1          6.6        6.567      6.57 0.010
Residual       950        990.5        1.043
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
            2         0.00      0.0244
           13         0.00      0.0244
           25         1.00      0.0244
           53         1.00      0.0244
           66         0.00      0.0244
           80         1.00      0.0244
          106         0.00      0.0244
          170         0.00      0.0244
          274         0.00      0.0244
          323         0.00      0.0244
          326         1.00      0.0244
          337         0.00      0.0244
          346         0.00      0.0244



                                           117
          360            1.00        0.0244
          400            0.00        0.0244
          428            1.00        0.0244
          440            1.00        0.0244
          456            0.00        0.0244
          463            1.00        0.0244
          469            0.00        0.0244
          470            0.00        0.0244
          482            1.00        0.0244
          520            1.00        0.0244
          527            0.00        0.0244
          572            1.00        0.0244
          581            1.00        0.0244
          640            1.00        0.0244
          659            0.00        0.0244
          673            0.00        0.0244
          680            0.00        0.0244
          682            1.00        0.0244
          720            1.00        0.0244
          727            0.00        0.0244
          746            1.00        0.0244
          749            0.00        0.0244
          758            0.00        0.0244
          769            0.00        0.0244
          799            0.00        0.0244
          818            0.00        0.0244
          932            0.00        0.0244
          950            0.00        0.0244


*** Estimates of parameters ***

                                                                 antilog of
                  estimate         s.e.      t(*)          t pr.   estimate
Constant           -1.3270       0.0812    -16.34          <.001     0.2653
Pigs 2               0.881        0.330      2.67          0.008      2.413
* MESSAGE: s.e.s are based on dispersion parameter         with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
                Pigs 1

Hence, the presence of pigs on a farm is associated with a higher risk of the farm
exhibiting positive samples. Pigs will be included as a candidate factor in the
multifactor analysis.

Fitting Lab Operator as a factor gives rise to the following output:
5563       MODEL   [DISTRIBUTION=binomial;            LINK=logit;   DISPERSION=1]       VFarmPos;
NBINOMIAL=N_Bin
5564 TERMS [FACT=9] Lab_Op
5565   FIT [PRINT=model,summary,estimates;           CONSTANT=estimate;   FPROB=yes;   TPROB=yes;
FACT=9]\
5566   Lab_Op

5566............................................................................


***** Regression Analysis *****

 Response variate:     VFarmPos
  Binomial totals:     N_Bin
     Distribution:     Binomial
    Link function:     Logit
     Fitted terms:     Constant, Lab_Op


*** Summary of analysis ***

                                              mean    deviance approx
                d.f.      deviance        deviance       ratio chi pr
Regression         2           6.5           3.256        3.26 0.039
Residual         925         958.2           1.036




                                               118
Total          927        964.7        1.041
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses


*** Estimates of parameters ***

                                                           antilog of
                  estimate         s.e.      t(*)    t pr.   estimate
Constant            -1.080        0.122     -8.83    <.001     0.3397
Lab_Op H            -0.304        0.169     -1.80    0.072     0.7379
Lab_Op S            -0.635        0.284     -2.24    0.025     0.5299
* MESSAGE: s.e.s are based on dispersion parameter   with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
              Lab_Op D


There are clear differences between the prevalence rate associated with different Lab
Operators. At a facile level, this is alarming. Obviously, the results of a study should
be independent of the technician carrying out the assaying of samples. However, the
samples analysed by the different technicians are not randomly sampled across the
lifetime of the study, and the initial analysis indicated that there was a major variation
in prevalence over the study.

Tabulating the number of samples processed by each operator in each month of the
study, we get the following values:

Month              D        H          S
         3         2         3         0
         4         6         9         0
         5         9         5         0
         6        10        21         0
         7        19        19         0
         8        25        13         0
         9        22        26         0
        10        24        21         0
        11        19        19         0
        12        12        14         0
        13        23        13         0
        14        17        25         0
        15        21        32         0
        16        26        18         0
        17        19        20         0
        18        18        17         0
        19        15        15         0
        20        20        15         0
        21        13         6         0
        22        31        20         0
        23         0        21         0
        24         0        13         0
        25         0        22        11
        26         0        22        23


                                           119
        27        0        28        35
        28        0        13        18
        29        0         9        31

Tabulating the mean prevalences seen in these months, we get the following table:

Month             D        H           S
         3    0.000     0.333          -
         4    0.833     0.000          -
         5    0.222     0.600          -
         6    0.200     0.190          -
         7    0.211     0.263          -
         8    0.320     0.231          -
         9    0.273     0.154          -
        10    0.167     0.143          -
        11    0.368     0.421          -
        12    0.250     0.357          -
        13    0.087     0.077          -
        14    0.294     0.200          -
        15    0.286     0.188          -
        16    0.115     0.222          -
        17    0.316     0.300          -
        18    0.167     0.118          -
        19    0.267     0.333          -
        20    0.200     0.200          -
        21    0.462     0.333          -
        22    0.290     0.200          -
        23         -    0.238          -
        24         -    0.000          -
        25         -    0.182      0.091
        26         -    0.136      0.000
        27         -    0.179      0.257
        28         -    0.077      0.056
        29         -    0.000      0.226

Restricting the analysis to months 3-22, when only operators D and H were present,
and fitting Lab Operator as an explanatory variable, we get the following output:
5724       MODEL   [DISTRIBUTION=binomial;       LINK=logit;   DISPERSION=1]     VFarmPos;
NBINOMIAL=N_Bin
5725 TERMS [FACT=9] Lab_Op
5726   FIT [PRINT=model,summary,estimates;   CONSTANT=estimate;    FPROB=yes;   TPROB=yes;
FACT=9]\
5727   Lab_Op

5727............................................................................


* MESSAGE: Term Lab_Op cannot be fully included in the model
  because 1 parameter is aliased with terms already in the model

 (Lab_Op S) = 0

***** Regression Analysis *****




                                           120
Response variate:   VFarmPos
 Binomial totals:   N_Bin
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant, Lab_Op


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       1          0.8        0.844      0.84 0.358
Residual       680        749.3        1.102
Total          681        750.1        1.101
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00


*** Estimates of parameters ***

                                                              antilog of
                  estimate         s.e.      t(*)       t pr.   estimate
Constant            -1.080        0.122     -8.83       <.001     0.3397
Lab_Op H            -0.165        0.180     -0.92       0.357     0.8476
Lab_Op S                 0            *         *           *      1.000
* MESSAGE: s.e.s are based on dispersion parameter      with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
              Lab_Op D

There is no significant difference (p=0.36) between the two operators during the
months for which they were both operating.

Restricting the analysis to months 25-29, when only operators H and S were present,
and fitting Lab Operator as an explanatory variable, we get the following output:
5730       MODEL   [DISTRIBUTION=binomial;         LINK=logit;   DISPERSION=1]       VFarmPos;
NBINOMIAL=N_Bin
5731 TERMS [FACT=9] Lab_Op

**** G5W0013 **** Warning (Code RE 49). Statement 1 on Line 5731
Command: TERMS [FACT=9] Lab_Op
No observations found at the reference level of a factor
The reference level for factor Lab_Op was Level 1, and has been changed to Level
 2

5732   FIT [PRINT=model,summary,estimates;        CONSTANT=estimate;   FPROB=yes;   TPROB=yes;
FACT=9]\
5733   Lab_Op

5733............................................................................


* MESSAGE: Term Lab_Op cannot be fully included in the model
  because 1 parameter is aliased with terms already in the model

 (Lab_Op D) = 0

***** Regression Analysis *****

Response variate:   VFarmPos
 Binomial totals:   N_Bin
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant, Lab_Op


*** Summary of analysis ***

                                           mean    deviance approx
             d.f.      deviance        deviance       ratio chi pr




                                            121
Regression       1          0.1       0.0853      0.09 0.770
Residual       210        176.3       0.8397
Total          211        176.4       0.8362
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00


*** Estimates of parameters ***

                                                           antilog of
                  estimate         s.e.      t(*)    t pr.   estimate
Constant            -1.829        0.299     -6.12    <.001     0.1605
Lab_Op D                 0            *         *        *      1.000
Lab_Op S             0.115        0.393      0.29    0.771      1.122
* MESSAGE: s.e.s are based on dispersion parameter   with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
              Lab_Op H

There is no significant difference (p=0.77) between the two operators during the
months for which they were both operating. The apparent Lab Operator effect is an
artefact of the unbalanced nature of the dataset with respect to this factor. It will
therefore not be considered as a candidate factor for the multifactor analysis.

We have considered all the candidate explanatory factors. The following factors:
FCattle, SamGrF, Cattle, NewSource, BeefonDairy, Breed2, Gra_Slurry, Gra_Manure
and Pigs will be candidates for inclusion in the multifactor model. However, the
identification in the univariate analyses of significant year and (possibly) seasonal
effects would indicate a need for some investigation of these possible descriptive
factors prior to the fitting of the multifactor model.

Fitting Sam_Year gives rise to the following output:
5753       MODEL    [DISTRIBUTION=binomial;   LINK=logit;    DISPERSION=1]    VFarmPos;
NBINOMIAL=N_Bin
5754 TERMS [FACT=9] Sam_Year
5755   FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes;
FACT=9]\
5756   Sam_Year
5756............................................................................


***** Regression Analysis *****

 Response variate:   VFarmPos
  Binomial totals:   N_Bin
     Distribution:   Binomial
    Link function:   Logit
     Fitted terms:   Constant, Sam_Year


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       2         10.8        5.419      5.42 0.004
Residual       949        986.2        1.039
Total          951        997.0        1.048
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses


*** Estimates of parameters ***




                                          122
                                                           antilog of
                  estimate         s.e.      t(*)    t pr.   estimate
Constant            -1.025        0.126     -8.14    <.001     0.3587
Sam_Year 1999       -0.254        0.173     -1.47    0.142     0.7759
Sam_Year 2000       -0.739        0.232     -3.19    0.001     0.4775
* MESSAGE: s.e.s are based on dispersion parameter   with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
            Sam_Year 1998

The effect looks conclusive: a drop in 1999 relative to 1998 was then continued in
2000. However, the results may be deceptive: only a fraction (months 1-5) of 2000
was sampled, and the analysis of monthly figures above might suggest that these
months exhibit lower levels of farm prevalence. Hence the figure for Year 2000
could be biased. However, by restricting the analysis only to the months January-
May, we can quickly test this hypothesis:
5774       MODEL    [DISTRIBUTION=binomial;   LINK=logit;    DISPERSION=1]    VFarmPos;
NBINOMIAL=N_Bin
5775 TERMS [FACT=9] Sam_Year
5776   FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes;
FACT=9]\
5777   Sam_Year
5777............................................................................


***** Regression Analysis *****

 Response variate:   VFarmPos
  Binomial totals:   N_Bin
     Distribution:   Binomial
    Link function:   Logit
     Fitted terms:   Constant, Sam_Year


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression       2          9.0       4.4964      4.50 0.011
Residual       474        458.8       0.9680
Total          476        467.8       0.9828
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
            1         0.00      0.0195
            2         0.00      0.0195
            3         1.00      0.0195
            4         0.00      0.0195
            5         0.00      0.0195
            6         0.00      0.0195
            7         1.00      0.0195
            8         0.00      0.0195
            9         0.00      0.0195
           10         0.00      0.0195
           11         0.00      0.0195
           12         0.00      0.0195
           13         0.00      0.0195
           14         1.00      0.0195
           15         1.00      0.0195
           16         0.00      0.0195
           17         1.00      0.0195
           18         0.00      0.0195
           19         1.00      0.0195
           20         0.00      0.0195
           21         0.00      0.0195
           22         1.00      0.0195
           23         0.00      0.0195



                                          123
           24         0.00        0.0195
           25         1.00        0.0195
           26         0.00        0.0195
           27         0.00        0.0195
           28         1.00        0.0195
           29         1.00        0.0195
           30         1.00        0.0195
           31         1.00        0.0195
           32         1.00        0.0195
           33         0.00        0.0195
           34         1.00        0.0195
           35         0.00        0.0195
           36         0.00        0.0195
           37         0.00        0.0195
           38         1.00        0.0195
           39         1.00        0.0195
           40         0.00        0.0195
           41         0.00        0.0195
           42         0.00        0.0195
           43         0.00        0.0195
           44         0.00        0.0195
           45         0.00        0.0195
           46         0.00        0.0195
           47         0.00        0.0195
           48         0.00        0.0195
           49         0.00        0.0195
           50         0.00        0.0195
           51         1.00        0.0195


*** Estimates of parameters ***

                                                           antilog of
                  estimate         s.e.      t(*)    t pr.   estimate
Constant            -0.693        0.296     -2.34    0.019     0.5000
Sam_Year 1999       -0.658        0.341     -1.93    0.054     0.5176
Sam_Year 2000       -1.071        0.354     -3.02    0.003     0.3425
* MESSAGE: s.e.s are based on dispersion parameter   with value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
            Sam_Year 1998

Restricting the analysis to only the first five months of the year, there is clear
evidence of a year on year drop in the farm prevalence.

There are issues of balance in the dataset when considering Sam_Year and
Sam_Month as factors to be fitted within the same model. It is therefore appropriate
to used a Generalised Linear Mixed Model to analyse these data, since it will give rise
to better estimates when fitting a model to highly unbalanced data. The model to be
fitted is Sam_Year+Sam_Month (it is impossible to fit an interaction between these
factors due to colinearity in the data), and it gives rise to the following output:
5709       GLMM    [PRINT=model,monitor,components,vcovariance,means,backmeans,effects;
DISTRIBUTION=binomial;\
5710            LINK=logit;    DISPERSION=1;    FIXED=Sam_Year+Sam_Mon;    RANDOM=Farm;
CONSTANT=estimate;\
5711         FACT=9;   PSE=*;   MAXCYCLE=20;   FMETHOD=all;   CADJUST=mean]   VFarmPos;
NBINOMIAL=N_Bin

***** Generalised Linear Mixed Model Analysis *****

               Method:    cf Schall (1991) Biometrika
     Response variate:    VFarmPos
         Distribution:    BINOMIAL
        Link function:    LOGIT

          Random model:   Farm
           Fixed model:   Constant + Sam_Year + Sam_Mon

* Dispersion parameter fixed at value 1.000




                                           124
*** Monitoring information ***

Iteration     Gammas Dispersion            Max change
        1    0.08797      1.000            3.7834E+00
        2 0.000001000      1.000            8.7973E-02
        3   0.007951      1.000            7.9504E-03
        4    0.08668      1.000            7.8730E-02
        5    0.08698      1.000            3.0157E-04
        6    0.08777      1.000            7.9033E-04
        7    0.08780      1.000            2.6984E-05


*** Estimated Variance Components ***

Random term                   Component            S.e.

Farm                              0.088           0.276


*** Residual variance model ***

Term              Factor           Model(order)      Parameter           Estimate      S.e.

Dispersn                           Identity          Sigma2                  1.000     FIXED



*** Estimated Variance matrix for Variance Components ***

     Farm     1      0.07627
 Dispersn     2      0.00000         0.00000

                              1               2



*** Table of effects for Constant ***

            -1.637    Standard error:       0.4301



*** Table of effects for Sam_Year ***

   Sam_Year            1998            1999             2000
                     0.0000         -0.1716          -0.6894

Standard error of differences:         Average            0.2471
                                       Maximum            0.2947
                                       Minimum            0.1938

Average variance of differences:                          0.06277


*** Table of effects for Sam_Mon ***

Sam_Mon        Jan      Feb          Mar       Apr         May         Jun      Jul      Aug
            0.0000   0.3126       0.8039    0.2403      0.9472      0.1870   0.6891   0.6000


Sam_Mon        Sep      Oct          Nov       Dec
            0.6828   0.3909       1.0287    0.3368

Standard error of differences:         Average            0.4246
                                       Maximum            0.5717
                                       Minimum            0.3162

Average variance of differences:                          0.1830



*** Tables of means ***


*** Table of predicted means for Sam_Year ***



                                                  125
Sam_Year      1998      1999       2000
            -1.119    -1.290     -1.808


*** Table of predicted means for Sam_Mon ***

Sam_Mon        Jan       Feb        Mar        Apr       May       Jun        Jul      Aug
            -1.924    -1.611     -1.120     -1.684    -0.977    -1.737     -1.235   -1.324


Sam_Mon        Sep       Oct        Nov        Dec
            -1.241    -1.533     -0.895     -1.587


*** Back-transformed Means (on the original scale) ***



       Sam_Year
           1998      0.2463
           1999      0.2158
           2000      0.1409


       Sam_Mon
           Jan       0.1274
           Feb       0.1664
           Mar       0.2460
           Apr       0.1566
           May       0.2735
           Jun       0.1497
           Jul       0.2253
           Aug       0.2102
           Sep       0.2242
           Oct       0.1775
           Nov       0.2900
           Dec       0.1698

Note: means are probabilities not expected values.


5712    FSPREADSHEET Vars[1]
5713    VDISPLAY [PRINT=Wald]


*** Wald tests for fixed effects ***


  Fixed term                   Wald statistic         d.f.     Wald/d.f.     Chi-sq prob

* Sequentially adding terms to fixed model

  Sam_Year                           9.29               2          4.64         0.010
  Sam_Mon                           13.58              11          1.23         0.257

* Dropping individual terms from full fixed model

  Sam_Year                           5.64               2          2.82         0.059
  Sam_Mon                           13.58              11          1.23         0.257


The year of sampling appears to be very close to statistical significance (p=0.059),
exhibiting a small drop in 1999 and a large drop in 2000. The estimated mean farm
prevalences for each year are as follows:

                        Mean Farm
         Year           Prevalence
         1998              0.25
         1999              0.22
         2000              0.14



                                                126
Plotting the mean prevalences by year, with the associated 95% confidence intervals,
gives:

                          1.00


                          0.80
   Mean Farm Prevalence




                          0.60


                          0.40


                          0.20


                          0.00
                                       1998   1999          2000
                                              Year



There is evidence of a mild drop in prevalence in 1999, followed by a larger decrease
in 2000.

The month of sampling shows no sign of statistical significance (p=0.26). The mean
prevalences for these months are as follows:

                                 Mean Farm
 Month                           Prevalence
  Jan                               0.13
  Feb                               0.17
  Mar                               0.25
  Apr                               0.16
  May                               0.27
  Jun                               0.15
  Jul                               0.23
  Aug                               0.21
  Sep                               0.22
  Oct                               0.18
  Nov                               0.29
  Dec                               0.17

It is informative to plot the mean prevalences by month with the associated 95%
confidence intervals.




                                              127
                            1.00


                            0.80
     Mean Farm Prevalence




                            0.60


                            0.40


                            0.20


                            0.00
                                   Jan Feb Mar Apr May Jun      Jul    Aug Sep Oct Nov Dec
                                                            Month



There is some evidence of drops in prevalence in April and June and an increase in
November. It is also noticeable that December, January, and February present some
of the lowest prevalences across the months, even after adjusting for the Sampling
Year effect.

It will clearly be important to assess the nature of the year effect after allowing for
any explanatory factors which are identified as part of the modelling exercise. Given
the importance of Month in the within-herd prevalence model, it will also be
important to assess whether any Sampling Month-related effects become apparent in
the multi-factor model.

Considering the candidate factors for the multi-variate model, no terms are forced into
the model.
5911                                         RSEARCH                     [METHOD=fstep]
FCattle+SamGrF+Cattle+NewSource+BeefonDairy+Breed2+Gra_Slur+Gra_Manu+Pigs

***** Model Selection *****

 Response variate:                     VFarmPos
  Binomial totals:                     N_Bin
     Distribution:                     Binomial
    Link function:                     Logit
  Number of units:                     951
     Forced terms:                     Constant
        Forced df:                     1
       Free terms:                     FCattle + SamGrF + Cattle + NewSource + BeefonDairy +
                                       Breed2 + Gra_Slur + Gra_Manu + Pigs


*** Stepwise (forward) analysis of deviance ***

Change                                                                       mean   deviance approx
                                                   d.f.     deviance     deviance      ratio chi pr
+   SamGrF                                            3      31.4232      10.4744      10.47 <.001
+   Gra_Slur                                          2      10.8629       5.4314       5.43 0.004
+   Gra_Manu                                          1      10.8730      10.8730      10.87 <.001
+   BeefonDairy                                       1       7.8689       7.8689       7.87 0.005
+   Pigs                                              1       5.1369       5.1369       5.14 0.023
+   FCattle                                           3       7.6210       2.5403       2.54 0.055



                                                             128
+ Cattle                            3        5.7489      1.9163       1.92   0.124
+ Breed2                            1        2.4449      2.4449       2.44   0.118
+ NewSource                         1        2.1589      2.1589       2.16   0.142
Residual                          934      909.8257      0.9741

Total                             950      993.9643      1.0463

        Final model: Constant + SamGrF + Gra_Slur + Gra_Manu +
                     BeefonDairy + Pigs + FCattle + Cattle + Breed2 +
                     NewSource

SamGrF, Grass Slurry, Grass Manure, BeefonDairy and Pigs all enter the model at a
level which is statistically significant at the 5% level. FCattle is close to this level of
statistical significance, while Cattle, Breed2 and NewSource all exhibit p-values
greater than 0.1. However, none of the variables have such low significance that it
would seem sensible to remove them from the analysis at this point. Cattle, Breed2
and NewSource all give rise to p-values which are appreciably higher than those seen
within the univariate analyses. Considering factor Cattle, this is not an enormous
surprise, given the many other factors included in the model which reflect the size of
the farm operation. However, it is important to establish the aspects of the model
which are causing the drop in significance assigned to Breed2 and NewSource.

In turn, each of Breed2 and NewSource are fitted with and without each other
candidate variable. The significance of the factor, based on the change in deviance
when it is removed from the two-factor model, is tabulated.

Initially considering the Breed2 factor,

 Other Factor        P-Value
-                       0.021
SamGrF                   0.05
Gra_Slur                0.024
Gra_Manu                0.037
BeefonDairy             0.017
Pigs                    0.015
FCattle                 0.052
Cattle                  0.038
NewSource               0.024

It is clear that no single factor is strongly associated with the drop in significance seen
in the multi-factor model. This is probably related to the relatively low support
present in the dataset for the factor Breed2. Only 6 farms in the dataset had this type
of animal present. On balance, it is more likely that the effect is spurious, associated
with the high leverage associated with these 6 farms and the unbalanced nature of the
dataset. On these grounds, Breed2 should ultimately be excluded from the multifactor
analysis.

By contrast, tabulating the effects of other factors on NewSource gives:

  Other Factor        P-Value
-                         0.026
SamGrF                    0.151
Gra_Slur                  0.028


                                            129
Gra_Manu                  0.031
BeefonDairy               0.039
Pigs                      0.037
FCattle                   0.222
Cattle                   <0.001
Breed2                    0.037

It is immediately clear that the finishing cattle number factors, SamGrF and FCattle
are associated with a dramatic drop in the significance associated with NewSource.
This is probably due to some type of correlation between large and open farms.
Firstly, the multifactor model is fitted without these two factors to confirm the
relationship.
5735       MODEL    [DISTRIBUTION=binomial;   LINK=logit;    DISPERSION=1]  VFarmPos;
NBINOMIAL=N_Bin
5736   TERMS [FACT=9] SamGrF + Gra_Slur + Gra_Manu +BeefonDairy + Pigs + FCattle +
Cattle + Breed2 +NewSource
5737    FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes;
FACT=9]\
5738   Gra_Slur + Gra_Manu +BeefonDairy + Pigs + Cattle + Breed2 +NewSource

5738............................................................................


* MESSAGE: Term Gra_Manu cannot be fully included in the model
  because 1 parameter is aliased with terms already in the model

 (Gra_Manu 999) = (Gra_Slur 999)

***** Regression Analysis *****

Response variate:   VFarmPos
 Binomial totals:   N_Bin
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant + Gra_Slur + Gra_Manu + BeefonDairy +
                    Pigs + Cattle + Breed2 + NewSource


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression      10         58.3       5.8334      5.83 <.001
Residual       940        935.6       0.9954
Total          950        994.0       1.0463
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The residuals do not appear to be random;
           for example, fitted values in the range 0.04 to 0.07
           are consistently larger than observed values
           and fitted values in the range 0.36 to 0.38
           are consistently smaller than observed values
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
           62         1.00       0.049
           70         1.00       0.049
           80         1.00       0.051
          131         1.00       0.094
          165         0.00       0.049
          182         0.00       0.047
          200         0.00       0.049
          201         0.00       0.049
          284         1.00       0.057
          297         0.00       0.094
          301         1.00       0.094




                                          130
         310         0.00         0.047
         348         0.00         0.047
         370         0.00         0.047
         384         0.00         0.094
         385         1.00         0.094
         418         0.00         0.047
         437         0.00         0.047
         444         1.00         0.141
         460         1.00         0.047
         494         0.00         0.141
         496         0.00         0.049
         497         0.00         0.102
         527         0.00         0.255
         581         1.00         0.051
         599         1.00         0.047
         600         0.00         0.176
         601         1.00         0.108
         602         0.00         0.072
         603         1.00         0.084
         620         1.00         0.167
         637         1.00         0.217
         640         1.00         0.051
         651         1.00         0.049
         659         0.00         0.046
         680         0.00         0.073
         688         1.00         0.211
         701         1.00         0.094
         737         0.00         0.200
         748         0.00         0.047
         750         1.00         0.047
         761         0.00         0.141
         763         0.00         0.141
         769         0.00         0.073
         788         1.00         0.094
         858         0.00         0.102
         884         1.00         0.133
         911         0.00         0.167
         950         0.00         0.041


*** Estimates of parameters ***

                                                                        antilog of
                              estimate         s.e.        t(*) t pr.     estimate
Constant                        -1.887        0.203       -9.29 <.001       0.1515
Gra_Slur 1                       1.118        0.319        3.50 <.001        3.058
Gra_Slur 999                     0.045        0.191        0.24 0.813        1.046
Gra_Manu 1                      -1.179        0.367       -3.22 0.001       0.3075
Gra_Manu 999                         0            *           *     *        1.000
BeefonDairy 1                    1.313        0.645        2.04 0.042        3.716
Pigs 2                           0.890        0.343        2.60 0.009        2.436
Cattle 2                         0.532        0.182        2.93 0.003        1.702
Cattle 3                         1.271        0.470        2.70 0.007        3.566
Cattle 4                         -0.04         1.12       -0.04 0.972       0.9610
Breed2 1                         1.709        0.901        1.90 0.058        5.525
NewSource 2                      0.501        0.178        2.81 0.005        1.650
* MESSAGE: s.e.s are based on dispersion parameter with   value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
            Gra_Slur 0
            Gra_Manu 0
         BeefonDairy 0
                Pigs 1
              Cattle 1
              Breed2 0
           NewSource 1

Clearly, in the absence of the finishing cattle size factors, NewSource is highly
significant (p=0.005). The relationship between these factors initially will be
investigated through tabulation.
Tabulating the properties of the dataset with respect to NewSource and SamGrF
gives:



                                          131
n         SamGrF
NewSource       1              2         3          4
        1     180            142       148        125
        2      65             92        93        107

mean      SamGrF
NewSource        1             2         3          4
        1    0.106         0.197     0.216      0.296
        2    0.123         0.261     0.237      0.346

var       SamGrF
NewSource        1             2         3          4
        1    0.095         0.159     0.171      0.210
        2    0.110         0.195     0.183      0.228

se        SamGrF
NewSource        1             2         3          4
        1    0.023         0.034     0.034      0.041
        2    0.041         0.046     0.044      0.046

There is little significant evidence of any difference due to NewSource at any of the
levels of SamGrF: in each case the mean is higher in the open farms, but the
difference is not appreciable relative to the standard errors.

Tabulating the properties of the dataset with respect to NewSource and FCattle gives:

n         FCattle
NewSource         1           2         3         4
        1      341          141        88        25
        2      124          108        87        38

mean      FCattle
NewSource         1           2         3          4
        1    0.158        0.255     0.216      0.280
        2    0.169        0.259     0.299      0.421

var       FCattle
NewSource         1           2         3          4
        1    0.134        0.191     0.171      0.210
        2    0.142        0.194     0.212      0.250

se        FCattle
NewSource         1           2         3          4
        1    0.020        0.037     0.044      0.092
        2    0.034        0.042     0.049      0.081




                                         132
Again, there is negligible difference in the mean behaviour between open and closed
farms except in the farms with the largest numbers of finishing cattle, and there the
numbers are small, ensuring that the associated standard errors are large. The
evidence for NewSource being the driving factor behind the variability seen in these
tables is weak and contradictory. By contrast, both FCattle and SamGrF show self-
consistent patterns of effect: all the higher levels of the factor consistently show
significantly different prevalence levels to the lowest level. On balance, it is more
likely that the NewSource effect is, at best, small and lacking in statistical
significance in this study. On these grounds, NewSource should ultimately be
excluded from the multifactor analysis.

Fitting the remaining factors in a multi-factor model, we generate the following
output:
5780       MODEL    [DISTRIBUTION=binomial;   LINK=logit;    DISPERSION=1] VFarmPos;
NBINOMIAL=N_Bin
5781   TERMS [FACT=9] SamGrF + Gra_Slur + Gra_Manu+BeefonDairy + Pigs + FCattle +
Cattle
5782   FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes;
FACT=9]\
5783   SamGrF + Gra_Slur + Gra_Manu+BeefonDairy + Pigs + FCattle + Cattle

5783............................................................................


* MESSAGE: Term Gra_Manu cannot be fully included in the model
  because 1 parameter is aliased with terms already in the model

 (Gra_Manu 999) = (Gra_Slur 999)

***** Regression Analysis *****

Response variate:   VFarmPos
 Binomial totals:   N_Bin
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant + SamGrF + Gra_Slur + Gra_Manu +
                    BeefonDairy + Pigs + FCattle + Cattle


*** Summary of analysis ***

                                        mean deviance approx
              d.f.     deviance     deviance     ratio chi pr
Regression      14         79.5       5.6811      5.68 <.001
Residual       936        914.4       0.9770
Total          950        994.0       1.0463
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The residuals do not appear to be random;
           for example, fitted values in the range 0.09 to 0.10
           are consistently larger than observed values
           and fitted values in the range 0.33 to 0.34
           are consistently smaller than observed values
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
           62         1.00       0.060
           70         1.00       0.068
           80         1.00       0.062
          131         1.00       0.098
          165         0.00       0.060
          182         0.00       0.062
          200         0.00       0.068
          284         1.00       0.059
          297         0.00       0.103
          301         1.00       0.106




                                          133
          310         0.00        0.057
          370         0.00        0.062
          384         0.00        0.111
          385         1.00        0.107
          418         0.00        0.059
          437         0.00        0.058
          444         1.00        0.214
          460         1.00        0.056
          494         0.00        0.114
          496         0.00        0.074
          497         0.00        0.094
          527         0.00        0.289
          581         1.00        0.059
          601         1.00        0.111
          602         0.00        0.071
          603         1.00        0.083
          640         1.00        0.052
          651         1.00        0.069
          659         0.00        0.052
          680         0.00        0.074
          701         1.00        0.094
          737         0.00        0.214
          748         0.00        0.058
          750         1.00        0.056
          761         0.00        0.114
          763         0.00        0.096
          769         0.00        0.085
          788         1.00        0.104
          858         0.00        0.115
          884         1.00        0.063


*** Estimates of parameters ***

                                                                        antilog of
                              estimate         s.e.        t(*) t pr.     estimate
Constant                        -2.460        0.272       -9.04 <.001      0.08544
SamGrF 2                         0.786        0.266        2.96 0.003        2.195
SamGrF 3                         0.642        0.269        2.39 0.017        1.901
SamGrF 4                         1.135        0.267        4.25 <.001        3.111
Gra_Slur 1                       1.121        0.322        3.48 <.001        3.068
Gra_Slur 999                     0.121        0.197        0.61 0.540        1.128
Gra_Manu 1                      -1.131        0.371       -3.05 0.002       0.3228
Gra_Manu 999                         0            *           *     *        1.000
BeefonDairy 1                    1.788        0.651        2.75 0.006        5.980
Pigs 2                           0.893        0.347        2.57 0.010        2.443
FCattle 2                        0.280        0.207        1.35 0.176        1.324
FCattle 3                        0.183        0.234        0.79 0.432        1.201
FCattle 4                        0.783        0.317        2.47 0.013        2.187
Cattle 2                         0.277        0.175        1.58 0.113        1.320
Cattle 3                         0.845        0.475        1.78 0.075        2.328
Cattle 4                         -0.90         1.15       -0.78 0.434       0.4054
* MESSAGE: s.e.s are based on dispersion parameter with   value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
              SamGrF 1
            Gra_Slur 0
            Gra_Manu 0
         BeefonDairy 0
                Pigs 1
             FCattle 1
              Cattle 1


All of the factors included in this model give rise to effect qualitatiatively similar to
those seen in the univariate analyses.

Again using stepwise regression to explore the properties of the data, we force the
above factors to be included in the model, and explore whether any other factors now
should be included in the model (excluding time and geographical variables which
will be considered later):



                                          134
5838                                                                           RSEARCH
[METHOD=fstep;FORCED=FCattle+SamGrF+Cattle+BeefonDairy+Gra_Slur+Gra_Manu+Pigs]
Manage_O \\
5839 +Sampler+ Max_Age + Min_Age + Housed + Housing+ NoChange + T_DHouse+Sup_Feed\\
5840                     +Forage           +           Silage+Conc+          Sil_Home+
Sil_Manu+Sil_Slur+Sil_Sewa+Sil_Geec+Sil_Gull+Hay          +         Hay_Manu         +
Hay_Slur+Hay_Geec+Hay_Gull\\
5841 +Gra_Sewa+Gra_Geec+Gra_Gull+Sheep + N_Horses+ Chicks + Deer+ Water + Water_Con +
WaterCT+ Want2Kno \\
5842 + Visit2

***** Model Selection *****

Response variate:    VFarmPos
 Binomial totals:    N_Bin
    Distribution:    Binomial
   Link function:    Logit
 Number of units:    950
    Forced terms:    Constant + FCattle + SamGrF + Cattle + BeefonDairy +
                     Gra_Slur + Gra_Manu + Pigs
          Forced df: 15
         Free terms: Manage_O + Sampler + Max_Age + Min_Age + Housed +
                     Housing + NoChange + T_DHouse + Sup_Feed +
                     Forage + Silage + Conc + Sil_Home + Sil_Manu +
                     Sil_Slur + Sil_Sewa + Sil_Geec + Sil_Gull +
                     Hay + Hay_Manu + Hay_Slur + Hay_Geec + Hay_Gull +
                     Gra_Sewa + Gra_Geec + Gra_Gull + Sheep + N_Horses +
                     Chicks + Deer + Water + Water_Con + WaterCT +
                     Want2Kno + Visit2


*** Stepwise (forward) analysis of deviance ***

Change                                                     mean   deviance approx
                                 d.f.     deviance     deviance      ratio chi pr
+ FCattle
+ SamGrF
+ Cattle
+ BeefonDairy
+ Gra_Slur
+ Gra_Manu
+ Pigs                            14      79.6050       5.6861       5.69   <.001
+ Housing                          4      13.2654       3.3163       3.32   0.010
+ Max_Age                          1       4.0096       4.0096       4.01   0.045
+ Water                            6       8.7817       1.4636       1.46   0.186
+ Sampler                          1       3.5225       3.5225       3.52   0.061
+ T_DHouse                         1       3.1108       3.1108       3.11   0.078
+ Sil_Geec                         2       3.1216       1.5608       1.56   0.210
+ Hay_Slur                         2       2.9882       1.4941       1.49   0.224
+ Hay_Geec                         1       4.7456       4.7456       4.75   0.029
+ Hay_Manu                         1       3.5651       3.5651       3.57   0.059
+ Hay_Gull                         1       2.6588       2.6588       2.66   0.103
+ Manage_O                         3       3.1311       1.0437       1.04   0.372
+ Sil_Slur                         1       1.3876       1.3876       1.39   0.239
+ Sil_Gull                         1       1.0677       1.0677       1.07   0.301
Residual                         910     858.5149       0.9434

Total                             949     993.4757       1.0469

        Final model: Constant + FCattle + SamGrF + Cattle + BeefonDairy +
                     Gra_Slur + Gra_Manu + Pigs + Housing + Max_Age +
                     Water + Sampler + T_DHouse + Sil_Geec + Hay_Slur +
                     Hay_Geec + Hay_Manu + Hay_Gull + Manage_O +
                     Sil_Slur + Sil_Gull


On fitting this model, it becomes apparent that the model is subject to a serious lack
of fit due to aliasing between Housing and Grass_Slurry. Housing is by far the less
understandable variable and is dropped. Recalculating the stepwise procedure gives:
5851                                                                           RSEARCH
[METHOD=fstep;FORCED=FCattle+SamGrF+Cattle+BeefonDairy+Gra_Slur+Gra_Manu+Pigs]
Manage_O \\
5852 +Sampler+ Max_Age + Min_Age + Housed + NoChange + T_DHouse+Sup_Feed\\




                                           135
5853                     +Forage           +          Silage+Conc+          Sil_Home+
Sil_Manu+Sil_Slur+Sil_Sewa+Sil_Geec+Sil_Gull+Hay         +         Hay_Manu         +
Hay_Slur+Hay_Geec+Hay_Gull\\
5854 +Gra_Sewa+Gra_Geec+Gra_Gull+Sheep + N_Horses+ Chicks + Deer+ Water + Water_Con +
WaterCT+ Want2Kno \\
5855 + Visit2

***** Model Selection *****

 Response variate:   VFarmPos
  Binomial totals:   N_Bin
     Distribution:   Binomial
    Link function:   Logit
  Number of units:   950
     Forced terms:   Constant + FCattle + SamGrF + Cattle + BeefonDairy +
                     Gra_Slur + Gra_Manu + Pigs
          Forced df: 15
         Free terms: Manage_O + Sampler + Max_Age + Min_Age + Housed +
                     NoChange + T_DHouse + Sup_Feed + Forage +
                     Silage + Conc + Sil_Home + Sil_Manu + Sil_Slur +
                     Sil_Sewa + Sil_Geec + Sil_Gull + Hay + Hay_Manu +
                     Hay_Slur + Hay_Geec + Hay_Gull + Gra_Sewa +
                     Gra_Geec + Gra_Gull + Sheep + N_Horses + Chicks +
                     Deer + Water + Water_Con + WaterCT + Want2Kno +
                     Visit2


*** Stepwise (forward) analysis of deviance ***

Change                                                     mean   deviance approx
                                 d.f.     deviance     deviance      ratio chi pr
+ FCattle
+ SamGrF
+ Cattle
+ BeefonDairy
+ Gra_Slur
+ Gra_Manu
+ Pigs                            14      79.6050       5.6861       5.69   <.001
+ Sampler                          1       4.4342       4.4342       4.43   0.035
+ Max_Age                          1       3.7562       3.7562       3.76   0.053
+ Water                            6       8.0363       1.3394       1.34   0.235
+ T_DHouse                         1       3.1636       3.1636       3.16   0.075
+ Hay_Geec                         2       3.4529       1.7264       1.73   0.178
+ Hay_Slur                         1       3.8311       3.8311       3.83   0.050
+ Hay_Manu                         1       3.4441       3.4441       3.44   0.063
+ Manage_O                         3       4.3497       1.4499       1.45   0.226
+ Hay_Gull                         1       2.3074       2.3074       2.31   0.129
+ Sil_Geec                         2       2.5615       1.2807       1.28   0.278
+ Housed                           1       1.3066       1.3066       1.31   0.253
Residual                         915     873.2272       0.9543

Total                            949     993.4757       1.0469

        Final model: Constant + FCattle + SamGrF + Cattle + BeefonDairy +
                     Gra_Slur + Gra_Manu + Pigs + Sampler + Max_Age +
                     Water + T_DHouse + Hay_Geec + Hay_Slur + Hay_Manu +
                     Manage_O + Hay_Gull + Sil_Geec + Housed


The threshold for inclusion is set deliberately low, so many of these factors will lack
statistical significance. We examine their suitability for inclusion in the model by
implementing a backwards stepwise procedure.

1/ Housed is not statistically significant when dropped (p=0.22). Housed is dropped.
2/ Sil_Geece is not statistically significant when dropped (p=0.11). Sil_Geece is
dropped.
3/ Sil_Slur is not statistically significant when dropped (p=0.84). Sil_Slur is dropped.
4/ Sample is not statistically significant when dropped (p= 0.17). Sample is dropped.
5/ Sil_Gull is not statistically significant when dropped (p=0.57). Sil_Gull is
dropped.
6/ Water is not statistically significant when dropped (p=0.19). Water is dropped.


                                           136
7/ Hay_Geece is not statistically significant when dropped (p=0.20). Hay_Geece is
dropped.
8/ Hay_Manu is not statistically significant when dropped (p=0.54). Hay_Manu is
dropped.
9/ Hay_Gull is not statistically significant when dropped (p=0.30). Hay_Gull is
dropped.
10/ Hay_Slurry is not statistically significant when dropped (p=0.25). Hay_Slurry is
dropped.
11/ T_DHouse is not statistically significant when dropped (p=0.11). T_DHouse is
dropped.
12/ Cattle is not statistically significant when dropped (p=0.18). Cattle is dropped.

All the remaining factors are statistically significant at at least the 10% level. The
factor Max_Age has been added as a new candidate factor, where a higher maximum
age in the animals in the sample group means that the samples are less likely to
contain a positive. Examination of the histogram of this variable suggests that it is
unlikely to be subject to serious leverage problems.
5883   DROP [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes]
Cattle


* MESSAGE: Term Gra_Manu cannot be fully included in the model
  because 1 parameter is aliased with terms already in the model

 (Gra_Manu 999) = (Gra_Slur 999)

***** Regression Analysis *****

Response variate:   VFarmPos
 Binomial totals:   N_Bin
    Distribution:   Binomial
   Link function:   Logit
    Fitted terms:   Constant + FCattle + SamGrF + BeefonDairy +
                    Gra_Slur + Gra_Manu + Pigs + Max_Age


*** Summary of analysis ***

                                        mean    deviance approx
             d.f.      deviance     deviance       ratio chi pr
Regression     12          78.0       6.5038        6.50 <.001
Residual      937         915.4       0.9770
Total         949         993.5       1.0469

Change           3          4.9       1.6474      1.65 0.176
* MESSAGE: ratios are based on dispersion parameter with value 1

Dispersion parameter is fixed at 1.00
* MESSAGE: The residuals do not appear to be random;
           for example, fitted values in the range 0.07 to 0.09
           are consistently larger than observed values
           and fitted values in the range 0.55 to 0.58
           are consistently smaller than observed values
* MESSAGE: The error variance does not appear to be constant:
           large responses are more variable than small responses
* MESSAGE: The following units have high leverage:
         Unit     Response    Leverage
           25         1.00       0.046
           53         1.00       0.049
           80         1.00       0.060
          131         1.00       0.091
          297         0.00       0.105
          301         1.00       0.108
          384         0.00       0.109
          385         1.00       0.102
          440         1.00       0.049




                                          137
          497         0.00        0.099
          527         0.00        0.051
          552         0.00        0.050
          581         1.00        0.059
          601         1.00        0.109
          602         0.00        0.096
          640         1.00        0.052
          659         0.00        0.049
          701         1.00        0.097
          788         1.00        0.101
          858         0.00        0.114


*** Estimates of parameters ***

                                                                        antilog of
                              estimate         s.e.        t(*) t pr.     estimate
Constant                        -1.750        0.384       -4.55 <.001       0.1738
FCattle 2                        0.383        0.208        1.84 0.066        1.466
FCattle 3                        0.362        0.234        1.55 0.122        1.436
FCattle 4                        0.981        0.318        3.08 0.002        2.668
SamGrF 2                         0.746        0.266        2.81 0.005        2.109
SamGrF 3                         0.632        0.269        2.35 0.019        1.882
SamGrF 4                         1.097        0.268        4.09 <.001        2.995
BeefonDairy 1                    1.967        0.641        3.07 0.002        7.150
Gra_Slur 1                       1.201        0.319        3.76 <.001        3.324
Gra_Slur 999                     0.084        0.197        0.43 0.668        1.088
Gra_Manu 1                      -1.164        0.369       -3.15 0.002       0.3122
Gra_Manu 999                         0            *           *     *        1.000
Pigs 2                           0.891        0.347        2.57 0.010        2.438
Max_Age                        -0.0309       0.0154       -2.01 0.044       0.9695
* MESSAGE: s.e.s are based on dispersion parameter with   value 1

Parameters for factors are differences compared with the reference level:
              Factor Reference level
             FCattle 1
              SamGrF 1
         BeefonDairy 0
            Gra_Slur 0
            Gra_Manu 0
                Pigs 1


Hence, the factors FCattle, SamGrF, Beefin Dairy, Gra_Slurry, Gra_Manure, Pigs and
the variate Max_Age are carried forward for detailed review in the Generalised Linear
Mixed Model.

Fitting this model in the Generalised Linear Mixed Model context gives the following
output. Initially, County and veterinary practice are fitted as possible random effects
along with Farm.
5560 GLMM
[PRINT=model,monitor,components,vcovariance,means,backmeans,effects;
DISTRIBUTION=binomial;\
5561   LINK=logit; DISPERSION=1; FIXED=SamGrF + Gra_Slur + Gra_Manu
+BeefonDairy + Pigs + FCattle + Max_Age;\
5562   RANDOM=County+Vet+Farm; CONSTANT=estimate; FACT=9; PSE=*;
MAXCYCLE=20; FMETHOD=fixed;\
5563   CADJUST=mean] VFarmPos; NBINOMIAL=N_Bin

**** G5W0001 **** Warning (Code VC 38). Statement 131 in Procedure GLMM

Command: REML [PRINT=*; RMETHOD=all] TRANS
Value of deviance at final iteration larger than at previous iteration(s)

Minimum deviance = 2199.17: value at final iteration = 2215.26


**** G5W0002 **** Warning (Code VD 12). Statement 131 in Procedure GLMM

Command: REML [PRINT=*; RMETHOD=all] TRANS
REML algorithm has diverged/parameters out of bounds - output not available




                                          138
Results may be unreliable. Printed estimates of variance
parameters/monitoring
information are available from REML or VDISPLAY and will indicate which
parameters are unstable. Redefine the model or use better initial values.


**** G5W0003 **** Warning (Code VD 12). Statement 135 in Procedure GLMM

Command: VKEEP #RAND; COMP=V[]
REML algorithm has diverged/parameters out of bounds - output not available

Results may be unreliable. Printed estimates of variance
parameters/monitoring
information are available from REML or VDISPLAY and will indicate which
parameters are unstable. Redefine the model or use better initial values.


* Message: Negative variance components present:

* Tables of effects/means will be produced for random model terms but should
be
used with caution



***** Generalised Linear Mixed Model Analysis *****

              Method:    Marginal model, cf Breslow & Clayton (1993) JASA
    Response variate:    VFarmPos
        Distribution:    BINOMIAL
       Link function:    LOGIT

         Random model:   (County + Vet) + Farm
          Fixed model:   Constant + (((((SamGrF + Gra_Slur) + Gra_Manu) +
Beefin
Dairy) + Pigs) + FCattle) + Max_Age

* Dispersion parameter fixed at value 1.000


******** Warning from GLMM:
         missing values generated in weights/working variate.


*** Monitoring information ***

Iteration     Gammas                            Dispersion   Max change
        1   0.009026    0.0001000     0.2296         1.000   3.4670E+00

******** Warning from GLMM:
         missing values generated in weights/working variate.

        2   0.005302    0.0001000   0.0001000       1.000    2.2952E-01

******** Warning from GLMM:
         missing values generated in weights/working variate.

        3   0.005788    0.0001000     0.09561       1.000    9.5507E-02

******** Warning from GLMM:
         missing values generated in weights/working variate.

        4   0.005703    0.0001000     0.2326        1.000    1.3699E-01

******** Warning from GLMM:
         missing values generated in weights/working variate.

        5   0.005657    0.0001000     0.2370        1.000    4.3747E-03

******** Warning from GLMM:
         missing values generated in weights/working variate.

        6   0.005638    0.0001000     0.2368         1.000   2.0973E-04

******** Warning from GLMM:
         missing values generated in weights/working variate.




                                           139
           7    0.005635   0.0001000          0.2367          1.000     7.3462E-05


*** Estimated Variance Components ***

Random term                      Component             S.e.

County                               0.006           0.052
Vet                                  0.000           0.093
Farm                                 0.237           0.304


*** Residual variance model ***

Term                Factor            Model(order)      Parameter           Estimate
S.e.

Dispersn                              Identity          Sigma2                1.000
FIXED



*** Estimated Variance matrix for Variance Components ***

   County       1        0.00272
      Vet       2       -0.00204         0.00858
     Farm       3       -0.00034        -0.00654           0.09255
 Dispersn       4        0.00000         0.00000           0.00000      0.00000

                                 1               2                3           4



*** Table of effects for Constant ***

               -2.349    Standard error:       0.2667



*** Table of effects for SamGrF ***

 SamGrF             1        2            3         4
               0.0000   0.7505       0.6321    1.0896

Standard error of differences:            Average             0.2515
                                          Maximum             0.2737
                                          Minimum             0.2280

Average variance of differences:                              0.06369


*** Table of effects for Gra_Slur ***

Gra_Slur          0.0      1.0        999.0
               0.0000   1.2091       0.0885

Standard error of differences:            Average              0.2846
                                          Maximum              0.3277
                                          Minimum              0.2011

Average variance of differences:                              0.08451


*** Table of effects for Gra_Manu ***

   Gra_Manu                0.0             1.0           999.0
                        0.0000         -1.1654          0.0000

Standard error of differences:                0.3776



*** Table of effects for BeefonDairy ***

BeefonDairy   0.0000   1.0000
           0.0000   1.9659




                                                     140
Standard error of differences:           0.6598



*** Table of effects for Pigs ***

   Pigs         1          2
           0.0000     0.8876

Standard error of differences:           0.3565



*** Table of effects for FCattle ***

FCattle         1          2         3          4
           0.0000     0.3834    0.3564     0.9813

Standard error of differences:       Average           0.2796
                                     Maximum           0.3352
                                     Minimum           0.2130

Average variance of differences:                      0.08062


*** Table of effects for Max_Age ***

          -0.03106     Standard error:      0.015738




**** G5W0004 **** Warning (Code VC 19). Statement 268 in Procedure GLMM

Command: VKEEP #PFORM; MEANS=MEANS[]; VARMEANS=VARMEANS[]
Table/sed matrix not available for mean effects of covariates

Table of mean effects cannot be saved for term Max_Age
as it is a variate/covariate



*** Tables of means ***


* Using covariate mean values




*** Table of predicted means for SamGrF ***

     SamGrF                1            2                3           4
                     -0.4479       0.3025           0.1842      0.6417


*** Table of predicted means for Gra_Slur ***

   Gra_Slur              0.0          1.0           999.0
                     -0.2624       0.9467         -0.1740


*** Table of predicted means for Gra_Manu ***

   Gra_Manu             0.0          1.0             999.0
                     0.5586      -0.6068            0.5586


*** Table of predicted means for BeefonDairy ***

BeefonDairy           0.0000      1.0000
                     -0.8129      1.1531


*** Table of predicted means for Pigs ***




                                              141
       Pigs             1            2
                  -0.2737       0.6139


*** Table of predicted means for FCattle ***

    FCattle             1            2              3          4
                  -0.2602       0.1233         0.0962     0.7212


*** Back-transformed Means (on the original scale) ***


* Using covariate mean values



      SamGrF
           1      0.3899
           2      0.5751
           3      0.5459
           4      0.6551


    Gra_Slur
         0.0      0.4348
         1.0      0.7204
       999.0      0.4566


    Gra_Manu
         0.0      0.6361
         1.0      0.3528
       999.0      0.6361


 BeefonDairy
      0.0000      0.3073
      1.0000      0.7601


        Pigs
           1      0.4320
           2      0.6488


     FCattle
           1      0.4353
           2      0.5308
           3      0.5240
           4      0.6729

Note: means are probabilities not expected values.

Veterinary practice is clearly the least significant (in fact, virtually non-existent)
variance component. The model is refitted without this random factor.
5564 GLMM
[PRINT=model,monitor,components,vcovariance,means,backmeans,effects;
DISTRIBUTION=binomial;\
5565   LINK=logit; DISPERSION=1; FIXED=SamGrF + Gra_Slur + Gra_Manu
+BeefonDairy + Pigs + FCattle + Max_Age;\
5566   RANDOM=County+Farm; CONSTANT=estimate; FACT=9; PSE=*; MAXCYCLE=20;
FMETHOD=fixed;\
5567   CADJUST=mean] VFarmPos; NBINOMIAL=N_Bin

* Message: Negative variance components present:

* Tables of effects/means will be produced for random model terms but should
be
used with caution



***** Generalised Linear Mixed Model Analysis *****




                                         142
                 Method:     Marginal model, cf Breslow & Clayton (1993) JASA
       Response variate:     VFarmPos
           Distribution:     BINOMIAL
          Link function:     LOGIT

           Random model:     County + Farm
            Fixed model:     Constant + (((((SamGrF + Gra_Slur) + Gra_Manu) +
Beefin
Dairy) + Pigs) + FCattle) + Max_Age

* Dispersion parameter fixed at value 1.000


******** Warning from GLMM:
         missing values generated in weights/working variate.


*** Monitoring information ***

Iteration         Gammas                Dispersion         Max change
        1      0.0001000      0.1210         1.000         3.5292E+00

******** Warning from GLMM:
         missing values generated in weights/working variate.

           2   0.0001000   0.0001000          1.000        1.2092E-01

******** Warning from GLMM:
         missing values generated in weights/working variate.

           3   0.0001000   0.0001000          1.000        0.0000E+00


*** Estimated Variance Components ***

Random term                   Component               S.e.

County                               0.000           0.042
Farm                                 0.000           0.278


*** Residual variance model ***

Term                Factor            Model(order)     Parameter        Estimate
S.e.

Dispersn                              Identity         Sigma2             1.000
FIXED



*** Estimated Variance matrix for Variance Components ***

   County       1        0.00173
     Farm       2       -0.00160        0.07753
 Dispersn       3        0.00000        0.00000            0.00000

                                 1               2               3



*** Table of effects for Constant ***

               -2.345    Standard error:      0.2540



*** Table of effects for SamGrF ***

 SamGrF             1        2            3        4
               0.0000   0.7439       0.6306   1.0895

Standard error of differences:            Average            0.2424
                                          Maximum            0.2623
                                          Minimum            0.2212




                                                     143
Average variance of differences:                    0.05914


*** Table of effects for Gra_Slur ***

Gra_Slur       0.0       1.0    999.0
            0.0000    1.2053   0.0917

Standard error of differences:      Average           0.2747
                                    Maximum           0.3156
                                    Minimum           0.1946

Average variance of differences:                    0.07864


*** Table of effects for Gra_Manu ***

   Gra_Manu              0.0         1.0           999.0
                      0.0000     -1.1552          0.0000

Standard error of differences:          0.3589



*** Table of effects for BeefonDairy ***

BeefonDairy   0.0000   1.0000
           0.0000   1.9648

Standard error of differences:          0.6380



*** Table of effects for Pigs ***

   Pigs          1         2
            0.0000    0.8917

Standard error of differences:          0.3454



*** Table of effects for FCattle ***

FCattle          1         2        3         4
            0.0000    0.3818   0.3515    0.9802

Standard error of differences:      Average           0.2710
                                    Maximum           0.3254
                                    Minimum           0.2062

Average variance of differences:                      0.07578


*** Table of effects for Max_Age ***

           -0.03085    Standard error:     0.015202




**** G5W0005 **** Warning (Code VC 19). Statement 268 in Procedure GLMM

Command: VKEEP #PFORM; MEANS=MEANS[]; VARMEANS=VARMEANS[]
Table/sed matrix not available for mean effects of covariates

Table of mean effects cannot be saved for term Max_Age
as it is a variate/covariate



*** Tables of means ***


* Using covariate mean values




                                             144
*** Table of predicted means for SamGrF ***

     SamGrF             1            2               3        4
                  -0.4409       0.3030          0.1897   0.6486


*** Table of predicted means for Gra_Slur ***

   Gra_Slur           0.0          1.0           999.0
                  -0.2572       0.9481         -0.1655


*** Table of predicted means for Gra_Manu ***

   Gra_Manu           0.0           1.0          999.0
                   0.5602       -0.5950         0.5602


*** Table of predicted means for BeefonDairy ***

BeefonDairy        0.0000       1.0000
                  -0.8073       1.1575


*** Table of predicted means for Pigs ***

       Pigs             1            2
                  -0.2707       0.6210


*** Table of predicted means for FCattle ***

    FCattle             1            2               3        4
                  -0.2533       0.1286          0.0982   0.7270


*** Back-transformed Means (on the original scale) ***


* Using covariate mean values



      SamGrF
           1      0.3915
           2      0.5752
           3      0.5473
           4      0.6567


    Gra_Slur
         0.0      0.4360
         1.0      0.7207
       999.0      0.4587


    Gra_Manu
         0.0      0.6365
         1.0      0.3555
       999.0      0.6365


 BeefonDairy
      0.0000      0.3085
      1.0000      0.7609


        Pigs
           1      0.4327
           2      0.6504


     FCattle
           1      0.4370



                                            145
               2       0.5321
               3       0.5245
               4       0.6741

Note: means are probabilities not expected values.

Neither variance component is significantly affecting the model. It would seem
sensible, however, to attempt to fit the model with only the lowest stratum of
variability.
5568 GLMM
[PRINT=model,monitor,components,vcovariance,means,backmeans,effects;
DISTRIBUTION=binomial;\
5569   LINK=logit; DISPERSION=1; FIXED=SamGrF + Gra_Slur + Gra_Manu
+BeefonDairy + Pigs + FCattle + Max_Age;\
5570   RANDOM=Farm; CONSTANT=estimate; FACT=9; PSE=*; MAXCYCLE=20;
FMETHOD=fixed;\
5571   CADJUST=mean] VFarmPos; NBINOMIAL=N_Bin

***** Generalised Linear Mixed Model Analysis *****

                 Method:      Marginal model, cf Breslow & Clayton (1993) JASA
       Response variate:      VFarmPos
           Distribution:      BINOMIAL
          Link function:      LOGIT

           Random model:      Farm
            Fixed model:      Constant + (((((SamGrF + Gra_Slur) + Gra_Manu) +
Beefin
Dairy) + Pigs) + FCattle) + Max_Age

* Dispersion parameter fixed at value 1.000


******** Warning from GLMM:
         missing values generated in weights/working variate.


*** Monitoring information ***

Iteration           Gammas Dispersion       Max change
        1          0.09728      1.000       3.5426E+00

******** Warning from GLMM:
         missing values generated in weights/working variate.

           2   0.0001000        1.000       9.7176E-02

******** Warning from GLMM:
         missing values generated in weights/working variate.

           3   0.0001000        1.000       0.0000E+00


*** Estimated Variance Components ***

Random term                     Component          S.e.

Farm                               0.000          0.276


*** Residual variance model ***

Term                 Factor         Model(order)    Parameter     Estimate
S.e.

Dispersn                            Identity        Sigma2           1.000
FIXED



*** Estimated Variance matrix for Variance Components ***

     Farm      1        0.07605
 Dispersn      2        0.00000         0.00000



                                                  146
                               1               2



*** Table of effects for Constant ***

            -2.345     Standard error:       0.2540



*** Table of effects for SamGrF ***

 SamGrF          1         2            3         4
            0.0000    0.7439       0.6307    1.0896

Standard error of differences:          Average           0.2424
                                        Maximum           0.2623
                                        Minimum           0.2212

Average variance of differences:                         0.05914


*** Table of effects for Gra_Slur ***

Gra_Slur       0.0       1.0        999.0
            0.0000    1.2053       0.0917

Standard error of differences:          Average           0.2746
                                        Maximum           0.3156
                                        Minimum           0.1946

Average variance of differences:                         0.07863


*** Table of effects for Gra_Manu ***

   Gra_Manu              0.0             1.0            999.0
                      0.0000         -1.1552           0.0000

Standard error of differences:              0.3589



*** Table of effects for BeefonDairy ***

BeefonDairy   0.0000   1.0000
           0.0000   1.9648

Standard error of differences:              0.6380



*** Table of effects for Pigs ***

   Pigs          1         2
            0.0000    0.8918

Standard error of differences:              0.3454



*** Table of effects for FCattle ***

FCattle          1         2            3          4
            0.0000    0.3818       0.3515     0.9802

Standard error of differences:          Average           0.2710
                                        Maximum           0.3254
                                        Minimum           0.2062

Average variance of differences:                         0.07578


*** Table of effects for Max_Age ***

           -0.03085    Standard error:         0.015201



                                                   147
**** G5W0006 **** Warning (Code VC 19). Statement 268 in Procedure GLMM

Command: VKEEP #PFORM; MEANS=MEANS[]; VARMEANS=VARMEANS[]
Table/sed matrix not available for mean effects of covariates

Table of mean effects cannot be saved for term Max_Age
as it is a variate/covariate



*** Tables of means ***


* Using covariate mean values




*** Table of predicted means for SamGrF ***

     SamGrF             1            2               3         4
                  -0.4408       0.3031          0.1898    0.6487


*** Table of predicted means for Gra_Slur ***

   Gra_Slur           0.0          1.0           999.0
                  -0.2571       0.9481         -0.1654


*** Table of predicted means for Gra_Manu ***

   Gra_Manu           0.0           1.0          999.0
                   0.5603       -0.5949         0.5603


*** Table of predicted means for BeefonDairy ***

BeefonDairy        0.0000       1.0000
                  -0.8072       1.1576


*** Table of predicted means for Pigs ***

       Pigs             1            2
                  -0.2707       0.6211


*** Table of predicted means for FCattle ***

    FCattle             1            2               3         4
                  -0.2532       0.1287          0.0983    0.7270


*** Back-transformed Means (on the original scale) ***


* Using covariate mean values



      SamGrF
           1      0.3915
           2      0.5752
           3      0.5473
           4      0.6567


    Gra_Slur
         0.0      0.4361
         1.0      0.7207
       999.0      0.4587



                                            148
    Gra_Manu
         0.0      0.6365
         1.0      0.3555
       999.0      0.6365


 BeefonDairy
      0.0000      0.3085
      1.0000      0.7609


        Pigs
           1      0.4327
           2      0.6505


     FCattle
           1      0.4370
           2      0.5321
           3      0.5246
           4      0.6742

Note: means are probabilities not expected values.

Given the complete lack of significance of the Farm effect, it was thought worthwhile
to investigate the equivalent model incorporating County as the sole random effect.
 5576 GLMM
[PRINT=model,monitor,components,vcovariance,means,backmeans,effects;
DISTRIBUTION=binomial;\
5577   LINK=logit; DISPERSION=1; FIXED=SamGrF + Gra_Slur + Gra_Manu
+BeefonDairy + Pigs + FCattle + Max_Age;\
5578   RANDOM=County; CONSTANT=estimate; FACT=9; PSE=*; MAXCYCLE=20;
FMETHOD=fixed;\
5579   CADJUST=mean] VFarmPos; NBINOMIAL=N_Bin

* Message: Negative variance components present:

* Tables of effects/means will be produced for random model terms but should
be
used with caution



***** Generalised Linear Mixed Model Analysis *****

              Method:    Marginal model, cf Breslow & Clayton (1993) JASA
    Response variate:    VFarmPos
        Distribution:    BINOMIAL
       Link function:    LOGIT

         Random model:   County
          Fixed model:   Constant + (((((SamGrF + Gra_Slur) + Gra_Manu) +
Beefin
Dairy) + Pigs) + FCattle) + Max_Age

* Dispersion parameter fixed at value 1.000


******** Warning from GLMM:
         missing values generated in weights/working variate.


*** Monitoring information ***

Iteration      Gammas Dispersion      Max change
        1   0.0001000      1.000      1.1357E-02

******** Warning from GLMM:
         missing values generated in weights/working variate.

        2   0.0001000      1.000      0.0000E+00




                                           149
*** Estimated Variance Components ***

Random term                    Component             S.e.

County                             0.000           0.036


*** Residual variance model ***

Term              Factor            Model(order)      Parameter       Estimate
S.e.

Dispersn                            Identity          Sigma2            1.000
FIXED



*** Estimated Variance matrix for Variance Components ***

   County     1      0.0012650
 Dispersn     2      0.0000000      0.0000000

                               1               2



*** Table of effects for Constant ***

            -2.235     Standard error:       0.2221



*** Table of effects for SamGrF ***

 SamGrF          1         2            3         4
            0.0000    0.6742       0.5690    1.0116

Standard error of differences:          Average             0.2221
                                        Maximum             0.2363
                                        Minimum             0.2097

Average variance of differences:                            0.04944


*** Table of effects for Gra_Slur ***

Gra_Slur       0.0       1.0        999.0
            0.0000    1.1463       0.0813

Standard error of differences:          Average             0.2590
                                        Maximum             0.2994
                                        Minimum             0.1803

Average variance of differences:                            0.07020


*** Table of effects for Gra_Manu ***

   Gra_Manu              0.0             1.0           999.0
                      0.0000         -1.0643          0.0000

Standard error of differences:              0.3138



*** Table of effects for BeefonDairy ***

BeefonDairy   0.0000   1.0000
           0.0000   1.9072

Standard error of differences:              0.6258



*** Table of effects for Pigs ***




                                                   150
   Pigs         1          2
           0.0000     0.8653

Standard error of differences:          0.3371



*** Table of effects for FCattle ***

FCattle         1          2        3          4
           0.0000     0.3586   0.3300     0.9417

Standard error of differences:       Average           0.2587
                                     Maximum           0.3151
                                     Minimum           0.1918

Average variance of differences:                     0.06943


*** Table of effects for Max_Age ***

          -0.02929     Standard error:      0.014025




******** Warning (Code VC 19). Statement 268 in Procedure GLMM

Command: VKEEP #PFORM; MEANS=MEANS[]; VARMEANS=VARMEANS[]
Table/sed matrix not available for mean effects of covariates

Table of mean effects cannot be saved for term Max_Age
as it is a variate/covariate



*** Tables of means ***


* Using covariate mean values




*** Table of predicted means for SamGrF ***

     SamGrF                1            2               3            4
                     -0.3868       0.2874          0.1822       0.6248


*** Table of predicted means for Gra_Slur ***

   Gra_Slur              0.0          1.0          999.0
                     -0.2322       0.9140        -0.1510


*** Table of predicted means for Gra_Manu ***

   Gra_Manu             0.0          1.0            999.0
                     0.5317      -0.5326           0.5317


*** Table of predicted means for BeefonDairy ***

BeefonDairy           0.0000     1.0000
                     -0.7767     1.1305


*** Table of predicted means for Pigs ***

       Pigs                1            2
                     -0.2557       0.6096


*** Table of predicted means for FCattle ***




                                             151
       FCattle            1             2             3             4
                    -0.2307        0.1280        0.0993        0.7111


*** Back-transformed Means (on the original scale) ***


* Using covariate mean values



        SamGrF
             1      0.4045
             2      0.5714
             3      0.5454
             4      0.6513


       Gra_Slur
            0.0     0.4422
            1.0     0.7138
          999.0     0.4623


       Gra_Manu
            0.0     0.6299
            1.0     0.3699
          999.0     0.6299


 BeefonDairy
      0.0000        0.3150
      1.0000        0.7559


          Pigs
             1      0.4364
             2      0.6478


       FCattle
             1      0.4426
             2      0.5320
             3      0.5248
             4      0.6706

Note: means are probabilities not expected values.

Hence there is no evidence of any of the random effects being particularly important.
However, it would seem sensible to use a REML-type algorithm to fit the data, given
the strongly unbalanced nature of the dataset. Hence, we will fit the model with Farm
as the sole random effect. Refitting the model (output not listed) and calculating
Wald statistics for the fixed effects gives the following results:
5582    VDISPLAY [PRINT=Wald]

5582............................................................................


*** Wald tests for fixed effects ***


  Fixed term                  Wald statistic         d.f.   Wald/d.f.   Chi-sq prob

* Sequentially adding terms to fixed model

  SamGrF                           26.59               3        8.86     <0.001
  Gra_Slur                          9.38               2        4.69      0.009
  Gra_Manu                          9.17               1        9.17      0.002
  BeefonDairy                       8.10               1        8.10      0.004
  Pigs                              5.21               1        5.21      0.022
  FCattle                           7.79               3        2.60      0.051
  Max_Age                           4.12               1        4.12      0.042




                                               152
* Dropping individual terms from full fixed model

   SamGrF                                    17.56             3      5.85         <0.001
   Gra_Slur                                  15.20             2      7.60         <0.001
   Gra_Manu                                  10.36             1     10.36          0.001
   BeefonDairy                                9.48             1      9.48          0.002
   Pigs                                       6.67             1      6.67          0.010
   FCattle                                   10.36             3      3.45          0.016
   Max_Age                                    4.12             1      4.12          0.042

* Message: chi-square distribution for Wald tests is an asymptotic approximation
  (i.e. for large samples) and underestimates the probabilities in other cases.

Even allowing for the liberal nature of the Wald tests, it is clear that there is strong
statistical evidence for the inclusion of each of the factors in the final multi-factor
model.

Each factor will be reviewed in turn, plotting the mean estimated farm prevalence for
different levels of each factor, along with the associated 95% confidence intervals.

Considering SamGrF, there is clear evidence that farms with fewer than 12 animals in
the sampling group have a lower probability of exhibiting shedding.

  Category                      Mean Farm Prevalence
    <12                                 0.39
   12-17                                0.58
   18-28                                0.55
    >28                                 0.66

Any trend in the data would be assumed to be monotonic, and hence it seems likely
that the (statistically insignificant) difference between categories 2 and 3 is simply
due to stochastic noise. It is not immediately clear how the prevalence in the highest
category relates to those in the intermediate categories.


                           1


                          0.8
   Mean Farm Prevalence




                          0.6


                          0.4


                          0.2


                           0
                                <12         12-17            18-28           >28
                                                 Categories




                                                       153
Contrasting the mean in the first category with the means in the 2 intermediate
categories, we find that the mean difference (on the logit scale) equals 0.69, the
standard error is 0.23 and hence the t-statistic equals 2.93, with an associated p-value
of 0.003. Hence, the probability of detecting shedding is lower in groups containing
fewer than 12 animals than in groups containing 12-28 animals. Contrasting the mean
in the final category with the means in the 2 intermediate categories, we find that the
mean difference on the logit scale equals 0.40, the standard error is 0.19 and hence the
t-statistic equals 2.12, with an associated p-value of 0.03. Hence, the probability of
detecting shedding is lower in groups containing 12-28 animals than in groups
containing more animals.

It might be thought that this is a truism: that if on all farms, each animal has an
independent chance of shedding, and hence the larger the number of samples tested,
the more likely it is that a positive sample will be detected. In practice, we might
suspect that the independence assumption is extremely unlikely to be true, but we
need to assess the results under such a hypothesis. The first requirement is to estimate
the independent probability of animal infection. For each category, we tabulate the
median number of samples collected, and hence, based on the estimated farm
prevalences for these categories, an estimate of the individual probabilities.

                 Mean             Median         Individual
 Category      Prevalence         Samples        Probability
       <12              0.39                 8           0.060
     12-17              0.58                14           0.059
     18-28              0.55                18           0.043
       >28              0.66                22           0.047

The higher the number of samples in the sample, the weaker the effect of variability in
the sampling distribution on the individual probabilities. However, the Highest
category is unbounded, which will increase the variability again. On this basis, a
value of 0.043, derived from the 18-28 category, is used as the estimate of the
individual probability.

         Estimated
         Prevalence Modelled
Category from Data Prevalence
     <12       0.39       0.30
   12-17       0.58       0.46
   18-28       0.55       0.55
     >28       0.66       0.62

Given the sizeable numbers of farms in the study, the differences between the
estimated and modelled prevalences in the lowest two categories are appreciable.
Similar results were generated by calculating the individual probability of detection
for each farm, and then averaging these by category. On this basis, it seems unlikely
that the pattern of prevalences associated with SamGrF are purely explicable as being
a mechanical association with the highly correlated term, number of samples
collected. Besides which, the within-herd prevalence estimated here is very much less



                                          154
than that calculated from the within-herd prevalence data. This must cast
considerable doubt on the argument that this observed effect is an artefact of the
sampling scheme. However, this possibility should be taken into account when
discussing this variable. However, the inclusion of FCattle in the model, even in the
presence of SamGrF, indicates that there are genuine ‘size of operation’ effects
present in the epidemiology of infection.

In view of this, we will next consider the factor FCattle. The pattern of prevalence
can be seen in the following diagram:

                           1


                          0.8
   Mean Farm Prevalence




                          0.6


                          0.4


                          0.2


                           0
                                1-49         50-99           100-199   200+
                                                 Categories



The mean farm prevalences for each category of FCattle are as follows:

  Category                      Mean Farm Prevalence
    1-49                                0.44
   50-99                                0.53
  100-199                               0.52
   200+                                 0.67

There is some indication of an upwards trend in the data with respect to higher
numbers of finishing cattle, especially when comparing the lowest category, the
middle two categories and the highest category.

Comparing the lowest category (<50 animals) with the two intermediate categories
(50-99 and 100-199), the mean difference in prevalences (on the logit scale) is 0.37,
with a standard error of 0.19, giving rise to a t-statistic of 1.98 and an associated p-
value of 0.048. Comparing the intermediate categories with the highest category
(200+ animals), the mean difference in prevalences (on the logit scale) is 0.61, with a
standard error of 0.30, giving rise to a t-statistic of 2.07 and an associated p-value of
0.039. Hence, there is evidence of a trend of increased risk of shedding being
identified, associated with higher numbers of finishing cattle on the farm. In the
context of SamGrF also being fitted to the model, this result is almost certainly a
genuine effect of enterprise size. It might be associated with some threshold results


                                                       155
from epidemic modelling theory, or from some aspects of animal management on
larger enterprises.

Next, considering the effect of spreading slurry on pasture. It will be remembered
that this question was in the main asked only to farms with animals at pasture. Hence,
the inclusion in this analysis of a ‘Housed’ category, to reflect the prevalences seen,
on average, on farms on which the question was not asked. The mean prevalences for
the different categories are as follows:

                        Mean Farm
      Category          Prevalence
 Unhoused: No Slurry       0.44
Unhoused: Slurry Spread    0.72
       Housed              0.46

It is apparent that the mean prevalences seen in Housed animals and in Pastured
animals from farms which do not spread slurry are virtually identical. However, the
mean prevalence on farms which do spread slurry is appreciably higher. Comparing
the mean prevelences on farms with animals at pasture, comparing those which spread
slurry and those which do not, we find that the mean difference (on the logit scale)
equals 1.21, the standard error equals 0.32, giving rise to a t-statistic of 3.82 and an
associated p-value less than 0.001. The spreading of slurry on pasture is a significant
risk factor on farms with animals at pasture.


                           1


                          0.8
   Mean Farm Prevalence




                          0.6


                          0.4


                          0.2


                           0
                                Unhoused: No Slurry   Unhoused: Slurry   Housed
                                                          Spread
                                                        Categories



Next, we consider the effect of spreading manure on pasture. Again, this question
was in the main asked only to farms with animals at pasture. Hence, the repeated
inclusion in this analysis of a ‘Housed’ category, to reflect the prevalences seen, on
average, on farms on which the question was not asked.




                                                            156
                           1


                          0.8
   Mean Farm Prevalence



                          0.6


                          0.4


                          0.2


                           0
                                Unhoused: No Manure   Unhoused: Manure   Housed
                                                           Spread
                                                        Categories



The mean farm prevalences for the different categories of farm are as follows:

                         Mean Farm
        Category         Prevalence
  Unhoused: No Manure       0.64
 Unhoused: Manure Spread    0.36
         Housed             0.64

It is apparent that the mean prevalences seen in Housed animals and in Pastured
animals from farms which do not spread manure are virtually identical. However, the
mean prevalence on farms which do spread manure is appreciably lower. Comparing
the mean prevelences on farms with animals at pasture, comparing those which spread
manure and those which do not, we find that the mean difference (on the logit scale)
equals 1.16, the standard error equals 0.36, giving rise to a t-statistic of 3.22 and an
associated p-value of 0.001. The spreading of manure on pasture is a significant
protective factor on farms with animals at pasture. This result may appear somewhat
counterintuitive: however, it may be related to the manure management regime in
place on a farm which wishes to spread this material on pasture. If the regimen which
is put in place to achieve this reduces contact of animals with faeces in the short term
during time periods when the animals are housed, this may have a negative effect on
the ability of the infection to maintain itself on the farm, and hence it gives rise to a
reduction in farm prevalence even later, when the animals (ironically) are at pasture,
and hence in contact with the manure. The results seen earlier in the within-herd
prevalence analysis would suggest that the contact of animals with infection while
housed is more important in maintaining high prevalence levels than any contact
while at pasture. It is unfortunate that the design of the study does not allow any
investigation of whether similar manure on pasture effects are present on farms with
(currently) housed animals.




                                                            157
Farms with beef animals in a dairy herd were identified as high risk in the earlier
analyses. Considering the BeefonDairy factor, it is immediately clear that the
prevalence is much higher on this class of farm.

                                  Mean Farm
            Category              Prevalence
Not a Dairy Farm with Beef Cattle    0.31
  Dairy Farm with Beef Cattle        0.76

The means and 95% confidence intervals are given in the following plot:


                           1


                          0.8
   Mean Farm Prevalence




                          0.6


                          0.4


                          0.2


                           0
                                Not a Dairy Farm with Beef Cattle     Dairy Farm with Beef Cattle
                                                           Categories



Carrying out a t-test, the mean difference (on the logit scale) is found to be 1.96, the
standard error is 0.64, the t-statistic equals 3.08, and the associated p-value 0.002.
Hence, the prevalence is highly statistically significantly higher in this class of farm.
It is of some concern that this particular group was only identified through a detailed
examination of the data, but the high prevalence seen in this group is extremely
striking.

The final factor which has been examined is Pigs. The mean farm prevalence for each
category is as follows:

                                                Mean Farm
        Category                                Prevalence
     Pigs not present                              0.43
       Pigs present                                0.65

The picture becomes more clear if the means are plotted with the associated 95%
confidence intervals:




                                                                158
                           1


                          0.8
   Mean Farm Prevalence




                          0.6


                          0.4


                          0.2


                           0
                                Pigs not present                Pigs present
                                                   Categories



The data would suggest that farms with pigs present exhibit a higher prevalence than
those which do not. Carrying out a t-test, the mean difference (on the logit scale) is
found to be 0.89, the standard error is 0.35, the t-statistic equals 2.58, and the
associated p-value 0.01. Hence, the prevalence is statistically significantly higher in
this class of farm.

The only variate which has been included in the model is Max_Age. The effect of
this variate on the linear predictor is summarised by the associated coefficient, which
takes the estimated value of –0.03, with a standard error of 0.015. The associated p-
value equals 0.04. Hence, this result suggests that the higher the maximum age of
animal present in the sampling group, the less likely is the group to present a positive
sample. The nature of the effect is similar to that seen in the univariate analysis,
where the associated p-value was 0.30. However, the removal of noise through the
fitting of other explanatory factors has clearly allowed the multi-factor model to
identify the utility of this variate in explaining aspects of the data. A review of the
histogram of the variate would suggest that it is unlikely to be subject to issues of
leverage.

Having fitted all the likely explanatory variables in the multifactor model, we now
return to explore the effect that the inclusion of these factors may have on the fit of
the structural factors.

Fitting Division in addition to the above explanatory variables gives the following
output:
5567       GLMM    [PRINT=model,monitor,components,vcovariance,means,backmeans,effects;
DISTRIBUTION=binomial;\
5568 LINK=logit; DISPERSION=1; FIXED=SamGrF + Gra_Slur + Gra_Manu+BeefonDairy + Pigs
+ FCattle + Max_Age+Division;\
5569    RANDOM=Farm; CONSTANT=estimate; FACT=9; PSE=*; MAXCYCLE=20; FMETHOD=fixed;
CADJUST=mean] VFarmPos; NBINOMIAL=N_Bin

***** Generalised Linear Mixed Model Analysis *****




                                                      159
                 Method:     Marginal model, cf Breslow & Clayton (1993) JASA
       Response variate:     VFarmPos
           Distribution:     BINOMIAL
          Link function:     LOGIT

          Random model: Farm
           Fixed model: Constant + ((((((SamGrF + Gra_Slur) + Gra_Manu) + Beefi
nDairy) + Pigs) + FCattle) + Max_Age) + Division

* Dispersion parameter fixed at value 1.000


******** Warning from GLMM:
         missing values generated in weights/working variate.


*** Monitoring information ***

Iteration         Gammas Dispersion            Max change
        1         0.1286      1.000            3.5100E+00

******** Warning from GLMM:
         missing values generated in weights/working variate.

           2 0.000001000             1.000      1.2864E-01

******** Warning from GLMM:
         missing values generated in weights/working variate.

           3 0.000001000             1.000      0.0000E+00


*** Estimated Variance Components ***

Random term                   Component                 S.e.

Farm                                  0.000            0.277


*** Residual variance model ***

Term                Factor              Model(order)      Parameter      Estimate    S.e.

Dispersn                               Identity           Sigma2           1.000    FIXED



*** Estimated Variance matrix for Variance Components ***

     Farm     1       0.07674
 Dispersn     2       0.00000                0.00000

                                 1                2



*** Table of effects for Constant ***

            -2.144     Standard error:           0.3003



*** Table of effects for SamGrF ***

 SamGrF          1          2              3          4
            0.0000     0.7174         0.5415     1.0466

Standard error of differences:                Average          0.2447
                                              Maximum          0.2661
                                              Minimum          0.2227

Average variance of differences:                               0.06023


*** Table of effects for Gra_Slur ***

Gra_Slur       0.0         1.0        999.0



                                                       160
           0.0000     1.2801   0.0802

Standard error of differences:        Average           0.2790
                                      Maximum           0.3217
                                      Minimum           0.1955

Average variance of differences:                     0.08130


*** Table of effects for Gra_Manu ***

   Gra_Manu              0.0           1.0          999.0
                      0.0000       -1.1381         0.0000

Standard error of differences:           0.3610



*** Table of effects for BeefonDairy ***

BeefonDairy     0.0000   1.0000
              0.000    2.015

Standard error of differences:          0.6400



*** Table of effects for Pigs ***

   Pigs         1          2
           0.0000     0.8741

Standard error of differences:          0.3480



*** Table of effects for FCattle ***

FCattle         1          2        3          4
           0.0000     0.3680   0.3494     0.9796

Standard error of differences:        Average           0.2747
                                      Maximum           0.3277
                                      Minimum           0.2076

Average variance of differences:                     0.07788


*** Table of effects for Max_Age ***

          -0.03181     Standard error:       0.015407



*** Table of effects for Division ***

   Division          Central      Highland        Islands   North East   South East
                      0.0000       -0.4960        -0.2883       0.0093       0.0066


   Division     South West
                   -0.3872

Standard error of differences:        Average           0.3212
                                      Maximum           0.4244
                                      Minimum           0.2437

Average variance of differences:                        0.1062




**** G5W0003 **** Warning (Code VC 19). Statement 268 in Procedure GLMM

Command: VKEEP #PFORM; MEANS=MEANS[]; VARMEANS=VARMEANS[]
Table/sed matrix not available for mean effects of covariates




                                              161
Table of mean effects cannot be saved for term Max_Age
as it is a variate/covariate



*** Tables of means ***


* Using covariate mean values




*** Table of predicted means for SamGrF ***

     SamGrF             1              2             3            4
                  -0.3942         0.3232        0.1473       0.6524


*** Table of predicted means for Gra_Slur ***

   Gra_Slur           0.0            1.0         999.0
                  -0.2713         1.0088       -0.1911


*** Table of predicted means for Gra_Manu ***

   Gra_Manu           0.0            1.0         999.0
                   0.5615        -0.5766        0.5615


*** Table of predicted means for BeefonDairy ***

BeefonDairy   0.0000   1.0000
           -0.825    1.189


*** Table of predicted means for Pigs ***

       Pigs             1              2
                  -0.2549         0.6192


*** Table of predicted means for FCattle ***

    FCattle             1              2             3            4
                  -0.2421         0.1259        0.1073       0.7375


*** Table of predicted means for Division ***

   Division      Central        Highland      Islands    North East   South East
                  0.3748         -0.1213       0.0864        0.3841       0.3814


   Division    South West
                  -0.0124


*** Back-transformed Means (on the original scale) ***


* Using covariate mean values



      SamGrF
           1      0.4027
           2      0.5801
           3      0.5367
           4      0.6575


    Gra_Slur
         0.0      0.4326
         1.0      0.7328
       999.0      0.4524



                                            162
       Gra_Manu
            0.0     0.6368
            1.0     0.3597
          999.0     0.6368


  BeefonDairy
       0.0000       0.3047
       1.0000       0.7666


          Pigs
             1      0.4366
             2      0.6500


       FCattle
             1      0.4398
             2      0.5314
             3      0.5268
             4      0.6764


     Division
      Central       0.5926
     Highland       0.4697
      Islands       0.5216
   North East       0.5949
   South East       0.5942
   South West       0.4969

Note: means are probabilities not expected values.


5570    VDISPLAY [PRINT=Wald]

5570............................................................................


*** Wald tests for fixed effects ***


   Fixed term                Wald statistic         d.f.   Wald/d.f.   Chi-sq prob

* Sequentially adding terms to fixed model

   SamGrF                         26.27               3        8.76     <0.001
   Gra_Slur                        9.23               2        4.61      0.010
   Gra_Manu                        9.05               1        9.05      0.003
   BeefonDairy                     8.03               1        8.03      0.005
   Pigs                            5.15               1        5.15      0.023
   FCattle                         7.56               3        2.52      0.056
   Max_Age                         4.05               1        4.05      0.044
   Division                        5.56               5        1.11      0.352

* Dropping individual terms from full fixed model

   SamGrF                         16.43               3        5.48     <0.001
   Gra_Slur                       16.61               2        8.31     <0.001
   Gra_Manu                        9.94               1        9.94      0.002
   BeefonDairy                     9.91               1        9.91      0.002
   Pigs                            6.31               1        6.31      0.012
   FCattle                         9.79               3        3.26      0.020
   Max_Age                         4.26               1        4.26      0.039
   Division                        5.56               5        1.11      0.352

* Message: chi-square distribution for Wald tests is an asymptotic approximation
  (i.e. for large samples) and underestimates the probabilities in other cases.

As in the univariate analysis, there is clearly no evidence of any variability which is
explained by Animal Health Division (p=0.35). For completeness, the plot of the




                                              163
mean prevalences by animal health division, adjusted for the other explanatory factors
is as follows:


                          1.00


                          0.80
   Mean Farm Prevalence




                          0.60


                          0.40


                          0.20


                          0.00
                                 Central   Highland    Islands    North East    South      South
                                                                                East       West
                                                          Categories



Although Highland Division is still the lowest prevalence division, it is much less
extreme, clearly much of the between-division variability has been explained by the
explanatory variables.

Considering Management class, fitting Manage_O gives rise to the following output
(summarised):
5583                      VDISPLAY [PRINT=Wald]

5583............................................................................


*** Wald tests for fixed effects ***


  Fixed term                                  Wald statistic           d.f.    Wald/d.f.    Chi-sq prob

* Sequentially adding terms to fixed model

  SamGrF                                              26.46              3         8.82       <0.001
  Gra_Slur                                             9.43              2         4.72        0.009
  Gra_Manu                                             9.17              1         9.17        0.002
  BeefonDairy                                          8.02              1         8.02        0.005
  Pigs                                                 5.16              1         5.16        0.023
  FCattle                                              7.79              3         2.60        0.051
  Max_Age                                              4.01              1         4.01        0.045
  Manage_O                                             1.49              3         0.50        0.685

* Dropping individual terms from full fixed model

  SamGrF                                              17.22              3         5.74       <0.001
  Gra_Slur                                            15.73              2         7.87       <0.001
  Gra_Manu                                            10.51              1        10.51        0.001
  BeefonDairy                                         10.32              1        10.32        0.001
  Pigs                                                 6.27              1         6.27        0.012
  FCattle                                             10.37              3         3.46        0.016
  Max_Age                                              2.63              1         2.63        0.105
  Manage_O                                             1.49              3         0.50        0.685




                                                                 164
* Message: chi-square distribution for Wald tests is an asymptotic approximation
  (i.e. for large samples) and underestimates the probabilities in other cases.


As seen in the earlier univariate analysis, there is clearly no evidence of any
systematic effect due to Management Class.

Given the evidence for trend in the data with respect to Sampling Year, and our
continued interest in Sampling Month, the first model to investigate temporal trend
will fit a separate effect for each of the 27 months of the study:
5661       GLMM   [PRINT=model,monitor,components,vcovariance,means,backmeans,effects;
DISTRIBUTION=binomial;\
5662   LINK=logit; DISPERSION=1; FIXED=SamGrF + Gra_Slur + Gra_Manu+BeefonDairy + Pigs
+ FCattle + Max_Age+Month;\
5663     RANDOM=Farm; CONSTANT=estimate; FACT=9; PTERMS=Month; PSE=*; MAXCYCLE=20;
FMETHOD=all;\
5664   CADJUST=mean] VFarmPos; NBINOMIAL=N_Bin; MEANS=Means; VARMEANS=Vars

***** Generalised Linear Mixed Model Analysis *****

              Method:   cf Schall (1991) Biometrika
    Response variate:   VFarmPos
        Distribution:   BINOMIAL
       Link function:   LOGIT

          Random model: Farm
           Fixed model: Constant + ((((((SamGrF + Gra_Slur) + Gra_Manu) + Beefi
nDairy) + Pigs) + FCattle) + Max_Age) + Month

* Dispersion parameter fixed at value 1.000


******** Warning from GLMM:
         missing values generated in weights/working variate.


*** Monitoring information ***

Iteration     Gammas Dispersion    Max change
        1     0.2879      1.000    3.2307E+00

******** Warning from GLMM:
         missing values generated in weights/working variate.

        2 0.000001000      1.000    2.8787E-01

******** Warning from GLMM:
         missing values generated in weights/working variate.

        3    0.06742      1.000    6.7416E-02

******** Warning from GLMM:
         missing values generated in weights/working variate.

        4     0.2668      1.000    1.9935E-01

******** Warning from GLMM:
         missing values generated in weights/working variate.

        5     0.2801      1.000    1.3329E-02

******** Warning from GLMM:
         missing values generated in weights/working variate.

        6     0.2854      1.000    5.3309E-03

******** Warning from GLMM:
         missing values generated in weights/working variate.

        7     0.2862      1.000    7.9267E-04

******** Warning from GLMM:




                                         165
           missing values generated in weights/working variate.

           8       0.2863        1.000        6.5874E-05


*** Estimated Variance Components ***

Random term                      Component             S.e.

Farm                                 0.286            0.310


*** Residual variance model ***

Term                Factor            Model(order)        Parameter          Estimate         S.e.

Dispersn                              Identity            Sigma2                  1.000      FIXED



*** Estimated Variance matrix for Variance Components ***

     Farm      1        0.09603
 Dispersn      2        0.00000          0.00000

                                 1                2



*** Table of effects for Month ***

  Month         3.00      4.00        5.00         6.00        7.00      8.00      9.00      10.00
               0.000     1.220       1.004        0.507       0.903     1.066     0.838      0.184


  Month        11.00     12.00        13.00       14.00       15.00      16.00    17.00       18.00
               1.334     0.567       -1.054       0.134       0.119     -0.460    0.852      -0.192


  Month        19.00     20.00       21.00        22.00       23.00      24.00     25.00      26.00
               0.916     0.367       1.638        0.304       0.049     -9.310    -0.455     -1.563


  Month        27.00     28.00       29.00
               0.272    -1.353       0.461

Standard error of differences:               Average            3.266
                                             Maximum            34.94
                                             Minimum           0.4839

Average variance of differences:                               90.87



*** Tables of means ***


*** Table of predicted means for Month ***

       Month              3.00             4.00             5.00           6.00             7.00
                       -0.1603           1.0599           0.8437         0.3464           0.7428


       Month             8.00              9.00            10.00          11.00            12.00
                       0.9057            0.6782           0.0238         1.1738           0.4065


       Month             13.00          14.00               15.00         16.00            17.00
                       -1.2148        -0.0262             -0.0415       -0.6201           0.6913


       Month             18.00            19.00            20.00          21.00            22.00
                       -0.3520           0.7553           0.2069         1.4780           0.1441


       Month             23.00          24.00               25.00         26.00            27.00
                       -0.1110        -9.4700             -0.6156       -1.7231           0.1118



                                                      166
       Month         28.00         29.00
                   -1.5135        0.3006


*** Back-transformed Means (on the original scale) ***



        Month
         3.00      0.4600
         4.00      0.7427
         5.00      0.6992
         6.00      0.5857
         7.00      0.6776
         8.00      0.7121
         9.00      0.6633
        10.00      0.5059
        11.00      0.7638
        12.00      0.6002
        13.00      0.2289
        14.00      0.4934
        15.00      0.4896
        16.00      0.3498
        17.00      0.6663
        18.00      0.4129
        19.00      0.6803
        20.00      0.5515
        21.00      0.8143
        22.00      0.5360
        23.00      0.4723
        24.00      0.0001
        25.00      0.3508
        26.00      0.1515
        27.00      0.5279
        28.00      0.1804
        29.00      0.5746

Note: means are probabilities not expected values.

5666   VDISPLAY [PRINT=Wald]


*** Wald tests for fixed effects ***


  Fixed term                 Wald statistic         d.f.   Wald/d.f.   Chi-sq prob

* Sequentially adding terms to fixed model

  SamGrF                          19.74               3        6.58     <0.001
  Gra_Slur                         8.08               2        4.04      0.018
  Gra_Manu                         7.51               1        7.51      0.006
  BeefonDairy                      5.57               1        5.57      0.018
  Pigs                             3.52               1        3.52      0.060
  FCattle                          5.77               3        1.92      0.123
  Max_Age                          3.11               1        3.11      0.078
  Month                           45.97              26        1.77      0.009

* Dropping individual terms from full fixed model

  SamGrF                          17.97               3        5.99     <0.001
  Gra_Slur                        16.57               2        8.29     <0.001
  Gra_Manu                         9.35               1        9.35      0.002
  BeefonDairy                      8.72               1        8.72      0.003
  Pigs                             6.65               1        6.65      0.010
  FCattle                         11.47               3        3.82      0.009
  Max_Age                          7.83               1        7.83      0.005
  Month                           45.97              26        1.77      0.009

Clearly, the month in which farms were sampled has a highly significant effect on the
probability of a farm being identified as positive, even after allowing for the




                                              167
explanatory variables.                       The plot of mean prevalences by sampling month is as
follows:


                               1


                            0.8
   Mean Farm Prevalence




                            0.6


                            0.4


                            0.2


                               0
                            Ja 8




                            Ja 9
                            M 9




                            M 0
                                    8




                            M 9




                            M 0
                            Se 8




                            Se 9
                            No 8




                            No 9
                                   8




                                   9




                                   0
                                  9




                                  9
                                 -9




                                 -9




                                 -0
                                l-9




                                l-9
                                  9




                                  9
                                  9




                                  0
                                 -9




                                 -9




                                 -0
                               v-




                               v-
                               n-




                               n-
                               p-




                               p-
                              ar




                              ar




                              ar
                             ay




                             ay




                             ay
                             Ju




                             Ju
                           M

                                   M




                                                           Month



There is a clear visual downwards trend in prevalence as the survey progressed, along
with a seasonal effect which is slightly apparent in the 1998 data, is very apparent in
the 1999 data, and which seems likely to be present in the 2000 data. In addition,
there are peculiarities in the pattern of observed prevalences. In each of 1998 and
1999, there is evidence of an appreciable drop in prevalence in June, and in each of
1999 and 2000, there is evidence of an appreciable drop in prevalence in April. It is
possible to overemphasise such apparent correlations in time series data, but it is
reasonable to assume that the observed prevalence could change according to month,
in line with changes in herd management and diet.

Fitting Sampling Month at this level of detail does not help to define a picture of any
seasonal effects on the prevalence. Any model with which it is hoped to achieve this
objective must allow for the long term drop in prevalence and the month-to month
variability. The simplest appropriate model is felt to be one which fits both Sampling
Year and Month of Sample as fixed effects. It will not be possible to fit an interaction
term. Since the data were collected in random clusters by week within Animal Health
Division, it is theoretically possible that some of the drops and peaks might be
associated with the particular Divisions which were sampled during that month. This
is unlikely, given the lack of significance seen earlier for Animal Health Division as a
factor, but to test for this, the model is refitted also including Animal Health Division:
5677                      VDISPLAY [PRINT=Wald]


*** Wald tests for fixed effects ***


   Fixed term                                 Wald statistic         d.f.   Wald/d.f.   Chi-sq prob

* Sequentially adding terms to fixed model

   SamGrF                                          21.25               3        7.08     <0.001



                                                               168
   Gra_Slur                           7.85             2    3.93       0.020
   Gra_Manu                           7.70             1    7.70       0.006
   BeefonDairy                        6.45             1    6.45       0.011
   Pigs                               4.14             1    4.14       0.042
   FCattle                            5.99             3    2.00       0.112
   Max_Age                            3.15             1    3.15       0.076
   Division                           4.51             5    0.90       0.479
   Sam_Year                          14.19             2    7.09      <0.001
   Sam_Mon                           20.54            11    1.87       0.038

* Dropping individual terms from full fixed model

   SamGrF                            16.92             3    5.64      <0.001
   Gra_Slur                          16.91             2    8.46      <0.001
   Gra_Manu                           9.36             1    9.36       0.002
   BeefonDairy                       10.55             1   10.55       0.001
   Pigs                               6.98             1    6.98       0.008
   FCattle                           10.04             3    3.35       0.018
   Max_Age                            7.19             1    7.19       0.007
   Division                           4.28             5    0.86       0.510
   Sam_Year                           6.91             2    3.45       0.032
   Sam_Mon                           20.54            11    1.87       0.038

The summarised results show that Division is insignificant as an effect, while
Sampling Month is still significant. Hence, the model is refitted without this
extraneous variable:
5684       GLMM    [PRINT=model,monitor,components,vcovariance,means,backmeans,effects;
DISTRIBUTION=binomial;\
5685   LINK=logit; DISPERSION=1; FIXED=SamGrF + Gra_Slur + Gra_Manu+BeefonDairy + Pigs
+ FCattle + Max_Age+Sam_Year+Sam_Mon;\
5686      RANDOM=Farm; CONSTANT=estimate; FACT=9; PSE=*; MAXCYCLE=20; FMETHOD=all;
CADJUST=mean]\
5687   VFarmPos; NBINOMIAL=N_Bin

***** Generalised Linear Mixed Model Analysis *****

                 Method:   cf Schall (1991) Biometrika
       Response variate:   VFarmPos
           Distribution:   BINOMIAL
          Link function:   LOGIT

          Random model: Farm
           Fixed model: Constant + (((((((SamGrF + Gra_Slur) + Gra_Manu) + Beef
inDairy) + Pigs) + FCattle) + Max_Age) + Sam_Year) + Sam_Mon

* Dispersion parameter fixed at value 1.000


 ******** Warning from GLMM:
          missing values generated in weights/working variate.


*** Monitoring information ***

 Iteration       Gammas Dispersion      Max change
         1       0.1777      1.000      3.3648E+00

 ******** Warning from GLMM:
          missing values generated in weights/working variate.

          2 0.000001000       1.000      1.7774E-01

 ******** Warning from GLMM:
          missing values generated in weights/working variate.

          3 0.000001000      1.000      0.0000E+00


*** Estimated Variance Components ***

Random term                 Component          S.e.

Farm                           0.000         0.283




                                             169
*** Residual variance model ***

Term              Factor           Model(order)       Parameter    Estimate   S.e.

Dispersn                           Identity           Sigma2         1.000    FIXED



*** Estimated Variance matrix for Variance Components ***

     Farm     1      0.07987
 Dispersn     2      0.00000         0.00000

                              1               2



*** Table of effects for Constant ***

            -3.180    Standard error:        0.5414



*** Table of effects for SamGrF ***

 SamGrF          1        2            3          4
            0.0000   0.8074       0.7204     1.1645

Standard error of differences:         Average           0.2488
                                       Maximum           0.2693
                                       Minimum           0.2278

Average variance of differences:                        0.06224


*** Table of effects for Gra_Slur ***

Gra_Slur       0.0      1.0        999.0
            0.0000   1.3087       0.6259

Standard error of differences:             Average        0.3242
                                           Maximum        0.3755
                                           Minimum        0.2704

Average variance of differences:                         0.1069


*** Table of effects for Gra_Manu ***

   Gra_Manu             0.0             1.0            999.0
                     0.0000         -1.1917           0.0000

Standard error of differences:             0.3676



*** Table of effects for BeefonDairy ***

BeefonDairy     0.0000   1.0000
              0.000    2.206

Standard error of differences:             0.6646



*** Table of effects for Pigs ***

   Pigs          1        2
            0.0000   1.0280

Standard error of differences:             0.3655



*** Table of effects for FCattle ***



                                                  170
FCattle         1          2        3         4
           0.0000     0.3878   0.3844    1.1158

Standard error of differences:       Average           0.2810
                                     Maximum           0.3371
                                     Minimum           0.2135

Average variance of differences:                       0.08154


*** Table of effects for Max_Age ***

          -0.04357     Standard error:      0.016086



*** Table of effects for Sam_Year ***

   Sam_Year            1998         1999             2000
                     0.0000      -0.4249          -0.7956

Standard error of differences:       Average           0.2577
                                     Maximum           0.3071
                                     Minimum           0.2076

Average variance of differences:                     0.06806


*** Table of effects for Sam_Mon ***

Sam_Mon       Jan        Feb      Mar       Apr        May          Jun        Jul      Aug
           0.0000     0.1722   0.8812    0.2479     1.2634       0.4584     1.1696   1.0222


Sam_Mon       Sep        Oct      Nov       Dec
           1.2757     0.5800   1.1474    0.2218

Standard error of differences:       Average           0.4615
                                     Maximum           0.5939
                                     Minimum           0.3495

Average variance of differences:                       0.2163




**** G5W0020 **** Warning (Code VC 19). Statement 268 in Procedure GLMM

Command: VKEEP #PFORM; MEANS=MEANS[]; VARMEANS=VARMEANS[]
Table/sed matrix not available for mean effects of covariates

Table of mean effects cannot be saved for term Max_Age
as it is a variate/covariate



*** Tables of means ***


* Using covariate mean values




*** Table of predicted means for SamGrF ***

     SamGrF                1            2              3                4
                     -0.5474       0.2600         0.1730           0.6171


*** Table of predicted means for Gra_Slur ***

   Gra_Slur              0.0          1.0          999.0
                     -0.5192       0.7895         0.1067




                                             171
*** Table of predicted means for Gra_Manu ***

   Gra_Manu           0.0           1.0          999.0
                   0.5229       -0.6688         0.5229


*** Table of predicted means for BeefonDairy ***

BeefonDairy   0.0000   1.0000
           -0.977    1.228


*** Table of predicted means for Pigs ***

       Pigs             1            2
                  -0.3883       0.6397


*** Table of predicted means for FCattle ***

    FCattle             1            2               3        4
                  -0.3463       0.0415          0.0381   0.7694


*** Table of predicted means for Sam_Year ***

   Sam_Year          1998         1999            2000
                   0.5325       0.1076         -0.2630


*** Table of predicted means for Sam_Mon ***

    Sam_Mon           Jan           Feb            Mar       Apr      May
                  -0.5776       -0.4054         0.3036   -0.3298   0.6857


    Sam_Mon           Jun          Jul             Aug      Sep       Oct
                  -0.1192       0.5920          0.4446   0.6980    0.0023


    Sam_Mon           Nov           Dec
                   0.5698       -0.3558


*** Back-transformed Means (on the original scale) ***


* Using covariate mean values



      SamGrF
           1      0.3665
           2      0.5646
           3      0.5431
           4      0.6496


    Gra_Slur
         0.0      0.3730
         1.0      0.6877
       999.0      0.5267


    Gra_Manu
         0.0      0.6278
         1.0      0.3388
       999.0      0.6278


 BeefonDairy
      0.0000      0.2735
      1.0000      0.7736


        Pigs
           1      0.4041



                                            172
             2      0.6547


       FCattle
             1      0.4143
             2      0.5104
             3      0.5095
             4      0.6834


       Sam_Year
           1998     0.6301
           1999     0.5269
           2000     0.4346


       Sam_Mon
           Jan      0.3595
           Feb      0.4000
           Mar      0.5753
           Apr      0.4183
           May      0.6650
           Jun      0.4702
           Jul      0.6438
           Aug      0.6094
           Sep      0.6678
           Oct      0.5006
           Nov      0.6387
           Dec      0.4120

Note: means are probabilities not expected values.


5688    VDISPLAY [PRINT=Wald]


*** Wald tests for fixed effects ***


  Fixed term                 Wald statistic         d.f.   Wald/d.f.   Chi-sq prob

* Sequentially adding terms to fixed model

  SamGrF                          23.99               3        8.00     <0.001
  Gra_Slur                         8.78               2        4.39      0.012
  Gra_Manu                         8.64               1        8.64      0.003
  BeefonDairy                      6.90               1        6.90      0.009
  Pigs                             4.40               1        4.40      0.036
  FCattle                          6.56               3        2.19      0.087
  Max_Age                          3.47               1        3.47      0.063
  Sam_Year                        16.00               2        8.00     <0.001
  Sam_Mon                         22.04              11        2.00      0.024

* Dropping individual terms from full fixed model

  SamGrF                          19.06                3       6.35     <0.001
  Gra_Slur                        18.22                2       9.11     <0.001
  Gra_Manu                        10.51                1      10.51      0.001
  BeefonDairy                     11.01                1      11.01     <0.001
  Pigs                             7.91                1       7.91      0.005
  FCattle                         11.86                3       3.95      0.008
  Max_Age                          7.33                1       7.33      0.007
  Sam_Year                         7.25                2       3.63      0.027
  Sam_Mon                         22.04               11       2.00      0.024

Both Month and Year of Sampling are found to have a statistically significant
influence on the probability of a farm being classed as positive for shedding. The
inclusion of these structural variables has a negligible effect on the significances
estimated for the explanatory factors.




                                              173
Reviewing the effect of Sampling Year, the estimated mean prevalences for the three
years of the study, adjusted for Sampling Month effects and all the explanatory
factors, are:

                                 Year          Mean Farm Prevalence
                                 1998                  0.63
                                 1999                  0.53
                                 2000                  0.43

Plotting the mean prevalence with the associated 95% confidence intervals gives:


                          1.00


                          0.80
   Mean Farm Prevalence




                          0.60


                          0.40


                          0.20


                          0.00
                                        1998           1999           2000
                                                       Year



The nature of the trend is clear. There is a year on year drop in prevalence, which is
statistically significant overall (p=0.03). The drop from 1998 to 1999 exhibits a mean
change of –0.425, with a standard error of 0.208. The associated t-statistic equals
2.05, with a p-value of 0.04. The drop from 1999 to 2000 is not statistically
significant (change=-0.37, se=0.26, t=1.44, p=0.15). The nature of the trend is
identical to that seen in the analysis involving only year and month, but the estimated
effects are much more significant for 1998/1999, presumably since much of the
extraneous noise in the initial analysis has been explained by the explanatory
variables in the multi-factor model, and less significant for 1999/2000, presumably
since much of the effect in 2000 has been explained by other explanatory factors
which were strongly unbalanced in the (abbreviated) sampling year 2000.

Reviewing the effect of Sampling Month, the estimated mean prevalences for the each
month of the year, adjusted for Sampling Year effects and all the explanatory factors,
are:

                                 Mean Farm
 Month                           Prevalence
  Jan                               0.36
  Feb                               0.40


                                                       174
  Mar                               0.58
  Apr                               0.42
  May                               0.67
  Jun                               0.47
  Jul                               0.64
  Aug                               0.61
  Sep                               0.67
  Oct                               0.50
  Nov                               0.64
  Dec                               0.41

A more clear picture is provided by plotting the mean prevalence with the associated
95% confidence intervals, giving:


                          1.00


                          0.80
   Mean Farm Prevalence




                          0.60


                          0.40


                          0.20


                          0.00
                                 Jan Feb Mar Apr May Jun   Jul   Aug Sep Oct Nov Dec
                                                  Month of Sampling



There appears to be a clear seasonal cycle in prevalence, with higher values in late
Sprint and Summer, and lower values in December to February. However, over and
above this, there is evidence of other monthly effects occurring against the cycle,
perhaps most blatantly in June, and probably in March, April and November. Again,
the nature of the month to month effect is unchanged relative to the initial analysis
involving only month and year, but the estimated effects exhibit a greater
significance, presumably due to the greater explanatory value of the multi-factor
model.

It is tempting to consider that, previous evidence notwithstanding, the Sampling
Month effect might be associated with Housing status, as was the within-farm
prevalence on positive farms. To test this hypothesis, the model is refitted, including
Housed as a further explanatory factor. The (summarised) results are as follows:
5730                      VDISPLAY [PRINT=Wald]


*** Wald tests for fixed effects ***




                                                        175
   Fixed term              Wald statistic         d.f.   Wald/d.f.   Chi-sq prob

* Sequentially adding terms to fixed model

   SamGrF                        24.20              3        8.07      <0.001
   Gra_Slur                       8.46              2        4.23       0.015
   Gra_Manu                       8.77              1        8.77       0.003
   BeefonDairy                    6.90              1        6.90       0.009
   Pigs                           4.37              1        4.37       0.037
   FCattle                        6.40              3        2.13       0.094
   Max_Age                        3.45              1        3.45       0.063
   Sam_Year                      15.98              2        7.99      <0.001
   Housed                         0.55              1        0.55       0.460
   Sam_Mon                       21.94             11        1.99       0.025

* Dropping individual terms from full fixed model

   SamGrF                        19.25              3        6.42      <0.001
   Gra_Slur                      16.54              2        8.27      <0.001
   Gra_Manu                      10.65              1       10.65       0.001
   BeefonDairy                   11.02              1       11.02      <0.001
   Pigs                           7.87              1        7.87       0.005
   FCattle                       11.64              3        3.88       0.009
   Max_Age                        7.24              1        7.24       0.007
   Sam_Year                       7.32              2        3.66       0.026
   Housed                         0.53              1        0.53       0.467
   Sam_Mon                       21.94             11        1.99       0.025


Housed remains completely insignificant as an explanatory factor, and Sampling
Month and Sampling Year are unchanged in terms of overall significance levels.

Hence, there is clear evidence of a temporal structure in the data, both over the long
term (a significant decrease in the proportion of farms detected as positive over the
lifetime of the project), and over the short term (a significant month to month
variability, unexplained by the explanatory variables fitted in the multi-factor model).




                                            176
Appendix 1: Variates and Factors Collected by the Farm Questionnaire.

Factor/Variable   Comments                                                                        Levels
Manage_O          Observed management type.                                                       Beef, Dairy, Other, Mixed
Division          Animal Health Division, with one division divided into Highlands and Islands.   Central, Highlands, Islands, NE, SE, SW
Sam_Month         Month in which samples were collected.                                          January-December
Sample            Type of sampling scheme.                                                        Faecal Pat, Rectal
Sam_Year          Year in which samples were collected.                                           1998, 1999, 2000

Sampler           Person carrying out sampling.                                                   H, F (codes)
N_F_Cattle        Number of finishing cattle on farm.                                             Variate
FCattle           Number of finishing cattle, categorised into groups.                            <50, 50-99, 100-199, 200+
N_Groups          Number of management groups of cattle on farm.                                  Variate
GroupsCat         Number of management groups, categorised into groups.                           1, 2-5, 6-9, 10+
N_Sam_Gr          Number of finishing cattle in sampling group.                                   Variate
Min_Age           Minimum age of animals in sampling group.                                       Variate
Max_Age           Maximum age of animals in sampling group.                                       Variate
Source            Farm policy for replacement cattle.                                             Buy In, Breeding Only, Both
NewSource         Restructuring of 'Source' into open and closed farms.                           Open, Closed
                                                                                                  Beef (Suckler Beef), Dairy Beef, Dairy (Bull
Breed             Breed of cattle in sampling group.                                              Beef), Combinations of these
Housed            Whether sampling group are housed or unhoused.                                  Housed, Unhoused
Housing           For housed animals only: type of housing.                                       Court/Straw Yard, Slats, Byre, Other
TDHouse           Number of months for which animals have been in current housed state.           Variate
                  Whether or not the sampling group have been moved in the 4 weeks prior to
Rec_Move          sampling.                                                                         Yes, No
SupFeed           For unhoused animals only: whether the sampling group is fed supplements.         Yes, No
                  Whether or not the sampling group have had a change in diet in the 4 weeks
RecDFeed          prior to sampling.                                                                Yes, No
Forage            For housed animals only: whether the sampling group is fed forage.                Yes, No
Silage            For housed animals only: whether the sampling group is fed silage.                Yes, No
Concentrate       For housed animals only: whether the sampling group is fed concentrate.           Yes, No
Sil_Home          For housed animals fed silage only: whether the farm produces silage.             Yes, No
                  For housed animals fed farm-produced silage only: whether the farm spreads
Sil_Manure        manure on the silage fields.                                                      Yes, No
                  For housed animals fed farm-produced silage only: whether the farm spreads
Sil_Slurry        slurry on the silage fields.                                                      Yes, No
                  For housed animals fed farm-produced silage only: whether the farm spreads
Sil_Sewage        sewage on the silage fields.                                                      Yes, No
                  For housed animals fed farm-produced silage only: whether geese have been
Sil_Geece         observed on the silage fields.                                                    Yes, No
                  For housed animals fed farm-produced silage only: whether gulls have been
Sil_Gulls         observed on the silage fields.                                                    Yes, No
Hay               Whether the farm produces hay.                                                    Yes, No
                  If the farm produces hay only: whether the farm spreads manure on the hay
Hay_Manure        fields.                                                                           Yes, No
Hay_Slurry        If the farm produces hay only: whether the farm spreads slurry on the hay fields. Yes, No
                  If the farm produces hay only: whether the farm spreads sewage on the hay
Hay_Sewage        fields.                                                                           Yes, No
                  If the farm produces hay only: whether geese have been observed on the hay
Hay_Geese         fields.                                                                           Yes, No
                  If the farm produces hay only: whether gulls have been observed on the hay
Hay_Gulls         fields.                                                                           Yes, No
Grass_Manure      Whether the farm spreads manure on pasture.                                       Yes, No
Grass_Slurry      Whether the farm spreads slurry on pasture.                                       Yes, No
Grass_Sewage      Whether the farm spreads sewage on pasture.                                       Yes, No
Grass_Geece       Whether geese have been observed on pasture.                                      Yes, No
Grass_Gulls       Whether gulls have been observed on pasture.                                      Yes, No
N_Cattle          Number of cattle on farm other than the finishing group.                          Variate
                  Number of cattle on farm other than the finishing group, categorised into a
Cattle            factor.                                                                           <100, 100-499, 500-899, 900+
N_Sheep           Number of sheep on farm.                                                          Variate
Sheep             Absence/presence of sheep on farm.                                                Yes, No
N_Goats           Number of goats on farm.                                                          Variate
Goats             Absence/presence of goats on farm.                                                Yes, No
N_Horses          Number of horses on farm.                                                         Variate
N_Pigs            Number of pigs on farm.                                                           Variate
Pigs              Absence/presence of pigs on farm.                                                 Yes, No



                                                           177
N_Chickens    Number of chickens on farm.                                                    Variate
Chickens      Absence/presence of chickens on farm.                                          Yes, No
N_Deer        Number of deer on farm.                                                        Variate
Deer          Absence/presence of deer on farm.                                              Yes, No
Mains         Whether sampling group is watered with a mains supply.                         Yes, No
Private       Whether sampling group is watered with a private supply.                       Yes, No
Natural       Whether sampling group is watered with a natural supply.                       Yes, No
WaterCon      Whether water have been contaminated within the 12 months prior to sampling.   Yes, No
                                                                                             Animals Upstream, Septic Tank, Midden,
WaterCT       Possible sources of contamination.                                             Combinations of these
Want2Know     Whether farmer wishes to know results of sampling.                             Yes, No
Visit2        Whether farmer is willing to have a further set of samples collected.          Yes, No
LabOperator   Lab operator responsible for assaying faeces samples.                          S, D, H (codes)
BeefonDairy   Whether farm is classed as a dairy farm with suckler beef cattle.              Yes, No




                                                        178

								
To top