VIEWS: 3 PAGES: 178 POSTED ON: 11/6/2012 Public Domain
Statistical Analysis of the SEERAD/SAC E. coli O157 Prevalence Study, 1998-2000 SEERAD FF Project BSS/028/99 Iain J. McKendrick Biomathematics & Statistics Scotland 1 Executive Summary Properties of Data Samples from 952 farms are included in the analysis, with a total of 14,856 faecal samples analysed. Of these faecal samples, 1231 were positive for verocytotoxic E. coli O157. These positive samples were sourced from 207 farms. Hence, the raw figures indicate that 21.7% (19.2%, 24.5%) of groups sampled contained shedding animals, and that the animal level prevalence is 8.3% (7.3%, 9.4%). However, these figures do not allow for the effects of sampling error (which in a situation with many groups with a small number of shedders would tend to underestimate the number of groups containing shedders) and of the mixed nature of the sample (farms with no infection will, by definition, have zero prevalence, a more useful statistic is the estimate of the animal prevalence on those farms which are positive). The data are analysed using a beta-binomial model, from which it is estimated that the proportion of shedding animals is 7.9% with a 95% confidence interval of (6.5%, 9.6%). This is slightly lower than the raw estimate given earlier. This adjustment arises from the more appropriate modelling of the asymmetric prevalence distribution. It is estimated that 22.8% of finishing groups contained at least one positive shedding animal, with a 95% confidence interval of (19.6%, 26.3%). The point-estimate and confidence interval are both slightly higher than the raw estimates given earlier, since these figures incorporate an adjustment to allow for farms with low shedding rates being misclassified as negative due to sampling variability. Analysis of Within-Farm Prevalences These data are highly skewed, with many zero returns. This is because their true statistical distribution should be a mixture distribution, with true negative farms always generating a zero response and positive farms generating a range of responses, many of which will be zero, with variability arising from the between-farm variability and the sampling variability. Ignoring this aspect of the data gives rise to models with unacceptable residuals. The data is handled by restricting analysis to those observations with non-zero responses. Hence, the epidemiological analysis answers the question ‘given that the farm has at least one positive sample, what factors tend to be associated with higher within-farm prevalences?’ The data are analysed by fitting a series of generalised linear models to each variable in turn, developing a multivariate model (using some of the stepwise regression functions available for this class of model) containing all likely factors, and then refitting this model as a generalised linear mixed model (GLMM). Hence the ultimate model uses the most appropriate algorithm for the data. The data are consistently fitted as binomial random variables with logit link functions. Generalised linear models are consistently fitted with estimated dispersion parameters (all of which are clearly greater than one), while the GLMMs are fitted with Farm as a random effect and fixed dispersion (since farm is the basic sampling unit). Other possible random effects are insignificant. Within the univariate analysis, examining structural variables, animal health division and sampling month are found to be highly significant. Examining possible 2 explanatory variables, we find that housing status (housed or unhoused) has an extremely significant effect on the prevalences (housed animals have a much higher prevalence than unhoused animals). Factor/Variable Effect Comment Division Highland area has a higher Effect even stronger in prevalence, South-West has a ultimate multivariate low prevalence. model. Sampling Month Lower in summer months. Effect disappears in ultimate multivariate model. Effect explained by differential housing in different months. Season/ Seas_List Summer and Autumn show Effect better explained by lower prevalences. examining results on a month by month basis. Effect disappears in multivariate model. Housed Housed animals have a much This is the key finding of higher prevalence. Highly the study. All other parts significant. of the analysis depend on the correct modelling of the ‘Housed’ effect. Recent Move A recent move is associated This effect becomes even with lower prevalences. more clear when explored in conjunction with ‘Housed’. Recent Change in Feed Recent change in feed This effect becomes even associated with lower more clear when explored prevalences. in conjunction with ‘Housed’. Silage_Home Silage production on the Effect explained more farm is associated with lower fully in multivariate prevalence in housed analysis. animals. Silage_Slurry Silage production on the Effect explained more farm with the spreading of fully in multivariate slurry is associated with analysis. lower prevalence in housed animals. N_Pigs Higher number of pigs is Model result depends on 8 associated with lower points with high leverage. prevalence. Suspicious that categorical variable derived from this variable (Pigs) is not significant. Effect found not to be significant in final multivariate model. Probably spurious. 3 N_Deer Higher number of deer is Model result depends on 1 associated with higher point with high leverage. prevalence. No basis for drawing any wider conclusions from this result. Probably spurious. Water Natural water supplies Natural water supplies associated with significantly associated with unhoused lower prevalences than main animals. Even so, natural supply. water supply is associated with lower prevalence. Housing, Supplementary All of these factors, although No information above that Feed, Forage, Silage, apparently significant in the gained from ‘Housed’ Concentrate, univariate analysis, are Grass_Manure, confounded with Housed. Grass_Slurry, Grass_Sewage, Grass_Geece, Grass_Gulls Fitting a multi-factor model, particularly exploring the interactions between the Housed variable and the other possible variables, we find that the following factors are of interest: Factor/ Effect Log se p- Variable Odds value Ratio Housed Housed animals have higher 1.319 0.33 <0.001 prevalences. FCattle Farms with >100 finishing -0.702 0.23 0.004 cattle have significantly lower prevalences than those with <100. Housed/’Recent Farms with Housed Animals 0.480 0.43 0.26 Changes in and recent changes have higher Housing or Diet’ prevalences than farms with interactions unhoused animals. This effect is not formally significant. 0.891 0.33 0.007 Farms with Housed Animals and no recent changes have higher prevalences than farms with Housed Animals and recent changes. Water sourced Farms with animals at pasture -0.708 0.35 0.04 from natural have lower prevalences if the supply water is from a natural source. 4 Slurry spread on Farms with housed animals -0.5529 0.29 0.07 Farm which spread slurry on their silage fields have a lower prevalence than farms with housed animals which do not. Animal Health Scotland divided into three Division regions: Highlands; Central, Islands, North-East and South- East; and South West. Highlands exhibits a 0.969 0.42 0.02 significantly higher prevalence than the portmanteau region. The South West exhibits a -0.600 0.28 0.03 significantly lower prevalence than the portmanteau region. Sampling Month No significant effects Various Various 0.23 identified. All variability explained by explanatory variables above, especially Housed. Sampling Year No significant effects Various Various 0.61 identified. Hence, various explanatory factors and variables have been identified as being associated with the within-farm prevalence of E. coli O157 shedding in finishing cattle on positive farms. No statistically significant management system variability was observed in the analysis of the basic data, and nothing further became apparent following the fitting of the multi-factor model. Similarly, there was no evidence of any long-term trend in prevalences over the lifetime of the study, and this conclusion remained unaffected by the fitting of the multi-factor model. By contrast, the basic data showed evidence of variability between different Animal Health Divisions, and this effect remained in the multi-factor model, unexplained by any of the proposed explanatory factors. The basic data showed highly significant evidence of cyclicity by month. When included in a model with the full multi-factor model, the month effect was found to be insignificant, being fully explained by other explanatory factors. Hence it can be concluded that although the within-farm prevalences do vary with month, this is explained by the proposed explanatory factors. By contrast, the geographical variability in the data appears to be genuine, and is best examined after the extraneous effects of the other explanatory factors have been allowed for in the model. Analysis of Between-Farm Prevalences The detailed data collected in the study can be converted into binary (or Bernoulli) data, where the farm is recorded as a positive if at least one of the samples collected from that farm is positive, and negative if all samples are negative. The binary data can then be analysed in terms of the probability of observing a positive farm on different types of farm. These data present fewer difficulties in analysis than the within-farm prevalence data: since only positives and negatives are recorded, it is 5 impossible for a generalised linear model to provide a poor fit in terms of the distribution of residuals, since the data does not contain enough structure for any lack of fit to occur. Accordingly, all the models in this section are fitted with dispersion parameter set equal to one, since it is impossible to estimate any such over-dispersion from the data. Many of the diagnostics which are available in terms of the fit of the model for Binomial data are not useful for Bernoulli data. It is appropriate to examine the data in this format for two reasons: firstly, since zero prevalence farms have been excluded from the within-farm analysis for technical statistical reasons, it is desirable to investigate the factors which are associated with farms being negative, since otherwise these data will have never have been analysed. Secondly, there is no reason to believe that the factors which promote high within-herd prevalences on farms which are positive will be the same as the factors which either promote the infection of farms with E. coli O157 or which encourage the maintenance of infection once introduced. Obviously, a factor which is associated with high within-herd prevalence will have potential to also be associated with a high probability of herd infection, however, it will be interesting to identify where different factors may come into play in the two models. The data are analysed by fitting a series of generalised linear models to each variable in turn, developing a multivariate model (using some of the stepwise regression functions available for this class of model) containing all likely factors, and then refitting this model as a generalised linear mixed model (GLMM). Hence the ultimate model uses the most appropriate algorithm for the data. The data are consistently fitted as Bernoulli random variables with logit link functions. Generalised linear models are consistently fitted with dispersion parameters fixed equal to one, while the GLMMs are fitted with Farm as a random effect and a fixed dispersion (since farm is the basic sampling unit). Other possible random effects are found to be insignificant. Within the univariate analysis, examining structural variables, none are found to be highly significant. There is some weak evidence of an effect due to Sampling Year, but this effects are not significant at the 5% level. Examining possible explanatory variables, by contrast to the within-herd model, we find that Housing status has a negligible effect on the probability of a farm being identified as positive. The following factors were found to be of interest in the univariate analysis: Factor/Variable Effect Comment Division No formally statistically No trend apparent, significant effects. although it is interesting Highland division has a that Highlands are so low, particularly low when the within-herd prevalence. prevalence was high. Effects utterly disappear in the multifactor model. Sampling Month No statistically significant In the within-farm model, evidence of any effects January-April tended to (p=0.26). Prevalences show higher prevalences, from December to associated with Housing February show signs of effects. This aspect of the being lower. dataset requires careful interpretation, since data 6 from early 2000 is included in the January to April estimates, and not in the other months. There is some evidence that the data from 2000 exhibits a lower prevalence. Hence this variable is analysed along with Sampling Year. However, even when Year and Sample Month are fitted in the same model, there is only weak evidence of any effect due to Sampling Month. However, the effects which are apparent in the univariate analysis can be shown be significant within the multifactor analysis. Sampling Year A small drop in 1999 and Due to a lack of balance in a large drop in 2000. The the dataset, this result is result is close to statistical derived from a model fitted significance (p=0.06). with Sampling Month. There is compelling evidence of a drop in prevalence by year 2000, less so for year 1999. Similar results are seen in the multifactor model, where the trend is highly significant. Number of Finishing Cattle Higher numbers of Each of the eight finishing cattle were significant cattle number associated with a high risk factors and variables gives of the farm being positive. the same result: more P-value suppressed as animals equates to a higher arising from a poorly risk of the farm being fitting model. positive. Some are rejected as presenting a poorly fitting model: others because another factor is found to be more informative. This variate was overly sensitive to a small number of farms with high numbers of finishing cattle. Categorised Number of Categorising the numbers One of the most 7 Finishing Cattle of animals into 4 classes, informative factors in this groups containing 1-49 sub-grouping. Carried animals were less likely to forward for further be identified as positive investigation in the multi- than larger groups, while factor model. groups of >200 animals had even higher prevalences still. Effects are highly statistically significant (p<0.001). Number of Groups of Cattle Higher numbers of groups This variate was overly of cattle were associated sensitive to a small number with a higher risk of the of farms with high farm being positive. p- numbers of groups of value suppressed as cattle. arising from a poorly fitting model. Categorised Number of Higher numbers of groups Factor relatively Groups of Cattle of cattle were associated insignificant. Lacked with a higher risk of the information relative to farm being positive. other terms in the sub- (p=0.08). Fit still fairly grouping. poor. Number of Cattle in Sampling Higher numbers of This variate was overly Group animals in the sampling sensitive to a small number group were associated of farms with high with a higher risk of the numbers of groups of farm being positive. p- cattle. value suppressed as arising from a poorly fitting model. Categorised Number of Cattle Higher numbers of Carried forward for further in Sampling Group animals in the sampling investigation in the multi- group were associated factor model. with a higher risk of the farm being positive (p<0.001). Number of Cattle Higher numbers of cattle This variate was overly were associated with a sensitive to a small number higher risk of the farm of farms with high being positive. p-value numbers of cattle. suppressed as arising from a poorly fitting model. Categorised Number of Cattle Higher numbers of cattle Carried forward for further were associated with a investigation in the multi- higher risk of the farm factor model. Lacks significance when fitted being positive. (p=0.002). with other factors. Source of Cattle Farms which never buy in Lacks significance when animals have a fitted with other factors in 8 significantly lower the multivariate model. (p=0.03) risk of being When number of finishing positive than those which cattle or number of always or sometimes buy sampling groups are in animals. included in the model, it can be seen that source of cattle lacks explanatory power. Breed Farms with B_D_DB An extremely small level, class animals have a with a correspondingly higher prevalence than high leverage, it is not others (p=0.018). surprising that it is found to lack significance when fitted with other factors. Beef Cattle on Dairy FarmFarms which are Risk group identified from described as having a analysis of interaction of dairy system with beef two more broadly defined cattle have a statistically factors. Possible risk of significantly higher risk of over-trawling the data. being positive than other farms (p=0.017). Spreading of Slurry on Farms with unhoused Pasture animals which spread slurry on the pasture have a higher risk of being positive than those which do not, or those which have housed animals. (p=0.003). Spreading of Manure on Farms with unhoused Pasture animals which spread manure on the pasture have a lower risk of being positive than those which do not, or those which have housed animals. (p=0.037). Number of Goats High number of goats is This variate was overly associated with a higher sensitive to two farms with risk of farm being higher numbers of goats. positive. p-value suppressed as arising from a poorly fitting model. Presence of Pigs on Farm The presence of pigs on a farm is associated with a higher risk of the farm being classed as positive (p=0.01). Lab Operator The identity of the lab This effect was found to be operator who carried out spurious, arising from the 9 unbalanced nature of the the assaying of the data with respect to this samples was found to be a significant factor. Different operators effect (p=0.039). carried out work at different times, on samples with different mean prevalences. Max Age of Animals in A higher maximum age is This variate is included for Group associated with a lower completeness, since it is prevalence (p=0.31). found to be relevant in the multi-factor model, although, as can be seen, it lacks any apparent explanatory power in isolation. Fitting a multi-factor model, we find that the following factors and variates are of interest: Factor/ Effect Log Odds se p-value Variable Ratio Sampling Year Allowing for the -0.425 0.21 0.04 explanatory factors, farms sampled in year 1999 are at lower risk of being positive than those sampled in 1998. Allowing for the -0.371 0.26 0.15 explanatory factors, farms sampled in year 2000 are at lower risk of being positive than those sampled in 1999. Allowing for the -0.795 0.31 0.01 explanatory factors, farms sampled in year 2000 are at lower risk of being positive than those sampled in 1998. Sampling A broad cyclical effect, Various Various 0.02 Month with prevalence effects peaking in Summer and troughing in Winter. Anomalous changes in prevalences observed in a number of months, such as June, April and November. 10 Categorised Farms with 12-28 animals 0.687 0.23 0.003 Number of are at a higher risk of Animals in being positive than those Sampling with <12 animals. Group. Farms with >28 animals 0.462 0.19 0.03 are at a higher risk of being positive than those with 12-28 animals. Categorised Farms with 50-199 0.367 0.19 0.05 Number of animals are at a higher Finishing risk of being positive than Cattle. those with 1-49 animals. Farms with 200+ animals 0.614 0.30 0.04 are at a higher risk of being positive than those with 50-199 animals. Spreading of Considering only farms 1.205 0.32 <0.001 Slurry on Pasture with animals at pasture, those which spread slurry are at a higher risk than those which do not. Spreading of Considering only farms -1.155 0.36 0.001 Manure on with animals at pasture, Pasture those which spread manure are at a lower risk than those which do not. Dairy Farms Dairy farms with beef 1.965 0.64 0.002 with Beef cattle are at a higher risk Cattle of being positive than other farms. Presence of Farms with pigs are at a 0.892 0.35 0.01 pigs on farm. higher risk of being positive than those without pigs. Maximum age Higher maximum age is -0.031 0.015 0.04 of cattle in associated with a lower sampling group. risk of the farm being positive. Of these, it should be pointed out that the factor ‘Categorised Number of Animals in Sampling Group’ is correlated with the number of animals in the sampling group and hence with the number of samples collected from the group. Hence it might be thought likely that a positive relationship might be generated through the higher detection probability arising from a larger sample. Consideration of the data suggests that this is unlikely, but even if the result is discounted on this basis, the inclusion of FCattle in the model even in the presence of the sampling group factor indicates that the size of enterprise is a highly significant risk factor. 11 Hence, various explanatory factors and variables have been identified as being associated with the farm prevalence of E. coli O157 shedding in finishing cattle. No statistically significant geographical or management system variability was observed in the analysis of the basic data, and nothing further became apparent following the fitting of the multi-factor model. By contrast, the basic data showed evidence of a long-term trend towards lower prevalences over the lifetime of the study, and this trend remained in the multi-factor model, unexplained by any of the proposed explanatory factors. The basic data showed no significant evidence of any cyclicity by month or season, although various peculiarities were observable in the analysis. When included in a model with the full multi-factor model, the month effect is found to be significant. It is important to stress that this significance is associated with the same peculiarities observed in the univariate model: the effect is not an artefact of a poorly fitting model. Hence it can be concluded that the farm level prevalences do vary with month, in a fashion which is not explained by the proposed explanatory factors. 12 Properties of Data Samples from 952 farms are included in the analysis, with a total of 14,856 faecal samples analysed. Of these faecal samples, 1231 were positive for verocytotoxic E. coli O157. These positive samples were sourced from 207 farms. Hence, the raw figures indicate that 21.7% (19.2%, 24.5%) of groups sampled contained shedding animals. 126 "Modelling of binomial proportions. (e.g. by logits)." 127 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=1 128 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 129 129............................................................................. ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: 1 Distribution: Binomial Link function: Logit Fitted terms: Constant *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 0 0.0 * Residual 951 997.0 1.048 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.2807 0.0786 -16.30 <.001 0.2779 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Analysis of the animal level prevalence is complicated by the need to fit a dispersion parameter and the (frankly) appalling fit of the model, giving a mean and confidence interval of 8.3% (7.3%, 9.4%). 134 "Modelling of binomial proportions. (e.g. by logits)." 135 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=*] VTPos; NBINOMIAL=N_Sam 136 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 137 137............................................................................. ***** Regression Analysis ***** Response variate: VTPos Binomial totals: N_Sam Distribution: Binomial Link function: Logit Fitted terms: Constant *** Summary of analysis *** mean deviance approx 13 d.f. deviance deviance ratio F pr. Regression 0 0. * Residual 951 5393. 5.671 Total 951 5393. 5.671 Dispersion parameter is estimated to be 5.67 from the residual deviance * MESSAGE: The following units have large standardized residuals: Unit Response Residual 3 15.00 3.63 15 21.00 4.02 30 23.00 4.50 38 16.00 3.45 131 17.00 3.87 259 16.00 3.45 273 22.00 4.40 305 18.00 3.81 326 18.00 3.50 428 17.00 3.57 464 14.00 3.32 514 20.00 4.19 719 16.00 3.45 720 17.00 3.87 864 14.00 3.51 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses *** Estimates of parameters *** antilog of estimate s.e. t(951) t pr. estimate Constant -2.4041 0.0709 -33.92 <.001 0.09035 * MESSAGE: s.e.s are based on the residual deviance This model is, however, extremely poor, since the plot of fractional prevalences shows that the distribution of positive samples is probably not even unimodal. 800 700 600 500 Frequency 400 300 200 100 0 0.0 0.5 1.0 Fractional Prevalence Histogram of Fractional Prevalences. However, these figures do not allow for the effects of sampling error (which in a situation with many groups with a small number of shedders would tend to underestimate the number of groups containing shedders) and of the mixed nature of 14 the sample (farms with no infection will, by definition, have zero prevalence, a more useful statistic is the estimate of the animal prevalence on those farms which are positive). In order to deal with these issues, a more complex model for the within-herd prevalence distribution is proposed. The data are treated as being the outcome of a mixture distribution, where a proportion pneg of the population are defined as negative farms and will always return a zero number of positive samples. Among the positive population, the between farm variability is modelled as a beta distribution, taking parameters a and b, while the sampling distribution of the faecal pat sampling process is taken to be binomial. A small number of farms were sampled using rectal samples. The sampling distribution of this process is taken to be hypergeometric. No positive samples were collected from rectally sampled groups. Hence, where N is the number of animal in the group, n is the number of samples collected, and x is the observed number of positives, the distribution of x is taken to be: n b a b p neg (1 p neg ) under faecal pat sampling a b n b P( X 0) N n N n i a n i b a b p neg (1 p neg ) i a b N a b under rectal sampling i 0 n x a n x b a b P( X x, x 0) (1 p neg ) x under faecal pat sampling a b n a b Hence, although two different sampling distributions are involved, they are based on the same underlying parameters and can be incorporated into the same likelihood. The log-likelihood is maximised with respect to a, b and pneg. Parameter Value pneg 3.98E-31 a 0.0687 b 0.8013 The beta function to model the between farm variability in positive groups has a bi- modal shape, reflecting the long tail towards high proportional prevalences. The population contains a large proportion of groups with low prevalences, which are likely to give rise to observations of zero positives. This means that the estimate for pneg and for a and b are highly negatively correlated. 15 6 5 4 pdf 3 2 1 0 0 0.2 0.4 0.6 0.8 1 Proportion Shedding Between-farm variability as summarised by the beta function. The fit of the model was tested against the faecal pat-sampled observations. These data were categorised by sample size, and expected values for each response given the model were calculated. Many of these expectations were extremely small, so the expectations and observations were grouped into larger combinations with expectations of at least 5. 55 variables were used to calculate a goodness of fit statistic. However, the expectations also incorporated 26 constraints, conditioning on the number of farms associated with each of the sample sizes. Hence there were 29 degrees of freedom associated with the test statistic. The fit to the data is found to be adequate, with a chi-squared goodness-of-fit test generating a test statistic 29 36 .4 2 which has a p-value of 0.16. The mean animal-level prevalence on positive farms was a estimated by the mean of the beta distribution, and the mean farm level ab prevalence was estimated using a more complex procedure which took account of the distribution of numbers of finishing cattle in the groups sampled in the study. This distribution has a highly skewed distribution, as shown below: 16 Frequency 200 100 0 0 100 200 Number of Cattle in Group Histogram of Number of Cattle in Sampling Groups. However, when the number of cattle are log-transformed, the distribution looks much more symmetric: 100 Frequency 50 0 0 1 2 3 4 5 Log(Number of Cattle) Histogram of the Log of Number of Cattle in Sampling Groups. The distribution of number of cattle in the sampling groups is modelled as a log- normal distribution, with parameters as shown in the table below: Parameter Value mu 2.843549 sigma 0.708497 17 Assuming no relationship between size of group and the variability in prevalence summarised in the beta distribution, the beta-binomial model was used to estimate the fraction of of groups which contained at least one shedding animal (the parameters already estimated give enough information to do this). Confidence intervals for the prevalences were generated by exploring the nature of the profile log-likelihood in the vicinity of the maximum, and using the chi-squared approximation to the log-likelihood ratio to define a 95% confidence region for a, b and pneg. Because of the strong negative correlation between pneg and a and b, pneg was set equal to the maximum likelihood estimate. Marginal confidence intervals for the mean prevalences were then generated from the profile log-likelihood by identifying the maximum and minimum values of the prevalences on the boundary of the confidence region specified by the chi-squared approximation to the profile log- likelihood ratio. Two variables were assumed unfixed, so the confidence interval was based on two available degrees of freedom. The results are summarised in the following table: 18 Point Estimate 95% Confidence Interval Group-Level 22.8% (19.6%, 26.3%) Prevalence Overall Animal- 7.9% (6.5%, 9.6%) Level Prevalence Just under one quarter of the groups of finishing cattle contained at least one shedding animal. The point-estimate and confidence interval are both slightly higher than the raw estimates given earlier, since these figures incorporate an adjustment to allow for farms with low shedding rates being misclassified as negative due to sampling variability. These figures imply that this misclassification occurred in just over 1% of farms sampled, and hence, that from the population of positive groups sampled, just under 5% (4.7%) were misclassified. The overall proportion of animals estimated to be shedding is 7.9%. This is slightly lower than the raw estimate given earlier. This adjustment arises from the more appropriate modelling of the asymmetric prevalence distribution. The confidence interval, (6.5%, 9.6%), is also slightly wider, for the same reason. It is interesting to attempt to estimate the proportion of animals shedding in positive groups. The difficult with this estimate is that because many groups may contain only a small number of shedders, and it is difficult to distinguish such positive groups (which should contribute to the estimate) from negative groups (which should not). Estimates of this proportion are highly sensitive to the estimated value of pneg and hence it is inappropriate to utilise the profile likelihood approach used to estimate the earlier confidence intervals. Confidence intervals for the mean prevalences were generated from the log-likelihood by identifying the maximum values of the prevalence on the boundary of the confidence region specified by the chi-squared approximation to the log-likelihood ratio. Three variables were varied, so the upper limit of the confidence interval was based on three available degrees of freedom. The lower bounds of the confidence interval for the within-infected groups prevalence must occur where pneg is negligible, and when this is the case, the likelihood is degenerate, with only two effective degrees of freedom. Therefore, the lower bound of the confidence interval was taken to be equal to that calculated for the overall prevalence of infected animals above, since this corresponded to a case with pneg small and two degrees of freedom. The results are summarised in the following table: Point Estimate 95% Confidence Interval Animal-Level 7.9% (6.5%, 21.0%) Prevalence in Positive Groups The mean estimate of the shedding prevalence remains the same, at 7.9%, but the confidence intervals is much wider, reflecting this uncertainly over the status of many of the farms reported as negative. It is interesting to note that these data are consistent with, on average, as many as 1 in 5 animals in positive groups shedding. 19 Analysing binomial data conditional on number of Vtpositives being greater than zero. Descriptive variables (Division, Sam_Month, Manage_O) 5656 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=*] VTPos; NBINOMIAL=N_Sam 5657 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5658 Manage_O * MESSAGE: Term Manage_O cannot be fully included in the model because 1 parameter is aliased with terms already in the model (Manage_O Mixed) = 0 ***** Regression Analysis ***** Response variate: VTPos Binomial totals: N_Sam Distribution: Binomial Link function: Logit Fitted terms: Constant, Manage_O *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio F pr. Regression 2 0. 0.160 0.02 0.979 Residual 204 1528. 7.489 Total 206 1528. 7.418 Dispersion parameter is estimated to be 7.49 from the residual deviance * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses * MESSAGE: The following units have high leverage: Unit Response Leverage 620 5.00 0.048 637 4.00 0.044 681 4.00 0.046 *** Estimates of parameters *** antilog of estimate s.e. t(204) t pr. estimate Constant -0.701 0.250 -2.81 0.005 0.4958 Manage_O Beef 0.054 0.277 0.20 0.846 1.056 Manage_O Other 0.060 0.324 0.18 0.854 1.061 Manage_O Mixed 0 * * * 1.000 * MESSAGE: s.e.s are based on the residual deviance Parameters for factors are differences compared with the reference level: Factor Reference level Manage_O Dairy Manage_O shows no significant effects. By contrast, consider Division. 5659 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=*] VTPos; NBINOMIAL=N_Sam 5660 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5661 Division ***** Regression Analysis ***** Response variate: VTPos Binomial totals: N_Sam Distribution: Binomial 20 Link function: Logit Fitted terms: Constant, Division *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio F pr. Regression 5 90. 18.017 2.52 0.031 Residual 201 1438. 7.154 Total 206 1528. 7.418 Dispersion parameter is estimated to be 7.15 from the residual deviance * MESSAGE: The following units have high leverage: Unit Response Leverage 15 21.00 0.092 51 3.00 0.092 139 9.00 0.088 143 1.00 0.105 566 15.00 0.092 584 10.00 0.104 637 4.00 0.101 *** Estimates of parameters *** antilog of estimate s.e. t(201) t pr. estimate Constant -0.653 0.202 -3.23 0.001 0.5205 Division Highland 0.725 0.395 1.84 0.068 2.065 Division Islands -0.326 0.439 -0.74 0.458 0.7218 Division North East 0.096 0.269 0.36 0.722 1.100 Division South East 0.243 0.303 0.80 0.424 1.275 Division South West -0.531 0.305 -1.74 0.083 0.5881 * MESSAGE: s.e.s are based on the residual deviance Parameters for factors are differences compared with the reference level: Factor Reference level Division Central The prevalence in the Highlands is significantly higher than that in Central, while those in the Islands and the South West show some evidence of being lower. 21 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Central Highlands Islands NE SE SW Plot of prevalences by animal health division (univariate analysis), with 95% confidence intervals. The estimated prevalences on positive farms in different divisions are as follows: Central 34% Highlands 52% Islands 27% NE 36% SE 40% SW 23% Hence there is evidence that the South West and Islands are low, Central, NE and SE are moderate and Highlands is high in terms of prevalence. Examining Sampling Month, ***** Regression Analysis ***** Response variate: VTPos Binomial totals: N_Sam Distribution: Binomial Link function: Logit Fitted terms: Constant, Sam_Mon *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio F pr. Regression 11 177. 16.104 2.32 0.011 Residual 195 1351. 6.928 Total 206 1528. 7.418 Dispersion parameter is estimated to be 6.93 from the residual deviance * MESSAGE: The error variance does not appear to be constant: intermediate responses are more variable than small or large responses 22 * MESSAGE: The following units have high leverage: Unit Response Leverage 308 16.00 0.176 326 18.00 0.164 333 14.00 0.172 *** Estimates of parameters *** antilog of estimate s.e. t(195) t pr. estimate Constant 0.301 0.460 0.65 0.514 1.351 Sam_Mon Feb -1.037 0.602 -1.72 0.086 0.3545 Sam_Mon Mar -0.570 0.525 -1.09 0.279 0.5656 Sam_Mon Apr -0.878 0.579 -1.52 0.131 0.4155 Sam_Mon May -0.535 0.517 -1.04 0.301 0.5854 Sam_Mon Jun -1.458 0.591 -2.47 0.014 0.2327 Sam_Mon Jul -1.407 0.569 -2.47 0.014 0.2448 Sam_Mon Aug -1.008 0.556 -1.81 0.071 0.3650 Sam_Mon Sep -1.695 0.594 -2.85 0.005 0.1836 Sam_Mon Oct -1.730 0.581 -2.98 0.003 0.1772 Sam_Mon Nov -0.653 0.540 -1.21 0.228 0.5207 Sam_Mon Dec -0.542 0.661 -0.82 0.413 0.5816 * MESSAGE: s.e.s are based on the residual deviance Parameters for factors are differences compared with the reference level: Factor Reference level Sam_Mon Jan 5745 RKEEP ; RESIDUALS=Resids; FITTEDVALUES=Fits;ESTIMATES=Para;VCOVAR=Var Examining the associated confidence intervals: 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% r ry ne Fe ry r ch ly ril r er st ay be be be Ju Ap ua a gu ob Ju ar M m nu m em M br Au ct ve ce Ja O pt No De Se Plot of prevalences by sampling month, with 95% confidence intervals. The estimated prevalences on positive farms in different sampling months are as follows: 23 January 57% February 32% March 43% April 36% May 44% June 24% July 25% August 33% September 20% October 19% November 41% December 44% There are clear differences between different months. The period June to October show significantly lower prevalences and there is some evidence of a peak in January. There is, however, little point in exploring these properties further before investigating the explanatory factors which may influence shedding rates. Exploring the possible explanatory factors in a univariate fashion using a Generalised Linear Model, the results are summarised in the following table. The p-values indicate the likely significance of the fitted values. Variables with p-values of less than 5% are indicated in red, those in the range 5%-10% in blue. Those variables which ultimately are found to be of interest in the multivariate analysis are indicated by bold text. 24 Factor/Variable p-value Comments Manage_C 0.88 ‘Beef’ and ‘Others' higher than 'Dairy' Manage_O 0.98 ‘Beef’ and ‘Others' higher than 'Dairy' Division 0.03 ‘Highland’ higher than others. Sam_Month 0.01 Lower in summer months Sample No variability in explanatory variable Sam_Year 0.50 No obvious pattern Season 0.006 Summer and Autumn lower than Winter and Spring SeasList 0.04 Both Summer and Autumn lower than Winter and Spring Sampler 0.85 ‘Fiona' is higher than 'Helen' Higher numbers of finishing cattle associated with lower N_F_Cattle 0.177 prevalence, probably better analysed as a factor, below FCattle 0.301 No consistent pattern Probably better analysed as a factor, below: More groups N_Groups 0.35 associated with lower prevalence. GroupsCat 0.93 No consistent pattern More animals in sampling group associated with lower N_Sam_Gr 0.22 prevalences Min_Age 0.44 Higher minimum age associated with lower prevalence Max_Age 0.25 Higher maximum age associated with lower prevalence Source 0.17 ‘Buy in' and ‘Both’ lower than 'Breeding only' NewSource 0.19 ‘Open' lower than 'Closed' Breed 0.54 ‘DairyBeef' less than 'Beef', but not significant Housed <0.001 Housed animals have much higher prevalences Housing <0.001 Housing confounded with Housed. Otherwise nothing. NoChange 0.59 1' higher than '0' (not sure of interpretation) TDHouse 0.45 Longer time associated with higher prevalences Rec_Move 0.002 A recent move is associated with lower prevalences Most recent move class 1 (<1 week) is lower than classes RecMove2 0.33 2 and 3 (>1 week) SupFeed <0.001 SupFeed confounded with Housed. Otherwise nothing. RecDFeed 0.007 Recent change in feed associated with lower prevalence Forage 0.007 Forage confounded with Housed. Silage 0.007 Silage confounded with Housed. Otherwise nothing. Concentrate 0.013 Concentrate confounded with Housed. ‘Yes' is lower than 'No'. Silage_Home confounded Sil_Home 0.029 with Housed. ‘Yes' is lower than 'No'. Silage_Manure confounded with Sil_Manure 0.19 Housed. ‘Yes' is lower than 'No'. Silage_Slurry confounded with Sil_Slurry 0.108 Housed. ‘Yes' is lower than 'No'. Silage_Sewage confounded with Sil_Sewage 0.44 Housed. ‘Yes' is higher than 'No'. Silage_Geece confounded with Sil_Geece 0.40 Housed. ‘Yes' is higher than 'No'. Silage_Gulls confounded with Sil_Gulls 0.37 Housed. Hay 0.79 ‘Yes' is lower than 'No' Hay_Manure 0.58 ‘Yes' is lower than 'No' Hay_Slurry 0.69 ‘Yes' is lower than 'No' Hay_Sewage No data points in class with Sewage on hay fields. 25 Hay_Geese No data points in class with Geese on hay fields. Hay_Gulls 0.45 Gulls present associated with lower prevalence Grass_Manure confounded with Housed. Otherwise ‘Yes' Grass_Manure <0.001 is lower than 'No', but not significant. Grass_Slurry confounded with Housed. Otherwise ‘Yes' Grass_Slurry <0.001 is lower than 'No', but not significant. Grass_Sewage confounded with Housed. Otherwise Grass_Sewage <0.001 nothing. Grass_Geece confounded with Housed. Otherwise ‘Yes' Grass_Geece <0.001 is lower than 'No' Grass_Gulls confounded with Housed. Otherwise ‘Yes' is Grass_Gulls <0.001 lower than 'No' N_Cattle 0.15 More cattle associated with lower prevalence Cattle 0.55 No clear pattern. Large numbers of sheep are protective, but better analysed N_Sheep 0.37 using a factor, below. Sheep 0.67 (Sheep absent or present) 'With' is lower than 'Without' N_Goats 0.21 More goats associated with higher prevalence Goats 0.46 (Goats absent or present) 'With' is higher than 'Without' N_Horses 0.84 More horses associated with lower prevalence N_Pigs 0.037 More pigs associated with lower prevalence Pigs 0.62 (Pigs absent or present) 'With' is lower than 'Without' N_Chickens 0.33 More chickens associated with higher prevalence (Chickens absent or present) 'With' is virtually identical to Chickens 1 'Without' N_Deer 0.026 More deer associated with higher prevalence Deer 0.026 (Deer absent or present) 'With' is higher than 'Without' Natural prevalences significantly lower than those for Water 0.014 Mains Mains prevalences slightly higher than those farms with Mains 0.83 other sources. Farms with natural water sources have lower prevalences Natural 0.002 than those with other sources. Farms with private water sources have lower prevalences Private 0.08 than those with other sources; confounded with housed. WaterCon 0.76 With' is higher than 'Without' All but 'None', 'Animal' and ASM thrown out for lack of WaterCT 0.52 information: 'ASM' lower than 'Animal' Those that wanted to know had higher prevalences than Want2Know 0.75 those who did not Those willing to have a 2nd visit had a lower prevalence Visit2 0.11 than those who were not LabOperator 0.55 S' generated lower prevalences than 'D' and ‘H’ BeefonDairy 0.34 This class of farm exhibits a higher prevalence The key explanatory factor appears to be Housed, reporting whether the animals were housed or not. Many of the other factors which appear significant are actually confounded with Housed, and reflect this variable. It may be appropriate to report the full results for the Housed analysis: 5763 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=*] VTPos; NBINOMIAL=N_Sam 5764 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5765 Housed 26 5765............................................................................ ***** Regression Analysis ***** Response variate: VTPos Binomial totals: N_Sam Distribution: Binomial Link function: Logit Fitted terms: Constant, Housed *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio F pr. Regression 1 161. 160.526 24.06 <.001 Residual 205 1367. 6.671 Total 206 1528. 7.418 Dispersion parameter is estimated to be 6.67 from the residual deviance * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses *** Estimates of parameters *** antilog of estimate s.e. t(205) t pr. estimate Constant -1.241 0.161 -7.73 <.001 0.2891 Housed 1 0.938 0.197 4.77 <.001 2.555 * MESSAGE: s.e.s are based on the residual deviance Parameters for factors are differences compared with the reference level: Factor Reference level Housed 0 Unhoused 22% Housed 42% 60% 50% 40% 30% 20% 10% 0% Unhoused Housed Housed animals exhibit much higher prevalences than unhoused animals. The effect of housing is so strong, and so fundamental, that it would seem wise to review all the other factors in terms of their interaction with Housing. 27 Factor/Variable p-value Comments Manage_C 0.153 ‘Beef’ higher and ‘Others' lower than 'Dairy' Manage_O 0.33 ‘Beef’ higher and ‘Others' lower than 'Dairy' ‘Highland’ higher than others, SW may be low. No Division 0.007 interaction with Housed. No interaction, monthly variability explained by Sam_Month 0.31 differential housing in different months. Sample No variability in explanatory variable Sam_Year 0.23 No obvious pattern No obvious pattern: seasonal variability explained by Season 0.32 differential housing. No obvious pattern: seasonal variability explained by SeasList 0.40 differential housing. ‘Fiona' has a different effect to 'Helen' in housed and Sampler 0.42 unhoused farms. No obvious effect. Higher numbers of finishing cattle associated with lower prevalence, probably better analysed as a factor, below. N_F_Cattle 0.009 No interaction with Housed. The larger the group of cattle, the lower the FCattle 0.032 prevalence. No interaction with Housed. Probably better analysed as a factor, below: More groups N_Groups 0.016 associated with lower prevalence. GroupsCat 0.41 No consistent pattern More housed animals in sampling groups associated with lower prevalences, more unhoused associated with higher N_Sam_Gr 0.20 prevalences. Higher minimum age associated with lower prevalence in Min_Age 0.31 unhoused farms, opposite on housed. Higher maximum age associated with lower prevalence in Max_Age 0.40 unhoused farms, opposite on housed. ‘Buy in' does different things in housed and unhoused farms. In unhoused, gives lower prevalences, in housed, Source 0.09 gives higher. ‘Open' lower than 'Closed' in unhoused groups, vice versa NewSource 0.08 in housed. Breed 0.67 No consistent pattern. Housing was confounded with Housed. Deal with this, and there is nothing left. ‘Slats’ and ‘Other’ are higher than Housing 0.73 ‘Court’ but nothing significant. NoChange 0.60 1' higher than '0' (not sure of interpretation) TDHouse 0.36 Longer time associated with higher prevalences Housed animals which have recently moved show Rec_Move 0.004 significantly lower shedding levels. In unhoused groups, most recent move class 1 (<1 week) RecMove2 0.16 is lower than classes 2 and 3 (>1 week) SupFeed was confounded with Housed. Having removed this, animals with supplementary feed have lower SupFeed 0.49 prevalences than those without. Housed animals which have had a recent change in RecDFeed 0.024 feed show significantly lower shedding levels. Forage was confounded with Housed. Now no consistent Forage 0.55 pattern. 28 Silage was confounded with Housed. Now no consistent Silage 0.51 pattern. Concentrate was confounded with Housed. Now no Concentrate 0.67 consistent pattern. ‘Yes' is lower than 'No'. ‘Null response’ lower than Sil_Home 0.04 ‘No’. No interaction with Housed. ‘Yes' is lower than 'No'. ‘Null response’ higher than ‘No’. Sil_Manure 0.047 No interaction with Housed. ‘Yes' is lower than 'No'. ‘Null response’ higher than ‘No’. Sil_Slurry 0.027 No interaction with Housed. Sil_Sewage 0.23 ‘Yes' is lower than 'No'. ‘Null response’ higher than ‘No’ Sil_Geece 0.34 No consistent pattern. Sil_Gulls 0.19 No consistent pattern. ‘Yes' is higher than 'No' in unhoused, vice versa in Hay 0.56 housed. Hay_Manure 0.52 ‘Yes' is lower than 'No' in unhoused animals. Hay_Slurry 0.60 ‘Yes' is lower than 'No' in unhoused animals. Hay_Sewage No data points in class with Sewage on hay fields. Hay_Geese No data points in class with Geese on hay fields. Gulls present associated with lower prevalence in Hay_Gulls 0.42 unhoused animals. Grass_Manure confounded with Housed. Otherwise ‘Yes' Grass_Manure 0.59 is lower than 'No', but not significant. Grass_Slurry confounded with Housed. Otherwise ‘Yes' Grass_Slurry 0.39 is lower than 'No', but not significant. Grass_Sewage Grass_Sewage completely aliased with Housed. Grass_Geese confounded with Housed. Otherwise ‘Yes' Grass_Geese 0.49 is lower than 'No' Grass_Gulls confounded with Housed. Otherwise ‘Yes' is Grass_Gulls 0.99 lower than 'No' More cattle associated with lower prevalence in housed N_Cattle 0.012 groups. No clear pattern: some evidence of lower prevalences in Cattle 0.18 larger housed groups. Large numbers of sheep are protective, but better analysed N_Sheep 0.10 using a factor, below. No interaction with Housed. Sheep 0.10 (Sheep absent or present) 'With' is lower than 'Without' N_Goats 0.49 Different effects in housed and unhoused. Goats 0.58 Different effects in housed and unhoused. More horses associated with lower prevalence in N_Horses 0.995 unhoused groups. More pigs associated with lower prevalence. No N_Pigs 0.034 interaction with Housed. (Pigs absent or present) 'With' is lower than 'Without' in Pigs 0.38 unhoused groups, vice versa for housed. More chickens associated with higher prevalence in N_Chickens 0.18 unhoused groups, vice versa in housed. (Chickens absent or present) 'With' is higher than Chickens 0.90 ‘Without’ in unhoused farms, vice versa for housed. More deer associated with higher prevalence. Potentially N_Deer 0.036 highly affected by one point’s leverage. (Deer absent or present) 'With' is higher than 'Without'. Deer 0.036 Potentially highly affected by one point’s leverage. 29 Effects explained by Housed variable. Mains water Water 0.28 associated with housed. Unhoused animals with mains water had higher Mains 0.79 prevalences, housed animals had lower. Unhoused animals with natural water had lower Natural 0.06 prevalences. Unhoused animals with private water had higher Private 0.27 prevalences, housed animals had lower. WaterCon 0.24 With' is higher than 'Without' WaterCT 1.00 No clear pattern Those that wanted to know had higher prevalences than Want2Know 0.39 those who did not Those willing to have a 2nd visit had a lower prevalence Visit2 0.19 than those who were not ‘H’ and’ S' generated lower prevalences than 'D' for LabOperator 0.45 unhoused farms, higher for housed. This class of farm exhibits a higher prevalence in housed BeefonDairy 0.59 groups, lower in unhoused. The Deer variables are driven by the presence of one farm in the study with a high prevalence, which was the only farm with a high number of deer, and indeed was one of only two farms with any deer at all. This record therefore has enormous leverage, and the resulting model is of dubious use. This variable should therefore be ignored. The variables which are of interest are therefore Housed, N_FCattle/FCattle/NGroups/NCattle, Source, Housed*Rec_Move/RecDFeed, Sil_Home/Sil_Manure/Sil_Slurry and N_Pigs. Note that the variables have been grouped, where appropriate, into equivalence classes of what are likely to be highly correlated factors. Exploring the N_FCattle/FCattle/NGroups/Ncattle complex, which all associate lower prevalences with larger numbers of cattle, using forward stepwise selection with the Akaike information criterion to select candidates for inclusion/exclusion, we find that FCattle is the most informative measure, with NGroups the second most informative, but lacking statistical significance. 5579 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=*] VTPos; NBINOMIAL=N_Sam 5580 RSEARCH [PRINT=model,results; METHOD=fstepwise; CONSTANT=estimate; FACTORIAL=3; DENOMINATOR=ss;\ 5581 INRATIO=1; OUTRATIO=1; MAXCYCLE=50; AFACTORIAL=2; CRITERION=aic; EXTRA=cp; NTERMS=60;\ 5582 NBESTMODELS=8;FORCED=Housed] N_F_Catt+FCattle+N_Groups+N_Cattle ***** Model Selection ***** Response variate: VTPos Binomial totals: N_Sam Distribution: Binomial Link function: Logit Number of units: 207 Forced terms: Constant + Housed Forced df: 2 Free terms: N_F_Catt + FCattle + N_Groups + N_Cattle *** Stepwise (forward) analysis of deviance *** Change mean deviance approx d.f. deviance deviance ratio F pr. + Housed 1 160.526 160.526 24.82 <.001 + FCattle 3 58.365 19.455 3.01 0.031 30 + N_Groups 1 9.011 9.011 1.39 0.239 Residual 201 1300.105 6.468 Total 206 1528.006 7.418 Final model: Constant + Housed + FCattle + N_Groups Exploring the Housed*Rec_Move/RecDFeed complex, we see that Housed*Rec_Move is the more informative variable. 5588 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=*] VTPos; NBINOMIAL=N_Sam 5589 RSEARCH [PRINT=model,results; METHOD=fstepwise; CONSTANT=estimate; FACTORIAL=3; DENOMINATOR=ss;\ 5590 INRATIO=1; OUTRATIO=1; MAXCYCLE=50; AFACTORIAL=2; CRITERION=aic; EXTRA=cp; NTERMS=60;\ 5591 NBESTMODELS=8;FORCED=Housed] Housed.(Rec_Move+RecDFeed) ***** Model Selection ***** Response variate: VTPos Binomial totals: N_Sam Distribution: Binomial Link function: Logit Number of units: 207 Forced terms: Constant + Housed Forced df: 2 Free terms: Housed.Rec_Move + Housed.RecDFeed *** Stepwise (forward) analysis of deviance *** Change mean deviance approx d.f. deviance deviance ratio F pr. + Housed 1 160.526 160.526 25.16 <.001 + Housed.Rec_Move 2 72.370 36.185 5.67 0.004 Residual 203 1295.110 6.380 Total 206 1528.006 7.418 Final model: Constant + Housed + Housed.Rec_Move The non-inclusion of RecDFeed can be explained by a confounding between this factor and Rec_Move. Considering the farms with shedding present, these divide into 4 categories depending on the status of the two factors: Number of observations RecDFeed 0 1 Rec_Move 0 137 14 1 20 36 Mean shedding fraction RecDFeed 0 1 Rec_Move 0 0.41 0.29 1 0.26 0.24 However, the behaviour is heavily dependent on the housing status of the farm. Tabulating the number of observations, the mean shedding fraction and the standard error of these statistics gives the following: Housed=0 RecDFeed Housed=1 RecDFeed 0 1 0 1 Rec_Move 0 38 6 Rec_Move 0 99 8 31 1 15 22 1 5 14 Housed=0 RecDFeed Housed=1 RecDFeed 0 1 0 1 Rec_Move 0 0.22 0.14 Rec_Move 0 0.48 0.40 1 0.26 0.26 1 0.26 0.20 Housed=0 RecDFeed Housed=1 RecDFeed 0 1 0 1 Rec_Move 0 0.032 0.049 Rec_Move 0 0.032 0.127 1 0.072 0.051 1 0.093 0.041 The impression which might be given by a simple examination of the means would be that the higher prevalences are restricted only to housed animals which have not been subject to a recent move. However, care should be taken given the extremely small numbers of animals which have been subjected to a change in diet without a change in feed. The difference between the mean of this group and the means in the low prevalence group is unlikely to be statistically significant. Clearly a positive entry for either RecDFeed or Rec_Move is associated with a lower shedding rate, although there is no sign of an interaction: the data set defining the most interesting aspects of the relationship is extremely sparse. For ease of analysis we therefore define a new variable RecChnge, which defines whether either change has taken place. The resulting interaction with Housed is highly significant (p=0.009). The effect of this factor could be centred on the effect of a change of location or of a change of diet: the dataset does not allow any further detail to be established. Analysing the complex of significant silage related factors is complicated by the questionnaire structure. Many of the questions were only asked if the responses to a previous question took particular values. Hence, simple-minded fitting of multi- variate models will fail due to multiple aliasing of terms in the model. The data structure can be summarised as follows: 32 Responses Stratum Comments 0 1 Housed Housed or unhoused 0 1 999 0 1 999 Silage 0=no silage fed 1=silage fed 999=question not asked Few Few Many Many Many Few 999 0 1 999 999 0 1 999 Sil_Home 0=silage fed and not produced 1=silage fed and produced on- farm 999=no silage fed or question not asked 999 999 0 1 999 999 999 0 1 999 Others 0=silage produced, factor not present 1=silage produced, factor present 999=no silage produced on farm or question not asked Aliasing will obviously be a problem, and it should be noted that non-trivial responses to the later questions are more heavily drawn from the housed population. This may affect the analysis. Housed has previously been shown to be a highly significant variable. Silage is not significant, either as a main effect or in interaction with Housed. Fitting Sil_Home in interaction with Housed gives the following results: * MESSAGE: Term Housed.Sil_Home cannot be fully included in the model because 1 parameter is aliased with terms already in the model (Housed 1 .Sil_Home 999) = - 1.000 + (Housed 1) + (Sil_Home 1) + (Sil_Home 999) - (Housed 1 .Sil_Home 1) ***** Regression Analysis ***** Response variate: VTPos Binomial totals: N_Sam Distribution: Binomial Link function: Logit Fitted terms: Constant + Housed + Sil_Home + Housed.Sil_Home *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio F pr. Regression 4 205. 51.186 7.81 <.001 Residual 202 1323. 6.551 Total 206 1528. 7.418 33 Dispersion parameter is estimated to be 6.55 from the residual deviance * MESSAGE: The error variance does not appear to be constant: intermediate responses are more variable than small or large responses * MESSAGE: The following units have high leverage: Unit Response Leverage 28 10.00 0.161 202 1.00 0.121 277 1.00 0.177 326 18.00 0.520 504 1.00 0.097 703 1.00 0.209 846 1.00 0.113 877 15.00 0.473 885 1.00 0.113 *** Estimates of parameters *** antilog of estimate s.e. t(202) t pr. estimate Constant 0.182 0.996 0.18 0.855 1.200 Housed 1 1.117 0.269 4.16 <.001 3.056 Sil_Home 1 -2.08 1.21 -1.72 0.086 0.1246 Sil_Home 999 -1.375 0.983 -1.40 0.163 0.2528 Housed 1 .Sil_Home 1 0.347 0.746 0.47 0.642 1.416 Housed 1 .Sil_Home 999 0 * * * 1.000 * MESSAGE: s.e.s are based on the residual deviance Parameters for factors are differences compared with the reference level: Factor Reference level Housed 0 Sil_Home 0 Trivial answers from housed animals are not fitted in the model because they are aliased with a previously-fitted term. However, we are not interested in this group. Dropping the interaction term is not statistically significant (p=0.63), however, dropping the main Sil_Home effect significantly increases the deviance (p=0.04). We therefore consider the model containing both the Housed and Sil_Home main effects: **** Regression Analysis ***** Response variate: VTPos Binomial totals: N_Sam Distribution: Binomial Link function: Logit Fitted terms: Constant + Housed + Sil_Home *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio F pr. Regression 3 203. 67.749 10.38 <.001 Residual 203 1325. 6.526 Total 206 1528. 7.418 Dispersion parameter is estimated to be 6.53 from the residual deviance * MESSAGE: The error variance does not appear to be constant: intermediate responses are more variable than small or large responses * MESSAGE: The following units have high leverage: Unit Response Leverage 326 18.00 0.520 877 15.00 0.473 *** Estimates of parameters *** antilog of estimate s.e. t(203) t pr. estimate Constant 0.133 0.989 0.13 0.893 1.143 Housed 1 1.166 0.247 4.72 <.001 3.208 34 Sil_Home 1 -1.747 0.967 -1.81 0.072 0.1742 Sil_Home 999 -1.345 0.979 -1.37 0.171 0.2606 * MESSAGE: s.e.s are based on the residual deviance Parameters for factors are differences compared with the reference level: Factor Reference level Housed 0 Sil_Home 0 Clearly, housed animal still present a higher prevalence, but this model indicates that animals in the level 1 class of the Sil_Home factor have lower prevalences than those in level 0 class. The level 999 class is not significantly different to either of the other two classes, but this is not surprising, given the heterogeneous nature of this level: it mostly refects unhoused farms, where the silage question was not asked. Hence, among housed animals where the farm produces silage, the mean prevalence appears to be lower. There are, of course, further factors nested within the silage production factor. The GLM model is not a good choice for the analysis of such unbalanced data, and it is also possible to define a more informative data structure. The silage feeding factor is not nested within the housing factor, but it should have been: only a few farms with unhoused animals have records relating to silage production, even if they did produce silage. Such small numbers of values, generated randomly by accident (biased towards early samples collected by a relatively inexperienced operator) are worthless. Hence a new factor is defined: Silage2, defining farms with housed animals which do feed them silage. We continue this process, defining new dummy variables: SHome2, defining farms with housed animals, feeding silage, which do produce silage; SMan2, defining farms with housed animals, feeding and producing silage, which spread manure on the silage fields; SSlu2, SSew2, SGeec2 and SGull2 are defined in a similar fashion. These variables will be fitted along with Housed in a GLMM to explore the inter-relations between the different factors. Fitting the Housed, Silage Feeding and Silage Production factors gives the following output: 6479 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects; DISTRIBUTION=binomial;\ 6480 LINK=logit; DISPERSION=1; FIXED=Housed+Silage2+SHome2; RANDOM=Farm; CONSTANT=estimate;\ 6481 FACT=9; PSE=*; MAXCYCLE=20; FMETHOD=all; CADJUST=mean] VTPos; NBINOMIAL=N_Sam ***** Generalised Linear Mixed Model Analysis ***** Method: cf Schall (1991) Biometrika Response variate: VTPos Distribution: BINOMIAL Link function: LOGIT Random model: Farm Fixed model: Constant + (Housed + Silage2) + SHome2 * Dispersion parameter fixed at value 1.000 *** Monitoring information *** Iteration Gammas Dispersion Max change 1 1.347 1.000 2.0404E+00 2 1.734 1.000 3.8636E-01 3 1.903 1.000 1.6972E-01 4 1.927 1.000 2.3823E-02 5 1.929 1.000 1.6145E-03 6 1.929 1.000 2.0148E-04 35 7 1.929 1.000 2.4608E-05 *** Estimated Variance Components *** Random term Component S.e. Farm 1.929 0.235 *** Residual variance model *** Term Factor Model(order) Parameter Estimate S.e. Dispersn Identity Sigma2 1.000 FIXED *** Estimated Variance matrix for Variance Components *** Farm 1 0.05510 Dispersn 2 0.00000 0.00000 1 2 *** Table of effects for Constant *** -1.471 Standard error: 0.1727 *** Table of effects for Housed *** Housed 0.0000 1.0000 0.0000 1.4064 Standard error of differences: 0.3088 *** Table of effects for Silage2 *** Silage2 0.0000 1.0000 0.0000 1.3649 Standard error of differences: 1.083 *** Table of effects for SHome2 *** SHome2 0.0000 1.0000 0.0000 -1.7519 Standard error of differences: 1.065 *** Tables of means *** *** Table of predicted means for Housed *** Housed 0.0000 1.0000 -1.6650 -0.2586 *** Table of predicted means for Silage2 *** Silage2 0.0000 1.0000 -1.6442 -0.2794 *** Table of predicted means for SHome2 *** 36 SHome2 0.0000 1.0000 -0.0859 -1.8377 *** Back-transformed Means (on the original scale) *** Housed 0.0000 0.1591 1.0000 0.4357 Silage2 0.0000 0.1619 1.0000 0.4306 SHome2 0.0000 0.4785 1.0000 0.1373 Note: means are probabilities not expected values. 6482 VDISPLAY [PRINT=Wald] *** Wald tests for fixed effects *** Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob * Sequentially adding terms to fixed model Housed 27.72 1 27.72 <0.001 Silage2 1.31 1 1.31 0.253 SHome2 2.71 1 2.71 0.100 * Dropping individual terms from full fixed model Housed 20.74 1 20.74 <0.001 Silage2 1.59 1 1.59 0.208 SHome2 2.71 1 2.71 0.100 * Message: chi-square distribution for Wald tests is an asymptotic approximation (i.e. for large samples) and underestimates the probabilities in other cases. Inevitably, Housed is highly significant, while Silage Feeding explains virtually none of the variability. Silage production, however, has borderline significance in explaining some of the variability seen in the data. Fitting the production variables in turn gives the following p-values from the Wald statistic (when all other factors have also been fitted). p-value Manure 0.11 Sewage 0.91 Slurry 0.06 Geece 0.90 Gulls 0.61 Clearly, Gulls, Geece and Sewage have no significant effect. However, the spreading of sewage and the spreading of slurry both appeear worth further examination. When they are both fitted in the same model, the spreading of manure lacks significance, with a p-value of 0.135, while the spreading of slurry is still within the range of 37 interest (p=0.08). Fitting the model with only slurry spreading gives rise to the following Wald statsitics: 6515 VDISPLAY [PRINT=Wald] *** Wald tests for fixed effects *** Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob * Sequentially adding terms to fixed model Housed 27.94 1 27.94 <0.001 Silage2 1.31 1 1.31 0.252 SHome2 2.73 1 2.73 0.098 SSlur2 3.40 1 3.40 0.065 * Dropping individual terms from full fixed model Housed 20.91 1 20.91 <0.001 Silage2 1.61 1 1.61 0.205 SHome2 1.91 1 1.91 0.167 SSlur2 3.40 1 3.40 0.065 * Message: chi-square distribution for Wald tests is an asymptotic approximation (i.e. for large samples) and underestimates the probabilities in other cases. We note that Silage2 (feeding) continues to lack any significance, while the presence of slurry spreading factor (SSlur2) removed any significance from the Silage production factor (SHome2). Refitting the model without Silage2 causes only marginal changes. Refitting the model without SHome2 gives: 6516 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects; DISTRIBUTION=binomial;\ 6517 LINK=logit; DISPERSION=1; FIXED=Housed+SSlur2; RANDOM=Farm; CONSTANT=estimate; FACT=9;\ 6518 PSE=*; MAXCYCLE=20; FMETHOD=all; CADJUST=mean] VTPos; NBINOMIAL=N_Sam ***** Generalised Linear Mixed Model Analysis ***** Method: cf Schall (1991) Biometrika Response variate: VTPos Distribution: BINOMIAL Link function: LOGIT Random model: Farm Fixed model: Constant + Housed + SSlur2 * Dispersion parameter fixed at value 1.000 *** Monitoring information *** Iteration Gammas Dispersion Max change 1 1.340 1.000 1.9846E+00 2 1.713 1.000 3.7332E-01 3 1.882 1.000 1.6920E-01 4 1.906 1.000 2.4158E-02 5 1.908 1.000 1.6650E-03 6 1.908 1.000 2.0323E-04 7 1.908 1.000 2.4199E-05 *** Estimated Variance Components *** Random term Component S.e. Farm 1.908 0.232 *** Residual variance model *** 38 Term Factor Model(order) Parameter Estimate S.e. Dispersn Identity Sigma2 1.000 FIXED *** Estimated Variance matrix for Variance Components *** Farm 1 0.05384 Dispersn 2 0.00000 0.00000 1 2 *** Table of effects for Constant *** -1.471 Standard error: 0.1719 *** Table of effects for Housed *** Housed 0.0000 1.0000 0.0000 1.3767 Standard error of differences: 0.2380 *** Table of effects for SSlur2 *** SSlur2 0.0000 1.0000 0.0000 -0.6851 Standard error of differences: 0.2917 *** Tables of means *** *** Table of predicted means for Housed *** Housed 0.0000 1.0000 -1.813 -0.437 *** Table of predicted means for SSlur2 *** SSlur2 0.0000 1.0000 -0.782 -1.467 *** Back-transformed Means (on the original scale) *** Housed 0.0000 0.1403 1.0000 0.3926 SSlur2 0.0000 0.3138 1.0000 0.1873 Note: means are probabilities not expected values. 6519 VDISPLAY [PRINT=Wald] *** Wald tests for fixed effects *** 39 Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob * Sequentially adding terms to fixed model Housed 27.94 1 27.94 <0.001 SSlur2 5.52 1 5.52 0.019 * Dropping individual terms from full fixed model Housed 33.45 1 33.45 <0.001 SSlur2 5.52 1 5.52 0.019 * Message: chi-square distribution for Wald tests is an asymptotic approximation (i.e. for large samples) and underestimates the probabilities in other cases. The spreading of slurry on silage fields on farms where the animals are housed is associated with statistically significantly lower (p=0.02) shedding levels. The other factors are explained either by their association with housing or with slurry spreading. Only one farm is recorded as having both housed animals and a natural water supply. Hence, any effect of natural water supply can be estimated only for unhoused animals. Refitting the model only to unhoused animals, we find that the effect remains statistically significant (p=0.03). The factor is redefined to define farms with unhoused animals with access to a natural water supply (Natural2). Hence, the factors which appear to be particularly likely to be relevant in the multi- factor model are Housed, FCattle, Housed*Source, Housed*RecChnge, SSlur2, N_Pigs and Natural2. Forcing the model to contain Housed, we use stepwise regression to evaluate which of these factors should be included in a multi-factor model: 6520 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=*] VTPos; NBINOMIAL=N_Sam 6521 RSEARCH [PRINT=model,results; METHOD=fstepwise; FORCED=Housed; CONSTANT=estimate; FACTORIAL=3; DENOMINATOR=ss;\ 6522 INRATIO=1; OUTRATIO=1; MAXCYCLE=50; AFACTORIAL=2; CRITERION=aic; EXTRA=cp; NTERMS=60;\ 6523 NBESTMODELS=8] FCattle + Housed.Source + Housed.RecChnge +SSlur2 + N_Pigs + Natural2 ***** Model Selection ***** Response variate: VTPos Binomial totals: N_Sam Distribution: Binomial Link function: Logit Number of units: 207 Forced terms: Constant + Housed Forced df: 2 Free terms: FCattle + Housed.Source + Housed.RecChnge + SSlur2 + N_Pigs + Natural2 *** Stepwise (forward) analysis of deviance *** Change mean deviance approx d.f. deviance deviance ratio F pr. + Housed 1 160.526 160.526 26.69 <.001 + Housed.RecChnge 2 61.752 30.876 5.13 0.007 + Housed.Source 4 57.622 14.405 2.40 0.052 + Natural2 1 23.184 23.184 3.85 0.051 + FCattle 3 34.351 11.450 1.90 0.130 + SSlur2 1 22.338 22.338 3.71 0.055 + N_Pigs 1 7.532 7.532 1.25 0.264 Residual 193 1160.702 6.014 Total 206 1528.006 7.418 40 Final model: Constant + Housed + Housed.RecChnge + Housed.Source + Natural2 + FCattle + SSlur2 + N_Pigs All of the factors are statistically significant with p-values less than or near 0.05, except for N_Pigs, which has ceased to show any appreciable evidence of fit and Fcattle which now has a significance level of 0.13. Dropping N_Pigs from the full model above produces a small change in deviance (p=0.26) by an F-test. We therefore conclude that the univariate significance of the N_Pigs variable is caused by some aspect of the data better explained by one of the other factors. Dropping FCattle from the (new) full model produces a larger change in deviance (p=0.11) by an F-test. It is decided to retain FCattle for the moment. Fitting the remaining factors in a multi-factor model, we generate the following output: 6600 "Modelling of binomial proportions. (e.g. by logits)." 6601 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=*] VTPos; NBINOMIAL=N_Sam 6602 TERMS [FACT=9] Housed + Housed.RecChnge + Housed.Source + Natural2 + FCattle + SSlur2 6603 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 6604 Housed + Housed.RecChnge + Housed.Source + Natural2 + FCattle + SSlur2 ***** Regression Analysis ***** Response variate: VTPos Binomial totals: N_Sam Distribution: Binomial Link function: Logit Fitted terms: Constant + Housed + Housed.RecChnge + Housed.Source + Natural2 + FCattle + SSlur2 *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio F pr. Regression 12 360. 29.981 4.98 <.001 Residual 194 1168. 6.022 Total 206 1528. 7.418 Dispersion parameter is estimated to be 6.02 from the residual deviance * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses *** Estimates of parameters *** antilog of estimate s.e. t(194) t pr. estimate Constant -0.682 0.304 -2.25 0.026 0.5058 Housed 1 0.961 0.348 2.76 0.006 2.616 Housed 0 .RecChnge 1 -0.179 0.322 -0.56 0.579 0.8362 Housed 1 .RecChnge 1 -0.780 0.302 -2.59 0.010 0.4584 Housed 0 .Source Buy -0.883 0.473 -1.87 0.064 0.4134 Housed 0 .Source Both -0.392 0.446 -0.88 0.380 0.6756 Housed 1 .Source Buy -0.178 0.268 -0.66 0.507 0.8371 Housed 1 .Source Both -0.479 0.311 -1.54 0.126 0.6196 Natural2 1 -0.661 0.349 -1.89 0.060 0.5164 FCattle 2 0.152 0.231 0.66 0.512 1.164 FCattle 3 -0.364 0.268 -1.36 0.176 0.6950 FCattle 4 -0.455 0.327 -1.39 0.165 0.6344 SSlur2 1 -0.493 0.257 -1.92 0.057 0.6106 * MESSAGE: s.e.s are based on the residual deviance Parameters for factors are differences compared with the reference level: Factor Reference level Housed 0 Natural2 0 41 FCattle 1 SSlur2 0 Again using stepwise regression to explore the properties of the data, we force the above factors to be included in the model, and explore whether any other factors now should be included in the model (excluding time and geographical variables which will be considered later): 6605 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=*] VTPos; NBINOMIAL=N_Sam 6606 RSEARCH [PRINT=model,results; METHOD=fstepwise; FORCED=Housed + Housed.RecChnge\ 6607 + Housed.Source + Natural2 + FCattle + SSlur2; CONSTANT=estimate; FACTORIAL=3; DENOMINATOR=ss;\ 6608 INRATIO=1; OUTRATIO=1; MAXCYCLE=50; AFACTORIAL=2; CRITERION=aic; EXTRA=cp; NTERMS=60;\ 6609 NBESTMODELS=8] BeefOnDairy + Breed + Cattle + Chicks +Forage + Goats \ 6610 + Gra_Geec + Gra_Gull + Gra_Manu + Gra_Slur + Hay + Hay_Manu + Lab_Op + Manage_C +\ 6611 Manage_O + Max_Age + Min_Age + N_Goats + N_Horses + N_Pigs + N_Sheep + NoChange + \ 6612 Pigs + Sampler + Sheep + T_DHouse + Visit2 + Want2Kno + Mains+Private+Water_Con + WaterCT ***** Model Selection ***** Response variate: VTPos Binomial totals: N_Sam Distribution: Binomial Link function: Logit Number of units: 199 Forced terms: Constant + Housed + Housed.RecChnge + Housed.Source + Natural2 + FCattle + SSlur2 Forced df: 13 Free terms: BeefOnDairy + Breed + Cattle + Chicks + Forage + Goats + Gra_Geec + Gra_Gull + Gra_Manu + Gra_Slur + Hay + Hay_Manu + Lab_Op + Manage_C + Manage_O + Max_Age + Min_Age + N_Goats + N_Horses + N_Pigs + N_Sheep + NoChange + Pigs + Sampler + Sheep + T_DHouse + Visit2 + Want2Kno + Mains + Private + Water_Con + WaterCT *** Stepwise (forward) analysis of deviance *** Change mean deviance approx d.f. deviance deviance ratio F pr. + Housed + Housed.RecChnge + Housed.Source + Natural2 + FCattle + SSlur2 12 321.379 26.782 4.72 <.001 + Sheep 1 24.244 24.244 4.27 0.040 + Visit2 1 14.035 14.035 2.47 0.118 + Breed 5 39.495 7.899 1.39 0.229 + Chicks 1 13.171 13.171 2.32 0.129 + Water_Con 1 14.980 14.980 2.64 0.106 + Forage 2 15.461 7.731 1.36 0.259 + NoChange 1 6.347 6.347 1.12 0.292 Residual 174 987.200 5.674 Total 198 1436.312 7.254 Final model: Constant + Housed + Housed.RecChnge + Housed.Source + Natural2 + FCattle + SSlur2 + Sheep + Visit2 + Breed + Chicks + Water_Con + Forage + NoChange The threshold for inclusion is set deliberately low, so many of these will lack statistical significance. We examine their suitability for inclusion in the model by implementing a backwards stepwise procedure. 42 1/ NoChange is not statistically significant when dropped (p=0.38). NoChange is dropped. 2/ Forage is not statistically significant when dropped (p=0.37). Forage is dropped. 3/ Breed is not statistically significant when dropped (p=0.42). Breed is dropped. 4/ Chick is not statistically significant when dropped (p=0.23). Chick is dropped. 5/ Visit2 is not statistically significant when dropped (p=0.14). Visit2 is dropped. 6/ Water_Con is not statistically significant when dropped (p=0.23). Water_Con is dropped. When FCattle is experimentally dropped from the model, it registers a significance of 0.09. It is therefore retained, as is Sheep. Hence we conclude that the multivariate model to be carried forward to the GLMM process is Housed + FCattle + Housed.Source + Housed.RecChnge + SSlur2 + Natural2+Sheep Fitting this model in the Generalised Linear Mixed Model context gives the following output (neither county or veterinary practice are found to be significant random effects): 6629 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects; DISTRIBUTION=binomial;\ 6630 LINK=logit; DISPERSION=1; FIXED=Housed + FCattle + Housed.Source + Housed.RecChnge + SSlur2 + Natural2+Sheep;\ 6631 RANDOM=Farm; CONSTANT=estimate; FACT=9; PSE=*; MAXCYCLE=20; FMETHOD=all; CADJUST=mean]\ 6632 VTPos; NBINOMIAL=N_Sam ***** Generalised Linear Mixed Model Analysis ***** Method: cf Schall (1991) Biometrika Response variate: VTPos Distribution: BINOMIAL Link function: LOGIT Random model: Farm Fixed model: Constant + (((((Housed + FCattle) + (Housed . Source)) + (Housed . RecChnge)) + SSlur2) + Natural2) + Sheep * Dispersion parameter fixed at value 1.000 *** Monitoring information *** Iteration Gammas Dispersion Max change 1 1.192 1.000 1.9262E+00 2 1.565 1.000 3.7302E-01 3 1.707 1.000 1.4208E-01 4 1.727 1.000 1.9953E-02 5 1.729 1.000 1.3488E-03 6 1.729 1.000 1.5644E-04 7 1.729 1.000 1.7719E-05 *** Estimated Variance Components *** Random term Component S.e. Farm 1.729 0.221 *** Residual variance model *** Term Factor Model(order) Parameter Estimate S.e. Dispersn Identity Sigma2 1.000 FIXED 43 *** Estimated Variance matrix for Variance Components *** Farm 1 0.04876 Dispersn 2 0.00000 0.00000 1 2 *** Table of effects for Constant *** -0.6691 Standard error: 0.36486 *** Table of effects for Housed *** Housed 0.0000 1.0000 0.0000 1.2032 Standard error of differences: 0.3911 *** Table of effects for FCattle *** FCattle 1 2 3 4 0.0000 0.1717 -0.4264 -0.6731 Standard error of differences: Average 0.3330 Maximum 0.3876 Minimum 0.2608 Average variance of differences: 0.1133 *** Table of effects for Housed.Source *** Source Breed Buy Both Housed 0.0000 0.0000 -0.8806 -0.2403 1.0000 0.0000 -0.0607 -0.4802 Standard error of differences: Average 0.4572 Maximum 0.5820 Minimum 0.3133 Average variance of differences: 0.2177 *** Table of effects for Housed.RecChnge *** RecChnge 0.0000 1.0000 Housed 0.0000 0.0000 -0.1825 1.0000 0.0000 -0.9878 Standard error of differences: Average 0.3687 Maximum 0.4842 Minimum 0.3388 Average variance of differences: 0.1393 *** Table of effects for SSlur2 *** SSlur2 0.0000 1.0000 0.0000 -0.4288 Standard error of differences: 0.2977 *** Table of effects for Natural2 *** 44 Natural2 0.0000 1.0000 0.0000 -0.7141 Standard error of differences: 0.3534 *** Table of effects for Sheep *** Sheep 1 2 0.0000 -0.3043 Standard error of differences: 0.2317 *** Tables of means *** *** Table of predicted means for Housed *** Housed 0.0000 1.0000 -2.090 -1.096 *** Table of predicted means for FCattle *** FCattle 1 2 3 4 -1.361 -1.189 -1.787 -2.034 *** Table of predicted means for Housed.Source *** Source Breed Buy Both Housed 0.0000 -1.716 -2.597 -1.956 1.0000 -0.915 -0.976 -1.396 *** Table of predicted means for Housed.RecChnge *** RecChnge 0.0000 1.0000 Housed 0.0000 -1.998 -2.181 1.0000 -0.602 -1.590 *** Table of predicted means for SSlur2 *** SSlur2 0.0000 1.0000 -1.378 -1.807 *** Table of predicted means for Natural2 *** Natural2 0.0000 1.0000 -1.236 -1.950 *** Table of predicted means for Sheep *** Sheep 1 2 -1.440 -1.745 *** Back-transformed Means (on the original scale) *** Housed 0.0000 0.1101 1.0000 0.2506 FCattle 45 1 0.2041 2 0.2334 3 0.1434 4 0.1157 Housed 0.0000 1.0000 Source Breed 0.1524 0.2859 Buy 0.0694 0.2737 Both 0.1239 0.1985 RecChnge 0.0000 1.0000 Housed 0.0000 0.1194 0.1015 1.0000 0.3539 0.1695 SSlur2 0.0000 0.2013 1.0000 0.1410 Natural2 0.0000 0.2252 1.0000 0.1246 Sheep 1 0.1915 2 0.1487 Note: means are probabilities not expected values. 6633 VDISPLAY [PRINT=Wald] *** Wald tests for fixed effects *** Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob * Sequentially adding terms to fixed model Housed 29.54 1 29.54 <0.001 FCattle 10.11 3 3.37 0.018 Housed.Source 5.57 4 1.39 0.234 Housed.RecChnge 10.75 2 5.38 0.005 SSlur2 2.34 1 2.34 0.126 Natural2 4.13 1 4.13 0.042 Sheep 1.73 1 1.73 0.189 * Dropping individual terms from full fixed model FCattle 7.20 3 2.40 0.066 Housed.Source 5.60 4 1.40 0.231 Housed.RecChnge 8.84 2 4.42 0.012 SSlur2 2.08 1 2.08 0.150 Natural2 4.08 1 4.08 0.043 Sheep 1.73 1 1.73 0.189 * Message: chi-square distribution for Wald tests is an asymptotic approximation (i.e. for large samples) and underestimates the probabilities in other cases. Remembering that the Wald tests are liberal, these results show no evidence for retaining Sheep and Housed.Source in the model. Refitting the model without these factors gives the following output: 6634 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects; DISTRIBUTION=binomial;\ 46 6635 LINK=logit; DISPERSION=1; FIXED=Housed + FCattle + Housed.RecChnge + SSlur2 + Natural2;\ 6636 RANDOM=Farm; CONSTANT=estimate; FACT=9; PSE=*; MAXCYCLE=20; FMETHOD=all; CADJUST=mean]\ 6637 VTPos; NBINOMIAL=N_Sam ***** Generalised Linear Mixed Model Analysis ***** Method: cf Schall (1991) Biometrika Response variate: VTPos Distribution: BINOMIAL Link function: LOGIT Random model: Farm Fixed model: Constant + (((Housed + FCattle) + (Housed . RecChnge)) + SSlur2) + Natural2 * Dispersion parameter fixed at value 1.000 *** Monitoring information *** Iteration Gammas Dispersion Max change 1 1.253 1.000 1.8574E+00 2 1.585 1.000 3.3224E-01 3 1.736 1.000 1.5076E-01 4 1.757 1.000 2.1145E-02 5 1.759 1.000 1.4440E-03 6 1.759 1.000 1.6155E-04 7 1.759 1.000 1.7614E-05 *** Estimated Variance Components *** Random term Component S.e. Farm 1.759 0.221 *** Residual variance model *** Term Factor Model(order) Parameter Estimate S.e. Dispersn Identity Sigma2 1.000 FIXED *** Estimated Variance matrix for Variance Components *** Farm 1 0.04875 Dispersn 2 0.00000 0.00000 1 2 *** Table of effects for Constant *** -1.071 Standard error: 0.3036 *** Table of effects for Housed *** Housed 0.0000 1.0000 0.0000 1.3188 Standard error of differences: 0.3318 *** Table of effects for FCattle *** FCattle 1 2 3 4 0.0000 0.1309 -0.5034 -0.7694 Standard error of differences: Average 0.3248 47 Maximum 0.3815 Minimum 0.2595 Average variance of differences: 0.1077 *** Table of effects for Housed.RecChnge *** RecChnge 0.0000 1.0000 Housed 0.0000 0.0000 -0.1043 1.0000 0.0000 -0.8906 Standard error of differences: Average 0.3661 Maximum 0.4804 Minimum 0.3361 Average variance of differences: 0.1373 *** Table of effects for SSlur2 *** SSlur2 0.0000 1.0000 0.0000 -0.5229 Standard error of differences: 0.2901 *** Table of effects for Natural2 *** Natural2 0.0000 1.0000 0.0000 -0.7082 Standard error of differences: 0.3525 *** Tables of means *** *** Table of predicted means for Housed *** Housed 0.0000 1.0000 -2.024 -1.099 *** Table of predicted means for FCattle *** FCattle 1 2 3 4 -1.276 -1.145 -1.779 -2.045 *** Table of predicted means for Housed.RecChnge *** RecChnge 0.0000 1.0000 Housed 0.0000 -1.972 -2.077 1.0000 -0.653 -1.544 *** Table of predicted means for SSlur2 *** SSlur2 0.0000 1.0000 -1.300 -1.823 *** Table of predicted means for Natural2 *** Natural2 0.0000 1.0000 -1.207 -1.916 *** Back-transformed Means (on the original scale) *** 48 Housed 0.0000 0.1167 1.0000 0.2500 FCattle 1 0.2182 2 0.2414 3 0.1444 4 0.1145 RecChnge 0.0000 1.0000 Housed 0.0000 0.1221 0.1114 1.0000 0.3422 0.1759 SSlur2 0.0000 0.2141 1.0000 0.1391 Natural2 0.0000 0.2301 1.0000 0.1283 Note: means are probabilities not expected values. 6638 VDISPLAY [PRINT=Wald] *** Wald tests for fixed effects *** Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob * Sequentially adding terms to fixed model Housed 29.34 1 29.34 <0.001 FCattle 10.03 3 3.34 0.018 Housed.RecChnge 9.87 2 4.94 0.007 SSlur2 3.23 1 3.23 0.072 Natural2 4.04 1 4.04 0.045 * Dropping individual terms from full fixed model FCattle 9.21 3 3.07 0.027 Housed.RecChnge 7.14 2 3.57 0.028 SSlur2 3.25 1 3.25 0.071 Natural2 4.04 1 4.04 0.045 * Message: chi-square distribution for Wald tests is an asymptotic approximation (i.e. for large samples) and underestimates the probabilities in other cases. These results show that farms on which the sampled animals were housed show statistically significantly higher (p<0.001) prevalences than those where the sampled animals were unhoused (Graph in JulyResults.xls[Multivariate Housed]) 49 0.40 0.35 0.30 0.25 Prevalence 0.20 0.15 0.10 0.05 0.00 Unhoused Housed Class of Farm Plot of prevalences in housed and unhoused animals, with 95% confidence intervals. The estimated prevalences on positive farms by housing status are as follows: Mean Class Prevalence Unhoused 11.7% Housed 25.0% The number of finishing cattle on the farm was used to define a categorical factor as follows: Category Name Number of Finishing Cattle 1 <50 2 50-100 3 100-200 4 >200 Farms which fell into categories 3 and 4 had statistically significantly lower prevalences than those in categories 1 and 2 (p=0.004). 50 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 FCattle 1 FCattle 2 FCattle 3 FCattle 4 Plot of prevalences in farms by FCattle category, with 95% confidence intervals. The estimated prevalences on positive farms by number of finishing cattle are as follows: Mean Category Prevalence 1 21.8% 2 24.1% 3 14.4% 4 11.5% The variable defining whether there has been any change in diet or housing in the immediate past is significant when fitted in interaction with Housed. 0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 Unhoused/No Unhoused/With Housed/No Housed/With Changes Changes Changes Changes Plot of prevalences in farms by Housed.RChnge category, with 95% confidence intervals. 51 The estimated prevalences on positive farms by housing/change status are as follows: Mean Category Prevalence Unhoused/No Changes 12.2% Unhoused/With Changes 11.1% Housed/No Changes 34.2% Housed/With Changes 17.6% There is no significant effect due to changes among unhoused animals (p=0.76). However, the prevalence among housed animals with recent changes is higher although not statistically significant (p=0.26), while the prevalence among housed animals without recent changes is significantly higher again (p=0.007). This can be interpreted as a ‘build-up’ effect: housing increases the prevalence, and the presence of a recent change implies that the housing effect will have had a shorter period of time to take effect. It should be remembered that this factor could reflect either changes in diet or changes in location: although it is tempting to interpret the results in terms of the change in location, this uncertainty should be borne in mind. 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 Natural Water Source Other Plot of prevalences in farms by water source, with 95% confidence intervals. The estimated prevalences on positive farms by water source are as follows: Mean Category Prevalence Natural Water 12.8% Other 23.0% Farms on which unhoused animals have access to a natural water supply have a lower prevalence (p=0.045) than on other farms. 52 0.30 0.25 0.20 0.15 0.10 0.05 0.00 No Slurry Spread Slurry Spread Plot of prevalences in farms by Slurry Spreading status, with 95% confidence intervals. The estimated prevalences on positive farms by slurry spreading status are as follows: Mean Category Prevalence No Silage Grown 21.4% Silage Grown 13.9% Farms on which slurry is spread on the silage fields have a lower prevalence than those farms on which no slurry is spread. This difference is not statistically significant (p=0.07) but would seem worth reporting. Having fitted all the likely explanatory variables in the multifactor model, we now return to explore the effect that the inclusion of these factors may have on the fit of the structural factors. Fitting Division and Division.Housed gives the following output: 7122 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects; DISTRIBUTION=binomial;\ 7123 LINK=logit; DISPERSION=1; FIXED=Housed + FCattle + Housed.RecChnge + SSlur2+ Natural2+Division+Division.Housed;\ 7124 RANDOM=Farm; CONSTANT=estimate; FACT=9; PSE=*; MAXCYCLE=20; FMETHOD=all; CADJUST=mean]\ 7125 VTPos; NBINOMIAL=N_Sam ***** Generalised Linear Mixed Model Analysis ***** Method: cf Schall (1991) Biometrika Response variate: VTPos Distribution: BINOMIAL Link function: LOGIT Random model: Farm Fixed model: Constant + (((((Housed + FCattle) + (Housed . RecChnge) ) + SSlur2) + Natural2) + Division) + (Housed . Division) * Dispersion parameter fixed at value 1.000 53 *** Monitoring information *** Iteration Gammas Dispersion Max change 1 1.292 1.000 1.7250E+00 2 1.533 1.000 2.4051E-01 3 1.678 1.000 1.4561E-01 4 1.697 1.000 1.9120E-02 5 1.699 1.000 1.2743E-03 6 1.699 1.000 1.3341E-04 7 1.699 1.000 1.3609E-05 *** Estimated Variance Components *** Random term Component S.e. Farm 1.699 0.221 *** Residual variance model *** Term Factor Model(order) Parameter Estimate S.e. Dispersn Identity Sigma2 1.000 FIXED *** Estimated Variance matrix for Variance Components *** Farm 1 0.04885 Dispersn 2 0.00000 0.00000 1 2 *** Table of effects for Constant *** -1.232 Standard error: 0.4404 *** Table of effects for Housed *** Housed 0.0000 1.0000 0.0000 1.3835 Standard error of differences: 0.5199 *** Table of effects for FCattle *** FCattle 1 2 3 4 0.0000 0.2010 -0.3886 -0.6241 Standard error of differences: Average 0.3262 Maximum 0.3793 Minimum 0.2614 Average variance of differences: 0.1085 *** Table of effects for Housed.RecChnge *** RecChnge 0.0000 1.0000 Housed 0.0000 0.0000 -0.2138 1.0000 0.0000 -0.8995 Standard error of differences: Average 0.3702 Maximum 0.4860 Minimum 0.3347 Average variance of differences: 0.1404 54 *** Table of effects for SSlur2 *** SSlur2 0.0000 1.0000 0.0000 -0.3293 Standard error of differences: 0.3029 *** Table of effects for Natural2 *** Natural2 0.0000 1.0000 0.0000 -0.6814 Standard error of differences: 0.3534 *** Table of effects for Division *** Division Central Highland Islands North East South East 0.0000 0.6400 0.4473 0.1987 0.5044 Division South West -0.4383 Standard error of differences: Average 0.6037 Maximum 0.7070 Minimum 0.4892 Average variance of differences: 0.3678 *** Table of effects for Housed.Division *** Division Central Highland Islands North East South East Housed 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 0.0000 1.1292 -1.7037 -0.1396 -0.4355 Division South West Housed 0.0000 0.0000 1.0000 -0.0928 Standard error of differences: Average 0.8699 Maximum 1.378 Minimum 0.6133 Average variance of differences: 0.8114 *** Tables of means *** *** Table of predicted means for Housed *** Housed 0.0000 1.0000 -1.822 -0.988 *** Table of predicted means for FCattle *** FCattle 1 2 3 4 -1.202 -1.001 -1.591 -1.826 *** Table of predicted means for Housed.RecChnge *** RecChnge 0.0000 1.0000 Housed 0.0000 -1.715 -1.929 1.0000 -0.538 -1.438 55 *** Table of predicted means for SSlur2 *** SSlur2 0.0000 1.0000 -1.240 -1.570 *** Table of predicted means for Natural2 *** Natural2 0.0000 1.0000 -1.064 -1.746 *** Table of predicted means for Division *** Division Central Highland Islands North East South East -1.527 -0.322 -1.931 -1.398 -1.240 Division South West -2.011 *** Table of predicted means for Housed.Division *** Division Central Highland Islands North East South East Housed 0.0000 -2.047 -1.407 -1.600 -1.848 -1.543 1.0000 -1.006 0.763 -2.263 -0.947 -0.937 Division South West Housed 0.0000 -2.485 1.0000 -1.538 *** Back-transformed Means (on the original scale) *** Housed 0.0000 0.1392 1.0000 0.2713 FCattle 1 0.2311 2 0.2688 3 0.1693 4 0.1387 RecChnge 0.0000 1.0000 Housed 0.0000 0.1526 0.1269 1.0000 0.3686 0.1919 SSlur2 0.0000 0.2244 1.0000 0.1723 Natural2 0.0000 0.2565 1.0000 0.1486 Division Central 0.1785 Highland 0.4202 Islands 0.1266 North East 0.1982 South East 0.2244 South West 0.1180 56 Housed 0.0000 1.0000 Division Central 0.1144 0.2677 Highland 0.1967 0.6820 Islands 0.1680 0.0943 North East 0.1361 0.2794 South East 0.1762 0.2814 South West 0.0769 0.1769 Note: means are probabilities not expected values. 7106 VDISPLAY [PRINT=Wald] *** Wald tests for fixed effects *** Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob * Sequentially adding terms to fixed model Housed 29.73 1 29.73 <0.001 FCattle 10.16 3 3.39 0.017 Housed.RecChnge 10.11 2 5.05 0.006 SSlur2 3.30 1 3.30 0.069 Natural2 4.12 1 4.12 0.042 Division 12.27 5 2.45 0.031 Housed.Division 4.78 5 0.96 0.443 * Dropping individual terms from full fixed model FCattle 7.24 3 2.41 0.065 Housed.RecChnge 7.65 2 3.82 0.022 SSlur2 1.18 1 1.18 0.277 Natural2 3.72 1 3.72 0.054 Housed.Division 4.78 5 0.96 0.443 * Message: chi-square distribution for Wald tests is an asymptotic approximation (i.e. for large samples) and underestimates the probabilities in other cases. Hence, although Housed.Division is not significant, there is still significant evidence of geographical variability unexplained by the fitted epidemiological factors (in fact, the geographical distinctions are more clear after the effects of the other factors have been removed). Fitting Manage_O gives the following Wald statistics: 7111 VDISPLAY [PRINT=Wald] *** Wald tests for fixed effects *** Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob * Sequentially adding terms to fixed model Housed 29.16 1 29.16 <0.001 FCattle 9.98 3 3.33 0.019 Housed.RecChnge 9.82 2 4.91 0.007 SSlur2 3.21 1 3.21 0.073 Natural2 4.01 1 4.01 0.045 Manage_O 0.93 2 0.46 0.630 * Dropping individual terms from full fixed model FCattle 9.05 3 3.02 0.029 Housed.RecChnge 7.24 2 3.62 0.027 SSlur2 3.50 1 3.50 0.062 Natural2 3.90 1 3.90 0.048 57 Manage_O 0.93 2 0.46 0.630 * Message: chi-square distribution for Wald tests is an asymptotic approximation (i.e. for large samples) and underestimates the probabilities in other cases. Fitting Housed.Manage_O gives the following Wald statistics: 7116 VDISPLAY [PRINT=Wald] *** Wald tests for fixed effects *** Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob * Sequentially adding terms to fixed model Housed 29.36 1 29.36 <0.001 FCattle 10.10 3 3.37 0.018 Housed.RecChnge 9.97 2 4.98 0.007 SSlur2 3.26 1 3.26 0.071 Natural2 4.01 1 4.01 0.045 Housed.Manage_O 6.25 4 1.56 0.181 * Dropping individual terms from full fixed model FCattle 10.47 3 3.49 0.015 Housed.RecChnge 7.61 2 3.80 0.022 SSlur2 3.53 1 3.53 0.060 Natural2 4.02 1 4.02 0.045 Housed.Manage_O 6.25 4 1.56 0.181 * Message: chi-square distribution for Wald tests is an asymptotic approximation (i.e. for large samples) and underestimates the probabilities in other cases. Hence there is no evidence of Manage_O or its interaction with Housed having any significant effect on the prevalence. Fitting Sam_Mon (which was highly significant in the univariate analysis) gives the following output: 7126 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects; DISTRIBUTION=binomial;\ 7127 LINK=logit; DISPERSION=1; FIXED=Housed + FCattle + Housed.RecChnge + SSlur2+ Natural2+Sam_Mon+Sam_Mon.Housed;\ 7128 RANDOM=Farm; CONSTANT=estimate; FACT=9; PSE=*; MAXCYCLE=20; FMETHOD=all; CADJUST=mean]\ 7129 VTPos; NBINOMIAL=N_Sam ***** Generalised Linear Mixed Model Analysis ***** Method: cf Schall (1991) Biometrika Response variate: VTPos Distribution: BINOMIAL Link function: LOGIT Random model: Farm Fixed model: Constant + (((((Housed + FCattle) + (Housed . RecChnge) ) + SSlur2) + Natural2) + Sam_Mon) + (Housed . Sam_Mon) * Dispersion parameter fixed at value 1.000 *** Monitoring information *** Iteration Gammas Dispersion Max change 1 1.241 1.000 1.7883E+00 2 1.548 1.000 3.0736E-01 3 1.701 1.000 1.5289E-01 4 1.722 1.000 2.1058E-02 58 5 1.724 1.000 1.4655E-03 6 1.724 1.000 1.5810E-04 7 1.724 1.000 1.6533E-05 *** Estimated Variance Components *** Random term Component S.e. Farm 1.724 0.228 *** Residual variance model *** Term Factor Model(order) Parameter Estimate S.e. Dispersn Identity Sigma2 1.000 FIXED *** Estimated Variance matrix for Variance Components *** Farm 1 0.05218 Dispersn 2 0.00000 0.00000 1 2 *** Table of effects for Constant *** -2.230 Standard error: 1.3511 *** Table of effects for Housed *** Housed 0.0000 1.0000 0.000 2.929 Standard error of differences: 1.267 *** Table of effects for FCattle *** FCattle 1 2 3 4 0.0000 0.1277 -0.6122 -0.7928 Standard error of differences: Average 0.3353 Maximum 0.3965 Minimum 0.2706 Average variance of differences: 0.1147 *** Table of effects for Housed.RecChnge *** RecChnge 0.0000 1.0000 Housed 0.0000 0.0000 -0.0109 1.0000 0.0000 -0.9641 Standard error of differences: Average 0.4218 Maximum 0.5547 Minimum 0.3789 Average variance of differences: 0.1824 *** Table of effects for SSlur2 *** SSlur2 0.0000 1.0000 0.0000 -0.5271 Standard error of differences: 0.3023 59 *** Table of effects for Natural2 *** Natural2 0.0000 1.0000 0.0000 -0.6120 Standard error of differences: 0.3729 *** Table of effects for Sam_Mon *** Sam_Mon Jan Feb Mar Apr May 0.0000 -1.1247 0.2774 -0.0039 1.1748 Sam_Mon Jun Jul Aug Sep Oct 1.3308 1.3849 1.4567 0.8369 0.5001 Sam_Mon Nov Dec -0.2282 -0.2707 Standard error of differences: Average 1.259 Maximum 2.157 Minimum 0.5449 Average variance of differences: 1.816 *** Table of effects for Housed.Sam_Mon *** Sam_Mon Jan Feb Mar Apr May Housed 0.0000 * * 0.0000 0.0000 0.0000 1.0000 0.0000 0.0000 -0.6260 -0.5438 -0.8902 Sam_Mon Jun Jul Aug Sep Oct Housed 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1.0000 * -2.7419 -1.5238 -4.0927 -1.4113 Sam_Mon Nov Dec Housed 0.0000 0.0000 * 1.0000 0.0000 0.0000 Standard error of differences: Average 1.655 Maximum 2.152 Minimum 0.9070 Average variance of differences: 2.805 *** Tables of means *** *** Table of predicted means for Housed *** Housed 0.0000 1.0000 * * *** Table of predicted means for FCattle *** FCattle 1 2 3 4 -1.726 -1.599 -2.338 -2.519 *** Table of predicted means for Housed.RecChnge *** RecChnge 0.0000 1.0000 Housed 60 0.0000 * * 1.0000 * * *** Table of predicted means for SSlur2 *** SSlur2 0.0000 1.0000 -1.782 -2.309 *** Table of predicted means for Natural2 *** Natural2 0.0000 1.0000 -1.740 -2.352 *** Table of predicted means for Sam_Mon *** Sam_Mon Jan Feb Mar Apr May Jun Jul Aug * * -1.934 -2.174 -1.169 * -1.885 -1.204 Sam_Mon Sep Oct Nov Dec -3.108 -2.104 -2.127 * *** Table of predicted means for Housed.Sam_Mon *** Sam_Mon Jan Feb Mar Apr May Jun Jul Housed 0.0000 * * -2.847 -3.129 -1.950 -1.794 -1.740 1.0000 -0.673 -1.797 -1.021 -1.220 -0.388 * -2.030 Sam_Mon Aug Sep Oct Nov Dec Housed 0.0000 -1.668 -2.288 -2.625 -3.353 * 1.0000 -0.740 -3.928 -1.584 -0.901 -0.943 *** Back-transformed Means (on the original scale) *** Housed 0.0000 * 1.0000 * FCattle 1 0.1511 2 0.1682 3 0.0880 4 0.0745 RecChnge 0.0000 1.0000 Housed 0.0000 * * 1.0000 * * SSlur2 0.0000 0.1440 1.0000 0.0904 Natural2 0.0000 0.1494 1.0000 0.0869 Sam_Mon Jan * Feb * Mar 0.1263 Apr 0.1021 May 0.2370 61 Jun * Jul 0.1319 Aug 0.2308 Sep 0.0428 Oct 0.1087 Nov 0.1065 Dec * Housed 0.0000 1.0000 Sam_Mon Jan * 0.3379 Feb * 0.1422 Mar 0.0548 0.2648 Apr 0.0419 0.2279 May 0.1246 0.4042 Jun 0.1426 * Jul 0.1493 0.1161 Aug 0.1587 0.3231 Sep 0.0921 0.0193 Oct 0.0676 0.1703 Nov 0.0338 0.2889 Dec * 0.2802 Note: means are probabilities not expected values. 7121 VDISPLAY [PRINT=Wald] *** Wald tests for fixed effects *** Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob * Sequentially adding terms to fixed model Housed 29.42 1 29.42 <0.001 FCattle 10.20 3 3.40 0.017 Housed.RecChnge 9.95 2 4.97 0.007 SSlur2 3.30 1 3.30 0.069 Natural2 4.00 1 4.00 0.045 Sam_Mon 14.10 11 1.28 0.227 Housed.Sam_Mon 9.12 7 1.30 0.244 * Dropping individual terms from full fixed model FCattle 10.76 3 3.59 0.013 Housed.RecChnge 6.48 2 3.24 0.039 SSlur2 3.04 1 3.04 0.081 Natural2 2.69 1 2.69 0.101 Housed.Sam_Mon 9.12 7 1.30 0.244 * Message: chi-square distribution for Wald tests is an asymptotic approximation (i.e. for large samples) and underestimates the probabilities in other cases. Neither Sam_Mon or Housed.Sam_Mon are statistically significant. Hence, the explanatory variables (particularly Housed) have explained most of the variability that was assigned to Month in the univariate analysis. We confirm this by refitting the model without any of the housing terms: 7134 VDISPLAY [PRINT=Wald] *** Wald tests for fixed effects *** Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob * Sequentially adding terms to fixed model FCattle 4.89 3 1.63 0.180 SSlur2 0.01 1 0.01 0.943 Natural2 21.19 1 21.19 <0.001 62 Sam_Mon 24.74 11 2.25 0.010 * Dropping individual terms from full fixed model FCattle 8.12 3 2.71 0.044 SSlur2 2.19 1 2.19 0.139 Natural2 12.75 1 12.75 <0.001 Sam_Mon 24.74 11 2.25 0.010 * Message: chi-square distribution for Wald tests is an asymptotic approximation (i.e. for large samples) and underestimates the probabilities in other cases. This output confirms that the month to month variability is almost completely explained by the Housed terms. Reviewing the pattern of housing of animals over the year we see the following pattern: 1 Proportion Groups Housed 0.8 0.6 0.4 0.2 0 r ry ne Fe ry r ch ly ril r er st ay be be be Ju Ap ua a gu ob Ju ar M m nu m em M br Au ct ve ce Ja O pt No De Se Month Proportion of Sampling Groups Housed, by Month, with 95% Confidence Intervals. In the univariate analysis, the months exhibiting a lower prevalence were identified as June to October. June to September are the months with the lowest proportion of animals housed, while in October, although a higher proportion of groups are housed, the ‘recent change’ factor is likely to operate to reduce the shedding prevalence. Fitting Sam_Year and Sam_Year.Housed to the data gives rise to the following summary statistics: 7139 VDISPLAY [PRINT=Wald] *** Wald tests for fixed effects *** Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob * Sequentially adding terms to fixed model 63 Housed 29.01 1 29.01 <0.001 FCattle 9.95 3 3.32 0.019 Housed.RecChnge 9.79 2 4.89 0.007 SSlur2 3.19 1 3.19 0.074 Natural2 3.98 1 3.98 0.046 Sam_Year 1.00 2 0.50 0.606 Housed.Sam_Year 2.33 2 1.17 0.312 * Dropping individual terms from full fixed model FCattle 8.30 3 2.77 0.040 Housed.RecChnge 4.87 2 2.43 0.088 SSlur2 3.30 1 3.30 0.069 Natural2 3.44 1 3.44 0.064 Housed.Sam_Year 2.33 2 1.17 0.312 * Message: chi-square distribution for Wald tests is an asymptotic approximation (i.e. for large samples) and underestimates the probabilities in other cases. There is no evidence of any year-on-year trend in prevalence in either housed or unhoused animals. Returning to the model with the explanatory factors and animal health division, the prevalences by area, after adjusting for the significant explanatory variables, are given by fitting the following model: 7140 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects; DISTRIBUTION=binomial;\ 7141 LINK=logit; DISPERSION=1; FIXED=Housed+ FCattle + Housed.RecChnge+SSlur2+ Natural2+Division;\ 7142 RANDOM=Farm; CONSTANT=estimate; FACT=9; PSE=*; MAXCYCLE=20; FMETHOD=all; CADJUST=mean]\ 7143 VTPos; NBINOMIAL=N_Sam ***** Generalised Linear Mixed Model Analysis ***** Method: cf Schall (1991) Biometrika Response variate: VTPos Distribution: BINOMIAL Link function: LOGIT Random model: Farm Fixed model: Constant + ((((Housed + FCattle) + (Housed . RecChnge)) + SSlur2) + Natural2) + Division * Dispersion parameter fixed at value 1.000 *** Monitoring information *** Iteration Gammas Dispersion Max change 1 1.237 1.000 1.7843E+00 2 1.533 1.000 2.9627E-01 3 1.666 1.000 1.3312E-01 4 1.685 1.000 1.8695E-02 5 1.686 1.000 1.2612E-03 6 1.686 1.000 1.3440E-04 7 1.686 1.000 1.3954E-05 *** Estimated Variance Components *** Random term Component S.e. Farm 1.686 0.217 *** Residual variance model *** Term Factor Model(order) Parameter Estimate S.e. Dispersn Identity Sigma2 1.000 FIXED 64 *** Estimated Variance matrix for Variance Components *** Farm 1 0.04689 Dispersn 2 0.00000 0.00000 1 2 *** Table of effects for Constant *** -1.178 Standard error: 0.3600 *** Table of effects for Housed *** Housed 0.0000 1.0000 0.0000 1.2921 Standard error of differences: 0.3288 *** Table of effects for FCattle *** FCattle 1 2 3 4 0.0000 0.2252 -0.4122 -0.6472 Standard error of differences: Average 0.3220 Maximum 0.3765 Minimum 0.2586 Average variance of differences: 0.1058 *** Table of effects for Housed.RecChnge *** RecChnge 0.0000 1.0000 Housed 0.0000 0.0000 -0.1707 1.0000 0.0000 -0.8901 Standard error of differences: Average 0.3654 Maximum 0.4806 Minimum 0.3326 Average variance of differences: 0.1369 *** Table of effects for SSlur2 *** SSlur2 0.0000 1.0000 0.0000 -0.3338 Standard error of differences: 0.2957 *** Table of effects for Natural2 *** Natural2 0.0000 1.0000 0.0000 -0.7004 Standard error of differences: 0.3498 *** Table of effects for Division *** Division Central Highland Islands North East South East 0.0000 1.0762 0.1254 0.1065 0.1952 65 Division South West -0.4932 Standard error of differences: Average 0.4146 Maximum 0.5626 Minimum 0.2942 Average variance of differences: 0.1788 *** Tables of means *** *** Table of predicted means for Housed *** Housed 0.0000 1.0000 -1.821 -0.888 *** Table of predicted means for FCattle *** FCattle 1 2 3 4 -1.146 -0.921 -1.558 -1.793 *** Table of predicted means for Housed.RecChnge *** RecChnge 0.0000 1.0000 Housed 0.0000 -1.735 -1.906 1.0000 -0.443 -1.333 *** Table of predicted means for SSlur2 *** SSlur2 0.0000 1.0000 -1.188 -1.521 *** Table of predicted means for Natural2 *** Natural2 0.0000 1.0000 -1.004 -1.705 *** Table of predicted means for Division *** Division Central Highland Islands North East South East -1.523 -0.447 -1.398 -1.416 -1.328 Division South West -2.016 *** Back-transformed Means (on the original scale) *** Housed 0.0000 0.1393 1.0000 0.2914 FCattle 1 0.2412 2 0.2848 3 0.1739 4 0.1427 RecChnge 0.0000 1.0000 Housed 0.0000 0.1499 0.1294 1.0000 0.3909 0.2086 66 SSlur2 0.0000 0.2337 1.0000 0.1792 Natural2 0.0000 0.2681 1.0000 0.1538 Division Central 0.1790 Highland 0.3902 Islands 0.1982 North East 0.1952 South East 0.2095 South West 0.1175 Note: means are probabilities not expected values. 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Division Division Division Division Division Division Central Highland Islands North East South East South West Plot of prevalences by animal health division, with 95% confidence intervals. The mean prevalence in Highland division appears to be significantly higher than those in Central, Islands, North-East and South-East (p=0.02), while the prevalences in these regions are significantly higher than that in the South-West (p=0.03). These trends match those seen in the univariate analysis. Reviewing the fit of the model, plotting the observed and expected fractions of positive pats for the 207 data included in the model gives the following plot: 67 1 0.9 0.8 0.7 Model Probability 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 Observed Fraction Plot of observed and fitted fractional prevalences. Overall, the fit looks fairly reasonable, with a few minor outliers. The only serious lack of fit occurs for maximal prevalences, where the fitted model will always be smaller than an observed 100% shedding rate. Even this cluster of negative residuals looks likely to be of negligible effect. To assess this more formally, we examine a residual plot for the model. The residuals and fitted values from the model (based on the inclusion only of fixed effects) are recovered by refitting the model using the marginal method of Breslow & Clayton (1993) and then recovering the residuals using VKEEP [RES=Residuals;FIT=Fitted]. The resulting fitted values are converted back onto the proportion scale using the inverse of the logit function, and the resulting plot is shown below: 68 1.5 1 0.5 Residual 0 0 0.2 0.4 0.6 0.8 1 -0.5 -1 -1.5 Fitted Fraction Plot of residual against model fit (random effects model). The histogram of these residuals should also be examined. Histogram of Residuals (Random) 40 30 Frequency 20 10 0 -0.8 -0.4 0.0 0.4 0.8 1.2 Residuals (Random) Histogram of residuals (random effects model). 69 The pattern of the residuals against the fitted value is fairly typical of this class of residuals. The histogram is sufficiently symmetric for the fit of the model to be regarded as acceptable, although there may be some evidence of sub-populations in the histogram. Interpretation of these residuals is problematic. To fully evaluate the fit of the model, we examine the deviance residuals from the equivalent fixed effect model with overdispersion. This model is close in its properties to the mixed model, and the deviance residuals are easier to interpret. The residuals are recovered using the RKEEP command, using the default residual settings in RKEEP and MODEL. The resulting fitted values are converted back onto the proportion scale using the inverse of the logit function, and the resulting plot is shown below: 3 2.5 2 1.5 1 Residual 0.5 0 -0.5 -1 -1.5 -2 -2.5 0 0.2 0.4 0.6 0.8 1 Fitted Fraction Plot of residual against model fit (fixed effects model). The histogram of these residuals should also be examined. 70 Histogram of Residuals (Fixed) 30 25 20 Frequency 15 10 5 0 -1.50 -0.75 0.00 0.75 1.50 2.25 Residuals (Fixed) Histogram of residuals (fixed effects model). These graphics are much more easy to interpret. The main peculiarities appear to be a clustering of moderately negative residuals associated with observed fractional prevalences in the range 20-30%, and a slightly disproportionate number of high (>2) residuals. The latter are, however, drawn from a wide range of observations with different prevalences. It is, indeed, plausible that the latter peculiarity is a side-effect of the former, since if the residual histogram is visualised as a confounding of two subpopulations, one centred on a value slightly larger than zero, and the other on a value around –0.75, both sub-populations appear reasonably normally distributed in the dataset. No points have been highlighted by Genstat as exhibiting high leverage. Calculating Cook’s statistics for each observation to identify observations which combine both large residuals with high leverage, no particular pattern is apparent. No sub-population of the dataset appears to be having a consistently strong effect on the model. 71 VTPos 3.0 2.5 2.0 1.5 1.0 0.5 0.0 40 60 80 100 120 Fitted values suitably transformed Plotting the Cook’s statistics against the various explanatory factors shows no particular trend. Only one point stands out in this exercise: the point (Farm 515) with the largest Cook’s statistic appears as an outlier in both the Highland level of the Division factor and in the Housed with recent change level of the Housed.RecChnge interaction term. However, removing this farm from the model has a negligible effect on the residuals (and on the model and associated p-values in general). The subpopulation of residuals correspond to a group of farms with lower than expected shedding levels. The predicted prevalence is in the range 20%-30%, while that observed is much lower: typically only one or two positive pats. Examination of the properties of these observations shows some pattern. They tend to be observations from farms which lack any of the obvious risk factors, or, if they do, these are off-set by other, protective factors. Hence, their fitted risk is close to the estimated mean, which is higher than the actual prevalence seen on these farms. This does not appear to be a response to the inclusion of any specific factor in the model (given the lack of evidence for significant leverage in the model), rather, it is a property of the response distribution, where on some farms there are much fewer positive pats detected than on apparently similar farms. This could reflect some unidentified and hence unmodelled explanatory factor, or some peculiarity of the distribution which describes the random terms. It is difficult to interpret such effects in purely random terms: the most obvious aspect of the raw data, namely the apparent ‘bulge’ at high prevalences, can be explained by various aspects of contagion models (such as the stochastic threshold theorem) or by hypothesising the existence of hyper-shedding cattle. It is more 72 difficult to conceptualise a distributional effect which gives rise to a smaller population at moderate prevalences. If this sub-population does reflect a genuine and unidentified explanatory factor, at least it is an unidentified protective factor rather than an unidentified risk factor. Examination of the residuals would suggest that the residuals, although less than perfect, are not sufficiently asymmetric to undermine the asymptotic assumptions which underlie the calculation of standard errors and p-values. Hence, the results reported in this document are still valid, and can be reported with confidence. 73 Analysing Bernoulli data (absence or presence of farm level infection) Initially, the effect of the descriptive variables (Division, Sam_Month, Manage_O) will be assessed: 5559 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5560 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5561 Manage_O ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, Manage_O *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 3 1.0 0.328 0.33 0.805 Residual 948 996.0 1.051 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The following units have high leverage: Unit Response Leverage 221 0.00 0.1845 351 0.00 0.1845 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.291 0.196 -6.58 <.001 0.2750 Manage_O Beef 0.010 0.220 0.05 0.963 1.010 Manage_O Other 0.032 0.260 0.12 0.903 1.032 Manage_O Mixed -4.27 6.95 -0.61 0.539 0.01400 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Manage_O Dairy Manage_O shows no significant effects. Division shows more interesting effects. 5562 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5563 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5564 Division 5564............................................................................ ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, Division 74 *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 5 7.9 1.580 1.58 0.162 Residual 946 989.1 1.046 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.106 0.170 -6.52 <.001 0.3309 Division Highland -0.612 0.336 -1.82 0.069 0.5423 Division Islands -0.475 0.339 -1.40 0.161 0.6221 Division North East -0.005 0.232 -0.02 0.982 0.9947 Division South East 0.017 0.260 0.07 0.948 1.017 Division South West -0.354 0.236 -1.50 0.133 0.7020 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Division Central Overall, there is no statistically significant evidence of any differences in the levels of farm prevalence in different areas of Scotland. The prevalence in the Central, North- East and South-East are all comparable, with the prevalence in the South-West being lower, and that in the Highlands and the Islands lower still. 35% 30% 25% 20% 15% 10% 5% 0% Central Highlands Islands NE SE SW 75 Plot of farm prevalences by animal health division (univariate analysis), with 95% confidence intervals. The estimated prevalences of positive farms in different divisions are as follows: Central 25% Highlands 15% Islands 17% NE 25% SE 25% SW 19% These results are interesting, noting in particular that the high animal prevalence in the Highlands is matched with a low farm prevalence, but no trend is apparent when the animal and farm prevalences are plotted by Division, and in general, it must be stressed that the farm prevalence effects are not statistically significant. Examining Sampling Month, 5602 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5603 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5604 Sam_Mon 5604............................................................................ ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, Sam_Mon *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 11 19.0 1.731 1.73 0.060 Residual 940 978.0 1.040 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -2.031 0.376 -5.40 <.001 0.1311 Sam_Mon Feb 0.292 0.481 0.61 0.544 1.340 Sam_Mon Mar 0.784 0.435 1.80 0.071 2.190 Sam_Mon Apr 0.340 0.475 0.71 0.475 1.405 Sam_Mon May 1.051 0.432 2.43 0.015 2.860 Sam_Mon Jun 0.502 0.485 1.04 0.300 1.652 Sam_Mon Jul 1.010 0.465 2.17 0.030 2.745 Sam_Mon Aug 0.915 0.463 1.97 0.048 2.496 Sam_Mon Sep 1.030 0.466 2.21 0.027 2.801 Sam_Mon Oct 0.696 0.452 1.54 0.123 2.007 Sam_Mon Nov 1.364 0.466 2.93 0.003 3.910 76 Sam_Mon Dec 0.677 0.546 1.24 0.215 1.968 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Sam_Mon Jan 5605 RKEEP ; ESTIMATES=Est; VCOVARIANCE=Var Overall, there is no statistically significant evidence of any differences in farm prevalence in different months. Examining the associated confidence intervals: 50% 45% 40% 35% 30% 25% 20% 15% 10% 5% 0% r ry ne Fe ry r ch ly ril r er st ay be be be Ju Ap ua a gu ob Ju ar M m nu m em M br Au ct ve ce Ja O pt No De Se Plot of farm prevalences by sampling month, with 95% confidence intervals. The estimated farm prevalences in different sampling months are as follows: January 12% February 15% March 22% April 16% May 27% June 18% July 26% August 25% September 27% October 21% November 34% December 21% Although there is no formally statistically significant evidence of differences between the mean prevalences on a month-by-month basis, a clear trend is visible in the data. January, February and April are associated with the lowest three prevalences, while the prevalence from March is fairly low. At the within-farm level, these months were 77 associated with some of the highest animal prevalences, explained by factors such as housing of animals. Given the complex multivariate model which was required in the analysis of the within-farm data, there is little point in exploring these properties further before investigating the explanatory factors which might affect prevalence levels. Exploring the possible explanatory factors in a univariate fashion using a Generalised Linear Model, the results are summarised in the following table. The p-values indicate the likely significance of the fitted values. Variables with p-values of less than 5% are indicated in red, those in the range 5%-10% in blue. Those variables which ultimately are found to be of interest in the multivariate analysis are indicated by bold text. Factor/Variable p-value Comments Manage_C 0.67 ‘Beef’ and ‘Others' higher than 'Dairy' Manage_O 0.80 ‘Beef’ and ‘Others' higher than 'Dairy' Division 0.16 ‘Highland’ lower than others Sam_Month 0.06 Lower in January and February Sample 0.28 Lower in rectal samples Sam_Year 0.004 Consistent drop with time Season 0.04 Winter lower than other seasons Both Winter estimates lower than other seasons: final SeasList 0.01 Spring may also be lower Sampler 0.18 ‘Fiona' is higher than 'Helen' Higher numbers of finishing cattle associated with higher farm prevalence, probably better analysed as a factor, N_F_Cattle <0.001 below FCattle <0.001 Groups 2 and 3 higher than group 1, group 4 higher again Probably better analysed as a factor, below: more groups N_Groups 0.04 associated with higher prevalence GroupsCat 0.08 More groups associated with higher prevalence N_Sam_Gr <0.001 More sampling groups associated with higher prevalences Min_Age 0.74 Higher minimum age associated with lower prevalence Max_Age 0.31 Higher maximum age associated with lower prevalence ‘Buy in' and ‘Both’ higher prevalences than 'Breeding Source 0.01 only' NewSource 0.03 ‘Open' higher than 'Closed' Breed 0.03 ‘B_D_DB ' higher than others. No consistent pattern Farms with Housed animals are more likely to exhibit Housed 0.64 shedding animals: but this is not statistically significant ‘Byre’ excluded due to badly fitting model: too few observations. All alternatives have lower prevalences than Housing 0.17 ‘Court’. NoChange 0.87 1' higher than '0' (not sure of interpretation) TDHouse 0.46 Longer time associated with higher prevalences Rec_Move 0.66 A recent move is associated with lower prevalences Most recent move class 1 (<1 week) is lower than classes RecMove2 0.58 2 and 3 (>1 week) Farms with animals receiving supplementary feed less SupFeed 0.80 likely to be positive RecDFeed 0.69 Recent change in feed associated with higher prevalence Forage 0.39 Farms with animals having forage less likely to be 78 positive Silage 0.64 Farms with animals having silage less likely to be positive Farms with animals having concentrate more likely to be Concentrate 0.31 positive Sil_Home 0.83 ‘Yes' is higher than 'No' Sil_Manure 0.68 ‘Yes' is lower than 'No' Sil_Slurry 0.16 ‘Yes' is higher than 'No' Sil_Sewage 0.60 ‘Yes' is higher than 'No' Sil_Geece 0.22 ‘Yes' is lower than 'No' Sil_Gulls 0.57 ‘Yes' is higher than 'No' Hay 0.87 ‘Yes' is lower than 'No' Hay_Manure 0.68 ‘Yes' is lower than 'No' Hay_Slurry 0.12 ‘Yes' is higher than 'No' Hay_Sewage No data points in class with Sewage on hay fields. Hay_Geese 0.27 Geece present associated with lower prevalence Hay_Gulls 0.22 Gulls present associated with lower prevalence Farms reporting use of manure on grass less likely to be Grass_Manure 0.02 positive for shedding Farms reporting use of slurry on grass more likely to be Grass_Slurry <0.001 positive for shedding Farms reporting use of sewage on grass less likely to be Grass_Sewage 0.54 positive for shedding Farms reporting geece on grass less likely to be positive Grass_Geece 0.52 for shedding Farms reporting gulls on grass more likely to be positive Grass_Gulls 0.49 for shedding N_Cattle 0.004 More cattle associated with higher prevalence Cattle 0.002 Groups 2 and 3 show higher prevalences than group 1 Larger numbers of sheep are protective, but better N_Sheep 0.41 analysed using a factor Sheep 0.42 (Sheep absent or present) 'With' is higher than 'Without' N_Goats 0.08 More goats associated with higher prevalence Goats 0.44 (Goats absent or present) 'With' is higher than 'Without' N_Horses 0.69 More horses associated with lower prevalence N_Pigs 0.32 More pigs associated with lower prevalence Pigs 0.01 (Pigs absent or present) 'With' is higher than 'Without' N_Chickens 0.97 More chickens associated with lower prevalence Chickens 0.46 (Chickens absent or present) 'With' is lower than 'Without' N_Deer 0.28 More deer associated with higher prevalence Deer 0.38 (Deer absent or present) 'With' is higher than 'Without' Water 0.16 No obvious pattern Mains 0.21 Mains supply farms have a higher mean prevalence Natural 0.10 Natural supply farms have a lower mean prevalence Private 0.34 Private supply farms have a lower mean prevalence WaterCon 0.66 With' is higher than 'Without' All but 'None', 'Animal' and ASM thrown out for lack of WaterCT 0.81 information: ordering ‘Animals’ , ‘None’, 'ASM' Those that wanted to know had lower prevalences than Want2Know 0.68 those who did not Those willing to have a 2nd visit had a lower prevalence Visit2 0.82 than those who were not ‘S’ generated lower prevalences than ‘D’ and ‘H’. ‘H’ LabOperator 0.04 was lower than ‘D’. 79 BeefonDairy 0.02 This class of farm exhibits a higher prevalence Unlike the analysis of the prevalence data from positive farms, no factor appears to be absolutely pivotal in defining the system in the way that the Housed/Unhoused classification did in for the Binomial data. The properties of the interesting factors will therefore be reviewed in depth. These are N_F_Cattle/ FCattle/N_Groups/GroupsCat/N_Sam_Gr/Cattle/N_Cattle, Source/NewSource, Breed/ BeefonDairy (BeefonDairy is defined as a particular interaction of a management and a breed factor), Grass_Manure, Grass_Slurry, N_Goats, Pigs, LabOperator. Sample_Year and a variety of associated Sample_Month and/or Seasonal factors are all worth further investigation as possible descriptive factors. Note that the variables have been grouped, where appropriate, into equivalence classes of what are likely to be highly correlated factors. Exploring the N_F_Cattle/FCattle/N_Groups/GroupsCat/N_Sam_Gr/Cattle/N_Cattle group, all of these measures are associated with the size of the animal population on the farm. All of these factors and variables associate higher numbers of cattle and/or groups with a higher probability of the farm exhibiting a sample containing VT E. coli O157. Examining the output from the model for N_F_Cattle, we note the high leverage which is associated with the larger values of the explanatory variable. 5621 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5622 TERMS [FACT=9] N_F_Catt 5623 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5624 N_F_Catt ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, N_F_Catt *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 1 17.1 17.065 17.06 <.001 Residual 950 980.0 1.032 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: intermediate responses are more variable than small or large responses * MESSAGE: The following units have high leverage: Unit Response Leverage 65 0.00 0.0606 70 1.00 0.0118 130 1.00 0.0212 172 0.00 0.0225 286 1.00 0.0212 308 1.00 0.0102 422 0.00 0.0102 440 1.00 0.0554 444 1.00 0.0368 450 0.00 0.0673 454 0.00 0.0152 455 0.00 0.0212 496 0.00 0.0212 499 0.00 0.0152 527 0.00 0.0078 80 529 1.00 0.0279 545 0.00 0.0085 552 0.00 0.0102 578 1.00 0.0102 683 1.00 0.0423 737 0.00 0.0102 775 0.00 0.0131 781 1.00 0.0082 838 1.00 0.0111 861 1.00 0.0212 874 0.00 0.0517 884 1.00 0.0102 920 0.00 0.0187 952 0.00 0.0102 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.567 0.108 -14.54 <.001 0.2087 N_F_Catt 0.003631 0.000884 4.11 <.001 1.004 MESSAGE: s.e.s are based on dispersion parameter with value 1 Such large leverages associated with a sparse tail of the distribution of a variable are generally associated with poor models. Hence, FCattle is to be preferred as an explanatory variable. The output from this model still exhibits the same leverage issues, but these effects are confined to the largest classification class, which is of relatively little importance. 5625 "Modelling of binomial proportions. (e.g. by logits)." 5626 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5627 TERMS [FACT=9] FCattle 5628 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5629 FCattle ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, FCattle *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 3 20.1 6.704 6.70 <.001 Residual 948 976.9 1.030 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses * MESSAGE: The following units have high leverage: Unit Response Leverage 22 1.00 0.0158 65 0.00 0.0158 70 1.00 0.0158 97 0.00 0.0158 130 1.00 0.0158 172 0.00 0.0158 200 0.00 0.0158 280 0.00 0.0158 286 1.00 0.0158 308 1.00 0.0158 322 0.00 0.0158 324 0.00 0.0158 81 355 0.00 0.0158 363 0.00 0.0158 369 0.00 0.0158 383 1.00 0.0158 386 0.00 0.0158 388 0.00 0.0158 421 0.00 0.0158 422 0.00 0.0158 425 1.00 0.0158 440 1.00 0.0158 444 1.00 0.0158 446 1.00 0.0158 450 0.00 0.0158 454 0.00 0.0158 455 0.00 0.0158 468 0.00 0.0158 472 0.00 0.0158 489 0.00 0.0158 496 0.00 0.0158 499 0.00 0.0158 527 0.00 0.0158 529 1.00 0.0158 545 0.00 0.0158 552 0.00 0.0158 560 0.00 0.0158 578 1.00 0.0158 620 1.00 0.0158 651 1.00 0.0158 661 0.00 0.0158 667 0.00 0.0158 683 1.00 0.0158 688 1.00 0.0158 705 0.00 0.0158 725 0.00 0.0158 737 0.00 0.0158 752 0.00 0.0158 763 0.00 0.0158 775 0.00 0.0158 781 1.00 0.0158 805 0.00 0.0158 809 1.00 0.0158 838 1.00 0.0158 857 1.00 0.0158 861 1.00 0.0158 874 0.00 0.0158 884 1.00 0.0158 897 0.00 0.0158 920 0.00 0.0158 922 1.00 0.0158 945 0.00 0.0158 952 0.00 0.0158 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.649 0.126 -13.08 <.001 0.1923 FCattle 2 0.587 0.192 3.06 0.002 1.799 FCattle 3 0.588 0.214 2.75 0.006 1.800 FCattle 4 1.095 0.290 3.78 <.001 2.990 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level FCattle 1 When the model is refitted, constrained to model only the smaller classes, the following output is generated: 5630 RESTRICT FCattle;CONDITION=FCattle.LT.4 5631 "Modelling of binomial proportions. (e.g. by logits)." 5632 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5633 TERMS [FACT=9] FCattle 82 5634 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5635 FCattle * MESSAGE: Term FCattle cannot be fully included in the model because 1 parameter is aliased with terms already in the model (FCattle 4) = 0 ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, FCattle *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 2 12.4 6.211 6.21 0.002 Residual 886 894.2 1.009 Total 888 906.6 1.021 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.649 0.126 -13.08 <.001 0.1923 FCattle 2 0.587 0.192 3.06 0.002 1.799 FCattle 3 0.588 0.214 2.75 0.006 1.800 FCattle 4 0 * * * 1.000 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level FCattle 1 The effect is still highly significant (p=0.002). Hence, FCattle is always to be preferred over N_F_Cattle. Similar considerations apply to N_Group, where the tail of the distribution has a strong leverage on the model: 5642 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5643 TERMS [FACT=9] N_Groups 5644 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5645 N_Groups ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, N_Groups *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr 83 Regression 1 4.0 4.044 4.04 0.044 Residual 950 993.0 1.045 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The following units have high leverage: Unit Response Leverage 65 0.00 0.0461 97 0.00 0.0104 249 0.00 0.0087 293 0.00 0.0104 324 0.00 0.0123 440 1.00 0.0594 450 0.00 0.0797 454 0.00 0.0087 487 0.00 0.0072 494 0.00 0.0087 496 0.00 0.1141 527 0.00 0.0166 529 1.00 0.0123 545 0.00 0.0277 552 0.00 0.0217 748 0.00 0.0123 781 1.00 0.0123 861 1.00 0.0217 922 1.00 0.2307 945 0.00 0.0104 946 1.00 0.0087 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.417 0.104 -13.56 <.001 0.2426 N_Groups 0.0375 0.0185 2.02 0.043 1.038 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Replacing N_Groups with GroupsCat gives rise to the following output: 5652 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5653 TERMS [FACT=9] GroupsCat 5654 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5655 GroupsCat ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, GroupsCat *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 3 6.7 2.230 2.23 0.082 Residual 948 990.3 1.045 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses * MESSAGE: The following units have high leverage: Unit Response Leverage 50 0.00 0.0231 65 0.00 0.0231 97 0.00 0.0231 172 0.00 0.0231 84 249 0.00 0.0231 254 1.00 0.0231 285 0.00 0.0231 293 0.00 0.0231 324 0.00 0.0231 330 0.00 0.0231 331 0.00 0.0231 440 1.00 0.0231 450 0.00 0.0231 454 0.00 0.0231 459 0.00 0.0231 460 1.00 0.0231 487 0.00 0.0231 494 0.00 0.0231 496 0.00 0.0231 520 1.00 0.0231 527 0.00 0.0231 529 1.00 0.0231 545 0.00 0.0231 552 0.00 0.0231 599 1.00 0.0231 667 0.00 0.0231 688 1.00 0.0231 692 0.00 0.0231 709 0.00 0.0231 748 0.00 0.0231 761 0.00 0.0231 775 0.00 0.0231 781 1.00 0.0231 813 1.00 0.0231 839 0.00 0.0231 857 1.00 0.0231 861 1.00 0.0231 864 1.00 0.0231 901 0.00 0.0231 922 1.00 0.0231 945 0.00 0.0231 946 1.00 0.0231 952 0.00 0.0231 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.604 0.178 -9.03 <.001 0.2011 GroupsCat 2 0.391 0.203 1.92 0.054 1.478 GroupsCat 3 0.318 0.303 1.05 0.295 1.374 GroupsCat 4 0.876 0.370 2.37 0.018 2.401 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level GroupsCat 1 On first review, this model output may appear less acceptable than the first, since the number of high leverage observations is higher. However, these observations are all of those allocated to the highest level of the group. The true suitability of the model can again be examined by constraining the model to ignore this level. 5673 RESTRICT GroupsCat;CONDITION=GroupsCat.LT.4 5674 "Modelling of binomial proportions. (e.g. by logits)." 5675 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5676 TERMS [FACT=9] GroupsCat 5677 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5678 GroupsCat * MESSAGE: Term GroupsCat cannot be fully included in the model because 1 parameter is aliased with terms already in the model (GroupsCat 4) = 0 85 ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, GroupsCat *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 2 3.9 1.935 1.94 0.144 Residual 906 936.1 1.033 Total 908 939.9 1.035 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.604 0.178 -9.03 <.001 0.2011 GroupsCat 2 0.391 0.203 1.92 0.054 1.478 GroupsCat 3 0.318 0.303 1.05 0.295 1.374 GroupsCat 4 0 * * * 1.000 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level GroupsCat 1 Leverage is not a problem in this model, but much of the significance of the effects has been lost. 5680 GROUPS [LMETHOD=*;boundaries=upper] N_Groups; RevGCat; limits=!(1.5); LABELS=!T(One, More) 5681 "Modelling of binomial proportions. (e.g. by logits)." 5682 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5683 TERMS [FACT=9] RevGCat 5684 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5685 RevGCat ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, RevGCat *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 1 4.6 4.581 4.58 0.032 Residual 950 992.4 1.045 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses *** Estimates of parameters *** 86 antilog of estimate s.e. t(*) t pr. estimate Constant -1.604 0.178 -9.03 <.001 0.2011 RevGCat 2 0.413 0.198 2.09 0.037 1.512 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level RevGCat 1 Hence, farms with more than one sampling group are more likely to exhibit positive samples (p=0.04). RevGCat is a more appropriate term to include in a model than N_Groups or GroupsCat. Similar considerations apply to the N_Cattle and Cattle terms. N_Cattle is a significant variable, but some of the larger terms exert a strong leverage on the results: 5687 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5688 TERMS [FACT=9] N_Cattle 5689 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5690 N_Cattle ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, N_Cattle *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 1 8.3 8.275 8.27 0.004 Residual 950 988.7 1.041 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The following units have high leverage: Unit Response Leverage 62 1.00 0.0165 70 1.00 0.0109 182 0.00 0.0097 200 0.00 0.0104 201 0.00 0.0108 310 0.00 0.0083 370 0.00 0.0108 418 0.00 0.0072 444 1.00 0.0464 460 1.00 0.0083 494 0.00 0.0503 496 0.00 0.0216 527 0.00 0.1084 599 1.00 0.0104 651 1.00 0.0116 680 0.00 0.0125 737 0.00 0.0372 748 0.00 0.0079 750 1.00 0.0190 761 0.00 0.0417 763 0.00 0.0665 769 0.00 0.0186 884 1.00 0.0216 87 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.466 0.103 -14.18 <.001 0.2307 N_Cattle 0.001299 0.000446 2.91 0.004 1.001 MESSAGE: s.e.s are based on dispersion parameter with value 1 Fitting Cattle gives similar results, but the leverage effects are confined to the larger two levels. 5692 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5693 TERMS [FACT=9] Cattle 5694 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5695 Cattle 5695............................................................................ ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, Cattle *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 3 14.4 4.815 4.81 0.002 Residual 948 982.6 1.036 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses * MESSAGE: The following units have high leverage: Unit Response Leverage 62 1.00 0.0454 70 1.00 0.0454 165 0.00 0.0454 182 0.00 0.0454 200 0.00 0.0454 201 0.00 0.0454 284 1.00 0.0454 310 0.00 0.0454 348 0.00 0.0454 370 0.00 0.0454 418 0.00 0.0454 437 0.00 0.0454 444 1.00 0.1664 460 1.00 0.0454 494 0.00 0.1664 496 0.00 0.0454 527 0.00 0.1664 599 1.00 0.0454 603 1.00 0.0454 651 1.00 0.0454 680 0.00 0.0454 737 0.00 0.1664 748 0.00 0.0454 750 1.00 0.0454 761 0.00 0.1664 763 0.00 0.1664 769 0.00 0.0454 884 1.00 0.0454 *** Estimates of parameters *** 88 antilog of estimate s.e. t(*) t pr. estimate Constant -1.560 0.118 -13.24 <.001 0.2101 Cattle 2 0.514 0.162 3.18 0.001 1.672 Cattle 3 1.192 0.449 2.65 0.008 3.294 Cattle 4 -0.05 1.10 -0.04 0.964 0.9517 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Cattle 1 However, the leverage issues are restricted to the largest two levels of the factor. Refitting the model, restricting the fit to lower levels, gives the following output: 5701 RESTRICT Cattle;CONDITION=Cattle.LT.3 * MESSAGE: The structure Cattle is already restricted. Results may be unexpected . 5702 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5703 TERMS [FACT=9] Cattle 5704 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5705 Cattle * MESSAGE: Term Cattle cannot be fully included in the model because 2 parameters are aliased with terms already in the model (Cattle 3) = 0 (Cattle 4) = 0 ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, Cattle *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 1 10.2 10.176 10.18 0.001 Residual 922 947.4 1.028 Total 923 957.6 1.037 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.560 0.118 -13.24 <.001 0.2101 Cattle 2 0.514 0.162 3.18 0.001 1.672 Cattle 3 0 * * * 1.000 Cattle 4 0 * * * 1.000 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Cattle 1 The Cattle factor is highly significant and well-fitting. It is therefore preferable to the N_Cattle variable. 89 Fitting N_Sam_Gr gives rise to the following output: 5557 GROUPS [LMETHOD=*;boundaries=upper] N_Groups; RevGCat; limits=!(1.5); LABELS=!T(One, More) 5558 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5559 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5560 N_Sam_Gr 5560............................................................................ ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, N_Sam_Gr *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 1 23.1 23.052 23.05 <.001 Residual 950 974.0 1.025 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The following units have high leverage: Unit Response Leverage 18 0.00 0.0147 54 0.00 0.0098 59 0.00 0.0074 61 1.00 0.0070 70 1.00 0.0505 107 0.00 0.0126 123 0.00 0.0351 149 0.00 0.0158 167 0.00 0.0290 267 1.00 0.0186 363 0.00 0.0228 413 0.00 0.0169 440 1.00 0.0074 503 0.00 0.0158 510 0.00 0.0074 532 1.00 0.0169 544 0.00 0.0098 578 1.00 0.0228 584 1.00 0.0351 603 1.00 0.0090 609 0.00 0.0198 620 1.00 0.0290 637 1.00 0.0074 681 1.00 0.0136 703 1.00 0.0290 743 0.00 0.0070 781 1.00 0.0136 831 0.00 0.0141 838 1.00 0.0074 891 0.00 0.0086 906 0.00 0.0406 924 0.00 0.0116 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.770 0.134 -13.23 <.001 0.1703 N_Sam_Gr 0.02106 0.00444 4.75 <.001 1.021 90 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Again, many of the points have a high leverage: these are farms with particularly high numbers of animals. Examining the properties of N_Sam_Gr we define a factor based on the quartiles of the distribution. 5583 DESCRIBE [SELECTION=nobs,nmv,mean,median,min,max,q1,q3] N_Sam_Gr Summary statistics for N_Sam_Gr Number of observations = 952 Number of missing values = 0 Mean = 21.85 Median = 17.00 Minimum = 2.00 Maximum = 177.00 Lower quartile = 11.00 Upper quartile = 28.00 5586 GROUPS [LMETHOD=*;boundaries=upper] N_Sam_Gr; SamGrF; limits=!(11,17,28) 5586 GROUPS [LMETHOD=*;boundaries=upper] N_Sam_Gr; SamGrF; limits=!(11,17,28) 5587 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5588 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5589 SamGrF 5589............................................................................ ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, SamGrF *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 3 32.2 10.728 10.73 <.001 Residual 948 964.8 1.018 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -2.089 0.204 -10.24 <.001 0.1239 SamGrF 2 0.836 0.257 3.25 0.001 2.307 SamGrF 3 0.847 0.256 3.31 <.001 2.332 SamGrF 4 1.330 0.248 5.37 <.001 3.782 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level SamGrF 1 This factor fits well and is extremely statistically significant. Hence, SamGrF is preferred to N_Sam_Gr for further analysis. 91 Since Natural was such an important factor in the levels of shedding analysis, and the observed p-value in this analysis was only marginally above 0.1, it is worthwhile to review the effect of this factor in more depth. Focusing only on unhoused animals, given the negligible number of farms with housed animals and a natural source of water (7), and using the factor Natural2 to review the effect of natural water supplies on unhoused animals only, the observed p-value increases to 0.12. Hence, this factor is not considered for inclusion in the multifactor model. Hence, FCattle, RevGCat, SamGrF and Cattle are the preferred factors for further review, with the other factors being removed primarily for reasons of model fit. Exploring the FCattle/RevGCat/SamGrF/Cattle complex, which all associate higher risk of shedding being identified on a farm with larger numbers of cattle, using forward stepwise selection with the Akaike information criterion to select candidates for inclusion/exclusion, we generate the following output: 5594 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5595 RSEARCH [PRINT=model,results; METHOD=fstepwise; CONSTANT=estimate; FACTORIAL=3; DENOMINATOR=ss;\ 5596 INRATIO=1; OUTRATIO=1; MAXCYCLE=50; AFACTORIAL=2; CRITERION=aic; EXTRA=cp; NTERMS=60;\ 5597 NBESTMODELS=8] FCattle+RevGCat+SamGrF+Cattle ***** Model Selection ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Number of units: 952 Forced terms: Constant Forced df: 1 Free terms: FCattle + RevGCat + SamGrF + Cattle *** Stepwise (forward) analysis of deviance *** Change mean deviance approx d.f. deviance deviance ratio chi pr + SamGrF 3 32.184 10.728 10.73 <.001 + Cattle 3 10.905 3.635 3.63 0.012 + FCattle 3 6.721 2.240 2.24 0.081 Residual 942 947.210 1.006 Total 951 997.020 1.048 Final model: Constant + SamGrF + Cattle + FCattle The factor categorising the numbers of sampling groups is the most relevant, but the factor categorising the number of cattle on the farm also shows signs of strong statistical significance. The factor categorising the number of finishing cattle shows signs of statistical significance, even in the presence of the latter two factors. Only the factor categorising the total numbers of groups of cattle on the farm is found to lack any real statistical significance. On this basis, each of the factors FCattle, SamGrF and Cattle should be candidates for inclusion in the multivariate model. Considering Source and NewSource as candidate factors, fitting Source (the basic data) gives the following output: 92 5598 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5599 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5600 Source ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, Source *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 2 4.7 2.341 2.34 0.096 Residual 949 992.3 1.046 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.418 0.103 -13.73 <.001 0.2422 Source Buy 0.326 0.190 1.71 0.087 1.385 Source Both 0.372 0.212 1.75 0.080 1.451 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Source Breed The factor shows a moderate level of statistical significance, but this is entirely due to the differences between the class of farms which never buy replacement cattle on one hand, and those which buy or do both on the other. There is no evidence of any statistically significant difference between this latter two group: t=0.16, p=0.87. Hence, it would seem sensible to replace Source with a new factor, New Source, which consolidates the farms into a single ‘Open’ class and a ‘Closed’ class. Fitting this factor gives: 5601 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5602 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5603 NewSource ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, NewSource *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 1 4.6 4.647 4.65 0.031 93 Residual 950 992.4 1.045 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.418 0.103 -13.73 <.001 0.2422 NewSource 2 0.346 0.159 2.17 0.030 1.413 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level NewSource 1 Farms which never buy in replacement cattle have statistically significantly (p=0.03) lower risk of exhibiting a shedding animal than those which occasionally or frequently buy animals in. NewSource will be a candidate factor in the multivariate analysis. BeefonDairy is a variable defined after close consideration of the properties of the dataset, in particular, Breed and Manage_O. Breed shows some evidence of significance in the bivariate analysis, but there is also evidence that the effect is confined to a subset of farms. Manage_O exhibits no evidence of significant differences in prevalence, but is important in understanding the patterns seen in Breed. Fitting Breed as a main effect gives the following output: 5562 "Modelling of binomial proportions. (e.g. by logits)." 5563 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5564 TERMS [FACT=9] Breed 5565 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5566 Breed 5566............................................................................ * MESSAGE: Term Breed cannot be fully included in the model because 1 parameter is aliased with terms already in the model (Breed B_D) = 0 ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, Breed *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 5 12.6 2.513 2.51 0.028 Residual 946 984.5 1.041 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 94 * MESSAGE: The error variance does not appear to be constant: intermediate responses are more variable than small or large responses * MESSAGE: The following units have high leverage: Unit Response Leverage 7 1.00 0.091 8 0.00 0.024 17 1.00 0.091 60 0.00 0.024 87 0.00 0.024 101 1.00 0.091 110 0.00 0.024 113 0.00 0.091 116 0.00 0.091 118 0.00 0.091 184 0.00 0.024 185 0.00 0.091 223 0.00 0.024 280 0.00 0.024 291 0.00 0.024 306 0.00 0.024 314 0.00 0.024 338 0.00 0.024 345 0.00 0.024 350 0.00 0.024 447 0.00 0.091 479 0.00 0.024 485 0.00 0.024 494 0.00 0.024 542 0.00 0.024 593 0.00 0.024 595 0.00 0.024 596 0.00 0.024 598 0.00 0.024 599 1.00 0.024 600 0.00 0.166 607 0.00 0.024 619 0.00 0.024 620 1.00 0.166 637 1.00 0.166 645 0.00 0.024 646 1.00 0.024 661 0.00 0.024 688 1.00 0.166 702 0.00 0.024 708 0.00 0.024 725 0.00 0.024 728 0.00 0.024 729 0.00 0.024 735 0.00 0.024 747 0.00 0.024 755 0.00 0.091 762 0.00 0.024 813 1.00 0.024 825 1.00 0.024 826 0.00 0.024 856 0.00 0.024 859 1.00 0.091 864 1.00 0.024 884 1.00 0.166 896 0.00 0.024 911 0.00 0.166 951 0.00 0.024 952 0.00 0.091 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.2595 0.0872 -14.44 <.001 0.2838 Breed DB -0.532 0.352 -1.51 0.131 0.5873 Breed D 0.700 0.632 1.11 0.268 2.014 Breed B_DB 0.182 0.301 0.60 0.546 1.200 Breed DB_D -0.742 0.484 -1.53 0.126 0.4762 Breed B_D 0 * * * 1.000 Breed B_D_DB 1.953 0.868 2.25 0.024 7.047 95 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Breed B However, the patterns is very different on different types of farm. Tabulating the number of farms and number of positive farms with respect to their recorded values for Breed and Manage_O, gives the following results (the number of farms recorded as “Mixed” are too small for any statistical analysis, and are excluded; no animals were recorded as “B_D”): Number Dairy Beef Other B 11 576 173 DB 59 3 8 D 11 - - B_DB 25 18 18 DB_D 42 - - B_D_DB 5 1 - Positives Dairy Beef Other B 6 123 39 DB 9 0 1 D 4 - - B_DB 6 6 4 DB_D 5 - - B_D_DB 3 1 - The means and marginal means for these tables are given by: Dairy Beef Other All B 0.545 0.214 0.225 0.221 DB 0.153 0.000 0.125 0.143 D 0.364 - - 0.364 B_DB 0.240 0.333 0.222 0.262 DB_D 0.119 - - 0.119 B_D_DB 0.600 1.000 0.667 All 0.216 0.217 0.221 0.218 Overall, there are clearly no significant differences between the mean prevalences on the different classes of farm. However, there is no clear evidence of any differences in the prevalence rates for different breeds on beef farms, and no evidence of any differences in the prevalence rates for different breeds on ‘Other’ farms. Similarly, for every breed except beef animals, there is no evidence of any differences in prevalence for the breed on different types of farm. However, an attempt to fit the interaction of Breed and Manage_O to the prevalence data gives the following output: 5716 "Modelling of binomial proportions. (e.g. by logits)." 5717 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5718 TERMS [FACT=9] Breed.Manage_O 96 5719 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5720 Breed.Manage_O 5720............................................................................ * MESSAGE: Term Breed.Manage_O cannot be fully included in the model because 14 parameters are aliased with terms already in the model (Breed B .Manage_O Mixed) = 0 (Breed DB .Manage_O Mixed) = 0 (Breed D .Manage_O Beef) = 0 (Breed D .Manage_O Other) = 0 (Breed D .Manage_O Mixed) = 0 (Breed DB_D .Manage_O Beef) = 0 (Breed DB_D .Manage_O Other) = 0 (Breed DB_D .Manage_O Mixed) = 0 (Breed B_D .Manage_O Dairy) = 0 (Breed B_D .Manage_O Beef) = 0 (Breed B_D .Manage_O Other) = 0 (Breed B_D .Manage_O Mixed) = 0 (Breed B_D_DB .Manage_O Other) = 0 (Breed B_D_DB .Manage_O Mixed) = 0 ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant + Breed.Manage_O *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 13 22.0 1.689 1.69 0.056 Residual 938 975.1 1.040 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The following units have high leverage: Unit Response Leverage 5 0.00 0.125 7 1.00 0.091 9 0.00 0.056 17 1.00 0.091 30 1.00 0.056 89 0.00 0.056 93 0.00 0.123 101 1.00 0.091 113 0.00 0.091 114 0.00 0.056 116 0.00 0.091 118 0.00 0.091 131 1.00 0.091 143 1.00 0.056 148 0.00 0.123 183 1.00 0.056 185 0.00 0.091 221 0.00 0.184 97 222 0.00 0.056 274 0.00 0.125 297 0.00 0.091 301 1.00 0.091 316 0.00 0.125 340 0.00 0.056 343 1.00 0.056 351 0.00 0.184 384 0.00 0.091 385 1.00 0.091 391 0.00 0.125 440 1.00 0.056 441 0.00 0.125 447 0.00 0.091 461 0.00 0.056 467 0.00 0.125 469 0.00 0.056 495 1.00 0.056 497 0.00 0.091 503 0.00 0.056 544 0.00 0.123 550 1.00 0.056 572 1.00 0.125 590 0.00 0.125 600 0.00 0.200 601 1.00 0.091 602 0.00 0.091 620 1.00 0.200 629 0.00 0.056 636 0.00 0.056 637 1.00 0.200 640 1.00 0.056 660 0.00 0.056 667 0.00 0.056 688 1.00 0.369 701 1.00 0.091 751 0.00 0.056 755 0.00 0.091 767 0.00 0.056 777 0.00 0.056 788 1.00 0.091 806 0.00 0.056 809 1.00 0.056 810 0.00 0.056 812 0.00 0.056 816 0.00 0.056 835 0.00 0.056 858 0.00 0.091 859 1.00 0.091 866 0.00 0.056 867 1.00 0.056 882 0.00 0.056 884 1.00 0.200 895 0.00 0.056 906 0.00 0.056 911 0.00 0.200 912 0.00 0.056 923 0.00 0.056 952 0.00 0.091 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant 0.182 0.606 0.30 0.763 1.200 Breed B .Manage_O Beef -1.486 0.614 -2.42 0.016 0.2263 Breed B .Manage_O Other -1.417 0.632 -2.24 0.025 0.2425 Breed B .Manage_O Mixed 0 * * * 1.000 Breed DB .Manage_O Dairy -1.897 0.706 -2.69 0.007 0.1500 Breed DB .Manage_O Beef -6.75 9.36 -0.72 0.471 0.001175 Breed DB .Manage_O Other -2.13 1.23 -1.73 0.083 0.1190 Breed DB .Manage_O Mixed 0 * * * 1.000 Breed D .Manage_O Dairy -0.742 0.872 -0.85 0.395 0.4762 Breed D .Manage_O Beef 0 * * * 1.000 Breed D .Manage_O Other 0 * * * 1.000 Breed D .Manage_O Mixed 0 * * * 1.000 98 Breed B_DB .Manage_O Dairy -1.335 0.765 -1.74 0.081 0.2632 Breed B_DB .Manage_O Beef -0.875 0.785 -1.11 0.265 0.4167 Breed B_DB .Manage_O Other -1.435 0.830 -1.73 0.084 0.2381 Breed B_DB .Manage_O Mixed -6.7 11.5 -0.59 0.556 0.001175 Breed DB_D .Manage_O Dairy -2.184 0.771 -2.83 0.005 0.1126 Breed DB_D .Manage_O Beef 0 * * * 1.000 Breed DB_D .Manage_O Other 0 * * * 1.000 Breed DB_D .Manage_O Mixed 0 * * * 1.000 Breed B_D .Manage_O Dairy 0 * * * 1.000 Breed B_D .Manage_O Beef 0 * * * 1.000 Breed B_D .Manage_O Other 0 * * * 1.000 Breed B_D .Manage_O Mixed 0 * * * 1.000 Breed B_D_DB .Manage_O Dairy 0.22 1.10 0.20 0.839 1.250 Breed B_D_DB .Manage_O Beef 5.04 8.33 0.60 0.545 153.9 Breed B_D_DB .Manage_O Other 0 * * * 1.000 Breed B_D_DB .Manage_O Mixed 0 * * * 1.000 * MESSAGE: s.e.s are based on dispersion parameter with value 1 The model fit is extremely messy: many of the terms are aliased, and the leverage situation is extremely complicated. The model fit has a p-value of 0.056, not quite formally significant, but rather impressive where 13 degrees of freedom have been used to fit interaction terms where we believe that only one term is likely to be significant. As noted earlier, there is no evidence of any pattern as a function of breed in the beef and ‘Other’ herds: hence it might be informative to examine the output from fitting Breed to only the Dairy herds: 5569 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5570 TERMS [FACT=9] Breed 5571 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5572 Breed 5572............................................................................ * MESSAGE: Term Breed cannot be fully included in the model because 1 parameter is aliased with terms already in the model (Breed B_D) = 0 ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, Breed *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 5 14.6 2.9249 2.92 0.012 Residual 147 144.9 0.9859 Total 152 159.5 1.0496 * MESSAGE: ratios are based on dispersion parameter with value 1 99 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses * MESSAGE: The following units have high leverage: Unit Response Leverage 600 0.00 0.199 620 1.00 0.199 637 1.00 0.199 884 1.00 0.199 911 0.00 0.199 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant 0.182 0.605 0.30 0.763 1.200 Breed DB -1.897 0.705 -2.69 0.007 0.1500 Breed D -0.742 0.870 -0.85 0.394 0.4762 Breed B_DB -1.335 0.764 -1.75 0.081 0.2632 Breed DB_D -2.184 0.770 -2.84 0.005 0.1126 Breed B_D 0 * * * 1.000 Breed B_D_DB 0.22 1.09 0.20 0.838 1.250 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Breed B The resulting model is statistically significant (p=0.01). It may be informative to examine confidence intervals for the mean prevalences for different breeds in dairy herds: 1 0.9 0.8 0.7 Mean Prevalence 0.6 0.5 0.4 0.3 0.2 0.1 0 B DB D B_D DB_D B_D_DB Breed of Animal Restricting attention only to animals outwith the B or B_D_DB classes, the following output is generated: 5654 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5655 TERMS [FACT=9] Breed 5656 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5657 Breed 100 5657............................................................................ * MESSAGE: Term Breed cannot be fully included in the model because 3 parameters are aliased with terms already in the model (Breed B) = 0 (Breed B_D) = 0 (Breed B_D_DB) = 0 ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, Breed *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 3 4.1 1.3683 1.37 0.250 Residual 133 123.0 0.9251 Total 136 127.1 0.9348 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses * MESSAGE: The following units have high leverage: Unit Response Leverage 7 1.00 0.091 17 1.00 0.091 101 1.00 0.091 113 0.00 0.091 116 0.00 0.091 118 0.00 0.091 185 0.00 0.091 447 0.00 0.091 755 0.00 0.091 859 1.00 0.091 952 0.00 0.091 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.715 0.362 -4.74 <.001 0.1800 Breed B 0 * * * 1.000 Breed D 1.155 0.723 1.60 0.110 3.175 Breed B_DB 0.562 0.591 0.95 0.341 1.754 Breed DB_D -0.287 0.598 -0.48 0.632 0.7508 Breed B_D 0 * * * 1.000 Breed B_D_DB 0 * * * 1.000 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Breed DB There is no evidence of any differences between the prevalences on these classes of farms. Examining the B and B_D_DB classes, tabulating their positive and negative values and carrying out a Fisher’s Exact test, we get: 5634 FEXACT2X2 [PRINT=prob] C1 ***** Fisher's Exact Test ***** One-tailed significance level 0.635 101 Mid-P value 0.433 Two-tailed significance level Two times one-tailed significance level 1.269 Mid-P value 0.865 Sum of all outcomes with Prob<=Observed 1.000 Mid-P value 0.798 There is no evidence of any difference in prevalence between the B and B_D_DB classes in dairy herds. However, fitting a model only to dairy herds, while excluding the beef class, gives the following output: 5660 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5661 TERMS [FACT=9] Breed 5662 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5663 Breed 5663............................................................................ * MESSAGE: Term Breed cannot be fully included in the model because 2 parameters are aliased with terms already in the model (Breed B) = 0 (Breed B_D) = 0 ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, Breed *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 4 8.4 2.0954 2.10 0.079 Residual 137 129.8 0.9472 Total 141 138.1 0.9798 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses * MESSAGE: The following units have high leverage: Unit Response Leverage 600 0.00 0.199 620 1.00 0.199 637 1.00 0.199 884 1.00 0.199 911 0.00 0.199 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.715 0.362 -4.74 <.001 0.1800 Breed B 0 * * * 1.000 Breed D 1.155 0.723 1.60 0.110 3.175 Breed B_DB 0.562 0.591 0.95 0.341 1.754 Breed DB_D -0.287 0.598 -0.48 0.632 0.7508 Breed B_D 0 * * * 1.000 Breed B_D_DB 2.120 0.980 2.16 0.031 8.333 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Breed DB 102 Hence, although the prevalence in group B_D_DB is higher, strictly speaking it is not statistically significantly higher than in the lower classes (p=0.08). However, the sample size is extremely small, and the comparison will have lacked power. The greatest danger in this exercise is to overtrawl the data. The overall effect of fitting the Manage_O by Breed interaction was close to formal statistical significance. Hence, we are not unjustified, invoking the overall test as a type of Fisher test for multiple comparisons, in investigating the properties of individual interaction terms. However, it would seem unwise to be overly liberal in then assigning importance to extremely small samples from the data, which actually lack formal statistical significance. In addition, the effect of beef animals on dairy herds appears to be specific to this type of farm. It is impossible to have the same confidence about the properties of the B_D_DB class, since the sample size in anything but the dairy herd is negligible. In conclusion, it seems rational to create a new variable, BeefonDairy, to identify those farms with beef animals and a dairy management system. Fitting this variable gives the following results: 5665 "Modelling of binomial proportions. (e.g. by logits)." 5666 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5667 TERMS [FACT=9] BeefonDairy 5668 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5669 BeefonDairy 5669............................................................................ ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, BeefonDairy *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 1 5.7 5.685 5.69 0.017 Residual 950 991.3 1.044 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses * MESSAGE: The following units have high leverage: Unit Response Leverage 131 1.00 0.0908 297 0.00 0.0908 301 1.00 0.0908 384 0.00 0.0908 385 1.00 0.0908 497 0.00 0.0908 601 1.00 0.0908 602 0.00 0.0908 701 1.00 0.0908 788 1.00 0.0908 858 0.00 0.0908 103 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.3033 0.0794 -16.42 <.001 0.2716 BeefonDairy 1 1.486 0.610 2.43 0.015 4.418 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level BeefonDairy 0 Farms in this class appear to have a significantly (p=0.02) higher prevalence. However, care must be taken over interpreting this factor, since it is derived from an extensive examination of the properties of the dataset. However, BeefonDairy should clearly be incorporated into the multivariate analysis. Fitting a model with both BeefonDairy and Breed as main effects, we generate the following output: 5679 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5680 TERMS [FACT=9] BeefonDairy+Breed 5681 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5682 BeefonDairy+Breed 5682............................................................................ * MESSAGE: Term Breed cannot be fully included in the model because 1 parameter is aliased with terms already in the model (Breed B_D) = 0 ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant + BeefonDairy + Breed *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 6 18.1 3.019 3.02 0.006 Residual 945 978.9 1.036 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses * MESSAGE: The following units have high leverage: Unit Response Leverage 7 1.00 0.091 17 1.00 0.091 101 1.00 0.091 113 0.00 0.091 116 0.00 0.091 118 0.00 0.091 131 1.00 0.091 185 0.00 0.091 297 0.00 0.091 301 1.00 0.091 384 0.00 0.091 385 1.00 0.091 447 0.00 0.091 497 0.00 0.091 600 0.00 0.166 104 601 1.00 0.091 602 0.00 0.091 620 1.00 0.166 637 1.00 0.166 688 1.00 0.166 701 1.00 0.091 755 0.00 0.091 788 1.00 0.091 858 0.00 0.091 859 1.00 0.091 884 1.00 0.166 911 0.00 0.166 952 0.00 0.091 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.792 0.341 -5.25 <.001 0.1667 BeefonDairy 1 1.470 0.611 2.40 0.016 4.348 Breed B 0.504 0.353 1.43 0.153 1.656 Breed D 1.232 0.713 1.73 0.084 3.429 Breed B_DB 0.714 0.447 1.60 0.110 2.043 Breed DB_D -0.210 0.586 -0.36 0.721 0.8108 Breed B_D 0 * * * 1.000 Breed B_D_DB 2.485 0.928 2.68 0.007 12.00 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level BeefonDairy 0 Breed DB 5683 DROP [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes] Breed 5683............................................................................ ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant + BeefonDairy *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 1 5.7 5.685 5.69 0.017 Residual 950 991.3 1.044 Total 951 997.0 1.048 Change 5 12.4 2.486 2.49 0.029 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses * MESSAGE: The following units have high leverage: Unit Response Leverage 131 1.00 0.0908 297 0.00 0.0908 301 1.00 0.0908 384 0.00 0.0908 385 1.00 0.0908 497 0.00 0.0908 601 1.00 0.0908 602 0.00 0.0908 701 1.00 0.0908 788 1.00 0.0908 858 0.00 0.0908 105 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.3033 0.0794 -16.42 <.001 0.2716 BeefonDairy 1 1.486 0.610 2.43 0.015 4.418 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level BeefonDairy 0 Both Breed (p=0.03) and BeefonDairy (p=0.02) are formally significantly explaining variability in the dataset. The latter is no surprise, but the former result deserves further attention. It is no surprise that the effect is completely driven by the B_D_DB level of the factor. This small group of 6 farms have a much higher prevalence. Leverage is a problem, but it would seem reasonable to define a new factor based exclusively around this breed, and include it in the multivariate analysis. Fitting Breed2 gives the following output: 5722 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5723 TERMS [FACT=9] Breed2 5724 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5725 Breed2 5725............................................................................ ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, Breed2 *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 1 5.6 5.595 5.59 0.018 Residual 950 991.4 1.044 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The following units have high leverage: Unit Response Leverage 600 0.00 0.1656 620 1.00 0.1656 637 1.00 0.1656 688 1.00 0.1656 884 1.00 0.1656 911 0.00 0.1656 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.2975 0.0790 -16.42 <.001 0.2732 Breed2 1 1.991 0.867 2.30 0.022 7.320 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Breed2 0 106 Breed2 is therefore included in the multivariate analysis. When investigating the properties of the factors Grass_Manure and Grass_Slurry, it is important to remember that these questions were, for the most part, only asked of farms where the animals were at pasture. Only 3 farms with housed animals recorded an answer to the questions about the properties of their pasture. Tabulating out the properties by Housing and slurry status gives the following tables: Number of Farms Housed No Slurry Yes Slurry Blank 0 308 77 0 1 3 0 563 Number Positive Housed No Slurry Yes Slurry Blank 0 53 27 - 1 0 - 126 Fraction Positive Housed No Slurry Yes Slurry Blank 0 0.172 0.351 - 1 0.000 - 0.224 The effect is clearly not just due to differences between housed and unhoused farms. Fitting the GLM gives the following output (the effect of the small number of housed animals which have non blank returns will be small and hence will be ignored for the moment): 5789 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5790 TERMS [FACT=9] Gra_Slur 5791 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5792 Gra_Slur 5792............................................................................ ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, Gra_Slur *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 2 11.6 5.777 5.78 0.003 Residual 948 982.4 1.036 Total 950 994.0 1.046 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses * MESSAGE: The following units have high leverage: Unit Response Leverage 107 46 0.00 0.0129 51 1.00 0.0129 53 1.00 0.0129 55 1.00 0.0129 61 1.00 0.0129 63 0.00 0.0129 80 1.00 0.0129 83 0.00 0.0129 84 0.00 0.0129 86 0.00 0.0129 87 0.00 0.0129 92 0.00 0.0129 100 1.00 0.0129 110 0.00 0.0129 116 0.00 0.0129 118 0.00 0.0129 119 0.00 0.0129 128 0.00 0.0129 129 0.00 0.0129 132 0.00 0.0129 133 1.00 0.0129 135 0.00 0.0129 139 1.00 0.0129 143 1.00 0.0129 174 1.00 0.0129 180 0.00 0.0129 189 1.00 0.0129 190 0.00 0.0129 196 1.00 0.0129 199 0.00 0.0129 202 1.00 0.0129 204 1.00 0.0129 206 0.00 0.0129 215 1.00 0.0129 217 1.00 0.0129 219 0.00 0.0129 225 0.00 0.0129 226 0.00 0.0129 230 0.00 0.0129 247 0.00 0.0129 345 0.00 0.0129 507 0.00 0.0129 533 0.00 0.0129 541 0.00 0.0129 542 0.00 0.0129 543 0.00 0.0129 546 0.00 0.0129 547 0.00 0.0129 548 0.00 0.0129 552 0.00 0.0129 566 1.00 0.0129 578 1.00 0.0129 581 1.00 0.0129 593 0.00 0.0129 598 0.00 0.0129 603 1.00 0.0129 606 0.00 0.0129 608 0.00 0.0129 612 0.00 0.0129 613 1.00 0.0129 637 1.00 0.0129 639 1.00 0.0129 640 1.00 0.0129 645 0.00 0.0129 646 1.00 0.0129 659 0.00 0.0129 662 0.00 0.0129 663 0.00 0.0129 665 0.00 0.0129 670 0.00 0.0129 677 0.00 0.0129 681 1.00 0.0129 690 0.00 0.0129 702 0.00 0.0129 703 1.00 0.0129 707 0.00 0.0129 924 0.00 0.0129 108 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.583 0.151 -10.50 <.001 0.2054 Gra_Slur 1 0.967 0.282 3.43 <.001 2.629 Gra_Slur 999 0.339 0.181 1.87 0.061 1.404 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Gra_Slur 0 Among animals at pasture, those on farms which spread slurry on the grass are at a higher risk of presenting shedding than those on farms which do not. Considering Gra_Manure, we can generate the following tables: Number of Farms Housed No Manure Yes Manure Blank 0 281 104 0 1 3 0 563 Number Positive Housed No Manure Yes Manure Blank 0 67 13 1 0 126 Fraction Positive Housed No Manure Yes Manure Blank 0 0.238 0.125 1 0.000 0.224 Again, any significance due to this factor is clearly not just due to differences between housed and unhoused animals. In fact, the prevalences in housed and unhoused/with no manure on pasture farms are virtually identical. The apparent effect is of unhoused farms which do spread manure having a lower prevalence. Fitting this as a GLM gives the following output: 5801 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5802 TERMS [FACT=9] Gra_Manu 5803 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5804 Gra_Manu 5804............................................................................ ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, Gra_Manu *** Summary of analysis *** 109 mean deviance approx d.f. deviance deviance ratio chi pr Regression 2 6.6 3.307 3.31 0.037 Residual 948 987.3 1.042 Total 950 994.0 1.046 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.175 0.139 -8.43 <.001 0.3088 Gra_Manu 1 -0.771 0.328 -2.35 0.019 0.4627 Gra_Manu 999 -0.068 0.172 -0.40 0.691 0.9338 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Gra_Manu 0 As indicated above, the significant effect (p=0.04) is associated with the spreading of manure on farms with unhoused animals, where farms which spread manure are less likely to present shedding animals. It is necessary to investigate whether there is any confounding of effects occurring between Gra_Slurry and Gra_Manure. Tabulating out the properties of the datset gives the following tables: Number of Farms Unhoused Slurry Manure 0 1 0 241 40 1 67 37 Housed 563 Number Positive Unhoused Slurry Manure 0 1 0 49 18 1 4 9 Housed 126 Fraction Positive Unhoused Slurry Manure 0 1 0 0.203 0.450 1 0.060 0.243 110 Housed 0.224 All the groups have reasonable support in the data, and it is clear that the Slurry and Manure effects both appear to be operating on unhoused animals. Fitting both terms in the same GLM gives the following results (aliasing, mainly due to the blank coding in both factors for most housed farms makes the output messy, but will not affect the main estimates of interest): 5815 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5816 TERMS [FACT=9] Gra_Manu*Gra_Slur 5817 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5818 Gra_Manu*Gra_Slur 5818............................................................................ * MESSAGE: Term Gra_Slur cannot be fully included in the model because 1 parameter is aliased with terms already in the model (Gra_Slur 999) = (Gra_Manu 999) * MESSAGE: Term Gra_Manu.Gra_Slur cannot be fully included in the model because 3 parameters are aliased with terms already in the model (Gra_Manu 1 .Gra_Slur 999) = 0 (Gra_Manu 999 .Gra_Slur 1) = 0 (Gra_Manu 999 .Gra_Slur 999) = (Gra_Manu 999) ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant + Gra_Manu + Gra_Slur + Gra_Manu.Gra_Slur *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 4 24.1 6.034 6.03 <.001 Residual 946 969.8 1.025 Total 950 994.0 1.046 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The residuals do not appear to be random; for example, fitted values in the range 0.22 to 0.22 are consistently larger than observed values and fitted values in the range 0.45 to 0.45 are consistently smaller than observed values * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses * MESSAGE: The following units have high leverage: Unit Response Leverage 46 0.00 0.0269 51 1.00 0.0250 53 1.00 0.0250 55 1.00 0.0250 61 1.00 0.0269 63 0.00 0.0250 80 1.00 0.0269 83 0.00 0.0250 84 0.00 0.0269 86 0.00 0.0269 87 0.00 0.0250 92 0.00 0.0269 111 100 1.00 0.0250 110 0.00 0.0250 116 0.00 0.0269 118 0.00 0.0269 119 0.00 0.0269 128 0.00 0.0250 129 0.00 0.0269 132 0.00 0.0269 133 1.00 0.0250 135 0.00 0.0250 139 1.00 0.0269 143 1.00 0.0250 174 1.00 0.0269 180 0.00 0.0269 189 1.00 0.0250 190 0.00 0.0269 196 1.00 0.0269 199 0.00 0.0269 202 1.00 0.0250 204 1.00 0.0250 206 0.00 0.0269 215 1.00 0.0250 217 1.00 0.0250 219 0.00 0.0269 225 0.00 0.0269 226 0.00 0.0250 230 0.00 0.0250 247 0.00 0.0250 345 0.00 0.0250 507 0.00 0.0269 533 0.00 0.0250 541 0.00 0.0269 542 0.00 0.0250 543 0.00 0.0250 546 0.00 0.0250 547 0.00 0.0250 548 0.00 0.0250 552 0.00 0.0269 566 1.00 0.0250 578 1.00 0.0250 581 1.00 0.0269 593 0.00 0.0250 598 0.00 0.0250 603 1.00 0.0269 606 0.00 0.0269 608 0.00 0.0250 612 0.00 0.0250 613 1.00 0.0250 637 1.00 0.0269 639 1.00 0.0250 640 1.00 0.0269 645 0.00 0.0269 646 1.00 0.0250 659 0.00 0.0250 662 0.00 0.0269 663 0.00 0.0269 665 0.00 0.0269 670 0.00 0.0269 677 0.00 0.0269 681 1.00 0.0250 690 0.00 0.0250 702 0.00 0.0269 703 1.00 0.0250 707 0.00 0.0269 924 0.00 0.0269 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.381 0.159 -8.66 <.001 0.2513 Gra_Manu 1 -1.376 0.539 -2.55 0.011 0.2527 Gra_Manu 999 0.138 0.189 0.73 0.466 1.147 Gra_Slur 1 1.180 0.356 3.32 <.001 3.256 Gra_Slur 999 0 * * * 1.000 Gra_Manu 1 .Gra_Slur 1 0.441 0.733 0.60 0.547 1.555 112 Gra_Manu 1 .Gra_Slur 999 0 * * * 1.000 Gra_Manu 999 .Gra_Slur 1 0 * * * 1.000 Gra_Manu 999 .Gra_Slur 999 0 * * * 1.000 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Gra_Manu 0 Gra_Slur 0 5819 DROP [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes] Gra_Manu.Gra_Slur 5819............................................................................ * MESSAGE: Term Gra_Slur cannot be fully included in the model because 1 parameter is aliased with terms already in the model (Gra_Slur 999) = (Gra_Manu 999) ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant + Gra_Manu + Gra_Slur *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 3 23.8 7.922 7.92 <.001 Residual 947 970.2 1.024 Total 950 994.0 1.046 Change 1 0.4 0.370 0.37 0.543 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The residuals do not appear to be random; for example, fitted values in the range 0.22 to 0.22 are consistently larger than observed values and fitted values in the range 0.47 to 0.47 are consistently smaller than observed values * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses * MESSAGE: The following units have high leverage: Unit Response Leverage 46 0.00 0.0185 51 1.00 0.0200 53 1.00 0.0200 55 1.00 0.0200 61 1.00 0.0185 63 0.00 0.0200 80 1.00 0.0185 83 0.00 0.0200 84 0.00 0.0185 86 0.00 0.0185 87 0.00 0.0200 92 0.00 0.0185 100 1.00 0.0200 110 0.00 0.0200 116 0.00 0.0185 118 0.00 0.0185 119 0.00 0.0185 128 0.00 0.0200 129 0.00 0.0185 132 0.00 0.0185 133 1.00 0.0200 135 0.00 0.0200 139 1.00 0.0185 143 1.00 0.0200 174 1.00 0.0185 113 180 0.00 0.0185 189 1.00 0.0200 190 0.00 0.0185 196 1.00 0.0185 199 0.00 0.0185 202 1.00 0.0200 204 1.00 0.0200 206 0.00 0.0185 215 1.00 0.0200 217 1.00 0.0200 219 0.00 0.0185 225 0.00 0.0185 226 0.00 0.0200 230 0.00 0.0200 247 0.00 0.0200 345 0.00 0.0200 507 0.00 0.0185 533 0.00 0.0200 541 0.00 0.0185 542 0.00 0.0200 543 0.00 0.0200 546 0.00 0.0200 547 0.00 0.0200 548 0.00 0.0200 552 0.00 0.0185 566 1.00 0.0200 578 1.00 0.0200 581 1.00 0.0185 593 0.00 0.0200 598 0.00 0.0200 603 1.00 0.0185 606 0.00 0.0185 608 0.00 0.0200 612 0.00 0.0200 613 1.00 0.0200 637 1.00 0.0185 639 1.00 0.0200 640 1.00 0.0185 645 0.00 0.0185 646 1.00 0.0200 659 0.00 0.0200 662 0.00 0.0185 663 0.00 0.0185 665 0.00 0.0185 670 0.00 0.0185 677 0.00 0.0185 681 1.00 0.0200 690 0.00 0.0200 702 0.00 0.0185 703 1.00 0.0200 707 0.00 0.0185 924 0.00 0.0185 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.403 0.156 -8.97 <.001 0.2459 Gra_Manu 1 -1.148 0.354 -3.24 0.001 0.3172 Gra_Manu 999 0.159 0.186 0.86 0.392 1.173 Gra_Slur 1 1.288 0.307 4.19 <.001 3.624 Gra_Slur 999 0 * * * 1.000 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Gra_Manu 0 Gra_Slur 0 There is no evidence of a statistically significant interaction between the factors (p=0.54), while independently, the spreading of manure is protective and the spreading of slurry is a risk factor for shedding being observed on the farm. It will be important to stress that although this result has been established only for farms with 114 unhoused animals, the relevant data were not collected for housed farms. Hence, both Gra_Slurry and Gra_Manure will be considered in the multifactor model. Considering N_Goats, it is suspicious that this variable is statistically significant, while the related factor reporting the absence or presence of goats is not. Plotting a histogram of N_Goats, we see that the bulk of the records contains zero. Generating a new histogram of the non-zero values of N_Goats, we see the following picture: Histogram of N_Goats 20 15 Frequency 10 5 0 2 4 6 8 10 12 14 16 N_Goats Fitting the model to N_Goats, we generate the following output: 5831 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5832 TERMS [FACT=9] N_Goats 5833 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5834 N_Goats 5834............................................................................ ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, N_Goats *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 1 3.1 3.150 3.15 0.076 Residual 950 993.9 1.046 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 115 Dispersion parameter is fixed at 1.00 * MESSAGE: The following units have high leverage: Unit Response Leverage 9 0.00 0.0075 95 0.00 0.0515 170 0.00 0.0075 243 0.00 0.0075 343 1.00 0.0171 366 1.00 0.0075 367 1.00 0.3600 368 0.00 0.0075 537 0.00 0.0075 554 0.00 0.0316 585 0.00 0.0171 673 0.00 0.0075 676 0.00 0.0515 720 1.00 0.2125 746 1.00 0.0515 766 0.00 0.0765 792 0.00 0.0075 799 0.00 0.0075 818 0.00 0.0515 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.2989 0.0793 -16.38 <.001 0.2728 N_Goats 0.1635 0.0943 1.73 0.083 1.178 MESSAGE: s.e.s are based on dispersion parameter with value 1 The two units with the highest leverage correspond to the farms with 10 and 16 goats. Removing these ultra-high leverage points from the analysis gives rise to the following output: 5934 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5935 TERMS [FACT=9] N_Goats 5936 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5937 N_Goats 5937............................................................................ ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, N_Goats *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 1 0.0 0.019 0.02 0.891 Residual 948 990.9 1.045 Total 949 990.9 1.044 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The following units have high leverage: Unit Response Leverage 9 0.00 0.0192 95 0.00 0.1143 170 0.00 0.0192 243 0.00 0.0192 343 1.00 0.0422 366 1.00 0.0192 367 0.00 0.0192 116 536 0.00 0.0192 553 0.00 0.0741 584 0.00 0.0422 672 0.00 0.0192 675 0.00 0.1143 744 1.00 0.1143 764 0.00 0.1626 790 0.00 0.0192 797 0.00 0.0192 816 0.00 0.1143 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.2888 0.0795 -16.22 <.001 0.2756 N_Goats -0.023 0.172 -0.14 0.892 0.9770 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Having removed the two high leverage points, N_Goats no longer exhibits any particular statistical significance (p=0.89). It will therefore not be considered for inclusion in the multifactor model. The next factor which will receive detailed consideration is Pigs. Fitting this factor gives rise to the following output: 5558 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5559 TERMS [FACT=9] Pigs 5560 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5561 Pigs 5561............................................................................ ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, Pigs *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 1 6.6 6.567 6.57 0.010 Residual 950 990.5 1.043 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses * MESSAGE: The following units have high leverage: Unit Response Leverage 2 0.00 0.0244 13 0.00 0.0244 25 1.00 0.0244 53 1.00 0.0244 66 0.00 0.0244 80 1.00 0.0244 106 0.00 0.0244 170 0.00 0.0244 274 0.00 0.0244 323 0.00 0.0244 326 1.00 0.0244 337 0.00 0.0244 346 0.00 0.0244 117 360 1.00 0.0244 400 0.00 0.0244 428 1.00 0.0244 440 1.00 0.0244 456 0.00 0.0244 463 1.00 0.0244 469 0.00 0.0244 470 0.00 0.0244 482 1.00 0.0244 520 1.00 0.0244 527 0.00 0.0244 572 1.00 0.0244 581 1.00 0.0244 640 1.00 0.0244 659 0.00 0.0244 673 0.00 0.0244 680 0.00 0.0244 682 1.00 0.0244 720 1.00 0.0244 727 0.00 0.0244 746 1.00 0.0244 749 0.00 0.0244 758 0.00 0.0244 769 0.00 0.0244 799 0.00 0.0244 818 0.00 0.0244 932 0.00 0.0244 950 0.00 0.0244 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.3270 0.0812 -16.34 <.001 0.2653 Pigs 2 0.881 0.330 2.67 0.008 2.413 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Pigs 1 Hence, the presence of pigs on a farm is associated with a higher risk of the farm exhibiting positive samples. Pigs will be included as a candidate factor in the multifactor analysis. Fitting Lab Operator as a factor gives rise to the following output: 5563 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5564 TERMS [FACT=9] Lab_Op 5565 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5566 Lab_Op 5566............................................................................ ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, Lab_Op *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 2 6.5 3.256 3.26 0.039 Residual 925 958.2 1.036 118 Total 927 964.7 1.041 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.080 0.122 -8.83 <.001 0.3397 Lab_Op H -0.304 0.169 -1.80 0.072 0.7379 Lab_Op S -0.635 0.284 -2.24 0.025 0.5299 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Lab_Op D There are clear differences between the prevalence rate associated with different Lab Operators. At a facile level, this is alarming. Obviously, the results of a study should be independent of the technician carrying out the assaying of samples. However, the samples analysed by the different technicians are not randomly sampled across the lifetime of the study, and the initial analysis indicated that there was a major variation in prevalence over the study. Tabulating the number of samples processed by each operator in each month of the study, we get the following values: Month D H S 3 2 3 0 4 6 9 0 5 9 5 0 6 10 21 0 7 19 19 0 8 25 13 0 9 22 26 0 10 24 21 0 11 19 19 0 12 12 14 0 13 23 13 0 14 17 25 0 15 21 32 0 16 26 18 0 17 19 20 0 18 18 17 0 19 15 15 0 20 20 15 0 21 13 6 0 22 31 20 0 23 0 21 0 24 0 13 0 25 0 22 11 26 0 22 23 119 27 0 28 35 28 0 13 18 29 0 9 31 Tabulating the mean prevalences seen in these months, we get the following table: Month D H S 3 0.000 0.333 - 4 0.833 0.000 - 5 0.222 0.600 - 6 0.200 0.190 - 7 0.211 0.263 - 8 0.320 0.231 - 9 0.273 0.154 - 10 0.167 0.143 - 11 0.368 0.421 - 12 0.250 0.357 - 13 0.087 0.077 - 14 0.294 0.200 - 15 0.286 0.188 - 16 0.115 0.222 - 17 0.316 0.300 - 18 0.167 0.118 - 19 0.267 0.333 - 20 0.200 0.200 - 21 0.462 0.333 - 22 0.290 0.200 - 23 - 0.238 - 24 - 0.000 - 25 - 0.182 0.091 26 - 0.136 0.000 27 - 0.179 0.257 28 - 0.077 0.056 29 - 0.000 0.226 Restricting the analysis to months 3-22, when only operators D and H were present, and fitting Lab Operator as an explanatory variable, we get the following output: 5724 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5725 TERMS [FACT=9] Lab_Op 5726 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5727 Lab_Op 5727............................................................................ * MESSAGE: Term Lab_Op cannot be fully included in the model because 1 parameter is aliased with terms already in the model (Lab_Op S) = 0 ***** Regression Analysis ***** 120 Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, Lab_Op *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 1 0.8 0.844 0.84 0.358 Residual 680 749.3 1.102 Total 681 750.1 1.101 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.080 0.122 -8.83 <.001 0.3397 Lab_Op H -0.165 0.180 -0.92 0.357 0.8476 Lab_Op S 0 * * * 1.000 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Lab_Op D There is no significant difference (p=0.36) between the two operators during the months for which they were both operating. Restricting the analysis to months 25-29, when only operators H and S were present, and fitting Lab Operator as an explanatory variable, we get the following output: 5730 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5731 TERMS [FACT=9] Lab_Op **** G5W0013 **** Warning (Code RE 49). Statement 1 on Line 5731 Command: TERMS [FACT=9] Lab_Op No observations found at the reference level of a factor The reference level for factor Lab_Op was Level 1, and has been changed to Level 2 5732 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5733 Lab_Op 5733............................................................................ * MESSAGE: Term Lab_Op cannot be fully included in the model because 1 parameter is aliased with terms already in the model (Lab_Op D) = 0 ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, Lab_Op *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr 121 Regression 1 0.1 0.0853 0.09 0.770 Residual 210 176.3 0.8397 Total 211 176.4 0.8362 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.829 0.299 -6.12 <.001 0.1605 Lab_Op D 0 * * * 1.000 Lab_Op S 0.115 0.393 0.29 0.771 1.122 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Lab_Op H There is no significant difference (p=0.77) between the two operators during the months for which they were both operating. The apparent Lab Operator effect is an artefact of the unbalanced nature of the dataset with respect to this factor. It will therefore not be considered as a candidate factor for the multifactor analysis. We have considered all the candidate explanatory factors. The following factors: FCattle, SamGrF, Cattle, NewSource, BeefonDairy, Breed2, Gra_Slurry, Gra_Manure and Pigs will be candidates for inclusion in the multifactor model. However, the identification in the univariate analyses of significant year and (possibly) seasonal effects would indicate a need for some investigation of these possible descriptive factors prior to the fitting of the multifactor model. Fitting Sam_Year gives rise to the following output: 5753 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5754 TERMS [FACT=9] Sam_Year 5755 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5756 Sam_Year 5756............................................................................ ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, Sam_Year *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 2 10.8 5.419 5.42 0.004 Residual 949 986.2 1.039 Total 951 997.0 1.048 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses *** Estimates of parameters *** 122 antilog of estimate s.e. t(*) t pr. estimate Constant -1.025 0.126 -8.14 <.001 0.3587 Sam_Year 1999 -0.254 0.173 -1.47 0.142 0.7759 Sam_Year 2000 -0.739 0.232 -3.19 0.001 0.4775 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Sam_Year 1998 The effect looks conclusive: a drop in 1999 relative to 1998 was then continued in 2000. However, the results may be deceptive: only a fraction (months 1-5) of 2000 was sampled, and the analysis of monthly figures above might suggest that these months exhibit lower levels of farm prevalence. Hence the figure for Year 2000 could be biased. However, by restricting the analysis only to the months January- May, we can quickly test this hypothesis: 5774 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5775 TERMS [FACT=9] Sam_Year 5776 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5777 Sam_Year 5777............................................................................ ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant, Sam_Year *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 2 9.0 4.4964 4.50 0.011 Residual 474 458.8 0.9680 Total 476 467.8 0.9828 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses * MESSAGE: The following units have high leverage: Unit Response Leverage 1 0.00 0.0195 2 0.00 0.0195 3 1.00 0.0195 4 0.00 0.0195 5 0.00 0.0195 6 0.00 0.0195 7 1.00 0.0195 8 0.00 0.0195 9 0.00 0.0195 10 0.00 0.0195 11 0.00 0.0195 12 0.00 0.0195 13 0.00 0.0195 14 1.00 0.0195 15 1.00 0.0195 16 0.00 0.0195 17 1.00 0.0195 18 0.00 0.0195 19 1.00 0.0195 20 0.00 0.0195 21 0.00 0.0195 22 1.00 0.0195 23 0.00 0.0195 123 24 0.00 0.0195 25 1.00 0.0195 26 0.00 0.0195 27 0.00 0.0195 28 1.00 0.0195 29 1.00 0.0195 30 1.00 0.0195 31 1.00 0.0195 32 1.00 0.0195 33 0.00 0.0195 34 1.00 0.0195 35 0.00 0.0195 36 0.00 0.0195 37 0.00 0.0195 38 1.00 0.0195 39 1.00 0.0195 40 0.00 0.0195 41 0.00 0.0195 42 0.00 0.0195 43 0.00 0.0195 44 0.00 0.0195 45 0.00 0.0195 46 0.00 0.0195 47 0.00 0.0195 48 0.00 0.0195 49 0.00 0.0195 50 0.00 0.0195 51 1.00 0.0195 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -0.693 0.296 -2.34 0.019 0.5000 Sam_Year 1999 -0.658 0.341 -1.93 0.054 0.5176 Sam_Year 2000 -1.071 0.354 -3.02 0.003 0.3425 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Sam_Year 1998 Restricting the analysis to only the first five months of the year, there is clear evidence of a year on year drop in the farm prevalence. There are issues of balance in the dataset when considering Sam_Year and Sam_Month as factors to be fitted within the same model. It is therefore appropriate to used a Generalised Linear Mixed Model to analyse these data, since it will give rise to better estimates when fitting a model to highly unbalanced data. The model to be fitted is Sam_Year+Sam_Month (it is impossible to fit an interaction between these factors due to colinearity in the data), and it gives rise to the following output: 5709 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects; DISTRIBUTION=binomial;\ 5710 LINK=logit; DISPERSION=1; FIXED=Sam_Year+Sam_Mon; RANDOM=Farm; CONSTANT=estimate;\ 5711 FACT=9; PSE=*; MAXCYCLE=20; FMETHOD=all; CADJUST=mean] VFarmPos; NBINOMIAL=N_Bin ***** Generalised Linear Mixed Model Analysis ***** Method: cf Schall (1991) Biometrika Response variate: VFarmPos Distribution: BINOMIAL Link function: LOGIT Random model: Farm Fixed model: Constant + Sam_Year + Sam_Mon * Dispersion parameter fixed at value 1.000 124 *** Monitoring information *** Iteration Gammas Dispersion Max change 1 0.08797 1.000 3.7834E+00 2 0.000001000 1.000 8.7973E-02 3 0.007951 1.000 7.9504E-03 4 0.08668 1.000 7.8730E-02 5 0.08698 1.000 3.0157E-04 6 0.08777 1.000 7.9033E-04 7 0.08780 1.000 2.6984E-05 *** Estimated Variance Components *** Random term Component S.e. Farm 0.088 0.276 *** Residual variance model *** Term Factor Model(order) Parameter Estimate S.e. Dispersn Identity Sigma2 1.000 FIXED *** Estimated Variance matrix for Variance Components *** Farm 1 0.07627 Dispersn 2 0.00000 0.00000 1 2 *** Table of effects for Constant *** -1.637 Standard error: 0.4301 *** Table of effects for Sam_Year *** Sam_Year 1998 1999 2000 0.0000 -0.1716 -0.6894 Standard error of differences: Average 0.2471 Maximum 0.2947 Minimum 0.1938 Average variance of differences: 0.06277 *** Table of effects for Sam_Mon *** Sam_Mon Jan Feb Mar Apr May Jun Jul Aug 0.0000 0.3126 0.8039 0.2403 0.9472 0.1870 0.6891 0.6000 Sam_Mon Sep Oct Nov Dec 0.6828 0.3909 1.0287 0.3368 Standard error of differences: Average 0.4246 Maximum 0.5717 Minimum 0.3162 Average variance of differences: 0.1830 *** Tables of means *** *** Table of predicted means for Sam_Year *** 125 Sam_Year 1998 1999 2000 -1.119 -1.290 -1.808 *** Table of predicted means for Sam_Mon *** Sam_Mon Jan Feb Mar Apr May Jun Jul Aug -1.924 -1.611 -1.120 -1.684 -0.977 -1.737 -1.235 -1.324 Sam_Mon Sep Oct Nov Dec -1.241 -1.533 -0.895 -1.587 *** Back-transformed Means (on the original scale) *** Sam_Year 1998 0.2463 1999 0.2158 2000 0.1409 Sam_Mon Jan 0.1274 Feb 0.1664 Mar 0.2460 Apr 0.1566 May 0.2735 Jun 0.1497 Jul 0.2253 Aug 0.2102 Sep 0.2242 Oct 0.1775 Nov 0.2900 Dec 0.1698 Note: means are probabilities not expected values. 5712 FSPREADSHEET Vars[1] 5713 VDISPLAY [PRINT=Wald] *** Wald tests for fixed effects *** Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob * Sequentially adding terms to fixed model Sam_Year 9.29 2 4.64 0.010 Sam_Mon 13.58 11 1.23 0.257 * Dropping individual terms from full fixed model Sam_Year 5.64 2 2.82 0.059 Sam_Mon 13.58 11 1.23 0.257 The year of sampling appears to be very close to statistical significance (p=0.059), exhibiting a small drop in 1999 and a large drop in 2000. The estimated mean farm prevalences for each year are as follows: Mean Farm Year Prevalence 1998 0.25 1999 0.22 2000 0.14 126 Plotting the mean prevalences by year, with the associated 95% confidence intervals, gives: 1.00 0.80 Mean Farm Prevalence 0.60 0.40 0.20 0.00 1998 1999 2000 Year There is evidence of a mild drop in prevalence in 1999, followed by a larger decrease in 2000. The month of sampling shows no sign of statistical significance (p=0.26). The mean prevalences for these months are as follows: Mean Farm Month Prevalence Jan 0.13 Feb 0.17 Mar 0.25 Apr 0.16 May 0.27 Jun 0.15 Jul 0.23 Aug 0.21 Sep 0.22 Oct 0.18 Nov 0.29 Dec 0.17 It is informative to plot the mean prevalences by month with the associated 95% confidence intervals. 127 1.00 0.80 Mean Farm Prevalence 0.60 0.40 0.20 0.00 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Month There is some evidence of drops in prevalence in April and June and an increase in November. It is also noticeable that December, January, and February present some of the lowest prevalences across the months, even after adjusting for the Sampling Year effect. It will clearly be important to assess the nature of the year effect after allowing for any explanatory factors which are identified as part of the modelling exercise. Given the importance of Month in the within-herd prevalence model, it will also be important to assess whether any Sampling Month-related effects become apparent in the multi-factor model. Considering the candidate factors for the multi-variate model, no terms are forced into the model. 5911 RSEARCH [METHOD=fstep] FCattle+SamGrF+Cattle+NewSource+BeefonDairy+Breed2+Gra_Slur+Gra_Manu+Pigs ***** Model Selection ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Number of units: 951 Forced terms: Constant Forced df: 1 Free terms: FCattle + SamGrF + Cattle + NewSource + BeefonDairy + Breed2 + Gra_Slur + Gra_Manu + Pigs *** Stepwise (forward) analysis of deviance *** Change mean deviance approx d.f. deviance deviance ratio chi pr + SamGrF 3 31.4232 10.4744 10.47 <.001 + Gra_Slur 2 10.8629 5.4314 5.43 0.004 + Gra_Manu 1 10.8730 10.8730 10.87 <.001 + BeefonDairy 1 7.8689 7.8689 7.87 0.005 + Pigs 1 5.1369 5.1369 5.14 0.023 + FCattle 3 7.6210 2.5403 2.54 0.055 128 + Cattle 3 5.7489 1.9163 1.92 0.124 + Breed2 1 2.4449 2.4449 2.44 0.118 + NewSource 1 2.1589 2.1589 2.16 0.142 Residual 934 909.8257 0.9741 Total 950 993.9643 1.0463 Final model: Constant + SamGrF + Gra_Slur + Gra_Manu + BeefonDairy + Pigs + FCattle + Cattle + Breed2 + NewSource SamGrF, Grass Slurry, Grass Manure, BeefonDairy and Pigs all enter the model at a level which is statistically significant at the 5% level. FCattle is close to this level of statistical significance, while Cattle, Breed2 and NewSource all exhibit p-values greater than 0.1. However, none of the variables have such low significance that it would seem sensible to remove them from the analysis at this point. Cattle, Breed2 and NewSource all give rise to p-values which are appreciably higher than those seen within the univariate analyses. Considering factor Cattle, this is not an enormous surprise, given the many other factors included in the model which reflect the size of the farm operation. However, it is important to establish the aspects of the model which are causing the drop in significance assigned to Breed2 and NewSource. In turn, each of Breed2 and NewSource are fitted with and without each other candidate variable. The significance of the factor, based on the change in deviance when it is removed from the two-factor model, is tabulated. Initially considering the Breed2 factor, Other Factor P-Value - 0.021 SamGrF 0.05 Gra_Slur 0.024 Gra_Manu 0.037 BeefonDairy 0.017 Pigs 0.015 FCattle 0.052 Cattle 0.038 NewSource 0.024 It is clear that no single factor is strongly associated with the drop in significance seen in the multi-factor model. This is probably related to the relatively low support present in the dataset for the factor Breed2. Only 6 farms in the dataset had this type of animal present. On balance, it is more likely that the effect is spurious, associated with the high leverage associated with these 6 farms and the unbalanced nature of the dataset. On these grounds, Breed2 should ultimately be excluded from the multifactor analysis. By contrast, tabulating the effects of other factors on NewSource gives: Other Factor P-Value - 0.026 SamGrF 0.151 Gra_Slur 0.028 129 Gra_Manu 0.031 BeefonDairy 0.039 Pigs 0.037 FCattle 0.222 Cattle <0.001 Breed2 0.037 It is immediately clear that the finishing cattle number factors, SamGrF and FCattle are associated with a dramatic drop in the significance associated with NewSource. This is probably due to some type of correlation between large and open farms. Firstly, the multifactor model is fitted without these two factors to confirm the relationship. 5735 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5736 TERMS [FACT=9] SamGrF + Gra_Slur + Gra_Manu +BeefonDairy + Pigs + FCattle + Cattle + Breed2 +NewSource 5737 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5738 Gra_Slur + Gra_Manu +BeefonDairy + Pigs + Cattle + Breed2 +NewSource 5738............................................................................ * MESSAGE: Term Gra_Manu cannot be fully included in the model because 1 parameter is aliased with terms already in the model (Gra_Manu 999) = (Gra_Slur 999) ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant + Gra_Slur + Gra_Manu + BeefonDairy + Pigs + Cattle + Breed2 + NewSource *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 10 58.3 5.8334 5.83 <.001 Residual 940 935.6 0.9954 Total 950 994.0 1.0463 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The residuals do not appear to be random; for example, fitted values in the range 0.04 to 0.07 are consistently larger than observed values and fitted values in the range 0.36 to 0.38 are consistently smaller than observed values * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses * MESSAGE: The following units have high leverage: Unit Response Leverage 62 1.00 0.049 70 1.00 0.049 80 1.00 0.051 131 1.00 0.094 165 0.00 0.049 182 0.00 0.047 200 0.00 0.049 201 0.00 0.049 284 1.00 0.057 297 0.00 0.094 301 1.00 0.094 130 310 0.00 0.047 348 0.00 0.047 370 0.00 0.047 384 0.00 0.094 385 1.00 0.094 418 0.00 0.047 437 0.00 0.047 444 1.00 0.141 460 1.00 0.047 494 0.00 0.141 496 0.00 0.049 497 0.00 0.102 527 0.00 0.255 581 1.00 0.051 599 1.00 0.047 600 0.00 0.176 601 1.00 0.108 602 0.00 0.072 603 1.00 0.084 620 1.00 0.167 637 1.00 0.217 640 1.00 0.051 651 1.00 0.049 659 0.00 0.046 680 0.00 0.073 688 1.00 0.211 701 1.00 0.094 737 0.00 0.200 748 0.00 0.047 750 1.00 0.047 761 0.00 0.141 763 0.00 0.141 769 0.00 0.073 788 1.00 0.094 858 0.00 0.102 884 1.00 0.133 911 0.00 0.167 950 0.00 0.041 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.887 0.203 -9.29 <.001 0.1515 Gra_Slur 1 1.118 0.319 3.50 <.001 3.058 Gra_Slur 999 0.045 0.191 0.24 0.813 1.046 Gra_Manu 1 -1.179 0.367 -3.22 0.001 0.3075 Gra_Manu 999 0 * * * 1.000 BeefonDairy 1 1.313 0.645 2.04 0.042 3.716 Pigs 2 0.890 0.343 2.60 0.009 2.436 Cattle 2 0.532 0.182 2.93 0.003 1.702 Cattle 3 1.271 0.470 2.70 0.007 3.566 Cattle 4 -0.04 1.12 -0.04 0.972 0.9610 Breed2 1 1.709 0.901 1.90 0.058 5.525 NewSource 2 0.501 0.178 2.81 0.005 1.650 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level Gra_Slur 0 Gra_Manu 0 BeefonDairy 0 Pigs 1 Cattle 1 Breed2 0 NewSource 1 Clearly, in the absence of the finishing cattle size factors, NewSource is highly significant (p=0.005). The relationship between these factors initially will be investigated through tabulation. Tabulating the properties of the dataset with respect to NewSource and SamGrF gives: 131 n SamGrF NewSource 1 2 3 4 1 180 142 148 125 2 65 92 93 107 mean SamGrF NewSource 1 2 3 4 1 0.106 0.197 0.216 0.296 2 0.123 0.261 0.237 0.346 var SamGrF NewSource 1 2 3 4 1 0.095 0.159 0.171 0.210 2 0.110 0.195 0.183 0.228 se SamGrF NewSource 1 2 3 4 1 0.023 0.034 0.034 0.041 2 0.041 0.046 0.044 0.046 There is little significant evidence of any difference due to NewSource at any of the levels of SamGrF: in each case the mean is higher in the open farms, but the difference is not appreciable relative to the standard errors. Tabulating the properties of the dataset with respect to NewSource and FCattle gives: n FCattle NewSource 1 2 3 4 1 341 141 88 25 2 124 108 87 38 mean FCattle NewSource 1 2 3 4 1 0.158 0.255 0.216 0.280 2 0.169 0.259 0.299 0.421 var FCattle NewSource 1 2 3 4 1 0.134 0.191 0.171 0.210 2 0.142 0.194 0.212 0.250 se FCattle NewSource 1 2 3 4 1 0.020 0.037 0.044 0.092 2 0.034 0.042 0.049 0.081 132 Again, there is negligible difference in the mean behaviour between open and closed farms except in the farms with the largest numbers of finishing cattle, and there the numbers are small, ensuring that the associated standard errors are large. The evidence for NewSource being the driving factor behind the variability seen in these tables is weak and contradictory. By contrast, both FCattle and SamGrF show self- consistent patterns of effect: all the higher levels of the factor consistently show significantly different prevalence levels to the lowest level. On balance, it is more likely that the NewSource effect is, at best, small and lacking in statistical significance in this study. On these grounds, NewSource should ultimately be excluded from the multifactor analysis. Fitting the remaining factors in a multi-factor model, we generate the following output: 5780 MODEL [DISTRIBUTION=binomial; LINK=logit; DISPERSION=1] VFarmPos; NBINOMIAL=N_Bin 5781 TERMS [FACT=9] SamGrF + Gra_Slur + Gra_Manu+BeefonDairy + Pigs + FCattle + Cattle 5782 FIT [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes; FACT=9]\ 5783 SamGrF + Gra_Slur + Gra_Manu+BeefonDairy + Pigs + FCattle + Cattle 5783............................................................................ * MESSAGE: Term Gra_Manu cannot be fully included in the model because 1 parameter is aliased with terms already in the model (Gra_Manu 999) = (Gra_Slur 999) ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant + SamGrF + Gra_Slur + Gra_Manu + BeefonDairy + Pigs + FCattle + Cattle *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 14 79.5 5.6811 5.68 <.001 Residual 936 914.4 0.9770 Total 950 994.0 1.0463 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The residuals do not appear to be random; for example, fitted values in the range 0.09 to 0.10 are consistently larger than observed values and fitted values in the range 0.33 to 0.34 are consistently smaller than observed values * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses * MESSAGE: The following units have high leverage: Unit Response Leverage 62 1.00 0.060 70 1.00 0.068 80 1.00 0.062 131 1.00 0.098 165 0.00 0.060 182 0.00 0.062 200 0.00 0.068 284 1.00 0.059 297 0.00 0.103 301 1.00 0.106 133 310 0.00 0.057 370 0.00 0.062 384 0.00 0.111 385 1.00 0.107 418 0.00 0.059 437 0.00 0.058 444 1.00 0.214 460 1.00 0.056 494 0.00 0.114 496 0.00 0.074 497 0.00 0.094 527 0.00 0.289 581 1.00 0.059 601 1.00 0.111 602 0.00 0.071 603 1.00 0.083 640 1.00 0.052 651 1.00 0.069 659 0.00 0.052 680 0.00 0.074 701 1.00 0.094 737 0.00 0.214 748 0.00 0.058 750 1.00 0.056 761 0.00 0.114 763 0.00 0.096 769 0.00 0.085 788 1.00 0.104 858 0.00 0.115 884 1.00 0.063 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -2.460 0.272 -9.04 <.001 0.08544 SamGrF 2 0.786 0.266 2.96 0.003 2.195 SamGrF 3 0.642 0.269 2.39 0.017 1.901 SamGrF 4 1.135 0.267 4.25 <.001 3.111 Gra_Slur 1 1.121 0.322 3.48 <.001 3.068 Gra_Slur 999 0.121 0.197 0.61 0.540 1.128 Gra_Manu 1 -1.131 0.371 -3.05 0.002 0.3228 Gra_Manu 999 0 * * * 1.000 BeefonDairy 1 1.788 0.651 2.75 0.006 5.980 Pigs 2 0.893 0.347 2.57 0.010 2.443 FCattle 2 0.280 0.207 1.35 0.176 1.324 FCattle 3 0.183 0.234 0.79 0.432 1.201 FCattle 4 0.783 0.317 2.47 0.013 2.187 Cattle 2 0.277 0.175 1.58 0.113 1.320 Cattle 3 0.845 0.475 1.78 0.075 2.328 Cattle 4 -0.90 1.15 -0.78 0.434 0.4054 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level SamGrF 1 Gra_Slur 0 Gra_Manu 0 BeefonDairy 0 Pigs 1 FCattle 1 Cattle 1 All of the factors included in this model give rise to effect qualitatiatively similar to those seen in the univariate analyses. Again using stepwise regression to explore the properties of the data, we force the above factors to be included in the model, and explore whether any other factors now should be included in the model (excluding time and geographical variables which will be considered later): 134 5838 RSEARCH [METHOD=fstep;FORCED=FCattle+SamGrF+Cattle+BeefonDairy+Gra_Slur+Gra_Manu+Pigs] Manage_O \\ 5839 +Sampler+ Max_Age + Min_Age + Housed + Housing+ NoChange + T_DHouse+Sup_Feed\\ 5840 +Forage + Silage+Conc+ Sil_Home+ Sil_Manu+Sil_Slur+Sil_Sewa+Sil_Geec+Sil_Gull+Hay + Hay_Manu + Hay_Slur+Hay_Geec+Hay_Gull\\ 5841 +Gra_Sewa+Gra_Geec+Gra_Gull+Sheep + N_Horses+ Chicks + Deer+ Water + Water_Con + WaterCT+ Want2Kno \\ 5842 + Visit2 ***** Model Selection ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Number of units: 950 Forced terms: Constant + FCattle + SamGrF + Cattle + BeefonDairy + Gra_Slur + Gra_Manu + Pigs Forced df: 15 Free terms: Manage_O + Sampler + Max_Age + Min_Age + Housed + Housing + NoChange + T_DHouse + Sup_Feed + Forage + Silage + Conc + Sil_Home + Sil_Manu + Sil_Slur + Sil_Sewa + Sil_Geec + Sil_Gull + Hay + Hay_Manu + Hay_Slur + Hay_Geec + Hay_Gull + Gra_Sewa + Gra_Geec + Gra_Gull + Sheep + N_Horses + Chicks + Deer + Water + Water_Con + WaterCT + Want2Kno + Visit2 *** Stepwise (forward) analysis of deviance *** Change mean deviance approx d.f. deviance deviance ratio chi pr + FCattle + SamGrF + Cattle + BeefonDairy + Gra_Slur + Gra_Manu + Pigs 14 79.6050 5.6861 5.69 <.001 + Housing 4 13.2654 3.3163 3.32 0.010 + Max_Age 1 4.0096 4.0096 4.01 0.045 + Water 6 8.7817 1.4636 1.46 0.186 + Sampler 1 3.5225 3.5225 3.52 0.061 + T_DHouse 1 3.1108 3.1108 3.11 0.078 + Sil_Geec 2 3.1216 1.5608 1.56 0.210 + Hay_Slur 2 2.9882 1.4941 1.49 0.224 + Hay_Geec 1 4.7456 4.7456 4.75 0.029 + Hay_Manu 1 3.5651 3.5651 3.57 0.059 + Hay_Gull 1 2.6588 2.6588 2.66 0.103 + Manage_O 3 3.1311 1.0437 1.04 0.372 + Sil_Slur 1 1.3876 1.3876 1.39 0.239 + Sil_Gull 1 1.0677 1.0677 1.07 0.301 Residual 910 858.5149 0.9434 Total 949 993.4757 1.0469 Final model: Constant + FCattle + SamGrF + Cattle + BeefonDairy + Gra_Slur + Gra_Manu + Pigs + Housing + Max_Age + Water + Sampler + T_DHouse + Sil_Geec + Hay_Slur + Hay_Geec + Hay_Manu + Hay_Gull + Manage_O + Sil_Slur + Sil_Gull On fitting this model, it becomes apparent that the model is subject to a serious lack of fit due to aliasing between Housing and Grass_Slurry. Housing is by far the less understandable variable and is dropped. Recalculating the stepwise procedure gives: 5851 RSEARCH [METHOD=fstep;FORCED=FCattle+SamGrF+Cattle+BeefonDairy+Gra_Slur+Gra_Manu+Pigs] Manage_O \\ 5852 +Sampler+ Max_Age + Min_Age + Housed + NoChange + T_DHouse+Sup_Feed\\ 135 5853 +Forage + Silage+Conc+ Sil_Home+ Sil_Manu+Sil_Slur+Sil_Sewa+Sil_Geec+Sil_Gull+Hay + Hay_Manu + Hay_Slur+Hay_Geec+Hay_Gull\\ 5854 +Gra_Sewa+Gra_Geec+Gra_Gull+Sheep + N_Horses+ Chicks + Deer+ Water + Water_Con + WaterCT+ Want2Kno \\ 5855 + Visit2 ***** Model Selection ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Number of units: 950 Forced terms: Constant + FCattle + SamGrF + Cattle + BeefonDairy + Gra_Slur + Gra_Manu + Pigs Forced df: 15 Free terms: Manage_O + Sampler + Max_Age + Min_Age + Housed + NoChange + T_DHouse + Sup_Feed + Forage + Silage + Conc + Sil_Home + Sil_Manu + Sil_Slur + Sil_Sewa + Sil_Geec + Sil_Gull + Hay + Hay_Manu + Hay_Slur + Hay_Geec + Hay_Gull + Gra_Sewa + Gra_Geec + Gra_Gull + Sheep + N_Horses + Chicks + Deer + Water + Water_Con + WaterCT + Want2Kno + Visit2 *** Stepwise (forward) analysis of deviance *** Change mean deviance approx d.f. deviance deviance ratio chi pr + FCattle + SamGrF + Cattle + BeefonDairy + Gra_Slur + Gra_Manu + Pigs 14 79.6050 5.6861 5.69 <.001 + Sampler 1 4.4342 4.4342 4.43 0.035 + Max_Age 1 3.7562 3.7562 3.76 0.053 + Water 6 8.0363 1.3394 1.34 0.235 + T_DHouse 1 3.1636 3.1636 3.16 0.075 + Hay_Geec 2 3.4529 1.7264 1.73 0.178 + Hay_Slur 1 3.8311 3.8311 3.83 0.050 + Hay_Manu 1 3.4441 3.4441 3.44 0.063 + Manage_O 3 4.3497 1.4499 1.45 0.226 + Hay_Gull 1 2.3074 2.3074 2.31 0.129 + Sil_Geec 2 2.5615 1.2807 1.28 0.278 + Housed 1 1.3066 1.3066 1.31 0.253 Residual 915 873.2272 0.9543 Total 949 993.4757 1.0469 Final model: Constant + FCattle + SamGrF + Cattle + BeefonDairy + Gra_Slur + Gra_Manu + Pigs + Sampler + Max_Age + Water + T_DHouse + Hay_Geec + Hay_Slur + Hay_Manu + Manage_O + Hay_Gull + Sil_Geec + Housed The threshold for inclusion is set deliberately low, so many of these factors will lack statistical significance. We examine their suitability for inclusion in the model by implementing a backwards stepwise procedure. 1/ Housed is not statistically significant when dropped (p=0.22). Housed is dropped. 2/ Sil_Geece is not statistically significant when dropped (p=0.11). Sil_Geece is dropped. 3/ Sil_Slur is not statistically significant when dropped (p=0.84). Sil_Slur is dropped. 4/ Sample is not statistically significant when dropped (p= 0.17). Sample is dropped. 5/ Sil_Gull is not statistically significant when dropped (p=0.57). Sil_Gull is dropped. 6/ Water is not statistically significant when dropped (p=0.19). Water is dropped. 136 7/ Hay_Geece is not statistically significant when dropped (p=0.20). Hay_Geece is dropped. 8/ Hay_Manu is not statistically significant when dropped (p=0.54). Hay_Manu is dropped. 9/ Hay_Gull is not statistically significant when dropped (p=0.30). Hay_Gull is dropped. 10/ Hay_Slurry is not statistically significant when dropped (p=0.25). Hay_Slurry is dropped. 11/ T_DHouse is not statistically significant when dropped (p=0.11). T_DHouse is dropped. 12/ Cattle is not statistically significant when dropped (p=0.18). Cattle is dropped. All the remaining factors are statistically significant at at least the 10% level. The factor Max_Age has been added as a new candidate factor, where a higher maximum age in the animals in the sample group means that the samples are less likely to contain a positive. Examination of the histogram of this variable suggests that it is unlikely to be subject to serious leverage problems. 5883 DROP [PRINT=model,summary,estimates; CONSTANT=estimate; FPROB=yes; TPROB=yes] Cattle * MESSAGE: Term Gra_Manu cannot be fully included in the model because 1 parameter is aliased with terms already in the model (Gra_Manu 999) = (Gra_Slur 999) ***** Regression Analysis ***** Response variate: VFarmPos Binomial totals: N_Bin Distribution: Binomial Link function: Logit Fitted terms: Constant + FCattle + SamGrF + BeefonDairy + Gra_Slur + Gra_Manu + Pigs + Max_Age *** Summary of analysis *** mean deviance approx d.f. deviance deviance ratio chi pr Regression 12 78.0 6.5038 6.50 <.001 Residual 937 915.4 0.9770 Total 949 993.5 1.0469 Change 3 4.9 1.6474 1.65 0.176 * MESSAGE: ratios are based on dispersion parameter with value 1 Dispersion parameter is fixed at 1.00 * MESSAGE: The residuals do not appear to be random; for example, fitted values in the range 0.07 to 0.09 are consistently larger than observed values and fitted values in the range 0.55 to 0.58 are consistently smaller than observed values * MESSAGE: The error variance does not appear to be constant: large responses are more variable than small responses * MESSAGE: The following units have high leverage: Unit Response Leverage 25 1.00 0.046 53 1.00 0.049 80 1.00 0.060 131 1.00 0.091 297 0.00 0.105 301 1.00 0.108 384 0.00 0.109 385 1.00 0.102 440 1.00 0.049 137 497 0.00 0.099 527 0.00 0.051 552 0.00 0.050 581 1.00 0.059 601 1.00 0.109 602 0.00 0.096 640 1.00 0.052 659 0.00 0.049 701 1.00 0.097 788 1.00 0.101 858 0.00 0.114 *** Estimates of parameters *** antilog of estimate s.e. t(*) t pr. estimate Constant -1.750 0.384 -4.55 <.001 0.1738 FCattle 2 0.383 0.208 1.84 0.066 1.466 FCattle 3 0.362 0.234 1.55 0.122 1.436 FCattle 4 0.981 0.318 3.08 0.002 2.668 SamGrF 2 0.746 0.266 2.81 0.005 2.109 SamGrF 3 0.632 0.269 2.35 0.019 1.882 SamGrF 4 1.097 0.268 4.09 <.001 2.995 BeefonDairy 1 1.967 0.641 3.07 0.002 7.150 Gra_Slur 1 1.201 0.319 3.76 <.001 3.324 Gra_Slur 999 0.084 0.197 0.43 0.668 1.088 Gra_Manu 1 -1.164 0.369 -3.15 0.002 0.3122 Gra_Manu 999 0 * * * 1.000 Pigs 2 0.891 0.347 2.57 0.010 2.438 Max_Age -0.0309 0.0154 -2.01 0.044 0.9695 * MESSAGE: s.e.s are based on dispersion parameter with value 1 Parameters for factors are differences compared with the reference level: Factor Reference level FCattle 1 SamGrF 1 BeefonDairy 0 Gra_Slur 0 Gra_Manu 0 Pigs 1 Hence, the factors FCattle, SamGrF, Beefin Dairy, Gra_Slurry, Gra_Manure, Pigs and the variate Max_Age are carried forward for detailed review in the Generalised Linear Mixed Model. Fitting this model in the Generalised Linear Mixed Model context gives the following output. Initially, County and veterinary practice are fitted as possible random effects along with Farm. 5560 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects; DISTRIBUTION=binomial;\ 5561 LINK=logit; DISPERSION=1; FIXED=SamGrF + Gra_Slur + Gra_Manu +BeefonDairy + Pigs + FCattle + Max_Age;\ 5562 RANDOM=County+Vet+Farm; CONSTANT=estimate; FACT=9; PSE=*; MAXCYCLE=20; FMETHOD=fixed;\ 5563 CADJUST=mean] VFarmPos; NBINOMIAL=N_Bin **** G5W0001 **** Warning (Code VC 38). Statement 131 in Procedure GLMM Command: REML [PRINT=*; RMETHOD=all] TRANS Value of deviance at final iteration larger than at previous iteration(s) Minimum deviance = 2199.17: value at final iteration = 2215.26 **** G5W0002 **** Warning (Code VD 12). Statement 131 in Procedure GLMM Command: REML [PRINT=*; RMETHOD=all] TRANS REML algorithm has diverged/parameters out of bounds - output not available 138 Results may be unreliable. Printed estimates of variance parameters/monitoring information are available from REML or VDISPLAY and will indicate which parameters are unstable. Redefine the model or use better initial values. **** G5W0003 **** Warning (Code VD 12). Statement 135 in Procedure GLMM Command: VKEEP #RAND; COMP=V[] REML algorithm has diverged/parameters out of bounds - output not available Results may be unreliable. Printed estimates of variance parameters/monitoring information are available from REML or VDISPLAY and will indicate which parameters are unstable. Redefine the model or use better initial values. * Message: Negative variance components present: * Tables of effects/means will be produced for random model terms but should be used with caution ***** Generalised Linear Mixed Model Analysis ***** Method: Marginal model, cf Breslow & Clayton (1993) JASA Response variate: VFarmPos Distribution: BINOMIAL Link function: LOGIT Random model: (County + Vet) + Farm Fixed model: Constant + (((((SamGrF + Gra_Slur) + Gra_Manu) + Beefin Dairy) + Pigs) + FCattle) + Max_Age * Dispersion parameter fixed at value 1.000 ******** Warning from GLMM: missing values generated in weights/working variate. *** Monitoring information *** Iteration Gammas Dispersion Max change 1 0.009026 0.0001000 0.2296 1.000 3.4670E+00 ******** Warning from GLMM: missing values generated in weights/working variate. 2 0.005302 0.0001000 0.0001000 1.000 2.2952E-01 ******** Warning from GLMM: missing values generated in weights/working variate. 3 0.005788 0.0001000 0.09561 1.000 9.5507E-02 ******** Warning from GLMM: missing values generated in weights/working variate. 4 0.005703 0.0001000 0.2326 1.000 1.3699E-01 ******** Warning from GLMM: missing values generated in weights/working variate. 5 0.005657 0.0001000 0.2370 1.000 4.3747E-03 ******** Warning from GLMM: missing values generated in weights/working variate. 6 0.005638 0.0001000 0.2368 1.000 2.0973E-04 ******** Warning from GLMM: missing values generated in weights/working variate. 139 7 0.005635 0.0001000 0.2367 1.000 7.3462E-05 *** Estimated Variance Components *** Random term Component S.e. County 0.006 0.052 Vet 0.000 0.093 Farm 0.237 0.304 *** Residual variance model *** Term Factor Model(order) Parameter Estimate S.e. Dispersn Identity Sigma2 1.000 FIXED *** Estimated Variance matrix for Variance Components *** County 1 0.00272 Vet 2 -0.00204 0.00858 Farm 3 -0.00034 -0.00654 0.09255 Dispersn 4 0.00000 0.00000 0.00000 0.00000 1 2 3 4 *** Table of effects for Constant *** -2.349 Standard error: 0.2667 *** Table of effects for SamGrF *** SamGrF 1 2 3 4 0.0000 0.7505 0.6321 1.0896 Standard error of differences: Average 0.2515 Maximum 0.2737 Minimum 0.2280 Average variance of differences: 0.06369 *** Table of effects for Gra_Slur *** Gra_Slur 0.0 1.0 999.0 0.0000 1.2091 0.0885 Standard error of differences: Average 0.2846 Maximum 0.3277 Minimum 0.2011 Average variance of differences: 0.08451 *** Table of effects for Gra_Manu *** Gra_Manu 0.0 1.0 999.0 0.0000 -1.1654 0.0000 Standard error of differences: 0.3776 *** Table of effects for BeefonDairy *** BeefonDairy 0.0000 1.0000 0.0000 1.9659 140 Standard error of differences: 0.6598 *** Table of effects for Pigs *** Pigs 1 2 0.0000 0.8876 Standard error of differences: 0.3565 *** Table of effects for FCattle *** FCattle 1 2 3 4 0.0000 0.3834 0.3564 0.9813 Standard error of differences: Average 0.2796 Maximum 0.3352 Minimum 0.2130 Average variance of differences: 0.08062 *** Table of effects for Max_Age *** -0.03106 Standard error: 0.015738 **** G5W0004 **** Warning (Code VC 19). Statement 268 in Procedure GLMM Command: VKEEP #PFORM; MEANS=MEANS[]; VARMEANS=VARMEANS[] Table/sed matrix not available for mean effects of covariates Table of mean effects cannot be saved for term Max_Age as it is a variate/covariate *** Tables of means *** * Using covariate mean values *** Table of predicted means for SamGrF *** SamGrF 1 2 3 4 -0.4479 0.3025 0.1842 0.6417 *** Table of predicted means for Gra_Slur *** Gra_Slur 0.0 1.0 999.0 -0.2624 0.9467 -0.1740 *** Table of predicted means for Gra_Manu *** Gra_Manu 0.0 1.0 999.0 0.5586 -0.6068 0.5586 *** Table of predicted means for BeefonDairy *** BeefonDairy 0.0000 1.0000 -0.8129 1.1531 *** Table of predicted means for Pigs *** 141 Pigs 1 2 -0.2737 0.6139 *** Table of predicted means for FCattle *** FCattle 1 2 3 4 -0.2602 0.1233 0.0962 0.7212 *** Back-transformed Means (on the original scale) *** * Using covariate mean values SamGrF 1 0.3899 2 0.5751 3 0.5459 4 0.6551 Gra_Slur 0.0 0.4348 1.0 0.7204 999.0 0.4566 Gra_Manu 0.0 0.6361 1.0 0.3528 999.0 0.6361 BeefonDairy 0.0000 0.3073 1.0000 0.7601 Pigs 1 0.4320 2 0.6488 FCattle 1 0.4353 2 0.5308 3 0.5240 4 0.6729 Note: means are probabilities not expected values. Veterinary practice is clearly the least significant (in fact, virtually non-existent) variance component. The model is refitted without this random factor. 5564 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects; DISTRIBUTION=binomial;\ 5565 LINK=logit; DISPERSION=1; FIXED=SamGrF + Gra_Slur + Gra_Manu +BeefonDairy + Pigs + FCattle + Max_Age;\ 5566 RANDOM=County+Farm; CONSTANT=estimate; FACT=9; PSE=*; MAXCYCLE=20; FMETHOD=fixed;\ 5567 CADJUST=mean] VFarmPos; NBINOMIAL=N_Bin * Message: Negative variance components present: * Tables of effects/means will be produced for random model terms but should be used with caution ***** Generalised Linear Mixed Model Analysis ***** 142 Method: Marginal model, cf Breslow & Clayton (1993) JASA Response variate: VFarmPos Distribution: BINOMIAL Link function: LOGIT Random model: County + Farm Fixed model: Constant + (((((SamGrF + Gra_Slur) + Gra_Manu) + Beefin Dairy) + Pigs) + FCattle) + Max_Age * Dispersion parameter fixed at value 1.000 ******** Warning from GLMM: missing values generated in weights/working variate. *** Monitoring information *** Iteration Gammas Dispersion Max change 1 0.0001000 0.1210 1.000 3.5292E+00 ******** Warning from GLMM: missing values generated in weights/working variate. 2 0.0001000 0.0001000 1.000 1.2092E-01 ******** Warning from GLMM: missing values generated in weights/working variate. 3 0.0001000 0.0001000 1.000 0.0000E+00 *** Estimated Variance Components *** Random term Component S.e. County 0.000 0.042 Farm 0.000 0.278 *** Residual variance model *** Term Factor Model(order) Parameter Estimate S.e. Dispersn Identity Sigma2 1.000 FIXED *** Estimated Variance matrix for Variance Components *** County 1 0.00173 Farm 2 -0.00160 0.07753 Dispersn 3 0.00000 0.00000 0.00000 1 2 3 *** Table of effects for Constant *** -2.345 Standard error: 0.2540 *** Table of effects for SamGrF *** SamGrF 1 2 3 4 0.0000 0.7439 0.6306 1.0895 Standard error of differences: Average 0.2424 Maximum 0.2623 Minimum 0.2212 143 Average variance of differences: 0.05914 *** Table of effects for Gra_Slur *** Gra_Slur 0.0 1.0 999.0 0.0000 1.2053 0.0917 Standard error of differences: Average 0.2747 Maximum 0.3156 Minimum 0.1946 Average variance of differences: 0.07864 *** Table of effects for Gra_Manu *** Gra_Manu 0.0 1.0 999.0 0.0000 -1.1552 0.0000 Standard error of differences: 0.3589 *** Table of effects for BeefonDairy *** BeefonDairy 0.0000 1.0000 0.0000 1.9648 Standard error of differences: 0.6380 *** Table of effects for Pigs *** Pigs 1 2 0.0000 0.8917 Standard error of differences: 0.3454 *** Table of effects for FCattle *** FCattle 1 2 3 4 0.0000 0.3818 0.3515 0.9802 Standard error of differences: Average 0.2710 Maximum 0.3254 Minimum 0.2062 Average variance of differences: 0.07578 *** Table of effects for Max_Age *** -0.03085 Standard error: 0.015202 **** G5W0005 **** Warning (Code VC 19). Statement 268 in Procedure GLMM Command: VKEEP #PFORM; MEANS=MEANS[]; VARMEANS=VARMEANS[] Table/sed matrix not available for mean effects of covariates Table of mean effects cannot be saved for term Max_Age as it is a variate/covariate *** Tables of means *** * Using covariate mean values 144 *** Table of predicted means for SamGrF *** SamGrF 1 2 3 4 -0.4409 0.3030 0.1897 0.6486 *** Table of predicted means for Gra_Slur *** Gra_Slur 0.0 1.0 999.0 -0.2572 0.9481 -0.1655 *** Table of predicted means for Gra_Manu *** Gra_Manu 0.0 1.0 999.0 0.5602 -0.5950 0.5602 *** Table of predicted means for BeefonDairy *** BeefonDairy 0.0000 1.0000 -0.8073 1.1575 *** Table of predicted means for Pigs *** Pigs 1 2 -0.2707 0.6210 *** Table of predicted means for FCattle *** FCattle 1 2 3 4 -0.2533 0.1286 0.0982 0.7270 *** Back-transformed Means (on the original scale) *** * Using covariate mean values SamGrF 1 0.3915 2 0.5752 3 0.5473 4 0.6567 Gra_Slur 0.0 0.4360 1.0 0.7207 999.0 0.4587 Gra_Manu 0.0 0.6365 1.0 0.3555 999.0 0.6365 BeefonDairy 0.0000 0.3085 1.0000 0.7609 Pigs 1 0.4327 2 0.6504 FCattle 1 0.4370 145 2 0.5321 3 0.5245 4 0.6741 Note: means are probabilities not expected values. Neither variance component is significantly affecting the model. It would seem sensible, however, to attempt to fit the model with only the lowest stratum of variability. 5568 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects; DISTRIBUTION=binomial;\ 5569 LINK=logit; DISPERSION=1; FIXED=SamGrF + Gra_Slur + Gra_Manu +BeefonDairy + Pigs + FCattle + Max_Age;\ 5570 RANDOM=Farm; CONSTANT=estimate; FACT=9; PSE=*; MAXCYCLE=20; FMETHOD=fixed;\ 5571 CADJUST=mean] VFarmPos; NBINOMIAL=N_Bin ***** Generalised Linear Mixed Model Analysis ***** Method: Marginal model, cf Breslow & Clayton (1993) JASA Response variate: VFarmPos Distribution: BINOMIAL Link function: LOGIT Random model: Farm Fixed model: Constant + (((((SamGrF + Gra_Slur) + Gra_Manu) + Beefin Dairy) + Pigs) + FCattle) + Max_Age * Dispersion parameter fixed at value 1.000 ******** Warning from GLMM: missing values generated in weights/working variate. *** Monitoring information *** Iteration Gammas Dispersion Max change 1 0.09728 1.000 3.5426E+00 ******** Warning from GLMM: missing values generated in weights/working variate. 2 0.0001000 1.000 9.7176E-02 ******** Warning from GLMM: missing values generated in weights/working variate. 3 0.0001000 1.000 0.0000E+00 *** Estimated Variance Components *** Random term Component S.e. Farm 0.000 0.276 *** Residual variance model *** Term Factor Model(order) Parameter Estimate S.e. Dispersn Identity Sigma2 1.000 FIXED *** Estimated Variance matrix for Variance Components *** Farm 1 0.07605 Dispersn 2 0.00000 0.00000 146 1 2 *** Table of effects for Constant *** -2.345 Standard error: 0.2540 *** Table of effects for SamGrF *** SamGrF 1 2 3 4 0.0000 0.7439 0.6307 1.0896 Standard error of differences: Average 0.2424 Maximum 0.2623 Minimum 0.2212 Average variance of differences: 0.05914 *** Table of effects for Gra_Slur *** Gra_Slur 0.0 1.0 999.0 0.0000 1.2053 0.0917 Standard error of differences: Average 0.2746 Maximum 0.3156 Minimum 0.1946 Average variance of differences: 0.07863 *** Table of effects for Gra_Manu *** Gra_Manu 0.0 1.0 999.0 0.0000 -1.1552 0.0000 Standard error of differences: 0.3589 *** Table of effects for BeefonDairy *** BeefonDairy 0.0000 1.0000 0.0000 1.9648 Standard error of differences: 0.6380 *** Table of effects for Pigs *** Pigs 1 2 0.0000 0.8918 Standard error of differences: 0.3454 *** Table of effects for FCattle *** FCattle 1 2 3 4 0.0000 0.3818 0.3515 0.9802 Standard error of differences: Average 0.2710 Maximum 0.3254 Minimum 0.2062 Average variance of differences: 0.07578 *** Table of effects for Max_Age *** -0.03085 Standard error: 0.015201 147 **** G5W0006 **** Warning (Code VC 19). Statement 268 in Procedure GLMM Command: VKEEP #PFORM; MEANS=MEANS[]; VARMEANS=VARMEANS[] Table/sed matrix not available for mean effects of covariates Table of mean effects cannot be saved for term Max_Age as it is a variate/covariate *** Tables of means *** * Using covariate mean values *** Table of predicted means for SamGrF *** SamGrF 1 2 3 4 -0.4408 0.3031 0.1898 0.6487 *** Table of predicted means for Gra_Slur *** Gra_Slur 0.0 1.0 999.0 -0.2571 0.9481 -0.1654 *** Table of predicted means for Gra_Manu *** Gra_Manu 0.0 1.0 999.0 0.5603 -0.5949 0.5603 *** Table of predicted means for BeefonDairy *** BeefonDairy 0.0000 1.0000 -0.8072 1.1576 *** Table of predicted means for Pigs *** Pigs 1 2 -0.2707 0.6211 *** Table of predicted means for FCattle *** FCattle 1 2 3 4 -0.2532 0.1287 0.0983 0.7270 *** Back-transformed Means (on the original scale) *** * Using covariate mean values SamGrF 1 0.3915 2 0.5752 3 0.5473 4 0.6567 Gra_Slur 0.0 0.4361 1.0 0.7207 999.0 0.4587 148 Gra_Manu 0.0 0.6365 1.0 0.3555 999.0 0.6365 BeefonDairy 0.0000 0.3085 1.0000 0.7609 Pigs 1 0.4327 2 0.6505 FCattle 1 0.4370 2 0.5321 3 0.5246 4 0.6742 Note: means are probabilities not expected values. Given the complete lack of significance of the Farm effect, it was thought worthwhile to investigate the equivalent model incorporating County as the sole random effect. 5576 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects; DISTRIBUTION=binomial;\ 5577 LINK=logit; DISPERSION=1; FIXED=SamGrF + Gra_Slur + Gra_Manu +BeefonDairy + Pigs + FCattle + Max_Age;\ 5578 RANDOM=County; CONSTANT=estimate; FACT=9; PSE=*; MAXCYCLE=20; FMETHOD=fixed;\ 5579 CADJUST=mean] VFarmPos; NBINOMIAL=N_Bin * Message: Negative variance components present: * Tables of effects/means will be produced for random model terms but should be used with caution ***** Generalised Linear Mixed Model Analysis ***** Method: Marginal model, cf Breslow & Clayton (1993) JASA Response variate: VFarmPos Distribution: BINOMIAL Link function: LOGIT Random model: County Fixed model: Constant + (((((SamGrF + Gra_Slur) + Gra_Manu) + Beefin Dairy) + Pigs) + FCattle) + Max_Age * Dispersion parameter fixed at value 1.000 ******** Warning from GLMM: missing values generated in weights/working variate. *** Monitoring information *** Iteration Gammas Dispersion Max change 1 0.0001000 1.000 1.1357E-02 ******** Warning from GLMM: missing values generated in weights/working variate. 2 0.0001000 1.000 0.0000E+00 149 *** Estimated Variance Components *** Random term Component S.e. County 0.000 0.036 *** Residual variance model *** Term Factor Model(order) Parameter Estimate S.e. Dispersn Identity Sigma2 1.000 FIXED *** Estimated Variance matrix for Variance Components *** County 1 0.0012650 Dispersn 2 0.0000000 0.0000000 1 2 *** Table of effects for Constant *** -2.235 Standard error: 0.2221 *** Table of effects for SamGrF *** SamGrF 1 2 3 4 0.0000 0.6742 0.5690 1.0116 Standard error of differences: Average 0.2221 Maximum 0.2363 Minimum 0.2097 Average variance of differences: 0.04944 *** Table of effects for Gra_Slur *** Gra_Slur 0.0 1.0 999.0 0.0000 1.1463 0.0813 Standard error of differences: Average 0.2590 Maximum 0.2994 Minimum 0.1803 Average variance of differences: 0.07020 *** Table of effects for Gra_Manu *** Gra_Manu 0.0 1.0 999.0 0.0000 -1.0643 0.0000 Standard error of differences: 0.3138 *** Table of effects for BeefonDairy *** BeefonDairy 0.0000 1.0000 0.0000 1.9072 Standard error of differences: 0.6258 *** Table of effects for Pigs *** 150 Pigs 1 2 0.0000 0.8653 Standard error of differences: 0.3371 *** Table of effects for FCattle *** FCattle 1 2 3 4 0.0000 0.3586 0.3300 0.9417 Standard error of differences: Average 0.2587 Maximum 0.3151 Minimum 0.1918 Average variance of differences: 0.06943 *** Table of effects for Max_Age *** -0.02929 Standard error: 0.014025 ******** Warning (Code VC 19). Statement 268 in Procedure GLMM Command: VKEEP #PFORM; MEANS=MEANS[]; VARMEANS=VARMEANS[] Table/sed matrix not available for mean effects of covariates Table of mean effects cannot be saved for term Max_Age as it is a variate/covariate *** Tables of means *** * Using covariate mean values *** Table of predicted means for SamGrF *** SamGrF 1 2 3 4 -0.3868 0.2874 0.1822 0.6248 *** Table of predicted means for Gra_Slur *** Gra_Slur 0.0 1.0 999.0 -0.2322 0.9140 -0.1510 *** Table of predicted means for Gra_Manu *** Gra_Manu 0.0 1.0 999.0 0.5317 -0.5326 0.5317 *** Table of predicted means for BeefonDairy *** BeefonDairy 0.0000 1.0000 -0.7767 1.1305 *** Table of predicted means for Pigs *** Pigs 1 2 -0.2557 0.6096 *** Table of predicted means for FCattle *** 151 FCattle 1 2 3 4 -0.2307 0.1280 0.0993 0.7111 *** Back-transformed Means (on the original scale) *** * Using covariate mean values SamGrF 1 0.4045 2 0.5714 3 0.5454 4 0.6513 Gra_Slur 0.0 0.4422 1.0 0.7138 999.0 0.4623 Gra_Manu 0.0 0.6299 1.0 0.3699 999.0 0.6299 BeefonDairy 0.0000 0.3150 1.0000 0.7559 Pigs 1 0.4364 2 0.6478 FCattle 1 0.4426 2 0.5320 3 0.5248 4 0.6706 Note: means are probabilities not expected values. Hence there is no evidence of any of the random effects being particularly important. However, it would seem sensible to use a REML-type algorithm to fit the data, given the strongly unbalanced nature of the dataset. Hence, we will fit the model with Farm as the sole random effect. Refitting the model (output not listed) and calculating Wald statistics for the fixed effects gives the following results: 5582 VDISPLAY [PRINT=Wald] 5582............................................................................ *** Wald tests for fixed effects *** Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob * Sequentially adding terms to fixed model SamGrF 26.59 3 8.86 <0.001 Gra_Slur 9.38 2 4.69 0.009 Gra_Manu 9.17 1 9.17 0.002 BeefonDairy 8.10 1 8.10 0.004 Pigs 5.21 1 5.21 0.022 FCattle 7.79 3 2.60 0.051 Max_Age 4.12 1 4.12 0.042 152 * Dropping individual terms from full fixed model SamGrF 17.56 3 5.85 <0.001 Gra_Slur 15.20 2 7.60 <0.001 Gra_Manu 10.36 1 10.36 0.001 BeefonDairy 9.48 1 9.48 0.002 Pigs 6.67 1 6.67 0.010 FCattle 10.36 3 3.45 0.016 Max_Age 4.12 1 4.12 0.042 * Message: chi-square distribution for Wald tests is an asymptotic approximation (i.e. for large samples) and underestimates the probabilities in other cases. Even allowing for the liberal nature of the Wald tests, it is clear that there is strong statistical evidence for the inclusion of each of the factors in the final multi-factor model. Each factor will be reviewed in turn, plotting the mean estimated farm prevalence for different levels of each factor, along with the associated 95% confidence intervals. Considering SamGrF, there is clear evidence that farms with fewer than 12 animals in the sampling group have a lower probability of exhibiting shedding. Category Mean Farm Prevalence <12 0.39 12-17 0.58 18-28 0.55 >28 0.66 Any trend in the data would be assumed to be monotonic, and hence it seems likely that the (statistically insignificant) difference between categories 2 and 3 is simply due to stochastic noise. It is not immediately clear how the prevalence in the highest category relates to those in the intermediate categories. 1 0.8 Mean Farm Prevalence 0.6 0.4 0.2 0 <12 12-17 18-28 >28 Categories 153 Contrasting the mean in the first category with the means in the 2 intermediate categories, we find that the mean difference (on the logit scale) equals 0.69, the standard error is 0.23 and hence the t-statistic equals 2.93, with an associated p-value of 0.003. Hence, the probability of detecting shedding is lower in groups containing fewer than 12 animals than in groups containing 12-28 animals. Contrasting the mean in the final category with the means in the 2 intermediate categories, we find that the mean difference on the logit scale equals 0.40, the standard error is 0.19 and hence the t-statistic equals 2.12, with an associated p-value of 0.03. Hence, the probability of detecting shedding is lower in groups containing 12-28 animals than in groups containing more animals. It might be thought that this is a truism: that if on all farms, each animal has an independent chance of shedding, and hence the larger the number of samples tested, the more likely it is that a positive sample will be detected. In practice, we might suspect that the independence assumption is extremely unlikely to be true, but we need to assess the results under such a hypothesis. The first requirement is to estimate the independent probability of animal infection. For each category, we tabulate the median number of samples collected, and hence, based on the estimated farm prevalences for these categories, an estimate of the individual probabilities. Mean Median Individual Category Prevalence Samples Probability <12 0.39 8 0.060 12-17 0.58 14 0.059 18-28 0.55 18 0.043 >28 0.66 22 0.047 The higher the number of samples in the sample, the weaker the effect of variability in the sampling distribution on the individual probabilities. However, the Highest category is unbounded, which will increase the variability again. On this basis, a value of 0.043, derived from the 18-28 category, is used as the estimate of the individual probability. Estimated Prevalence Modelled Category from Data Prevalence <12 0.39 0.30 12-17 0.58 0.46 18-28 0.55 0.55 >28 0.66 0.62 Given the sizeable numbers of farms in the study, the differences between the estimated and modelled prevalences in the lowest two categories are appreciable. Similar results were generated by calculating the individual probability of detection for each farm, and then averaging these by category. On this basis, it seems unlikely that the pattern of prevalences associated with SamGrF are purely explicable as being a mechanical association with the highly correlated term, number of samples collected. Besides which, the within-herd prevalence estimated here is very much less 154 than that calculated from the within-herd prevalence data. This must cast considerable doubt on the argument that this observed effect is an artefact of the sampling scheme. However, this possibility should be taken into account when discussing this variable. However, the inclusion of FCattle in the model, even in the presence of SamGrF, indicates that there are genuine ‘size of operation’ effects present in the epidemiology of infection. In view of this, we will next consider the factor FCattle. The pattern of prevalence can be seen in the following diagram: 1 0.8 Mean Farm Prevalence 0.6 0.4 0.2 0 1-49 50-99 100-199 200+ Categories The mean farm prevalences for each category of FCattle are as follows: Category Mean Farm Prevalence 1-49 0.44 50-99 0.53 100-199 0.52 200+ 0.67 There is some indication of an upwards trend in the data with respect to higher numbers of finishing cattle, especially when comparing the lowest category, the middle two categories and the highest category. Comparing the lowest category (<50 animals) with the two intermediate categories (50-99 and 100-199), the mean difference in prevalences (on the logit scale) is 0.37, with a standard error of 0.19, giving rise to a t-statistic of 1.98 and an associated p- value of 0.048. Comparing the intermediate categories with the highest category (200+ animals), the mean difference in prevalences (on the logit scale) is 0.61, with a standard error of 0.30, giving rise to a t-statistic of 2.07 and an associated p-value of 0.039. Hence, there is evidence of a trend of increased risk of shedding being identified, associated with higher numbers of finishing cattle on the farm. In the context of SamGrF also being fitted to the model, this result is almost certainly a genuine effect of enterprise size. It might be associated with some threshold results 155 from epidemic modelling theory, or from some aspects of animal management on larger enterprises. Next, considering the effect of spreading slurry on pasture. It will be remembered that this question was in the main asked only to farms with animals at pasture. Hence, the inclusion in this analysis of a ‘Housed’ category, to reflect the prevalences seen, on average, on farms on which the question was not asked. The mean prevalences for the different categories are as follows: Mean Farm Category Prevalence Unhoused: No Slurry 0.44 Unhoused: Slurry Spread 0.72 Housed 0.46 It is apparent that the mean prevalences seen in Housed animals and in Pastured animals from farms which do not spread slurry are virtually identical. However, the mean prevalence on farms which do spread slurry is appreciably higher. Comparing the mean prevelences on farms with animals at pasture, comparing those which spread slurry and those which do not, we find that the mean difference (on the logit scale) equals 1.21, the standard error equals 0.32, giving rise to a t-statistic of 3.82 and an associated p-value less than 0.001. The spreading of slurry on pasture is a significant risk factor on farms with animals at pasture. 1 0.8 Mean Farm Prevalence 0.6 0.4 0.2 0 Unhoused: No Slurry Unhoused: Slurry Housed Spread Categories Next, we consider the effect of spreading manure on pasture. Again, this question was in the main asked only to farms with animals at pasture. Hence, the repeated inclusion in this analysis of a ‘Housed’ category, to reflect the prevalences seen, on average, on farms on which the question was not asked. 156 1 0.8 Mean Farm Prevalence 0.6 0.4 0.2 0 Unhoused: No Manure Unhoused: Manure Housed Spread Categories The mean farm prevalences for the different categories of farm are as follows: Mean Farm Category Prevalence Unhoused: No Manure 0.64 Unhoused: Manure Spread 0.36 Housed 0.64 It is apparent that the mean prevalences seen in Housed animals and in Pastured animals from farms which do not spread manure are virtually identical. However, the mean prevalence on farms which do spread manure is appreciably lower. Comparing the mean prevelences on farms with animals at pasture, comparing those which spread manure and those which do not, we find that the mean difference (on the logit scale) equals 1.16, the standard error equals 0.36, giving rise to a t-statistic of 3.22 and an associated p-value of 0.001. The spreading of manure on pasture is a significant protective factor on farms with animals at pasture. This result may appear somewhat counterintuitive: however, it may be related to the manure management regime in place on a farm which wishes to spread this material on pasture. If the regimen which is put in place to achieve this reduces contact of animals with faeces in the short term during time periods when the animals are housed, this may have a negative effect on the ability of the infection to maintain itself on the farm, and hence it gives rise to a reduction in farm prevalence even later, when the animals (ironically) are at pasture, and hence in contact with the manure. The results seen earlier in the within-herd prevalence analysis would suggest that the contact of animals with infection while housed is more important in maintaining high prevalence levels than any contact while at pasture. It is unfortunate that the design of the study does not allow any investigation of whether similar manure on pasture effects are present on farms with (currently) housed animals. 157 Farms with beef animals in a dairy herd were identified as high risk in the earlier analyses. Considering the BeefonDairy factor, it is immediately clear that the prevalence is much higher on this class of farm. Mean Farm Category Prevalence Not a Dairy Farm with Beef Cattle 0.31 Dairy Farm with Beef Cattle 0.76 The means and 95% confidence intervals are given in the following plot: 1 0.8 Mean Farm Prevalence 0.6 0.4 0.2 0 Not a Dairy Farm with Beef Cattle Dairy Farm with Beef Cattle Categories Carrying out a t-test, the mean difference (on the logit scale) is found to be 1.96, the standard error is 0.64, the t-statistic equals 3.08, and the associated p-value 0.002. Hence, the prevalence is highly statistically significantly higher in this class of farm. It is of some concern that this particular group was only identified through a detailed examination of the data, but the high prevalence seen in this group is extremely striking. The final factor which has been examined is Pigs. The mean farm prevalence for each category is as follows: Mean Farm Category Prevalence Pigs not present 0.43 Pigs present 0.65 The picture becomes more clear if the means are plotted with the associated 95% confidence intervals: 158 1 0.8 Mean Farm Prevalence 0.6 0.4 0.2 0 Pigs not present Pigs present Categories The data would suggest that farms with pigs present exhibit a higher prevalence than those which do not. Carrying out a t-test, the mean difference (on the logit scale) is found to be 0.89, the standard error is 0.35, the t-statistic equals 2.58, and the associated p-value 0.01. Hence, the prevalence is statistically significantly higher in this class of farm. The only variate which has been included in the model is Max_Age. The effect of this variate on the linear predictor is summarised by the associated coefficient, which takes the estimated value of –0.03, with a standard error of 0.015. The associated p- value equals 0.04. Hence, this result suggests that the higher the maximum age of animal present in the sampling group, the less likely is the group to present a positive sample. The nature of the effect is similar to that seen in the univariate analysis, where the associated p-value was 0.30. However, the removal of noise through the fitting of other explanatory factors has clearly allowed the multi-factor model to identify the utility of this variate in explaining aspects of the data. A review of the histogram of the variate would suggest that it is unlikely to be subject to issues of leverage. Having fitted all the likely explanatory variables in the multifactor model, we now return to explore the effect that the inclusion of these factors may have on the fit of the structural factors. Fitting Division in addition to the above explanatory variables gives the following output: 5567 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects; DISTRIBUTION=binomial;\ 5568 LINK=logit; DISPERSION=1; FIXED=SamGrF + Gra_Slur + Gra_Manu+BeefonDairy + Pigs + FCattle + Max_Age+Division;\ 5569 RANDOM=Farm; CONSTANT=estimate; FACT=9; PSE=*; MAXCYCLE=20; FMETHOD=fixed; CADJUST=mean] VFarmPos; NBINOMIAL=N_Bin ***** Generalised Linear Mixed Model Analysis ***** 159 Method: Marginal model, cf Breslow & Clayton (1993) JASA Response variate: VFarmPos Distribution: BINOMIAL Link function: LOGIT Random model: Farm Fixed model: Constant + ((((((SamGrF + Gra_Slur) + Gra_Manu) + Beefi nDairy) + Pigs) + FCattle) + Max_Age) + Division * Dispersion parameter fixed at value 1.000 ******** Warning from GLMM: missing values generated in weights/working variate. *** Monitoring information *** Iteration Gammas Dispersion Max change 1 0.1286 1.000 3.5100E+00 ******** Warning from GLMM: missing values generated in weights/working variate. 2 0.000001000 1.000 1.2864E-01 ******** Warning from GLMM: missing values generated in weights/working variate. 3 0.000001000 1.000 0.0000E+00 *** Estimated Variance Components *** Random term Component S.e. Farm 0.000 0.277 *** Residual variance model *** Term Factor Model(order) Parameter Estimate S.e. Dispersn Identity Sigma2 1.000 FIXED *** Estimated Variance matrix for Variance Components *** Farm 1 0.07674 Dispersn 2 0.00000 0.00000 1 2 *** Table of effects for Constant *** -2.144 Standard error: 0.3003 *** Table of effects for SamGrF *** SamGrF 1 2 3 4 0.0000 0.7174 0.5415 1.0466 Standard error of differences: Average 0.2447 Maximum 0.2661 Minimum 0.2227 Average variance of differences: 0.06023 *** Table of effects for Gra_Slur *** Gra_Slur 0.0 1.0 999.0 160 0.0000 1.2801 0.0802 Standard error of differences: Average 0.2790 Maximum 0.3217 Minimum 0.1955 Average variance of differences: 0.08130 *** Table of effects for Gra_Manu *** Gra_Manu 0.0 1.0 999.0 0.0000 -1.1381 0.0000 Standard error of differences: 0.3610 *** Table of effects for BeefonDairy *** BeefonDairy 0.0000 1.0000 0.000 2.015 Standard error of differences: 0.6400 *** Table of effects for Pigs *** Pigs 1 2 0.0000 0.8741 Standard error of differences: 0.3480 *** Table of effects for FCattle *** FCattle 1 2 3 4 0.0000 0.3680 0.3494 0.9796 Standard error of differences: Average 0.2747 Maximum 0.3277 Minimum 0.2076 Average variance of differences: 0.07788 *** Table of effects for Max_Age *** -0.03181 Standard error: 0.015407 *** Table of effects for Division *** Division Central Highland Islands North East South East 0.0000 -0.4960 -0.2883 0.0093 0.0066 Division South West -0.3872 Standard error of differences: Average 0.3212 Maximum 0.4244 Minimum 0.2437 Average variance of differences: 0.1062 **** G5W0003 **** Warning (Code VC 19). Statement 268 in Procedure GLMM Command: VKEEP #PFORM; MEANS=MEANS[]; VARMEANS=VARMEANS[] Table/sed matrix not available for mean effects of covariates 161 Table of mean effects cannot be saved for term Max_Age as it is a variate/covariate *** Tables of means *** * Using covariate mean values *** Table of predicted means for SamGrF *** SamGrF 1 2 3 4 -0.3942 0.3232 0.1473 0.6524 *** Table of predicted means for Gra_Slur *** Gra_Slur 0.0 1.0 999.0 -0.2713 1.0088 -0.1911 *** Table of predicted means for Gra_Manu *** Gra_Manu 0.0 1.0 999.0 0.5615 -0.5766 0.5615 *** Table of predicted means for BeefonDairy *** BeefonDairy 0.0000 1.0000 -0.825 1.189 *** Table of predicted means for Pigs *** Pigs 1 2 -0.2549 0.6192 *** Table of predicted means for FCattle *** FCattle 1 2 3 4 -0.2421 0.1259 0.1073 0.7375 *** Table of predicted means for Division *** Division Central Highland Islands North East South East 0.3748 -0.1213 0.0864 0.3841 0.3814 Division South West -0.0124 *** Back-transformed Means (on the original scale) *** * Using covariate mean values SamGrF 1 0.4027 2 0.5801 3 0.5367 4 0.6575 Gra_Slur 0.0 0.4326 1.0 0.7328 999.0 0.4524 162 Gra_Manu 0.0 0.6368 1.0 0.3597 999.0 0.6368 BeefonDairy 0.0000 0.3047 1.0000 0.7666 Pigs 1 0.4366 2 0.6500 FCattle 1 0.4398 2 0.5314 3 0.5268 4 0.6764 Division Central 0.5926 Highland 0.4697 Islands 0.5216 North East 0.5949 South East 0.5942 South West 0.4969 Note: means are probabilities not expected values. 5570 VDISPLAY [PRINT=Wald] 5570............................................................................ *** Wald tests for fixed effects *** Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob * Sequentially adding terms to fixed model SamGrF 26.27 3 8.76 <0.001 Gra_Slur 9.23 2 4.61 0.010 Gra_Manu 9.05 1 9.05 0.003 BeefonDairy 8.03 1 8.03 0.005 Pigs 5.15 1 5.15 0.023 FCattle 7.56 3 2.52 0.056 Max_Age 4.05 1 4.05 0.044 Division 5.56 5 1.11 0.352 * Dropping individual terms from full fixed model SamGrF 16.43 3 5.48 <0.001 Gra_Slur 16.61 2 8.31 <0.001 Gra_Manu 9.94 1 9.94 0.002 BeefonDairy 9.91 1 9.91 0.002 Pigs 6.31 1 6.31 0.012 FCattle 9.79 3 3.26 0.020 Max_Age 4.26 1 4.26 0.039 Division 5.56 5 1.11 0.352 * Message: chi-square distribution for Wald tests is an asymptotic approximation (i.e. for large samples) and underestimates the probabilities in other cases. As in the univariate analysis, there is clearly no evidence of any variability which is explained by Animal Health Division (p=0.35). For completeness, the plot of the 163 mean prevalences by animal health division, adjusted for the other explanatory factors is as follows: 1.00 0.80 Mean Farm Prevalence 0.60 0.40 0.20 0.00 Central Highland Islands North East South South East West Categories Although Highland Division is still the lowest prevalence division, it is much less extreme, clearly much of the between-division variability has been explained by the explanatory variables. Considering Management class, fitting Manage_O gives rise to the following output (summarised): 5583 VDISPLAY [PRINT=Wald] 5583............................................................................ *** Wald tests for fixed effects *** Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob * Sequentially adding terms to fixed model SamGrF 26.46 3 8.82 <0.001 Gra_Slur 9.43 2 4.72 0.009 Gra_Manu 9.17 1 9.17 0.002 BeefonDairy 8.02 1 8.02 0.005 Pigs 5.16 1 5.16 0.023 FCattle 7.79 3 2.60 0.051 Max_Age 4.01 1 4.01 0.045 Manage_O 1.49 3 0.50 0.685 * Dropping individual terms from full fixed model SamGrF 17.22 3 5.74 <0.001 Gra_Slur 15.73 2 7.87 <0.001 Gra_Manu 10.51 1 10.51 0.001 BeefonDairy 10.32 1 10.32 0.001 Pigs 6.27 1 6.27 0.012 FCattle 10.37 3 3.46 0.016 Max_Age 2.63 1 2.63 0.105 Manage_O 1.49 3 0.50 0.685 164 * Message: chi-square distribution for Wald tests is an asymptotic approximation (i.e. for large samples) and underestimates the probabilities in other cases. As seen in the earlier univariate analysis, there is clearly no evidence of any systematic effect due to Management Class. Given the evidence for trend in the data with respect to Sampling Year, and our continued interest in Sampling Month, the first model to investigate temporal trend will fit a separate effect for each of the 27 months of the study: 5661 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects; DISTRIBUTION=binomial;\ 5662 LINK=logit; DISPERSION=1; FIXED=SamGrF + Gra_Slur + Gra_Manu+BeefonDairy + Pigs + FCattle + Max_Age+Month;\ 5663 RANDOM=Farm; CONSTANT=estimate; FACT=9; PTERMS=Month; PSE=*; MAXCYCLE=20; FMETHOD=all;\ 5664 CADJUST=mean] VFarmPos; NBINOMIAL=N_Bin; MEANS=Means; VARMEANS=Vars ***** Generalised Linear Mixed Model Analysis ***** Method: cf Schall (1991) Biometrika Response variate: VFarmPos Distribution: BINOMIAL Link function: LOGIT Random model: Farm Fixed model: Constant + ((((((SamGrF + Gra_Slur) + Gra_Manu) + Beefi nDairy) + Pigs) + FCattle) + Max_Age) + Month * Dispersion parameter fixed at value 1.000 ******** Warning from GLMM: missing values generated in weights/working variate. *** Monitoring information *** Iteration Gammas Dispersion Max change 1 0.2879 1.000 3.2307E+00 ******** Warning from GLMM: missing values generated in weights/working variate. 2 0.000001000 1.000 2.8787E-01 ******** Warning from GLMM: missing values generated in weights/working variate. 3 0.06742 1.000 6.7416E-02 ******** Warning from GLMM: missing values generated in weights/working variate. 4 0.2668 1.000 1.9935E-01 ******** Warning from GLMM: missing values generated in weights/working variate. 5 0.2801 1.000 1.3329E-02 ******** Warning from GLMM: missing values generated in weights/working variate. 6 0.2854 1.000 5.3309E-03 ******** Warning from GLMM: missing values generated in weights/working variate. 7 0.2862 1.000 7.9267E-04 ******** Warning from GLMM: 165 missing values generated in weights/working variate. 8 0.2863 1.000 6.5874E-05 *** Estimated Variance Components *** Random term Component S.e. Farm 0.286 0.310 *** Residual variance model *** Term Factor Model(order) Parameter Estimate S.e. Dispersn Identity Sigma2 1.000 FIXED *** Estimated Variance matrix for Variance Components *** Farm 1 0.09603 Dispersn 2 0.00000 0.00000 1 2 *** Table of effects for Month *** Month 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 0.000 1.220 1.004 0.507 0.903 1.066 0.838 0.184 Month 11.00 12.00 13.00 14.00 15.00 16.00 17.00 18.00 1.334 0.567 -1.054 0.134 0.119 -0.460 0.852 -0.192 Month 19.00 20.00 21.00 22.00 23.00 24.00 25.00 26.00 0.916 0.367 1.638 0.304 0.049 -9.310 -0.455 -1.563 Month 27.00 28.00 29.00 0.272 -1.353 0.461 Standard error of differences: Average 3.266 Maximum 34.94 Minimum 0.4839 Average variance of differences: 90.87 *** Tables of means *** *** Table of predicted means for Month *** Month 3.00 4.00 5.00 6.00 7.00 -0.1603 1.0599 0.8437 0.3464 0.7428 Month 8.00 9.00 10.00 11.00 12.00 0.9057 0.6782 0.0238 1.1738 0.4065 Month 13.00 14.00 15.00 16.00 17.00 -1.2148 -0.0262 -0.0415 -0.6201 0.6913 Month 18.00 19.00 20.00 21.00 22.00 -0.3520 0.7553 0.2069 1.4780 0.1441 Month 23.00 24.00 25.00 26.00 27.00 -0.1110 -9.4700 -0.6156 -1.7231 0.1118 166 Month 28.00 29.00 -1.5135 0.3006 *** Back-transformed Means (on the original scale) *** Month 3.00 0.4600 4.00 0.7427 5.00 0.6992 6.00 0.5857 7.00 0.6776 8.00 0.7121 9.00 0.6633 10.00 0.5059 11.00 0.7638 12.00 0.6002 13.00 0.2289 14.00 0.4934 15.00 0.4896 16.00 0.3498 17.00 0.6663 18.00 0.4129 19.00 0.6803 20.00 0.5515 21.00 0.8143 22.00 0.5360 23.00 0.4723 24.00 0.0001 25.00 0.3508 26.00 0.1515 27.00 0.5279 28.00 0.1804 29.00 0.5746 Note: means are probabilities not expected values. 5666 VDISPLAY [PRINT=Wald] *** Wald tests for fixed effects *** Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob * Sequentially adding terms to fixed model SamGrF 19.74 3 6.58 <0.001 Gra_Slur 8.08 2 4.04 0.018 Gra_Manu 7.51 1 7.51 0.006 BeefonDairy 5.57 1 5.57 0.018 Pigs 3.52 1 3.52 0.060 FCattle 5.77 3 1.92 0.123 Max_Age 3.11 1 3.11 0.078 Month 45.97 26 1.77 0.009 * Dropping individual terms from full fixed model SamGrF 17.97 3 5.99 <0.001 Gra_Slur 16.57 2 8.29 <0.001 Gra_Manu 9.35 1 9.35 0.002 BeefonDairy 8.72 1 8.72 0.003 Pigs 6.65 1 6.65 0.010 FCattle 11.47 3 3.82 0.009 Max_Age 7.83 1 7.83 0.005 Month 45.97 26 1.77 0.009 Clearly, the month in which farms were sampled has a highly significant effect on the probability of a farm being identified as positive, even after allowing for the 167 explanatory variables. The plot of mean prevalences by sampling month is as follows: 1 0.8 Mean Farm Prevalence 0.6 0.4 0.2 0 Ja 8 Ja 9 M 9 M 0 8 M 9 M 0 Se 8 Se 9 No 8 No 9 8 9 0 9 9 -9 -9 -0 l-9 l-9 9 9 9 0 -9 -9 -0 v- v- n- n- p- p- ar ar ar ay ay ay Ju Ju M M Month There is a clear visual downwards trend in prevalence as the survey progressed, along with a seasonal effect which is slightly apparent in the 1998 data, is very apparent in the 1999 data, and which seems likely to be present in the 2000 data. In addition, there are peculiarities in the pattern of observed prevalences. In each of 1998 and 1999, there is evidence of an appreciable drop in prevalence in June, and in each of 1999 and 2000, there is evidence of an appreciable drop in prevalence in April. It is possible to overemphasise such apparent correlations in time series data, but it is reasonable to assume that the observed prevalence could change according to month, in line with changes in herd management and diet. Fitting Sampling Month at this level of detail does not help to define a picture of any seasonal effects on the prevalence. Any model with which it is hoped to achieve this objective must allow for the long term drop in prevalence and the month-to month variability. The simplest appropriate model is felt to be one which fits both Sampling Year and Month of Sample as fixed effects. It will not be possible to fit an interaction term. Since the data were collected in random clusters by week within Animal Health Division, it is theoretically possible that some of the drops and peaks might be associated with the particular Divisions which were sampled during that month. This is unlikely, given the lack of significance seen earlier for Animal Health Division as a factor, but to test for this, the model is refitted also including Animal Health Division: 5677 VDISPLAY [PRINT=Wald] *** Wald tests for fixed effects *** Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob * Sequentially adding terms to fixed model SamGrF 21.25 3 7.08 <0.001 168 Gra_Slur 7.85 2 3.93 0.020 Gra_Manu 7.70 1 7.70 0.006 BeefonDairy 6.45 1 6.45 0.011 Pigs 4.14 1 4.14 0.042 FCattle 5.99 3 2.00 0.112 Max_Age 3.15 1 3.15 0.076 Division 4.51 5 0.90 0.479 Sam_Year 14.19 2 7.09 <0.001 Sam_Mon 20.54 11 1.87 0.038 * Dropping individual terms from full fixed model SamGrF 16.92 3 5.64 <0.001 Gra_Slur 16.91 2 8.46 <0.001 Gra_Manu 9.36 1 9.36 0.002 BeefonDairy 10.55 1 10.55 0.001 Pigs 6.98 1 6.98 0.008 FCattle 10.04 3 3.35 0.018 Max_Age 7.19 1 7.19 0.007 Division 4.28 5 0.86 0.510 Sam_Year 6.91 2 3.45 0.032 Sam_Mon 20.54 11 1.87 0.038 The summarised results show that Division is insignificant as an effect, while Sampling Month is still significant. Hence, the model is refitted without this extraneous variable: 5684 GLMM [PRINT=model,monitor,components,vcovariance,means,backmeans,effects; DISTRIBUTION=binomial;\ 5685 LINK=logit; DISPERSION=1; FIXED=SamGrF + Gra_Slur + Gra_Manu+BeefonDairy + Pigs + FCattle + Max_Age+Sam_Year+Sam_Mon;\ 5686 RANDOM=Farm; CONSTANT=estimate; FACT=9; PSE=*; MAXCYCLE=20; FMETHOD=all; CADJUST=mean]\ 5687 VFarmPos; NBINOMIAL=N_Bin ***** Generalised Linear Mixed Model Analysis ***** Method: cf Schall (1991) Biometrika Response variate: VFarmPos Distribution: BINOMIAL Link function: LOGIT Random model: Farm Fixed model: Constant + (((((((SamGrF + Gra_Slur) + Gra_Manu) + Beef inDairy) + Pigs) + FCattle) + Max_Age) + Sam_Year) + Sam_Mon * Dispersion parameter fixed at value 1.000 ******** Warning from GLMM: missing values generated in weights/working variate. *** Monitoring information *** Iteration Gammas Dispersion Max change 1 0.1777 1.000 3.3648E+00 ******** Warning from GLMM: missing values generated in weights/working variate. 2 0.000001000 1.000 1.7774E-01 ******** Warning from GLMM: missing values generated in weights/working variate. 3 0.000001000 1.000 0.0000E+00 *** Estimated Variance Components *** Random term Component S.e. Farm 0.000 0.283 169 *** Residual variance model *** Term Factor Model(order) Parameter Estimate S.e. Dispersn Identity Sigma2 1.000 FIXED *** Estimated Variance matrix for Variance Components *** Farm 1 0.07987 Dispersn 2 0.00000 0.00000 1 2 *** Table of effects for Constant *** -3.180 Standard error: 0.5414 *** Table of effects for SamGrF *** SamGrF 1 2 3 4 0.0000 0.8074 0.7204 1.1645 Standard error of differences: Average 0.2488 Maximum 0.2693 Minimum 0.2278 Average variance of differences: 0.06224 *** Table of effects for Gra_Slur *** Gra_Slur 0.0 1.0 999.0 0.0000 1.3087 0.6259 Standard error of differences: Average 0.3242 Maximum 0.3755 Minimum 0.2704 Average variance of differences: 0.1069 *** Table of effects for Gra_Manu *** Gra_Manu 0.0 1.0 999.0 0.0000 -1.1917 0.0000 Standard error of differences: 0.3676 *** Table of effects for BeefonDairy *** BeefonDairy 0.0000 1.0000 0.000 2.206 Standard error of differences: 0.6646 *** Table of effects for Pigs *** Pigs 1 2 0.0000 1.0280 Standard error of differences: 0.3655 *** Table of effects for FCattle *** 170 FCattle 1 2 3 4 0.0000 0.3878 0.3844 1.1158 Standard error of differences: Average 0.2810 Maximum 0.3371 Minimum 0.2135 Average variance of differences: 0.08154 *** Table of effects for Max_Age *** -0.04357 Standard error: 0.016086 *** Table of effects for Sam_Year *** Sam_Year 1998 1999 2000 0.0000 -0.4249 -0.7956 Standard error of differences: Average 0.2577 Maximum 0.3071 Minimum 0.2076 Average variance of differences: 0.06806 *** Table of effects for Sam_Mon *** Sam_Mon Jan Feb Mar Apr May Jun Jul Aug 0.0000 0.1722 0.8812 0.2479 1.2634 0.4584 1.1696 1.0222 Sam_Mon Sep Oct Nov Dec 1.2757 0.5800 1.1474 0.2218 Standard error of differences: Average 0.4615 Maximum 0.5939 Minimum 0.3495 Average variance of differences: 0.2163 **** G5W0020 **** Warning (Code VC 19). Statement 268 in Procedure GLMM Command: VKEEP #PFORM; MEANS=MEANS[]; VARMEANS=VARMEANS[] Table/sed matrix not available for mean effects of covariates Table of mean effects cannot be saved for term Max_Age as it is a variate/covariate *** Tables of means *** * Using covariate mean values *** Table of predicted means for SamGrF *** SamGrF 1 2 3 4 -0.5474 0.2600 0.1730 0.6171 *** Table of predicted means for Gra_Slur *** Gra_Slur 0.0 1.0 999.0 -0.5192 0.7895 0.1067 171 *** Table of predicted means for Gra_Manu *** Gra_Manu 0.0 1.0 999.0 0.5229 -0.6688 0.5229 *** Table of predicted means for BeefonDairy *** BeefonDairy 0.0000 1.0000 -0.977 1.228 *** Table of predicted means for Pigs *** Pigs 1 2 -0.3883 0.6397 *** Table of predicted means for FCattle *** FCattle 1 2 3 4 -0.3463 0.0415 0.0381 0.7694 *** Table of predicted means for Sam_Year *** Sam_Year 1998 1999 2000 0.5325 0.1076 -0.2630 *** Table of predicted means for Sam_Mon *** Sam_Mon Jan Feb Mar Apr May -0.5776 -0.4054 0.3036 -0.3298 0.6857 Sam_Mon Jun Jul Aug Sep Oct -0.1192 0.5920 0.4446 0.6980 0.0023 Sam_Mon Nov Dec 0.5698 -0.3558 *** Back-transformed Means (on the original scale) *** * Using covariate mean values SamGrF 1 0.3665 2 0.5646 3 0.5431 4 0.6496 Gra_Slur 0.0 0.3730 1.0 0.6877 999.0 0.5267 Gra_Manu 0.0 0.6278 1.0 0.3388 999.0 0.6278 BeefonDairy 0.0000 0.2735 1.0000 0.7736 Pigs 1 0.4041 172 2 0.6547 FCattle 1 0.4143 2 0.5104 3 0.5095 4 0.6834 Sam_Year 1998 0.6301 1999 0.5269 2000 0.4346 Sam_Mon Jan 0.3595 Feb 0.4000 Mar 0.5753 Apr 0.4183 May 0.6650 Jun 0.4702 Jul 0.6438 Aug 0.6094 Sep 0.6678 Oct 0.5006 Nov 0.6387 Dec 0.4120 Note: means are probabilities not expected values. 5688 VDISPLAY [PRINT=Wald] *** Wald tests for fixed effects *** Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob * Sequentially adding terms to fixed model SamGrF 23.99 3 8.00 <0.001 Gra_Slur 8.78 2 4.39 0.012 Gra_Manu 8.64 1 8.64 0.003 BeefonDairy 6.90 1 6.90 0.009 Pigs 4.40 1 4.40 0.036 FCattle 6.56 3 2.19 0.087 Max_Age 3.47 1 3.47 0.063 Sam_Year 16.00 2 8.00 <0.001 Sam_Mon 22.04 11 2.00 0.024 * Dropping individual terms from full fixed model SamGrF 19.06 3 6.35 <0.001 Gra_Slur 18.22 2 9.11 <0.001 Gra_Manu 10.51 1 10.51 0.001 BeefonDairy 11.01 1 11.01 <0.001 Pigs 7.91 1 7.91 0.005 FCattle 11.86 3 3.95 0.008 Max_Age 7.33 1 7.33 0.007 Sam_Year 7.25 2 3.63 0.027 Sam_Mon 22.04 11 2.00 0.024 Both Month and Year of Sampling are found to have a statistically significant influence on the probability of a farm being classed as positive for shedding. The inclusion of these structural variables has a negligible effect on the significances estimated for the explanatory factors. 173 Reviewing the effect of Sampling Year, the estimated mean prevalences for the three years of the study, adjusted for Sampling Month effects and all the explanatory factors, are: Year Mean Farm Prevalence 1998 0.63 1999 0.53 2000 0.43 Plotting the mean prevalence with the associated 95% confidence intervals gives: 1.00 0.80 Mean Farm Prevalence 0.60 0.40 0.20 0.00 1998 1999 2000 Year The nature of the trend is clear. There is a year on year drop in prevalence, which is statistically significant overall (p=0.03). The drop from 1998 to 1999 exhibits a mean change of –0.425, with a standard error of 0.208. The associated t-statistic equals 2.05, with a p-value of 0.04. The drop from 1999 to 2000 is not statistically significant (change=-0.37, se=0.26, t=1.44, p=0.15). The nature of the trend is identical to that seen in the analysis involving only year and month, but the estimated effects are much more significant for 1998/1999, presumably since much of the extraneous noise in the initial analysis has been explained by the explanatory variables in the multi-factor model, and less significant for 1999/2000, presumably since much of the effect in 2000 has been explained by other explanatory factors which were strongly unbalanced in the (abbreviated) sampling year 2000. Reviewing the effect of Sampling Month, the estimated mean prevalences for the each month of the year, adjusted for Sampling Year effects and all the explanatory factors, are: Mean Farm Month Prevalence Jan 0.36 Feb 0.40 174 Mar 0.58 Apr 0.42 May 0.67 Jun 0.47 Jul 0.64 Aug 0.61 Sep 0.67 Oct 0.50 Nov 0.64 Dec 0.41 A more clear picture is provided by plotting the mean prevalence with the associated 95% confidence intervals, giving: 1.00 0.80 Mean Farm Prevalence 0.60 0.40 0.20 0.00 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Month of Sampling There appears to be a clear seasonal cycle in prevalence, with higher values in late Sprint and Summer, and lower values in December to February. However, over and above this, there is evidence of other monthly effects occurring against the cycle, perhaps most blatantly in June, and probably in March, April and November. Again, the nature of the month to month effect is unchanged relative to the initial analysis involving only month and year, but the estimated effects exhibit a greater significance, presumably due to the greater explanatory value of the multi-factor model. It is tempting to consider that, previous evidence notwithstanding, the Sampling Month effect might be associated with Housing status, as was the within-farm prevalence on positive farms. To test this hypothesis, the model is refitted, including Housed as a further explanatory factor. The (summarised) results are as follows: 5730 VDISPLAY [PRINT=Wald] *** Wald tests for fixed effects *** 175 Fixed term Wald statistic d.f. Wald/d.f. Chi-sq prob * Sequentially adding terms to fixed model SamGrF 24.20 3 8.07 <0.001 Gra_Slur 8.46 2 4.23 0.015 Gra_Manu 8.77 1 8.77 0.003 BeefonDairy 6.90 1 6.90 0.009 Pigs 4.37 1 4.37 0.037 FCattle 6.40 3 2.13 0.094 Max_Age 3.45 1 3.45 0.063 Sam_Year 15.98 2 7.99 <0.001 Housed 0.55 1 0.55 0.460 Sam_Mon 21.94 11 1.99 0.025 * Dropping individual terms from full fixed model SamGrF 19.25 3 6.42 <0.001 Gra_Slur 16.54 2 8.27 <0.001 Gra_Manu 10.65 1 10.65 0.001 BeefonDairy 11.02 1 11.02 <0.001 Pigs 7.87 1 7.87 0.005 FCattle 11.64 3 3.88 0.009 Max_Age 7.24 1 7.24 0.007 Sam_Year 7.32 2 3.66 0.026 Housed 0.53 1 0.53 0.467 Sam_Mon 21.94 11 1.99 0.025 Housed remains completely insignificant as an explanatory factor, and Sampling Month and Sampling Year are unchanged in terms of overall significance levels. Hence, there is clear evidence of a temporal structure in the data, both over the long term (a significant decrease in the proportion of farms detected as positive over the lifetime of the project), and over the short term (a significant month to month variability, unexplained by the explanatory variables fitted in the multi-factor model). 176 Appendix 1: Variates and Factors Collected by the Farm Questionnaire. Factor/Variable Comments Levels Manage_O Observed management type. Beef, Dairy, Other, Mixed Division Animal Health Division, with one division divided into Highlands and Islands. Central, Highlands, Islands, NE, SE, SW Sam_Month Month in which samples were collected. January-December Sample Type of sampling scheme. Faecal Pat, Rectal Sam_Year Year in which samples were collected. 1998, 1999, 2000 Sampler Person carrying out sampling. H, F (codes) N_F_Cattle Number of finishing cattle on farm. Variate FCattle Number of finishing cattle, categorised into groups. <50, 50-99, 100-199, 200+ N_Groups Number of management groups of cattle on farm. Variate GroupsCat Number of management groups, categorised into groups. 1, 2-5, 6-9, 10+ N_Sam_Gr Number of finishing cattle in sampling group. Variate Min_Age Minimum age of animals in sampling group. Variate Max_Age Maximum age of animals in sampling group. Variate Source Farm policy for replacement cattle. Buy In, Breeding Only, Both NewSource Restructuring of 'Source' into open and closed farms. Open, Closed Beef (Suckler Beef), Dairy Beef, Dairy (Bull Breed Breed of cattle in sampling group. Beef), Combinations of these Housed Whether sampling group are housed or unhoused. Housed, Unhoused Housing For housed animals only: type of housing. Court/Straw Yard, Slats, Byre, Other TDHouse Number of months for which animals have been in current housed state. Variate Whether or not the sampling group have been moved in the 4 weeks prior to Rec_Move sampling. Yes, No SupFeed For unhoused animals only: whether the sampling group is fed supplements. Yes, No Whether or not the sampling group have had a change in diet in the 4 weeks RecDFeed prior to sampling. Yes, No Forage For housed animals only: whether the sampling group is fed forage. Yes, No Silage For housed animals only: whether the sampling group is fed silage. Yes, No Concentrate For housed animals only: whether the sampling group is fed concentrate. Yes, No Sil_Home For housed animals fed silage only: whether the farm produces silage. Yes, No For housed animals fed farm-produced silage only: whether the farm spreads Sil_Manure manure on the silage fields. Yes, No For housed animals fed farm-produced silage only: whether the farm spreads Sil_Slurry slurry on the silage fields. Yes, No For housed animals fed farm-produced silage only: whether the farm spreads Sil_Sewage sewage on the silage fields. Yes, No For housed animals fed farm-produced silage only: whether geese have been Sil_Geece observed on the silage fields. Yes, No For housed animals fed farm-produced silage only: whether gulls have been Sil_Gulls observed on the silage fields. Yes, No Hay Whether the farm produces hay. Yes, No If the farm produces hay only: whether the farm spreads manure on the hay Hay_Manure fields. Yes, No Hay_Slurry If the farm produces hay only: whether the farm spreads slurry on the hay fields. Yes, No If the farm produces hay only: whether the farm spreads sewage on the hay Hay_Sewage fields. Yes, No If the farm produces hay only: whether geese have been observed on the hay Hay_Geese fields. Yes, No If the farm produces hay only: whether gulls have been observed on the hay Hay_Gulls fields. Yes, No Grass_Manure Whether the farm spreads manure on pasture. Yes, No Grass_Slurry Whether the farm spreads slurry on pasture. Yes, No Grass_Sewage Whether the farm spreads sewage on pasture. Yes, No Grass_Geece Whether geese have been observed on pasture. Yes, No Grass_Gulls Whether gulls have been observed on pasture. Yes, No N_Cattle Number of cattle on farm other than the finishing group. Variate Number of cattle on farm other than the finishing group, categorised into a Cattle factor. <100, 100-499, 500-899, 900+ N_Sheep Number of sheep on farm. Variate Sheep Absence/presence of sheep on farm. Yes, No N_Goats Number of goats on farm. Variate Goats Absence/presence of goats on farm. Yes, No N_Horses Number of horses on farm. Variate N_Pigs Number of pigs on farm. Variate Pigs Absence/presence of pigs on farm. Yes, No 177 N_Chickens Number of chickens on farm. Variate Chickens Absence/presence of chickens on farm. Yes, No N_Deer Number of deer on farm. Variate Deer Absence/presence of deer on farm. Yes, No Mains Whether sampling group is watered with a mains supply. Yes, No Private Whether sampling group is watered with a private supply. Yes, No Natural Whether sampling group is watered with a natural supply. Yes, No WaterCon Whether water have been contaminated within the 12 months prior to sampling. Yes, No Animals Upstream, Septic Tank, Midden, WaterCT Possible sources of contamination. Combinations of these Want2Know Whether farmer wishes to know results of sampling. Yes, No Visit2 Whether farmer is willing to have a further set of samples collected. Yes, No LabOperator Lab operator responsible for assaying faeces samples. S, D, H (codes) BeefonDairy Whether farm is classed as a dairy farm with suckler beef cattle. Yes, No 178