Sunday, July 7, 2013

Does Neg- vs. Aux-Contraction Vary Geographically In England? A Miniature Study

On Friday Sam Kirkham and I met some sixth-formers [high school students] and gave them an introduction to our department and sociolinguistics in general. We decided to take advantage of the opportunity and use the students as unpaid research assistants. We designed a small questionnaire that they could give to each other, and to people waiting for a bus or sitting on the [unusually] sunny steps of Alexandra Square. To illustrate north-south differences, we included a few questions about TRAP-BATH and FOOT-STRUT. We also had this item:

This alternation, which has all but disappeared from US English, involves a choice between so-called negative contraction (I haven't been) and auxiliary [or operator] contraction (I've not been). The auxiliary in question can be is, are, have, has, had, will, or would (Varela Pérez 2013: 257); in this study we are looking at a single instance with have, an environment that favors negative contraction, compared to is or are.

Peter Trudgill was the first sociolinguist to suggest a geographic correlation for this variable, claiming that auxiliary contraction increases "the further north one goes" (1978: 13). However, his early proclamations of this sort have not always survived later scrutiny. As another example, Hughes & Trudgill (1979: 25) stated that the particle verb alternation (pour out the tea vs. pour the tea out) also patterned along a north-south continuum, but this was not at all borne out in an experimental study involving 145 UK (and Irish) speakers (Haddican & Johnson 2012).

Regarding contraction, studies have indeed found either no clear geographic correlation (Anderwald 2002, Smith & Tagliamonte 2002), or a weak one in the opposite direction, meaning that southerners may slightly prefer auxiliary contraction (Gasparrini 2001). However, nothing approaching a dialectological study of this variable has ever been conducted. For example, the eight places studied by Tagliamonte & Smith are scattered and to some extent intentionally unrepresentative of UK speech. In the present miniature study, we will achieve far less depth about each location, but a wider geographical coverage (though certainly unrepresentative in its own way).

We obtained 52 responses to the question on contraction, associated with 36 places of origin in England. The distribution was as follows: 23 people said "I haven't been to Ireland" was "much better", 16 said it was "slightly better", 7 said the two alternatives were "equally good", 3 said "I've not been to Ireland" was "slightly better", and 3 said it was "much better". This overall strong preference for NEG-contraction with have is in line with the literature. A simple way to address the geographical question is to divide the responses into three categories - South, Midlands, and North - and compare the responses in each group. Although the measurement scale of the question is ordinal, we will assume linearity and assign numerical scores, ranging from 0 for judging NEG-contraction "much better", to 4 for judging AUX-contraction "much better".

Using Wikipedia's traditional definition of the Midlands to divide the regions led to average scores that are, at the very least, suggestive of a difference in line with Trudgill's original formula.

South (11 responses): 0.55
Midlands (8 responses): 0.63
North (33 responses): 1.21

If we combine the very similar South and Midlands regions and contrast their data with that from the North, it is initially unclear just how much evidence we have for a geographical difference. While a conventional t.test() returns a p-value of .03, the non-parametric wilcox.test() (or Mann-Whitney test, more appropriate here because the response is not only ordinal but quite skewed) gives p = .12, which would not be interpreted as statistically significant. However, we should also consider that none of the 19 respondents from the South and Midlands expressed a positive preference for AUX-contraction, while 6 of 33 Northern subjects did so. While dispreferred everywhere, AUX-contraction appears to be more acceptable in the North.

It is rarely a good idea to reduce a continuous variable to a set of discrete categories, and collapsing 36 distinct places into three regions is no exception, even though the historical division between North, Midlands and South has considerable historical precedent (the areas correspond roughly to the Northumbrian, Mercian, and Saxon kingdoms - and dialects - of the Old English period). If AUX-contraction really increases in a continuous manner "the further north one goes", then an analysis that treats latitude as a continuous variable will be more successful in revealing the effect. Incorporating the dimension of longitude as well, though it makes the statistics more complex, is potentially even more revealing.

The place names given by the respondents (usually cities or towns, sometimes counties) were entered into an online geocoder to obtain their latitudes and longitudes. There are many R packages (as well as other software) that could produce a map of this data; some options are described here. I found an outline map of England here, intended for use with the sp package, but I plotted it with ordinary 'base' R graphics (since I have yet to learn ggplot2, I do not know how to produce maps like this!). The only commands used for this map are plot(), points(), cluster.overplot() to separate the responses from the same place, and legend().

A basic spatial statistic called Moran's I is often run to establish whether the data show global spatial autocorrelation. Like any correlation, Moran's I can range from -1 to +1. A value of 0 would reflect a random spatial distribution of high and low values (dark and light points). A positive value means that similar values tend to cluster together, while a negative value means that high and low values are inter-dispersed more than randomness would expect (imagine the black and white squares on a chessboard). The statistic depends on a matrix of spatial weights; for example, all points within a certain distance could be considered neighbors, or the closest k points regardless of distance. Other, more gradual criteria can also be applied (see here and also Grieve et al. 2011).

I decided, somewhat arbitrarily, to use 5-nearest-neighbors as the threshold. If responses, on average, are more similar to their 5 nearest neighbors than to responses further away, then Moran's I should be positive. In fact, Moran's I is -0.102, which is associated with a p-value of .27. This means that the distribution of responses favoring AUX-contraction and NEG-contraction are not clustered, but in fact almost random in their spatial patterning. This conclusion is disappointing! On the bright side, a lack of spatial autocorrelation means that an ordinary regression can be performed with less fear of error. But a lm() model with latitude as a predictor is also not statistically significant (p = .27). Of course, such a model implies a gradual effect of latitude which to some extent goes against the idea of coherent dialect regions.

If a linguistic feature has wide variability in every community, then it is possible that global spatial autocorrelation will be low - especially with a small number of respondents - even though an overall geographical difference may exist. As this is a miniature study, we cannot pursue the debate further but can only note that if a small amount of crude data collected one afternoon in Lancaster can provide this much information, a larger collection effort could likely settle the question once and for all as to whether the preferred means of contraction has a geographic component. We will conclude by using the method of generalized additive modeling (mgcv package) to create a smoothed map of contraction preference.

Based on this plot, we would think that contraction varies geographically! But geographic patterns, like other types, can certainly arise by chance. To solve this question would require a dialectological investigation - that is, one conducted at many places. But the data collected on Friday, in a few hours, by sixth form students, restores some faith in Peter Trudgill's conjecture, which may have been dismissed too hastily by linguists.


Anderwald, Lieselotte. 2002. Negation in Non-standard British English: Gaps, Regularizations, Asymmetries. London: Routledge.

Gasparrini, Désirée. 2001. It isn’t, it is not or it’s not? Regional Differences in Contraction in Spoken British English. Master’s thesis. University of Zürich.

Grieve, Jack, Dirk Speelman and Dirk Geeraerts. 2011. A statistical method for the identification and aggregation of regional linguistic variation. Language Variation and Change 23: 193-221.

Haddican, Bill and Daniel Ezra Johnson. 2012. Effects on the Particle Verb Alternation across English Dialects. University of Pennsylvania Working Papers in Linguistics 18(2): Article 5.

Hughes, Arthur and Peter Trudgill. 1979. English Accents and Dialects: An Introduction to Social and Regional Varieties of British English. London: Edward Arnold.

Tagliamonte, Sali and Jennifer Smith. 2002. 'Either it isn’t or it’s not': NEG/AUX Contraction in British Dialects. English World-Wide 23(2): 251-281.

Trudgill, Peter. 1978. Sociolinguistic patterns in British English. London: Edward Arnold.

Varela Pérez, José Ramón. 2013. Operator and negative contraction in spoken British English: a change in progress. In Bas Aarts, Joanne Close, and Geoffrey Leech (eds.), The Verb Phrase in English: Investigating Recent Language Change With Corpora. Cambridge University Press. 256-285.


  1. The alternation seems pretty common to me here in Georgia.

  2. "I've not" sounds Irish to me.