Sunday, December 7, 2014

Sour Grapes And Sweet Mixed Models: Avoiding The Goldilocks Problem in Publishing


Longtime readers of this blog, and people who look in the archives (try 2010), know that I sometimes write poems. I think they're good, and my kinder readers, or those who know less about academic poetry -- and I certainly belong in both categories -- ask if I've tried to publish them. And I say I would if I could, but I know I can't. And also, I already have.

Before Gutenberg, people basically passed notes. It was democratic, it was slow, it could be subversive, but it wasn't easy to connect across long distances or outside of one's social networks. The invention of printing allowed a much wider and more rapid dissemination of all kinds of writing. At the same time, it concentrated power in the hands of publishers, who decided what to print and what not to. And made money off it.

Today, publishers still make these decisions, and they still make money off it, but they exist for a different reason. No longer necessary for getting your work "out there", publishing (as opposed to self-publishing) gives the writer prestige and hopefully career advancement, and it helps readers decide what is worth reading (or at least what to cite).

In today's world, where there is a lot out there even on obscure topics, and an overwhelming amount on popular ones, it is certainly necessary for judgements and decisions to be made. No one can read everything, and not everything is worth reading. However, we should be able to make these judgments and decisions for ourselves. In Anglo-American legal terms, we don't need, or want, "prior restraint".

Publishing is outdated and will eventually disappear. Pre-publication peer review will fall along with it and, in my view, be replaced with some version of post-self-publication peer review. But many more thoughtful people than me are out there debating what this new world might look like.

The revolution will be bad news for publishing houses, obviously, and will also pose problems for anyone wanting to evaluate academics the way they are now -- and thus for academics who need to be evaluated, promoted, given tenure, etc. Without getting lost in speculation about how these institutions might work in a post-publishing world, we can anticipate that they will work better than they do now, depending heavily on a flawed system of peer review and publishing.

At a minimum, an author asks one thing from the current system: "If my article be rejected, Lord, it is thy will, but please let not a blatantly shittier article be published in the same journal within six months." But the author's perspective is not only biased, but limited. We may think our article is good (or know we really need to publish it), but we are not interested in every other type of thing the journal might legitimately publish.

And while "not interested" may indeed sometimes mean "a waste of time for me to read", it doesn't necessarily mean "a waste of paper for the journal to print", because other readers with other interests are out there. (I'm setting aside the important fact that there is no paper, that any space restrictions have become intentional, not physical.)

The problem with pre-publication peer review is not that there are too many submissions of varying quality coming from too many different perspectives (where the relationship between quality and perspective is complicated), although there are, and this will make editors' jobs difficult as long as journals exist. The problem is that given this quantity and diversity, pre-publication peer review doesn't (and can't) employ nearly enough reviewers.

Potential reviewers will be more or less familiar with the topic of a submission, and (especially if they are more familiar with it) more or less already in agreement with what it says. Less familiar reviewers may overestimate the originality of the submission, while more familiar reviewers may overestimate the knowledge of the intended audience. There may be reviewers whose own work is praised or developed further in the submission, and others whose work is criticized.

Even when these reviewers all provide useful feedback, a publishing decision must be made based on their recommendations. And the more inconsistency there is between reviewers, or any raters, the more opinions you need to achieve a reliable outcome. Considering how much inter-reviewer variation exists in our field, I think two or three reviewers, no matter how carefully chosen, are not always enough. So it doesn't surprise me that certain things "fall through the cracks", while others do the opposite (whatever the metaphor would be). Luckily, in the future, we will all be reviewers, so this problem will be eliminated along with the journals and publishers.

Some years ago, having published an article pointing out one advantage of using mixed-effects models on sociolinguistic data, when speakers vary individually in their use of a variable -- namely the reduction of absurd levels of Type I error for between-speaker predictors (Johnson 2009) -- I was invited to say more on the topic at a panel at NWAV 38 in October 2009.

In this presentation, I reiterated the point about Type I error, and discussed three other advantages of mixed models: better estimation of the effects of between-speaker predictors (when some speakers have more data than others), better estimation of the effects of within-speaker predictors (when those predictors are not balanced across speakers), and better estimation of the effects of within-speaker predictors (in general, in logistic regression).

I wrote this up and submitted it to a journal (8/10), revised and resubmitted it twice (4/11, 2/12), and then submitted and re-submitted it to another journal (7/12, 4/13). While this process has greatly improved the manuscript, it still tries to make the same points as the NWAV presentation. While the reviewers did not challenge these points directly, they raised valid concerns about the manuscript's length (too much of it) and organization (too little of it), and about its appeal to the various potential readers of the respective journals.

For example, some judged it to be inaccessible, while for others, it read too much like a textbook. Another point of disagreement related to the value of using simulated data to illustrate statistical points, which I had done in Johnson 2009, and more recently here, here, here, here, and here.

When it came to the core content, the reviewers' opinions were equally divergent. I was told that fixed-effects models could be as good as (or even better than) mixed-effects models: "too new!" I was told that the mixed models I focus on, which don't contain random slopes, were not good enough: "too old!" And I was told that the mixed models I discussed were just fine -- "just right?" -- but that everyone knows this already: "d'oh!"

The editors gave detailed and thoughtful recommendations, both general and specific, on how to revamp the manuscript, including trying to find a middle ground between these apparently contradictory perspectives. But even if the article were published in a journal, any reader would still find themselves falling into one (or more) of these camps. Or, more likely, having a unique perspective of their own.

My perspective is that this article could be useful to some readers, and that readers should be able to decide if it is for themselves. Clearly, not everyone already "knows" that we "should" use mixed models, since fixed-effects models are still seen in the wild. They rarely appear in the mixed-esque form devised by Paolillo 2012, but more often in their native (naive) incarnation, where any individual variation -- by speaker, by word, etc. -- is simply and majestically ignored. GoldVarb Bear made several appearances at NWAV 43.

Random intercepts are a stepping-stone, not an alternative, to random slopes (but see Bear, er, Barr et al. 2013). And even if the reduction of Type I error is the most important thing for some readers, the other advantages of random intercepts -- their effect on the regression coefficients themselves -- deserve to be known and discussed. To me, they are not very complicated to understand, but they are not obvious. Readers are welcome to disagree.

Of course, whether individual speakers and words actually have different rates of variation (let alone constraints on variation) is not a settled question in the first place. This article assumes (conservatively?) that they do, and I think most people would agree when it comes to speakers, while many would disagree when it comes to words. But a nice thing about mixed-effects models is that they work either way.

All of this is a way of introducing this document, and placing what is partially sour grapes in some kind of coherent frame. This article is distributed free of charge and without any restriction on its use. If it seems long, boring, and/or repetitive, you can skim it, and certainly don't print it! If something seems wrong, please let me know. And if it's helpful in your work, you can cite it if you like.

This particular article is certainly not a model of anything, just a few ideas, or observations, of a quantitative and methodological kind. But maybe we don't always need a publishing infrastructure to share our ideas -- or at least not if they're not our best ones.


References

Barr, Dale J., Roger Levy, Christoph Scheepers and Harry J. Tily. 2013. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language 68: 255–278.

Johnson, Daniel Ezra. 2009. Getting off the GoldVarb standard: Introducing Rbrul for mixed-effects variable rule analysis. Language and Linguistics Compass 3(1): 359-383.

Johnson, Daniel Ezra. 2014. Progress in regression: Why natural language data calls for mixed-effects models. Self-published manuscript.

Paolillo, John C. 2013. Individual effects in variation analysis: Model, software, and research design. Language Variation and Change 25(1): 89-118.

Wednesday, December 3, 2014

Are You Talkin' To ME? In defense of mixed-effects models


At NWAV 43 in Chicago, Joseph Roy and Stephen Levey presented a poster calling for "caution" in the use of mixed-effects models in situations where the data is highly unbalanced, especially if some of the random-effects groups (speakers) have only a small number of observations (tokens).

One of their findings involved a model where a certain factor received a very high factor weight, like .999, which pushed the other factor weights in the group well below .500. Although I have been unable to look at their data, and so can't determine what caused this to happen, it reminded me that sum-contrast coefficients or factor weights can only be interpreted relative to the other ones in the same group.

An outlying coefficient C does not affect the difference between A and B. This is much easier to see if the coefficients are expressed in log-odds units. In log-odds, it seems obvious that the difference between A: -1 and B: +1 is the same as the difference between A': -5 and B': -3. The difference in each case is 2 log-odds.

Expressed as factor weights -- A: .269, B: .731; A': .007, B': .047 -- this equivalence is obscured, to say the least. It is impossible to consistently describe the difference between any two factor weights, if there are three or more in the group. To put it mildly, this is one of the disadvantages of using factor weights for reporting the results of logistic regressions.

Since factor weights (and the Varbrul program that produces them) have several other drawbacks, I am more interested in the (software-independent) question that Roy & Levey raise, about fitting mixed-effects models to unbalanced data. Even though handling unbalanced data is one of the main selling points of mixed models (Pinheiro & Bates 2000), Roy and Levey claim that such analyses "with less than 30-50 tokens per speaker, with at least 30-50 speakers, vastly overestimate variance", citing Moineddin et al. (2007).

However, Moineddin et al. actually only claim to find such an overestimate "when group size is small (e.g. 5)". In any case, the focus on group size points to the possibility that the small numbers of tokens for some speakers is the real issue, rather than the data imbalance itself.

Fixed-effects models like Varbrul's vastly underestimate speaker variance by not estimating it at all and assuming it to be zero. Therefore, they inflate the significance of between-speaker (social) factors. P-values associated with these factors are too low, increasing the rate of Type I error beyond the nominal 5% (this behavior is called "anti-conservative"). All things being equal, the more tokens there are per speaker, the worse the performance of a fixed-effects model will be (Johnson 2009).

With only 20 tokens per speaker, the advantage of the mixed-effects model can be small, but there is no sign that mixed models ever err in the opposite direction, by overestimating speaker variance -- at least, not in the balanced, simulated data sets of Johnson (2009). If they did, they would show p-values that are higher than they should be, resulting in Type I error rates below 5% (this behavior is called "conservative").

It is difficult to compare the performance of statistical models on real data samples (as Roy and Levey do for three Canadian English variables), because the true population parameters are never known. Simulations are a much better way to assess the consequences of a claim like this.

I simulated data from 20 "speakers" in two groups -- 10 "male", 10 "female" -- with a population gender effect of zero, and speaker effects normally distributed with a standard deviation of either zero (no individual-speaker effects), 0.1 log-odds (95% of speakers with input probabilities between .451 and .549), or 0.2 log-odds (95% of speakers between .403 and .597).

The average number of tokens per speaker (N_s) ranged from 5 to 100. The number of tokens per speaker was either balanced (all speakers have N_s tokens), imbalanced (N_s * rnorm(20, 1, 0.5), or very imbalanced (N_s * rnorm(20, 1, 1). Each speaker had at least one token and no speaker had more than three times the average number of tokens.

For each of these settings, 1000 datasets were generated and two models were fit to each dataset: a fixed-effects model with a predictor for gender (equivalent to the "cautious" Varbrul model that Roy & Levey implicitly recommend), and a mixed-effects (glmer) model with a predictor for gender and a random intercept for speaker. In each case, the drop1 function (a likelihood-ratio test) was used to calculate the Type I error rate -- the proportion of the 1000 models with p < .05 for gender. Because there is no real gender effect, if everything is working properly, this rate should always be 5%.



For each panel, the figure above plots the proportion of significant p-values (p < .05) obtained from the simulation, in blue for the fixed-effects model and in magenta for the mixed model. A loess smoothing line has been added to each panel. Again, since the true population gender difference is always zero, any result deemed significant is a type I error. The figure shows that:

1) If there is no individual-speaker variation (left column), the fixed-effects model appears to behave properly, with 5% Type I error, and the mixed model is slightly conservative, with 4% Type I error. There is no effect of the average number of tokens per speaker (within each panel), nor is there any effect of data imbalance (between the rows of the figure).

2) If there is individual-speaker variation (center and right columns), the fixed-effects model error rate is always above 5%, and it increases roughly linearly in proportion to the number of tokens per speaker. The greater the individual-speaker variation, the faster the increase in the Type I error rate for the fixed-effects model, and therefore the larger the disadvantage compared with the mixed model.

The mixed model proportions are much closer to 5%. We do see a small increase in Type I error as the number of tokens per speaker increases; the mixed model goes from being slightly conservative (p-values too high, Type I error below 5%) to slightly anti-conservative (p-values too low, Type I error above 5%).

Finally, there is a small increase in Type I error associated with greater data imbalance across groups. However, this effect can be seen for both types of models. There is no evidence that mixed models are more susceptible to error from this source, either with a low or a high number of average tokens per speaker.

In summary, the simulation does not show any sign of markedly overconservative behavior from the mixed models, even when the number of tokens per speaker is low, and the degree of imbalace is high. This is likely to be because the mixed model is not "vastly overestimating" speaker variance in any general way, despite Roy & Levey's warnings to the contrary.

We can look at what is going on with these estimates of speaker variance, starting with a "middle-of-the-road" case where the average number of tokens per speaker is 50, the true individual-speaker standard deviation is 0.1, and there is no imbalance across groups.

For this balanced case, the fixed-effects model gives an overall Type I error rate of 6.4%, while the mixed model gives 4.4%. The mean estimate of individual-speaker variance, in the mixed model, is 0.063. Note that this average is an underestimate, not an overestimate, of the variance in the population, which is 0.1.

Indeed, in 214 of the 1000 runs, the mixed model underestimated the speaker variance as much as it possibly could: it came out as zero. For these runs, the proportion of Type I error was higher: 6.1%, and similar to the fixed-effects model, as we would expect.

In 475 of the runs, a positive speaker variance was estimated that was still below 0.1, and the Type I error rate was 5.3%. And in 311 runs, the variance was indeed overestimated, that is, it was higher than 0.1. The Type I error rate for these runs was only 1.9%.

Mixed models can overestimate speaker variance -- incidentally, this is because of the sample data they are given, not because of some glitch -- and when this happens, the p-value for a between-speaker effect will be too high (conservative), compared to what we would calculate if the true variance in the population were known. However, in just as many cases, the opposite thing happens: the speaker variance is underestimated, resulting p-values that are too low (anti-conservative). On average, though, the mixed-effects model does not behave in an overly conservative way.

If we make the same data quite unbalanced across groups (keeping the average of 50 tokens per speaker and the speaker standard deviation of 0.1), the Type I error rates rise to 8.3% for the fixed-effects model and 5.6% for the mixed model. So data imbalance does inflate Type I error, but mixed models still maintain a consistent advantage. And it is still as common for the mixed model to estimate zero speaker variance (35% of runs) as it is to overestimate the true variance (28% of runs).

I speculated above that small groups -- speakers with few tokens -- might pose more of a problem than unbalanced data itself. Keeping the population speaker variance of 0.1, and the high level of data imbalance, but considering the case with only 10 tokens per speaker on average, we see that the Type I error rates are 4.5% for fixed, 3.0% for mixed.

The figure of 4.5% would probably average out close to 5%; it's within the range of error exhibited by the points on the figure above (top row, middle column). Recall that our simulations go as low as 5 tokens per speaker, and if there were only 1 token per speaker, no one would assail the accuracy of a fixed-effects model because it ignored individual-speaker variation (or, put another way, within-speaker correlation). But sociolinguistic studies with only a handful of observations per speaker or text are not that common, outside of New York department stores, rare discourse variables, and historical syntax.

For the mixed model, the Type I error rate is the lowest we have seen, even though only 28% of runs overestimated the speaker variance. Many of these overestimated it considerably, however, contributing to the overall conservative behavior.

Perhaps this is all that Roy & Levey intended by their admonition to use caution with mixed models. But a better target of caution might be any data set like this one: a binary linguistic variable, collected from 10 "men" and 10 "women", where two people contributed one token each, another contributed 2, another 4, etc., while others contributed 29 or 30 tokens. As much as we love "naturalistic" data, it is not hard to see that such a data set is far from ideal for answering the question of whether men or women use a linguistic variable more often. If we have to start with very unbalanced data sets, including groups with too few observations to reasonably generalize from, it is too much to expect that any one statistical procedure can always save us.

The simulations used here are idealized -- for one thing, they assume normal distributions of speaker effects -- but they are replicable, and can be tweaked and improved in any number of ways. Simulations are not meant to replicate all the complexities of "real data", but rather to allow the manipulation of known properties of the data. When comparing the performance of two models, it really helps to know the actual properties of what is being modeled. Attempting to use real data to compare the performance of models at best confuses sample and population, and at worst casts unwarranted doubt on reliable tools.


References

Johnson, D. E. 2009. Getting off the GoldVarb standard: introducing Rbrul for mixed-effects variable rule analysis. Language and Linguistics Compass 3/1: 359-383.

Moineddin, R., F. I. Matheson and R. H. Glazier. 2007. A simulation study of sample size for multilevel logistic regression models. BMC Medical Research Methodology 7(1): 34.

Pinheiro, J. C. and D. M. Bates. 2000. Mixed-effect models in S and S-PLUS. New York: Springer.

Roy, J. and S. Levey. 2014. Mixed-effects models and unbalanced sociolinguistic data: the need for caution. Poster presented at NWAV 43, Chicago. http://www.nwav43.illinois.edu/program/documents/Roy-LongAbstract.pdf

Saturday, August 23, 2014

How Should We Measure What "Doesn't Exist" - Part 1

I was told the other day that vowel formants don't exist, or that they're not real, or something; that measuring F1 and F2 is like trying to measure the diameter of an irregular, non-spherical object, like this comet:


I can see two ways in which this is true. First, a model of the human vocal tract consisting of two (or even more) formant values is far from accurate: the vocal tract can generate anti-formants, and even without that complication, the shape and material of the mouth and pharynx are not very much like those of a flute or clarinet; the vocal tract is a much more complex resonator. However, most of us are not interested in modeling this physical, biological object as such, nor, unless we are very much oriented towards articulatory phonetics, the movements of the tongue for their own sake. We are more interested in representing the aspects of vowel sounds that are important linguistically.

But here too, the conception that "vowels are things with two formants and a midpoint" is a substantial oversimplification. Most obviously, vowels extend over (more or less) time, and their properties can change (more or less) during this time. Both these aspects - duration and glide direction/extent - can be linguistically distinctive, although they often are not. So to fully convey the properties of a vowel, we will need to create a dynamic representation. While this creates challenges for measurement and especially for analysis, these problems are orthogonal to the question of how best to describe a vowel at a single point - more precisely, a short window - in time.

As mentioned, the "point" spectrum of a vowel sound, a function mapping frequency to amplitude, is complex. For one thing, it contains some information that is relatively speaker-specific. This data is of great importance in forensic and other speaker-recognition tasks, but in sociophonetic/LVC work we are usually happy to ignore it (if we even can). Other aspects of a vowel's spectrum, not necessarily easily separable from the first category, convey information about pitch, nasality, and the umbrella term "voice quality". These things may or may not interest us, depending on the language involved and the particular research being conducted.

And then there is "vowel quality" proper, which linguists generally describe using four pairs of terms: high-low (or closed-open), back-front, rounded-unrounded, and tense-lax. Originally, these parameters were thought of as straightforwardly articulatory, reflecting the position of the jaw, tongue, lips, and pharynx. As we will see, this articulatory view has shifted, but the IPA alphabet still uses the same basic terms (though no longer tense-lax), and for the most part Ladefoged & Johnson (2014) classify English vowels along the same dimensions as A. M. Bell (1867), who used primary-wide instead of tense-lax:


Equally long ago, scientists were beginning to realize that the vocal tract's first two resonances (or formants, as we now say) are related to the high-low and back-front dimensions of vowel quality, respectively. The higher the first formant (F1), the lower the vowel, and the higher the second formant (F2), the fronter the vowel. Helmholtz (1862/1877) partially demonstrated this with experiments holding a series of tuning forks up to the mouth, and he also made the first steps in vowel synthesis. But he found it difficult - like present-day researchers, sometimes - to distinguish F1 and F2 in back vowels, considering them simple resonances:


In contrast to Helmholtz, A. G. Bell (1879) and Hermann (1894) both emphasized that the resonance frequencies characterizing specific vowels were independent of the fundamental frequency (F0) of the voice. Bell provided an early description of formant bandwidth, while Hermann coined the term "formant" itself.

In the new century, experiments on the phonautograph, the phonograph, the Phonodeik, and other machines led most investigators to realize that all vowel sounds consisted of several formants (not just one for back vowels and two for front vowels, as Helmholtz had influentially claimed); advances were also made in vowel synthesis, using electrical devices. In particular, Paget (1930) argued for the importance of two formants for each vowel. But the most progress in acoustic phonetics was made after the invention of the sound spectrograph and its declassification following World War II.

Four significant publications from the immediate postwar years - Essner (1947), Delattre (1948), Joos (1948), and Potter & Peterson (1948) - all use spectrographic data to analyze vowels, and all four explicitly identify the high-low dimension with F1 (or "bar 1" for P & P) and the back-front dimension with F2 (or "bar 2" for P & P). Higher formants are variously recognized, but considered less important. In Delattre's analysis, the French front rounded vowels are seen to have lower F2 (and F3) values than the corresponding unrounded vowels:


Jakobson, Fant and Halle (1952) is somewhat different; it defines the acoustic correlates of distinctive phonological features, aiming to cover vowels and consonants with the same definitions. So the pair compact-diffuse refers to the relative prominence of a central spectral peak, while grave-acute reflects the skewness of the spectrum. In vowels, compact is associated with a more open tongue position and higher F1, and diffuse with a more closed tongue position and lower F1. Grave or back vowels have F2 closer to F1, acute or front vowels have F2 closer to F3. (Indeed, F2 - F1 has been suggested as an alternative to F2 for measuring the back-front dimension, e.g. in the first three editions of Ladefoged's Course on Phonetics).

These authors all more or less privilege an acoustic view of vowel differences over an articulatory view, noting that acoustic measurements may correspond better to the traditional "articulatory" vowel quadrilateral than the actual position of the tongue during articulation. The true relationship between articulation and acoustics was not well understood until Fant (1960). But the discrepancy had been suspected at least since Russell (1928), whose X-ray study showed that the highest point of the tongue was not directly related to vowel quality; it was noticeably higher in [i] than in [u], for example. In addition, certain vowel qualities, like [a], could be produced with several different configurations of the tongue. Observations like these led Russell to make the well-known claim that "phoneticians are thinking in terms of acoustic fact, and using physiological fantasy to express the idea".

Outside of a motor theory of speech perception, the idea that the vowel space is defined by acoustic resonances rather than articulatory configurations should be acceptable, although the integration of continuous acoustic parameters into phonological theory has never been smoothly accomplished. While we don't know that formant frequencies are directly tracked by our perceptual apparatus, it seems that they are far from fictional, at least from the point of view of how they function in language.

[To be continued in Part 2, with less history and more practical suggestions!]

References (the works by Harrington, Jacobi, and Lindsey, though not cited above, were useful in learning about the topic):

Bell, Alexander Graham. 1879. Vowel theories. American Journal of Otology 1: 163-180.

Bell, Alexander Melville. 1867. Visible Speech: The Science of Universal Alphabetics. London: Simkin, Marshall & Co.

Delattre, Pierre. 1948. Un triangle acoustique des voyelles orales du français. The French Review 21(6): 477-484.

Essner, C. 1947. Récherches sur la structure des voyelles orales. Archives Néerlandaises de Phonétique Expérimentale 20: 40-77.

Fant, Gunnar. 1960. Acoustic theory of speech production. The Hague: Mouton.

Harrington, Jonathan. 2010. Acoustic phonetics. In W. Hardcastle et al. (eds.), The Handbook of Phonetic Sciences, 2nd edition. Oxford: Blackwell.

Helmholtz, Hermann von. 1877. Die Lehre von den Tonempfindungen. 4th edition. Braunschweig: Friedrich Vieweg.

Hermann, Ludimar. 1894. Beiträge zur Lehre von der Klangwahrnehmung. Pflügers Archiv 56: 467-499.

Jacobi, I. 2009. On variation and change in diphthongs and long vowels of spoken Dutch. Ph.D. dissertation, University of Amsterdam.

Jakobson, Roman, Fant, C. Gunnar M., and Halle, Morris. 1952. Preliminaries to speech analysis: the distinctive features and their correlates. Cambridge: MIT Press.

Joos, Martin. 1948. Acoustic phonetics. Language Monograph 23. Baltimore: Waverly Press.

Ladefoged, Peter. A course in phonetics. First edition, 1975. Second edition, 1982. Third edition, 1993. Fourth edition, 2001. Fifth edition, 2006. Sixth edition, 2011.

Ladefoged, Peter & Johnson, Keith. 2014. A course in phonetics. Seventh edition. Stamford, CT: Cengage Learning.

Lindsey, Geoff. 2013. The vowel space. http://englishspeechservices.com/blog/the-vowel-space/

Mattingly, Ignatius G. 1999. A short history of acoustic phonetics in the U.S. Proceedings of the XIVth International Congress of Phonetic Sciences. San Francisco, CA. 1-6.

Paget, Richard. 1930. Human speech. London: Routledge & Kegan Paul.

Potter, Ralph K. & Peterson, Gordon E. 1948. The representation of vowels and their movements. Journal of the Acoustical Society of America 20(4): 528-535.

Russell, George Oscar. 1928. The vowel, its physiological mechanism as shown by X-ray. Columbus: OSU Press.

Tuesday, July 15, 2014

LAEL Department 40th Birthday Rap


Before one department all others pale
It's 40 years old, and its name is LAEL
In the old school days you had to walk from town
You got hit with a cane if you forgot your gown

Since then we've grown, now we have all flavors
So we don't understand each others' papers
We got CDA and we got pragmatics
And clever students who don't know mathematics
We got literacy and SLA
And a little bit of corpus somewhere far away

But we don't care about subfield labels
We know what matters and it's called league tables
We're gonna catch up, we're gonna grab the gold
Although linguistics at Cambridge is 900 years old

We love Vic, Vicky, Becky, Elaine and Marjorie
But the Phonetics Lab looks like a doctor's surgery
They have a torture cage with which they'll surround you
Then they'll mold your palate and they'll ultrasound you

Speaking of ultrasound, everybody's having babies
Rescuing kittens, and Winston's got rabies
Still there's only one thing that we really fear
But Alison comes only twice a year

Now if this rap isn't quite what you expected
Don't forget when I applied I was initially rejected
But I'm so glad that they changed their mind
Even though I left America far behind

Of course I miss my right to bear arms
But an English life has different charms
We got the Lake District, the Trough of Bowland
The Duchess of Cambridge, and Ronnie Rowlands

So the question is, or maybe I already know
Como dijeron Los Clash, should I stay or should I go?

video of rap during break in song