Sunday, October 25, 2015

Quantifying Overlap: A Shiny App for NWAV 44

TL;DR
Pillai (actually 1 - Pillai):
df <- data.frame(x = F2, y = F1, class = vowel.class)
m <- lm(cbind(df$x, df$y) ~ df$class)
pillai <- 1 - anova(m)["df$class", "Pillai"]

Bhattacharyya affinity:
library(adehabitatHR)
df <- data.frame(x = F2, y = F1, class = vowel.class)
spdf <- SpatialPointsDataFrame(cbind(df$x, df$y), data.frame(class = df$class))
ba <- kerneloverlap(spdf, method = "BA", kern = "epa")[1, 2]



Today at NWAV in Toronto, I ran out of time, but presented most of this presentation. This is not Powerpoint slides, but a Shiny app, which contains an interactive Overlap Simulator and an ANAE Explorer for the low back vowels. I think that interactive apps like this can be very useful as part of presentations and even publications, as we move away from the model of the traditional paper journal. The R/Shiny code that makes up the app is here and here, but please note that this was my first time trying to write this kind of code! Included in the code are functions for the Pillai score, the Bhattacharyya affinity, and the "Closest Centroid Correct" measure discussed in the text.

To summarize my talk, the popular Pillai score -- as noted by Nycz & Hall-Lew -- is not really a measure of the overlap of two clouds of points, such as vowel tokens from two different word classes. Pillai (a parametric statistic making several assumptions about the data) is more similar to an R-squared measurement, asking what proportion of the total variability in the data is "explained" by the difference (in means) between the two categories. Even when two clouds are clearly non-overlapping, there is still residual variation in each cloud, which means that Pillai will not come out as 1. On the other hand, if the means of the two groups are equal, Pillai will always come out as 0, even if the clouds have different shapes and are not technically showing complete overlap. Finally, Pillai is sensitive to imbalance in token numbers between word classes. If one class has more data than the other, Pillai suggests that there is more overlap than if the number of tokens were equal.

The Overlap Simulator allows the user to observe these drawbacks of the Pillai score, and to note that the Bhattacharyya affinity (or coefficient) generally does not suffer from the same problems (although it is also skewed, to a lesser degree, when the tokens are imbalanced across groups). BA was explicitly designed as a measure of the overlap of two continuous distributions, and has a very simple mathematical formula: multiply the class probabilities, take the square root, and integrate over the plane. For R to estimate and implement BA, though, a few parameters must be set: the type of kernel, the kernel bandwidth, and the grid size. I have mainly used the default values for these.

Another measure of overlap, which I came up with (as far as I know), is the Closer Centroid Correct Criterion (CCCC). This seems to perform similarly to BA, although it tends to have a lower value (when converted to a scale where 0 means no overlap and 1 means complete overlap). One possible advantage of CCCC is that its calculation is very simple: it represents the chance that a point is closer to the centroid or mean of its own class rather than the other class. This seems like it could reflect the amount of confusion that a listener might have in distinguishing two vowel classes in the speech of another person, and also would presumably (?) be computationally/brain-instantiable much more readily than the Bhattacharyya method, which involves estimation, multiplication, and integration of two-dimensional probability distributions.

While results from the ANAE Explorer were preliminary, it was clearly evident that the Pillai metric failed to reflect the degree of low back separation of some of the speakers in the Mid-Atlantic and Inland North region. Another point to mention is that it makes a big difference whether the overlap of the LOT and THOUGHT vowels is assessed with or without making an adjustment for phonetic environment.

Experimenting with this adjustment -- which amounts to working with the residuals from a regression model that fits preceding- and following-segment coefficients pooled across all speakers -- shows that LOT and THOUGHT usually appear to overlap more once phonetic effects are taken into account. An extreme example of this is Gus K. from Nashville, TN, whose unadjusted BA was .320, but whose adjusted BA was .818. However, factoring out phonetic environment can sometimes have the opposite effect, like for Tony M. from Knoxville, TN, whose unadjusted BA was .690 and whose adjusted BA was .234.

Sunday, October 4, 2015

Slash Non-Fiction

There may be times when you do not wish to exclude [tokens with] a factor from your entire analysis, but you do want to exclude [that factor] from the results of a given factor group. - Goldvarb 2001 Users' Manual

Since Rbrul was created in 2008, users have asked about the possibility of this type of partial exclusion. Until now, this capability has continued to be available only in Goldvarb. While Goldvarb is still the only way to achieve more complex exclusions (for example, excluding tokens from the results of one factor group based on the values of another factor group), Rbrul now emulates the Goldvarb "slash operator" in its primary usage.

If one or more tokens have "/" (slash) for a certain predictor (factor group), then regardless of the value(s) of the dependent variable for those tokens, the log-odds coefficient for the slashed group is forced to zero (factor weight .500), although this is reported as "NA" for the sake of clarity. In effect, these tokens are ignored in the calculation of the other results for that predictor, while being included normally in the calculations for the other predictors.

I have not tested this new function very thoroughly, only making sure that the output matched Goldvarb on a few simple models. So please give me feedback (danielezrajohnson@gmail.com) if it seems to have problems, or even if it seems to be working properly.

Thanks for your patience, and happy slashing!

Saturday, May 23, 2015

♫  Ain't Even Trying ♫

I said I got loving on my mind, Loretta
Should I shake it off and try to forget her?
She said maybe, but it's still too soon
You walk the line, you'll see about June

If you see me walking, that's my way of living
The sun's in the water, but there's light in the west
If you hear me crying, that's my way of singing
Ain't even trying to get this love off my chest


If you need a manager, I'll play Doolittle
If you want, you can play me like a fiddle
I was playing in the jailhouse, honey
You looked in, I walked out like Johnny

If you see me walking, that's my way of living
The sun's in the water, but there's light in the west
If you hear me crying, that's my way of singing
Ain't even trying to get this love off my chest


I can't fight like Mooney Lynn
Can I turn out the light and let the moon shine in?
This half moon is in first quarter
You'd make a fine full moon, Miss Carter

If you see me walking your way, I'm living
When you're laughing, I can rest
I'll be singing to you, love, when you're crying
Ain't even trying to get this love off my chest

Wednesday, May 20, 2015

diamond

how can i fade
emerald to jade
sapphire to topaz
ruby to red glass
how can i solder
steel onto silver
how can i hold
gilt after gold
how can i order
wine after water
how can i share
vapor with air
you left a conundrum
harder than corundum

Sunday, April 26, 2015

♫  Get Me Back ♫

Can't waste another day in Margaritaville
I gotta wake up from this dream
Can't take another round with Jose Cuervo
I need another shot at that old Jim Beam

Gonna get me back to work, back from vacation
Get me back to the land I love
Gonna get my radio back onto that country station
If I can't get back the girl I'm thinking of


I used to see you over a pint glass
Now your shadow's on my plastic cup
But when I saw your face in a coconut
I knew that it was time to go home and give up

Gonna get me back to work, back from vacation
Get me back to the land I love
Gonna get my radio back onto that country station
If I can't get back the girl I'm thinking of


Some folks come here to get happy
Some are here to forget who they are
I thought I came to get lucky
But you can't drown bad luck at a swim-up bar

Gonna get me back to work, back from vacation
Get me back to the land I love
Gonna get my radio back onto that country station
If I can't get back the girl I'm thinking of

Wednesday, April 22, 2015

if i get offa this mountain

she's a drunkard's dream
you're the midnight sun
she is all five senses
you are just the one

Tuesday, April 21, 2015

about face time

I wake to emails that are not from you
The single buzzes on my run aren't too
Later, the doubles throw names on the lock
screen, but they're someone else's - we don't talk.
The T runs late, a fortnight after U
Riders are smiling, texting back, a few
have the correct fair hair, but the wrong face
I check your typing bubble - just in case.
The flight east makes a short night run away
I'd have a red eye even at midday.
I'll dry, return to touch before the fall
rains mute my will - "touch to return to call".
The Lune drains slowly west - the day we met it ran uphill
We'll find our level best - I know you're standing, hope you're still

louisiana (1997)

What does the king of Mardi
Gras do when the partys over
when he gets up round twelve
o clock on Ash Wednesday What
do you reckon he gives up for
Lent I bet he takes a stroll
 through the French Quarter
  Maybe he will see a nice
  string of white plastic
   beads like big pearls
    Jewelry just Que in
     a few hours fait a
     ago now a le Roi gutter near Bourbon
    Kings de Mardi Gras Street Yesterday
   litter après la soirée he handed just
   lies quand il se relève such necklaces
   sur le midi le mercredi des to subjects
   Cendres Que renonce-t-il pour le carême
  Il se promène à travers le Vieux Carré  en
 baîllant Il trouve un joli collier de perles
           blanches    perles en plastique
                          abandonnées  hier
                          le Roi   le     rama
                                            sse

Monday, April 20, 2015

Thoughts on 18 July 1997 [at Goldman Sachs]

Kelly LaRosa
is tall and pretty
My tax return shows a
liability
Carolyn Katz
Roberta Goss
I have a job that's
a total loss

pacific

Your grass is greener even when it's brown
That buoyancy you found was just a bay
You'll pass the Golden Gate before you drown
This boy will find you swimming there some day

Thursday, April 16, 2015

An Interview With A Climate-Change Denier

Of temperature and tone: Has climate shaped human languages?
by Mary Caperton Morton
Earth Magazine
April 20, 2015

I participated in this journalism in January and February, and Morton wrote a great, balanced article. She was good enough to send the story in advance, allowing me to double-check my quotes. Still, the piece changed a bit during the editing process. Probably for reasons of length, the original surprise ending - where I had the last word! - was removed. Here it is:

'But it may take more than a few experimental games of telephone to convince linguists that languages evolve in response to environmental changes, Johnson says. "They're trying to suggest that the same forces that shape human cultures also shape human language. And since, quite obviously, aspects of human culture, especially material culture, reflect the physical environment, they expect language to do so too. But the interplay between culture and language has been a hotly debated topic for decades."'

Wednesday, April 15, 2015

♫  I'll Make A Move To Nashville ♫

These hands never held your face
But there's teardrops on my guitar case
I'll wait for you, but if you wait too long
You're gonna wind up in a country song

I would move across the world for you
And the way you move already asked me to
They'd write songs about our love, but if you move on, then I will
If I can't make a move on you, I'll make a move to Nashville


There's no letters to return
And just one picture of us to burn
You're gonna hear, if you still can't see
It's tender kisses or Tennessee

I would move across the world for you
And the way you move already asked me to
They'd write songs about our love, but if you move on, then I will
If I can't make a move on you, I'll make a move to Nashville


The move worked out for Elvis
I bet he never had trouble like this
It worked for Alan Jackson
And it worked for Taylor Swift

I would move across the world for you
And the way you move already asked me to
They'd write songs about our love, but if you move on, then I will
If I can't make a move on you, I'll make a move to Nashville

Tuesday, April 7, 2015

it was just that the time was wrong

without your typing on the screen
the writing's on the wall
without you willing to be seen
april's a fruitless fall
without a happy ending
may is a long massage
without a long flight pending
is june just a mirage?
was march this year? i'll miss you
till the second of your text
the first of if i kiss you
will steal seconds from the next
see you soon or in some second if i never
catch you later or remember you forever

Monday, March 30, 2015

littorally

Invertebrates can end up like a girl
Luckless in amber, wounded round a pearl
Neither at home in water nor awing
Some fly into a swatter, some a ring

Wednesday, January 21, 2015

My First Post-Publication Review: "Climate, vocal folds, and tonal languages" by Everett et al. (2015)

I suggested in my last post that in the future, the current system of academic peer review should be replaced by post-publication peer review. A recent publication in PNAS (Proceedings of the National Academy of Sciences) provides a good opportunity for me to attempt such a review myself. Of course, an article published in PNAS has already gone through peer review, but the close relationship between authors and editors there often results in a product that – even more than always – might benefit from some outside criticism.

"Climate, vocal folds, and tonal languages: Connecting the physiological and geographic dots," by Caleb Everett, Damián Blasi and Seán Roberts, is an ambitious, though short paper that argues for a connection between low humidity and temperature and the absence of tone, particularly complex tone (defined as three or more levels of phonemic pitch contrast), in languages around the world.

To make this argument, Everett et al. first cite literature, particularly from laryngology, supporting the idea that pitch distinctions are more difficult to make under dry and cold conditions. And if complex tone is "maladaptive (even in minor ways)," they predict that languages spoken in comparatively arid and/or chilly locales will be more likely to "lose/never acquire" complex tonal contrasts. The majority of the paper is then devoted to demonstrating that the predicted correlation exists: globally, within large language families, and across linguistic isolates.

To me, it was helpful to think of the paper's structure in the opposite direction. Clearly, if the correlation between climate and tone does not exist – or if it does exist, but could be due to chance – then as far as this particular claim is concerned, we could stop right there. However, it would still be worth thinking about the proposed explanation. While ease of articulation and perception are certainly important forces driving language change, the idea that these forces themselves might vary based on totally extra-linguistic factors (like climate) is very intriguing.

But if we do accept Everett et al.'s argument, we might also expect that many other small differences in people's environments should differentially favor changes to their language over the long term. So assuming more phonetic predictions of this type can be made, the theory would have a problem if the geographical correlations don't pan out as well as they seem to in this case. Also, if we extend our interest to small differences in people's anatomy in different parts of the world, we might be treading on ground that is considered, at least since the mid-twentieth century, rather dangerous.

Returning to the specifics of the article, the argument that dry and cold air negatively affects the production of precise pitch differences is well-supported, but the magnitude of these effects is never made clear. While the evolutionary argument does not depend on the effects being large, it would have been nice to know, for example, if the increased imprecision in pitch when "jitter measurements increased by over 50%" were comparable in magnitude to tonal pitch differences, or not. It is also relevant that language hearers typically "normalize" or compensate for phonetic differences of considerable size in the speech of their interlocutors. This point, and more generally the relationship between pitch (a phonetic property) and tone (a phonological one) was not considered.

Surprisingly, in the section about the geographic correlation, no quantitative estimate is ever given of the effects of humidity (or temperature) on the likelihood of a language having complex tone. This information is presented in a cumulative distribution plot (Figure 2), which does have the advantage (from the authors' point of view) of maximizing the appearance of the effect.

When numbers are presented, they compare the climate properties of tonal vs. non-tonal languages, rather than treating climate as the explanatory variable it is claimed to be. This may seem like a quibble, but it makes it rather difficult to understand just how strong an association is being shown. For example, when we read that "the average [humidity] for isolates with complex tone is 0.017, whereas the average for other isolates is 0.013," this measures a difference in average climate (whatever that means), depending on the language type. What the reader deserves to know is how different the tonal properties of languages are, depending on the climate.

Although the bulk of this section quite correctly attempts to eliminate areal effects as an explanation for the association between climate and tone, the final paragraph seemingly does an about-face, suggesting that "tone spreads across languages more effectively via
 interlinguistic contact in regions with favorable ambient conditions" and less effectively in cold/dry regions. This expands the scope of the hypothesis beyond language transmission to include language contact, without any additional evidence, and possibly at the risk of circularity.

I would have expected that what linguists already know about tonogenesis would be more relevant to this topic. Mentioning it for the first time in their discussion and conclusions section, Everett et al. say only that this literature does not predict any effect of climate. Actually, this might make perfect sense if languages in dry, cold climates only tend to lose tone, rather than "lose/never acquire" it (to return to the authors' curious conflation). But in this case, some discussion of how tone is ordinarily thought to be lost might have been worthwhile, even if the influence of climate could be independent.

In summary, I found the argument for the geographic correlation itself to be fairly strong, although I did not really look into the details here. The link between the proposed phonetic effect and language change was plausible, but needed more grounding in research on language change in general and the loss of tone in particular. But I was less convinced that the physiological (or phonetic) effects of dry and cold air are really an obstacle to producing phonological tone. Like Everett et al., I too hope "that experimental phoneticians and others examine the effects of ambient air conditions on the production of tones and other sound patterns, so that we can better understand this pivotal way in which human sound systems appear to be ecologically adaptive." Unless they are too busy.