Sunday, December 7, 2014

Sour Grapes And Sweet Mixed Models: Avoiding The Goldilocks Problem in Publishing


Longtime readers of this blog, and people who look in the archives (try 2010), know that I sometimes write poems. I think they're good, and my kinder readers, or those who know less about academic poetry -- and I certainly belong in both categories -- ask if I've tried to publish them. And I say I would if I could, but I know I can't. And also, I already have.

Before Gutenberg, people basically passed notes. It was democratic, it was slow, it could be subversive, but it wasn't easy to connect across long distances or outside of one's social networks. The invention of printing allowed a much wider and more rapid dissemination of all kinds of writing. At the same time, it concentrated power in the hands of publishers, who decided what to print and what not to. And made money off it.

Today, publishers still make these decisions, and they still make money off it, but they exist for a different reason. No longer necessary for getting your work "out there", publishing (as opposed to self-publishing) gives the writer prestige and hopefully career advancement, and it helps readers decide what is worth reading (or at least what to cite).

In today's world, where there is a lot out there even on obscure topics, and an overwhelming amount on popular ones, it is certainly necessary for judgements and decisions to be made. No one can read everything, and not everything is worth reading. However, we should be able to make these judgments and decisions for ourselves. In Anglo-American legal terms, we don't need, or want, "prior restraint".

Publishing is outdated and will eventually disappear. Pre-publication peer review will fall along with it and, in my view, be replaced with some version of post-self-publication peer review. But many more thoughtful people than me are out there debating what this new world might look like.

The revolution will be bad news for publishing houses, obviously, and will also pose problems for anyone wanting to evaluate academics the way they are now -- and thus for academics who need to be evaluated, promoted, given tenure, etc. Without getting lost in speculation about how these institutions might work in a post-publishing world, we can anticipate that they will work better than they do now, depending heavily on a flawed system of peer review and publishing.

At a minimum, an author asks one thing from the current system: "If my article be rejected, Lord, it is thy will, but please let not a blatantly shittier article be published in the same journal within six months." But the author's perspective is not only biased, but limited. We may think our article is good (or know we really need to publish it), but we are not interested in every other type of thing the journal might legitimately publish.

And while "not interested" may indeed sometimes mean "a waste of time for me to read", it doesn't necessarily mean "a waste of paper for the journal to print", because other readers with other interests are out there. (I'm setting aside the important fact that there is no paper, that any space restrictions have become intentional, not physical.)

The problem with pre-publication peer review is not that there are too many submissions of varying quality coming from too many different perspectives (where the relationship between quality and perspective is complicated), although there are, and this will make editors' jobs difficult as long as journals exist. The problem is that given this quantity and diversity, pre-publication peer review doesn't (and can't) employ nearly enough reviewers.

Potential reviewers will be more or less familiar with the topic of a submission, and (especially if they are more familiar with it) more or less already in agreement with what it says. Less familiar reviewers may overestimate the originality of the submission, while more familiar reviewers may overestimate the knowledge of the intended audience. There may be reviewers whose own work is praised or developed further in the submission, and others whose work is criticized.

Even when these reviewers all provide useful feedback, a publishing decision must be made based on their recommendations. And the more inconsistency there is between reviewers, or any raters, the more opinions you need to achieve a reliable outcome. Considering how much inter-reviewer variation exists in our field, I think two or three reviewers, no matter how carefully chosen, are not always enough. So it doesn't surprise me that certain things "fall through the cracks", while others do the opposite (whatever the metaphor would be). Luckily, in the future, we will all be reviewers, so this problem will be eliminated along with the journals and publishers.

Some years ago, having published an article pointing out one advantage of using mixed-effects models on sociolinguistic data, when speakers vary individually in their use of a variable -- namely the reduction of absurd levels of Type I error for between-speaker predictors (Johnson 2009) -- I was invited to say more on the topic at a panel at NWAV 38 in October 2009.

In this presentation, I reiterated the point about Type I error, and discussed three other advantages of mixed models: better estimation of the effects of between-speaker predictors (when some speakers have more data than others), better estimation of the effects of within-speaker predictors (when those predictors are not balanced across speakers), and better estimation of the effects of within-speaker predictors (in general, in logistic regression).

I wrote this up and submitted it to a journal (8/10), revised and resubmitted it twice (4/11, 2/12), and then submitted and re-submitted it to another journal (7/12, 4/13). While this process has greatly improved the manuscript, it still tries to make the same points as the NWAV presentation. While the reviewers did not challenge these points directly, they raised valid concerns about the manuscript's length (too much of it) and organization (too little of it), and about its appeal to the various potential readers of the respective journals.

For example, some judged it to be inaccessible, while for others, it read too much like a textbook. Another point of disagreement related to the value of using simulated data to illustrate statistical points, which I had done in Johnson 2009, and more recently here, here, here, here, and here.

When it came to the core content, the reviewers' opinions were equally divergent. I was told that fixed-effects models could be as good as (or even better than) mixed-effects models: "too new!" I was told that the mixed models I focus on, which don't contain random slopes, were not good enough: "too old!" And I was told that the mixed models I discussed were just fine -- "just right?" -- but that everyone knows this already: "d'oh!"

The editors gave detailed and thoughtful recommendations, both general and specific, on how to revamp the manuscript, including trying to find a middle ground between these apparently contradictory perspectives. But even if the article were published in a journal, any reader would still find themselves falling into one (or more) of these camps. Or, more likely, having a unique perspective of their own.

My perspective is that this article could be useful to some readers, and that readers should be able to decide if it is for themselves. Clearly, not everyone already "knows" that we "should" use mixed models, since fixed-effects models are still seen in the wild. They rarely appear in the mixed-esque form devised by Paolillo 2012, but more often in their native (naive) incarnation, where any individual variation -- by speaker, by word, etc. -- is simply and majestically ignored. GoldVarb Bear made several appearances at NWAV 43.

Random intercepts are a stepping-stone, not an alternative, to random slopes (but see Bear, er, Barr et al. 2013). And even if the reduction of Type I error is the most important thing for some readers, the other advantages of random intercepts -- their effect on the regression coefficients themselves -- deserve to be known and discussed. To me, they are not very complicated to understand, but they are not obvious. Readers are welcome to disagree.

Of course, whether individual speakers and words actually have different rates of variation (let alone constraints on variation) is not a settled question in the first place. This article assumes (conservatively?) that they do, and I think most people would agree when it comes to speakers, while many would disagree when it comes to words. But a nice thing about mixed-effects models is that they work either way.

All of this is a way of introducing this document, and placing what is partially sour grapes in some kind of coherent frame. This article is distributed free of charge and without any restriction on its use. If it seems long, boring, and/or repetitive, you can skim it, and certainly don't print it! If something seems wrong, please let me know. And if it's helpful in your work, you can cite it if you like.

This particular article is certainly not a model of anything, just a few ideas, or observations, of a quantitative and methodological kind. But maybe we don't always need a publishing infrastructure to share our ideas -- or at least not if they're not our best ones.


References

Barr, Dale J., Roger Levy, Christoph Scheepers and Harry J. Tily. 2013. Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language 68: 255–278.

Johnson, Daniel Ezra. 2009. Getting off the GoldVarb standard: Introducing Rbrul for mixed-effects variable rule analysis. Language and Linguistics Compass 3(1): 359-383.

Johnson, Daniel Ezra. 2014. Progress in regression: Why natural language data calls for mixed-effects models. Self-published manuscript.

Paolillo, John C. 2013. Individual effects in variation analysis: Model, software, and research design. Language Variation and Change 25(1): 89-118.

No comments:

Post a Comment