A tool for Conference analysis

While we know that gospel principles are eternal, we must also admit that the language used to describe them changes over time. And now we have a tool for discovering and analyzing how Church leaders have changed their descriptions of the gospel over the past 160 years.

BYU Linguistics professor Mark Davies has released his Corpus of LDS General Conference Talks, a database containing General Conference talks since 1850 (some 10,000 talks and 24 million words) along with robust tools for searching and analyzing how the language in the talks has changed over time. This corpus, or collection of texts, is just the latest of several that Davies has made available to researchers, including his 400 million word Corpus of Historical American English and his 410 million word Corpus of Contemporary American English.

This is much more than just a long word processing document and better than average search tools that allow you to find every time the word “green” appears. The texts in this corpus includes much more information than just words. The texts are dated and have been analyzed to identify the part of speech of each word. And the search tools are much more sophisticated than those you will find in any word processor. Users can search not just for a word, but for its synonyms also, essentially allowing users to search for a concept instead of just a word (searching for “sin” and its synonyms also finds evil, wickedness, iniquity, crime, transgression, immorality, transgress, err, wrongdoing, lapse, debauchery, depravity, turpitude, misdemeanor and misdeed).

Best of all, users can look at the frequency of these words and concepts over time, learning, for example, that the concept of “sin” was mentioned twice as often as it is now in the 1850s, and 50% more through the 1880s, before falling to a level 20% lower than now in the early 1900s. The concept was again a popular topic int eh 1960s and 1970s (50% more than now) before dropping back down again.

Late last fall Google introduced a tool with some of these capabilities, drawing a few posts here on the bloggernacle about how it could be used for Mormon Studies (see J. Max Wilson’s post at Millennial Star and my own on a still unexplained Mormon literary mystery on A Motley Vision). We were then unaware that Davies already had a tool that provided the same information and allowed more sophisticated searches. While not as large as Google Books, which includes 500 billion words, the tools in Davies’ smaller (400 million words) Corpus of Historical American English are much more sophisticated, and Davies argues that, at least for researchers, his corpus is more useful.

Unfortunately, the interface for using Davies’ corpus isn’t as easy to use as Google’s—mainly because it is so much more sophisticated. It is hard to make more complex tools easy to use—sophistication comes at a price. Davies’ system also doesn’t give the nice graphs that Googlelabs’  project provides. However, its easy to take the data from Davies’ corpus and copy it into a spreadsheet, where any spreadsheet jockey worth his salt can produce very nice graphs.

But perhaps most importantly, Davies’ General Conference corpus has one overriding advantage over any other—it is limited to just Conference talks. Searching on Google Books’ ngrams viewer or even on Davies own Corpus of Historical American English tells you about overall use of words and concepts—it gives you an idea of how the culture as a whole used language. The General Conference corpus helps us understand the word use of a much smaller group of people—LDS general authorities. That restriction alone makes this corpus extremely useful to Mormon Studies.

Of course, this also begs the question “what other corpora could be useful to Mormon Studies?” Off the top of my head it should be possible to put together corpora for things like the text of Mormon periodicals, Mormon missionary diaries (from BYU’s collection), the Deseret News and even the collection of contemporary Mormon texts we call the bloggernacle. I wonder what we could learn if we were able to analyze these corpora also?

Enhanced by Zemanta

17 comments for “A tool for Conference analysis

  1. Of course, the first thing I did was type in “Social Justice” and hit search…

  2. Hurray for the ’30s and ’40s on that one, Matt!

    I searched “blood atonement” and discovered that Ben E. Rich had a real bee in his bonnet over that in the 19-aughts. Wouldn’t have guessed that.

    Being able to pull up a contextual quote and the speaker’s name means this tool is going to be of use to me as a sophisticated index, even if I don’t understand and can’t use the linguistic tools it was intended for. Thanks, Kent.

  3. Wow what a cool website. I typed in “last days” and it was mentioned 233 times in the 1850s as opposed to 61 times in the 2000s.

  4. That is an interesting insight Brad. The closer the “end times” come the less we talk about them. I wonder if societal norms sway that? that ‘end times’ talk is seen as lunatic and so authorities stay away from it.

  5. Kent,

    about the capabilities of this corpus, when searching “end times” does include “last days” or “latter days”? I seems like your description of it should allow for them, but I don’t know why “end times” wouldn’t return hits for each time the name of the full church is used – unless there limits in place to restrict hits.

  6. So early saints saw the end times as only a few years or decades away and now we see them as decades or centuries? That’s feasible.

    But if we accept we don’t know “when” the end times are, then by definition we would have to be closer to them today than in the past no matter when they are. So why don’t we talk about them anymore? Do we really view them as farther away? I know I don’t.

  7. Jax, as I indicated in the OP, the interface isn’t always easy. I tried to do synonyms of “last days” but it won’t let me do synonyms of a phrase, just the individual words. There are also no synonyms of “days” so you end up with [=last] days as the search term. End times comes out as [=end] times, but it has almost no results.

    It would probably help if we could come up with a single word synonym for “last days”

  8. Very cool. I did a search for “socialism” — there are a few mentions of it each decade all the way back to the 1850s, and then an astounding 80 times in the 1960s!


  9. Yeah I typed in communism and it appeared a whopping 191 times in the 1960s as opposed to 0 times in 1990s and 1 time in the 2000s.

  10. “Working mother” got 9 hits:
    1960s = 6
    1980s = 3
    But “working mothers” got 15 hits distinct from “working mother”
    1960s = 7
    1970s = 3
    1980s = 4
    1990s = 1
    Professor Davies, if you’re here, could you advise searchers how to compensate for this? Or could you change the search function to include simple plurals?
    I love that you can see quickly and neatly who the speaker is and exactly what year. Could you also show somehow unique talks? For example, all 3 1980s hits of “working mother” are from the same speaker; are they all from one talk? Thanks for this interesting and useful tool!

  11. Jennie, that function is there. I think there is a symbol of a little “?” (question mark) in a box at the end of each search field. If you click on that box, you’ll get the search function help, which will tell you how to include plurals.

    As I indicated in the post, this interface isn’t quite as easy as Google’s interface, but it does handle searches that are more complex than most of us encounter regularly [For example, you can search for uses of “work” as a noun, as opposed to “work” as a verb, something Google can’t do]. Studying the help page may tell you how to do what you want. It will certainly give you a better idea of what the capabilities of this corpus is.

  12. It’s a cool idea. I wish I were smart enough to run it. I’ve tried it, and can’t get it to go. I’m an idiot savant – just can’t do computers. In all other areas I’m omniscient. Except for the ones I aint.

Comments are closed.