<\/span><\/h3>\n\n\n\nIf you are like most people in the United States, you read and\nwrite one phrase, sentence, and paragraph at a time. Then, you consider all the words, sentences, and paragraphs of\na full individual text, and that tells you what that text is about. <\/p>\n\n\n\n
For example, when you read the news, you\nprobably read or skim each news article or post from the beginning onward, and\nthen you think about what each one is about. For a class or your own purposes, you might\nalso consider the audience of a particular article, such as whether it is international\nor domestic, or left-leaning or right-leaning. This kind of attention to the\nrhetoric and rhetorical situation of individual texts is something you have probably\npracticed a good deal. <\/p>\n\n\n\n
Reading one sentence and text at a time\nis what your teachers tend to do when they read papers, too: they read your\npaper from start to finish, and then they read your classmate\u2019s paper, and so\non.<\/p>\n\n\n\n
You and your instructors may also think\nabout some aspects of writing across<\/em> individual texts, such as genre or\npurpose. Your teachers might look across a stack of papers, for instance,\nand consider how well a class of students has used primary evidence in a\nresearch paper. In another example, you might look over a Twitter feed to see how\noften people retweet posts in a particular thread. In such instances, you and your\nteachers are paying attention to aspects of the rhetorical situation across\nmultiple texts. <\/p>\n\n\n\nBy contrast, you probably spend little time\nthinking about how language<\/em>\u2014in words, phrases, and sentences\u2014is used across<\/em>\nthe texts you read and write. That kind of focus, on language across texts, is\ncommon in linguistic approaches to writing, which are more popular outside of\nthe U.S. than inside the U.S. Accordingly, if your writing teachers have been\ntrained in U.S. rhetoric and composition rather than linguistics, they know a\nlot about students\u2019 writing generally but may not know a lot about the specific\nlanguage that students use across their papers and across courses. <\/p>\n\n\n\nWhat does all this mean? Most U.S. readers and writers, and most U.S.\nstudent writing research, tends to discuss written texts one text at a time. Understanding\nacross<\/em> texts tends to focus on contextual <\/em>patterns, such as\naudience or genre. Most U.S. readers and writers know less about textual<\/em>\npatterns, or patterns of language across texts and contexts. <\/p>\n\n\n\nOf course, on some level, you do think about language patterns, maybe without even\nrealizing it. It\u2019s part of why you can recognize a newspaper article and why\nyou know how to write a text message: you have paid attention to how people use\nlanguage in patterned ways. But this kind of knowledge\u2014the kind we pick\nup through casual observation\u2014is often subconscious and is rarely systematic. For\nexample, you can probably write a text message that is appropriate for a given rhetorical\nsituation without thinking much about it, because you have picked up on what\nkind of language is appropriate for the genre (text message) and audience (your\nrecipient, such as a family member or friend). But what do you do when you need\nto write something unfamiliar to you? If you are writing your first college composition\nessay, or your first psychology case study, how do you know what language patterns\nare preferred?<\/p>\n\n\n\n
<\/span>Corpus\nLinguistic Analysis <\/strong><\/span><\/h3>\n\n\n\nThis brings us to analysis that uses computer-aided tools to offer\nus a view of language patterns across texts\u2014a bird\u2019s eye view of written language\npatterns. This kind of analysis is called corpus linguistic analysis: the term corpus<\/strong>\nrefers to a body of texts, and linguistic analysis<\/strong>, as you saw before,\nrefers to the examination of patterns of language use. As a complement to understanding\none text at a time, corpus linguistic analysis can help us systematically\nanalyze and understand written language in terms of patterns across many texts\nand across time. <\/p>\n\n\n\nReading so far, you may already be picking up on three premises,\nor assumptions, related to corpus linguistics: <\/p>\n\n\n\n
\nTexts\nmake meaning in patterned ways across texts and contexts. <\/li>\n\n\n\n It\ncan be hard to comprehend language patterns if we are trained to read and\nanalyze only one text at a time. <\/li>\n\n\n\n Attention\nto language across texts and contexts can teach us additional information about\nwhat is expected in particular rhetorical situations. <\/li>\n<\/ul>\n\n\n\nYou are probably already picking up on a detailed definition of\ncorpus linguistic analysis, too. Corpus\nlinguistic analysis<\/strong> refers to the examination of textual patterns in a selected body of naturally produced texts, usually via computer-aided tools that\nfacilitate searching, sorting, and calculating large-scale textual patterns. <\/em><\/p>\n\n\n\nNotice two key terms inside this definition:<\/p>\n\n\n\n
\nTextual\npatterns<\/em><\/strong>: <\/em>lexical or grammatical patterns that persist across texts in a\ncorpus, in contrast to more varied choices or to patterns in other corpora<\/li>\n\n\n\nNaturally\nproduced texts<\/em><\/strong>: <\/em>a given corpus consists only of language produced for authentic,\nreal- world purposes<\/li>\n<\/ul>\n\n\n\nIn sum, corpus linguistic analysis is about identifying choices\npeople make (and don\u2019t make) across texts, and we can use the results of such\nanalysis to enhance our understanding of how language and texts work. Corpus\nlinguistic analysis has been used a lot since the mid- to late-20th<\/sup>\ncentury, especially outside of the U.S., in places like England, Asia, and\nAustralia, to help teachers and students learn about expert and student writing\nchoices that come up again and again.<\/p>\n\n\n\n<\/span>The Bird\u2019s-Eye View of Language: Why Corpus Linguistic Analysis?<\/strong><\/span><\/h3>\n\n\n\nYou may not be convinced yet. If we are\nmost used to reading and writing one text at a time, why introduce something\ndifferent? Why get a bird\u2019s eye view of language patterns across texts? <\/p>\n\n\n\n
Some good reasons include that we get to\nsee different details when we look across texts\u2014details we can miss or\nmisperceive when we read one text at a time. Here are two key reasons why\ncorpus linguistic analysis can be useful, followed by examples from corpus\nlinguistic analysis of academic writing.<\/p>\n\n\n\n
\nOur perceptions of language use are\noften misleading<\/strong>. <\/li>\n<\/ul>\n\n\n\nIt\u2019s easy to come to inaccurate\nconclusions about language, because some things catch our attention more than\nothers. For instance, people tend to think that language is changing rapidly\nwhen they read slang words on the Internet. But actually, there are many more\nwords on the Internet that have been around a long time than there are new\nwords. Corpus linguistic analysis has shown that only around 3% of online\nlanguage use includes internet-specific slang such as abbreviations. It\u2019s just\nthat the newer words grab our attention more than the old ones. In this\nexample, corpus linguistic analysis helps us quantify what percentage of words\non the internet are actually new words, and what percentage are words we have\nbeen using for a while. Let\u2019s\nconsider one more example, this one from research on academic writing.<\/strong><\/p>\n\n\n\nHave you ever found it difficult\nto read college textbooks? <\/strong>Doug Biber and his research team used\ncorpus linguistic analysis to analyze different kinds of language use on\ncollege campuses, including research articles, textbooks, and office hours. One\nthing they wanted to investigate was how textbooks compared to these other\nkinds of language use, because instructors often think that textbooks provide\neasy-to-read narrative descriptions for students. <\/p>\n\n\n\nBased on corpus linguistic\nanalysis of all of these kinds of language, Biber et al. found that textbooks\nare not characterized by narrative, accessible language like spoken\nconversation. Instead, they tend to include dense, present-tense discussions of\nimplications, making textbooks challenging to read for students. In some ways,\ntextbooks are just as difficult to parse as research articles. <\/p>\n\n\n\n
\nMuch of our knowledge about\nwritten language is tacit, or unconscious<\/strong> (Odell et al.).<\/li>\n<\/ul>\n\n\n\nOnce we have learned to write in a\nparticular way, it is easy to forget the conscious steps we had to learn to do\nit in the first place. That is why it can be hard for your teachers to realize\nwhat might be challenging about an academic writing task they assign, and why\nit might be hard for you to explain to a grandparent how to write a tweet or\nhow to use hashtags. Let\u2019s again turn to a more specific example from research\non academic writing.<\/p>\n\n\n\n
Have you ever felt like you didn\u2019t know what a teacher wanted in your writing?<\/strong> What teachers want can be subtle, or even unstated. Brown and Aull did a corpus analysis of advanced placement English essays that showed two distinct patterns in successful and unsuccessful essays. The successful student writing included specific, detailed phrases, while unsuccessful student writing included generic, emphatic phrases. This means, for instance, that a successful student essay might include the following sentence:<\/p>\n\n\n\nA\ntwentieth-century understanding of grief<\/strong>\nsuggests that it takes time<\/strong>. <\/p>\n\n\n\nIn this sentence, a detailed\nphrase about an understanding of grief (underlined in the example) is the\nsubject of the sentence. <\/p>\n\n\n\n
By contrast, an unsuccessful\nstudent essay might instead say: <\/p>\n\n\n\n
Grief<\/strong>\nobviously takes time<\/strong>. <\/p>\n\n\n\nThis sentence includes a simple\nsubject (grief<\/em>) as well as an emphatic word obviously<\/em>.To\nacademic readers, the second sentence can seem too general and too strong.<\/p>\n\n\n\nThe bottom line is that our\nperceptions of language use can miss important patterns, because we tend to\nread one word, sentence, and text at a time. Getting a bird\u2019s-eye view allows\nus to understand more about the kinds of choices people tend to make with\nlanguage, including successful and unsuccessful choices in academic writing. As\nwe learn about such patterns and practice looking for them, we can become more\nadept at recognizing what characterizes different kinds of written texts. <\/p>\n\n\n\n
Example\nexercise: Words that hang out with one another <\/strong><\/p>\n\n\n\nLet\u2019s get some practice thinking about language patterns. We\u2019ll do\nthis by considering collocations<\/strong>, or\nthe words that most often hang out with other words. (The technical,\nfancy-sounding definition of collocations is \u201cthe habitual juxtaposition of a\nparticular word with another word or words with a frequency greater than\nchance.\u201d)<\/p>\n\n\n\nFirst, try to guess: What words collocate, or hang out, most often\nwith the word idea <\/em>in U.S. English? <\/p>\n\n\n\nSpecifically, what words do you think come just before idea<\/em>,\nin all sorts of U.S. English (spoken, fiction, academic, news, and magazine)? List\nyour top 5 guesses. <\/p>\n\n\n\n________________ idea<\/em><\/p>\n\n\n\n________________ idea<\/em><\/p>\n\n\n\n________________ idea<\/em><\/p>\n\n\n\n________________ idea<\/em><\/p>\n\n\n\n________________ idea<\/em><\/p>\n\n\n\nTo test your guesses, we can turn to corpus linguistic analysis,\nusing the Corpus of Contemporary American English (COCA). COCA is an online\ndatabase where you can search all kinds of patterns in American English, across\nspoken conversation, fiction, academic writing, news, and magazines. You\u2019ll see\nCOCA listed in the resources below with a URL so that you can check it out\nyourself.<\/p>\n\n\n\n
For this search, we\u2019ll look for all words immediately to the left of\nidea. These are called 1L<\/em> collocates, because they appear 1 space to the\nleft.<\/em><\/p>\n\n\n\nUse of the word IDEA<\/strong> in\nCOCA (all registers)<\/p>\n\n\n\n\n Top 10\n 1L Collocates <\/strong>\n <\/td>\n <\/strong>\n <\/td><\/tr>\n good<\/strong><\/strong>\n <\/td>\n idea<\/strong>\n <\/td><\/tr>\n bad<\/strong><\/strong>\n <\/td>\n idea<\/strong>\n <\/td><\/tr>\n whole<\/strong><\/strong>\n <\/td>\n idea<\/strong>\n <\/td><\/tr>\n great<\/strong><\/strong>\n <\/td>\n idea<\/strong>\n <\/td><\/tr>\n better<\/strong><\/strong>\n <\/td>\n idea<\/strong>\n <\/td><\/tr>\n new<\/strong><\/strong>\n <\/td>\n idea<\/strong>\n <\/td><\/tr>\n very <\/strong><\/strong>\n <\/td>\n idea<\/strong>\n <\/td><\/tr>\n basic<\/strong><\/strong>\n <\/td>\n idea<\/strong>\n <\/td><\/tr>\n clear<\/strong><\/strong>\n <\/td>\n idea<\/strong>\n <\/td><\/tr>\n general<\/strong><\/strong>\n <\/td>\n idea<\/strong>\n <\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\nHow many of your guesses were right? Did you guess that not only\nare good idea <\/em>and bad idea<\/em> popular, but so too are the\nexpressions (the) very idea, basic idea, <\/em>and general idea?<\/em> <\/p>\n\n\n\nLet\u2019s think about these patterns. Several collocations show evaluation\n<\/em>of an idea (good idea, bad idea, great idea<\/em>), including some\ncomparison (better idea, new idea<\/em>). Others show emphasis<\/em> on an\nidea ((the) very idea<\/em>). Finally, others convey a summary or gist of an\nidea (whole idea, basic idea, general idea<\/em>). (Clear idea <\/em>is used\nboth in evaluation and in summary statements.) <\/p>\n\n\n\nMany people guess that people describe ideas as good <\/em>and bad<\/em>,\nbut they don\u2019t realize how often speakers and writers use idea <\/em>to let\ntheir audience know that they are summarizing something. As you read before, this\nis the kind of thing that corpus linguistic analysis can uncover: common\npatterns of language use that we don\u2019t necessarily pay attention to but that\ncan tell us what matters to people in a given type of writing. Picking up on\nthese collocates might, for instance, help students begin to notice how often\npeople summarize, and when they tend to do so.<\/p>\n\n\n\nIf we use the above examples, for instance, you could consider the\nfollowing as you begin to read and write in a new course: How do writers\ndescribe ideas? Do they evaluate them (e.g., as good, bad, <\/em>or correct<\/em>)?\nDo they describe them (e.g., as theoretical, abstract,<\/em> or practical<\/em>)?\nDo they summarize them (e.g., general, overall<\/em>)?<\/p>\n\n\n\nLet\u2019s explore one more example, this one concerning something many\nstudents wonder about: the first person in academic writing.<\/p>\n\n\n\n
Here\u2019s our question for this one: How do writers draw attention to\nthemselves as writers by using the first person I<\/em> or we<\/em>?<\/p>\n\n\n\nLet\u2019s first make a guess about expert academic writing. In academic writing published in the U.S., what words do you think collocate, or hang out, with I<\/em>? Specifically, what words do you think most often appear right after I,<\/em> or immediately to the right<\/strong> of the word I<\/em>, in academic writing? Again, note your top 5 guesses.<\/p>\n\n\n\nI <\/em>________________\n<\/p>\n\n\n\nI <\/em>________________\n<\/p>\n\n\n\nI <\/em>________________\n<\/p>\n\n\n\nI <\/em>________________\n<\/p>\n\n\n\nI <\/em>________________\n<\/p>\n\n\n\nWe can again use corpus linguistic analysis to find out how\naccurate your guesses are. Specifically, we can use the Corpus of Contemporary\nAmerican English academic subcorpus (COCAA) and search for words 1 space to the right, or 1R, of I.<\/em><\/p>\n\n\n\nUse of the word I <\/em>in COCA, Academic writing<\/p>\n\n\n\n\n <\/strong>\n <\/td>\n Top 10 1R Collocates<\/strong>\n <\/td><\/tr>\n I<\/strong>\n <\/td>\n have <\/strong>\n <\/td><\/tr>\n I<\/strong>\n <\/td>\n was<\/strong>\n <\/td><\/tr>\n I<\/strong>\n <\/td>\n think<\/strong>\n <\/td><\/tr>