Oh, what a BAWE! The British Academic Written English corpus

BAWE case study from the Life Sciences collection in FLAX showing links to Wikipedia resources

This is the sixth post in a blog series based on the the TOETOE International project with the University of Oxford, the UK Higher Education Academy (HEA) and the Joint Information Systems Committee (JISC). I have also made this post in the OEP series available as a .pdf on Slideshare.

FLAX British Academic Written English (BAWE) collections

The BAWE collections in FLAX, as demonstrated in the training video below, enable you to interact with the BAWE corpus of university student writing from across the disciplines to learn about the thirteen different genres assigned by the makers of the corpus (Nesi, Gardner, Thompson & Wickens, 2007). For free access to the complete manual on the making of the BAWE by Heuboeck, Holmes and Nesi, 2010) you can access it from the following link (The BAWE Corpus Manual, An Investigation of Genres of Assessed Writing in British Higher Education). Features from the FLAX open source software (OSS) project for understanding the BAWE, include: word lists and keyness indicators; collocations; lexical bundles; a glossary function with Wikipedia; along with a variety of automated functions for searching, saving and linking within the BAWE corpus.

From its earliest inception the FLAX project has been envisioned and advanced with the language teacher and learner in mind. Since 2008, I have been engaged with the FLAX project to provide user feedback on the development of the language reference collections and to devise ways to promote the project resources within mainstream English language teaching and learning communities. A simplified and intuitive interface has been developed for presenting language collections and interactive learning activities based on the powerful and complex handling of search queries from a range of linked corpora and open linguistic content.

Another open web-based interface for accessing the BAWE is located within the commercial Sketch Engine project. This project provides the more traditional KWIC (KeyWord In Context) concordancer interface for linguistic data presentation with strings of search terms embedded in truncated language context snippets. The Using Sketch Engine with BAWE manual (Nesi & Thompson, 2011) provides an in-depth user guide for the more expert corpus user.

sketchengine
Sketch Engine open concordancer interface for the BAWE showing results for a KWIC query for the item ‘research’.

The Word Tree corpus interface is a JISC Rapid Innovation project based at Coventry University providing yet another open web-based interface alternative to KWIC searches for analysing the BAWE. One of the project’s goals is for the open sourcecode that has been developed for this rapid innovation project to be re-used in further open corpus-based projects for analysing additional corpora which is available from github. This project can be followed via the Word Tree project blog and JISC final report, outlining issues encountered with managing and processing the presentation of large amounts of linguistic data through a word tree interface that provides click through pathways and the ability to prune and graft word tree searches.

bawewordtree
The Word Tree corpus interface for the BAWE showing a search query word tree for the items ‘research’ and ‘research methods’

Reference corpora versus specialist corpora

Comparisons made between language as it is used in reference corpora, such as the British National Corpus (BNC) which provides a snapshot of how English occurs across a variety of contexts, and how it is used in specialist academic sub-corpora, or in actual student-generated academic text corpora as in the case of the BAWE, help us to identify which words and phrases occur more commonly in specific as well as in general academic contexts of use. Not confined by the boundaries of a printed volume, the openly available web-based BAWE collections in FLAX (demonstrated in the video above) are arguably more powerful than the average dictionary or coursebook for practice with academic English.

Before commencing on my journeys with the TOETOE international, I had written an extensive project blog post on open trends within corpora and ELT materials development in Radio Ga Ga: corpus-based resources, you’ve yet to have your finest hour. At the Open Education conference in Vancouver in October 2012, with my presentation on the Great Beyond with Open ELT Resources (see below) I had outlined the development work that TOETOE and the FLAX team were going to embark on with respects to the BAWE corpus and the evaluations on the earlier BAWE collections in FLAX that we would be seeking from international participants in collaboration with the project. Feedback from international stakeholders in China (Confucian dynamism in Chinese ELT context) and Korea (the English language skyline in South Korea) on the BAWE collections in FLAX led to further design and development iterations while back in New Zealand with the FLAX team (Love is a stranger in an open car to tempt you in and drive you far away…toward open educational practice) which have been captured in the project blog posts here in brackets.

Earlier in 2012 FLAX had developed the wikify function for matching key words and phrases in the BAWE collections to Wikipedia entries as a glossary support feature. This provides help with subject specific language in the BAWE which may be daunting to learners and teachers alike who are not yet familiar with the specific language of a given topic area but where there is an expectation that learners will need to develop proficiencies with specific academic English if they are to engage in English-medium higher education programmes. For example, the technical language from a biology methodology recount text in the BAWE can be glossed for enhanced understanding in FLAX with links to Wikipedia definitions and related topics.

Corpus-based approaches for understanding genre in EAP

“Unsurprisingly, the utility of the corpus is increased when it has been annotated, making it no longer a body of text where linguistic information is implicitly present, but one which may be considered a repository of linguistic information.” (ICT4ELT McEnery  & Wilson, 2012)

Corpus studies help with investigations into understanding more than just discrete language items. The study of genres as different communities of practice develop them is also central to corpus work for better understanding the different written assessment types that students will actually encounter across the academy. Generic EAP writing assessments, especially those found in College Composition and Writing Across the Curriculum programmes (Freedman; Petraglia, 1995; Russell, 2002), have been criticized for becoming genres unto themselves; with serious doubts cast on their ability to resemble or assist with transfer in the multitude of specific genres that students will be expected to engage with in their different academic programmes. Generic EAP teaching resources and writing assignments that teach general things about academic language and writing have resulted in EAP writing that Wardle describes as conforming to ‘mutt genres’ (2009).

In response to the issue of genre in university writing, the BAWE corpus collections in FLAX provide EAP teachers and students with a first-hand look into this student-generated corpus of assessed undergraduate and taught postgraduate writing collected at three UK universities: Warwick, Oxford Brookes and Reading. Thirteen different genres were assigned by the developers of the BAWE (Nesi et al., 2004-2007), as can be seen below (hyperlinks to the Life Sciences sub-corpus of the BAWE collections in FLAX):

The Oxford Text Archive where the BAWE is managed by the University of Oxford IT Services granted access to the FLAX project to develop OSS for language learning and teaching on top of this valuable research corpus, in the same way that FLAX have developed OSS to enable access to the BNC which is also managed and distributed by OU IT Services. Four sub-corpora have been developed in FLAX as they correspond to written academic assessments across the major academic disciplines as identified by the makers of the BAWE, including: the Physical Sciences, the Life Sciences, the Social Sciences and the Arts and Humanities BAWE collections in FLAX. It was determined that student texts from the BAWE would serve as an achievable model for academic writing for EAP students, and that this corpus of student texts would serve as a starting point if linked to wider resources, namely the BNC, Wikipedia, the Learning Collocations collection in FLAX and the live Web, thereby providing a ‘bridge’ to more expert writing.

The developers of the BAWE corpus have a follow-on ERSC-funded project, Writing for a Purpose, which are learning resources based on the BAWE for enhancing understanding of genre for writing across the disciplines. These resources are going to be promoted at the upcoming 2013 IATEFL and BALEAP conferences and will definitely be something to look out for.

 

References

Freedman, A. “The What, Where, When, Why, and How of Classroom Genres.” Petraglia Reconceiving. 121–44.

Heuboeck, A. Holmes, J. & Nesi, H. (2010). The BAWE corpus manual for the project entitled, ‘An Investigation of Genres of Assessed Writing in British Higher Education’, version 3. Retrieved from http://www.coventry.ac.uk/Global/05%20Research%20section%20assets/Research/British%20Academic%20Written%20English%20Corpus%20%28BAWE%29/Microsoft%20Word%20-%20BAWEmanual%20v3%20-%20BAWEmanual%20v3.pdf

McEnery T. & Wilson A. (2012) Corpus linguistics. Module 3.4 in Davies G. (ed.) Information and Communications Technology for Language Teachers (ICT4LT), Slough, Thames Valley University [Online]. Retrieved from http://www.ict4lt.org/en/en_mod3-4.htm

Nesi, H, Gardner, S., Thompson, P. & Wickens, P. (2007) The British Academic Written English (BAWE) corpus, developed at the Universities of Warwick, Reading and Oxford Brookes under the directorship of Hilary Nesi and Sheena Gardner (formerly of the Centre for Applied Linguistics [previously called CELTE], Warwick), Paul Thompson (Department of Applied Linguistics, Reading) and Paul Wickens (Westminster Institute of Education, Oxford Brookes), with funding from the ESRC (RES-000-23-0800)

Nesi, H. & Thompson, P. (2011). Using Sketch Engine with BAWE. Retrieved from http://wwwm.coventry.ac.uk/researchnet/BAWE/Documents/Using%20Sketch%20Engine%20with%20BAWE%202011.pdf

Nesi, H. & Gardner S. (2012). Genres across the disciplines: student writing in Higher Education. Cambridge: Cambridge University Press.

Petraglia, J. (1995). Ed. Reconceiving Writing, Rethinking Writing Instruction. Mahwah, NJ: Lawrence Erlbaum.

Russell, D. (2002). Writing in the Academic Disciplines: A Curricular History. 2nd ed. Carbondale: Southern Illinois UP.

Wardle, E. (2009) “‘Mutt Genres’ and the Goal of FYC: Can We Help Students Write the Genres of the University?” College Composition and Communication 60: 765-789.

2 Comments

Leave a Comment

    • Hi, thanks for your interest.

      There are a few different options for your request, depending on what you are planning to do with the corpus:

      The home of the BAWE corpus project is at Coventry University where you can access further information about and resources about the BAWE http://www.coventry.ac.uk/research/research-directory/art-design/british-academic-written-english-corpus-bawe/

      If you want to request access to the corpus for non-commercial educational research purposes (as the FLAX project has done) you can either contact Coventry (details above) or the Oxford Text Archive where it is housed http://ota.ahds.ac.uk/about/contact.xml

      However, if you are wanting to just access the BAWE for search and language learning/teaching purposes you can either use the FLAX interface or the Sketch Engine interface (the later being the more traditional concordance interface) – these are both open and free to access via the web.

      There is also the new Writing for a Purpose project coming out soon via the British Council website with additional derivative resources for using the BAWE in EAP.

      All the best,

      Alannah

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s