Radio Ga Ga: corpus-based resources, you’ve yet to have your finest hour

Radio Ga Ga album cover by Queen via Wikipedia

These past few months I’ve been tuning into a lot of different practitioner events and discussions across a range of educational communities which I feel are of relevance to English language education where uses for corpus-based resources are concerned. There’s something very distinct about the way these different communities are coming together and in the way they are sharing their ideas and outputs. In this post, I will liken their behaviour to different types of radio station broadcast, highlighting differences in communication style and the types of audience (and audience participation) they tend to attract.

I’ve also been re-setting my residential as well as my work stations. No longer at Durham University’s English Language Centre, I’m now London-based and have just set off on a whirlwind adventure for further open educational resources (OER) development and dissemination work with collaborators and stakeholders in a variety of locations around the world. TOETOE is going international and is now being hosted by Oxford University Computing Services (OUCS) in conjunction with the Higher Education Academy (HEA) and the Joint Information Systems Committee (JISC) as part of the UK government-funded OER International programme.

I will also be spreading the word about the newly formed Open Education Special Interest Group (OESIG), the Flexible Language Acquisition (FLAX) open corpus-based language resources project at the University of Waikato, and select research corpora, including the British National Corpus (BNC) and the British Academic Written English (BAWE) corpus, both managed by OUCS, which have been prised open by FLAX and TOETOE for uses in English as a Foreign Language (EFL) – also referred to as English as a Second Language (ESL) in North America – and English for Academic Purposes (EAP). Stay tuned to this blog in the coming months for more insights into open corpus-based English language resources and their uses in different teaching and learning contexts.

This post is what those in the blogging business refer to as a ‘cornerstone’ post as it includes many insights into the past few months of my teaching fellowship in OER with the Support Centre in Open Educational Resources (SCORE) at the Open University in the UK. Many posts within one as it were. This post also provides a road map for taking my project work forward while identifying shorter blogging themes for posts that will follow this one. This particular post will also act as the mother-ship TOETOE post from which subsequent satellite posts will be linked.  Please use the red menu hyperlinks in the section below to dip in and out of the four main sections of this blog post series. I have elected to choose this more reflective style of writing through blogging so that my growing understandings in this area are more accessible to unanticipated readers who may stumble upon this blog and hopefully make comments to help me refine my work. Two more formal case studies on my TOETOE project to date will be coming out soon via the HEA and the JISC.

I have also made this hyperlinked post (in five sections) available as a .pdf on Slideshare.

Which station(s) are you listening to?

BBC Radio has been going since 1927. With audiences in the UK, four stations in particular are firm favourites: youth oriented BBC Radio 1 featuring new and contemporary music; BBC Radio 2 with middle of the road music for the more mature audience; high culture and arts oriented BBC Radio 3, and; news and current affairs oriented BBC Radio 4. Of course there are many more stations but these four are very typical of those found around the world. What is more, I’ve selected these four very distinct stations as the basis to build a metaphor around the way four very distinct educational practitioner communities are intersecting with corpus-based language teaching resources. This metaphor will draw on thought waves from the following:

Radio 1 – what’s new and hip in open corpus-based resources and practices

Radio 2 – the greatest hits in ELT materials development and publishing

Radio 3 – research from teaching and language corpora

Radio 4 – The current talk in EAP: open platforms for defining practice


Leave a Comment

  1. Some good stuff Alannah, thanks.
    Software: WST is aimed more at researchers, but freeware such as AntConc, various resources at Lextutor, or online interfaces to corpora such as at BYU, are more amenable; many (eg AntConc) have video tutorials, either on their own site or via YouTube.
    DDL bibliographies: there are various links from CorpusCALL (, as well as our own list of papers relating to corpus consultation for language learning / teaching. Erin Shaw also has a nice introduction to using BYU corpora DDL-wise at


    • Thanks, Alex, for your comments. I would say AntConc is used a lot by researchers also judging by the AntConc Google Groups discussions which I also follow. The training resources Laurence Anthony provides for AntConc in terms of YouTube videos and worksheets on his website do bridge the gap into language learning and teaching uses for AntConc, especially in ESP/ESAP. I did ask Laurence Anthony about putting his training videos directly onto his university website again as certain parts of the world can’t access YouTube and he said he had also thought about that issue – I find him incredibly responsive to questions like the rest of the DDL community whom I’ve been emailing for my research and of course the TaLCers whom I met in person in Warsaw. The DDL bibliographies are great for researchers and developers that have access to the publications, many of which are subscription-only…perhaps it would be good to include information on whether open access and pre-pub versions of the literature are available as part of the annotating/tagging activity that goes into making these bibliographies. Yes, Erin Shaw’s work with BYU corpora is great – creating mainstream language teaching community awareness around these types of resources that are freely available would be one of my key areas of interest. Like a lot of elearning resources that are scattered on the web it is really those teachers engaging in DDL along with teacher training and publishing/marketing bodies who can help direct teachers and learners to these resources for DDL…but I find these promotional-educational pathways to be scattered also.


    • Hi Laura, and thanks for your positive feedback on my blog post! Great to see you’re working with French academic corpora released under creative commons. Sounds interesting – how are you thinking of improving it? I’ve also been thinking about bringing in open access English journal articles published under creative commons into the FLAX collections to link to their English collocations and Wikipedia databases. There’s no reason why similar collections couldn’t be built with French content. You just need the interest of a small team of interested practitioners so you can crowd source the building of collections. The FLAX team would be interested if you wanted to start doing this. Also, the FLAX software is open source and multilingual so you could download it and try building open French collections with their built-in interactivity tools for making e.g. language learning resources. They have an active user support group for their Greenstone open source digital library software and there are many different communities building all sorts of digital collections so they have a lot of expertise to share.

      Thanks again!



  2. Great post, Alannah – a mine of information and resources which I shall keep coming back to. As a footnote, have you seen an article in the latest Applied Linguistics (September 2012: 33/4), by Kwanghyun Park titled ‘Learner-Corpus Interaction: A locus of microgenesis in corpus-assisted L2 writing’, which claims to be the first attempt to monitor learners’ use of corpora in real time in order to self-edit and improve their academic writing, and which shows that ‘favorable learning outcomes accrue when learners evaluate search results based on careful analysis’. What’s interesting (to me, anyway) is how relatively small the (customised) corpus was – 50 or so academic texts freely available on-line (approx 350,000 words), but it seemed sufficient to address many of the queries that the three (Chinese-speaking) subjects put to it. Anyway, I pass that on for what it’s worth!


  3. Great to have you here, Scott.
    Your comments about the corpus-assisted L2 writing study are indeed worth sharing…I expect we’ll be seeing a lot more in the way of studies presenting real-time data from learner interactions with intelligent systems for text analysis that can easily record e.g. searches and time spent on activities to trace learner pathways with a variety of data outputs (e.g. log files and screen shots)…simply because we have the technology to do so now.
    We now also have the technology to build DIY customised corpora that are relatively small in size which can be linked to larger reference corpora for comparative analytical purposes. Making this more intuitive is something the FLAX team have been working on and we’ll be putting out some training videos soon to show teachers and learners how to build their own collections online and to build interactivity into their collections with games for e.g. collocations to improve their understanding of target L2 reading collections and fluency in L2 writing tasks.
    The technology is getting easier to manipulate and those who are building freeware for these purposes – and experimenting with more user-friendly interfaces that are not the usual KeyWordInContext (KWIC) search interfaces – are more likely to succeed with scaling these approaches and tools in mainstream language learning and teaching. What I’m particularly interested in at the moment is exploring how these corpus-based approaches can be scaled in the online world of open and distance education where currently we have a lot of OER in English and arguably a greater willingness and need to employ open technologies and practices for language education as with, for example, the rising MOOC trend for global higher education. Not surprisingly, we are seeing a lot of those learners who are interested in pursuing more Arts and Humanities or Social Sciences courses online with English-medium MOOCs from the likes of Coursera dropping out because of English written requirements for assessment combined with a lack of support resources for English for Specific Academic Purposes as is discussed here:
    Needless to say, there are so many exciting opportunities for Data Driven Learning for the masses and perhaps the research into DDL needs to be looking beyond the classroom to amass sizable learner interaction data and successful approaches that can be replicated at minimal cost in both open and traditional education contexts. I’ll be blogging about this as my work comes together more in this area….


  4. Really useful! Nice to have so many links to resources in one place, thanks 🙂

    I have been trying to promote my EFL students’ use of BAWE and Lextutor to test their usage of newly acquired vocabulary, but I think the latter could do with a more user-friendly interface. At present the user-interfaces aren’t sufficiently intuitive to allow students to just get on with such activity, and that means internet access in the classroom is necessary, but not always possible, to provide tutorials. To be honest, I think the user-interfaces and the name ‘corpus’ scares practitioners too.

    I have also introduced a couple of colleagues to AntConc with a view to compiling our own local corpora to help with curriculum development and classroom teaching practice. I think the use of this kind of software is the future of language learning curriculum and teaching materials design.


  5. Thanks for writing in!
    I’ll be releasing some training videos on new features with the BAWE collections in FLAX later this week and I’ll ping the links to you. It’s great to see someone else trying to introduce this same range of tools to their students – I only work with the freely available ones as these are more likely to be followed up by students away from classrooms and labs.
    You’re right about interface issues and that’s why FLAX are trying to keep things really simple. Quite often with concordancing software programs there are lots of features for corpus linguists e.g. for enabling complex queries that return a lot of descriptive statistical information that simply baffle language teachers and learners. I believe that the design of any technology user interface for uses in education has a better chance of success if it follows the design principles of simplicity, accessibility and functionality. Downes (2004) defines simplicity in educational technology design as those tools which are not only easy to use but those which have been designed to perform necessary functions only. FLAX also can be downloaded and installed on mainframes or run from CD-ROMs to get around the connectivity issue, and there is also a FLAX Moodle plug-in.

    I see that you’re based in China. I’ve just put out a new post related to my recent visit to Beijing and Dalian – it would be great to hear your views on my experiences based there and on my plans to keep working with ELT practitioners there.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s