Radio Ga Ga 1 – What’s new and hip in open corpus-based resources and practices

Radio Ga Ga by Queen via YouTube
Radio Ga Ga by Queen via YouTube

This is the first satellite post from the mothership post, Radio Ga Ga: corpus-based resources, you’ve yet to have your finest hour. I have also made the complete hyperlinked post (in five sections) available as a .pdf on Slideshare.

Radio 1

Original, in-house and live, this station brings us what’s new in the world of OER for corpus-based language resources.

Flipped conferencing

Kicking things off in late March with Clare Carr from Durham, we co-presented an OER for EAP corpus-based teacher and learner training cascade project at the Eurocall CMC & Teacher Education Annual Workshop in Bologna, Italy. This was very much a flipped conference whereby draft presentation papers were sent to be read in advance by participants and where the focus was on discussion rather than presentation at the physical event. Russell Stannard of Teacher Training Videos (TTV) was the keynote speaker at this conference and I have been developing some training resources for the FLAX open-source corpus collections which will be ready to go live on TTV soon. New collections in FLAX have opened up the BAWE corpus and have linked this to the BNC, a Google-derived n-gram corpus as well as Wikimedia resources, namely Wikipedia and Wiktionary. These collections in FLAX show what’s cutting edge in the developer world of open corpus-based resources for language learning and teaching.

Focusing on linked resources: which academic vocabulary list?

In a later post, I will be looking at Mark Davies’ new work with Academic Vocabulary Lists based on a 110 million-word academic sub corpus in the Corpus of Contemporary American (COCA) English – moving away from the Academic Word List (AWL) by Coxhead (2000) based on a 3.5 million-word corpus – and his innovative web tools and collections based on the COCA. Once again, Davies’ Word and Phrase project website at Brigham Young University contains a bundle of powerfully linked resources, including a collocational thesaurus which links to other leading research resources such as the on-going lexical database project at Princeton, WordNet.

The open approach to developing non-commercial learning and teaching corpus-based resources in FLAX also shows the commitment to OER at OUCS (including the Oxford Text Archive), where the BAWE and the BNC research corpora are both managed. Click on the image below to visit the BAWE collections in FLAX.

BAWE case study text from the Life Sciences collection in FLAX with Wikipedia resources

Open eBooks for language learning and teaching

Learning Through Sharing: Open Resources, Open Practices, Open Communication, was the theme of the EuroCALL conference and to follow things up the organisers have released a call for OER in languages for the creation of an open eBook on the same theme. The book will be “a collection of case studies providing practical suggestions for the incorporation of Open Educational Resources (OER) and Practices (OEP), and Open Communication principles to the language classroom and to the initial and continuing development of language teachers.” This open-access e-Book, aimed at practitioners in secondary and tertiary education, will be freely available for download. If you’re interested in submitting a proposal to contribute to this electronic volume, please send in a case study proposal (maximum 500 words) by 15 October 2012 to the co-editors of the publication, Ana Beaven (University of Bologna, Italy), Anna Comas-Quinn (Open University, UK) and Barbara Sawhill (Oberlin College, USA).

MOOC on Open Translation tools and practices

Another learning event which I’ve just picked up from EuroCALL is a pilot Massive Open Online Course in open translation practices being run from the British Open University from 15th October to 7 December 2012 (8 weeks), with the accompanying course website opening on Oct 10th 2012. Visit the “Get involved” tab on the following site: “Open translation practices rely on crowd sourcing, and are used for translating open resources such as TED talks and Wikipedia articles, and also in global blogging and citizen media projects such as Global Voices. There are many tools to support Open Translation practices, from Google translation tools to online dictionaries like Wordreference, or translation workflow tools like Transifex.” Some of these tools and practices will be explored in the OT12 MOOC.

Bringing open corpus-based projects to the Open Education community

On the back of the Cambridge 2012 conference: Innovation and Impact – Openly Collaborating to Enhance Education held in April, I’ve been working on another eBook chapter on open corpus-based resources which will be launched very soon at the Open Education conference in Vancouver. The Cambridge 2012 event was jointly hosted in Cambridge, England by the Open Course Ware Consortium (OCWC) and SCORE. Presenting with Terri Edwards from Durham, we covered EAP student and teacher perceptions of training with open corpus-based resources from three projects: FLAX, the Lextutor and AntConc. These three projects vary in terms of openness and the type of resources they are offering. In future posts I will be looking at their work and the communities that form around their resources in more depth. The following video from the conference has captured our presentation and the ensuing discussion at this event to a non-specialist audience who are curious to know how open corpus-based resources can help with the open education vision. Embedding these tools and resources into online and distance education to support the growing number of learners worldwide who wish to access higher education, where the OER and most published research are in English, opens a whole new world of possibilities for open corpus-based resources and EAP practitioners working in this area.

A further video from a panel discussion which I contributed to – an OER kaleidoscope for languages – looks at three further open language resources projects that are currently underway and building momentum here in the UK: OpenLives, LORO, the CommunityCafe. Reference to other established OER projects for languages and the humanities including LanguageBox and the HumBox are also made in this talk.

A world declaration for OER

The World OER congress in June at the UNESCO headquarters in Paris marked ten years since the coining of the term OER in 2002 along with the formal adoption of an OER declaration (click on the image to see the declaration). I’ve included the following quotation from the OER declaration to provide a backdrop to this growing open education movement as it applies to language teaching and learning, highlighting that attribution for original work is commonplace with creative commons licensing.

Emphasizing that the term Open Educational Resources (OER) was coined at UNESCO’s 2002 Forum on OpenCourseWare and designates “teaching, learning and research materials in any medium, digital or otherwise, that reside in the public domain or have been released under an open license that permits no-cost access, use, adaptation and redistribution by others with no or limited restrictions. Open licensing is built within the existing framework of intellectual property rights as defined by relevant international conventions and respects the authorship of the work”.

Wikimedia – why not?

Wikimedia Foundation
Wikimedia Foundation

Earlier in September, I volunteered to present at the EduWiki conference in Leicester which was hosted by the Wikimedia UK chapter. Most people are familiar with Wikipedia which is the sixth most visited website in the world. It is but one of many sister projects managed by the Wikimedia Foundation, however, along with others such as Wikiversity, Wiktionary etc.

I will also be blogging soon about widely held misconceptions for uses of Wikipedia in EAP and EFL / ESL while exploring its potentials in writing instruction with reference to some very exciting education projects using Wikipedia around the world. The types of texts that make up Wikipedia alongside many academics’ realisations that they need to be reaching wider audiences with their work through more accessible modes of writing transmission are all issues I will be commenting on in this blog in the very near future.

Presenting the work the FLAX team have done with text mining, incorporating David Milne’s Wikipedia mining tool, the potential of Wikipedia as an open corpus resource in language learning and teaching is evident. I was demonstrating how this Wikipedia corpus has been linked to other research corpora in FLAX, namely the BNC and the BAWE, for the development of corpus-based OER for EFL / ESL and EAP. And, let’s not forget that it’s all for free!

The open approach to corpus resources development

There is no reason why the open approach taken by FLAX cannot be extended to build open corpus-based collections for learning and teaching other modern languages, linking different language versions of Wikipedia to relevant research corpora and resources in the target language. In particular, functionality in the FLAX collections that enable you to compare how language is used differently across a range of corpora, which are further supported by additional resources such as Wiktionary and Roget’s Thesaurus, make for a very powerful language resource. Crowd-sourcing corpus resources through open research and education practices and through the development of open infrastructure for managing and making these resources available is not as far off in the future as we might think. The Common Language Resources and Technology Infrastructure (CLARIN) mission in Europe is a leading success story in the direction currently being taken with corpus-based resources (read more about the recent workshop for CLARIN-D held in Leipzig, Germany).


Coxhead, A. (2000). The Academic Word List.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s