Designing a pedagogical interface for a repository of video interviews

One of the goals in the Corpus to Classroom project is to design a pedagogical interface for the repository of video clips that are being generated out of the more than 100 interviews that were collected in the past as part of the Spanish in Texas project. From our interviews with actual teachers and materials developers, we confirmed that teachers are potentially interested in applying the following types of filtering criteria to their searches:

  • Grammar topics: e.g., search for those clips that contain a significant number of occurrences of por and para
  • Functional topics: e.g., search for those clips that contain exponents of the function apologizing
  • Vocabulary: e.g., clips that contain words (in a pre-defined list maybe) that relate to the topic la familia (papá, mamá, padre(s), madre, hermano/a, abuelo/a…)
  • Thematic: e.g., clips talking about food, traditions, reasons for moving to the US (in our case)…

This is not a complete list, but it is a starting one that contains the most common types of criteria (emotion and phonetics are two criteria that were mentioned too).

With this in mind we are considering the use of a standard search engine (such as Apache Solr/Lucene) to allow teachers to search for the clips and use facets (filtering options) to dig down or define finer-grained queries. However, we also consider the use of typical corpus query tools (such as CWB or SketchEngine — or NoSketchEngine). With this we can cover the Information Retrieval part of our task (more appropriate for document retrieval on the basis of word- or term-based queries) and the Information Extraction part of our task (more appropriate for the queries driven by linguistic patterns).

We will further describe our advances in future posts.

LIFT off!

This blog will chronicle the development of the SPinTX Corpus, and our work to bring a pedagogically useful corpus of authentic Spanish and bilingual Spanish-English speech samples into language classrooms across Texas. The Spanish in Texas (SPinTX) Project project was selected to receive funding from the Longhorn Innovation Fund for Technology (LIFT) for the grant period September 1, 2012 – August 31, 2013. Development of the Corpus began in 2010 and is ongoing under the auspices of the Title VI Center for Open Educational Resources and Language Learning (COERLL).

The focus of the project over the next year will be to help educators exploit the SPinTX corpus to customize materials for the teaching of Spanish at all educational levels. The aims of the project are:

  • to develop a pedagogically friendly interface for the corpus;
  • to involve teachers and learners, via crowd-sourcing, social networking, and workshops, in the development of open educational resources (OER); and to
  • develop a model for using open source tools and a pedagogical interface that can be adapted for any language corpus.

In the spirit of openness, we will be sharing and discussing what we learn and create throughout the project. We invite you to join with us as we explore new tools and methods for integrating authentic content and open data into the language classroom!