Brainstorming on the search & browse interface

We are thinking of offering teachers a practical and user friendly way of accessing the video clips in the SPinTX corpus. We are assuming that teachers might sometimes be overwhelmed by what can be asked to a corpus query interface (i.e., they did not design the compilation process, and it can be just a small corpus — compare to Google, querying the entire web).

Thus we want to offer teachers two clip retrieval modes: the search mode and the browsing mode. The search mode is the usual Google-like key term based search. I would type “banco Medellín” to retreive documents related to banks (financial institutions) in Medellín (Colombia). However, I would type “banco madera Medellín”, if I were looking for documents about carpenters or stores selling wooden banks (to sit on) in Medellín.

The browsing functionality is intended to facilitate the visual exploration of pedagogically relevant information extracted from the corpus. One initial thought is the use of information clouds, as reflected in the figure below. Imagine a a blank square with two drop-down menus. On one of them you could select a topic, to determine the lexical goal, the vocabulary. On the other one you could select the linguistic topic, which could range from grammatical categories to functional ones and a range of other classification criteria that could be relevant for language instruction/learning.

Figure 1 shows how this particular strategy would look like if we select Todos (all topics) in the thematic dropdown list and Gram: Prep. régimen (grammar topic, verb and preposition combinations). The size of the particular verb+prep combination is related to the number of occurrences it has in the corpus now, though it could also be related to the number of documents that have it in the corpus too.

Wireframe of the user interface for exploring the corpus via the browsing mode.

Figure 1. Wireframe of a user interface for browsing the corpus information on the basis of thematic criteria and linguistic criteria.

I am a graduate student at Universitat Pompeu Fabra in Barcelona and at Karls-Eberhard Universität in Tübingen. My PhD dissertation is a study on the methodological and technical aspects that can help produce ICALL activities that are pedagogically meaningful and computationally feasible. I have worked in the design and implementation of several software solutions including Natural Language Processing techniques, among them web interfaces to parallel and monolingual corpora, the development of morphosyntactic taggers and parsers, and the development of end-user spell, grammar and style checking tools.

