We are using VISLCG3 to annotate with pedagogically relevant information the transcripts of our oral video clips. VISLCG3 is open source software under a GNU General Public License that implements a finite-state machine that allows for the linguistic analysis of text in so-called local contexts using hand-crafted rules.
VISLCG3 (CG3 for short) can be used mainly for three types of operations: replace information related/assigned to a word, remove information assigned to a word (by a previous module, e.g., a dictionary look-up module), or add information to a word. For the purpose of our project we are mainly using CG3 to add information, that is, to add pedagogically-relevant annotations to texts (oral transcripts) that have previously been tagged with basic linguistic information such as lemma and part-of-speech with TreeTagger (this deserves a separate post). We are also using it to do some post-editing of the tagging, since our tagger systematically makes certain decisions with which we do not agree.
ADD and ADDRELATION
The two operators that we most frequently use are ADD and ADDRELATION. They perform similar actions: They both add one piece of information to the reading of a particular word. The only difference is that the former can be applied to phenomena that extend over one word (cohort, using CG3’s terminology), while the latter can be applied to phenomena that extend over two cohorts or more — optionally with words inbetween. This annotated phenomena are correlated with an internal Pedagogical Typology (still in progress) which we elaborated by extracting linguistic and communicative topics often found in Spanish textbooks.
The following rules exemplify how we used ADD and ADDRELATION to pedagogically annotate our corpus. The first two are pretty straightforward rules, the third one is a relatively more complex rule and gives the reader an idea of the power of formalisms such as CG3.
- ADD (@Prag:MarcadoresDisc:Probabilidad) Adverbio IF (0 Quizas);
The above rule states that any adverb reading of the word(s) included in the set Quizas (which includes both quizás and quizá and is defined elsewhere in the grammar file) will be assigned the information @Prag:MarcadoresDisc:Probabilidad.
- ADDRELATION (Gram:SerEstar:EstarAux) VerboEstar IF (0 FiniteVerbForms) TO (1 Gerundio);
The above rule states that any occurrence of the verb estar that is conjugated (that is, finite verb form) and is followed by a gerund will be tagged as an instance of the verb estar being used as an auxiliary R:Gram:SerEstar:EstarAux. The rule establishes and index-based relation between the form of estar and the corresponding gerund.
- ADDRELATION (Func:Deseos:Ajenos) Verbo IF (0 VerbosExpresarDeseos) (1 Que) TO (*2 Verbo + Subjuntivo BARRIER GrupoNominal OR LimiteOracionSimple OR FiniteVerbForms);
The above rule states that any instance of a verb included in the list VerbosExpresarDeseos (which is defined elsewhere and includes verbs such as gustar, desear, querer…) that is followed by a que and also followed by a verb in subjunctive mood should be annotated as an instance of a way of expressing desire. The rule uses operators such as the Kleene star (*) or the special word BARRIER to give more flexibility to the actual location of the verb (not necessarily right after the que), but also controls that there is no crossing over of certain linguistic itmes such as conjunctions or sentence delimiters to guarantee that the rule stays within a safe scope. The tag that the rule maps is R:Func:Desesos:Ajenos to the actual verb and an index records the direction of the relation with the subjunctive verb (note the TO).