Lab 2: Multilingual generation and translation
This lab corresponds to Chapters 5 to 9 of the Notes, but follows them only loosely. Therefore we will structure it according to the exercise sessions rather than chapters. The abstract syntax is given in the subdirectory grammars/abstract/
Session 5
- Design a morphology for the main lexical types (N, A, V) with parameters and a couple of paradigms.
- Test it by implementing the lexicon in the MicroLang module. You need to define lincat N,A,V,V2 as well as the paradigms in MicroResource.
To deliver: the lexicon part of files MicroGrammarX.gf and MicroResourceX.gf for your language of choice X. Follow the structure of MicroGrammarEng and MicroResourceEng when preparing these.
Session 6
- Define the linearization types of main phrasal categories - the remaining categories in MicroLang.
- Define the rest of the linearization rules in MicroLang.
To deliver: MicroLangX and MicroResourceX for your language of choice, with the lexicon part from Session 5 completed with syntax part.
Session 7
- Add concrete UD labels.
- Generate a synthetic UD treebank by using gfud.
To deliver for your language X, file MicroLangX.labels and a treebank in CoNLL format with 20 trees.
Deadline: until the end of the course.
Session 8
The task is to add a language to the grammar in wikipedia/. Mode details coming soon.
More information on the task can be find in
- the lecture notes: https://github.com/aarneranta/NLG-examples/blob/main/doc/gf-nlg.pdf
- the slides shown on lecture 9: https://docs.google.com/presentation/d/1gQTI_vv6anBCaUCJujGxJZbmZBKuXPN_rRUYRgdvXio/edit?usp=sharing
- the GF Summer School film: https://www.youtube.com/watch?v=gX_y2BqJ0w0&list=PL7VkoRLnYYP6EZngakW7lNNCTjfC93uh0&index=15
A method for testing your Micro grammar
Since MicroLang is a proper part of the RGL, it can be easily implemented as an application grammar.
How to do this is shown in grammar/functor/, where the implementation consists of two files:
MicroLangFunctor.gfwhich is a generic implementation working for all RGL languages,MicroLangFunctorEng.gfwhich is a functor instantiation for English, easily reproduciple for other languages thanEng.
To use this for testing, you can take the following steps:
-
Build a functor instantiation for your language by copying
MicroLangFunctorEng.gfand changingEngin the file name and inside the file to your language code. -
Use GF to create a testfile by random generation:
$ echo "gr -number=1000 | l -tabtreebank" | gf english/MicroLangEng.gf functor/MicroLangFunctorEng.gf >test.tmp
- Inspect the resulting file
test.tmp. But you can also use Unixcutto create separate files for the two versions of the grammar anddiffto compare them:
$ cut -f2 test.tmp >test1.tmp
$ cut -f3 test.tmp >test2.tmp
$ diff test1.tmp test2.tmp
52c52
< the hot fire teachs her
---
> the hot fire teaches her
69c69
< the man teachs the apples
---
> the man teaches the apples
122c122
As seen from the result in this case, our implementation has a wrong inflection of the verb "teach".
The Mini grammar can be tested in the same way, by building a reference implementation using the functor in functor/.