Lab 1: Multilingual generation and translation
In this lab, you will implement the concrete syntax of a grammar for a language of your choice.
The abstract syntax is given in the directory grammar/abstract/ and an example concrete syntax for English can be found in grammar/english/.
Part 1: setup and lexicon
- Create a subfolder in
grammar/for your language of choice - Copy the contents of
grammar/english/to your new folder and apply the necessary renamings (i.e. replace all occurrences ofEngwith the new language code) - Translate the words in lexicon part of
MicroLangXxx - Test your new concrete syntax by generating a few random trees in the GF interpreter. When you linearize them, you should see sentences in a mixture of English and your chosen language. To do this you can use the commands
i MicroLangXxx.gfto import the grammargr | lto generate a random tree and linearize it
Part 2: morphology
- Design the morphological types of the major parts of speech (NOUN, ADJ, and VERB) in your selected language, i.e. identify their inflectional and inherent features using: a traditional grammar book or a Wikipedia article and/or data from universaldependencies.org. In the latter case:
- Implement these in GF by defining parameters and writing a couple of paradigms. In this phase, you will work in the
MicroResXxxmodule - Define the
lincats forN,A,VandV2inMicroLangEng - Test your GF morphology. To do that, you can import the grammar with the
-retainflag and use thecompute_concretecommand on the various lexical items. For examplecc star_Nreturns the full inflectional table for the noun "star"
Part 3: syntax
- Define the linearization types of main phrasal categories - the remaining categories in
MicroLang. - Define the rest of the linearization rules in
MicroLang.
Part 4: testing your grammar against the RGL
Since MicroLang is a proper part of the RGL, it can be easily implemented as an application grammar.
How to do this is shown in grammar/functor/, where the implementation consists of two files:
MicroLangFunctor.gfwhich is a generic implementation working for all RGL languages,MicroLangFunctorEng.gfwhich is a functor instantiation for English, easily reproducible for other languages thanEng.
To use this for testing, you can take the following steps:
-
Build a functor instantiation for your language by copying
MicroLangFunctorEng.gfand changingEngin the file name and inside the file to your language code. -
Use GF to create a testfile by random generation:
$ echo "gr -number=1000 | l -tabtreebank" | gf english/MicroLangEng.gf functor/MicroLangFunctorEng.gf >test.tmp
- Inspect the resulting file
test.tmp. But you can also use Unixcutto create separate files for the two versions of the grammar anddiffto compare them:
$ cut -f2 test.tmp >test1.tmp
$ cut -f3 test.tmp >test2.tmp
$ diff test1.tmp test2.tmp
52c52
< the hot fire teachs her
---
> the hot fire teaches her
69c69
< the man teachs the apples
---
> the man teaches the apples
122c122
As seen from the result in this case, our implementation has a wrong inflection of the verb "teach".
The Mini grammar can be tested in the same way, by building a reference implementation using the functor in functor/.'
Submit MicroLangXxx.gf and MicroResXxx.gf on Canvas.