Files
comp-syntax-gu-mlt/lab2
2023-03-21 07:46:59 +01:00
..
2021-05-26 07:54:50 +02:00
2022-05-30 15:01:30 +02:00
2023-03-21 07:46:59 +01:00

Lab 2: Multilingual generation and translation

This lab corresponds to Chapters 5 to 9 of the Notes, but follows them only loosely. Therefore we will structure it according to the exercise sessions rather than chapters. The abstract syntax is given in the subdirectory grammars/abstract/

Session 5

  1. Design a morphology for the main lexical types (N, A, V) with parameters and a couple of paradigms.
  2. Test it by implementing the lexicon in the MicroLang module. You need to define lincat N,A,V,V2 as well as the paradigms in MicroResource.

To deliver: the lexicon part of files MicroGrammarX.gf and MicroResourceX.gf for your language of choice X. Follow the structure of MicroGrammarEng and MicroResourceEng when preparing these.

Session 6

  1. Define the linearization types of main phrasal categories - the remaining categories in MicroLang.
  2. Define the rest of the linearization rules in MicroLang.

To deliver: MicroLangX and MicroResourceX for your language of choice, with the lexicon part from Session 5 completed with syntax part.

Session 7

  1. Add concrete UD labels.
  2. Generate a synthetic UD treebank by using gfud.

To deliver for your language X, file MicroLangX.labels and a treebank in CoNLL format with 20 trees.

Deadline: until the end of the course.

Session 8

The task is to add a language to the grammar in wikipedia/. Mode details coming soon.

More information on the task can be find in

A method for testing your Micro grammar

Since MicroLang is a proper part of the RGL, it can be easily implemented as an application grammar. How to do this is shown in grammar/functor/, where the implementation consists of two files:

  • MicroLangFunctor.gf which is a generic implementation working for all RGL languages,
  • MicroLangFunctorEng.gf which is a functor instantiation for English, easily reproduciple for other languages than Eng.

To use this for testing, you can take the following steps:

  1. Build a functor instantiation for your language by copying MicroLangFunctorEng.gf and changing Eng in the file name and inside the file to your language code.

  2. Use GF to create a testfile by random generation:

  $ echo "gr -number=1000 | l -tabtreebank" | gf english/MicroLangEng.gf functor/MicroLangFunctorEng.gf >test.tmp
  1. Inspect the resulting file test.tmp. But you can also use Unix cut to create separate files for the two versions of the grammar and diff to compare them:
  $ cut -f2 test.tmp >test1.tmp
  $ cut -f3 test.tmp >test2.tmp
  $ diff test1.tmp test2.tmp

  52c52
  < the hot fire teachs her
  ---
  > the hot fire teaches her
  69c69
  < the man teachs the apples
  ---
  > the man teaches the apples
  122c122

As seen from the result in this case, our implementation has a wrong inflection of the verb "teach".

The Mini grammar can be tested in the same way, by building a reference implementation using the functor in functor/.