Files
2026-02-07 21:32:31 -07:00
..
2026-02-07 21:32:31 -07:00

Lab 1: Multilingual generation and translation

In this lab, you will implement the concrete syntax of a grammar for a language of your choice. The abstract syntax is given in the directory grammar/abstract/ and an example concrete syntax for English can be found in grammar/english/.

You will need to install:

Part 1: setup and lexicon

  1. Create a subfolder in grammar/ for your language of choice
  2. Copy the contents of grammar/english/ to your new folder and apply the necessary renamings (i.e. replace all occurrences of Eng with the new language code)
  3. Translate the words in lexicon part of MicroLangXxx
  4. Test your new concrete syntax by generating a few random trees in the GF interpreter. When you linearize them, you should see sentences in a mixture of English and your chosen language. To do this you can use the commands

Part 2: morphology

  1. Design the morphological types of the major parts of speech (NOUN, ADJ, and VERB) in your selected language, i.e. identify their inflectional and inherent features using: a traditional grammar book or a Wikipedia article and/or data from universaldependencies.org. In the latter case:
    1. download a treebank for your language
    2. use deptreepy or write your own script to query the treebank and look up what morphological features actually occur in the data for each POS
  2. Implement these in GF by defining parameters and writing a couple of paradigms. In this phase, you will work in the MicroResXxx module
  3. Test your GF morphology. To do that, you can import the grammar with the -retain flag and use the compute_concrete command on the various lexical items. For example cc star_N returns the full inflectional table for the noun "star"

Part 3: syntax

  1. Define the linearization types of main phrasal categories - the remaining categories in MicroLang.
  2. Define the rest of the linearization rules in MicroLang.

Part 4: testing your grammar against the RGL

Since MicroLang is a proper part of the RGL, it can be easily implemented as an application grammar. How to do this is shown in grammar/functor/, where the implementation consists of two files:

  • MicroLangFunctor.gf which is a generic implementation working for all RGL languages,
  • MicroLangFunctorEng.gf which is a functor instantiation for English, easily reproducible for other languages than Eng.

To use this for testing, you can take the following steps:

  1. Build a functor instantiation for your language by copying MicroLangFunctorEng.gf and changing Eng in the file name and inside the file to your language code.

  2. Use GF to create a testfile by random generation:

  $ echo "gr -number=1000 | l -tabtreebank" | gf english/MicroLangEng.gf functor/MicroLangFunctorEng.gf >test.tmp
  1. Inspect the resulting file test.tmp. But you can also use Unix cut to create separate files for the two versions of the grammar and diff to compare them:
  $ cut -f2 test.tmp >test1.tmp
  $ cut -f3 test.tmp >test2.tmp
  $ diff test1.tmp test2.tmp

  52c52
  < the hot fire teachs her
  ---
  > the hot fire teaches her
  69c69
  < the man teachs the apples
  ---
  > the man teaches the apples
  122c122

As seen from the result in this case, our implementation has a wrong inflection of the verb "teach".

The Mini grammar can be tested in the same way, by building a reference implementation using the functor in functor/.'


Submit MicroLangXxx.gf and MicroResXxx.gf on Canvas.