forked from GitHub/comp-syntax-gu-mlt
65 lines
4.0 KiB
Markdown
65 lines
4.0 KiB
Markdown
# Lab 1: Multilingual generation and translation
|
|
|
|
In this lab, you will implement the concrete syntax of a grammar for a language of your choice.
|
|
The abstract syntax is given in the directory [`grammar/abstract/`](grammar/abstract/) and an example concrete syntax for English can be found in [`grammar/english/`](grammar/english/).
|
|
|
|
## Part 1: setup and lexicon
|
|
1. Create a subfolder in [`grammar/`](grammar/) for your language of choice
|
|
2. Copy the contents of [`grammar/english/`](grammar/english/) to your new folder and apply the necessary renamings (i.e. replace all occurrences of `Eng` with the new language code)
|
|
3. Translate the words in lexicon part of `MicroLangXxx`
|
|
4. Test your new concrete syntax by generating a few random trees in the GF interpreter. When you linearize them, you should see sentences in a mixture of English and your chosen language. To do this you can use the commands
|
|
- `i MicroLangXxx.gf` to [import](https://www.grammaticalframework.org/doc/gf-shell-reference.html#toc18) the grammar
|
|
- `gr | l` to [generate a random tree](https://www.grammaticalframework.org/doc/gf-shell-reference.html#toc15) and [linearize](https://www.grammaticalframework.org/doc/gf-shell-reference.html#toc19) it
|
|
|
|
## Part 2: morphology
|
|
1. Design the morphological types of the major parts of speech (NOUN, ADJ, and VERB) in your selected language, i.e. identify their inflectional and inherent features using: a traditional grammar book or a Wikipedia article and/or data from [universaldependencies.org](https://universaldependencies.org/). In the latter case:
|
|
1. download a treebank for your language
|
|
2. use [deptreepy](https://github.com/aarneranta/deptreepy/) or [STUnD](https://harisont.github.io/STUnD/) to query the treebank and look up what morphological features actually occur in the data for each POS
|
|
2. Implement these in GF by defining parameters and writing a couple of paradigms. In this phase, you will work in the `MicroResXxx` module
|
|
3. Define the `lincat`s for `N`,`A`,`V` and `V2` in `MicroLangEng`
|
|
4. Test your GF morphology. To do that, you can import the grammar with the `-retain` flag and use the [`compute_concrete`](https://www.grammaticalframework.org/doc/gf-shell-reference.html#toc8) command on the various lexical items. For example `cc star_N` returns the full inflectional table for the noun "star"
|
|
|
|
## Part 3: syntax
|
|
1. Define the linearization types of main phrasal categories - the remaining categories in `MicroLang`.
|
|
2. Define the rest of the linearization rules in `MicroLang`.
|
|
|
|
## Part 4: testing your grammar against the RGL
|
|
Since `MicroLang` is a proper part of the RGL, it can be easily implemented as an application grammar.
|
|
How to do this is shown in `grammar/functor/`, where the implementation consists of two files:
|
|
- `MicroLangFunctor.gf` which is a generic implementation working for all RGL languages,
|
|
- `MicroLangFunctorEng.gf` which is a *functor instantiation* for English, easily reproducible for other languages than `Eng`.
|
|
|
|
To use this for testing, you can take the following steps:
|
|
|
|
1. Build a functor instantiation for your language by copying `MicroLangFunctorEng.gf` and changing `Eng` in the file name and inside the file to your language code.
|
|
|
|
2. Use GF to create a testfile by random generation:
|
|
```
|
|
$ echo "gr -number=1000 | l -tabtreebank" | gf english/MicroLangEng.gf functor/MicroLangFunctorEng.gf >test.tmp
|
|
```
|
|
|
|
3. Inspect the resulting file `test.tmp`.
|
|
But you can also use Unix `cut` to create separate files for the two versions of the grammar and `diff` to compare them:
|
|
```
|
|
$ cut -f2 test.tmp >test1.tmp
|
|
$ cut -f3 test.tmp >test2.tmp
|
|
$ diff test1.tmp test2.tmp
|
|
|
|
52c52
|
|
< the hot fire teachs her
|
|
---
|
|
> the hot fire teaches her
|
|
69c69
|
|
< the man teachs the apples
|
|
---
|
|
> the man teaches the apples
|
|
122c122
|
|
```
|
|
As seen from the result in this case, our implementation has a wrong inflection of the verb "teach".
|
|
|
|
The Mini grammar can be tested in the same way, by building a reference implementation using the functor in `functor/`.'
|
|
|
|
---
|
|
|
|
Submit `MicroLangXxx.gf` and `MicroResXxx.gf` on Canvas.
|