adapted lab1 to new structure

This commit is contained in:
Arianna Masciolini
2025-03-26 15:07:07 +01:00
parent 5ef3990633
commit 125b8a2b0d

View File

@@ -3,61 +3,31 @@
In this lab, you will implement the concrete syntax of a grammar for a language of your choice. In this lab, you will implement the concrete syntax of a grammar for a language of your choice.
The abstract syntax is given in the directory [`../grammar/abstract/`](../grammar/abstract/) and an example concrete syntax for English can be found in [`../grammar/english/`](../grammar/english/). The abstract syntax is given in the directory [`../grammar/abstract/`](../grammar/abstract/) and an example concrete syntax for English can be found in [`../grammar/english/`](../grammar/english/).
## Part 1: design the morphological types of the major parts of speech in your selected language ## Part 1: setup and lexicon
1. Go to [universaldependencies.org](https://universaldependencies.org/) and download Version 2.7+ treebanks 1. Create a subfolder in [`../grammar/`](../grammar/) for your language of choice
2. Look up the Parallel UD treebanks for those 21 languages that have it. They are named e.g. `UD_English-PUD/` 2. Copy the contents of [`../grammar/english/`](../grammar/english/) to your new folder and apply the necessary renamings (i.e. replace all occurrences of `Eng` with the new language code)
3. Select a language to compare with English. 3. Translate the words in lexicon part of `MicroLangXxx`
4. Make statistics about the frequencies of POS tags and dependency labels in your language compared with English: find the top-20 tags/labels and their number of occurrences. What does this tell you about the language? (This can be done with shell or Python programming or, more easily, with the [deptreepy](https://github.com/aarneranta/deptreepy/) or [gf-ud](https://github.com/grammaticalFramework/gf-ud) tools. The latter is also available on the eduserv server.) 4. Test your new concrete syntax by generating a few random trees in the GF interpreter. When you linearize them, you should see sentences in a mixture of English and your chosen language. To do this you can use the commands
- `i MicroLangXxx.gf` to [import](https://www.grammaticalframework.org/doc/gf-shell-reference.html#toc18) the grammar
- `gr | l` to [generate a random tree](https://www.grammaticalframework.org/doc/gf-shell-reference.html#toc15) and [linearize](https://www.grammaticalframework.org/doc/gf-shell-reference.html#toc19) it
1. It is enough to cover NOUN, ADJ, and VERB. ## Part 2: morphology
2. Use a traditional grammar book or a Wikipedia article to identify the inflectional and inherent features. 1. Design the morphological types of the major parts of speech (NOUN, ADJ, and VERB) in your selected language, i.e. identify their inflectional and inherent features using: a traditional grammar book or a Wikipedia article and/or data from [universaldependencies.org](https://universaldependencies.org/). In the latter case:
3. Then use data from PUD to check which morphological features actually occur in the treebank for that language. 1. download a treebank for your language
2. use [deptreepy](https://github.com/aarneranta/deptreepy/) or [STUnD](https://harisont.github.io/STUnD/) to query the treebank and look up what morphological features actually occur in the data for each POS
2. Implement these in GF by defining parameters and writing a couple of paradigms. In this phase, you will work in the `MicroResXxx` module
3. Define the `lincat`s for `N`,`A`,`V` and `V2` in `MicroLangEng`
4. Test your GF morphology. To do that, you can import the grammar with the `-retain` flag and use the [`compute_concrete`](https://www.grammaticalframework.org/doc/gf-shell-reference.html#toc8) command on the various lexical items. For example `cc star_N` returns the full inflectional table for the noun "star"
## After lecture 6 ## Part 3: syntax
1. Define the linearization types of main phrasal categories - the remaining categories in `MicroLang`.
2. Define the rest of the linearization rules in `MicroLang`.
1. Design a morphology for the main lexical types (N, A, V) with parameters and a couple of paradigms. ## Part 4: testing your grammar against the RGL
2. Test it by implementing the lexicon in the MicroLang module. You need to define lincat N,A,V,V2 as well as the paradigms in MicroResource. Since `MicroLang` is a proper part of the RGL, it can be easily implemented as an application grammar.
*To deliver*: the lexicon part of files MicroGrammarX.gf and MicroResourceX.gf for your language of choice X. Follow the structure of MicroGrammarEng and MicroResourceEng when preparing these.
## After lecture 7
1. Define the linearization types of main phrasal categories - the remaining categories in MicroLang.
2. Define the rest of the linearization rules in MicroLang.
*To deliver*: MicroLangX and MicroResourceX for your language of choice, with the lexicon part from Session 5 completed with syntax part.
## After lecture 8
1. Try out the applications in `../python` and read its README carefully.
2. Add a concrete syntax for your language to one of the grammars
in `../python/`, either `Query` or `Draw`.
The simplest way to do this
is first to copy the `Eng` grammar and then to change the words; the
syntax may work well as it is. Even though it can be a bit unnatural,
it should be in a wide sense natural.
3. Compile the grammar with `gf -make Query???.gf` so that your grammar
gets included (the same for `Draw`).
4. Generate phrases in GF by first importing your pgf file and then
issuing the command `gt | l -treebank`; fix your grammar if it looks
too bad.
5. Test the corresponding Python application with your language.
The Python code with embedded GF grammars will be explained in a greater
detail in Lecture 9.
*To deliver*: your grammar module.
*Deadline*: 29 May 2024. Demo your grammars (both Micro and this one) at
the last lecture of the course!
## A method for testing your Micro grammar
Since MicroLang is a proper part of the RGL, it can be easily implemented as an application grammar.
How to do this is shown in `grammar/functor/`, where the implementation consists of two files: How to do this is shown in `grammar/functor/`, where the implementation consists of two files:
- `MicroLangFunctor.gf` which is a generic implementation working for all RGL languages, - `MicroLangFunctor.gf` which is a generic implementation working for all RGL languages,
- `MicroLangFunctorEng.gf` which is a *functor instantiation* for English, easily reproduciple for other languages than `Eng`. - `MicroLangFunctorEng.gf` which is a *functor instantiation* for English, easily reproducible for other languages than `Eng`.
To use this for testing, you can take the following steps: To use this for testing, you can take the following steps:
@@ -87,9 +57,8 @@ But you can also use Unix `cut` to create separate files for the two versions of
``` ```
As seen from the result in this case, our implementation has a wrong inflection of the verb "teach". As seen from the result in this case, our implementation has a wrong inflection of the verb "teach".
The Mini grammar can be tested in the same way, by building a reference implementation using the functor in `functor/`. The Mini grammar can be tested in the same way, by building a reference implementation using the functor in `functor/`.'
---
Submit `MicroLangXxx.gf` and `MicroResXxx.gf` on Canvas.