more clarifications lab 3

This commit is contained in:
Arianna Masciolini
2025-05-27 14:26:29 +02:00
parent 53e08ebf88
commit 3885c0d226

View File

@@ -47,6 +47,20 @@ expands to
We recommend that you annotate at least the first few sentences from scratch. We recommend that you annotate at least the first few sentences from scratch.
When you start feeling confident, you may pre-parse the remaining ones with UDPipe and manually correct the automatic annotation. When you start feeling confident, you may pre-parse the remaining ones with UDPipe and manually correct the automatic annotation.
If you are unsure about an annotation choice you made, you can add a comment line (starting with `#`) right before the sentence in question.
To fully comply with the CoNLL-U standard, comment lines should consist of key-value pairs, e.g.
```conllu
# comment = your comment here
```
but for this assigment lines like
```
# your comment here
```
are perfectly acceptable too.
### Step 4: make sure your files match the CoNLL-U specification ### Step 4: make sure your files match the CoNLL-U specification
Once you have full CoNLL, you can use [deptreepy](https://github.com/aarneranta/deptreepy/), [STUnD](https://harisont.github.io/STUnD/) or [the official online CoNNL-U viewer](https://universaldependencies.org/conllu_viewer.html) to visualize it. Once you have full CoNLL, you can use [deptreepy](https://github.com/aarneranta/deptreepy/), [STUnD](https://harisont.github.io/STUnD/) or [the official online CoNNL-U viewer](https://universaldependencies.org/conllu_viewer.html) to visualize it.
@@ -91,6 +105,7 @@ If you are working on MLTGPU, you may choose a large treebank such as [Swedish-T
If you are working on your laptop and/or if your language does not have a lot of data available, you may want to use a smaller treebank, such as [Amharic-ATT](https://github.com/UniversalDependencies/UD_Amharic-ATT), which only comes with a test set. If you are working on your laptop and/or if your language does not have a lot of data available, you may want to use a smaller treebank, such as [Amharic-ATT](https://github.com/UniversalDependencies/UD_Amharic-ATT), which only comes with a test set.
In this case, split the test into a training and a development portion (e.g. 80% of the sentences for training and 20% for development). In this case, split the test into a training and a development portion (e.g. 80% of the sentences for training and 20% for development).
Make sure both files end with an empty line.
To ensure that MaChAmp works correctly, preprocess __all__ of your data (including your own test data) by running To ensure that MaChAmp works correctly, preprocess __all__ of your data (including your own test data) by running
@@ -111,16 +126,22 @@ python3 train.py --dataset_configs configs/compsyn.json --device N
from the MaChAmp folder. from the MaChAmp folder.
If you are working on MLTGPU, replace `N` with `0` (GPU). If you are using your laptop or EDUSERV, replace it with `-1`, which instructs MaChAmp to train the model on the CPU. If you are working on MLTGPU, replace `N` with `0` (GPU). If you are using your laptop or EDUSERV, replace it with `-1`, which instructs MaChAmp to train the model on the CPU.
Everything you see on screen at this stage will be saved in a training log file called `logs/compsyn/DATE/log.txt`.
### Step 4: evaluation ### Step 4: evaluation
Run your newly trained model with Run your newly trained model with
``` ```
python predict.py logs/compsyn/DATE/model.pt PATH-TO-YOUR-PART1-TREEBANK predictions/OUTPUT-FILE-NAME.conllu --device N python predict.py logs/compsyn/DATE/model.pt PATH-TO-YOUR-PART1-TREEBANK predictions/OUTPUT-FILE-NAME.conllu --device N
``` ```
and use the `machamp/scripts/misc/conll18_ud_eval.py` script to evaluate the system output against your annotations. You can run it as
This saves your model's predictions, i.e. the trees produced by your new parser, in `predictions/OUTPUT-FILE-NAME.conllu`.
You can take a look at this file to get a first impression of how your model performs.
Then, use the `machamp/scripts/misc/conll18_ud_eval.py` script to evaluate the system output against your annotations. You can run it as
``` ```
python scripts/misc/conll18_ud_eval.py PATH-TO-YOUR-PART1-TREEBANK predictions/OUTPUT-FILE-NAME.conllu python scripts/misc/conll18_ud_eval.py PATH-TO-YOUR-PART1-TREEBANK predictions/OUTPUT-FILE-NAME.conllu
``` ```
On Canvas, submit the training logs, the predictions and the output of `conll18_ud_eval.py`, along with a short text summarizing your considerations on the performance of the parser. On Canvas, submit the training logs, the predictions and the output of `conll18_ud_eval.py`, along with a short text summarizing your considerations on the performance of the parser, based on the predictions themselves and on the output of the results of the evaluation.