more clarifications lab 3

This commit is contained in:
Arianna Masciolini
2025-05-27 14:26:29 +02:00
parent 53e08ebf88
commit 3885c0d226

View File

@@ -47,6 +47,20 @@ expands to
We recommend that you annotate at least the first few sentences from scratch.
When you start feeling confident, you may pre-parse the remaining ones with UDPipe and manually correct the automatic annotation.
If you are unsure about an annotation choice you made, you can add a comment line (starting with `#`) right before the sentence in question.
To fully comply with the CoNLL-U standard, comment lines should consist of key-value pairs, e.g.
```conllu
# comment = your comment here
```
but for this assigment lines like
```
# your comment here
```
are perfectly acceptable too.
### Step 4: make sure your files match the CoNLL-U specification
Once you have full CoNLL, you can use [deptreepy](https://github.com/aarneranta/deptreepy/), [STUnD](https://harisont.github.io/STUnD/) or [the official online CoNNL-U viewer](https://universaldependencies.org/conllu_viewer.html) to visualize it.
@@ -91,6 +105,7 @@ If you are working on MLTGPU, you may choose a large treebank such as [Swedish-T
If you are working on your laptop and/or if your language does not have a lot of data available, you may want to use a smaller treebank, such as [Amharic-ATT](https://github.com/UniversalDependencies/UD_Amharic-ATT), which only comes with a test set.
In this case, split the test into a training and a development portion (e.g. 80% of the sentences for training and 20% for development).
Make sure both files end with an empty line.
To ensure that MaChAmp works correctly, preprocess __all__ of your data (including your own test data) by running
@@ -111,16 +126,22 @@ python3 train.py --dataset_configs configs/compsyn.json --device N
from the MaChAmp folder.
If you are working on MLTGPU, replace `N` with `0` (GPU). If you are using your laptop or EDUSERV, replace it with `-1`, which instructs MaChAmp to train the model on the CPU.
Everything you see on screen at this stage will be saved in a training log file called `logs/compsyn/DATE/log.txt`.
### Step 4: evaluation
Run your newly trained model with
```
python predict.py logs/compsyn/DATE/model.pt PATH-TO-YOUR-PART1-TREEBANK predictions/OUTPUT-FILE-NAME.conllu --device N
```
and use the `machamp/scripts/misc/conll18_ud_eval.py` script to evaluate the system output against your annotations. You can run it as
This saves your model's predictions, i.e. the trees produced by your new parser, in `predictions/OUTPUT-FILE-NAME.conllu`.
You can take a look at this file to get a first impression of how your model performs.
Then, use the `machamp/scripts/misc/conll18_ud_eval.py` script to evaluate the system output against your annotations. You can run it as
```
python scripts/misc/conll18_ud_eval.py PATH-TO-YOUR-PART1-TREEBANK predictions/OUTPUT-FILE-NAME.conllu
```
On Canvas, submit the training logs, the predictions and the output of `conll18_ud_eval.py`, along with a short text summarizing your considerations on the performance of the parser.
On Canvas, submit the training logs, the predictions and the output of `conll18_ud_eval.py`, along with a short text summarizing your considerations on the performance of the parser, based on the predictions themselves and on the output of the results of the evaluation.