more clarifications lab 3

2025-05-27 14:26:29 +02:00
parent 53e08ebf88
commit 3885c0d226
1 changed files with 23 additions and 2 deletions
--- a/lab3/README.md
+++ b/lab3/README.md
@@ -47,6 +47,20 @@ expands to

 We recommend that you annotate at least the first few sentences from scratch.
 When you start feeling confident, you may pre-parse the remaining ones with UDPipe and manually correct the automatic annotation.
+If you are unsure about an annotation choice you made, you can add a comment line (starting with `#`) right before the sentence in question.
+To fully comply with the CoNLL-U standard, comment lines should consist of key-value pairs, e.g.
+
+```conllu
+# comment = your comment here
+```
+
+but for this assigment lines like
+
+```
+# your comment here
+```
+
+are perfectly acceptable too.

 ### Step 4: make sure your files match the CoNLL-U specification
 Once you have full CoNLL, you can use [deptreepy](https://github.com/aarneranta/deptreepy/), [STUnD](https://harisont.github.io/STUnD/) or [the official online CoNNL-U viewer](https://universaldependencies.org/conllu_viewer.html) to visualize it.
@@ -91,6 +105,7 @@ If you are working on MLTGPU, you may choose a large treebank such as [Swedish-T

 If you are working on your laptop and/or if your language does not have a lot of data available, you may want to use a smaller treebank, such as [Amharic-ATT](https://github.com/UniversalDependencies/UD_Amharic-ATT), which only comes with a test set. 
 In this case, split the test into a training and a development portion (e.g. 80% of the sentences for training and 20% for development).
+Make sure both files end with an empty line.

 To ensure that MaChAmp works correctly, preprocess __all__ of your data (including your own test data) by running 

@@ -111,16 +126,22 @@ python3 train.py --dataset_configs configs/compsyn.json --device N
 from the MaChAmp folder.
 If you are working on MLTGPU, replace `N` with `0` (GPU). If you are using your laptop or EDUSERV, replace it with `-1`, which instructs MaChAmp to train the model on the CPU.

+Everything you see on screen at this stage will be saved in a training log file called `logs/compsyn/DATE/log.txt`.
+
 ### Step 4: evaluation
 Run your newly trained model with

 ```
 python predict.py logs/compsyn/DATE/model.pt PATH-TO-YOUR-PART1-TREEBANK predictions/OUTPUT-FILE-NAME.conllu --device N
 ```
-and use the `machamp/scripts/misc/conll18_ud_eval.py` script to evaluate the system output against your annotations. You can run it as
+
+This saves your model's predictions, i.e. the trees produced by your new parser, in `predictions/OUTPUT-FILE-NAME.conllu`. 
+You can take a look at this file to get a first impression of how your model performs.
+
+Then, use the `machamp/scripts/misc/conll18_ud_eval.py` script to evaluate the system output against your annotations. You can run it as

 ```
 python scripts/misc/conll18_ud_eval.py PATH-TO-YOUR-PART1-TREEBANK predictions/OUTPUT-FILE-NAME.conllu
 ```

-On Canvas, submit the training logs, the predictions and the output of `conll18_ud_eval.py`, along with a short text summarizing your considerations on the performance of the parser.
+On Canvas, submit the training logs, the predictions and the output of `conll18_ud_eval.py`, along with a short text summarizing your considerations on the performance of the parser, based on the predictions themselves and on the output of the results of the evaluation.