diff --git a/lab3/README.md b/lab3/README.md index a2c942f..c9c182a 100644 --- a/lab3/README.md +++ b/lab3/README.md @@ -47,6 +47,20 @@ expands to We recommend that you annotate at least the first few sentences from scratch. When you start feeling confident, you may pre-parse the remaining ones with UDPipe and manually correct the automatic annotation. +If you are unsure about an annotation choice you made, you can add a comment line (starting with `#`) right before the sentence in question. +To fully comply with the CoNLL-U standard, comment lines should consist of key-value pairs, e.g. + +```conllu +# comment = your comment here +``` + +but for this assigment lines like + +``` +# your comment here +``` + +are perfectly acceptable too. ### Step 4: make sure your files match the CoNLL-U specification Once you have full CoNLL, you can use [deptreepy](https://github.com/aarneranta/deptreepy/), [STUnD](https://harisont.github.io/STUnD/) or [the official online CoNNL-U viewer](https://universaldependencies.org/conllu_viewer.html) to visualize it. @@ -91,6 +105,7 @@ If you are working on MLTGPU, you may choose a large treebank such as [Swedish-T If you are working on your laptop and/or if your language does not have a lot of data available, you may want to use a smaller treebank, such as [Amharic-ATT](https://github.com/UniversalDependencies/UD_Amharic-ATT), which only comes with a test set. In this case, split the test into a training and a development portion (e.g. 80% of the sentences for training and 20% for development). +Make sure both files end with an empty line. To ensure that MaChAmp works correctly, preprocess __all__ of your data (including your own test data) by running @@ -111,16 +126,22 @@ python3 train.py --dataset_configs configs/compsyn.json --device N from the MaChAmp folder. If you are working on MLTGPU, replace `N` with `0` (GPU). If you are using your laptop or EDUSERV, replace it with `-1`, which instructs MaChAmp to train the model on the CPU. +Everything you see on screen at this stage will be saved in a training log file called `logs/compsyn/DATE/log.txt`. + ### Step 4: evaluation Run your newly trained model with ``` python predict.py logs/compsyn/DATE/model.pt PATH-TO-YOUR-PART1-TREEBANK predictions/OUTPUT-FILE-NAME.conllu --device N ``` -and use the `machamp/scripts/misc/conll18_ud_eval.py` script to evaluate the system output against your annotations. You can run it as + +This saves your model's predictions, i.e. the trees produced by your new parser, in `predictions/OUTPUT-FILE-NAME.conllu`. +You can take a look at this file to get a first impression of how your model performs. + +Then, use the `machamp/scripts/misc/conll18_ud_eval.py` script to evaluate the system output against your annotations. You can run it as ``` python scripts/misc/conll18_ud_eval.py PATH-TO-YOUR-PART1-TREEBANK predictions/OUTPUT-FILE-NAME.conllu ``` -On Canvas, submit the training logs, the predictions and the output of `conll18_ud_eval.py`, along with a short text summarizing your considerations on the performance of the parser. \ No newline at end of file +On Canvas, submit the training logs, the predictions and the output of `conll18_ud_eval.py`, along with a short text summarizing your considerations on the performance of the parser, based on the predictions themselves and on the output of the results of the evaluation. \ No newline at end of file