minor updates for smoother lab 2 part 2

2025-05-18 17:27:22 +02:00
parent a2a29f4b35
commit 003f0edbc4
2 changed files with 13 additions and 3 deletions
--- a/lab3/README.md
+++ b/lab3/README.md
@@ -64,7 +64,9 @@ Submit the two CoNLL-U files on Canvas.
 ## Part 2: UD parsing
 In this part of the lab, you will train and evaluate a UD parsing + POS tagging model.
-For better performance, you are strongly encouraged to use the MLTGPU server.
+For better performance, you are strongly encouraged to use the MLTGPU server. 
 If you want to install MaChAmp on your own computer, keep in mind that very old and very new Python version are not supported. 
 For more information, see [here](https://github.com/machamp-nlp/machamp/issues/42). 
 ### Step 1: setting up MaChAmp
 1. optional, but recommended: create a Python virtual environment with the command
@@ -81,7 +83,7 @@ For better performance, you are strongly encouraged to use the MLTGPU server.
  pip3 install -r requirements.txt
  ```
-### Step 2: selecting the training and development data
+### Step 2: preparing the training and development data
 Choose a UD treebank for one of the two languages you annotated in [part 1](#part-1-ud-annotation) and download it. 
 If you translated the corpus to a language that does not have a UD treebank, download a treebank for a related language (e.g. Italian if you annotated sentences in Sardinian). 
@@ -90,6 +92,14 @@ If you are working on MLTGPU, you may choose a large treebank such as [Swedish-T
 If you are working on your laptop and/or if your language does not have a lot of data available, you may want to use a smaller treebank, such as [Amharic-ATT](https://github.com/UniversalDependencies/UD_Amharic-ATT), which only comes with a test set. 
 In this case, split the test into a training and a development portion (e.g. 80% of the sentences for training and 20% for development).
 To ensure that MaChAmp works correctly, preprocess __all__ of your data (including your own test data) by running 
 ```
 python scripts/misc/cleanconl.py PATH-TO-A-DATASET-SPLIT
 ```
 This replaces the contents of your input file with a "cleaned up" version of the same treebank.
 ### Step 3: training
 Copy `compsyn.json` to `machamp/configs` and replace the traning and development data paths with the paths to the files you selected/created in step 2.
@@ -110,7 +120,7 @@ python predict.py logs/compsyn/DATE/model.pt PATH-TO-YOUR-PART1-TREEBANK predict
 and use the `machamp/scripts/misc/conll18_ud_eval.py` script to evaluate the system output against your annotations. You can run it as
 ```
-python conll18_ud_eval.py PATH-TO-YOUR-PART1-TREEBANK predictions/OUTPUT-FILE-NAME.conllu
+python scripts/misc/conll18_ud_eval.py PATH-TO-YOUR-PART1-TREEBANK predictions/OUTPUT-FILE-NAME.conllu
 ```
 On Canvas, submit the training logs, the predictions and the output of `conll18_ud_eval.py`, along with a short text summarizing your considerations on the performance of the parser.
--- a/lab3/machamp_config.json
+++ b/lab3/machamp_config.json