forked from GitHub/comp-syntax-gu-mlt
more clarifications lab 3
This commit is contained in:
@@ -47,6 +47,20 @@ expands to
|
||||
|
||||
We recommend that you annotate at least the first few sentences from scratch.
|
||||
When you start feeling confident, you may pre-parse the remaining ones with UDPipe and manually correct the automatic annotation.
|
||||
If you are unsure about an annotation choice you made, you can add a comment line (starting with `#`) right before the sentence in question.
|
||||
To fully comply with the CoNLL-U standard, comment lines should consist of key-value pairs, e.g.
|
||||
|
||||
```conllu
|
||||
# comment = your comment here
|
||||
```
|
||||
|
||||
but for this assigment lines like
|
||||
|
||||
```
|
||||
# your comment here
|
||||
```
|
||||
|
||||
are perfectly acceptable too.
|
||||
|
||||
### Step 4: make sure your files match the CoNLL-U specification
|
||||
Once you have full CoNLL, you can use [deptreepy](https://github.com/aarneranta/deptreepy/), [STUnD](https://harisont.github.io/STUnD/) or [the official online CoNNL-U viewer](https://universaldependencies.org/conllu_viewer.html) to visualize it.
|
||||
@@ -91,6 +105,7 @@ If you are working on MLTGPU, you may choose a large treebank such as [Swedish-T
|
||||
|
||||
If you are working on your laptop and/or if your language does not have a lot of data available, you may want to use a smaller treebank, such as [Amharic-ATT](https://github.com/UniversalDependencies/UD_Amharic-ATT), which only comes with a test set.
|
||||
In this case, split the test into a training and a development portion (e.g. 80% of the sentences for training and 20% for development).
|
||||
Make sure both files end with an empty line.
|
||||
|
||||
To ensure that MaChAmp works correctly, preprocess __all__ of your data (including your own test data) by running
|
||||
|
||||
@@ -111,16 +126,22 @@ python3 train.py --dataset_configs configs/compsyn.json --device N
|
||||
from the MaChAmp folder.
|
||||
If you are working on MLTGPU, replace `N` with `0` (GPU). If you are using your laptop or EDUSERV, replace it with `-1`, which instructs MaChAmp to train the model on the CPU.
|
||||
|
||||
Everything you see on screen at this stage will be saved in a training log file called `logs/compsyn/DATE/log.txt`.
|
||||
|
||||
### Step 4: evaluation
|
||||
Run your newly trained model with
|
||||
|
||||
```
|
||||
python predict.py logs/compsyn/DATE/model.pt PATH-TO-YOUR-PART1-TREEBANK predictions/OUTPUT-FILE-NAME.conllu --device N
|
||||
```
|
||||
and use the `machamp/scripts/misc/conll18_ud_eval.py` script to evaluate the system output against your annotations. You can run it as
|
||||
|
||||
This saves your model's predictions, i.e. the trees produced by your new parser, in `predictions/OUTPUT-FILE-NAME.conllu`.
|
||||
You can take a look at this file to get a first impression of how your model performs.
|
||||
|
||||
Then, use the `machamp/scripts/misc/conll18_ud_eval.py` script to evaluate the system output against your annotations. You can run it as
|
||||
|
||||
```
|
||||
python scripts/misc/conll18_ud_eval.py PATH-TO-YOUR-PART1-TREEBANK predictions/OUTPUT-FILE-NAME.conllu
|
||||
```
|
||||
|
||||
On Canvas, submit the training logs, the predictions and the output of `conll18_ud_eval.py`, along with a short text summarizing your considerations on the performance of the parser.
|
||||
On Canvas, submit the training logs, the predictions and the output of `conll18_ud_eval.py`, along with a short text summarizing your considerations on the performance of the parser, based on the predictions themselves and on the output of the results of the evaluation.
|
||||
Reference in New Issue
Block a user