forked from GitHub/comp-syntax-gu-mlt
more clarifications lab 3
This commit is contained in:
@@ -47,6 +47,20 @@ expands to
|
|||||||
|
|
||||||
We recommend that you annotate at least the first few sentences from scratch.
|
We recommend that you annotate at least the first few sentences from scratch.
|
||||||
When you start feeling confident, you may pre-parse the remaining ones with UDPipe and manually correct the automatic annotation.
|
When you start feeling confident, you may pre-parse the remaining ones with UDPipe and manually correct the automatic annotation.
|
||||||
|
If you are unsure about an annotation choice you made, you can add a comment line (starting with `#`) right before the sentence in question.
|
||||||
|
To fully comply with the CoNLL-U standard, comment lines should consist of key-value pairs, e.g.
|
||||||
|
|
||||||
|
```conllu
|
||||||
|
# comment = your comment here
|
||||||
|
```
|
||||||
|
|
||||||
|
but for this assigment lines like
|
||||||
|
|
||||||
|
```
|
||||||
|
# your comment here
|
||||||
|
```
|
||||||
|
|
||||||
|
are perfectly acceptable too.
|
||||||
|
|
||||||
### Step 4: make sure your files match the CoNLL-U specification
|
### Step 4: make sure your files match the CoNLL-U specification
|
||||||
Once you have full CoNLL, you can use [deptreepy](https://github.com/aarneranta/deptreepy/), [STUnD](https://harisont.github.io/STUnD/) or [the official online CoNNL-U viewer](https://universaldependencies.org/conllu_viewer.html) to visualize it.
|
Once you have full CoNLL, you can use [deptreepy](https://github.com/aarneranta/deptreepy/), [STUnD](https://harisont.github.io/STUnD/) or [the official online CoNNL-U viewer](https://universaldependencies.org/conllu_viewer.html) to visualize it.
|
||||||
@@ -91,6 +105,7 @@ If you are working on MLTGPU, you may choose a large treebank such as [Swedish-T
|
|||||||
|
|
||||||
If you are working on your laptop and/or if your language does not have a lot of data available, you may want to use a smaller treebank, such as [Amharic-ATT](https://github.com/UniversalDependencies/UD_Amharic-ATT), which only comes with a test set.
|
If you are working on your laptop and/or if your language does not have a lot of data available, you may want to use a smaller treebank, such as [Amharic-ATT](https://github.com/UniversalDependencies/UD_Amharic-ATT), which only comes with a test set.
|
||||||
In this case, split the test into a training and a development portion (e.g. 80% of the sentences for training and 20% for development).
|
In this case, split the test into a training and a development portion (e.g. 80% of the sentences for training and 20% for development).
|
||||||
|
Make sure both files end with an empty line.
|
||||||
|
|
||||||
To ensure that MaChAmp works correctly, preprocess __all__ of your data (including your own test data) by running
|
To ensure that MaChAmp works correctly, preprocess __all__ of your data (including your own test data) by running
|
||||||
|
|
||||||
@@ -111,16 +126,22 @@ python3 train.py --dataset_configs configs/compsyn.json --device N
|
|||||||
from the MaChAmp folder.
|
from the MaChAmp folder.
|
||||||
If you are working on MLTGPU, replace `N` with `0` (GPU). If you are using your laptop or EDUSERV, replace it with `-1`, which instructs MaChAmp to train the model on the CPU.
|
If you are working on MLTGPU, replace `N` with `0` (GPU). If you are using your laptop or EDUSERV, replace it with `-1`, which instructs MaChAmp to train the model on the CPU.
|
||||||
|
|
||||||
|
Everything you see on screen at this stage will be saved in a training log file called `logs/compsyn/DATE/log.txt`.
|
||||||
|
|
||||||
### Step 4: evaluation
|
### Step 4: evaluation
|
||||||
Run your newly trained model with
|
Run your newly trained model with
|
||||||
|
|
||||||
```
|
```
|
||||||
python predict.py logs/compsyn/DATE/model.pt PATH-TO-YOUR-PART1-TREEBANK predictions/OUTPUT-FILE-NAME.conllu --device N
|
python predict.py logs/compsyn/DATE/model.pt PATH-TO-YOUR-PART1-TREEBANK predictions/OUTPUT-FILE-NAME.conllu --device N
|
||||||
```
|
```
|
||||||
and use the `machamp/scripts/misc/conll18_ud_eval.py` script to evaluate the system output against your annotations. You can run it as
|
|
||||||
|
This saves your model's predictions, i.e. the trees produced by your new parser, in `predictions/OUTPUT-FILE-NAME.conllu`.
|
||||||
|
You can take a look at this file to get a first impression of how your model performs.
|
||||||
|
|
||||||
|
Then, use the `machamp/scripts/misc/conll18_ud_eval.py` script to evaluate the system output against your annotations. You can run it as
|
||||||
|
|
||||||
```
|
```
|
||||||
python scripts/misc/conll18_ud_eval.py PATH-TO-YOUR-PART1-TREEBANK predictions/OUTPUT-FILE-NAME.conllu
|
python scripts/misc/conll18_ud_eval.py PATH-TO-YOUR-PART1-TREEBANK predictions/OUTPUT-FILE-NAME.conllu
|
||||||
```
|
```
|
||||||
|
|
||||||
On Canvas, submit the training logs, the predictions and the output of `conll18_ud_eval.py`, along with a short text summarizing your considerations on the performance of the parser.
|
On Canvas, submit the training logs, the predictions and the output of `conll18_ud_eval.py`, along with a short text summarizing your considerations on the performance of the parser, based on the predictions themselves and on the output of the results of the evaluation.
|
||||||
Reference in New Issue
Block a user