From 646752515b3a3587770a5b7fb47dc2fd5fce5799 Mon Sep 17 00:00:00 2001 From: Aarne Ranta Date: Tue, 4 Apr 2023 17:19:36 +0200 Subject: [PATCH] UDpipe instructions --- lab1/lecture3-examples.txt | 38 +++++++++++++++++++++++ lab1/using_udpipe.md | 63 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 101 insertions(+) create mode 100644 lab1/lecture3-examples.txt create mode 100644 lab1/using_udpipe.md diff --git a/lab1/lecture3-examples.txt b/lab1/lecture3-examples.txt new file mode 100644 index 0000000..9b06b95 --- /dev/null +++ b/lab1/lecture3-examples.txt @@ -0,0 +1,38 @@ +John prefers wine to beer. +John definitely prefers beer to wine today if he is hungry. +John walks. +John does not walk. +John has walked. +John is old. +John is a doctor. +John is here. +Probably the best beer in the world. +Why? + +She will. +The black cat was seen. +There is an elephant in the room. +It is too cold in the room. +She gave me a hint. +She gave a hint to me. +John saw a man with a telescope. +She walks today. +genetically modified + +Why does she walk? + +Why does she not walk? + +Every man in the city works in the city. +She is here. +She has been here. +The reason is that I am tired. +He can sing. +Does he sing? + +Would he have sung? + +He would not have been tired. +He called me a "bad loser". + + diff --git a/lab1/using_udpipe.md b/lab1/using_udpipe.md new file mode 100644 index 0000000..5f1741e --- /dev/null +++ b/lab1/using_udpipe.md @@ -0,0 +1,63 @@ +# UDPipe: quick instructions + +## Download and install + +The simplest way to use UDPipe is to install the binaries for UDPipe-1, which exist for several operating systems. +They can be downloaded from + +https://github.com/ufal/udpipe/releases/download/v1.3.0/udpipe-1.3.0-bin.zip + +When you have downloaded and unzipped this file, you will find the binary for your system in a subdirectory, for instance, +``` +udpipe-1.3.0-bin/bin-macos/udpipe +``` +is the binary for MacOS. +If you include this directory on your PATH, you can run the command `udpipe` from anywhere. + +Running the parser for a language requires a model for that language. +Models can be accessed via + +https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-3131 + +This page includes a long list of models and a command to install them all. +If you need only some of them, you can do, for instance, +``` +$ wget https://lindat.mff.cuni.cz/repository/xmlui/bitstream/handle/11234/1-3131//english-lines-ud-2.5-191206.udpipe +``` + +## Running the parser + +Assuming that you have the binary `udpipe` and the model `english-lines-ud-2.5-191206.udpipe` on you path, you can parse a single sentence with +``` +$ echo "my hovercraft is full of eels" | udpipe --tokenize --tag --parse english-lines-ud-2.5-191206.udpipe +``` +The result is a UD tree in the CoNLL-U notation, +``` +# newdoc +# newpar +# sent_id = 1 +# text = my hovercraft is full of eels +1 my I PRON P1SG-GEN Number=Sing|Person=1|Poss=Yes|PronType=Prs nmod:poss _ _ +2 hovercraft hovercraft NOUN SG-NOM Number=Sing 4 nsubj _ _ +3 is be AUX PRES Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin cop _ _ +4 full full ADJ POS Degree=Pos 0 root _ _ +5 of of ADP _ _ 6 case _ _ +6 eels eel NOUN PL-NOM Number=Plur 4 nmod _ SpacesAfter=\n +``` +If you also have `gfud` and `pdflatex` on your path, you can pipe the result into `gfud conll2pdf` to see the graphical tree. + +As `udpipe` reads standard input, you can read it from a file, such as `lecture3-examples.txt`: +``` +$ cat | udpipe --tokenize --tag --parse +``` +Notice that sentences in that file must either end with a period or be separated by empty lines, because otherwise the whole file is parsed as one sentence. + + +## Training a new model + +If you have a treebank in the CoNLL-U format, you can use it for training a new model, with +``` +$ cat .conllu | udpipe --tokenizer none --tagger none --train .udpipe +``` + +