UDpipe instructions

2023-04-04 17:19:36 +02:00
parent c8abcf4bd4
commit 646752515b
2 changed files with 101 additions and 0 deletions
--- a/lab1/lecture3-examples.txt
+++ b/lab1/lecture3-examples.txt
@@ -0,0 +1,38 @@
+John prefers wine to beer.
+John definitely prefers beer to wine today if he is hungry.
+John walks.
+John does not walk.
+John has walked.
+John is old.
+John is a doctor.
+John is here.
+Probably the best beer in the world.
+Why?
+
+She will.
+The black cat was seen.
+There is an elephant in the room.
+It is too cold in the room.
+She gave me a hint.
+She gave a hint to me.
+John saw a man with a telescope.
+She walks today.
+genetically modified
+
+Why does she walk?
+
+Why does she not walk?
+
+Every man in the city works in the city.
+She is here.
+She has been here.
+The reason is that I am tired.
+He can sing.
+Does he sing?
+
+Would he have sung?
+
+He would not have been tired.
+He called me a "bad loser".
+
+
--- a/lab1/using_udpipe.md
+++ b/lab1/using_udpipe.md
@@ -0,0 +1,63 @@
+# UDPipe: quick instructions
+
+## Download and install
+
+The simplest way to use UDPipe is to install the binaries for UDPipe-1, which exist for several operating systems.
+They can be downloaded from
+
+https://github.com/ufal/udpipe/releases/download/v1.3.0/udpipe-1.3.0-bin.zip
+
+When you have downloaded and unzipped this file, you will find the binary for your system in a subdirectory, for instance,
+```
+udpipe-1.3.0-bin/bin-macos/udpipe
+```
+is the binary for MacOS.
+If you include this directory on your PATH, you can run the command `udpipe` from anywhere.
+
+Running the parser for a language requires a model for that language.
+Models can be accessed via
+
+https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-3131
+
+This page includes a long list of models and a command to install them all.
+If you need only some of them, you can do, for instance,
+```
+$ wget https://lindat.mff.cuni.cz/repository/xmlui/bitstream/handle/11234/1-3131//english-lines-ud-2.5-191206.udpipe
+```
+
+## Running the parser
+
+Assuming that you have the binary `udpipe` and the model `english-lines-ud-2.5-191206.udpipe` on you path, you can parse a single sentence with
+```
+$ echo "my hovercraft is full of eels" | udpipe --tokenize --tag --parse english-lines-ud-2.5-191206.udpipe
+```
+The result is a UD tree in the CoNLL-U notation,
+```
+# newdoc
+# newpar
+# sent_id = 1
+# text = my hovercraft is full of eels
+1	my	I	PRON	P1SG-GEN	Number=Sing|Person=1|Poss=Yes|PronType=Prs	nmod:poss	_	_
+2	hovercraft	hovercraft	NOUN	SG-NOM	Number=Sing	4	nsubj	_	_
+3	is	be	AUX	PRES	Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin	cop	_	_
+4	full	full	ADJ	POS	Degree=Pos	0	root	_	_
+5	of	of	ADP	_	_	6	case	_	_
+6	eels	eel	NOUN	PL-NOM	Number=Plur	4	nmod	_	SpacesAfter=\n
+```
+If you also have `gfud` and `pdflatex` on your path, you can pipe the result into `gfud conll2pdf` to see the graphical tree.
+
+As `udpipe` reads standard input, you can read it from a file, such as `lecture3-examples.txt`:
+```
+$ cat <myfile> | udpipe  --tokenize --tag --parse <model>
+```
+Notice that sentences in that file must either end with a period or be separated by empty lines, because otherwise the whole file is parsed as one sentence.
+
+
+## Training a new model
+
+If you have a treebank in the CoNLL-U format, you can use it for training a new model, with
+```
+$ cat <myfile>.conllu | udpipe --tokenizer none --tagger none --train <myfile>.udpipe
+```
+
+