last year's lecture material moved to directory 2025
175
lectures/2025/README.md
Normal file
@@ -0,0 +1,175 @@
|
||||
# Computations Syntax Lectures: Outline
|
||||
|
||||
## Lecture 1
|
||||
|
||||
Coursenotes: Chapter 1
|
||||
|
||||
Participants' native languages:
|
||||
Chinese (2), Dutch, English, Finnish, French (2), Greek, Hebrew, Italian (3),
|
||||
Korean, Persian (2), Polish, Portuguese, Romanian (3), Russian, Spanish, Swedish (2),
|
||||
Swiss German, West-Assyrian - 24 students, 17 languages + 2 teachers, 1 more language
|
||||
|
||||
Formal grammar is no more expected to match natural language exactly
|
||||
- analysis: should be wider than the language (we will use UD)
|
||||
- generation: should be contained in the language (we will use GF)
|
||||
- in both formats, we aim to use universal concepts for many languages
|
||||
|
||||
Phrase structure grammars, context-free = BNF, grammar rules, trees
|
||||
- example: [english.cf](lecture-01/english.cf)
|
||||
- testing grammars in GF: import, generate_random, parse, linearize, visualize_parse, help
|
||||
|
||||
|
||||
GF grammars: dividing .cf into abstract and concrete .gf
|
||||
- example: [Intro*.gf](lecture-01/)
|
||||
- forms of rules: cat, fun, lincat, lin
|
||||
- word order switch English-Italian
|
||||
- to solve next time:
|
||||
|
||||
Experiments in GF:
|
||||
- https://cloud.grammaticalframework.org/minibar/minibar.html
|
||||
- Grammar: ResourceDemo, Startcat: S
|
||||
|
||||
|
||||
## Lecture 2
|
||||
|
||||
Agreement, parameter definitions, variable and inherent features, linearization types
|
||||
|
||||
[IntroEng.gf](lecture-02/InfroEng.gf)
|
||||
|
||||
For you to do:
|
||||
- write a concrete syntax for some other language, carefully thinking about
|
||||
|
||||
|
||||
### Instructions for ARM Mac users
|
||||
|
||||
The GF Download page contains a binary for the Mac with an Intel processor, but it will not work for newer Macs, which use an ARM Processor (called M1, M2, or M3 by Apple).
|
||||
|
||||
We have therefore prepared a binary for these newer Macs.
|
||||
To download it, open a terminal and do:
|
||||
```
|
||||
cd # go to your home directory
|
||||
mkdir tmp # if the directory tmp does not exist already
|
||||
cd tmp
|
||||
wget https://www.grammaticalframework.org/~aarne/gf-mac.gz
|
||||
```
|
||||
This is better than downloading via a browser, because your Mac OS may then block the use of the program as "unreliable".
|
||||
|
||||
After download, stay in the terminal and do:
|
||||
```
|
||||
gunzip gf-mac.gz
|
||||
mv gf-mac gf
|
||||
chmod a+x gf
|
||||
./gf
|
||||
```
|
||||
You should now see the GF promt. Type 'help' to see if it works!
|
||||
|
||||
Hint: if any of the terminal commands used above are unfamiliar to you, it is a good time to learn them now.
|
||||
They will be useful throughout your future career as a programmer!
|
||||
The readily available method is the `man` command, for instance,
|
||||
```
|
||||
man gunzip
|
||||
```
|
||||
|
||||
The next thing is to move it to a place where you can find it from anywhere in your system.
|
||||
One standard place is
|
||||
```
|
||||
mv gf /usr/local/bin
|
||||
```
|
||||
If you get "permission denied", you will have to write
|
||||
```
|
||||
sudo mv gf /usr/local/bin
|
||||
```
|
||||
and type your computer's password.
|
||||
|
||||
Then you can try
|
||||
```
|
||||
cd
|
||||
gf
|
||||
```
|
||||
to verify that GF works in your home directory.
|
||||
|
||||
After that, you can test it in the course GitHub directory
|
||||
```
|
||||
cd comp-syntax-gu-mlt/lectures/lecture-02
|
||||
gf
|
||||
> import IntroEng.gf # in GF
|
||||
```
|
||||
You can work here for a while.
|
||||
The next step will be to install the RGL, but this can wait a bit.
|
||||
The instructions in https://www.grammaticalframework.org/download/index-3.11.html should work even for the ARM Mac.
|
||||
|
||||
|
||||
## Lecture 3
|
||||
|
||||
Course notes: Chapter 2, Chapter 5
|
||||
|
||||
Analysing UD data with shell commands:
|
||||
```
|
||||
cat treebanks/UD_Swedish-Talbanken/sv_talbanken-ud-train.conllu | cut -f4 | grep -v "#" | sort
|
||||
cat treebanks/UD_Swedish-Talbanken/sv_talbanken-ud-train.conllu | cut -f4 | grep -v "#" | sort -u
|
||||
cat treebanks/UD_Swedish-Talbanken/sv_talbanken-ud-train.conllu | cut -f4 | grep -v "#" | sort -u | wc
|
||||
```
|
||||
Again, make sure to learn to use these shell commands!
|
||||
|
||||
Adding deptreepy to the pipeline:
|
||||
```
|
||||
cat treebanks/UD_English-EWT/en_ewt-ud-train.conllu | ./deptreepy.py "statistics POS"
|
||||
cat treebanks/UD_English-EWT/en_ewt-ud-train.conllu | ./deptreepy.py "match_wordlines (POS X)"
|
||||
cat treebanks/UD_English-EWT/en_ewt-ud-train.conllu | ./deptreepy.py "statistics FEATS"
|
||||
cat treebanks/UD_English-EWT/en_ewt-ud-train.conllu | ./deptreepy.py "match_wordlines (POS NOUN) | statistics FEATS"
|
||||
```
|
||||
Download deptreepy and the UD treebanks, and do the same for other treebanks of other languages!
|
||||
|
||||
Confirmed Swedish inflection table by looking up a word at https://svenska.se/ and also learn what is inherent and what is variable.
|
||||
|
||||
Started MorphologyEng.gf and MorphologySwe.gf in lecture-03/.
|
||||
|
||||
|
||||
## Lecture 7
|
||||
|
||||
We took a look at the RGL synopsis, https://www.grammaticalframework.org/lib/doc/synopsis/
|
||||
|
||||
We focused on a few things:
|
||||
- the hierarchic view of categories (Chapter 1)
|
||||
- Sentence/Clause distinction, looking at "inflection tables" of clauses in https://cloud.grammaticalframework.org/minibar/minibar.html (ResourceDemo, startcat Cl)
|
||||
- verb valencies: V, V2, V3, VA, VV, etc and the "sense distinctions" that come with different valency patterns and also typically are translated with different words
|
||||
|
||||
An examples of verb valencies, "look":
|
||||
- V: look ; titta
|
||||
- V2: look at ; titta på
|
||||
- V2: look for ; leta efter
|
||||
- V2: look after ; ta hand om
|
||||
- V2 : look like ; se ut som
|
||||
- V3 : look up ; slå upp
|
||||
- VA: look (good) ; se (bra) ut
|
||||
|
||||
We also briefly discussed complements vs. adjuncts and pointed out that they can be difficult to distinguish and that UD does not even try.
|
||||
|
||||
|
||||
## Lecture 8
|
||||
|
||||
Installing RGL: a binary release can be found in
|
||||
|
||||
https://github.com/GrammaticalFramework/gf-rgl/releases/tag/20250429
|
||||
|
||||
Steps:
|
||||
1. Download rgl-20250429.tgz
|
||||
2. Put it into some good place, for instance ~/GF or /usr/local/lib
|
||||
3. Uncompress it with `tar xvfz`
|
||||
4. The top directory created is lib, with subdirectories alltenses, prelude, present. List them to see lots of .gfo files
|
||||
5. Export the absolute path to this lib as the value of the environment variable `GF_LIB_PATH`, which GF recognizes: `export GF_LIB_PATH=/Users/aarne/GF/lib` if this is where you have placed it.
|
||||
6. This export command can also be attached you your .bashrc or .zprofile, or whatever shell initialization file you have
|
||||
|
||||
When you have done this, you can test if it works in the following way:
|
||||
```
|
||||
$ gf
|
||||
> i alltenses/LangEng.gfo
|
||||
> gr -cat=Cl | l -table
|
||||
```
|
||||
We also looked at the source of the RGL, obtained by cloning https://github.com/GrammaticalFramework/gf-rgl
|
||||
The binaries can be compiled from this, if you need a Haskell compiler.
|
||||
If you don't have one, you can still keep the sources just for documentation.
|
||||
They can be imported in the GF program, but compiling the whole RGL is easier if you use `make install`, which requires Haskell.
|
||||
|
||||
|
||||
|
||||
30
lectures/2025/lecture-01/Intro.gf
Normal file
@@ -0,0 +1,30 @@
|
||||
abstract Intro = {
|
||||
|
||||
cat
|
||||
S ;
|
||||
NP ;
|
||||
VP ;
|
||||
CN ;
|
||||
AP ;
|
||||
Det ;
|
||||
Pron ;
|
||||
N ;
|
||||
A ;
|
||||
V2 ;
|
||||
|
||||
fun
|
||||
PredVP : NP -> VP -> S ;
|
||||
ComplV2 : V2 -> NP -> VP ;
|
||||
DetCN : Det -> CN -> NP ;
|
||||
AdjCN : AP -> CN -> CN ;
|
||||
UseN : N -> CN ;
|
||||
UseA : A -> AP ;
|
||||
UsePron : Pron -> NP ;
|
||||
|
||||
the_Det : Det ;
|
||||
black_A : A ;
|
||||
cat_N : N ;
|
||||
see_V2 : V2 ;
|
||||
we_Pron : Pron ;
|
||||
|
||||
}
|
||||
30
lectures/2025/lecture-01/IntroEng.gf
Normal file
@@ -0,0 +1,30 @@
|
||||
concrete IntroEng of Intro = {
|
||||
|
||||
lincat
|
||||
S = Str ;
|
||||
NP = Str ;
|
||||
VP = Str ;
|
||||
CN = Str ;
|
||||
AP = Str ;
|
||||
Det = Str ;
|
||||
Pron = Str ;
|
||||
N = Str ;
|
||||
A = Str ;
|
||||
V2 = Str ;
|
||||
|
||||
lin
|
||||
PredVP np vp = np ++ vp ;
|
||||
ComplV2 v2 np = v2 ++ np ;
|
||||
DetCN det cn = det ++ cn ;
|
||||
AdjCN ap cn = ap ++ cn ;
|
||||
UseN n = n ;
|
||||
UseA a = a ;
|
||||
UsePron pron = pron ;
|
||||
|
||||
the_Det = "the" ;
|
||||
black_A = "black" ;
|
||||
cat_N = "cat" ;
|
||||
see_V2 = "sees" ;
|
||||
we_Pron = "us" ;
|
||||
|
||||
}
|
||||
30
lectures/2025/lecture-01/IntroIta.gf
Normal file
@@ -0,0 +1,30 @@
|
||||
concrete IntroIta of Intro = {
|
||||
|
||||
lincat
|
||||
S = Str ;
|
||||
NP = Str ;
|
||||
VP = Str ;
|
||||
CN = Str ;
|
||||
AP = Str ;
|
||||
Det = Str ;
|
||||
Pron = Str ;
|
||||
N = Str ;
|
||||
A = Str ;
|
||||
V2 = Str ;
|
||||
|
||||
lin
|
||||
PredVP np vp = np ++ vp ;
|
||||
ComplV2 v2 np = np ++ v2 ;
|
||||
DetCN det cn = det ++ cn ;
|
||||
AdjCN ap cn = cn ++ ap ;
|
||||
UseN n = n ;
|
||||
UseA a = a ;
|
||||
UsePron pron = pron ;
|
||||
|
||||
the_Det = "il" ;
|
||||
black_A = "nero" ;
|
||||
cat_N = "gatto" ;
|
||||
see_V2 = "vede" ;
|
||||
we_Pron = "ci" ;
|
||||
|
||||
}
|
||||
14
lectures/2025/lecture-01/english.cf
Normal file
@@ -0,0 +1,14 @@
|
||||
S ::= NP VP ;
|
||||
NP ::= Det CN ;
|
||||
NP ::= Pron ;
|
||||
CN ::= AP CN ;
|
||||
CN ::= N ;
|
||||
VP ::= V2 NP ;
|
||||
AP ::= A ;
|
||||
|
||||
Det ::= "the" ;
|
||||
N ::= "cat" ;
|
||||
A ::= "black" ;
|
||||
V2 ::= "sees" ;
|
||||
Pron ::= "us" ;
|
||||
|
||||
30
lectures/2025/lecture-02/Intro.gf
Normal file
@@ -0,0 +1,30 @@
|
||||
abstract Intro = {
|
||||
|
||||
cat
|
||||
S ;
|
||||
NP ;
|
||||
VP ;
|
||||
CN ;
|
||||
AP ;
|
||||
Det ;
|
||||
Pron ;
|
||||
N ;
|
||||
A ;
|
||||
V2 ;
|
||||
|
||||
fun
|
||||
PredVP : NP -> VP -> S ;
|
||||
ComplV2 : V2 -> NP -> VP ;
|
||||
DetCN : Det -> CN -> NP ;
|
||||
AdjCN : AP -> CN -> CN ;
|
||||
UseN : N -> CN ;
|
||||
UseA : A -> AP ;
|
||||
UsePron : Pron -> NP ;
|
||||
|
||||
the_Det : Det ;
|
||||
black_A : A ;
|
||||
cat_N : N ;
|
||||
see_V2 : V2 ;
|
||||
we_Pron : Pron ;
|
||||
|
||||
}
|
||||
40
lectures/2025/lecture-02/IntroEng.gf
Normal file
@@ -0,0 +1,40 @@
|
||||
concrete IntroEng of Intro = {
|
||||
|
||||
lincat
|
||||
S = Str ;
|
||||
NP = {s : Case => Str ; a : Agr} ;
|
||||
VP = Agr => Str ;
|
||||
CN = Str ;
|
||||
AP = Str ;
|
||||
Det = Str ;
|
||||
Pron = {s : Case => Str ; a : Agr} ;
|
||||
N = Str ;
|
||||
A = Str ;
|
||||
V2 = Agr => Str ;
|
||||
|
||||
lin
|
||||
PredVP np vp = np.s ! Nom ++ vp ! np.a ;
|
||||
ComplV2 v2 np = table {a => v2 ! a ++ np.s ! Acc} ;
|
||||
DetCN det cn = {
|
||||
s = table {_ => det ++ cn} ;
|
||||
a = SgP3
|
||||
} ;
|
||||
AdjCN ap cn = ap ++ cn ;
|
||||
UseN n = n ;
|
||||
UseA a = a ;
|
||||
UsePron pron = pron ;
|
||||
|
||||
the_Det = "the" ;
|
||||
black_A = "black" ;
|
||||
cat_N = "cat" ;
|
||||
see_V2 = table {SgP3 => "sees" ; Other => "see"} ;
|
||||
we_Pron = {
|
||||
s = table {Nom => "we" ; Acc => "us"} ;
|
||||
a = Other
|
||||
} ;
|
||||
|
||||
param
|
||||
Case = Nom | Acc ;
|
||||
Agr = SgP3 | Other ;
|
||||
|
||||
}
|
||||
30
lectures/2025/lecture-02/IntroIta.gf
Normal file
@@ -0,0 +1,30 @@
|
||||
concrete IntroIta of Intro = {
|
||||
|
||||
lincat
|
||||
S = Str ;
|
||||
NP = Str ;
|
||||
VP = Str ;
|
||||
CN = Str ;
|
||||
AP = Str ;
|
||||
Det = Str ;
|
||||
Pron = Str ;
|
||||
N = Str ;
|
||||
A = Str ;
|
||||
V2 = Str ;
|
||||
|
||||
lin
|
||||
PredVP np vp = np ++ vp ;
|
||||
ComplV2 v2 np = np ++ v2 ;
|
||||
DetCN det cn = det ++ cn ;
|
||||
AdjCN ap cn = cn ++ ap ;
|
||||
UseN n = n ;
|
||||
UseA a = a ;
|
||||
UsePron pron = pron ;
|
||||
|
||||
the_Det = "il" ;
|
||||
black_A = "nero" ;
|
||||
cat_N = "gatto" ;
|
||||
see_V2 = "vede" ;
|
||||
we_Pron = "ci" ;
|
||||
|
||||
}
|
||||
34
lectures/2025/lecture-03/MorphologyEng.gf
Normal file
@@ -0,0 +1,34 @@
|
||||
resource MorphologyEng = {
|
||||
|
||||
param
|
||||
Number = Sg | Pl ;
|
||||
|
||||
oper
|
||||
Noun : Type = {s : Number => Str} ;
|
||||
|
||||
mkNoun : Str -> Str -> Noun = \sg, pl ->
|
||||
{s = table {Sg => sg ; Pl => pl}} ;
|
||||
|
||||
regNoun : Str -> Noun = \sg -> mkNoun sg (sg + "s") ;
|
||||
|
||||
smartNoun : Str -> Noun = \sg -> case sg of {
|
||||
_ + ("s" | "ch" | "sh") => mkNoun sg (sg + "es") ;
|
||||
_ + ("ay" | "ey" | "oy" | "uy") => regNoun sg ;
|
||||
x + "y" => mkNoun sg (x + "ies") ;
|
||||
_ => regNoun sg
|
||||
} ;
|
||||
|
||||
-- to test
|
||||
teacher_N : Noun = {s = table {Sg => "teacher" ; Pl => "teachers"}} ;
|
||||
|
||||
cat_N : Noun = mkNoun "cat" "cats" ;
|
||||
|
||||
dog_N : Noun = regNoun "dog" ;
|
||||
|
||||
bus_N : Noun = smartNoun "bus" ;
|
||||
baby_N : Noun = smartNoun "baby" ;
|
||||
fly_N : Noun = smartNoun "fly" ;
|
||||
|
||||
|
||||
}
|
||||
|
||||
61
lectures/2025/lecture-03/MorphologySwe.gf
Normal file
@@ -0,0 +1,61 @@
|
||||
resource MorphologySwe = {
|
||||
|
||||
param
|
||||
Case = Nom | Gen ;
|
||||
Definite = Ind | Def ;
|
||||
Gender = Com | Neut ;
|
||||
Number = Sg | Pl ;
|
||||
|
||||
NForm = NF Number Definite Case ; -- NF is a constructor
|
||||
|
||||
oper
|
||||
-- Noun = {s : Number => Definite => Case => Str ; g : Gender} ;
|
||||
Noun = {s : NForm => Str ; g : Gender} ;
|
||||
|
||||
mkNoun : (sin, sig, sdn, sdg, pin, pig, pdn, pdg : Str) -> Gender -> Noun =
|
||||
\sin, sig, sdn, sdg, pin, pig, pdn, pdg, g -> {
|
||||
s = table {
|
||||
NF Sg Ind Nom => sin ;
|
||||
NF Sg Ind Gen => sig ;
|
||||
NF Sg Def Nom => sdn ;
|
||||
NF Sg Def Gen => sdg ;
|
||||
NF Pl Ind Nom => pin ;
|
||||
NF Pl Ind Gen => pig ;
|
||||
NF Pl Def Nom => pdn ;
|
||||
NF Pl Def Gen => pdg
|
||||
} ;
|
||||
g = g
|
||||
} ;
|
||||
|
||||
addS : Str -> Str = \s -> case s of {
|
||||
_ + ("s" | "x" | "z") => s ;
|
||||
_ => s + "s"
|
||||
} ;
|
||||
|
||||
mk4Noun : (sin, sdn, pin, pdn : Str) -> Noun =
|
||||
\sin, sdn, pin, pdn -> {
|
||||
s = table {
|
||||
NF Sg Ind Nom => sin ;
|
||||
NF Sg Ind Gen => addS sin ;
|
||||
NF Sg Def Nom => sdn ;
|
||||
NF Sg Def Gen => addS sdn ;
|
||||
NF Pl Ind Nom => pin ;
|
||||
NF Pl Ind Gen => addS pin ;
|
||||
NF Pl Def Nom => pdn ;
|
||||
NF Pl Def Gen => addS pdn
|
||||
} ;
|
||||
g = case sdn of {
|
||||
_ + "n" => Com ;
|
||||
_ => Neut
|
||||
}
|
||||
} ;
|
||||
|
||||
smartNoun : Str -> Noun = \mamma -> case mamma of {
|
||||
mamm + "a" => mkNoun mamma (mamma + "s") (mamma + "n") (mamma + "ns")
|
||||
(mamm + "or") (mamm + "ors") (mamm + "orna") (mamm + "ornas")
|
||||
Com ;
|
||||
bil => mkNoun bil (bil + "s") (bil + "en") (bil + "ens")
|
||||
(bil + "ar") (bil + "ars") (bil + "arna") (bil + "arnas") Com
|
||||
} ;
|
||||
|
||||
}
|
||||
BIN
lectures/2025/lecture-04/slides.pdf
Normal file
21
lectures/2025/lecture-05/Agreement.gf
Normal file
@@ -0,0 +1,21 @@
|
||||
abstract Agreement = {
|
||||
cat
|
||||
NP ;
|
||||
CN ;
|
||||
N ;
|
||||
A ;
|
||||
Det ;
|
||||
|
||||
fun
|
||||
DetCN : Det -> CN -> NP ; -- this black cat
|
||||
AdjCN : A -> N -> CN ; -- black cat
|
||||
UseN : N -> CN ; -- cat
|
||||
|
||||
cat_N : N ;
|
||||
house_N : N ;
|
||||
black_A : A ;
|
||||
big_A : A ;
|
||||
-- simplification of pronouns just to make English interesting
|
||||
this_Det : Det ;
|
||||
these_Det : Det ;
|
||||
}
|
||||
32
lectures/2025/lecture-05/AgreementEng.gf
Normal file
@@ -0,0 +1,32 @@
|
||||
concrete AgreementEng of Agreement = open MorphologyEng in {
|
||||
lincat
|
||||
NP = {s: Str; n: Number} ;
|
||||
CN = Noun ;
|
||||
N = Noun ;
|
||||
A = {s: Str} ;
|
||||
Det = {s: Str; n: Number} ;
|
||||
|
||||
lin
|
||||
DetCN d cn = {
|
||||
s = d.s ++ (cn.s ! d.n) ;
|
||||
n = d.n ;
|
||||
} ;
|
||||
AdjCN a cn = {
|
||||
s = \\n => a.s ++ (cn.s ! n) ;
|
||||
} ;
|
||||
UseN n = n ;
|
||||
|
||||
cat_N = regNoun "cat" ;
|
||||
house_N = regNoun "house" ;
|
||||
black_A = {s = "black"} ;
|
||||
big_A = {s = "big"} ;
|
||||
this_Det = {
|
||||
s = "this";
|
||||
n = Sg ;
|
||||
} ;
|
||||
these_Det = {
|
||||
s = "these";
|
||||
n = Pl ;
|
||||
} ;
|
||||
|
||||
}
|
||||
42
lectures/2025/lecture-05/AgreementSwe.gf
Normal file
@@ -0,0 +1,42 @@
|
||||
concrete AgreementSwe of Agreement = open MorphologySwe in {
|
||||
lincat
|
||||
NP = {s: Str; a: NPAgreement} ;
|
||||
CN = Noun ;
|
||||
N = Noun ;
|
||||
A = Adjective ;
|
||||
Det = {s : Gender => Str; n: Number; d: Definite} ; -- and possible Definiteness
|
||||
|
||||
lin
|
||||
DetCN d cn = {
|
||||
s = (d.s ! cn.g) ++ (cn.s ! (NF d.n d.d Nom)) ;
|
||||
a = NPAgr d.n d.d cn.g ;
|
||||
} ;
|
||||
AdjCN a n = {
|
||||
s = \\nf => let agr = NPAgr (nform2number nf) (nform2definite nf) n.g
|
||||
in (a.s ! agr) ++ (n.s ! nf) ;
|
||||
g = n.g
|
||||
} ;
|
||||
UseN n = n ;
|
||||
|
||||
cat_N = mk4Noun "katt" "katten" "katter" "katterna" ;
|
||||
house_N = mk4Noun "hus" "huset" "hus" "husen" ;
|
||||
black_A = mk3Adjective "svart" "svart" "svarta" ;
|
||||
big_A = mk3Adjective "stor" "stort" "stora" ;
|
||||
this_Det = {
|
||||
s = table {
|
||||
Com => "den här" ;
|
||||
Neut => "det här"
|
||||
} ;
|
||||
n = Sg ;
|
||||
d = Def ;
|
||||
} ;
|
||||
these_Det = {
|
||||
s = table {
|
||||
Com => "de här" ;
|
||||
Neut => "de här"
|
||||
};
|
||||
n = Pl ;
|
||||
d = Def ;
|
||||
} ;
|
||||
|
||||
}
|
||||
34
lectures/2025/lecture-05/MorphologyEng.gf
Normal file
@@ -0,0 +1,34 @@
|
||||
resource MorphologyEng = {
|
||||
|
||||
param
|
||||
Number = Sg | Pl ;
|
||||
|
||||
oper
|
||||
Noun : Type = {s : Number => Str} ;
|
||||
|
||||
mkNoun : Str -> Str -> Noun = \sg, pl ->
|
||||
{s = table {Sg => sg ; Pl => pl}} ;
|
||||
|
||||
regNoun : Str -> Noun = \sg -> mkNoun sg (sg + "s") ;
|
||||
|
||||
smartNoun : Str -> Noun = \sg -> case sg of {
|
||||
_ + ("s" | "ch" | "sh") => mkNoun sg (sg + "es") ;
|
||||
_ + ("ay" | "ey" | "oy" | "uy") => regNoun sg ;
|
||||
x + "y" => mkNoun sg (x + "ies") ;
|
||||
_ => regNoun sg
|
||||
} ;
|
||||
|
||||
-- to test
|
||||
teacher_N : Noun = {s = table {Sg => "teacher" ; Pl => "teachers"}} ;
|
||||
|
||||
cat_N : Noun = mkNoun "cat" "cats" ;
|
||||
|
||||
dog_N : Noun = regNoun "dog" ;
|
||||
|
||||
bus_N : Noun = smartNoun "bus" ;
|
||||
baby_N : Noun = smartNoun "baby" ;
|
||||
fly_N : Noun = smartNoun "fly" ;
|
||||
|
||||
|
||||
}
|
||||
|
||||
85
lectures/2025/lecture-05/MorphologySwe.gf
Normal file
@@ -0,0 +1,85 @@
|
||||
resource MorphologySwe = {
|
||||
|
||||
param
|
||||
Case = Nom | Gen ;
|
||||
Definite = Ind | Def ;
|
||||
Gender = Com | Neut ;
|
||||
Number = Sg | Pl ;
|
||||
|
||||
NForm = NF Number Definite Case ; -- NF is a constructor
|
||||
NPAgreement = NPAgr Number Definite Gender ;
|
||||
|
||||
oper
|
||||
nform2number : NForm -> Number = \nf -> case nf of {
|
||||
(NF n _ _) => n
|
||||
} ;
|
||||
|
||||
nform2definite : NForm -> Definite = \nf -> case nf of {
|
||||
(NF _ d _) => d
|
||||
} ;
|
||||
|
||||
-- Noun = {s : Number => Definite => Case => Str ; g : Gender} ;
|
||||
Noun = {s : NForm => Str ; g : Gender} ;
|
||||
|
||||
Adjective = { s: NPAgreement => Str } ;
|
||||
|
||||
mkNoun : (sin, sig, sdn, sdg, pin, pig, pdn, pdg : Str) -> Gender -> Noun =
|
||||
\sin, sig, sdn, sdg, pin, pig, pdn, pdg, g -> {
|
||||
s = table {
|
||||
NF Sg Ind Nom => sin ;
|
||||
NF Sg Ind Gen => sig ;
|
||||
NF Sg Def Nom => sdn ;
|
||||
NF Sg Def Gen => sdg ;
|
||||
NF Pl Ind Nom => pin ;
|
||||
NF Pl Ind Gen => pig ;
|
||||
NF Pl Def Nom => pdn ;
|
||||
NF Pl Def Gen => pdg
|
||||
} ;
|
||||
g = g
|
||||
} ;
|
||||
|
||||
addS : Str -> Str = \s -> case s of {
|
||||
_ + ("s" | "x" | "z") => s ;
|
||||
_ => s + "s"
|
||||
} ;
|
||||
|
||||
mk3Adjective : (stor, stort, stora : Str) -> Adjective = \stor, stort, stora -> {
|
||||
s = table {
|
||||
NPAgr Sg Ind Com => stor ;
|
||||
NPAgr Sg Ind Neut => stort ;
|
||||
NPAgr Sg Def Com => stora ;
|
||||
NPAgr Sg Def Neut => stora ;
|
||||
NPAgr Pl Ind Com => stora ;
|
||||
NPAgr Pl Ind Neut => stora ;
|
||||
NPAgr Pl Def Com => stora ;
|
||||
NPAgr Pl Def Neut => stora
|
||||
}
|
||||
} ;
|
||||
|
||||
mk4Noun : (sin, sdn, pin, pdn : Str) -> Noun =
|
||||
\sin, sdn, pin, pdn -> {
|
||||
s = table {
|
||||
NF Sg Ind Nom => sin ;
|
||||
NF Sg Ind Gen => addS sin ;
|
||||
NF Sg Def Nom => sdn ;
|
||||
NF Sg Def Gen => addS sdn ;
|
||||
NF Pl Ind Nom => pin ;
|
||||
NF Pl Ind Gen => addS pin ;
|
||||
NF Pl Def Nom => pdn ;
|
||||
NF Pl Def Gen => addS pdn
|
||||
} ;
|
||||
g = case sdn of {
|
||||
_ + "n" => Com ;
|
||||
_ => Neut
|
||||
}
|
||||
} ;
|
||||
|
||||
smartNoun : Str -> Noun = \mamma -> case mamma of {
|
||||
mamm + "a" => mkNoun mamma (mamma + "s") (mamma + "n") (mamma + "ns")
|
||||
(mamm + "or") (mamm + "ors") (mamm + "orna") (mamm + "ornas")
|
||||
Com ;
|
||||
bil => mkNoun bil (bil + "s") (bil + "en") (bil + "ens")
|
||||
(bil + "ar") (bil + "ars") (bil + "arna") (bil + "arnas") Com
|
||||
} ;
|
||||
|
||||
}
|
||||
33
lectures/2025/lecture-05/examples.md
Normal file
@@ -0,0 +1,33 @@
|
||||
## Number agreement in NPs
|
||||
|
||||
| Singular | Plural |
|
||||
| --- | --- |
|
||||
| black cat | black cats |
|
||||
| musta kissa | __mustat__ kissat |
|
||||
| gatto nero | gatti __neri__ |
|
||||
| schwarze Katze | schwarze Katzen |
|
||||
| chat noir | chats __noirs__ |
|
||||
| μαύρη γάτα | __μαύρες__ γάτες |
|
||||
| czarny kot | __czarne__ koty |
|
||||
| gato negro | gatos __negros__ |
|
||||
| pisică neagră | pisici __negre__ |
|
||||
| svart katt | __svarta__ katter |
|
||||
| zwarte kat | zwarte katten |
|
||||
| gato preto | gatos __pretos__ |
|
||||
| черная кошка | __черные__ кошки |
|
||||
|
||||
- these black cats - de här svarta katterna
|
||||
- these black houses - de här svarta husen
|
||||
- these cats - de här katterna
|
||||
- these houses - de här husen
|
||||
- this black cat - den här svarta katten
|
||||
- this black house - det här svarta huset
|
||||
- this cat - den här katten
|
||||
- this house - det här huset
|
||||
|
||||
- big cat(s) - stor katt / stora katten / stora katter / stora katterna
|
||||
- black cat(s) - svart katt / svarta katten / svarta katter / svarta katterna
|
||||
- big house(s) - stort hus / stora huset / stora hus / stora husen
|
||||
- black house(s) - svart hus / svarta huset / svarta hus / svarta husen
|
||||
- cat - katt
|
||||
- house - hus
|
||||
94
lectures/2025/lecture-n-1/.gitignore
vendored
Normal file
@@ -0,0 +1,94 @@
|
||||
## Core latex/pdflatex auxiliary files:
|
||||
*.aux
|
||||
*.lof
|
||||
*.log
|
||||
*.lot
|
||||
*.fls
|
||||
*.out
|
||||
*.toc
|
||||
|
||||
## Intermediate documents:
|
||||
*.dvi
|
||||
# these rules might exclude image files for figures etc.
|
||||
# *.ps
|
||||
# *.eps
|
||||
# *.pdf
|
||||
|
||||
## Bibliography auxiliary files (bibtex/biblatex/biber):
|
||||
*.bbl
|
||||
*.bcf
|
||||
*.blg
|
||||
*-blx.aux
|
||||
*-blx.bib
|
||||
*.run.xml
|
||||
|
||||
## Build tool auxiliary files:
|
||||
*.fdb_latexmk
|
||||
*.synctex.gz
|
||||
*.synctex.gz(busy)
|
||||
*.pdfsync
|
||||
|
||||
## Auxiliary and intermediate files from other packages:
|
||||
|
||||
# algorithms
|
||||
*.alg
|
||||
*.loa
|
||||
|
||||
# amsthm
|
||||
*.thm
|
||||
|
||||
# beamer
|
||||
*.nav
|
||||
*.snm
|
||||
*.vrb
|
||||
|
||||
# glossaries
|
||||
*.acn
|
||||
*.acr
|
||||
*.glg
|
||||
*.glo
|
||||
*.gls
|
||||
|
||||
# hyperref
|
||||
*.brf
|
||||
|
||||
# listings
|
||||
*.lol
|
||||
|
||||
# makeidx
|
||||
*.idx
|
||||
*.ilg
|
||||
*.ind
|
||||
*.ist
|
||||
|
||||
# minitoc
|
||||
*.maf
|
||||
*.mtc
|
||||
*.mtc0
|
||||
|
||||
# minted
|
||||
*.pyg
|
||||
|
||||
# nomencl
|
||||
*.nlo
|
||||
|
||||
# sagetex
|
||||
*.sagetex.sage
|
||||
*.sagetex.py
|
||||
*.sagetex.scmd
|
||||
|
||||
# sympy
|
||||
*.sout
|
||||
*.sympy
|
||||
sympy-plots-for-*.tex/
|
||||
|
||||
# todonotes
|
||||
*.tdo
|
||||
|
||||
# xindy
|
||||
*.xdy
|
||||
|
||||
# useless files
|
||||
color_scheme.png
|
||||
identicon.png
|
||||
._wordcount_selection.tex
|
||||
187
lectures/2025/lecture-n-1/beamerthemelucid.sty
Normal file
@@ -0,0 +1,187 @@
|
||||
\usepackage{tikz}
|
||||
\usetikzlibrary{calc}
|
||||
|
||||
% -------- COLOR SCHEME --------
|
||||
\definecolor{PrimaryColor}{RGB}{7,79,140} % primary color (blue)
|
||||
\definecolor{SecondaryColor}{RGB}{242,88,26} % bulleted lists
|
||||
\definecolor{BackgroundColor}{RGB}{255,255,255} % background & titles (white)
|
||||
\definecolor{TextColor}{RGB}{0,0,0} % text (black)
|
||||
\definecolor{ProgBarBGColor}{RGB}{175,175,175} % progress bar background (grey)
|
||||
|
||||
|
||||
% set colours
|
||||
\setbeamercolor{normal text}{fg=TextColor}\usebeamercolor*{normal text}
|
||||
\setbeamercolor{alerted text}{fg=PrimaryColor}
|
||||
\setbeamercolor{section in toc}{fg=PrimaryColor}
|
||||
\setbeamercolor{structure}{fg=SecondaryColor}
|
||||
\hypersetup{colorlinks,linkcolor=,urlcolor=SecondaryColor}
|
||||
|
||||
% set fonts
|
||||
\setbeamerfont{itemize/enumerate body}{size=\large}
|
||||
\setbeamerfont{itemize/enumerate subbody}{size=\normalsize}
|
||||
\setbeamerfont{itemize/enumerate subsubbody}{size=\small}
|
||||
|
||||
% make pixelated bullets
|
||||
\setbeamertemplate{itemize item}{
|
||||
\tikz{
|
||||
\draw[fill=SecondaryColor,draw=none] (0, 0) rectangle(0.1, 0.1);
|
||||
\draw[fill=SecondaryColor,draw=none] (0.1, 0.1) rectangle(0.2, 0.2);
|
||||
\draw[fill=SecondaryColor,draw=none] (0, 0.2) rectangle(0.1, 0.3);
|
||||
}
|
||||
}
|
||||
\setbeamertemplate{itemize subitem}{
|
||||
\tikz{
|
||||
\draw[fill=SecondaryColor,draw=none] (0, 0) rectangle(0.075, 0.075);
|
||||
\draw[fill=SecondaryColor,draw=none] (0.075, 0.075) rectangle(0.15, 0.15);
|
||||
\draw[fill=SecondaryColor,draw=none] (0, 0.15) rectangle(0.075, 0.225);
|
||||
}
|
||||
}
|
||||
\setbeamertemplate{itemize subsubitem}{
|
||||
\tikz{
|
||||
\draw[fill=SecondaryColor,draw=none] (0.050, 0.050) rectangle(0.15, 0.15);
|
||||
}
|
||||
}
|
||||
|
||||
% disable navigation
|
||||
\setbeamertemplate{navigation symbols}{}
|
||||
|
||||
% disable the damn default logo!
|
||||
\setbeamertemplate{sidebar right}{}
|
||||
|
||||
% custom draw the title page above
|
||||
\setbeamertemplate{title page}{}
|
||||
|
||||
% again, manually draw the frame title above
|
||||
\setbeamertemplate{frametitle}{}
|
||||
|
||||
% disable "Figure:" in the captions
|
||||
% TODO: somehow this doesn't work for md-generated slides
|
||||
%\setbeamertemplate{caption}{\tiny\insertcaption}
|
||||
%\setbeamertemplate{caption label separator}{}
|
||||
|
||||
% add some space below the footnotes so they don't end up on the progress bar
|
||||
\setbeamertemplate{footnote}{
|
||||
\parindent 0em
|
||||
\noindent
|
||||
\raggedright
|
||||
\hbox to 0.8em{\hfil\insertfootnotemark}
|
||||
\insertfootnotetext
|
||||
\par
|
||||
\vspace{2em}
|
||||
}
|
||||
|
||||
% add the same vspace both before and after quotes
|
||||
\setbeamertemplate{quote begin}{\vspace{0.5em}}
|
||||
\setbeamertemplate{quote end}{\vspace{0.5em}}
|
||||
|
||||
% progress bar counters
|
||||
\newcounter{showProgressBar}
|
||||
\setcounter{showProgressBar}{1}
|
||||
\newcounter{showSlideNumbers}
|
||||
\setcounter{showSlideNumbers}{1}
|
||||
\newcounter{showSlideTotal}
|
||||
\setcounter{showSlideTotal}{1}
|
||||
|
||||
% use \makeatletter for our progress bar definitions
|
||||
% progress bar idea from http://tex.stackexchange.com/a/59749/44221
|
||||
% slightly adapted for visual purposes here
|
||||
\makeatletter
|
||||
\newcount\progressbar@tmpcounta% auxiliary counter
|
||||
\newcount\progressbar@tmpcountb% auxiliary counter
|
||||
\newdimen\progressbar@pbwidth %progressbar width
|
||||
\newdimen\progressbar@tmpdim % auxiliary dimension
|
||||
|
||||
\newdimen\slidewidth % auxiliary dimension
|
||||
\newdimen\slideheight % auxiliary dimension
|
||||
|
||||
% make the progress bar go across the screen
|
||||
\progressbar@pbwidth=\the\paperwidth
|
||||
\slidewidth=\the\paperwidth
|
||||
\slideheight=\the\paperheight
|
||||
|
||||
% draw everything with tikz
|
||||
\setbeamertemplate{background}{ % all slides
|
||||
% progress bar stuff
|
||||
\progressbar@tmpcounta=\insertframenumber
|
||||
\progressbar@tmpcountb=\inserttotalframenumber
|
||||
\progressbar@tmpdim=\progressbar@pbwidth
|
||||
\divide\progressbar@tmpdim by 100
|
||||
\multiply\progressbar@tmpdim by \progressbar@tmpcounta
|
||||
\divide\progressbar@tmpdim by \progressbar@tmpcountb
|
||||
\multiply\progressbar@tmpdim by 100
|
||||
|
||||
\begin{tikzpicture}
|
||||
% set up the entire slide as the canvas
|
||||
\useasboundingbox (0,0) rectangle(\the\paperwidth,\the\paperheight);
|
||||
|
||||
% background
|
||||
\fill[color=BackgroundColor] (0,0) rectangle(\the\paperwidth,\the\paperheight);
|
||||
|
||||
\ifnum\thepage=1\relax % only title slides
|
||||
% primary color rectangle
|
||||
\fill[color=PrimaryColor] (0, 4cm) rectangle(\slidewidth,\slideheight);
|
||||
|
||||
% text (title, subtitle, author, date)
|
||||
\node[anchor=south,text width=\slidewidth-1cm,inner xsep=0.5cm] at (0.5\slidewidth,4cm) {\color{BackgroundColor}\Huge\textbf{\inserttitle}};
|
||||
\node[anchor=north east,text width=\slidewidth-1cm,align=right] at (\slidewidth-0.4cm,4cm) {\color{PrimaryColor}\large\textbf{\insertsubtitle}};
|
||||
\node at (0.5\slidewidth,2cm) {\color{PrimaryColor}\LARGE\insertauthor};
|
||||
\node at (0.5\slidewidth,1.25cm) {\color{PrimaryColor}\Large\insertinstitute};
|
||||
\node[anchor=south east] at(\slidewidth,0cm) {\color{PrimaryColor}\tiny\insertdate};
|
||||
\else % other slides
|
||||
% title bar
|
||||
\fill[color=PrimaryColor] (0, \slideheight-1cm) rectangle(\slidewidth,\slideheight);
|
||||
|
||||
% slide title
|
||||
\node[anchor=north,text width=\slidewidth-0.75cm,inner xsep=0.5cm,inner ysep=0.25cm] at (0.5\slidewidth,\slideheight) {\color{BackgroundColor}\huge\textbf{\insertframetitle}};
|
||||
|
||||
% logo (TODO: autoscale; now it expects 350x350
|
||||
\node[anchor=north east] at (\slidewidth-0.25cm,\slideheight+0.06cm){\insertlogo};
|
||||
|
||||
% show progress bar
|
||||
\ifnum \value{showProgressBar}>0\relax%
|
||||
% progress bar icon in the middle of the screen
|
||||
\draw[fill=ProgBarBGColor,draw=none] (0cm,0cm) rectangle(\slidewidth,0.25cm);
|
||||
\draw[fill=PrimaryColor,draw=none] (0cm,0cm) rectangle(\progressbar@tmpdim,0.25cm);
|
||||
|
||||
% bottom info
|
||||
\node[anchor=south west] at(0cm,0.25cm) {\color{PrimaryColor}\tiny\vphantom{lp}\insertsection};
|
||||
% if slide numbers are active
|
||||
\ifnum \value{showSlideNumbers}>0\relax%
|
||||
% if slide totals are active
|
||||
\ifnum \value{showSlideTotal}>0\relax%
|
||||
% draw both slide number and slide total
|
||||
\node[anchor=south east] at(\slidewidth,0.25cm) {\color{PrimaryColor}\tiny\insertframenumber/\inserttotalframenumber};
|
||||
\else
|
||||
\node[anchor=south east] at(\slidewidth,0.25cm) {\color{PrimaryColor}\tiny\insertframenumber};
|
||||
\fi
|
||||
\fi
|
||||
\else
|
||||
% section title in the bottom left
|
||||
\node[anchor=south west] at(0cm,0cm) {\color{PrimaryColor}\tiny\vphantom{lp}\insertsection};
|
||||
% if we're showing slide numbers
|
||||
\ifnum \value{showSlideNumbers}>0\relax%
|
||||
% if slide totals are active
|
||||
\ifnum \value{showSlideTotal}>0\relax%
|
||||
% slide number and slide total
|
||||
\node[anchor=south east] at(\slidewidth,0cm) {\color{PrimaryColor}\tiny\insertframenumber/\inserttotalframenumber};
|
||||
\else
|
||||
\node[anchor=south east] at(\slidewidth,0cm) {\color{PrimaryColor}\tiny\insertframenumber};
|
||||
\fi
|
||||
\fi
|
||||
\fi
|
||||
\fi
|
||||
\end{tikzpicture}
|
||||
}
|
||||
\makeatother
|
||||
|
||||
\AtBeginSection{\frame{\sectionpage}} % section pages
|
||||
\setbeamertemplate{section page}
|
||||
{
|
||||
\begin{tikzpicture}
|
||||
% set up the entire slide as the canvas
|
||||
\useasboundingbox (0,0) rectangle(\slidewidth,\slideheight);
|
||||
\fill[color=BackgroundColor] (-1cm, 2cm) rectangle (\slidewidth, \slideheight+0.1cm);
|
||||
\fill[color=PrimaryColor] (-1cm, 0.5\slideheight-1cm) rectangle(\slidewidth, 0.5\slideheight+1cm);
|
||||
\node[text width=\the\paperwidth-1cm,align=center] at (0.4\slidewidth, 0.5\slideheight) {\color{BackgroundColor}\Huge\textbf{\insertsection}};
|
||||
\end{tikzpicture}
|
||||
}
|
||||
BIN
lectures/2025/lecture-n-1/gu.png
Normal file
|
After Width: | Height: | Size: 81 KiB |
BIN
lectures/2025/lecture-n-1/img/argmining.png
Normal file
|
After Width: | Height: | Size: 57 KiB |
BIN
lectures/2025/lecture-n-1/img/gfast.png
Normal file
|
After Width: | Height: | Size: 64 KiB |
BIN
lectures/2025/lecture-n-1/img/machamp.png
Normal file
|
After Width: | Height: | Size: 337 KiB |
BIN
lectures/2025/lecture-n-1/img/sets.png
Normal file
|
After Width: | Height: | Size: 160 KiB |
1
lectures/2025/lecture-n-1/img/sets.svg
Normal file
|
After Width: | Height: | Size: 51 KiB |
6
lectures/2025/lecture-n-1/img/ud.conllu
Normal file
@@ -0,0 +1,6 @@
|
||||
1 the the DET DT Definite=Def|PronType=Art 3 det _ TokenRange=0:3
|
||||
2 black black ADJ JJ Degree=Pos 3 amod _ TokenRange=4:9
|
||||
3 cat cat NOUN NN Number=Sing 4 nsubj _ TokenRange=10:13
|
||||
4 sees see VERB VBZ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 0 root _ TokenRange=14:18
|
||||
5 us we PRON PRP Case=Acc|Number=Plur|Person=1|PronType=Prs 4 obj _ TokenRange=19:21
|
||||
6 now now ADV RB PronType=Dem 4 advmod _ SpaceAfter=No|TokenRange=22:25
|
||||
51
lectures/2025/lecture-n-1/img/ud.svg
Normal file
@@ -0,0 +1,51 @@
|
||||
<svg width="317"
|
||||
height="115"
|
||||
viewBox="0 0 317 115"
|
||||
version="1.1"
|
||||
xmlns="http://www.w3.org/2000/svg">
|
||||
<text x="5" y="108" font-size="16">the</text>
|
||||
<text x="42" y="108" font-size="16">black</text>
|
||||
<text x="97" y="108" font-size="16">cat</text>
|
||||
<text x="143" y="108" font-size="16">sees</text>
|
||||
<text x="189" y="108" font-size="16">us</text>
|
||||
<text x="235" y="108" font-size="16">now</text>
|
||||
<text x="5" y="93" font-size="10">DET</text>
|
||||
<text x="42" y="93" font-size="10">ADJ</text>
|
||||
<text x="97" y="93" font-size="10">NOUN</text>
|
||||
<text x="143" y="93" font-size="10">VERB</text>
|
||||
<text x="189" y="93" font-size="10">PRON</text>
|
||||
<text x="235" y="93" font-size="10">ADV</text>
|
||||
<path d="M 17 80 Q 17 47 50 47 L 72 47 Q 105 47 105 80"
|
||||
stroke="black"
|
||||
fill="none"/>
|
||||
<line x1="17" y1="75" x2="17" y2="80" stroke="black"/>
|
||||
<path d="M 17 80 14 74 20 74"/>
|
||||
<text x="54" y="42" font-size="10">det</text>
|
||||
<path d="M 55 80 Q 55 63 71 63 L 88 63 Q 104 63 104 80"
|
||||
stroke="black"
|
||||
fill="none"/>
|
||||
<line x1="55" y1="75" x2="55" y2="80" stroke="black"/>
|
||||
<path d="M 55 80 52 74 58 74"/>
|
||||
<text x="71" y="58" font-size="10">amod</text>
|
||||
<path d="M 110 80 Q 110 63 127 63 L 133 63 Q 150 63 150 80"
|
||||
stroke="black"
|
||||
fill="none"/>
|
||||
<line x1="110" y1="75" x2="110" y2="80" stroke="black"/>
|
||||
<path d="M 110 80 107 74 113 74"/>
|
||||
<text x="119" y="58" font-size="10">nsubj</text>
|
||||
<line x1="158" y1="20" x2="158" y2="80" stroke="black"/>
|
||||
<path d="M 158 80 155 74 161 74"/>
|
||||
<text x="163" y="28" font-size="10">root</text>
|
||||
<path d="M 166 80 Q 166 63 183 63 L 189 63 Q 206 63 206 80"
|
||||
stroke="black"
|
||||
fill="none"/>
|
||||
<line x1="206" y1="75" x2="206" y2="80" stroke="black"/>
|
||||
<path d="M 206 80 203 74 209 74"/>
|
||||
<text x="179" y="58" font-size="10">obj</text>
|
||||
<path d="M 165 80 Q 165 47 198 47 L 220 47 Q 253 47 253 80"
|
||||
stroke="black"
|
||||
fill="none"/>
|
||||
<line x1="253" y1="75" x2="253" y2="80" stroke="black"/>
|
||||
<path d="M 253 80 250 74 256 74"/>
|
||||
<text x="195" y="42" font-size="10">advmod</text>
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 2.1 KiB |
219
lectures/2025/lecture-n-1/slides.md
Normal file
@@ -0,0 +1,219 @@
|
||||
---
|
||||
title: "Training and evaluating \\newline dependency parsers"
|
||||
subtitle: "(added to the course by popular demand)"
|
||||
author: "Arianna Masciolini"
|
||||
theme: "lucid"
|
||||
logo: "gu.png"
|
||||
date: "VT25"
|
||||
institute: "LT2214 Computational Syntax"
|
||||
---
|
||||
|
||||
## Today's topic
|
||||
\bigskip \bigskip
|
||||

|
||||
|
||||
# Parsing
|
||||
|
||||
## A structured prediction task
|
||||
Sequence $\to$ structure, e.g.
|
||||
|
||||
- natural language sentence $\to$ syntax tree
|
||||
- code $\to$ AST
|
||||
- argumentative essay $\to$ argumentative structure
|
||||
- ...
|
||||
|
||||
## Example (argmining)
|
||||
|
||||
> Språkbanken has better fika than CLASP: every fika, someone bakes. Sure, CLASP has a better coffee machine. On the other hand, there are more important things than coffee. In fact, most people drink tea in the afternoon.
|
||||
|
||||
## Example (argmining)
|
||||

|
||||
|
||||
\footnotesize From "A gentle introduction to argumentation mining" (Lindahl et al., 2022)
|
||||
|
||||
# Syntactic parsing
|
||||
|
||||
## From sentence to tree
|
||||
From chapter 18 of _Speech and Language Processing_, (Jurafsky & Martin, January 2024 draft):
|
||||
|
||||
> Syntactic parsing is the task of assigning a syntactic structure to a sentence
|
||||
|
||||
- the structure is usually a _syntax tree_
|
||||
- two main classes of approaches:
|
||||
- constituency parsing (e.g. GF)
|
||||
- dependency parsing (e.g. UD)
|
||||
|
||||
## Example (GF)
|
||||
```
|
||||
MicroLang> i MicroLangEng.gf
|
||||
linking ... OK
|
||||
|
||||
Languages: MicroLangEng
|
||||
7 msec
|
||||
MicroLang> p "the black cat sees us now"
|
||||
PredVPS (DetCN the_Det (AdjCN (PositA black_A)
|
||||
(UseN cat_N))) (AdvVP (ComplV2 see_V2 (UsePron
|
||||
we_Pron)) now_Adv)
|
||||
```
|
||||
|
||||
## Example (GF)
|
||||
```haskell
|
||||
PredVPS
|
||||
(DetCN
|
||||
the_Det
|
||||
(AdjCN (PositA black_A) (UseN cat_N))
|
||||
)
|
||||
(AdvVP
|
||||
(ComplV2 see_V2 (UsePron we_Pron))
|
||||
now_Adv
|
||||
)
|
||||
```
|
||||
|
||||
## Example (GF)
|
||||

|
||||
|
||||
# Dependency parsing
|
||||
|
||||
## Example (UD)
|
||||

|
||||
|
||||
\small
|
||||
```
|
||||
1 the _ DET _ _ 3 det _ _
|
||||
2 black _ ADJ _ _ 3 amod _ _
|
||||
3 cat _ NOUN _ _ 4 nsubj _ _
|
||||
4 sees _ VERB _ _ 0 root _ _
|
||||
5 us _ PRON _ _ 4 obj _ _
|
||||
6 now _ ADV _ _ 4 advmod _ _
|
||||
```
|
||||
|
||||
## Two paradigms
|
||||
- __graph-based algorithms__: find the optimal tree from the set of all possible candidate solutions (or a subset of it)
|
||||
- __transition-based algorithms__: incrementally build a tree by solving a sequence of classification problems
|
||||
|
||||
## Graph-based approaches
|
||||
$$\hat{t} = \underset{t \in T(s)}{argmax}\, score(s,t)$$
|
||||
|
||||
- $t$: candidate tree
|
||||
- $\hat{t}$: predicted tree
|
||||
- $s$: input sentence
|
||||
- $T(s)$: set of candidate trees for $s$
|
||||
|
||||
## Complexity
|
||||
Depends on:
|
||||
|
||||
- choice of $T$ (upper bound: $n^{n-1}$, where $n$ is the number of words in $s$)
|
||||
- scoring function (in the __arc-factor model__, the score of a tree is the sum of the score of each edge, scored individually by a NN)
|
||||
|
||||
|
||||
In practice: $O(n^3)$ complexity
|
||||
|
||||
## Transition-based approaches
|
||||
- trees are built through a sequence of steps, called _transitions_
|
||||
- training requires:
|
||||
- a gold-standard treebank (as for graph-based approaches)
|
||||
- an _oracle_ i.e. an algorithm that converts each tree into a a gold-standard sequence of transitions
|
||||
- much more efficient: $O(n)$
|
||||
|
||||
## Evaluation
|
||||
2 main metrics:
|
||||
|
||||
- __UAS__ (Unlabelled Attachment Score): what's the fraction of nodes are attached to the correct dependency head?
|
||||
- __LAS__ (Labelled Attachment Score): what's the fraction of nodes are attached to the correct dependency head _with an arc labelled with the correct relation type_[^1]?
|
||||
|
||||
[^1]: in UD: the `DEPREL` column
|
||||
|
||||
# Specifics of UD parsing
|
||||
|
||||
## Not just parsing per se
|
||||
UD "parsers" typically do a lot more than dependency parsing:
|
||||
|
||||
- sentence segmentation
|
||||
- tokenization
|
||||
- lemmatization (`LEMMA` column)
|
||||
- POS tagging (`UPOS` + `XPOS`)
|
||||
- morphological tagging (`FEATS`)
|
||||
- ...
|
||||
|
||||
Sometimes, some of these tasks are performed __jointly__ to achieve better performance.
|
||||
|
||||
## Evaluation (UD-specific)
|
||||
Some more specific metrics:
|
||||
|
||||
- __CLAS__ (Content-word LAS): LAS limited to content words
|
||||
- __MLAS__ (Morphology-Aware LAS): CLAS that also uses the `FEATS` column
|
||||
- __BLEX__ (Bi-Lexical dependency score): CLAS that also uses the `LEMMA` column
|
||||
|
||||
## Evaluation script output
|
||||
\small
|
||||
```
|
||||
Metric | Precision | Recall | F1 Score | AligndAcc
|
||||
-----------+-----------+-----------+-----------+-----------
|
||||
Tokens | 100.00 | 100.00 | 100.00 |
|
||||
Sentences | 100.00 | 100.00 | 100.00 |
|
||||
Words | 100.00 | 100.00 | 100.00 |
|
||||
UPOS | 98.36 | 98.36 | 98.36 | 98.36
|
||||
XPOS | 100.00 | 100.00 | 100.00 | 100.00
|
||||
UFeats | 100.00 | 100.00 | 100.00 | 100.00
|
||||
AllTags | 98.36 | 98.36 | 98.36 | 98.36
|
||||
Lemmas | 100.00 | 100.00 | 100.00 | 100.00
|
||||
UAS | 92.73 | 92.73 | 92.73 | 92.73
|
||||
LAS | 90.30 | 90.30 | 90.30 | 90.30
|
||||
CLAS | 88.50 | 88.34 | 88.42 | 88.34
|
||||
MLAS | 86.72 | 86.56 | 86.64 | 86.56
|
||||
BLEX | 88.50 | 88.34 | 88.42 | 88.34
|
||||
```
|
||||
|
||||
## Three generations of parsers
|
||||
(all transition-based)
|
||||
|
||||
1. __MaltParser__ (Nivre et al. 2006): "classic" transition-based parser, data-driven but not NN-based
|
||||
2. __UDPipe__: neural parser, personal favorite
|
||||
- v1 (Straka et al. 2016): fast, solid software, easy to install and available anywhere
|
||||
- v2 (Straka et al. 2018): much better results but slower and only available through an API/via the web GUI
|
||||
3. __MaChAmp__ (van der Goot et al. 2021): transformer-based toolkit for multi-task learning, works on all CoNNL-like data, close to the SOTA, relatively easy to install and train
|
||||
|
||||
## MaChAmp config example
|
||||
```json
|
||||
{"compsyn": {
|
||||
"train_data_path": "PATH-TO-YOUR-TRAIN-SPLIT",
|
||||
"dev_data_path": "PATH-TO-YOUR-DEV-SPLIT",
|
||||
"word_idx": 1,
|
||||
"tasks": {
|
||||
"upos": {
|
||||
"task_type": "seq",
|
||||
"column_idx": 3
|
||||
},
|
||||
"dependency": {
|
||||
"task_type": "dependency",
|
||||
"column_idx": 6}}}}
|
||||
```
|
||||
|
||||
## Your task (lab 3)
|
||||

|
||||
|
||||
1. annotate a small treebank for your language of choice (started yesterday)
|
||||
2. __train a parser-tagger on a reference UD treebank__ (tomorrow, or maybe even today: installation)
|
||||
3. evaluate it on your treebank
|
||||
|
||||
# To learn more
|
||||
|
||||
## Main sources
|
||||
- chapters 18-19 of the January 2024 draft of _Speech and Language Processing_ (Jurafsky & Martin) (full text available [__here__](https://web.stanford.edu/~jurafsky/slp3/))
|
||||
- unit 3-2 of Johansson & Kuhlmann's course "Deep Learning for Natural Language Processing" ([__slides and videos__](https://liu-nlp.ai/dl4nlp/modules/module3/))
|
||||
- section 10.9.2 on parser evaluation from Aarne's course notes (on Canvas)
|
||||
|
||||
## Papers describing the parsers
|
||||
- _MaltParser: A Data-Driven Parser-Generator for Dependency Parsing_ (Nivre et al. 2006) ([__PDF__](http://lrec-conf.org/proceedings/lrec2006/pdf/162_pdf.pdf))
|
||||
- _UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing_ (Straka et al. 2016) ([__PDF__](https://aclanthology.org/L16-1680.pdf))
|
||||
- _UDPipe 2.0 Prototype at CoNLL 2018 UD Shared Task_ (Straka et al. 2018) ([__PDF__](https://aclanthology.org/K18-2020.pdf))
|
||||
- _Massive Choice, Ample Tasks (MACHAMP): A Toolkit for Multi-task Learning in NLP_ (van der Goot et al., 2021) ([__PDF__](https://arxiv.org/pdf/2005.14672))
|
||||
|
||||
## CSE courses you may like
|
||||
1. [DIT231](https://www.gu.se/en/study-gothenburg/programming-language-technology-dit231) Programming language technology
|
||||
- build a complete compiler
|
||||
2. [DIT301](https://www.gu.se/en/study-gothenburg/compiler-construction-dit301) Compiler construction
|
||||
- the hardcore version of 1.
|
||||
- build another compiler _and optimize it_
|
||||
3. DIT247 Machine learning for NLP (?)
|
||||
- has a module on dependency parsing similar to the one in "Deep Learning for Natural Language Processing"
|
||||
BIN
lectures/2025/lecture-n-1/slides.pdf
Normal file
94
lectures/2025/lecture-n/arianna/.gitignore
vendored
Normal file
@@ -0,0 +1,94 @@
|
||||
## Core latex/pdflatex auxiliary files:
|
||||
*.aux
|
||||
*.lof
|
||||
*.log
|
||||
*.lot
|
||||
*.fls
|
||||
*.out
|
||||
*.toc
|
||||
|
||||
## Intermediate documents:
|
||||
*.dvi
|
||||
# these rules might exclude image files for figures etc.
|
||||
# *.ps
|
||||
# *.eps
|
||||
# *.pdf
|
||||
|
||||
## Bibliography auxiliary files (bibtex/biblatex/biber):
|
||||
*.bbl
|
||||
*.bcf
|
||||
*.blg
|
||||
*-blx.aux
|
||||
*-blx.bib
|
||||
*.run.xml
|
||||
|
||||
## Build tool auxiliary files:
|
||||
*.fdb_latexmk
|
||||
*.synctex.gz
|
||||
*.synctex.gz(busy)
|
||||
*.pdfsync
|
||||
|
||||
## Auxiliary and intermediate files from other packages:
|
||||
|
||||
# algorithms
|
||||
*.alg
|
||||
*.loa
|
||||
|
||||
# amsthm
|
||||
*.thm
|
||||
|
||||
# beamer
|
||||
*.nav
|
||||
*.snm
|
||||
*.vrb
|
||||
|
||||
# glossaries
|
||||
*.acn
|
||||
*.acr
|
||||
*.glg
|
||||
*.glo
|
||||
*.gls
|
||||
|
||||
# hyperref
|
||||
*.brf
|
||||
|
||||
# listings
|
||||
*.lol
|
||||
|
||||
# makeidx
|
||||
*.idx
|
||||
*.ilg
|
||||
*.ind
|
||||
*.ist
|
||||
|
||||
# minitoc
|
||||
*.maf
|
||||
*.mtc
|
||||
*.mtc0
|
||||
|
||||
# minted
|
||||
*.pyg
|
||||
|
||||
# nomencl
|
||||
*.nlo
|
||||
|
||||
# sagetex
|
||||
*.sagetex.sage
|
||||
*.sagetex.py
|
||||
*.sagetex.scmd
|
||||
|
||||
# sympy
|
||||
*.sout
|
||||
*.sympy
|
||||
sympy-plots-for-*.tex/
|
||||
|
||||
# todonotes
|
||||
*.tdo
|
||||
|
||||
# xindy
|
||||
*.xdy
|
||||
|
||||
# useless files
|
||||
color_scheme.png
|
||||
identicon.png
|
||||
._wordcount_selection.tex
|
||||
187
lectures/2025/lecture-n/arianna/beamerthemelucid.sty
Normal file
@@ -0,0 +1,187 @@
|
||||
\usepackage{tikz}
|
||||
\usetikzlibrary{calc}
|
||||
|
||||
% -------- COLOR SCHEME --------
|
||||
\definecolor{PrimaryColor}{RGB}{7,79,140} % primary color (blue)
|
||||
\definecolor{SecondaryColor}{RGB}{242,88,26} % bulleted lists
|
||||
\definecolor{BackgroundColor}{RGB}{255,255,255} % background & titles (white)
|
||||
\definecolor{TextColor}{RGB}{0,0,0} % text (black)
|
||||
\definecolor{ProgBarBGColor}{RGB}{175,175,175} % progress bar background (grey)
|
||||
|
||||
|
||||
% set colours
|
||||
\setbeamercolor{normal text}{fg=TextColor}\usebeamercolor*{normal text}
|
||||
\setbeamercolor{alerted text}{fg=PrimaryColor}
|
||||
\setbeamercolor{section in toc}{fg=PrimaryColor}
|
||||
\setbeamercolor{structure}{fg=SecondaryColor}
|
||||
\hypersetup{colorlinks,linkcolor=,urlcolor=SecondaryColor}
|
||||
|
||||
% set fonts
|
||||
\setbeamerfont{itemize/enumerate body}{size=\large}
|
||||
\setbeamerfont{itemize/enumerate subbody}{size=\normalsize}
|
||||
\setbeamerfont{itemize/enumerate subsubbody}{size=\small}
|
||||
|
||||
% make pixelated bullets
|
||||
\setbeamertemplate{itemize item}{
|
||||
\tikz{
|
||||
\draw[fill=SecondaryColor,draw=none] (0, 0) rectangle(0.1, 0.1);
|
||||
\draw[fill=SecondaryColor,draw=none] (0.1, 0.1) rectangle(0.2, 0.2);
|
||||
\draw[fill=SecondaryColor,draw=none] (0, 0.2) rectangle(0.1, 0.3);
|
||||
}
|
||||
}
|
||||
\setbeamertemplate{itemize subitem}{
|
||||
\tikz{
|
||||
\draw[fill=SecondaryColor,draw=none] (0, 0) rectangle(0.075, 0.075);
|
||||
\draw[fill=SecondaryColor,draw=none] (0.075, 0.075) rectangle(0.15, 0.15);
|
||||
\draw[fill=SecondaryColor,draw=none] (0, 0.15) rectangle(0.075, 0.225);
|
||||
}
|
||||
}
|
||||
\setbeamertemplate{itemize subsubitem}{
|
||||
\tikz{
|
||||
\draw[fill=SecondaryColor,draw=none] (0.050, 0.050) rectangle(0.15, 0.15);
|
||||
}
|
||||
}
|
||||
|
||||
% disable navigation
|
||||
\setbeamertemplate{navigation symbols}{}
|
||||
|
||||
% disable the damn default logo!
|
||||
\setbeamertemplate{sidebar right}{}
|
||||
|
||||
% custom draw the title page above
|
||||
\setbeamertemplate{title page}{}
|
||||
|
||||
% again, manually draw the frame title above
|
||||
\setbeamertemplate{frametitle}{}
|
||||
|
||||
% disable "Figure:" in the captions
|
||||
% TODO: somehow this doesn't work for md-generated slides
|
||||
%\setbeamertemplate{caption}{\tiny\insertcaption}
|
||||
%\setbeamertemplate{caption label separator}{}
|
||||
|
||||
% add some space below the footnotes so they don't end up on the progress bar
|
||||
\setbeamertemplate{footnote}{
|
||||
\parindent 0em
|
||||
\noindent
|
||||
\raggedright
|
||||
\hbox to 0.8em{\hfil\insertfootnotemark}
|
||||
\insertfootnotetext
|
||||
\par
|
||||
\vspace{2em}
|
||||
}
|
||||
|
||||
% add the same vspace both before and after quotes
|
||||
\setbeamertemplate{quote begin}{\vspace{0.5em}}
|
||||
\setbeamertemplate{quote end}{\vspace{0.5em}}
|
||||
|
||||
% progress bar counters
|
||||
\newcounter{showProgressBar}
|
||||
\setcounter{showProgressBar}{1}
|
||||
\newcounter{showSlideNumbers}
|
||||
\setcounter{showSlideNumbers}{1}
|
||||
\newcounter{showSlideTotal}
|
||||
\setcounter{showSlideTotal}{1}
|
||||
|
||||
% use \makeatletter for our progress bar definitions
|
||||
% progress bar idea from http://tex.stackexchange.com/a/59749/44221
|
||||
% slightly adapted for visual purposes here
|
||||
\makeatletter
|
||||
\newcount\progressbar@tmpcounta% auxiliary counter
|
||||
\newcount\progressbar@tmpcountb% auxiliary counter
|
||||
\newdimen\progressbar@pbwidth %progressbar width
|
||||
\newdimen\progressbar@tmpdim % auxiliary dimension
|
||||
|
||||
\newdimen\slidewidth % auxiliary dimension
|
||||
\newdimen\slideheight % auxiliary dimension
|
||||
|
||||
% make the progress bar go across the screen
|
||||
\progressbar@pbwidth=\the\paperwidth
|
||||
\slidewidth=\the\paperwidth
|
||||
\slideheight=\the\paperheight
|
||||
|
||||
% draw everything with tikz
|
||||
\setbeamertemplate{background}{ % all slides
|
||||
% progress bar stuff
|
||||
\progressbar@tmpcounta=\insertframenumber
|
||||
\progressbar@tmpcountb=\inserttotalframenumber
|
||||
\progressbar@tmpdim=\progressbar@pbwidth
|
||||
\divide\progressbar@tmpdim by 100
|
||||
\multiply\progressbar@tmpdim by \progressbar@tmpcounta
|
||||
\divide\progressbar@tmpdim by \progressbar@tmpcountb
|
||||
\multiply\progressbar@tmpdim by 100
|
||||
|
||||
\begin{tikzpicture}
|
||||
% set up the entire slide as the canvas
|
||||
\useasboundingbox (0,0) rectangle(\the\paperwidth,\the\paperheight);
|
||||
|
||||
% background
|
||||
\fill[color=BackgroundColor] (0,0) rectangle(\the\paperwidth,\the\paperheight);
|
||||
|
||||
\ifnum\thepage=1\relax % only title slides
|
||||
% primary color rectangle
|
||||
\fill[color=PrimaryColor] (0, 4cm) rectangle(\slidewidth,\slideheight);
|
||||
|
||||
% text (title, subtitle, author, date)
|
||||
\node[anchor=south,text width=\slidewidth-1cm,inner xsep=0.5cm] at (0.5\slidewidth,4cm) {\color{BackgroundColor}\Huge\textbf{\inserttitle}};
|
||||
\node[anchor=north east,text width=\slidewidth-1cm,align=right] at (\slidewidth-0.4cm,4cm) {\color{PrimaryColor}\large\textbf{\insertsubtitle}};
|
||||
\node at (0.5\slidewidth,2cm) {\color{PrimaryColor}\LARGE\insertauthor};
|
||||
\node at (0.5\slidewidth,1.25cm) {\color{PrimaryColor}\Large\insertinstitute};
|
||||
\node[anchor=south east] at(\slidewidth,0cm) {\color{PrimaryColor}\tiny\insertdate};
|
||||
\else % other slides
|
||||
% title bar
|
||||
\fill[color=PrimaryColor] (0, \slideheight-1cm) rectangle(\slidewidth,\slideheight);
|
||||
|
||||
% slide title
|
||||
\node[anchor=north,text width=\slidewidth-0.75cm,inner xsep=0.5cm,inner ysep=0.25cm] at (0.5\slidewidth,\slideheight) {\color{BackgroundColor}\huge\textbf{\insertframetitle}};
|
||||
|
||||
% logo (TODO: autoscale; now it expects 350x350
|
||||
\node[anchor=north east] at (\slidewidth-0.25cm,\slideheight+0.06cm){\insertlogo};
|
||||
|
||||
% show progress bar
|
||||
\ifnum \value{showProgressBar}>0\relax%
|
||||
% progress bar icon in the middle of the screen
|
||||
\draw[fill=ProgBarBGColor,draw=none] (0cm,0cm) rectangle(\slidewidth,0.25cm);
|
||||
\draw[fill=PrimaryColor,draw=none] (0cm,0cm) rectangle(\progressbar@tmpdim,0.25cm);
|
||||
|
||||
% bottom info
|
||||
\node[anchor=south west] at(0cm,0.25cm) {\color{PrimaryColor}\tiny\vphantom{lp}\insertsection};
|
||||
% if slide numbers are active
|
||||
\ifnum \value{showSlideNumbers}>0\relax%
|
||||
% if slide totals are active
|
||||
\ifnum \value{showSlideTotal}>0\relax%
|
||||
% draw both slide number and slide total
|
||||
\node[anchor=south east] at(\slidewidth,0.25cm) {\color{PrimaryColor}\tiny\insertframenumber/\inserttotalframenumber};
|
||||
\else
|
||||
\node[anchor=south east] at(\slidewidth,0.25cm) {\color{PrimaryColor}\tiny\insertframenumber};
|
||||
\fi
|
||||
\fi
|
||||
\else
|
||||
% section title in the bottom left
|
||||
\node[anchor=south west] at(0cm,0cm) {\color{PrimaryColor}\tiny\vphantom{lp}\insertsection};
|
||||
% if we're showing slide numbers
|
||||
\ifnum \value{showSlideNumbers}>0\relax%
|
||||
% if slide totals are active
|
||||
\ifnum \value{showSlideTotal}>0\relax%
|
||||
% slide number and slide total
|
||||
\node[anchor=south east] at(\slidewidth,0cm) {\color{PrimaryColor}\tiny\insertframenumber/\inserttotalframenumber};
|
||||
\else
|
||||
\node[anchor=south east] at(\slidewidth,0cm) {\color{PrimaryColor}\tiny\insertframenumber};
|
||||
\fi
|
||||
\fi
|
||||
\fi
|
||||
\fi
|
||||
\end{tikzpicture}
|
||||
}
|
||||
\makeatother
|
||||
|
||||
\AtBeginSection{\frame{\sectionpage}} % section pages
|
||||
\setbeamertemplate{section page}
|
||||
{
|
||||
\begin{tikzpicture}
|
||||
% set up the entire slide as the canvas
|
||||
\useasboundingbox (0,0) rectangle(\slidewidth,\slideheight);
|
||||
\fill[color=BackgroundColor] (-1cm, 2cm) rectangle (\slidewidth, \slideheight+0.1cm);
|
||||
\fill[color=PrimaryColor] (-1cm, 0.5\slideheight-1cm) rectangle(\slidewidth, 0.5\slideheight+1cm);
|
||||
\node[text width=\the\paperwidth-1cm,align=center] at (0.4\slidewidth, 0.5\slideheight) {\color{BackgroundColor}\Huge\textbf{\insertsection}};
|
||||
\end{tikzpicture}
|
||||
}
|
||||
BIN
lectures/2025/lecture-n/arianna/gu.png
Normal file
|
After Width: | Height: | Size: 81 KiB |
BIN
lectures/2025/lecture-n/arianna/img/cda.png
Normal file
|
After Width: | Height: | Size: 258 KiB |
24
lectures/2025/lecture-n/arianna/img/ex.conllu
Normal file
@@ -0,0 +1,24 @@
|
||||
# generator = UDPipe 2, https://lindat.mff.cuni.cz/services/udpipe
|
||||
# udpipe_model = swedish-talbanken-ud-2.15-241121
|
||||
# udpipe_model_licence = CC BY-NC-SA
|
||||
# newdoc
|
||||
# newpar
|
||||
# sent_id = 1
|
||||
# text = den är smog salt och det bra för all kropen
|
||||
1 den den PRON PN|UTR|SIN|DEF|SUB/OBJ Definite=Def|Gender=Com|Number=Sing|PronType=Prs 4 nsubj _ TokenRange=0:3
|
||||
2 är vara AUX VB|PRS|AKT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act 4 cop _ TokenRange=4:6
|
||||
3 smog smog ADV AB _ 4 advmod _ TokenRange=7:11
|
||||
4 salt salt ADJ JJ|POS|UTR|SIN|IND|NOM Case=Nom|Definite=Ind|Degree=Pos|Number=Sing 0 root _ TokenRange=12:16
|
||||
5 och och CCONJ KN _ 7 cc _ TokenRange=17:20
|
||||
6 det den PRON PN|NEU|SIN|DEF|SUB/OBJ Definite=Def|Gender=Neut|Number=Sing|PronType=Prs 7 nsubj _ TokenRange=21:24
|
||||
7 bra bra ADJ JJ|POS|UTR/NEU|SIN/PLU|IND/DEF|NOM Case=Nom|Degree=Pos 4 conj _ TokenRange=25:28
|
||||
8 för för ADP PP _ 10 case _ TokenRange=29:32
|
||||
9 all all DET DT|UTR|SIN|IND/DEF Gender=Com|Number=Sing|PronType=Tot 10 det _ TokenRange=33:36
|
||||
10 kropen krop NOUN NN|UTR|SIN|DEF|NOM Case=Nom|Definite=Def|Gender=Com|Number=Sing 7 obl _ SpaceAfter=No|TokenRange=37:43
|
||||
|
||||
1 Självklart självklar ADV JJ|POS|NEU|SIN|IND|NOM Degree=Pos 0 root _ ORIG_LABEL=root
|
||||
2 att att SCONJ SN _ 5 mark _ CorrectionLabels=S-Clause
|
||||
3 det den PRON PN|NEU|SIN|DEF|SUB/OBJ Definite=Def|Gender=Neut|Number=Sing|PronType=Prs 5 nsubj _ _
|
||||
4 är vara AUX VB|PRS|AKT Mood=Ind|Tense=Pres|VerbForm=Fin|Voice=Act 5 cop _ CorrectionLabels=S-Clause
|
||||
5 viktigt viktig ADJ JJ|POS|NEU|SIN|IND|NOM Case=Nom|Definite=Ind|Degree=Pos|Gender=Neut|Number=Sing 1 csubj _ _
|
||||
6 . . PUNCT MAD _ 1 punct _ _
|
||||
BIN
lectures/2025/lecture-n/arianna/img/ex.pdf
Normal file
111
lectures/2025/lecture-n/arianna/img/ex.tex
Normal file
@@ -0,0 +1,111 @@
|
||||
\documentclass{article}
|
||||
\usepackage[a4paper,margin=0.5in,landscape]{geometry}
|
||||
\usepackage[utf8]{inputenc}
|
||||
\begin{document}
|
||||
%% den är smog salt och det bra för all kropen
|
||||
\setlength{\unitlength}{0.2mm}
|
||||
\begin{picture}(531.0,110.0)
|
||||
\put(0.0,0.0){den}
|
||||
\put(46.0,0.0){är}
|
||||
\put(83.0,0.0){smog}
|
||||
\put(129.0,0.0){salt}
|
||||
\put(175.0,0.0){och}
|
||||
\put(230.0,0.0){det}
|
||||
\put(276.0,0.0){bra}
|
||||
\put(313.0,0.0){för}
|
||||
\put(350.0,0.0){all}
|
||||
\put(387.0,0.0){kropen}
|
||||
\put(0.0,15.0){{\tiny PRON}}
|
||||
\put(46.0,15.0){{\tiny AUX}}
|
||||
\put(83.0,15.0){{\tiny ADV}}
|
||||
\put(129.0,15.0){{\tiny ADJ}}
|
||||
\put(175.0,15.0){{\tiny CCONJ}}
|
||||
\put(230.0,15.0){{\tiny PRON}}
|
||||
\put(276.0,15.0){{\tiny ADJ}}
|
||||
\put(313.0,15.0){{\tiny ADP}}
|
||||
\put(350.0,15.0){{\tiny DET}}
|
||||
\put(387.0,15.0){{\tiny NOUN}}
|
||||
\put(0.0,-11.0){{\scriptsize {\slshape den}}}
|
||||
\put(46.0,-11.0){{\scriptsize {\slshape vara}}}
|
||||
\put(83.0,-11.0){{\scriptsize {\slshape smog}}}
|
||||
\put(129.0,-11.0){{\scriptsize {\slshape salt}}}
|
||||
\put(175.0,-11.0){{\scriptsize {\slshape och}}}
|
||||
\put(230.0,-11.0){{\scriptsize {\slshape den}}}
|
||||
\put(276.0,-11.0){{\scriptsize {\slshape bra}}}
|
||||
\put(313.0,-11.0){{\scriptsize {\slshape för}}}
|
||||
\put(350.0,-11.0){{\scriptsize {\slshape all}}}
|
||||
\put(387.0,-11.0){{\scriptsize {\slshape krop}}}
|
||||
\put(74.5,30.0){\oval(126.67441860465117,100.0)[t]}
|
||||
\put(11.162790697674417,35.0){\vector(0,-1){5.0}}
|
||||
\put(63.25,83.0){{\tiny nsubj}}
|
||||
\put(97.5,30.0){\oval(79.3855421686747,66.66666666666667)[t]}
|
||||
\put(57.80722891566265,35.0){\vector(0,-1){5.0}}
|
||||
\put(90.75,66.33333333333334){{\tiny cop}}
|
||||
\put(116.0,30.0){\oval(39.47826086956522,33.333333333333336)[t]}
|
||||
\put(96.26086956521739,35.0){\vector(0,-1){5.0}}
|
||||
\put(102.5,49.66666666666667){{\tiny advmod}}
|
||||
\put(144.0,110.0){\vector(0,-1){80.0}}
|
||||
\put(149.0,100.0){{\tiny root}}
|
||||
\put(235.5,30.0){\oval(98.02970297029702,66.66666666666667)[t]}
|
||||
\put(186.4851485148515,35.0){\vector(0,-1){5.0}}
|
||||
\put(231.0,66.33333333333334){{\tiny cc}}
|
||||
\put(263.0,30.0){\oval(39.47826086956522,33.333333333333336)[t]}
|
||||
\put(243.26086956521738,35.0){\vector(0,-1){5.0}}
|
||||
\put(251.75,49.66666666666667){{\tiny nsubj}}
|
||||
\put(222.5,30.0){\oval(144.9591836734694,100.0)[t]}
|
||||
\put(294.9795918367347,35.0){\vector(0,-1){5.0}}
|
||||
\put(213.5,83.0){{\tiny conj}}
|
||||
\put(360.0,30.0){\oval(69.94594594594595,66.66666666666667)[t]}
|
||||
\put(325.02702702702703,35.0){\vector(0,-1){5.0}}
|
||||
\put(351.0,66.33333333333334){{\tiny case}}
|
||||
\put(378.5,30.0){\oval(28.89189189189189,33.333333333333336)[t]}
|
||||
\put(364.05405405405406,35.0){\vector(0,-1){5.0}}
|
||||
\put(371.75,49.66666666666667){{\tiny det}}
|
||||
\put(351.5,30.0){\oval(108.29729729729729,100.0)[t]}
|
||||
\put(405.64864864864865,35.0){\vector(0,-1){5.0}}
|
||||
\put(344.75,83.0){{\tiny obl}}
|
||||
\end{picture}
|
||||
|
||||
|
||||
\vspace{4mm}
|
||||
%% Självklart att det är viktigt .
|
||||
\setlength{\unitlength}{0.2mm}
|
||||
\begin{picture}(406.0,150.0)
|
||||
\put(0.0,0.0){Självklart}
|
||||
\put(100.0,0.0){att}
|
||||
\put(155.0,0.0){det}
|
||||
\put(201.0,0.0){är}
|
||||
\put(238.0,0.0){viktigt}
|
||||
\put(311.0,0.0){.}
|
||||
\put(0.0,15.0){{\tiny ADV}}
|
||||
\put(100.0,15.0){{\tiny SCONJ}}
|
||||
\put(155.0,15.0){{\tiny PRON}}
|
||||
\put(201.0,15.0){{\tiny AUX}}
|
||||
\put(238.0,15.0){{\tiny ADJ}}
|
||||
\put(311.0,15.0){{\tiny PUNCT}}
|
||||
\put(0.0,-11.0){{\scriptsize {\slshape självklar}}}
|
||||
\put(100.0,-11.0){{\scriptsize {\slshape att}}}
|
||||
\put(155.0,-11.0){{\scriptsize {\slshape den}}}
|
||||
\put(201.0,-11.0){{\scriptsize {\slshape vara}}}
|
||||
\put(238.0,-11.0){{\scriptsize {\slshape viktig}}}
|
||||
\put(311.0,-11.0){{\scriptsize {\slshape .}}}
|
||||
\put(15.0,150.0){\vector(0,-1){120.0}}
|
||||
\put(20.0,140.0){{\tiny root}}
|
||||
\put(179.0,30.0){\oval(135.82608695652175,100.0)[t]}
|
||||
\put(111.08695652173913,35.0){\vector(0,-1){5.0}}
|
||||
\put(170.0,83.0){{\tiny mark}}
|
||||
\put(206.5,30.0){\oval(79.3855421686747,66.66666666666667)[t]}
|
||||
\put(166.80722891566265,35.0){\vector(0,-1){5.0}}
|
||||
\put(195.25,66.33333333333334){{\tiny nsubj}}
|
||||
\put(229.5,30.0){\oval(28.89189189189189,33.333333333333336)[t]}
|
||||
\put(215.05405405405406,35.0){\vector(0,-1){5.0}}
|
||||
\put(222.75,49.66666666666667){{\tiny cop}}
|
||||
\put(139.0,30.0){\oval(236.73949579831933,133.33333333333334)[t]}
|
||||
\put(257.3697478991597,35.0){\vector(0,-1){5.0}}
|
||||
\put(127.75,99.66666666666667){{\tiny csubj}}
|
||||
\put(175.5,30.0){\oval(310.0353697749196,166.66666666666666)[t]}
|
||||
\put(330.51768488745984,35.0){\vector(0,-1){5.0}}
|
||||
\put(164.25,116.33333333333333){{\tiny punct}}
|
||||
\end{picture}
|
||||
|
||||
\end{document}
|
||||
BIN
lectures/2025/lecture-n/arianna/img/l1l2.png
Normal file
|
After Width: | Height: | Size: 60 KiB |
637
lectures/2025/lecture-n/arianna/slides.md
Normal file
@@ -0,0 +1,637 @@
|
||||
---
|
||||
title: "UD as an annotation standard \\newline for learner language"
|
||||
subtitle: "a case study on L2 Swedish"
|
||||
author: "Arianna Masciolini"
|
||||
theme: "lucid"
|
||||
logo: "gu.png"
|
||||
date: "VT25"
|
||||
institute: "LT2214 Computational Syntax"
|
||||
---
|
||||
|
||||
## Learner data
|
||||
|
||||
<!--see any problems?-->
|
||||
|
||||
\bigskip \bigskip
|
||||
|
||||
### English (FCE)
|
||||
\small
|
||||
```xml
|
||||
I also suggest that more plays and films should
|
||||
<ns type="RV"> <ns type="FV"><i>be taken</i><c>take</c>
|
||||
</ns> place</ns>.
|
||||
```
|
||||
|
||||
### Italian (VALICO)
|
||||
\small
|
||||
```xml
|
||||
Finse <MC><i>aveva paura</i><c>che aveva paura</c>
|
||||
</MC> di un <DN><i>rapito</i><c>rapimento</c></DN>.
|
||||
```
|
||||
|
||||
### Swedish (SweLL)
|
||||
\small
|
||||
```xml
|
||||
<sentence> <w ref="1">"</w> <w ref="2" target_form="Det"
|
||||
correction_label="L-Ref">Den</w> <w ref="3">är</w>
|
||||
<w ref="4">en</w> <w ref="5">tredjedel</w>
|
||||
<w ref="6">av</w> <w ref="7">din</w> <w ref="8">dag</w>
|
||||
<w ref="9">!</w> </sentence>
|
||||
```
|
||||
|
||||
## The problems
|
||||
- coarse-grained error labels
|
||||
- exclusive focus on errors
|
||||
- lots of manual annotation needed
|
||||
- lack of interoperability between corpora
|
||||
|
||||
## The solution: UD
|
||||
- fine-grained morphosyntactic annotation <!--answers the first two-->
|
||||
- parsers
|
||||
- cross-linguistic consistency $\to$ possibility to compare:
|
||||
- L2 vs. standard
|
||||
- L1 vs. L2
|
||||
- different L2s
|
||||
|
||||
## L1-L2 treebanks
|
||||
|
||||
<!--I did not come up with this, or actually I did, but someone else had already-->
|
||||
|
||||

|
||||
|
||||
\bigskip
|
||||
|
||||
- L2 sentences $\parallel$ correction hypotheses <!--explain hypotheses-->
|
||||
- no explicit error tagging
|
||||
|
||||
<!--I love parallel treebanks btw - concept-alignment thesis with Aarne-->
|
||||
|
||||
## UD treebanks of learner language
|
||||
\bigskip
|
||||
|
||||
| **language** | **name** | **size** | **status** | **parallel** |
|
||||
| ----------: | --------- | -------: | :-----------: | :--------: |
|
||||
| Chinese | CFL | 451 | released | **yes**\*\* |
|
||||
| English | ESL | 5124 | retired\* | **yes** |
|
||||
| English | ESLSpok | 2320 | released | no |
|
||||
| Italian | Valico | 398 | released | **yes** |
|
||||
| Korean | KSL | 12977 | released | no |
|
||||
| Russian | ? | 500 | WIP | **yes** |
|
||||
| \color{SecondaryColor}Swedish | \color{SecondaryColor}SweLL | \color{SecondaryColor}\~5000 | \color{SecondaryColor}WIP | \color{SecondaryColor}**yes** |
|
||||
|
||||
\footnotesize \*available for download but not part of the latest UD release
|
||||
\newline\footnotesize \**only L2 half available
|
||||
|
||||
## Challenges
|
||||
| **expectations** | **reality** |
|
||||
| -----: | :----- |
|
||||
| fine-grained annotation | when the validator allows that |
|
||||
| parsers | don't work terribly well |
|
||||
| cross-linguistic consistency | is limited to error-free spans |
|
||||
|
||||
## The `root` of the problem
|
||||
The UD guidelines are designed with standard language in mind
|
||||
|
||||
- should we annotate the intended meaning (correction) and/or the observed language use?
|
||||
- how to handle mismatches between the characteristics of individual tokens and their use in context?
|
||||
|
||||
# Treebanking SweLL
|
||||
|
||||
## Source corpus
|
||||
__SweLL-gold__, aka the Swedish Learner Language corpus:
|
||||
|
||||
- __genre__: essays (misc topics)
|
||||
- __learners__: adult L2 Swedish learners with various language backgrounds and proficiency levels
|
||||
- __annotation__: error tagging, pseudonymization and normalization (minimal edits)
|
||||
- __license__: CLARIN-ID -PRIV \underline{-NORED} -BY
|
||||
|
||||
## Example 0
|
||||
\setlength{\unitlength}{0.20mm}
|
||||
\begin{picture}(406.0,150.0)
|
||||
\put(0.0,0.0){Självklart}
|
||||
\put(100.0,0.0){\bfseries att}
|
||||
\put(155.0,0.0){\bfseries det}
|
||||
\put(201.0,0.0){\bfseries är}
|
||||
\put(238.0,0.0){viktigt}
|
||||
\put(311.0,0.0){.}
|
||||
\put(0.0,-11.0){{\scriptsize {\slshape of.course}}}
|
||||
\put(100.0,-11.0){{\scriptsize {\slshape that}}}
|
||||
\put(155.0,-11.0){{\scriptsize {\slshape it}}}
|
||||
\put(201.0,-11.0){{\scriptsize {\slshape is}}}
|
||||
\put(238.0,-11.0){{\scriptsize {\slshape important}}}
|
||||
\put(311.0,-11.0){{\scriptsize {\slshape .}}}
|
||||
\end{picture}
|
||||
|
||||
\bigskip
|
||||
|
||||
- \small correction: "Självklart __är det__ viktigt."
|
||||
- \small translation: "Of course it is important."
|
||||
|
||||
## Example 0
|
||||
\setlength{\unitlength}{0.20mm}
|
||||
\begin{picture}(406.0,150.0)
|
||||
\put(0.0,0.0){Självklart}
|
||||
\put(100.0,0.0){\bfseries att}
|
||||
\put(155.0,0.0){\bfseries det}
|
||||
\put(201.0,0.0){\bfseries är}
|
||||
\put(238.0,0.0){viktigt}
|
||||
\put(311.0,0.0){.}
|
||||
\put(0.0,15.0){{\tiny ADV}}
|
||||
\put(100.0,15.0){{\tiny SCONJ}}
|
||||
\put(155.0,15.0){{\tiny PRON}}
|
||||
\put(201.0,15.0){{\tiny AUX}}
|
||||
\put(238.0,15.0){{\tiny ADJ}}
|
||||
\put(311.0,15.0){{\tiny PUNCT}}
|
||||
\put(0.0,-11.0){{\scriptsize {\slshape of.course}}}
|
||||
\put(100.0,-11.0){{\scriptsize {\slshape that}}}
|
||||
\put(155.0,-11.0){{\scriptsize {\slshape it}}}
|
||||
\put(201.0,-11.0){{\scriptsize {\slshape is}}}
|
||||
\put(238.0,-11.0){{\scriptsize {\slshape important}}}
|
||||
\put(311.0,-11.0){{\scriptsize {\slshape .}}}
|
||||
\end{picture}
|
||||
|
||||
\bigskip
|
||||
|
||||
- \small correction: "Självklart __är det__ viktigt."
|
||||
- \small translation: "Of course it is important."
|
||||
|
||||
## Example 0
|
||||
\setlength{\unitlength}{0.20mm}
|
||||
\begin{picture}(406.0,150.0)
|
||||
\put(0.0,0.0){Självklart}
|
||||
\put(100.0,0.0){\bfseries att}
|
||||
\put(155.0,0.0){\bfseries det}
|
||||
\put(201.0,0.0){\bfseries är}
|
||||
\put(238.0,0.0){viktigt}
|
||||
\put(311.0,0.0){.}
|
||||
\put(0.0,15.0){{\tiny ADV}}
|
||||
\put(100.0,15.0){{\tiny SCONJ}}
|
||||
\put(155.0,15.0){{\tiny PRON}}
|
||||
\put(201.0,15.0){{\tiny AUX}}
|
||||
\put(238.0,15.0){{\tiny ADJ}}
|
||||
\put(311.0,15.0){{\tiny PUNCT}}
|
||||
\put(0.0,-11.0){{\scriptsize {\slshape of.course}}}
|
||||
\put(100.0,-11.0){{\scriptsize {\slshape that}}}
|
||||
\put(155.0,-11.0){{\scriptsize {\slshape it}}}
|
||||
\put(201.0,-11.0){{\scriptsize {\slshape is}}}
|
||||
\put(238.0,-11.0){{\scriptsize {\slshape important}}}
|
||||
\put(311.0,-11.0){{\scriptsize {\slshape .}}}
|
||||
\put(15.0,150.0){\vector(0,-1){120.0}}
|
||||
\put(20.0,140.0){{\tiny root}}
|
||||
\put(179.0,30.0){\oval(135.82608695652175,100.0)[t]}
|
||||
\put(111.08695652173913,35.0){\vector(0,-1){5.0}}
|
||||
\put(170.0,83.0){{\tiny mark}}
|
||||
\put(206.5,30.0){\oval(79.3855421686747,66.66666666666667)[t]}
|
||||
\put(166.80722891566265,35.0){\vector(0,-1){5.0}}
|
||||
\put(195.25,66.33333333333334){{\tiny nsubj}}
|
||||
\put(229.5,30.0){\oval(28.89189189189189,33.333333333333336)[t]}
|
||||
\put(215.05405405405406,35.0){\vector(0,-1){5.0}}
|
||||
\put(222.75,49.66666666666667){{\tiny cop}}
|
||||
\put(139.0,30.0){\oval(236.73949579831933,133.33333333333334)[t]}
|
||||
\put(257.3697478991597,35.0){\vector(0,-1){5.0}}
|
||||
\put(127.75,99.66666666666667){{\tiny csubj}}
|
||||
\put(175.5,30.0){\oval(310.0353697749196,166.66666666666666)[t]}
|
||||
\put(330.51768488745984,35.0){\vector(0,-1){5.0}}
|
||||
\put(164.25,116.33333333333333){{\tiny punct}}
|
||||
\end{picture}
|
||||
|
||||
\bigskip
|
||||
|
||||
- \small correction: "Självklart __är det__ viktigt."
|
||||
- \small translation: "Of course it is important."
|
||||
|
||||
## Example 1
|
||||
\setlength{\unitlength}{0.23mm}
|
||||
\begin{picture}(409.0,130.0)
|
||||
\put(0.0,0.0){Jag}
|
||||
\put(46.0,0.0){hade}
|
||||
\put(92.0,0.0){\bfseries emotskänslor}
|
||||
\put(200.0,0.0){fast}
|
||||
\put(270.0,0.0){jag}
|
||||
\put(311.0,0.0){\bfseries var}
|
||||
\put(348.0,0.0){\bfseries vänta}
|
||||
\put(403.0,0.0){det}
|
||||
\put(0.0,-11.0){{\scriptsize {\slshape I}}}
|
||||
\put(46.0,-11.0){{\scriptsize {\slshape had}}}
|
||||
\put(92.0,-11.0){{\scriptsize {\slshape againstfeelings}}}
|
||||
\put(200.0,-11.0){{\scriptsize {\slshape although}}}
|
||||
\put(270.0,-11.0){{\scriptsize {\slshape I}}}
|
||||
\put(311.0,-11.0){{\scriptsize {\slshape was}}}
|
||||
\put(348.0,-11.0){{\scriptsize {\slshape wait}}}
|
||||
\put(403.0,-11.0){{\scriptsize {\slshape that}}}
|
||||
\end{picture}
|
||||
|
||||
\bigskip
|
||||
|
||||
- \small correction: "Jag hade __motstridiga känslor__ fast jag __hade väntat mig__ det"
|
||||
- \small translation: "I had mixed feelings although I was expecting that"
|
||||
|
||||
## Example 1
|
||||
\setlength{\unitlength}{0.23mm}
|
||||
\begin{picture}(409.0,130.0)
|
||||
\put(0.0,0.0){Jag}
|
||||
\put(46.0,0.0){hade}
|
||||
\put(92.0,0.0){\bfseries emotskänslor}
|
||||
\put(200.0,0.0){fast}
|
||||
\put(270.0,0.0){jag}
|
||||
\put(311.0,0.0){\bfseries var}
|
||||
\put(348.0,0.0){\bfseries vänta}
|
||||
\put(403.0,0.0){det}
|
||||
\put(0.0,15.0){{\tiny PRON}}
|
||||
\put(46.0,15.0){{\tiny VERB}}
|
||||
\put(92.0,15.0){{\tiny NOUN}}
|
||||
\put(200.0,15.0){{\tiny SCONJ}}
|
||||
\put(270.0,15.0){{\tiny PRON}}
|
||||
\put(311.0,15.0){{\tiny AUX}}
|
||||
\put(348.0,15.0){{\tiny VERB}}
|
||||
\put(403.0,15.0){{\tiny PRON}}
|
||||
\put(0.0,-11.0){{\scriptsize {\slshape I}}}
|
||||
\put(46.0,-11.0){{\scriptsize {\slshape had}}}
|
||||
\put(92.0,-11.0){{\scriptsize {\slshape againstfeelings}}}
|
||||
\put(200.0,-11.0){{\scriptsize {\slshape although}}}
|
||||
\put(270.0,-11.0){{\scriptsize {\slshape I}}}
|
||||
\put(311.0,-11.0){{\scriptsize {\slshape was}}}
|
||||
\put(348.0,-11.0){{\scriptsize {\slshape wait}}}
|
||||
\put(403.0,-11.0){{\scriptsize {\slshape that}}}
|
||||
\end{picture}
|
||||
|
||||
\bigskip
|
||||
|
||||
- \small correction: "Jag hade __motstridiga känslor__ fast jag __hade väntat mig__ det"
|
||||
- \small translation: "I had mixed feelings although I was expecting that"
|
||||
|
||||
## Example 1
|
||||
\setlength{\unitlength}{0.23mm}
|
||||
\begin{picture}(409.0,130.0)
|
||||
\put(0.0,0.0){Jag}
|
||||
\put(46.0,0.0){hade}
|
||||
\put(92.0,0.0){\bfseries emotskänslor}
|
||||
\put(200.0,0.0){fast}
|
||||
\put(270.0,0.0){jag}
|
||||
\put(311.0,0.0){\bfseries var}
|
||||
\put(348.0,0.0){\bfseries vänta}
|
||||
\put(403.0,0.0){det}
|
||||
\put(0.0,15.0){{\tiny PRON}}
|
||||
\put(46.0,15.0){{\tiny VERB}}
|
||||
\put(92.0,15.0){{\tiny NOUN}}
|
||||
\put(200.0,15.0){{\tiny SCONJ}}
|
||||
\put(270.0,15.0){{\tiny PRON}}
|
||||
\put(311.0,15.0){{\tiny AUX}}
|
||||
\put(348.0,15.0){{\tiny VERB}}
|
||||
\put(403.0,15.0){{\tiny PRON}}
|
||||
\put(0.0,-11.0){{\scriptsize {\slshape I}}}
|
||||
\put(46.0,-11.0){{\scriptsize {\slshape had}}}
|
||||
\put(92.0,-11.0){{\scriptsize {\slshape againstfeelings}}}
|
||||
\put(200.0,-11.0){{\scriptsize {\slshape although}}}
|
||||
\put(270.0,-11.0){{\scriptsize {\slshape I}}}
|
||||
\put(311.0,-11.0){{\scriptsize {\slshape was}}}
|
||||
\put(348.0,-11.0){{\scriptsize {\slshape wait}}}
|
||||
\put(403.0,-11.0){{\scriptsize {\slshape that}}}
|
||||
\put(33.0,30.0){\oval(39.47826086956522,33.333333333333336)[t]}
|
||||
\put(13.26086956521739,35.0){\vector(0,-1){5.0}}
|
||||
\put(21.75,49.66666666666667){{\tiny nsubj}}
|
||||
\put(61.0,130.0){\vector(0,-1){100.0}}
|
||||
\put(66.0,120.0){{\tiny root}}
|
||||
\put(89.0,30.0){\oval(39.47826086956522,33.333333333333336)[t]}
|
||||
\put(108.73913043478261,35.0){\vector(0,-1){5.0}}
|
||||
\put(82.25,49.66666666666667){{\tiny obj}}
|
||||
\put(289.0,30.0){\oval(135.82608695652175,100.0)[t]}
|
||||
\put(221.08695652173913,35.0){\vector(0,-1){5.0}}
|
||||
\put(280.0,83.0){{\tiny mark}}
|
||||
\put(316.5,30.0){\oval(79.3855421686747,66.66666666666667)[t]}
|
||||
\put(276.8072289156627,35.0){\vector(0,-1){5.0}}
|
||||
\put(305.25,66.33333333333334){{\tiny nsubj}}
|
||||
\put(339.5,30.0){\oval(28.89189189189189,33.333333333333336)[t]}
|
||||
\put(325.05405405405406,35.0){\vector(0,-1){5.0}}
|
||||
\put(332.75,49.66666666666667){{\tiny \bfseries ?}}
|
||||
\put(217.0,30.0){\oval(301.0066225165563,133.33333333333334)[t]}
|
||||
\put(367.50331125827813,35.0){\vector(0,-1){5.0}}
|
||||
\put(205.75,99.66666666666667){{\tiny advcl}}
|
||||
\put(395.5,30.0){\oval(49.54545454545455,33.333333333333336)[t]}
|
||||
\put(420.27272727272725,35.0){\vector(0,-1){5.0}}
|
||||
\put(388.75,49.66666666666667){{\tiny obj}}
|
||||
\end{picture}
|
||||
|
||||
\bigskip
|
||||
|
||||
- \small correction: "Jag hade __motstridiga känslor__ fast jag __hade väntat mig__ det"
|
||||
- \small translation: "I had mixed feelings although I was expecting that"
|
||||
|
||||
## Example 1
|
||||
\setlength{\unitlength}{0.23mm}
|
||||
\begin{picture}(409.0,130.0)
|
||||
\put(0.0,0.0){Jag}
|
||||
\put(46.0,0.0){hade}
|
||||
\put(92.0,0.0){\bfseries emotskänslor}
|
||||
\put(200.0,0.0){fast}
|
||||
\put(270.0,0.0){jag}
|
||||
\put(311.0,0.0){\bfseries var}
|
||||
\put(348.0,0.0){\bfseries vänta}
|
||||
\put(403.0,0.0){det}
|
||||
\put(0.0,15.0){{\tiny PRON}}
|
||||
\put(46.0,15.0){{\tiny VERB}}
|
||||
\put(92.0,15.0){{\tiny NOUN}}
|
||||
\put(200.0,15.0){{\tiny SCONJ}}
|
||||
\put(270.0,15.0){{\tiny PRON}}
|
||||
\put(311.0,15.0){{\tiny AUX}}
|
||||
\put(348.0,15.0){{\tiny VERB}}
|
||||
\put(403.0,15.0){{\tiny PRON}}
|
||||
\put(0.0,-11.0){{\scriptsize {\slshape I}}}
|
||||
\put(46.0,-11.0){{\scriptsize {\slshape had}}}
|
||||
\put(92.0,-11.0){{\scriptsize {\slshape againstfeelings}}}
|
||||
\put(200.0,-11.0){{\scriptsize {\slshape although}}}
|
||||
\put(270.0,-11.0){{\scriptsize {\slshape I}}}
|
||||
\put(311.0,-11.0){{\scriptsize {\slshape was}}}
|
||||
\put(348.0,-11.0){{\scriptsize {\slshape wait}}}
|
||||
\put(403.0,-11.0){{\scriptsize {\slshape that}}}
|
||||
\put(33.0,30.0){\oval(39.47826086956522,33.333333333333336)[t]}
|
||||
\put(13.26086956521739,35.0){\vector(0,-1){5.0}}
|
||||
\put(21.75,49.66666666666667){{\tiny nsubj}}
|
||||
\put(61.0,130.0){\vector(0,-1){100.0}}
|
||||
\put(66.0,120.0){{\tiny root}}
|
||||
\put(89.0,30.0){\oval(39.47826086956522,33.333333333333336)[t]}
|
||||
\put(108.73913043478261,35.0){\vector(0,-1){5.0}}
|
||||
\put(82.25,49.66666666666667){{\tiny obj}}
|
||||
\put(289.0,30.0){\oval(135.82608695652175,100.0)[t]}
|
||||
\put(221.08695652173913,35.0){\vector(0,-1){5.0}}
|
||||
\put(280.0,83.0){{\tiny mark}}
|
||||
\put(316.5,30.0){\oval(79.3855421686747,66.66666666666667)[t]}
|
||||
\put(276.8072289156627,35.0){\vector(0,-1){5.0}}
|
||||
\put(305.25,66.33333333333334){{\tiny nsubj}}
|
||||
\put(339.5,30.0){\oval(28.89189189189189,33.333333333333336)[t]}
|
||||
\put(325.05405405405406,35.0){\vector(0,-1){5.0}}
|
||||
\put(332.75,49.66666666666667){{\tiny \bfseries aux:*}}
|
||||
\put(217.0,30.0){\oval(301.0066225165563,133.33333333333334)[t]}
|
||||
\put(367.50331125827813,35.0){\vector(0,-1){5.0}}
|
||||
\put(205.75,99.66666666666667){{\tiny advcl}}
|
||||
\put(395.5,30.0){\oval(49.54545454545455,33.333333333333336)[t]}
|
||||
\put(420.27272727272725,35.0){\vector(0,-1){5.0}}
|
||||
\put(388.75,49.66666666666667){{\tiny obj}}
|
||||
\end{picture}
|
||||
|
||||
\bigskip
|
||||
|
||||
- \small correction: "Jag hade __motstridiga känslor__ fast jag __hade väntat mig__ det"
|
||||
- \small translation: "I had mixed feelings although I was expecting that"
|
||||
|
||||
## Example 2
|
||||
\setlength{\unitlength}{0.23mm}
|
||||
\begin{picture}(195.0,110.0)
|
||||
\put(0.0,0.0){en}
|
||||
\put(37.0,0.0){lång}
|
||||
\put(83.0,0.0){\bfseries bus}
|
||||
\put(129.0,0.0){\bfseries resa}
|
||||
\put(0.0,-13.0){{\scriptsize {\slshape a}}}
|
||||
\put(37.0,-13.0){{\scriptsize {\slshape long}}}
|
||||
\put(83.0,-13.0){{\scriptsize {\slshape bus}}}
|
||||
\put(129.0,-13.0){{\scriptsize {\slshape trip}}}
|
||||
\end{picture}
|
||||
|
||||
\bigskip
|
||||
|
||||
- \small correction: "en lång __bussresa__"
|
||||
- \small translation: "a long bus trip"
|
||||
|
||||
## Example 2
|
||||
\setlength{\unitlength}{0.23mm}
|
||||
\begin{picture}(195.0,110.0)
|
||||
\put(0.0,0.0){en}
|
||||
\put(37.0,0.0){lång}
|
||||
\put(83.0,0.0){\bfseries bus}
|
||||
\put(129.0,0.0){\bfseries resa}
|
||||
\put(0.0,15.0){{\tiny DET}}
|
||||
\put(37.0,15.0){{\tiny ADJ}}
|
||||
\put(83.0,15.0){{\tiny NOUN}}
|
||||
\put(129.0,15.0){{\tiny NOUN}}
|
||||
\put(0.0,-13.0){{\scriptsize {\slshape a}}}
|
||||
\put(37.0,-13.0){{\scriptsize {\slshape long}}}
|
||||
\put(83.0,-13.0){{\scriptsize {\slshape bus}}}
|
||||
\put(129.0,-13.0){{\scriptsize {\slshape trip}}}
|
||||
\end{picture}
|
||||
|
||||
\bigskip
|
||||
|
||||
- \small correction: "en lång __bussresa__"
|
||||
- \small translation: "a long bus trip"
|
||||
|
||||
## Example 2
|
||||
\setlength{\unitlength}{0.23mm}
|
||||
\begin{picture}(195.0,110.0)
|
||||
\put(0.0,0.0){en}
|
||||
\put(37.0,0.0){lång}
|
||||
\put(83.0,0.0){\bfseries bus}
|
||||
\put(129.0,0.0){\bfseries resa}
|
||||
\put(0.0,15.0){{\tiny DET}}
|
||||
\put(37.0,15.0){{\tiny ADJ}}
|
||||
\put(83.0,15.0){{\tiny NOUN}}
|
||||
\put(129.0,15.0){{\tiny NOUN}}
|
||||
\put(0.0,-13.0){{\scriptsize {\slshape a}}}
|
||||
\put(37.0,-13.0){{\scriptsize {\slshape long}}}
|
||||
\put(83.0,-13.0){{\scriptsize {\slshape bus}}}
|
||||
\put(129.0,-13.0){{\scriptsize {\slshape trip}}}
|
||||
\put(74.5,30.0){\oval(126.67441860465117,100.0)[t]}
|
||||
\put(11.162790697674417,35.0){\vector(0,-1){5.0}}
|
||||
\put(67.75,83.0){{\tiny det}}
|
||||
\put(93.0,30.0){\oval(88.73913043478261,66.66666666666667)[t]}
|
||||
\put(48.630434782608695,35.0){\vector(0,-1){5.0}}
|
||||
\put(84.0,66.33333333333334){{\tiny amod}}
|
||||
\put(116.0,30.0){\oval(39.47826086956522,33.333333333333336)[t]}
|
||||
\put(96.26086956521739,35.0){\vector(0,-1){5.0}}
|
||||
\put(75.5,49.66666666666667){{\tiny compound:*}}
|
||||
\put(144.0,110.0){\vector(0,-1){80.0}}
|
||||
\put(149.0,100.0){{\tiny root}}
|
||||
\end{picture}
|
||||
|
||||
\bigskip
|
||||
|
||||
- \small correction: "en lång __bussresa__"
|
||||
- \small translation: "a long bus trip"
|
||||
|
||||
## Example 3
|
||||
\setlength{\unitlength}{0.23mm}
|
||||
\begin{picture}(531.0,110.0)
|
||||
\small
|
||||
\put(0.0,0.0){\bfseries den}
|
||||
\put(46.0,0.0){\bfseries är}
|
||||
\put(83.0,0.0){\bfseries smog}
|
||||
\put(129.0,0.0){salt}
|
||||
\put(175.0,0.0){och}
|
||||
\put(230.0,0.0){det}
|
||||
\put(276.0,0.0){bra}
|
||||
\put(313.0,0.0){för}
|
||||
\put(350.0,0.0){\bfseries all}
|
||||
\put(387.0,0.0){\bfseries kropen}
|
||||
\put(0.0,15.0){{\tiny PRON}}
|
||||
\put(46.0,15.0){{\tiny AUX}}
|
||||
\put(83.0,15.0){{\tiny NOUN}}
|
||||
\put(129.0,15.0){{\tiny NOUN}}
|
||||
\put(175.0,15.0){{\tiny CCONJ}}
|
||||
\put(230.0,15.0){{\tiny PRON}}
|
||||
\put(276.0,15.0){{\tiny ADJ}}
|
||||
\put(313.0,15.0){{\tiny ADP}}
|
||||
\put(350.0,15.0){{\tiny DET}}
|
||||
\put(387.0,15.0){{\tiny NOUN}}
|
||||
\put(0.0,-13.0){{\scriptsize {\slshape it}}}
|
||||
\put(46.0,-13.0){{\scriptsize {\slshape is}}}
|
||||
\put(83.0,-13.0){{\scriptsize {\slshape taste?}}}
|
||||
\put(129.0,-13.0){{\scriptsize {\slshape salt}}}
|
||||
\put(175.0,-13.0){{\scriptsize {\slshape and}}}
|
||||
\put(230.0,-13.0){{\scriptsize {\slshape it}}}
|
||||
\put(276.0,-13.0){{\scriptsize {\slshape good}}}
|
||||
\put(313.0,-13.0){{\scriptsize {\slshape for}}}
|
||||
\put(350.0,-13.0){{\scriptsize {\slshape all}}}
|
||||
\put(387.0,-13.0){{\scriptsize {\slshape the.body}}}
|
||||
\put(51.5,30.0){\oval(79.3855421686747,66.66666666666667)[t]}
|
||||
\put(11.807228915662648,35.0){\vector(0,-1){5.0}}
|
||||
\put(40.25,66.33333333333334){{\tiny nsubj}}
|
||||
\put(74.5,30.0){\oval(28.89189189189189,33.333333333333336)[t]}
|
||||
\put(60.054054054054056,35.0){\vector(0,-1){5.0}}
|
||||
\put(67.75,49.66666666666667){{\tiny cop}}
|
||||
\put(98.0,110.0){\vector(0,-1){80.0}}
|
||||
\put(103.0,100.0){{\tiny root}}
|
||||
\put(126.0,30.0){\oval(39.47826086956522,33.333333333333336)[t]}
|
||||
\put(145.73913043478262,35.0){\vector(0,-1){5.0}}
|
||||
\put(117.0,49.66666666666667){{\tiny nmod}}
|
||||
\put(235.5,30.0){\oval(98.02970297029702,66.66666666666667)[t]}
|
||||
\put(186.4851485148515,35.0){\vector(0,-1){5.0}}
|
||||
\put(231.0,66.33333333333334){{\tiny cc}}
|
||||
\put(263.0,30.0){\oval(39.47826086956522,33.333333333333336)[t]}
|
||||
\put(243.26086956521738,35.0){\vector(0,-1){5.0}}
|
||||
\put(251.75,49.66666666666667){{\tiny nsubj}}
|
||||
\put(199.5,30.0){\oval(191.4455958549223,100.0)[t]}
|
||||
\put(295.22279792746116,35.0){\vector(0,-1){5.0}}
|
||||
\put(190.5,83.0){{\tiny conj}}
|
||||
\put(360.0,30.0){\oval(69.94594594594595,66.66666666666667)[t]}
|
||||
\put(325.02702702702703,35.0){\vector(0,-1){5.0}}
|
||||
\put(351.0,66.33333333333334){{\tiny case}}
|
||||
\put(378.5,30.0){\oval(28.89189189189189,33.333333333333336)[t]}
|
||||
\put(364.05405405405406,35.0){\vector(0,-1){5.0}}
|
||||
\put(371.75,49.66666666666667){{\tiny det}}
|
||||
\put(351.5,30.0){\oval(108.29729729729729,100.0)[t]}
|
||||
\put(405.64864864864865,35.0){\vector(0,-1){5.0}}
|
||||
\put(344.75,83.0){{\tiny obl}}
|
||||
\end{picture}
|
||||
|
||||
\bigskip
|
||||
|
||||
- \small correction: "__Det smakar__ salt och det __är__ bra för __hela kroppen__"
|
||||
- \small translation: "it tastes salt and it's good for the whole body"
|
||||
|
||||
## Example 3: parser output
|
||||
\setlength{\unitlength}{0.23mm}
|
||||
\begin{picture}(531.0,110.0)
|
||||
\put(0.0,0.0){\bfseries den}
|
||||
\put(46.0,0.0){\bfseries är}
|
||||
\put(83.0,0.0){\bfseries smog}
|
||||
\put(129.0,0.0){salt}
|
||||
\put(175.0,0.0){och}
|
||||
\put(230.0,0.0){det}
|
||||
\put(276.0,0.0){bra}
|
||||
\put(313.0,0.0){för}
|
||||
\put(350.0,0.0){\bfseries all}
|
||||
\put(387.0,0.0){\bfseries kropen}
|
||||
\put(0.0,15.0){{\tiny PRON}}
|
||||
\put(46.0,15.0){{\tiny AUX}}
|
||||
\put(83.0,15.0){{\tiny \color{SecondaryColor} ADV}}
|
||||
\put(129.0,15.0){{\tiny \color{SecondaryColor} ADJ}}
|
||||
\put(175.0,15.0){{\tiny CCONJ}}
|
||||
\put(230.0,15.0){{\tiny PRON}}
|
||||
\put(276.0,15.0){{\tiny ADJ}}
|
||||
\put(313.0,15.0){{\tiny ADP}}
|
||||
\put(350.0,15.0){{\tiny DET}}
|
||||
\put(387.0,15.0){{\tiny NOUN}}
|
||||
\put(74.5,30.0){\oval(126.67441860465117,100.0)[t]}
|
||||
\put(11.162790697674417,35.0){\vector(0,-1){5.0}}
|
||||
\put(63.25,83.0){{\tiny nsubj}}
|
||||
\put(97.5,30.0){\oval(79.3855421686747,66.66666666666667)[t]}
|
||||
\put(57.80722891566265,35.0){\vector(0,-1){5.0}}
|
||||
\put(90.75,66.33333333333334){{\tiny cop}}
|
||||
\put(116.0,30.0){\color{SecondaryColor} \oval(39.47826086956522,33.333333333333336)[t]}
|
||||
\put(96.26086956521739,35.0){\color{SecondaryColor} \vector(0,-1){5.0}}
|
||||
\put(102.5,49.66666666666667){{\tiny \color{SecondaryColor} advmod}}
|
||||
\put(144.0,110.0){\color{SecondaryColor} \vector(0,-1){80.0}}
|
||||
\put(149.0,100.0){{\tiny \color{SecondaryColor} root}}
|
||||
\put(235.5,30.0){\oval(98.02970297029702,66.66666666666667)[t]}
|
||||
\put(186.4851485148515,35.0){\vector(0,-1){5.0}}
|
||||
\put(231.0,66.33333333333334){{\tiny cc}}
|
||||
\put(263.0,30.0){\oval(39.47826086956522,33.333333333333336)[t]}
|
||||
\put(243.26086956521738,35.0){\vector(0,-1){5.0}}
|
||||
\put(251.75,49.66666666666667){{\tiny nsubj}}
|
||||
\put(222.5,30.0){\oval(144.9591836734694,100.0)[t]}
|
||||
\put(294.9795918367347,35.0){\vector(0,-1){5.0}}
|
||||
\put(213.5,83.0){{\tiny conj}}
|
||||
\put(360.0,30.0){\oval(69.94594594594595,66.66666666666667)[t]}
|
||||
\put(325.02702702702703,35.0){\vector(0,-1){5.0}}
|
||||
\put(351.0,66.33333333333334){{\tiny case}}
|
||||
\put(378.5,30.0){\oval(28.89189189189189,33.333333333333336)[t]}
|
||||
\put(364.05405405405406,35.0){\vector(0,-1){5.0}}
|
||||
\put(371.75,49.66666666666667){{\tiny det}}
|
||||
\put(351.5,30.0){\oval(108.29729729729729,100.0)[t]}
|
||||
\put(405.64864864864865,35.0){\vector(0,-1){5.0}}
|
||||
\put(344.75,83.0){{\tiny obl}}
|
||||
\end{picture}
|
||||
|
||||
\bigskip \bigskip
|
||||
|
||||
\footnotesize (obtained with the UDPipe 2 Talbanken 2.15 model)
|
||||
|
||||
<!--and this is without FEATS and LEMMA!-->
|
||||
|
||||
## Our principles
|
||||
- the validator is a tool, not a goal:
|
||||
- __*literal* criteria at the token level__
|
||||
- __*distributional* criteria at the syntax level__
|
||||
- __borrow from L1__ guidelines when necessary
|
||||
- __correction-aware annotation__: the annotation of learner sentences should be consistent with the semantics of the correction hypothesis
|
||||
|
||||
## Status
|
||||
- guidelines and test set (200/500 sentences) WIP
|
||||
- remaining 5000 + 500 sentences TODO \pause
|
||||
- you are welcome to __participate__!
|
||||
- you do _not_ have to be a native speaker
|
||||
(in fact, none of the current annotators is)
|
||||
- you _might_ be able to do this as a course project
|
||||
|
||||
# Exploring parallel learner treebanks with STUnD
|
||||
|
||||
## STUnD
|
||||
- _Sökverktyg för Tvåspråkiga Universal Dependencies-trädbanker_, or
|
||||
- Search Tool for (parallel) Universal Dependencies Treebanks
|
||||
- available at `demo.spraakbanken.gu.se/stund` (hopefully)
|
||||
|
||||
## Under the hood
|
||||
1. identify subtree alignments
|
||||
2. run the query on the LHS treebanks, looking for matching subtres
|
||||
3. find the corresponding RHS subtree (and check if it matches the RHS-specific patters)
|
||||
|
||||
## Use cases
|
||||
- error retrieval: patterns (queries) $\to$ trees
|
||||
- pattern extraction: trees $\to$ patterns
|
||||
- feedback comment generation: patterns $\to$ natural language comments <!--maybe goto mittsem slides-->
|
||||
|
||||
# Sources
|
||||
|
||||
## In order of appearance
|
||||
- \small John Lee, Keying Li, and Herman Leung. _L1-L2 parallel dependency treebank as learner corpus_. In Proceedings of the 15th International Conference on Parsing Technologies, pages 44-49, Pisa, Italy, September 2017. Association for Computational Linguistics
|
||||
- \small John Lee, Herman Leung, and Keying Li. _Towards Universal Dependencies for learner Chinese_. In Marie-Catherine de Marneffe, Joakim Nivre, and Sebastian Schuster, editors, Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies (UDW 2017), pages 67-71, Gothenburg, Sweden, may 2017. Association for Computational Linguistics
|
||||
|
||||
## In order of appearance
|
||||
- \small Yevgeni Berzak, Jessica Kenney, Carolyn Spadine, Jing Xian Wang, Lucia Lam, Keiko Sophie Mori, Sebastian Garza, and Boris Katz. _Universal Dependencies for learner English_. In Katrin Erk and Noah A. Smith, editors, Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 737-746, Berlin, Germany, aug 2016. Association for Computational Linguistics.
|
||||
- \small Elisa Di Nuovo, Manuela Sanguinetti, Alessandro Mazzei, Elisa Corino, and Cristina Bosco. _VALICO-UD: Treebanking an Italian learner corpus in Universal Dependencies_. IJCoL. Italian Journal of Computational Linguistics, 8(8-1), 2022
|
||||
|
||||
## In order of appearance
|
||||
- \small Hakyung Sung and Gyu-Ho Shin. _Constructing a dependency treebank for second language learners of Korean_. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, editors, Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 3747-3758, Torino, Italia, may 2024. ELRA and ICCL
|
||||
- \small Hakyung Sung and Gyu-Ho Shin. _Second language Korean Universal Dependency treebank v1.2: Focus on data augmentation and annotation scheme refinement_. In Špela Arhar Holdt, Nikolai Ilinykh, Barbara Scalvini, Micaella Bruton, Iben Nyholm Debess, and Crina Madalina Tudor, editors, Proceedings of the Third Workshop on Resources and Representations for Under-Resourced Languages and Domains (RESOURCEFUL-2025), pages 13-19, Tallinn, Estonia, March 2025. University of Tartu Library, Estonia
|
||||
|
||||
## In order of appearance
|
||||
- \small Alla Rozovskaya. _Universal Dependencies for learner Russian_. In Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, and Nianwen Xue, editors, Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 17112-17119, Torino, Italia, may 2024. ELRA and ICCL
|
||||
- \small Elena Volodina, Lena Granstedt, Arild Matsson, Beáta Megyesi, Ildikó Pilán, Julia Prentice, Dan Rosén, Lisa Rudebeck, Carl-Johan Schenström, Gunlög Sundberg, et al. _The SweLL language learner corpus: From design to annotation_. Northern European Journal of Language Technology, 6:67-104, 2019
|
||||
- \small Arianna Masciolini. _A query engine for L1-L2 parallel dependency treebanks_. In Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa), pages 574--587, Tórshavn, Faroe Islands, May 2023. University of Tartu Library
|
||||
|
||||
## In order of appearance
|
||||
- \small Arianna Masciolini, Elena Volodina, and Dana Dannélls. _Towards automatically extracting morphosyntactical error patterns from L1-L2 parallel dependency treebanks_. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 585-597, Toronto, Canada, jul 2023. Association for Computational Linguistics
|
||||
- \small Arianna Masciolini and Márton A Tóth. _STUnD: ett Sökverktyg för Tvåspråkiga Universal Dependencies-trädbanker_. In Proceedings of the Huminfra Conference, pages 95-109, Gothenburg, Sweden, 2024
|
||||
|
||||
## To appear
|
||||
- \small Arianna Masciolini, Herbert Lange and Márton A Tóth. _Exploring parallel corpora with STUnD: a Search Tool for Universal Dependencies_. In the upcoming Huminfra Handbook, Gothenburg, Sweden, __most likely__ 2025
|
||||
- \small a paper about harmonization of UD guidelines for L2 treebanks (under review)
|
||||