From 088f52a0f6390e22ab1be8a235019a490082a636 Mon Sep 17 00:00:00 2001
From: Arianna Masciolini <uzkamascio@gmail.com>
Date: Sun, 29 Mar 2026 21:15:06 +0200
Subject: [PATCH] 2026 modifications to lab 3

---
 lab3/README.md                      | 39 +++++++++++------------------
 lab3/comp-syntax-corpus-english.txt |  2 +-
 lab3/comp-syntax-corpus-swedish.txt | 24 ------------------
 3 files changed, 15 insertions(+), 50 deletions(-)
 delete mode 100644 lab3/comp-syntax-corpus-swedish.txt
diff --git a/lab3/README.md b/lab3/README.md
index 6f57e6d..b705a81 100644
--- a/lab3/README.md
+++ b/lab3/README.md
@@ -12,28 +12,27 @@ Go to [universaldependencies.org](https://universaldependencies.org/) and downlo
 Choose a short (5-10 tokens) and a long (>25 words) sentence and convert it from CoNNL-U to a graphical trees by hand.
 
 ### Step 2: choose a corpus
-Choose one of the two corpora provided in this folder:
+Choose a corpus of 25+ sentences.
 
-- [`comp-syntax-corpus-english.txt`](comp-syntax-corpus-english.txt) is a combination of __English__ sentences from different sources, including [the Parallel UD treebank (PUD)](https://github.com/UniversalDependencies/UD_English-PUD/tree/master). If you want to cheat - or just check your answers - you can look for them in the official treebank. You can also compare your analyses with those of an automatic parser, such as [UDPipe](https://lindat.mff.cuni.cz/services/udpipe/), which you can try directly in your browser. These automatic analyses must of course be taken with a grain of salt
-- [`comp-syntax-corpus-swedish.txt`](comp-syntax-corpus-swedish.txt) consists of teacher-corrected sentences from the [__Swedish__ Learner Language (SweLL) corpus](https://spraakbanken.gu.se/en/resources/swell-gold), which is currently being annotated in UD for the first time. 
-In this case, there is no "gold standard" to check your answers against, but you can still compare your solutions with [UDPipe](https://lindat.mff.cuni.cz/services/udpipe/)'s automatic analyses.
+If you want to start with __English__, you can use[`comp-syntax-corpus-english.txt`](comp-syntax-corpus-english.txt) is a combination of sentences from different sources, including [the Parallel UD treebank (PUD)](https://github.com/UniversalDependencies/UD_English-PUD/tree/master). If you want to cheat - or just check your answers - you can look for them in the official treebank. You can also compare your analyses with those of an automatic parser, such as [UDPipe](https://lindat.mff.cuni.cz/services/udpipe/), which you can try directly in your browser. These automatic analyses must of course be taken with a grain of salt. Note that the first few sentences of this corpus are pre-tokenized and POS-tagged. Each token is in the form `word:<UPOS>`.
 
-In both corpora, the first few sentences are pre-tokenized and POS-tagged. Each token is in the form
 
-`word:<UPOS>`.
+If you want to work with __Swedish__ and might be interested in contributing to an [official UD treebank](https://github.com/universaldependencies/UD_Swedish-SweLL), ask Arianna for [a sample of the Swedish Learner Language corpus](https://spraakbanken.gu.se/en/resources/swell).
+
+If you have other data in mind that you think would be interesting to annotate in UD (not necessarily in English or Swedish), don't hesitate to bring it up during a lab session!
 
 ### Step 3: annotate
 For each sentence in the corpus, the annotation tasks consists in:
 
 1. analyzing the sentence in UD
-2. translating it to a language of your choice
+2. translating it to a language of your choice (as long as one of the two versions is in English or Swedish)
 3. analyzing your translation
 
 The only required fields are `ID`, `FORM`, `UPOS`, `HEAD` and `DEPREL`.
 
 In the end, you will submit two parallel CoNLL-U files, one containing the analyses of the source sentences and one for the analyses of the translations.
 
-To produce the CoNLL-U files, you may work in your text editor (if you use Visual Studio Code, you can use the [vscode-conllu](https://marketplace.visualstudio.com/items?itemName=lgrobol.vscode-conllu) to get syntax highlighting), use a spreadsheet program and then export to TSV, or use a dedicated graphical annotation tool such as [Arborator](https://arborator.grew.fr/#/).
+To produce the CoNLL-U files, you may work in your text editor (you can usually get syntax highlighting by changing the extension to `.tsv`), use a spreadsheet program and then export to TSV, or use a dedicated graphical annotation tool such as [Arborator](https://arborator.grew.fr/#/) (helpful but unstable!).
 
 If you work in your text editor, it might be easier to first write a simplified CoNLL-U, with just the fields `ID`, `FORM`, `UPOS`, `HEAD` and `DEPREL`, separated by tabs, and then expand it to full CoNLL-U with [this script](https://gist.github.com/harisont/612a87d20f729aa3411041f873367fa2) (or similar).
 
@@ -54,7 +53,7 @@ To fully comply with the CoNLL-U standard, comment lines should consist of key-v
 # comment = your comment here
 ```
 
-but for this assigment lines like
+but for this assignment lines like
 
 ```
 # your comment here
@@ -63,24 +62,14 @@ but for this assigment lines like
 are perfectly acceptable too.
 
 ### Step 4: make sure your files match the CoNLL-U specification
-Once you have full CoNLL, you can use [deptreepy](https://github.com/aarneranta/deptreepy/), [STUnD](https://harisont.github.io/STUnD/) or [the official online CoNNL-U viewer](https://universaldependencies.org/conllu_viewer.html) to visualize it.
-
-With deptreepy, you will need to issue the command
-
-`cat my-file.conllu | python deptreepy.py visualize_conllu > my-file.html`
-
-which creates an HTML file you can open in you web browser.
-
-If you can visualize your trees with any of these tools, that's a very good sign that your file _more or less_ matches the CoNNL-U format!
-
-As a last step, validate your treebank with the official UD validator. 
+Check your treebank with the official UD validator. 
 To do that, clone or download the [UD tools repository](https://github.com/UniversalDependencies/tools), move inside the corresponding folder and run
 
 ```
-python validate.py PATH-TO-YOUR-TREEBANK.conllu --lang=2-LETTER-LANGCODE-FOR-YOUR-LANGUAGE --level=1
+python validate.py PATH-TO-YOUR-TREEBANK.conllu --lang=2-LETTER-LANGCODE-FOR-YOUR-LANGUAGE --level=2
 ```
 
-If you want to check for more subtle errors, you can [go up a few levels](https://harisont.github.io/gfaqs.html#ud-validator).
+Level 2 should be enough for part 2, but you can [go up a few levels](https://harisont.github.io/gfaqs.html#ud-validator) to check for more subtle errors.
 
 Submit the two CoNLL-U files on Canvas.
 
@@ -91,7 +80,7 @@ If you want to install MaChAmp on your own computer, keep in mind that very old
 For more information, see [here](https://github.com/machamp-nlp/machamp/issues/42). 
 
 ### Step 1: setting up MaChAmp
-1. optional, but recommended: create a Python virtual environment with the command
+1. create a Python virtual environment with the command
    ```
    python -m venv ENVNAME
    ```
@@ -124,7 +113,7 @@ python scripts/misc/cleanconl.py PATH-TO-A-DATASET-SPLIT
 This replaces the contents of your input file with a "cleaned up" version of the same treebank.
 
 ### Step 3: training
-Copy `compsyn.json` to `machamp/configs` and replace the traning and development data paths with the paths to the files you selected/created in step 2.
+Copy `compsyn.json` to `machamp/configs` and replace the training and development data paths with the paths to the files you selected/created in step 2.
 
 You can now train your model by running
 
@@ -152,4 +141,4 @@ Then, use the `machamp/scripts/misc/conll18_ud_eval.py` script to evaluate the s
 python scripts/misc/conll18_ud_eval.py PATH-TO-YOUR-PART1-TREEBANK predictions/OUTPUT-FILE-NAME.conllu
 ```
 
-On Canvas, submit the training logs, the predictions and the output of `conll18_ud_eval.py`, along with a short text summarizing your considerations on the performance of the parser, based on the predictions themselves and on the output of the results of the evaluation.
\ No newline at end of file
+On Canvas, submit the training logs, the predictions and the output of `conll18_ud_eval.py`, along with a short text summarizing your considerations on the performance of the parser, based on the predictions themselves and on the automatic evaluation.
\ No newline at end of file
diff --git a/lab3/comp-syntax-corpus-english.txt b/lab3/comp-syntax-corpus-english.txt
index 3d0624a..77c1136 100644
--- a/lab3/comp-syntax-corpus-english.txt
+++ b/lab3/comp-syntax-corpus-english.txt
@@ -6,7 +6,7 @@ The:<DET> study:<NOUN> of:<ADP> volcanoes:<NOUN> is:<AUX> called:<VERB> volcanol
 It:<PRON> was:<AUX> conducted:<VERB> just:<ADV> off:<ADP> the:<DET> Mexican:<ADJ> coast:<NOUN> from:<ADP> April:<PROPN> to:<ADP> June:<PROPN> .:<PUNCT>
 ":<PUNCT> Her:<PRON> voice:<NOUN> literally:<ADV> went:<VERB> around:<ADP> the:<DET> world:<NOUN> ,:<PUNCT> ":<PUNCT> Leive:<PROPN> said:<VERB> .:<PUNCT>
 A:<DET> witness:<NOUN> told:<VERB> police:<NOUN> that:<SCONJ> the:<DET> victim:<NOUN> had:<AUX> attacked:<VERB> the:<DET> suspect:<NOUN> in:<ADP> April:<PROPN> .:<PUNCT>
-It:<PRON> 's:<AUX> most:<ADV> obvious:<ADJ> when:<SSUBJ> a:<DET> celebrity:<NOUN> 's:<PART> name:<NOUN> is:<AUX> initially:<ADV> quite:<ADV> rare:<ADJ> .:<PUNCT>
+It:<PRON> 's:<AUX> most:<ADV> obvious:<ADJ> when:<SCONJ> a:<DET> celebrity:<NOUN> 's:<PART> name:<NOUN> is:<AUX> initially:<ADV> quite:<ADV> rare:<ADJ> .:<PUNCT>
 This:<PRON> has:<AUX> not:<PART> stopped:<VERB> investors:<NOUN> flocking:<VERB> to:<PART> put:<VERB> their:<PRON> money:<NOUN> in:<ADP> the:<DET> funds:<NOUN> .:<PUNCT>
 This:<DET> discordance:<NOUN> between:<ADP> economic:<ADJ> data:<NOUN> and:<CCONJ> political:<ADJ> rhetoric:<NOUN> is:<AUX> familiar:<ADJ> ,:<PUNCT> or:<CCONJ> should:<AUX> be:<AUX> .:<PUNCT>
 The:<DET> feasibility:<NOUN> study:<NOUN> estimates:<VERB> that:<SCONJ> it:<PRON> would:<AUX> take:<VERB> passengers:<NOUN> about:<ADV> four:<NUM> minutes:<NOUN> to:<PART> cross:<VERB> the:<DET> Potomac:<PROPN> River:<PROPN> on:<ADP> the:<DET> gondola:<NOUN> .:<PUNCT>
diff --git a/lab3/comp-syntax-corpus-swedish.txt b/lab3/comp-syntax-corpus-swedish.txt
deleted file mode 100644
index 002d99e..0000000
--- a/lab3/comp-syntax-corpus-swedish.txt
+++ /dev/null
@@ -1,24 +0,0 @@
-Jag:<PRON> tycker:<VERB> att:<SCONJ> du:<PRON> ska:<AUX> börja:<VERB> med:<ADP> en:<DET> språkkurs:<NOUN>.:<PUNCT>
-Flerspråkighet:<NOUN> gynnar:<VERB> oss:<PRON> även:<ADV> på:<ADP> arbetsmarknaden:<NOUN>.:<PUNCT>
-Språket:<NOUN> är:<AUX> lätt:<ADJ> och:<CCONJ> jag:<PRON> kan:<AUX> läsa:<VERB> utan:<ADP> något:<DET> problem:<PRON>.:<PUNCT>
-Man:<PRON> känner:<VERB> sig:<PRON> ensam:<ADJ> när:<SCONJ> man:<PRON> inte:<PART> kan:<AUX> prata:<VERB> språket:<NOUN> bra:<ADV>.:<PUNCT>
-Det:<PRON> kan:<AUX> vara:<AUX> kroppsspråk:<NOUN> men:<CCONJ> främst:<ADV> sker:<VERB> det:<PRON> genom:<ADP> talet:<NOUN>.
-Språket:<NOUN> är:<AUX> nyckeln:<NOUN> till:<ADP> alla:<DET> låsta:<ADJ> dörrar:<NOUN>,:<PUNCT> har:<AUX> vi:<PRON> hört:<VERB> flera:<ADJ> gånger:<NOUN>.:<PUNCT>
-Att:<PART> kunna:<VERB> ett:<DET> språk:<NOUN> är:<AUX> en:<DET> av:<ADP> de:<DET> viktigaste:<ADJ> och:<CCONJ> värdefullaste:<ADJ> egenskaper:<NOUN> en:<DET> människa:<NOUN> kan:<AUX> ha:<VERB> så:<SCONJ> det:<PRON> är:<AUX> värt:<ADJ> mer:<ADV> än:<ADP> vad:<PRON> man:<PRON> tror:<VERB>.:<PUNCT>
-Med:<ADP> andra:<ADJ> ord:<NOUN>,:<PUNCT> språket:<NOUN> är:<AUX> nyckeln:<NOUN> till:<ADP> alla:<DET> låsta:<ADJ> dörrar:<NOUN>,:<PUNCT> men:<CCONJ> det:<PRON> finns:<VERB> viktigare:<ADJ> saker:<NOUN> att:<PART> satsa:<VERB> på:<ADP> som:<PRON> jag:<PRON> kommer:<AUX> att:<PART> nämna:<VERB> längre:<ADV> ner:<ADV>.:<PUNCT>
-Han:<PRON> kom:<VERB> till:<ADP> Sverige:<PROPN> för:<ADP> 4:<NUM> år:<NOUN> sedan:<ADV>,:<PUNCT> han:<PRON> kunde:<AUX> inte:<PART> tala:<VERB> svenska:<ADJ> språket:<NOUN>,<PUNCT> ingen:<DET> engelska:<NOUN>,:<PUNCT> han:<PRON> kunde:<AUX> i:<ADP> princip:<NOUN> inte:<PART> kommunicera:<VERB> med:<ADP> någon:<PRON> här<ADV>.:<PUNCT>
-För:<ADP> det:<DET> första:<ADJ> hänger:<VERB> språket:<NOUN> ihop:<ADV> med:<ADP> tillhörighet:<NOUN>,:<PUNCT> särskilt:<ADV> för:<ADP> de:<DET> nya:<ADJ> invandrare:<NOUN> som:<PRON> har:<AUX> bestämt:<VERB> sig:<PRON> för:<ADP> att:<PART> flytta:<VERB> och:<CCONJ> bosätta:<VERB> sig:<PRON> i:<ADP> Sverige:<PROPN>.:<PUNCT>
-Om:<SCONJ> alla:<PRON> hade:<AUX> talat:<VERB> samma:<DET> språk:<NOUN> hade:<AUX> det:<PRON> förmodligen:<ADV> inte:<PART> funnits:<VERB> något:<DET> utanförskap:<NOUN>,:<PUNCT> utan:<CCONJ> man:<PRON> hade:<AUX> fått:<VERB> en:<DET> typ:<NOUN> av:<ADP> gemenskap:<NOUN> där:<ADV> man:<PRON> delar:<VERB> samma:<DET> kultur:<NOUN>.:<PUNCT>
-Att:<PART> lära:<VERB> sig:<PRON> ett:<DET> språk:<NOUN> är:<AUX> väldigt:<ADV> svårt:<ADJ>,:<PUNCT> speciellt:<ADV> för:<ADP> vuxna:<ADJ> människor:<NOUN>,:<PUNCT> och:<CCONJ> eftersom:<SCONJ> majoritetsspråket:<NOUN> blir:<VERB> en:<DET> viktig:<ADJ> del:<NOUN> i:<ADP> en:<DET> persons:<NOUN> liv:<NOUN> räcker:<VERB> det:<PRON> inte:<PART> att:<PART> tala:<VERB> det:<PRON> på:<ADP> söndagar:<NOUN> utan:<CCONJ> det:<PRON> måste:<AUX> läras:<VERB> in:<PART> som:<SCONJ> ett:<DET> modersmål:<NOUN>,:<PUNCT> vilket:<PRON> finansieras:<VERB> av:<ADP> oss:<PRON> skattebetalare:<NOUN>.:<PUNCT>
-Avslutningsvis så vill jag förmedla att vi bör rädda världen innan språken.
-Språket är ganska enkelt, och det är lätt att förstå vad romanen handlar om.
-Det är även kostsamt för staten att se till att dessa minoritetsspråk lever kvar.
-Låt mig säga att det är inte för sent att rädda de små språken, vi måste ta steget nu.
-Att hålla dessa minoritetsspråk vid liv är både slöseri med tid och mycket ekonomiskt krävande.
-Jag tackar alla lärare på Sfi som hjälper oss för att vi ska kunna bli bättre på svenska språket.
-Språk skapades för flera tusen år sedan och vissa språk har tynat bort medan några nya har skapats.
-Samhället behöver flerspråkiga och vägen till kommunikation och till att begripa andras kulturer är ett språk.
-Om man kan fler språk har man fler möjligheter att använda sig av dem vilket leder till utveckling.
-Därför tycker jag att vi bör införa ett förbud mot främmande språk i statliga myndigheter och föreningar.
-Men jag anser först och främst att språket är som själen, det som ger oss livskraft, säregenhet och karaktär.
-På Sveriges riksdags hemsida kan man läsa om hur Sverige bidrar med att skydda dessa språk med hjälp av statligt bidrag.