Files
comp-syntax-gu-mlt/lab2/README.md
2025-05-17 10:05:59 +02:00

1.8 KiB

Lab 2: Multilingual text generation from Wikidata

This uses GF to generate texts from facts in the Wikidata fact database. You are given

  • an abstract syntax and an English concrete syntax, in the subdirectory grammars/
  • a json dump from Wikidata, in the subdirectory data/
  • a Python file that connects Wikidata with GF, in the subdirectory scripts/

Your task is to create a concrete syntax for some other language by using the GF RGL and evaluate the text generated by this. The steps to take are the following:

  • in scripts/, run python3 find_labels.py da > ../grammars/LabelsDan.gf
  • in grammars/, copy the beginnings of LabelsEng.gf to LabelsDan.gf, change Eng to Dan
  • in grammars/, copy NobelEng.gf to NobelDan.gf and do the necessary changes
  • in grammars/, start GF and import NobelDan.gf, to do some testing
  • in grammars/ outside GF, do gf -make NobelEng.gf NobelDan.gf
  • (if possible, do this, but see woraround below) in scripts/, generate all texts with python3 describe_nobel.py Dan

Replace da and Dan with your own language codes!

The last step above requires pip3 install pgf. If you don't manage to install pgf, a quick way to test is, in GF,

 import NobelEng.gf
 rf -file="../data/trees.gft" -lines -tree | l

If you need gender agreement of names

(This note was added late, and is therefore not required at the 2025 course)

In some languages, names of laureates requires gender agreement. In that case, use the GF command

  rf -file="../data/gendertrees.gft" -lines -tree | l

or, if it works for you, the Python command

  python3 describe_nobel.py Dan gender

This requires you to define linearizations of the gender-specific functions MaleName and FemaleName so that the gender agreement is set properly. The following works for many languages:

  FemaleName s = mkNP (mkPN s.s feminine) ;