mirror of
https://github.com/GrammaticalFramework/gf-rgl.git
synced 2026-06-11 08:06:34 -06:00
MkMorphoDict: Extracting a minimal morphological dictionary from an existing GF dictionary.
Aarne Ranta 2020-03-02
principles:
There should be a single source for each lemgram (i.e. inflection table of a word)
Functions names should be easy to guess: baseform_Category (but avoiding accidental errors if this is not a unique key)
Hence,
Functions are 1-to-1 with lemgrams, i.e. inflection tables, thus
- no sense distinctions
- no subcategorizations
- no variants
Functionname = baseform_category, with exceptions
- same baseform_Category, different inflection tables: lie_1_V, lie_2_V
- words that have non-ident characters: 'bird\'s-eye_A'
- words that start with non-letters: W_'tween_Adv
Example run, English:
gf -make ../english/DictEng.gf
runghc MkMorphodict.hs DictEngAbs.pgf MorphoDictEng
Result: 64923 -> 56599 functions, of which 21679 could be compounds
Swedish, using a dump of SALDO (not available in these sources)
cd saldo/
runghc SaldoGF.hs
# combine abs.tmp with Saldo.header to obtain Saldo.gf
# combine cnc.tmp with SaldoSwe.header to obtain SaldoSwe.gf
gf -make SaldoSwe.gf
cd ..
runghc MkMorphodict.hs saldo/Saldo.pgf MorphoDictSwe