mirror of
https://github.com/GrammaticalFramework/gf-rgl.git
synced 2026-05-27 08:58:55 -06:00
updated morphodict/README.md with MkMorphodict help
This commit is contained in:
@@ -20,7 +20,6 @@ import System.Environment (getArgs)
|
||||
|
||||
-- example:
|
||||
-- gf -make ../english/DictEng.gf
|
||||
-- runghc
|
||||
-- runghc MkMorphodict.hs pgf MorphoDictEng.config DictEngAbs.pgf MorphoDictEng
|
||||
-- 64923 -> 56599 functions
|
||||
|
||||
@@ -138,7 +137,7 @@ mkMorphoDict env =
|
||||
_ -> []
|
||||
|
||||
renames :: [RawRule] -> [RuleData]
|
||||
-- renames fls = [((mkFun (f ++ [show i,c]),c),l) | (i,((f,c),l)) <- zip [1..] fls] -- disambiguate with int
|
||||
--- renames fls = [((mkFun (f ++ [show i,c]),c),l) | (i,((f,c),l)) <- zip [1..] fls] -- disambiguate with int
|
||||
renames fls = [((mkFun (f ++ fs ++ [c]),c),l) | (i,(((f,c),l),fs)) <- zip [1..] (zip fls (minimize fls))] -- disambiguate with different forms
|
||||
|
||||
minimize :: [RawRule] -> [[String]]
|
||||
|
||||
@@ -21,13 +21,13 @@ Functions names should be easy to guess:
|
||||
- `baseform_Category`
|
||||
|
||||
Baseforms that have many different lemgrams are an exception.
|
||||
They should be numbered as
|
||||
- `lie_1_V` ("lie, lay, lain")
|
||||
- `lie_2_V` ("lie, lied lied")
|
||||
They should be disambiguated by adding the differing forms, as in
|
||||
- `lie_lay_V` ("lie, lay, lain")
|
||||
- `lie_lied_V` ("lie, lied lied")
|
||||
|
||||
Such distinctions are made in all cases where there are alternative inflections, even if there is no sense distinction:
|
||||
- `learn_1_V` ("learn, learned, learned")
|
||||
- `learn_2_V` ("learn, learnt, learnt")
|
||||
- `learn_learned_V` ("learn, learned, learned")
|
||||
- `learn_learnt_V` ("learn, learnt, learnt")
|
||||
|
||||
Hence,
|
||||
- no `variants` should appear in the MorphoDict
|
||||
@@ -115,26 +115,50 @@ To guarantee compatibility with the rest of the RGL and application grammars,
|
||||
|
||||
## Bootstrapping with `MkMorphoDict`
|
||||
|
||||
THIS WAS AN EARLY EXPERIMENT, TO BE UPDATED
|
||||
|
||||
Example run, English:
|
||||
|
||||
gf -make ../english/DictEng.gf
|
||||
runghc MkMorphodict.hs DictEngAbs.pgf MorphoDictEng
|
||||
|
||||
Result: 64923 -> 56599 functions, of which 21679 could be compounds
|
||||
|
||||
Swedish, using a dump of SALDO (not available in these sources)
|
||||
```
|
||||
cd saldo/
|
||||
runghc SaldoGF.hs
|
||||
# combine abs.tmp with Saldo.header to obtain Saldo.gf
|
||||
# combine cnc.tmp with SaldoSwe.header to obtain SaldoSwe.gf
|
||||
gf -make SaldoSwe.gf
|
||||
cd ..
|
||||
runghc MkMorphodict.hs saldo/Saldo.pgf MorphoDictSwe
|
||||
gf -make ../english/DictEng.gf
|
||||
runghc MkMorphodict.hs pgf MorphoDictEng.config DictEngAbs.pgf MorphoDictEng
|
||||
```
|
||||
Or, if you have raw data from another source, of the format "N woman women", you can do
|
||||
```
|
||||
|
||||
runghc MkMorphodict.hs raw MorphoDictEng.config raw_words_eng.txt MorphoDictEng
|
||||
```
|
||||
The script needs a *configuration file* mapping legacy categories and forms lists to parts of GF code:
|
||||
```
|
||||
N : N mkN 0 2
|
||||
A : A mkA 0 2 4 6
|
||||
V : V mkV 0 4 2
|
||||
V2 : V mkV 0 4 2
|
||||
Adv : Adv mkAdv 0
|
||||
Prep : Prep mkPrep 0
|
||||
```
|
||||
In addition, it needs *header files* containing lines to be prefixed to the generated files:
|
||||
```
|
||||
concrete MorphoDictEng of MorphoDictEngAbs =
|
||||
CatEng [N,A,V,Adv,Prep] **
|
||||
open
|
||||
ParadigmsEng
|
||||
in
|
||||
{
|
||||
```
|
||||
```
|
||||
abstract MorphoDictEngAbs =
|
||||
Cat [N,A,V,Adv,Prep] **
|
||||
{
|
||||
```
|
||||
For more details, we refer to `MkMorphodict.hs` for the time being.
|
||||
|
||||
If the config and header files are sound, the script produces compilable GF files.
|
||||
They also mostly comply to the guidelines given in this document.
|
||||
|
||||
Some things TODO:
|
||||
- deal with multiwords such as "more regular" generated by Paradigms
|
||||
- use references to native Irreg files instead of very long smart paradigms
|
||||
- support increments in addition to overwrites
|
||||
|
||||
|
||||
|
||||
## Things to do
|
||||
|
||||
To support the construction of a `MorphoDict`, the following should be guaranteed in the RGL:
|
||||
|
||||
Reference in New Issue
Block a user