From eb50c5a9e3111ff1aa31df918e86b7486b874cab Mon Sep 17 00:00:00 2001 From: aarneranta Date: Thu, 27 May 2021 11:16:33 +0200 Subject: [PATCH] more in morphodict/README.md --- src/morphodict/README.md | 39 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/src/morphodict/README.md b/src/morphodict/README.md index 8717c2d0e..590e4c64f 100644 --- a/src/morphodict/README.md +++ b/src/morphodict/README.md @@ -27,7 +27,7 @@ They should be numbered as Such distinctions are made in all cases where there are alternative inflections, even if there is no sense distinction: - `learn_1_V` ("learn, learned, learned") -- `learn_1_V` ("learn, learnt, learnt") +- `learn_2_V` ("learn, learnt, learnt") Hence, - no `variants` should appear in the MorphoDict @@ -38,6 +38,20 @@ The dictionary will also exclude *multiwords* consisting of several tokens. Most of the time, even *compounds* written as single tokens should be excluded. However, as the status of a compound is not always clear, and since they do not create spurious morphological analyses, they can be tolerated, in particular if extracted from legacy sources. +Since multiwords and compounds are excluded, `Paradigms` and `MakeStructural` should for each language provide API functions for easy definitions of them, preferably of the form +``` + mkC : Str -> C -> C +``` +The situation when this is not enough is when separate functions are needed for gluing and concatenation compounds. + +*Open question*: what to do with compound prepositions that are common in e.g. English? +The above principles imply +``` + according_to_Prep = mkPrep "according" to_Prep +``` +defined *outside* `MorphoDictEng`, so that `mkPrep` comes from `ParadigmsEng` and `to_Prep` from `MorphoDictEng`. +This may sound like against tradition, but follows the general guidelines of morphological dictionaries. + ## Relevant categories @@ -85,6 +99,20 @@ However, if the word contains characters that are not legal in identifiers, the If function names are formed by the API function `PGF.mkCId`, these conventions are automatically followed. +## Coding conventions + +To enable easy ocular and automatic inspection, +- write one entry per line, each prefixed by `fun` or `lin` keyword +- sort the entries alphabetically +- use paradigms with enough many arguments to make the characteristic forms explicit + +To guarantee compatibility with the rest of the RGL and application grammars, +- paradigms used should be imported from `Paradigms` and `MakeStructural` rather than defined in `MorphoDict` itself +- import of *low-level modules* such as `Res` should be avoided +- `MorphoDict` should be self-contained, i.e. not inherit from other modules such as `Structural` or `Irreg`. But it is OK to `open` them in a qualified mode to use when defining linearizations. + + + ## Bootstrapping with `MkMorphoDict` THIS WAS AN EARLY EXPERIMENT, TO BE UPDATED @@ -107,3 +135,12 @@ Swedish, using a dump of SALDO (not available in these sources) runghc MkMorphodict.hs saldo/Saldo.pgf MorphoDictSwe ``` +## Things to do + +To support the construction of a `MorphoDict`, the following should be guaranteed in the RGL: +- in `Paradigms`, explicit smart paradigms with characteristic forms and inherent features for each category +- in `Paradigms`, API constants for all inherent features that are needed +- in `Paradigms`, compound-constructing functions for all categories that need them +- in `Extra`, the extra categories that one wants to include in that language + +