Comment out the contents of these files, except their headers and module
- brackets.
+ brackets. This will give you a set of templates out of which the grammar
+ will grow as you uncomment and modify the files rule by rule.
@@ -207,7 +208,7 @@ were introduced above is a natural order to proceed, even though not the
only one. So you will find yourseld iterating the following steps:
-- Select a phrase module, e.g. NounDut, and uncomment one
+
- Select a phrase category module, e.g. NounDut, and uncomment one
linearization rule (for instance, DefSg, which is
not too complicated).
@@ -239,6 +240,7 @@ only one. So you will find yourseld iterating the following steps:
- Spare some tree-linearization pairs for later regression testing.
+ You can do this way (!!to be completed)
You are likely to run this cycle a few times for each linearization rule
@@ -247,11 +249,272 @@ you implement, and some hundreds of times altogether. There are 159
-Of course, you don't need to complete one phrase module before starting
+Of course, you don't need to complete one phrase category module before starting
with the next one. Actually, a suitable subset of Noun,
Verb, and Adjective will lead to a reasonable coverage
very soon, keep you motivated, and reveal errors.
+
Resource modules used
+
+These modules will be written by you.
+
+- ResDut: parameter types and auxiliary operations
+
- MorphoDut: complete inflection engine; not needed for Test.
+
+These modules are language-independent and provided by the existing resource
+package.
+
+- ParamX: parameter types used in many languages
+
- TenseX: implementation of the logical tense, anteriority,
+ and polarity parameters
+
- Coordination: operations to deal with lists and coordination
+
- Prelude: general-purpose operations on strings, records,
+ truth values, etc.
+
- Predefined: general-purpose operations with hard-coded definitions
+
+
+
+
+Morphology and lexicon
+
+When the implementation of Test is complete, it is time to
+work out the lexicon files. The underlying machinery is provided in
+MorphoDut, which is, in effect, your linguistic theory of
+Dutch morphology. It can contain very sophisticated and complicated
+definitions, which are not necessarily suitable for actually building a
+lexicon. For this purpose, you should write the module
+
+- ParadigmsDut: morphological paradigms for the lexicographer.
+
+This module provides high-level ways to define the linearization of
+lexical items, of categories N, A, V and their complement-taking
+variants.
+
+
+
+For ease of use, the Paradigms modules follow a certain
+naming convention. Thus they for each lexical category, such as N,
+the functions
+
+For the complement-taking variants, such as V2, we provide
+
+The golden rule for the design of paradigms is that
+
+- The user will only need function applications with constants and strings,
+ never any records or tables.
+
+The discipline of data abstraction moreover requires that the user of the resource
+is not given access to parameter constructors, but only to constants that denote
+them. This gives the resource grammarian the freedom to change the underlying
+data representation if needed. It means that the ParadigmsDut module has
+to define constants for those parameter types and constructors that
+the application grammarian may need to use, e.g.
+
+ oper
+ Case : Type ;
+ nominative, accusative, genitive : Case ;
+
+These constants are defined in terms of parameter types and constructors
+in ResDut and MorphoDut, which modules are are not
+accessible to the application grammarian.
+
+
+Lock fields
+
+An important difference between MorphoDut and
+ParadigmsDut is that the former uses "raw" record types
+as lincats, whereas the latter used category symbols defined in
+CatDut. When these category symbols are used to denote
+record types in a resource modules, such as ParadigmsDut,
+a lock field is added to the record, so that categories
+with the same implementation are not confused with each other.
+(This is inspired by the newtype discipline in Haskell.)
+For instance, the lincats of adverbs and conjunctions may be the same
+in CatDut:
+
+ lincat Adv = {s : Str} ;
+ lincat Conj = {s : Str} ;
+
+But when these category symbols are used to denote their linearization
+types in resource module, these definitions are translated to
+
+ oper Adv : Type = {s : Str ; lock_Adv : {}} ;
+ oper Conj : Type = {s : Str} ; lock_Conj : {}} ;
+
+In this way, the user of a resource grammar cannot confuse adverbs with
+conjunctions. In other words, the lock fields force the type checker
+to function as grammaticality checker.
+
+
+
+When the resource grammar is opened in an application grammar, the
+lock fields are never seen (except possibly in type error messages),
+and the application grammarian should never write them herself. If she
+has to do this, it is a sign that the resource grammar is incomplete, and
+the proper way to proceed is to fix the resource grammar.
+
+
+
+The resource grammarian has to provide the dummy lock field values
+in her hidden definitions of constants in Paradigms. For instance,
+
+ mkAdv : Str -> Adv ;
+ -- mkAdv s = {s = s ; lock_Adv = <>} ;
+
+
+
+Lexicon construction
+
+The lexicon belonging to LangDut consists of two modules:
+
+- StructuralDut, structural words, built by directly using
+ MorphoDut.
+
- BasicDut, content words, built by using ParadigmsDut.
+
+The reason why MorphoDut has to be used in StructuralDut
+is that ParadigmsDut does not contain constructors for closed
+word classes such as pronouns and determiners. The reason why we
+recommend ParadigmsDut for building BasicDut is that
+the coverage of the paradigms gets thereby tested and that the
+use of the paradigms in BasicDut gives a good set of examples for
+those who want to build new lexica.
+
+
+
+
+Inside phrase category modules
+
+Noun
+
+Verb
+
+Adjective
+
+
+Lexicon extension
+
+The irregularity lexicon
+
+It may be handy to provide a separate module of irregular
+verbs and other words which are difficult for a lexicographer
+to handle. There are usually a limited number of such words - a
+few hundred perhaps. Building such a lexicon separately also
+makes it less important to cover everything by the
+worst-case paradigms (mkV etc).
+
+
+
+Lexicon extraction from a word list
+
+You can often find resources such as lists of
+irregular verbs on the internet. For instance, the
+
+Dutch for Travelers page gives a list of verbs in the
+traditional tabular format, which begins as follows:
+
+ begrijpen begrijp begreep begrepen to understand
+ bijten bijt beet gebeten to bite
+ binden bind bond gebonden to tie
+ breken breek brak gebroken to break
+
+All you have to do is to write a suitable verb paradigm
+
+ irregV : Str -> Str -> Str -> Str -> V ;
+
+and a Perl or Python or Haskell script that transforms
+the table to
+
+ begrijpen_V = irregV "begrijpen" "begrijp" "begreep" "begrepen" ;
+ bijten_V = irregV "bijten" "bijt" "beet" "gebeten" ;
+ binden_V = irregV "binden" "bind" "bond" "gebonden" ;
+
+(You may want to use the English translation for some purpose, as well.)
+
+
+
+When using ready-made word lists, you should think about
+coyright issues. Ideally, all resource grammar material should
+be provided under GNU General Public License.
+
+
+
+
Lexicon extraction from raw text data
+
+This is a cheap technique to build a lexicon of thousands
+of words, if text data is available in digital format.
+See the
+Functional Morphology homepage for details.
+
+
+
+Extending the resource grammar API
+
+Sooner or later it will happen that the resource grammar API
+does not suffice for all applications. A common reason is
+that it does not include idiomatic expressions in a given language.
+The solution then is in the first place to build language-specific
+extension modules. This chapter will deal with this issue.
+
+
+Writing an instance of parametrized resource grammar implementation
+
+Above we have looked at how a resource implementation is built by
+the copy and paste method (from English to Dutch), that is, formally
+speaking, from scratch. A more elegant solution available for
+families of languages such as Romance and Scandinavian is to
+use parametrized modules. The advantages are
+
+- theoretical: linguistic generalizations and insights
+
- practical: maintainability improves with fewer components
+
+In this chapter, we will look at an example: adding Portuguese to
+the Romance family.
+
+
+
+Parametrizing a resource grammar implementation
+
+This is the most demanding form of resource grammar writing.
+We do not recommend the method of parametrizing from the
+beginning: it is easier to have one language first implemented
+in the conventional way and then add another language of the
+same family by aprametrization. This means that the copy and
+paste method is still used, but at this time the differences
+are put into an interface module.
+
+
+
+This chapter will work out an example of how an Estonian grammar
+is constructed from the Finnish grammar through parametrization.
+
+
+