diff --git a/resource-1.0/doc/Resource-HOWTO.html b/resource-1.0/doc/Resource-HOWTO.html index 4435d3c8e..58e05bd46 100644 --- a/resource-1.0/doc/Resource-HOWTO.html +++ b/resource-1.0/doc/Resource-HOWTO.html @@ -7,9 +7,56 @@

Resource grammar writing HOWTO

Author: Aarne Ranta <aarne (at) cs.chalmers.se>
-Last update: Fri May 26 17:36:48 2006 +Last update: Fri Jun 16 00:59:52 2006
+

+
+

+ + +

+
+

The purpose of this document is to tell how to implement the GF resource grammar API for a new language. We will not cover how @@ -17,23 +64,43 @@ to use the resource grammar, nor how to change the API. But we will give some hints how to extend the API.

-Notice. This document concerns the API v. 1.0 which has not -yet been released. You can find the current code -in GF/lib/resource-1.0/. See the -resource-1.0/README for +A manual for using the resource grammar is found in +

+

+http://www.cs.chalmers.se/~aarne/GF/doc/resource.pdf. +

+

+A tutorial on GF, also introducing the idea of resource grammars, is found in +

+

+http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html. +

+

+This document concerns the API v. 1.0. You can find the current code in +

+

+http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/ +

+

+See the README for details on how this differs from previous versions.

+

The resource grammar API

The API is divided into a bunch of abstract modules. The following figure gives the dependencies of these modules.

- +

-The module structure is rather flat: almost every module is a direct -parent of the top module Lang. The idea +Thus the API consists of a grammar and a lexicon, which is +provided for test purposes. +

+

+The module structure is rather flat: most modules are direct +parents of Grammar. The idea is that you can concentrate on one linguistic aspect at a time, or also distribute the work among several authors. The module Cat defines the "glue" that ties the aspects together - a type system @@ -41,6 +108,7 @@ to which all the other modules conform, so that e.g. NP means the same thing in those modules that use NPs and those that constructs them.

+

Phrase category modules

The direct parents of the top will be called phrase category modules, @@ -65,6 +133,7 @@ one of a small number of different types). Thus we have

  • Idiom: idiomatic phrases such as existentials +

    Infrastructure modules

    Expressions of each phrase category are constructed in the corresponding @@ -93,6 +162,7 @@ can skip the lincat definition of a category and use the default {s : Str} until you need to change it to something else. In English, for instance, many categories do have this linearization type.

    +

    Lexical modules

    What is lexical and what is syntactic is not as clearcut in GF as in @@ -129,6 +199,45 @@ different languages on the level of a resource grammar. In other words, application grammars are likely to use the resource in different ways for different languages.

    + +

    Language-dependent syntax modules

    +

    +In addition to the common API, there is room for language-dependent extensions +of the resource. The top level of each languages looks as follows (with English as example): +

    +
    +    abstract English = Grammar, ExtraEngAbs, DictEngAbs
    +
    +

    +where ExtraEngAbs is a collection of syntactic structures specific to English, +and DictEngAbs is an English dictionary +(at the moment, it consists of IrregEngAbs, +the irregular verbs of English). Each of these language-specific grammars has +the potential to grow into a full-scale grammar of the language. These grammar +can also be used as libraries, but the possibility of using functors is lost. +

    +

    +To give a better overview of language-specific structures, +modules like ExtraEngAbs +are built from a language-independent module ExtraAbs +by restricted inheritance: +

    +
    +    abstract ExtraEngAbs = Extra [f,g,...]
    +
    +

    +Thus any category and function in Extra may be shared by a subset of all +languages. One can see this set-up as a matrix, which tells +what Extra structures +are implemented in what languages. For the common API in Grammar, the matrix +is filled with 1's (everything is implemented in every language). +

    +

    +In a minimal resource grammar implementation, the language-dependent +extensions are just empty modules, but it is good to provide them for +the sake of uniformity. +

    +

    The core of the syntax

    Among all categories and functions, a handful are @@ -153,6 +262,7 @@ rules relate the categories to each other. It is intended to be a first approximation when designing the parameter system of a new language.

    +

    Another reduced API

    If you want to experiment with a small subset of the resource API first, @@ -161,6 +271,7 @@ try out the module explained in the GF Tutorial.

    +

    The present-tense fragment

    Some lines in the resource library are suffixed with the comment @@ -176,7 +287,9 @@ implementation. To compile a grammar with present-tense-only, use i -preproc=GF/lib/resource-1.0/mkPresent LangGer.gf

    +

    Phases of the work

    +

    Putting up a directory

    Unless you are writing an instance of a parametrized implementation @@ -262,6 +375,7 @@ as e.g. VerbGer.

    +

    Direction of work

    The real work starts now. There are many ways to proceed, the main ones being @@ -360,6 +474,7 @@ and dependences there are in your language, and you can now produce very much in the order you please. +

    The develop-test cycle

    The following develop-test cycle will @@ -416,6 +531,7 @@ follow soon. (You will found out that these explanations involve a rational reconstruction of the live process! Among other things, the API was changed during the actual process to make it more intuitive.)

    +

    Resource modules used

    These modules will be written by you. @@ -472,6 +588,7 @@ almost everything. This led in practice to the duplication of almost all code on the lin and oper levels, and made the code hard to understand and maintain.

    +

    Morphology and lexicon

    The paradigms needed to implement @@ -542,6 +659,7 @@ These constants are defined in terms of parameter types and constructors in ResGer and MorphoGer, which modules are not visible to the application grammarian.

    +

    Lock fields

    An important difference between MorphoGer and @@ -588,6 +706,7 @@ in her hidden definitions of constants in Paradigms. For instance, -- mkAdv s = {s = s ; lock_Adv = <>} ;

    +

    Lexicon construction

    The lexicon belonging to LangGer consists of two modules: @@ -607,17 +726,20 @@ the coverage of the paradigms gets thereby tested and that the use of the paradigms in LexiconGer gives a good set of examples for those who want to build new lexica.

    +

    Inside grammar modules

    Detailed implementation tricks are found in the comments of each module.

    +

    The category system

    +

    Phrase category modules

    +

    Resource modules

    +

    Lexicon

    +

    Lexicon extension

    +

    The irregularity lexicon

    It may be handy to provide a separate module of irregular @@ -658,6 +784,7 @@ few hundred perhaps. Building such a lexicon separately also makes it less important to cover everything by the worst-case paradigms (mkV etc).

    +

    Lexicon extraction from a word list

    You can often find resources such as lists of @@ -692,6 +819,7 @@ When using ready-made word lists, you should think about coyright issues. Ideally, all resource grammar material should be provided under GNU General Public License.

    +

    Lexicon extraction from raw text data

    This is a cheap technique to build a lexicon of thousands @@ -699,6 +827,7 @@ of words, if text data is available in digital format. See the Functional Morphology homepage for details.

    +

    Extending the resource grammar API

    Sooner or later it will happen that the resource grammar API @@ -707,6 +836,7 @@ that it does not include idiomatic expressions in a given language. The solution then is in the first place to build language-specific extension modules. This chapter will deal with this issue (to be completed).

    +

    Writing an instance of parametrized resource grammar implementation

    Above we have looked at how a resource implementation is built by @@ -726,6 +856,7 @@ the Romance family (to be completed). Here is a set of slides on the topic.

    +

    Parametrizing a resource grammar implementation

    This is the most demanding form of resource grammar writing. @@ -742,5 +873,5 @@ is constructed from the Finnish grammar through parametrization.

    - + diff --git a/resource-1.0/doc/Resource-HOWTO.txt b/resource-1.0/doc/Resource-HOWTO.txt index 937a30df5..6b8df4563 100644 --- a/resource-1.0/doc/Resource-HOWTO.txt +++ b/resource-1.0/doc/Resource-HOWTO.txt @@ -14,11 +14,19 @@ resource grammar API for a new language. We will //not// cover how to use the resource grammar, nor how to change the API. But we will give some hints how to extend the API. +A manual for using the resource grammar is found in -**Notice**. This document concerns the API v. 1.0 which has not -yet been released. You can find the current code -in [``GF/lib/resource-1.0/`` ..]. See the -[``resource-1.0/README`` ../README] for +[``http://www.cs.chalmers.se/~aarne/GF/doc/resource.pdf`` http://www.cs.chalmers.se/~aarne/GF/doc/resource.pdf]. + +A tutorial on GF, also introducing the idea of resource grammars, is found in + +[``http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html`` ../../../doc/tutorial/gf-tutorial2.html]. + +This document concerns the API v. 1.0. You can find the current code in + +[``http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/`` ..] + +See the [``README`` ../README] for details on how this differs from previous versions. @@ -28,10 +36,13 @@ details on how this differs from previous versions. The API is divided into a bunch of ``abstract`` modules. The following figure gives the dependencies of these modules. -[Lang.png] +[Grammar.png] -The module structure is rather flat: almost every module is a direct -parent of the top module ``Lang``. The idea +Thus the API consists of a grammar and a lexicon, which is +provided for test purposes. + +The module structure is rather flat: most modules are direct +parents of ``Grammar``. The idea is that you can concentrate on one linguistic aspect at a time, or also distribute the work among several authors. The module ``Cat`` defines the "glue" that ties the aspects together - a type system @@ -127,6 +138,38 @@ application grammars are likely to use the resource in different ways for different languages. +==Language-dependent syntax modules== + +In addition to the common API, there is room for language-dependent extensions +of the resource. The top level of each languages looks as follows (with English as example): +``` + abstract English = Grammar, ExtraEngAbs, DictEngAbs +``` +where ``ExtraEngAbs`` is a collection of syntactic structures specific to English, +and ``DictEngAbs`` is an English dictionary +(at the moment, it consists of ``IrregEngAbs``, +the irregular verbs of English). Each of these language-specific grammars has +the potential to grow into a full-scale grammar of the language. These grammar +can also be used as libraries, but the possibility of using functors is lost. + +To give a better overview of language-specific structures, +modules like ``ExtraEngAbs`` +are built from a language-independent module ``ExtraAbs`` +by restricted inheritance: +``` + abstract ExtraEngAbs = Extra [f,g,...] +``` +Thus any category and function in ``Extra`` may be shared by a subset of all +languages. One can see this set-up as a matrix, which tells +what ``Extra`` structures +are implemented in what languages. For the common API in ``Grammar``, the matrix +is filled with 1's (everything is implemented in every language). + +In a minimal resource grammar implementation, the language-dependent +extensions are just empty modules, but it is good to provide them for +the sake of uniformity. + + ==The core of the syntax== Among all categories and functions, a handful are