diff --git a/doc/tutorial/gf-tutorial2.html b/doc/tutorial/gf-tutorial2.html index 00caa1d58..d657f7cc8 100644 --- a/doc/tutorial/gf-tutorial2.html +++ b/doc/tutorial/gf-tutorial2.html @@ -7,7 +7,7 @@

Grammatical Framework Tutorial

Author: Aarne Ranta <aarne (at) cs.chalmers.se>
-Last update: Wed Jan 25 16:03:03 2006 +Last update: Fri Jun 16 01:02:28 2006

@@ -34,7 +34,7 @@ Last update: Wed Jan 25 16:03:03 2006
  • Labelled context-free grammars
  • The labelled context-free format -
  • The ``.gf`` grammar format +
  • The .gf grammar format @@ -222,7 +229,8 @@ These grammars can be used as libraries to define application grammars. In this way, it is possible to write a high-quality grammar without knowing about linguistics: in general, to write an application grammar by using the resource library just requires practical knowledge of -the target language. +the target language. and all theoretical knowledge about its grammar +is given by the libraries.

    Who is this tutorial for

    @@ -258,9 +266,10 @@ notation (also known as BNF). The BNF format is often a good starting point for GF grammar development, because it is simple and widely used. However, the BNF format is not good for multilingual grammars. While it is possible to -translate the words contained in a BNF grammar to another -language, proper translation usually involves more, e.g. -changing the word order in +"translate" by just changing the words contained in a +BNF grammar to words of some other +language, proper translation usually involves more. +For instance, the order of words may have to be changed:

       Italian cheese ===> formaggio italiano
    @@ -279,14 +288,14 @@ Italian adjectives usually have four forms where English
     has just one:
     

    -    delicious (wine | wines | pizza | pizzas)
    +    delicious (wine, wines, pizza, pizzas)
         vino delizioso, vini deliziosi, pizza deliziosa, pizze deliziose
     

    The morphology of a language describes the forms of its words. While the complete description of morphology -belongs to resource grammars, the tutorial will explain the -main programming concepts involved. This will moreover +belongs to resource grammars, this tutorial will explain the +programming concepts involved in morphology. This will moreover make it possible to grow the fragment covered by the food example. The tutorial will in fact build a toy resource grammar in order to illustrate the module structure of library-based application @@ -584,7 +593,7 @@ a sentence but a sequence of ten sentences.

    Labelled context-free grammars

    The syntax trees returned by GF's parser in the previous examples -are not so nice to look at. The identifiers of form Mks +are not so nice to look at. The identifiers that form the tree are labels of the BNF rules. To see which label corresponds to which rule, you can use the print_grammar = pg command with the printer flag set to cf (which means context-free): @@ -631,7 +640,7 @@ labels to each rule. In files with the suffix .cf, you can prefix rules with labels that you provide yourself - these may be more useful than the automatically generated ones. The following is a possible -labelling of paleolithic.cf with nicer-looking labels. +labelling of food.cf with nicer-looking labels.

         Is.        S       ::= Item "is" Quality ;
    @@ -661,7 +670,7 @@ With this grammar, the trees look as follows:
     
     

    -

    The ``.gf`` grammar format

    +

    The .gf grammar format

    To see what there is in GF's shell state when a grammar has been imported, you can give the plain command @@ -696,7 +705,7 @@ A GF grammar consists of two main parts:

    -The EBNF and CF formats fuse these two things together, but it is possible +The CF format fuses these two things together, but it is possible to take them apart. For instance, the sentence formation rule

    @@ -773,7 +782,7 @@ judgement forms:
     

    We return to the precise meanings of these judgement forms later. First we will look at how judgements are grouped into modules, and -show how the paleolithic grammar is +show how the food grammar is expressed by using modules and judgements.

    @@ -950,7 +959,7 @@ A system with this property is called a multilingual grammar.

    Multilingual grammars can be used for applications such as -translation. Let us buid an Italian concrete syntax for +translation. Let us build an Italian concrete syntax for Food and then test the resulting multilingual grammar.

    @@ -1179,10 +1188,11 @@ The graph uses
  • square boxes for concrete modules
  • black-headed arrows for inheritance
  • white-headed arrows for the concrete-of-abstract relation -

    - +

    + +

    System commands

    @@ -1203,7 +1213,7 @@ shell escape symbol !. The resulting graph was shown in the previou

    The command print_multi = pm is used for printing the current multilingual grammar in various formats, of which the format -printer=graph just -shows the module dependencies. Use the help to see what other formats +shows the module dependencies. Use help to see what other formats are available:

    @@ -1216,9 +1226,9 @@ are available:
     
     

    The golden rule of functional programming

    -In comparison to the .cf format, the .gf format still looks rather +In comparison to the .cf format, the .gf format looks rather verbose, and demands lots more characters to be written. You have probably -done this by the copy-paste-modify method, which is a standard way to +done this by the copy-paste-modify method, which is a common way to avoid repeating work.

    @@ -1232,8 +1242,8 @@ method. The golden rule of functional programming says that

    A function separates the shared parts of different computations from the changing parts, parameters. In functional programming languages, such as -Haskell, it is possible to share muc more than in -the languages such as C and Java. +Haskell, it is possible to share much more than in +languages such as C and Java.

    Operation definitions

    @@ -1283,11 +1293,8 @@ strings and records. resource StringOper = { oper SS : Type = {s : Str} ; - ss : Str -> SS = \x -> {s = x} ; - cc : SS -> SS -> SS = \x,y -> ss (x.s ++ y.s) ; - prefix : Str -> SS -> SS = \p,x -> ss (p ++ x.s) ; }
    @@ -1433,7 +1440,7 @@ forms of a word are formed.

    From GF point of view, a paradigm is a function that takes a lemma - -a string also known as a dictionary form - and returns an inflection +also known as a dictionary form - and returns an inflection table of desired type. Paradigms are not functions in the sense of the fun judgements of abstract syntax (which operate on trees and not on strings), but operations defined in oper judgements. @@ -1457,13 +1464,13 @@ are written together to form one token. Thus, for instance,

  • -

    Worst-case macros and data abstraction

    +

    Worst-case functions and data abstraction

    Some English nouns, such as mouse, are so irregular that it makes no sense to see them as instances of a paradigm. Even then, it is useful to perform data abstraction from the definition of the type Noun, and introduce a constructor -operation, a worst-case macro for nouns: +operation, a worst-case function for nouns:

         oper mkNoun : Str -> Str -> Noun = \x,y -> {
    @@ -1490,7 +1497,7 @@ and
     instead of writing the inflection table explicitly.
     

    -The grammar engineering advantage of worst-case macros is that +The grammar engineering advantage of worst-case functions is that the author of the resource module may change the definitions of Noun and mkNoun, and still retain the interface (i.e. the system of type signatures) that makes it @@ -1498,7 +1505,7 @@ correct to use these functions in concrete modules. In programming terms, Noun is then treated as an abstract datatype.

    -

    A system of paradigms using ``Prelude`` operations

    +

    A system of paradigms using Prelude operations

    In addition to the completely regular noun paradigm regNoun, some other frequent noun paradigms deserve to be @@ -1707,7 +1714,7 @@ The rule of subject-verb agreement in English says that the verb phrase must be inflected in the number of the subject. This means that a noun phrase (functioning as a subject), inherently has a number, which it passes to the verb. The verb does not -have a number, but must be able to receive whatever number the +have a number, but must be able to receive whatever number the subject has. This distinction is nicely represented by the different linearization types of noun phrases and verb phrases:

    @@ -1717,7 +1724,8 @@ different linearization types of noun phrases and verb phrases:

    We say that the number of NP is an inherent feature, -whereas the number of NP is parametric. +whereas the number of NP is a variable feature (or a +parametric feature).

    The agreement rule itself is expressed in the linearization rule of @@ -1823,7 +1831,7 @@ Here is an example of pattern matching, the paradigm of regular adjectives. }

    -A constructor can have patterns as arguments. For instance, +A constructor can be used as a pattern that has patterns as arguments. For instance, the adjectival paradigm in which the two singular forms are the same, can be defined

    @@ -1837,9 +1845,9 @@ can be defined

    Morphological analysis and morphology quiz

    -Even though in GF morphology -is mostly seen as an auxiliary of syntax, a morphology once defined -can be used on its own right. The command morpho_analyse = ma +Even though morphology is in GF +mostly used as an auxiliary for syntax, it +can also be useful on its own right. The command morpho_analyse = ma can be used to read a text and return for each word the analyses that it has in the current concrete syntax.

    @@ -1865,11 +1873,12 @@ the category is set to be something else than S. For instance, Score 0/1

    -Finally, a list of morphological exercises and save it in a +Finally, a list of morphological exercises can be generated +off-line saved in a file for later use, by the command morpho_list = ml

    -    > morpho_list -number=25 -cat=V
    +    > morpho_list -number=25 -cat=V | wf exx.txt
     

    The number flag gives the number of exercises generated. @@ -1884,25 +1893,36 @@ a sentence may place the object between the verb and the particle: he switched it off.

    -The first of the following judgements defines transitive verbs as +The following judgement defines transitive verbs as discontinuous constituents, i.e. as having a linearization -type with two strings and not just one. The second judgement +type with two strings and not just one. +

    +
    +    lincat TV = {s : Number => Str ; part : Str} ;
    +
    +

    +This linearization rule shows how the constituents are separated by the object in complementization.

    -    lincat TV         = {s : Number => Str ; part : Str} ;
         lin PredTV tv obj = {s = \\n => tv.s ! n ++ obj.s ++ tv.part} ;
     

    There is no restriction in the number of discontinuous constituents (or other fields) a lincat may contain. The only condition is that the fields must be of finite types, i.e. built from records, tables, -parameters, and Str, and not functions. A mathematical result +parameters, and Str, and not functions. +

    +

    +A mathematical result about parsing in GF says that the worst-case complexity of parsing -increases with the number of discontinuous constituents. Moreover, -the parsing and linearization commands only give reliable results -for categories whose linearization type has a unique Str valued -field labelled s. +increases with the number of discontinuous constituents. This is +potentially a reason to avoid discontinuous constituents. +Moreover, the parsing and linearization commands only give accurate +results for categories whose linearization type has a unique Str +valued field labelled s. Therefore, discontinuous constituents +are not a good idea in top-level categories accessed by the users +of a grammar application.

    More constructs for concrete syntax

    @@ -1953,8 +1973,25 @@ can be used e.g. if a word lacks a certain form. In general, variants should be used cautiously. It is not recommended for modules aimed to be libraries, because the user of the library has no way to choose among the variants. -Moreover, even though variants admits lists of any type, -its semantics for complex types can cause surprises. +Moreover, variants is only defined for basic types (Str +and parameter types). The grammar compiler will admit +variants for any types, but it will push it to the +level of basic types in a way that may be unwanted. +For instance, German has two words meaning "car", +Wagen, which is Masculine, and Auto, which is Neuter. +However, if one writes +

    +
    +    variants {{s = "Wagen" ; g = Masc} ; {s = "Auto" ; g = Neutr}}
    +
    +

    +this will compute to +

    +
    +    {s = variants {"Wagen" ; "Auto"} ; g = variants {Masc ; Neutr}}
    +
    +

    +which will also accept erroneous combinations of strings and genders.

    Record extension and subtyping

    @@ -2039,9 +2076,6 @@ possible to write, slightly surprisingly,

    Regular expression patterns

    -(New since 7 January 2006.) -

    -

    To define string operations computed at compile time, such as in morphology, it is handy to use regular expression patterns:

    @@ -2076,7 +2110,6 @@ Another example: English noun plural formation. x + "y" => x + "ies" ; _ => w + "s" } ; -

    Semantics: variables are always bound to the first match, which is the first @@ -2085,8 +2118,10 @@ in the sequence of binding lists Match p v defined as follows. In t

         Match (p1|p2) v = Match p1 v ++ Match p2 v
    -    Match (p1+p2) s = [Match p1 s1 ++ Match p2 s2 | i <- [0..length s], (s1,s2) = splitAt i s]
    -    Match p*      s = Match "" s ++ Match p s ++ Match (p + p) s ++ ...
    +    Match (p1+p2) s = [Match p1 s1 ++ Match p2 s2 | 
    +                         i <- [0..length s], (s1,s2) = splitAt i s]
    +    Match p*      s = [[]] if Match "" s ++ Match p s ++ Match (p+p) s ++... /= []
    +    Match -p      v = [[]] if Match p v = []
         Match c       v = [[]] if c == v  -- for constant and literal patterns c
         Match x       v = [[(x,v)]]       -- for variable patterns x
         Match x@p     v = [[(x,v)]] + M   if M = Match p v /= []
    @@ -2097,14 +2132,18 @@ Examples:
     

    Prefix-dependent choices

    -The construct exemplified in +Sometimes a token has different forms depending on the token +that follows. An example is the English indefinite article, +which is an if a vowel follows, a otherwise. +Which form is chosen can only be decided at run time, i.e. +when a string is actually build. GF has a special construct for +such tokens, the pre construct exemplified in

         oper artIndef : Str = 
    @@ -2152,22 +2191,61 @@ they can be used as arguments. For example:
       
         -- e.g. (StreetAddress 10 "Downing Street") : Address
     
    -

    +

    +The linearization type is {s : Str} for all these categories. +

    -

    More features of the module system

    +

    More concepts of abstract syntax

    -

    Interfaces, instances, and functors

    +

    GF as a logical framework

    +

    +In this section, we will show how +to encode advanced semantic concepts in an abstract syntax. +We use concepts inherited from type theory. Type theory +is the basis of many systems known as logical frameworks, which are +used for representing mathematical theorems and their proofs on a computer. +In fact, GF has a logical framework as its proper part: +this part is the abstract syntax. +

    +

    +In a logical framework, the formalization of a mathematical theory +is a set of type and function declarations. The following is an example +of such a theory, represented as an abstract module in GF. +

    +
    +    abstract Geometry = {
    +      cat 
    +        Line ; Point ; Circle ;            -- basic types of figures
    +        Prop ;                             -- proposition
    +      fun
    +        Parallel : Line -> Line -> Prop ;  -- x is parallel to y
    +        Centre : Circle -> Point ;         -- the centre of c
    +      } 
    +
    +

    +

    Dependent types

    + +

    Higher-order abstract syntax

    + +

    Semantic definitions

    + +

    List categories

    + +

    More features of the module system

    + +

    Interfaces, instances, and functors

    +

    Resource grammars and their reuse

    A resource grammar is a grammar built on linguistic grounds, to describe a language rather than a domain. -The GF resource grammar library contains resource grammars for +The GF resource grammar library, which contains resource grammars for 10 languages, is described more closely in the following documents:

    - +

    Efficiency of grammars

    Issues: @@ -2290,7 +2550,7 @@ Issues:

  • parsing efficiency: -mcfg vs. others - +

    Speech input and output

    Thespeak_aloud = sa command sends a string to the speech @@ -2320,7 +2580,7 @@ The method words only for grammars of English. Both Flite and ATK are freely available through the links above, but they are not distributed together with GF.

    - +

    Multilingual syntax editor

    The @@ -2337,12 +2597,12 @@ Here is a snapshot of the editor: The grammars of the snapshot are from the Letter grammar package.

    - +

    Interactive Development Environment (IDE)

    Forthcoming.

    - +

    Communicating with GF

    Other processes can communicate with the GF command interpreter, @@ -2359,7 +2619,7 @@ Thus the most silent way to invoke GF is - +

    Embedded grammars in Haskell, Java, and Prolog

    GF grammars can be used as parts of programs written in the @@ -2371,15 +2631,15 @@ following languages. The links give more documentation.

  • Prolog - +

    Alternative input and output grammar formats

    A summary is given in the following chart of GF grammar compiler phases:

    - +

    Case studies

    - +

    Interfacing formal and natural languages

    Formal and Informal Software Specifications, @@ -2392,6 +2652,6 @@ English and German. A simpler example will be explained here.

    - + diff --git a/lib/resource-1.0/doc/Resource-HOWTO.html b/lib/resource-1.0/doc/Resource-HOWTO.html index 4435d3c8e..58e05bd46 100644 --- a/lib/resource-1.0/doc/Resource-HOWTO.html +++ b/lib/resource-1.0/doc/Resource-HOWTO.html @@ -7,9 +7,56 @@

    Resource grammar writing HOWTO

    Author: Aarne Ranta <aarne (at) cs.chalmers.se>
    -Last update: Fri May 26 17:36:48 2006 +Last update: Fri Jun 16 00:59:52 2006
    +

    +
    +

    + + +

    +
    +

    The purpose of this document is to tell how to implement the GF resource grammar API for a new language. We will not cover how @@ -17,23 +64,43 @@ to use the resource grammar, nor how to change the API. But we will give some hints how to extend the API.

    -Notice. This document concerns the API v. 1.0 which has not -yet been released. You can find the current code -in GF/lib/resource-1.0/. See the -resource-1.0/README for +A manual for using the resource grammar is found in +

    +

    +http://www.cs.chalmers.se/~aarne/GF/doc/resource.pdf. +

    +

    +A tutorial on GF, also introducing the idea of resource grammars, is found in +

    +

    +http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html. +

    +

    +This document concerns the API v. 1.0. You can find the current code in +

    +

    +http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/ +

    +

    +See the README for details on how this differs from previous versions.

    +

    The resource grammar API

    The API is divided into a bunch of abstract modules. The following figure gives the dependencies of these modules.

    - +

    -The module structure is rather flat: almost every module is a direct -parent of the top module Lang. The idea +Thus the API consists of a grammar and a lexicon, which is +provided for test purposes. +

    +

    +The module structure is rather flat: most modules are direct +parents of Grammar. The idea is that you can concentrate on one linguistic aspect at a time, or also distribute the work among several authors. The module Cat defines the "glue" that ties the aspects together - a type system @@ -41,6 +108,7 @@ to which all the other modules conform, so that e.g. NP means the same thing in those modules that use NPs and those that constructs them.

    +

    Phrase category modules

    The direct parents of the top will be called phrase category modules, @@ -65,6 +133,7 @@ one of a small number of different types). Thus we have

  • Idiom: idiomatic phrases such as existentials +

    Infrastructure modules

    Expressions of each phrase category are constructed in the corresponding @@ -93,6 +162,7 @@ can skip the lincat definition of a category and use the default {s : Str} until you need to change it to something else. In English, for instance, many categories do have this linearization type.

    +

    Lexical modules

    What is lexical and what is syntactic is not as clearcut in GF as in @@ -129,6 +199,45 @@ different languages on the level of a resource grammar. In other words, application grammars are likely to use the resource in different ways for different languages.

    + +

    Language-dependent syntax modules

    +

    +In addition to the common API, there is room for language-dependent extensions +of the resource. The top level of each languages looks as follows (with English as example): +

    +
    +    abstract English = Grammar, ExtraEngAbs, DictEngAbs
    +
    +

    +where ExtraEngAbs is a collection of syntactic structures specific to English, +and DictEngAbs is an English dictionary +(at the moment, it consists of IrregEngAbs, +the irregular verbs of English). Each of these language-specific grammars has +the potential to grow into a full-scale grammar of the language. These grammar +can also be used as libraries, but the possibility of using functors is lost. +

    +

    +To give a better overview of language-specific structures, +modules like ExtraEngAbs +are built from a language-independent module ExtraAbs +by restricted inheritance: +

    +
    +    abstract ExtraEngAbs = Extra [f,g,...]
    +
    +

    +Thus any category and function in Extra may be shared by a subset of all +languages. One can see this set-up as a matrix, which tells +what Extra structures +are implemented in what languages. For the common API in Grammar, the matrix +is filled with 1's (everything is implemented in every language). +

    +

    +In a minimal resource grammar implementation, the language-dependent +extensions are just empty modules, but it is good to provide them for +the sake of uniformity. +

    +

    The core of the syntax

    Among all categories and functions, a handful are @@ -153,6 +262,7 @@ rules relate the categories to each other. It is intended to be a first approximation when designing the parameter system of a new language.

    +

    Another reduced API

    If you want to experiment with a small subset of the resource API first, @@ -161,6 +271,7 @@ try out the module explained in the GF Tutorial.

    +

    The present-tense fragment

    Some lines in the resource library are suffixed with the comment @@ -176,7 +287,9 @@ implementation. To compile a grammar with present-tense-only, use i -preproc=GF/lib/resource-1.0/mkPresent LangGer.gf

    +

    Phases of the work

    +

    Putting up a directory

    Unless you are writing an instance of a parametrized implementation @@ -262,6 +375,7 @@ as e.g. VerbGer.

    +

    Direction of work

    The real work starts now. There are many ways to proceed, the main ones being @@ -360,6 +474,7 @@ and dependences there are in your language, and you can now produce very much in the order you please. +

    The develop-test cycle

    The following develop-test cycle will @@ -416,6 +531,7 @@ follow soon. (You will found out that these explanations involve a rational reconstruction of the live process! Among other things, the API was changed during the actual process to make it more intuitive.)

    +

    Resource modules used

    These modules will be written by you. @@ -472,6 +588,7 @@ almost everything. This led in practice to the duplication of almost all code on the lin and oper levels, and made the code hard to understand and maintain.

    +

    Morphology and lexicon

    The paradigms needed to implement @@ -542,6 +659,7 @@ These constants are defined in terms of parameter types and constructors in ResGer and MorphoGer, which modules are not visible to the application grammarian.

    +

    Lock fields

    An important difference between MorphoGer and @@ -588,6 +706,7 @@ in her hidden definitions of constants in Paradigms. For instance, -- mkAdv s = {s = s ; lock_Adv = <>} ;

    +

    Lexicon construction

    The lexicon belonging to LangGer consists of two modules: @@ -607,17 +726,20 @@ the coverage of the paradigms gets thereby tested and that the use of the paradigms in LexiconGer gives a good set of examples for those who want to build new lexica.

    +

    Inside grammar modules

    Detailed implementation tricks are found in the comments of each module.

    +

    The category system

    +

    Phrase category modules

    +

    Resource modules

    +

    Lexicon

    +

    Lexicon extension

    +

    The irregularity lexicon

    It may be handy to provide a separate module of irregular @@ -658,6 +784,7 @@ few hundred perhaps. Building such a lexicon separately also makes it less important to cover everything by the worst-case paradigms (mkV etc).

    +

    Lexicon extraction from a word list

    You can often find resources such as lists of @@ -692,6 +819,7 @@ When using ready-made word lists, you should think about coyright issues. Ideally, all resource grammar material should be provided under GNU General Public License.

    +

    Lexicon extraction from raw text data

    This is a cheap technique to build a lexicon of thousands @@ -699,6 +827,7 @@ of words, if text data is available in digital format. See the Functional Morphology homepage for details.

    +

    Extending the resource grammar API

    Sooner or later it will happen that the resource grammar API @@ -707,6 +836,7 @@ that it does not include idiomatic expressions in a given language. The solution then is in the first place to build language-specific extension modules. This chapter will deal with this issue (to be completed).

    +

    Writing an instance of parametrized resource grammar implementation

    Above we have looked at how a resource implementation is built by @@ -726,6 +856,7 @@ the Romance family (to be completed). Here is a set of slides on the topic.

    +

    Parametrizing a resource grammar implementation

    This is the most demanding form of resource grammar writing. @@ -742,5 +873,5 @@ is constructed from the Finnish grammar through parametrization.

    - + diff --git a/lib/resource-1.0/doc/Resource-HOWTO.txt b/lib/resource-1.0/doc/Resource-HOWTO.txt index 937a30df5..6b8df4563 100644 --- a/lib/resource-1.0/doc/Resource-HOWTO.txt +++ b/lib/resource-1.0/doc/Resource-HOWTO.txt @@ -14,11 +14,19 @@ resource grammar API for a new language. We will //not// cover how to use the resource grammar, nor how to change the API. But we will give some hints how to extend the API. +A manual for using the resource grammar is found in -**Notice**. This document concerns the API v. 1.0 which has not -yet been released. You can find the current code -in [``GF/lib/resource-1.0/`` ..]. See the -[``resource-1.0/README`` ../README] for +[``http://www.cs.chalmers.se/~aarne/GF/doc/resource.pdf`` http://www.cs.chalmers.se/~aarne/GF/doc/resource.pdf]. + +A tutorial on GF, also introducing the idea of resource grammars, is found in + +[``http://www.cs.chalmers.se/~aarne/GF/doc/tutorial/gf-tutorial2.html`` ../../../doc/tutorial/gf-tutorial2.html]. + +This document concerns the API v. 1.0. You can find the current code in + +[``http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/`` ..] + +See the [``README`` ../README] for details on how this differs from previous versions. @@ -28,10 +36,13 @@ details on how this differs from previous versions. The API is divided into a bunch of ``abstract`` modules. The following figure gives the dependencies of these modules. -[Lang.png] +[Grammar.png] -The module structure is rather flat: almost every module is a direct -parent of the top module ``Lang``. The idea +Thus the API consists of a grammar and a lexicon, which is +provided for test purposes. + +The module structure is rather flat: most modules are direct +parents of ``Grammar``. The idea is that you can concentrate on one linguistic aspect at a time, or also distribute the work among several authors. The module ``Cat`` defines the "glue" that ties the aspects together - a type system @@ -127,6 +138,38 @@ application grammars are likely to use the resource in different ways for different languages. +==Language-dependent syntax modules== + +In addition to the common API, there is room for language-dependent extensions +of the resource. The top level of each languages looks as follows (with English as example): +``` + abstract English = Grammar, ExtraEngAbs, DictEngAbs +``` +where ``ExtraEngAbs`` is a collection of syntactic structures specific to English, +and ``DictEngAbs`` is an English dictionary +(at the moment, it consists of ``IrregEngAbs``, +the irregular verbs of English). Each of these language-specific grammars has +the potential to grow into a full-scale grammar of the language. These grammar +can also be used as libraries, but the possibility of using functors is lost. + +To give a better overview of language-specific structures, +modules like ``ExtraEngAbs`` +are built from a language-independent module ``ExtraAbs`` +by restricted inheritance: +``` + abstract ExtraEngAbs = Extra [f,g,...] +``` +Thus any category and function in ``Extra`` may be shared by a subset of all +languages. One can see this set-up as a matrix, which tells +what ``Extra`` structures +are implemented in what languages. For the common API in ``Grammar``, the matrix +is filled with 1's (everything is implemented in every language). + +In a minimal resource grammar implementation, the language-dependent +extensions are just empty modules, but it is good to provide them for +the sake of uniformity. + + ==The core of the syntax== Among all categories and functions, a handful are