gf-core/lib/resource/doc/clt2006.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
<TITLE>The GF Resource Grammar Library Version 1.0</TITLE>
</HEAD><BODY BGCOLOR="white" TEXT="black">
<P ALIGN="center"><CENTER><H1>The GF Resource Grammar Library Version 1.0</H1>
<FONT SIZE="4">
<I>Author: Aarne Ranta &lt;aarne (at) cs.chalmers.se&gt;</I><BR>
Last update: Sat Jan 13 17:48:13 2007
</FONT></CENTER>

<P>
<!-- NEW -->
</P>
<H2>Plan</H2>
<P>
Purpose
</P>
<P>
Background
</P>
<P>
Coverage
</P>
<P>
Structure
</P>
<P>
How to use
</P>
<P>
How to implement a new language
</P>
<P>
How to extend the API
</P>
<P>
<!-- NEW -->
</P>
<H2>Purpose</H2>
<H3>Library for applications</H3>
<P>
High-level access to grammatical rules
</P>
<P>
E.g. <I>You have k new messages</I> rendered in ten languages <I>X</I>
</P>
<PRE>
    render X (Have (You (Number (k (New Message)))))
</PRE>
<P></P>
<P>
Usability for different purposes
</P>
<UL>
<LI>translation systems
<LI>software localization
<LI>dialogue systems
<LI>language teaching
</UL>

<P>
<!-- NEW -->
</P>
<H3>Not primarily code for a parser</H3>
<P>
Often in NLP, a grammar is just high-level code for a parser.
</P>
<P>
But writing a grammar can be inadequate for parsing:
</P>
<UL>
<LI>too much manual work
<LI>too inefficient
<LI>not robust
<LI>too ambiguous
</UL>

<P>
Moreover, a grammar fine-tuned for parsing may not be reusable
</P>
<UL>
<LI>for generation
<LI>for specialized grammars
<LI>as library
</UL>

<P>
<!-- NEW -->
</P>
<H3>Grammar as language definition</H3>
<P>
Linguistic ontology: <B>abstract syntax</B>
</P>
<P>
E.g. adjectival modification rule
</P>
<PRE>
    AdjCN : AP -&gt; CN -&gt; CN ;
</PRE>
<P>
Rendering in different languages: <B>concrete syntax</B>
</P>
<PRE>
    AdjCN (PositA even_A) (UseN number_N)

    even number, even numbers

    jämnt tal, jämna tal

    nombre pair, nombres pairs
</PRE>
<P>
Abstract away from inflection, agreement, word order.
</P>
<P>
Resource grammars have generation perspective, rather than parsing
</P>
<UL>
<LI>abstract syntax serves as a key to renderings in different languages
</UL>

<P>
<!-- NEW -->
</P>
<H3>Usability by non-linguists</H3>
<P>
Division of labour: resource grammars hide linguistic details
</P>
<UL>
<LI><CODE>AdjCN : AP -&gt; CN -&gt; CN</CODE> hides agreement, word order,...
</UL>

<P>
Presentation: "school grammar" concepts, dictionary-like conventions
</P>
<PRE>
    bird_N = reg2N "Vogel" "Vögel" masculine
</PRE>
<P>
API = Application Programmer's Interface
</P>
<P>
Documentation: <CODE>gfdoc</CODE>
</P>
<UL>
<LI>produces html from gf
</UL>

<P>
IDE = Interactive Development Environment (forthcoming)
</P>
<UL>
<LI>library browser and syntax editor for grammar writing
</UL>

<P>
Example-based grammar writing
</P>
<PRE>
    render Ita (parse Eng "you have k messages")
</PRE>
<P></P>
<P>
<!-- NEW -->
</P>
<H3>Scientific interest</H3>
<P>
Linguistics
</P>
<UL>
<LI>definition of linguistic ontology
<LI>describing language on this level of abstraction
<LI>coping with different problems in different languages
<LI>sharing concrete-syntax code between languages
<LI>creating a resource for other NLP applications
</UL>

<P>
Computer science
</P>
<UL>
<LI>datastructures for grammar rules
<LI>type systems for grammars
<LI>algorithms: parsing, generation, grammar compilation
<LI>domain-specific programming language (GF)
<LI>module system
</UL>

<P>
<!-- NEW -->
</P>
<H2>Background</H2>
<H3>History</H3>
<P>
2002: v. 0.2
</P>
<UL>
<LI>English, French, German, Swedish
</UL>

<P>
2003: v. 0.6
</P>
<UL>
<LI>module system
<LI>added Finnish, Italian, Russian
<LI>used in KeY
</UL>

<P>
2005: v. 0.9
</P>
<UL>
<LI>tenses
<LI>added Danish, Norwegian, Spanish; no German
<LI>used in WebALT
</UL>

<P>
2006: v. 1.0
</P>
<UL>
<LI>approximate CLE coverage
<LI>reorganized module system and implementation
<LI>not yet (4/3/2006) for Danish and Russian
</UL>

<P>
<!-- NEW -->
</P>
<H3>Authors</H3>
<P>
Janna Khegai (Russian modules, forthcoming),
Bjorn Bringert (many Swadesh lexica),
Inger Andersson and Therese Söderberg (Spanish morphology),
Ludmilla Bogavac (Russian morphology),
Carlos Gonzalia (Spanish cardinals),
Harald Hammarström (German morphology),
Partik Jansson (Swedish cardinals),
Aarne Ranta.
</P>
<P>
We are grateful for contributions and
comments to several other people who have used this and
the previous versions of the resource library, including
Ana Bove,
David Burke,
Lauri Carlson,
Gloria Casanellas,
Karin Cavallin,
Hans-Joachim Daniels,
Kristofer Johannisson,
Anni Laine,
Wanjiku Ng'ang'a,
Jordi Saludes.
</P>
<P>
<!-- NEW -->
</P>
<H3>Related work</H3>
<P>
CLE (Core Language Engine,
<A HREF="http://mitpress.mit.edu/catalog/item/default.asp?tid=7739&amp;ttype=2">Book 1992</A>)
</P>
<UL>
<LI>English, Swedish, French, Danish
<LI>uses Definita Clause Grammars, implementation in Prolog
<LI>coverage for the ATIS corpus,
  <A HREF="http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=0521770777">Spoken Language Translator (2001)</A>
<LI>grammar specialization via explanation-based learning
</UL>

<P>
<!-- NEW -->
</P>
<H3>Slightly less related work</H3>
<P>
<A HREF="http://www.delph-in.net/matrix/">LinGO Grammar Matrix</A>
</P>
<UL>
<LI>English, German, Japanese, Spanish, ...
<LI>uses HPSG, implementation in LKB
<LI>a check list for parallel grammar implementations
</UL>

<P>
<A HREF="http://www2.parc.com/istl/groups/nltt/pargram/">Pargram</A>
</P>
<UL>
<LI>Aimed: Arabic, Chinese, English, French, German, Hungarian, Japanese,
Malagasy, Norwegian, Turkish, Urdu, Vietnamese, and Welsh
<LI>uses LFG
<LI>one set of big grammars, transfer rules
</UL>

<P>
Rosetta Machine Translation (<A HREF="http://citeseer.ist.psu.edu/181924.html">Book 1994</A>)
</P>
<UL>
<LI>Dutch, English, French
<LI>uses M-grammars, compositional translation inspired by Montague
<LI>compositional transfer rules
</UL>

<P>
<!-- NEW -->
</P>
<H2>Coverage</H2>
<H3>Languages</H3>
<P>
The current GF Resource Project covers ten languages:
</P>
<UL>
<LI><CODE>Dan</CODE>ish
<LI><CODE>Eng</CODE>lish
<LI><CODE>Fin</CODE>nish
<LI><CODE>Fre</CODE>nch
<LI><CODE>Ger</CODE>man
<LI><CODE>Ita</CODE>lian
<LI><CODE>Nor</CODE>wegian (bokmål)
<LI><CODE>Rus</CODE>sian
<LI><CODE>Spa</CODE>nish
<LI><CODE>Swe</CODE>dish
</UL>

<P>
In addition, parts of Arabic, Estonian, Latin, and Urdu
</P>
<P>
API 1.0 not yet implemented for Danish and Russian
</P>
<P>
<!-- NEW -->
</P>
<H3>Morphology and lexicon</H3>
<P>
Complete inflection engine
</P>
<UL>
<LI>all word classes
<LI>all forms
<LI>all inflectional paradigms
</UL>

<P>
Basic lexicon
</P>
<UL>
<LI>100 structural words
<LI>340 content words, mainly for testing
<LI>these include the 207 <A HREF="http://en.wiktionary.org/wiki/Swadesh_List">Swadesh words</A>
</UL>

<P>
It is more important to enable lexicon extensions than to
provide a huge lexicon.
</P>
<UL>
<LI>technical lexica can have very special words, which tend to be regular
</UL>

<P>
<!-- NEW -->
</P>
<H3>Syntactic structures</H3>
<P>
Texts:
sequences of phrases with punctuation
</P>
<P>
Phrases:
declaratives, questions, imperatives, vocatives
</P>
<P>
Tense, mood, and polarity:
present, past, future, conditional ; simultaneous, anterior ; positive, negative
</P>
<P>
Questions:
yes-no, "wh" ; direct, indirect
</P>
<P>
Clauses:
main, relative, embedded (subject, object, adverbial)
</P>
<P>
Verb phrases:
intransitive, transitive, ditransitive, prepositional
</P>
<P>
Noun phrases:
proper names, pronouns, determiners, possessives, cardinals and ordinals
</P>
<P>
Coordination:
lists of sentences, noun phrases, adverbs, adjectival phrases
</P>
<P>
<!-- NEW -->
</P>
<H3>Quantitative measures</H3>
<P>
67 categories
</P>
<P>
150 abstract syntax combination rules
</P>
<P>
100 structural words
</P>
<P>
340 content words in a test lexicon
</P>
<P>
35 kLines of source code (4/3/2006):
</P>
<PRE>
    abstract     1131
    english      2344
    german       2386
    finnish      3396
    norwegian    1257
    swedish      1465
    scandinavian 1023
    french       3246 -- Besch + Irreg + Morpho 2111
    italian      7797 -- Besch 6512
    spanish      7120 -- Besch 5877
    romance      1066
</PRE>
<P></P>
<P>
<!-- NEW -->
</P>
<H2>Structure of the API</H2>
<H3>Language-independent ground API</H3>
<P>
<IMG ALIGN="middle" SRC="Lang.png" BORDER="0" ALT="">
</P>
<P>
<!-- NEW -->
</P>
<H3>The structure of a text sentence</H3>
<PRE>
  John walks.

  TFullStop              : Phr -&gt; Text -&gt; Text              | TQuestMark, TExclMark
    (PhrUtt              : PConj -&gt; Utt -&gt; Voc -&gt; Phr       | PhrYes, PhrNo, ...
      NoPConj                                               | but_PConj, ...
      (UttS              : S -&gt; Utt                         | UttQS, UttImp, UttNP, ...
        (UseCl           : Tense -&gt; Anter -&gt; Pol -&gt; Cl -&gt; S
          TPres
          ASimul
          PPos
          (PredVP        : NP -&gt; VP -&gt; Cl                   | ImpersNP, ExistNP, ...
            (UsePN       : PN -&gt; NP
              john_PN)
            (UseV        : V  -&gt; VP                         | ComplV2, UseComp, ...
              walk_V))))
      NoVoc)                                                | VocNP, please_Voc, ...
    TEmpty
</PRE>
<P></P>
<P>
<!-- NEW -->
</P>
<H3>The structure in the syntax editor</H3>
<P>
<IMG ALIGN="middle" SRC="editor.png" BORDER="0" ALT="">
</P>
<P>
<!-- NEW -->
</P>
<H3>Language-dependent paradigm modules</H3>
<H4>Regular paradigms</H4>
<P>
Every language implements these regular patterns that take
"dictionary forms" as arguments.
</P>
<PRE>
    regN : Str -&gt; N
    regA : Str -&gt; A
    regV : Str -&gt; V
</PRE>
<P>
Their usefulness varies. For instance, they
all are quite good in Finnish and English.
In Swedish, less so:
</P>
<PRE>
    regN "val" ---&gt; val, valen, valar, valarna
</PRE>
<P>
Initializing a lexicon with <CODE>regX</CODE> for every entry is
usually a good starting point in grammar development.
</P>
<P>
<!-- NEW -->
</P>
<H4>Regular paradigms</H4>
<P>
In Swedish, giving the gender of <CODE>N</CODE> improves a lot
</P>
<PRE>
    regGenN "val" neutrum ---&gt; val, valet, val, valen
</PRE>
<P></P>
<P>
There are also special constructs taking other forms:
</P>
<PRE>
    mk2N   : (nyckel,nycklar : Str) -&gt; N

    mk1N   : (bilarna : Str) -&gt; N

    irregV : (dricka, drack, druckit : Str) -&gt; V
</PRE>
<P></P>
<P>
Regular verbs are actually implemented the
<A HREF="http://lexin.nada.kth.se/sve-sve.shtml">Lexin</A> way
</P>
<PRE>
    regV : (talar : Str) -&gt; V
</PRE>
<P></P>
<P>
<!-- NEW -->
</P>
<H4>Worst-case paradigms</H4>
<P>
To cover all situations, worst-case paradigms are given. E.g. Swedish
</P>
<PRE>
    mkN : (apa,apan,apor,aporna : Str) -&gt; N

    mkA : (liten, litet, lilla, små, mindre, minst, minsta : Str) -&gt; A

    mkV : (supa,super,sup,söp,supit,supen : Str) -&gt; V
</PRE>
<P></P>
<P>
<!-- NEW -->
</P>
<H4>Irregular words</H4>
<P>
Iregular words in <CODE>IrregX</CODE>, e.g. Swedish:
</P>
<PRE>
      draga_V : V =
        mkV
          (variants { "dra"  ; "draga"})
          (variants { "drar" ; "drager"})
          (variants { "dra"  ; "drag" })
          "drog"
          "dragit"
          "dragen" ;
</PRE>
<P>
Goal: eliminate the user's need of worst-case functions.
</P>
<P>
<!-- NEW -->
</P>
<H3>Language-dependent syntax extensions</H3>
<P>
Syntactic structures that are not shared by all languages.
</P>
<P>
Alternative (and often more idiomatic) ways to say what is already covered by the API.
</P>
<P>
Not implemented yet.
</P>
<P>
Candidates:
</P>
<UL>
<LI>Norwegian post-possessives: <CODE>bilen min</CODE>
<LI>French question forms: <CODE>est-ce que tu dors ?</CODE>
<LI>Romance simple past tenses
</UL>

<P>
<!-- NEW -->
</P>
<H3>Special-purpose APIs</H3>
<P>
Mathematical
</P>
<P>
Multimodal
</P>
<P>
Present
</P>
<P>
Minimal
</P>
<P>
Shallow
</P>
<P>
<!-- NEW -->
</P>
<H3>How to use the resource as top-level grammar</H3>
<H3>Compiling</H3>
<P>
It is a good idea to compile the library, so that it can be opened faster
</P>
<PRE>
    GF/lib/resource-1.0% make

    writes GF/lib/alltenses
           GF/lib/present
           GF/lib/resource-1.0/langs.gfcm
</PRE>
<P>
If you don't intend to change the library, you never need to process the source
files again. Just do some of
</P>
<PRE>
    gf -nocf langs.gfcm                                    -- all 8 languages

    gf -nocf -path=alltenses:prelude alltenses/LangSwe.gfc -- Swedish only

    gf -nocf -path=present:prelude present/LangSwe.gfc     -- Swedish in present tense only
</PRE>
<P></P>
<P>
<!-- NEW -->
</P>
<H3>Parsing</H3>
<P>
The default parser does not work! (It is obsolete anyway.)
</P>
<P>
The MCFG parser (the new standard) works in theory, but can
in practice be too slow to build.
</P>
<P>
But it does work in some languages, after waiting appr. 20 seconds
</P>
<PRE>
    p -mcfg -lang=LangEng -cat=S "I would see her"

    p -mcfg -lang=LangSwe -cat=S "jag skulle se henne"
</PRE>
<P>
Parsing in <CODE>present/</CODE> versions is quicker.
</P>
<P>
Remedies:
</P>
<UL>
<LI>write application grammars for parsing
<LI>use treebank lookup instead
</UL>

<P>
<!-- NEW -->
</P>
<H3>Treebank generation</H3>
<P>
Multilingual treebank entry = tree + linearizations
</P>
<P>
Some examples on treebank generation, assuming <CODE>langs.gfcm</CODE>
</P>
<PRE>
    gr -cat=S   -number=10 -cf | tb                  -- 10 random S

    gt -cat=Phr -depth=4       | tb -xml | wf ex.xml -- all Phr to depth 4, into file ex.xml
</PRE>
<P>
Regression testing
</P>
<PRE>
    rf ex.xml | tb -c      -- read treebank from file and compare to present grammars
</PRE>
<P>
Updating a treebank
</P>
<PRE>
    rf old.xml | tb -trees | tb -xml | wf new.xml    -- read old from file, write new to file
</PRE>
<P></P>
<P>
<!-- NEW -->
</P>
<H3>The multilingual treebank format</H3>
<P>
Tree + linearizations
</P>
<PRE>
    &gt; gr -cat=Cl | tb
    PredVP (UsePron they_Pron) (PassV2 seek_V2)
    They are sought
    Elles sont cherchées
    Son buscadas
    Vengono cercate
    De blir sökta
    De blir lette
    Sie werden gesucht
    Heidät etsitään
</PRE>
<P>
These can also be wrapped in XML tags (<CODE>tb -xml</CODE>)
</P>
<P>
<!-- NEW -->
</P>
<H3>Treebank-based parsing</H3>
<P>
Brute-force method that helps if real parsing is more expensive.
</P>
<PRE>
    make treebank                     -- make treebank with all languages

    gf -treebank langs.xml            -- start GF by reading the treebank

    &gt; ut -strings -treebank=LangIta   -- show all Ita strings

    &gt; ut -treebank=LangIta -raw "Quello non si romperebbe" -- look up a string

    &gt; i -nocf langs.gfcm              -- read grammar to be able to linearize

    &gt; ut -treebank=LangIta "Quello non si romperebbe" | l -multi  -- translate to all
</PRE>
<P></P>
<P>
<!-- NEW -->
</P>
<H3>Morphology</H3>
<P>
Use morphological analyser
</P>
<PRE>
    gf -nocf -retain -path=alltenses:prelude alltenses/LangSwe.gf
    &gt; ma "jag kan inte höra vad du säger"
</PRE>
<P></P>
<P>
Try out a morphology quiz
</P>
<PRE>
    &gt; mq -cat=V
</PRE>
<P></P>
<P>
Try out inflection patterns
</P>
<PRE>
    gf -retain -path=alltenses:prelude alltenses/ParadigmsSwe.gfr
    &gt; cc regV "lyser"
</PRE>
<P></P>
<P>
<!-- NEW -->
</P>
<H3>Syntax editing</H3>
<P>
The simplest way to start editing with all grammars is
</P>
<PRE>
    gfeditor langs.gfcm
</PRE>
<P>
The forthcoming IDE will extend the syntax editor with
a <CODE>Paradigms</CODE> file browser and a control on what
parts of an application grammar remain to be implemented.
</P>
<P>
<!-- NEW -->
</P>
<H3>Efficient parsing via application grammar</H3>
<P>
Get rid of discontinuous constituents (in particular, <CODE>VP</CODE>)
</P>
<P>
Example: <A HREF="gfdoc/Predication.html"><CODE>mathematical/Predication</CODE></A>:
</P>
<PRE>
    predV2 : V2 -&gt; NP -&gt; NP -&gt; Cl
</PRE>
<P>
instead of <CODE>PredVP np (ComplV2 v2 np')</CODE>
</P>
<P>
<!-- NEW -->
</P>
<H2>How to use as library</H2>
<H3>Specialization through parametrized modules</H3>
<P>
The application grammar is implemented with reference to
the resource API
</P>
<P>
Individual languages are instantiations
</P>
<P>
Example: <A HREF="../../../examples/tram/TramI.gf">tram</A>
</P>
<P>
<!-- NEW -->
</P>
<H3>Compile-time transfer</H3>
<P>
Instead of parametrized modules:
</P>
<P>
select resource functions differently for different languages
</P>
<P>
Example: imperative vs. infinitive in mathematical exercises
</P>
<P>
<!-- NEW -->
</P>
<H3>A natural division into modules</H3>
<P>
Lexicon in language-dependent moduls
</P>
<P>
Combination rules in a parametrized module
</P>
<P>
<!-- NEW -->
</P>
<H3>Example-based grammar writing</H3>
<P>
Example: <A HREF="../../../examples/animal/QuestionsI.gfe">animal</A>
</P>
<PRE>
  --# -resource=present/LangEng.gf
  --# -path=.:present:prelude

  -- to compile: gf -examples QuestionsI.gfe

  incomplete concrete QuestionsI of Questions = open Lang in {
    lincat
      Phrase = Phr ;
      Entity = N ;
      Action = V2 ;
    lin
      Who  love_V2 man_N           = in Phr "who loves men" ;
      Whom man_N love_V2           = in Phr "whom does the man love" ;
      Answer woman_N love_V2 man_N = in Phr "the woman loves men" ;
  }
</PRE>
<P></P>
<P>
<!-- NEW -->
</P>
<H2>How to implement a new language</H2>
<P>
See <A HREF="Resource-HOWTO.html">Resource-HOWTO</A>
</P>
<H2>Ordinary modules</H2>
<P>
Write a concrete syntax module for each abstract module in the API
</P>
<P>
Write a <CODE>Paradigms</CODE> module
</P>
<P>
Examples: English, Finnish, German, Russian
</P>
<P>
<!-- NEW -->
</P>
<H2>Parametrized modules</H2>
<P>
Examples: Romance (French, Italian, Spanish), Scandinavian (Danish, Norwegian, Swedish)
</P>
<P>
Write a <CODE>Diff</CODE> interface for a family of languages
</P>
<P>
Write concrete syntaxes as functors opening the interface
</P>
<P>
Write separate <CODE>Paradigms</CODE> modules for each language
</P>
<P>
Advantages:
</P>
<UL>
<LI>easier maintenance of library
<LI>insights into language families
</UL>

<P>
Problems:
</P>
<UL>
<LI>more abstract thinking required
<LI>individual grammars may not come out optimal in elegance and efficiency
</UL>

<P>
<!-- NEW -->
</P>
<H3>The core API</H3>
<P>
Everything else is variations of this
</P>
<PRE>
  cat
    Cl ;   -- clause
    VP ;   -- verb phrase
    V2 ;   -- two-place verb
    NP ;   -- noun phrase
    CN ;   -- common noun
    Det ;  -- determiner
    AP ;   -- adjectival phrase

  fun
    PredVP  : NP  -&gt; VP -&gt; Cl ;   -- predication
    ComplV2 : V2  -&gt; NP -&gt; VP ;   -- complementization
    DetCN   : Det -&gt; CN -&gt; NP ;   -- determination
    ModCN   : AP  -&gt; CN -&gt; CN ;   -- modification
</PRE>
<P></P>
<P>
<!-- NEW -->
</P>
<H3>The core API in Latin: parameters</H3>
<P>
This <A HREF="latin.gf">toy Latin grammar</A> shows in a nutshell how the core
can be implemented.
</P>
<PRE>
  param
    Number   = Sg | Pl ;
    Person   = P1 | P2 | P3 ;
    Tense    = Pres | Past ;
    Polarity = Pos | Neg ;
    Case     = Nom | Acc | Dat ;
    Gender   = Masc | Fem | Neutr ;
  oper
    Agr = {g : Gender ; n : Number ; p : Person} ; -- agreement features
</PRE>
<P></P>
<P>
<!-- NEW -->
</P>
<H3>The core API in Latin: linearization types</H3>
<PRE>
  lincat
    Cl = {
      s : Tense =&gt; Polarity =&gt; Str
      } ;
    VP  = {
      verb  : Tense =&gt; Polarity =&gt; Agr =&gt; Str ;  -- finite verb
      neg   : Polarity =&gt; Str ;                  -- negation
      compl : Agr =&gt; Str                         -- complement
      } ;
    V2 = {
      s : Tense =&gt; Number =&gt; Person =&gt; Str ;
      c : Case                                   -- complement case
      } ;
    NP = {
      s : Case =&gt; Str ;
      a : Agr                                    -- agreement features
      } ;
    CN = {
      s : Number =&gt; Case =&gt; Str ;
      g : Gender
      } ;
    Det = {
      s : Gender =&gt; Case =&gt; Str ;
      n : Number
      } ;
    AP = {
      s : Gender =&gt; Number =&gt; Case =&gt; Str
      } ;
</PRE>
<P></P>
<P>
<!-- NEW -->
</P>
<H3>The core API in Latin: predication and complementization</H3>
<PRE>
  lin
    PredVP np vp = {
      s = \\t,p =&gt;
        let
          agr = np.a ;
          subject = np.s ! Nom ;
          object  = vp.compl ! agr ;
          verb    = vp.neg ! p ++ vp.verb ! t ! p ! agr
        in
        subject ++ object ++ verb
      } ;

    ComplV2 v np = {
      verb  = \\t,p,a =&gt; v.s ! t ! a.n ! a.p ;
      compl = \\_ =&gt; np.s ! v.c ;
      neg   = table {Pos =&gt; [] ; Neg =&gt; "non"}
      } ;
</PRE>
<P></P>
<P>
<!-- NEW -->
</P>
<H3>The core API in Latin: determination and modification</H3>
<PRE>
    DetCN det cn =
      let
        g = cn.g ;
        n = det.n
      in {
        s = \\c =&gt; det.s ! g ! c ++ cn.s ! n ! c ;
        a = {g = g ; n = n ; p = P3}
        } ;

    ModCN ap cn =
      let
        g = cn.g
      in {
        s = \\n,c =&gt; cn.s ! n ! c ++ ap.s ! g ! n ! c ;
        g = g
        } ;
</PRE>
<P></P>
<P>
<!-- NEW -->
</P>
<H3>How to proceed</H3>
<OL>
<LI>put up a directory with dummy modules by copying from e.g. English and
commenting out the contents
<P></P>
<LI>so you will have a compilable <CODE>LangX</CODE> all the time
<P></P>
<LI>start with nouns and their inflection
<P></P>
<LI>proceed to verbs and their inflection
<P></P>
<LI>add some noun phrases
<P></P>
<LI>implement predication
</OL>

<P>
<!-- NEW -->
</P>
<H2>How to extend the API</H2>
<P>
Extend old modules or add a new one?
</P>
<P>
Usually better to start a new one: then you don't have to implement it
for all languages at once.
</P>
<P>
Exception: if you are working with a language-specific API extension,
you can work directly in that module.
</P>

<!-- html code generated by txt2tags 2.0 (http://txt2tags.sf.net) -->
<!-- cmdline: txt2tags clt2006.txt -->
</BODY></HTML>