preserve 1.0

2026-06-29 18:48:35 -06:00 · 2006-06-22 22:27:48 +00:00
parent 251bc4c738
commit 4821244741
569 changed files with 0 additions and 0 deletions
@@ -1,899 +0,0 @@
-The GF Resource Grammar Library Version 1.0
-Author: Aarne Ranta <aarne (at) cs.chalmers.se>
-Last update: %%date(%c)
-
-% NOTE: this is a txt2tags file.
-% Create an html file from this file using:
-% txt2tags --toc clt2006.txt
-
-%!target:html
-
-%!postproc(html): #NEW <!-- NEW -->
-
-
-#NEW 
-
-==Plan==
-
-Purpose
-
-Background
-
-Coverage
-
-Structure
-
-How to use
-
-How to implement a new language
-
-How to extend the API
-
-
-
-#NEW
-
-==Purpose==
-
-===Library for applications===
-
-High-level access to grammatical rules
-
-E.g. //You have k new messages// rendered in ten languages //X//
-```
-  render X (Have (You (Number (k (New Message)))))
-```
-
-Usability for different purposes
- translation systems
- software localization
- dialogue systems
- language teaching
-
-
-#NEW
-
-===Not primarily code for a parser===
-
-Often in NLP, a grammar is just high-level code for a parser.
-
-But writing a grammar can be inadequate for parsing:
- too much manual work
- too inefficient
- not robust
- too ambiguous
-
-
-Moreover, a grammar fine-tuned for parsing may not be reusable
- for generation
- for specialized grammars
- as library
-
-
-#NEW
-
-===Grammar as language definition===
-
-Linguistic ontology: **abstract syntax**
-
-E.g. adjectival modification rule
-```
-  AdjCN : AP -> CN -> CN ;
-```
-Rendering in different languages: **concrete syntax**
-```
-  AdjCN (PositA even_A) (UseN number_N)
-
-  even number, even numbers
-
-  jämnt tal, jämna tal
-
-  nombre pair, nombres pairs
-```
-Abstract away from inflection, agreement, word order.
-
-Resource grammars have generation perspective, rather than parsing
- abstract syntax serves as a key to renderings in different languages
-
-
-
-#NEW
-
-===Usability by non-linguists===
-
-Division of labour: resource grammars hide linguistic details
- ``AdjCN : AP -> CN -> CN`` hides agreement, word order,...
-
-
-Presentation: "school grammar" concepts, dictionary-like conventions
-```
-  bird_N = reg2N "Vogel" "Vögel" masculine
-```
-API = Application Programmer's Interface
-
-Documentation: ``gfdoc`` 
- produces html from gf
-
-
-IDE = Interactive Development Environment (forthcoming)
- library browser and syntax editor for grammar writing
-
-
-Example-based grammar writing
-```
-  render Ita (parse Eng "you have k messages")
-```
-
-
-#NEW
-
-===Scientific interest===
-
-Linguistics
- definition of linguistic ontology
- describing language on this level of abstraction
- coping with different problems in different languages
- sharing concrete-syntax code between languages
- creating a resource for other NLP applications
-
-
-Computer science
- datastructures for grammar rules
- type systems for grammars
- algorithms: parsing, generation, grammar compilation
- domain-specific programming language (GF)
- module system
-
-
-
-#NEW
-
-==Background==
-
-===History===
-
-2002: v. 0.2
- English, French, German, Swedish
-
-
-2003: v. 0.6
- module system
- added Finnish, Italian, Russian
- used in KeY
-
-
-2005: v. 0.9 
- tenses
- added Danish, Norwegian, Spanish; no German
- used in WebALT
-
-
-2006: v. 1.0
- approximate CLE coverage
- reorganized module system and implementation
- not yet (4/3/2006) for Danish and Russian
-
-
-#NEW
-
-===Authors===
-
-Janna Khegai (Russian modules, forthcoming),
-Bjorn Bringert (many Swadesh lexica),
-Inger Andersson and Therese Söderberg (Spanish morphology),
-Ludmilla Bogavac (Russian morphology),
-Carlos Gonzalia (Spanish cardinals), 
-Harald Hammarström (German morphology),
-Partik Jansson (Swedish cardinals),
-Aarne Ranta.
-
-We are grateful for contributions and 
-comments to several other people who have used this and 
-the previous versions of the resource library, including
-Ana Bove,
-David Burke,
-Lauri Carlson,
-Gloria Casanellas,
-Karin Cavallin,
-Hans-Joachim Daniels,
-Kristofer Johannisson,
-Anni Laine,
-Wanjiku Ng'ang'a,
-Jordi Saludes.
-
-
-#NEW
-
-===Related work===
-
-CLE (Core Language Engine, 
-[Book 1992 http://mitpress.mit.edu/catalog/item/default.asp?tid=7739&ttype=2])
- English, Swedish, French, Danish
- uses Definita Clause Grammars, implementation in Prolog
- coverage for the ATIS corpus, 
-  [Spoken Language Translator (2001) http://www.cambridge.org/uk/catalogue/catalogue.asp?isbn=0521770777]
- grammar specialization via explanation-based learning
-
-
-#NEW
-
-===Slightly less related work===
-
-[LinGO Grammar Matrix http://www.delph-in.net/matrix/]
- English, German, Japanese, Spanish, ...
- uses HPSG, implementation in LKB
- a check list for parallel grammar implementations
-
-
-[Pargram http://www2.parc.com/istl/groups/nltt/pargram/]
- Aimed: Arabic, Chinese, English, French, German, Hungarian, Japanese, 
-Malagasy, Norwegian, Turkish, Urdu, Vietnamese, and Welsh
- uses LFG
- one set of big grammars, transfer rules
-
-
-Rosetta Machine Translation ([Book 1994 http://citeseer.ist.psu.edu/181924.html])
- Dutch, English, French
- uses M-grammars, compositional translation inspired by Montague
- compositional transfer rules
-
-
-#NEW
-
-==Coverage==
-
-===Languages===
-
-The current GF Resource Project covers ten languages:
- ``Dan``ish
- ``Eng``lish
- ``Fin``nish
- ``Fre``nch
- ``Ger``man
- ``Ita``lian
- ``Nor``wegian (bokmål)
- ``Rus``sian
- ``Spa``nish
- ``Swe``dish
-
-
-In addition, parts of Arabic, Estonian, Latin, and Urdu
-
-API 1.0 not yet implemented for Danish and Russian
-
-
-
-#NEW
-
-===Morphology and lexicon===
-
-Complete inflection engine
- all word classes
- all forms
- all inflectional paradigms
-
-
-Basic lexicon
- 100 structural words
- 340 content words, mainly for testing
- these include the 207 [Swadesh words http://en.wiktionary.org/wiki/Swadesh_List]
-
-
-It is more important to enable lexicon extensions than to 
-provide a huge lexicon.
- technical lexica can have very special words, which tend to be regular
-
-
-
-
-
-#NEW
-
-===Syntactic structures===
-
-Texts: 
-sequences of phrases with punctuation
-
-Phrases: 
-declaratives, questions, imperatives, vocatives
-
-Tense, mood, and polarity: 
-present, past, future, conditional ; simultaneous, anterior ; positive, negative
-
-Questions: 
-yes-no, "wh" ; direct, indirect
-
-Clauses: 
-main, relative, embedded (subject, object, adverbial)
-
-Verb phrases: 
-intransitive, transitive, ditransitive, prepositional
-
-Noun phrases: 
-proper names, pronouns, determiners, possessives, cardinals and ordinals
-
-Coordination:
-lists of sentences, noun phrases, adverbs, adjectival phrases
-
-
-#NEW
-
-===Quantitative measures===
-
-67 categories
-
-150 abstract syntax combination rules
-
-100 structural words
-
-340 content words in a test lexicon
-
-35 kLines of source code (4/3/2006):
-```
-  abstract     1131
-  english      2344
-  german       2386
-  finnish      3396
-  norwegian    1257
-  swedish      1465
-  scandinavian 1023
-  french       3246 -- Besch + Irreg + Morpho 2111
-  italian      7797 -- Besch 6512
-  spanish      7120 -- Besch 5877
-  romance      1066
-```
-
-
-#NEW
-
-==Structure of the API==
-
-===Language-independent ground API===
-
-[Lang.png]
-
-
-#NEW
-
-===The structure of a text sentence===
-
-```
-John walks.
-
-TFullStop              : Phr -> Text -> Text              | TQuestMark, TExclMark
-  (PhrUtt              : PConj -> Utt -> Voc -> Phr       | PhrYes, PhrNo, ...
-    NoPConj                                               | but_PConj, ...
-    (UttS              : S -> Utt                         | UttQS, UttImp, UttNP, ...
-      (UseCl           : Tense -> Anter -> Pol -> Cl -> S
-        TPres              
-        ASimul 
-        PPos 
-        (PredVP        : NP -> VP -> Cl                   | ImpersNP, ExistNP, ...
-          (UsePN       : PN -> NP 
-            john_PN) 
-          (UseV        : V  -> VP                         | ComplV2, UseComp, ...
-            walk_V)))) 
-    NoVoc)                                                | VocNP, please_Voc, ...
-  TEmpty
-```
-
-#NEW
-
-===The structure in the syntax editor===
-
-[editor.png]
-
-
-#NEW
-
-===Language-dependent paradigm modules===
-
-====Regular paradigms====
-
-Every language implements these regular patterns that take
-"dictionary forms" as arguments.
-```
-  regN : Str -> N
-  regA : Str -> A 
-  regV : Str -> V
-```
-Their usefulness varies. For instance, they
-all are quite good in Finnish and English.
-In Swedish, less so:
-```
-  regN "val" ---> val, valen, valar, valarna
-```
-Initializing a lexicon with ``regX`` for every entry is
-usually a good starting point in grammar development.
-
-
-#NEW
-
-====Regular paradigms====
-
-In Swedish, giving the gender of ``N`` improves a lot
-```
-  regGenN "val" neutrum ---> val, valet, val, valen
-```
-
-There are also special constructs taking other forms:
-```
-  mk2N   : (nyckel,nycklar : Str) -> N
-
-  mk1N   : (bilarna : Str) -> N
-
-  irregV : (dricka, drack, druckit : Str) -> V
-```
-
-Regular verbs are actually implemented the 
-[Lexin http://lexin.nada.kth.se/sve-sve.shtml] way
-```
-  regV : (talar : Str) -> V
-```
-
-
-#NEW
-
-====Worst-case paradigms====
-
-To cover all situations, worst-case paradigms are given. E.g. Swedish
-```
-  mkN : (apa,apan,apor,aporna : Str) -> N
-
-  mkA : (liten, litet, lilla, små, mindre, minst, minsta : Str) -> A
-
-  mkV : (supa,super,sup,söp,supit,supen : Str) -> V
-```
-
-
-#NEW
-
-====Irregular words====
-
-Iregular words in ``IrregX``, e.g. Swedish:
-```
-    draga_V : V = 
-      mkV 
-        (variants { "dra"  ; "draga"}) 
-        (variants { "drar" ; "drager"}) 
-        (variants { "dra"  ; "drag" }) 
-        "drog" 
-        "dragit" 
-        "dragen" ;
-```
-Goal: eliminate the user's need of worst-case functions.
-
-
-
-#NEW
-
-===Language-dependent syntax extensions===
-
-Syntactic structures that are not shared by all languages.
-
-Alternative (and often more idiomatic) ways to say what is already covered by the API.
-
-Not implemented yet.
-
-Candidates:
- Norwegian post-possessives: ``bilen min``
- French question forms: ``est-ce que tu dors ?``
- Romance simple past tenses
-
-
-#NEW
-
-===Special-purpose APIs===
-
-Mathematical
-
-Multimodal
-
-Present
-
-Minimal
-
-Shallow
-
-
-#NEW
-
-===How to use the resource as top-level grammar===
-
-===Compiling===
-
-It is a good idea to compile the library, so that it can be opened faster
-```
-  GF/lib/resource-1.0% make
-
-  writes GF/lib/alltenses
-         GF/lib/present
-         GF/lib/resource-1.0/langs.gfcm
-```
-If you don't intend to change the library, you never need to process the source
-files again. Just do some of
-```
-  gf -nocf langs.gfcm                                    -- all 8 languages
- 
-  gf -nocf -path=alltenses:prelude alltenses/LangSwe.gfc -- Swedish only
-
-  gf -nocf -path=present:prelude present/LangSwe.gfc     -- Swedish in present tense only
-```
-
-
-#NEW
-
-===Parsing===
-
-The default parser does not work! (It is obsolete anyway.)
-
-The MCFG parser (the new standard) works in theory, but can
-in practice be too slow to build.
-
-But it does work in some languages, after waiting appr. 20 seconds
-```
-  p -mcfg -lang=LangEng -cat=S "I would see her"
-
-  p -mcfg -lang=LangSwe -cat=S "jag skulle se henne"
-```
-Parsing in ``present/`` versions is quicker.
-
-Remedies:
- write application grammars for parsing
- use treebank lookup instead
-
-
-#NEW
-
-===Treebank generation===
-
-Multilingual treebank entry = tree + linearizations
-
-Some examples on treebank generation, assuming ``langs.gfcm``
-```
-  gr -cat=S   -number=10 -cf | tb                  -- 10 random S
-
-  gt -cat=Phr -depth=4       | tb -xml | wf ex.xml -- all Phr to depth 4, into file ex.xml
-```
-Regression testing
-```
-  rf ex.xml | tb -c      -- read treebank from file and compare to present grammars 
-```
-Updating a treebank
-```
-  rf old.xml | tb -trees | tb -xml | wf new.xml    -- read old from file, write new to file
-```
-
-#NEW
-
-===The multilingual treebank format===
-
-Tree + linearizations
-```
-  > gr -cat=Cl | tb
-  PredVP (UsePron they_Pron) (PassV2 seek_V2)
-  They are sought
-  Elles sont cherchées
-  Son buscadas
-  Vengono cercate
-  De blir sökta
-  De blir lette
-  Sie werden gesucht
-  Heidät etsitään
-```
-These can also be wrapped in XML tags (``tb -xml``)
-
-
-#NEW
-
-===Treebank-based parsing===
-
-Brute-force method that helps if real parsing is more expensive.
-```
-  make treebank                     -- make treebank with all languages
-
-  gf -treebank langs.xml            -- start GF by reading the treebank
-
-  > ut -strings -treebank=LangIta   -- show all Ita strings
-
-  > ut -treebank=LangIta -raw "Quello non si romperebbe" -- look up a string
-
-  > i -nocf langs.gfcm              -- read grammar to be able to linearize
-
-  > ut -treebank=LangIta "Quello non si romperebbe" | l -multi  -- translate to all
-```
-
-
-#NEW
-
-===Morphology===
-
-Use morphological analyser
-```
-  gf -nocf -retain -path=alltenses:prelude alltenses/LangSwe.gf
-  > ma "jag kan inte höra vad du säger"
-```
-
-Try out a morphology quiz
-```
-  > mq -cat=V
-```
-
-Try out inflection patterns
-```
-  gf -retain -path=alltenses:prelude alltenses/ParadigmsSwe.gfr
-  > cc regV "lyser"
-```
-
-
-
-#NEW
-
-===Syntax editing===
-
-The simplest way to start editing with all grammars is
-``` 
-  gfeditor langs.gfcm
-```
-The forthcoming IDE will extend the syntax editor with
-a ``Paradigms`` file browser and a control on what
-parts of an application grammar remain to be implemented.
-
-
-#NEW
-
-===Efficient parsing via application grammar===
-
-Get rid of discontinuous constituents (in particular, ``VP``)
-
-Example: [``mathematical/Predication`` gfdoc/Predication.html]:
-```
-  predV2 : V2 -> NP -> NP -> Cl
-```
-instead of ``PredVP np (ComplV2 v2 np')``
-
-
-#NEW
-
-==How to use as library==
-
-===Specialization through parametrized modules===
-
-The application grammar is implemented with reference to
-the resource API
-
-Individual languages are instantiations
-
-Example: [tram ../../../examples/tram/TramI.gf]
-
-
-#NEW
-
-===Compile-time transfer===
-
-Instead of parametrized modules:
-
-select resource functions differently for different languages
-
-Example: imperative vs. infinitive in mathematical exercises
-
-
-#NEW
-
-===A natural division into modules===
-
-Lexicon in language-dependent moduls
-
-Combination rules in a parametrized module
-
-#NEW
-
-===Example-based grammar writing===
-
-Example: [animal ../../../examples/animal/QuestionsI.gfe]
-```
--# -resource=present/LangEng.gf
--# -path=.:present:prelude
-
-- to compile: gf -examples QuestionsI.gfe
-
-incomplete concrete QuestionsI of Questions = open Lang in {
-  lincat
-    Phrase = Phr ;
-    Entity = N ;
-    Action = V2 ;
-  lin 
-    Who  love_V2 man_N           = in Phr "who loves men" ;
-    Whom man_N love_V2           = in Phr "whom does the man love" ;
-    Answer woman_N love_V2 man_N = in Phr "the woman loves men" ;
-}
-```
-
-#NEW
-
-==How to implement a new language==
-
-See [Resource-HOWTO Resource-HOWTO.html]
-
-==Ordinary modules==
-
-Write a concrete syntax module for each abstract module in the API
-
-Write a ``Paradigms`` module
-
-Examples: English, Finnish, German, Russian
-
-#NEW
-
-==Parametrized modules==
-
-Examples: Romance (French, Italian, Spanish), Scandinavian (Danish, Norwegian, Swedish)
-
-Write a ``Diff`` interface for a family of languages
-
-Write concrete syntaxes as functors opening the interface
-
-Write separate ``Paradigms`` modules for each language
-
-Advantages:
- easier maintenance of library
- insights into language families
-
-
-Problems:
- more abstract thinking required
- individual grammars may not come out optimal in elegance and efficiency
-
-
-#NEW
-
-===The core API===
-
-Everything else is variations of this
-```
-cat
-  Cl ;   -- clause
-  VP ;   -- verb phrase
-  V2 ;   -- two-place verb
-  NP ;   -- noun phrase
-  CN ;   -- common noun
-  Det ;  -- determiner
-  AP ;   -- adjectival phrase
-
-fun
-  PredVP  : NP  -> VP -> Cl ;   -- predication
-  ComplV2 : V2  -> NP -> VP ;   -- complementization
-  DetCN   : Det -> CN -> NP ;   -- determination
-  ModCN   : AP  -> CN -> CN ;   -- modification
-```
-
-#NEW
-
-===The core API in Latin: parameters===
-
-This [toy Latin grammar  latin.gf] shows in a nutshell how the core
-can be implemented.
-```
-param
-  Number   = Sg | Pl ;
-  Person   = P1 | P2 | P3 ;
-  Tense    = Pres | Past ;
-  Polarity = Pos | Neg ;
-  Case     = Nom | Acc | Dat ;
-  Gender   = Masc | Fem | Neutr ;
-oper
-  Agr = {g : Gender ; n : Number ; p : Person} ; -- agreement features
-```
-
-#NEW
-
-===The core API in Latin: linearization types===
-
-```
-lincat
-  Cl = {
-    s : Tense => Polarity => Str
-    } ;
-  VP  = {
-    verb  : Tense => Polarity => Agr => Str ;  -- finite verb
-    neg   : Polarity => Str ;                  -- negation
-    compl : Agr => Str                         -- complement
-    } ;
-  V2 = {
-    s : Tense => Number => Person => Str ; 
-    c : Case                                   -- complement case
-    } ;
-  NP = {
-    s : Case => Str ; 
-    a : Agr                                    -- agreement features
-    } ;
-  CN = {
-    s : Number => Case => Str ; 
-    g : Gender
-    } ;
-  Det = {
-    s : Gender => Case => Str ; 
-    n : Number
-    } ;
-  AP = {
-    s : Gender => Number => Case => Str
-    } ;
-```
-
-#NEW
-
-===The core API in Latin: predication and complementization===
-
-```
-lin
-  PredVP np vp = {
-    s = \\t,p => 
-      let
-        agr = np.a ;
-        subject = np.s ! Nom ;
-        object  = vp.compl ! agr ;
-        verb    = vp.neg ! p ++ vp.verb ! t ! p ! agr  
-      in                      
-      subject ++ object ++ verb
-    } ;
-
-  ComplV2 v np = {
-    verb  = \\t,p,a => v.s ! t ! a.n ! a.p ;
-    compl = \\_ => np.s ! v.c ;
-    neg   = table {Pos => [] ; Neg => "non"}
-    } ;
-```
-
-#NEW
-
-===The core API in Latin: determination and modification===
-
-```
-  DetCN det cn = 
-    let 
-      g = cn.g ; 
-      n = det.n
-    in {
-      s = \\c => det.s ! g ! c ++ cn.s ! n ! c ;
-      a = {g = g ; n = n ; p = P3}
-      } ;
-
-  ModCN ap cn = 
-    let 
-      g = cn.g 
-    in {
-      s = \\n,c => cn.s ! n ! c ++ ap.s ! g ! n ! c ;
-      g = g
-      } ;
-```
-
-
-#NEW
-
-===How to proceed===
-
-+ put up a directory with dummy modules by copying from e.g. English and
-commenting out the contents
-
-+ so you will have a compilable ``LangX`` all the time
-
-+ start with nouns and their inflection
-
-+ proceed to verbs and their inflection
-
-+ add some noun phrases
-
-+ implement predication
-
-
-#NEW
-
-==How to extend the API==
-
-Extend old modules or add a new one?
-
-Usually better to start a new one: then you don't have to implement it
-for all languages at once.
-
-Exception: if you are working with a language-specific API extension,
-you can work directly in that module.