Files
gf-core/lib/resource/doc/index-1.1.txt
2007-12-12 20:30:11 +00:00

440 lines
14 KiB
Plaintext

GF Resource Grammar Library v. 1.1
Author: Aarne Ranta <aarne (at) cs.chalmers.se>
Last update: %%date(%c)
% NOTE: this is a txt2tags file.
% Create an html file from this file using:
% txt2tags --toc -thtml index.txt
%!target:html
The GF Resource Grammar Library defines the basic grammar of
ten languages:
Danish, English, Finnish, French, German,
Italian, Norwegian, Russian, Spanish, Swedish.
A still incomplete implementation for Arabic is also
included.
**New in Version 1.1**
- Simpler APIs using overloading:
- [Constructors gfdoc/Constructors.html]: almost all trees in a category ``C``
can be built by the function ``mkC``.
- [Combinators gfdoc/Combinators.html]: cross-cut grammatical functions:
predication, application, modification, coordination.
- [Symbolic gfdoc/Symbolic.html]: noun phrases with mathematical symbols.
An example of use is [``logic`` ../../../examples/logic].
The API of version 1.0 remains valid and can be used in combination with this.
- Some new functions.
- Bug fixes.
==Authors==
Inger Andersson and Therese Soderberg (Spanish morphology),
Nicolas Barth and Sylvain Pogodalla (French verb list),
Ali El Dada (Arabic modules),
Janna Khegai (Russian modules),
Bjorn Bringert (many Swadesh lexica),
Carlos Gonzalía (Spanish cardinals),
Harald Hammarström (German morphology),
Patrik Jansson (Swedish cardinals),
Andreas Priesnitz (German lexicon),
Aarne Ranta.
We are grateful for contributions and
comments to several other people who have used this and
the previous versions of the resource library, including
Ludmilla Bogavac,
Ana Bove,
David Burke,
Lauri Carlson,
Gloria Casanellas,
Karin Cavallin,
Robin Cooper,
Hans-Joachim Daniels,
Elisabet Engdahl,
Markus Forsberg,
Kristofer Johannisson,
Anni Laine,
Peter Ljunglöf,
Saara Myllyntausta,
Wanjiku Ng'ang'a,
Jordi Saludes.
==License==
The GF Resource Grammar Library is open-source software licensed under
GNU General Public License. See the file [LICENSE ../LICENSE] for more
details.
==Scope==
Coverage, for each language:
- complete morphology
- lexicon of the ca. 100 most important structural words
- test lexicon of ca. 300 content words (rough equivalents in each language)
- list of irregular verbs (separately for each language)
- representative fragment of syntax (cf. CLE (Core Language Engine))
- rather flat semantics (cf. Quasi-Logical Form of CLE)
Organization:
- top-level (API) modules
- Ground API + special-purpose APIs
- "school grammar" concepts rather than advanced linguistic theory
Presentation:
- tool ``gfdoc`` for generating HTML from grammars
- example collections
==Quick start==
Go to the main directory, compile the grammars, and run a test.
```
cd GF/lib/resource-1.0
make
make test
```
This will take quite some time. An alternative is to use the
precompiled grammar package [``compiled.tgz`` ../../compiled.tgz].
This package has the necessary ``gfc`` and ``gfr`` files directly under ``GF/lib``.
```
GF/lib/alltenses
GF/lib/mathematical
GF/lib/multimodal
GF/lib/present
```
Do for instance
```
cd GF/lib/
gf
> i -path=present:prelude present/LangEng.gfc
> gr -cat=S -number=3 -cf | tb
```
For more examples, see the [Overview slides clt2006.html].
The ``make`` procedure does not make Arabic, but it can
be compiled in a similar way as the other languages.
==Encoding==
Finnish, German, Romance, and Scandinavian languages are in isolatin-1.
Arabic and Russian are in UTF-8.
English is in pure ASCII.
===The language independent ground API===
This API is accessible by both ``present`` and ``alltenses``.
The API is divided into a bunch of ``abstract`` modules.
The following figure gives the dependencies of these modules.
[Grammar.png]
The documentation of the individual modules:
- [Common gfdoc/Common.html]: abstract notions with language-indep. implementations
- [Cat gfdoc/Cat.html]: the category system
- [Noun gfdoc/Noun.html]: construction of nouns and noun phrases
- [Adjective gfdoc/Adjective.html]: construction of adjectival phrases
- [Verb gfdoc/Verb.html]: construction of verb phrases
- [Adverb gfdoc/Adverb.html]: construction of adverbial phrases
- [Numeral gfdoc/Numeral.html]: construction of cardinal and ordinal numerals
- [Sentence gfdoc/Sentence.html]: construction of sentences and imperatives
- [Question gfdoc/Question.html]: construction of questions
- [Relative gfdoc/Relative.html]: construction of relative clauses
- [Conjunction gfdoc/Conjunction.html]: coordination of phrases
- [Phrase gfdoc/Phrase.html]: construction of the major units of text and speech
- [Text gfdoc/Text.html]: construction of texts from phrases, using punctuation
- [Idiom gfdoc/Idiom.html]: idiomatic phrases, such as existentials
- [Structural gfdoc/Structural.html]: a lexicon of structural words
- [Lexicon gfdoc/Lexicon.html]: a lexicon of other common words, for test purposes
- [Grammar gfdoc/Grammar.html]: the main module comprising all but ``Lexicon``
- [Lang gfdoc/Lang.html]: the main module comprising both ``Grammar`` and ``Lexicon``
===The language-dependent APIs===
- [ParadigmsDan gfdoc/ParadigmsDan.html]: Danish lexical paradigms
- [ParadigmsEng gfdoc/ParadigmsEng.html]: English lexical paradigms
- [ParadigmsFin gfdoc/ParadigmsFin.html]: Finnish lexical paradigms
- [ParadigmsFre gfdoc/ParadigmsFre.html]: French lexical paradigms
- [ParadigmsIta gfdoc/ParadigmsIta.html]: Italian lexical paradigms
- [ParadigmsGer gfdoc/ParadigmsGer.html]: German lexical paradigms
- [ParadigmsNor gfdoc/ParadigmsNor.html]: Norwegian lexical paradigms
- [ParadigmsRus gfdoc/ParadigmsRus.html]: Russian lexical paradigms
- [ParadigmsSpa gfdoc/ParadigmsSpa.html]: Spanish lexical paradigms
- [ParadigmsSwe gfdoc/ParadigmsSwe.html]: Swedish lexical paradigms
- [IrregDan ../danish/IrregDan.gf]: Danish irregular verbs (very incomplete)
- [IrregEng ../english/IrregEng.gf]: English irregular verbs
- [IrregFre ../french/IrregFre.gf]: French irregular verbs
- [IrregGer ../german/IrregGer.gf]: German irregular verbs
- [IrregNor ../norwegian/IrregNor.gf]: Norwegian irregular verbs (very incomplete)
- [IrregSpa ../spanish/IrregSpa.gf]: Spanish irregular verbs
- [IrregSwe ../swedish/IrregSwe.gf]: Swedish irregular verbs
This is the structure of each language-dependent top module.
[English.png]
- [Extra ../abstract/Extra.gf]: extra constructs implemented in some languages
- [ExtraDan ../danish/ExtraDanAbs.gf]: extra constructs in Danish
- [ExtraEng ../english/ExtraEngAbs.gf]: extra constructs in English
- [ExtraFin ../finnish/ExtraFinAbs.gf]: extra constructs in Finnish
- [ExtraFre ../french/ExtraFreAbs.gf]: extra constructs in French
- [ExtraIta ../italian/ExtraItaAbs.gf]: extra constructs in Italian
- [ExtraNor ../norwegian/ExtraNorAbs.gf]: extra constructs in Norwegian
- [ExtraRus ../russian/ExtraRusAbs.gf]: extra constructs in Russian
- [ExtraScand ../scandinavian/ExtraScandAbs.gf]: extra constructs in Scandinavian
- [ExtraSpa ../french/ExtraSpaAbs.gf]: extra constructs in Spanish
- [ExtraSwe ../swedish/ExtraSweAbs.gf]: extra constructs in Swedish
- [Danish ../danish/DanishAbs.gf]: Danish with all extras
- [English ../english/EnglishAbs.gf]: English with all extras
- [Finnish ../finnish/FinnishAbs.gf]: Finnish with all extras
- [French ../french/FrenchAbs.gf]: French with all extras
- [German ../german/GermanAbs.gf]: German with all extras
- [Italian ../italian/ItalianAbs.gf]: Italian with all extras
- [Norwegian ../norwegian/NorwegianAbs.gf]: Norwegian with all extras
- [Russian ../russian/RussianAbs.gf]: Russian with all extras
- [Spanish ../spanish/SpanishAbs.gf]: Spanish with all extras
- [Swedish ../swedish/SwedishAbs.gf]: Swedish with all extras
===Special-purpose APIs===
====Present====
The API is the same as for the full ground API, but the compiler
has ignored all verb and sentence tenses except the present.
Lines ignored in the source files are marked by ``--# notpresent``.
The result is a smaller and more efficient grammar, which is still
sufficient for many applications.
====Multimodal====
The API is the same as for the full ground API, but with modified
linearization types of ``NP`` and ``Adv``, and all other categories
depending on them: an extra field is added to a demonstrative pointing
gesture. Some functions for constructing demonstratives are provided.
- [Multi gfdoc/Multi.html]: main module for multimodal dialogue systems
====Mathematical====
- [Mathematical gfdoc/Mathematical.html]: main module for mathematical language
- [Predication gfdoc/Predication.html]: predication with verbs, adjectives, etc
- [Symbol gfdoc/Symbol.html]: symbols and numbers in text
==Using the library==
===The compiled version===
The simplest way to get the library is to install the precompiled version
[``lib/compiled.tgz`` ../../compiled.tgz]. Just do
```
cd GF/lib
tar xvfz compiled.tgz
```
There is no need to link application grammars to the source directories of the
library. Use one (or several) of the following packages instead:
- ``lib/alltenses`` the complete ground-API library with all forms
- ``lib/present`` a pruned ground-API library with present tense only
- ``lib/mathematical`` special-purpose API for mathematical applications
- ``lib/multimodal`` the complete ground-API with demonstratives for
multimodal dialogue applications
===Linking applications to libraries===
Typically, open one of
- ``GrammarX`` for just syntax
- ``LangX`` for both syntax and a small lexicon
- ``X`` (e.g. ``English``) for syntax, lexicon, and language-dependent extensions
Usually you also need your own lexicon, and hence have to open
- ``ParadigmsX`` for lexicon-building functions
It is advisable to use the bare package names in paths pointing to the
libraries. Here is an example, from ``examples/dialogue/LightsEng.gf``:
```
--# -path=.:alltenses:multimodal:prelude
```
To reach these directories from anywhere, set the environment variable
``GF_LIB_PATH`` to point to the directory ``GF/lib/``. For instance,
I have the following line in my ``.bashrc`` file:
```
export GF_LIB_PATH=/home/aarne/GF/lib
```
The ``mathematical`` API shares modules with
``present``. It is therefore not a good idea to use it in combination with
``alltenses``.
===Using the libraries as top-level grammars===
If you have done ``make`` in ``lib/resource-1.0``, you will have
a file ``langs.gfcm``. This file can be used with fast startup for
tasks such as treebank generation:
```
> i -nocf langs.gfcm
> gr -cat=S -cf -number=10 | tb
```
The ``-nocf`` flag saves startup time and memory by preventing the
creation of context-free parse grammars.
The resource grammar libraries do //not// support
parsing very well. While it is theoretically possible to parse with any
GF grammar, the resource grammars are so abstract and complex that
building the actual parser in memory may just need too much resources
to succeed.
An exception is ``LangEng``. It is actually feasible to parse with
both ``alltenses/LangEng`` and ``present/LangEng`` - the latter being
much faster than the former. The ``-fcfg`` flag (fast multiple context-free grammar)
must be used:
```
p -lang=LangEng -fcfg "this man is old"
```
Parsing with the ``-fcfg`` flag takes a few extra seconds the first time during
each session, but gets faster at later runs. From GF 2.6, ``fcfg`` is the
default parser of GF and the flag is not needed.
It is also possible to parse in Scandinavian languages
(Danish, Norwegian, Swedish) and, with enough memory (``gf +RTS -K512M``),
German.
==Example applications==
These applications are meant to serve as starting points for
new applications, showing how the libraries can be used in
typical situations.
===Bronzeage===
The [examples/bronzeage ../../../examples/bronzeage]
grammar set implements a language fragment
based on the Swadesh list of 200 words. It is useful for
things like language training.
===Dialogue===
The [examples/dialogue ../../../examples/dialogue]
grammar set implements the user grammars of some
multimodal dialogue system.
Its purpose is to serve as a prototype for applications in the
TALK project.
===Animals===
The [examples/animal ../../../examples/animal]
grammar set implements some queries about animals.
Its purpose is to serve as a prototype for example-based
grammar writing.
==Known bugs and missing components==
Danish
- the lexicon and chosen inflections are only partially verified
English
Finnish
- wrong cases in some passive constructions
French
- multiple clitics (with V3) not always right
- third person pronominal questions with inverted word order
have wrong forms if "t" is required e.g.
(e.g. "comment fera-t-il" becomes "comment fera il")
German
Italian
- multiple clitics (with V3) not always right
Norwegian
- the lexicon and chosen inflections are only partially verified
Russian
- some functions missing
- some regular paradigms are missing
Spanish
- multiple clitics (with V3) not always right
- missing contractions with imperatives and clitics
Swedish
==More reading==
[GF Resource Grammar Library ../../../doc/resource.pdf] (pdf).
Printable user manual with API documentation (version 1.0).
[Grammars as Software Libraries gslt-sem-2006.html]. Slides
with background and motivation for the resource grammar library.
[GF Resource Grammar Library Version 1.0 clt2006.html]. Slides
giving an overview of the library and practical hints on its use.
[How to write resource grammars Resource-HOWTO.html]. Helps you
start if you want to add another language to the library.
[Parametrized modules for Romance languages http://www.cs.chalmers.se/~aarne/geocal2006.pdf].
Slides explaining some ideas in the implementation of
French, Italian, and Spanish.
[Grammar writing by examples http://www.cs.chalmers.se/~aarne/slides/webalt-2005.pdf].
Slides showing how the method is used.
[Multimodal Resource Grammars http://www.cs.chalmers.se/~aarne/slides/talk-edin2005.pdf].
Slides showing how to use the multimodal resource library. N.B. the library
examples are from ``multimodal/old``, which is a reduced-size API.