forked from GitHub/gf-core
361 lines
11 KiB
Plaintext
361 lines
11 KiB
Plaintext
GF Resource Grammar Library v. 1.0
|
|
Author: Aarne Ranta <aarne (at) cs.chalmers.se>
|
|
Last update: %%date(%c)
|
|
|
|
% NOTE: this is a txt2tags file.
|
|
% Create an html file from this file using:
|
|
% txt2tags --toc -thtml index.txt
|
|
|
|
%!target:html
|
|
|
|
|
|
The GF Resource Grammar Library defines the basic grammar of
|
|
ten languages:
|
|
Danish, English, Finnish, French, German,
|
|
Italian, Norwegian, Russian, Spanish, Swedish.
|
|
|
|
**Notice**. This document concerns the API v. 1.0 which has not
|
|
yet been "officially" released. The release will be made in combination
|
|
with a new version of GF itself, since the grammars use new features
|
|
not available in GF 2.4.
|
|
|
|
**New**. V. 1.0 is now (26 May) available Russian and Danish.
|
|
But the modules still need some testing, especially the Danish
|
|
lexicon (corrections welcome!).
|
|
|
|
|
|
==Authors==
|
|
|
|
Inger Andersson and Therese Soderberg (Spanish morphology),
|
|
Nicolas Barth and Sylvain Pogodalla (French verb list),
|
|
Janna Khegai (Russian modules),
|
|
Bjorn Bringert (many Swadesh lexica),
|
|
Carlos Gonzalia (Spanish cardinals),
|
|
Harald Hammarstrom (German morphology),
|
|
Partik Jansson (Swedish cardinals),
|
|
Andreas Priesnitz (German lexicon),
|
|
Aarne Ranta.
|
|
|
|
We are grateful for contributions and
|
|
comments to several other people who have used this and
|
|
the previous versions of the resource library, including
|
|
Ludmilla Bogavac,
|
|
Ana Bove,
|
|
David Burke,
|
|
Lauri Carlson,
|
|
Gloria Casanellas,
|
|
Karin Cavallin,
|
|
Robin Cooper,
|
|
Ali El Dada,
|
|
Hans-Joachim Daniels,
|
|
Elisabet Engdahl,
|
|
Markus Forsberg,
|
|
Kristofer Johannisson,
|
|
Anni Laine,
|
|
Saara Myllyntausta,
|
|
Wanjiku Ng'ang'a,
|
|
Jordi Saludes.
|
|
|
|
|
|
==License==
|
|
|
|
The GF Resource Grammar Library is open-source software licensed under
|
|
GNU General Public License. See the file [LICENSE ../LICENSE] for more
|
|
details.
|
|
|
|
|
|
==Scope==
|
|
|
|
Coverage, for each language:
|
|
- complete morphology
|
|
- lexicon of the ca. 100 most important structural words
|
|
- test lexicon of ca. 300 content words
|
|
- representative fragment of syntax (cf. CLE (Core Language Engine))
|
|
- rather flat semantics (cf. Quasi-Logical Form of CLE)
|
|
|
|
|
|
Organization:
|
|
- top-level (API) modules
|
|
- Ground API + special-purpose APIs
|
|
- "school grammar" concepts rather than advanced linguistic theory
|
|
|
|
|
|
Presentation:
|
|
- tool ``gfdoc`` for generating HTML from grammars
|
|
- example collections
|
|
|
|
|
|
==Quick start==
|
|
|
|
Go to the main directory, compile the grammars, and run a test.
|
|
```
|
|
cd GF/lib/resource-1.0
|
|
make
|
|
make test
|
|
```
|
|
This will take quite some time. An alternative is to use the
|
|
precompiled grammar package from GF download page. This package
|
|
has the necessary ``gfc`` and ``gfr`` files directly under ``GF/lib``.
|
|
```
|
|
GF/lib/alltenses
|
|
GF/lib/mathematical
|
|
GF/lib/multimodal
|
|
GF/lib/present
|
|
```
|
|
Do for instance
|
|
```
|
|
cd GF/lib/
|
|
gf
|
|
> i -path=present:prelude present/LangEng.gfc
|
|
> gr -cat=S -number=3 -cf | tb
|
|
```
|
|
For more examples, see the [Overview slides clt2006.html].
|
|
|
|
|
|
|
|
===The language independent ground API===
|
|
|
|
This API is accessible by both ``present`` and ``alltenses``.
|
|
The API is divided into a bunch of ``abstract`` modules.
|
|
The following figure gives the dependencies of these modules.
|
|
|
|
[Lang.png]
|
|
|
|
The documentation of the individual modules:
|
|
|
|
- [Common gfdoc/Common.html]: abstract notions with language-indep. implementations
|
|
- [Cat gfdoc/Cat.html]: the category system
|
|
- [Noun gfdoc/Noun.html]: construction of nouns and noun phrases
|
|
- [Adjective gfdoc/Adjective.html]: construction of adjectival phrases
|
|
- [Verb gfdoc/Verb.html]: construction of verb phrases
|
|
- [Adverb gfdoc/Adverb.html]: construction of adverbial phrases
|
|
- [Numeral gfdoc/Numeral.html]: construction of cardinal and ordinal numerals
|
|
- [Sentence gfdoc/Sentence.html]: construction of sentences and imperatives
|
|
- [Question gfdoc/Question.html]: construction of questions
|
|
- [Relative gfdoc/Relative.html]: construction of relative clauses
|
|
- [Conjunction gfdoc/Conjunction.html]: coordination of phrases
|
|
- [Phrase gfdoc/Phrase.html]: construction of the major units of text and speech
|
|
- [Text gfdoc/Text.html]: construction of texts from phrases, using punctuation
|
|
- [Idiom gfdoc/Idiom.html]: idiomatic phrases, such as existentials
|
|
- [Structural gfdoc/Structural.html]: a lexicon of structural words
|
|
- [Lexicon gfdoc/Lexicon.html]: a lexicon of other common words, for test purposes
|
|
- [Lang gfdoc/Lang.html]: the main module comprising all the others
|
|
|
|
|
|
===The language-dependent APIs===
|
|
|
|
- [ParadigmsDan gfdoc/ParadigmsDan.html]: Danish lexical paradigms
|
|
- [ParadigmsEng gfdoc/ParadigmsEng.html]: English lexical paradigms
|
|
- [ParadigmsFin gfdoc/ParadigmsFin.html]: Finnish lexical paradigms
|
|
- [ParadigmsFre gfdoc/ParadigmsFre.html]: French lexical paradigms
|
|
- [ParadigmsIta gfdoc/ParadigmsIta.html]: Italian lexical paradigms
|
|
- [ParadigmsGer gfdoc/ParadigmsGer.html]: German lexical paradigms
|
|
- [ParadigmsNor gfdoc/ParadigmsNor.html]: Norwegian lexical paradigms
|
|
- [ParadigmsRus gfdoc/ParadigmsRus.html]: Russian lexical paradigms
|
|
- [ParadigmsSpa gfdoc/ParadigmsSpa.html]: Spanish lexical paradigms
|
|
- [ParadigmsSwe gfdoc/ParadigmsSwe.html]: Swedish lexical paradigms
|
|
|
|
|
|
|
|
- [IrregDan gfdoc/IrregDan.gf]: Danish irregular verbs (very incomplete)
|
|
- [IrregEng gfdoc/IrregEng.gf]: English irregular verbs
|
|
- [IrregFre gfdoc/IrregFre.gf]: French irregular verbs
|
|
% - [IrregGer gfdoc/IrregGer.gf]: German irregular verbs
|
|
- [IrregNor gfdoc/IrregNor.gf]: Norwegian irregular verbs (very incomplete)
|
|
- [IrregSwe gfdoc/IrregSwe.gf]: Swedish irregular verbs
|
|
|
|
|
|
===Special-purpose APIs===
|
|
|
|
====Present====
|
|
|
|
The API is the same as for the full ground API, but the compiler
|
|
has ignored all verb and sentence tenses except the present.
|
|
Lines ignored in the source files are marked by ``--# notpresent``.
|
|
The result is a smaller and more efficient grammar, which is still
|
|
sufficient for many applications.
|
|
|
|
|
|
====Multimodal====
|
|
|
|
- [Multi gfdoc/Multi.html]: main module for multimodal dialogue systems
|
|
|
|
|
|
====Mathematical====
|
|
|
|
- [Mathematical gfdoc/Mathematical.html]: main module for mathematical language
|
|
- [Predication gfdoc/Predication.html]: predication with verbs, adjectives, etc
|
|
- [Symbol gfdoc/Symbol.html]: symbols and numbers in text
|
|
|
|
|
|
|
|
==Using the library==
|
|
|
|
===The compiled version===
|
|
|
|
The simplest way to get the library is to install the precompiled version
|
|
[``lib/compiled.tgz`` ../../compiled.tgz]. Just do
|
|
```
|
|
cd GF/lib
|
|
tar xvfz compiled.tgz
|
|
```
|
|
There is no need to link application grammars to the source directories of the
|
|
library. Use one (or several) of the following packages instead:
|
|
|
|
- ``lib/alltenses`` the complete ground-API library with all forms
|
|
- ``lib/present`` a pruned ground-API library with present tense only
|
|
- ``lib/mathematical`` special-purpose API for mathematical applications
|
|
- ``lib/multimodal`` the complete ground-API with demonstratives for
|
|
multimodal dialogue applications
|
|
|
|
|
|
===Linking applications to libraries===
|
|
|
|
Notice, however, that both special-purpose APIs share modules with
|
|
``present``. It is therefore not a good idea to use them in combination with
|
|
``alltenses``.
|
|
|
|
|
|
It is advisable to use the bare package names in paths pointing to the
|
|
libraries. Here is an example, from ``examples/dialogue/LightsEng.gf``:
|
|
```
|
|
--# -path=.:alltenses:multimodal:prelude
|
|
```
|
|
To reach these directories from anywhere, set the environment variable
|
|
``GF_LIB_PATH`` to point to the directory ``GF/lib/``. For instance,
|
|
I have the following line in my ``.bashrc`` file:
|
|
```
|
|
export GF_LIB_PATH=/home/aarne/GF/lib
|
|
```
|
|
|
|
|
|
===Using the libraries as top-level grammars===
|
|
|
|
If you have done ``make`` in ``lib/resource-1.0``, you will have
|
|
a file ``langs.gfcm``. This file can be used with fast startup for
|
|
tasks such as treebank generation:
|
|
```
|
|
> i -nocf langs.gfcm
|
|
> gr -cat=S -cf -number=10 | tb
|
|
```
|
|
The ``-nocf`` flag saves startup time and memory by preventing the
|
|
creation of context-free parse grammars.
|
|
The resource grammar libraries do //not// support
|
|
parsing very well. While it is theoretically possible to parse with any
|
|
GF grammar, the resource grammars are so abstract and complex that
|
|
building the actual parser in memory may just need too much resources
|
|
to succeed.
|
|
|
|
An exception is ``LangEng``. It is actually feasible to parse with
|
|
both ``alltenses/LangEng`` and ``present/LangEng`` - the latter being
|
|
much faster than the former. The ``-mcfg`` flag (multiple context-free grammar)
|
|
must be used:
|
|
```
|
|
p -lang=LangEng -mcfg -parser=topdown "this man is old"
|
|
```
|
|
Parsing with the ``-mcfg`` flag takes a few extra seconds the first time during
|
|
each session, but gets faster at later runs.
|
|
|
|
|
|
==Example applications==
|
|
|
|
These applications are meand to serve as starting points for
|
|
new applications, showing how the libraries can be used in
|
|
typical situations.
|
|
|
|
===Brozeage===
|
|
|
|
The [examples/bronzeage ../../../examples/bronzeage]
|
|
grammar set implements a language fragment
|
|
based on the Swadesh list of 200 words. It is useful for
|
|
things like language training.
|
|
|
|
|
|
===Dialogue===
|
|
|
|
The [examples/dialogue ../../../examples/dialogue]
|
|
grammar set implements the user grammars of some
|
|
multimodal dialogue system.
|
|
Its purpose is to serve as a prototype for applications in the
|
|
TALK project.
|
|
|
|
|
|
===Animals===
|
|
|
|
The [examples/animal ../../../examples/animal]
|
|
grammar set implements some queries about animals.
|
|
Its purpose is to serve as a prototype for example-based
|
|
grammar writing.
|
|
|
|
|
|
==Known bugs and missing components==
|
|
|
|
This bugs should be fixed before the final release of v. 1.0.
|
|
|
|
Danish
|
|
- the lexicon and chosen inflections are only partially verified
|
|
|
|
|
|
English
|
|
- only contracted negation forms
|
|
|
|
|
|
Finnish
|
|
- wrong cases in some passive constructions
|
|
|
|
|
|
French
|
|
- only direct word order in questions
|
|
|
|
|
|
German
|
|
- no list of irregular verbs
|
|
|
|
|
|
Italian
|
|
- no contraction of infinitives before clitics
|
|
- no list of irregular verbs
|
|
|
|
|
|
Norwegian
|
|
- the lexicon and chosen inflections are only partially verified
|
|
|
|
|
|
Russian
|
|
- some functions missing
|
|
- regular paradigms are missing
|
|
|
|
|
|
Spanish
|
|
- no ordinal numbers
|
|
|
|
|
|
Swedish
|
|
-
|
|
|
|
|
|
|
|
==More reading==
|
|
|
|
[Grammars as Software Libraries gslt-sem-2006.html]. Slides
|
|
with background and motivation for the resource grammar library.
|
|
|
|
[GF Resource Grammar Library Version 1.0 clt2006.html]. Slides
|
|
giving an overview of the library and practical hints on its use.
|
|
|
|
[How to write resource grammars Resource-HOWTO.html]. Helps you
|
|
start if you want to add another language to the library.
|
|
|
|
[Parametrized modules for Romance languages http://www.cs.chalmers.se/~aarne/geocal2006.pdf].
|
|
Slides explaining some ideas in the implementation of
|
|
French, Italian, and Spanish.
|
|
|
|
|
|
[Grammar writing by examples http://www.cs.chalmers.se/~aarne/slides/webalt-2005.pdf].
|
|
Slides showing how the method is used.
|
|
|
|
[Multimodal Resource Grammars http://www.cs.chalmers.se/~aarne/slides/talk-edin2005.pdf].
|
|
Slides showing how to use the multimodal resource library. N.B. the library
|
|
examples are from ``multimodal/old``, which is a reduced-size API.
|
|
|