mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-19 17:59:32 -06:00
440 lines
14 KiB
Plaintext
440 lines
14 KiB
Plaintext
GF Resource Grammar Library v. 1.1
|
|
Author: Aarne Ranta <aarne (at) cs.chalmers.se>
|
|
Last update: %%date(%c)
|
|
|
|
% NOTE: this is a txt2tags file.
|
|
% Create an html file from this file using:
|
|
% txt2tags --toc -thtml index.txt
|
|
|
|
%!target:html
|
|
|
|
|
|
The GF Resource Grammar Library defines the basic grammar of
|
|
ten languages:
|
|
Danish, English, Finnish, French, German,
|
|
Italian, Norwegian, Russian, Spanish, Swedish.
|
|
A still incomplete implementation for Arabic is also
|
|
included.
|
|
|
|
**New in Version 1.1**
|
|
- Simpler APIs using overloading:
|
|
- [Constructors gfdoc/Constructors.html]: almost all trees in a category ``C``
|
|
can be built by the function ``mkC``.
|
|
- [Combinators gfdoc/Combinators.html]: cross-cut grammatical functions:
|
|
predication, application, modification, coordination.
|
|
- [Symbolic gfdoc/Symbolic.html]: noun phrases with mathematical symbols.
|
|
|
|
|
|
An example of use is [``logic`` ../../../examples/logic].
|
|
The API of version 1.0 remains valid and can be used in combination with this.
|
|
- Some new functions.
|
|
- Bug fixes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
==Authors==
|
|
|
|
Inger Andersson and Therese Soderberg (Spanish morphology),
|
|
Nicolas Barth and Sylvain Pogodalla (French verb list),
|
|
Ali El Dada (Arabic modules),
|
|
Janna Khegai (Russian modules),
|
|
Bjorn Bringert (many Swadesh lexica),
|
|
Carlos Gonzalía (Spanish cardinals),
|
|
Harald Hammarström (German morphology),
|
|
Patrik Jansson (Swedish cardinals),
|
|
Andreas Priesnitz (German lexicon),
|
|
Aarne Ranta.
|
|
|
|
We are grateful for contributions and
|
|
comments to several other people who have used this and
|
|
the previous versions of the resource library, including
|
|
Ludmilla Bogavac,
|
|
Ana Bove,
|
|
David Burke,
|
|
Lauri Carlson,
|
|
Gloria Casanellas,
|
|
Karin Cavallin,
|
|
Robin Cooper,
|
|
Hans-Joachim Daniels,
|
|
Elisabet Engdahl,
|
|
Markus Forsberg,
|
|
Kristofer Johannisson,
|
|
Anni Laine,
|
|
Peter Ljunglöf,
|
|
Saara Myllyntausta,
|
|
Wanjiku Ng'ang'a,
|
|
Jordi Saludes.
|
|
|
|
|
|
==License==
|
|
|
|
The GF Resource Grammar Library is open-source software licensed under
|
|
GNU General Public License. See the file [LICENSE ../LICENSE] for more
|
|
details.
|
|
|
|
|
|
==Scope==
|
|
|
|
Coverage, for each language:
|
|
- complete morphology
|
|
- lexicon of the ca. 100 most important structural words
|
|
- test lexicon of ca. 300 content words (rough equivalents in each language)
|
|
- list of irregular verbs (separately for each language)
|
|
- representative fragment of syntax (cf. CLE (Core Language Engine))
|
|
- rather flat semantics (cf. Quasi-Logical Form of CLE)
|
|
|
|
|
|
Organization:
|
|
- top-level (API) modules
|
|
- Ground API + special-purpose APIs
|
|
- "school grammar" concepts rather than advanced linguistic theory
|
|
|
|
|
|
Presentation:
|
|
- tool ``gfdoc`` for generating HTML from grammars
|
|
- example collections
|
|
|
|
|
|
==Quick start==
|
|
|
|
Go to the main directory, compile the grammars, and run a test.
|
|
```
|
|
cd GF/lib/resource-1.0
|
|
make
|
|
make test
|
|
```
|
|
This will take quite some time. An alternative is to use the
|
|
precompiled grammar package [``compiled.tgz`` ../../compiled.tgz].
|
|
This package has the necessary ``gfc`` and ``gfr`` files directly under ``GF/lib``.
|
|
```
|
|
GF/lib/alltenses
|
|
GF/lib/mathematical
|
|
GF/lib/multimodal
|
|
GF/lib/present
|
|
```
|
|
Do for instance
|
|
```
|
|
cd GF/lib/
|
|
gf
|
|
> i -path=present:prelude present/LangEng.gfc
|
|
> gr -cat=S -number=3 -cf | tb
|
|
```
|
|
For more examples, see the [Overview slides clt2006.html].
|
|
The ``make`` procedure does not make Arabic, but it can
|
|
be compiled in a similar way as the other languages.
|
|
|
|
|
|
==Encoding==
|
|
|
|
Finnish, German, Romance, and Scandinavian languages are in isolatin-1.
|
|
|
|
Arabic and Russian are in UTF-8.
|
|
|
|
English is in pure ASCII.
|
|
|
|
|
|
|
|
===The language independent ground API===
|
|
|
|
This API is accessible by both ``present`` and ``alltenses``.
|
|
The API is divided into a bunch of ``abstract`` modules.
|
|
The following figure gives the dependencies of these modules.
|
|
|
|
[Grammar.png]
|
|
|
|
The documentation of the individual modules:
|
|
|
|
- [Common gfdoc/Common.html]: abstract notions with language-indep. implementations
|
|
- [Cat gfdoc/Cat.html]: the category system
|
|
- [Noun gfdoc/Noun.html]: construction of nouns and noun phrases
|
|
- [Adjective gfdoc/Adjective.html]: construction of adjectival phrases
|
|
- [Verb gfdoc/Verb.html]: construction of verb phrases
|
|
- [Adverb gfdoc/Adverb.html]: construction of adverbial phrases
|
|
- [Numeral gfdoc/Numeral.html]: construction of cardinal and ordinal numerals
|
|
- [Sentence gfdoc/Sentence.html]: construction of sentences and imperatives
|
|
- [Question gfdoc/Question.html]: construction of questions
|
|
- [Relative gfdoc/Relative.html]: construction of relative clauses
|
|
- [Conjunction gfdoc/Conjunction.html]: coordination of phrases
|
|
- [Phrase gfdoc/Phrase.html]: construction of the major units of text and speech
|
|
- [Text gfdoc/Text.html]: construction of texts from phrases, using punctuation
|
|
- [Idiom gfdoc/Idiom.html]: idiomatic phrases, such as existentials
|
|
- [Structural gfdoc/Structural.html]: a lexicon of structural words
|
|
- [Lexicon gfdoc/Lexicon.html]: a lexicon of other common words, for test purposes
|
|
- [Grammar gfdoc/Grammar.html]: the main module comprising all but ``Lexicon``
|
|
- [Lang gfdoc/Lang.html]: the main module comprising both ``Grammar`` and ``Lexicon``
|
|
|
|
|
|
===The language-dependent APIs===
|
|
|
|
- [ParadigmsDan gfdoc/ParadigmsDan.html]: Danish lexical paradigms
|
|
- [ParadigmsEng gfdoc/ParadigmsEng.html]: English lexical paradigms
|
|
- [ParadigmsFin gfdoc/ParadigmsFin.html]: Finnish lexical paradigms
|
|
- [ParadigmsFre gfdoc/ParadigmsFre.html]: French lexical paradigms
|
|
- [ParadigmsIta gfdoc/ParadigmsIta.html]: Italian lexical paradigms
|
|
- [ParadigmsGer gfdoc/ParadigmsGer.html]: German lexical paradigms
|
|
- [ParadigmsNor gfdoc/ParadigmsNor.html]: Norwegian lexical paradigms
|
|
- [ParadigmsRus gfdoc/ParadigmsRus.html]: Russian lexical paradigms
|
|
- [ParadigmsSpa gfdoc/ParadigmsSpa.html]: Spanish lexical paradigms
|
|
- [ParadigmsSwe gfdoc/ParadigmsSwe.html]: Swedish lexical paradigms
|
|
|
|
|
|
|
|
- [IrregDan ../danish/IrregDan.gf]: Danish irregular verbs (very incomplete)
|
|
- [IrregEng ../english/IrregEng.gf]: English irregular verbs
|
|
- [IrregFre ../french/IrregFre.gf]: French irregular verbs
|
|
- [IrregGer ../german/IrregGer.gf]: German irregular verbs
|
|
- [IrregNor ../norwegian/IrregNor.gf]: Norwegian irregular verbs (very incomplete)
|
|
- [IrregSpa ../spanish/IrregSpa.gf]: Spanish irregular verbs
|
|
- [IrregSwe ../swedish/IrregSwe.gf]: Swedish irregular verbs
|
|
|
|
|
|
This is the structure of each language-dependent top module.
|
|
|
|
[English.png]
|
|
|
|
- [Extra ../abstract/Extra.gf]: extra constructs implemented in some languages
|
|
- [ExtraDan ../danish/ExtraDanAbs.gf]: extra constructs in Danish
|
|
- [ExtraEng ../english/ExtraEngAbs.gf]: extra constructs in English
|
|
- [ExtraFin ../finnish/ExtraFinAbs.gf]: extra constructs in Finnish
|
|
- [ExtraFre ../french/ExtraFreAbs.gf]: extra constructs in French
|
|
- [ExtraIta ../italian/ExtraItaAbs.gf]: extra constructs in Italian
|
|
- [ExtraNor ../norwegian/ExtraNorAbs.gf]: extra constructs in Norwegian
|
|
- [ExtraRus ../russian/ExtraRusAbs.gf]: extra constructs in Russian
|
|
- [ExtraScand ../scandinavian/ExtraScandAbs.gf]: extra constructs in Scandinavian
|
|
- [ExtraSpa ../french/ExtraSpaAbs.gf]: extra constructs in Spanish
|
|
- [ExtraSwe ../swedish/ExtraSweAbs.gf]: extra constructs in Swedish
|
|
|
|
|
|
- [Danish ../danish/DanishAbs.gf]: Danish with all extras
|
|
- [English ../english/EnglishAbs.gf]: English with all extras
|
|
- [Finnish ../finnish/FinnishAbs.gf]: Finnish with all extras
|
|
- [French ../french/FrenchAbs.gf]: French with all extras
|
|
- [German ../german/GermanAbs.gf]: German with all extras
|
|
- [Italian ../italian/ItalianAbs.gf]: Italian with all extras
|
|
- [Norwegian ../norwegian/NorwegianAbs.gf]: Norwegian with all extras
|
|
- [Russian ../russian/RussianAbs.gf]: Russian with all extras
|
|
- [Spanish ../spanish/SpanishAbs.gf]: Spanish with all extras
|
|
- [Swedish ../swedish/SwedishAbs.gf]: Swedish with all extras
|
|
|
|
|
|
|
|
===Special-purpose APIs===
|
|
|
|
====Present====
|
|
|
|
The API is the same as for the full ground API, but the compiler
|
|
has ignored all verb and sentence tenses except the present.
|
|
Lines ignored in the source files are marked by ``--# notpresent``.
|
|
The result is a smaller and more efficient grammar, which is still
|
|
sufficient for many applications.
|
|
|
|
|
|
====Multimodal====
|
|
|
|
The API is the same as for the full ground API, but with modified
|
|
linearization types of ``NP`` and ``Adv``, and all other categories
|
|
depending on them: an extra field is added to a demonstrative pointing
|
|
gesture. Some functions for constructing demonstratives are provided.
|
|
|
|
- [Multi gfdoc/Multi.html]: main module for multimodal dialogue systems
|
|
|
|
|
|
====Mathematical====
|
|
|
|
- [Mathematical gfdoc/Mathematical.html]: main module for mathematical language
|
|
- [Predication gfdoc/Predication.html]: predication with verbs, adjectives, etc
|
|
- [Symbol gfdoc/Symbol.html]: symbols and numbers in text
|
|
|
|
|
|
|
|
==Using the library==
|
|
|
|
===The compiled version===
|
|
|
|
The simplest way to get the library is to install the precompiled version
|
|
[``lib/compiled.tgz`` ../../compiled.tgz]. Just do
|
|
```
|
|
cd GF/lib
|
|
tar xvfz compiled.tgz
|
|
```
|
|
There is no need to link application grammars to the source directories of the
|
|
library. Use one (or several) of the following packages instead:
|
|
|
|
- ``lib/alltenses`` the complete ground-API library with all forms
|
|
- ``lib/present`` a pruned ground-API library with present tense only
|
|
- ``lib/mathematical`` special-purpose API for mathematical applications
|
|
- ``lib/multimodal`` the complete ground-API with demonstratives for
|
|
multimodal dialogue applications
|
|
|
|
|
|
===Linking applications to libraries===
|
|
|
|
Typically, open one of
|
|
- ``GrammarX`` for just syntax
|
|
- ``LangX`` for both syntax and a small lexicon
|
|
- ``X`` (e.g. ``English``) for syntax, lexicon, and language-dependent extensions
|
|
|
|
|
|
Usually you also need your own lexicon, and hence have to open
|
|
- ``ParadigmsX`` for lexicon-building functions
|
|
|
|
|
|
It is advisable to use the bare package names in paths pointing to the
|
|
libraries. Here is an example, from ``examples/dialogue/LightsEng.gf``:
|
|
```
|
|
--# -path=.:alltenses:multimodal:prelude
|
|
```
|
|
To reach these directories from anywhere, set the environment variable
|
|
``GF_LIB_PATH`` to point to the directory ``GF/lib/``. For instance,
|
|
I have the following line in my ``.bashrc`` file:
|
|
```
|
|
export GF_LIB_PATH=/home/aarne/GF/lib
|
|
```
|
|
|
|
The ``mathematical`` API shares modules with
|
|
``present``. It is therefore not a good idea to use it in combination with
|
|
``alltenses``.
|
|
|
|
|
|
|
|
===Using the libraries as top-level grammars===
|
|
|
|
If you have done ``make`` in ``lib/resource-1.0``, you will have
|
|
a file ``langs.gfcm``. This file can be used with fast startup for
|
|
tasks such as treebank generation:
|
|
```
|
|
> i -nocf langs.gfcm
|
|
> gr -cat=S -cf -number=10 | tb
|
|
```
|
|
The ``-nocf`` flag saves startup time and memory by preventing the
|
|
creation of context-free parse grammars.
|
|
The resource grammar libraries do //not// support
|
|
parsing very well. While it is theoretically possible to parse with any
|
|
GF grammar, the resource grammars are so abstract and complex that
|
|
building the actual parser in memory may just need too much resources
|
|
to succeed.
|
|
|
|
An exception is ``LangEng``. It is actually feasible to parse with
|
|
both ``alltenses/LangEng`` and ``present/LangEng`` - the latter being
|
|
much faster than the former. The ``-fcfg`` flag (fast multiple context-free grammar)
|
|
must be used:
|
|
```
|
|
p -lang=LangEng -fcfg "this man is old"
|
|
```
|
|
Parsing with the ``-fcfg`` flag takes a few extra seconds the first time during
|
|
each session, but gets faster at later runs. From GF 2.6, ``fcfg`` is the
|
|
default parser of GF and the flag is not needed.
|
|
|
|
It is also possible to parse in Scandinavian languages
|
|
(Danish, Norwegian, Swedish) and, with enough memory (``gf +RTS -K512M``),
|
|
German.
|
|
|
|
|
|
|
|
|
|
==Example applications==
|
|
|
|
These applications are meant to serve as starting points for
|
|
new applications, showing how the libraries can be used in
|
|
typical situations.
|
|
|
|
===Bronzeage===
|
|
|
|
The [examples/bronzeage ../../../examples/bronzeage]
|
|
grammar set implements a language fragment
|
|
based on the Swadesh list of 200 words. It is useful for
|
|
things like language training.
|
|
|
|
|
|
===Dialogue===
|
|
|
|
The [examples/dialogue ../../../examples/dialogue]
|
|
grammar set implements the user grammars of some
|
|
multimodal dialogue system.
|
|
Its purpose is to serve as a prototype for applications in the
|
|
TALK project.
|
|
|
|
|
|
===Animals===
|
|
|
|
The [examples/animal ../../../examples/animal]
|
|
grammar set implements some queries about animals.
|
|
Its purpose is to serve as a prototype for example-based
|
|
grammar writing.
|
|
|
|
|
|
==Known bugs and missing components==
|
|
|
|
Danish
|
|
- the lexicon and chosen inflections are only partially verified
|
|
|
|
|
|
English
|
|
|
|
|
|
Finnish
|
|
- wrong cases in some passive constructions
|
|
|
|
|
|
French
|
|
- multiple clitics (with V3) not always right
|
|
- third person pronominal questions with inverted word order
|
|
have wrong forms if "t" is required e.g.
|
|
(e.g. "comment fera-t-il" becomes "comment fera il")
|
|
|
|
|
|
German
|
|
|
|
|
|
Italian
|
|
- multiple clitics (with V3) not always right
|
|
|
|
|
|
Norwegian
|
|
- the lexicon and chosen inflections are only partially verified
|
|
|
|
|
|
Russian
|
|
- some functions missing
|
|
- some regular paradigms are missing
|
|
|
|
|
|
Spanish
|
|
- multiple clitics (with V3) not always right
|
|
- missing contractions with imperatives and clitics
|
|
|
|
|
|
Swedish
|
|
|
|
|
|
|
|
|
|
==More reading==
|
|
|
|
[GF Resource Grammar Library ../../../doc/resource.pdf] (pdf).
|
|
Printable user manual with API documentation (version 1.0).
|
|
|
|
[Grammars as Software Libraries gslt-sem-2006.html]. Slides
|
|
with background and motivation for the resource grammar library.
|
|
|
|
[GF Resource Grammar Library Version 1.0 clt2006.html]. Slides
|
|
giving an overview of the library and practical hints on its use.
|
|
|
|
[How to write resource grammars Resource-HOWTO.html]. Helps you
|
|
start if you want to add another language to the library.
|
|
|
|
[Parametrized modules for Romance languages http://www.cs.chalmers.se/~aarne/geocal2006.pdf].
|
|
Slides explaining some ideas in the implementation of
|
|
French, Italian, and Spanish.
|
|
|
|
[Grammar writing by examples http://www.cs.chalmers.se/~aarne/slides/webalt-2005.pdf].
|
|
Slides showing how the method is used.
|
|
|
|
[Multimodal Resource Grammars http://www.cs.chalmers.se/~aarne/slides/talk-edin2005.pdf].
|
|
Slides showing how to use the multimodal resource library. N.B. the library
|
|
examples are from ``multimodal/old``, which is a reduced-size API.
|
|
|