mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-11 13:59:31 -06:00
500 lines
16 KiB
HTML
500 lines
16 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
|
|
<TITLE>GF Resource Grammar Library v. 1.1</TITLE>
|
|
</HEAD><BODY BGCOLOR="white" TEXT="black">
|
|
<P ALIGN="center"><CENTER><H1>GF Resource Grammar Library v. 1.1</H1>
|
|
<FONT SIZE="4">
|
|
<I>Author: Aarne Ranta <aarne (at) cs.chalmers.se></I><BR>
|
|
Last update: Thu Apr 19 23:35:29 2007
|
|
</FONT></CENTER>
|
|
|
|
<P>
|
|
The GF Resource Grammar Library defines the basic grammar of
|
|
ten languages:
|
|
Danish, English, Finnish, French, German,
|
|
Italian, Norwegian, Russian, Spanish, Swedish.
|
|
A still incomplete implementation for Arabic is also
|
|
included.
|
|
</P>
|
|
<P>
|
|
<B>New in Version 1.1</B>
|
|
</P>
|
|
<UL>
|
|
<LI>Simpler APIs using overloading:
|
|
<UL>
|
|
<LI><A HREF="gfdoc/Constructors.html">Constructors</A>: almost all trees in a category <CODE>C</CODE>
|
|
can be built by the function <CODE>mkC</CODE>.
|
|
<LI><A HREF="gfdoc/Combinators.html">Combinators</A>: cross-cut grammatical functions:
|
|
predication, application, modification, coordination.
|
|
<LI><A HREF="gfdoc/Symbolic.html">Symbolic</A>: noun phrases with mathematical symbols.
|
|
</UL>
|
|
</UL>
|
|
|
|
<P>
|
|
An example of use is <A HREF="../../../examples/logic"><CODE>logic</CODE></A>.
|
|
The API of version 1.0 remains valid and can be used in combination with this.
|
|
</P>
|
|
<UL>
|
|
<LI>Some new functions.
|
|
<LI>Bug fixes.
|
|
</UL>
|
|
|
|
<H2>Authors</H2>
|
|
<P>
|
|
Inger Andersson and Therese Soderberg (Spanish morphology),
|
|
Nicolas Barth and Sylvain Pogodalla (French verb list),
|
|
Ali El Dada (Arabic modules),
|
|
Janna Khegai (Russian modules),
|
|
Bjorn Bringert (many Swadesh lexica),
|
|
Carlos Gonzalía (Spanish cardinals),
|
|
Harald Hammarström (German morphology),
|
|
Patrik Jansson (Swedish cardinals),
|
|
Andreas Priesnitz (German lexicon),
|
|
Aarne Ranta.
|
|
</P>
|
|
<P>
|
|
We are grateful for contributions and
|
|
comments to several other people who have used this and
|
|
the previous versions of the resource library, including
|
|
Ludmilla Bogavac,
|
|
Ana Bove,
|
|
David Burke,
|
|
Lauri Carlson,
|
|
Gloria Casanellas,
|
|
Karin Cavallin,
|
|
Robin Cooper,
|
|
Hans-Joachim Daniels,
|
|
Elisabet Engdahl,
|
|
Markus Forsberg,
|
|
Kristofer Johannisson,
|
|
Anni Laine,
|
|
Peter Ljunglöf,
|
|
Saara Myllyntausta,
|
|
Wanjiku Ng'ang'a,
|
|
Jordi Saludes.
|
|
</P>
|
|
<H2>License</H2>
|
|
<P>
|
|
The GF Resource Grammar Library is open-source software licensed under
|
|
GNU General Public License. See the file <A HREF="../LICENSE">LICENSE</A> for more
|
|
details.
|
|
</P>
|
|
<H2>Scope</H2>
|
|
<P>
|
|
Coverage, for each language:
|
|
</P>
|
|
<UL>
|
|
<LI>complete morphology
|
|
<LI>lexicon of the ca. 100 most important structural words
|
|
<LI>test lexicon of ca. 300 content words (rough equivalents in each language)
|
|
<LI>list of irregular verbs (separately for each language)
|
|
<LI>representative fragment of syntax (cf. CLE (Core Language Engine))
|
|
<LI>rather flat semantics (cf. Quasi-Logical Form of CLE)
|
|
</UL>
|
|
|
|
<P>
|
|
Organization:
|
|
</P>
|
|
<UL>
|
|
<LI>top-level (API) modules
|
|
<LI>Ground API + special-purpose APIs
|
|
<LI>"school grammar" concepts rather than advanced linguistic theory
|
|
</UL>
|
|
|
|
<P>
|
|
Presentation:
|
|
</P>
|
|
<UL>
|
|
<LI>tool <CODE>gfdoc</CODE> for generating HTML from grammars
|
|
<LI>example collections
|
|
</UL>
|
|
|
|
<H2>Quick start</H2>
|
|
<P>
|
|
Go to the main directory, compile the grammars, and run a test.
|
|
</P>
|
|
<PRE>
|
|
cd GF/lib/resource-1.0
|
|
make
|
|
make test
|
|
</PRE>
|
|
<P>
|
|
This will take quite some time. An alternative is to use the
|
|
precompiled grammar package <A HREF="../../compiled.tgz"><CODE>compiled.tgz</CODE></A>.
|
|
This package has the necessary <CODE>gfc</CODE> and <CODE>gfr</CODE> files directly under <CODE>GF/lib</CODE>.
|
|
</P>
|
|
<PRE>
|
|
GF/lib/alltenses
|
|
GF/lib/mathematical
|
|
GF/lib/multimodal
|
|
GF/lib/present
|
|
</PRE>
|
|
<P>
|
|
Do for instance
|
|
</P>
|
|
<PRE>
|
|
cd GF/lib/
|
|
gf
|
|
> i -path=present:prelude present/LangEng.gfc
|
|
> gr -cat=S -number=3 -cf | tb
|
|
</PRE>
|
|
<P>
|
|
For more examples, see the <A HREF="clt2006.html">Overview slides</A>.
|
|
The <CODE>make</CODE> procedure does not make Arabic, but it can
|
|
be compiled in a similar way as the other languages.
|
|
</P>
|
|
<H2>Encoding</H2>
|
|
<P>
|
|
Finnish, German, Romance, and Scandinavian languages are in isolatin-1.
|
|
</P>
|
|
<P>
|
|
Arabic and Russian are in UTF-8.
|
|
</P>
|
|
<P>
|
|
English is in pure ASCII.
|
|
</P>
|
|
<H3>The language independent ground API</H3>
|
|
<P>
|
|
This API is accessible by both <CODE>present</CODE> and <CODE>alltenses</CODE>.
|
|
The API is divided into a bunch of <CODE>abstract</CODE> modules.
|
|
The following figure gives the dependencies of these modules.
|
|
</P>
|
|
<P>
|
|
<IMG ALIGN="left" SRC="Grammar.png" BORDER="0" ALT="">
|
|
</P>
|
|
<P>
|
|
The documentation of the individual modules:
|
|
</P>
|
|
<UL>
|
|
<LI><A HREF="gfdoc/Common.html">Common</A>: abstract notions with language-indep. implementations
|
|
<LI><A HREF="gfdoc/Cat.html">Cat</A>: the category system
|
|
<LI><A HREF="gfdoc/Noun.html">Noun</A>: construction of nouns and noun phrases
|
|
<LI><A HREF="gfdoc/Adjective.html">Adjective</A>: construction of adjectival phrases
|
|
<LI><A HREF="gfdoc/Verb.html">Verb</A>: construction of verb phrases
|
|
<LI><A HREF="gfdoc/Adverb.html">Adverb</A>: construction of adverbial phrases
|
|
<LI><A HREF="gfdoc/Numeral.html">Numeral</A>: construction of cardinal and ordinal numerals
|
|
<LI><A HREF="gfdoc/Sentence.html">Sentence</A>: construction of sentences and imperatives
|
|
<LI><A HREF="gfdoc/Question.html">Question</A>: construction of questions
|
|
<LI><A HREF="gfdoc/Relative.html">Relative</A>: construction of relative clauses
|
|
<LI><A HREF="gfdoc/Conjunction.html">Conjunction</A>: coordination of phrases
|
|
<LI><A HREF="gfdoc/Phrase.html">Phrase</A>: construction of the major units of text and speech
|
|
<LI><A HREF="gfdoc/Text.html">Text</A>: construction of texts from phrases, using punctuation
|
|
<LI><A HREF="gfdoc/Idiom.html">Idiom</A>: idiomatic phrases, such as existentials
|
|
<LI><A HREF="gfdoc/Structural.html">Structural</A>: a lexicon of structural words
|
|
<LI><A HREF="gfdoc/Lexicon.html">Lexicon</A>: a lexicon of other common words, for test purposes
|
|
<LI><A HREF="gfdoc/Grammar.html">Grammar</A>: the main module comprising all but <CODE>Lexicon</CODE>
|
|
<LI><A HREF="gfdoc/Lang.html">Lang</A>: the main module comprising both <CODE>Grammar</CODE> and <CODE>Lexicon</CODE>
|
|
</UL>
|
|
|
|
<H3>The language-dependent APIs</H3>
|
|
<UL>
|
|
<LI><A HREF="gfdoc/ParadigmsDan.html">ParadigmsDan</A>: Danish lexical paradigms
|
|
<LI><A HREF="gfdoc/ParadigmsEng.html">ParadigmsEng</A>: English lexical paradigms
|
|
<LI><A HREF="gfdoc/ParadigmsFin.html">ParadigmsFin</A>: Finnish lexical paradigms
|
|
<LI><A HREF="gfdoc/ParadigmsFre.html">ParadigmsFre</A>: French lexical paradigms
|
|
<LI><A HREF="gfdoc/ParadigmsIta.html">ParadigmsIta</A>: Italian lexical paradigms
|
|
<LI><A HREF="gfdoc/ParadigmsGer.html">ParadigmsGer</A>: German lexical paradigms
|
|
<LI><A HREF="gfdoc/ParadigmsNor.html">ParadigmsNor</A>: Norwegian lexical paradigms
|
|
<LI><A HREF="gfdoc/ParadigmsRus.html">ParadigmsRus</A>: Russian lexical paradigms
|
|
<LI><A HREF="gfdoc/ParadigmsSpa.html">ParadigmsSpa</A>: Spanish lexical paradigms
|
|
<LI><A HREF="gfdoc/ParadigmsSwe.html">ParadigmsSwe</A>: Swedish lexical paradigms
|
|
</UL>
|
|
|
|
<UL>
|
|
<LI><A HREF="../danish/IrregDan.gf">IrregDan</A>: Danish irregular verbs (very incomplete)
|
|
<LI><A HREF="../english/IrregEng.gf">IrregEng</A>: English irregular verbs
|
|
<LI><A HREF="../french/IrregFre.gf">IrregFre</A>: French irregular verbs
|
|
<LI><A HREF="../german/IrregGer.gf">IrregGer</A>: German irregular verbs
|
|
<LI><A HREF="../norwegian/IrregNor.gf">IrregNor</A>: Norwegian irregular verbs (very incomplete)
|
|
<LI><A HREF="../spanish/IrregSpa.gf">IrregSpa</A>: Spanish irregular verbs
|
|
<LI><A HREF="../swedish/IrregSwe.gf">IrregSwe</A>: Swedish irregular verbs
|
|
</UL>
|
|
|
|
<P>
|
|
This is the structure of each language-dependent top module.
|
|
</P>
|
|
<P>
|
|
<IMG ALIGN="middle" SRC="English.png" BORDER="0" ALT="">
|
|
</P>
|
|
<UL>
|
|
<LI><A HREF="../abstract/Extra.gf">Extra</A>: extra constructs implemented in some languages
|
|
<LI><A HREF="../danish/ExtraDanAbs.gf">ExtraDan</A>: extra constructs in Danish
|
|
<LI><A HREF="../english/ExtraEngAbs.gf">ExtraEng</A>: extra constructs in English
|
|
<LI><A HREF="../finnish/ExtraFinAbs.gf">ExtraFin</A>: extra constructs in Finnish
|
|
<LI><A HREF="../french/ExtraFreAbs.gf">ExtraFre</A>: extra constructs in French
|
|
<LI><A HREF="../italian/ExtraItaAbs.gf">ExtraIta</A>: extra constructs in Italian
|
|
<LI><A HREF="../norwegian/ExtraNorAbs.gf">ExtraNor</A>: extra constructs in Norwegian
|
|
<LI><A HREF="../russian/ExtraRusAbs.gf">ExtraRus</A>: extra constructs in Russian
|
|
<LI><A HREF="../scandinavian/ExtraScandAbs.gf">ExtraScand</A>: extra constructs in Scandinavian
|
|
<LI><A HREF="../french/ExtraSpaAbs.gf">ExtraSpa</A>: extra constructs in Spanish
|
|
<LI><A HREF="../swedish/ExtraSweAbs.gf">ExtraSwe</A>: extra constructs in Swedish
|
|
</UL>
|
|
|
|
<UL>
|
|
<LI><A HREF="../danish/DanishAbs.gf">Danish</A>: Danish with all extras
|
|
<LI><A HREF="../english/EnglishAbs.gf">English</A>: English with all extras
|
|
<LI><A HREF="../finnish/FinnishAbs.gf">Finnish</A>: Finnish with all extras
|
|
<LI><A HREF="../french/FrenchAbs.gf">French</A>: French with all extras
|
|
<LI><A HREF="../german/GermanAbs.gf">German</A>: German with all extras
|
|
<LI><A HREF="../italian/ItalianAbs.gf">Italian</A>: Italian with all extras
|
|
<LI><A HREF="../norwegian/NorwegianAbs.gf">Norwegian</A>: Norwegian with all extras
|
|
<LI><A HREF="../russian/RussianAbs.gf">Russian</A>: Russian with all extras
|
|
<LI><A HREF="../spanish/SpanishAbs.gf">Spanish</A>: Spanish with all extras
|
|
<LI><A HREF="../swedish/SwedishAbs.gf">Swedish</A>: Swedish with all extras
|
|
</UL>
|
|
|
|
<H3>Special-purpose APIs</H3>
|
|
<H4>Present</H4>
|
|
<P>
|
|
The API is the same as for the full ground API, but the compiler
|
|
has ignored all verb and sentence tenses except the present.
|
|
Lines ignored in the source files are marked by <CODE>--# notpresent</CODE>.
|
|
The result is a smaller and more efficient grammar, which is still
|
|
sufficient for many applications.
|
|
</P>
|
|
<H4>Multimodal</H4>
|
|
<P>
|
|
The API is the same as for the full ground API, but with modified
|
|
linearization types of <CODE>NP</CODE> and <CODE>Adv</CODE>, and all other categories
|
|
depending on them: an extra field is added to a demonstrative pointing
|
|
gesture. Some functions for constructing demonstratives are provided.
|
|
</P>
|
|
<UL>
|
|
<LI><A HREF="gfdoc/Multi.html">Multi</A>: main module for multimodal dialogue systems
|
|
</UL>
|
|
|
|
<H4>Mathematical</H4>
|
|
<UL>
|
|
<LI><A HREF="gfdoc/Mathematical.html">Mathematical</A>: main module for mathematical language
|
|
<LI><A HREF="gfdoc/Predication.html">Predication</A>: predication with verbs, adjectives, etc
|
|
<LI><A HREF="gfdoc/Symbol.html">Symbol</A>: symbols and numbers in text
|
|
</UL>
|
|
|
|
<H2>Using the library</H2>
|
|
<H3>The compiled version</H3>
|
|
<P>
|
|
The simplest way to get the library is to install the precompiled version
|
|
<A HREF="../../compiled.tgz"><CODE>lib/compiled.tgz</CODE></A>. Just do
|
|
</P>
|
|
<PRE>
|
|
cd GF/lib
|
|
tar xvfz compiled.tgz
|
|
</PRE>
|
|
<P>
|
|
There is no need to link application grammars to the source directories of the
|
|
library. Use one (or several) of the following packages instead:
|
|
</P>
|
|
<UL>
|
|
<LI><CODE>lib/alltenses</CODE> the complete ground-API library with all forms
|
|
<LI><CODE>lib/present</CODE> a pruned ground-API library with present tense only
|
|
<LI><CODE>lib/mathematical</CODE> special-purpose API for mathematical applications
|
|
<LI><CODE>lib/multimodal</CODE> the complete ground-API with demonstratives for
|
|
multimodal dialogue applications
|
|
</UL>
|
|
|
|
<H3>Linking applications to libraries</H3>
|
|
<P>
|
|
Typically, open one of
|
|
</P>
|
|
<UL>
|
|
<LI><CODE>GrammarX</CODE> for just syntax
|
|
<LI><CODE>LangX</CODE> for both syntax and a small lexicon
|
|
<LI><CODE>X</CODE> (e.g. <CODE>English</CODE>) for syntax, lexicon, and language-dependent extensions
|
|
</UL>
|
|
|
|
<P>
|
|
Usually you also need your own lexicon, and hence have to open
|
|
</P>
|
|
<UL>
|
|
<LI><CODE>ParadigmsX</CODE> for lexicon-building functions
|
|
</UL>
|
|
|
|
<P>
|
|
It is advisable to use the bare package names in paths pointing to the
|
|
libraries. Here is an example, from <CODE>examples/dialogue/LightsEng.gf</CODE>:
|
|
</P>
|
|
<PRE>
|
|
--# -path=.:alltenses:multimodal:prelude
|
|
</PRE>
|
|
<P>
|
|
To reach these directories from anywhere, set the environment variable
|
|
<CODE>GF_LIB_PATH</CODE> to point to the directory <CODE>GF/lib/</CODE>. For instance,
|
|
I have the following line in my <CODE>.bashrc</CODE> file:
|
|
</P>
|
|
<PRE>
|
|
export GF_LIB_PATH=/home/aarne/GF/lib
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
The <CODE>mathematical</CODE> API shares modules with
|
|
<CODE>present</CODE>. It is therefore not a good idea to use it in combination with
|
|
<CODE>alltenses</CODE>.
|
|
</P>
|
|
<H3>Using the libraries as top-level grammars</H3>
|
|
<P>
|
|
If you have done <CODE>make</CODE> in <CODE>lib/resource-1.0</CODE>, you will have
|
|
a file <CODE>langs.gfcm</CODE>. This file can be used with fast startup for
|
|
tasks such as treebank generation:
|
|
</P>
|
|
<PRE>
|
|
> i -nocf langs.gfcm
|
|
> gr -cat=S -cf -number=10 | tb
|
|
</PRE>
|
|
<P>
|
|
The <CODE>-nocf</CODE> flag saves startup time and memory by preventing the
|
|
creation of context-free parse grammars.
|
|
The resource grammar libraries do <I>not</I> support
|
|
parsing very well. While it is theoretically possible to parse with any
|
|
GF grammar, the resource grammars are so abstract and complex that
|
|
building the actual parser in memory may just need too much resources
|
|
to succeed.
|
|
</P>
|
|
<P>
|
|
An exception is <CODE>LangEng</CODE>. It is actually feasible to parse with
|
|
both <CODE>alltenses/LangEng</CODE> and <CODE>present/LangEng</CODE> - the latter being
|
|
much faster than the former. The <CODE>-fcfg</CODE> flag (fast multiple context-free grammar)
|
|
must be used:
|
|
</P>
|
|
<PRE>
|
|
p -lang=LangEng -fcfg "this man is old"
|
|
</PRE>
|
|
<P>
|
|
Parsing with the <CODE>-fcfg</CODE> flag takes a few extra seconds the first time during
|
|
each session, but gets faster at later runs. From GF 2.6, <CODE>fcfg</CODE> is the
|
|
default parser of GF and the flag is not needed.
|
|
</P>
|
|
<P>
|
|
It is also possible to parse in Scandinavian languages
|
|
(Danish, Norwegian, Swedish) and, with enough memory (<CODE>gf +RTS -K512M</CODE>),
|
|
German.
|
|
</P>
|
|
<H2>Example applications</H2>
|
|
<P>
|
|
These applications are meant to serve as starting points for
|
|
new applications, showing how the libraries can be used in
|
|
typical situations.
|
|
</P>
|
|
<H3>Bronzeage</H3>
|
|
<P>
|
|
The <A HREF="../../../examples/bronzeage">examples/bronzeage</A>
|
|
grammar set implements a language fragment
|
|
based on the Swadesh list of 200 words. It is useful for
|
|
things like language training.
|
|
</P>
|
|
<H3>Dialogue</H3>
|
|
<P>
|
|
The <A HREF="../../../examples/dialogue">examples/dialogue</A>
|
|
grammar set implements the user grammars of some
|
|
multimodal dialogue system.
|
|
Its purpose is to serve as a prototype for applications in the
|
|
TALK project.
|
|
</P>
|
|
<H3>Animals</H3>
|
|
<P>
|
|
The <A HREF="../../../examples/animal">examples/animal</A>
|
|
grammar set implements some queries about animals.
|
|
Its purpose is to serve as a prototype for example-based
|
|
grammar writing.
|
|
</P>
|
|
<H2>Known bugs and missing components</H2>
|
|
<P>
|
|
Danish
|
|
</P>
|
|
<UL>
|
|
<LI>the lexicon and chosen inflections are only partially verified
|
|
</UL>
|
|
|
|
<P>
|
|
English
|
|
</P>
|
|
<P>
|
|
Finnish
|
|
</P>
|
|
<UL>
|
|
<LI>wrong cases in some passive constructions
|
|
</UL>
|
|
|
|
<P>
|
|
French
|
|
</P>
|
|
<UL>
|
|
<LI>multiple clitics (with V3) not always right
|
|
<LI>third person pronominal questions with inverted word order
|
|
have wrong forms if "t" is required e.g.
|
|
(e.g. "comment fera-t-il" becomes "comment fera il")
|
|
</UL>
|
|
|
|
<P>
|
|
German
|
|
</P>
|
|
<P>
|
|
Italian
|
|
</P>
|
|
<UL>
|
|
<LI>multiple clitics (with V3) not always right
|
|
</UL>
|
|
|
|
<P>
|
|
Norwegian
|
|
</P>
|
|
<UL>
|
|
<LI>the lexicon and chosen inflections are only partially verified
|
|
</UL>
|
|
|
|
<P>
|
|
Russian
|
|
</P>
|
|
<UL>
|
|
<LI>some functions missing
|
|
<LI>some regular paradigms are missing
|
|
</UL>
|
|
|
|
<P>
|
|
Spanish
|
|
</P>
|
|
<UL>
|
|
<LI>multiple clitics (with V3) not always right
|
|
<LI>missing contractions with imperatives and clitics
|
|
</UL>
|
|
|
|
<P>
|
|
Swedish
|
|
</P>
|
|
<H2>More reading</H2>
|
|
<P>
|
|
<A HREF="../../../doc/resource.pdf">GF Resource Grammar Library</A> (pdf).
|
|
Printable user manual with API documentation (version 1.0).
|
|
</P>
|
|
<P>
|
|
<A HREF="gslt-sem-2006.html">Grammars as Software Libraries</A>. Slides
|
|
with background and motivation for the resource grammar library.
|
|
</P>
|
|
<P>
|
|
<A HREF="clt2006.html">GF Resource Grammar Library Version 1.0</A>. Slides
|
|
giving an overview of the library and practical hints on its use.
|
|
</P>
|
|
<P>
|
|
<A HREF="Resource-HOWTO.html">How to write resource grammars</A>. Helps you
|
|
start if you want to add another language to the library.
|
|
</P>
|
|
<P>
|
|
<A HREF="http://www.cs.chalmers.se/~aarne/geocal2006.pdf">Parametrized modules for Romance languages</A>.
|
|
Slides explaining some ideas in the implementation of
|
|
French, Italian, and Spanish.
|
|
</P>
|
|
<P>
|
|
<A HREF="http://www.cs.chalmers.se/~aarne/slides/webalt-2005.pdf">Grammar writing by examples</A>.
|
|
Slides showing how the method is used.
|
|
</P>
|
|
<P>
|
|
<A HREF="http://www.cs.chalmers.se/~aarne/slides/talk-edin2005.pdf">Multimodal Resource Grammars</A>.
|
|
Slides showing how to use the multimodal resource library. N.B. the library
|
|
examples are from <CODE>multimodal/old</CODE>, which is a reduced-size API.
|
|
</P>
|
|
|
|
<!-- html code generated by txt2tags 2.4 (http://txt2tags.sf.net) -->
|
|
<!-- cmdline: txt2tags index.txt -->
|
|
</BODY></HTML>
|