Files
gf-core/lib/resource/doc/index-1.1.html
2007-12-12 20:30:11 +00:00

500 lines
16 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
<TITLE>GF Resource Grammar Library v. 1.1</TITLE>
</HEAD><BODY BGCOLOR="white" TEXT="black">
<P ALIGN="center"><CENTER><H1>GF Resource Grammar Library v. 1.1</H1>
<FONT SIZE="4">
<I>Author: Aarne Ranta &lt;aarne (at) cs.chalmers.se&gt;</I><BR>
Last update: Thu Apr 19 23:35:29 2007
</FONT></CENTER>
<P>
The GF Resource Grammar Library defines the basic grammar of
ten languages:
Danish, English, Finnish, French, German,
Italian, Norwegian, Russian, Spanish, Swedish.
A still incomplete implementation for Arabic is also
included.
</P>
<P>
<B>New in Version 1.1</B>
</P>
<UL>
<LI>Simpler APIs using overloading:
<UL>
<LI><A HREF="gfdoc/Constructors.html">Constructors</A>: almost all trees in a category <CODE>C</CODE>
can be built by the function <CODE>mkC</CODE>.
<LI><A HREF="gfdoc/Combinators.html">Combinators</A>: cross-cut grammatical functions:
predication, application, modification, coordination.
<LI><A HREF="gfdoc/Symbolic.html">Symbolic</A>: noun phrases with mathematical symbols.
</UL>
</UL>
<P>
An example of use is <A HREF="../../../examples/logic"><CODE>logic</CODE></A>.
The API of version 1.0 remains valid and can be used in combination with this.
</P>
<UL>
<LI>Some new functions.
<LI>Bug fixes.
</UL>
<H2>Authors</H2>
<P>
Inger Andersson and Therese Soderberg (Spanish morphology),
Nicolas Barth and Sylvain Pogodalla (French verb list),
Ali El Dada (Arabic modules),
Janna Khegai (Russian modules),
Bjorn Bringert (many Swadesh lexica),
Carlos Gonzalía (Spanish cardinals),
Harald Hammarström (German morphology),
Patrik Jansson (Swedish cardinals),
Andreas Priesnitz (German lexicon),
Aarne Ranta.
</P>
<P>
We are grateful for contributions and
comments to several other people who have used this and
the previous versions of the resource library, including
Ludmilla Bogavac,
Ana Bove,
David Burke,
Lauri Carlson,
Gloria Casanellas,
Karin Cavallin,
Robin Cooper,
Hans-Joachim Daniels,
Elisabet Engdahl,
Markus Forsberg,
Kristofer Johannisson,
Anni Laine,
Peter Ljunglöf,
Saara Myllyntausta,
Wanjiku Ng'ang'a,
Jordi Saludes.
</P>
<H2>License</H2>
<P>
The GF Resource Grammar Library is open-source software licensed under
GNU General Public License. See the file <A HREF="../LICENSE">LICENSE</A> for more
details.
</P>
<H2>Scope</H2>
<P>
Coverage, for each language:
</P>
<UL>
<LI>complete morphology
<LI>lexicon of the ca. 100 most important structural words
<LI>test lexicon of ca. 300 content words (rough equivalents in each language)
<LI>list of irregular verbs (separately for each language)
<LI>representative fragment of syntax (cf. CLE (Core Language Engine))
<LI>rather flat semantics (cf. Quasi-Logical Form of CLE)
</UL>
<P>
Organization:
</P>
<UL>
<LI>top-level (API) modules
<LI>Ground API + special-purpose APIs
<LI>"school grammar" concepts rather than advanced linguistic theory
</UL>
<P>
Presentation:
</P>
<UL>
<LI>tool <CODE>gfdoc</CODE> for generating HTML from grammars
<LI>example collections
</UL>
<H2>Quick start</H2>
<P>
Go to the main directory, compile the grammars, and run a test.
</P>
<PRE>
cd GF/lib/resource-1.0
make
make test
</PRE>
<P>
This will take quite some time. An alternative is to use the
precompiled grammar package <A HREF="../../compiled.tgz"><CODE>compiled.tgz</CODE></A>.
This package has the necessary <CODE>gfc</CODE> and <CODE>gfr</CODE> files directly under <CODE>GF/lib</CODE>.
</P>
<PRE>
GF/lib/alltenses
GF/lib/mathematical
GF/lib/multimodal
GF/lib/present
</PRE>
<P>
Do for instance
</P>
<PRE>
cd GF/lib/
gf
&gt; i -path=present:prelude present/LangEng.gfc
&gt; gr -cat=S -number=3 -cf | tb
</PRE>
<P>
For more examples, see the <A HREF="clt2006.html">Overview slides</A>.
The <CODE>make</CODE> procedure does not make Arabic, but it can
be compiled in a similar way as the other languages.
</P>
<H2>Encoding</H2>
<P>
Finnish, German, Romance, and Scandinavian languages are in isolatin-1.
</P>
<P>
Arabic and Russian are in UTF-8.
</P>
<P>
English is in pure ASCII.
</P>
<H3>The language independent ground API</H3>
<P>
This API is accessible by both <CODE>present</CODE> and <CODE>alltenses</CODE>.
The API is divided into a bunch of <CODE>abstract</CODE> modules.
The following figure gives the dependencies of these modules.
</P>
<P>
<IMG ALIGN="left" SRC="Grammar.png" BORDER="0" ALT="">
</P>
<P>
The documentation of the individual modules:
</P>
<UL>
<LI><A HREF="gfdoc/Common.html">Common</A>: abstract notions with language-indep. implementations
<LI><A HREF="gfdoc/Cat.html">Cat</A>: the category system
<LI><A HREF="gfdoc/Noun.html">Noun</A>: construction of nouns and noun phrases
<LI><A HREF="gfdoc/Adjective.html">Adjective</A>: construction of adjectival phrases
<LI><A HREF="gfdoc/Verb.html">Verb</A>: construction of verb phrases
<LI><A HREF="gfdoc/Adverb.html">Adverb</A>: construction of adverbial phrases
<LI><A HREF="gfdoc/Numeral.html">Numeral</A>: construction of cardinal and ordinal numerals
<LI><A HREF="gfdoc/Sentence.html">Sentence</A>: construction of sentences and imperatives
<LI><A HREF="gfdoc/Question.html">Question</A>: construction of questions
<LI><A HREF="gfdoc/Relative.html">Relative</A>: construction of relative clauses
<LI><A HREF="gfdoc/Conjunction.html">Conjunction</A>: coordination of phrases
<LI><A HREF="gfdoc/Phrase.html">Phrase</A>: construction of the major units of text and speech
<LI><A HREF="gfdoc/Text.html">Text</A>: construction of texts from phrases, using punctuation
<LI><A HREF="gfdoc/Idiom.html">Idiom</A>: idiomatic phrases, such as existentials
<LI><A HREF="gfdoc/Structural.html">Structural</A>: a lexicon of structural words
<LI><A HREF="gfdoc/Lexicon.html">Lexicon</A>: a lexicon of other common words, for test purposes
<LI><A HREF="gfdoc/Grammar.html">Grammar</A>: the main module comprising all but <CODE>Lexicon</CODE>
<LI><A HREF="gfdoc/Lang.html">Lang</A>: the main module comprising both <CODE>Grammar</CODE> and <CODE>Lexicon</CODE>
</UL>
<H3>The language-dependent APIs</H3>
<UL>
<LI><A HREF="gfdoc/ParadigmsDan.html">ParadigmsDan</A>: Danish lexical paradigms
<LI><A HREF="gfdoc/ParadigmsEng.html">ParadigmsEng</A>: English lexical paradigms
<LI><A HREF="gfdoc/ParadigmsFin.html">ParadigmsFin</A>: Finnish lexical paradigms
<LI><A HREF="gfdoc/ParadigmsFre.html">ParadigmsFre</A>: French lexical paradigms
<LI><A HREF="gfdoc/ParadigmsIta.html">ParadigmsIta</A>: Italian lexical paradigms
<LI><A HREF="gfdoc/ParadigmsGer.html">ParadigmsGer</A>: German lexical paradigms
<LI><A HREF="gfdoc/ParadigmsNor.html">ParadigmsNor</A>: Norwegian lexical paradigms
<LI><A HREF="gfdoc/ParadigmsRus.html">ParadigmsRus</A>: Russian lexical paradigms
<LI><A HREF="gfdoc/ParadigmsSpa.html">ParadigmsSpa</A>: Spanish lexical paradigms
<LI><A HREF="gfdoc/ParadigmsSwe.html">ParadigmsSwe</A>: Swedish lexical paradigms
</UL>
<UL>
<LI><A HREF="../danish/IrregDan.gf">IrregDan</A>: Danish irregular verbs (very incomplete)
<LI><A HREF="../english/IrregEng.gf">IrregEng</A>: English irregular verbs
<LI><A HREF="../french/IrregFre.gf">IrregFre</A>: French irregular verbs
<LI><A HREF="../german/IrregGer.gf">IrregGer</A>: German irregular verbs
<LI><A HREF="../norwegian/IrregNor.gf">IrregNor</A>: Norwegian irregular verbs (very incomplete)
<LI><A HREF="../spanish/IrregSpa.gf">IrregSpa</A>: Spanish irregular verbs
<LI><A HREF="../swedish/IrregSwe.gf">IrregSwe</A>: Swedish irregular verbs
</UL>
<P>
This is the structure of each language-dependent top module.
</P>
<P>
<IMG ALIGN="middle" SRC="English.png" BORDER="0" ALT="">
</P>
<UL>
<LI><A HREF="../abstract/Extra.gf">Extra</A>: extra constructs implemented in some languages
<LI><A HREF="../danish/ExtraDanAbs.gf">ExtraDan</A>: extra constructs in Danish
<LI><A HREF="../english/ExtraEngAbs.gf">ExtraEng</A>: extra constructs in English
<LI><A HREF="../finnish/ExtraFinAbs.gf">ExtraFin</A>: extra constructs in Finnish
<LI><A HREF="../french/ExtraFreAbs.gf">ExtraFre</A>: extra constructs in French
<LI><A HREF="../italian/ExtraItaAbs.gf">ExtraIta</A>: extra constructs in Italian
<LI><A HREF="../norwegian/ExtraNorAbs.gf">ExtraNor</A>: extra constructs in Norwegian
<LI><A HREF="../russian/ExtraRusAbs.gf">ExtraRus</A>: extra constructs in Russian
<LI><A HREF="../scandinavian/ExtraScandAbs.gf">ExtraScand</A>: extra constructs in Scandinavian
<LI><A HREF="../french/ExtraSpaAbs.gf">ExtraSpa</A>: extra constructs in Spanish
<LI><A HREF="../swedish/ExtraSweAbs.gf">ExtraSwe</A>: extra constructs in Swedish
</UL>
<UL>
<LI><A HREF="../danish/DanishAbs.gf">Danish</A>: Danish with all extras
<LI><A HREF="../english/EnglishAbs.gf">English</A>: English with all extras
<LI><A HREF="../finnish/FinnishAbs.gf">Finnish</A>: Finnish with all extras
<LI><A HREF="../french/FrenchAbs.gf">French</A>: French with all extras
<LI><A HREF="../german/GermanAbs.gf">German</A>: German with all extras
<LI><A HREF="../italian/ItalianAbs.gf">Italian</A>: Italian with all extras
<LI><A HREF="../norwegian/NorwegianAbs.gf">Norwegian</A>: Norwegian with all extras
<LI><A HREF="../russian/RussianAbs.gf">Russian</A>: Russian with all extras
<LI><A HREF="../spanish/SpanishAbs.gf">Spanish</A>: Spanish with all extras
<LI><A HREF="../swedish/SwedishAbs.gf">Swedish</A>: Swedish with all extras
</UL>
<H3>Special-purpose APIs</H3>
<H4>Present</H4>
<P>
The API is the same as for the full ground API, but the compiler
has ignored all verb and sentence tenses except the present.
Lines ignored in the source files are marked by <CODE>--# notpresent</CODE>.
The result is a smaller and more efficient grammar, which is still
sufficient for many applications.
</P>
<H4>Multimodal</H4>
<P>
The API is the same as for the full ground API, but with modified
linearization types of <CODE>NP</CODE> and <CODE>Adv</CODE>, and all other categories
depending on them: an extra field is added to a demonstrative pointing
gesture. Some functions for constructing demonstratives are provided.
</P>
<UL>
<LI><A HREF="gfdoc/Multi.html">Multi</A>: main module for multimodal dialogue systems
</UL>
<H4>Mathematical</H4>
<UL>
<LI><A HREF="gfdoc/Mathematical.html">Mathematical</A>: main module for mathematical language
<LI><A HREF="gfdoc/Predication.html">Predication</A>: predication with verbs, adjectives, etc
<LI><A HREF="gfdoc/Symbol.html">Symbol</A>: symbols and numbers in text
</UL>
<H2>Using the library</H2>
<H3>The compiled version</H3>
<P>
The simplest way to get the library is to install the precompiled version
<A HREF="../../compiled.tgz"><CODE>lib/compiled.tgz</CODE></A>. Just do
</P>
<PRE>
cd GF/lib
tar xvfz compiled.tgz
</PRE>
<P>
There is no need to link application grammars to the source directories of the
library. Use one (or several) of the following packages instead:
</P>
<UL>
<LI><CODE>lib/alltenses</CODE> the complete ground-API library with all forms
<LI><CODE>lib/present</CODE> a pruned ground-API library with present tense only
<LI><CODE>lib/mathematical</CODE> special-purpose API for mathematical applications
<LI><CODE>lib/multimodal</CODE> the complete ground-API with demonstratives for
multimodal dialogue applications
</UL>
<H3>Linking applications to libraries</H3>
<P>
Typically, open one of
</P>
<UL>
<LI><CODE>GrammarX</CODE> for just syntax
<LI><CODE>LangX</CODE> for both syntax and a small lexicon
<LI><CODE>X</CODE> (e.g. <CODE>English</CODE>) for syntax, lexicon, and language-dependent extensions
</UL>
<P>
Usually you also need your own lexicon, and hence have to open
</P>
<UL>
<LI><CODE>ParadigmsX</CODE> for lexicon-building functions
</UL>
<P>
It is advisable to use the bare package names in paths pointing to the
libraries. Here is an example, from <CODE>examples/dialogue/LightsEng.gf</CODE>:
</P>
<PRE>
--# -path=.:alltenses:multimodal:prelude
</PRE>
<P>
To reach these directories from anywhere, set the environment variable
<CODE>GF_LIB_PATH</CODE> to point to the directory <CODE>GF/lib/</CODE>. For instance,
I have the following line in my <CODE>.bashrc</CODE> file:
</P>
<PRE>
export GF_LIB_PATH=/home/aarne/GF/lib
</PRE>
<P></P>
<P>
The <CODE>mathematical</CODE> API shares modules with
<CODE>present</CODE>. It is therefore not a good idea to use it in combination with
<CODE>alltenses</CODE>.
</P>
<H3>Using the libraries as top-level grammars</H3>
<P>
If you have done <CODE>make</CODE> in <CODE>lib/resource-1.0</CODE>, you will have
a file <CODE>langs.gfcm</CODE>. This file can be used with fast startup for
tasks such as treebank generation:
</P>
<PRE>
&gt; i -nocf langs.gfcm
&gt; gr -cat=S -cf -number=10 | tb
</PRE>
<P>
The <CODE>-nocf</CODE> flag saves startup time and memory by preventing the
creation of context-free parse grammars.
The resource grammar libraries do <I>not</I> support
parsing very well. While it is theoretically possible to parse with any
GF grammar, the resource grammars are so abstract and complex that
building the actual parser in memory may just need too much resources
to succeed.
</P>
<P>
An exception is <CODE>LangEng</CODE>. It is actually feasible to parse with
both <CODE>alltenses/LangEng</CODE> and <CODE>present/LangEng</CODE> - the latter being
much faster than the former. The <CODE>-fcfg</CODE> flag (fast multiple context-free grammar)
must be used:
</P>
<PRE>
p -lang=LangEng -fcfg "this man is old"
</PRE>
<P>
Parsing with the <CODE>-fcfg</CODE> flag takes a few extra seconds the first time during
each session, but gets faster at later runs. From GF 2.6, <CODE>fcfg</CODE> is the
default parser of GF and the flag is not needed.
</P>
<P>
It is also possible to parse in Scandinavian languages
(Danish, Norwegian, Swedish) and, with enough memory (<CODE>gf +RTS -K512M</CODE>),
German.
</P>
<H2>Example applications</H2>
<P>
These applications are meant to serve as starting points for
new applications, showing how the libraries can be used in
typical situations.
</P>
<H3>Bronzeage</H3>
<P>
The <A HREF="../../../examples/bronzeage">examples/bronzeage</A>
grammar set implements a language fragment
based on the Swadesh list of 200 words. It is useful for
things like language training.
</P>
<H3>Dialogue</H3>
<P>
The <A HREF="../../../examples/dialogue">examples/dialogue</A>
grammar set implements the user grammars of some
multimodal dialogue system.
Its purpose is to serve as a prototype for applications in the
TALK project.
</P>
<H3>Animals</H3>
<P>
The <A HREF="../../../examples/animal">examples/animal</A>
grammar set implements some queries about animals.
Its purpose is to serve as a prototype for example-based
grammar writing.
</P>
<H2>Known bugs and missing components</H2>
<P>
Danish
</P>
<UL>
<LI>the lexicon and chosen inflections are only partially verified
</UL>
<P>
English
</P>
<P>
Finnish
</P>
<UL>
<LI>wrong cases in some passive constructions
</UL>
<P>
French
</P>
<UL>
<LI>multiple clitics (with V3) not always right
<LI>third person pronominal questions with inverted word order
have wrong forms if "t" is required e.g.
(e.g. "comment fera-t-il" becomes "comment fera il")
</UL>
<P>
German
</P>
<P>
Italian
</P>
<UL>
<LI>multiple clitics (with V3) not always right
</UL>
<P>
Norwegian
</P>
<UL>
<LI>the lexicon and chosen inflections are only partially verified
</UL>
<P>
Russian
</P>
<UL>
<LI>some functions missing
<LI>some regular paradigms are missing
</UL>
<P>
Spanish
</P>
<UL>
<LI>multiple clitics (with V3) not always right
<LI>missing contractions with imperatives and clitics
</UL>
<P>
Swedish
</P>
<H2>More reading</H2>
<P>
<A HREF="../../../doc/resource.pdf">GF Resource Grammar Library</A> (pdf).
Printable user manual with API documentation (version 1.0).
</P>
<P>
<A HREF="gslt-sem-2006.html">Grammars as Software Libraries</A>. Slides
with background and motivation for the resource grammar library.
</P>
<P>
<A HREF="clt2006.html">GF Resource Grammar Library Version 1.0</A>. Slides
giving an overview of the library and practical hints on its use.
</P>
<P>
<A HREF="Resource-HOWTO.html">How to write resource grammars</A>. Helps you
start if you want to add another language to the library.
</P>
<P>
<A HREF="http://www.cs.chalmers.se/~aarne/geocal2006.pdf">Parametrized modules for Romance languages</A>.
Slides explaining some ideas in the implementation of
French, Italian, and Spanish.
</P>
<P>
<A HREF="http://www.cs.chalmers.se/~aarne/slides/webalt-2005.pdf">Grammar writing by examples</A>.
Slides showing how the method is used.
</P>
<P>
<A HREF="http://www.cs.chalmers.se/~aarne/slides/talk-edin2005.pdf">Multimodal Resource Grammars</A>.
Slides showing how to use the multimodal resource library. N.B. the library
examples are from <CODE>multimodal/old</CODE>, which is a reduced-size API.
</P>
<!-- html code generated by txt2tags 2.4 (http://txt2tags.sf.net) -->
<!-- cmdline: txt2tags index.txt -->
</BODY></HTML>