mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-13 06:49:31 -06:00
333 lines
11 KiB
HTML
333 lines
11 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
|
|
<TITLE>GF Resource Grammar Library v. 1.0</TITLE>
|
|
</HEAD><BODY BGCOLOR="white" TEXT="black">
|
|
<P ALIGN="center"><CENTER><H1>GF Resource Grammar Library v. 1.0</H1>
|
|
<FONT SIZE="4">
|
|
<I>Author: Aarne Ranta <aarne (at) cs.chalmers.se></I><BR>
|
|
Last update: Thu Mar 2 12:03:59 2006
|
|
</FONT></CENTER>
|
|
|
|
<P></P>
|
|
<HR NOSHADE SIZE=1>
|
|
<P></P>
|
|
<UL>
|
|
<LI><A HREF="#toc1">Authors</A>
|
|
<LI><A HREF="#toc2">License</A>
|
|
<LI><A HREF="#toc3">Scope</A>
|
|
<UL>
|
|
<LI><A HREF="#toc4">The language independent ground API</A>
|
|
<LI><A HREF="#toc5">The language-dependent APIs</A>
|
|
<LI><A HREF="#toc6">Special-purpose APIs</A>
|
|
</UL>
|
|
<LI><A HREF="#toc7">Using the library</A>
|
|
<UL>
|
|
<LI><A HREF="#toc8">The compiled version</A>
|
|
<LI><A HREF="#toc9">Linking applications to libraries</A>
|
|
<LI><A HREF="#toc10">Using the libraries as top-level grammars</A>
|
|
</UL>
|
|
<LI><A HREF="#toc11">Example applications</A>
|
|
<UL>
|
|
<LI><A HREF="#toc12">Brozeage</A>
|
|
<LI><A HREF="#toc13">Tram</A>
|
|
<LI><A HREF="#toc14">Animals</A>
|
|
</UL>
|
|
<LI><A HREF="#toc15">More reading</A>
|
|
</UL>
|
|
|
|
<P></P>
|
|
<HR NOSHADE SIZE=1>
|
|
<P></P>
|
|
<P>
|
|
The GF Resource Grammar Library defines the basic grammar of
|
|
ten languages:
|
|
Danish, English, Finnish, French, German,
|
|
Italian, Norwegian, Russian, Spanish, Swedish.
|
|
</P>
|
|
<P>
|
|
<B>Notice</B>. This document concerns the API v. 1.0 which has not
|
|
yet been "officially" released. The release will be made in combination
|
|
with a new version of GF itself, since the grammars use new features
|
|
not available in GF 2.4.
|
|
</P>
|
|
<P>
|
|
V. 1.0 is not yet available for Russian and Danish: for them,
|
|
we refer to <A HREF="../../resource/">v. 0.9</A>.
|
|
</P>
|
|
<A NAME="toc1"></A>
|
|
<H2>Authors</H2>
|
|
<P>
|
|
Janna Khegai (Russian modules, forthcoming),
|
|
Bjorn Bringert (many Swadesh lexica),
|
|
Carlos Gonzalia (Spanish cardinals),
|
|
Partik Jansson (Swedish cardinals),
|
|
Aarne Ranta.
|
|
</P>
|
|
<P>
|
|
We are grateful for contributions and
|
|
comments to several other people who have used this and
|
|
the previous versions of the resource library, including
|
|
David Burke,
|
|
Lauri Carlson,
|
|
Gloria Casanellas,
|
|
Karin Cavallin,
|
|
Hans-Joachim Daniels,
|
|
Kristofer Johannisson,
|
|
Anni Laine,
|
|
Wanjiku Ng'ang'a,
|
|
Jordi Saludes.
|
|
</P>
|
|
<A NAME="toc2"></A>
|
|
<H2>License</H2>
|
|
<P>
|
|
The GF Resource Grammar Library is open-source software licensed under
|
|
GNU General Public License. See the file <A HREF="../LICENSE">LICENSE</A> for more
|
|
details.
|
|
</P>
|
|
<A NAME="toc3"></A>
|
|
<H2>Scope</H2>
|
|
<P>
|
|
Coverage, for each language:
|
|
</P>
|
|
<UL>
|
|
<LI>complete morphology
|
|
<LI>lexicon of the ca. 100 most important structural words
|
|
<LI>test lexicon of ca. 300 content words
|
|
<LI>representative fragment of syntax (cf. CLE (Core Language Engine))
|
|
<LI>rather flat semantics (cf. Quasi-Logical Form of CLE)
|
|
</UL>
|
|
|
|
<P>
|
|
Organization:
|
|
</P>
|
|
<UL>
|
|
<LI>top-level (API) modules
|
|
<LI>Ground API + special-purpose APIs
|
|
<LI>"school grammar" concepts rather than advanced linguistic theory
|
|
</UL>
|
|
|
|
<P>
|
|
Presentation:
|
|
</P>
|
|
<UL>
|
|
<LI>tool <CODE>gfdoc</CODE> for generating HTML from grammars
|
|
<LI>example collections
|
|
</UL>
|
|
|
|
<A NAME="toc4"></A>
|
|
<H3>The language independent ground API</H3>
|
|
<P>
|
|
This API is accessible by both <CODE>present</CODE> and <CODE>alltenses</CODE>.
|
|
The API is divided into a bunch of <CODE>abstract</CODE> modules.
|
|
The following figure gives the dependencies of these modules.
|
|
</P>
|
|
<P>
|
|
<IMG ALIGN="left" SRC="Lang.png" BORDER="0" ALT="">
|
|
</P>
|
|
<P>
|
|
The documentation of the individual modules:
|
|
</P>
|
|
<UL>
|
|
<LI><A HREF="gfdoc/Common.html">Common</A>: abstract notions with language-indep. implementations
|
|
<LI><A HREF="gfdoc/Cat.html">Cat</A>: the category system
|
|
<LI><A HREF="gfdoc/Noun.html">Noun</A>: construction of nouns and noun phrases
|
|
<LI><A HREF="gfdoc/Adjective.html">Adjective</A>: construction of adjectival phrases
|
|
<LI><A HREF="gfdoc/Verb.html">Verb</A>: construction of verb phrases
|
|
<LI><A HREF="gfdoc/Adverb.html">Adverb</A>: construction of adverbial phrases
|
|
<LI><A HREF="gfdoc/Numeral.html">Numeral</A>: construction of cardinal and ordinal numerals
|
|
<LI><A HREF="gfdoc/Sentence.html">Sentence</A>: construction of sentences and imperatives
|
|
<LI><A HREF="gfdoc/Question.html">Question</A>: construction of questions
|
|
<LI><A HREF="gfdoc/Relative.html">Relative</A>: construction of relative clauses
|
|
<LI><A HREF="gfdoc/Conjunction.html">Conjunction</A>: coordination of phrases
|
|
<LI><A HREF="gfdoc/Phrase.html">Phrase</A>: construction of the major units of text and speech
|
|
<LI><A HREF="gfdoc/Text.html">Text</A>: construction of texts from phrases, using punctuation
|
|
<LI><A HREF="gfdoc/Idiom.html">Idiom</A>: idiomatic phrases, such as existentials
|
|
<LI><A HREF="gfdoc/Structural.html">Structural</A>: a lexicon of structural words
|
|
<LI><A HREF="gfdoc/Lexicon.html">Lexicon</A>: a lexicon of other common words, for test purposes
|
|
<LI><A HREF="gfdoc/Lang.html">Lang</A>: the main module comprising all the others
|
|
</UL>
|
|
|
|
<A NAME="toc5"></A>
|
|
<H3>The language-dependent APIs</H3>
|
|
<UL>
|
|
<LI><A HREF="gfdoc/ParadigmsEng.html">ParadigmsEng</A>: English lexical paradigms
|
|
<LI><A HREF="gfdoc/ParadigmsFin.html">ParadigmsFin</A>: Finnish lexical paradigms
|
|
<LI><A HREF="gfdoc/ParadigmsFre.html">ParadigmsFre</A>: French lexical paradigms
|
|
<LI><A HREF="gfdoc/ParadigmsIta.html">ParadigmsIta</A>: Italian lexical paradigms
|
|
<LI><A HREF="gfdoc/ParadigmsGer.html">ParadigmsGer</A>: German lexical paradigms
|
|
<LI><A HREF="gfdoc/ParadigmsNor.html">ParadigmsNor</A>: Norwegian lexical paradigms
|
|
<LI><A HREF="gfdoc/ParadigmsSpa.html">ParadigmsSpa</A>: Spanish lexical paradigms
|
|
<LI><A HREF="gfdoc/ParadigmsSwe.html">ParadigmsSwe</A>: Swedish lexical paradigms
|
|
</UL>
|
|
|
|
<UL>
|
|
<LI><A HREF="gfdoc/IrregEng.gf">IrregEng</A>: English irregular verbs
|
|
<LI><A HREF="gfdoc/IrregFre.gf">IrregFre</A>: French irregular verbs
|
|
<LI><A HREF="gfdoc/IrregNor.gf">IrregNor</A>: Norwegian irregular verbs
|
|
<LI><A HREF="gfdoc/IrregSwe.gf">IrregSwe</A>: Swedish irregular verbs
|
|
</UL>
|
|
|
|
<A NAME="toc6"></A>
|
|
<H3>Special-purpose APIs</H3>
|
|
<H4>Present</H4>
|
|
<P>
|
|
The API is the same as for the full ground API, but the compiler
|
|
has ignored all verb and sentence tenses except the present.
|
|
Lines ignored in the source files are marked by <CODE>--# notpresent</CODE>.
|
|
The result is a smaller and more efficient grammar, which is still
|
|
sufficient for many applications.
|
|
</P>
|
|
<H4>Multimodal</H4>
|
|
<UL>
|
|
<LI><A HREF="gfdoc/Multimodal.html">Multimodal</A>: main module for multimodal dialogue systems
|
|
<LI><A HREF="gfdoc/Demonstrative.html">Demonstrative</A>: demonstrative noun phrases and adverbs
|
|
</UL>
|
|
|
|
<H4>Mathematical</H4>
|
|
<UL>
|
|
<LI><A HREF="gfdoc/Mathematical.html">Mathematical</A>: main module for mathematical language
|
|
<LI><A HREF="gfdoc/Predication.html">Predication</A>: predication with verbs, adjectives, etc
|
|
<LI><A HREF="gfdoc/Symbol.html">Symbol</A>: symbols and numbers in text
|
|
</UL>
|
|
|
|
<A NAME="toc7"></A>
|
|
<H2>Using the library</H2>
|
|
<A NAME="toc8"></A>
|
|
<H3>The compiled version</H3>
|
|
<P>
|
|
The simplest way to get the library is to install the precompiled version
|
|
<A HREF="../../compiled.tgz"><CODE>lib/compiled.tgz</CODE></A>. Just do
|
|
</P>
|
|
<PRE>
|
|
cd GF/lib
|
|
tar xvfz compiled.tgz
|
|
</PRE>
|
|
<P>
|
|
There is no need to link application grammars to the source directories of the
|
|
library. Use one (or several) of the following packages instead:
|
|
</P>
|
|
<UL>
|
|
<LI><CODE>lib/alltenses</CODE> the complete ground-API library with all forms
|
|
<LI><CODE>lib/present</CODE> a pruned ground-API library with present tense only
|
|
<LI><CODE>lib/mathematical</CODE> special-purpose API for mathematical applications
|
|
<LI><CODE>lib/multimodal</CODE> special-purpose API for multimodal dialogue applications
|
|
</UL>
|
|
|
|
<A NAME="toc9"></A>
|
|
<H3>Linking applications to libraries</H3>
|
|
<P>
|
|
Notice, however, that both special-purpose APIs share modules with
|
|
<CODE>present</CODE>. It is therefore not a good idea to use them in combination with
|
|
<CODE>alltenses</CODE>.
|
|
</P>
|
|
<P>
|
|
It is advisable to use the bare package names in paths pointing to the
|
|
libraries. Here is an example, from <CODE>examples/tram</CODE>:
|
|
</P>
|
|
<PRE>
|
|
--# -path=.:present:multimodal:mathematical:prelude
|
|
</PRE>
|
|
<P>
|
|
To reach these directories from anywhere, set the environment variable
|
|
<CODE>GF_LIB_PATH</CODE> to point to the directory <CODE>GF/lib/</CODE>. For instance,
|
|
I have the following line in my <CODE>.bashrc</CODE> file:
|
|
</P>
|
|
<PRE>
|
|
export GF_LIB_PATH=/home/aarne/GF/lib
|
|
</PRE>
|
|
<P></P>
|
|
<A NAME="toc10"></A>
|
|
<H3>Using the libraries as top-level grammars</H3>
|
|
<P>
|
|
If you have done <CODE>make</CODE> in <CODE>lib/resource-1.0</CODE>, you will have
|
|
a file <CODE>langs.gfcm</CODE>. This file can be used with fast startup for
|
|
tasks such as treebank generation:
|
|
</P>
|
|
<PRE>
|
|
> i -nocf langs.gfcm
|
|
> gr -cat=S -cf -number=10 | tb
|
|
</PRE>
|
|
<P>
|
|
The <CODE>-nocf</CODE> flag saves startup time and memory by preventing the
|
|
creation of context-free parse grammars.
|
|
The resource grammar libraries do <I>not</I> support
|
|
parsing very well. While it is theoretically possible to parse with any
|
|
GF grammar, the resource grammars are so abstract and complex that
|
|
building the actual parser in memory may just need too much resources
|
|
to succeed.
|
|
</P>
|
|
<P>
|
|
An exception is <CODE>LangEng</CODE>. It is actually feasible to parse with
|
|
both <CODE>alltenses/LangEng</CODE> and <CODE>present/LangEng</CODE> - the latter being
|
|
much faster than the former. The <CODE>-mcfg</CODE> flag (multiple context-free grammar)
|
|
must be used:
|
|
</P>
|
|
<PRE>
|
|
p -lang=LangEng -mcfg "this man is old"
|
|
</PRE>
|
|
<P>
|
|
Parsing with the <CODE>-mcfg</CODE> flag takes a few extra seconds the first time during
|
|
each session, but gets faster at later runs.
|
|
</P>
|
|
<A NAME="toc11"></A>
|
|
<H2>Example applications</H2>
|
|
<P>
|
|
These applications are meand to serve as starting points for
|
|
new applications, showing how the libraries can be used in
|
|
typical situations.
|
|
</P>
|
|
<A NAME="toc12"></A>
|
|
<H3>Brozeage</H3>
|
|
<P>
|
|
The <A HREF="../../../examples/bronzeage">examples/bronzeage</A>
|
|
grammar set implements a language fragment
|
|
based on the Swadesh list of 200 words. It is useful for
|
|
things like language training.
|
|
</P>
|
|
<A NAME="toc13"></A>
|
|
<H3>Tram</H3>
|
|
<P>
|
|
The <A HREF="../../../examples/tram">examples/tram</A>
|
|
grammar set implements the user grammar of a
|
|
multimodal dialogue system concerning public transport.
|
|
Its purpose is to serve as a prototype for applications in the
|
|
TALK project.
|
|
</P>
|
|
<A NAME="toc14"></A>
|
|
<H3>Animals</H3>
|
|
<P>
|
|
The <A HREF="../../../examples/animal">examples/animal</A>
|
|
grammar set implements some queries about animals.
|
|
Its purpose is to serve as a prototype for example-based
|
|
grammar writing.
|
|
</P>
|
|
<A NAME="toc15"></A>
|
|
<H2>More reading</H2>
|
|
<P>
|
|
<A HREF="gslt-sem-2006.html">Grammars as Software Libraries</A>. Slides
|
|
with background and motivation for the resource grammar library.
|
|
</P>
|
|
<P>
|
|
<A HREF="Resource-HOWTO.html">How to write resource grammars</A>. Helps you
|
|
start if you want to add another language to the library.
|
|
</P>
|
|
<P>
|
|
<A HREF="http://www.cs.chalmers.se/~aarne/geocal2006.pdf">Parametrized modules for Romance languages</A>.
|
|
Slides explaining some ideas in the implementation of
|
|
French, Italian, and Spanish.
|
|
</P>
|
|
<P>
|
|
<A HREF="http://www.cs.chalmers.se/~aarne/slides/webalt-2005.pdf">Grammar writing by examples</A>.
|
|
Slides showing how the method is used.
|
|
</P>
|
|
<P>
|
|
<A HREF="http://www.cs.chalmers.se/~aarne/slides/talk-edin2005.pdf">Multimodal Resource Grammars</A>.
|
|
Slides showing how to use the multimodal resource library.
|
|
</P>
|
|
|
|
<!-- html code generated by txt2tags 2.0 (http://txt2tags.sf.net) -->
|
|
<!-- cmdline: txt2tags -\-toc -thtml index.txt -->
|
|
</BODY></HTML>
|