forked from GitHub/gf-core
index for new resource API
This commit is contained in:
305
lib/resource-1.0/doc/index.html
Normal file
305
lib/resource-1.0/doc/index.html
Normal file
@@ -0,0 +1,305 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<HTML>
|
||||
<HEAD>
|
||||
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
|
||||
<TITLE>GF Resource Grammar Library v. 1.2</TITLE>
|
||||
</HEAD><BODY BGCOLOR="white" TEXT="black">
|
||||
<P ALIGN="center"><CENTER><H1>GF Resource Grammar Library v. 1.2</H1>
|
||||
<FONT SIZE="4">
|
||||
<I>Author: Aarne Ranta <aarne (at) cs.chalmers.se></I><BR>
|
||||
Last update: Wed Jul 4 23:00:32 2007
|
||||
</FONT></CENTER>
|
||||
|
||||
<P>
|
||||
The GF Resource Grammar Library defines the basic grammar of
|
||||
ten languages:
|
||||
Danish, English, Finnish, French, German,
|
||||
Italian, Norwegian, Russian, Spanish, Swedish.
|
||||
Still incomplete implementations for Arabic and Catalan are also
|
||||
included.
|
||||
</P>
|
||||
<P>
|
||||
<B>New in Version 1.2</B>
|
||||
</P>
|
||||
<UL>
|
||||
<LI>Simpler APIs using overloading: see <A HREF="synopsis.html">Synopsis</A>.
|
||||
The API of version 1.0 remains valid and can be used in combination with this.
|
||||
<LI>Bug fixes.
|
||||
</UL>
|
||||
|
||||
<H2>Authors</H2>
|
||||
<P>
|
||||
Inger Andersson and Therese Soderberg (Spanish morphology),
|
||||
Nicolas Barth and Sylvain Pogodalla (French verb list),
|
||||
Ali El Dada (Arabic modules),
|
||||
Janna Khegai (Russian modules),
|
||||
Bjorn Bringert (many Swadesh lexica),
|
||||
Carlos Gonzalía (Spanish cardinals),
|
||||
Harald Hammarström (German morphology),
|
||||
Patrik Jansson (Swedish cardinals),
|
||||
Andreas Priesnitz (German lexicon),
|
||||
Aarne Ranta,
|
||||
Jordi Saludes (Catalan modules),
|
||||
Henning Thielemann (German lexicon).
|
||||
</P>
|
||||
<P>
|
||||
We are grateful for contributions and
|
||||
comments to several other people who have used this and
|
||||
the previous versions of the resource library, including
|
||||
Ludmilla Bogavac,
|
||||
Ana Bove,
|
||||
David Burke,
|
||||
Lauri Carlson,
|
||||
Gloria Casanellas,
|
||||
Karin Cavallin,
|
||||
Robin Cooper,
|
||||
Hans-Joachim Daniels,
|
||||
Elisabet Engdahl,
|
||||
Markus Forsberg,
|
||||
Kristofer Johannisson,
|
||||
Anni Laine,
|
||||
Peter Ljunglöf,
|
||||
Saara Myllyntausta,
|
||||
Wanjiku Ng'ang'a,
|
||||
Nadine Perera,
|
||||
Jordi Saludes.
|
||||
</P>
|
||||
<H2>License</H2>
|
||||
<P>
|
||||
The GF Resource Grammar Library is open-source software licensed under
|
||||
GNU Lesser General Public License (LGPL). See the file <A HREF="../LICENSE">LICENSE</A> for more
|
||||
details.
|
||||
</P>
|
||||
<H2>Scope</H2>
|
||||
<P>
|
||||
Coverage, for each language:
|
||||
</P>
|
||||
<UL>
|
||||
<LI>complete morphology
|
||||
<LI>lexicon of the ca. 100 most important structural words
|
||||
<LI>test lexicon of ca. 300 content words (rough equivalents in each language)
|
||||
<LI>list of irregular verbs (separately for each language)
|
||||
<LI>representative fragment of syntax (cf. CLE (Core Language Engine))
|
||||
<LI>rather flat semantics (cf. Quasi-Logical Form of CLE)
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
Organization:
|
||||
</P>
|
||||
<UL>
|
||||
<LI>top-level (API) modules
|
||||
<LI>Ground API + special-purpose APIs
|
||||
<LI>"school grammar" concepts rather than advanced linguistic theory
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
Presentation:
|
||||
</P>
|
||||
<UL>
|
||||
<LI>tool <CODE>gfdoc</CODE> for generating HTML from grammars
|
||||
<LI>example collections
|
||||
</UL>
|
||||
|
||||
<H2>Location</H2>
|
||||
<P>
|
||||
Assuming you have installed the libraries, you will find the precompiled
|
||||
<CODE>gfc</CODE> and <CODE>gfr</CODE> files directly under <CODE>$GF_LIB_PATH</CODE>, whose default
|
||||
value is <CODE>/usr/local/share/GF/</CODE>. The precompiled subdirectories are
|
||||
</P>
|
||||
<PRE>
|
||||
alltenses
|
||||
mathematical
|
||||
multimodal
|
||||
present
|
||||
</PRE>
|
||||
<P>
|
||||
Do for instance
|
||||
</P>
|
||||
<PRE>
|
||||
cd $GF_LIB_PATH
|
||||
gf alltenses/langs.gfcm
|
||||
|
||||
> p -cat=S -lang=LangEng "this grammar is too big" | tb
|
||||
</PRE>
|
||||
<P>
|
||||
For more details, see the <A HREF="synopsis.html">Synopsis</A>.
|
||||
</P>
|
||||
<H2>Compilation</H2>
|
||||
<P>
|
||||
If you want to compile the library from scratch, use <CODE>make</CODE>:
|
||||
</P>
|
||||
<PRE>
|
||||
cd $GF_LIB_PATH/resource-1.0
|
||||
make
|
||||
</PRE>
|
||||
<P>
|
||||
The <CODE>make</CODE> procedure does not by default make Arabic and Catalan, but you
|
||||
can uncomment the relevant lines in <CODE>Makefile</CODE> to compile them.
|
||||
</P>
|
||||
<H2>Encoding</H2>
|
||||
<P>
|
||||
Finnish, German, Romance, and Scandinavian languages are in isolatin-1.
|
||||
</P>
|
||||
<P>
|
||||
Arabic and Russian are in UTF-8.
|
||||
</P>
|
||||
<P>
|
||||
English is in pure ASCII.
|
||||
</P>
|
||||
<P>
|
||||
The different encodings imply, unfortunately, that it is hard to get
|
||||
a nice view of all languages simultaneously. The easiest way to achieve this is
|
||||
to use <CODE>gfeditor</CODE>, which automatically converts grammars to UTF-8.
|
||||
</P>
|
||||
<H2>Using the resource as library</H2>
|
||||
<P>
|
||||
This API is accessible by both <CODE>present</CODE> and <CODE>alltenses</CODE>. The modules you most often need are
|
||||
</P>
|
||||
<UL>
|
||||
<LI><CODE>Syntax</CODE>, the interface to syntactic structures
|
||||
<LI><CODE>Syntax</CODE><I>L</I>, the implementations of <CODE>Syntax</CODE> for each language <I>L</I>
|
||||
<LI><CODE>Paradigms</CODE><I>L</I>, the morphological paradigms for each language <I>L</I>
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
The <A HREF="synopsis.html">Synopsis</A> gives examples on the typical usage of these
|
||||
modules.
|
||||
</P>
|
||||
<H2>Using the resource as top level grammar</H2>
|
||||
<P>
|
||||
The following modules can be used for parsing and linearization. They are accessible from both
|
||||
<CODE>present</CODE> and <CODE>alltenses</CODE>.
|
||||
</P>
|
||||
<UL>
|
||||
<LI><CODE>Lang</CODE><I>L</I> for each language <I>L</I>, implementing a common abstract syntax <CODE>Lang</CODE>
|
||||
<LI><CODE>Danish</CODE>, <CODE>English</CODE>, etc, implementing <CODE>Lang</CODE> with language-specific extensions
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
In addition, there is in both <CODE>present</CODE> and <CODE>alltenses</CODE> the file
|
||||
</P>
|
||||
<UL>
|
||||
<LI><CODE>langs.gfcm</CODE>, a package with precompiled <CODE>Lang</CODE><I>L</I> grammars
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
A way to test and view the resource grammar is to load <CODE>langs.gfcm</CODE> either into <CODE>gfeditor</CODE>
|
||||
or into the <CODE>gf</CODE> shell and perform actions such as syntax editing and treebank generation.
|
||||
For instance, the command
|
||||
</P>
|
||||
<PRE>
|
||||
> p -lang=LangEng -cat=S "this grammar is too big" | tb
|
||||
</PRE>
|
||||
<P>
|
||||
creates a treebank entry with translations of this sentence.
|
||||
</P>
|
||||
<P>
|
||||
For parsing, currently only English and the Scandinavian languages are within the limits ofr
|
||||
reasonable resources. For other languages <I>L</I>, parsing with <CODE>Lang</CODE><I>L</I> will probably eat
|
||||
up the computer resources before finishing the parser generation.
|
||||
</P>
|
||||
<H2>Accessing the lower level ground API</H2>
|
||||
<P>
|
||||
The <CODE>Syntax</CODE> API is implemented in terms a bunch of <CODE>abstract</CODE> modules, which
|
||||
as of version 1.2 are mainly interesting for implementors of the resource.
|
||||
See the <A HREF="index-1.1.html">documentation for version 1.1</A> for more details.
|
||||
</P>
|
||||
<H2>Known bugs and missing components</H2>
|
||||
<P>
|
||||
Danish
|
||||
</P>
|
||||
<UL>
|
||||
<LI>the lexicon and chosen inflections are only partially verified
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
English
|
||||
</P>
|
||||
<P>
|
||||
Finnish
|
||||
</P>
|
||||
<UL>
|
||||
<LI>wrong cases in some passive constructions
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
French
|
||||
</P>
|
||||
<UL>
|
||||
<LI>multiple clitics (with V3) not always right
|
||||
<LI>third person pronominal questions with inverted word order
|
||||
have wrong forms if "t" is required e.g.
|
||||
(e.g. "comment fera-t-il" becomes "comment fera il")
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
German
|
||||
</P>
|
||||
<P>
|
||||
Italian
|
||||
</P>
|
||||
<UL>
|
||||
<LI>multiple clitics (with V3) not always right
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
Norwegian
|
||||
</P>
|
||||
<UL>
|
||||
<LI>the lexicon and chosen inflections are only partially verified
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
Russian
|
||||
</P>
|
||||
<UL>
|
||||
<LI>some functions missing
|
||||
<LI>some regular paradigms are missing
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
Spanish
|
||||
</P>
|
||||
<UL>
|
||||
<LI>multiple clitics (with V3) not always right
|
||||
<LI>missing contractions with imperatives and clitics
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
Swedish
|
||||
</P>
|
||||
<H2>More reading</H2>
|
||||
<P>
|
||||
<A HREF="../../../doc/resource.pdf">GF Resource Grammar Library</A> (pdf).
|
||||
Printable user manual with API documentation.
|
||||
</P>
|
||||
<P>
|
||||
<A HREF="gslt-sem-2006.html">Grammars as Software Libraries</A>. Slides
|
||||
with background and motivation for the resource grammar library.
|
||||
</P>
|
||||
<P>
|
||||
<A HREF="clt2006.html">GF Resource Grammar Library Version 1.0</A>. Slides
|
||||
giving an overview of the library and practical hints on its use.
|
||||
</P>
|
||||
<P>
|
||||
<A HREF="Resource-HOWTO.html">How to write resource grammars</A>. Helps you
|
||||
start if you want to add another language to the library.
|
||||
</P>
|
||||
<P>
|
||||
<A HREF="http://www.cs.chalmers.se/~aarne/geocal2006.pdf">Parametrized modules for Romance languages</A>.
|
||||
Slides explaining some ideas in the implementation of
|
||||
French, Italian, and Spanish.
|
||||
</P>
|
||||
<P>
|
||||
<A HREF="http://www.cs.chalmers.se/~aarne/slides/webalt-2005.pdf">Grammar writing by examples</A>.
|
||||
Slides showing how linearization rules are written as strings parsable by the resource grammar.
|
||||
</P>
|
||||
<P>
|
||||
<A HREF="http://www.cs.chalmers.se/~aarne/slides/talk-edin2005.pdf">Multimodal Resource Grammars</A>.
|
||||
Slides showing how to use the multimodal resource library. N.B. the library
|
||||
examples are from <CODE>multimodal/old</CODE>, which is a reduced-size API.
|
||||
</P>
|
||||
|
||||
<!-- html code generated by txt2tags 2.4 (http://txt2tags.sf.net) -->
|
||||
<!-- cmdline: txt2tags -thtml index.txt -->
|
||||
</BODY></HTML>
|
||||
256
lib/resource-1.0/doc/index.txt
Normal file
256
lib/resource-1.0/doc/index.txt
Normal file
@@ -0,0 +1,256 @@
|
||||
GF Resource Grammar Library v. 1.2
|
||||
Author: Aarne Ranta <aarne (at) cs.chalmers.se>
|
||||
Last update: %%date(%c)
|
||||
|
||||
% NOTE: this is a txt2tags file.
|
||||
% Create an html file from this file using:
|
||||
% txt2tags --toc -thtml index.txt
|
||||
|
||||
%!target:html
|
||||
|
||||
|
||||
The GF Resource Grammar Library defines the basic grammar of
|
||||
ten languages:
|
||||
Danish, English, Finnish, French, German,
|
||||
Italian, Norwegian, Russian, Spanish, Swedish.
|
||||
Still incomplete implementations for Arabic and Catalan are also
|
||||
included.
|
||||
|
||||
**New in Version 1.2**
|
||||
- Simpler APIs using overloading: see [Synopsis synopsis.html].
|
||||
The API of version 1.0 remains valid and can be used in combination with this.
|
||||
- Bug fixes.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
==Authors==
|
||||
|
||||
Inger Andersson and Therese Soderberg (Spanish morphology),
|
||||
Nicolas Barth and Sylvain Pogodalla (French verb list),
|
||||
Ali El Dada (Arabic modules),
|
||||
Janna Khegai (Russian modules),
|
||||
Bjorn Bringert (many Swadesh lexica),
|
||||
Carlos Gonzalía (Spanish cardinals),
|
||||
Harald Hammarström (German morphology),
|
||||
Patrik Jansson (Swedish cardinals),
|
||||
Andreas Priesnitz (German lexicon),
|
||||
Aarne Ranta,
|
||||
Jordi Saludes (Catalan modules),
|
||||
Henning Thielemann (German lexicon).
|
||||
|
||||
|
||||
We are grateful for contributions and
|
||||
comments to several other people who have used this and
|
||||
the previous versions of the resource library, including
|
||||
Ludmilla Bogavac,
|
||||
Ana Bove,
|
||||
David Burke,
|
||||
Lauri Carlson,
|
||||
Gloria Casanellas,
|
||||
Karin Cavallin,
|
||||
Robin Cooper,
|
||||
Hans-Joachim Daniels,
|
||||
Elisabet Engdahl,
|
||||
Markus Forsberg,
|
||||
Kristofer Johannisson,
|
||||
Anni Laine,
|
||||
Peter Ljunglöf,
|
||||
Saara Myllyntausta,
|
||||
Wanjiku Ng'ang'a,
|
||||
Nadine Perera,
|
||||
Jordi Saludes.
|
||||
|
||||
|
||||
==License==
|
||||
|
||||
The GF Resource Grammar Library is open-source software licensed under
|
||||
GNU Lesser General Public License (LGPL). See the file [LICENSE ../LICENSE] for more
|
||||
details.
|
||||
|
||||
|
||||
==Scope==
|
||||
|
||||
Coverage, for each language:
|
||||
- complete morphology
|
||||
- lexicon of the ca. 100 most important structural words
|
||||
- test lexicon of ca. 300 content words (rough equivalents in each language)
|
||||
- list of irregular verbs (separately for each language)
|
||||
- representative fragment of syntax (cf. CLE (Core Language Engine))
|
||||
- rather flat semantics (cf. Quasi-Logical Form of CLE)
|
||||
|
||||
|
||||
Organization:
|
||||
- top-level (API) modules
|
||||
- Ground API + special-purpose APIs
|
||||
- "school grammar" concepts rather than advanced linguistic theory
|
||||
|
||||
|
||||
Presentation:
|
||||
- tool ``gfdoc`` for generating HTML from grammars
|
||||
- example collections
|
||||
|
||||
|
||||
==Location==
|
||||
|
||||
Assuming you have installed the libraries, you will find the precompiled
|
||||
``gfc`` and ``gfr`` files directly under ``$GF_LIB_PATH``, whose default
|
||||
value is ``/usr/local/share/GF/``. The precompiled subdirectories are
|
||||
```
|
||||
alltenses
|
||||
mathematical
|
||||
multimodal
|
||||
present
|
||||
```
|
||||
Do for instance
|
||||
```
|
||||
cd $GF_LIB_PATH
|
||||
gf alltenses/langs.gfcm
|
||||
|
||||
> p -cat=S -lang=LangEng "this grammar is too big" | tb
|
||||
```
|
||||
For more details, see the [Synopsis synopsis.html].
|
||||
|
||||
|
||||
==Compilation==
|
||||
|
||||
If you want to compile the library from scratch, use ``make``:
|
||||
```
|
||||
cd $GF_LIB_PATH/resource-1.0
|
||||
make
|
||||
```
|
||||
The ``make`` procedure does not by default make Arabic and Catalan, but you
|
||||
can uncomment the relevant lines in ``Makefile`` to compile them.
|
||||
|
||||
|
||||
==Encoding==
|
||||
|
||||
Finnish, German, Romance, and Scandinavian languages are in isolatin-1.
|
||||
|
||||
Arabic and Russian are in UTF-8.
|
||||
|
||||
English is in pure ASCII.
|
||||
|
||||
The different encodings imply, unfortunately, that it is hard to get
|
||||
a nice view of all languages simultaneously. The easiest way to achieve this is
|
||||
to use ``gfeditor``, which automatically converts grammars to UTF-8.
|
||||
|
||||
|
||||
==Using the resource as library==
|
||||
|
||||
This API is accessible by both ``present`` and ``alltenses``. The modules you most often need are
|
||||
- ``Syntax``, the interface to syntactic structures
|
||||
- ``Syntax``//L//, the implementations of ``Syntax`` for each language //L//
|
||||
- ``Paradigms``//L//, the morphological paradigms for each language //L//
|
||||
|
||||
|
||||
The [Synopsis synopsis.html] gives examples on the typical usage of these
|
||||
modules.
|
||||
|
||||
|
||||
==Using the resource as top level grammar==
|
||||
|
||||
The following modules can be used for parsing and linearization. They are accessible from both
|
||||
``present`` and ``alltenses``.
|
||||
- ``Lang``//L// for each language //L//, implementing a common abstract syntax ``Lang``
|
||||
- ``Danish``, ``English``, etc, implementing ``Lang`` with language-specific extensions
|
||||
|
||||
|
||||
In addition, there is in both ``present`` and ``alltenses`` the file
|
||||
- ``langs.gfcm``, a package with precompiled ``Lang``//L// grammars
|
||||
|
||||
|
||||
A way to test and view the resource grammar is to load ``langs.gfcm`` either into ``gfeditor``
|
||||
or into the ``gf`` shell and perform actions such as syntax editing and treebank generation.
|
||||
For instance, the command
|
||||
```
|
||||
> p -lang=LangEng -cat=S "this grammar is too big" | tb
|
||||
```
|
||||
creates a treebank entry with translations of this sentence.
|
||||
|
||||
For parsing, currently only English and the Scandinavian languages are within the limits ofr
|
||||
reasonable resources. For other languages //L//, parsing with ``Lang``//L// will probably eat
|
||||
up the computer resources before finishing the parser generation.
|
||||
|
||||
|
||||
|
||||
==Accessing the lower level ground API==
|
||||
|
||||
The ``Syntax`` API is implemented in terms a bunch of ``abstract`` modules, which
|
||||
as of version 1.2 are mainly interesting for implementors of the resource.
|
||||
See the [documentation for version 1.1 index-1.1.html] for more details.
|
||||
|
||||
|
||||
==Known bugs and missing components==
|
||||
|
||||
Danish
|
||||
- the lexicon and chosen inflections are only partially verified
|
||||
|
||||
|
||||
English
|
||||
|
||||
|
||||
Finnish
|
||||
- wrong cases in some passive constructions
|
||||
|
||||
|
||||
French
|
||||
- multiple clitics (with V3) not always right
|
||||
- third person pronominal questions with inverted word order
|
||||
have wrong forms if "t" is required e.g.
|
||||
(e.g. "comment fera-t-il" becomes "comment fera il")
|
||||
|
||||
|
||||
German
|
||||
|
||||
|
||||
Italian
|
||||
- multiple clitics (with V3) not always right
|
||||
|
||||
|
||||
Norwegian
|
||||
- the lexicon and chosen inflections are only partially verified
|
||||
|
||||
|
||||
Russian
|
||||
- some functions missing
|
||||
- some regular paradigms are missing
|
||||
|
||||
|
||||
Spanish
|
||||
- multiple clitics (with V3) not always right
|
||||
- missing contractions with imperatives and clitics
|
||||
|
||||
|
||||
Swedish
|
||||
|
||||
|
||||
|
||||
|
||||
==More reading==
|
||||
|
||||
[GF Resource Grammar Library ../../../doc/resource.pdf] (pdf).
|
||||
Printable user manual with API documentation.
|
||||
|
||||
[Grammars as Software Libraries gslt-sem-2006.html]. Slides
|
||||
with background and motivation for the resource grammar library.
|
||||
|
||||
[GF Resource Grammar Library Version 1.0 clt2006.html]. Slides
|
||||
giving an overview of the library and practical hints on its use.
|
||||
|
||||
[How to write resource grammars Resource-HOWTO.html]. Helps you
|
||||
start if you want to add another language to the library.
|
||||
|
||||
[Parametrized modules for Romance languages http://www.cs.chalmers.se/~aarne/geocal2006.pdf].
|
||||
Slides explaining some ideas in the implementation of
|
||||
French, Italian, and Spanish.
|
||||
|
||||
[Grammar writing by examples http://www.cs.chalmers.se/~aarne/slides/webalt-2005.pdf].
|
||||
Slides showing how linearization rules are written as strings parsable by the resource grammar.
|
||||
|
||||
[Multimodal Resource Grammars http://www.cs.chalmers.se/~aarne/slides/talk-edin2005.pdf].
|
||||
Slides showing how to use the multimodal resource library. N.B. the library
|
||||
examples are from ``multimodal/old``, which is a reduced-size API.
|
||||
|
||||
Reference in New Issue
Block a user