mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-10 21:39:32 -06:00
487 lines
15 KiB
HTML
487 lines
15 KiB
HTML
<html>
|
|
|
|
<body bgcolor="#FFFFFF" text="#000000" >
|
|
|
|
<center>
|
|
<img SRC="../../doc/gf-logo.gif">
|
|
|
|
<h1>The GF Resource Grammar Library</h1>
|
|
|
|
|
|
<a href="http://www.cs.chalmers.se/~aarne">Aarne Ranta</a>
|
|
2002-2004
|
|
|
|
<p>
|
|
|
|
Version 0.6: <a href="../../download/gf-lib.tgz">source package</a>.
|
|
|
|
<p>
|
|
|
|
Current languages: English, Finnish, French, German, Italian, Russian, Swedish.
|
|
|
|
</center>
|
|
|
|
<font size=2>
|
|
<b>News</b>. <br>
|
|
|
|
10/8/2004 This document updated as a revision of the
|
|
<a href="http://tournesol.cs.chalmers.se/aarne/GF/resource/">old resource page</a>.
|
|
|
|
<br>
|
|
|
|
13/4/2004 Version 0.6 written using the module system of GF 2. Also an
|
|
extended coverage. The files are placed in separate subdirectories (one
|
|
per language) and have different names than before, so that file names
|
|
(without the extension <tt>.gf</tt>) are also legal module names.
|
|
</font>
|
|
|
|
<p>
|
|
|
|
<i>
|
|
<b>Notice</b>. You need GF Version 2.0beta or later
|
|
to work with these resource grammars.
|
|
It is available from the
|
|
<a href="http://www.cs.chalmers.se/~aarne/GF/">GF home page</a>.
|
|
</i>
|
|
|
|
|
|
<p>
|
|
|
|
|
|
<h2>Introduction</h2>
|
|
|
|
As programs in general can be divided into
|
|
<ul>
|
|
<li> application programs
|
|
<li> library programs
|
|
</ul>
|
|
GF grammars can be divided into
|
|
<ul>
|
|
<li> <b>application grammars</b>
|
|
<li> <b>resource grammars</b>
|
|
</ul>
|
|
An application grammar is typically built around
|
|
a semantic model, which is formalized as the abstract
|
|
syntax of the language. Concrete syntax defines
|
|
a mapping from the abstract syntax into English or
|
|
Swedish or some other language.
|
|
|
|
<p>
|
|
|
|
A resource grammar is not based on semantics, but its
|
|
purpose is to define the linguistic "surface" structures
|
|
of some language. The availability of these structures makes it easier to
|
|
write application grammars.
|
|
|
|
<p>
|
|
|
|
With resource grammars, we aim to achieve <b>division of labour</b> in
|
|
grammar writing:
|
|
<ul>
|
|
<li> application grammars are written by domain experts
|
|
<li> resource grammars are written by linguists
|
|
</ul>
|
|
By using resource grammars, experts of application domains can take
|
|
linguistic details for granted. For instance, to
|
|
express the linearization of the arithmetical predicate <i>even</i>
|
|
in French, she does not have to write
|
|
<pre>
|
|
lin Even x = {s =
|
|
table {
|
|
m => x.s ++
|
|
table {Ind => "est" ; Subj => "soit"} ! m ++
|
|
table {Masc => "pair" ; Fem => "paire"} ! x.g
|
|
}
|
|
} ;
|
|
</pre>
|
|
but simply
|
|
<pre>
|
|
lin Even = predA1 (adjReg "pair") ;
|
|
</pre>
|
|
The author of the French resource grammar will have defined the
|
|
functions <tt>predAdj</tt> and <tt>adjReg</tt> in such a way that
|
|
they can be used in all applications.
|
|
|
|
<p>
|
|
|
|
What is more, the resource grammar has a <b>language-independent
|
|
API</b>, which makes it possible to write the corresponding rule
|
|
for other languages in a very similar way. For instance, the
|
|
German rule is
|
|
<pre>
|
|
lin Even = predA1 (adjReg "gerade") ;
|
|
</pre>
|
|
|
|
|
|
|
|
<h2>Coverage</h2>
|
|
|
|
The ultimate goal of the resource grammar library is a full coverage of the linguistic
|
|
structures of each language. As of Version 0.6, we still have some way
|
|
to go to reach that goal. But we do have
|
|
<ul>
|
|
<li> fairly complete sets of inflection paradigms for each language
|
|
<li> a representative fragment of syntax covering present-tense
|
|
indicative, interrogative, and imperative sentence.
|
|
<li> lexica of structural words such as pronouns, articles, conjunctions.
|
|
</ul>
|
|
|
|
|
|
<h2>Demo</h2>
|
|
|
|
To get an idea of the coverage of the resource library, and also
|
|
to help finding the right functions for your applications, you
|
|
can do
|
|
<pre>
|
|
make test
|
|
jgf TestAll.gfcm
|
|
</pre>
|
|
This opens the syntax editor with all the seven resource grammars
|
|
extended with a small lexicon.
|
|
|
|
|
|
|
|
|
|
<h2>Programmer's view on resource grammars</h2>
|
|
|
|
The resource grammar library a hierarchical structure. Its main layers are
|
|
<ul>
|
|
<li> The language-dependent <b>core resources</b>, to be described below.
|
|
<li> The language-independent <b>core resource API</b>,
|
|
<a href="doc/Combinations.html"><tt>Combinations.gf</tt></a>.
|
|
<a href="doc/Structural.html"><tt>Structural.gf</tt></a>.
|
|
<li> The <b>derived resource libraries</b>, some of which are
|
|
language-dependent, some of which aren't. The most important
|
|
ones are the language-dependent lexical paradigm modules
|
|
<tt>ParadigmsX.gf</tt>.
|
|
</ul>
|
|
The core resources should not be needed by application grammarians: it should
|
|
be enough to use the core resource API and the derived libraries. If
|
|
this is not the case, the best solution is to extend the derived resource
|
|
libraries or create new ones.
|
|
|
|
|
|
|
|
<h3>Grammaticality guarantee via data abstraction</h3>
|
|
|
|
An important principle is that
|
|
<ul>
|
|
<li> the core resource API and the derived resource libraries guarantee
|
|
that all type-correct uses of them preserve grammaticality.
|
|
</ul>
|
|
This principle is simultaneously a guidance for resource grammarians
|
|
and an argument for the application grammarian to use these libraries.
|
|
What we mean by "only using the libraries" is that
|
|
<ul>
|
|
<li> all <tt>lin</tt> and
|
|
<tt>lincat</tt> rules are built solely from library functions and
|
|
argument variables.
|
|
</ul>
|
|
Thus for instance no records, tables, selections or projections should appear
|
|
in the rules. What we have achieved then is <b>total data abstraction</b>,
|
|
and the grammaticality guarantee can be given.
|
|
|
|
<p>
|
|
|
|
Since the resource grammars are work in progress, their coverage is not
|
|
yet sufficient for complete data abstraction. In addition, there may of course
|
|
be bugs in the resource grammars that destroy grammaticality. The GF group is
|
|
grateful for bug reports, requests, and contributions!
|
|
|
|
<p>
|
|
|
|
The most important exception to total data abstraction in practice is the
|
|
incompleteness of resource lexica. Since it is impossible to have
|
|
full coverage of all the words in a language, users often have to introduce
|
|
their own lexical entries, and thereby use literal strings in their GF code.
|
|
The safest and most convenient way of using this is via functions
|
|
defined in <tt>ParadigmsX.gf</tt> files. Using these functions guarantees
|
|
that the lexical entries created are type-correct. But nothing guards
|
|
against misspelling a word, picking a wrong inflectional pattern, or
|
|
a wrong inherent feature (such as gender).
|
|
|
|
|
|
|
|
<h3>The resource grammar documentation in <tt>gfdoc</tt></h3>
|
|
|
|
All documented GF grammars linked from this page
|
|
have been written in GF and then translated to HTML
|
|
using a light-weight documentation tool,
|
|
<tt>gfdoc</tt>. The tool is available as a part of the GF
|
|
source code package, in the Haskell file
|
|
<tt>util/GFDoc.hs</tt> that can be run in the Hugs interpreter
|
|
by the script <tt>util/gfdoc</tt>. The program also has the
|
|
flag <tt>+latex</tt>, which produces output in Latex instead of
|
|
HTML.
|
|
|
|
|
|
|
|
<h3>The core resource API</h3>
|
|
|
|
The API is divided into two modules, <tt>Combiantions</tt> and
|
|
its extension <tt>Structural</tt>.
|
|
|
|
<p>
|
|
|
|
The file <a href="doc/Combinations.html"><tt>Combinations.gf</tt></a>
|
|
gives the core resource type signatures of phrasal categories and
|
|
syntactic combination rules, together with some explanations
|
|
and examples. The examples are so far only in English, but their
|
|
equivalents are available in all of the languages for which the
|
|
API has been implemented.
|
|
|
|
<p>
|
|
|
|
The file <a href="doc/Structural.html"><tt>Structurals.gf</tt></a>
|
|
gives a list of structural words such as determiners, pronouns,
|
|
prepositions, and conjunctions.
|
|
|
|
<p>
|
|
|
|
The file <tt>Structural.gf</tt> cannot be imported directly, but
|
|
via the generated files <tt>ResourceX.gf</tt> for each language <tt>X</tt>.
|
|
In these files, the <tt>fun/lin</tt> and <tt>cat/lincat</tt> judgements have been
|
|
translated into <tt>oper</tt> judgements.
|
|
|
|
|
|
|
|
<h3>The lexical paradigm modules</h3>
|
|
|
|
The lexical paradigm modules define, for
|
|
each lexical category, a <b>worst-case macro</b> for adding words
|
|
of that category by giving a sufficient number of characteristic
|
|
forms. In addition, the most common <b>regular paradigms</b> are
|
|
included, where it is enough just to give one form to generate
|
|
all the others.
|
|
|
|
<p>
|
|
|
|
For example, the English paradigm module has the worst-case macro for nouns,
|
|
<pre>
|
|
mkN : (man,men,man's,men's : Str) -> Gender -> N ;
|
|
</pre>
|
|
taking four forms and a gender (<tt>human</tt> or <tt>nonhuman</tt>,
|
|
as is also explained in the module). Its application
|
|
<pre>
|
|
mkN "mouse" "mice" "mouse's" "mice's" nonhuman
|
|
</pre>
|
|
defines all information that is needed for the noun <i>mouse</i>.
|
|
There are also some regular patterns, for instance,
|
|
<pre>
|
|
nReg : Str -> Gender -> N ; -- dog, dogs
|
|
nKiss : Str -> Gender -> N ; -- kiss, kisses
|
|
</pre>
|
|
examples of which are
|
|
<pre>
|
|
nReg "car" nonhuman
|
|
nKiss "waitress" human
|
|
</pre>
|
|
|
|
<p>
|
|
|
|
Here are the documented versions of the paradigm modules:
|
|
<ul>
|
|
<li> English: <a href="doc/ParadigmsEng.html"><tt>ParadigmsEng.gf</tt></a>
|
|
<li> Finnish: <a href="doc/ParadigmsFin.html"><tt>ParadigmsFin.gf</tt></a>
|
|
<li> French: <a href="doc/ParadigmsFre.html"><tt>ParadigmsFre.gf</tt></a>
|
|
<li> German: <a href="doc/ParadigmsGer.html"><tt>ParadigmsGer.gf</tt></a>
|
|
<li> Italian: <a href="doc/ParadigmsIta.html"><tt>ParadigmsIta.gf</tt></a>
|
|
<li> Russian: <a href="doc/ParadigmsRus.html"><tt>ParadigmsRus.gf</tt></a>
|
|
<li> Swedish: <a href="doc/ParadigmsSwe.html"><tt>ParadigmsSwe.gf</tt></a>
|
|
</ul>
|
|
|
|
|
|
<h3>The derived resource libraries</h3>
|
|
|
|
The core resource grammar is minimal in the sense that it defines the
|
|
smallest syntactic combinations and has no redundancy. For applications, it
|
|
is usually more convenient to use combinations of the minimal rules.
|
|
Some such combinations are given in the <b>predication library</b>,
|
|
which defines the simultaneous applications of one- and two-place
|
|
verbs and adjectives to all their argument noun phrases. It also
|
|
defines some other constructions useful for logical and mathematical
|
|
applications.
|
|
|
|
<p>
|
|
|
|
The API of the predication library is in the file
|
|
<a href="doc/Predication.html"><tt>Predication.gf</tt></a>.
|
|
What is imported is one of the language-dependent files,
|
|
<tt>X/PredicationX.gf</tt> for each language <tt>X</tt>.
|
|
|
|
|
|
|
|
|
|
<h2>Linguist's view on resource grammars</h2>
|
|
|
|
<h3>GF and other grammar formalisms</h3>
|
|
|
|
Linguists in particular might be interested in resource
|
|
grammars for their own sake, not as basis of applications.
|
|
Since few linguists are so far familiar with GF, we refer to the
|
|
<a href="http://www.cs.chalmers.se/~aarne/GF/">GF Homepage</a>
|
|
and especially to the
|
|
<a href="http://www.cs.chalmers.se/~aarne/GF/Tutorial/">GF Tutorial</a>.
|
|
What comes here is a brief summary of the relation of GF to
|
|
other record-based formalisms.
|
|
|
|
<p>
|
|
|
|
The records of GF are much like feature structures in PATR or HPSG.
|
|
The main differences are that
|
|
<ul>
|
|
<li> GF has a type system inherited from
|
|
functional programming languages;
|
|
<li> GF records are primarily obtained as linearizations of trees, not
|
|
as parses of strings.
|
|
</ul>
|
|
The latter difference explains why a GF record typically carries more
|
|
information than a feature structure. For instance, the record describing
|
|
the French noun <i>cheval</i> is
|
|
<pre>
|
|
{s = table {Sg => "cheval" ; Pl => "chevaux"} ; g = Masc} ;
|
|
</pre>
|
|
showing the full inflection table of the (abstract) noun <i>cheval</i>.
|
|
A PATR record
|
|
for the French word <i>cheval</i> would be
|
|
<pre>
|
|
{s = "cheval" ; n = Sg ; g = Masc} ;
|
|
</pre>
|
|
showing just the information that can be gathered from the (concrete)
|
|
string <i>cheval</i>.
|
|
There is a rather straightforward sense in which the PATR record is an
|
|
<b>instance</b> of the GF record.
|
|
|
|
<p>
|
|
|
|
When generating language from syntax trees (or from logical formulas via
|
|
syntax trees), the record containing full inflection tables is an efficient
|
|
(linear-time) method of producing the correct forms.
|
|
This is important when text is generated in real time in
|
|
an interactive system.
|
|
|
|
|
|
|
|
<h2>The structure of core resource grammars</h2>
|
|
|
|
As explained above, the application grammarian's view on resource grammars
|
|
is through API modules. They are collections of type signatures of functions.
|
|
It is the task of linguists to define these functions.
|
|
The definitions are in the end given
|
|
in the <b>core resource grammars</b>.
|
|
|
|
<p>
|
|
|
|
We have divided the core resource grammar for each language <tt>X</tt>
|
|
into the following parts:
|
|
<ul>
|
|
<li> Type system: <tt>TypesX.gf</tt>
|
|
<li> Morphology: <tt>MorphoX.gf</tt>
|
|
<li> Syntax: <tt>SyntaxX.gf</tt>
|
|
</ul>
|
|
To get the most powerful resource grammar for each language, one can use
|
|
these files directly.
|
|
|
|
<p>
|
|
|
|
However, the languages we have studied have so much in common
|
|
that we have gathered a considerable set of categories and rules
|
|
in a <b>multilingual resource grammar</b>. Its parts are
|
|
<ul>
|
|
<li> Abstract syntax: <tt>Resource.gf</tt></a>
|
|
<li> Language-dependent concrete syntax: <tt>ResourceX.gf</tt></a> for
|
|
each language.
|
|
</ul>
|
|
The advantage of using this API in application grammars is that
|
|
<b>their concrete syntax looks the same for all languages</b>
|
|
up to non-structural words. Thus it is possible to produce concrete syntaxes
|
|
for new languages without knowing almost anything about them.
|
|
The abstract syntax serves as a common API to the core resource grammar.
|
|
|
|
|
|
<h3>The code for the core resource grammars</h3>
|
|
|
|
Each language has its resource code in a separate directory.
|
|
You can view the code as it is, or download it and run <tt>gfdoc</tt>
|
|
on each file.
|
|
<ul>
|
|
<li> English:
|
|
<a href="english"><tt>english</tt></a>
|
|
<li> Finnish:
|
|
<a href="finnish"><tt>Finnish</tt></a>
|
|
<li> Shared Romance:
|
|
<a href="romance"><tt>romance</tt></a>
|
|
<li> French (building on Romance):
|
|
<a href="french"><tt>French</tt></a>
|
|
<li> Italian (building on Romance):
|
|
<a href="italian"><tt>italian</tt></a>
|
|
<li> Russian:
|
|
<a href="russian"><tt>russian</tt></a>
|
|
<li> German:
|
|
<a href="german"><tt>german</tt></a>
|
|
<li> Swedish:
|
|
<a href="swedish"><tt>swedish</tt></a>
|
|
</ul>
|
|
|
|
|
|
<h2>Compiling and using the resource</h2>
|
|
|
|
To compile the resource into reusable operations, for all languages, type
|
|
<pre>
|
|
make
|
|
</pre>
|
|
in the <tt>resource/</tt> directory.
|
|
This requires that you have a recent version of GF (>= 2.0).
|
|
What you get is a set of files with names <tt>ResourceX.gfr</tt>,
|
|
<tt>ResourceX.gfc</tt>, <tt>ParadigmsX.gfr</tt>, and <tt>ParadigmsX.gfc</tt>.
|
|
You need never consult any of these files,
|
|
but only look into the <a href="doc">documentation</a>.
|
|
|
|
|
|
|
|
<h2>Examples of using the resource grammars</h2>
|
|
|
|
<h3>A test suite</h3>
|
|
|
|
The grammars <tt>TestResourceX.gf</tt> define a few expressions of each
|
|
lexical category and make it possible to test linearization, parsing,
|
|
random generation, and editing.
|
|
|
|
|
|
<h3>A database query language</h3>
|
|
|
|
The grammars
|
|
<a href="../database/">
|
|
<tt>database/(Database | Restaurant)X.gf</tt></a>
|
|
make use of the resource. The <tt>RestaurantX.gf</tt>
|
|
grammars are just one possible application building on the generic
|
|
<tt>DatabaseX.gf</tt> grammars.
|
|
Notice that the
|
|
<tt>DatabaseX</tt> gramamrs are defined as instantiations of
|
|
the parametrized module <tt>DatabaseI</tt>.
|
|
|
|
|
|
<h2>Functional morphology</h2>
|
|
|
|
Even though GF is a useful language for describing syntax and semantics, it
|
|
is not the optimal choice for morphology.
|
|
One reason is the absence of low-level
|
|
programming, such as string matching. Another reason is efficiency.
|
|
In connection with the resource grammar project, we have started another
|
|
project, <a href="http://www.cs.chalmers.se/%7Emarkus/FM">
|
|
functional morphology</a>,
|
|
which uses Haskell to implement
|
|
morphology. Haskell morphologies can then be used for generating
|
|
GF morphologies.
|
|
|
|
|
|
<h2>Further reading</h2>
|
|
|
|
<a
|
|
href="http://www.cs.chalmers.se/~aarne/slides/multi-eng-slides.pdf">
|
|
Slides on modular grammar engineering</a>.
|
|
|
|
</body>
|
|
</html>
|
|
|