Files
gf-core/lib/resource-0.6/doc/gf-resource.html
2004-06-22 19:54:12 +00:00

516 lines
17 KiB
HTML

<html>
<body bgcolor="#FFFFFF" text="#000000" >
<center>
<img SRC="./gf-logo.gif">
<h1>The GF Resource Grammar Library</h1>
<a href="http://www.cs.chalmers.se/~aarne">Aarne Ranta</a>
2002-2004
<p>
Version 0.7: <a href="gf-resource.tgz">source package</a>.
<p>
Current languages: English, Finnish, French, German, Italian, Russian, Swedish.
</center>
<font size=2>
<b>News</b> <br>
24/6/2004 Version 0.7 released together with the release of GF 2.0.
<br>
13/4/2004 Version 0.6 completed using the module system of GF 2. Also an
extended coverage. The files are placed in separate subdirectories (one
per language) and have different names than before, so that file names
(without the extension <tt>.gf</tt>) are also legal module names.
<br>
15/8/2003 Version 0.4 with Finnish added. Some updates of the Russian modules.
<br>
25/6/2003 Release of GF 1.2 making it more efficient to work with
resource grammars. See
<a href="http://www.cs.chalmers.se/~aarne/GF/doc/gf-1.2.html">highlights</a>.
Also <a href="gf-resource-0.3.tgz">source package version 0.3</a>
with some bug fixes.
<br>
5/6/2003. Russian resource modules by
<a href="http://www.cs.chalmers.se/~janna">Janna Khegai</a>.
Cyrillic strings in the files <tt>*.RusU.gf</tt> use UTF-8 encoding,
which is automatically detected by the Java GUI to GF. However, in web
browsers the encoding must be set manually.
<br>
3/6/2003. New version of this document, with separate sections
on application and resource grammarians' views and
added documentation
on the type system of each language <tt>X</tt>
in <tt>CombinationsX.gf</tt>.
<br>
23/5/2003. High-level lexicon access also in
French,
Italian,
and
Swedish.
<br>
23/5/2003.
Italian grammar based on generic Romance modules, shared with French.
<br>
14/4/2003. High-level access to define a lexicon in
English and German.
<p>
<i>
<b>Notice</b>. You need GF Version 2.0 or later
to work with these resource grammars.
It is available from the
<a href="http://www.cs.chalmers.se/~aarne/GF/">GF home page</a>.
</i>
</font>
<p>
<h2>Introduction</h2>
As programs in general can be divided into
application programs and library programs,
GF grammars can be divided into
<b>application grammars</b> and
<b>resource grammars</b>.
An application grammar is typically built around
a semantic model, which is formalized as the abstract
syntax of the language. Concrete syntax defines
a mapping from the abstract syntax into English or
Swedish or some other language.
A resource grammar is not based on semantics, but its
purpose is to define the linguistic "surface" structures
of some language. The availability of these structures makes it easier to
write application grammars.
<h3>Abstraction level</h3>
Resource grammars
<b>raise the level of abstraction in concrete syntax</b>.
The author of an application grammar is freed from thinking
about inflection, word order, etc, but can use structured
tree-like objects in linearization rules. For instance, to
express the linearization of the arithmetical predicate <i>even</i>
in French, she no longer has to write
<pre>
lin Even x = {s =
table {
m => x.s ++
table {Ind => "est" ; Subj => "soit"} ! m ++
table {Masc => "pair" ; Fem => "paire"} ! x.g
}
} ;
</pre>
but simply
<pre>
lin Even = predA1 (adjReg "pair") ;
</pre>
The author of the French resource grammar will have defined the
functions <tt>predAdj</tt> and <tt>adjReg</tt> in such a way that
they can be used in all applications. The type checker of the GF grammar
compiler guarantees that only grammatically correct combinations
can be formed by the resource grammar functions.
<h3>Unity of language</h3>
In addition to high abstraction level, reusability, and the division
of labour, resource grammars have the virtue of making sense of the
<b>unity of a language</b> such as English: while application grammars
depend on applications, resource grammars depend on language.
What is more, resource grammars for related languages can
share much of their code: to what degree this can be done gives
a measure of how related the languages are.
Thus we find resource grammars to be an interesting linguistic
project in its own right.
<h3>Semantics</h3>
We leave it open if we can also explain the <b>semantics</b>
of resource grammar on the general level. The philosophy of GF,
inherited from logical frameworks,
is that semantics is only given to
application grammars. (You can also compare application grammars to Wittgenstein's
"language games").
This view gives us a lot of freedom in formulating resource grammars.
When describing them, we sometimes say that such-and-such a construction
is likely to be ruled out by semantic reasons; what we mean is that
this will actually happen in application grammars; we do <i>not</i>
mean that GF has no semantic rules.
An example is the question
<i>From which city is every number even or odd?</i>.
The resource grammar makes it possible to form this question,
but it can hardly be correct in any application grammar that has
a rigorous semantics.
<h2>Programmer's view on resource grammars</h2>
The resource grammar library a hierarchical structure. Its main layers are
<ul>
<li> The language-independent <b>core resource API</b>,
<a href="Combinations.html"><tt>Combinations.gf</tt></a>.
<a href="Structural.html"><tt>Structural.gf</tt></a>.
<li> The language-dependent lexical paradigm modules
<tt>ParadigmsX.gf</tt></a>.
<li> The <b>derived resource libraries</b>, some of which are
language-dependent, some of which aren't.
<li> The language-dependent <b>resource infrastructure</b>, to be described below.
</ul>
The resource infrastructure should not be needed by application grammarians: it should
be enough to use the core resource API, the paradigm modules, and the derived libraries. If
this is not the case, the best solution is to extend the derived resource
libraries or create new ones.
<h3>Grammaticality guarantee via data abstraction</h3>
An important principle is that
<ul>
<li> the core resource API and the derived resource libraries guarantee
that all type-correct uses of them preserve grammaticality.
</ul>
This principle is simultaneously a guidance for resource grammarians
and an argument for the application grammarian to use these libraries.
What we mean by "only using the libraries" is that
<ul>
<li> all <tt>lin</tt> and
<tt>lincat</tt> rules are built solely from library functions and
argument variables.
</ul>
Thus for instance no records, tables, selections or projections should appear
in the rules. What we have achieved then is <b>total data abstraction</b>,
and the grammaticality guarantee can be given.
<p>
Since the resource grammars are work in progress, their coverage is not
yet sufficient for complete data abstraction. In addition, there may of course
be bugs in the resource grammars that destroy grammaticality. The GF group is
grateful for bug reports, requests, and contributions!
<p>
The most important exception to total data abstraction in practice
concerns resource lexica. Since it is impossible to have a
full coverage of all the words in a language, users often have to introduce
their own lexical entries, and thereby use literal strings in their GF code.
The safest and most convenient way of using this is via functions
defined in <tt>ParadigmsX.gf</tt> files. Using these functions guarantees
that the lexical entries created are type-correct. But nothing guards
against misspelling a word, picking a wrong inflectional pattern, or
a wrong inherent feature (such as the gender of a French noun).
<h3>The resource grammar documentation in <tt>gfdoc</tt></h3>
All documented GF grammars linked from this page
have been written in GF and then translated to HTML
using a light-weight documentation tool,
<tt>gfdoc</tt>. The tool is available as a part of the GF
package. The program also has the
flag <tt>-latex</tt>, which produces output in Latex instead of
HTML.
<h3>The core resource API</h3>
The API is divided into two modules, <tt>Combiantions</tt> and
its extension <tt>Structural</tt>.
<p>
The module <a href="Combinations.html"><tt>Combinations</tt></a>
gives the core resource type signatures of phrasal categories and
syntactic combination rules, together with some explanations
and examples. The examples are so far only in English, but their
equivalents are available in all of the languages for which the
API has been implemented.
<p>
The module <a href="Structural.html"><tt>Structural</tt></a>
defines structural words such as determiners, pronouns,
prepositions, and conjunctions.
<p>
The file <tt>Structural.gf</tt> cannot be imported directly, but
via the generated files <tt>ResourceX.gf</tt> for each language <tt>X</tt>.
In these files, the <tt>fun/lin</tt> and <tt>cat/lincat</tt> judgements have been
translated into <tt>oper</tt> judgements.
<h3>The lexical paradigm modules</h3>
The lexical paradigm modules define, for
each lexical category, a <b>worst-case macro</b> for adding words
of that category by giving a sufficient number of characteristic
forms. In addition, the most common <b>regular paradigms</b> are
included, where it is enough just to give one form to generate
all the others.
<p>
For example, the English paradigm module has the worst-case macro for nouns,
<pre>
mkN : (man,men,man's,men's : Str) -> Gender -> N ;
</pre>
taking four forms and a gender (<tt>human</tt> or <tt>nonhuman</tt>,
as is also explained in the module). Its application
<pre>
mkN "mouse" "mice" "mouse's" "mice's" nonhuman
</pre>
defines all information that is needed for the noun <i>mouse</i>.
There are also some regular patterns, for instance,
<pre>
nReg : Str -> Gender -> N ; -- dog, dogs
nKiss : Str -> Gender -> N ; -- kiss, kisses
</pre>
examples of which are
<pre>
nReg "car" nonhuman
nKiss "waitress" human
</pre>
<p>
Here are the documented versions of the paradigm modules:
<ul>
<li> English: <a href="ParadigmsEng.html"><tt>ParadigmsEng.gf</tt></a>
<li> Finnish: <a href="ParadigmsFin.html"><tt>ParadigmsFin.gf</tt></a>
<li> French: <a href="ParadigmsFra.html"><tt>ParadigmsFra.gf</tt></a>
<li> German: <a href="ParadigmsGer.html"><tt>ParadigmsDeu.gf</tt></a>
<li> Italian: <a href="ParadigmsIta.html"><tt>ParadigmsIta.gf</tt></a>
<li> Russian: <a href="ParadigmsRus.html"><tt>ParadigmsRus.gf</tt></a>
<li> Swedish: <a href="ParadigmsSwe.html"><tt>ParadigmsSwe.gf</tt></a>
</ul>
<h3>The derived resource libraries</h3>
The core resource grammar is minimal in the sense that it defines the
smallest syntactic combinations and has no redundancy. For applications, it
is usually more convenient to use combinations of the minimal rules.
Some such combinations are given in the <b>predication library</b>,
which defines the simultaneous applications of one- and two-place
verbs and adjectives to all their argument noun phrases. It also
defines some other constructions useful for logical and mathematical
applications.
<p>
The API of the predication library is the module
<a href="Predication.html"><tt>Predication</tt></a>.
What is imported is one of the language-dependent modules,
<tt>X/PredicationX</tt> for each language <tt>X</tt>.
<h3>The language-dependent type systems</h3>
Sometimes it is useful for the application grammarian to know what the
language-dependent linearizations types are for each category in the
core resource. These types are defined in the files <tt>CombinationsX.gf</tt>.
They can be translated to documents by <tt>gfdoc</tt> if desired.
<!--
<ul>
<li> English: <a href="CombinationsEng.html"><tt>CombinationsEng.gf</tt></a>
<li> Finnish: <a href="CombinationsFin.html"><tt>CombinationsFin.gf</tt></a>
<li> French: <a href="CombinationsFra.html"><tt>CombinationsFra.gf</tt></a>
<li> German: <a href="CombinationsDeu.html"><tt>CombinationsDeu.gf</tt></a>
<li> Italian: <a href="CombinationsIta.html"><tt>CombinationsIta.gf</tt></a>
<li> Russian: <a href="CombinationsRus.html"><tt>CombinationsRusU.gf</tt></a>
<li> Swedish: <a href="CombinationsSwe.html"><tt>CombinationsSwe.gf</tt></a>
</ul>
-->
<p>
For the sake of uniformity, we have tried to use the same names
of parameter types when applicable. For instance, the gender parameter
is called <tt>Gender</tt> in every grammar, even though its values
differ. The definitions of the parameter
types are given in the modules <tt>TypesX</tt>.
The application grammarian following the complete abstraction principle
does not open these modules and cannot hence
use the parameter constructors directly, but rather the
names defined in <tt>ParadigmsX</tt>.
<h2>Linguist's view on resource grammars</h2>
<h3>GF and other grammar formalisms</h3>
Linguists in particular might be interested in resource
grammars for their own sake, not as basis of applications.
Since few linguists are so far familiar with GF, we refer to the
<a href="http://www.cs.chalmers.se/~aarne/GF/">GF Homepage</a>
and especially to the
<a href="http://www.cs.chalmers.se/~aarne/GF/Tutorial/">GF Tutorial</a>.
What comes here is a brief summary of the relation of GF to
other record-based formalisms.
<p>
The records of GF are much like feature structures in PATR or HPSG.
The main differences are that
<ul>
<li> GF has a type system inherited from
functional programming languages;
<li> GF records are primarily obtained as linearizations of trees, not
as parses of strings.
</ul>
The latter difference explains why a GF record typically carries more
information than a feature structure. For instance, the record describing
the French noun <i>cheval</i> is
<pre>
{s = table {Sg => "cheval" ; Pl => "chevaux"} ; g = Masc} ;
</pre>
showing the full inflection table of the (abstract) noun <i>cheval</i>.
A PATR record
for the French word <i>cheval</i> would be
<pre>
{s = "cheval" ; n = Sg ; g = Masc} ;
</pre>
showing just the information that can be gathered from the (concrete)
string <i>cheval</i>.
There is a rather straightforward sense in which the PATR record is an
<b>instance</b> of the GF record.
<p>
When generating language from syntax trees (or from logical formulas via
syntax trees), the record containing full inflection tables is an efficient
(linear-time) method of producing the correct forms.
This is important when text is generated in real time in
an interactive system.
<h2>The resource grammar infrastructure</h2>
As explained above, the application grammarian's view on resource grammars
is through API modules. They are collections of type signatures of functions.
It is the task of linguists to define these functions.
The definitions are in the end given
in the <b>resource grammar infrastructure</b>.
<p>
We have divided the core resource grammar for each language <tt>X</tt>
into the following parts:
<ul>
<li> Type system: <tt>TypesX.gf</tt>
<li> Morphology: <tt>MorphoX.gf</tt>
<li> Syntax: <tt>SyntaxX.gf</tt>
</ul>
To get the most powerful resource grammar for each language, one can use
these files directly. To view these modules, documents can be generated
by <tt>gfdoc</tt>.
<h2>Compiling and using the resource</h2>
If you want to use the resource grammars,
you should download and unpack the
<a href="gf-resource.tgz">GF grammar package</a> and go to the
directory <tt>newresource</tt>.
At Chalmers, however, we keep the resource grammars in the
GF CVS archive, in the directory <tt>grammars/newresource/</tt>,
and you'd better take them that way. The package accessible through www
is usually not quite up to date.
<p>
To compile the resource into precompiled modules, for all languages, type
<pre>
make
</pre>
in the <tt>resource/</tt> directory.
What you get is a set of <tt>gfr</tt> and <tt>gfc</tt> files.
You need never consult any of these files,
but mostly look into <tt>resource.Abs.gf</tt> for the type
signatures of syntactic structures.
<h2>Examples of using the resource grammars</h2>
<h3>A test suite</h3>
The grammars <tt>TestX</tt> define a few expressions of each
lexical category and make it possible to test linearization, parsing,
random generation, and editing.
<h3>A database query language</h3>
The grammars <tt>database/(Database|Restaurant)X.gf</tt>
make use of the resource. The <tt>RestaurantX.gf</tt>
grammars are just one possible application building on the generic
<tt>DatabaseX.gf</tt> grammars.
<h2>Functional morphology</h2>
Even though GF is a useful language for describing syntax and semantics, it
is not the optimal choice for morphology.
One reason is the absence of low-level
programming, such as string matching. Another reason is efficiency.
In connection with the resource grammar project, we have started another
project, <b>functional morphology</b>, which uses Haskell to implement
morphology. See the
<a href="http://www.cs.chalmers.se/~markus/FM"><tt>Functional Morphology Homepage</tt></a>
for more information.
<h2>Further reading</h2>
The paper Modular Grammar Engineering in GF and
a set of slides on the same topic.
</body>
</html>