mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-12 14:29:31 -06:00
516 lines
17 KiB
HTML
516 lines
17 KiB
HTML
<html>
|
|
|
|
<body bgcolor="#FFFFFF" text="#000000" >
|
|
|
|
<center>
|
|
<img SRC="./gf-logo.gif">
|
|
|
|
<h1>The GF Resource Grammar Library</h1>
|
|
|
|
|
|
<a href="http://www.cs.chalmers.se/~aarne">Aarne Ranta</a>
|
|
2002-2004
|
|
|
|
<p>
|
|
|
|
Version 0.7: <a href="gf-resource.tgz">source package</a>.
|
|
|
|
<p>
|
|
|
|
Current languages: English, Finnish, French, German, Italian, Russian, Swedish.
|
|
|
|
</center>
|
|
|
|
<font size=2>
|
|
<b>News</b> <br>
|
|
|
|
24/6/2004 Version 0.7 released together with the release of GF 2.0.
|
|
|
|
<br>
|
|
|
|
13/4/2004 Version 0.6 completed using the module system of GF 2. Also an
|
|
extended coverage. The files are placed in separate subdirectories (one
|
|
per language) and have different names than before, so that file names
|
|
(without the extension <tt>.gf</tt>) are also legal module names.
|
|
|
|
<br>
|
|
|
|
15/8/2003 Version 0.4 with Finnish added. Some updates of the Russian modules.
|
|
|
|
<br>
|
|
|
|
25/6/2003 Release of GF 1.2 making it more efficient to work with
|
|
resource grammars. See
|
|
<a href="http://www.cs.chalmers.se/~aarne/GF/doc/gf-1.2.html">highlights</a>.
|
|
Also <a href="gf-resource-0.3.tgz">source package version 0.3</a>
|
|
with some bug fixes.
|
|
|
|
<br>
|
|
|
|
5/6/2003. Russian resource modules by
|
|
<a href="http://www.cs.chalmers.se/~janna">Janna Khegai</a>.
|
|
Cyrillic strings in the files <tt>*.RusU.gf</tt> use UTF-8 encoding,
|
|
which is automatically detected by the Java GUI to GF. However, in web
|
|
browsers the encoding must be set manually.
|
|
|
|
<br>
|
|
|
|
3/6/2003. New version of this document, with separate sections
|
|
on application and resource grammarians' views and
|
|
added documentation
|
|
on the type system of each language <tt>X</tt>
|
|
in <tt>CombinationsX.gf</tt>.
|
|
|
|
<br>
|
|
|
|
23/5/2003. High-level lexicon access also in
|
|
French,
|
|
Italian,
|
|
and
|
|
Swedish.
|
|
|
|
<br>
|
|
|
|
23/5/2003.
|
|
Italian grammar based on generic Romance modules, shared with French.
|
|
|
|
<br>
|
|
|
|
14/4/2003. High-level access to define a lexicon in
|
|
English and German.
|
|
|
|
<p>
|
|
|
|
<i>
|
|
<b>Notice</b>. You need GF Version 2.0 or later
|
|
to work with these resource grammars.
|
|
It is available from the
|
|
<a href="http://www.cs.chalmers.se/~aarne/GF/">GF home page</a>.
|
|
</i>
|
|
</font>
|
|
|
|
<p>
|
|
|
|
|
|
<h2>Introduction</h2>
|
|
|
|
As programs in general can be divided into
|
|
application programs and library programs,
|
|
GF grammars can be divided into
|
|
<b>application grammars</b> and
|
|
<b>resource grammars</b>.
|
|
An application grammar is typically built around
|
|
a semantic model, which is formalized as the abstract
|
|
syntax of the language. Concrete syntax defines
|
|
a mapping from the abstract syntax into English or
|
|
Swedish or some other language.
|
|
A resource grammar is not based on semantics, but its
|
|
purpose is to define the linguistic "surface" structures
|
|
of some language. The availability of these structures makes it easier to
|
|
write application grammars.
|
|
|
|
|
|
|
|
<h3>Abstraction level</h3>
|
|
|
|
Resource grammars
|
|
<b>raise the level of abstraction in concrete syntax</b>.
|
|
The author of an application grammar is freed from thinking
|
|
about inflection, word order, etc, but can use structured
|
|
tree-like objects in linearization rules. For instance, to
|
|
express the linearization of the arithmetical predicate <i>even</i>
|
|
in French, she no longer has to write
|
|
<pre>
|
|
lin Even x = {s =
|
|
table {
|
|
m => x.s ++
|
|
table {Ind => "est" ; Subj => "soit"} ! m ++
|
|
table {Masc => "pair" ; Fem => "paire"} ! x.g
|
|
}
|
|
} ;
|
|
</pre>
|
|
but simply
|
|
<pre>
|
|
lin Even = predA1 (adjReg "pair") ;
|
|
</pre>
|
|
The author of the French resource grammar will have defined the
|
|
functions <tt>predAdj</tt> and <tt>adjReg</tt> in such a way that
|
|
they can be used in all applications. The type checker of the GF grammar
|
|
compiler guarantees that only grammatically correct combinations
|
|
can be formed by the resource grammar functions.
|
|
|
|
|
|
|
|
<h3>Unity of language</h3>
|
|
|
|
In addition to high abstraction level, reusability, and the division
|
|
of labour, resource grammars have the virtue of making sense of the
|
|
<b>unity of a language</b> such as English: while application grammars
|
|
depend on applications, resource grammars depend on language.
|
|
What is more, resource grammars for related languages can
|
|
share much of their code: to what degree this can be done gives
|
|
a measure of how related the languages are.
|
|
Thus we find resource grammars to be an interesting linguistic
|
|
project in its own right.
|
|
|
|
|
|
<h3>Semantics</h3>
|
|
|
|
We leave it open if we can also explain the <b>semantics</b>
|
|
of resource grammar on the general level. The philosophy of GF,
|
|
inherited from logical frameworks,
|
|
is that semantics is only given to
|
|
application grammars. (You can also compare application grammars to Wittgenstein's
|
|
"language games").
|
|
This view gives us a lot of freedom in formulating resource grammars.
|
|
When describing them, we sometimes say that such-and-such a construction
|
|
is likely to be ruled out by semantic reasons; what we mean is that
|
|
this will actually happen in application grammars; we do <i>not</i>
|
|
mean that GF has no semantic rules.
|
|
An example is the question
|
|
<i>From which city is every number even or odd?</i>.
|
|
The resource grammar makes it possible to form this question,
|
|
but it can hardly be correct in any application grammar that has
|
|
a rigorous semantics.
|
|
|
|
|
|
|
|
<h2>Programmer's view on resource grammars</h2>
|
|
|
|
The resource grammar library a hierarchical structure. Its main layers are
|
|
<ul>
|
|
<li> The language-independent <b>core resource API</b>,
|
|
<a href="Combinations.html"><tt>Combinations.gf</tt></a>.
|
|
<a href="Structural.html"><tt>Structural.gf</tt></a>.
|
|
<li> The language-dependent lexical paradigm modules
|
|
<tt>ParadigmsX.gf</tt></a>.
|
|
<li> The <b>derived resource libraries</b>, some of which are
|
|
language-dependent, some of which aren't.
|
|
<li> The language-dependent <b>resource infrastructure</b>, to be described below.
|
|
</ul>
|
|
The resource infrastructure should not be needed by application grammarians: it should
|
|
be enough to use the core resource API, the paradigm modules, and the derived libraries. If
|
|
this is not the case, the best solution is to extend the derived resource
|
|
libraries or create new ones.
|
|
|
|
|
|
|
|
<h3>Grammaticality guarantee via data abstraction</h3>
|
|
|
|
An important principle is that
|
|
<ul>
|
|
<li> the core resource API and the derived resource libraries guarantee
|
|
that all type-correct uses of them preserve grammaticality.
|
|
</ul>
|
|
This principle is simultaneously a guidance for resource grammarians
|
|
and an argument for the application grammarian to use these libraries.
|
|
What we mean by "only using the libraries" is that
|
|
<ul>
|
|
<li> all <tt>lin</tt> and
|
|
<tt>lincat</tt> rules are built solely from library functions and
|
|
argument variables.
|
|
</ul>
|
|
Thus for instance no records, tables, selections or projections should appear
|
|
in the rules. What we have achieved then is <b>total data abstraction</b>,
|
|
and the grammaticality guarantee can be given.
|
|
|
|
<p>
|
|
|
|
Since the resource grammars are work in progress, their coverage is not
|
|
yet sufficient for complete data abstraction. In addition, there may of course
|
|
be bugs in the resource grammars that destroy grammaticality. The GF group is
|
|
grateful for bug reports, requests, and contributions!
|
|
|
|
<p>
|
|
|
|
The most important exception to total data abstraction in practice
|
|
concerns resource lexica. Since it is impossible to have a
|
|
full coverage of all the words in a language, users often have to introduce
|
|
their own lexical entries, and thereby use literal strings in their GF code.
|
|
The safest and most convenient way of using this is via functions
|
|
defined in <tt>ParadigmsX.gf</tt> files. Using these functions guarantees
|
|
that the lexical entries created are type-correct. But nothing guards
|
|
against misspelling a word, picking a wrong inflectional pattern, or
|
|
a wrong inherent feature (such as the gender of a French noun).
|
|
|
|
|
|
|
|
<h3>The resource grammar documentation in <tt>gfdoc</tt></h3>
|
|
|
|
All documented GF grammars linked from this page
|
|
have been written in GF and then translated to HTML
|
|
using a light-weight documentation tool,
|
|
<tt>gfdoc</tt>. The tool is available as a part of the GF
|
|
package. The program also has the
|
|
flag <tt>-latex</tt>, which produces output in Latex instead of
|
|
HTML.
|
|
|
|
|
|
|
|
<h3>The core resource API</h3>
|
|
|
|
The API is divided into two modules, <tt>Combiantions</tt> and
|
|
its extension <tt>Structural</tt>.
|
|
|
|
<p>
|
|
|
|
The module <a href="Combinations.html"><tt>Combinations</tt></a>
|
|
gives the core resource type signatures of phrasal categories and
|
|
syntactic combination rules, together with some explanations
|
|
and examples. The examples are so far only in English, but their
|
|
equivalents are available in all of the languages for which the
|
|
API has been implemented.
|
|
|
|
<p>
|
|
|
|
The module <a href="Structural.html"><tt>Structural</tt></a>
|
|
defines structural words such as determiners, pronouns,
|
|
prepositions, and conjunctions.
|
|
|
|
<p>
|
|
|
|
The file <tt>Structural.gf</tt> cannot be imported directly, but
|
|
via the generated files <tt>ResourceX.gf</tt> for each language <tt>X</tt>.
|
|
In these files, the <tt>fun/lin</tt> and <tt>cat/lincat</tt> judgements have been
|
|
translated into <tt>oper</tt> judgements.
|
|
|
|
|
|
|
|
<h3>The lexical paradigm modules</h3>
|
|
|
|
The lexical paradigm modules define, for
|
|
each lexical category, a <b>worst-case macro</b> for adding words
|
|
of that category by giving a sufficient number of characteristic
|
|
forms. In addition, the most common <b>regular paradigms</b> are
|
|
included, where it is enough just to give one form to generate
|
|
all the others.
|
|
|
|
<p>
|
|
|
|
For example, the English paradigm module has the worst-case macro for nouns,
|
|
<pre>
|
|
mkN : (man,men,man's,men's : Str) -> Gender -> N ;
|
|
</pre>
|
|
taking four forms and a gender (<tt>human</tt> or <tt>nonhuman</tt>,
|
|
as is also explained in the module). Its application
|
|
<pre>
|
|
mkN "mouse" "mice" "mouse's" "mice's" nonhuman
|
|
</pre>
|
|
defines all information that is needed for the noun <i>mouse</i>.
|
|
There are also some regular patterns, for instance,
|
|
<pre>
|
|
nReg : Str -> Gender -> N ; -- dog, dogs
|
|
nKiss : Str -> Gender -> N ; -- kiss, kisses
|
|
</pre>
|
|
examples of which are
|
|
<pre>
|
|
nReg "car" nonhuman
|
|
nKiss "waitress" human
|
|
</pre>
|
|
|
|
<p>
|
|
|
|
Here are the documented versions of the paradigm modules:
|
|
<ul>
|
|
<li> English: <a href="ParadigmsEng.html"><tt>ParadigmsEng.gf</tt></a>
|
|
<li> Finnish: <a href="ParadigmsFin.html"><tt>ParadigmsFin.gf</tt></a>
|
|
<li> French: <a href="ParadigmsFra.html"><tt>ParadigmsFra.gf</tt></a>
|
|
<li> German: <a href="ParadigmsGer.html"><tt>ParadigmsDeu.gf</tt></a>
|
|
<li> Italian: <a href="ParadigmsIta.html"><tt>ParadigmsIta.gf</tt></a>
|
|
<li> Russian: <a href="ParadigmsRus.html"><tt>ParadigmsRus.gf</tt></a>
|
|
<li> Swedish: <a href="ParadigmsSwe.html"><tt>ParadigmsSwe.gf</tt></a>
|
|
</ul>
|
|
|
|
|
|
<h3>The derived resource libraries</h3>
|
|
|
|
The core resource grammar is minimal in the sense that it defines the
|
|
smallest syntactic combinations and has no redundancy. For applications, it
|
|
is usually more convenient to use combinations of the minimal rules.
|
|
Some such combinations are given in the <b>predication library</b>,
|
|
which defines the simultaneous applications of one- and two-place
|
|
verbs and adjectives to all their argument noun phrases. It also
|
|
defines some other constructions useful for logical and mathematical
|
|
applications.
|
|
|
|
<p>
|
|
|
|
The API of the predication library is the module
|
|
<a href="Predication.html"><tt>Predication</tt></a>.
|
|
What is imported is one of the language-dependent modules,
|
|
<tt>X/PredicationX</tt> for each language <tt>X</tt>.
|
|
|
|
|
|
|
|
<h3>The language-dependent type systems</h3>
|
|
|
|
Sometimes it is useful for the application grammarian to know what the
|
|
language-dependent linearizations types are for each category in the
|
|
core resource. These types are defined in the files <tt>CombinationsX.gf</tt>.
|
|
They can be translated to documents by <tt>gfdoc</tt> if desired.
|
|
<!--
|
|
|
|
<ul>
|
|
<li> English: <a href="CombinationsEng.html"><tt>CombinationsEng.gf</tt></a>
|
|
<li> Finnish: <a href="CombinationsFin.html"><tt>CombinationsFin.gf</tt></a>
|
|
<li> French: <a href="CombinationsFra.html"><tt>CombinationsFra.gf</tt></a>
|
|
<li> German: <a href="CombinationsDeu.html"><tt>CombinationsDeu.gf</tt></a>
|
|
<li> Italian: <a href="CombinationsIta.html"><tt>CombinationsIta.gf</tt></a>
|
|
<li> Russian: <a href="CombinationsRus.html"><tt>CombinationsRusU.gf</tt></a>
|
|
<li> Swedish: <a href="CombinationsSwe.html"><tt>CombinationsSwe.gf</tt></a>
|
|
</ul>
|
|
-->
|
|
|
|
<p>
|
|
|
|
For the sake of uniformity, we have tried to use the same names
|
|
of parameter types when applicable. For instance, the gender parameter
|
|
is called <tt>Gender</tt> in every grammar, even though its values
|
|
differ. The definitions of the parameter
|
|
types are given in the modules <tt>TypesX</tt>.
|
|
The application grammarian following the complete abstraction principle
|
|
does not open these modules and cannot hence
|
|
use the parameter constructors directly, but rather the
|
|
names defined in <tt>ParadigmsX</tt>.
|
|
|
|
|
|
|
|
<h2>Linguist's view on resource grammars</h2>
|
|
|
|
<h3>GF and other grammar formalisms</h3>
|
|
|
|
Linguists in particular might be interested in resource
|
|
grammars for their own sake, not as basis of applications.
|
|
Since few linguists are so far familiar with GF, we refer to the
|
|
<a href="http://www.cs.chalmers.se/~aarne/GF/">GF Homepage</a>
|
|
and especially to the
|
|
<a href="http://www.cs.chalmers.se/~aarne/GF/Tutorial/">GF Tutorial</a>.
|
|
What comes here is a brief summary of the relation of GF to
|
|
other record-based formalisms.
|
|
|
|
<p>
|
|
|
|
The records of GF are much like feature structures in PATR or HPSG.
|
|
The main differences are that
|
|
<ul>
|
|
<li> GF has a type system inherited from
|
|
functional programming languages;
|
|
<li> GF records are primarily obtained as linearizations of trees, not
|
|
as parses of strings.
|
|
</ul>
|
|
The latter difference explains why a GF record typically carries more
|
|
information than a feature structure. For instance, the record describing
|
|
the French noun <i>cheval</i> is
|
|
<pre>
|
|
{s = table {Sg => "cheval" ; Pl => "chevaux"} ; g = Masc} ;
|
|
</pre>
|
|
showing the full inflection table of the (abstract) noun <i>cheval</i>.
|
|
A PATR record
|
|
for the French word <i>cheval</i> would be
|
|
<pre>
|
|
{s = "cheval" ; n = Sg ; g = Masc} ;
|
|
</pre>
|
|
showing just the information that can be gathered from the (concrete)
|
|
string <i>cheval</i>.
|
|
There is a rather straightforward sense in which the PATR record is an
|
|
<b>instance</b> of the GF record.
|
|
|
|
<p>
|
|
|
|
When generating language from syntax trees (or from logical formulas via
|
|
syntax trees), the record containing full inflection tables is an efficient
|
|
(linear-time) method of producing the correct forms.
|
|
This is important when text is generated in real time in
|
|
an interactive system.
|
|
|
|
|
|
|
|
<h2>The resource grammar infrastructure</h2>
|
|
|
|
As explained above, the application grammarian's view on resource grammars
|
|
is through API modules. They are collections of type signatures of functions.
|
|
It is the task of linguists to define these functions.
|
|
The definitions are in the end given
|
|
in the <b>resource grammar infrastructure</b>.
|
|
|
|
<p>
|
|
|
|
We have divided the core resource grammar for each language <tt>X</tt>
|
|
into the following parts:
|
|
<ul>
|
|
<li> Type system: <tt>TypesX.gf</tt>
|
|
<li> Morphology: <tt>MorphoX.gf</tt>
|
|
<li> Syntax: <tt>SyntaxX.gf</tt>
|
|
</ul>
|
|
To get the most powerful resource grammar for each language, one can use
|
|
these files directly. To view these modules, documents can be generated
|
|
by <tt>gfdoc</tt>.
|
|
|
|
|
|
<h2>Compiling and using the resource</h2>
|
|
|
|
If you want to use the resource grammars,
|
|
you should download and unpack the
|
|
<a href="gf-resource.tgz">GF grammar package</a> and go to the
|
|
directory <tt>newresource</tt>.
|
|
At Chalmers, however, we keep the resource grammars in the
|
|
GF CVS archive, in the directory <tt>grammars/newresource/</tt>,
|
|
and you'd better take them that way. The package accessible through www
|
|
is usually not quite up to date.
|
|
|
|
<p>
|
|
|
|
To compile the resource into precompiled modules, for all languages, type
|
|
<pre>
|
|
make
|
|
</pre>
|
|
in the <tt>resource/</tt> directory.
|
|
What you get is a set of <tt>gfr</tt> and <tt>gfc</tt> files.
|
|
You need never consult any of these files,
|
|
but mostly look into <tt>resource.Abs.gf</tt> for the type
|
|
signatures of syntactic structures.
|
|
|
|
|
|
|
|
<h2>Examples of using the resource grammars</h2>
|
|
|
|
<h3>A test suite</h3>
|
|
|
|
The grammars <tt>TestX</tt> define a few expressions of each
|
|
lexical category and make it possible to test linearization, parsing,
|
|
random generation, and editing.
|
|
|
|
|
|
<h3>A database query language</h3>
|
|
|
|
The grammars <tt>database/(Database|Restaurant)X.gf</tt>
|
|
make use of the resource. The <tt>RestaurantX.gf</tt>
|
|
grammars are just one possible application building on the generic
|
|
<tt>DatabaseX.gf</tt> grammars.
|
|
|
|
|
|
|
|
|
|
<h2>Functional morphology</h2>
|
|
|
|
Even though GF is a useful language for describing syntax and semantics, it
|
|
is not the optimal choice for morphology.
|
|
One reason is the absence of low-level
|
|
programming, such as string matching. Another reason is efficiency.
|
|
In connection with the resource grammar project, we have started another
|
|
project, <b>functional morphology</b>, which uses Haskell to implement
|
|
morphology. See the
|
|
<a href="http://www.cs.chalmers.se/~markus/FM"><tt>Functional Morphology Homepage</tt></a>
|
|
for more information.
|
|
|
|
|
|
|
|
<h2>Further reading</h2>
|
|
|
|
The paper Modular Grammar Engineering in GF and
|
|
a set of slides on the same topic.
|
|
|
|
</body>
|
|
</html>
|
|
|