mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-28 05:52:51 -06:00
resource doc
This commit is contained in:
515
lib/resource-0.6/doc/gf-resource.html
Normal file
515
lib/resource-0.6/doc/gf-resource.html
Normal file
@@ -0,0 +1,515 @@
|
||||
<html>
|
||||
|
||||
<body bgcolor="#FFFFFF" text="#000000" >
|
||||
|
||||
<center>
|
||||
<img SRC="./gf-logo.gif">
|
||||
|
||||
<h1>The GF Resource Grammar Library</h1>
|
||||
|
||||
|
||||
<a href="http://www.cs.chalmers.se/~aarne">Aarne Ranta</a>
|
||||
2002-2004
|
||||
|
||||
<p>
|
||||
|
||||
Version 0.7: <a href="gf-resource.tgz">source package</a>.
|
||||
|
||||
<p>
|
||||
|
||||
Current languages: English, Finnish, French, German, Italian, Russian, Swedish.
|
||||
|
||||
</center>
|
||||
|
||||
<font size=2>
|
||||
<b>News</b> <br>
|
||||
|
||||
24/6/2004 Version 0.7 released together with the release of GF 2.0.
|
||||
|
||||
<br>
|
||||
|
||||
13/4/2004 Version 0.6 completed using the module system of GF 2. Also an
|
||||
extended coverage. The files are placed in separate subdirectories (one
|
||||
per language) and have different names than before, so that file names
|
||||
(without the extension <tt>.gf</tt>) are also legal module names.
|
||||
|
||||
<br>
|
||||
|
||||
15/8/2003 Version 0.4 with Finnish added. Some updates of the Russian modules.
|
||||
|
||||
<br>
|
||||
|
||||
25/6/2003 Release of GF 1.2 making it more efficient to work with
|
||||
resource grammars. See
|
||||
<a href="http://www.cs.chalmers.se/~aarne/GF/doc/gf-1.2.html">highlights</a>.
|
||||
Also <a href="gf-resource-0.3.tgz">source package version 0.3</a>
|
||||
with some bug fixes.
|
||||
|
||||
<br>
|
||||
|
||||
5/6/2003. Russian resource modules by
|
||||
<a href="http://www.cs.chalmers.se/~janna">Janna Khegai</a>.
|
||||
Cyrillic strings in the files <tt>*.RusU.gf</tt> use UTF-8 encoding,
|
||||
which is automatically detected by the Java GUI to GF. However, in web
|
||||
browsers the encoding must be set manually.
|
||||
|
||||
<br>
|
||||
|
||||
3/6/2003. New version of this document, with separate sections
|
||||
on application and resource grammarians' views and
|
||||
added documentation
|
||||
on the type system of each language <tt>X</tt>
|
||||
in <tt>CombinationsX.gf</tt>.
|
||||
|
||||
<br>
|
||||
|
||||
23/5/2003. High-level lexicon access also in
|
||||
French,
|
||||
Italian,
|
||||
and
|
||||
Swedish.
|
||||
|
||||
<br>
|
||||
|
||||
23/5/2003.
|
||||
Italian grammar based on generic Romance modules, shared with French.
|
||||
|
||||
<br>
|
||||
|
||||
14/4/2003. High-level access to define a lexicon in
|
||||
English and German.
|
||||
|
||||
<p>
|
||||
|
||||
<i>
|
||||
<b>Notice</b>. You need GF Version 2.0 or later
|
||||
to work with these resource grammars.
|
||||
It is available from the
|
||||
<a href="http://www.cs.chalmers.se/~aarne/GF/">GF home page</a>.
|
||||
</i>
|
||||
</font>
|
||||
|
||||
<p>
|
||||
|
||||
|
||||
<h2>Introduction</h2>
|
||||
|
||||
As programs in general can be divided into
|
||||
application programs and library programs,
|
||||
GF grammars can be divided into
|
||||
<b>application grammars</b> and
|
||||
<b>resource grammars</b>.
|
||||
An application grammar is typically built around
|
||||
a semantic model, which is formalized as the abstract
|
||||
syntax of the language. Concrete syntax defines
|
||||
a mapping from the abstract syntax into English or
|
||||
Swedish or some other language.
|
||||
A resource grammar is not based on semantics, but its
|
||||
purpose is to define the linguistic "surface" structures
|
||||
of some language. The availability of these structures makes it easier to
|
||||
write application grammars.
|
||||
|
||||
|
||||
|
||||
<h3>Abstraction level</h3>
|
||||
|
||||
Resource grammars
|
||||
<b>raise the level of abstraction in concrete syntax</b>.
|
||||
The author of an application grammar is freed from thinking
|
||||
about inflection, word order, etc, but can use structured
|
||||
tree-like objects in linearization rules. For instance, to
|
||||
express the linearization of the arithmetical predicate <i>even</i>
|
||||
in French, she no longer has to write
|
||||
<pre>
|
||||
lin Even x = {s =
|
||||
table {
|
||||
m => x.s ++
|
||||
table {Ind => "est" ; Subj => "soit"} ! m ++
|
||||
table {Masc => "pair" ; Fem => "paire"} ! x.g
|
||||
}
|
||||
} ;
|
||||
</pre>
|
||||
but simply
|
||||
<pre>
|
||||
lin Even = predA1 (adjReg "pair") ;
|
||||
</pre>
|
||||
The author of the French resource grammar will have defined the
|
||||
functions <tt>predAdj</tt> and <tt>adjReg</tt> in such a way that
|
||||
they can be used in all applications. The type checker of the GF grammar
|
||||
compiler guarantees that only grammatically correct combinations
|
||||
can be formed by the resource grammar functions.
|
||||
|
||||
|
||||
|
||||
<h3>Unity of language</h3>
|
||||
|
||||
In addition to high abstraction level, reusability, and the division
|
||||
of labour, resource grammars have the virtue of making sense of the
|
||||
<b>unity of a language</b> such as English: while application grammars
|
||||
depend on applications, resource grammars depend on language.
|
||||
What is more, resource grammars for related languages can
|
||||
share much of their code: to what degree this can be done gives
|
||||
a measure of how related the languages are.
|
||||
Thus we find resource grammars to be an interesting linguistic
|
||||
project in its own right.
|
||||
|
||||
|
||||
<h3>Semantics</h3>
|
||||
|
||||
We leave it open if we can also explain the <b>semantics</b>
|
||||
of resource grammar on the general level. The philosophy of GF,
|
||||
inherited from logical frameworks,
|
||||
is that semantics is only given to
|
||||
application grammars. (You can also compare application grammars to Wittgenstein's
|
||||
"language games").
|
||||
This view gives us a lot of freedom in formulating resource grammars.
|
||||
When describing them, we sometimes say that such-and-such a construction
|
||||
is likely to be ruled out by semantic reasons; what we mean is that
|
||||
this will actually happen in application grammars; we do <i>not</i>
|
||||
mean that GF has no semantic rules.
|
||||
An example is the question
|
||||
<i>From which city is every number even or odd?</i>.
|
||||
The resource grammar makes it possible to form this question,
|
||||
but it can hardly be correct in any application grammar that has
|
||||
a rigorous semantics.
|
||||
|
||||
|
||||
|
||||
<h2>Programmer's view on resource grammars</h2>
|
||||
|
||||
The resource grammar library a hierarchical structure. Its main layers are
|
||||
<ul>
|
||||
<li> The language-independent <b>core resource API</b>,
|
||||
<a href="Combinations.html"><tt>Combinations.gf</tt></a>.
|
||||
<a href="Structural.html"><tt>Structural.gf</tt></a>.
|
||||
<li> The language-dependent lexical paradigm modules
|
||||
<tt>ParadigmsX.gf</tt></a>.
|
||||
<li> The <b>derived resource libraries</b>, some of which are
|
||||
language-dependent, some of which aren't.
|
||||
<li> The language-dependent <b>resource infrastructure</b>, to be described below.
|
||||
</ul>
|
||||
The resource infrastructure should not be needed by application grammarians: it should
|
||||
be enough to use the core resource API, the paradigm modules, and the derived libraries. If
|
||||
this is not the case, the best solution is to extend the derived resource
|
||||
libraries or create new ones.
|
||||
|
||||
|
||||
|
||||
<h3>Grammaticality guarantee via data abstraction</h3>
|
||||
|
||||
An important principle is that
|
||||
<ul>
|
||||
<li> the core resource API and the derived resource libraries guarantee
|
||||
that all type-correct uses of them preserve grammaticality.
|
||||
</ul>
|
||||
This principle is simultaneously a guidance for resource grammarians
|
||||
and an argument for the application grammarian to use these libraries.
|
||||
What we mean by "only using the libraries" is that
|
||||
<ul>
|
||||
<li> all <tt>lin</tt> and
|
||||
<tt>lincat</tt> rules are built solely from library functions and
|
||||
argument variables.
|
||||
</ul>
|
||||
Thus for instance no records, tables, selections or projections should appear
|
||||
in the rules. What we have achieved then is <b>total data abstraction</b>,
|
||||
and the grammaticality guarantee can be given.
|
||||
|
||||
<p>
|
||||
|
||||
Since the resource grammars are work in progress, their coverage is not
|
||||
yet sufficient for complete data abstraction. In addition, there may of course
|
||||
be bugs in the resource grammars that destroy grammaticality. The GF group is
|
||||
grateful for bug reports, requests, and contributions!
|
||||
|
||||
<p>
|
||||
|
||||
The most important exception to total data abstraction in practice
|
||||
concerns resource lexica. Since it is impossible to have a
|
||||
full coverage of all the words in a language, users often have to introduce
|
||||
their own lexical entries, and thereby use literal strings in their GF code.
|
||||
The safest and most convenient way of using this is via functions
|
||||
defined in <tt>ParadigmsX.gf</tt> files. Using these functions guarantees
|
||||
that the lexical entries created are type-correct. But nothing guards
|
||||
against misspelling a word, picking a wrong inflectional pattern, or
|
||||
a wrong inherent feature (such as the gender of a French noun).
|
||||
|
||||
|
||||
|
||||
<h3>The resource grammar documentation in <tt>gfdoc</tt></h3>
|
||||
|
||||
All documented GF grammars linked from this page
|
||||
have been written in GF and then translated to HTML
|
||||
using a light-weight documentation tool,
|
||||
<tt>gfdoc</tt>. The tool is available as a part of the GF
|
||||
package. The program also has the
|
||||
flag <tt>-latex</tt>, which produces output in Latex instead of
|
||||
HTML.
|
||||
|
||||
|
||||
|
||||
<h3>The core resource API</h3>
|
||||
|
||||
The API is divided into two modules, <tt>Combiantions</tt> and
|
||||
its extension <tt>Structural</tt>.
|
||||
|
||||
<p>
|
||||
|
||||
The module <a href="Combinations.html"><tt>Combinations</tt></a>
|
||||
gives the core resource type signatures of phrasal categories and
|
||||
syntactic combination rules, together with some explanations
|
||||
and examples. The examples are so far only in English, but their
|
||||
equivalents are available in all of the languages for which the
|
||||
API has been implemented.
|
||||
|
||||
<p>
|
||||
|
||||
The module <a href="Structural.html"><tt>Structural</tt></a>
|
||||
defines structural words such as determiners, pronouns,
|
||||
prepositions, and conjunctions.
|
||||
|
||||
<p>
|
||||
|
||||
The file <tt>Structural.gf</tt> cannot be imported directly, but
|
||||
via the generated files <tt>ResourceX.gf</tt> for each language <tt>X</tt>.
|
||||
In these files, the <tt>fun/lin</tt> and <tt>cat/lincat</tt> judgements have been
|
||||
translated into <tt>oper</tt> judgements.
|
||||
|
||||
|
||||
|
||||
<h3>The lexical paradigm modules</h3>
|
||||
|
||||
The lexical paradigm modules define, for
|
||||
each lexical category, a <b>worst-case macro</b> for adding words
|
||||
of that category by giving a sufficient number of characteristic
|
||||
forms. In addition, the most common <b>regular paradigms</b> are
|
||||
included, where it is enough just to give one form to generate
|
||||
all the others.
|
||||
|
||||
<p>
|
||||
|
||||
For example, the English paradigm module has the worst-case macro for nouns,
|
||||
<pre>
|
||||
mkN : (man,men,man's,men's : Str) -> Gender -> N ;
|
||||
</pre>
|
||||
taking four forms and a gender (<tt>human</tt> or <tt>nonhuman</tt>,
|
||||
as is also explained in the module). Its application
|
||||
<pre>
|
||||
mkN "mouse" "mice" "mouse's" "mice's" nonhuman
|
||||
</pre>
|
||||
defines all information that is needed for the noun <i>mouse</i>.
|
||||
There are also some regular patterns, for instance,
|
||||
<pre>
|
||||
nReg : Str -> Gender -> N ; -- dog, dogs
|
||||
nKiss : Str -> Gender -> N ; -- kiss, kisses
|
||||
</pre>
|
||||
examples of which are
|
||||
<pre>
|
||||
nReg "car" nonhuman
|
||||
nKiss "waitress" human
|
||||
</pre>
|
||||
|
||||
<p>
|
||||
|
||||
Here are the documented versions of the paradigm modules:
|
||||
<ul>
|
||||
<li> English: <a href="ParadigmsEng.html"><tt>ParadigmsEng.gf</tt></a>
|
||||
<li> Finnish: <a href="ParadigmsFin.html"><tt>ParadigmsFin.gf</tt></a>
|
||||
<li> French: <a href="ParadigmsFra.html"><tt>ParadigmsFra.gf</tt></a>
|
||||
<li> German: <a href="ParadigmsGer.html"><tt>ParadigmsDeu.gf</tt></a>
|
||||
<li> Italian: <a href="ParadigmsIta.html"><tt>ParadigmsIta.gf</tt></a>
|
||||
<li> Russian: <a href="ParadigmsRus.html"><tt>ParadigmsRus.gf</tt></a>
|
||||
<li> Swedish: <a href="ParadigmsSwe.html"><tt>ParadigmsSwe.gf</tt></a>
|
||||
</ul>
|
||||
|
||||
|
||||
<h3>The derived resource libraries</h3>
|
||||
|
||||
The core resource grammar is minimal in the sense that it defines the
|
||||
smallest syntactic combinations and has no redundancy. For applications, it
|
||||
is usually more convenient to use combinations of the minimal rules.
|
||||
Some such combinations are given in the <b>predication library</b>,
|
||||
which defines the simultaneous applications of one- and two-place
|
||||
verbs and adjectives to all their argument noun phrases. It also
|
||||
defines some other constructions useful for logical and mathematical
|
||||
applications.
|
||||
|
||||
<p>
|
||||
|
||||
The API of the predication library is the module
|
||||
<a href="Predication.html"><tt>Predication</tt></a>.
|
||||
What is imported is one of the language-dependent modules,
|
||||
<tt>X/PredicationX</tt> for each language <tt>X</tt>.
|
||||
|
||||
|
||||
|
||||
<h3>The language-dependent type systems</h3>
|
||||
|
||||
Sometimes it is useful for the application grammarian to know what the
|
||||
language-dependent linearizations types are for each category in the
|
||||
core resource. These types are defined in the files <tt>CombinationsX.gf</tt>.
|
||||
They can be translated to documents by <tt>gfdoc</tt> if desired.
|
||||
<!--
|
||||
|
||||
<ul>
|
||||
<li> English: <a href="CombinationsEng.html"><tt>CombinationsEng.gf</tt></a>
|
||||
<li> Finnish: <a href="CombinationsFin.html"><tt>CombinationsFin.gf</tt></a>
|
||||
<li> French: <a href="CombinationsFra.html"><tt>CombinationsFra.gf</tt></a>
|
||||
<li> German: <a href="CombinationsDeu.html"><tt>CombinationsDeu.gf</tt></a>
|
||||
<li> Italian: <a href="CombinationsIta.html"><tt>CombinationsIta.gf</tt></a>
|
||||
<li> Russian: <a href="CombinationsRus.html"><tt>CombinationsRusU.gf</tt></a>
|
||||
<li> Swedish: <a href="CombinationsSwe.html"><tt>CombinationsSwe.gf</tt></a>
|
||||
</ul>
|
||||
-->
|
||||
|
||||
<p>
|
||||
|
||||
For the sake of uniformity, we have tried to use the same names
|
||||
of parameter types when applicable. For instance, the gender parameter
|
||||
is called <tt>Gender</tt> in every grammar, even though its values
|
||||
differ. The definitions of the parameter
|
||||
types are given in the modules <tt>TypesX</tt>.
|
||||
The application grammarian following the complete abstraction principle
|
||||
does not open these modules and cannot hence
|
||||
use the parameter constructors directly, but rather the
|
||||
names defined in <tt>ParadigmsX</tt>.
|
||||
|
||||
|
||||
|
||||
<h2>Linguist's view on resource grammars</h2>
|
||||
|
||||
<h3>GF and other grammar formalisms</h3>
|
||||
|
||||
Linguists in particular might be interested in resource
|
||||
grammars for their own sake, not as basis of applications.
|
||||
Since few linguists are so far familiar with GF, we refer to the
|
||||
<a href="http://www.cs.chalmers.se/~aarne/GF/">GF Homepage</a>
|
||||
and especially to the
|
||||
<a href="http://www.cs.chalmers.se/~aarne/GF/Tutorial/">GF Tutorial</a>.
|
||||
What comes here is a brief summary of the relation of GF to
|
||||
other record-based formalisms.
|
||||
|
||||
<p>
|
||||
|
||||
The records of GF are much like feature structures in PATR or HPSG.
|
||||
The main differences are that
|
||||
<ul>
|
||||
<li> GF has a type system inherited from
|
||||
functional programming languages;
|
||||
<li> GF records are primarily obtained as linearizations of trees, not
|
||||
as parses of strings.
|
||||
</ul>
|
||||
The latter difference explains why a GF record typically carries more
|
||||
information than a feature structure. For instance, the record describing
|
||||
the French noun <i>cheval</i> is
|
||||
<pre>
|
||||
{s = table {Sg => "cheval" ; Pl => "chevaux"} ; g = Masc} ;
|
||||
</pre>
|
||||
showing the full inflection table of the (abstract) noun <i>cheval</i>.
|
||||
A PATR record
|
||||
for the French word <i>cheval</i> would be
|
||||
<pre>
|
||||
{s = "cheval" ; n = Sg ; g = Masc} ;
|
||||
</pre>
|
||||
showing just the information that can be gathered from the (concrete)
|
||||
string <i>cheval</i>.
|
||||
There is a rather straightforward sense in which the PATR record is an
|
||||
<b>instance</b> of the GF record.
|
||||
|
||||
<p>
|
||||
|
||||
When generating language from syntax trees (or from logical formulas via
|
||||
syntax trees), the record containing full inflection tables is an efficient
|
||||
(linear-time) method of producing the correct forms.
|
||||
This is important when text is generated in real time in
|
||||
an interactive system.
|
||||
|
||||
|
||||
|
||||
<h2>The resource grammar infrastructure</h2>
|
||||
|
||||
As explained above, the application grammarian's view on resource grammars
|
||||
is through API modules. They are collections of type signatures of functions.
|
||||
It is the task of linguists to define these functions.
|
||||
The definitions are in the end given
|
||||
in the <b>resource grammar infrastructure</b>.
|
||||
|
||||
<p>
|
||||
|
||||
We have divided the core resource grammar for each language <tt>X</tt>
|
||||
into the following parts:
|
||||
<ul>
|
||||
<li> Type system: <tt>TypesX.gf</tt>
|
||||
<li> Morphology: <tt>MorphoX.gf</tt>
|
||||
<li> Syntax: <tt>SyntaxX.gf</tt>
|
||||
</ul>
|
||||
To get the most powerful resource grammar for each language, one can use
|
||||
these files directly. To view these modules, documents can be generated
|
||||
by <tt>gfdoc</tt>.
|
||||
|
||||
|
||||
<h2>Compiling and using the resource</h2>
|
||||
|
||||
If you want to use the resource grammars,
|
||||
you should download and unpack the
|
||||
<a href="gf-resource.tgz">GF grammar package</a> and go to the
|
||||
directory <tt>newresource</tt>.
|
||||
At Chalmers, however, we keep the resource grammars in the
|
||||
GF CVS archive, in the directory <tt>grammars/newresource/</tt>,
|
||||
and you'd better take them that way. The package accessible through www
|
||||
is usually not quite up to date.
|
||||
|
||||
<p>
|
||||
|
||||
To compile the resource into precompiled modules, for all languages, type
|
||||
<pre>
|
||||
make
|
||||
</pre>
|
||||
in the <tt>resource/</tt> directory.
|
||||
What you get is a set of <tt>gfr</tt> and <tt>gfc</tt> files.
|
||||
You need never consult any of these files,
|
||||
but mostly look into <tt>resource.Abs.gf</tt> for the type
|
||||
signatures of syntactic structures.
|
||||
|
||||
|
||||
|
||||
<h2>Examples of using the resource grammars</h2>
|
||||
|
||||
<h3>A test suite</h3>
|
||||
|
||||
The grammars <tt>TestX</tt> define a few expressions of each
|
||||
lexical category and make it possible to test linearization, parsing,
|
||||
random generation, and editing.
|
||||
|
||||
|
||||
<h3>A database query language</h3>
|
||||
|
||||
The grammars <tt>database/(Database|Restaurant)X.gf</tt>
|
||||
make use of the resource. The <tt>RestaurantX.gf</tt>
|
||||
grammars are just one possible application building on the generic
|
||||
<tt>DatabaseX.gf</tt> grammars.
|
||||
|
||||
|
||||
|
||||
|
||||
<h2>Functional morphology</h2>
|
||||
|
||||
Even though GF is a useful language for describing syntax and semantics, it
|
||||
is not the optimal choice for morphology.
|
||||
One reason is the absence of low-level
|
||||
programming, such as string matching. Another reason is efficiency.
|
||||
In connection with the resource grammar project, we have started another
|
||||
project, <b>functional morphology</b>, which uses Haskell to implement
|
||||
morphology. See the
|
||||
<a href="http://www.cs.chalmers.se/~markus/FM"><tt>Functional Morphology Homepage</tt></a>
|
||||
for more information.
|
||||
|
||||
|
||||
|
||||
<h2>Further reading</h2>
|
||||
|
||||
The paper Modular Grammar Engineering in GF and
|
||||
a set of slides on the same topic.
|
||||
|
||||
</body>
|
||||
</html>
|
||||
|
||||
Reference in New Issue
Block a user