The GF Resource Grammar Library

Aarne Ranta 2002-2004

Current languages: English, Finnish, French, German, Italian, Russian, Swedish.

News.
13/4/2004 Version 0.6 written using the module system of GF 2. Also an extended coverage. The files are placed in separate subdirectories (one per language) and have different names than before, so that file names (without the extension .gf) are also legal module names.
15/8/2003 Version 0.4 with Finnish added. Some updates of the Russian modules.
25/6/2003 Release of GF 1.2 making it much more efficient to work with resource grammars. See highlights. Also source package version 0.3 with some bug fixes.
5/6/2003. Russian resource modules by Janna Khegai. Cyrillic strings in the files *.RusU.gf use UTF-8 encoding, which is automatically detected by the Java GUI to GF. However, in web browsers the encoding must be set manually.
3/6/2003. New version of this document, with separate sections on application and resource grammarians' views and added documentation on the type system of each language X in resource.X.gf.
23/5/2003. High-level lexicon access also in French, Italian, and Swedish.
23/5/2003. Italian grammar based on generic Romance modules, shared with French.
14/4/2003. High-level access to define a lexicon in English and German.

Notice. You need GF Version 2.0beta or later to work with these resource grammars. It is available from the GF home page.

Introduction

As programs in general can be divided into application programs and library programs, GF grammars can be divided into application grammars and resource grammars. An application grammar is typically built around a semantic model, which is formalized as the abstract syntax of the language. Concrete syntax defines a mapping from the abstract syntax into English or Swedish or some other language. A resource grammar is not based on semantics, but its purpose is to define the linguistic "surface" structures of some language. The availability of these structures makes it easier to write application grammars.

Abstraction level

Resource grammars raise the level of abstraction in concrete syntax. The author of an application grammar is freed from thinking about inflection, word order, etc, but can use structured tree-like objects in linearization rules. For instance, to express the linearization of the arithmetical predicate even in French, she no longer has to write

  lin Even x = {s =
      table {
        m => x.s ++ 
             table {Ind  => "est" ;  Subj => "soit"} ! m ++
             table {Masc => "pair" ; Fem  => "paire"} ! x.g
      }
    } ;

but simply

  lin Even = predA1 (adjReg "pair") ;

The author of the French resource grammar will have defined the functions predAdj and adjReg in such a way that they can be used in all applications.

Unity of language

In addition to high abstraction level, reusability, and the division of labour, resource grammars have the virtue of making sense of the unity of a language such as English: while application grammars depend on applications, resource grammars depend on language. What is more, resource grammars for related languages can share much of their code: to what degree this can be done gives a measure of how related the languages are. Thus we find resource grammars to be an interesting linguistic project in its own right.

Semantics

We leave it open if we can also explain the semantics of resource grammar on the general level. The philosophy of GF, inherited from logical frameworks, is that semantics is only given to application grammars. (You can also compare them to Wittgenstein's "language games"). This view gives us a lot of freedom in formulating resource grammars. When describing them, we sometimes say that such-and-such construction is likely to be ruled out by semantic reasons; what we mean is that this will actually happen in application grammars; we do not mean that GF has no semantic rules. An example of this is the free formation of question adverbials, e.g. From which city is every number even or odd?. The resource grammar makes it possible to form this question, but it can hardly be grammatical in any sensible application grammar.

Programmer's view on resource grammars

The resource grammar library a hierarchical structure. Its main layers are

The language-dependent core resources, to be described below.
The language-independent core resource API, Combinations.gf. Structural.gf.
The derived resource libraries, some of which are language-dependent, some of which aren't. The most important ones are the language-dependent lexical paradigm modules ParadigmsX.gf.

The core resources should not be needed by application grammarians: it should be enough to use the core resource API and the derived libraries. If this is not the case, the best solution is to extend the derived resource libraries or create new ones.

Grammaticality guarantee via data abstraction

An important principle is that

the core resource API and the derived resource libraries guarantee that all type-correct uses of them preserve grammaticality.

This principle is simultaneously a guidance for resource grammarians and an argument for the application grammarian to use these libraries. What we mean by "only using the libraries" is that

all lin and lincat rules are built solely from library functions and argument variables.

Thus for instance no records, tables, selections or projections should appear in the rules. What we have achieved then is total data abstraction, and the grammaticality guarantee can be given.

Since the resource grammars are work in progress, their coverage is not yet sufficient for complete data abstraction. In addition, there may of course be bugs in the resource grammars that destroy grammaticality. The GF group is grateful for bug reports, requests, and contributions!

The most important exception to total data abstraction in practice is the incompleteness of resource lexica. Since it is impossible to have full coverage of all the words in a language, users often have to introduce their own lexical entries, and thereby use literal strings in their GF code. The safest and most convenient way of using this is via functions defined in ParadigmsX.gf files. Using these functions guarantees that the lexical entries created are type-correct. But nothing guards against misspelling a word, picking a wrong inflectional pattern, or a wrong inherent feature (such as gender).

The resource grammar documentation in `gfdoc`

All documented GF grammars linked from this page have been written in GF and then translated to HTML using a light-weight documentation tool, gfdoc. The tool is available as a part of the GF source code package, in the Haskell file util/GFDoc.hs that can be run in the Hugs interpreter by the script util/gfdoc. The program also has the flag +latex, which produces output in Latex instead of HTML.

The core resource API

The API is divided into two modules, Combiantions and its extension Structural.

The file Combinations.gf gives the core resource type signatures of phrasal categories and syntactic combination rules, together with some explanations and examples. The examples are so far only in English, but their equivalents are available in all of the languages for which the API has been implemented.

The file Structurals.gf gives a list of structural words such as determiners, pronouns, prepositions, and conjunctions.

The file Structural.gf cannot be imported directly, but via the generated files ResourceX.gf for each language X. In these files, the fun/lin and cat/lincat judgements have been translated into oper judgements.

The lexical paradigm modules

The lexical paradigm modules define, for each lexical category, a worst-case macro for adding words of that category by giving a sufficient number of characteristic forms. In addition, the most common regular paradigms are included, where it is enough just to give one form to generate all the others.

For example, the English paradigm module has the worst-case macro for nouns,

  mkN : (man,men,man's,men's : Str) -> Gender -> N ;

taking four forms and a gender (human or nonhuman, as is also explained in the module). Its application

  mkN "mouse" "mice" "mouse's" "mice's" nonhuman

defines all information that is needed for the noun mouse. There are also some regular patterns, for instance,

  nReg  : Str -> Gender -> N ;   -- dog, dogs
  nKiss : Str -> Gender -> N ;   -- kiss, kisses

examples of which are

  nReg "car" nonhuman
  nKiss "waitress" human

Here are the documented versions of the paradigm modules:

The derived resource libraries

The core resource grammar is minimal in the sense that it defines the smallest syntactic combinations and has no redundancy. For applications, it is usually more convenient to use combinations of the minimal rules. Some such combinations are given in the predication library, which defines the simultaneous applications of one- and two-place verbs and adjectives to all their argument noun phrases. It also defines some other constructions useful for logical and mathematical applications.

The API of the predication library is in the file Predication.gf. What is imported is one of the language-dependent files, X/PredicationX.gf for each language X.

The language-dependent type systems

Sometimes it is useful for the application grammarian to know what the language-dependent linearizations types are for each category in the core resource. These types are defined in the files CombinationsX.gf:

For the sake of uniformity, we have tried to use the same names of parameter types when applicable. For instance, the gender parameter is called Gender in every grammar, even though its values differ. The definitions of the parameter types are given in the files TypesX.gf. The application grammarian following the complete abstraction principle should not use the parameter constructors directly, but rather the names defined in ParadigmsX.gf.

Linguist's view on resource grammars

GF and other grammar formalisms

Linguists in particular might be interested in resource grammars for their own sake, not as basis of applications. Since few linguists are so far familiar with GF, we refer to the GF Homepage and especially to the GF Tutorial. What comes here is a brief summary of the relation of GF to other record-based formalisms.

The records of GF are much like feature structures in PATR or HPSG. The main differences are that

GF has a type system inherited from functional programming languages;
GF records are primarily obtained as linearizations of trees, not as parses of strings.

The latter difference explains why a GF record typically carries more information than a feature structure. For instance, the record describing the French noun cheval is

  {s = table {Sg => "cheval" ; Pl => "chevaux"} ; g = Masc} ;

showing the full inflection table of the (abstract) noun cheval. A PATR record for the French word cheval would be

  {s = "cheval" ; n = Sg ; g = Masc} ;

showing just the information that can be gathered from the (concrete) string cheval. There is a rather straightforward sense in which the PATR record is an instance of the GF record.

When generating language from syntax trees (or from logical formulas via syntax trees), the record containing full inflection tables is an efficient (linear-time) method of producing the correct forms. This is important when text is generated in real time in an interactive system.

The structure of core resource grammars

As explained above, the application grammarian's view on resource grammars is through API modules. They are collections of type signatures of functions. It is the task of linguists to define these functions. The definitions are in the end given in the core resource grammars.

We have divided the core resource grammar for each language X into the following parts:

Type system: types.X.gf
Morphology: morpho.X.gf
Syntax: syntax.X.gf

To get the most powerful resource grammar for each language, one can use these files directly.

However, the languages we have studied have so much in common that we have gathered a considerable set of categories and rules in a multilingual resource grammar. Its parts are

Abstract syntax: Resource.gf
Language-dependent concrete syntax: ResourceX.gf for each language.

The advantage of using this API in application grammars is that their concrete syntax looks the same for all languages up to non-structural words. Thus it is possible to produce concrete syntaxes for new languages without knowing almost anything about them. The abstract syntax serves as a common API to the core resource grammar.

The code for the core resource grammars

The following links go to the gfdoc-generated HTML files while showing the names of the GF files.

English: types.Eng.gf, morpho.Eng.gf, syntax.Eng.gf.
Finnish: types.Fin.gf, morpho.Fin.gf, syntax.Fin.gf.
Shared Romance: types.Romance.gf, syntax.Romance.gf,
French: types.Fra.gf, morpho.Fra.gf, syntax.Fra.gf.
Italian: types.Ita.gf, morpho.Ita.gf, syntax.Ita.gf.
Russian: types.RusU.gf, morpho.RusU.gf, syntax.RusU.gf.
German: types.Deu.gf, morpho.Deu.gf, syntax.Deu.gf.
Swedish: types.Swe.gf, morpho.Swe.gf, syntax.Swe.gf.

Compiling and using the resource

If you want to use the resource grammars, you should download and unpack the source package. At Chalmers, however, we keep the resource grammars in the GF CVS archive, in the directory Grammars/resource/, and you'd better take them that way. The package accessible through www is usually not quite up to date.

To compile the resource into reusable operations, for all languages, type

  make

in the resource/ directory. This requires that you have a recent version of GF (>= 1.1). What you get is a set of files with names res.X.gf. The file res.Types.gf gives the type signatures of the operations. You need never consult any of these files, but mostly look into resource.Abs.gf for the type signatures of syntactic structures.

Examples of using the resource grammars

A test suite

The grammars test.X.gf define a few expressions of each lexical category and make it possible to test linearization, parsing, random generation, and editing.

A database query language

The grammars database/(database | restaurant).X.gf make use of the resource. The restaurant.X.gf grammars are just one possible application building on the generic database.X.gf grammars. Look at the abstract database syntax and, as an example, the French concrete syntax.

A dialogue system for the video recorder

The grammars video/video.X.gf are meant to be a prototype for dialogu systems. Look at the abstract video grammar and, as an example, the English concrete syntax.

Functional morphology

Even though GF is a useful language for describing syntax and semantics, it is not the optimal choice for morphology. One reason is the absence of low-level programming, such as string matching. Another reason is efficiency. In connection with the resource grammar project, we have started another project, functional morphology, which uses Haskell to implement morphology. Haskell morphologies can then be used for generating GF morphologies, as exemplified by large parts of morpho.Swe.gf.

Work is in progress to document functional morphology, but here is a beginning:

General library for defining morphologies: General.hs.
Swedish inflection engine: RulesSw.hs.

(Notice that we have here used gfdoc on Haskell files.)

To see that it is nevertheless possible to implement morphology in GF, look at the French morphology in morpho.Fra.gf. Its verb part is complete in the sense that it implements all the 88 inflection tables of the Bescherelle.