mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-11 05:49:31 -06:00
5624 lines
187 KiB
Plaintext
5624 lines
187 KiB
Plaintext
Grammatical Framework: Tutorial, Applications, and Reference Manual
|
|
Aarne Ranta
|
|
Draft %%date(%c)
|
|
|
|
% NOTE: this is a txt2tags file.
|
|
% Create an html file from this file using:
|
|
% txt2tags --toc gf-tutorial2.txt
|
|
|
|
%!target:html
|
|
%!encoding: iso-8859-1
|
|
|
|
%%!postproc(tex): "section\*" "section"
|
|
|
|
%!postproc(tex): "subsection\*" "section"
|
|
%!postproc(tex): "section\*" "chapter"
|
|
|
|
%!postproc(html): #BCEN <center>
|
|
%!postproc(html): #ECEN </center>
|
|
|
|
%!postproc(tex): #BCEN "begin{center}"
|
|
%!postproc(tex): #ECEN "end{center}"
|
|
|
|
%!preproc(html): #EDITORPNG [../quick-editor.png]
|
|
%!preproc(tex): #EDITORPNG [../../lib/resource-1.0/doc/10lang-small.png]
|
|
|
|
%!preproc(html): #LOGOPNG [../gf-logo.png]
|
|
%!preproc(tex): #LOGOPNG ""
|
|
|
|
|
|
%!postproc(tex): #PARTone "part{Tutorial}"
|
|
%!postproc(tex): #PARTtwo "part{Applications of Grammars}"
|
|
%!postproc(tex): #PARTfour "part{Advanced Grammar Writing}"
|
|
%!postproc(tex): #PARTthree "part{Reference Manual}"
|
|
|
|
%!postproc(tex): #PARTbnf "include{DocGF}"
|
|
%!postproc(tex): #PARTquickref "chapter{Quick Reference}"
|
|
%!postproc(tex): #twocolumn "twocolumn"
|
|
%!postproc(tex): #smallsize "tiny"
|
|
%!postproc(tex): #startappendix "appendix"
|
|
|
|
|
|
%!postproc(tex): #FORMULAone "input{FORMULAone}"
|
|
|
|
#LOGOPNG
|
|
|
|
|
|
|
|
%--!
|
|
=Introduction=
|
|
|
|
In this Introduction, we will discuss the field of natural language processing
|
|
and locate the place of GF in the field. We will continue with a brief history
|
|
of GF and its applications, followed by an overview of this book.
|
|
This Introduction contains no technical material that is presupposed in
|
|
later chapters. Therefore, the practically oriented reader can jump
|
|
directly to the Tutorial starting from Chapter 2.
|
|
|
|
|
|
|
|
==Natural language application programming==
|
|
|
|
Making computers understand human language is one of the oldest dreams of
|
|
programmers. Projects with machine translation started almost as soon as
|
|
the first computers appeared in the 1940's. They was partly encouraged by the
|
|
success of decryption during the Second World War. Thus some American scientists
|
|
had the vision that Russian can be seen as encrypted English, which can be
|
|
deciphered by similar algorithms as those used for cracking the Germans' Enigma.
|
|
|
|
Despite substantial efforts in machine translation, the early visions were not
|
|
realized, and the general conclusion reached by the mid-1960's was that
|
|
high-quality broad-coverage machine translation is impossible. Machine
|
|
translation was tuned down to the less ambitious and more specialized group of
|
|
tasks that started to be called computational linguistics.
|
|
Parallel to this, fantacies of "speaking robots" and
|
|
other language-understanding machines prevailed, exemplified by such science
|
|
fiction figures as the HAL computer in "2001: A Space Odyssey".
|
|
|
|
The language understanding machines we see today are a variety of
|
|
products, which focus on different aspects of the task and none of which comes
|
|
even close to HAL or a machine translator with human-like capacities. Here is a
|
|
list of some such applications:
|
|
- browse-quality machine translation: Systran
|
|
- machine translation specialized on weather reports: Meteo
|
|
- electronic dictionaries: desktop, web-based, portable
|
|
- spelling and grammar checkers
|
|
- dialogue systems enabling simple speech interaction with a computer
|
|
|
|
|
|
A common feature of these applications is that their construction requires
|
|
**linguistic knowledge**: theoretical understanding of languages. As opposed to
|
|
practical understanding, which means the ability to speak, listen, write, and
|
|
read, theoretical understanding means knowledge of the **rules** of language.
|
|
It is by expressing these rules in a programming language the we can hope to
|
|
make a computer understand at least something of a natural language.
|
|
|
|
This is where GF comes into picture. GF, Grammatical Framework, is a programming
|
|
language designed for expressing linguistic rules. A set of such rules is called
|
|
a **grammar**. GF is designed to make it easy to write grammar rules; this is
|
|
much easier than in a general-purpose programming language such as
|
|
Java or C or Haskell. But it is also in many ways easier and more productive
|
|
than in other languages specialized in grammars. The most well-known of these
|
|
is the **BNF notation** (Backus Naur Form), which is also known as
|
|
**context-free grammars**. It is used in compiler tools such as YACC.
|
|
While BNF is an excellent way to specify the grammar of a programming language,
|
|
it does not scale up the the complexities of of natural languages.
|
|
|
|
Linguists have of course developed many formalisms that are designed for
|
|
describing natural languages. In comparison with them, one advantage of GF is
|
|
its support for **multilingual grammars**. In a multilingual grammar, a
|
|
semantic representation can be shared between several languages, in such a
|
|
way that a grammar written for one language can be easily ported to another
|
|
one. The grammar also supports translation between the languages it includes.
|
|
The most comprehensive multilingual grammars written in GF cover almost 100
|
|
languages.
|
|
|
|
GF does not only enable the writing of grammars. It is also equipped with tools
|
|
for integrating grammars in language-processing systems.
|
|
To build a language application usually involves much more than just a grammar,
|
|
and these other parts are often written in general-purpose languages.
|
|
Since it is important that the grammar can be integrated seemlessly with
|
|
the rest of the application, GF grammars can be converted into
|
|
**embedded grammars**, which can be directly used as components of
|
|
programs written in other languages such as C, Java, JavaScript, and Haskell.
|
|
|
|
Since natural language application programming requires linguistic knowledge, it
|
|
is usually considered to need linguistic training. The mission of GF is to relieve
|
|
some of this need. This is achieved in two ways:
|
|
- GF works in a way familiar to ordinary programmers, namely as a **compiler**
|
|
that analyses a language and generates a result.
|
|
- GF has a set of **resource grammar libraries**, which encapsulate much of
|
|
the linguistic knowledge needed when writing grammars.
|
|
|
|
|
|
This said, GF makes no claim to "fire linguists" from natural language programming
|
|
projects. The claim is just that there should be a **division of labour**:
|
|
in GF, grammar can be divided into different **modules**, where some modules
|
|
require linguistic knowledge and others don't. Linguists working on the linguistic
|
|
modules will appreciate the way GF supports abstractions and generalizations, and
|
|
also the grammar development tools that enable testing of linguistic rules.
|
|
Non-linguists working on application-oriented modules will appreciate the
|
|
possibility to rely grammar rules defined in the linguistic library modules,
|
|
and to focus on other aspects of the task.
|
|
|
|
|
|
|
|
==A brief history of GF and its applications==
|
|
|
|
GF belongs to the tradition of **functional programming languages**, exemplified
|
|
by ML and Haskell and, somewhat more remotely, Lisp. One branch
|
|
of functional programming is **type theory**, which in turn has its roots in
|
|
logic and the foundations of mathematics. GF was, at the first place, created to
|
|
implement the idea that type theory can provide **semantics**, i.e. formalize
|
|
the meaning of natural languages. Several aspects of type-theoretical semantics
|
|
were covered in the monograph //Type-Theoretical Grammar// (Ranta 1994).
|
|
But a stronger aspect grew out of subsequent experiments dealing with different
|
|
languages: it is possible to have a common semantics for many languages, and
|
|
thereby build systems that translate between languages via the semantics.
|
|
The first implementation of this idea was written as a plug-in to the
|
|
proof editor Alfa (Magnusson & Nordström 1994) in 1995. It supported the
|
|
generation of sentences in six languages from mathematical formula that were
|
|
manipulated in the proof editor. One example area was geometry:
|
|
- Formula:
|
|
#FORMULAone
|
|
- English:
|
|
//If a point p lies outside a line l, then there is a line m such that p lies on m and m is parallel to l.//
|
|
- Finnish:
|
|
//Jos piste p on suoran l ulkopuolella, niin on olemassa suora m sellainen että p on suoralla m ja m on yhdensuuntainen l:n kanssa.//
|
|
- French:
|
|
//Si un point p est extérieur à une ligne l, alors il existe une ligne m telle que p soit sur la ligne m et que m soit parallèle à l.//
|
|
- German:
|
|
//Wenn ein Punkt p außerhalb einer Geraden l liegt, dann gibt es so eine Gerade m daß p auf der Geraden m liegt und m parallel zu l ist.//
|
|
- Italian:
|
|
//Si un punto p è esteriore a una linea l, allora esiste una line m tale che p sia sulla linea m e che m sia parallela a l.//
|
|
- Swedish:
|
|
//Om en punkt p ligger utanför linje l, så finns det en linje m sådan att p ligger på m och m är parallel med l.//
|
|
|
|
|
|
As a stand-alone programming language, GF was first implemented in 1998. This
|
|
took place at Xerox Research Centre Europe in Grenoble, within a project entitled
|
|
//Multilingual Document Authoring//. The goal of the project was to build a tool
|
|
for writing documents in multiple languages simultaneously, so that the user
|
|
need only know one of the languages; the rest will be produced automatically
|
|
via translations from the type-theoretical semantics (Dymetman & al. 2000).
|
|
In addition to GF itself, the project produced some prototype applications,
|
|
e.g. a restaurant phrase book and an editor for medical drug descriptions.
|
|
An important aspect was the adaptability of the system to new domains and
|
|
languages; hence the need of a language where such adaptations can be made
|
|
by just writing new grammars.
|
|
|
|
Most grammars that were build in the Xerox project
|
|
remained property of Xerox Corporation, but the GF formalism and its
|
|
implementation were released as open-source software under GNU General
|
|
Public License. From 1999, the development of GF continued mostly at
|
|
the Department of Computing Science of Chalmers University of Technology
|
|
and Gothenburg University. In this environment, both functional programming
|
|
and type theory are strong research areas. This helped GF to develop into
|
|
a more stable and more full-fledged programming language.
|
|
|
|
At Chalmers GF was soon used in courses given to computer science
|
|
students and in joint projects with non-linguist research groups.
|
|
This activity was soon summarized in the idea of making GF into
|
|
"the working programmer's grammar formalism", as
|
|
opposed to a tool requiring linguistic expertise. A nice experience from
|
|
the courses (both graduate and undergraduate) was that computer scientists are
|
|
often very interested in languages and have firm intuitions on grammar; given
|
|
a suitable programming tool, they can achieve impressive results in short time.
|
|
GF was to be made into such a tool, which meant above all that
|
|
it was developed in the way programming languages are,
|
|
following the virtues of familiarity and "the least surprise".
|
|
Issues of stability are also important, including
|
|
backward compatibility and portability to different platforms.
|
|
As a mark of stability, version 1.0 of GF was released in
|
|
2002. Also
|
|
documentation was found important. In addition to on-line documents,
|
|
a reference article appeared in the
|
|
//Journal of Functional Programming// in 2004,
|
|
and a long tutorial text was published in the post-publication of
|
|
ESSLLI lecture notes.
|
|
|
|
The first full-scale applications of GF were natural-language interfaces.
|
|
The first one was for the proof editor Alfa (Hallgren & Ranta 2000).
|
|
The second one was a syntax editor and a natural-language interface to the
|
|
software specification language OCL (Object Constraint Language) built
|
|
within the KeY project (Ahrendt & al. 2006).
|
|
These projects boosted the implementation side
|
|
of GF itself, in particular, the graphical syntax editor (Khegai & al. 2003).
|
|
At the same time, some major mathematical properties of GF were established
|
|
in the PhD thesis of Peter Ljunglöf (2004), which led to improved
|
|
parser implementations.
|
|
|
|
At the same time as GF was used in joint projects with computer science groups,
|
|
collaboration with the Linguistics Department of
|
|
Gothenburg University served as a "linguistic sanity check" of GF. Two efforts
|
|
that have been formative to the development of GF were started within this
|
|
collaboration:
|
|
- resource grammar libraries
|
|
- dialogue system applications
|
|
|
|
|
|
It was the resource grammar libraries that made GF really usable for non-linguist
|
|
programmers in more serious projects. They were heavily missed in the Alfa
|
|
project, and heavily used and improved in the KeY project. The development of
|
|
the library started in 2002; a version stable enough to be released with number
|
|
1.0 was complete in 2006, comprising ten languages.
|
|
|
|
Dialogue systems, on the other hand, turned
|
|
out to be a major source of interesting problems and also of successful solutions.
|
|
Much of this work was carried out in the European project TALK (Tools for Ambient
|
|
Linguistic Knowledge, 2004-2006), also involving sites from Cambridge,
|
|
Edinburgh, and BMW in Munich. In addition to complete systems, the TALK
|
|
project produced supporting tools for embedded grammars
|
|
and speech recognition, as well as additions of spoken language structures
|
|
to the resource grammar library.
|
|
|
|
Besides dialogue systems, multilingual authoring and translation continued
|
|
to be the main application of GF. The European WebALT project (Web Advanced
|
|
Learning Technologies, 2005-2006), used GF to build a tool for translating
|
|
mathematical exercises from formal specifications (written in MathML) to
|
|
six languages. Also a tool integrating GF with a computer algebra system was
|
|
developed. The project gave rise to a company, WebALT Inc.
|
|
|
|
At the time of writing this (August 2007), the release of GF has version
|
|
number 2.8. It is a stable system that has been built with contributions
|
|
of dozens of persons and been used by at least hundreds; download figures
|
|
are in thousands. New ideas on how to use GF are posted by users almost
|
|
every week.
|
|
|
|
|
|
|
|
==The purpose and scope of this book==
|
|
|
|
One purpose of this book is to serve the growing user base of GF with
|
|
a definitive manual that gathers all relevant information in one place.
|
|
However, it is also intended to serve those who want to get started with GF, and
|
|
who don't necessarily have the technical background of the typical
|
|
users. We believe that learning to program in GF is not more difficult
|
|
than learning some other programming language. As for learning the linguistic
|
|
aspects, our experience is that writing grammars is an excellent introduction
|
|
to the problems of linguistics. In this way, linguistic
|
|
theory can be learnt at the same time as it is motivated by concrete problems.
|
|
|
|
The book thus starts with a Tutorial (Part I), which gradually explains all
|
|
the constructs of the GF programming language. Also the design and style
|
|
aspects of grammar engineering are covered, to help the user to scale
|
|
up from small to large and possibly collaborative applications. Linguistic
|
|
concepts are explained at the same time as they are introduced in grammars.
|
|
|
|
After the Tutorial, the book continues with a manual on building applications
|
|
that have embedded grammars as components (Part II). Part III goes through some
|
|
examples of more advanced grammar writing, in particular, the internals of the
|
|
resource grammar library.
|
|
Part IV is a complete reference manual, and the two Appendices
|
|
show a grammar of the GF language and a quick reference card of GF.
|
|
|
|
What is not given much space in the book is theoretical discussions of
|
|
GF, especially in comparison to other grammar formalisms. Even though important
|
|
in the development of GF as a scientifically justified framework, such
|
|
discussions are not relevant for programmers who just want to use GF - any more
|
|
than, say, a book on Haskell has to include comparisons with Java. In fact,
|
|
comparisons with Java in a Haskell introduction would make more sense
|
|
than comparisons with DCG or HPSG or LFG in a GF introduction:
|
|
many Haskell learners can already be expected to know Java, whereas
|
|
most GF learners are not expected
|
|
to know any grammar formalisms, except perhaps BNF.
|
|
|
|
|
|
|
|
#PARTone
|
|
|
|
=An overview of the tutorial=
|
|
|
|
The tutorial gives a hands-on introduction to grammar writing in GF.
|
|
We start in Chapter 3
|
|
by building a "Hello World" grammar, which covers greetings
|
|
in three languages: English (//hello world//),
|
|
Finnish (//terve maailma//), and Italian (//ciao mondo//).
|
|
This **multilingual grammar** is based on the most central idea of GF:
|
|
the distinction between **abstract syntax**
|
|
(the logical structure) and **concrete syntax** (the
|
|
sequence of words).
|
|
|
|
From the "Hello World" example, we proceed
|
|
in Chapter 4
|
|
to a larger grammar for the domain of food.
|
|
In this grammar, you can say things like
|
|
```
|
|
this Italian cheese is delicious
|
|
```
|
|
in English and Italian. This grammar illustrates how translation is
|
|
more than just replacement of words. For instance, the order of
|
|
words may have to be changed:
|
|
```
|
|
Italian cheese ===> formaggio italiano
|
|
```
|
|
Moreover, words can have different forms, and which forms
|
|
they have vary from language to language. For instance,
|
|
Italian adjectives usually have four forms where English
|
|
has just one:
|
|
```
|
|
delicious (wine, wines, pizza, pizzas)
|
|
vino delizioso, vini deliziosi, pizza deliziosa, pizze deliziose
|
|
```
|
|
The **morphology** of a language describes the
|
|
forms of its words, and the basics
|
|
of it are explained in Chapter 5.
|
|
|
|
The complete description of morphology
|
|
belongs to resource grammars, whose use is covered in Chapter 6.
|
|
Writing resource grammars will only be covered in Part III;
|
|
however, we the Tutorial does explain all the
|
|
programming concepts involved in resource grammars.
|
|
|
|
In addition to multilinguality, **semantics** is an important aspect of GF
|
|
grammars. The concepts needed for "purely linguistic" grammars belong to
|
|
the concrete syntax part of GF, whereas semantics is expressed in the abstract
|
|
syntax. After the presentation of concrete syntax constructs, we proceed
|
|
in Chapter 7 to the enrichment of abstract syntax with **dependent types**,
|
|
**variable bindings**, and **semantic definitions**.
|
|
|
|
English and Italian are used as example languages in many grammars.
|
|
Of course, we will not presuppose that the reader knows any Italian.
|
|
We have chosen Italian because it has a rich structure
|
|
that illustrates very well the capacities of GF.
|
|
Moreover, even those readers who don't know Italian, will find many of
|
|
its words familiar, due to the Latin heritage.
|
|
The exercises will encourage the reader to
|
|
port the examples to other languages as well; in particular,
|
|
it should be instructive for the reader to look at her
|
|
own native language from the point of view of writing a grammar
|
|
implementation.
|
|
|
|
To learn how to write GF grammars is not the only goal of
|
|
this tutorial. We will also explain the most important
|
|
commands of the GF system, mostly in passing. With these commands,
|
|
simple application programs such as translation and
|
|
quiz systems, can be built simply by writing scripts for the
|
|
GF system. More complicated applications, such as natural-language
|
|
interfaces and dialogue systems, moreover require programming in
|
|
some general-purpose language; such applications are covered in Part II.
|
|
|
|
|
|
==Who should read this tutorial==
|
|
|
|
This tutorial has been written for all programmers
|
|
who want to learn to write grammars in GF.
|
|
It will go through GF's programming concepts, and does not
|
|
presuppose knowledge of any of the main ingredients of GF:
|
|
linguistics, functional programming, and type theory.
|
|
This knowledge will be introduced as a part of grammar writing
|
|
practice.
|
|
|
|
Thus the tutorial should be accessible to anyone who has some
|
|
previous experience from any programming language; the basics
|
|
of using computers are also presupposed, e.g. the use of
|
|
text editors and the management of files.
|
|
|
|
Those who already know GF well can skip the tutorial part,
|
|
or skim thorough it, and go directly to the parts on applications
|
|
and advanced grammar writing.
|
|
Many of these topics will involve large scale GF programming,
|
|
and/or programming in other languages in which GF grammars are embedded.
|
|
|
|
|
|
|
|
=Getting started=
|
|
|
|
In this chapter, we will introduce the GF system and write the first GF grammar,
|
|
a "Hello World" grammar. While extremely small, this grammar already illustrates
|
|
how GF can be used for the tasks of translation and multilingual
|
|
generation.
|
|
|
|
|
|
==What GF is==
|
|
|
|
We use the term GF for three different things:
|
|
- a **system** (computer program) used for working with grammars
|
|
- a **programming language** in which grammars can be written
|
|
- a **theory** about grammars and languages
|
|
|
|
|
|
The relation between these things is obvious: the GF system is an implementation
|
|
of the GF programming language, which in turn is built on the ideas of the
|
|
GF theory. The main focus of this book is on the GF programming language.
|
|
We learn how grammars are written in this language. At the same time, we learn
|
|
the way of thinking in the GF theory. To make this all useful and fun, and
|
|
to encourage experimenting, we make the grammars run on a computer by
|
|
using the GF system.
|
|
|
|
|
|
|
|
%--!
|
|
==What GF grammars are used for==
|
|
|
|
A grammar is a definition of a language.
|
|
From this definition, different language processing components
|
|
can be derived:
|
|
- **parsing**: to analyse the language
|
|
- **linearization**: to generate the language
|
|
- **translation**: to analyse one language and generate another
|
|
|
|
|
|
A GF grammar can be seen as a declarative program from which these
|
|
processing tasks can be automatically derived. In addition, many
|
|
other tasks are readily available for GF grammars:
|
|
- **morphological analysis**: find out the possible inflection forms of words
|
|
- **morphological synthesis**: generate all inflection forms of words
|
|
- **random generation**: generate random expressions
|
|
- **corpus generation**: generate all expressions
|
|
- **treebank generation**: generate a list of trees with their linearizations
|
|
- **teaching quizzes**: train morphology and translation
|
|
- **multilingual authoring**: create a document in many languages simultaneously
|
|
- **speech input**: optimize a speech recognition system for your grammar
|
|
|
|
|
|
A typical GF application is based on a **multilingual grammar** involving
|
|
translation on a special domain. Existing applications of this idea include
|
|
- [Alfa http://www.cs.chalmers.se/~hallgren/Alfa/Tutorial/GFplugin.html]:
|
|
a natural-language interface to a proof editor
|
|
(languages: English, French, Swedish)
|
|
- [KeY http://www.key-project.org/]:
|
|
a multilingual authoring system for creating software specifications
|
|
(languages: OCL, English, German)
|
|
- [TALK http://www.talk-project.org]:
|
|
multilingual and multimodal dialogue systems
|
|
(languages: English, Finnish, French, German, Italian, Spanish, Swedish)
|
|
- [WebALT http://webalt.math.helsinki.fi/content/index_eng.html]:
|
|
a multilingual translator of mathematical exercises
|
|
(languages: Catalan, English, Finnish, French, Spanish, Swedish)
|
|
- [Numeral translator http://www.cs.chalmers.se/~bringert/gf/translate/]:
|
|
number words from 1 to 999,999
|
|
(88 languages)
|
|
|
|
|
|
The specialization of a grammar to a domain makes it possible to
|
|
obtain much better translations than in an unlimited machine translation
|
|
system. This is due to the well-defined semantics of such domains.
|
|
Grammars having this character are called **application grammars**.
|
|
They are different from most grammars written by linguists just
|
|
because they are multilingual and domain-specific.
|
|
|
|
However, there is another kind of grammars, which we call **resource grammars**.
|
|
These are large, comprehensive grammars that can be used on any domain.
|
|
The GF Resource Grammar Library has resource grammars for 12 languages.
|
|
These grammars can be used as **libraries** to define application grammars.
|
|
In this way, it is possible to write a high-quality grammar without
|
|
knowing about linguistics: in general, to write an application grammar
|
|
by using the resource library just requires practical knowledge of
|
|
the target language. and all theoretical knowledge about its grammar
|
|
is given in the libraries.
|
|
|
|
|
|
|
|
|
|
%--!
|
|
==Getting the GF system==
|
|
|
|
The GF system is open-source free software, which can be downloaded via the
|
|
GF Homepage:
|
|
|
|
[``http://www.digitalgrammars.com/gf`` http://www.digitalgrammars.com/gf]
|
|
|
|
There you can download
|
|
- binaries for Linux, Mac OS X, and Windows
|
|
- source code and documentation
|
|
- grammar libraries and examples
|
|
|
|
|
|
If you want to compile GF from source, you need a Haskell compiler.
|
|
To compile the interactive editor, you also need a Java compilers.
|
|
But normally you don't have to compile anything yourself, and you definitely
|
|
don't need to know Haskell or Java to use GF.
|
|
|
|
We are assuming the availability of a Unix shell. Linux and Mac OS X users
|
|
have it automatically, the latter under the name "terminal".
|
|
Windows users are recommended to install Cywgin, the free Unix shell for Windows.
|
|
|
|
|
|
%--!
|
|
==Running the GF system==
|
|
|
|
To start the GF system, assuming you have installed it, just type
|
|
``gf`` in the Unix (or Cygwin) shell:
|
|
```
|
|
% gf
|
|
```
|
|
You will see GF's welcome message and the prompt ``>``.
|
|
The command
|
|
```
|
|
> help
|
|
```
|
|
will give you a list of available commands.
|
|
|
|
As a common convention in this book, we will use
|
|
- ``%`` as a prompt that marks system commands
|
|
- ``>`` as a prompt that marks GF commands
|
|
|
|
|
|
Thus you should not type these prompts, but only the characters that
|
|
follow them.
|
|
|
|
|
|
|
|
==A "Hello World" grammar==
|
|
|
|
The tradition in programming language tutorials is to start with a
|
|
program that prints "Hello World" on the terminal. GF should be no
|
|
exception. But our program has features that distinguish it from
|
|
most "Hello World" programs:
|
|
- **Multilinguality**: the message is printed in many languages.
|
|
- **Reversibility**: in addition to printing, you can **parse** the
|
|
message and translate it to other languages.
|
|
|
|
|
|
===The program: abstract syntax and concrete syntaxes===
|
|
|
|
A GF program, in general, is a **multilingual grammar**. Its main parts
|
|
are
|
|
- an **abstract syntax**
|
|
- one or more **concrete syntaxes**
|
|
|
|
|
|
The abstract syntax defines, in a language-independent way, what **meanings**
|
|
can be expressed in the grammar. In the "Hello World" grammar we want
|
|
to express //Greetings//, where we greet a //Recipient//, which can be
|
|
//World// or //Mum// or //Friends//. Here is the entire
|
|
GF code for the abstract syntax:
|
|
```
|
|
-- a "Hello World" grammar
|
|
abstract Hello = {
|
|
|
|
flags startcat = Greeting ;
|
|
|
|
cat Greeting ; Recipient ;
|
|
|
|
fun
|
|
Hello : Recipient -> Greeting ;
|
|
World, Mum, Friends : Recipient ;
|
|
}
|
|
```
|
|
The code has the following parts:
|
|
- a **comment** (optional), saying what the module is doing
|
|
- a **module header** indicating that it is an abstract syntax
|
|
module named ``Hello``
|
|
- a **module body** in braces, consisting of
|
|
- a **startcat flag declaration** stating that ``Greeting`` is the
|
|
main category, i.e. the one in which parsing and generation is
|
|
performed by default
|
|
- **category declarations** stating that ``Greeting`` and ``Recipient``
|
|
are categories, i.e. types of meanings
|
|
- **function declarations** stating what meaning-building functions there
|
|
are; these are the three possible recipients, as well as the function
|
|
``Hello`` constructing a greeting from a recipient
|
|
|
|
|
|
A concrete syntax defines a mapping from the abstract meanings to their
|
|
expressions in a language. We first give an English concrete syntax:
|
|
```
|
|
concrete HelloEng of Hello = {
|
|
|
|
lincat Greeting, Recipient = {s : Str} ;
|
|
|
|
lin
|
|
Hello rec = {s = "hello" ++ rec.s} ;
|
|
World = {s = "world"} ;
|
|
Mum = {s = "mum"} ;
|
|
Friends = {s = "friends"} ;
|
|
}
|
|
```
|
|
The major parts of this code are:
|
|
- a module header indicating that it is a concrete syntax of the abstract syntax
|
|
``Hello``, itself named ``HelloEng``
|
|
- a module body in braces, consisting of
|
|
- **linearization type definitions** stating that
|
|
``Greeting`` and ``recipient`` are **records** with a **string** ``s``
|
|
- **linearization definitions** telling what records are assigned to
|
|
each of the meanings defined in the abstract syntax; the recipients are
|
|
linearized to records containing single words, whereas the ``Hello`` greeting
|
|
has a function telling that the word ``hello`` is prefixed to the string
|
|
containing in the argument record
|
|
|
|
|
|
|
|
|
|
To make the grammar truly multilingual, we add a Finnish and an Italian concrete
|
|
syntax:
|
|
```
|
|
concrete HelloFin of Hello = {
|
|
lincat Greeting, Recipient = {s : Str} ;
|
|
lin
|
|
Hello rec = {s = "terve" ++ rec.s} ;
|
|
World = {s = "maailma"} ;
|
|
Mum = {s = "äiti"} ;
|
|
Friends = {s = "ystävät"} ;
|
|
}
|
|
|
|
concrete HelloIta of Hello = {
|
|
lincat Greeting, Recipient = {s : Str} ;
|
|
lin
|
|
Hello rec = {s = "ciao" ++ rec.s} ;
|
|
World = {s = "mondo"} ;
|
|
Mum = {s = "mamma"} ;
|
|
Friends = {s = "amici"} ;
|
|
}
|
|
```
|
|
Now we have a trilingual grammar usable for translation and
|
|
many other tasks, which we will now look into.
|
|
|
|
|
|
|
|
===Using the grammar in the GF system===
|
|
|
|
In order to compile the grammar in GF, each of the four modules
|
|
has to be put in a file named //Modulename//``.gf``:
|
|
```
|
|
Hello.gf HelloEng.gf HelloFin.gf HelloIta.gf
|
|
```
|
|
The first GF command needed when using a grammar is to **import** it.
|
|
The command has a long name, ``import``, and a short name, ``i``.
|
|
You can thus type either
|
|
```
|
|
> import food.cf
|
|
```
|
|
or
|
|
```
|
|
> i food.cf
|
|
```
|
|
to get the same effect. In general, all GF commands have a long and a short name;
|
|
short names are convenient when typing commands by hand, whereas long command
|
|
names are more readable in scripts, i.e. files that include sequences of commands.
|
|
|
|
The effect of ``import`` is that the GF program **compiles** your grammar
|
|
into an internal representation, and shows a new prompt when it is ready.
|
|
It will also show how much CPU time was consumed:
|
|
```
|
|
> i HelloEng.gf
|
|
- compiling Hello.gf... wrote file Hello.gfc 8 msec
|
|
- compiling HelloEng.gf... wrote file HelloEng.gfc 12 msec
|
|
|
|
12 msec
|
|
>
|
|
```
|
|
You can now use GF for **parsing**:
|
|
```
|
|
> parse "hello world"
|
|
Hello World
|
|
```
|
|
The ``parse`` (= ``p``) command takes a **string**
|
|
(in double quotes) and returns an **abstract syntax tree** - the meaning
|
|
of the string defined in the abstract syntax.
|
|
A tree is, in general, something easier than a string
|
|
for a machine to understand and to process further, although this
|
|
is not so obvious in this simple grammar.
|
|
|
|
Strings that return a tree when parsed do so in virtue of the grammar
|
|
you imported. Try to parse something that is not in grammar, and you will fail
|
|
```
|
|
> parse "hello dad"
|
|
Unknown words: dad
|
|
|
|
> parse "world hello"
|
|
no tree found
|
|
```
|
|
In the first example, the failure is caused by an unknown word.
|
|
In the second example, the combination of words is ungrammatical.
|
|
|
|
In addition to parsing, you can also use GF for **linearizing**
|
|
(``linearize = l``). This is the inverse of
|
|
parsing, taking trees into strings:
|
|
```
|
|
> linearize Hello World
|
|
hello world
|
|
```
|
|
What is the use of this? Typically not that you type in a tree at
|
|
the GF prompt. The utility of linearization comes from the fact that
|
|
you can obtain a tree from somewhere else - for instance, from
|
|
a parser. A prime example of this is **translation**: you parse
|
|
with one concrete syntax and linearize with another. Let us
|
|
now do this by first importing the Italian grammar:
|
|
```
|
|
> import HelloIta.gf
|
|
```
|
|
We can now parse with ``HelloEng`` and **pipe** the result
|
|
into linearizing with ``HelloIta``:
|
|
```
|
|
> parse -lang=HelloEng "hello mum" | linearize -lang=HelloIta
|
|
ciao mamma
|
|
```
|
|
Notice that the commands must use a **language flag** to indicate
|
|
which concrete syntax is used in each operation.
|
|
|
|
To conclude the translation exercise, we import the Finnish grammar
|
|
and pipe English parsing into **multilingual generation**:
|
|
```
|
|
> parse -lang=HelloEng "hello friends" | linearize -multi
|
|
terve ystävät
|
|
ciao amici
|
|
hello friends
|
|
```
|
|
|
|
**Exercise**. Test the parsing and translation examples shown above, as well as
|
|
five other examples.
|
|
|
|
**Exercise**. Extend the grammar ``Hello.gf`` and some of the
|
|
concrete syntaxes by five new recipients and one new greeting
|
|
form.
|
|
|
|
**Exercise**. Add a concrete syntax for some other
|
|
languages you might know.
|
|
|
|
|
|
|
|
==Using grammars from outside GF==
|
|
|
|
A "hello world" program written e.g. in C should be executable from the
|
|
Unix shell and print its output on the terminal. This is possible in GF
|
|
as well, by using the ``gf`` program in a Unix pipe. Invoking ``gf``
|
|
can be made with grammar names as arguments,
|
|
```
|
|
% gf HelloEng.gf HelloFin.gf HelloIta.gf
|
|
```
|
|
which has the same effect as opening ``gf`` and then importing the
|
|
grammars. A command can be send to this ``gf`` state by piping it from
|
|
Unix's ``echo`` command:
|
|
```
|
|
% echo "l -multi Hello Wordl" | gf HelloEng.gf HelloFin.gf HelloIta.gf
|
|
```
|
|
which will execute the command and then quit. Alternatively, one can write
|
|
a **script**,
|
|
```
|
|
import HelloEng.gf
|
|
import HelloFin.gf
|
|
import HelloIta.gf
|
|
linearize -multi Hello World
|
|
```
|
|
If we name this script ``hello.gfs``, we can do
|
|
```
|
|
$ gf -batch -s <hello.gfs s
|
|
|
|
ciao mondo
|
|
terve maailma
|
|
hello world
|
|
```
|
|
The options ``-batch`` and ``-s``("silent") prohibit GF's prompts, CPU time,
|
|
and other messages.
|
|
|
|
Writing GF scripts and Unix shell scripts that call GF is the simplest
|
|
way to build application programs that use GF grammars.
|
|
|
|
**Exercise**. (For Unix hackers.) Write a GF application that reads
|
|
an English string from the standard input and writes an Italian
|
|
translation to the output.
|
|
|
|
|
|
|
|
==What else can be done with the grammar==
|
|
|
|
Now we have built our first multilingual grammar and seen the basic
|
|
functionalities of GF: parsing and linearization. We have tested
|
|
these functionalities inside the GF program. In the forthcoming
|
|
chapters, we will build larger grammars and have more fun with
|
|
these functionalities. But we will also introduce many more,
|
|
as listed above in Section 3.2:
|
|
random generation,
|
|
exhaustive generation,
|
|
treebank generation,
|
|
syntax editing,
|
|
morphological analysis,
|
|
and
|
|
translation and morphological quizzes.
|
|
|
|
|
|
|
|
The usefulness of GF would be quite limited if grammars were
|
|
usable only inside the GF program. Later in this book,
|
|
we will see many other ways of using grammars:
|
|
- compile them to new formats, such as speech recognition grammars
|
|
- embed them in Java and Haskell programs
|
|
- build applications using compilation and embedding:
|
|
- voice commands
|
|
- spoken language translators
|
|
- dialogue systems
|
|
- user interfaces
|
|
- localization: parametrize the messages printed by a program
|
|
to support different languages
|
|
|
|
|
|
All GF functionalities, both those inside the GF program and those
|
|
ported to other environments,
|
|
are of course applicable to the simplest of grammars,
|
|
such as the ``Hello`` grammars presented above. But the main focus
|
|
of this tutorial will be on grammar writing. Thus we will show
|
|
how larger and more expressive grammars can be built by using
|
|
the constructs of the GF programming language, before entering the
|
|
applications in the next part of the book.
|
|
|
|
|
|
|
|
==Summary of GF language features==
|
|
|
|
A GF grammar consists of **modules**,
|
|
into which judgements are grouped. The most important
|
|
module forms are
|
|
- ``abstract`` A ``=`` M, abstract syntax A with judgements in
|
|
the module body M.
|
|
- ``concrete`` C ``of`` A ``=`` M, concrete syntax C of the
|
|
abstract syntax A, with judgements in the module body M.
|
|
|
|
|
|
Each module is written in a file named //Modulename//.``.gf``.
|
|
|
|
Rules in a GF grammar are called **judgements**, and the keywords
|
|
``fun`` and ``lin`` are used for distinguishing between two
|
|
**judgement forms**. Here is a summary of the most important
|
|
judgement forms:
|
|
|
|
- abstract syntax
|
|
|
|
| form | reading |
|
|
| ``cat`` C | C is a category
|
|
| ``fun`` f ``:`` A | f is a function of type A
|
|
|
|
- concrete syntax
|
|
|
|
| form | reading |
|
|
| ``lincat`` C ``=`` T | category C has linearization type T
|
|
| ``lin`` f ``=`` t | function f has linearization t
|
|
|
|
|
|
Both abstract and concrete modules may moreover contain definitions of
|
|
**flags**, of the form
|
|
- ``flags`` //flag//``=``//value//
|
|
|
|
|
|
and **comments** of the forms
|
|
- ``--`` //anything till a newline//
|
|
- ``{-`` //anything except hyphen followed by closing brace// ``-}``
|
|
|
|
|
|
Shorthands permit the sharing of
|
|
the keyword in subsequent judgements,
|
|
```
|
|
cat Phrase ; Item ; === cat Phrase ; cat Item ;
|
|
```
|
|
and of the right-hand-side in subsequent judgements of the same form
|
|
```
|
|
fun World, Mum, Friends : Recipient ; ===
|
|
fun World : Recipient ; Mum : Recipient ; Friends : Recipient ;
|
|
```
|
|
We use the symbol ``===`` to indicate **syntactic sugar** when
|
|
speaking about GF. Thus it is not a symbol of the GF language.
|
|
|
|
The order of judgements in a module is free. In particular, an identifier
|
|
need not be declared before it is used.
|
|
|
|
An **identifier** is a letter followed by a sequence of letters, digits, and
|
|
characters ``'`` or ``_``. Each identifier can only be
|
|
introduced once in the same module.
|
|
|
|
**Types** in an abstract syntax are either **basic types**,
|
|
i.e. ones introduced in ``cat`` judgements, or
|
|
**function types** of the form
|
|
```
|
|
A1 -> ... -> An -> A
|
|
```
|
|
where each of ``A1, ..., An, A`` is a basic type (this restriction
|
|
will be relieved later). The last type in the arrow-separated sequence
|
|
is the **value type** of the function type, the earlier types are
|
|
its **argument types**.
|
|
|
|
In a concrete syntax, the available types include
|
|
- the type of strings, ``Str``
|
|
- record types of form ``{`` r1 : T1 ; ... ; rn : Tn ``}``
|
|
|
|
|
|
**Terms** used in linearizations have the forms
|
|
- quoted string: ``"foo"``, of type ``Str``
|
|
- concatenation of strings: ``"foo" ++ "bar"``,
|
|
- record: ``{`` r1 = t1 ; ... ; rn = Tn ``}``,
|
|
of type ``{`` r1 : R1 ; ... ; rn : Rn ``}``
|
|
- projection ``t.r`` with a record label, of the corresponding record
|
|
field type
|
|
- argument variable ``x`` bound by the left-hand-side of a ``lin`` rule,
|
|
of the corresponding linearization type
|
|
|
|
|
|
Each semi-colon separated part in record types and records is called a
|
|
**field**. The identifier introduced by the left-hand-side of a field
|
|
is called a **label**.
|
|
|
|
Each quoted string is treated as one **token**, and strings concatenated by
|
|
``++`` are treated as separate tokens. Tokens are, by default, written with
|
|
a space in between. This behaviour can be changed by ``lexer`` and ``unlexer``
|
|
flags, as will be explained later in Section ??.
|
|
|
|
|
|
|
|
|
|
|
|
=Designing a grammar for complex phrases=
|
|
|
|
We will now start with a grammar that has much more structure than
|
|
the ``Hello`` grammar. We will look at how the abstract syntax
|
|
is divided into suitable categories, and how infinitely many
|
|
phrases can be built by using recursive rules. We will also
|
|
introduce **modularity** by showing how a large grammar can be
|
|
divided into modules, and how functions defined in **resource modules**
|
|
can be used to ahare code in and among modules.
|
|
|
|
|
|
==The abstract syntax Food==
|
|
|
|
We will write a grammar that
|
|
defines a set of phrases usable for speaking about food:
|
|
- the main category is ``Phrase``
|
|
- a ``Phrase`` can be built by assigning a ``Quality`` to an ``Item``s
|
|
(e.g. //this cheese is Italian//)
|
|
- an``Item`` are build from a ``Kind`` by prefixing "this" or "that"
|
|
(e.g. //this wine//)
|
|
- a ``Kind`` is either **atomic** (e.g. //cheese//), or formed
|
|
modifying a given ``Kind`` with a ``Quality`` (e.g. //Italian cheese//)
|
|
- a ``Quality`` is either atomic (e.g. //Italian//,
|
|
or built by modifying a given ``Quality`` (e.g. //very warm//)
|
|
|
|
|
|
These verbal descriptions can be expressed as the following abstract syntax:
|
|
```
|
|
abstract Food = {
|
|
|
|
flags startcat = Phrase ;
|
|
|
|
cat
|
|
Phrase ; Item ; Kind ; Quality ;
|
|
|
|
fun
|
|
Is : Item -> Quality -> Phrase ;
|
|
This, That : Kind -> Item ;
|
|
QKind : Quality -> Kind -> Kind ;
|
|
Wine, Cheese, Fish : Kind ;
|
|
Very : Quality -> Quality ;
|
|
Fresh, Warm, Italian, Expensive, Delicious, Boring : Quality ;
|
|
}
|
|
```
|
|
In this abstract syntax, we can build ``Phrase``s such as
|
|
```
|
|
Is (This (QKind Delicious (QKind Italian Wine))) (Very (Very Expensive))
|
|
```
|
|
In the English concrete syntax, we will want to linearize this into
|
|
```
|
|
this delicious Italian wine is very very expensive
|
|
```
|
|
|
|
|
|
==The concrete syntax FoodEng==
|
|
|
|
The English concrete syntax gives no surprises:
|
|
```
|
|
concrete FoodEng of Food = {
|
|
|
|
lincat
|
|
Phrase, Item, Kind, Quality = {s : Str} ;
|
|
|
|
lin
|
|
Is item quality = {s = item.s ++ "is" ++ quality.s} ;
|
|
This kind = {s = "this" ++ kind.s} ;
|
|
That kind = {s = "that" ++ kind.s} ;
|
|
QKind quality kind = {s = quality.s ++ kind.s} ;
|
|
Wine = {s = "wine"} ;
|
|
Cheese = {s = "cheese"} ;
|
|
Fish = {s = "fish"} ;
|
|
Very quality = {s = "very" ++ quality.s} ;
|
|
Fresh = {s = "fresh"} ;
|
|
Warm = {s = "warm"} ;
|
|
Italian = {s = "Italian"} ;
|
|
Expensive = {s = "expensive"} ;
|
|
Delicious = {s = "delicious"} ;
|
|
Boring = {s = "boring"} ;
|
|
}
|
|
```
|
|
Let us test how the grammar works in parsing:
|
|
```
|
|
> import FoodEng.gf
|
|
> parse "this delicious wine is very very Italian"
|
|
Is (This (QKind Delicious Wine)) (Very (Very Italian))
|
|
```
|
|
You can also try parsing in other categories than the ``startcat``,
|
|
by setting the command-line ``cat`` flag:
|
|
```
|
|
p -cat=Kind "very Italian wine"
|
|
QKind (Very Italian) Wine
|
|
```
|
|
|
|
**Exercise**. Extend the ``Food`` grammar by ten new food kinds and
|
|
qualities, and run the parser with new kinds of examples.
|
|
|
|
|
|
**Exercise**. Add a rule that enables question phrases of the form
|
|
//is this cheese Italian//.
|
|
|
|
|
|
**Exercise**. Enable the optional prefixing of
|
|
phrases with the words "excuse me but". Do this in such a way that
|
|
the prefix can occur at most once.
|
|
|
|
|
|
|
|
==Commands for testing grammars==
|
|
|
|
===Generating trees and strings===
|
|
|
|
When we have a grammar above a trivial size, especially a recursive
|
|
one, we need more efficient ways of testing it than just by parsing
|
|
sentences that happen to come to our minds. One way to do this is
|
|
based on **automatic generation**, which can be either
|
|
**random** or **exhausive**.
|
|
|
|
Random generation (``generate_random = gr``) is an operation that
|
|
builds a random tree in accordance with an abstract syntax:
|
|
```
|
|
> generate_random
|
|
Is (This (QKind Italian Fish)) Fresh
|
|
```
|
|
By using a pipe, random generation can be fed into linearization:
|
|
```
|
|
> generate_random | linearize
|
|
this Italian fish is fresh
|
|
```
|
|
Random generation is a good way to test a grammar; it can also
|
|
be fun. By using the ``number`` flag, several strings can be generated
|
|
in one command:
|
|
```
|
|
> gr -number=10 | l
|
|
that wine is boring
|
|
that fresh cheese is fresh
|
|
that cheese is very boring
|
|
this cheese is Italian
|
|
that expensive cheese is expensive
|
|
that fish is fresh
|
|
that wine is very Italian
|
|
this wine is Italian
|
|
this cheese is boring
|
|
this fish is boring
|
|
```
|
|
To generate //all// phrases that a grammar can produce,
|
|
GF provides the command ``generate_trees = gt``.
|
|
```
|
|
> generate_trees | l
|
|
that cheese is very Italian
|
|
that cheese is very boring
|
|
that cheese is very delicious
|
|
that cheese is very expensive
|
|
that cheese is very fresh
|
|
...
|
|
this wine is expensive
|
|
this wine is fresh
|
|
this wine is warm
|
|
|
|
```
|
|
You get quite a few trees but not all of them: only up to a given
|
|
**depth** of trees. The default depth is 3; the depth can be
|
|
set by using the ``depth`` flag:
|
|
```
|
|
> generate_trees -depth=5 | l
|
|
```
|
|
Other options to the generation commands (like all commands) can be seen
|
|
by GF's ``help = h`` command:
|
|
```
|
|
> help gr
|
|
> help gt
|
|
```
|
|
|
|
**Exercise**. If the command ``gt`` generated all
|
|
trees in your grammar, it would never terminate. Why?
|
|
|
|
**Exercise**. Measure how many trees the grammar gives with depths 4 and 5,
|
|
respectively. You use the Unix **word count** command ``wc`` to count lines.
|
|
**Hint**. You can pipe the output of a GF command into a Unix command by
|
|
using the escape ``?``, as follows:
|
|
```
|
|
> generate_trees -depth=4 | ? wc
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
===More on pipes; tracing===
|
|
|
|
A pipe of GF commands can have any length, but the "output type"
|
|
(either string or tree) of one command must always match the "input type"
|
|
of the next command, in order for the result to make sense.
|
|
|
|
The intermediate results in a pipe can be observed by putting the
|
|
**tracing** flag ``-tr`` to each command whose output you
|
|
want to see:
|
|
```
|
|
> gr -tr | l -tr | p
|
|
|
|
Is (This Cheese) Boring
|
|
this cheese is boring
|
|
Is (This Cheese) Boring
|
|
```
|
|
This facility is useful for test purposes: for instance, you
|
|
may want to see if a grammar is **ambiguous**, i.e.
|
|
contains strings that can be parsed in more than one way.
|
|
|
|
**Exercise**. Extend the ``Food`` grammar so that it produces ambiguous
|
|
strings, and try out the ambiguity test.
|
|
|
|
|
|
|
|
===Writing and reading files===
|
|
|
|
To save the outputs of GF commands into a file, you can
|
|
pipe it to the ``write_file = wf`` command,
|
|
```
|
|
> gr -number=10 | l | write_file exx.tmp
|
|
```
|
|
You can read the file back to GF with the
|
|
``read_file = rf`` command,
|
|
```
|
|
> read_file exx.tmp | p -lines
|
|
```
|
|
Notice the flag ``-lines`` given to the parsing
|
|
command. This flag tells GF to parse each line of
|
|
the file separately. Without the flag, the grammar could
|
|
not recognize the string in the file, because it is not
|
|
a sentence but a sequence of ten sentences.
|
|
|
|
Files with examples can be used for **regression testing**
|
|
of grammars.
|
|
|
|
|
|
|
|
|
|
|
|
==An Italian concrete syntax==
|
|
|
|
We write the Italian grammar in a straightforward way, by replacing
|
|
English words with their usual dictionary equivalents:
|
|
```
|
|
concrete FoodIta of Food = {
|
|
|
|
lincat
|
|
Phrase, Item, Kind, Quality = {s : Str} ;
|
|
|
|
lin
|
|
Is item quality = {s = item.s ++ "è" ++ quality.s} ;
|
|
This kind = {s = "questo" ++ kind.s} ;
|
|
That kind = {s = "quello" ++ kind.s} ;
|
|
QKind quality kind = {s = kind.s ++ quality.s} ;
|
|
Wine = {s = "vino"} ;
|
|
Cheese = {s = "formaggio"} ;
|
|
Fish = {s = "pesce"} ;
|
|
Very quality = {s = "molto" ++ quality.s} ;
|
|
Fresh = {s = "fresco"} ;
|
|
Warm = {s = "caldo"} ;
|
|
Italian = {s = "italiano"} ;
|
|
Expensive = {s = "caro"} ;
|
|
Delicious = {s = "delizioso"} ;
|
|
Boring = {s = "noioso"} ;
|
|
}
|
|
```
|
|
An alert reader, or one who already knows Italian, may notice one point in
|
|
which the change is more radical than just replacement of words: the order of
|
|
a quality and the kind it modifies in
|
|
```
|
|
QKind quality kind = {s = kind.s ++ quality.s} ;
|
|
```
|
|
Thus Italian says ``vino italiano`` for ``Italian wine``.
|
|
|
|
**Exercise**. Write a concrete syntax of ``Food`` for some other language.
|
|
You will probably end up with grammatically incorrect linearizations - but don't
|
|
worry about this yet.
|
|
|
|
**Exercise**. If you have written ``Food`` for German, Swedish, or some
|
|
other language, test with random or exhaustive generation what constructs
|
|
come out incorrect, and prepare a list of those ones that cannot be helped
|
|
with the currently available fragment of GF. You can return to your list
|
|
after having worked out Chapter 5.
|
|
|
|
|
|
|
|
==More application of multilingual grammars==
|
|
|
|
===Multilingual treebanks===
|
|
|
|
A **multilingual treebank**, is a set of trees with their
|
|
translations in different languages:
|
|
```
|
|
> gr -number=2 | tree_bank
|
|
|
|
Is (That Cheese) (Very Boring)
|
|
quello formaggio è molto noioso
|
|
that cheese is very boring
|
|
|
|
Is (That Cheese) Fresh
|
|
quello formaggio è fresco
|
|
that cheese is fresh
|
|
```
|
|
|
|
|
|
===Translation session===
|
|
|
|
If translation is what you want to do with a set of grammars, a convenient
|
|
way to do it is to open a ``translation_session = ts``. In this session,
|
|
you can translate between all the languages that are in scope.
|
|
A dot ``.`` terminates the translation session.
|
|
```
|
|
> ts
|
|
|
|
trans> that very warm cheese is boring
|
|
quello formaggio molto caldo è noioso
|
|
that very warm cheese is boring
|
|
|
|
trans> questo vino molto italiano è molto delizioso
|
|
questo vino molto italiano è molto delizioso
|
|
this very Italian wine is very delicious
|
|
|
|
trans> .
|
|
>
|
|
```
|
|
|
|
|
|
===Translation quiz===
|
|
|
|
This is a simple language exercise that can be automatically
|
|
generated from a multilingual grammar. The system generates a set of
|
|
random sentences, displays them in one language, and checks the user's
|
|
answer given in another language. The command ``translation_quiz = tq``
|
|
makes this in a subshell of GF.
|
|
```
|
|
> translation_quiz FoodEng FoodIta
|
|
|
|
Welcome to GF Translation Quiz.
|
|
The quiz is over when you have done at least 10 examples
|
|
with at least 75 % success.
|
|
You can interrupt the quiz by entering a line consisting of a dot ('.').
|
|
|
|
this fish is warm
|
|
questo pesce è caldo
|
|
> Yes.
|
|
Score 1/1
|
|
|
|
this cheese is Italian
|
|
questo formaggio è noioso
|
|
> No, not questo formaggio è noioso, but
|
|
questo formaggio è italiano
|
|
|
|
Score 1/2
|
|
this fish is expensive
|
|
```
|
|
You can also generate a list of translation exercises and save it in a
|
|
file for later use, by the command ``translation_list = tl``
|
|
```
|
|
> translation_list -number=25 FoodEng FoodIta | write_file transl.txt
|
|
```
|
|
The ``number`` flag gives the number of sentences generated.
|
|
|
|
|
|
|
|
===Multilingual syntax editing===
|
|
|
|
Any multilingual grammar can be used in the graphical syntax editor, which is
|
|
opened by the shell
|
|
command ``gfeditor`` followed by the names of the grammar files.
|
|
Thus
|
|
```
|
|
% gfeditor FoodEng.gf FoodIta.gf
|
|
```
|
|
opens the editor for the two ``Food`` grammars.
|
|
|
|
The editor supports commands for manipulating an abstract syntax tree.
|
|
The process is started by choosing a category from the "New" menu.
|
|
Choosing ``Phrase`` creates a new tree of type ``Phrase``. A new tree
|
|
is in general completely unknown: it consists of a **metavariable**
|
|
``?1``. However, since the category ``Phrase`` in ``Food`` has
|
|
only one possible constructor, ``Is``, the tree is readily
|
|
given the form ``Is ?1 ?2``. Here is what the editor looks like at
|
|
this stage:
|
|
|
|
[food1.png]
|
|
|
|
Editing goes on by **refinements**, i.e. choices of constructors from
|
|
the menu, until no metavariables remain. Here is a tree resulting from the
|
|
current editing session:
|
|
|
|
[food2.png]
|
|
|
|
Editing can be continued even when the tree is finished. The user can shift
|
|
the **focus** to some of the subtrees by clicking at it of the corresponding
|
|
part of a linearization. In the picture, the focus is on "fish".
|
|
Since there are no metavariables,
|
|
the menu shows no refinements, but some other possible actions:
|
|
- to **change** "fish" to "cheese" or "wine"
|
|
- to **delete** "fish", i.e. change it to a metavariable
|
|
- to **wrap** "fish" in a qualification, i.e. change it to
|
|
``QKind ? Fish``, where the quality can be given in a later refinement
|
|
|
|
|
|
In addition to menu-based editing, the tool supports refinement by parsing,
|
|
which is accessible by middle-clicking in the tree or in the linearization field.
|
|
|
|
**Exercise**. Construct the sentence
|
|
//this very expensive cheese is very very delicious//
|
|
and its Italian translation by using ``gfeditor``.
|
|
|
|
|
|
==Context-free grammars and GF==
|
|
|
|
Readers not familar with context-free grammars, also known as BNF grammars, can
|
|
skip this section. Those that are familar with them will find here the exact
|
|
relation between GF and context-free grammars. We will moreover show how
|
|
the BNF format can be used as input to the GF program; it is often more
|
|
concise than GF proper, but also more restricted in expressive power.
|
|
|
|
|
|
===The "cf" grammar format===
|
|
|
|
The grammar ``FoodEng`` could be written in a BNF format as follows:
|
|
```
|
|
Is. Phrase ::= Item "is" Quality ;
|
|
That. Item ::= "that" Kind ;
|
|
This. Item ::= "this" Kind ;
|
|
QKind. Kind ::= Quality Kind ;
|
|
Cheese. Kind ::= "cheese" ;
|
|
Fish. Kind ::= "fish" ;
|
|
Wine. Kind ::= "wine" ;
|
|
Italian. Quality ::= "Italian" ;
|
|
Boring. Quality ::= "boring" ;
|
|
Delicious. Quality ::= "delicious" ;
|
|
Expensive. Quality ::= "expensive" ;
|
|
Fresh. Quality ::= "fresh" ;
|
|
Very. Quality ::= "very" Quality ;
|
|
Warm. Quality ::= "warm" ;
|
|
```
|
|
In this format, each rule is prefixed by a **label** that gives
|
|
the constructor function GF gives in its ``fun`` rules. In fact,
|
|
each context-free rule is a fusion of a ``fun`` and a ``lin`` rule:
|
|
it states simultaneously that
|
|
- the label is a function from the nonterminal categories
|
|
on the right-hand side to the category on the left-hand side;
|
|
the first rule gives
|
|
```
|
|
fun Is : Item -> Quality -> Phrase
|
|
```
|
|
- trees built by the label are linearized in the way indicated
|
|
by the right-hand side;
|
|
the first rule gives
|
|
```
|
|
lin Is item quality = {s = item.s ++ "is" ++ quality.s}
|
|
```
|
|
|
|
|
|
The translation from BNF to GF described above is in fact used in
|
|
the GF system to convert BNF grammars into GF. BNF files are recognized
|
|
by the file name suffix ``.cf``; thus the grammar above can be
|
|
put into a file named ``food.cf`` and read into GF by
|
|
```
|
|
> import food.cf
|
|
```
|
|
|
|
|
|
===Restrictions of context-free grammars===
|
|
|
|
Even though we managed to write ``FoodEng`` in the context-free format,
|
|
we cannot do this for GF grammars in general. If we just try to do this
|
|
for ``FoodIta`` as well, we lose an important aspect of multilinguality:
|
|
that the order of constituents is defined separately in concrete syntax.
|
|
Thus we could not use context-free ``FoodEng`` and ``FoodIta`` in a multilingual
|
|
grammar that supports translation via common abstract syntax: the
|
|
qualification function ``QKind`` has different types in the two
|
|
grammars.
|
|
|
|
In general terms, the separation of concrete and abstract syntax allows
|
|
three deviations from context-free grammar:
|
|
- **permutation**: changing the order of constituents
|
|
- **suppression**: omitting constituents
|
|
- **reduplication**: repeating constituent
|
|
|
|
|
|
The third property is the one that definitely shows that GF is
|
|
stronger than context-free: GF can define the **copy language**
|
|
``{x x | x <- (a|b)*}``, which is known not to be context-free.
|
|
The other properties have more to do with the kind of trees that
|
|
the grammar can associated with strings: permutation is important
|
|
in multilingual grammars, and suppression is exploited in grammars
|
|
where trees carry some hidden semantic information (see Chapter 7
|
|
below).
|
|
|
|
Of course, context-free grammars are also restricted from the
|
|
grammar engineering point of view. They give no support to
|
|
modules, functions, and parameters, which are so central
|
|
for the productivity of GF.
|
|
|
|
**Exercise**. GF can also interpret unlabelled BNF grammars, by
|
|
creating labels automatically. The right-hand sides of BNF rules
|
|
can moreover be disjunctions, e.g.
|
|
```
|
|
Quality ::= "fresh" | "Italian" | "very" Quality ;
|
|
```
|
|
Experiment with this format in GF, possibly with a grammar that
|
|
you import from some other source, such as a programming language
|
|
document.
|
|
|
|
**Exercise**. Define the copy language ``{x x | x <- (a|b)*}`` in GF.
|
|
|
|
|
|
|
|
%--!
|
|
==Modules and files==
|
|
|
|
GF uses suffixes to recognize different file formats. The most
|
|
important ones are:
|
|
- Source files: //Modulename//``.gf``
|
|
- Target files: //Modulename//``.gfc``
|
|
|
|
|
|
When you import ``FoodEng.gf``, you see the target files being
|
|
generated:
|
|
```
|
|
> i FoodEng.gf
|
|
- compiling Food.gf... wrote file Food.gfc 16 msec
|
|
- compiling FoodEng.gf... wrote file FoodEng.gfc 20 msec
|
|
```
|
|
You also see that the GF program does not only read the file
|
|
``FoodEng.gf``, but also all other files that it
|
|
depends on - in this case, ``Food.gf``.
|
|
|
|
For each file that is compiled, a ``.gfc`` file
|
|
is generated. The GFC format (="GF Canonical") is the
|
|
"machine code" of GF, which is faster to process than
|
|
GF source files. When reading a module, GF decides whether
|
|
to use an existing ``.gfc`` file or to generate
|
|
a new one, by looking at modification times.
|
|
|
|
**Exercise**. What happens when you import ``FoodEng.gf`` for
|
|
a second time? Try this in different situations:
|
|
- Right after importing it the first time (the modules are kept in
|
|
the memory of GF and need no reloading).
|
|
- After issuing the command ``empty`` (``e``), which clears the memory
|
|
of GF.
|
|
- After making a small change in ``FoodEng.gf``, be it only an added space.
|
|
- After making a change in ``Food.gf``.
|
|
|
|
|
|
|
|
|
|
|
|
==Using operations and resource modules==
|
|
|
|
===The golden rule of functional programming===
|
|
|
|
When writing a grammar, you have to type lots of
|
|
characters. You have probably
|
|
done this by the copy-and-paste method, which is a common way to
|
|
avoid repeating work.
|
|
|
|
However, there is a more elegant way to avoid repeating work than
|
|
the copy-and-paste
|
|
method. The **golden rule of functional programming** says that
|
|
- whenever you find yourself programming by copy-and-paste,
|
|
write a function instead.
|
|
|
|
|
|
A function separates the shared parts of different computations from the
|
|
changing parts, its **arguments**, or **parameters**.
|
|
In functional programming languages, such as
|
|
[Haskell http://www.haskell.org], it is possible to share much more
|
|
code with functions than in languages such as C and Java, because
|
|
of higher-order functions (functions that takes functions as arguments).
|
|
|
|
|
|
===Operation definitions===
|
|
|
|
GF is a functional programming language, not only in the sense that
|
|
the abstract syntax is a system of functions (``fun``), but also because
|
|
functional programming can be used when defining concrete syntax. This is
|
|
done by using a new form of judgement, with the keyword ``oper`` (for
|
|
**operation**), distinct from ``fun`` for the sake of clarity.
|
|
Here is a simple example of an operation:
|
|
```
|
|
oper ss : Str -> {s : Str} = \x -> {s = x} ;
|
|
```
|
|
The operation can be **applied** to an argument, and GF will
|
|
**compute** the application into a value. For instance,
|
|
```
|
|
ss "boy" ===> {s = "boy"}
|
|
```
|
|
We use the symbol ``===>`` to indicate how an expression is
|
|
computed into a value; this symbol is not a part of GF.
|
|
|
|
Thus an ``oper`` judgement includes the name of the defined operation,
|
|
its type, and an expression defining it. As for the syntax of the defining
|
|
expression, notice the **lambda abstraction** form ``\``//x// ``->`` //t// of
|
|
the function. It reads: function with variable //x// and **function body**
|
|
//t//.
|
|
|
|
For lambda abstraction with multiple arguments, we have the shorthand
|
|
```
|
|
\x,y,z -> t === \x -> \y -> \z -> t
|
|
```
|
|
The notation we have used for linearization rules, where
|
|
variables are bound on the left-hand side, is actually syntactic
|
|
sugar for abstraction:
|
|
```
|
|
lin f x = t === lin f = \x -> t
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
%--!
|
|
===The ``resource`` module type===
|
|
|
|
Operator definitions can be included in a concrete syntax.
|
|
But they are usually not really tied to a particular
|
|
set of linearization rules.
|
|
They should rather be seen as **resources**
|
|
usable in many concrete syntaxes.
|
|
|
|
The ``resource`` module type is used to package
|
|
``oper`` definitions into reusable resources. Here is
|
|
an example, with a handful of operations to manipulate
|
|
strings and records.
|
|
```
|
|
resource StringOper = {
|
|
oper
|
|
SS : Type = {s : Str} ;
|
|
ss : Str -> SS = \x -> {s = x} ;
|
|
cc : SS -> SS -> SS = \x,y -> ss (x.s ++ y.s) ;
|
|
prefix : Str -> SS -> SS = \p,x -> ss (p ++ x.s) ;
|
|
}
|
|
```
|
|
|
|
|
|
|
|
%--!
|
|
===Opening a resource===
|
|
|
|
Any number of ``resource`` modules can be
|
|
**opened** in a ``concrete`` syntax, which
|
|
makes definitions contained
|
|
in the resource usable in the concrete syntax. Here is
|
|
an example, where the resource ``StringOper`` is
|
|
opened in a new version of ``FoodEng``.
|
|
```
|
|
concrete FoodEng of Food = open StringOper in {
|
|
|
|
lincat
|
|
S, Item, Kind, Quality = SS ;
|
|
|
|
lin
|
|
Is item quality = cc item (prefix "is" quality) ;
|
|
This k = prefix "this" k ;
|
|
That k = prefix "that" k ;
|
|
QKind k q = cc k q ;
|
|
Wine = ss "wine" ;
|
|
Cheese = ss "cheese" ;
|
|
Fish = ss "fish" ;
|
|
Very = prefix "very" ;
|
|
Fresh = ss "fresh" ;
|
|
Warm = ss "warm" ;
|
|
Italian = ss "Italian" ;
|
|
Expensive = ss "expensive" ;
|
|
Delicious = ss "delicious" ;
|
|
Boring = ss "boring" ;
|
|
}
|
|
```
|
|
|
|
**Exercise**. Use the same string operations to write ``FoodIta``
|
|
more concisely.
|
|
|
|
|
|
|
|
%--!
|
|
===Partial application===
|
|
|
|
GF, like Haskell, permits **partial application** of
|
|
functions. An example of this is the rule
|
|
```
|
|
lin This k = prefix "this" k ;
|
|
```
|
|
which can be written more concisely
|
|
```
|
|
lin This = prefix "this" ;
|
|
```
|
|
The first form is perhaps more intuitive to write
|
|
but, once you get used to partial application, you will appreciate its
|
|
conciseness and elegance. The logic of partial application
|
|
is known as **currying**, with a reference to Haskell B. Curry.
|
|
The idea is that any //n//-place function can be defined as a 1-place
|
|
function whose value is an //n-//1 -place function. Thus
|
|
```
|
|
oper prefix : Str -> SS -> SS ;
|
|
```
|
|
can be used as a 1-place function that takes a ``Str`` into a
|
|
function ``SS -> SS``. The expected linearization of ``This`` is exactly
|
|
a function of such a type, operating on an argument of type ``Kind``
|
|
whose linearization is of type ``SS``. Thus we can define the
|
|
linearization directly as ``prefix "this"``.
|
|
|
|
**Exercise**. Define an operation ``infix`` analogous to ``prefix``,
|
|
such that it allows you to write
|
|
```
|
|
lin Is = infix "is" ;
|
|
```
|
|
|
|
|
|
|
|
===Testing resource modules===
|
|
|
|
To test a ``resource`` module independently, you must import it
|
|
with the flag ``-retain``, which tells GF to retain ``oper`` definitions
|
|
in the memory; the usual behaviour is that ``oper`` definitions
|
|
are just applied to compile linearization rules
|
|
(this is called **inlining**) and then thrown away.
|
|
```
|
|
> i -retain StringOper.gf
|
|
```
|
|
The command ``compute_concrete = cc`` computes any expression
|
|
formed by operations and other GF constructs. For example,
|
|
```
|
|
> compute_concrete prefix "in" (ss "addition")
|
|
{
|
|
s : Str = "in" ++ "addition"
|
|
}
|
|
```
|
|
|
|
|
|
==Grammar architecture==
|
|
|
|
===Extending a grammar===
|
|
|
|
The module system of GF makes it possible to **extend** a
|
|
grammar in different ways. The syntax of extension is
|
|
shown by the following example. We extend ``Food`` by
|
|
adding a category of questions and two new functions.
|
|
```
|
|
abstract Morefood = Food ** {
|
|
cat
|
|
Question ;
|
|
fun
|
|
QIs : Item -> Quality -> Question ;
|
|
Pizza : Kind ;
|
|
|
|
}
|
|
```
|
|
Parallel to the abstract syntax, extensions can
|
|
be built for concrete syntaxes:
|
|
```
|
|
concrete MorefoodEng of Morefood = FoodEng ** {
|
|
lincat
|
|
Question = {s : Str} ;
|
|
lin
|
|
QIs item quality = {s = "is" ++ item.s ++ quality.s} ;
|
|
Pizza = {s = "pizza"} ;
|
|
}
|
|
```
|
|
The effect of extension is that all of the contents of the extended
|
|
and extending module are put together. We also say that the new
|
|
module **inherits** the contents of the old module.
|
|
|
|
At the same time as extending a module of the same type, a concrete
|
|
syntax module may open resources. The syntax is shown by the
|
|
following Italian grammar module:
|
|
```
|
|
concrete MorefoodIta of Morefood = FoodIta ** open StringOper in {
|
|
lincat
|
|
Question = SS ;
|
|
lin
|
|
QIs item quality = ss (item.s ++ "è" ++ quality.s) ;
|
|
Pizza = ss "pizza" ;
|
|
}
|
|
```
|
|
Resource modules can extend other resource modules, in the
|
|
same way as modules of other types can extend modules of the
|
|
same type. Thus it is possible to build resource hierarchies.
|
|
|
|
|
|
===Multiple inheritance===
|
|
|
|
Specialized vocabularies can be represented as small grammars that
|
|
only do "one thing" each. For instance, the following are grammars
|
|
for fruit and mushrooms
|
|
```
|
|
abstract Fruit = {
|
|
cat Fruit ;
|
|
fun Apple, Peach : Fruit ;
|
|
}
|
|
|
|
abstract Mushroom = {
|
|
cat Mushroom ;
|
|
fun Cep, Agaric : Mushroom ;
|
|
}
|
|
```
|
|
They can afterwards be combined into bigger grammars by using
|
|
**multiple inheritance**, i.e. extension of several grammars at the
|
|
same time:
|
|
```
|
|
abstract Foodmarket = Food, Fruit, Mushroom ** {
|
|
fun
|
|
FruitKind : Fruit -> Kind ;
|
|
MushroomKind : Mushroom -> Kind ;
|
|
}
|
|
```
|
|
The main advantages with splitting a grammar to modules are
|
|
**reusability**, **separate compilation**, and **division of labour**.
|
|
Reusability means
|
|
that one and the same module can be put into different uses; for instance,
|
|
a module with mushroom names might be used in a mycological information system
|
|
as well as in a restaurant phrasebook. Separate compilation means that a module
|
|
once compiled into ``.gfc`` need not be compiled again unless changes have
|
|
taken place.
|
|
Division of labour means simply that programmers that are experts in
|
|
special areas can work on modules belonging tp those areas.
|
|
|
|
**Exercise**. Refactor ``Food`` by taking apart ``Wine`` into a special
|
|
``Drink`` module.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
==Summary of GF language features==
|
|
|
|
Module extensions, multiple inheritance.
|
|
|
|
Resource modules.
|
|
|
|
Oper judgements.
|
|
|
|
Lambda abstraction.
|
|
|
|
The ``.cf`` grammar format.
|
|
|
|
|
|
|
|
|
|
=Grammars with parameters=
|
|
|
|
In this Chapter, we will introduce the techniques needed for
|
|
describing the inflection of words, as well as the rules by
|
|
which propor word forms are selected in syntactic combinations.
|
|
These techniques are already needed in a very slight extension
|
|
of the Food grammar of the previous Chapter. While explaining
|
|
how the linguistic problems are solved for English and Italian,
|
|
we also explain all the language constructs GF has for
|
|
defining concrete syntax.
|
|
|
|
|
|
|
|
==The problem: words have to be inflected==
|
|
|
|
Suppose we want to say, with the vocabulary included in
|
|
``Food.gf``, things like
|
|
```
|
|
all Italian wines are delicious
|
|
```
|
|
The new grammatical facility we need are the plural forms
|
|
of nouns and verbs (//wines, are//), as opposed to their
|
|
singular forms.
|
|
|
|
The introduction of plural forms requires two things:
|
|
- the **inflection** of nouns and verbs in singular and plural
|
|
- the **agreement** of the verb to subject:
|
|
the verb must have the same number as the subject
|
|
|
|
|
|
Different languages have different rules of inflection and agreement.
|
|
For instance, Italian has also agreement in gender (masculine vs. feminine).
|
|
We want to express such special features of languages in the
|
|
concrete syntax while ignoring them in the abstract syntax.
|
|
|
|
To be able to do all this, we need one new judgement form
|
|
and some new expression forms.
|
|
We also need to generalize linearization types
|
|
from strings to more complex types.
|
|
|
|
**Exercise**. Make a list of the possible forms that nouns,
|
|
adjectives, and verbs can have in some languages that you know.
|
|
|
|
|
|
%--!
|
|
==Parameters and tables==
|
|
|
|
We define the **parameter type** of number in English by
|
|
using a new form of judgement:
|
|
```
|
|
param Number = Sg | Pl ;
|
|
```
|
|
To express that ``Kind`` expressions in English have a linearization
|
|
depending on number, we replace the linearization type ``{s : Str}``
|
|
with a type where the ``s`` field is a **table** depending on number:
|
|
```
|
|
lincat Kind = {s : Number => Str} ;
|
|
```
|
|
The **table type** ``Number => Str`` is in many respects similar to
|
|
a function type (``Number -> Str``). The main difference is that the
|
|
argument type of a table type must always be a parameter type. This means
|
|
that the argument-value pairs can be listed in a finite table. The following
|
|
example shows such a table:
|
|
```
|
|
lin Cheese = {s = table {
|
|
Sg => "cheese" ;
|
|
Pl => "cheeses"
|
|
}
|
|
} ;
|
|
```
|
|
The table consists of **branches**, where a **pattern** on the
|
|
left of the arrow ``=>`` is assigned a **value** on the right.
|
|
|
|
The application of a table to a parameter is done by the **selection**
|
|
operator ``!``. For instance,
|
|
```
|
|
table {Sg => "cheese" ; Pl => "cheeses"} ! Pl
|
|
```
|
|
is a selection that computes into the value ``"cheeses"``.
|
|
This computation is performed by **pattern matching**: return
|
|
the value from the first branch whose pattern matches the
|
|
selection argument. Thus
|
|
```
|
|
table {Sg => "cheese" ; Pl => "cheeses"} ! Pl
|
|
===> "cheeses"
|
|
```
|
|
|
|
**Exercise**. In a previous exercise, we made a list of the possible
|
|
forms that nouns, adjectives, and verbs can have in some languages that
|
|
you know. Now take some of the results and implement them by
|
|
using parameter type definitions and tables. Write them into a ``resource``
|
|
module, which you can test by using the command ``compute_concrete``.
|
|
|
|
|
|
|
|
%--!
|
|
==Inflection tables and paradigms==
|
|
|
|
All English common nouns are inflected for number, most of them in the
|
|
same way: the plural form is obtained from the singular by adding the
|
|
ending //s//. This rule is an example of
|
|
a **paradigm** - a formula telling how the inflection
|
|
forms of a word are formed.
|
|
|
|
From the GF point of view, a paradigm is a function that takes a **lemma** -
|
|
also known as a **dictionary form** or a **citation form** - and returns an inflection
|
|
table of desired type. Paradigms are not functions in the sense of the
|
|
``fun`` judgements of abstract syntax (which operate on trees and not
|
|
on strings), but operations defined in ``oper`` judgements.
|
|
The following operation defines the regular noun paradigm of English:
|
|
```
|
|
oper regNoun : Str -> {s : Number => Str} = \x -> {
|
|
s = table {
|
|
Sg => x ;
|
|
Pl => x + "s"
|
|
}
|
|
} ;
|
|
```
|
|
The **gluing** operator ``+`` tells that
|
|
the string held in the variable ``x`` and the ending ``"s"``
|
|
are written together to form one **token**. Thus, for instance,
|
|
```
|
|
(regNoun "cheese").s ! Pl ===> "cheese" + "s" ===> "cheeses"
|
|
```
|
|
|
|
**Exercise**. Identify cases in which the ``regNoun`` paradigm does not
|
|
apply in English, and implement some alternative paradigms.
|
|
|
|
**Exercise**. Implement a paradigm for regular verbs in English.
|
|
|
|
**Exercise**. Implement some regular paradigms for other languages you have
|
|
considered in earlier exercises.
|
|
|
|
|
|
|
|
==Using parameters in concrete syntax==
|
|
|
|
We can now enrich the concrete syntax definitions to
|
|
comprise morphology. This will permit a more radical
|
|
variation between languages (e.g. English and Italian)
|
|
than just the use of different words. In general,
|
|
parameters and linearization types are different in
|
|
different languages - but this has no effect on
|
|
the common abstract syntax.
|
|
|
|
We consider a grammar ``Foods``, which is similar to
|
|
``Food``, with the addition of two plural determiners,
|
|
```
|
|
fun These, Those : Kind -> Item ;
|
|
```
|
|
We also add a noun which in Italian has the feminine case; all noun in
|
|
``Food`` were carefully chosen to be masculine!
|
|
```
|
|
fun Pizza : Kind ;
|
|
```
|
|
This will force us to deal with gender in the Italian grammar, which is what
|
|
we need for the grammar to scale up for larger applications.
|
|
|
|
|
|
|
|
%--!
|
|
===Agreement===
|
|
|
|
In the English ``Foods`` grammar, we need just one type of parameters:
|
|
``Number`` as defined above. The phrase-forming rule
|
|
```
|
|
fun Is : Item -> Quality -> Phrase ;
|
|
```
|
|
is affected by the number because of **subject-verb agreement**.
|
|
In English, agreement says that the verb of a sentence
|
|
must be inflected in the number of the subject. Thus we will linearize
|
|
```
|
|
Is (This Pizza) Warm >> "this pizza is warm"
|
|
Is (These Pizza) Warm >> "these pizzas are warm"
|
|
```
|
|
It is the **copula**, i.e. the verb //be// that is affected. We define
|
|
the copula as the operation
|
|
```
|
|
oper copula : Number => Str =
|
|
table {
|
|
Sg => "is" ;
|
|
Pl => "are"
|
|
} ;
|
|
```
|
|
We don't need to inflect the copula for person and tense yet.
|
|
|
|
The form of the copula in a sentence depends on the
|
|
**subject** of the sentence, i.e. the item
|
|
that is qualified. This means that an item must have such a number to provide.
|
|
In other words, the linearization of an ``Item`` must provide a number. The
|
|
obvious to guarantee this is by putting a number as a field in
|
|
the linearization type:
|
|
```
|
|
lincat Item = {s : Str ; n : Number} ;
|
|
```
|
|
Now we can write precisely the ``Is`` rule that expresses agreement:
|
|
```
|
|
lin Is item qual = {s = item.s ++ copula ! item.n ++ qual.s} ;
|
|
```
|
|
The copula needs a number, which it receives from the subject item.
|
|
|
|
|
|
===Determiners===
|
|
|
|
Let us turn to ``Item`` subjects and see how they receive their
|
|
numbers. The two rules
|
|
```
|
|
fun This, These : Kind -> Item ;
|
|
```
|
|
form ``Item``s from ``Kind``s by adding **determiners**, either
|
|
//this// or //these//. The determiners
|
|
require different numbers of their ``Kind`` arguments: ``This``
|
|
requires the singular (//this pizza//) and ``These`` the plural
|
|
(//these pizzas//). The ``Kind`` is the same in both cases: ``Pizza``.
|
|
Thus a ``Kind`` must have both singular and plural forms.
|
|
The simplest way to express this is by using a table:
|
|
```
|
|
lincat Kind = {s : Number => Str} ;
|
|
```
|
|
The linearization rules for ``This`` and ``These`` can now be written
|
|
```
|
|
lin This kind = {
|
|
s = "this" ++ kind.s ! Sg ;
|
|
n = Sg
|
|
} ;
|
|
|
|
lin These kind = {
|
|
s = "these" ++ kind.s ! Pl ;
|
|
n = Pl
|
|
} ;
|
|
```
|
|
The grammatical relation between the determiner and the noun is similar to
|
|
agreement, but due to some subtle differencies into which we don't go here
|
|
it is often called **government**.
|
|
|
|
Since the same pattern for determination is used four times in the ``FoodsEng`` grammar,
|
|
we codify it as an operation,
|
|
```
|
|
oper det :
|
|
Str -> Number -> {s : Number => Str} -> {s : Str ; n : Number} =
|
|
\det,n,kind -> {
|
|
s = det ++ kind.s ! n ;
|
|
n = n
|
|
} ;
|
|
```
|
|
In a more **lexicalized** grammar, determiners would be made into a
|
|
category of their own and given an inherent number:
|
|
```
|
|
lincat Det = {s : Str ; n : Number} ;
|
|
fun Det : Det -> Kind -> Item ;
|
|
lin Det det kind = {
|
|
s = det.s ++ kind.s ! det.n ;
|
|
n = det.n
|
|
} ;
|
|
```
|
|
This is essentially what is done in the linguistically motivated resource grammars.
|
|
|
|
|
|
|
|
===Parametric vs. inherent features===
|
|
|
|
``Kind``s, as in general common nouns in English, have both singular
|
|
and plural forms; what form is chosen is determined by the construction
|
|
in which the noun is used. We say that the number is a
|
|
**parametric feature** of nouns. In GF, parametric features
|
|
appear as argument types in tables in linearization types.
|
|
|
|
``Item``s, as in general noun phrases functioning as subjects, don't
|
|
have variation in number. The number is instead an **inherent feature**,
|
|
which the noun phrase passes to the verb. In GF, inherent features
|
|
appear as record fields in linearization types.
|
|
|
|
A category can have both parametric and inherent features. As we will see
|
|
in the Italian ``Foods`` grammar, nouns have parametric number and
|
|
inherent gender:
|
|
```
|
|
lincat Kind = {s : Number => Str ; g : Gender} ;
|
|
```
|
|
Nothing prevents the same parameter type from appearing both
|
|
as parametric and inherent feature, or the appearance of several inherent
|
|
features of the same type, etc. Determining the linearization types
|
|
of categories is one of the most crucial steps in the design of a GF
|
|
grammar. These two conditions must be in balance:
|
|
- existence: what forms are possible to build by morphological and
|
|
other means?
|
|
- need: what features are expected via agreement or government?
|
|
|
|
|
|
Grammar books and dictionaries give good advice on existence; for instance,
|
|
an Italian dictionary has entries such as
|
|
- **uomo**, pl. //uomini//, n.m. "man"
|
|
|
|
|
|
which tells that //uomo// is a masculine noun with the plural form //uomini//.
|
|
From this alone, or with a couple more examples, we can generalize to the type
|
|
for all nouns in Italian: they have both singular and plural forms and thus
|
|
a parametric number, and they have an inherent gender.
|
|
|
|
The distinction between parametric and inherent features can be stated in
|
|
object-oriented programming terms: a linearization type is like a **class**,
|
|
which has a **method** for linearization and also some **attributes**.
|
|
In this class, the parametric features appear as supplementary arguments to the
|
|
linearization method, whereas the inherent features appear as arguments.
|
|
|
|
Sometimes the puzzle of making agreement and government work in a grammar has
|
|
several solutions. For instance, **precedence** in programming languages can
|
|
be equivalently described by a parametric or an inherent feature
|
|
(see Section ?? below).
|
|
|
|
In natural language applications that use the resource grammar library,
|
|
all parameters are hidden from the user, who thereby does not need to bother
|
|
about them. The only thing that one has to think about is what linguistic
|
|
categories are given as linearization types to each semantic category.
|
|
|
|
|
|
|
|
==An English concrete syntax for Foods with parameters==
|
|
|
|
We repeat some of the rules above by showing the entire
|
|
module ``FoodsEng``, equipped with parameters. The parameters and
|
|
operations are, for the sake of brevity, included in the same module
|
|
and not in a separate ``resource``. However, some string operations
|
|
from the library [``Prelude`` ../../lib/prelude/Prelude.gf]
|
|
are used.
|
|
```
|
|
--# -path=.:prelude
|
|
|
|
concrete FoodsEng of Foods = open Prelude in {
|
|
|
|
lincat
|
|
S, Quality = SS ;
|
|
Kind = {s : Number => Str} ;
|
|
Item = {s : Str ; n : Number} ;
|
|
|
|
lin
|
|
Is item quality = ss (item.s ++ copula item.n ++ quality.s) ;
|
|
This = det Sg "this" ;
|
|
That = det Sg "that" ;
|
|
These = det Pl "these" ;
|
|
Those = det Pl "those" ;
|
|
QKind quality kind = {s = \\n => quality.s ++ kind.s ! n} ;
|
|
Wine = regNoun "wine" ;
|
|
Cheese = regNoun "cheese" ;
|
|
Fish = noun "fish" "fish" ;
|
|
Pizza = regNoun "pizza" ;
|
|
Very = prefixSS "very" ;
|
|
Fresh = ss "fresh" ;
|
|
Warm = ss "warm" ;
|
|
Italian = ss "Italian" ;
|
|
Expensive = ss "expensive" ;
|
|
Delicious = ss "delicious" ;
|
|
Boring = ss "boring" ;
|
|
|
|
param
|
|
Number = Sg | Pl ;
|
|
|
|
oper
|
|
det : Number -> Str -> {s : Number => Str} -> {s : Str ; n : Number} =
|
|
\n,d,cn -> {
|
|
s = d ++ cn.s ! n ;
|
|
n = n
|
|
} ;
|
|
noun : Str -> Str -> {s : Number => Str} =
|
|
\man,men -> {s = table {
|
|
Sg => man ;
|
|
Pl => men
|
|
}
|
|
} ;
|
|
regNoun : Str -> {s : Number => Str} =
|
|
\car -> noun car (car + "s") ;
|
|
copula : Number -> Str =
|
|
\n -> case n of {
|
|
Sg => "is" ;
|
|
Pl => "are"
|
|
} ;
|
|
}
|
|
```
|
|
Notice the ``case`` expression in the ``copula`` rule. Such expressions
|
|
are common in functional programming languages. In GF they are just syntactic
|
|
sugar for table selections:
|
|
```
|
|
case e of {...} === table {...} ! e
|
|
```
|
|
|
|
|
|
==Pattern matching==
|
|
|
|
We have so far built all expressions of the ``table`` form
|
|
from branches whose patterns are constants introduced in
|
|
``param`` definitions, as well as constant strings.
|
|
But there are more expressive patterns. Here is a summary of the possible forms:
|
|
- a constructor pattern (identifier introduced in a ``param`` definition) matches
|
|
the identical constructor
|
|
- a variable pattern (identifier other than constant parameter) matches anything
|
|
- the wild card ``_`` matches anything
|
|
- a string literal pattern, e.g. ``"s"``, matches the same string
|
|
- a disjunctive pattern ``P | ... | Q`` matches anything that
|
|
one of the disjuncts matches
|
|
|
|
|
|
Pattern matching is performed in the order in which the branches
|
|
appear in the table: the branch of the first matching pattern is followed.
|
|
Thus we could write the regular noun paradigm equally well as
|
|
```
|
|
regNoun : Str -> {s : Number => Str} =
|
|
\car -> {s = table {
|
|
Sg => car ;
|
|
_ => car + "s"
|
|
}
|
|
} ;
|
|
```
|
|
where the wildcard matches anything but the singular.
|
|
|
|
Tables with only one branch are a common special case.
|
|
Either the value is the same for all parameters, as in
|
|
```
|
|
lin Fish = {s = table {_ => "fish"}} ;
|
|
```
|
|
or a parameter variable is just passed on to the right-hand-side,
|
|
as in
|
|
```
|
|
lin QKind quality kind = {s = table {n => quality.s ++ kind.s ! n}} ;
|
|
```
|
|
GF has syntactic sugar for writing one-branch tables concisely:
|
|
```
|
|
\\P,...,Q => t === table {P => ... table {Q => t} ...}
|
|
```
|
|
Thus we could rewrite the above rules
|
|
```
|
|
lin Fish = {s = \\_ => "fish"} ;
|
|
lin QKind quality kind = {s = \\n => quality.s ++ kind.s ! n} ;
|
|
```
|
|
|
|
|
|
%--!
|
|
==Hierarchic parameter types==
|
|
|
|
The reader familiar with a functional programming language such as
|
|
[Haskell http://www.haskell.org] must have noticed the similarity
|
|
between parameter types in GF and **algebraic datatypes** (``data`` definitions
|
|
in Haskell). The parameter types of GF are actually a special case of algebraic
|
|
datatypes: the main restriction is that in GF, these types must be finite.
|
|
It is this restriction that makes it possible to invert linearization rules into
|
|
parsing methods.
|
|
|
|
However, finite is not the same thing as enumerated. Even in GF, parameter
|
|
constructors can take arguments, provided these arguments are from other
|
|
parameter types - only recursion is forbidden. Such parameter types impose a
|
|
hierarchic order among parameters. They are often needed to define
|
|
the linguistically most accurate parameter systems.
|
|
|
|
To give an example, Swedish adjectives
|
|
are inflected for number (singular or plural) and
|
|
gender (uter or neuter). These parameters would suggest 2*2=4 different
|
|
forms. However, the gender distinction is done only in the singular. Therefore,
|
|
it would be inaccurate to define adjective paradigms using the type
|
|
``Gender => Number => Str``. The following hierarchic definition
|
|
yields an accurate system of three adjectival forms.
|
|
```
|
|
param AdjForm = ASg Gender | APl ;
|
|
param Gender = Utr | Neutr ;
|
|
```
|
|
Here is an example of pattern matching, the paradigm of regular adjectives.
|
|
```
|
|
oper regAdj : Str -> AdjForm => Str = \fin -> table {
|
|
ASg Utr => fin ;
|
|
ASg Neutr => fin + "t" ;
|
|
APl => fin + "a" ;
|
|
}
|
|
```
|
|
A constructor can be used as a pattern that has patterns as arguments. For instance,
|
|
the adjectival paradigm in which the two singular forms are the same,
|
|
can be defined
|
|
```
|
|
oper plattAdj : Str -> AdjForm => Str = \platt -> table {
|
|
ASg _ => platt ;
|
|
APl => platt + "a" ;
|
|
}
|
|
```
|
|
|
|
|
|
|
|
|
|
%--!
|
|
==Discontinuous constituents==
|
|
|
|
A linearization type may contain more strings than one.
|
|
An example of where this is useful are English particle
|
|
verbs, such as //switch off//. The linearization of
|
|
a sentence may place the object between the verb and the particle:
|
|
//he switched it off//.
|
|
|
|
The following judgement defines transitive verbs as
|
|
**discontinuous constituents**, i.e. as having a linearization
|
|
type with two strings and not just one.
|
|
```
|
|
lincat TV = {s : Number => Str ; part : Str} ;
|
|
```
|
|
In the abstract syntax, we can now have a rule that combines a subject and and object
|
|
item with a transitive verb to form a sentence:
|
|
```
|
|
fun AppTV : Item -> TV -> Item -> Phrase ;
|
|
```
|
|
The linearization rule places the object between the two parts of the verb:
|
|
```
|
|
lin AppTV subj tv obj = {s = subj.s ++ tv.s ! subj.n ++ obj.s ++ tv.part} ;
|
|
```
|
|
There is no restriction in the number of discontinuous constituents
|
|
(or other fields) a ``lincat`` may contain. The only condition is that
|
|
the fields must be built from records, tables,
|
|
parameters, and ``Str``, but not functions.
|
|
|
|
A mathematical result
|
|
about parsing in GF says that the worst-case complexity of parsing
|
|
increases with the number of discontinuous constituents. This is
|
|
potentially a reason to avoid discontinuous constituents.
|
|
|
|
Moreover, the parsing and linearization commands only give accurate
|
|
results for categories whose linearization type has a unique ``Str``
|
|
valued field labelled ``s``. Therefore, discontinuous constituents
|
|
are not a good idea in top-level categories accessed by the users
|
|
of a grammar application.
|
|
|
|
**Exercise**. Define the language ``a^n b^n c^n`` in GF, i.e.
|
|
any number of //a//'s followed by the same number of //b//'s and
|
|
the same number of //c//'s. This language is not context-free,
|
|
but can be defined in GF by using discontinuous constituents.
|
|
|
|
|
|
==More constructs for concrete syntax==
|
|
|
|
In this section, we go through constructs that are not necessary
|
|
in simple grammars or when the concrete syntax relies on libraries.
|
|
But they are useful when writing advanced concrete syntax implementations,
|
|
such as resource grammar libraries. They complete
|
|
our presentation of concrete syntax constructs.
|
|
|
|
|
|
%--!
|
|
===Local definitions===
|
|
|
|
Local definitions ("``let`` expressions") are used in functional
|
|
programming for two reasons: to structure the code into smaller
|
|
expressions, and to avoid repeated computation of one and
|
|
the same expression. Here is an example from
|
|
Italian morphology. The operation needs to analyse the
|
|
last letter of the lemma, to select a plural ending.
|
|
It also needs the stem consisting of all letters than the last,
|
|
to add the ending to. The lemma and the ending are computed
|
|
in a local definition.
|
|
```
|
|
oper regNoun : Str -> Noun = \vino ->
|
|
let
|
|
vin = init vino ;
|
|
o = last vino
|
|
in
|
|
case o of {
|
|
"a" => mkNoun Fem vino (vin + "e") ;
|
|
"o" | "e" => mkNoun Masc vino (vin + "i") ;
|
|
_ => mkNoun Masc vino vino
|
|
} ;
|
|
```
|
|
|
|
|
|
|
|
===Record extension and subtyping===
|
|
|
|
Record types and records can be **extended** with new fields. For instance,
|
|
in German it is natural to see transitive verbs as verbs with a case, which
|
|
is usually accusative or dative, and is passed to the object of the verb.
|
|
The symbol ``**`` is used for both record types and record objects.
|
|
```
|
|
lincat TV = Verb ** {c : Case} ;
|
|
|
|
lin Follow = regVerb "folgen" ** {c = Dative} ;
|
|
```
|
|
To extend a record type or a record with a field whose label it
|
|
already has is a type error. It is also an error to extend a type or
|
|
object that is not a record.
|
|
|
|
A record type //T// is a **subtype** of another one //R//, if //T// has
|
|
all the fields of //R// and possibly other fields. For instance,
|
|
an extension of a record type is always a subtype of it.
|
|
|
|
If //T// is a subtype of //R//, an object of //T// can be used whenever
|
|
an object of //R// is required. For instance, a transitive verb can
|
|
be used whenever a verb is required.
|
|
|
|
**Contravariance** means that a function taking an //R// as argument
|
|
can also be applied to any object of a subtype //T//.
|
|
|
|
|
|
|
|
===Tuples and product types===
|
|
|
|
Product types and tuples are syntactic sugar for record types and records:
|
|
```
|
|
T1 * ... * Tn === {p1 : T1 ; ... ; pn : Tn}
|
|
<t1, ..., tn> === {p1 = T1 ; ... ; pn = Tn}
|
|
```
|
|
Thus the labels ``p1, p2,...`` are hard-coded.
|
|
|
|
|
|
===Record and tuple patterns===
|
|
|
|
Record types of parameter types count themselves as parameter types.
|
|
A typical example is a record of agreement features, e.g. Italian
|
|
```
|
|
oper Agr : PType = {g : Gender ; n : Number ; p : Person} ;
|
|
```
|
|
Notice the term ``PType`` rather than just ``Type`` referring to
|
|
parameter types. Every ``PType`` is also a ``Type``, but not vice-versa.
|
|
|
|
Pattern matching is done in the expected way, but it can moreover
|
|
utilize partial records: the branch
|
|
```
|
|
{g = Fem} => t
|
|
```
|
|
in a table of type ``Agr => T`` means the same as
|
|
```
|
|
{g = Fem ; n = _ ; p = _} => t
|
|
```
|
|
Tuple patterns are translated to record patterns in the
|
|
same way as tuples to records; partial patterns make it
|
|
possible to write, slightly surprisingly,
|
|
```
|
|
case <g,n,p> of {
|
|
<Fem> => t
|
|
...
|
|
}
|
|
```
|
|
|
|
===Regular expression patterns===
|
|
|
|
To define string operations computed at compile time, such
|
|
as in morphology, it is handy to use regular expression patterns:
|
|
- //p// ``+`` //q// : token consisting of //p// followed by //q//
|
|
- //p// ``*`` : token //p// repeated 0 or more times
|
|
(max the length of the string to be matched)
|
|
- ``-`` //p// : matches anything that //p// does not match
|
|
- //x// ``@`` //p// : bind to //x// what //p// matches
|
|
- //p// ``|`` //q// : matches what either //p// or //q// matches
|
|
|
|
|
|
The last three apply to all types of patterns, the first two only to token strings.
|
|
As an example, we give a rule for the formation of English word forms
|
|
ending with an //s// and used in the formation of both plural nouns and
|
|
third-person present-tense verbs.
|
|
```
|
|
add_s : Str -> Str = \w -> case w of {
|
|
_ + "oo" => w + "s" ; -- bamboo
|
|
_ + ("s" | "z" | "x" | "sh" | "o") => w + "es" ; -- bus, hero
|
|
_ + ("a" | "o" | "u" | "e") + "y" => w + "s" ; -- boy
|
|
x + "y" => x + "ies" ; -- fly
|
|
_ => w + "s" -- car
|
|
} ;
|
|
```
|
|
Here is another example, the plural formation in Swedish 2nd declension.
|
|
The second branch uses a variable binding with ``@`` to cover the cases where an
|
|
unstressed pre-final vowel //e// disappears in the plural
|
|
(//nyckel-nycklar, seger-segrar, bil-bilar//):
|
|
```
|
|
plural2 : Str -> Str = \w -> case w of {
|
|
pojk + "e" => pojk + "ar" ;
|
|
nyck + "e" + l@("l" | "r" | "n") => nyck + l + "ar" ;
|
|
bil => bil + "ar"
|
|
} ;
|
|
```
|
|
Variables in regular expression patterns
|
|
are always bound to the **first match**, which is the first
|
|
in the sequence of binding lists. For example:
|
|
- ``x + "e" + y`` matches ``"peter"`` with ``x = "p", y = "ter"``
|
|
- ``x + "er"*`` matches ``"burgerer"`` with ``x = "burg"
|
|
|
|
|
|
|
|
**Exercise**. Implement the German **Umlaut** operation on word stems.
|
|
The operation changes the vowel of the stressed stem syllable as follows:
|
|
//a// to //ä//, //au// to //äu//, //o// to //ö//, and //u// to //ü//. You
|
|
can assume that the operation only takes syllables as arguments. Test the
|
|
operation to see whether it correctly changes //Arzt// to //Ärzt//,
|
|
//Baum// to //Bäum//, //Topf// to //Töpf//, and //Kuh// to //Küh//.
|
|
|
|
**Exercise**. Define an operation that deletes all vowels from the
|
|
end of a string, so that e.g. "aigeia" becomes "aig".
|
|
|
|
|
|
===Free variation===
|
|
|
|
Sometimes there are many alternative ways to define a concrete syntax.
|
|
For instance, the verb negation in English can be expressed both by
|
|
//does not// and //doesn't//. In linguistic terms, these expressions
|
|
are in **free variation**. The ``variants`` construct of GF can
|
|
be used to give a list of strings in free variation. For example,
|
|
```
|
|
NegVerb verb = {s = variants {["does not"] ; "doesn't} ++ verb.s ! Pl} ;
|
|
```
|
|
An empty variant list
|
|
```
|
|
variants {}
|
|
```
|
|
can be used e.g. if a word lacks a certain form.
|
|
|
|
In general, ``variants`` should be used cautiously. It is not
|
|
recommended for modules aimed to be libraries, because the
|
|
user of the library has no way to choose among the variants.
|
|
|
|
|
|
===Prefix-dependent choices===
|
|
|
|
Sometimes a token has different forms depending on the token
|
|
that follows. An example is the English indefinite article,
|
|
which is //an// if a vowel follows, //a// otherwise.
|
|
Which form is chosen can only be decided at run time, i.e.
|
|
when a string is actually build. GF has a special construct for
|
|
such tokens, the ``pre`` construct exemplified in
|
|
```
|
|
oper artIndef : Str =
|
|
pre {"a" ; "an" / strs {"a" ; "e" ; "i" ; "o"}} ;
|
|
```
|
|
Thus
|
|
```
|
|
artIndef ++ "cheese" ---> "a" ++ "cheese"
|
|
artIndef ++ "apple" ---> "an" ++ "apple"
|
|
```
|
|
This very example does not work in all situations: the prefix
|
|
//u// has no general rules, and some problematic words are
|
|
//euphemism, one-eyed, n-gram//. It is possible to write
|
|
```
|
|
oper artIndef : Str =
|
|
pre {"a" ;
|
|
"a" / strs {"eu" ; "one"} ;
|
|
"an" / strs {"a" ; "e" ; "i" ; "o" ; "n-"}
|
|
} ;
|
|
```
|
|
|
|
**Example**. The masculine singular definite article has three forms:
|
|
- //l'// before a vowel (any of //aeiouh//): //l'amico// ("the friend")
|
|
- //lo// before "impure s"
|
|
(any of "sb", "sc", "sd", "sf", "sg", "sm", "sp", "st", "sv", "z"):
|
|
//lo stato// ("the state")
|
|
- //il// otherwise: //il vino// ("the wine")
|
|
|
|
|
|
Define this by using prefix-dependent choice.
|
|
|
|
|
|
|
|
===Predefined types===
|
|
|
|
GF has the following predefined categories in abstract syntax:
|
|
```
|
|
cat Int ; -- integers, e.g. 0, 5, 743145151019
|
|
cat Float ; -- floats, e.g. 0.0, 3.1415926
|
|
cat String ; -- strings, e.g. "", "foo", "123"
|
|
```
|
|
The objects of each of these categories are **literals**
|
|
as indicated in the comments above. No ``fun`` definition
|
|
can have a predefined category as its value type, but
|
|
they can be used as arguments. For example:
|
|
```
|
|
fun StreetAddress : Int -> String -> Address ;
|
|
lin StreetAddress number street = {s = number.s ++ street.s} ;
|
|
|
|
-- e.g. (StreetAddress 10 "Downing Street") : Address
|
|
```
|
|
FIXME: The linearization type is ``{s : Str}`` for all these categories.
|
|
|
|
|
|
===Function types with variables===
|
|
|
|
In Chapter 8, we will introduce **dependent function types**, where
|
|
the value type depends on the argument. For this end, we need a notation
|
|
that binds a variable to the argument type, as in
|
|
```
|
|
switchOff : (k : Kind) -> Action k
|
|
```
|
|
Function types //without//
|
|
variables are actually a shorthand notation: writing
|
|
```
|
|
PredVP : NP -> VP -> S
|
|
```
|
|
is shorthand for
|
|
```
|
|
PredVP : (x : NP) -> (y : VP) -> S
|
|
```
|
|
or any other naming of the variables. Actually the use of variables
|
|
sometimes shortens the code, since they can share a type:
|
|
```
|
|
octuple : (x,y,z,u,v,w,s,t : Str) -> Str
|
|
```
|
|
If a bound variable is not used, it can here, as elsewhere in GF, be replaced by
|
|
a wildcard:
|
|
```
|
|
octuple : (_,_,_,_,_,_,_,_ : Str) -> Str
|
|
```
|
|
A good practice for functions with many arguments of the same type
|
|
is to indicate the number of arguments:
|
|
```
|
|
octuple : (x1,_,_,_,_,_,_,x8 : Str) -> Str
|
|
```
|
|
One can also use heuristic variable names to document what
|
|
information each argument is expected to provide.
|
|
This is very handy in the types of inflection paradigms:
|
|
```
|
|
mkV : (drink,drank,drunk : Str) -> V
|
|
```
|
|
|
|
|
|
===Separating operation types and definitions===
|
|
|
|
In grammars intended as libraries, it is useful to separate oparation
|
|
definitions from their type signatures. The user is only interested
|
|
in the type, whereas the definition is kept for the implementor and
|
|
the maintainer. This is possible by using separate ``oper`` fragments
|
|
for the two parts:
|
|
```
|
|
oper regNoun : Str -> Noun ;
|
|
oper regNoun s = mkNoun s (s + "s") ;
|
|
```
|
|
The type checker combines the two into one ``oper`` judgement to see
|
|
if the definition matches the type. Notice that, in this way, it
|
|
is possible to bind the argument variables on the left hand side
|
|
instead of using a lambda.
|
|
|
|
In the library module, the type signatures are typically placed in
|
|
the beginning and the definitions in the end. A more radical separation
|
|
can be achieved by using the ``interface`` and ``instance`` module types
|
|
(see below Section ??): the type signatures are placed in the interface
|
|
and the definitions in the instance.
|
|
|
|
|
|
|
|
|
|
===Overloading of operations===
|
|
|
|
Large libraries, such as the GF Resource Grammar Library, may define
|
|
hundreds of names. This can be unpractical
|
|
for both the library author and the user: the author has to invent longer
|
|
and longer names which are not always intuitive,
|
|
and the author has to learn or at least be able to find all these names.
|
|
A solution to this problem, adopted by languages such as C++,
|
|
is **overloading**: one and the same name can be used for several functions.
|
|
When such a name is used, the
|
|
compiler performs **overload resolution** to find out which of
|
|
the possible functions is meant. Overload resolution is based on
|
|
the types of the functions: all functions that
|
|
have the same name must have different types.
|
|
|
|
In C++, functions with the same name can be scattered everywhere in the program.
|
|
In GF, they must be grouped together in ``overload`` groups. Here is an example
|
|
of an overload group, giving three different ways to define verbs in English:
|
|
```
|
|
oper mkV : overload {
|
|
mkV : (walk : Str) -> V ; = -- regular verbs
|
|
mkV : (omit,omitting : Str) -> V ; = -- reg. verbs with duplication
|
|
mkV : (sing,sang,sung : Str) -> V ; = -- irregular verbs
|
|
mkV : (run,ran,run,running : Str) -> V = -- irreg. verbs with duplication
|
|
}
|
|
```
|
|
Intuitively, the forms correspond to the way regular and irregular words
|
|
are given in most dictionaries: by listing relevant forms, instead of
|
|
referring to a paradigm number identifier.
|
|
|
|
The ``mkV`` example above gives only the possible types of the overloaded
|
|
operation. Their definitions can be given separately, maybe in another module
|
|
(cf. the section above). An overload group with definitions looks as follows:
|
|
```
|
|
oper mkV = overload {
|
|
mkV : (walk : Str) -> V = regV ;
|
|
mkV : (omit,omitting : Str) -> V = ... ;
|
|
mkV : (sing,sang,sung : Str) -> V = ... ;
|
|
mkV : (run,ran,run,running : Str) -> V = ... ;
|
|
}
|
|
```
|
|
Notice that the types of the branches must be repeated so that they can be
|
|
associated with proper definitions; the order of the branches has no
|
|
significance.
|
|
|
|
|
|
|
|
==The Italian Food grammar==
|
|
|
|
We conclude the parametrization of the Food grammar by presenting an
|
|
Italian variant, now complete with parameters, inflection, and
|
|
agreement.
|
|
|
|
The header part is similar to English:
|
|
```
|
|
--# -path=.:prelude
|
|
|
|
concrete FoodsIta of Foods = open Prelude in {
|
|
```
|
|
Parameters include not only number byt also gender.
|
|
```
|
|
param
|
|
Number = Sg | Pl ;
|
|
Gender = Masc | Fem ;
|
|
```
|
|
Qualities are inflected for gender and number, whereas kinds
|
|
have a parametric number (as in English) and an inherent gender.
|
|
Items have an inherent number (as in English) but also gender.
|
|
```
|
|
lincat
|
|
Phr = SS ;
|
|
Quality = {s : Gender => Number => Str} ;
|
|
Kind = {s : Number => Str ; g : Gender} ;
|
|
Item = {s : Str ; g : Gender ; n : Number} ;
|
|
```
|
|
A Quality is expressed by an adjective, which in Italian has one form for each
|
|
gender-number combination.
|
|
```
|
|
oper
|
|
adjective : (_,_,_,_ : Str) -> {s : Gender => Number => Str} =
|
|
\nero,nera,neri,nere -> {
|
|
s = table {
|
|
Masc => table {
|
|
Sg => nero ;
|
|
Pl => neri
|
|
} ;
|
|
Fem => table {
|
|
Sg => nera ;
|
|
Pl => nere
|
|
}
|
|
}
|
|
} ;
|
|
```
|
|
The very common case of regular adjectives works by adding
|
|
endings to the stem.
|
|
```
|
|
regAdj : Str -> {s : Gender => Number => Str} = \nero ->
|
|
let ner = init nero
|
|
in adjective nero (ner + "a") (ner + "i") (ner + "e") ;
|
|
```
|
|
For noun inflection, there are several paradigms; since only two forms
|
|
are ever needed, we will just give them explicitly (the resource grammar
|
|
library also has a paradigm that takes the singular form and infers the
|
|
plural and the gender from it).
|
|
```
|
|
noun : Str -> Str -> Gender -> {s : Number => Str ; g : Gender} =
|
|
\man,men,g -> {
|
|
s = table {
|
|
Sg => man ;
|
|
Pl => men
|
|
} ;
|
|
g = g
|
|
} ;
|
|
```
|
|
As in ``FoodEng``, we need only number variation for the copula.
|
|
```
|
|
copula : Number -> Str =
|
|
\n -> case n of {
|
|
Sg => "è" ;
|
|
Pl => "sono"
|
|
} ;
|
|
```
|
|
Determination is more complex than in English, because of gender:
|
|
it uses separate determiner forms for the two genders, and selects
|
|
one of them as function of the noun determined.
|
|
```
|
|
|
|
det : Number -> Str -> Str -> {s : Number => Str ; g : Gender} ->
|
|
{s : Str ; g : Gender ; n : Number} =
|
|
\n,m,f,cn -> {
|
|
s = case cn.g of {Masc => m ; Fem => f} ++ cn.s ! n ;
|
|
g = cn.g ;
|
|
n = n
|
|
} ;
|
|
```
|
|
Here is, finally, the complete set of linearization rules.
|
|
```
|
|
lin
|
|
Is item quality =
|
|
ss (item.s ++ copula item.n ++ quality.s ! item.g ! item.n) ;
|
|
This = det Sg "questo" "questa" ;
|
|
That = det Sg "quello" "quella" ;
|
|
These = det Pl "questi" "queste" ;
|
|
Those = det Pl "quelli" "quelle" ;
|
|
QKind quality kind = {
|
|
s = \\n => kind.s ! n ++ quality.s ! kind.g ! n ;
|
|
g = kind.g
|
|
} ;
|
|
Wine = noun "vino" "vini" Masc ;
|
|
Cheese = noun "formaggio" "formaggi" Masc ;
|
|
Fish = noun "pesce" "pesci" Masc ;
|
|
Pizza = noun "pizza" "pizze" Fem ;
|
|
Very qual = {s = \\g,n => "molto" ++ qual.s ! g ! n} ;
|
|
Fresh = adjective "fresco" "fresca" "freschi" "fresche" ;
|
|
Warm = regAdj "caldo" ;
|
|
Italian = regAdj "italiano" ;
|
|
Expensive = regAdj "caro" ;
|
|
Delicious = regAdj "delizioso" ;
|
|
Boring = regAdj "noioso" ;
|
|
}
|
|
```
|
|
The grammars ``FoodsEng`` and ``FoodsIta`` can be found on line, and
|
|
in the GF distribution, in the directory
|
|
[``examples/tutorial/foods/`` ../../examples/tutorial/foods/].
|
|
|
|
|
|
**Exercise**. Experiment with multilingual generation and translation in the
|
|
``Foods`` grammars.
|
|
|
|
|
|
**Exercise**. Add items, qualities, and determiners to the grammar, and try to get
|
|
their inflection and inherent features right.
|
|
|
|
**Exercise**. Write a concrete syntax of ``Food`` for a language of your choice,
|
|
now aiming for complete grammatical correctness by the use of parameters.
|
|
|
|
|
|
|
|
|
|
|
|
=Using the resource grammar library=
|
|
|
|
In this chapter, we will take a look at the GF resource grammar library.
|
|
We will use the library to implement a slightly extended ``Food`` grammar
|
|
and port it to some new languages. Some new concepts of GF's module system
|
|
are also introduced, most notably the technique of **parametrized modules**,
|
|
which has become an important "design pattern" for multilingual grammars.
|
|
|
|
|
|
==The coverage of the library==
|
|
|
|
The GF Resource Grammar Library contains grammar rules for
|
|
10 languages (in addition, 2 languages are available as incomplete
|
|
implementations, and a few more are under construction). Its purpose
|
|
is to make these rules available for application programmers,
|
|
who can thereby concentrate on the semantic and stylistic
|
|
aspects of their grammars, without having to think about
|
|
grammaticality. The targeted level of application grammarians
|
|
is that of a skilled programmer with
|
|
a practical knowledge of the target languages, but without
|
|
theoretical knowledge about their grammars.
|
|
Such a combination of
|
|
skills is typical of programmers who, for instance, want to localize
|
|
software to new languages.
|
|
|
|
The current resource languages are
|
|
- ``Ara``bic (incomplete)
|
|
- ``Cat``alan (incomplete)
|
|
- ``Dan``ish
|
|
- ``Eng``lish
|
|
- ``Fin``nish
|
|
- ``Fre``nch
|
|
- ``Ger``man
|
|
- ``Ita``lian
|
|
- ``Nor``wegian
|
|
- ``Rus``sian
|
|
- ``Spa``nish
|
|
- ``Swe``dish
|
|
|
|
|
|
The first three letters (``Eng`` etc) are used in grammar module names.
|
|
The incomplete Arabic and Catalan implementations are
|
|
enough to be used in many applications; they both contain, amoung other
|
|
things, complete inflectional morphology.
|
|
|
|
|
|
==The resource API==
|
|
|
|
The resource library API is devided into language-specific
|
|
and language-independent parts. To put it roughly,
|
|
- the syntax API is language-independent, i.e. has the same types and functions for all
|
|
languages.
|
|
Its name is ``Syntax``//L// for each language //L//
|
|
- the morphology API is language-specific, i.e. has partly different types and functions
|
|
for different languages.
|
|
Its name is ``Paradigms``//L// for each language //L//
|
|
|
|
|
|
A full documentation of the API is available on-line in the
|
|
[resource synopsis ../../lib/resource-1.0/synopsis.html]. For our
|
|
examples, we will only need a fragment of the full API.
|
|
|
|
In the first examples,
|
|
we will make use of the following categories, from the module ``Syntax``.
|
|
|
|
|| Category | Explanation | Example ||
|
|
| ``Utt`` | sentence, question, word... | "be quiet" |
|
|
| ``Adv`` | verb-phrase-modifying adverb, | "in the house" |
|
|
| ``AdA`` | adjective-modifying adverb, | "very" |
|
|
| ``S`` | declarative sentence | "she lived here" |
|
|
| ``Cl`` | declarative clause, with all tenses | "she looks at this" |
|
|
| ``AP`` | adjectival phrase | "very warm" |
|
|
| ``CN`` | common noun (without determiner) | "red house" |
|
|
| ``NP`` | noun phrase (subject or object) | "the red house" |
|
|
| ``Det`` | determiner phrase | "those seven" |
|
|
| ``Predet`` | predeterminer | "only" |
|
|
| ``Quant`` | quantifier with both sg and pl | "this/these" |
|
|
| ``Prep`` | preposition, or just case | "in" |
|
|
| ``A`` | one-place adjective | "warm" |
|
|
| ``N`` | common noun | "house" |
|
|
|
|
|
|
We will need the following syntax rules from ``Syntax``.
|
|
|
|
|| Function | Type | Example ||
|
|
| ``mkUtt`` | ``S -> Utt`` | //John walked// |
|
|
| ``mkUtt`` | ``Cl -> Utt`` | //John walks// |
|
|
| ``mkCl`` | ``NP -> AP -> Cl`` | //John is very old// |
|
|
| ``mkNP`` | ``Det -> CN -> NP`` | //the first old man// |
|
|
| ``mkNP`` | ``Predet -> NP -> NP`` | //only John// |
|
|
| ``mkDet`` | ``Quant -> Det`` | //this// |
|
|
| ``mkCN`` | ``N -> CN`` | //house// |
|
|
| ``mkCN`` | ``AP -> CN -> CN`` | //very big blue house// |
|
|
| ``mkAP`` | ``A -> AP`` | //old// |
|
|
| ``mkAP`` | ``AdA -> AP -> AP`` | //very very old// |
|
|
|
|
We will also need the following structural words from ``Syntax``.
|
|
|
|
|| Function | Type | Example ||
|
|
| ``all_Predet`` | ``Predet`` | //all// |
|
|
| ``defPlDet`` | ``Det`` | //the (houses)// |
|
|
| ``this_Quant`` | ``Quant`` | //this// |
|
|
| ``very_AdA`` | ``AdA`` | //very// |
|
|
|
|
|
|
For French, we will use the following part of ``ParadigmsFre``.
|
|
|
|
|| Function | Type ||
|
|
| ``Gender`` | ``Type`` |
|
|
| ``masculine`` | ``Gender`` |
|
|
| ``feminine`` | ``Gender`` |
|
|
| ``mkN`` | ``(cheval : Str) -> N`` |
|
|
| ``mkN`` | ``(foie : Str) -> Gender -> N`` |
|
|
| ``mkA`` | ``(cher : Str) -> A`` |
|
|
| ``mkA`` | ``(sec,seche : Str) -> A`` |
|
|
|
|
|
|
For German, we will use the following part of ``ParadigmsGer``.
|
|
|
|
|| Function | Type ||
|
|
| ``Gender`` | ``Type`` |
|
|
| ``masculine`` | ``Gender`` |
|
|
| ``feminine`` | ``Gender`` |
|
|
| ``neuter`` | ``Gender`` |
|
|
| ``mkN`` | ``(Stufe : Str) -> N`` |
|
|
| ``mkN`` | ``(Bild,Bilder : Str) -> Gender -> N`` |
|
|
| ``mkA`` | ``(klein : Str) -> A`` |
|
|
| ``mkA`` | ``(gut,besser,beste : Str) -> A`` |
|
|
|
|
|
|
**Exercise**. Try out the morphological paradigms in different languages. Do
|
|
in this way:
|
|
```
|
|
> i -path=alltenses:prelude -retain alltenses/ParadigmsGer.gfr
|
|
> cc mkN "Farbe"
|
|
> cc mkA "gut" "besser" "beste"
|
|
```
|
|
|
|
|
|
==Example: French==
|
|
|
|
We start with an abstract syntax that is like ``Food`` before, but
|
|
has a plural determiner (//all wines//) and some new nouns that will
|
|
need different genders in most languages.
|
|
```
|
|
abstract Food = {
|
|
cat
|
|
S ; Item ; Kind ; Quality ;
|
|
fun
|
|
Is : Item -> Quality -> S ;
|
|
This, All : Kind -> Item ;
|
|
QKind : Quality -> Kind -> Kind ;
|
|
Wine, Cheese, Fish, Beer, Pizza : Kind ;
|
|
Very : Quality -> Quality ;
|
|
Fresh, Warm, Italian, Expensive, Delicious, Boring : Quality ;
|
|
}
|
|
```
|
|
The French implementation opens ``SyntaxFre`` and ``ParadigmsFre``
|
|
to get access to the resource libraries needed. In order to find
|
|
the libraries, a ``path`` directive is prepended; it is interpreted
|
|
relative to the environment variable ``GF_LIB_PATH``.
|
|
```
|
|
--# -path=.:present:prelude
|
|
|
|
concrete FoodFre of Food = open SyntaxFre,ParadigmsFre in {
|
|
lincat
|
|
S = Utt ;
|
|
Item = NP ;
|
|
Kind = CN ;
|
|
Quality = AP ;
|
|
lin
|
|
Is item quality = mkUtt (mkCl item quality) ;
|
|
This kind = mkNP (mkDet this_Quant) kind ;
|
|
All kind = mkNP all_Predet (mkNP defPlDet kind) ;
|
|
QKind quality kind = mkCN quality kind ;
|
|
Wine = mkCN (mkN "vin") ;
|
|
Beer = mkCN (mkN "bière") ;
|
|
Pizza = mkCN (mkN "pizza" feminine) ;
|
|
Cheese = mkCN (mkN "fromage" masculine) ;
|
|
Fish = mkCN (mkN "poisson") ;
|
|
Very quality = mkAP very_AdA quality ;
|
|
Fresh = mkAP (mkA "frais" "fraîche") ;
|
|
Warm = mkAP (mkA "chaud") ;
|
|
Italian = mkAP (mkA "italien") ;
|
|
Expensive = mkAP (mkA "cher") ;
|
|
Delicious = mkAP (mkA "délicieux") ;
|
|
Boring = mkAP (mkA "ennuyeux") ;
|
|
}
|
|
```
|
|
The ``lincat`` definitions in ``FoodFre`` assign **resource categories**
|
|
to **application categories**. In a sense, the application categories
|
|
are **semantic**, as they correspond to concepts in the grammar application,
|
|
whereas the resource categories are **syntactic**: they give the linguistic
|
|
means to express concepts in any application.
|
|
|
|
The ``lin`` definitions likewise assign resource functions to application
|
|
functions. Under the hood, there is a lot of matching with parameters to
|
|
take care of word order, inflection, and agreement. But the user of the
|
|
library sees nothing of this: the only parameters you need to give are
|
|
the genders of some nouns, which cannot be correctly inferred from the word.
|
|
|
|
In French, for example, the one-argument ``mkN`` assigns the noun the feminine
|
|
gender if and only if it ends with an //e//. Therefore the words //fromage// and
|
|
//pizza// are given genders manually.
|
|
One can of course always give genders manually, to be on the safe side.
|
|
|
|
As for inflection, the one-argument adjective pattern ``mkA`` takes care of
|
|
completely regular adjective such as //chaud-chaude//, but also of special
|
|
cases such as //italien-italienne//, //cher-chère//, and //délicieux-délicieuse//.
|
|
But it cannot form //frais-fraîche// properly. Once again, you can give more
|
|
forms to be on the safe side. You can also test the paradigms in the GF
|
|
system.
|
|
|
|
**Exercise**. Compile the grammar ``FoodFre`` and generate and parse some sentences.
|
|
|
|
**Exercise**. Write a concrete syntax of ``Food`` for English or some other language
|
|
included in the resource library. You can also compare the output with the hand-written
|
|
grammars presented earlier in this tutorial.
|
|
|
|
**Exercise**. In particular, try to write a concrete syntax for Italian, even if
|
|
you don't know Italian. What you need to know is that "beer" is //birra// and
|
|
"pizza" is //pizza//, and that all the nouns and adjectives in the grammar
|
|
are regular.
|
|
|
|
|
|
|
|
==Functor implementation of multilingual grammars==
|
|
|
|
If you did the exercise of writing a concrete syntax of ``Food`` for some other
|
|
language, you probably noticed that much of the code looks exactly the same
|
|
as for French. The immediate reason for this is that the ``Syntax`` API is the
|
|
same for all languages; the deeper reason is that all languages (at least those
|
|
in the resource package) implement the same syntactic structures and tend to use them
|
|
in similar ways. Thus it is only the lexical parts of a concrete syntax that
|
|
you need to write anew for a new language. In brief,
|
|
- first copy the concrete syntax for one language
|
|
- then change the words (the strings and perhaps some paradigms)
|
|
|
|
|
|
But programming by copy-and-paste is not worthy of a functional programmer.
|
|
Can we write a function that takes care of the shared parts of grammar modules?
|
|
Yes, we can. It is not a function in the ``fun`` or ``oper`` sense, but
|
|
a function operating on modules, called a **functor**. This construct
|
|
is familiar from the functional languages ML and OCaml, but it does not
|
|
exist in Haskell. It also bears some resemblance to templates in C++.
|
|
Functors are also known as **parametrized modules**.
|
|
|
|
In GF, a functor is a module that ``open``s one or more **interfaces**.
|
|
An ``interface`` is a module similar to a ``resource``, but it only
|
|
contains the types of ``oper``s, not their definitions. You can think
|
|
of an interface as a kind of a record type. Thus a functor is a kind
|
|
of a function taking records as arguments and producins a module
|
|
as value.
|
|
|
|
Let us look at a functor implementation of the ``Food`` grammar.
|
|
Consider its module header first:
|
|
```
|
|
incomplete concrete FoodI of Food = open Syntax, LexFood in
|
|
```
|
|
In the functor-function analogy, ``FoodI`` would be presented as a function
|
|
with the following type signature:
|
|
```
|
|
FoodI : instance of Syntax -> instance of LexFood -> concrete of Food
|
|
```
|
|
It takes as arguments two interfaces:
|
|
- ``Syntax``, the resource grammar interface
|
|
- ``LexFood``, the domain-specific lexicon interface
|
|
|
|
|
|
Functors opening ``Syntax`` and a domain lexicon interface are in fact
|
|
so typical in GF applications, that this structure could be called
|
|
a **design patter**
|
|
for GF grammars. The idea in this pattern is, again, that
|
|
the languages use the same syntactic structures but different words.
|
|
|
|
Before going to the details of the module bodies, let us look at how functors
|
|
are concretely used. An interface has a header such as
|
|
```
|
|
interface LexFood = open Syntax in
|
|
```
|
|
To give an ``instance`` of it means that all ``oper``s are given definitione (of
|
|
appropriate types). For example,
|
|
```
|
|
instance LexFoodGer of LexFood = open SyntaxGer, ParadigmsGer in
|
|
```
|
|
Notice that when an interface opens an interface, such as ``Syntax``,
|
|
then its instance
|
|
opens an instance of it. But the instance may also open some other
|
|
resources - typically,
|
|
a domain lexicon instance opens a ``Paradigms`` module.
|
|
|
|
In the function-functor analogy, we now have
|
|
```
|
|
SyntaxGer : instance of Syntax
|
|
LexFoodGer : instance of LexFood
|
|
```
|
|
Thus we can complete the German implementation by "applying" the functor:
|
|
```
|
|
FoodI SyntaxGer LexFoodGer : concrete of Food
|
|
```
|
|
The GF syntax for doing so is
|
|
```
|
|
concrete FoodGer of Food = FoodI with
|
|
(Syntax = SyntaxGer),
|
|
(LexFood = LexFoodGer) ;
|
|
```
|
|
Notice that this is the //complete// module, not just a header of it.
|
|
The module body is received from ``FoodI``, by instantiating the
|
|
interface constants with their definitions given in the German
|
|
instances.
|
|
|
|
A module of this form, characterized by the keyword ``with``, is
|
|
called a **functor instantiation**.
|
|
|
|
Here is the complete code for the functor ``FoodI``:
|
|
```
|
|
incomplete concrete FoodI of Food = open Syntax, LexFood in {
|
|
lincat
|
|
S = Utt ;
|
|
Item = NP ;
|
|
Kind = CN ;
|
|
Quality = AP ;
|
|
lin
|
|
Is item quality = mkUtt (mkCl item quality) ;
|
|
This kind = mkNP (mkDet this_Quant) kind ;
|
|
All kind = mkNP all_Predet (mkNP defPlDet kind) ;
|
|
QKind quality kind = mkCN quality kind ;
|
|
Wine = mkCN wine_N ;
|
|
Beer = mkCN beer_N ;
|
|
Pizza = mkCN pizza_N ;
|
|
Cheese = mkCN cheese_N ;
|
|
Fish = mkCN fish_N ;
|
|
Very quality = mkAP very_AdA quality ;
|
|
Fresh = mkAP fresh_A ;
|
|
Warm = mkAP warm_A ;
|
|
Italian = mkAP italian_A ;
|
|
Expensive = mkAP expensive_A ;
|
|
Delicious = mkAP delicious_A ;
|
|
Boring = mkAP boring_A ;
|
|
}
|
|
```
|
|
|
|
|
|
==Interfaces and instances==
|
|
|
|
Let us now define the ``LexFood`` interface:
|
|
```
|
|
interface LexFood = open Syntax in {
|
|
oper
|
|
wine_N : N ;
|
|
beer_N : N ;
|
|
pizza_N : N ;
|
|
cheese_N : N ;
|
|
fish_N : N ;
|
|
fresh_A : A ;
|
|
warm_A : A ;
|
|
italian_A : A ;
|
|
expensive_A : A ;
|
|
delicious_A : A ;
|
|
boring_A : A ;
|
|
}
|
|
```
|
|
In this interface, only lexical items are declared. In general, an
|
|
interface can declare any functions and also types. The ``Syntax``
|
|
interface does so.
|
|
|
|
Here is the German instance of the interface:
|
|
```
|
|
instance LexFoodGer of LexFood = open SyntaxGer, ParadigmsGer in {
|
|
oper
|
|
wine_N = mkN "Wein" ;
|
|
beer_N = mkN "Bier" "Biere" neuter ;
|
|
pizza_N = mkN "Pizza" "Pizzen" feminine ;
|
|
cheese_N = mkN "Käse" "Käsen" masculine ;
|
|
fish_N = mkN "Fisch" ;
|
|
fresh_A = mkA "frisch" ;
|
|
warm_A = mkA "warm" "wärmer" "wärmste" ;
|
|
italian_A = mkA "italienisch" ;
|
|
expensive_A = mkA "teuer" ;
|
|
delicious_A = mkA "köstlich" ;
|
|
boring_A = mkA "langweilig" ;
|
|
}
|
|
```
|
|
Just to complete the picture, we repeat the German functor instantiation
|
|
for ``FoodI``, this time with a path directive that makes it compilable.
|
|
```
|
|
--# -path=.:present:prelude
|
|
|
|
concrete FoodGer of Food = FoodI with
|
|
(Syntax = SyntaxGer),
|
|
(LexFood = LexFoodGer) ;
|
|
```
|
|
|
|
|
|
**Exercise**. Compile and test ``FoodGer``.
|
|
|
|
**Exercise**. Refactor ``FoodFre`` into a functor instantiation.
|
|
|
|
|
|
|
|
==Adding languages to a functor implementation==
|
|
|
|
Once we have an application grammar defined by using a functor,
|
|
adding a new language is simple. Just two modules need to be written:
|
|
- a domain lexicon instance
|
|
- a functor instantiation
|
|
|
|
|
|
The functor instantiation is completely mechanical to write.
|
|
Here is one for Finnish:
|
|
```
|
|
--# -path=.:present:prelude
|
|
|
|
concrete FoodFin of Food = FoodI with
|
|
(Syntax = SyntaxFin),
|
|
(LexFood = LexFoodFin) ;
|
|
```
|
|
The domain lexicon instance requires some knowledge of the words of the
|
|
language: what words are used for which concepts, how the words are
|
|
inflected, plus features such as genders. Here is a lexicon instance for
|
|
Finnish:
|
|
```
|
|
instance LexFoodFin of LexFood = open SyntaxFin, ParadigmsFin in {
|
|
oper
|
|
wine_N = mkN "viini" ;
|
|
beer_N = mkN "olut" ;
|
|
pizza_N = mkN "pizza" ;
|
|
cheese_N = mkN "juusto" ;
|
|
fish_N = mkN "kala" ;
|
|
fresh_A = mkA "tuore" ;
|
|
warm_A = mkA "lämmin" ;
|
|
italian_A = mkA "italialainen" ;
|
|
expensive_A = mkA "kallis" ;
|
|
delicious_A = mkA "herkullinen" ;
|
|
boring_A = mkA "tylsä" ;
|
|
}
|
|
```
|
|
|
|
**Exercise**. Instantiate the functor ``FoodI`` to some language of
|
|
your choice.
|
|
|
|
|
|
==Division of labour revisited==
|
|
|
|
One purpose with the resource grammars was stated to be a division
|
|
of labour between linguists and application grammarians. We can now
|
|
reflect on what this means more precisely, by asking ourselves what
|
|
skills are required of grammarians working on different components.
|
|
|
|
Building a GF application starts from the abstract syntax. Writing
|
|
an abstract syntax requires
|
|
- understanding the semantic structure of the application domain
|
|
- knowledge of the GF fragment with categories and functions
|
|
|
|
|
|
If the concrete syntax is written by means of a functor, the programmer
|
|
has to decide what parts of the implementation are put to the interface
|
|
and what parts are shared in the functor. This requires
|
|
- knowing how the domain concepts are expressed in natural language
|
|
- knowledge of the resource grammar library - the categories and combinators
|
|
- understanding what parts are likely to be expressed in language-dependent
|
|
ways, so that they must belong to the interface and not the functor
|
|
- knowledge of the GF fragment with function applications and strings
|
|
|
|
|
|
Instantiating a ready-made functor to a new language is less demanding.
|
|
It requires essentially
|
|
- knowing how the domain words are expressed in the language
|
|
- knowing, roughly, how these words are inflected
|
|
- knowledge of the paradigms available in the library
|
|
- knowledge of the GF fragment with function applications and strings
|
|
|
|
|
|
Notice that none of these tasks requires the use of GF records, tables,
|
|
or parameters. Thus only a small fragment of GF is needed; the rest of
|
|
GF is only relevant for those who write the libraries.
|
|
|
|
Of course, grammar writing is not always straightforward usage of libraries.
|
|
For example, GF can be used for other languages than just those in the
|
|
libraries - for both natural and formal languages. A knowledge of records
|
|
and tables can, unfortunately, also be needed for understanding GF's error
|
|
messages.
|
|
|
|
**Exercise**. Design a small grammar that can be used for controlling
|
|
an MP3 player. The grammar should be able to recognize commands such
|
|
as //play this song//, with the following variations:
|
|
- verbs: //play//, //remove//
|
|
- objects: //song//, //artist//
|
|
- determiners: //this//, //the previous//
|
|
- verbs without arguments: //stop//, //pause//
|
|
|
|
|
|
The implementation goes in the following phases:
|
|
+ abstract syntax
|
|
+ functor and lexicon interface
|
|
+ lexicon instance for the first language
|
|
+ functor instantiation for the first language
|
|
+ lexicon instance for the second language
|
|
+ functor instantiation for the second language
|
|
+ ...
|
|
|
|
|
|
|
|
==Restricted inheritance==
|
|
|
|
A functor implementation using the resource ``Syntax`` interface
|
|
works as long as all concepts are expressed by using the same structures
|
|
in all languages. If this is not the case, the deviant linearization can
|
|
be made into a parameter and moved to the domain lexicon interface.
|
|
|
|
Let us take a slightly contrived example: assume that English has
|
|
no word for ``Pizza``, but has to use the paraphrase //Italian pie//.
|
|
This paraphrase is no longer a noun ``N``, but a complex phrase
|
|
in the category ``CN``. An obvious way to solve this problem is
|
|
to change interface ``LexEng`` so that the constant declared for
|
|
``Pizza`` gets a new type:
|
|
```
|
|
oper pizza_CN : CN ;
|
|
```
|
|
But this solution is unstable: we may end up changing the interface
|
|
and the function with each new language, and we must every time also
|
|
change the interface instances for the old languages to maintain
|
|
type correctness.
|
|
|
|
A better solution is to use **restricted inheritance**: the English
|
|
instantiation inherits the functor implementation except for the
|
|
constant ``Pizza``. This is how we write:
|
|
```
|
|
--# -path=.:present:prelude
|
|
|
|
concrete FoodEng of Food = FoodI - [Pizza] with
|
|
(Syntax = SyntaxEng),
|
|
(LexFood = LexFoodEng) **
|
|
open SyntaxEng, ParadigmsEng in {
|
|
|
|
lin Pizza = mkCN (mkA "Italian") (mkN "pie") ;
|
|
}
|
|
```
|
|
Restricted inheritance is available for all inherited modules. One can for
|
|
instance exclude some mushrooms and pick up just some fruit in
|
|
the ``FoodMarket`` example:
|
|
```
|
|
abstract Foodmarket = Food, Fruit [Peach], Mushroom - [Agaric]
|
|
```
|
|
A concrete syntax of ``Foodmarket`` must then indicate the same inheritance
|
|
restrictions.
|
|
|
|
|
|
**Exercise**. Change ``FoodGer`` in such a way that it says, instead of
|
|
//X is Y//, the equivalent of //X must be Y// (//X muss Y sein//).
|
|
You will have to browse the full resource API to find all
|
|
the functions needed.
|
|
|
|
|
|
==Browsing the resource with GF commands==
|
|
|
|
In addition to reading the
|
|
[resource synopsis ../../lib/resource-1.0/synopsis.html], you
|
|
can find resource function combinations by using the parser. This
|
|
is so because the resource library is in the end implemented as
|
|
a top-level ``abstract-concrete`` grammar, on which parsing
|
|
and linearization work.
|
|
|
|
Unfortunately, only English and the Scandinavian languages can be
|
|
parsed within acceptable computer resource limits when the full
|
|
resource is used.
|
|
|
|
To look for a syntax tree in the overload API by parsing, do like this:
|
|
```
|
|
> $GF_LIB_PATH
|
|
> i -path=alltenses:prelude alltenses/OverLangEng.gfc
|
|
> p -cat=S -overload "this grammar is too big"
|
|
mkS (mkCl (mkNP (mkDet this_Quant) grammar_N) (mkAP too_AdA big_A))
|
|
```
|
|
To view linearizations in all languages by parsing from English:
|
|
```
|
|
> i alltenses/langs.gfcm
|
|
> p -cat=S -lang=LangEng "this grammar is too big" | tb
|
|
UseCl TPres ASimul PPos (PredVP (DetCN (DetSg (SgQuant this_Quant)
|
|
NoOrd) (UseN grammar_N)) (UseComp (CompAP (AdAP too_AdA (PositA big_A)))))
|
|
Den här grammatiken är för stor
|
|
Esta gramática es demasiado grande
|
|
(Cyrillic: eta grammatika govorit des'at' jazykov)
|
|
Denne grammatikken er for stor
|
|
Questa grammatica è troppo grande
|
|
Diese Grammatik ist zu groß
|
|
Cette grammaire est trop grande
|
|
Tämä kielioppi on liian suuri
|
|
This grammar is too big
|
|
Denne grammatik er for stor
|
|
```
|
|
Unfortunately, the Russian grammar uses at the moment a different
|
|
character encoding than the rest and is therefore not displayed correctly
|
|
in a terminal window. However, the GF syntax editor does display all
|
|
examples correctly:
|
|
```
|
|
% gfeditor alltenses/langs.gfcm
|
|
```
|
|
When you have constructed the tree, you will see the following screen:
|
|
|
|
#BCEN
|
|
|
|
[../../lib/resource-1.0/doc/10lang-small.png]
|
|
|
|
#ECEN
|
|
|
|
|
|
**Exercise**. Find the resource grammar translations for the following
|
|
English phrases (parse in the category ``Phr``). You can first try to
|
|
build the terms manually.
|
|
|
|
//every man loves a woman//
|
|
|
|
//this grammar speaks more than ten languages//
|
|
|
|
//which languages aren't in the grammar//
|
|
|
|
//which languages did you want to speak//
|
|
|
|
|
|
|
|
=Refining semantics in abstract syntax=
|
|
|
|
While the concrete syntax constructs of GF have been already
|
|
introduced, there is much more that can be done in the abstract
|
|
syntax. The techniques of **dependent types** and
|
|
**higher order abstract syntax** are introduced in this Chapter,
|
|
which thereby concludes the presentation of the GF language.
|
|
|
|
|
|
|
|
==GF as a logical framework==
|
|
|
|
In this section, we will show how
|
|
to encode advanced semantic concepts in an abstract syntax.
|
|
We use concepts inherited from **type theory**. Type theory
|
|
is the basis of many systems known as **logical frameworks**, which are
|
|
used for representing mathematical theorems and their proofs on a computer.
|
|
In fact, GF has a logical framework as its proper part:
|
|
this part is the abstract syntax.
|
|
|
|
In a logical framework, the formalization of a mathematical theory
|
|
is a set of type and function declarations. The following is an example
|
|
of such a theory, represented as an ``abstract`` module in GF.
|
|
```
|
|
abstract Arithm = {
|
|
cat
|
|
Prop ; -- proposition
|
|
Nat ; -- natural number
|
|
fun
|
|
Zero : Nat ; -- 0
|
|
Succ : Nat -> Nat ; -- successor of x
|
|
Even : Nat -> Prop ; -- x is even
|
|
And : Prop -> Prop -> Prop ; -- A and B
|
|
}
|
|
```
|
|
|
|
**Exercise**. Give a concrete syntax of ``Arithm``, either from scatch or
|
|
by using the resource library.
|
|
|
|
|
|
|
|
|
|
==Dependent types==
|
|
|
|
**Dependent types** are a characteristic feature of GF,
|
|
inherited from the **constructive type theory** of Martin-Löf and
|
|
distinguishing GF from most other grammar formalisms and
|
|
functional programming languages.
|
|
|
|
Dependent types can be used for stating stronger
|
|
**conditions of well-formedness** than ordinary types.
|
|
A simple example is a "smart house" system, which
|
|
defines voice commands for household appliances. This example
|
|
is borrowed from the
|
|
[Regulus Book http://cslipublications.stanford.edu/site/1575865262.html]
|
|
(Rayner & al. 2006).
|
|
|
|
One who enters a smart house can use speech to dim lights, switch
|
|
on the fan, etc. For each ``Kind`` of a device, there is a set of
|
|
``Actions`` that can be performed on it; thus one can dim the lights but
|
|
not the fan, for example. These dependencies can be expressed by
|
|
by making the type ``Action`` dependent on ``Kind``. We express this
|
|
as follows in ``cat`` declarations:
|
|
```
|
|
cat
|
|
Command ;
|
|
Kind ;
|
|
Action Kind ;
|
|
Device Kind ;
|
|
```
|
|
The crucial use of the dependencies is made in the rule for forming commands:
|
|
```
|
|
fun CAction : (k : Kind) -> Action k -> Device k -> Command ;
|
|
```
|
|
In other words: an action and a device can be combined into a command only
|
|
if they are of the same ``Kind`` ``k``. If we have the functions
|
|
```
|
|
DKindOne : (k : Kind) -> Device k ; -- the light
|
|
|
|
light, fan : Kind ;
|
|
dim : Action light ;
|
|
```
|
|
we can form the syntax tree
|
|
```
|
|
CAction light dim (DKindOne light)
|
|
```
|
|
but we cannot form the trees
|
|
```
|
|
CAction light dim (DKindOne fan)
|
|
CAction fan dim (DKindOne light)
|
|
CAction fan dim (DKindOne fan)
|
|
```
|
|
Linearization rules are written as usual: the concrete syntax does not
|
|
know if a category is a dependent type. In English, you can write as follows:
|
|
```
|
|
lincat Action = {s : Str} ;
|
|
lin CAction kind act dev = {s = act.s ++ dev.s} ;
|
|
```
|
|
Notice that the argument ``kind`` does not appear in the linearization.
|
|
The type checker will be able to reconstruct it from the ``dev`` argument.
|
|
|
|
Parsing with dependent types is performed in two phases:
|
|
+ context-free parsing
|
|
+ filtering through type checker
|
|
|
|
|
|
If you just parse in the usual way, you don't enter the second phase, and
|
|
the ``kind`` argument is not found:
|
|
```
|
|
> parse "dim the light"
|
|
CAction ? dim (DKindOne light)
|
|
```
|
|
Moreover, type-incorrect commands are not rejected:
|
|
```
|
|
> parse "dim the fan"
|
|
CAction ? dim (DKindOne fan)
|
|
```
|
|
The question mark ``?`` is a **metavariable**, and is returned by the parser
|
|
for any subtree that is suppressed by a linearization rule.
|
|
|
|
To get rid of metavariables, you must feed the parse result into the
|
|
second phase of **solving** them. The ``solve`` process uses the dependent
|
|
type checker to restore the values of the metavariables. It is invoked by
|
|
the command ``put_tree = pt`` with the flag ``-transform=solve``:
|
|
```
|
|
> parse "dim the light" | put_tree -transform=solve
|
|
CAction light dim (DKindOne light)
|
|
```
|
|
The ``solve`` process may fail, in which case no tree is returned:
|
|
```
|
|
> parse "dim the fan" | put_tree -transform=solve
|
|
no tree found
|
|
```
|
|
|
|
|
|
**Exercise**. Write an abstract syntax module with above contents
|
|
and an appropriate English concrete syntax. Try to parse the commands
|
|
//dim the light// and //dim the fan//, with and without ``solve`` filtering.
|
|
|
|
|
|
**Exercise**. Perform random and exhaustive generation, with and without
|
|
``solve`` filtering.
|
|
|
|
**Exercise**. Add some device kinds and actions to the grammar.
|
|
|
|
|
|
==Polymorphism==
|
|
|
|
Sometimes an action can be performed on all kinds of devices. It would be
|
|
possible to introduce separate ``fun`` constants for each kind-action pair,
|
|
but this would be tedious. Instead, one can use **polymorphic** actions,
|
|
i.e. actions that take a ``Kind`` as an argument and produce an ``Action``
|
|
for that ``Kind``:
|
|
```
|
|
fun switchOn, switchOff : (k : Kind) -> Action k ;
|
|
```
|
|
Functions that are not polymorphic are **monomorphic**. However, the
|
|
dichotomy into monomorphism and full polymorphism is not always sufficien
|
|
for good semantic modelling: very typically, some actions are defined
|
|
for a proper subset of devices, but not just one. For instance, both doors and
|
|
windows can be opened, whereas lights cannot.
|
|
We will return to this problem by introducing the
|
|
concept of **restricted polymorphism** later,
|
|
after a chapter on proof objects.
|
|
|
|
|
|
|
|
==Dependent types and spoken language models==
|
|
|
|
We have used dependent types to control semantic well-formedness
|
|
in grammars. This is important in traditional type theory
|
|
applications such as proof assistants, where only mathematically
|
|
meaningful formulas should be constructed. But semantic filtering has
|
|
also proved important in speech recognition, because it reduces the
|
|
ambiguity of the results.
|
|
|
|
|
|
===Grammar-based language models===
|
|
|
|
The standard way of using GF in speech recognition is by building
|
|
**grammar-based language models**. To this end, GF comes with compilers
|
|
into several formats that are used in speech recognition systems.
|
|
One such format is GSL, used in the [Nuance speech recognizer www.nuance.com].
|
|
It is produced from GF simply by printing a grammar with the flag
|
|
``-printer=gsl``.
|
|
```
|
|
> import -conversion=finite SmartEng.gf
|
|
> print_grammar -printer=gsl
|
|
|
|
;GSL2.0
|
|
; Nuance speech recognition grammar for SmartEng
|
|
; Generated by GF
|
|
|
|
.MAIN SmartEng_2
|
|
|
|
SmartEng_0 [("switch" "off") ("switch" "on")]
|
|
SmartEng_1 ["dim" ("switch" "off")
|
|
("switch" "on")]
|
|
SmartEng_2 [(SmartEng_0 SmartEng_3)
|
|
(SmartEng_1 SmartEng_4)]
|
|
SmartEng_3 ("the" SmartEng_5)
|
|
SmartEng_4 ("the" SmartEng_6)
|
|
SmartEng_5 "fan"
|
|
SmartEng_6 "light"
|
|
```
|
|
Now, GSL is a context-free format, so how does it cope with dependent types?
|
|
In general, dependent types can give rise to infinitely many basic types
|
|
(exercise!), whereas a context-free grammar can by definition only have
|
|
finitely many nonterminals.
|
|
|
|
This is where the flag ``-conversion=finite`` is needed in the ``import``
|
|
command. Its effect is to convert a GF grammar with dependent types to
|
|
one without, so that each instance of a dependent type is replaced by
|
|
an atomic type. This can then be used as a nonterminal in a context-free
|
|
grammar. The ``finite`` conversion presupposes that every
|
|
dependent type has only finitely many instances, which is in fact
|
|
the case in the ``Smart`` grammar.
|
|
|
|
|
|
**Exercise**. If you have access to the Nuance speech recognizer,
|
|
test it with GF-generated language models for ``SmartEng``. Do this
|
|
both with and without ``-conversion=finite``.
|
|
|
|
**Exercise**. Construct an abstract syntax with infinitely many instances
|
|
of dependent types.
|
|
|
|
|
|
===Statistical language models===
|
|
|
|
An alternative to grammar-based language models are
|
|
**statistical language models** (**SLM**s). An SLM is
|
|
built from a **corpus**, i.e. a set of utterances. It specifies the
|
|
probability of each **n-gram**, i.e. sequence of //n// words. The
|
|
typical value of //n// is 2 (bigrams) or 3 (trigrams).
|
|
|
|
One advantage of SLMs over grammar-based models is that they are
|
|
**robust**, i.e. they can be used to recognize sequences that would
|
|
be out of the grammar or the corpus. Another advantage is that
|
|
an SLM can be built "for free" if a corpus is available.
|
|
|
|
However, collecting a corpus can require a lot of work, and writing
|
|
a grammar can be less demanding, especially with tools such as GF or
|
|
Regulus. This advantage of grammars can be combined with robustness
|
|
by creating a back-up SLM from a **synthesized corpus**. This means
|
|
simply that the grammar is used for generating such a corpus.
|
|
In GF, this can be done with the ``generate_trees`` command.
|
|
As with grammar-based models, the quality of the SLM is better
|
|
if meaningless utterances are excluded from the corpus. Thus
|
|
a good way to generate an SLM from a GF grammar is by using
|
|
dependent types and filter the results through the type checker:
|
|
```
|
|
> generate_trees | put_trees -transform=solve | linearize
|
|
```
|
|
|
|
|
|
**Exercise**. Measure the size of the corpus generated from
|
|
``SmartEng``, with and without type checker filtering.
|
|
|
|
|
|
|
|
==Digression: dependent types in concrete syntax==
|
|
|
|
The **functional fragment** of GF
|
|
terms and types comprises function types, applications, lambda
|
|
abstracts, constants, and variables. This fragment is similar in
|
|
abstract and concrete syntax. In particular,
|
|
dependent types are also available in concrete syntax.
|
|
We have not made use of them yet,
|
|
but we will now look at one example of how they
|
|
can be used.
|
|
|
|
Those readers who are familiar with functional programming languages
|
|
like ML and Haskell, may already have missed **polymorphic**
|
|
functions. For instance, Haskell programmers have access to
|
|
the functions
|
|
```
|
|
const :: a -> b -> a
|
|
const c _ = c
|
|
|
|
flip :: (a -> b -> c) -> b -> a -> c
|
|
flip f y x = f x y
|
|
```
|
|
which can be used for any given types ``a``,``b``, and ``c``.
|
|
|
|
The GF counterpart of polymorphic functions are **monomorphic**
|
|
functions with explicit **type variables**. Thus the above
|
|
definitions can be written
|
|
```
|
|
oper const :(a,b : Type) -> a -> b -> a =
|
|
\_,_,c,_ -> c ;
|
|
|
|
oper flip : (a,b,c : Type) -> (a -> b ->c) -> b -> a -> c =
|
|
\_,_,_,f,x,y -> f y x ;
|
|
```
|
|
When the operations are used, the type checker requires
|
|
them to be equipped with all their arguments; this may be a nuisance
|
|
for a Haskell or ML programmer.
|
|
|
|
|
|
|
|
==Proof objects==
|
|
|
|
Perhaps the most well-known idea in constructive type theory is
|
|
the **Curry-Howard isomorphism**, also known as the
|
|
**propositions as types principle**. Its earliest formulations
|
|
were attempts to give semantics to the logical systems of
|
|
propositional and predicate calculus. In this section, we will consider
|
|
a more elementary example, showing how the notion of proof is useful
|
|
outside mathematics, as well.
|
|
|
|
We first define the category of unary (also known as Peano-style)
|
|
natural numbers:
|
|
```
|
|
cat Nat ;
|
|
fun Zero : Nat ;
|
|
fun Succ : Nat -> Nat ;
|
|
```
|
|
The **successor function** ``Succ`` generates an infinite
|
|
sequence of natural numbers, beginning from ``Zero``.
|
|
|
|
We then define what it means for a number //x// to be //less than//
|
|
a number //y//. Our definition is based on two axioms:
|
|
- ``Zero`` is less than ``Succ`` //y// for any //y//.
|
|
- If //x// is less than //y//, then ``Succ`` //x// is less than ``Succ`` //y//.
|
|
|
|
|
|
The most straightforward way of expressing these axioms in type theory
|
|
is as typing judgements that introduce objects of a type ``Less`` //x y//:
|
|
```
|
|
cat Less Nat Nat ;
|
|
fun lessZ : (y : Nat) -> Less Zero (Succ y) ;
|
|
fun lessS : (x,y : Nat) -> Less x y -> Less (Succ x) (Succ y) ;
|
|
```
|
|
Objects formed by ``lessZ`` and ``lessS`` are
|
|
called **proof objects**: they establish the truth of certain
|
|
mathematical propositions.
|
|
For instance, the fact that 2 is less that
|
|
4 has the proof object
|
|
```
|
|
lessS (Succ Zero) (Succ (Succ (Succ Zero)))
|
|
(lessS Zero (Succ (Succ Zero)) (lessZ (Succ Zero)))
|
|
```
|
|
whose type is
|
|
```
|
|
Less (Succ (Succ Zero)) (Succ (Succ (Succ (Succ Zero))))
|
|
```
|
|
which is the formalization of the proposition that 2 is less than 4.
|
|
|
|
GF grammars can be used to provide a **semantic control** of
|
|
well-formedness of expressions. We have already seen examples of this:
|
|
the grammar of well-formed actions on household devices. By introducing proof objects
|
|
we have now added a very powerful technique of expressing semantic conditions.
|
|
|
|
A simple example of the use of proof objects is the definition of
|
|
well-formed //time spans//: a time span is expected to be from an earlier to
|
|
a later time:
|
|
```
|
|
from 3 to 8
|
|
```
|
|
is thus well-formed, whereas
|
|
```
|
|
from 8 to 3
|
|
```
|
|
is not. The following rules for spans impose this condition
|
|
by using the ``Less`` predicate:
|
|
```
|
|
cat Span ;
|
|
fun span : (m,n : Nat) -> Less m n -> Span ;
|
|
```
|
|
|
|
**Exercise**. Write an abstract and concrete syntax with the
|
|
concepts of this section, and experiment with it in GF.
|
|
|
|
|
|
**Exercise**. Define the notions of "even" and "odd" in terms
|
|
of proof objects. **Hint**. You need one function for proving
|
|
that 0 is even, and two other functions for propagating the
|
|
properties.
|
|
|
|
|
|
|
|
|
|
===Proof-carrying documents===
|
|
|
|
Another possible application of proof objects is **proof-carrying documents**:
|
|
to be semantically well-formed, the abstract syntax of a document must contain a proof
|
|
of some property, although the proof is not shown in the concrete document.
|
|
Think, for instance, of small documents describing flight connections:
|
|
|
|
//To fly from Gothenburg to Prague, first take LH3043 to Frankfurt, then OK0537 to Prague.//
|
|
|
|
The well-formedness of this text is partly expressible by dependent typing:
|
|
```
|
|
cat
|
|
City ;
|
|
Flight City City ;
|
|
fun
|
|
Gothenburg, Frankfurt, Prague : City ;
|
|
LH3043 : Flight Gothenburg Frankfurt ;
|
|
OK0537 : Flight Frankfurt Prague ;
|
|
```
|
|
This rules out texts saying //take OK0537 from Gothenburg to Prague//.
|
|
However, there is a
|
|
further condition saying that it must be possible to
|
|
change from LH3043 to OK0537 in Frankfurt.
|
|
This can be modelled as a proof object of a suitable type,
|
|
which is required by the constructor
|
|
that connects flights.
|
|
```
|
|
cat
|
|
IsPossible (x,y,z : City)(Flight x y)(Flight y z) ;
|
|
fun
|
|
Connect : (x,y,z : City) ->
|
|
(u : Flight x y) -> (v : Flight y z) ->
|
|
IsPossible x y z u v -> Flight x z ;
|
|
```
|
|
|
|
|
|
==Restricted polymorphism==
|
|
|
|
In the first version of the smart house grammar ``Smart``,
|
|
all Actions were either of
|
|
- **monomorphic**: defined for one Kind
|
|
- **polymorphic**: defined for all Kinds
|
|
|
|
|
|
To make this scale up for new Kinds, we can refine this to
|
|
**restricted polymorphism**: defined for Kinds of a certain **class**
|
|
|
|
|
|
The notion of class can be expressed in abstract syntax
|
|
by using the Curry-Howard isomorphism as follows:
|
|
- a class is a **predicate** of Kinds - i.e. a type depending of Kinds
|
|
- a Kind is in a class if there is a proof object of this type
|
|
|
|
|
|
Here is an example with switching and dimming. The classes are called
|
|
``switchable`` and ``dimmable``.
|
|
```
|
|
cat
|
|
Switchable Kind ;
|
|
Dimmable Kind ;
|
|
fun
|
|
switchable_light : Switchable light ;
|
|
switchable_fan : Switchable fan ;
|
|
dimmable_light : Dimmable light ;
|
|
|
|
switchOn : (k : Kind) -> Switchable k -> Action k ;
|
|
dim : (k : Kind) -> Dimmable k -> Action k ;
|
|
```
|
|
One advantage of this formalization is that classes for new
|
|
actions can be added incrementally.
|
|
|
|
**Exercise**. Write a new version of the ``Smart`` grammar with
|
|
classes, and test it in GF.
|
|
|
|
**Exercise**. Add some actions, kinds, and classes to the grammar.
|
|
Try to port the grammar to a new language. You will probably find
|
|
out that restricted polymorphism works differently in different languages.
|
|
For instance, in Finnish not only doors but also TVs and radios
|
|
can be "opened", which means switching them on.
|
|
|
|
|
|
==Variable bindings==
|
|
|
|
Mathematical notation and programming languages have
|
|
expressions that **bind** variables. For instance,
|
|
a universally quantifier proposition
|
|
```
|
|
(All x)B(x)
|
|
```
|
|
consists of the **binding** ``(All x)`` of the variable ``x``,
|
|
and the **body** ``B(x)``, where the variable ``x`` can have
|
|
**bound occurrences**.
|
|
|
|
Variable bindings appear in informal mathematical language as well, for
|
|
instance,
|
|
```
|
|
for all x, x is equal to x
|
|
|
|
the function that for any numbers x and y returns the maximum of x+y
|
|
and x*y
|
|
|
|
Let x be a natural number. Assume that x is even. Then x + 3 is odd.
|
|
```
|
|
In type theory, variable-binding expression forms can be formalized
|
|
as functions that take functions as arguments. The universal
|
|
quantifier is defined
|
|
```
|
|
fun All : (Ind -> Prop) -> Prop
|
|
```
|
|
where ``Ind`` is the type of individuals and ``Prop``,
|
|
the type of propositions. If we have, for instance, the equality predicate
|
|
```
|
|
fun Eq : Ind -> Ind -> Prop
|
|
```
|
|
we may form the tree
|
|
```
|
|
All (\x -> Eq x x)
|
|
```
|
|
which corresponds to the ordinary notation
|
|
```
|
|
(All x)(x = x).
|
|
```
|
|
An abstract syntax where trees have functions as arguments, as in
|
|
the two examples above, has turned out to be precisely the right
|
|
thing for the semantics and computer implementation of
|
|
variable-binding expressions. The advantage lies in the fact that
|
|
only one variable-binding expression form is needed, the lambda abstract
|
|
``\x -> b``, and all other bindings can be reduced to it.
|
|
This makes it easier to implement mathematical theories and reason
|
|
about them, since variable binding is tricky to implement and
|
|
to reason about. The idea of using functions as arguments of
|
|
syntactic constructors is known as **higher-order abstract syntax**.
|
|
|
|
The question now arises: how to define linearization rules
|
|
for variable-binding expressions?
|
|
Let us first consider universal quantification,
|
|
```
|
|
fun All : (Ind -> Prop) -> Prop
|
|
```
|
|
We write
|
|
```
|
|
lin All B = {s = "(" ++ "All" ++ B.$0 ++ ")" ++ B.s}
|
|
```
|
|
to obtain the form shown above.
|
|
This linearization rule brings in a new GF concept - the ``$0``
|
|
field of ``B`` containing a bound variable symbol.
|
|
The general rule is that, if an argument type of a function is
|
|
itself a function type ``A -> C``, the linearization type of
|
|
this argument is the linearization type of ``C``
|
|
together with a new field ``$0 : Str``. In the linearization rule
|
|
for ``All``, the argument ``B`` thus has the linearization
|
|
type
|
|
```
|
|
{$0 : Str ; s : Str},
|
|
```
|
|
since the linearization type of ``Prop`` is
|
|
```
|
|
{s : Str}
|
|
```
|
|
In other words, the linearization of a function
|
|
consists of a linearization of the body together with a
|
|
field for a linearization of the bound variable.
|
|
Those familiar with type theory or lambda calculus
|
|
should notice that GF requires trees to be in
|
|
**eta-expanded** form in order to be linearizable:
|
|
any function of type
|
|
```
|
|
A -> B
|
|
```
|
|
always has a syntax tree of the form
|
|
```
|
|
\x -> b
|
|
```
|
|
where ``b : B`` under the assumption ``x : A``.
|
|
It is in this form that an expression can be analysed
|
|
as having a bound variable and a body.
|
|
|
|
|
|
Given the linearization rule
|
|
```
|
|
lin Eq a b = {s = "(" ++ a.s ++ "=" ++ b.s ++ ")"}
|
|
```
|
|
the linearization of
|
|
```
|
|
\x -> Eq x x
|
|
```
|
|
is the record
|
|
```
|
|
{$0 = "x", s = ["( x = x )"]}
|
|
```
|
|
Thus we can compute the linearization of the formula,
|
|
```
|
|
All (\x -> Eq x x) --> {s = "[( All x ) ( x = x )]"}.
|
|
```
|
|
How did we get the //linearization// of the variable ``x``
|
|
into the string ``"x"``? GF grammars have no rules for
|
|
this: it is just hard-wired in GF that variable symbols are
|
|
linearized into the same strings that represent them in
|
|
the print-out of the abstract syntax.
|
|
|
|
To be able to //parse// variable symbols, however, GF needs to know what
|
|
to look for (instead of e.g. trying to parse //any//
|
|
string as a variable). What strings are parsed as variable symbols
|
|
is defined in the lexical analysis part of GF parsing
|
|
```
|
|
> p -cat=Prop -lexer=codevars "(All x)(x = x)"
|
|
All (\x -> Eq x x)
|
|
```
|
|
(see more details on lexers below). If several variables are bound in the
|
|
same argument, the labels are ``$0, $1, $2``, etc.
|
|
|
|
|
|
**Exercise**. Write an abstract syntax of the whole
|
|
**predicate calculus**, with the
|
|
**connectives** "and", "or", "implies", and "not", and the
|
|
**quantifiers** "exists" and "for all". Use higher-order functions
|
|
to guarantee that unbounded variables do not occur.
|
|
|
|
**Exercise**. Write a concrete syntax for your favourite
|
|
notation of predicate calculus. Use Latex as target language
|
|
if you want nice output. You can also try producing Haskell boolean
|
|
expressions. Use as many parenthesis as you need to
|
|
guarantee non-ambiguity.
|
|
|
|
|
|
|
|
==Semantic definitions==
|
|
|
|
We have seen that,
|
|
just like functional programming languages, GF has declarations
|
|
of functions, telling what the type of a function is.
|
|
But we have not yet shown how to **compute**
|
|
these functions: all we can do is provide them with arguments
|
|
and linearize the resulting terms.
|
|
Since our main interest is the well-formedness of expressions,
|
|
this has not yet bothered
|
|
us very much. As we will see, however, computation does play a role
|
|
even in the well-formedness of expressions when dependent types are
|
|
present.
|
|
|
|
GF has a form of judgement for **semantic definitions**,
|
|
recognized by the key word ``def``. At its simplest, it is just
|
|
the definition of one constant, e.g.
|
|
```
|
|
def one = Succ Zero ;
|
|
```
|
|
We can also define a function with arguments,
|
|
```
|
|
def Neg A = Impl A Abs ;
|
|
```
|
|
which is still a special case of the most general notion of
|
|
definition, that of a group of **pattern equations**:
|
|
```
|
|
def
|
|
sum x Zero = x ;
|
|
sum x (Succ y) = Succ (Sum x y) ;
|
|
```
|
|
To compute a term is, as in functional programming languages,
|
|
simply to follow a chain of reductions until no definition
|
|
can be applied. For instance, we compute
|
|
```
|
|
Sum one one -->
|
|
Sum (Succ Zero) (Succ Zero) -->
|
|
Succ (sum (Succ Zero) Zero) -->
|
|
Succ (Succ Zero)
|
|
```
|
|
Computation in GF is performed with the ``pt`` command and the
|
|
``compute`` transformation, e.g.
|
|
```
|
|
> p -tr "1 + 1" | pt -transform=compute -tr | l
|
|
sum one one
|
|
Succ (Succ Zero)
|
|
s(s(0))
|
|
```
|
|
|
|
The ``def`` definitions of a grammar induce a notion of
|
|
**definitional equality** among trees: two trees are
|
|
definitionally equal if they compute into the same tree.
|
|
Thus, trivially, all trees in a chain of computation
|
|
(such as the one above)
|
|
are definitionally equal to each other. So are the trees
|
|
```
|
|
sum Zero (Succ one)
|
|
Succ one
|
|
sum (sum Zero Zero) (sum (Succ Zero) one)
|
|
```
|
|
and infinitely many other trees.
|
|
|
|
A fact that has to be emphasized about ``def`` definitions is that
|
|
they are //not// performed as a first step of linearization.
|
|
We say that **linearization is intensional**, which means that
|
|
the definitional equality of two trees does not imply that
|
|
they have the same linearizations. For instance, each of the seven terms
|
|
shown above has a different linearizations in arithmetic notation:
|
|
```
|
|
1 + 1
|
|
s(0) + s(0)
|
|
s(s(0) + 0)
|
|
s(s(0))
|
|
0 + s(0)
|
|
s(1)
|
|
0 + 0 + s(0) + 1
|
|
```
|
|
This notion of intensionality is
|
|
no more exotic than the intensionality of any **pretty-printing**
|
|
function of a programming language (function that shows
|
|
the expressions of the language as strings). It is vital for
|
|
pretty-printing to be intensional in this sense - if we want,
|
|
for instance, to trace a chain of computation by pretty-printing each
|
|
intermediate step, what we want to see is a sequence of different
|
|
expression, which are definitionally equal.
|
|
|
|
What is more exotic is that GF has two ways of referring to the
|
|
abstract syntax objects. In the concrete syntax, the reference is intensional.
|
|
In the abstract syntax, the reference is extensional, since
|
|
**type checking is extensional**. The reason is that,
|
|
in the type theory with dependent types, types may depend on terms.
|
|
Two types depending on terms that are definitionally equal are
|
|
equal types. For instance,
|
|
```
|
|
Proof (Odd one)
|
|
Proof (Odd (Succ Zero))
|
|
```
|
|
are equal types. Hence, any tree that type checks as a proof that
|
|
1 is odd also type checks as a proof that the successor of 0 is odd.
|
|
(Recall, in this connection, that the
|
|
arguments a category depends on never play any role
|
|
in the linearization of trees of that category,
|
|
nor in the definition of the linearization type.)
|
|
|
|
In addition to computation, definitions impose a
|
|
**paraphrase** relation on expressions:
|
|
two strings are paraphrases if they
|
|
are linearizations of trees that are
|
|
definitionally equal.
|
|
Paraphrases are sometimes interesting for
|
|
translation: the **direct translation**
|
|
of a string, which is the linearization of the same tree
|
|
in the targer language, may be inadequate because it is e.g.
|
|
unidiomatic or ambiguous. In such a case,
|
|
the translation algorithm may be made to consider
|
|
translation by a paraphrase.
|
|
|
|
To stress express the distinction between
|
|
**constructors** (=**canonical** functions)
|
|
and other functions, GF has a judgement form
|
|
``data`` to tell that certain functions are canonical, e.g.
|
|
```
|
|
data Nat = Succ | Zero ;
|
|
```
|
|
Unlike in Haskell, but similarly to ALF (where constructor functions
|
|
are marked with a flag ``C``),
|
|
new constructors can be added to
|
|
a type with new ``data`` judgements. The type signatures of constructors
|
|
are given separately, in ordinary ``fun`` judgements.
|
|
One can also write directly
|
|
```
|
|
data Succ : Nat -> Nat ;
|
|
```
|
|
which is equivalent to the two judgements
|
|
```
|
|
fun Succ : Nat -> Nat ;
|
|
data Nat = Succ ;
|
|
```
|
|
|
|
**Exercise**. Implement an interpreter of a small functional programming
|
|
language with natural numbers, lists, pairs, lambdas, etc. Use higher-order
|
|
abstract syntax with semantic definitions. As target language, use
|
|
your favourite programming language.
|
|
|
|
**Exercise**. To make your interpreted language look nice, use
|
|
**precedences** instead of putting parentheses everywhere.
|
|
You can use the [precedence library ../../lib/prelude/Precedence.gf]
|
|
of GF to facilitate this.
|
|
|
|
|
|
|
|
#PARTtwo
|
|
|
|
=Embedded grammars in Haskell=
|
|
|
|
GF grammars can be used as parts of programs written in the
|
|
following languages. We will go through a skeleton application in
|
|
Haskell, while the next chapter will show how to build an
|
|
application in Java.
|
|
|
|
We will show how to build a minimal resource grammar
|
|
application whose architecture scales up to much
|
|
larger applications. The application is run from the
|
|
shell by the command
|
|
```
|
|
math
|
|
```
|
|
whereafter it reads user input in English and French.
|
|
To each input line, it answers by the truth value of
|
|
the sentence.
|
|
```
|
|
./math
|
|
zéro est pair
|
|
True
|
|
zero is odd
|
|
False
|
|
zero is even and zero is odd
|
|
False
|
|
```
|
|
The source of the application consists of the following
|
|
files:
|
|
```
|
|
LexEng.gf -- English instance of Lex
|
|
LexFre.gf -- French instance of Lex
|
|
Lex.gf -- lexicon interface
|
|
Makefile -- a makefile
|
|
MathEng.gf -- English instantiation of MathI
|
|
MathFre.gf -- French instantiation of MathI
|
|
Math.gf -- abstract syntax
|
|
MathI.gf -- concrete syntax functor for Math
|
|
Run.hs -- Haskell Main module
|
|
```
|
|
The system was built in 22 steps explained below.
|
|
|
|
|
|
==Writing GF grammars==
|
|
|
|
===Creating the first grammar===
|
|
|
|
1. Write ``Math.gf``, which defines what you want to say.
|
|
```
|
|
abstract Math = {
|
|
cat Prop ; Elem ;
|
|
fun
|
|
And : Prop -> Prop -> Prop ;
|
|
Even : Elem -> Prop ;
|
|
Zero : Elem ;
|
|
}
|
|
```
|
|
2. Write ``Lex.gf``, which defines which language-dependent
|
|
parts are needed in the concrete syntax. These are mostly
|
|
words (lexicon), but can in fact be any operations. The definitions
|
|
only use resource abstract syntax, which is opened.
|
|
```
|
|
interface Lex = open Syntax in {
|
|
oper
|
|
even_A : A ;
|
|
zero_PN : PN ;
|
|
}
|
|
```
|
|
3. Write ``LexEng.gf``, the English implementation of ``Lex.gf``
|
|
This module uses English resource libraries.
|
|
```
|
|
instance LexEng of Lex = open GrammarEng, ParadigmsEng in {
|
|
oper
|
|
even_A = regA "even" ;
|
|
zero_PN = regPN "zero" ;
|
|
|
|
}
|
|
```
|
|
4. Write ``MathI.gf``, a language-independent concrete syntax of
|
|
``Math.gf``. It opens interfaces.
|
|
which makes it an incomplete module, aka. parametrized module, aka.
|
|
functor.
|
|
```
|
|
incomplete concrete MathI of Math =
|
|
|
|
open Syntax, Lex in {
|
|
|
|
flags startcat = Prop ;
|
|
|
|
lincat
|
|
Prop = S ;
|
|
Elem = NP ;
|
|
lin
|
|
And x y = mkS and_Conj x y ;
|
|
Even x = mkS (mkCl x even_A) ;
|
|
Zero = mkNP zero_PN ;
|
|
}
|
|
```
|
|
5. Write ``MathEng.gf``, which is just an instatiation of ``MathI.gf``,
|
|
replacing the interfaces by their English instances. This is the module
|
|
that will be used as a top module in GF, so it contains a path to
|
|
the libraries.
|
|
```
|
|
instance LexEng of Lex = open SyntaxEng, ParadigmsEng in {
|
|
oper
|
|
even_A = mkA "even" ;
|
|
zero_PN = mkPN "zero" ;
|
|
}
|
|
```
|
|
|
|
|
|
===Testing===
|
|
|
|
6. Test the grammar in GF by random generation and parsing.
|
|
```
|
|
$ gf
|
|
> i MathEng.gf
|
|
> gr -tr | l -tr | p
|
|
And (Even Zero) (Even Zero)
|
|
zero is evenand zero is even
|
|
And (Even Zero) (Even Zero)
|
|
```
|
|
When importing the grammar, you will fail if you haven't
|
|
- correctly defined your ``GF_LIB_PATH`` as ``GF/lib``
|
|
- installed the resource package or
|
|
compiled the resource from source by ``make`` in ``GF/lib/resource-1.0``
|
|
|
|
|
|
|
|
===Adding a new language===
|
|
|
|
7. Now it is time to add a new language. Write a French lexicon ``LexFre.gf``:
|
|
```
|
|
instance LexFre of Lex = open SyntaxFre, ParadigmsFre in {
|
|
oper
|
|
even_A = mkA "pair" ;
|
|
zero_PN = mkPN "zéro" ;
|
|
}
|
|
```
|
|
8. You also need a French concrete syntax, ``MathFre.gf``:
|
|
```
|
|
--# -path=.:present:prelude
|
|
|
|
concrete MathFre of Math = MathI with
|
|
(Syntax = SyntaxFre),
|
|
(Lex = LexFre) ;
|
|
```
|
|
9. This time, you can test multilingual generation:
|
|
```
|
|
> i MathFre.gf
|
|
> gr | tb
|
|
Even Zero
|
|
zéro est pair
|
|
zero is even
|
|
```
|
|
|
|
|
|
===Extending the language===
|
|
|
|
10. You want to add a predicate saying that a number is odd.
|
|
It is first added to ``Math.gf``:
|
|
```
|
|
fun Odd : Elem -> Prop ;
|
|
```
|
|
11. You need a new word in ``Lex.gf``.
|
|
```
|
|
oper odd_A : A ;
|
|
```
|
|
12. Then you can give a language-independent concrete syntax in
|
|
``MathI.gf``:
|
|
```
|
|
lin Odd x = mkS (mkCl x odd_A) ;
|
|
```
|
|
13. The new word is implemented in ``LexEng.gf``.
|
|
```
|
|
oper odd_A = mkA "odd" ;
|
|
```
|
|
14. The new word is implemented in ``LexFre.gf``.
|
|
```
|
|
oper odd_A = mkA "impair" ;
|
|
```
|
|
15. Now you can test with the extended lexicon. First empty
|
|
the environment to get rid of the old abstract syntax, then
|
|
import the new versions of the grammars.
|
|
```
|
|
> e
|
|
> i MathEng.gf
|
|
> i MathFre.gf
|
|
> gr | tb
|
|
And (Odd Zero) (Even Zero)
|
|
zéro est impair et zéro est pair
|
|
zero is odd and zero is even
|
|
```
|
|
|
|
|
|
==Building a user program==
|
|
|
|
===Producing a compiled grammar package===
|
|
|
|
16. Your grammar is going to be used by persons wh``MathEng.gf``o do not need
|
|
to compile it again. They may not have access to the resource library,
|
|
either. Therefore it is advisable to produce a multilingual grammar
|
|
package in a single file. We call this package ``math.gfcm`` and
|
|
produce it, when we have ``MathEng.gf`` and
|
|
``MathEng.gf`` in the GF state, by the command
|
|
```
|
|
> pm | wf math.gfcm
|
|
```
|
|
|
|
|
|
===Writing the Haskell application===
|
|
|
|
17. Write the Haskell main file ``Run.hs``. It uses the ``EmbeddedAPI``
|
|
module defining some basic functionalities such as parsing.
|
|
The answer is produced by an interpreter of trees returned by the parser.
|
|
```
|
|
module Main where
|
|
|
|
import GSyntax
|
|
import GF.Embed.EmbedAPI
|
|
|
|
main :: IO ()
|
|
main = do
|
|
gr <- file2grammar "math.gfcm"
|
|
loop gr
|
|
|
|
loop :: MultiGrammar -> IO ()
|
|
loop gr = do
|
|
s <- getLine
|
|
interpret gr s
|
|
loop gr
|
|
|
|
interpret :: MultiGrammar -> String -> IO ()
|
|
interpret gr s = do
|
|
let tss = parseAll gr "Prop" s
|
|
case (concat tss) of
|
|
[] -> putStrLn "no parse"
|
|
t:_ -> print $ answer $ fg t
|
|
|
|
answer :: GProp -> Bool
|
|
answer p = case p of
|
|
(GOdd x1) -> odd (value x1)
|
|
(GEven x1) -> even (value x1)
|
|
(GAnd x1 x2) -> answer x1 && answer x2
|
|
|
|
value :: GElem -> Int
|
|
value e = case e of
|
|
GZero -> 0
|
|
```
|
|
|
|
18. The syntax trees manipulated by the interpreter are not raw
|
|
GF trees, but objects of the Haskell datatype ``GProp``.
|
|
From any GF grammar, a file ``GFSyntax.hs`` with
|
|
datatypes corresponding to its abstract
|
|
syntax can be produced by the command
|
|
```
|
|
> pg -printer=haskell | wf GSyntax.hs
|
|
```
|
|
The module also defines the overloaded functions
|
|
``gf`` and ``fg`` for translating from these types to
|
|
raw trees and back.
|
|
|
|
|
|
===Compiling the Haskell grammar===
|
|
|
|
19. Before compiling ``Run.hs``, you must check that the
|
|
embedded GF modules are found. The easiest way to do this
|
|
is by two symbolic links to your GF source directories:
|
|
```
|
|
$ ln -s /home/aarne/GF/src/GF
|
|
$ ln -s /home/aarne/GF/src/Transfer/
|
|
```
|
|
|
|
20. Now you can run the GHC Haskell compiler to produce the program.
|
|
```
|
|
$ ghc --make -o math Run.hs
|
|
```
|
|
The program can be tested with the command ``./math``.
|
|
|
|
|
|
===Building a distribution===
|
|
|
|
21. For a stand-alone binary-only distribution, only
|
|
the two files ``math`` and ``math.gfcm`` are needed.
|
|
For a source distribution, the files mentioned in
|
|
the beginning of this documents are needed.
|
|
|
|
|
|
===Using a Makefile===
|
|
|
|
22. As a part of the source distribution, a ``Makefile`` is
|
|
essential. The ``Makefile`` is also useful when developing the
|
|
application. It should always be possible to build an executable
|
|
from source by typing ``make``. Here is a minimal such ``Makefile``:
|
|
```
|
|
all:
|
|
echo "pm | wf math.gfcm" | gf MathEng.gf MathFre.gf
|
|
echo "pg -printer=haskell | wf GSyntax.hs" | gf math.gfcm
|
|
ghc --make -o math Run.hs
|
|
```
|
|
|
|
|
|
==The Embedded GF Haskell API==
|
|
|
|
|
|
|
|
=Embedded grammars in Java FORTHCOMING=
|
|
|
|
In this chapter, we will build a similar application in Java as was
|
|
built in Haskell in the previous chapter. This application gives
|
|
a template with the overall program structure that can be
|
|
extended with larger grammars and more Java functionalities.
|
|
|
|
Before the chapter is written, the document
|
|
|
|
[``http://www.cs.chalmers.se/~bringert/gf/gf-java.html`` http://www.cs.chalmers.se/~bringert/gf/gf-java.html]
|
|
|
|
by Björn Bringert gives more information on embedded grammars in Java.
|
|
|
|
|
|
|
|
=Spoken language translators FORTHCOMING=
|
|
|
|
In this chapter, it will be shown how a multilingual grammar is
|
|
equipped with speech recognition and speech synthesis to obtain
|
|
a spoken language translator.
|
|
|
|
Before the chapter is written, the document
|
|
|
|
[``http://www.cs.chalmers.se/~bringert/gf/translatespeech.html`` http://www.cs.chalmers.se/~bringert/gf/translatespeech.html]
|
|
|
|
by Björn Bringert gives more information on spoken language translation with GF.
|
|
|
|
|
|
|
|
=Multimodal dialogue systems FORTHCOMING=
|
|
|
|
In this chapter, we will show how to build a dialogue system in GF:
|
|
a system in which the user can talk with the computer to accomplish
|
|
a task such as finding a route on a map of transfer systems.
|
|
The grammars are **multimodal**, which means that spoken input
|
|
can be completed with mouse clicks.
|
|
|
|
Before the chapter is written, the article
|
|
"Multimodal Dialogue System Grammars" by
|
|
Björn Bringert, Robin Cooper, Peter Ljunglöf, and Aarne Ranta
|
|
(//Proceedings of DIALOR'05, Ninth Workshop on the Semantics and Pragmatics of Dialogue//,
|
|
Nancy, France, June 9-11, 2005)
|
|
provides information on multilingual grammars. The paper is available in
|
|
|
|
[``http://www.cs.chalmers.se/~bringert/publ/mm-grammars-dialor/mm-grammars-dialor.pdf`` http://www.cs.chalmers.se/~bringert/publ/mm-grammars-dialor/mm-grammars-dialor.pdf]
|
|
|
|
|
|
|
|
=Grammars of formal languages FORTHCOMING=
|
|
|
|
In this chapter, we will build a grammar for a formal language and interface
|
|
it with natural language.
|
|
|
|
==Precedence and fixity==
|
|
|
|
==Extensible natural-language interfaces==
|
|
|
|
|
|
|
|
|
|
#PARTfour
|
|
|
|
=Implementing morphology and syntax=
|
|
|
|
In this chapter, we will dig deeper into linguistic concepts than
|
|
so far. We will build an implementation of a linguistic motivated
|
|
fragment of English and Italian, covering basic morphology and syntax.
|
|
The result is a miniature of the GF resource library, whose internals will
|
|
be covered in the next chapter. There are two main purposes
|
|
for this chapter:
|
|
- to understand the linguistic concepts underlying the resource
|
|
grammar library
|
|
- to get practice in the more advanced constructs of concrete syntax
|
|
|
|
|
|
|
|
|
|
==Lexical vs. syntactic rules==
|
|
|
|
So far we have seen a grammar from a semantic point of view:
|
|
a grammar specifies a system of meanings (specified in the abstract syntax) and
|
|
tells how they are expressed in some language (as specified in a concrete syntax).
|
|
In resource grammars, as in linguistic tradition, the goal is to
|
|
specify the **grammatically correct combinations of words**, whatever their
|
|
meanings are.
|
|
|
|
Thus the grammar has two kinds of categories and two kinds of rules:
|
|
- lexical:
|
|
- lexical categories, to classify words
|
|
- lexical rules, to define words their properties
|
|
|
|
|
|
- phrasal (combinatorial, syntactic):
|
|
- phrasal categories, to classify phrases of arbitrary size
|
|
- phrasal rules, to combine phrases into larger phrases
|
|
|
|
|
|
Many grammar formalisms force a radical distinction between the lexical and syntactic
|
|
components; sometimes it is not even possible to express the two kinds of rules in
|
|
the same formalism. GF has no such restrictions. Nevertheless, it has turned out
|
|
to be a good discipline to maintain a distinction between the lexical and syntactic
|
|
components.
|
|
|
|
|
|
|
|
==The abstract syntax==
|
|
|
|
Let us go through the abstract syntax contained in the module ``Syntax``.
|
|
It can be found in the file
|
|
[``examples/tutorial/syntax/Syntax.gf`` examples/tutorial/syntax/Syntax.gf].
|
|
|
|
|
|
===Lexical categories===
|
|
|
|
Words are classified into two kinds of categories: **closed** and
|
|
**open**. The definining property of closed categories is that the
|
|
words of them can easily be enumerated; it is very seldom that any
|
|
new words are introduced in them. In general, closed categories
|
|
contain **structural words**, also known as **function words**.
|
|
In ``Syntax``, we have just two closed lexical categories:
|
|
```
|
|
cat
|
|
Det ; -- determiner e.g. "this"
|
|
AdA ; -- adadjective e.g. "very"
|
|
```
|
|
We have already used words of both categories in the ``Food``
|
|
examples; they have just not been assigned a category, but
|
|
treated as **syncategorematic**. In GF, a syncategoramatic
|
|
word is one that is introduced in a linearization rule of
|
|
some construction alongside with some other expressions that
|
|
are combined; there is no abstract syntax tree for that word
|
|
alone. Thus in the rules
|
|
```
|
|
fun That : Kind -> Item ;
|
|
lin That k = {"that" ++ k.s} ;
|
|
```
|
|
the word //that// is syncategoramatic. In linguistically motivated
|
|
grammars, syncategorematic words are usually avoided, whereas in
|
|
semantically motivated grammars, structural words are often treated
|
|
as syncategoramatic. This is partly so because the concept expressed
|
|
by a structural word in one language is often expressed by some other
|
|
means than an individual word in another. For instance, the definite
|
|
article //the// is a determiner word in English, whereas Swedish expresses
|
|
determination by inflecting the determined noun: //the wine// is //vinet//
|
|
in Swedish.
|
|
|
|
As for open classes, we will use four:
|
|
```
|
|
cat
|
|
N ; -- noun e.g. "pizza"
|
|
A ; -- adjective e.g. "good"
|
|
V ; -- intransitive verb e.g. "boil"
|
|
V2 ; -- two-place verb e.g. "eat"
|
|
```
|
|
Two-place verbs differ from intransitive verbs syntactically by
|
|
taking an object. In the lexicon, they must be equipped with information
|
|
on the //case// of the object in some languages (such as German and Latin),
|
|
and on the //preposition// in some languages (such as English).
|
|
|
|
|
|
|
|
===Lexical rules===
|
|
|
|
The words of closed categories can be listed once and for all in a
|
|
library. The ``Syntax`` module has the following:
|
|
```
|
|
fun
|
|
this_Det, that_Det, these_Det, those_Det,
|
|
every_Det, theSg_Det, thePl_Det, indef_Det, plur_Det, two_Det : Det ;
|
|
very_AdA : AdA ;
|
|
```
|
|
The naming convention for lexical rules is that we use a word followed by
|
|
the category. In this way we can for instance distinguish the determiner
|
|
//that// from the conjunction //that//. But there are also rules where this
|
|
does not quite suffice. English has no distinction between singular and
|
|
plural //the//; yet they behave differently as determiners, analogously to
|
|
//this// vs. //these//. The function //indef_Det// is the indefinite article
|
|
//a//, whereas //plur_Det// is semantically the plural indefinite article,
|
|
which has no separate word in English, as in some other languages, e.g.
|
|
//des// in French.
|
|
|
|
Open lexical categories have no objects in ``Syntax``. However, we can
|
|
build lexical modules as extensions of ``Syntax``. An example is
|
|
[``examples/tutorial/syntax/Test.gf`` examples/tutorial/syntax/Test.gf],
|
|
which we use to test the syntax. Its vocabulary is from the food domain:
|
|
```
|
|
abstract Test = Syntax ** {
|
|
fun
|
|
wine_N, cheese_N, fish_N, pizza_N, waiter_N, customer_N : N ;
|
|
fresh_A, warm_A, italian_A, expensive_A, delicious_A, boring_A : A ;
|
|
stink_V : V ;
|
|
eat_V2, love_V2, talk_V2 : V2 ;
|
|
}
|
|
```
|
|
|
|
===Phrasal categories===
|
|
|
|
The topmost category in ``Syntax`` is ``Phr``, **phrase**, covering
|
|
all complete sentences, which have a punctuation mark and could be
|
|
used alone to make an utterance. In addition to **declarative sentences**
|
|
``S``, there are also **question sentences** ``QS``:
|
|
```
|
|
cat
|
|
Phr ; -- any complete sentence e.g. "Is this pizza good?"
|
|
S ; -- declarative sentence e.g. "this pizza is good"
|
|
QS ; -- question sentence e.g. "is this pizza good"
|
|
```
|
|
The main parts of a sentence are usually taken to be the **noun phrase** ``NP`` and
|
|
the **verb phrase** ``VP``. In analogy to noun phrases, we consider
|
|
**interrogative phrases**, which are used for forming question sentences.
|
|
```
|
|
NP ; -- noun phrase e.g. "this pizza"
|
|
IP ; -- interrogative phrase e.g "which pizza"
|
|
VP ; -- verb phrase e.g. "is good"
|
|
```
|
|
The "smallest" phrasal categories are **common nouns** ``CN`` and
|
|
**adjectival phrases** ``AP``:
|
|
```
|
|
CN ; -- common noun phrase e.g. "very good pizza"
|
|
AP ; -- adjectival phrase e.g. "very good"
|
|
```
|
|
Common nouns are typically combined with determiners to build noun
|
|
phrases, whereas adjectival phrases are combined with the copula to
|
|
form verb phrases.
|
|
|
|
|
|
===Phrasal rules===
|
|
|
|
Phrasal rules specify how complex phrases are built from simpler ones.
|
|
At the bottom, there are **lexical insertion rules** telling how
|
|
words from each lexical category are "promoted" to phrases; i.e. how
|
|
the most elementary phrases are built.
|
|
```
|
|
fun
|
|
UseN : N -> CN ; -- pizza
|
|
UseA : A -> AP ; -- be good
|
|
UseV : V -> VP ; -- stink
|
|
```
|
|
Structural words usually don't form phrases themselves; thus they
|
|
are at the first place used for promoting "lower" phrase categories
|
|
to "higher" ones,
|
|
```
|
|
DetCN : Det -> CN -> NP ; -- this pizza
|
|
```
|
|
or for recursively building more complex phrases:
|
|
```
|
|
AdAP : AdA -> AP -> AP ; -- very good
|
|
```
|
|
In analogy to ``DetCN``, we could have a rule forming interrogative
|
|
noun phrases with interogative determiners such as //which//. In
|
|
``Syntax``, we however make a shortcut and just treat //which//
|
|
syncategorematically:
|
|
```
|
|
WhichCN : CN -> IP ;
|
|
```
|
|
Starting from the top of the grammar, we need two rules promoting
|
|
sentences and questions into complete phrases:
|
|
```
|
|
PhrS : S -> Phr ; -- This pizza is good.
|
|
PhrQS : QS -> Phr ; -- Is this pizza good?
|
|
```
|
|
The most central rule in most grammars is the **predication rule**,
|
|
which combines a noun
|
|
phrase and a verb phrase into a sentence. In the present grammar,
|
|
though not in the full resource grammar library, we split this
|
|
rule into two: one for positive and one for negated sentences:
|
|
```
|
|
PosVP, NegVP : NP -> VP -> S ; -- this pizza is/isn't good
|
|
```
|
|
In the same way, question sentences can be formed with these two
|
|
**polarities**:
|
|
```
|
|
QPosVP, QNegVP : NP -> VP -> QS ; -- is/isn't this pizza good
|
|
```
|
|
Another form of questions are ones with interrogative noun phrases:
|
|
```
|
|
IPPosVP, IPNegVP : IP -> VP -> QS ; -- which pizza is/isn't good
|
|
```
|
|
Verb phrases can be built by **complementation**, where a two-place
|
|
verb needs a noun phrase complement, and the (syncategoriematic) copula
|
|
can take an adjectival phrase as complement:
|
|
```
|
|
ComplV2 : V2 -> NP -> VP ; -- eat this pizza
|
|
ComplAP : AP -> VP ; -- be good
|
|
```
|
|
**Adjectival modification** is a recursive rule for forming common nouns:
|
|
```
|
|
ModCN : AP -> CN -> CN ; -- warm pizza
|
|
```
|
|
Finally, we have two special rules that are instances of so-called
|
|
**wh-movement**. The idea with this term is that a question such
|
|
as //which pizza do you eat// is a result of moving //which pizza//
|
|
from its "proper" place which is after the verb: //you eat which pizza//:
|
|
```
|
|
IPPosV2, IPNegV2 : IP -> NP -> V2 -> QS ; -- which pizza do/don't you eat
|
|
```
|
|
The full resource grammar has a more general treatment of this phenomenon.
|
|
But these special cases are already quite useful; moreover, they illustrate
|
|
variation that is possible in English between
|
|
**pied piping** (//about which pizzza do you talk//) and
|
|
**preposition stranding** (//which pizzza do you talk about//).
|
|
|
|
|
|
==Concrete syntax: English morphology==
|
|
|
|
===Worst-case functions and data abstraction===
|
|
|
|
Some English nouns, such as ``mouse``, are so irregular that
|
|
it makes no sense to see them as instances of a paradigm. Even
|
|
then, it is useful to perform **data abstraction** from the
|
|
definition of the type ``Noun``, and introduce a constructor
|
|
operation, a **worst-case function** for nouns:
|
|
```
|
|
oper mkNoun : Str -> Str -> Noun = \x,y -> {
|
|
s = table {
|
|
Sg => x ;
|
|
Pl => y
|
|
}
|
|
} ;
|
|
```
|
|
Thus we can define
|
|
```
|
|
lin Mouse = mkNoun "mouse" "mice" ;
|
|
```
|
|
and
|
|
```
|
|
oper regNoun : Str -> Noun = \x ->
|
|
mkNoun x (x + "s") ;
|
|
```
|
|
instead of writing the inflection tables explicitly.
|
|
|
|
The grammar engineering advantage of worst-case functions is that
|
|
the author of the resource module may change the definitions of
|
|
``Noun`` and ``mkNoun``, and still retain the
|
|
interface (i.e. the system of type signatures) that makes it
|
|
correct to use these functions in concrete modules. In programming
|
|
terms, ``Noun`` is then treated as an **abstract datatype**.
|
|
|
|
|
|
===A system of paradigms using predefined string operations===
|
|
|
|
In addition to the completely regular noun paradigm ``regNoun``,
|
|
some other frequent noun paradigms deserve to be
|
|
defined, for instance,
|
|
```
|
|
sNoun : Str -> Noun = \kiss -> mkNoun kiss (kiss + "es") ;
|
|
```
|
|
What about nouns like //fly//, with the plural //flies//? The already
|
|
available solution is to use the longest common prefix
|
|
//fl// (also known as the **technical stem**) as argument, and define
|
|
```
|
|
yNoun : Str -> Noun = \fl -> mkNoun (fl + "y") (fl + "ies") ;
|
|
```
|
|
But this paradigm would be very unintuitive to use, because the technical stem
|
|
is not an existing form of the word. A better solution is to use
|
|
the lemma and a string operator ``init``, which returns the initial segment (i.e.
|
|
all characters but the last) of a string:
|
|
```
|
|
yNoun : Str -> Noun = \fly -> mkNoun fly (init fly + "ies") ;
|
|
```
|
|
The operation ``init`` belongs to a set of operations in the
|
|
resource module ``Prelude``, which therefore has to be
|
|
``open``ed so that ``init`` can be used.
|
|
```
|
|
> cc init "curry"
|
|
"curr"
|
|
```
|
|
Its dual is ``last``:
|
|
```
|
|
> cc last "curry"
|
|
"y"
|
|
```
|
|
As generalizations of the library functions ``init`` and ``last``, GF has
|
|
two predefined funtions:
|
|
``Predef.dp``, which "drops" suffixes of any length,
|
|
and ``Predef.tk``, which "takes" a prefix
|
|
just omitting a number of characters from the end. For instance,
|
|
```
|
|
> cc Predef.tk 3 "worried"
|
|
"worr"
|
|
> cc Predef.dp 3 "worried"
|
|
"ied"
|
|
```
|
|
The prefix ``Predef`` is given to a handful of functions that could
|
|
not be defined internally in GF. They are available in all modules
|
|
without explicit ``open`` of the module ``Predef``.
|
|
|
|
|
|
|
|
===An intelligent noun paradigm using pattern matching===
|
|
|
|
It may be hard for the user of a resource morphology to pick the right
|
|
inflection paradigm. A way to help this is to define a more intelligent
|
|
paradigm, which chooses the ending by first analysing the lemma.
|
|
The following variant for English regular nouns puts together all the
|
|
previously shown paradigms, and chooses one of them on the basis of
|
|
the final letter of the lemma (found by the prelude operation ``last``).
|
|
```
|
|
regNoun : Str -> Noun = \s -> case last s of {
|
|
"s" | "z" => mkNoun s (s + "es") ;
|
|
"y" => mkNoun s (init s + "ies") ;
|
|
_ => mkNoun s (s + "s")
|
|
} ;
|
|
```
|
|
The paradigms ``regNoun`` does not give the correct forms for
|
|
all nouns. For instance, //mouse - mice// and
|
|
//fish - fish// must be given by using ``mkNoun``.
|
|
Also the word //boy// would be inflected incorrectly; to prevent
|
|
this, either use ``mkNoun`` or modify
|
|
``regNoun`` so that the ``"y"`` case does not
|
|
apply if the second-last character is a vowel.
|
|
|
|
**Exercise**. Extend the ``regNoun`` paradigm so that it takes care
|
|
of all variations there are in English. Test it with the nouns
|
|
//ax//, //bamboo//, //boy//, //bush//, //hero//, //match//.
|
|
**Hint**. The library functions ``Predef.dp`` and ``Predef.tk``
|
|
are useful in this task.
|
|
|
|
**Exercise**. The same rules that form plural nouns in English also
|
|
apply in the formation of third-person singular verbs.
|
|
Write a regular verb paradigm that uses this idea, but first
|
|
rewrite ``regNoun`` so that the analysis needed to build //s//-forms
|
|
is factored out as a separate ``oper``, which is shared with
|
|
``regVerb``.
|
|
|
|
|
|
===Morphological resource modules===
|
|
|
|
A common idiom is to
|
|
gather the ``oper`` and ``param`` definitions
|
|
needed for inflecting words in
|
|
a language into a morphology module. Here is a simple
|
|
example, [``MorphoEng`` resource/MorphoEng.gf].
|
|
```
|
|
--# -path=.:prelude
|
|
|
|
resource MorphoEng = open Prelude in {
|
|
|
|
param
|
|
Number = Sg | Pl ;
|
|
|
|
oper
|
|
Noun, Verb : Type = {s : Number => Str} ;
|
|
|
|
mkNoun : Str -> Str -> Noun = \x,y -> {
|
|
s = table {
|
|
Sg => x ;
|
|
Pl => y
|
|
}
|
|
} ;
|
|
|
|
regNoun : Str -> Noun = \s -> case last s of {
|
|
"s" | "z" => mkNoun s (s + "es") ;
|
|
"y" => mkNoun s (init s + "ies") ;
|
|
_ => mkNoun s (s + "s")
|
|
} ;
|
|
|
|
mkVerb : Str -> Str -> Verb = \x,y -> mkNoun y x ;
|
|
|
|
regVerb : Str -> Verb = \s -> case last s of {
|
|
"s" | "z" => mkVerb s (s + "es") ;
|
|
"y" => mkVerb s (init s + "ies") ;
|
|
"o" => mkVerb s (s + "es") ;
|
|
_ => mkVerb s (s + "s")
|
|
} ;
|
|
}
|
|
```
|
|
The first line gives as a hint to the compiler the
|
|
**search path** needed to find all the other modules that the
|
|
module depends on. The directory ``prelude`` is a subdirectory of
|
|
``GF/lib``; to be able to refer to it in this simple way, you can
|
|
set the environment variable ``GF_LIB_PATH`` to point to this
|
|
directory.
|
|
|
|
|
|
===Morphological analysis and morphology quiz===
|
|
|
|
Even though morphology is in GF
|
|
mostly used as an auxiliary for syntax, it
|
|
can also be useful on its own right. The command ``morpho_analyse = ma``
|
|
can be used to read a text and return for each word the analyses that
|
|
it has in the current concrete syntax.
|
|
```
|
|
> rf bible.txt | morpho_analyse
|
|
```
|
|
In the same way as translation exercises, morphological exercises can
|
|
be generated, by the command ``morpho_quiz = mq``. Usually,
|
|
the category is set to be something else than ``S``. For instance,
|
|
```
|
|
> cd GF/lib/resource-1.0/
|
|
> i french/IrregFre.gf
|
|
> morpho_quiz -cat=V
|
|
|
|
Welcome to GF Morphology Quiz.
|
|
...
|
|
|
|
réapparaître : VFin VCondit Pl P2
|
|
réapparaitriez
|
|
> No, not réapparaitriez, but
|
|
réapparaîtriez
|
|
Score 0/1
|
|
```
|
|
Finally, a list of morphological exercises can be generated
|
|
off-line and saved in a
|
|
file for later use, by the command ``morpho_list = ml``
|
|
```
|
|
> morpho_list -number=25 -cat=V | wf exx.txt
|
|
```
|
|
The ``number`` flag gives the number of exercises generated.
|
|
|
|
|
|
|
|
==Concrete syntax: English phrase building FORTHCOMING==
|
|
|
|
|
|
===Predication===
|
|
|
|
|
|
===Complementization===
|
|
|
|
|
|
===Determination===
|
|
|
|
|
|
===Modification===
|
|
|
|
|
|
===Putting the syntax together===
|
|
|
|
|
|
==Concrete syntax for Italian FORTHCOMING==
|
|
|
|
|
|
|
|
|
|
=Inside the resource grammar library FORTHCOMING=
|
|
|
|
This chapter is meant for those who want to understand the GF resource
|
|
grammar library more thoroughly - in particular, for those who
|
|
want to write their own implementations.
|
|
|
|
Before the chapter is finished, more information can be found in
|
|
|
|
[``http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/doc/Resource-HOWTO.html`` http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/doc/Resource-HOWTO.html]
|
|
|
|
|
|
|
|
|
|
=Building a compiler in GF FORTHCOMING=
|
|
|
|
The purpose of this chapter is to show how the expressive power
|
|
of GF can be used in a complete definition of a language, which
|
|
includes both its syntax and semantics. We will write a grammar
|
|
for a subset of the C programming language, taking care of
|
|
parsing and type checking. As an alternative concrete syntax,
|
|
we will use JVM (Java Virtual Machine), so that we can use
|
|
the grammar to compile C code into runnable JVM code.
|
|
|
|
Before the chapter is finished, more information can be found in
|
|
|
|
[``http://www.cs.chalmers.se/~aarne/GF/doc/gfcc.pdf`` http://www.cs.chalmers.se/~aarne/GF/doc/gfcc.pdf]
|
|
|
|
|
|
|
|
=Using Transfer for semantics actions FORTHCOMING=
|
|
|
|
Semantic actions on syntax trees can be defined in a general purpose language,
|
|
as is done in embedded Java and Haskell applications. But this method has
|
|
two drawbacks:
|
|
- the definitions are not portable from one language to another
|
|
- the host languages do not support the dependent type system of GF
|
|
|
|
|
|
In this chapter, a powerful technique provided by a separate ``transfer`` language
|
|
is introduced, and applied to build logical representations from syntax trees,
|
|
perform anaphora resolution and generation, and optimize text generation by
|
|
aggregagation.
|
|
|
|
Before the chapter is ready, more information on ``transfer`` can be found in
|
|
[``http://www.cs.chalmers.se/~aarne/GF/doc/transfer.html`` http://www.cs.chalmers.se/~aarne/GF/doc/transfer.html]
|
|
|
|
Many aspects of logical semantics and how they are implemented by using ``def``
|
|
definitions are covered in the article
|
|
"Computational semantics in type theory" by Aarne Ranta
|
|
(// Mathematics and Social Sciences//, 165, pp. 31-57, 2004),
|
|
available in
|
|
|
|
[``http://msh.revues.org/document2925.html`` http://msh.revues.org/document2925.html]
|
|
|
|
|
|
#PARTthree
|
|
|
|
|
|
=Syntax and semantics of the GF language FORTHCOMING=
|
|
|
|
Before this chapter is written, we refer to Appendix A with a BNF grammar of GF,
|
|
and Appendix B with a quick reference.
|
|
|
|
|
|
|
|
=The resource grammar API FORTHCOMING=
|
|
|
|
Before this chapter is written, we refer to the
|
|
Resource Grammar Synopsis:
|
|
|
|
[``http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/synopsis.html`` ../../lib/resource-1.0/synopsis.html]
|
|
|
|
|
|
=The low-level GFC format FORTHCOMING=
|
|
|
|
This is the format generated by the GF grammar compiler. The format is
|
|
undergoing a revision, so a reference manual will appear later.
|
|
|
|
|
|
=The GF system FORTHCOMING=
|
|
|
|
==The command language of the GF shell==
|
|
|
|
Before this chapter is written, we refer to online help obtained
|
|
in the GF shell with the command ``help``.
|
|
|
|
|
|
==The multilingual syntax editor==
|
|
|
|
The
|
|
[Editor User Manual http://www.cs.chalmers.se/~aarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm]
|
|
describes the use of the editor, which works for any multilingual GF grammar.
|
|
|
|
Here is a snapshot of the editor:
|
|
|
|
%#BCEN
|
|
|
|
%#EDITORPNG
|
|
|
|
%#ECEN
|
|
|
|
|
|
The grammars of the snapshot are from the
|
|
[Letter grammar package http://www.cs.chalmers.se/~aarne/GF/examples/letter].
|
|
|
|
|
|
==Communicating with GF==
|
|
|
|
Other processes can communicate with the GF command interpreter,
|
|
and also with the GF syntax editor. Useful flags when invoking GF are
|
|
- ``-batch`` suppresses the promps and structures the communication with XML tags.
|
|
- ``-s`` suppresses non-output non-error messages and XML tags.
|
|
- ``-nocpu`` suppresses CPU time indication.
|
|
|
|
|
|
Thus the most silent way to invoke GF is
|
|
```
|
|
gf -batch -s -nocpu
|
|
```
|
|
|
|
|
|
|
|
=Documenting grammars with GFDoc FORTHCOMING=
|
|
|
|
GFDoc is a very simple system generating HTML and LaTeX from GF grammars.
|
|
Some mark-up has been defined to enable annotations of the source code.
|
|
|
|
Before this chapter is written, a summary of the tool is obtained by
|
|
the command
|
|
```
|
|
% gfdoc
|
|
```
|
|
The `gfdoc`` program is normally installed as a part of GF installation.
|
|
|
|
|
|
|
|
#startappendix
|
|
|
|
#PARTbnf
|
|
|
|
#twocolumn
|
|
|
|
#PARTquickref
|
|
|
|
#smallsize
|
|
|
|
|
|
This is a quick reference on GF grammars. It aims to
|
|
cover all forms of expression available when writing
|
|
grammars. It assumes basic knowledge of GF, which
|
|
can be acquired from the Tutorial part of this book.
|
|
For the commands of the GF system, help is obtained on line by the
|
|
help command (``help``). Help on invoking
|
|
GF from the shell is obtained with (``gf -help``).
|
|
|
|
|
|
==A complete example==
|
|
|
|
This is a complete example of a GF grammar divided
|
|
into three modules in files. The grammar recognizes the
|
|
phrases //one pizza// and //two pizzas//.
|
|
|
|
File ``Order.gf``:
|
|
```
|
|
abstract Order = {
|
|
cat
|
|
Order ;
|
|
Item ;
|
|
fun
|
|
One, Two : Item -> Order ;
|
|
Pizza : Item ;
|
|
}
|
|
```
|
|
File ``OrderEng.gf`` (the top file):
|
|
```
|
|
--# -path=.:prelude
|
|
concrete OrderEng of Order =
|
|
open Res, Prelude in {
|
|
flags startcat=Order ;
|
|
lincat
|
|
Order = SS ;
|
|
Item = {s : Num => Str} ;
|
|
lin
|
|
One it = ss ("one" ++ it.s ! Sg) ;
|
|
Two it = ss ("two" ++ it.s ! Pl) ;
|
|
Pizza = regNoun "pizza" ;
|
|
}
|
|
```
|
|
File ``Res.gf``:
|
|
```
|
|
resource Res = open Prelude in {
|
|
param Num = Sg | Pl ;
|
|
oper regNoun : Str -> {s : Num => Str} =
|
|
\dog -> {s = table {
|
|
Sg => dog ;
|
|
_ => dog + "s"
|
|
}
|
|
} ;
|
|
}
|
|
```
|
|
To use this example, do
|
|
```
|
|
% gf -- in shell: start GF
|
|
> i OrderEng.gf -- in GF: import grammar
|
|
> p "one pizza" -- parse string
|
|
> l Two Pizza -- linearize tree
|
|
```
|
|
|
|
|
|
|
|
==Modules and files==
|
|
|
|
One module per file.
|
|
File named ``Foo.gf`` contains module named
|
|
``Foo``.
|
|
|
|
Each module has the structure
|
|
```
|
|
moduletypename =
|
|
Inherits ** -- optional
|
|
open Opens in -- optional
|
|
{ Judgements }
|
|
```
|
|
Inherits are names of modules of the same type.
|
|
Inheritance can be restricted:
|
|
```
|
|
Mo[f,g], -- inherit only f,g from Mo
|
|
Lo-[f,g] -- inheris all but f,g from Lo
|
|
```
|
|
Opens are possible in ``concrete`` and ``resource``.
|
|
They are names of modules of these two types, possibly
|
|
qualified:
|
|
```
|
|
(M = Mo), -- refer to f as M.f or Mo.f
|
|
(Lo = Lo) -- refer to f as Lo.f
|
|
```
|
|
Module types and judgements in them:
|
|
```
|
|
abstract A -- cat, fun, def, data
|
|
concrete C of A -- lincat, lin, lindef, printname
|
|
resource R -- param, oper
|
|
|
|
interface I -- like resource, but can have
|
|
oper f : T without definition
|
|
instance J of I -- like resource, defines opers
|
|
that I leaves undefined
|
|
incomplete -- functor: concrete that opens
|
|
concrete CI of A = one or more interfaces
|
|
open I in ...
|
|
concrete CJ of A = -- completion: concrete that
|
|
CI with instantiates a functor by
|
|
(I = J) instances of open interfaces
|
|
```
|
|
The forms
|
|
``param``, ``oper``
|
|
may appear in ``concrete`` as well, but are then
|
|
not inherited to extensions.
|
|
|
|
All modules can moreover have ``flags`` and comments.
|
|
Comments have the forms
|
|
```
|
|
-- till the end of line
|
|
{- any number of lines between -}
|
|
--# used for compiler pragmas
|
|
```
|
|
A ``concrete`` can be opened like a ``resource``.
|
|
It is translated as follows:
|
|
```
|
|
cat C ---> oper C : Type =
|
|
lincat C = T T ** {lock_C : {}}
|
|
|
|
fun f : G -> C ---> oper f : A* -> C* = \g ->
|
|
lin f = t t g ** {lock_C = <>}
|
|
```
|
|
An ``abstract`` can be opened like an ``interface``.
|
|
Any ``concrete`` of it then works as an ``instance``.
|
|
|
|
|
|
|
|
==Judgements==
|
|
|
|
```
|
|
cat C -- declare category C
|
|
cat C (x:A)(y:B x) -- dependent category C
|
|
cat C A B -- same as C (x : A)(y : B)
|
|
fun f : T -- declare function f of type T
|
|
def f = t -- define f as t
|
|
def f p q = t -- define f by pattern matching
|
|
data C = f | g -- set f,g as constructors of C
|
|
data f : A -> C -- same as
|
|
fun f : A -> C; data C=f
|
|
|
|
lincat C = T -- define lin.type of cat C
|
|
lin f = t -- define lin. of fun f
|
|
lin f x y = t -- same as lin f = \x y -> t
|
|
lindef C = \s -> t -- default lin. of cat C
|
|
printname fun f = s -- printname shown in menus
|
|
printname cat C = s -- printname shown in menus
|
|
printname f = s -- same as printname fun f = s
|
|
|
|
param P = C | D Q R -- define parameter type P
|
|
with constructors
|
|
C : P, D : Q -> R -> P
|
|
oper h : T = t -- define oper h of type T
|
|
oper h = t -- omit type, if inferrable
|
|
|
|
flags p=v -- set value of flag p
|
|
```
|
|
Judgements are terminated by semicolons (``;``).
|
|
Subsequent judgments of the same form may share the
|
|
keyword:
|
|
```
|
|
cat C ; D ; -- same as cat C ; cat D ;
|
|
```
|
|
Judgements can also share RHS:
|
|
```
|
|
fun f,g : A -- same as fun f : A ; g : A
|
|
```
|
|
|
|
|
|
==Types==
|
|
|
|
Abstract syntax (in ``fun``):
|
|
```
|
|
C -- basic type, if cat C
|
|
C a b -- basic type for dep. category
|
|
(x : A) -> B -- dep. functions from A to B
|
|
(_ : A) -> B -- nondep. functions from A to B
|
|
(p,q : A) -> B -- same as (p : A)-> (q : A) -> B
|
|
A -> B -- same as (_ : A) -> B
|
|
Int -- predefined integer type
|
|
Float -- predefined float type
|
|
String -- predefined string type
|
|
```
|
|
Concrete syntax (in ``lincat``):
|
|
```
|
|
Str -- token lists
|
|
P -- parameter type, if param P
|
|
P => B -- table type, if P param. type
|
|
{s : Str ; p : P}-- record type
|
|
{s,t : Str} -- same as {s : Str ; t : Str}
|
|
{a : A} **{b : B}-- record type extension, same as
|
|
{a : A ; b : B}
|
|
A * B * C -- tuple type, same as
|
|
{p1 : A ; p2 : B ; p3 : C}
|
|
Ints n -- type of n first integers
|
|
```
|
|
Resource (in ``oper``): all those of concrete, plus
|
|
```
|
|
Tok -- tokens (subtype of Str)
|
|
A -> B -- functions from A to B
|
|
Int -- integers
|
|
Strs -- list of prefixes (for pre)
|
|
PType -- parameter type
|
|
Type -- any type
|
|
```
|
|
As parameter types, one can use any finite type:
|
|
``P`` defined in ``param P``,
|
|
``Ints n``, and record types of parameter types.
|
|
|
|
|
|
|
|
==Expressions==
|
|
|
|
Syntax trees = full function applications
|
|
```
|
|
f a b -- : C if fun f : A -> B -> C
|
|
1977 -- : Int
|
|
3.14 -- : Float
|
|
"foo" -- : String
|
|
```
|
|
Higher-Order Abstract syntax (HOAS): functions as arguments:
|
|
```
|
|
F a (\x -> c) -- : C if a : A, c : C (x : B),
|
|
fun F : A -> (B -> C) -> C
|
|
```
|
|
Tokens and token lists
|
|
```
|
|
"hello" -- : Tok, singleton Str
|
|
"hello" ++ "world" -- : Str
|
|
["hello world"] -- : Str, same as "hello" ++ "world"
|
|
"hello" + "world" -- : Tok, computes to "helloworld"
|
|
[] -- : Str, empty list
|
|
```
|
|
Parameters
|
|
```
|
|
Sg -- atomic constructor
|
|
VPres Sg P2 -- applied constructor
|
|
{n = Sg ; p = P3} -- record of parameters
|
|
```
|
|
Tables
|
|
```
|
|
table { -- by full branches
|
|
Sg => "mouse" ;
|
|
Pl => "mice"
|
|
}
|
|
table { -- by pattern matching
|
|
Pl => "mice" ;
|
|
_ => "mouse" -- wildcard pattern
|
|
}
|
|
table {
|
|
n => regn n "cat" -- variable pattern
|
|
}
|
|
table Num {...} -- table given with arg. type
|
|
table ["ox"; "oxen"] -- table as course of values
|
|
\\_ => "fish" -- same as table {_ => "fish"}
|
|
\\p,q => t -- same as \\p => \\q => t
|
|
|
|
t ! p -- select p from table t
|
|
case e of {...} -- same as table {...} ! e
|
|
```
|
|
Records
|
|
```
|
|
{s = "Liz"; g = Fem} -- record in full form
|
|
{s,t = "et"} -- same as {s = "et";t= "et"}
|
|
{s = "Liz"} ** -- record extension: same as
|
|
{g = Fem} {s = "Liz" ; g = Fem}
|
|
|
|
<a,b,c> -- tuple, same as {p1=a;p2=b;p3=c}
|
|
```
|
|
Functions
|
|
```
|
|
\x -> t -- lambda abstract
|
|
\x,y -> t -- same as \x -> \y -> t
|
|
\x,_ -> t -- binding not in t
|
|
```
|
|
Local definitions
|
|
```
|
|
let x : A = d in t -- let definition
|
|
let x = d in t -- let defin, type inferred
|
|
let x=d ; y=e in t -- same as
|
|
let x=d in let y=e in t
|
|
let {...} in t -- same as let ... in t
|
|
|
|
t where {...} -- same as let ... in t
|
|
```
|
|
Free variation
|
|
```
|
|
variants {x ; y} -- both x and y possible
|
|
variants {} -- nothing possible
|
|
```
|
|
Prefix-dependent choices
|
|
```
|
|
pre {"a" ; "an" / v} -- "an" before v, "a" otherw.
|
|
strs {"a" ; "i" ;"o"}-- list of condition prefixes
|
|
```
|
|
Typed expression
|
|
```
|
|
<t:T> -- same as t, to help type inference
|
|
```
|
|
Accessing bound variables in ``lin``: use fields ``$1, $2, $3,...``.
|
|
Example:
|
|
```
|
|
fun F : (A : Set) -> (El A -> Prop) -> Prop ;
|
|
lin F A B = {s = ["for all"] ++ A.s ++ B.$1 ++ B.s}
|
|
```
|
|
|
|
|
|
==Pattern matching==
|
|
|
|
These patterns can be used in branches of ``table`` and
|
|
``case`` expressions. Patterns are matched in the order in
|
|
which they appear in the grammar.
|
|
```
|
|
C -- atomic param constructor
|
|
C p q -- param constr. applied to patterns
|
|
x -- variable, matches anything
|
|
_ -- wildcard, matches anything
|
|
"foo" -- string
|
|
56 -- integer
|
|
{s = p ; y = q} -- record, matches extensions too
|
|
<p,q> -- tuple, same as {p1=p ; p2=q}
|
|
p | q -- disjunction, binds to first match
|
|
x@p -- binds x to what p matches
|
|
- p -- negation
|
|
p + "s" -- sequence of two string patterns
|
|
p* -- repetition of a string pattern
|
|
```
|
|
|
|
==Sample library functions==
|
|
|
|
```
|
|
-- lib/prelude/Predef.gf
|
|
drop : Int -> Tok -> Tok -- drop prefix of length
|
|
take : Int -> Tok -> Tok -- take prefix of length
|
|
tk : Int -> Tok -> Tok -- drop suffix of length
|
|
dp : Int -> Tok -> Tok -- take suffix of length
|
|
occur : Tok -> Tok -> PBool -- test if substring
|
|
occurs : Tok -> Tok -> PBool -- test if any char occurs
|
|
show : (P:Type) -> P ->Tok -- param to string
|
|
read : (P:Type) -> Tok-> P -- string to param
|
|
toStr : (L:Type) -> L ->Str -- find "first" string
|
|
|
|
-- lib/prelude/Prelude.gf
|
|
param Bool = True | False
|
|
oper
|
|
SS : Type -- the type {s : Str}
|
|
ss : Str -> SS -- construct SS
|
|
cc2 : (_,_ : SS) -> SS -- concat SS's
|
|
optStr : Str -> Str -- string or empty
|
|
strOpt : Str -> Str -- empty or string
|
|
bothWays : Str -> Str -> Str -- X++Y or Y++X
|
|
init : Tok -> Tok -- all but last char
|
|
last : Tok -> Tok -- last char
|
|
prefixSS : Str -> SS -> SS
|
|
postfixSS : Str -> SS -> SS
|
|
infixSS : Str -> SS -> SS -> SS
|
|
if_then_else : (A : Type) -> Bool -> A -> A -> A
|
|
if_then_Str : Bool -> Str -> Str -> Str
|
|
```
|
|
|
|
|
|
==Flags==
|
|
|
|
Flags can appear, with growing priority,
|
|
- in files, judgement ``flags`` and without dash (``-``)
|
|
- as flags to ``gf`` when invoked, with dash
|
|
- as flags to various GF commands, with dash
|
|
|
|
|
|
Some common flags used in grammars:
|
|
```
|
|
startcat=cat use this category as default
|
|
|
|
lexer=literals int and string literals recognized
|
|
lexer=code like program code
|
|
lexer=text like text: spacing, capitals
|
|
lexer=textlit text, unknowns as string lits
|
|
|
|
unlexer=code like program code
|
|
unlexer=codelit code, remove string lit quotes
|
|
unlexer=text like text: punctuation, capitals
|
|
unlexer=textlit text, remove string lit quotes
|
|
unlexer=concat remove all spaces
|
|
unlexer=bind remove spaces around "&+"
|
|
|
|
optimize=all_subs best for almost any concrete
|
|
optimize=values good for lexicon concrete
|
|
optimize=all usually good for resource
|
|
optimize=noexpand for resource, if =all too big
|
|
```
|
|
For the full set of values for ``FLAG``,
|
|
use on-line ``h -FLAG``.
|
|
|
|
|
|
|
|
==File paths==
|
|
|
|
Colon-separated lists of directories searched in the
|
|
given order:
|
|
```
|
|
--# -path=.:../abstract:../common:prelude
|
|
```
|
|
This can be (in order of growing preference), as
|
|
first line in the top file, as flag to ``gf``
|
|
when invoked, or as flag to the ``i`` command.
|
|
The prefix ``--#`` is used only in files.
|
|
|
|
If the environment variabls ``GF_LIB_PATH`` is defined, its
|
|
value is automatically prefixed to each directory to
|
|
extend the original search path.
|
|
|
|
|
|
==Alternative grammar formats==
|
|
|
|
**Old GF** (before GF 2.0):
|
|
all judgements in any kinds of modules,
|
|
division into files uses ``include``s.
|
|
A file ``Foo.gf`` is recognized as the old format
|
|
if it lacks a module header.
|
|
|
|
**Context-free** (file ``foo.cf``). The form of rules is e.g.
|
|
```
|
|
Fun. S ::= NP "is" AP ;
|
|
```
|
|
If ``Fun`` is omitted, it is generated automatically.
|
|
Rules must be one per line. The RHS can be empty.
|
|
|
|
**Extended BNF** (file ``foo.ebnf``). The form of rules is e.g.
|
|
```
|
|
S ::= (NP+ ("is" | "was") AP | V NP*) ;
|
|
```
|
|
where the RHS is a regular expression of categories
|
|
and quoted tokens: ``"foo", CAT, T U, T|U, T*, T+, T?``, or empty.
|
|
Rule labels are generated automatically.
|
|
|
|
|
|
**Probabilistic grammars** (not a separate format).
|
|
You can set the probability of a function ``f`` (in its value category) by
|
|
```
|
|
--# prob f 0.009
|
|
```
|
|
These are put into a file given to GF using the ``probs=File`` flag
|
|
on command line. This file can be the grammar file itself.
|
|
|
|
**Example-based grammars** (file ``foo.gfe``). Expressions of the form
|
|
```
|
|
in Cat "example string"
|
|
```
|
|
are preprocessed by using a parser given by the flag
|
|
```
|
|
--# -resource=File
|
|
```
|
|
and the result is written to ``foo.gf``.
|
|
|
|
|