chaps 1-4

This commit is contained in:
aarne
2007-08-30 13:37:15 +00:00
parent 890b00658c
commit 36c3c2369c
2 changed files with 246 additions and 152 deletions

View File

@@ -0,0 +1,4 @@
\(
(\forall p : \mbox{Pt})(\forall l : \mbox{Ln})(\mbox{Ext}(p,l) \; \supset \;
(\exists m : \mbox{Ln})(\mbox{Inc}(p,m) \& \mbox{Par}(m,l)))
\)

View File

@@ -39,6 +39,7 @@ Draft %%date(%c)
%!postproc(tex): #startappendix "appendix" %!postproc(tex): #startappendix "appendix"
%!postproc(tex): #FORMULAone "input{FORMULAone}"
#LOGOPNG #LOGOPNG
@@ -97,10 +98,21 @@ language designed for expressing linguistic rules. A set of such rules is called
a **grammar**. GF is designed to make it easy to write grammar rules; this is a **grammar**. GF is designed to make it easy to write grammar rules; this is
much easier than in a general-purpose programming language such as much easier than in a general-purpose programming language such as
Java or C or Haskell. But it is also in many ways easier and more productive Java or C or Haskell. But it is also in many ways easier and more productive
than in other languages specialized in grammars; the most well-known of these than in other languages specialized in grammars. The most well-known of these
is the BNF notation (Backus Naur Form), which is also known as is the **BNF notation** (Backus Naur Form), which is also known as
context-free grammars and implemented in tools such as YACC. **context-free grammars**. It is used in compiler tools such as YACC.
While BNF is an excellent way to specify the grammar of a programming language,
it does not scale up the the complexities of of natural languages.
Linguists have of course developed many formalisms that are designed for
describing natural languages. In comparison with them, one advantage of GF is
its support for **multilingual grammars**. In a multilingual grammar, a
semantic representation can be shared between several languages, in such a
way that a grammar written for one language can be easily ported to another
one. The grammar also supports translation between the languages it includes.
The most comprehensive multilingual grammars written in GF cover almost 100
languages.
GF does not only enable the writing of grammars. It is also equipped with tools GF does not only enable the writing of grammars. It is also equipped with tools
for integrating grammars in language-processing systems. for integrating grammars in language-processing systems.
To build a language application usually involves much more than just a grammar, To build a language application usually involves much more than just a grammar,
@@ -139,26 +151,43 @@ of functional programming is **type theory**, which in turn has its roots in
logic and the foundations of mathematics. GF was, at the first place, created to logic and the foundations of mathematics. GF was, at the first place, created to
implement the idea that type theory can provide **semantics**, i.e. formalize implement the idea that type theory can provide **semantics**, i.e. formalize
the meaning of natural languages. Several aspects of type-theoretical semantics the meaning of natural languages. Several aspects of type-theoretical semantics
were covered in the monograph //Type-Theoretical Grammar// (A. Ranta, OUP 1994). were covered in the monograph //Type-Theoretical Grammar// (Ranta 1994).
But a stronger aspect grew out of subsequent experiments dealing with different But a stronger aspect grew out of subsequent experiments dealing with different
languages: it is possible to have a common semantics for many languages, and languages: it is possible to have a common semantics for many languages, and
thereby build systems that translate between languages via the semantics. thereby build systems that translate between languages via the semantics.
The first implementation of this idea was written as a plug-in to the The first implementation of this idea was written as a plug-in to the
proof editor Alfa (Magnusson & Nordström 1994) in 1995. proof editor Alfa (Magnusson & Nordström 1994) in 1995. It supported the
generation of sentences in six languages from mathematical formula that were
manipulated in the proof editor. One example area was geometry:
- Formula:
#FORMULAone
- English:
//If a point p lies outside a line l, then there is a line m such that p lies on m and m is parallel to l.//
- Finnish:
//Jos piste p on suoran l ulkopuolella, niin on olemassa suora m sellainen että p on suoralla m ja m on yhdensuuntainen l:n kanssa.//
- French:
//Si un point p est extérieur à une ligne l, alors il existe une ligne m telle que p soit sur la ligne m et que m soit parallèle à l.//
- German:
//Wenn ein Punkt p außerhalb einer Geraden l liegt, dann gibt es so eine Gerade m daß p auf der Geraden m liegt und m parallel zu l ist.//
- Italian:
//Si un punto p è esteriore a una linea l, allora esiste una line m tale che p sia sulla linea m e che m sia parallela a l.//
- Swedish:
//Om en punkt p ligger utanför linje l, så finns det en linje m sådan att p ligger på m och m är parallel med l.//
As a stand-alone programming language, GF was first implemented in 1998. This As a stand-alone programming language, GF was first implemented in 1998. This
took place at Xerox Research Centre Europe in Grenoble, within a project entitled took place at Xerox Research Centre Europe in Grenoble, within a project entitled
//Multilingual Document Authoring//. The goal of the project was to build a tool //Multilingual Document Authoring//. The goal of the project was to build a tool
for writing documents in multiple languages simultaneously, so that the user for writing documents in multiple languages simultaneously, so that the user
need only know one of the languages; the rest will be produced automatically need only know one of the languages; the rest will be produced automatically
via translations from the type-theoretical semantics. via translations from the type-theoretical semantics (Dymetman & al. 2000).
In addition to GF itself, the project produced some prototype applications, In addition to GF itself, the project produced some prototype applications,
e.g. a restaurant phrase book and an editor of medical drug descriptions. e.g. a restaurant phrase book and an editor for medical drug descriptions.
An important aspect was the adaptability of the system to new domains and An important aspect was the adaptability of the system to new domains and
languages; hence the need of a language where such adaptations can be made languages; hence the need of a language where such adaptations can be made
by just wrting new grammars. by just writing new grammars.
The grammars that were build in the Xerox project Most grammars that were build in the Xerox project
remained property of Xerox Corporation, but the GF formalism and its remained property of Xerox Corporation, but the GF formalism and its
implementation were released as open-source software under GNU General implementation were released as open-source software under GNU General
Public License. From 1999, the development of GF continued mostly at Public License. From 1999, the development of GF continued mostly at
@@ -167,9 +196,9 @@ and Gothenburg University. In this environment, both functional programming
and type theory are strong research areas. This helped GF to develop into and type theory are strong research areas. This helped GF to develop into
a more stable and more full-fledged programming language. a more stable and more full-fledged programming language.
At Chalmers, GF has also been used in courses given to computer science At Chalmers GF was soon used in courses given to computer science
students and in joint projects with non-linguist research groups. students and in joint projects with non-linguist research groups.
This activity soon crystallized the idea of making GF into This activity was soon summarized in the idea of making GF into
"the working programmer's grammar formalism", as "the working programmer's grammar formalism", as
opposed to a tool requiring linguistic expertise. A nice experience from opposed to a tool requiring linguistic expertise. A nice experience from
the courses (both graduate and undergraduate) was that computer scientists are the courses (both graduate and undergraduate) was that computer scientists are
@@ -180,10 +209,11 @@ it was developed in the way programming languages are,
following the virtues of familiarity and "the least surprise". following the virtues of familiarity and "the least surprise".
Issues of stability are also important, including Issues of stability are also important, including
backward compatibility and portability to different platforms. backward compatibility and portability to different platforms.
Documentation is something there can hardly be too much of.
As a mark of stability, version 1.0 of GF was released in As a mark of stability, version 1.0 of GF was released in
2002. In 2004, a reference article appeared in the 2002. Also
//Journal of Functional Programming//, documentation was found important. In addition to on-line documents,
a reference article appeared in the
//Journal of Functional Programming// in 2004,
and a long tutorial text was published in the post-publication of and a long tutorial text was published in the post-publication of
ESSLLI lecture notes. ESSLLI lecture notes.
@@ -200,7 +230,7 @@ parser implementations.
At the same time as GF was used in joint projects with computer science groups, At the same time as GF was used in joint projects with computer science groups,
collaboration with the Linguistics Department of collaboration with the Linguistics Department of
Gothenburg University served as a "linguistic sanity check" of GF. To efforts Gothenburg University served as a "linguistic sanity check" of GF. Two efforts
that have been formative to the development of GF were started within this that have been formative to the development of GF were started within this
collaboration: collaboration:
- resource grammar libraries - resource grammar libraries
@@ -226,13 +256,13 @@ Besides dialogue systems, multilingual authoring and translation continued
to be the main application of GF. The European WebALT project (Web Advanced to be the main application of GF. The European WebALT project (Web Advanced
Learning Technologies, 2005-2006), used GF to build a tool for translating Learning Technologies, 2005-2006), used GF to build a tool for translating
mathematical exercises from formal specifications (written in MathML) to mathematical exercises from formal specifications (written in MathML) to
six language. Also tool integrating GF with a computer algebra system was six languages. Also a tool integrating GF with a computer algebra system was
developed. The project gave rise to a company, WebALT Inc. developed. The project gave rise to a company, WebALT Inc.
At the time of writing this (August 2007), the release of GF has version At the time of writing this (August 2007), the release of GF has version
number 2.8. It is a stable system that has been built with contributions number 2.8. It is a stable system that has been built with contributions
of dozens of persons and been used by at least hundreds; download figures of dozens of persons and been used by at least hundreds; download figures
are in thousands. New ideas of how to apply GF are posted by users almost are in thousands. New ideas on how to use GF are posted by users almost
every week. every week.
@@ -244,15 +274,16 @@ a definitive manual that gathers all relevant information in one place.
However, it is also intended to serve those who want to get started with GF, and However, it is also intended to serve those who want to get started with GF, and
who don't necessarily have the technical background of the typical who don't necessarily have the technical background of the typical
users. We believe that learning to program in GF is not more difficult users. We believe that learning to program in GF is not more difficult
than learning some other programming language; as for the linguistic than learning some other programming language. As for learning the linguistic
aspects, our experience is that writing grammars is an excellent introduction aspects, our experience is that writing grammars is an excellent introduction
to the problems of linguistics. In this way, theory can be learnt at the to the problems of linguistics. In this way, linguistic
same time as it is motivated by concrete problems. theory can be learnt at the same time as it is motivated by concrete problems.
The book thus starts with a Tutorial (Part I), which gradually explains all The book thus starts with a Tutorial (Part I), which gradually explains all
the constructs of the GF programming language. Also the design and style the constructs of the GF programming language. Also the design and style
aspects of grammar engineering are covered, to help the user to scale aspects of grammar engineering are covered, to help the user to scale
up from small to large and possibly collaborative applications. up from small to large and possibly collaborative applications. Linguistic
concepts are explained at the same time as they are introduced in grammars.
After the Tutorial, the book continues with a manual on building applications After the Tutorial, the book continues with a manual on building applications
that have embedded grammars as components (Part II). Part III goes through some that have embedded grammars as components (Part II). Part III goes through some
@@ -262,7 +293,7 @@ Part IV is a complete reference manual, and the two Appendices
show a grammar of the GF language and a quick reference card of GF. show a grammar of the GF language and a quick reference card of GF.
What is not given much space in the book is theoretical discussions of What is not given much space in the book is theoretical discussions of
GF, especially in comparison to other grammar formalism. Even though important GF, especially in comparison to other grammar formalisms. Even though important
in the development of GF as a scientifically justified framework, such in the development of GF as a scientifically justified framework, such
discussions are not relevant for programmers who just want to use GF - any more discussions are not relevant for programmers who just want to use GF - any more
than, say, a book on Haskell has to include comparisons with Java. In fact, than, say, a book on Haskell has to include comparisons with Java. In fact,
@@ -283,8 +314,8 @@ We start in Chapter 3
by building a "Hello World" grammar, which covers greetings by building a "Hello World" grammar, which covers greetings
in three languages: English (//hello world//), in three languages: English (//hello world//),
Finnish (//terve maailma//), and Italian (//ciao mondo//). Finnish (//terve maailma//), and Italian (//ciao mondo//).
This **multilingual grammar** is based on the distinction, central in This **multilingual grammar** is based on the most central idea of GF:
GF, between **abstract syntax** the distinction between **abstract syntax**
(the logical structure) and **concrete syntax** (the (the logical structure) and **concrete syntax** (the
sequence of words). sequence of words).
@@ -310,11 +341,13 @@ has just one:
vino delizioso, vini deliziosi, pizza deliziosa, pizze deliziose vino delizioso, vini deliziosi, pizza deliziosa, pizze deliziose
``` ```
The **morphology** of a language describes the The **morphology** of a language describes the
forms of its words, and the basics of it are explained in Chapter 5. forms of its words, and the basics
of it are explained in Chapter 5.
The complete description of morphology The complete description of morphology
belongs to resource grammars, whose use is covered in Chapter 6. belongs to resource grammars, whose use is covered in Chapter 6.
However, we will explain all the Writing resource grammars will only be covered in Part III;
however, we the Tutorial does explain all the
programming concepts involved in resource grammars. programming concepts involved in resource grammars.
In addition to multilinguality, **semantics** is an important aspect of GF In addition to multilinguality, **semantics** is an important aspect of GF
@@ -324,14 +357,17 @@ syntax. After the presentation of concrete syntax constructs, we proceed
in Chapter 7 to the enrichment of abstract syntax with **dependent types**, in Chapter 7 to the enrichment of abstract syntax with **dependent types**,
**variable bindings**, and **semantic definitions**. **variable bindings**, and **semantic definitions**.
Italian is used as the example language of many grammars. English and Italian are used as example languages in many grammars.
Of course, we will not presuppose that the reader knows any Italian. Of course, we will not presuppose that the reader knows any Italian.
We have chosen Italian because it has a rich structure We have chosen Italian because it has a rich structure
that illustrates very well the capacities of GF. that illustrates very well the capacities of GF.
Moreover, even those readers who don't know Italian, will find many of Moreover, even those readers who don't know Italian, will find many of
its words familiar. The exercises will encourage the reader to its words familiar, due to the Latin heritage.
port the examples to other languages; in fact, many GF The exercises will encourage the reader to
applications work for 5-10 languages. port the examples to other languages as well; in particular,
it should be instructive for the reader to look at her
own native language from the point of view of writing a grammar
implementation.
To learn how to write GF grammars is not the only goal of To learn how to write GF grammars is not the only goal of
this tutorial. We will also explain the most important this tutorial. We will also explain the most important
@@ -354,7 +390,7 @@ This knowledge will be introduced as a part of grammar writing
practice. practice.
Thus the tutorial should be accessible to anyone who has some Thus the tutorial should be accessible to anyone who has some
previous programming experience from any programming language; the basics previous experience from any programming language; the basics
of using computers are also presupposed, e.g. the use of of using computers are also presupposed, e.g. the use of
text editors and the management of files. text editors and the management of files.
@@ -368,7 +404,7 @@ and/or programming in other languages in which GF grammars are embedded.
=Getting started= =Getting started=
In this chapter, we will introduce the GF program and write the first GF grammar, In this chapter, we will introduce the GF system and write the first GF grammar,
a "Hello World" grammar. While extremely small, this grammar already illustrates a "Hello World" grammar. While extremely small, this grammar already illustrates
how GF can be used for the tasks of translation and multilingual how GF can be used for the tasks of translation and multilingual
generation. generation.
@@ -385,7 +421,7 @@ We use the term GF for three different things:
The relation between these things is obvious: the GF system is an implementation The relation between these things is obvious: the GF system is an implementation
of the GF programming language, which in turn is built on the ideas of the of the GF programming language, which in turn is built on the ideas of the
GF theory. The main focus of this book is on the GF programming language. GF theory. The main focus of this book is on the GF programming language.
We learn how grammars are written in the language. At the same time, we learn We learn how grammars are written in this language. At the same time, we learn
the way of thinking in the GF theory. To make this all useful and fun, and the way of thinking in the GF theory. To make this all useful and fun, and
to encourage experimenting, we make the grammars run on a computer by to encourage experimenting, we make the grammars run on a computer by
using the GF system. using the GF system.
@@ -410,7 +446,7 @@ other tasks are readily available for GF grammars:
- **morphological synthesis**: generate all inflection forms of words - **morphological synthesis**: generate all inflection forms of words
- **random generation**: generate random expressions - **random generation**: generate random expressions
- **corpus generation**: generate all expressions - **corpus generation**: generate all expressions
- **treebank generation**: generate a list of trees with multiple linearizations - **treebank generation**: generate a list of trees with their linearizations
- **teaching quizzes**: train morphology and translation - **teaching quizzes**: train morphology and translation
- **multilingual authoring**: create a document in many languages simultaneously - **multilingual authoring**: create a document in many languages simultaneously
- **speech input**: optimize a speech recognition system for your grammar - **speech input**: optimize a speech recognition system for your grammar
@@ -456,12 +492,12 @@ is given in the libraries.
%--! %--!
==Getting the GF program== ==Getting the GF system==
The GF program is open-source free software, which can be downloaded via the The GF system is open-source free software, which can be downloaded via the
GF Homepage: GF Homepage:
[``http://www.cs.chalmers.se/~aarne/GF`` http://www.cs.chalmers.se/~aarne/GF] [``http://www.digitalgrammars.com/gf`` http://www.digitalgrammars.com/gf]
There you can download There you can download
- binaries for Linux, Mac OS X, and Windows - binaries for Linux, Mac OS X, and Windows
@@ -471,7 +507,7 @@ There you can download
If you want to compile GF from source, you need a Haskell compiler. If you want to compile GF from source, you need a Haskell compiler.
To compile the interactive editor, you also need a Java compilers. To compile the interactive editor, you also need a Java compilers.
But normally you don't have to compile, and you definitely But normally you don't have to compile anything yourself, and you definitely
don't need to know Haskell or Java to use GF. don't need to know Haskell or Java to use GF.
We are assuming the availability of a Unix shell. Linux and Mac OS X users We are assuming the availability of a Unix shell. Linux and Mac OS X users
@@ -480,9 +516,9 @@ Windows users are recommended to install Cywgin, the free Unix shell for Windows
%--! %--!
==Running the GF program== ==Running the GF system==
To start the GF program, assuming you have installed it, just type To start the GF system, assuming you have installed it, just type
``gf`` in the Unix (or Cygwin) shell: ``gf`` in the Unix (or Cygwin) shell:
``` ```
% gf % gf
@@ -494,7 +530,7 @@ The command
``` ```
will give you a list of available commands. will give you a list of available commands.
As a common convention in this Tutorial, we will use As a common convention in this book, we will use
- ``%`` as a prompt that marks system commands - ``%`` as a prompt that marks system commands
- ``>`` as a prompt that marks GF commands - ``>`` as a prompt that marks GF commands
@@ -547,8 +583,9 @@ The code has the following parts:
module named ``Hello`` module named ``Hello``
- a **module body** in braces, consisting of - a **module body** in braces, consisting of
- a **startcat flag declaration** stating that ``Greeting`` is the - a **startcat flag declaration** stating that ``Greeting`` is the
main category, i.e. the one we are most interested in main category, i.e. the one in which parsing and generation is
- **category declarations** stating that ``Greeting`` and ``recipient`` performed by default
- **category declarations** stating that ``Greeting`` and ``Recipient``
are categories, i.e. types of meanings are categories, i.e. types of meanings
- **function declarations** stating what meaning-building functions there - **function declarations** stating what meaning-building functions there
are; these are the three possible recipients, as well as the function are; these are the three possible recipients, as well as the function
@@ -578,7 +615,8 @@ The major parts of this code are:
- **linearization definitions** telling what records are assigned to - **linearization definitions** telling what records are assigned to
each of the meanings defined in the abstract syntax; the recipients are each of the meanings defined in the abstract syntax; the recipients are
linearized to records containing single words, whereas the ``Hello`` greeting linearized to records containing single words, whereas the ``Hello`` greeting
has a function telling that the word ``hello`` is prefixed to the argument has a function telling that the word ``hello`` is prefixed to the string
containing in the argument record
@@ -609,16 +647,16 @@ many other tasks, which we will now look into.
===Using the grammar in the GF program=== ===Using the grammar in the GF system===
In order to compile the grammar in GF, each of the four modules In order to compile the grammar in GF, each of the four modules
has to be put in a file named //modulename//``.gf``: has to be put in a file named //Modulename//``.gf``:
``` ```
Hello.gf HelloEng.gf HelloFin.gf HelloIta.gf Hello.gf HelloEng.gf HelloFin.gf HelloIta.gf
``` ```
The first GF command needed when using a grammar is to **import** it. The first GF command needed when using a grammar is to **import** it.
The command has a long name, ``import``, and a short name, ``i``. The command has a long name, ``import``, and a short name, ``i``.
You can type either You can thus type either
``` ```
> import food.cf > import food.cf
``` ```
@@ -627,8 +665,8 @@ or
> i food.cf > i food.cf
``` ```
to get the same effect. In general, all GF commands have a long and a short name; to get the same effect. In general, all GF commands have a long and a short name;
short names are convenient when typing commands by hand, whereas long commands short names are convenient when typing commands by hand, whereas long command
are more readable in scripts, i.e. files with lists of commands. names are more readable in scripts, i.e. files that include sequences of commands.
The effect of ``import`` is that the GF program **compiles** your grammar The effect of ``import`` is that the GF program **compiles** your grammar
into an internal representation, and shows a new prompt when it is ready. into an internal representation, and shows a new prompt when it is ready.
@@ -688,7 +726,7 @@ into linearizing with ``HelloIta``:
ciao mamma ciao mamma
``` ```
Notice that the commands must use a **language flag** to indicate Notice that the commands must use a **language flag** to indicate
which concrete syntax is used in each of the operations. which concrete syntax is used in each operation.
To conclude the translation exercise, we import the Finnish grammar To conclude the translation exercise, we import the Finnish grammar
and pipe English parsing into **multilingual generation**: and pipe English parsing into **multilingual generation**:
@@ -711,20 +749,65 @@ languages you might know.
==Using grammars from outside GF==
A "hello world" program written e.g. in C should be executable from the
Unix shell and print its output on the terminal. This is possible in GF
as well, by using the ``gf`` program in a Unix pipe. Invoking ``gf``
can be made with grammar names as arguments,
```
% gf HelloEng.gf HelloFin.gf HelloIta.gf
```
which has the same effect as opening ``gf`` and then importing the
grammars. A command can be send to this ``gf`` state by piping it from
Unix's ``echo`` command:
```
% echo "l -multi Hello Wordl" | gf HelloEng.gf HelloFin.gf HelloIta.gf
```
which will execute the command and then quit. Alternatively, one can write
a **script**,
```
import HelloEng.gf
import HelloFin.gf
import HelloIta.gf
linearize -multi Hello World
```
If we name this script ``hello.gfs``, we can do
```
$ gf -batch -s <hello.gfs s
ciao mondo
terve maailma
hello world
```
The options ``-batch`` and ``-s``("silent") prohibit GF's prompts, CPU time,
and other messages.
Writing GF scripts and Unix shell scripts that call GF is the simplest
way to build application programs that use GF grammars.
**Exercise**. (For Unix hackers.) Write a GF application that reads
an English string from the standard input and writes an Italian
translation to the output.
==What else can be done with the grammar== ==What else can be done with the grammar==
Now we have built our first multilingual grammar and seen the basic Now we have built our first multilingual grammar and seen the basic
functionalities of GF: parsing and linearization. We have tested functionalities of GF: parsing and linearization. We have tested
these functionalities inside the GF program. In the forthcoming these functionalities inside the GF program. In the forthcoming
chapters, we will build larger grammars and have more fun with chapters, we will build larger grammars and have more fun with
these functionalities. But we will also introduce many more: these functionalities. But we will also introduce many more,
- random generation as listed above in Section 3.2:
- exhaustive generation random generation,
- treebank generation exhaustive generation,
- syntax editing treebank generation,
- morphological analysis syntax editing,
- translation and morphological quizzes morphological analysis,
- semantic filtering and
translation and morphological quizzes.
The usefulness of GF would be quite limited if grammars were The usefulness of GF would be quite limited if grammars were
@@ -803,6 +886,9 @@ and of the right-hand-side in subsequent judgements of the same form
fun World, Mum, Friends : Recipient ; === fun World, Mum, Friends : Recipient ; ===
fun World : Recipient ; Mum : Recipient ; Friends : Recipient ; fun World : Recipient ; Mum : Recipient ; Friends : Recipient ;
``` ```
We use the symbol ``===`` to indicate **syntactic sugar** when
speaking about GF. Thus it is not a symbol of the GF language.
The order of judgements in a module is free. In particular, an identifier The order of judgements in a module is free. In particular, an identifier
need not be declared before it is used. need not be declared before it is used.
@@ -828,6 +914,7 @@ In a concrete syntax, the available types include
**Terms** used in linearizations have the forms **Terms** used in linearizations have the forms
- quoted string: ``"foo"``, of type ``Str`` - quoted string: ``"foo"``, of type ``Str``
- concatenation of strings: ``"foo" ++ "bar"``,
- record: ``{`` r1 = t1 ; ... ; rn = Tn ``}``, - record: ``{`` r1 = t1 ; ... ; rn = Tn ``}``,
of type ``{`` r1 : R1 ; ... ; rn : Rn ``}`` of type ``{`` r1 : R1 ; ... ; rn : Rn ``}``
- projection ``t.r`` with a record label, of the corresponding record - projection ``t.r`` with a record label, of the corresponding record
@@ -836,6 +923,15 @@ In a concrete syntax, the available types include
of the corresponding linearization type of the corresponding linearization type
Each semi-colon separated part in record types and records is called a
**field**. The identifier introduced by the left-hand-side of a field
is called a **label**.
Each quoted string is treated as one **token**, and strings concatenated by
``++`` are treated as separate tokens. Tokens are, by default, written with
a space in between. This behaviour can be changed by ``lexer`` and ``unlexer``
flags, as will be explained later in Section ??.
@@ -847,8 +943,8 @@ the ``Hello`` grammar. We will look at how the abstract syntax
is divided into suitable categories, and how infinitely many is divided into suitable categories, and how infinitely many
phrases can be built by using recursive rules. We will also phrases can be built by using recursive rules. We will also
introduce **modularity** by showing how a large grammar can be introduce **modularity** by showing how a large grammar can be
divided into modules, and how functions defined **resource modules** divided into modules, and how functions defined in **resource modules**
can be used for avoiding repeated code. can be used to ahare code in and among modules.
==The abstract syntax Food== ==The abstract syntax Food==
@@ -856,12 +952,14 @@ can be used for avoiding repeated code.
We will write a grammar that We will write a grammar that
defines a set of phrases usable for speaking about food: defines a set of phrases usable for speaking about food:
- the main category is ``Phrase`` - the main category is ``Phrase``
- a ``Phrase`` can be built by assigning a ``Quality`` to an ``Item``s - a ``Phrase`` can be built by assigning a ``Quality`` to an ``Item``s
(e.g. //this cheese is Italian//)
- an``Item`` are build from a ``Kind`` by prefixing "this" or "that" - an``Item`` are build from a ``Kind`` by prefixing "this" or "that"
- a ``Kind`` is either **atomic**, such as "cheese" and "wine", or formed (e.g. //this wine//)
modifying a given ``Kind`` with a ``Quality`` - a ``Kind`` is either **atomic** (e.g. //cheese//), or formed
- a ``Quality`` is either atomic, such as "Italian" and "boring", modifying a given ``Kind`` with a ``Quality`` (e.g. //Italian cheese//)
or built by modifying a given ``Quality`` "very" - a ``Quality`` is either atomic (e.g. //Italian//,
or built by modifying a given ``Quality`` (e.g. //very warm//)
These verbal descriptions can be expressed as the following abstract syntax: These verbal descriptions can be expressed as the following abstract syntax:
@@ -963,7 +1061,7 @@ builds a random tree in accordance with an abstract syntax:
``` ```
By using a pipe, random generation can be fed into linearization: By using a pipe, random generation can be fed into linearization:
``` ```
> gr | l > generate_random | linearize
this Italian fish is fresh this Italian fish is fresh
``` ```
Random generation is a good way to test a grammar; it can also Random generation is a good way to test a grammar; it can also
@@ -1041,7 +1139,7 @@ want to see:
this cheese is boring this cheese is boring
Is (This Cheese) Boring Is (This Cheese) Boring
``` ```
This facility is good for test purposes: for instance, you This facility is useful for test purposes: for instance, you
may want to see if a grammar is **ambiguous**, i.e. may want to see if a grammar is **ambiguous**, i.e.
contains strings that can be parsed in more than one way. contains strings that can be parsed in more than one way.
@@ -1074,43 +1172,6 @@ of grammars.
%--!
==Modules and files==
GF uses suffixes to recognize different file formats. The most
important ones are:
- Source files: //Modulename//``.gf``
- Target files: //Modulename//``.gfc``
When you import ``FoodEng.gf``, you see the target files being
generated:
```
> i FoodEng.gf
- compiling Food.gf... wrote file Food.gfc 16 msec
- compiling FoodEng.gf... wrote file FoodEng.gfc 20 msec
```
You also see that the GF program does not only read the file
``FoodEng.gf``, but also all other files that it
depends on - in this case, ``Food.gf``.
For each file that is compiled, a ``.gfc`` file
is generated. The GFC format (="GF Canonical") is the
"machine code" of GF, which is faster to process than
GF source files. When reading a module, GF decides whether
to use an existing ``.gfc`` file or to generate
a new one, by looking at modification times.
**Exercise**. What happens when you import ``FoodEng.gf`` for
a second time? Try this in different situations:
- Right after importing it the first time (the modules are kept in
the memory of GF and need no reloading).
- After issuing the command ``empty`` (``e``), which clears the memory
of GF.
- After making a small change in ``FoodEng.gf``, be it only an added space.
- After making a change in ``Food.gf``.
==An Italian concrete syntax== ==An Italian concrete syntax==
@@ -1140,7 +1201,7 @@ English words with their usual dictionary equivalents:
} }
``` ```
An alert reader, or one who already knows Italian, may notice one point in An alert reader, or one who already knows Italian, may notice one point in
which a change more radical than replacement of words is made: the order of which the change is more radical than just replacement of words: the order of
a quality and the kind it modifies in a quality and the kind it modifies in
``` ```
QKind quality kind = {s = kind.s ++ quality.s} ; QKind quality kind = {s = kind.s ++ quality.s} ;
@@ -1148,13 +1209,14 @@ a quality and the kind it modifies in
Thus Italian says ``vino italiano`` for ``Italian wine``. Thus Italian says ``vino italiano`` for ``Italian wine``.
**Exercise**. Write a concrete syntax of ``Food`` for some other language. **Exercise**. Write a concrete syntax of ``Food`` for some other language.
You will probably end up with grammatically incorrect output - but don't You will probably end up with grammatically incorrect linearizations - but don't
worry about this yet. worry about this yet.
**Exercise**. If you have written ``Food`` for German, Swedish, or some **Exercise**. If you have written ``Food`` for German, Swedish, or some
other language, test with random or exhaustive generation what constructs other language, test with random or exhaustive generation what constructs
come out incorrect, and prepare a list of those ones that cannot be helped come out incorrect, and prepare a list of those ones that cannot be helped
with the currently available fragment of GF. with the currently available fragment of GF. You can return to your list
after having worked out Chapter 5.
@@ -1267,16 +1329,16 @@ current editing session:
Editing can be continued even when the tree is finished. The user can shift Editing can be continued even when the tree is finished. The user can shift
the **focus** to some of the subtrees by clicking at it of the corresponding the **focus** to some of the subtrees by clicking at it of the corresponding
part of a linearization. In the picture, the focus is on "fish". part of a linearization. In the picture, the focus is on "fish".
The menu shows no refinements, since there are no metavariables, but other Since there are no metavariables,
possible actions: the menu shows no refinements, but some other possible actions:
- to **change** "fish" to "cheese" or "wine" - to **change** "fish" to "cheese" or "wine"
- to **delete** "fish", i.e. change it to a metavariable - to **delete** "fish", i.e. change it to a metavariable
- to **wrap** "fish" in a qualification, i.e. change it to - to **wrap** "fish" in a qualification, i.e. change it to
``QKind ? Fish``, where the quality can be given in a later refinement ``QKind ? Fish``, where the quality can be given in a later refinement
In adition to menu-based editing, the tool supports refinement by parsing, In addition to menu-based editing, the tool supports refinement by parsing,
which gets accessible by middle-clicking at the linearization field. which is accessible by middle-clicking in the tree or in the linearization field.
**Exercise**. Construct the sentence **Exercise**. Construct the sentence
//this very expensive cheese is very very delicious// //this very expensive cheese is very very delicious//
@@ -1313,7 +1375,7 @@ The grammar ``FoodEng`` could be written in a BNF format as follows:
``` ```
In this format, each rule is prefixed by a **label** that gives In this format, each rule is prefixed by a **label** that gives
the constructor function GF gives in its ``fun`` rules. In fact, the constructor function GF gives in its ``fun`` rules. In fact,
each context-free rule is a fusion of ``fun`` and a ``lin`` rule: each context-free rule is a fusion of a ``fun`` and a ``lin`` rule:
it states simultaneously that it states simultaneously that
- the label is a function from the nonterminal categories - the label is a function from the nonterminal categories
on the right-hand side to the category on the left-hand side; on the right-hand side to the category on the left-hand side;
@@ -1329,7 +1391,7 @@ it states simultaneously that
``` ```
The translations from BNF to GF described above are in fact used in The translation from BNF to GF described above is in fact used in
the GF system to convert BNF grammars into GF. BNF files are recognized the GF system to convert BNF grammars into GF. BNF files are recognized
by the file name suffix ``.cf``; thus the grammar above can be by the file name suffix ``.cf``; thus the grammar above can be
put into a file named ``food.cf`` and read into GF by put into a file named ``food.cf`` and read into GF by
@@ -1349,11 +1411,11 @@ grammar that supports translation via common abstract syntax: the
qualification function ``QKind`` has different types in the two qualification function ``QKind`` has different types in the two
grammars. grammars.
To summarize, the separation of concrete and abstract syntax allows In general terms, the separation of concrete and abstract syntax allows
three deviations from context-free grammar: three deviations from context-free grammar:
- **permutation**: vary the linear order of constituents - **permutation**: changing the order of constituents
- **suppression**: omit some constituent in linearization - **suppression**: omitting constituents
- **reduplication**: repeat some constituent in linearization - **reduplication**: repeating constituent
The third property is the one that definitely shows that GF is The third property is the one that definitely shows that GF is
@@ -1361,8 +1423,8 @@ stronger than context-free: GF can define the **copy language**
``{x x | x <- (a|b)*}``, which is known not to be context-free. ``{x x | x <- (a|b)*}``, which is known not to be context-free.
The other properties have more to do with the kind of trees that The other properties have more to do with the kind of trees that
the grammar can associated with strings: permutation is important the grammar can associated with strings: permutation is important
in multilingual grammars, and suppression is needed in grammars in multilingual grammars, and suppression is exploited in grammars
were trees carry some hidden semantic information (see Chapter 8 where trees carry some hidden semantic information (see Chapter 7
below). below).
Of course, context-free grammars are also restricted from the Of course, context-free grammars are also restricted from the
@@ -1384,13 +1446,53 @@ document.
%--!
==Modules and files==
GF uses suffixes to recognize different file formats. The most
important ones are:
- Source files: //Modulename//``.gf``
- Target files: //Modulename//``.gfc``
When you import ``FoodEng.gf``, you see the target files being
generated:
```
> i FoodEng.gf
- compiling Food.gf... wrote file Food.gfc 16 msec
- compiling FoodEng.gf... wrote file FoodEng.gfc 20 msec
```
You also see that the GF program does not only read the file
``FoodEng.gf``, but also all other files that it
depends on - in this case, ``Food.gf``.
For each file that is compiled, a ``.gfc`` file
is generated. The GFC format (="GF Canonical") is the
"machine code" of GF, which is faster to process than
GF source files. When reading a module, GF decides whether
to use an existing ``.gfc`` file or to generate
a new one, by looking at modification times.
**Exercise**. What happens when you import ``FoodEng.gf`` for
a second time? Try this in different situations:
- Right after importing it the first time (the modules are kept in
the memory of GF and need no reloading).
- After issuing the command ``empty`` (``e``), which clears the memory
of GF.
- After making a small change in ``FoodEng.gf``, be it only an added space.
- After making a change in ``Food.gf``.
==Using operations and resource modules== ==Using operations and resource modules==
===The golden rule of functional programming=== ===The golden rule of functional programming===
When writing a grammar, you have to type lots of When writing a grammar, you have to type lots of
characters. You have probably characters. You have probably
done this by the copy-paste-modify method, which is a common way to done this by the copy-and-paste method, which is a common way to
avoid repeating work. avoid repeating work.
However, there is a more elegant way to avoid repeating work than However, there is a more elegant way to avoid repeating work than
@@ -1437,13 +1539,11 @@ For lambda abstraction with multiple arguments, we have the shorthand
``` ```
\x,y,z -> t === \x -> \y -> \z -> t \x,y,z -> t === \x -> \y -> \z -> t
``` ```
The notation we have used for linearization rules, The notation we have used for linearization rules, where
variables are bound on the left-hand side, is actually syntactic
sugar for abstraction:
``` ```
lin f x y = t lin f x = t === lin f = \x -> t
```
is shorthand for
```
lin f = \x,y -> t
``` ```
@@ -1454,7 +1554,8 @@ is shorthand for
===The ``resource`` module type=== ===The ``resource`` module type===
Operator definitions can be included in a concrete syntax. Operator definitions can be included in a concrete syntax.
But they are not really tied to a particular set of linearization rules. But they are usually not really tied to a particular
set of linearization rules.
They should rather be seen as **resources** They should rather be seen as **resources**
usable in many concrete syntaxes. usable in many concrete syntaxes.
@@ -1471,9 +1572,6 @@ strings and records.
prefix : Str -> SS -> SS = \p,x -> ss (p ++ x.s) ; prefix : Str -> SS -> SS = \p,x -> ss (p ++ x.s) ;
} }
``` ```
Resource modules can extend other resource modules, in the
same way as modules of other types can extend modules of the
same type. Thus it is possible to build resource hierarchies.
@@ -1570,8 +1668,6 @@ formed by operations and other GF constructs. For example,
``` ```
==Grammar architecture== ==Grammar architecture==
===Extending a grammar=== ===Extending a grammar===
@@ -1617,7 +1713,9 @@ following Italian grammar module:
Pizza = ss "pizza" ; Pizza = ss "pizza" ;
} }
``` ```
Resource modules can extend other resource modules, in the
same way as modules of other types can extend modules of the
same type. Thus it is possible to build resource hierarchies.
===Multiple inheritance=== ===Multiple inheritance===
@@ -1646,6 +1744,16 @@ same time:
MushroomKind : Mushroom -> Kind ; MushroomKind : Mushroom -> Kind ;
} }
``` ```
The main advantages with splitting a grammar to modules are
**reusability**, **separate compilation**, and **division of labour**.
Reusability means
that one and the same module can be put into different uses; for instance,
a module with mushroom names might be used in a mycological information system
as well as in a restaurant phrasebook. Separate compilation means that a module
once compiled into ``.gfc`` need not be compiled again unless changes have
taken place.
Division of labour means simply that programmers that are experts in
special areas can work on modules belonging tp those areas.
**Exercise**. Refactor ``Food`` by taking apart ``Wine`` into a special **Exercise**. Refactor ``Food`` by taking apart ``Wine`` into a special
``Drink`` module. ``Drink`` module.
@@ -1653,24 +1761,6 @@ same time:
===Division of labour===
Using operations defined in resource modules is a
way to avoid repetitive code.
In addition, it enables a new kind of modularity
and division of labour in grammar writing: grammarians familiar with
the linguistic details of a language can make their knowledge
available through resource grammar modules, whose users only need
to pick the right operations and not to know their implementation
details.
In the following Chapter, we will go through some
such linguistic details. The programming constructs needed when
doing this are useful for all GF programmers, even for those who don't
hand-code the linguistics of their applications but get them
from libraries. And it can be generally interesting to learn something about the
linguistic concepts of inflection, agreement, and parts of speech, in the
form of precise computer-executable code.