intro revised

This commit is contained in:
aarne
2007-08-28 13:40:13 +00:00
parent 16a6034cdb
commit 37b520b440

View File

@@ -42,30 +42,30 @@ Last update: %%date(%c)
==Natural language application programming== ==Natural language application programming==
Making computers understand human language is one of the oldest dreams of Making computers understand human language is one of the oldest dreams of
programmers. Projects with machine translations started almost as soon as programmers. Projects with machine translation started almost as soon as
the first computers appeared in the 1940's. This was partly encouraged by the the first computers appeared in the 1940's. They was partly encouraged by the
success of decryption during the Second World War. Thus some American scientists success of decryption during the Second World War. Thus some American scientists
had the vision that Russian can be seen as encrypted English, which can be had the vision that Russian can be seen as encrypted English, which can be
deciphered by similar algorithms as those used for cracking the Germans' Enigma. deciphered by similar algorithms as those used for cracking the Germans' Enigma.
Despite substantial efforts on machine translation, the early visions were not Despite substantial efforts in machine translation, the early visions were not
realized, and the general conclusion reached by the mid-1960's was that realized, and the general conclusion reached by the mid-1960's was that
high-quality broad-coverage machine translation is impossible. Machine high-quality broad-coverage machine translation is impossible. Machine
translation was translated to the less ambitious and more specialized tasks of translation was tuned down to the less ambitious and more specialized group of
computational linguistics. Parallel to this, fantacies of "speaking robots" and tasks that started to be called computational linguistics.
Parallel to this, fantacies of "speaking robots" and
other language-understanding machines prevailed, exemplified by such science other language-understanding machines prevailed, exemplified by such science
fiction figures as the HAL computer in the film "2001: A Space Odyssey" from fiction figures as the HAL computer in "2001: A Space Odyssey".
1970.
What we see in today's market of language understanding machines is a variety of The language understanding machines we see today are a variety of
products, which focus on different aspects of the task and none of which comes products, which focus on different aspects of the task and none of which comes
even close to HAL or a machine translator with human-like capacities. Here is a even close to HAL or a machine translator with human-like capacities. Here is a
list of some such applications: list of some such applications:
- browse-quality machine translation: Systran - browse-quality machine translation: Systran
- machine translation specialized on weather reports: Meteo - machine translation specialized on weather reports: Meteo
- electronic dictionaries - electronic dictionaries: desktop, web-based, portable
- spelling and grammar checkers - spelling and grammar checkers
- dialogue systems for enabling simple speech interaction with a computer - dialogue systems enabling simple speech interaction with a computer
A common feature of these applications is that their construction requires A common feature of these applications is that their construction requires
@@ -77,14 +77,21 @@ make a computer understand at least something of a natural language.
This is where GF comes into picture. GF, Grammatical Framework, is a programming This is where GF comes into picture. GF, Grammatical Framework, is a programming
language designed for expressing linguistic rules. A set of such rules is called language designed for expressing linguistic rules. A set of such rules is called
a **grammar**. GF is designed in such a way that it is much easier to write a **grammar**. GF is designed to make it easy to write grammar rules; this is
grammar rules in it than in a general-purpose programming language, such as much easier than in a general-purpose programming language such as
Java or C or Haskell. At the same time, GF is equipped with tools for Java or C or Haskell. But it is also in many ways easier and more productive
**embedded grammars**. This means that a GF grammar can be used as a component than in other languages specialized in grammars; the most well-known of these
of a program written in another language, such as Java or C or Haskell. To build is the BNF notation (Backus Naur Form), which is also known as
a language application usually involves much more than just a grammar, and it is context-free grammars and implemented in tools such as YACC.
important that the grammar can be integrated seemlessly with the rest of the
application. GF does not only enable the writing of grammars. It is also equipped with tools
for integrating grammars in language-processing systems.
To build a language application usually involves much more than just a grammar,
and these other parts are often written in general-purpose languages.
Since it is important that the grammar can be integrated seemlessly with
the rest of the application, GF grammars can be converted into
**embedded grammars**, which can be directly used as components of
programs written in other languages such as C, Java, JavaScript, and Haskell.
Since natural language application programming requires linguistic knowledge, it Since natural language application programming requires linguistic knowledge, it
is usually considered to need linguistic training. The mission of GF is to relieve is usually considered to need linguistic training. The mission of GF is to relieve
@@ -96,85 +103,89 @@ some of this need. This is achieved in two ways:
This said, GF makes no claim to "fire linguists" from natural language programming This said, GF makes no claim to "fire linguists" from natural language programming
projects. The claim is rather one of the **division of labour**: GF enables the projects. The claim is just that there should be a **division of labour**:
division of grammar writing into different **modules**, where some modules in GF, grammar can be divided into different **modules**, where some modules
require linguistic knowledge and others don't. Linguists working on the linguistic require linguistic knowledge and others don't. Linguists working on the linguistic
modules will appreciate the way GF supports abstractions and generalizations, and modules will appreciate the way GF supports abstractions and generalizations, and
also the grammar development tools that enable testing of linguistic rules. also the grammar development tools that enable testing of linguistic rules.
Non-linguists working on the application-oriented modules will appreciate the Non-linguists working on application-oriented modules will appreciate the
possibility to take grammar rules for granted and focus on other aspects of possibility to rely grammar rules defined in the linguistic library modules,
the program. and to focus on other aspects of the task.
==The history of GF and its applications== ==A brief history of GF and its applications==
GF belongs to the tradition of **functional programming languages**, exemplified GF belongs to the tradition of **functional programming languages**, exemplified
by Lisp and, as later and closer relatives, ML and Haskell. An important branch by ML and Haskell and, somewhat more remotely, Lisp. One branch
of functional programming is **type theory**, which in turn has its roots in of functional programming is **type theory**, which in turn has its roots in
logic and the foundations of mathematics. GF was, at the first place, created to logic and the foundations of mathematics. GF was, at the first place, created to
implement the idea that type theory can provide **semantics**, i.e. formalize implement the idea that type theory can provide **semantics**, i.e. formalize
the meaning of natural languages. Several aspects of type-theoretical semantics the meaning of natural languages. Several aspects of type-theoretical semantics
were covered in the monograph //Type-Theoretical Grammar// (A. Ranta, OUP 1994). were covered in the monograph //Type-Theoretical Grammar// (A. Ranta, OUP 1994).
But a stronger aspect grew out of subsequent experiments dealing with different But a stronger aspect grew out of subsequent experiments dealing with different
languages: it is possible to have a common semantics for many language, and languages: it is possible to have a common semantics for many languages, and
thereby build systems that translate between languages via the semantics. During thereby build systems that translate between languages via the semantics.
this period, discussions with Per Martin-Löf (Ranta's PhD supervisor at the The first implementation of this idea was written as a plug-in to the
University of Stockholm) had a major impact on the work, and cooperation proof editor Alfa (Magnusson & Nordström 1994) in 1995.
with Petri Mäenpää at the University of Helsinki led to the first computer
implementations.
As a stand-alone programming language, GF was first implemented in 1998. This As a stand-alone programming language, GF was first implemented in 1998. This
took place at Xerox Research Centre Europe in Grenoble, within a project entitled took place at Xerox Research Centre Europe in Grenoble, within a project entitled
//Multilingual Document Authoring//. The leading idea in the project was to //Multilingual Document Authoring//. The goal of the project was to build a tool
enable writing documents in multiple languages simultaneously, so that the user for writing documents in multiple languages simultaneously, so that the user
need only know one of the languages; the rest will be produced automatically need only know one of the languages; the rest will be produced automatically
via translations from the type-theoretical semantics. The Xerox staff involved via translations from the type-theoretical semantics.
in the project included Marc Dymetman, Lauri Karttunen, Veronika Lux, In addition to GF itself, the project produced some prototype applications,
Sylvain Pogodalla, and Annie Zaenen. e.g. a restaurant phrase book and an editor of medical drug descriptions.
An important aspect was the adaptability of the system to new domains and
languages; hence the need of a language where such adaptations can be made
by just wrting new grammars.
The Xerox project produced some prototype applications, e.g. a restaurant phrase The grammars that were build in the Xerox project
book and an editor of medical drug descriptions. The grammars that were build remained property of Xerox Corporation, but the GF formalism and its
remained the property of Xerox, but the GF formalism and its implementation implementation were released as open-source software under GNU General
were released as open-source software under GNU General Public License. The Public License. From 1999, the development of GF continued mostly at
principal author of GF got an academic position in 1999, at the Department of the Department of Computing Science of Chalmers University of Technology
Computing Science of Chalmers University of Technology and Gothenburg University. and Gothenburg University. In this environment, both functional programming
At Chalmers, both functional programming and type theory flourish, and in this and type theory are strong research areas. This helped GF to develop into
environment, GF developed into a more stable and more full-fledged programming a more stable and more full-fledged programming language.
language. In this process, collaboration with Koen Claessen, Thierry Coquand,
Thomas Hallgren, Patrik Jansson, and Bengt Nordström made important contributions.
The idea of making GF into "the working programmer's grammar formalism", as At Chalmers, GF has also been used in courses given to computer science
opposed to a tool requiring linguistic expertise, was confirmed at Chalmers students and in joint projects with non-linguist research groups.
in courses given to computer science students and later in joint research This activity soon crystallized the idea of making GF into
projects. A nice experience of the courses was that computer scientists are "the working programmer's grammar formalism", as
opposed to a tool requiring linguistic expertise. A nice experience from
the courses (both graduate and undergraduate) was that computer scientists are
often very interested in languages and have firm intuitions on grammar; given often very interested in languages and have firm intuitions on grammar; given
a suitable programming tool, they can achieve impressive results. GF seemed to a suitable programming tool, they can achieve impressive results in short time.
be close to such a tool, and, in subsequent collaborations at the Department, GF was to be made into such a tool, which meant above all that
it evolved even more to a programming language with a virtues of familiarity it was developed in the way programming languages are,
and "the least surprise". Issues of stability are also important, including following the virtues of familiarity and "the least surprise".
backward compatibility, and documentation is something there can hardly be Issues of stability are also important, including
too much of. As a mark of stability, version 1.0 of GF was released in backward compatibility and portability to different platforms.
2002. In 2004, a theoretical reference paper appeared in the Journal Documentation is something there can hardly be too much of.
of Functional Programming, as well as a long tutorial text in the ESSLLI As a mark of stability, version 1.0 of GF was released in
lecture notes post-publication. 2002. In 2004, a reference article appeared in the
//Journal of Functional Programming//,
and a long tutorial text was published in the post-publication of
ESSLLI lecture notes.
The first full-scale applications of GF emerged as natural-language interfaces. The first full-scale applications of GF were natural-language interfaces.
The first one was for the proof editor Alfa, written with Thomas Hallgren. The first one was for the proof editor Alfa (Hallgren & Ranta 2000).
The second one was a syntax editor and a natural-language interface to the The second one was a syntax editor and a natural-language interface to the
software specification language OCL (Object Constraint Language) built software specification language OCL (Object Constraint Language) built
within the KeY project. This work was done first with Reiner Hähnle, then within the KeY project (Ahrendt & al. 2006).
with the students Kristoffer Johannisson (PhD 2005), Hans-Joachim Daniels, These projects boosted the implementation side
and David Burke. On the GF implementation side, Janna Khegai (PhD 2006) built of GF itself, in particular, the graphical syntax editor (Khegai & al. 2003).
a Java-based syntax editor. Peter Ljunglöf (PhD 2004) succeeded to identify At the same time, some major mathematical properties of GF were established
the complexity of parsing in GF and found an algorithm that greatly improved in the PhD thesis of Peter Ljunglöf (2004), which led to improved
the use of GF in parsing. He implemented the algorithm with Håkan Burden, and parser implementations.
it was later still improved by Krasimir Angelov.
At the same time, collaboration with the Linguistics Department of At the same time as GF was used in joint projects with computer science groups,
Gothenburg University served as a "linguistic sanity check" of GF. collaboration with the Linguistics Department of
Robin Cooper, an eminent linguist working at the Department, initiated Gothenburg University served as a "linguistic sanity check" of GF. To efforts
two efforts that have formed the development of GF: that have been formative to the development of GF were started within this
collaboration:
- resource grammar libraries - resource grammar libraries
- dialogue system applications - dialogue system applications
@@ -188,43 +199,37 @@ the library started in 2002; a version stable enough to be released with number
Dialogue systems, on the other hand, turned Dialogue systems, on the other hand, turned
out to be a major source of interesting problems and also of successful solutions. out to be a major source of interesting problems and also of successful solutions.
Much of this work was carried out in the European project TALK (Tools for Ambient Much of this work was carried out in the European project TALK (Tools for Ambient
Linguistic Knowledge, 2004-2006), by Björn Bringert, Rebecca Jonson, and Linguistic Knowledge, 2004-2006), also involving sites from Cambridge,
Peter Ljunglöf in Gothenburg, and Oliver Lemon (Edinburgh), Nadine Perera (BMW), Edinburgh, and BMW in Munich. In addition to complete systems, the TALK
and Karl Weilhammer (Cambridge) at the other sites. In addition to project produced supporting tools for embedded grammars
complete systems, this project produced supporting tools for embedded grammars and speech recognition, as well as additions of spoken language structures
and speech recognition, and additions to the resource grammar library. to the resource grammar library.
Besides dialogue systems, multilingual authoring and translation continues Besides dialogue systems, multilingual authoring and translation continued
to be the main application of GF. The European WebALT project (Web Advanced to be the main application of GF. The European WebALT project (Web Advanced
Learning Technologies, 2005-2006), used GF to build a tool for translating Learning Technologies, 2005-2006), used GF to build a tool for translating
mathematical exercises from formal specifications (written in MathML) to mathematical exercises from formal specifications (written in MathML) to
six language. Also tool integrating GF with a computer algebra system was six language. Also tool integrating GF with a computer algebra system was
developed. The project gave rise to a company, WebALT Inc. Many members developed. The project gave rise to a company, WebALT Inc.
of the WebALT staff also contributed to GF and the resource grammar library:
Lauri Carlson, Glòria Casanellas, Anni Laine, Wanjiku N'gan'ga, and
Jordi Saludes.
As of the time of writing (August 2007), the release of GF has version At the time of writing this (August 2007), the release of GF has version
number 2.8. It is a stable system that has been built with contributions number 2.8. It is a stable system that has been built with contributions
of dozens of persons and been used by at least hundreds; download figures of dozens of persons and been used by at least hundreds; download figures
are in thousands. New ideas of how to apply GF are posted by users almost are in thousands. New ideas of how to apply GF are posted by users almost
every week. These users are often programmers with good knowledge of every week.
functional languages, highly developed instinct for programming language
design, and firm intuitions on natural language. Another group of users
are those that have been trained in GF on courses.
==The purpose and scope of this book== ==The purpose and scope of this book==
The purpose of this book is to serve the growing user base of GF with One purpose of this book is to serve the growing user base of GF with
a manual that gathers all relevant information in one place. However, it a definitive manual that gathers all relevant information in one place.
is also intended to serve those who want to get started with GF, and However, it is also intended to serve those who want to get started with GF, and
who don't necessarily have the technical background of the typical who don't necessarily have the technical background of the typical
users. We believe that learning to program in GF is not more difficult users. We believe that learning to program in GF is not more difficult
than learning some other programming language; as for the linguistic than learning some other programming language; as for the linguistic
aspects, we believe that writing grammars is an excellent introduction aspects, our experience is that writing grammars is an excellent introduction
to the problems of linguistics, where theory can be learnt at the to the problems of linguistics. In this way, theory can be learnt at the
same time as it is motivated by concrete problems. same time as it is motivated by concrete problems.
The book thus starts with a tutorial, which gradually explains all The book thus starts with a tutorial, which gradually explains all
@@ -233,22 +238,21 @@ aspects of grammar engineering are covered, to help the user to scale
up from small to large and possibly collaborative applications. up from small to large and possibly collaborative applications.
After the tutorial, the book continues with a "cook book" containing After the tutorial, the book continues with a "cook book" containing
hints and case studies for advanced users. Moreover, the resource hints and case studies for advanced users. Moreover, the resource
grammar library is covered in some detail, which will help the grammar library is covered in some detail, which will help those
programmers who want to port the library to new languages, but also programmers who want to port the library to new languages, but also
motivate linguistically the choices made in the libraries. motivate linguistically the choices made in the libraries.
A complete reference manual concludes the book, with a quick reference A complete reference manual concludes the book, with a quick reference
card as an appendix. card as an appendix.
What is not covered by the book is theoretical discussions of What is not given much space in the book is theoretical discussions of
GF, especially in comparison to other grammar formalism. Even though important GF, especially in comparison to other grammar formalism. Even though important
in the development of GF as a scientifically justified framework, such in the development of GF as a scientifically justified framework, such
discussions are not relevant for programmers who want to use GF - any more discussions are not relevant for programmers who just want to use GF - any more
than, say, a book on Haskell has to include comparisons with Java. In fact, than, say, a book on Haskell has to include comparisons with Java. In fact,
introducing Haskell by references to Java may have some point, since many introducing Haskell by references to Java may make more sense
of the readers can already be assumed to know Java. But, even though some than comparing GF with DCG or HPSG or LFG: many Haskell learners can
readers will know DCG or HPSG or LFG, we will not assume this; we will just already be expected to know Java, but most GF learners are not expected
note in passing the relation between GF and context-free grammars, also to know any grammar formalism, except perhaps BNF.
known as BNF grammars in computer science.
@@ -256,11 +260,13 @@ known as BNF grammars in computer science.
=Getting started= =Getting started=
In this chapter, we will introduce the GF program and write a first GF grammar. In this chapter, we will introduce the GF program and write the first GF grammar,
We show how the grammar is used for the tasks of translation and multilingual a "Hello World" grammar. While extremely small, this grammar already illustrates
how GF can be used for the tasks of translation and multilingual
generation. generation.
==What GF is== ==What GF is==
We use the term GF for three different things: We use the term GF for three different things:
@@ -273,8 +279,9 @@ The relation between these things is obvious: the GF system is an implementation
of the GF programming language, which in turn is built on the ideas of the of the GF programming language, which in turn is built on the ideas of the
GF theory. The main focus of this book is on the GF programming language. GF theory. The main focus of this book is on the GF programming language.
We learn how grammars are written in the language. At the same time, we learn We learn how grammars are written in the language. At the same time, we learn
the way of thinking in the GF theory. To make this all useful and fun, we the way of thinking in the GF theory. To make this all useful and fun, and
make the grammars run on a computer by using the GF system. to encourage experimenting, we make the grammars run on a computer by
using the GF system.
@@ -330,18 +337,16 @@ because they are multilingual and domain-specific.
However, there is another kind of grammars, which we call **resource grammars**. However, there is another kind of grammars, which we call **resource grammars**.
These are large, comprehensive grammars that can be used on any domain. These are large, comprehensive grammars that can be used on any domain.
The GF Resource Grammar Library has resource grammars for 10 languages. The GF Resource Grammar Library has resource grammars for 12 languages.
These grammars can be used as **libraries** to define application grammars. These grammars can be used as **libraries** to define application grammars.
In this way, it is possible to write a high-quality grammar without In this way, it is possible to write a high-quality grammar without
knowing about linguistics: in general, to write an application grammar knowing about linguistics: in general, to write an application grammar
by using the resource library just requires practical knowledge of by using the resource library just requires practical knowledge of
the target language. and all theoretical knowledge about its grammar the target language. and all theoretical knowledge about its grammar
is given by the libraries. is given in the libraries.
%--!
==Who is the tutorial for== ==Who is the tutorial for==
The tutorial part of this book is mainly for programmers The tutorial part of this book is mainly for programmers
@@ -349,33 +354,37 @@ who want to learn to write application grammars.
It will go through GF's programming concepts, and does not It will go through GF's programming concepts, and does not
presuppose knowledge of any of the main ingredients of GF: presuppose knowledge of any of the main ingredients of GF:
linguistics, functional programming, and type theory. linguistics, functional programming, and type theory.
Thus it should be accessible to anyone who has some This knowledge will be introduced as a part of grammar writing
previous programming experience from any language; the basics practice.
Thus the book should be accessible to anyone who has some
previous programming experience from any programming language; the basics
of using computers are also presupposed, e.g. the use of of using computers are also presupposed, e.g. the use of
text editors and the management of files. text editors and the management of files.
Those who already know GF well can skip the tutorial part, Those who already know GF well can skip the tutorial part,
or skim thorough it, and go directly to the part on advanced applications. or skim thorough it, and go directly to the part on advanced applications.
These will involve large scale GF programming, such as needed in resource Many of these applications will involve large scale GF programming,
grammars, and also the embedding of GF in systems such as and/or programming in other languages in which GF grammars are embedded.
natural-language user interfaces and dialogue systems.
%--!
==The coverage of the tutorial== ==The coverage of the tutorial==
The tutorial gives a hands-on introduction to grammar writing. The tutorial gives a hands-on introduction to grammar writing.
We start by building a "Hello World" grammar, which covers greetings We start in this chapter
in three languages (//hello world//, //terve maailma//, //ciao mondo//). by building a "Hello World" grammar, which covers greetings
in three languages: English (//hello world//),
Finnish (//terve maailma//), and Italian (//ciao mondo//).
This **multilingual grammar** is based on the distinction, central in This **multilingual grammar** is based on the distinction, central in
GF, between the **abstract syntax** GF, between **abstract syntax**
(the logical structure) and the **concrete syntax** (the (the logical structure) and **concrete syntax** (the
sequence of words) of expressions. sequence of words).
From the "Hello World" example, we proceed From the "Hello World" example, we proceed
to a larger grammar for the domain of food: in the next chapter
in this grammar, you can say things like to a larger grammar for the domain of food.
In this grammar, you can say things like
``` ```
this Italian cheese is delicious this Italian cheese is delicious
``` ```
@@ -394,53 +403,48 @@ has just one:
vino delizioso, vini deliziosi, pizza deliziosa, pizze deliziose vino delizioso, vini deliziosi, pizza deliziosa, pizze deliziose
``` ```
The **morphology** of a language describes the The **morphology** of a language describes the
forms of its words. forms of its words, and the basics of it are explained in Chapter 4.
While the complete description of morphology The complete description of morphology
belongs to resource grammars, and the use of them will be covered belongs to resource grammars, whose use is covered in Chapter 6.
by the tutorial. However, we will explain all the However, we will explain all the
programming concepts involved in resource grammars. programming concepts involved in resource grammars, and also
The tutorial will in fact build a miniature resource grammar in order build a miniature resource grammar in order
to give an introduction to linguistically oriented grammar writing. to give an introduction to linguistically oriented grammar writing (Chapter 5).
Of course, we will not presuppose that the reader knows Italian. Italian is used as the example language of many grammars.
We have chosen Italian as the example language because it has a rich Of course, we will not presuppose that the reader knows any Italian.
morphological structure that illustrates very well the capacities of We have chosen Italian because it has a rich structure
GF. Moreover, even those who don't know Italian, will find many of that illustrates very well the capacities of GF.
Moreover, even those readers who don't know Italian, will find many of
its words familiar. The exercises will encourage the reader to its words familiar. The exercises will encourage the reader to
port the examples to other languages; in fact, many GF port the examples to other languages; in fact, many GF
applications work for 5-10 languages. applications work for 5-10 languages.
Thus it is by elaborating the Food grammar example that
the tutorial makes a guided tour through most of GF.
While the constructs of the GF language are the main focus,
also the commands of the GF system are introduced as they
are needed.
In addition to multilinguality, **semantics** is an important aspect of GF In addition to multilinguality, **semantics** is an important aspect of GF
grammars. The concepts needed for "purely linguistic" grammars belong to grammars. The concepts needed for "purely linguistic" grammars belong to
the concrete syntax part of GF, whereas semantics is expressed in the abstract the concrete syntax part of GF, whereas semantics is expressed in the abstract
syntax. After the presentation of concrete syntax constructs, we proceed syntax. After the presentation of concrete syntax constructs, we proceed
to the enrichment of abstract syntax with **dependent types**, in Chapter 7 to the enrichment of abstract syntax with **dependent types**,
**variable bindings**, and **semantic definitions**. **variable bindings**, and **semantic definitions**.
To learn how to write GF grammars is not the only goal of To learn how to write GF grammars is not the only goal of
this tutorial. We will also explain the most important this tutorial. We will also explain the most important
commands of the GF system. With these commands, commands of the GF system. With these commands,
simple applications of grammars, such as translation and simple application programs such as translation and
quiz systems, can be built simply by writing scripts for the quiz systems, can be built simply by writing scripts for the
system. GF system.
More complicated applications, such as natural-language More complicated applications, such as natural-language
interfaces and dialogue systems, moreover require programming in interfaces and dialogue systems, moreover require programming in
some general-purpose language. The part on advanced topics will some general-purpose language. Part II on advanced applications will
explain how GF grammars are used as components of Haskell and Java programs. explain how GF grammars are used as components of Haskell and Java programs.
%--! %--!
==Getting the GF program== ==Getting the GF program==
The GF program is open-source free software, which you can download via the The GF program is open-source free software, which can be downloaded via the
GF Homepage: GF Homepage:
[``http://www.cs.chalmers.se/~aarne/GF`` http://www.cs.chalmers.se/~aarne/GF] [``http://www.cs.chalmers.se/~aarne/GF`` http://www.cs.chalmers.se/~aarne/GF]
@@ -485,6 +489,7 @@ Thus you should not type these prompts, but only the characters that
follow them. follow them.
==A "Hello World" grammar== ==A "Hello World" grammar==
The tradition in programming language tutorials is to start with a The tradition in programming language tutorials is to start with a
@@ -620,6 +625,7 @@ It will also show how much CPU time was consumed:
- compiling HelloEng.gf... wrote file HelloEng.gfc 12 msec - compiling HelloEng.gf... wrote file HelloEng.gfc 12 msec
12 msec 12 msec
>
``` ```
You can now use GF for **parsing**: You can now use GF for **parsing**:
``` ```
@@ -634,7 +640,7 @@ for a machine to understand and to process further, although this
is not so obvious in this simple grammar. is not so obvious in this simple grammar.
Strings that return a tree when parsed do so in virtue of the grammar Strings that return a tree when parsed do so in virtue of the grammar
you imported. Try parsing something that is not in grammar, and you fail you imported. Try to parse something that is not in grammar, and you will fail
``` ```
> parse "hello dad" > parse "hello dad"
Unknown words: dad Unknown words: dad
@@ -708,7 +714,7 @@ these functionalities. But we will also introduce many more:
The usefulness of GF would be quite limited if grammars were The usefulness of GF would be quite limited if grammars were
usable only inside the GF program. In the forthcoming chapters, usable only inside the GF program. Later in this book,
we will see many other ways of using grammars: we will see many other ways of using grammars:
- compile them to new formats, such as speech recognition grammars - compile them to new formats, such as speech recognition grammars
- embed them in Java and Haskell programs - embed them in Java and Haskell programs
@@ -823,7 +829,7 @@ In a concrete syntax, the available types include
=Designing a grammar for complex phrases= =Designing a grammar for complex phrases=
We will now start with a grammar that has much more structure than We will now start with a grammar that has much more structure than
the ``Hello`` grammar. We will look at how the abstract the ``Hello`` grammar. We will look at how the abstract syntax
is divided into suitable categories, and how infinitely many is divided into suitable categories, and how infinitely many
phrases can be built by using recursive rules. We will also phrases can be built by using recursive rules. We will also
introduce **modularity** by showing how a large grammar can be introduce **modularity** by showing how a large grammar can be
@@ -833,7 +839,8 @@ can be used for avoiding repeated code.
==The abstract syntax Food== ==The abstract syntax Food==
The grammar we wrote defines a set of phrases usable for speaking about food: We will write a grammar that
defines a set of phrases usable for speaking about food:
- the main category is ``Phrase`` - the main category is ``Phrase``
- a ``Phrase`` can be built by assigning a ``Quality`` to an ``Item``s - a ``Phrase`` can be built by assigning a ``Quality`` to an ``Item``s
- an``Item`` are build from a ``Kind`` by prefixing "this" or "that" - an``Item`` are build from a ``Kind`` by prefixing "this" or "that"
@@ -861,7 +868,11 @@ These verbal descriptions can be expressed as the following abstract syntax:
Fresh, Warm, Italian, Expensive, Delicious, Boring : Quality ; Fresh, Warm, Italian, Expensive, Delicious, Boring : Quality ;
} }
``` ```
In the concrete syntax, we will be able to build phrases such as In this abstract syntax, we can build ``Phrase``s such as
```
Is (This (QKind Delicious (QKind Italian Wine))) (Very (Very Expensive))
```
In the English concrete syntax, we will want to linearize this into
``` ```
this delicious Italian wine is very very expensive this delicious Italian wine is very very expensive
``` ```
@@ -924,7 +935,7 @@ the prefix can occur at most once.
===Generating trees and strings=== ===Generating trees and strings===
When we have a grammar above the trivial size, especially a recursive When we have a grammar above a trivial size, especially a recursive
one, we need more efficient ways of testing it than just by parsing one, we need more efficient ways of testing it than just by parsing
sentences that happen to come to our minds. One way to do this is sentences that happen to come to our minds. One way to do this is
based on **automatic generation**, which can be either based on **automatic generation**, which can be either
@@ -1054,8 +1065,8 @@ of grammars.
GF uses suffixes to recognize different file formats. The most GF uses suffixes to recognize different file formats. The most
important ones are: important ones are:
- Source files: //Modulname//``.gf`` - Source files: //Modulename//``.gf``
- Target files: //Modulname//``.gfc`` - Target files: //Modulename//``.gfc``
When you import ``FoodEng.gf``, you see the target files being When you import ``FoodEng.gf``, you see the target files being
@@ -1258,7 +1269,7 @@ which gets accessible by middle-clicking at the linearization field.
and its Italian translation by using ``gfeditor``. and its Italian translation by using ``gfeditor``.
==The context-free grammar format== ==Context-free grammars and GF==
Readers not familar with context-free grammars, also known as BNF grammars, can Readers not familar with context-free grammars, also known as BNF grammars, can
skip this section. Those that are familar with them will find here the exact skip this section. Those that are familar with them will find here the exact
@@ -1267,6 +1278,65 @@ the BNF format can be used as input to the GF program; it is often more
concise than GF proper, but also more restricted in expressive power. concise than GF proper, but also more restricted in expressive power.
===The "cf" grammar format===
The grammar ``FoodEng`` could be written in a BNF format as follows:
```
Is. Phrase ::= Item "is" Quality ;
That. Item ::= "that" Kind ;
This. Item ::= "this" Kind ;
QKind. Kind ::= Quality Kind ;
Cheese. Kind ::= "cheese" ;
Fish. Kind ::= "fish" ;
Wine. Kind ::= "wine" ;
Italian. Quality ::= "Italian" ;
Boring. Quality ::= "boring" ;
Delicious. Quality ::= "delicious" ;
Expensive. Quality ::= "expensive" ;
Fresh. Quality ::= "fresh" ;
Very. Quality ::= "very" Quality ;
Warm. Quality ::= "warm" ;
```
In this format, each rule is prefixed by a **label** that gives
the constructor function GF gives in its ``fun`` rules. In fact,
each context-free rule is a fusion of ``fun`` and a ``lin`` rule:
it states simultaneously that
- the label is a function from the nonterminal categories
on the right-hand side to the category on the left-hand side;
the first rule gives
```
fun Is : Item -> Quality -> Phrase
```
- trees built by the label are linearized in the way indicated
by the right-hand side;
the first rule gives
```
lin Is item quality = {s = item.s ++ "is" ++ quality.s}
```
The translations from BNF to GF described above are in fact used in
the GF system to convert BNF grammars into GF. BNF files are recognized
by the file name suffix ``.cf``; thus the grammar above can be
put into a file named ``food.cf`` and read into GF by
```
> import food.cf
```
===Restrictions of context-free grammars===
Even though we managed to write ``FoodEng`` in the context-free format,
we cannot do this for GF grammars in general. If we just try to do this
for ``FoodIta`` as well, we lose an important aspect of multilinguality:
that the order of constituents is defined separately in concrete syntax.
Thus we could not use ``FoodEng`` and ``FoodIta`` in a multilingual
grammar that supports translation via common abstract syntax.
**Exercise**. Define the copy language ``{x x | x <- (a|b)*}`` in GF.
This language is known not to be context-free.
==Using operations and resource modules== ==Using operations and resource modules==