From 1d8281f3c5b079a23df6ec3fb774b344e1469e50 Mon Sep 17 00:00:00 2001 From: aarne Date: Fri, 31 Aug 2007 12:11:35 +0000 Subject: [PATCH] moved away book --- doc/tutorial/SETLENGTHS.tex | 4 + doc/tutorial/gf-book.txt | 5623 ------------------------------- doc/tutorial/gf-tutorial2_9.txt | 4316 ------------------------ doc/tutorial/prelude | 8 +- 4 files changed, 11 insertions(+), 9940 deletions(-) create mode 100644 doc/tutorial/SETLENGTHS.tex delete mode 100644 doc/tutorial/gf-book.txt delete mode 100644 doc/tutorial/gf-tutorial2_9.txt diff --git a/doc/tutorial/SETLENGTHS.tex b/doc/tutorial/SETLENGTHS.tex new file mode 100644 index 000000000..5d0fc1c72 --- /dev/null +++ b/doc/tutorial/SETLENGTHS.tex @@ -0,0 +1,4 @@ + +\setlength{\parskip}{0mm} +\setlength{\parindent}{4mm} + diff --git a/doc/tutorial/gf-book.txt b/doc/tutorial/gf-book.txt deleted file mode 100644 index 4e228955b..000000000 --- a/doc/tutorial/gf-book.txt +++ /dev/null @@ -1,5623 +0,0 @@ -Grammatical Framework: Tutorial, Applications, and Reference Manual -Aarne Ranta -Draft %%date(%c) - -% NOTE: this is a txt2tags file. -% Create an html file from this file using: -% txt2tags --toc gf-tutorial2.txt - -%!target:html -%!encoding: iso-8859-1 - -%%!postproc(tex): "section\*" "section" - -%!postproc(tex): "subsection\*" "section" -%!postproc(tex): "section\*" "chapter" - -%!postproc(html): #BCEN
-%!postproc(html): #ECEN
- -%!postproc(tex): #BCEN "begin{center}" -%!postproc(tex): #ECEN "end{center}" - -%!preproc(html): #EDITORPNG [../quick-editor.png] -%!preproc(tex): #EDITORPNG [../../lib/resource-1.0/doc/10lang-small.png] - -%!preproc(html): #LOGOPNG [../gf-logo.png] -%!preproc(tex): #LOGOPNG "" - - -%!postproc(tex): #PARTone "part{Tutorial}" -%!postproc(tex): #PARTtwo "part{Applications of Grammars}" -%!postproc(tex): #PARTfour "part{Advanced Grammar Writing}" -%!postproc(tex): #PARTthree "part{Reference Manual}" - -%!postproc(tex): #PARTbnf "include{DocGF}" -%!postproc(tex): #PARTquickref "chapter{Quick Reference}" -%!postproc(tex): #twocolumn "twocolumn" -%!postproc(tex): #smallsize "tiny" -%!postproc(tex): #startappendix "appendix" - - -%!postproc(tex): #FORMULAone "input{FORMULAone}" - -#LOGOPNG - - - -%--! -=Introduction= - -In this Introduction, we will discuss the field of natural language processing -and locate the place of GF in the field. We will continue with a brief history -of GF and its applications, followed by an overview of this book. -This Introduction contains no technical material that is presupposed in -later chapters. Therefore, the practically oriented reader can jump -directly to the Tutorial starting from Chapter 2. - - - -==Natural language application programming== - -Making computers understand human language is one of the oldest dreams of -programmers. Projects with machine translation started almost as soon as -the first computers appeared in the 1940's. They was partly encouraged by the -success of decryption during the Second World War. Thus some American scientists -had the vision that Russian can be seen as encrypted English, which can be -deciphered by similar algorithms as those used for cracking the Germans' Enigma. - -Despite substantial efforts in machine translation, the early visions were not -realized, and the general conclusion reached by the mid-1960's was that -high-quality broad-coverage machine translation is impossible. Machine -translation was tuned down to the less ambitious and more specialized group of -tasks that started to be called computational linguistics. -Parallel to this, fantacies of "speaking robots" and -other language-understanding machines prevailed, exemplified by such science -fiction figures as the HAL computer in "2001: A Space Odyssey". - -The language understanding machines we see today are a variety of -products, which focus on different aspects of the task and none of which comes -even close to HAL or a machine translator with human-like capacities. Here is a -list of some such applications: -- browse-quality machine translation: Systran -- machine translation specialized on weather reports: Meteo -- electronic dictionaries: desktop, web-based, portable -- spelling and grammar checkers -- dialogue systems enabling simple speech interaction with a computer - - -A common feature of these applications is that their construction requires -**linguistic knowledge**: theoretical understanding of languages. As opposed to -practical understanding, which means the ability to speak, listen, write, and -read, theoretical understanding means knowledge of the **rules** of language. -It is by expressing these rules in a programming language the we can hope to -make a computer understand at least something of a natural language. - -This is where GF comes into picture. GF, Grammatical Framework, is a programming -language designed for expressing linguistic rules. A set of such rules is called -a **grammar**. GF is designed to make it easy to write grammar rules; this is -much easier than in a general-purpose programming language such as -Java or C or Haskell. But it is also in many ways easier and more productive -than in other languages specialized in grammars. The most well-known of these -is the **BNF notation** (Backus Naur Form), which is also known as -**context-free grammars**. It is used in compiler tools such as YACC. -While BNF is an excellent way to specify the grammar of a programming language, -it does not scale up the the complexities of of natural languages. - -Linguists have of course developed many formalisms that are designed for -describing natural languages. In comparison with them, one advantage of GF is -its support for **multilingual grammars**. In a multilingual grammar, a -semantic representation can be shared between several languages, in such a -way that a grammar written for one language can be easily ported to another -one. The grammar also supports translation between the languages it includes. -The most comprehensive multilingual grammars written in GF cover almost 100 -languages. - -GF does not only enable the writing of grammars. It is also equipped with tools -for integrating grammars in language-processing systems. -To build a language application usually involves much more than just a grammar, -and these other parts are often written in general-purpose languages. -Since it is important that the grammar can be integrated seemlessly with -the rest of the application, GF grammars can be converted into -**embedded grammars**, which can be directly used as components of -programs written in other languages such as C, Java, JavaScript, and Haskell. - -Since natural language application programming requires linguistic knowledge, it -is usually considered to need linguistic training. The mission of GF is to relieve -some of this need. This is achieved in two ways: -- GF works in a way familiar to ordinary programmers, namely as a **compiler** - that analyses a language and generates a result. -- GF has a set of **resource grammar libraries**, which encapsulate much of - the linguistic knowledge needed when writing grammars. - - -This said, GF makes no claim to "fire linguists" from natural language programming -projects. The claim is just that there should be a **division of labour**: -in GF, grammar can be divided into different **modules**, where some modules -require linguistic knowledge and others don't. Linguists working on the linguistic -modules will appreciate the way GF supports abstractions and generalizations, and -also the grammar development tools that enable testing of linguistic rules. -Non-linguists working on application-oriented modules will appreciate the -possibility to rely grammar rules defined in the linguistic library modules, -and to focus on other aspects of the task. - - - -==A brief history of GF and its applications== - -GF belongs to the tradition of **functional programming languages**, exemplified -by ML and Haskell and, somewhat more remotely, Lisp. One branch -of functional programming is **type theory**, which in turn has its roots in -logic and the foundations of mathematics. GF was, at the first place, created to -implement the idea that type theory can provide **semantics**, i.e. formalize -the meaning of natural languages. Several aspects of type-theoretical semantics -were covered in the monograph //Type-Theoretical Grammar// (Ranta 1994). -But a stronger aspect grew out of subsequent experiments dealing with different -languages: it is possible to have a common semantics for many languages, and -thereby build systems that translate between languages via the semantics. -The first implementation of this idea was written as a plug-in to the -proof editor Alfa (Magnusson & Nordström 1994) in 1995. It supported the -generation of sentences in six languages from mathematical formula that were -manipulated in the proof editor. One example area was geometry: -- Formula: -#FORMULAone -- English: -//If a point p lies outside a line l, then there is a line m such that p lies on m and m is parallel to l.// -- Finnish: -//Jos piste p on suoran l ulkopuolella, niin on olemassa suora m sellainen että p on suoralla m ja m on yhdensuuntainen l:n kanssa.// -- French: -//Si un point p est extérieur à une ligne l, alors il existe une ligne m telle que p soit sur la ligne m et que m soit parallèle à l.// -- German: -//Wenn ein Punkt p außerhalb einer Geraden l liegt, dann gibt es so eine Gerade m daß p auf der Geraden m liegt und m parallel zu l ist.// -- Italian: -//Si un punto p è esteriore a una linea l, allora esiste una line m tale che p sia sulla linea m e che m sia parallela a l.// -- Swedish: -//Om en punkt p ligger utanför linje l, så finns det en linje m sådan att p ligger på m och m är parallel med l.// - - -As a stand-alone programming language, GF was first implemented in 1998. This -took place at Xerox Research Centre Europe in Grenoble, within a project entitled -//Multilingual Document Authoring//. The goal of the project was to build a tool -for writing documents in multiple languages simultaneously, so that the user -need only know one of the languages; the rest will be produced automatically -via translations from the type-theoretical semantics (Dymetman & al. 2000). -In addition to GF itself, the project produced some prototype applications, -e.g. a restaurant phrase book and an editor for medical drug descriptions. -An important aspect was the adaptability of the system to new domains and -languages; hence the need of a language where such adaptations can be made -by just writing new grammars. - -Most grammars that were build in the Xerox project -remained property of Xerox Corporation, but the GF formalism and its -implementation were released as open-source software under GNU General -Public License. From 1999, the development of GF continued mostly at -the Department of Computing Science of Chalmers University of Technology -and Gothenburg University. In this environment, both functional programming -and type theory are strong research areas. This helped GF to develop into -a more stable and more full-fledged programming language. - -At Chalmers GF was soon used in courses given to computer science -students and in joint projects with non-linguist research groups. -This activity was soon summarized in the idea of making GF into -"the working programmer's grammar formalism", as -opposed to a tool requiring linguistic expertise. A nice experience from -the courses (both graduate and undergraduate) was that computer scientists are -often very interested in languages and have firm intuitions on grammar; given -a suitable programming tool, they can achieve impressive results in short time. -GF was to be made into such a tool, which meant above all that -it was developed in the way programming languages are, -following the virtues of familiarity and "the least surprise". -Issues of stability are also important, including -backward compatibility and portability to different platforms. -As a mark of stability, version 1.0 of GF was released in -2002. Also -documentation was found important. In addition to on-line documents, -a reference article appeared in the -//Journal of Functional Programming// in 2004, -and a long tutorial text was published in the post-publication of -ESSLLI lecture notes. - -The first full-scale applications of GF were natural-language interfaces. -The first one was for the proof editor Alfa (Hallgren & Ranta 2000). -The second one was a syntax editor and a natural-language interface to the -software specification language OCL (Object Constraint Language) built -within the KeY project (Ahrendt & al. 2006). -These projects boosted the implementation side -of GF itself, in particular, the graphical syntax editor (Khegai & al. 2003). -At the same time, some major mathematical properties of GF were established -in the PhD thesis of Peter Ljunglöf (2004), which led to improved -parser implementations. - -At the same time as GF was used in joint projects with computer science groups, -collaboration with the Linguistics Department of -Gothenburg University served as a "linguistic sanity check" of GF. Two efforts -that have been formative to the development of GF were started within this -collaboration: -- resource grammar libraries -- dialogue system applications - - -It was the resource grammar libraries that made GF really usable for non-linguist -programmers in more serious projects. They were heavily missed in the Alfa -project, and heavily used and improved in the KeY project. The development of -the library started in 2002; a version stable enough to be released with number -1.0 was complete in 2006, comprising ten languages. - -Dialogue systems, on the other hand, turned -out to be a major source of interesting problems and also of successful solutions. -Much of this work was carried out in the European project TALK (Tools for Ambient -Linguistic Knowledge, 2004-2006), also involving sites from Cambridge, -Edinburgh, and BMW in Munich. In addition to complete systems, the TALK -project produced supporting tools for embedded grammars -and speech recognition, as well as additions of spoken language structures -to the resource grammar library. - -Besides dialogue systems, multilingual authoring and translation continued -to be the main application of GF. The European WebALT project (Web Advanced -Learning Technologies, 2005-2006), used GF to build a tool for translating -mathematical exercises from formal specifications (written in MathML) to -six languages. Also a tool integrating GF with a computer algebra system was -developed. The project gave rise to a company, WebALT Inc. - -At the time of writing this (August 2007), the release of GF has version -number 2.8. It is a stable system that has been built with contributions -of dozens of persons and been used by at least hundreds; download figures -are in thousands. New ideas on how to use GF are posted by users almost -every week. - - - -==The purpose and scope of this book== - -One purpose of this book is to serve the growing user base of GF with -a definitive manual that gathers all relevant information in one place. -However, it is also intended to serve those who want to get started with GF, and -who don't necessarily have the technical background of the typical -users. We believe that learning to program in GF is not more difficult -than learning some other programming language. As for learning the linguistic -aspects, our experience is that writing grammars is an excellent introduction -to the problems of linguistics. In this way, linguistic -theory can be learnt at the same time as it is motivated by concrete problems. - -The book thus starts with a Tutorial (Part I), which gradually explains all -the constructs of the GF programming language. Also the design and style -aspects of grammar engineering are covered, to help the user to scale -up from small to large and possibly collaborative applications. Linguistic -concepts are explained at the same time as they are introduced in grammars. - -After the Tutorial, the book continues with a manual on building applications -that have embedded grammars as components (Part II). Part III goes through some -examples of more advanced grammar writing, in particular, the internals of the -resource grammar library. -Part IV is a complete reference manual, and the two Appendices -show a grammar of the GF language and a quick reference card of GF. - -What is not given much space in the book is theoretical discussions of -GF, especially in comparison to other grammar formalisms. Even though important -in the development of GF as a scientifically justified framework, such -discussions are not relevant for programmers who just want to use GF - any more -than, say, a book on Haskell has to include comparisons with Java. In fact, -comparisons with Java in a Haskell introduction would make more sense -than comparisons with DCG or HPSG or LFG in a GF introduction: -many Haskell learners can already be expected to know Java, whereas -most GF learners are not expected -to know any grammar formalisms, except perhaps BNF. - - - -#PARTone - -=An overview of the tutorial= - -The tutorial gives a hands-on introduction to grammar writing in GF. -We start in Chapter 3 -by building a "Hello World" grammar, which covers greetings -in three languages: English (//hello world//), -Finnish (//terve maailma//), and Italian (//ciao mondo//). -This **multilingual grammar** is based on the most central idea of GF: -the distinction between **abstract syntax** -(the logical structure) and **concrete syntax** (the -sequence of words). - -From the "Hello World" example, we proceed -in Chapter 4 -to a larger grammar for the domain of food. -In this grammar, you can say things like -``` - this Italian cheese is delicious -``` -in English and Italian. This grammar illustrates how translation is -more than just replacement of words. For instance, the order of -words may have to be changed: -``` - Italian cheese ===> formaggio italiano -``` -Moreover, words can have different forms, and which forms -they have vary from language to language. For instance, -Italian adjectives usually have four forms where English -has just one: -``` - delicious (wine, wines, pizza, pizzas) - vino delizioso, vini deliziosi, pizza deliziosa, pizze deliziose -``` -The **morphology** of a language describes the -forms of its words, and the basics -of it are explained in Chapter 5. - -The complete description of morphology -belongs to resource grammars, whose use is covered in Chapter 6. -Writing resource grammars will only be covered in Part III; -however, we the Tutorial does explain all the -programming concepts involved in resource grammars. - -In addition to multilinguality, **semantics** is an important aspect of GF -grammars. The concepts needed for "purely linguistic" grammars belong to -the concrete syntax part of GF, whereas semantics is expressed in the abstract -syntax. After the presentation of concrete syntax constructs, we proceed -in Chapter 7 to the enrichment of abstract syntax with **dependent types**, -**variable bindings**, and **semantic definitions**. - -English and Italian are used as example languages in many grammars. -Of course, we will not presuppose that the reader knows any Italian. -We have chosen Italian because it has a rich structure -that illustrates very well the capacities of GF. -Moreover, even those readers who don't know Italian, will find many of -its words familiar, due to the Latin heritage. -The exercises will encourage the reader to -port the examples to other languages as well; in particular, -it should be instructive for the reader to look at her -own native language from the point of view of writing a grammar -implementation. - -To learn how to write GF grammars is not the only goal of -this tutorial. We will also explain the most important -commands of the GF system, mostly in passing. With these commands, -simple application programs such as translation and -quiz systems, can be built simply by writing scripts for the -GF system. More complicated applications, such as natural-language -interfaces and dialogue systems, moreover require programming in -some general-purpose language; such applications are covered in Part II. - - -==Who should read this tutorial== - -This tutorial has been written for all programmers -who want to learn to write grammars in GF. -It will go through GF's programming concepts, and does not -presuppose knowledge of any of the main ingredients of GF: -linguistics, functional programming, and type theory. -This knowledge will be introduced as a part of grammar writing -practice. - -Thus the tutorial should be accessible to anyone who has some -previous experience from any programming language; the basics -of using computers are also presupposed, e.g. the use of -text editors and the management of files. - -Those who already know GF well can skip the tutorial part, -or skim thorough it, and go directly to the parts on applications -and advanced grammar writing. -Many of these topics will involve large scale GF programming, -and/or programming in other languages in which GF grammars are embedded. - - - -=Getting started= - -In this chapter, we will introduce the GF system and write the first GF grammar, -a "Hello World" grammar. While extremely small, this grammar already illustrates -how GF can be used for the tasks of translation and multilingual -generation. - - -==What GF is== - -We use the term GF for three different things: -- a **system** (computer program) used for working with grammars -- a **programming language** in which grammars can be written -- a **theory** about grammars and languages - - -The relation between these things is obvious: the GF system is an implementation -of the GF programming language, which in turn is built on the ideas of the -GF theory. The main focus of this book is on the GF programming language. -We learn how grammars are written in this language. At the same time, we learn -the way of thinking in the GF theory. To make this all useful and fun, and -to encourage experimenting, we make the grammars run on a computer by -using the GF system. - - - -%--! -==What GF grammars are used for== - -A grammar is a definition of a language. -From this definition, different language processing components -can be derived: -- **parsing**: to analyse the language -- **linearization**: to generate the language -- **translation**: to analyse one language and generate another - - -A GF grammar can be seen as a declarative program from which these -processing tasks can be automatically derived. In addition, many -other tasks are readily available for GF grammars: -- **morphological analysis**: find out the possible inflection forms of words -- **morphological synthesis**: generate all inflection forms of words -- **random generation**: generate random expressions -- **corpus generation**: generate all expressions -- **treebank generation**: generate a list of trees with their linearizations -- **teaching quizzes**: train morphology and translation -- **multilingual authoring**: create a document in many languages simultaneously -- **speech input**: optimize a speech recognition system for your grammar - - -A typical GF application is based on a **multilingual grammar** involving -translation on a special domain. Existing applications of this idea include -- [Alfa http://www.cs.chalmers.se/~hallgren/Alfa/Tutorial/GFplugin.html]: - a natural-language interface to a proof editor - (languages: English, French, Swedish) -- [KeY http://www.key-project.org/]: - a multilingual authoring system for creating software specifications - (languages: OCL, English, German) -- [TALK http://www.talk-project.org]: - multilingual and multimodal dialogue systems - (languages: English, Finnish, French, German, Italian, Spanish, Swedish) -- [WebALT http://webalt.math.helsinki.fi/content/index_eng.html]: - a multilingual translator of mathematical exercises - (languages: Catalan, English, Finnish, French, Spanish, Swedish) -- [Numeral translator http://www.cs.chalmers.se/~bringert/gf/translate/]: - number words from 1 to 999,999 - (88 languages) - - -The specialization of a grammar to a domain makes it possible to -obtain much better translations than in an unlimited machine translation -system. This is due to the well-defined semantics of such domains. -Grammars having this character are called **application grammars**. -They are different from most grammars written by linguists just -because they are multilingual and domain-specific. - -However, there is another kind of grammars, which we call **resource grammars**. -These are large, comprehensive grammars that can be used on any domain. -The GF Resource Grammar Library has resource grammars for 12 languages. -These grammars can be used as **libraries** to define application grammars. -In this way, it is possible to write a high-quality grammar without -knowing about linguistics: in general, to write an application grammar -by using the resource library just requires practical knowledge of -the target language. and all theoretical knowledge about its grammar -is given in the libraries. - - - - -%--! -==Getting the GF system== - -The GF system is open-source free software, which can be downloaded via the -GF Homepage: - -[``http://www.digitalgrammars.com/gf`` http://www.digitalgrammars.com/gf] - -There you can download -- binaries for Linux, Mac OS X, and Windows -- source code and documentation -- grammar libraries and examples - - -If you want to compile GF from source, you need a Haskell compiler. -To compile the interactive editor, you also need a Java compilers. -But normally you don't have to compile anything yourself, and you definitely -don't need to know Haskell or Java to use GF. - -We are assuming the availability of a Unix shell. Linux and Mac OS X users -have it automatically, the latter under the name "terminal". -Windows users are recommended to install Cywgin, the free Unix shell for Windows. - - -%--! -==Running the GF system== - -To start the GF system, assuming you have installed it, just type -``gf`` in the Unix (or Cygwin) shell: -``` - % gf -``` -You will see GF's welcome message and the prompt ``>``. -The command -``` - > help -``` -will give you a list of available commands. - -As a common convention in this book, we will use -- ``%`` as a prompt that marks system commands -- ``>`` as a prompt that marks GF commands - - -Thus you should not type these prompts, but only the characters that -follow them. - - - -==A "Hello World" grammar== - -The tradition in programming language tutorials is to start with a -program that prints "Hello World" on the terminal. GF should be no -exception. But our program has features that distinguish it from -most "Hello World" programs: -- **Multilinguality**: the message is printed in many languages. -- **Reversibility**: in addition to printing, you can **parse** the - message and translate it to other languages. - - -===The program: abstract syntax and concrete syntaxes=== - -A GF program, in general, is a **multilingual grammar**. Its main parts -are -- an **abstract syntax** -- one or more **concrete syntaxes** - - -The abstract syntax defines, in a language-independent way, what **meanings** -can be expressed in the grammar. In the "Hello World" grammar we want -to express //Greetings//, where we greet a //Recipient//, which can be -//World// or //Mum// or //Friends//. Here is the entire -GF code for the abstract syntax: -``` - -- a "Hello World" grammar - abstract Hello = { - - flags startcat = Greeting ; - - cat Greeting ; Recipient ; - - fun - Hello : Recipient -> Greeting ; - World, Mum, Friends : Recipient ; - } -``` -The code has the following parts: -- a **comment** (optional), saying what the module is doing -- a **module header** indicating that it is an abstract syntax - module named ``Hello`` -- a **module body** in braces, consisting of - - a **startcat flag declaration** stating that ``Greeting`` is the - main category, i.e. the one in which parsing and generation is - performed by default - - **category declarations** stating that ``Greeting`` and ``Recipient`` - are categories, i.e. types of meanings - - **function declarations** stating what meaning-building functions there - are; these are the three possible recipients, as well as the function - ``Hello`` constructing a greeting from a recipient - - -A concrete syntax defines a mapping from the abstract meanings to their -expressions in a language. We first give an English concrete syntax: -``` - concrete HelloEng of Hello = { - - lincat Greeting, Recipient = {s : Str} ; - - lin - Hello rec = {s = "hello" ++ rec.s} ; - World = {s = "world"} ; - Mum = {s = "mum"} ; - Friends = {s = "friends"} ; - } -``` -The major parts of this code are: -- a module header indicating that it is a concrete syntax of the abstract syntax - ``Hello``, itself named ``HelloEng`` -- a module body in braces, consisting of - - **linearization type definitions** stating that - ``Greeting`` and ``recipient`` are **records** with a **string** ``s`` - - **linearization definitions** telling what records are assigned to - each of the meanings defined in the abstract syntax; the recipients are - linearized to records containing single words, whereas the ``Hello`` greeting - has a function telling that the word ``hello`` is prefixed to the string - containing in the argument record - - - - -To make the grammar truly multilingual, we add a Finnish and an Italian concrete -syntax: -``` - concrete HelloFin of Hello = { - lincat Greeting, Recipient = {s : Str} ; - lin - Hello rec = {s = "terve" ++ rec.s} ; - World = {s = "maailma"} ; - Mum = {s = "äiti"} ; - Friends = {s = "ystävät"} ; - } - - concrete HelloIta of Hello = { - lincat Greeting, Recipient = {s : Str} ; - lin - Hello rec = {s = "ciao" ++ rec.s} ; - World = {s = "mondo"} ; - Mum = {s = "mamma"} ; - Friends = {s = "amici"} ; - } -``` -Now we have a trilingual grammar usable for translation and -many other tasks, which we will now look into. - - - -===Using the grammar in the GF system=== - -In order to compile the grammar in GF, each of the four modules -has to be put in a file named //Modulename//``.gf``: -``` - Hello.gf HelloEng.gf HelloFin.gf HelloIta.gf -``` -The first GF command needed when using a grammar is to **import** it. -The command has a long name, ``import``, and a short name, ``i``. -You can thus type either -``` - > import food.cf -``` -or -``` - > i food.cf -``` -to get the same effect. In general, all GF commands have a long and a short name; -short names are convenient when typing commands by hand, whereas long command -names are more readable in scripts, i.e. files that include sequences of commands. - -The effect of ``import`` is that the GF program **compiles** your grammar -into an internal representation, and shows a new prompt when it is ready. -It will also show how much CPU time was consumed: -``` - > i HelloEng.gf - - compiling Hello.gf... wrote file Hello.gfc 8 msec - - compiling HelloEng.gf... wrote file HelloEng.gfc 12 msec - - 12 msec - > -``` -You can now use GF for **parsing**: -``` - > parse "hello world" - Hello World -``` -The ``parse`` (= ``p``) command takes a **string** -(in double quotes) and returns an **abstract syntax tree** - the meaning -of the string defined in the abstract syntax. -A tree is, in general, something easier than a string -for a machine to understand and to process further, although this -is not so obvious in this simple grammar. - -Strings that return a tree when parsed do so in virtue of the grammar -you imported. Try to parse something that is not in grammar, and you will fail -``` - > parse "hello dad" - Unknown words: dad - - > parse "world hello" - no tree found -``` -In the first example, the failure is caused by an unknown word. -In the second example, the combination of words is ungrammatical. - -In addition to parsing, you can also use GF for **linearizing** -(``linearize = l``). This is the inverse of -parsing, taking trees into strings: -``` - > linearize Hello World - hello world -``` -What is the use of this? Typically not that you type in a tree at -the GF prompt. The utility of linearization comes from the fact that -you can obtain a tree from somewhere else - for instance, from -a parser. A prime example of this is **translation**: you parse -with one concrete syntax and linearize with another. Let us -now do this by first importing the Italian grammar: -``` - > import HelloIta.gf -``` -We can now parse with ``HelloEng`` and **pipe** the result -into linearizing with ``HelloIta``: -``` - > parse -lang=HelloEng "hello mum" | linearize -lang=HelloIta - ciao mamma -``` -Notice that the commands must use a **language flag** to indicate -which concrete syntax is used in each operation. - -To conclude the translation exercise, we import the Finnish grammar -and pipe English parsing into **multilingual generation**: -``` - > parse -lang=HelloEng "hello friends" | linearize -multi - terve ystävät - ciao amici - hello friends -``` - -**Exercise**. Test the parsing and translation examples shown above, as well as -five other examples. - -**Exercise**. Extend the grammar ``Hello.gf`` and some of the -concrete syntaxes by five new recipients and one new greeting -form. - -**Exercise**. Add a concrete syntax for some other -languages you might know. - - - -==Using grammars from outside GF== - -A "hello world" program written e.g. in C should be executable from the -Unix shell and print its output on the terminal. This is possible in GF -as well, by using the ``gf`` program in a Unix pipe. Invoking ``gf`` -can be made with grammar names as arguments, -``` - % gf HelloEng.gf HelloFin.gf HelloIta.gf -``` -which has the same effect as opening ``gf`` and then importing the -grammars. A command can be send to this ``gf`` state by piping it from -Unix's ``echo`` command: -``` - % echo "l -multi Hello Wordl" | gf HelloEng.gf HelloFin.gf HelloIta.gf -``` -which will execute the command and then quit. Alternatively, one can write -a **script**, -``` - import HelloEng.gf - import HelloFin.gf - import HelloIta.gf - linearize -multi Hello World -``` -If we name this script ``hello.gfs``, we can do -``` - $ gf -batch -s ... -> An -> A -``` -where each of ``A1, ..., An, A`` is a basic type (this restriction -will be relieved later). The last type in the arrow-separated sequence -is the **value type** of the function type, the earlier types are -its **argument types**. - -In a concrete syntax, the available types include -- the type of strings, ``Str`` -- record types of form ``{`` r1 : T1 ; ... ; rn : Tn ``}`` - - -**Terms** used in linearizations have the forms -- quoted string: ``"foo"``, of type ``Str`` -- concatenation of strings: ``"foo" ++ "bar"``, -- record: ``{`` r1 = t1 ; ... ; rn = Tn ``}``, - of type ``{`` r1 : R1 ; ... ; rn : Rn ``}`` -- projection ``t.r`` with a record label, of the corresponding record - field type -- argument variable ``x`` bound by the left-hand-side of a ``lin`` rule, - of the corresponding linearization type - - -Each semi-colon separated part in record types and records is called a -**field**. The identifier introduced by the left-hand-side of a field -is called a **label**. - -Each quoted string is treated as one **token**, and strings concatenated by -``++`` are treated as separate tokens. Tokens are, by default, written with -a space in between. This behaviour can be changed by ``lexer`` and ``unlexer`` -flags, as will be explained later in Section ??. - - - - - -=Designing a grammar for complex phrases= - -We will now start with a grammar that has much more structure than -the ``Hello`` grammar. We will look at how the abstract syntax -is divided into suitable categories, and how infinitely many -phrases can be built by using recursive rules. We will also -introduce **modularity** by showing how a large grammar can be -divided into modules, and how functions defined in **resource modules** -can be used to ahare code in and among modules. - - -==The abstract syntax Food== - -We will write a grammar that -defines a set of phrases usable for speaking about food: -- the main category is ``Phrase`` -- a ``Phrase`` can be built by assigning a ``Quality`` to an ``Item``s - (e.g. //this cheese is Italian//) -- an``Item`` are build from a ``Kind`` by prefixing "this" or "that" - (e.g. //this wine//) -- a ``Kind`` is either **atomic** (e.g. //cheese//), or formed - modifying a given ``Kind`` with a ``Quality`` (e.g. //Italian cheese//) -- a ``Quality`` is either atomic (e.g. //Italian//, - or built by modifying a given ``Quality`` (e.g. //very warm//) - - -These verbal descriptions can be expressed as the following abstract syntax: -``` - abstract Food = { - - flags startcat = Phrase ; - - cat - Phrase ; Item ; Kind ; Quality ; - - fun - Is : Item -> Quality -> Phrase ; - This, That : Kind -> Item ; - QKind : Quality -> Kind -> Kind ; - Wine, Cheese, Fish : Kind ; - Very : Quality -> Quality ; - Fresh, Warm, Italian, Expensive, Delicious, Boring : Quality ; - } -``` -In this abstract syntax, we can build ``Phrase``s such as -``` - Is (This (QKind Delicious (QKind Italian Wine))) (Very (Very Expensive)) -``` -In the English concrete syntax, we will want to linearize this into -``` - this delicious Italian wine is very very expensive -``` - - -==The concrete syntax FoodEng== - -The English concrete syntax gives no surprises: -``` - concrete FoodEng of Food = { - - lincat - Phrase, Item, Kind, Quality = {s : Str} ; - - lin - Is item quality = {s = item.s ++ "is" ++ quality.s} ; - This kind = {s = "this" ++ kind.s} ; - That kind = {s = "that" ++ kind.s} ; - QKind quality kind = {s = quality.s ++ kind.s} ; - Wine = {s = "wine"} ; - Cheese = {s = "cheese"} ; - Fish = {s = "fish"} ; - Very quality = {s = "very" ++ quality.s} ; - Fresh = {s = "fresh"} ; - Warm = {s = "warm"} ; - Italian = {s = "Italian"} ; - Expensive = {s = "expensive"} ; - Delicious = {s = "delicious"} ; - Boring = {s = "boring"} ; - } -``` -Let us test how the grammar works in parsing: -``` - > import FoodEng.gf - > parse "this delicious wine is very very Italian" - Is (This (QKind Delicious Wine)) (Very (Very Italian)) -``` -You can also try parsing in other categories than the ``startcat``, -by setting the command-line ``cat`` flag: -``` - p -cat=Kind "very Italian wine" - QKind (Very Italian) Wine -``` - -**Exercise**. Extend the ``Food`` grammar by ten new food kinds and -qualities, and run the parser with new kinds of examples. - - -**Exercise**. Add a rule that enables question phrases of the form -//is this cheese Italian//. - - -**Exercise**. Enable the optional prefixing of -phrases with the words "excuse me but". Do this in such a way that -the prefix can occur at most once. - - - -==Commands for testing grammars== - -===Generating trees and strings=== - -When we have a grammar above a trivial size, especially a recursive -one, we need more efficient ways of testing it than just by parsing -sentences that happen to come to our minds. One way to do this is -based on **automatic generation**, which can be either -**random** or **exhausive**. - -Random generation (``generate_random = gr``) is an operation that -builds a random tree in accordance with an abstract syntax: -``` - > generate_random - Is (This (QKind Italian Fish)) Fresh -``` -By using a pipe, random generation can be fed into linearization: -``` - > generate_random | linearize - this Italian fish is fresh -``` -Random generation is a good way to test a grammar; it can also -be fun. By using the ``number`` flag, several strings can be generated -in one command: -``` - > gr -number=10 | l - that wine is boring - that fresh cheese is fresh - that cheese is very boring - this cheese is Italian - that expensive cheese is expensive - that fish is fresh - that wine is very Italian - this wine is Italian - this cheese is boring - this fish is boring -``` -To generate //all// phrases that a grammar can produce, -GF provides the command ``generate_trees = gt``. -``` - > generate_trees | l - that cheese is very Italian - that cheese is very boring - that cheese is very delicious - that cheese is very expensive - that cheese is very fresh - ... - this wine is expensive - this wine is fresh - this wine is warm - -``` -You get quite a few trees but not all of them: only up to a given -**depth** of trees. The default depth is 3; the depth can be -set by using the ``depth`` flag: -``` - > generate_trees -depth=5 | l -``` -Other options to the generation commands (like all commands) can be seen -by GF's ``help = h`` command: -``` - > help gr - > help gt -``` - -**Exercise**. If the command ``gt`` generated all -trees in your grammar, it would never terminate. Why? - -**Exercise**. Measure how many trees the grammar gives with depths 4 and 5, -respectively. You use the Unix **word count** command ``wc`` to count lines. -**Hint**. You can pipe the output of a GF command into a Unix command by -using the escape ``?``, as follows: -``` - > generate_trees -depth=4 | ? wc -``` - - - - - -===More on pipes; tracing=== - -A pipe of GF commands can have any length, but the "output type" -(either string or tree) of one command must always match the "input type" -of the next command, in order for the result to make sense. - -The intermediate results in a pipe can be observed by putting the -**tracing** flag ``-tr`` to each command whose output you -want to see: -``` - > gr -tr | l -tr | p - - Is (This Cheese) Boring - this cheese is boring - Is (This Cheese) Boring -``` -This facility is useful for test purposes: for instance, you -may want to see if a grammar is **ambiguous**, i.e. -contains strings that can be parsed in more than one way. - -**Exercise**. Extend the ``Food`` grammar so that it produces ambiguous -strings, and try out the ambiguity test. - - - -===Writing and reading files=== - -To save the outputs of GF commands into a file, you can -pipe it to the ``write_file = wf`` command, -``` - > gr -number=10 | l | write_file exx.tmp -``` -You can read the file back to GF with the -``read_file = rf`` command, -``` - > read_file exx.tmp | p -lines -``` -Notice the flag ``-lines`` given to the parsing -command. This flag tells GF to parse each line of -the file separately. Without the flag, the grammar could -not recognize the string in the file, because it is not -a sentence but a sequence of ten sentences. - -Files with examples can be used for **regression testing** -of grammars. - - - - - -==An Italian concrete syntax== - -We write the Italian grammar in a straightforward way, by replacing -English words with their usual dictionary equivalents: -``` - concrete FoodIta of Food = { - - lincat - Phrase, Item, Kind, Quality = {s : Str} ; - - lin - Is item quality = {s = item.s ++ "è" ++ quality.s} ; - This kind = {s = "questo" ++ kind.s} ; - That kind = {s = "quello" ++ kind.s} ; - QKind quality kind = {s = kind.s ++ quality.s} ; - Wine = {s = "vino"} ; - Cheese = {s = "formaggio"} ; - Fish = {s = "pesce"} ; - Very quality = {s = "molto" ++ quality.s} ; - Fresh = {s = "fresco"} ; - Warm = {s = "caldo"} ; - Italian = {s = "italiano"} ; - Expensive = {s = "caro"} ; - Delicious = {s = "delizioso"} ; - Boring = {s = "noioso"} ; - } -``` -An alert reader, or one who already knows Italian, may notice one point in -which the change is more radical than just replacement of words: the order of -a quality and the kind it modifies in -``` - QKind quality kind = {s = kind.s ++ quality.s} ; -``` -Thus Italian says ``vino italiano`` for ``Italian wine``. - -**Exercise**. Write a concrete syntax of ``Food`` for some other language. -You will probably end up with grammatically incorrect linearizations - but don't -worry about this yet. - -**Exercise**. If you have written ``Food`` for German, Swedish, or some -other language, test with random or exhaustive generation what constructs -come out incorrect, and prepare a list of those ones that cannot be helped -with the currently available fragment of GF. You can return to your list -after having worked out Chapter 5. - - - -==More application of multilingual grammars== - -===Multilingual treebanks=== - -A **multilingual treebank**, is a set of trees with their -translations in different languages: -``` - > gr -number=2 | tree_bank - - Is (That Cheese) (Very Boring) - quello formaggio è molto noioso - that cheese is very boring - - Is (That Cheese) Fresh - quello formaggio è fresco - that cheese is fresh -``` - - -===Translation session=== - -If translation is what you want to do with a set of grammars, a convenient -way to do it is to open a ``translation_session = ts``. In this session, -you can translate between all the languages that are in scope. -A dot ``.`` terminates the translation session. -``` - > ts - - trans> that very warm cheese is boring - quello formaggio molto caldo è noioso - that very warm cheese is boring - - trans> questo vino molto italiano è molto delizioso - questo vino molto italiano è molto delizioso - this very Italian wine is very delicious - - trans> . - > -``` - - -===Translation quiz=== - -This is a simple language exercise that can be automatically -generated from a multilingual grammar. The system generates a set of -random sentences, displays them in one language, and checks the user's -answer given in another language. The command ``translation_quiz = tq`` -makes this in a subshell of GF. -``` - > translation_quiz FoodEng FoodIta - - Welcome to GF Translation Quiz. - The quiz is over when you have done at least 10 examples - with at least 75 % success. - You can interrupt the quiz by entering a line consisting of a dot ('.'). - - this fish is warm - questo pesce è caldo - > Yes. - Score 1/1 - - this cheese is Italian - questo formaggio è noioso - > No, not questo formaggio è noioso, but - questo formaggio è italiano - - Score 1/2 - this fish is expensive -``` -You can also generate a list of translation exercises and save it in a -file for later use, by the command ``translation_list = tl`` -``` - > translation_list -number=25 FoodEng FoodIta | write_file transl.txt -``` -The ``number`` flag gives the number of sentences generated. - - - -===Multilingual syntax editing=== - -Any multilingual grammar can be used in the graphical syntax editor, which is -opened by the shell -command ``gfeditor`` followed by the names of the grammar files. -Thus -``` - % gfeditor FoodEng.gf FoodIta.gf -``` -opens the editor for the two ``Food`` grammars. - -The editor supports commands for manipulating an abstract syntax tree. -The process is started by choosing a category from the "New" menu. -Choosing ``Phrase`` creates a new tree of type ``Phrase``. A new tree -is in general completely unknown: it consists of a **metavariable** -``?1``. However, since the category ``Phrase`` in ``Food`` has -only one possible constructor, ``Is``, the tree is readily -given the form ``Is ?1 ?2``. Here is what the editor looks like at -this stage: - - [food1.png] - -Editing goes on by **refinements**, i.e. choices of constructors from -the menu, until no metavariables remain. Here is a tree resulting from the -current editing session: - - [food2.png] - -Editing can be continued even when the tree is finished. The user can shift -the **focus** to some of the subtrees by clicking at it of the corresponding -part of a linearization. In the picture, the focus is on "fish". -Since there are no metavariables, -the menu shows no refinements, but some other possible actions: -- to **change** "fish" to "cheese" or "wine" -- to **delete** "fish", i.e. change it to a metavariable -- to **wrap** "fish" in a qualification, i.e. change it to - ``QKind ? Fish``, where the quality can be given in a later refinement - - -In addition to menu-based editing, the tool supports refinement by parsing, -which is accessible by middle-clicking in the tree or in the linearization field. - -**Exercise**. Construct the sentence -//this very expensive cheese is very very delicious// -and its Italian translation by using ``gfeditor``. - - -==Context-free grammars and GF== - -Readers not familar with context-free grammars, also known as BNF grammars, can -skip this section. Those that are familar with them will find here the exact -relation between GF and context-free grammars. We will moreover show how -the BNF format can be used as input to the GF program; it is often more -concise than GF proper, but also more restricted in expressive power. - - -===The "cf" grammar format=== - -The grammar ``FoodEng`` could be written in a BNF format as follows: -``` - Is. Phrase ::= Item "is" Quality ; - That. Item ::= "that" Kind ; - This. Item ::= "this" Kind ; - QKind. Kind ::= Quality Kind ; - Cheese. Kind ::= "cheese" ; - Fish. Kind ::= "fish" ; - Wine. Kind ::= "wine" ; - Italian. Quality ::= "Italian" ; - Boring. Quality ::= "boring" ; - Delicious. Quality ::= "delicious" ; - Expensive. Quality ::= "expensive" ; - Fresh. Quality ::= "fresh" ; - Very. Quality ::= "very" Quality ; - Warm. Quality ::= "warm" ; -``` -In this format, each rule is prefixed by a **label** that gives -the constructor function GF gives in its ``fun`` rules. In fact, -each context-free rule is a fusion of a ``fun`` and a ``lin`` rule: -it states simultaneously that -- the label is a function from the nonterminal categories - on the right-hand side to the category on the left-hand side; - the first rule gives -``` - fun Is : Item -> Quality -> Phrase -``` -- trees built by the label are linearized in the way indicated - by the right-hand side; - the first rule gives -``` - lin Is item quality = {s = item.s ++ "is" ++ quality.s} -``` - - -The translation from BNF to GF described above is in fact used in -the GF system to convert BNF grammars into GF. BNF files are recognized -by the file name suffix ``.cf``; thus the grammar above can be -put into a file named ``food.cf`` and read into GF by -``` - > import food.cf -``` - - -===Restrictions of context-free grammars=== - -Even though we managed to write ``FoodEng`` in the context-free format, -we cannot do this for GF grammars in general. If we just try to do this -for ``FoodIta`` as well, we lose an important aspect of multilinguality: -that the order of constituents is defined separately in concrete syntax. -Thus we could not use context-free ``FoodEng`` and ``FoodIta`` in a multilingual -grammar that supports translation via common abstract syntax: the -qualification function ``QKind`` has different types in the two -grammars. - -In general terms, the separation of concrete and abstract syntax allows -three deviations from context-free grammar: -- **permutation**: changing the order of constituents -- **suppression**: omitting constituents -- **reduplication**: repeating constituent - - -The third property is the one that definitely shows that GF is -stronger than context-free: GF can define the **copy language** -``{x x | x <- (a|b)*}``, which is known not to be context-free. -The other properties have more to do with the kind of trees that -the grammar can associated with strings: permutation is important -in multilingual grammars, and suppression is exploited in grammars -where trees carry some hidden semantic information (see Chapter 7 -below). - -Of course, context-free grammars are also restricted from the -grammar engineering point of view. They give no support to -modules, functions, and parameters, which are so central -for the productivity of GF. - -**Exercise**. GF can also interpret unlabelled BNF grammars, by -creating labels automatically. The right-hand sides of BNF rules -can moreover be disjunctions, e.g. -``` - Quality ::= "fresh" | "Italian" | "very" Quality ; -``` -Experiment with this format in GF, possibly with a grammar that -you import from some other source, such as a programming language -document. - -**Exercise**. Define the copy language ``{x x | x <- (a|b)*}`` in GF. - - - -%--! -==Modules and files== - -GF uses suffixes to recognize different file formats. The most -important ones are: -- Source files: //Modulename//``.gf`` -- Target files: //Modulename//``.gfc`` - - -When you import ``FoodEng.gf``, you see the target files being -generated: -``` - > i FoodEng.gf - - compiling Food.gf... wrote file Food.gfc 16 msec - - compiling FoodEng.gf... wrote file FoodEng.gfc 20 msec -``` -You also see that the GF program does not only read the file -``FoodEng.gf``, but also all other files that it -depends on - in this case, ``Food.gf``. - -For each file that is compiled, a ``.gfc`` file -is generated. The GFC format (="GF Canonical") is the -"machine code" of GF, which is faster to process than -GF source files. When reading a module, GF decides whether -to use an existing ``.gfc`` file or to generate -a new one, by looking at modification times. - -**Exercise**. What happens when you import ``FoodEng.gf`` for -a second time? Try this in different situations: -- Right after importing it the first time (the modules are kept in - the memory of GF and need no reloading). -- After issuing the command ``empty`` (``e``), which clears the memory - of GF. -- After making a small change in ``FoodEng.gf``, be it only an added space. -- After making a change in ``Food.gf``. - - - - - -==Using operations and resource modules== - -===The golden rule of functional programming=== - -When writing a grammar, you have to type lots of -characters. You have probably -done this by the copy-and-paste method, which is a common way to -avoid repeating work. - -However, there is a more elegant way to avoid repeating work than -the copy-and-paste -method. The **golden rule of functional programming** says that -- whenever you find yourself programming by copy-and-paste, - write a function instead. - - -A function separates the shared parts of different computations from the -changing parts, its **arguments**, or **parameters**. -In functional programming languages, such as -[Haskell http://www.haskell.org], it is possible to share much more -code with functions than in languages such as C and Java, because -of higher-order functions (functions that takes functions as arguments). - - -===Operation definitions=== - -GF is a functional programming language, not only in the sense that -the abstract syntax is a system of functions (``fun``), but also because -functional programming can be used when defining concrete syntax. This is -done by using a new form of judgement, with the keyword ``oper`` (for -**operation**), distinct from ``fun`` for the sake of clarity. -Here is a simple example of an operation: -``` - oper ss : Str -> {s : Str} = \x -> {s = x} ; -``` -The operation can be **applied** to an argument, and GF will -**compute** the application into a value. For instance, -``` - ss "boy" ===> {s = "boy"} -``` -We use the symbol ``===>`` to indicate how an expression is -computed into a value; this symbol is not a part of GF. - -Thus an ``oper`` judgement includes the name of the defined operation, -its type, and an expression defining it. As for the syntax of the defining -expression, notice the **lambda abstraction** form ``\``//x// ``->`` //t// of -the function. It reads: function with variable //x// and **function body** -//t//. - -For lambda abstraction with multiple arguments, we have the shorthand -``` - \x,y,z -> t === \x -> \y -> \z -> t -``` -The notation we have used for linearization rules, where -variables are bound on the left-hand side, is actually syntactic -sugar for abstraction: -``` - lin f x = t === lin f = \x -> t -``` - - - - - -%--! -===The ``resource`` module type=== - -Operator definitions can be included in a concrete syntax. -But they are usually not really tied to a particular -set of linearization rules. -They should rather be seen as **resources** -usable in many concrete syntaxes. - -The ``resource`` module type is used to package -``oper`` definitions into reusable resources. Here is -an example, with a handful of operations to manipulate -strings and records. -``` - resource StringOper = { - oper - SS : Type = {s : Str} ; - ss : Str -> SS = \x -> {s = x} ; - cc : SS -> SS -> SS = \x,y -> ss (x.s ++ y.s) ; - prefix : Str -> SS -> SS = \p,x -> ss (p ++ x.s) ; - } -``` - - - -%--! -===Opening a resource=== - -Any number of ``resource`` modules can be -**opened** in a ``concrete`` syntax, which -makes definitions contained -in the resource usable in the concrete syntax. Here is -an example, where the resource ``StringOper`` is -opened in a new version of ``FoodEng``. -``` - concrete FoodEng of Food = open StringOper in { - - lincat - S, Item, Kind, Quality = SS ; - - lin - Is item quality = cc item (prefix "is" quality) ; - This k = prefix "this" k ; - That k = prefix "that" k ; - QKind k q = cc k q ; - Wine = ss "wine" ; - Cheese = ss "cheese" ; - Fish = ss "fish" ; - Very = prefix "very" ; - Fresh = ss "fresh" ; - Warm = ss "warm" ; - Italian = ss "Italian" ; - Expensive = ss "expensive" ; - Delicious = ss "delicious" ; - Boring = ss "boring" ; - } -``` - -**Exercise**. Use the same string operations to write ``FoodIta`` -more concisely. - - - -%--! -===Partial application=== - -GF, like Haskell, permits **partial application** of -functions. An example of this is the rule -``` - lin This k = prefix "this" k ; -``` -which can be written more concisely -``` - lin This = prefix "this" ; -``` -The first form is perhaps more intuitive to write -but, once you get used to partial application, you will appreciate its -conciseness and elegance. The logic of partial application -is known as **currying**, with a reference to Haskell B. Curry. -The idea is that any //n//-place function can be defined as a 1-place -function whose value is an //n-//1 -place function. Thus -``` - oper prefix : Str -> SS -> SS ; -``` -can be used as a 1-place function that takes a ``Str`` into a -function ``SS -> SS``. The expected linearization of ``This`` is exactly -a function of such a type, operating on an argument of type ``Kind`` -whose linearization is of type ``SS``. Thus we can define the -linearization directly as ``prefix "this"``. - -**Exercise**. Define an operation ``infix`` analogous to ``prefix``, -such that it allows you to write -``` - lin Is = infix "is" ; -``` - - - -===Testing resource modules=== - -To test a ``resource`` module independently, you must import it -with the flag ``-retain``, which tells GF to retain ``oper`` definitions -in the memory; the usual behaviour is that ``oper`` definitions -are just applied to compile linearization rules -(this is called **inlining**) and then thrown away. -``` - > i -retain StringOper.gf -``` -The command ``compute_concrete = cc`` computes any expression -formed by operations and other GF constructs. For example, -``` - > compute_concrete prefix "in" (ss "addition") - { - s : Str = "in" ++ "addition" - } -``` - - -==Grammar architecture== - -===Extending a grammar=== - -The module system of GF makes it possible to **extend** a -grammar in different ways. The syntax of extension is -shown by the following example. We extend ``Food`` by -adding a category of questions and two new functions. -``` - abstract Morefood = Food ** { - cat - Question ; - fun - QIs : Item -> Quality -> Question ; - Pizza : Kind ; - - } -``` -Parallel to the abstract syntax, extensions can -be built for concrete syntaxes: -``` - concrete MorefoodEng of Morefood = FoodEng ** { - lincat - Question = {s : Str} ; - lin - QIs item quality = {s = "is" ++ item.s ++ quality.s} ; - Pizza = {s = "pizza"} ; - } -``` -The effect of extension is that all of the contents of the extended -and extending module are put together. We also say that the new -module **inherits** the contents of the old module. - -At the same time as extending a module of the same type, a concrete -syntax module may open resources. The syntax is shown by the -following Italian grammar module: -``` - concrete MorefoodIta of Morefood = FoodIta ** open StringOper in { - lincat - Question = SS ; - lin - QIs item quality = ss (item.s ++ "è" ++ quality.s) ; - Pizza = ss "pizza" ; - } -``` -Resource modules can extend other resource modules, in the -same way as modules of other types can extend modules of the -same type. Thus it is possible to build resource hierarchies. - - -===Multiple inheritance=== - -Specialized vocabularies can be represented as small grammars that -only do "one thing" each. For instance, the following are grammars -for fruit and mushrooms -``` - abstract Fruit = { - cat Fruit ; - fun Apple, Peach : Fruit ; - } - - abstract Mushroom = { - cat Mushroom ; - fun Cep, Agaric : Mushroom ; - } -``` -They can afterwards be combined into bigger grammars by using -**multiple inheritance**, i.e. extension of several grammars at the -same time: -``` - abstract Foodmarket = Food, Fruit, Mushroom ** { - fun - FruitKind : Fruit -> Kind ; - MushroomKind : Mushroom -> Kind ; - } -``` -The main advantages with splitting a grammar to modules are -**reusability**, **separate compilation**, and **division of labour**. -Reusability means -that one and the same module can be put into different uses; for instance, -a module with mushroom names might be used in a mycological information system -as well as in a restaurant phrasebook. Separate compilation means that a module -once compiled into ``.gfc`` need not be compiled again unless changes have -taken place. -Division of labour means simply that programmers that are experts in -special areas can work on modules belonging tp those areas. - -**Exercise**. Refactor ``Food`` by taking apart ``Wine`` into a special -``Drink`` module. - - - - - - - -==Summary of GF language features== - -Module extensions, multiple inheritance. - -Resource modules. - -Oper judgements. - -Lambda abstraction. - -The ``.cf`` grammar format. - - - - -=Grammars with parameters= - -In this Chapter, we will introduce the techniques needed for -describing the inflection of words, as well as the rules by -which propor word forms are selected in syntactic combinations. -These techniques are already needed in a very slight extension -of the Food grammar of the previous Chapter. While explaining -how the linguistic problems are solved for English and Italian, -we also explain all the language constructs GF has for -defining concrete syntax. - - - -==The problem: words have to be inflected== - -Suppose we want to say, with the vocabulary included in -``Food.gf``, things like -``` - all Italian wines are delicious -``` -The new grammatical facility we need are the plural forms -of nouns and verbs (//wines, are//), as opposed to their -singular forms. - -The introduction of plural forms requires two things: -- the **inflection** of nouns and verbs in singular and plural -- the **agreement** of the verb to subject: - the verb must have the same number as the subject - - -Different languages have different rules of inflection and agreement. -For instance, Italian has also agreement in gender (masculine vs. feminine). -We want to express such special features of languages in the -concrete syntax while ignoring them in the abstract syntax. - -To be able to do all this, we need one new judgement form -and some new expression forms. -We also need to generalize linearization types -from strings to more complex types. - -**Exercise**. Make a list of the possible forms that nouns, -adjectives, and verbs can have in some languages that you know. - - -%--! -==Parameters and tables== - -We define the **parameter type** of number in English by -using a new form of judgement: -``` - param Number = Sg | Pl ; -``` -To express that ``Kind`` expressions in English have a linearization -depending on number, we replace the linearization type ``{s : Str}`` -with a type where the ``s`` field is a **table** depending on number: -``` - lincat Kind = {s : Number => Str} ; -``` -The **table type** ``Number => Str`` is in many respects similar to -a function type (``Number -> Str``). The main difference is that the -argument type of a table type must always be a parameter type. This means -that the argument-value pairs can be listed in a finite table. The following -example shows such a table: -``` - lin Cheese = {s = table { - Sg => "cheese" ; - Pl => "cheeses" - } - } ; -``` -The table consists of **branches**, where a **pattern** on the -left of the arrow ``=>`` is assigned a **value** on the right. - -The application of a table to a parameter is done by the **selection** -operator ``!``. For instance, -``` - table {Sg => "cheese" ; Pl => "cheeses"} ! Pl -``` -is a selection that computes into the value ``"cheeses"``. -This computation is performed by **pattern matching**: return -the value from the first branch whose pattern matches the -selection argument. Thus -``` - table {Sg => "cheese" ; Pl => "cheeses"} ! Pl - ===> "cheeses" -``` - -**Exercise**. In a previous exercise, we made a list of the possible -forms that nouns, adjectives, and verbs can have in some languages that -you know. Now take some of the results and implement them by -using parameter type definitions and tables. Write them into a ``resource`` -module, which you can test by using the command ``compute_concrete``. - - - -%--! -==Inflection tables and paradigms== - -All English common nouns are inflected for number, most of them in the -same way: the plural form is obtained from the singular by adding the -ending //s//. This rule is an example of -a **paradigm** - a formula telling how the inflection -forms of a word are formed. - -From the GF point of view, a paradigm is a function that takes a **lemma** - -also known as a **dictionary form** or a **citation form** - and returns an inflection -table of desired type. Paradigms are not functions in the sense of the -``fun`` judgements of abstract syntax (which operate on trees and not -on strings), but operations defined in ``oper`` judgements. -The following operation defines the regular noun paradigm of English: -``` - oper regNoun : Str -> {s : Number => Str} = \x -> { - s = table { - Sg => x ; - Pl => x + "s" - } - } ; -``` -The **gluing** operator ``+`` tells that -the string held in the variable ``x`` and the ending ``"s"`` -are written together to form one **token**. Thus, for instance, -``` - (regNoun "cheese").s ! Pl ===> "cheese" + "s" ===> "cheeses" -``` - -**Exercise**. Identify cases in which the ``regNoun`` paradigm does not -apply in English, and implement some alternative paradigms. - -**Exercise**. Implement a paradigm for regular verbs in English. - -**Exercise**. Implement some regular paradigms for other languages you have -considered in earlier exercises. - - - -==Using parameters in concrete syntax== - -We can now enrich the concrete syntax definitions to -comprise morphology. This will permit a more radical -variation between languages (e.g. English and Italian) -than just the use of different words. In general, -parameters and linearization types are different in -different languages - but this has no effect on -the common abstract syntax. - -We consider a grammar ``Foods``, which is similar to -``Food``, with the addition of two plural determiners, -``` - fun These, Those : Kind -> Item ; -``` -We also add a noun which in Italian has the feminine case; all noun in -``Food`` were carefully chosen to be masculine! -``` - fun Pizza : Kind ; -``` -This will force us to deal with gender in the Italian grammar, which is what -we need for the grammar to scale up for larger applications. - - - -%--! -===Agreement=== - -In the English ``Foods`` grammar, we need just one type of parameters: -``Number`` as defined above. The phrase-forming rule -``` - fun Is : Item -> Quality -> Phrase ; -``` -is affected by the number because of **subject-verb agreement**. -In English, agreement says that the verb of a sentence -must be inflected in the number of the subject. Thus we will linearize -``` - Is (This Pizza) Warm >> "this pizza is warm" - Is (These Pizza) Warm >> "these pizzas are warm" -``` -It is the **copula**, i.e. the verb //be// that is affected. We define -the copula as the operation -``` - oper copula : Number => Str = - table { - Sg => "is" ; - Pl => "are" - } ; -``` -We don't need to inflect the copula for person and tense yet. - -The form of the copula in a sentence depends on the -**subject** of the sentence, i.e. the item -that is qualified. This means that an item must have such a number to provide. -In other words, the linearization of an ``Item`` must provide a number. The -obvious to guarantee this is by putting a number as a field in -the linearization type: -``` - lincat Item = {s : Str ; n : Number} ; -``` -Now we can write precisely the ``Is`` rule that expresses agreement: -``` - lin Is item qual = {s = item.s ++ copula ! item.n ++ qual.s} ; -``` -The copula needs a number, which it receives from the subject item. - - -===Determiners=== - -Let us turn to ``Item`` subjects and see how they receive their -numbers. The two rules -``` - fun This, These : Kind -> Item ; -``` -form ``Item``s from ``Kind``s by adding **determiners**, either -//this// or //these//. The determiners -require different numbers of their ``Kind`` arguments: ``This`` -requires the singular (//this pizza//) and ``These`` the plural -(//these pizzas//). The ``Kind`` is the same in both cases: ``Pizza``. -Thus a ``Kind`` must have both singular and plural forms. -The simplest way to express this is by using a table: -``` - lincat Kind = {s : Number => Str} ; -``` -The linearization rules for ``This`` and ``These`` can now be written -``` - lin This kind = { - s = "this" ++ kind.s ! Sg ; - n = Sg - } ; - - lin These kind = { - s = "these" ++ kind.s ! Pl ; - n = Pl - } ; -``` -The grammatical relation between the determiner and the noun is similar to -agreement, but due to some subtle differencies into which we don't go here -it is often called **government**. - -Since the same pattern for determination is used four times in the ``FoodsEng`` grammar, -we codify it as an operation, -``` - oper det : - Str -> Number -> {s : Number => Str} -> {s : Str ; n : Number} = - \det,n,kind -> { - s = det ++ kind.s ! n ; - n = n - } ; -``` -In a more **lexicalized** grammar, determiners would be made into a -category of their own and given an inherent number: -``` - lincat Det = {s : Str ; n : Number} ; - fun Det : Det -> Kind -> Item ; - lin Det det kind = { - s = det.s ++ kind.s ! det.n ; - n = det.n - } ; -``` -This is essentially what is done in the linguistically motivated resource grammars. - - - -===Parametric vs. inherent features=== - -``Kind``s, as in general common nouns in English, have both singular -and plural forms; what form is chosen is determined by the construction -in which the noun is used. We say that the number is a -**parametric feature** of nouns. In GF, parametric features -appear as argument types in tables in linearization types. - -``Item``s, as in general noun phrases functioning as subjects, don't -have variation in number. The number is instead an **inherent feature**, -which the noun phrase passes to the verb. In GF, inherent features -appear as record fields in linearization types. - -A category can have both parametric and inherent features. As we will see -in the Italian ``Foods`` grammar, nouns have parametric number and -inherent gender: -``` - lincat Kind = {s : Number => Str ; g : Gender} ; -``` -Nothing prevents the same parameter type from appearing both -as parametric and inherent feature, or the appearance of several inherent -features of the same type, etc. Determining the linearization types -of categories is one of the most crucial steps in the design of a GF -grammar. These two conditions must be in balance: -- existence: what forms are possible to build by morphological and - other means? -- need: what features are expected via agreement or government? - - -Grammar books and dictionaries give good advice on existence; for instance, -an Italian dictionary has entries such as -- **uomo**, pl. //uomini//, n.m. "man" - - -which tells that //uomo// is a masculine noun with the plural form //uomini//. -From this alone, or with a couple more examples, we can generalize to the type -for all nouns in Italian: they have both singular and plural forms and thus -a parametric number, and they have an inherent gender. - -The distinction between parametric and inherent features can be stated in -object-oriented programming terms: a linearization type is like a **class**, -which has a **method** for linearization and also some **attributes**. -In this class, the parametric features appear as supplementary arguments to the -linearization method, whereas the inherent features appear as arguments. - -Sometimes the puzzle of making agreement and government work in a grammar has -several solutions. For instance, **precedence** in programming languages can -be equivalently described by a parametric or an inherent feature -(see Section ?? below). - -In natural language applications that use the resource grammar library, -all parameters are hidden from the user, who thereby does not need to bother -about them. The only thing that one has to think about is what linguistic -categories are given as linearization types to each semantic category. - - - -==An English concrete syntax for Foods with parameters== - -We repeat some of the rules above by showing the entire -module ``FoodsEng``, equipped with parameters. The parameters and -operations are, for the sake of brevity, included in the same module -and not in a separate ``resource``. However, some string operations -from the library [``Prelude`` ../../lib/prelude/Prelude.gf] -are used. -``` - --# -path=.:prelude - - concrete FoodsEng of Foods = open Prelude in { - - lincat - S, Quality = SS ; - Kind = {s : Number => Str} ; - Item = {s : Str ; n : Number} ; - - lin - Is item quality = ss (item.s ++ copula item.n ++ quality.s) ; - This = det Sg "this" ; - That = det Sg "that" ; - These = det Pl "these" ; - Those = det Pl "those" ; - QKind quality kind = {s = \\n => quality.s ++ kind.s ! n} ; - Wine = regNoun "wine" ; - Cheese = regNoun "cheese" ; - Fish = noun "fish" "fish" ; - Pizza = regNoun "pizza" ; - Very = prefixSS "very" ; - Fresh = ss "fresh" ; - Warm = ss "warm" ; - Italian = ss "Italian" ; - Expensive = ss "expensive" ; - Delicious = ss "delicious" ; - Boring = ss "boring" ; - - param - Number = Sg | Pl ; - - oper - det : Number -> Str -> {s : Number => Str} -> {s : Str ; n : Number} = - \n,d,cn -> { - s = d ++ cn.s ! n ; - n = n - } ; - noun : Str -> Str -> {s : Number => Str} = - \man,men -> {s = table { - Sg => man ; - Pl => men - } - } ; - regNoun : Str -> {s : Number => Str} = - \car -> noun car (car + "s") ; - copula : Number -> Str = - \n -> case n of { - Sg => "is" ; - Pl => "are" - } ; - } -``` -Notice the ``case`` expression in the ``copula`` rule. Such expressions -are common in functional programming languages. In GF they are just syntactic -sugar for table selections: -``` - case e of {...} === table {...} ! e -``` - - -==Pattern matching== - -We have so far built all expressions of the ``table`` form -from branches whose patterns are constants introduced in -``param`` definitions, as well as constant strings. -But there are more expressive patterns. Here is a summary of the possible forms: -- a constructor pattern (identifier introduced in a ``param`` definition) matches - the identical constructor -- a variable pattern (identifier other than constant parameter) matches anything -- the wild card ``_`` matches anything -- a string literal pattern, e.g. ``"s"``, matches the same string -- a disjunctive pattern ``P | ... | Q`` matches anything that - one of the disjuncts matches - - -Pattern matching is performed in the order in which the branches -appear in the table: the branch of the first matching pattern is followed. -Thus we could write the regular noun paradigm equally well as -``` - regNoun : Str -> {s : Number => Str} = - \car -> {s = table { - Sg => car ; - _ => car + "s" - } - } ; -``` -where the wildcard matches anything but the singular. - -Tables with only one branch are a common special case. -Either the value is the same for all parameters, as in -``` - lin Fish = {s = table {_ => "fish"}} ; -``` -or a parameter variable is just passed on to the right-hand-side, -as in -``` - lin QKind quality kind = {s = table {n => quality.s ++ kind.s ! n}} ; -``` -GF has syntactic sugar for writing one-branch tables concisely: -``` - \\P,...,Q => t === table {P => ... table {Q => t} ...} -``` -Thus we could rewrite the above rules -``` - lin Fish = {s = \\_ => "fish"} ; - lin QKind quality kind = {s = \\n => quality.s ++ kind.s ! n} ; -``` - - -%--! -==Hierarchic parameter types== - -The reader familiar with a functional programming language such as -[Haskell http://www.haskell.org] must have noticed the similarity -between parameter types in GF and **algebraic datatypes** (``data`` definitions -in Haskell). The parameter types of GF are actually a special case of algebraic -datatypes: the main restriction is that in GF, these types must be finite. -It is this restriction that makes it possible to invert linearization rules into -parsing methods. - -However, finite is not the same thing as enumerated. Even in GF, parameter -constructors can take arguments, provided these arguments are from other -parameter types - only recursion is forbidden. Such parameter types impose a -hierarchic order among parameters. They are often needed to define -the linguistically most accurate parameter systems. - -To give an example, Swedish adjectives -are inflected for number (singular or plural) and -gender (uter or neuter). These parameters would suggest 2*2=4 different -forms. However, the gender distinction is done only in the singular. Therefore, -it would be inaccurate to define adjective paradigms using the type -``Gender => Number => Str``. The following hierarchic definition -yields an accurate system of three adjectival forms. -``` - param AdjForm = ASg Gender | APl ; - param Gender = Utr | Neutr ; -``` -Here is an example of pattern matching, the paradigm of regular adjectives. -``` - oper regAdj : Str -> AdjForm => Str = \fin -> table { - ASg Utr => fin ; - ASg Neutr => fin + "t" ; - APl => fin + "a" ; - } -``` -A constructor can be used as a pattern that has patterns as arguments. For instance, -the adjectival paradigm in which the two singular forms are the same, -can be defined -``` - oper plattAdj : Str -> AdjForm => Str = \platt -> table { - ASg _ => platt ; - APl => platt + "a" ; - } -``` - - - - -%--! -==Discontinuous constituents== - -A linearization type may contain more strings than one. -An example of where this is useful are English particle -verbs, such as //switch off//. The linearization of -a sentence may place the object between the verb and the particle: -//he switched it off//. - -The following judgement defines transitive verbs as -**discontinuous constituents**, i.e. as having a linearization -type with two strings and not just one. -``` - lincat TV = {s : Number => Str ; part : Str} ; -``` -In the abstract syntax, we can now have a rule that combines a subject and and object -item with a transitive verb to form a sentence: -``` - fun AppTV : Item -> TV -> Item -> Phrase ; -``` -The linearization rule places the object between the two parts of the verb: -``` - lin AppTV subj tv obj = {s = subj.s ++ tv.s ! subj.n ++ obj.s ++ tv.part} ; -``` -There is no restriction in the number of discontinuous constituents -(or other fields) a ``lincat`` may contain. The only condition is that -the fields must be built from records, tables, -parameters, and ``Str``, but not functions. - -A mathematical result -about parsing in GF says that the worst-case complexity of parsing -increases with the number of discontinuous constituents. This is -potentially a reason to avoid discontinuous constituents. - -Moreover, the parsing and linearization commands only give accurate -results for categories whose linearization type has a unique ``Str`` -valued field labelled ``s``. Therefore, discontinuous constituents -are not a good idea in top-level categories accessed by the users -of a grammar application. - -**Exercise**. Define the language ``a^n b^n c^n`` in GF, i.e. -any number of //a//'s followed by the same number of //b//'s and -the same number of //c//'s. This language is not context-free, -but can be defined in GF by using discontinuous constituents. - - -==More constructs for concrete syntax== - -In this section, we go through constructs that are not necessary -in simple grammars or when the concrete syntax relies on libraries. -But they are useful when writing advanced concrete syntax implementations, -such as resource grammar libraries. They complete -our presentation of concrete syntax constructs. - - -%--! -===Local definitions=== - -Local definitions ("``let`` expressions") are used in functional -programming for two reasons: to structure the code into smaller -expressions, and to avoid repeated computation of one and -the same expression. Here is an example from -Italian morphology. The operation needs to analyse the -last letter of the lemma, to select a plural ending. -It also needs the stem consisting of all letters than the last, -to add the ending to. The lemma and the ending are computed -in a local definition. -``` - oper regNoun : Str -> Noun = \vino -> - let - vin = init vino ; - o = last vino - in - case o of { - "a" => mkNoun Fem vino (vin + "e") ; - "o" | "e" => mkNoun Masc vino (vin + "i") ; - _ => mkNoun Masc vino vino - } ; -``` - - - -===Record extension and subtyping=== - -Record types and records can be **extended** with new fields. For instance, -in German it is natural to see transitive verbs as verbs with a case, which -is usually accusative or dative, and is passed to the object of the verb. -The symbol ``**`` is used for both record types and record objects. -``` - lincat TV = Verb ** {c : Case} ; - - lin Follow = regVerb "folgen" ** {c = Dative} ; -``` -To extend a record type or a record with a field whose label it -already has is a type error. It is also an error to extend a type or -object that is not a record. - -A record type //T// is a **subtype** of another one //R//, if //T// has -all the fields of //R// and possibly other fields. For instance, -an extension of a record type is always a subtype of it. - -If //T// is a subtype of //R//, an object of //T// can be used whenever -an object of //R// is required. For instance, a transitive verb can -be used whenever a verb is required. - -**Contravariance** means that a function taking an //R// as argument -can also be applied to any object of a subtype //T//. - - - -===Tuples and product types=== - -Product types and tuples are syntactic sugar for record types and records: -``` - T1 * ... * Tn === {p1 : T1 ; ... ; pn : Tn} - === {p1 = T1 ; ... ; pn = Tn} -``` -Thus the labels ``p1, p2,...`` are hard-coded. - - -===Record and tuple patterns=== - -Record types of parameter types count themselves as parameter types. -A typical example is a record of agreement features, e.g. Italian -``` - oper Agr : PType = {g : Gender ; n : Number ; p : Person} ; -``` -Notice the term ``PType`` rather than just ``Type`` referring to -parameter types. Every ``PType`` is also a ``Type``, but not vice-versa. - -Pattern matching is done in the expected way, but it can moreover -utilize partial records: the branch -``` - {g = Fem} => t -``` -in a table of type ``Agr => T`` means the same as -``` - {g = Fem ; n = _ ; p = _} => t -``` -Tuple patterns are translated to record patterns in the -same way as tuples to records; partial patterns make it -possible to write, slightly surprisingly, -``` - case of { - => t - ... - } -``` - -===Regular expression patterns=== - -To define string operations computed at compile time, such -as in morphology, it is handy to use regular expression patterns: - - //p// ``+`` //q// : token consisting of //p// followed by //q// - - //p// ``*`` : token //p// repeated 0 or more times - (max the length of the string to be matched) - - ``-`` //p// : matches anything that //p// does not match - - //x// ``@`` //p// : bind to //x// what //p// matches - - //p// ``|`` //q// : matches what either //p// or //q// matches - - -The last three apply to all types of patterns, the first two only to token strings. -As an example, we give a rule for the formation of English word forms -ending with an //s// and used in the formation of both plural nouns and -third-person present-tense verbs. -``` - add_s : Str -> Str = \w -> case w of { - _ + "oo" => w + "s" ; -- bamboo - _ + ("s" | "z" | "x" | "sh" | "o") => w + "es" ; -- bus, hero - _ + ("a" | "o" | "u" | "e") + "y" => w + "s" ; -- boy - x + "y" => x + "ies" ; -- fly - _ => w + "s" -- car - } ; -``` -Here is another example, the plural formation in Swedish 2nd declension. -The second branch uses a variable binding with ``@`` to cover the cases where an -unstressed pre-final vowel //e// disappears in the plural -(//nyckel-nycklar, seger-segrar, bil-bilar//): -``` - plural2 : Str -> Str = \w -> case w of { - pojk + "e" => pojk + "ar" ; - nyck + "e" + l@("l" | "r" | "n") => nyck + l + "ar" ; - bil => bil + "ar" - } ; -``` -Variables in regular expression patterns -are always bound to the **first match**, which is the first -in the sequence of binding lists. For example: -- ``x + "e" + y`` matches ``"peter"`` with ``x = "p", y = "ter"`` -- ``x + "er"*`` matches ``"burgerer"`` with ``x = "burg" - - - -**Exercise**. Implement the German **Umlaut** operation on word stems. -The operation changes the vowel of the stressed stem syllable as follows: -//a// to //ä//, //au// to //äu//, //o// to //ö//, and //u// to //ü//. You -can assume that the operation only takes syllables as arguments. Test the -operation to see whether it correctly changes //Arzt// to //Ärzt//, -//Baum// to //Bäum//, //Topf// to //Töpf//, and //Kuh// to //Küh//. - -**Exercise**. Define an operation that deletes all vowels from the -end of a string, so that e.g. "aigeia" becomes "aig". - - -===Free variation=== - -Sometimes there are many alternative ways to define a concrete syntax. -For instance, the verb negation in English can be expressed both by -//does not// and //doesn't//. In linguistic terms, these expressions -are in **free variation**. The ``variants`` construct of GF can -be used to give a list of strings in free variation. For example, -``` - NegVerb verb = {s = variants {["does not"] ; "doesn't} ++ verb.s ! Pl} ; -``` -An empty variant list -``` - variants {} -``` -can be used e.g. if a word lacks a certain form. - -In general, ``variants`` should be used cautiously. It is not -recommended for modules aimed to be libraries, because the -user of the library has no way to choose among the variants. - - -===Prefix-dependent choices=== - -Sometimes a token has different forms depending on the token -that follows. An example is the English indefinite article, -which is //an// if a vowel follows, //a// otherwise. -Which form is chosen can only be decided at run time, i.e. -when a string is actually build. GF has a special construct for -such tokens, the ``pre`` construct exemplified in -``` - oper artIndef : Str = - pre {"a" ; "an" / strs {"a" ; "e" ; "i" ; "o"}} ; -``` -Thus -``` - artIndef ++ "cheese" ---> "a" ++ "cheese" - artIndef ++ "apple" ---> "an" ++ "apple" -``` -This very example does not work in all situations: the prefix -//u// has no general rules, and some problematic words are -//euphemism, one-eyed, n-gram//. It is possible to write -``` - oper artIndef : Str = - pre {"a" ; - "a" / strs {"eu" ; "one"} ; - "an" / strs {"a" ; "e" ; "i" ; "o" ; "n-"} - } ; -``` - -**Example**. The masculine singular definite article has three forms: -- //l'// before a vowel (any of //aeiouh//): //l'amico// ("the friend") -- //lo// before "impure s" - (any of "sb", "sc", "sd", "sf", "sg", "sm", "sp", "st", "sv", "z"): - //lo stato// ("the state") -- //il// otherwise: //il vino// ("the wine") - - -Define this by using prefix-dependent choice. - - - -===Predefined types=== - -GF has the following predefined categories in abstract syntax: -``` - cat Int ; -- integers, e.g. 0, 5, 743145151019 - cat Float ; -- floats, e.g. 0.0, 3.1415926 - cat String ; -- strings, e.g. "", "foo", "123" -``` -The objects of each of these categories are **literals** -as indicated in the comments above. No ``fun`` definition -can have a predefined category as its value type, but -they can be used as arguments. For example: -``` - fun StreetAddress : Int -> String -> Address ; - lin StreetAddress number street = {s = number.s ++ street.s} ; - - -- e.g. (StreetAddress 10 "Downing Street") : Address -``` -FIXME: The linearization type is ``{s : Str}`` for all these categories. - - -===Function types with variables=== - -In Chapter 8, we will introduce **dependent function types**, where -the value type depends on the argument. For this end, we need a notation -that binds a variable to the argument type, as in -``` - switchOff : (k : Kind) -> Action k -``` -Function types //without// -variables are actually a shorthand notation: writing -``` - PredVP : NP -> VP -> S -``` -is shorthand for -``` - PredVP : (x : NP) -> (y : VP) -> S -``` -or any other naming of the variables. Actually the use of variables -sometimes shortens the code, since they can share a type: -``` - octuple : (x,y,z,u,v,w,s,t : Str) -> Str -``` -If a bound variable is not used, it can here, as elsewhere in GF, be replaced by -a wildcard: -``` - octuple : (_,_,_,_,_,_,_,_ : Str) -> Str -``` -A good practice for functions with many arguments of the same type -is to indicate the number of arguments: -``` - octuple : (x1,_,_,_,_,_,_,x8 : Str) -> Str -``` -One can also use heuristic variable names to document what -information each argument is expected to provide. -This is very handy in the types of inflection paradigms: -``` - mkV : (drink,drank,drunk : Str) -> V -``` - - -===Separating operation types and definitions=== - -In grammars intended as libraries, it is useful to separate oparation -definitions from their type signatures. The user is only interested -in the type, whereas the definition is kept for the implementor and -the maintainer. This is possible by using separate ``oper`` fragments -for the two parts: -``` - oper regNoun : Str -> Noun ; - oper regNoun s = mkNoun s (s + "s") ; -``` -The type checker combines the two into one ``oper`` judgement to see -if the definition matches the type. Notice that, in this way, it -is possible to bind the argument variables on the left hand side -instead of using a lambda. - -In the library module, the type signatures are typically placed in -the beginning and the definitions in the end. A more radical separation -can be achieved by using the ``interface`` and ``instance`` module types -(see below Section ??): the type signatures are placed in the interface -and the definitions in the instance. - - - - -===Overloading of operations=== - -Large libraries, such as the GF Resource Grammar Library, may define -hundreds of names. This can be unpractical -for both the library author and the user: the author has to invent longer -and longer names which are not always intuitive, -and the author has to learn or at least be able to find all these names. -A solution to this problem, adopted by languages such as C++, -is **overloading**: one and the same name can be used for several functions. -When such a name is used, the -compiler performs **overload resolution** to find out which of -the possible functions is meant. Overload resolution is based on -the types of the functions: all functions that -have the same name must have different types. - -In C++, functions with the same name can be scattered everywhere in the program. -In GF, they must be grouped together in ``overload`` groups. Here is an example -of an overload group, giving three different ways to define verbs in English: -``` - oper mkV : overload { - mkV : (walk : Str) -> V ; = -- regular verbs - mkV : (omit,omitting : Str) -> V ; = -- reg. verbs with duplication - mkV : (sing,sang,sung : Str) -> V ; = -- irregular verbs - mkV : (run,ran,run,running : Str) -> V = -- irreg. verbs with duplication - } -``` -Intuitively, the forms correspond to the way regular and irregular words -are given in most dictionaries: by listing relevant forms, instead of -referring to a paradigm number identifier. - -The ``mkV`` example above gives only the possible types of the overloaded -operation. Their definitions can be given separately, maybe in another module -(cf. the section above). An overload group with definitions looks as follows: -``` - oper mkV = overload { - mkV : (walk : Str) -> V = regV ; - mkV : (omit,omitting : Str) -> V = ... ; - mkV : (sing,sang,sung : Str) -> V = ... ; - mkV : (run,ran,run,running : Str) -> V = ... ; - } -``` -Notice that the types of the branches must be repeated so that they can be -associated with proper definitions; the order of the branches has no -significance. - - - -==The Italian Food grammar== - -We conclude the parametrization of the Food grammar by presenting an -Italian variant, now complete with parameters, inflection, and -agreement. - -The header part is similar to English: -``` ---# -path=.:prelude - -concrete FoodsIta of Foods = open Prelude in { -``` -Parameters include not only number byt also gender. -``` - param - Number = Sg | Pl ; - Gender = Masc | Fem ; -``` -Qualities are inflected for gender and number, whereas kinds -have a parametric number (as in English) and an inherent gender. -Items have an inherent number (as in English) but also gender. -``` - lincat - Phr = SS ; - Quality = {s : Gender => Number => Str} ; - Kind = {s : Number => Str ; g : Gender} ; - Item = {s : Str ; g : Gender ; n : Number} ; -``` -A Quality is expressed by an adjective, which in Italian has one form for each -gender-number combination. -``` - oper - adjective : (_,_,_,_ : Str) -> {s : Gender => Number => Str} = - \nero,nera,neri,nere -> { - s = table { - Masc => table { - Sg => nero ; - Pl => neri - } ; - Fem => table { - Sg => nera ; - Pl => nere - } - } - } ; -``` -The very common case of regular adjectives works by adding -endings to the stem. -``` - regAdj : Str -> {s : Gender => Number => Str} = \nero -> - let ner = init nero - in adjective nero (ner + "a") (ner + "i") (ner + "e") ; -``` -For noun inflection, there are several paradigms; since only two forms -are ever needed, we will just give them explicitly (the resource grammar -library also has a paradigm that takes the singular form and infers the -plural and the gender from it). -``` - noun : Str -> Str -> Gender -> {s : Number => Str ; g : Gender} = - \man,men,g -> { - s = table { - Sg => man ; - Pl => men - } ; - g = g - } ; -``` -As in ``FoodEng``, we need only number variation for the copula. -``` - copula : Number -> Str = - \n -> case n of { - Sg => "è" ; - Pl => "sono" - } ; -``` -Determination is more complex than in English, because of gender: -it uses separate determiner forms for the two genders, and selects -one of them as function of the noun determined. -``` - - det : Number -> Str -> Str -> {s : Number => Str ; g : Gender} -> - {s : Str ; g : Gender ; n : Number} = - \n,m,f,cn -> { - s = case cn.g of {Masc => m ; Fem => f} ++ cn.s ! n ; - g = cn.g ; - n = n - } ; -``` -Here is, finally, the complete set of linearization rules. -``` - lin - Is item quality = - ss (item.s ++ copula item.n ++ quality.s ! item.g ! item.n) ; - This = det Sg "questo" "questa" ; - That = det Sg "quello" "quella" ; - These = det Pl "questi" "queste" ; - Those = det Pl "quelli" "quelle" ; - QKind quality kind = { - s = \\n => kind.s ! n ++ quality.s ! kind.g ! n ; - g = kind.g - } ; - Wine = noun "vino" "vini" Masc ; - Cheese = noun "formaggio" "formaggi" Masc ; - Fish = noun "pesce" "pesci" Masc ; - Pizza = noun "pizza" "pizze" Fem ; - Very qual = {s = \\g,n => "molto" ++ qual.s ! g ! n} ; - Fresh = adjective "fresco" "fresca" "freschi" "fresche" ; - Warm = regAdj "caldo" ; - Italian = regAdj "italiano" ; - Expensive = regAdj "caro" ; - Delicious = regAdj "delizioso" ; - Boring = regAdj "noioso" ; - } -``` -The grammars ``FoodsEng`` and ``FoodsIta`` can be found on line, and -in the GF distribution, in the directory -[``examples/tutorial/foods/`` ../../examples/tutorial/foods/]. - - -**Exercise**. Experiment with multilingual generation and translation in the -``Foods`` grammars. - - -**Exercise**. Add items, qualities, and determiners to the grammar, and try to get -their inflection and inherent features right. - -**Exercise**. Write a concrete syntax of ``Food`` for a language of your choice, -now aiming for complete grammatical correctness by the use of parameters. - - - - - -=Using the resource grammar library= - -In this chapter, we will take a look at the GF resource grammar library. -We will use the library to implement a slightly extended ``Food`` grammar -and port it to some new languages. Some new concepts of GF's module system -are also introduced, most notably the technique of **parametrized modules**, -which has become an important "design pattern" for multilingual grammars. - - -==The coverage of the library== - -The GF Resource Grammar Library contains grammar rules for -10 languages (in addition, 2 languages are available as incomplete -implementations, and a few more are under construction). Its purpose -is to make these rules available for application programmers, -who can thereby concentrate on the semantic and stylistic -aspects of their grammars, without having to think about -grammaticality. The targeted level of application grammarians -is that of a skilled programmer with -a practical knowledge of the target languages, but without -theoretical knowledge about their grammars. -Such a combination of -skills is typical of programmers who, for instance, want to localize -software to new languages. - -The current resource languages are -- ``Ara``bic (incomplete) -- ``Cat``alan (incomplete) -- ``Dan``ish -- ``Eng``lish -- ``Fin``nish -- ``Fre``nch -- ``Ger``man -- ``Ita``lian -- ``Nor``wegian -- ``Rus``sian -- ``Spa``nish -- ``Swe``dish - - -The first three letters (``Eng`` etc) are used in grammar module names. -The incomplete Arabic and Catalan implementations are -enough to be used in many applications; they both contain, amoung other -things, complete inflectional morphology. - - -==The resource API== - -The resource library API is devided into language-specific -and language-independent parts. To put it roughly, -- the syntax API is language-independent, i.e. has the same types and functions for all - languages. - Its name is ``Syntax``//L// for each language //L// -- the morphology API is language-specific, i.e. has partly different types and functions - for different languages. - Its name is ``Paradigms``//L// for each language //L// - - -A full documentation of the API is available on-line in the -[resource synopsis ../../lib/resource-1.0/synopsis.html]. For our -examples, we will only need a fragment of the full API. - -In the first examples, -we will make use of the following categories, from the module ``Syntax``. - -|| Category | Explanation | Example || -| ``Utt`` | sentence, question, word... | "be quiet" | -| ``Adv`` | verb-phrase-modifying adverb, | "in the house" | -| ``AdA`` | adjective-modifying adverb, | "very" | -| ``S`` | declarative sentence | "she lived here" | -| ``Cl`` | declarative clause, with all tenses | "she looks at this" | -| ``AP`` | adjectival phrase | "very warm" | -| ``CN`` | common noun (without determiner) | "red house" | -| ``NP`` | noun phrase (subject or object) | "the red house" | -| ``Det`` | determiner phrase | "those seven" | -| ``Predet`` | predeterminer | "only" | -| ``Quant`` | quantifier with both sg and pl | "this/these" | -| ``Prep`` | preposition, or just case | "in" | -| ``A`` | one-place adjective | "warm" | -| ``N`` | common noun | "house" | - - -We will need the following syntax rules from ``Syntax``. - -|| Function | Type | Example || -| ``mkUtt`` | ``S -> Utt`` | //John walked// | -| ``mkUtt`` | ``Cl -> Utt`` | //John walks// | -| ``mkCl`` | ``NP -> AP -> Cl`` | //John is very old// | -| ``mkNP`` | ``Det -> CN -> NP`` | //the first old man// | -| ``mkNP`` | ``Predet -> NP -> NP`` | //only John// | -| ``mkDet`` | ``Quant -> Det`` | //this// | -| ``mkCN`` | ``N -> CN`` | //house// | -| ``mkCN`` | ``AP -> CN -> CN`` | //very big blue house// | -| ``mkAP`` | ``A -> AP`` | //old// | -| ``mkAP`` | ``AdA -> AP -> AP`` | //very very old// | - -We will also need the following structural words from ``Syntax``. - -|| Function | Type | Example || -| ``all_Predet`` | ``Predet`` | //all// | -| ``defPlDet`` | ``Det`` | //the (houses)// | -| ``this_Quant`` | ``Quant`` | //this// | -| ``very_AdA`` | ``AdA`` | //very// | - - -For French, we will use the following part of ``ParadigmsFre``. - -|| Function | Type || -| ``Gender`` | ``Type`` | -| ``masculine`` | ``Gender`` | -| ``feminine`` | ``Gender`` | -| ``mkN`` | ``(cheval : Str) -> N`` | -| ``mkN`` | ``(foie : Str) -> Gender -> N`` | -| ``mkA`` | ``(cher : Str) -> A`` | -| ``mkA`` | ``(sec,seche : Str) -> A`` | - - -For German, we will use the following part of ``ParadigmsGer``. - -|| Function | Type || -| ``Gender`` | ``Type`` | -| ``masculine`` | ``Gender`` | -| ``feminine`` | ``Gender`` | -| ``neuter`` | ``Gender`` | -| ``mkN`` | ``(Stufe : Str) -> N`` | -| ``mkN`` | ``(Bild,Bilder : Str) -> Gender -> N`` | -| ``mkA`` | ``(klein : Str) -> A`` | -| ``mkA`` | ``(gut,besser,beste : Str) -> A`` | - - -**Exercise**. Try out the morphological paradigms in different languages. Do -in this way: -``` - > i -path=alltenses:prelude -retain alltenses/ParadigmsGer.gfr - > cc mkN "Farbe" - > cc mkA "gut" "besser" "beste" -``` - - -==Example: French== - -We start with an abstract syntax that is like ``Food`` before, but -has a plural determiner (//all wines//) and some new nouns that will -need different genders in most languages. -``` - abstract Food = { - cat - S ; Item ; Kind ; Quality ; - fun - Is : Item -> Quality -> S ; - This, All : Kind -> Item ; - QKind : Quality -> Kind -> Kind ; - Wine, Cheese, Fish, Beer, Pizza : Kind ; - Very : Quality -> Quality ; - Fresh, Warm, Italian, Expensive, Delicious, Boring : Quality ; - } -``` -The French implementation opens ``SyntaxFre`` and ``ParadigmsFre`` -to get access to the resource libraries needed. In order to find -the libraries, a ``path`` directive is prepended; it is interpreted -relative to the environment variable ``GF_LIB_PATH``. -``` - --# -path=.:present:prelude - - concrete FoodFre of Food = open SyntaxFre,ParadigmsFre in { - lincat - S = Utt ; - Item = NP ; - Kind = CN ; - Quality = AP ; - lin - Is item quality = mkUtt (mkCl item quality) ; - This kind = mkNP (mkDet this_Quant) kind ; - All kind = mkNP all_Predet (mkNP defPlDet kind) ; - QKind quality kind = mkCN quality kind ; - Wine = mkCN (mkN "vin") ; - Beer = mkCN (mkN "bière") ; - Pizza = mkCN (mkN "pizza" feminine) ; - Cheese = mkCN (mkN "fromage" masculine) ; - Fish = mkCN (mkN "poisson") ; - Very quality = mkAP very_AdA quality ; - Fresh = mkAP (mkA "frais" "fraîche") ; - Warm = mkAP (mkA "chaud") ; - Italian = mkAP (mkA "italien") ; - Expensive = mkAP (mkA "cher") ; - Delicious = mkAP (mkA "délicieux") ; - Boring = mkAP (mkA "ennuyeux") ; - } -``` -The ``lincat`` definitions in ``FoodFre`` assign **resource categories** -to **application categories**. In a sense, the application categories -are **semantic**, as they correspond to concepts in the grammar application, -whereas the resource categories are **syntactic**: they give the linguistic -means to express concepts in any application. - -The ``lin`` definitions likewise assign resource functions to application -functions. Under the hood, there is a lot of matching with parameters to -take care of word order, inflection, and agreement. But the user of the -library sees nothing of this: the only parameters you need to give are -the genders of some nouns, which cannot be correctly inferred from the word. - -In French, for example, the one-argument ``mkN`` assigns the noun the feminine -gender if and only if it ends with an //e//. Therefore the words //fromage// and -//pizza// are given genders manually. -One can of course always give genders manually, to be on the safe side. - -As for inflection, the one-argument adjective pattern ``mkA`` takes care of -completely regular adjective such as //chaud-chaude//, but also of special -cases such as //italien-italienne//, //cher-chère//, and //délicieux-délicieuse//. -But it cannot form //frais-fraîche// properly. Once again, you can give more -forms to be on the safe side. You can also test the paradigms in the GF -system. - -**Exercise**. Compile the grammar ``FoodFre`` and generate and parse some sentences. - -**Exercise**. Write a concrete syntax of ``Food`` for English or some other language -included in the resource library. You can also compare the output with the hand-written -grammars presented earlier in this tutorial. - -**Exercise**. In particular, try to write a concrete syntax for Italian, even if -you don't know Italian. What you need to know is that "beer" is //birra// and -"pizza" is //pizza//, and that all the nouns and adjectives in the grammar -are regular. - - - -==Functor implementation of multilingual grammars== - -If you did the exercise of writing a concrete syntax of ``Food`` for some other -language, you probably noticed that much of the code looks exactly the same -as for French. The immediate reason for this is that the ``Syntax`` API is the -same for all languages; the deeper reason is that all languages (at least those -in the resource package) implement the same syntactic structures and tend to use them -in similar ways. Thus it is only the lexical parts of a concrete syntax that -you need to write anew for a new language. In brief, -- first copy the concrete syntax for one language -- then change the words (the strings and perhaps some paradigms) - - -But programming by copy-and-paste is not worthy of a functional programmer. -Can we write a function that takes care of the shared parts of grammar modules? -Yes, we can. It is not a function in the ``fun`` or ``oper`` sense, but -a function operating on modules, called a **functor**. This construct -is familiar from the functional languages ML and OCaml, but it does not -exist in Haskell. It also bears some resemblance to templates in C++. -Functors are also known as **parametrized modules**. - -In GF, a functor is a module that ``open``s one or more **interfaces**. -An ``interface`` is a module similar to a ``resource``, but it only -contains the types of ``oper``s, not their definitions. You can think -of an interface as a kind of a record type. Thus a functor is a kind -of a function taking records as arguments and producins a module -as value. - -Let us look at a functor implementation of the ``Food`` grammar. -Consider its module header first: -``` - incomplete concrete FoodI of Food = open Syntax, LexFood in -``` -In the functor-function analogy, ``FoodI`` would be presented as a function -with the following type signature: -``` - FoodI : instance of Syntax -> instance of LexFood -> concrete of Food -``` -It takes as arguments two interfaces: -- ``Syntax``, the resource grammar interface -- ``LexFood``, the domain-specific lexicon interface - - -Functors opening ``Syntax`` and a domain lexicon interface are in fact -so typical in GF applications, that this structure could be called -a **design patter** -for GF grammars. The idea in this pattern is, again, that -the languages use the same syntactic structures but different words. - -Before going to the details of the module bodies, let us look at how functors -are concretely used. An interface has a header such as -``` - interface LexFood = open Syntax in -``` -To give an ``instance`` of it means that all ``oper``s are given definitione (of -appropriate types). For example, -``` - instance LexFoodGer of LexFood = open SyntaxGer, ParadigmsGer in -``` -Notice that when an interface opens an interface, such as ``Syntax``, -then its instance -opens an instance of it. But the instance may also open some other -resources - typically, -a domain lexicon instance opens a ``Paradigms`` module. - -In the function-functor analogy, we now have -``` - SyntaxGer : instance of Syntax - LexFoodGer : instance of LexFood -``` -Thus we can complete the German implementation by "applying" the functor: -``` - FoodI SyntaxGer LexFoodGer : concrete of Food -``` -The GF syntax for doing so is -``` - concrete FoodGer of Food = FoodI with - (Syntax = SyntaxGer), - (LexFood = LexFoodGer) ; -``` -Notice that this is the //complete// module, not just a header of it. -The module body is received from ``FoodI``, by instantiating the -interface constants with their definitions given in the German -instances. - -A module of this form, characterized by the keyword ``with``, is -called a **functor instantiation**. - -Here is the complete code for the functor ``FoodI``: -``` - incomplete concrete FoodI of Food = open Syntax, LexFood in { - lincat - S = Utt ; - Item = NP ; - Kind = CN ; - Quality = AP ; - lin - Is item quality = mkUtt (mkCl item quality) ; - This kind = mkNP (mkDet this_Quant) kind ; - All kind = mkNP all_Predet (mkNP defPlDet kind) ; - QKind quality kind = mkCN quality kind ; - Wine = mkCN wine_N ; - Beer = mkCN beer_N ; - Pizza = mkCN pizza_N ; - Cheese = mkCN cheese_N ; - Fish = mkCN fish_N ; - Very quality = mkAP very_AdA quality ; - Fresh = mkAP fresh_A ; - Warm = mkAP warm_A ; - Italian = mkAP italian_A ; - Expensive = mkAP expensive_A ; - Delicious = mkAP delicious_A ; - Boring = mkAP boring_A ; -} -``` - - -==Interfaces and instances== - -Let us now define the ``LexFood`` interface: -``` - interface LexFood = open Syntax in { - oper - wine_N : N ; - beer_N : N ; - pizza_N : N ; - cheese_N : N ; - fish_N : N ; - fresh_A : A ; - warm_A : A ; - italian_A : A ; - expensive_A : A ; - delicious_A : A ; - boring_A : A ; -} -``` -In this interface, only lexical items are declared. In general, an -interface can declare any functions and also types. The ``Syntax`` -interface does so. - -Here is the German instance of the interface: -``` - instance LexFoodGer of LexFood = open SyntaxGer, ParadigmsGer in { - oper - wine_N = mkN "Wein" ; - beer_N = mkN "Bier" "Biere" neuter ; - pizza_N = mkN "Pizza" "Pizzen" feminine ; - cheese_N = mkN "Käse" "Käsen" masculine ; - fish_N = mkN "Fisch" ; - fresh_A = mkA "frisch" ; - warm_A = mkA "warm" "wärmer" "wärmste" ; - italian_A = mkA "italienisch" ; - expensive_A = mkA "teuer" ; - delicious_A = mkA "köstlich" ; - boring_A = mkA "langweilig" ; - } -``` -Just to complete the picture, we repeat the German functor instantiation -for ``FoodI``, this time with a path directive that makes it compilable. -``` - --# -path=.:present:prelude - - concrete FoodGer of Food = FoodI with - (Syntax = SyntaxGer), - (LexFood = LexFoodGer) ; -``` - - -**Exercise**. Compile and test ``FoodGer``. - -**Exercise**. Refactor ``FoodFre`` into a functor instantiation. - - - -==Adding languages to a functor implementation== - -Once we have an application grammar defined by using a functor, -adding a new language is simple. Just two modules need to be written: -- a domain lexicon instance -- a functor instantiation - - -The functor instantiation is completely mechanical to write. -Here is one for Finnish: -``` ---# -path=.:present:prelude - -concrete FoodFin of Food = FoodI with - (Syntax = SyntaxFin), - (LexFood = LexFoodFin) ; -``` -The domain lexicon instance requires some knowledge of the words of the -language: what words are used for which concepts, how the words are -inflected, plus features such as genders. Here is a lexicon instance for -Finnish: -``` - instance LexFoodFin of LexFood = open SyntaxFin, ParadigmsFin in { - oper - wine_N = mkN "viini" ; - beer_N = mkN "olut" ; - pizza_N = mkN "pizza" ; - cheese_N = mkN "juusto" ; - fish_N = mkN "kala" ; - fresh_A = mkA "tuore" ; - warm_A = mkA "lämmin" ; - italian_A = mkA "italialainen" ; - expensive_A = mkA "kallis" ; - delicious_A = mkA "herkullinen" ; - boring_A = mkA "tylsä" ; - } -``` - -**Exercise**. Instantiate the functor ``FoodI`` to some language of -your choice. - - -==Division of labour revisited== - -One purpose with the resource grammars was stated to be a division -of labour between linguists and application grammarians. We can now -reflect on what this means more precisely, by asking ourselves what -skills are required of grammarians working on different components. - -Building a GF application starts from the abstract syntax. Writing -an abstract syntax requires -- understanding the semantic structure of the application domain -- knowledge of the GF fragment with categories and functions - - -If the concrete syntax is written by means of a functor, the programmer -has to decide what parts of the implementation are put to the interface -and what parts are shared in the functor. This requires -- knowing how the domain concepts are expressed in natural language -- knowledge of the resource grammar library - the categories and combinators -- understanding what parts are likely to be expressed in language-dependent - ways, so that they must belong to the interface and not the functor -- knowledge of the GF fragment with function applications and strings - - -Instantiating a ready-made functor to a new language is less demanding. -It requires essentially -- knowing how the domain words are expressed in the language -- knowing, roughly, how these words are inflected -- knowledge of the paradigms available in the library -- knowledge of the GF fragment with function applications and strings - - -Notice that none of these tasks requires the use of GF records, tables, -or parameters. Thus only a small fragment of GF is needed; the rest of -GF is only relevant for those who write the libraries. - -Of course, grammar writing is not always straightforward usage of libraries. -For example, GF can be used for other languages than just those in the -libraries - for both natural and formal languages. A knowledge of records -and tables can, unfortunately, also be needed for understanding GF's error -messages. - -**Exercise**. Design a small grammar that can be used for controlling -an MP3 player. The grammar should be able to recognize commands such -as //play this song//, with the following variations: -- verbs: //play//, //remove// -- objects: //song//, //artist// -- determiners: //this//, //the previous// -- verbs without arguments: //stop//, //pause// - - -The implementation goes in the following phases: -+ abstract syntax -+ functor and lexicon interface -+ lexicon instance for the first language -+ functor instantiation for the first language -+ lexicon instance for the second language -+ functor instantiation for the second language -+ ... - - - -==Restricted inheritance== - -A functor implementation using the resource ``Syntax`` interface -works as long as all concepts are expressed by using the same structures -in all languages. If this is not the case, the deviant linearization can -be made into a parameter and moved to the domain lexicon interface. - -Let us take a slightly contrived example: assume that English has -no word for ``Pizza``, but has to use the paraphrase //Italian pie//. -This paraphrase is no longer a noun ``N``, but a complex phrase -in the category ``CN``. An obvious way to solve this problem is -to change interface ``LexEng`` so that the constant declared for -``Pizza`` gets a new type: -``` - oper pizza_CN : CN ; -``` -But this solution is unstable: we may end up changing the interface -and the function with each new language, and we must every time also -change the interface instances for the old languages to maintain -type correctness. - -A better solution is to use **restricted inheritance**: the English -instantiation inherits the functor implementation except for the -constant ``Pizza``. This is how we write: -``` - --# -path=.:present:prelude - - concrete FoodEng of Food = FoodI - [Pizza] with - (Syntax = SyntaxEng), - (LexFood = LexFoodEng) ** - open SyntaxEng, ParadigmsEng in { - - lin Pizza = mkCN (mkA "Italian") (mkN "pie") ; - } -``` -Restricted inheritance is available for all inherited modules. One can for -instance exclude some mushrooms and pick up just some fruit in -the ``FoodMarket`` example: -``` - abstract Foodmarket = Food, Fruit [Peach], Mushroom - [Agaric] -``` -A concrete syntax of ``Foodmarket`` must then indicate the same inheritance -restrictions. - - -**Exercise**. Change ``FoodGer`` in such a way that it says, instead of -//X is Y//, the equivalent of //X must be Y// (//X muss Y sein//). -You will have to browse the full resource API to find all -the functions needed. - - -==Browsing the resource with GF commands== - -In addition to reading the -[resource synopsis ../../lib/resource-1.0/synopsis.html], you -can find resource function combinations by using the parser. This -is so because the resource library is in the end implemented as -a top-level ``abstract-concrete`` grammar, on which parsing -and linearization work. - -Unfortunately, only English and the Scandinavian languages can be -parsed within acceptable computer resource limits when the full -resource is used. - -To look for a syntax tree in the overload API by parsing, do like this: -``` - > $GF_LIB_PATH - > i -path=alltenses:prelude alltenses/OverLangEng.gfc - > p -cat=S -overload "this grammar is too big" - mkS (mkCl (mkNP (mkDet this_Quant) grammar_N) (mkAP too_AdA big_A)) -``` -To view linearizations in all languages by parsing from English: -``` - > i alltenses/langs.gfcm - > p -cat=S -lang=LangEng "this grammar is too big" | tb - UseCl TPres ASimul PPos (PredVP (DetCN (DetSg (SgQuant this_Quant) - NoOrd) (UseN grammar_N)) (UseComp (CompAP (AdAP too_AdA (PositA big_A))))) - Den här grammatiken är för stor - Esta gramática es demasiado grande - (Cyrillic: eta grammatika govorit des'at' jazykov) - Denne grammatikken er for stor - Questa grammatica è troppo grande - Diese Grammatik ist zu groß - Cette grammaire est trop grande - Tämä kielioppi on liian suuri - This grammar is too big - Denne grammatik er for stor -``` -Unfortunately, the Russian grammar uses at the moment a different -character encoding than the rest and is therefore not displayed correctly -in a terminal window. However, the GF syntax editor does display all -examples correctly: -``` - % gfeditor alltenses/langs.gfcm -``` -When you have constructed the tree, you will see the following screen: - -#BCEN - - [../../lib/resource-1.0/doc/10lang-small.png] - -#ECEN - - -**Exercise**. Find the resource grammar translations for the following -English phrases (parse in the category ``Phr``). You can first try to -build the terms manually. - -//every man loves a woman// - -//this grammar speaks more than ten languages// - -//which languages aren't in the grammar// - -//which languages did you want to speak// - - - -=Refining semantics in abstract syntax= - -While the concrete syntax constructs of GF have been already -introduced, there is much more that can be done in the abstract -syntax. The techniques of **dependent types** and -**higher order abstract syntax** are introduced in this Chapter, -which thereby concludes the presentation of the GF language. - - - -==GF as a logical framework== - -In this section, we will show how -to encode advanced semantic concepts in an abstract syntax. -We use concepts inherited from **type theory**. Type theory -is the basis of many systems known as **logical frameworks**, which are -used for representing mathematical theorems and their proofs on a computer. -In fact, GF has a logical framework as its proper part: -this part is the abstract syntax. - -In a logical framework, the formalization of a mathematical theory -is a set of type and function declarations. The following is an example -of such a theory, represented as an ``abstract`` module in GF. -``` -abstract Arithm = { - cat - Prop ; -- proposition - Nat ; -- natural number - fun - Zero : Nat ; -- 0 - Succ : Nat -> Nat ; -- successor of x - Even : Nat -> Prop ; -- x is even - And : Prop -> Prop -> Prop ; -- A and B - } -``` - -**Exercise**. Give a concrete syntax of ``Arithm``, either from scatch or -by using the resource library. - - - - -==Dependent types== - -**Dependent types** are a characteristic feature of GF, -inherited from the **constructive type theory** of Martin-Löf and -distinguishing GF from most other grammar formalisms and -functional programming languages. - -Dependent types can be used for stating stronger -**conditions of well-formedness** than ordinary types. -A simple example is a "smart house" system, which -defines voice commands for household appliances. This example -is borrowed from the -[Regulus Book http://cslipublications.stanford.edu/site/1575865262.html] -(Rayner & al. 2006). - -One who enters a smart house can use speech to dim lights, switch -on the fan, etc. For each ``Kind`` of a device, there is a set of -``Actions`` that can be performed on it; thus one can dim the lights but - not the fan, for example. These dependencies can be expressed by -by making the type ``Action`` dependent on ``Kind``. We express this -as follows in ``cat`` declarations: -``` - cat - Command ; - Kind ; - Action Kind ; - Device Kind ; -``` -The crucial use of the dependencies is made in the rule for forming commands: -``` - fun CAction : (k : Kind) -> Action k -> Device k -> Command ; -``` -In other words: an action and a device can be combined into a command only -if they are of the same ``Kind`` ``k``. If we have the functions -``` - DKindOne : (k : Kind) -> Device k ; -- the light - - light, fan : Kind ; - dim : Action light ; -``` -we can form the syntax tree -``` - CAction light dim (DKindOne light) -``` -but we cannot form the trees -``` - CAction light dim (DKindOne fan) - CAction fan dim (DKindOne light) - CAction fan dim (DKindOne fan) -``` -Linearization rules are written as usual: the concrete syntax does not -know if a category is a dependent type. In English, you can write as follows: -``` - lincat Action = {s : Str} ; - lin CAction kind act dev = {s = act.s ++ dev.s} ; -``` -Notice that the argument ``kind`` does not appear in the linearization. -The type checker will be able to reconstruct it from the ``dev`` argument. - -Parsing with dependent types is performed in two phases: -+ context-free parsing -+ filtering through type checker - - -If you just parse in the usual way, you don't enter the second phase, and -the ``kind`` argument is not found: -``` - > parse "dim the light" - CAction ? dim (DKindOne light) -``` -Moreover, type-incorrect commands are not rejected: -``` - > parse "dim the fan" - CAction ? dim (DKindOne fan) -``` -The question mark ``?`` is a **metavariable**, and is returned by the parser -for any subtree that is suppressed by a linearization rule. - -To get rid of metavariables, you must feed the parse result into the -second phase of **solving** them. The ``solve`` process uses the dependent -type checker to restore the values of the metavariables. It is invoked by -the command ``put_tree = pt`` with the flag ``-transform=solve``: -``` - > parse "dim the light" | put_tree -transform=solve - CAction light dim (DKindOne light) -``` -The ``solve`` process may fail, in which case no tree is returned: -``` - > parse "dim the fan" | put_tree -transform=solve - no tree found -``` - - -**Exercise**. Write an abstract syntax module with above contents -and an appropriate English concrete syntax. Try to parse the commands -//dim the light// and //dim the fan//, with and without ``solve`` filtering. - - -**Exercise**. Perform random and exhaustive generation, with and without -``solve`` filtering. - -**Exercise**. Add some device kinds and actions to the grammar. - - -==Polymorphism== - -Sometimes an action can be performed on all kinds of devices. It would be -possible to introduce separate ``fun`` constants for each kind-action pair, -but this would be tedious. Instead, one can use **polymorphic** actions, -i.e. actions that take a ``Kind`` as an argument and produce an ``Action`` -for that ``Kind``: -``` - fun switchOn, switchOff : (k : Kind) -> Action k ; -``` -Functions that are not polymorphic are **monomorphic**. However, the -dichotomy into monomorphism and full polymorphism is not always sufficien -for good semantic modelling: very typically, some actions are defined -for a proper subset of devices, but not just one. For instance, both doors and -windows can be opened, whereas lights cannot. -We will return to this problem by introducing the -concept of **restricted polymorphism** later, -after a chapter on proof objects. - - - -==Dependent types and spoken language models== - -We have used dependent types to control semantic well-formedness -in grammars. This is important in traditional type theory -applications such as proof assistants, where only mathematically -meaningful formulas should be constructed. But semantic filtering has -also proved important in speech recognition, because it reduces the -ambiguity of the results. - - -===Grammar-based language models=== - -The standard way of using GF in speech recognition is by building -**grammar-based language models**. To this end, GF comes with compilers -into several formats that are used in speech recognition systems. -One such format is GSL, used in the [Nuance speech recognizer www.nuance.com]. -It is produced from GF simply by printing a grammar with the flag -``-printer=gsl``. -``` - > import -conversion=finite SmartEng.gf - > print_grammar -printer=gsl - - ;GSL2.0 - ; Nuance speech recognition grammar for SmartEng - ; Generated by GF - - .MAIN SmartEng_2 - - SmartEng_0 [("switch" "off") ("switch" "on")] - SmartEng_1 ["dim" ("switch" "off") - ("switch" "on")] - SmartEng_2 [(SmartEng_0 SmartEng_3) - (SmartEng_1 SmartEng_4)] - SmartEng_3 ("the" SmartEng_5) - SmartEng_4 ("the" SmartEng_6) - SmartEng_5 "fan" - SmartEng_6 "light" -``` -Now, GSL is a context-free format, so how does it cope with dependent types? -In general, dependent types can give rise to infinitely many basic types -(exercise!), whereas a context-free grammar can by definition only have -finitely many nonterminals. - -This is where the flag ``-conversion=finite`` is needed in the ``import`` -command. Its effect is to convert a GF grammar with dependent types to -one without, so that each instance of a dependent type is replaced by -an atomic type. This can then be used as a nonterminal in a context-free -grammar. The ``finite`` conversion presupposes that every -dependent type has only finitely many instances, which is in fact -the case in the ``Smart`` grammar. - - -**Exercise**. If you have access to the Nuance speech recognizer, -test it with GF-generated language models for ``SmartEng``. Do this -both with and without ``-conversion=finite``. - -**Exercise**. Construct an abstract syntax with infinitely many instances -of dependent types. - - -===Statistical language models=== - -An alternative to grammar-based language models are -**statistical language models** (**SLM**s). An SLM is -built from a **corpus**, i.e. a set of utterances. It specifies the -probability of each **n-gram**, i.e. sequence of //n// words. The -typical value of //n// is 2 (bigrams) or 3 (trigrams). - -One advantage of SLMs over grammar-based models is that they are -**robust**, i.e. they can be used to recognize sequences that would -be out of the grammar or the corpus. Another advantage is that -an SLM can be built "for free" if a corpus is available. - -However, collecting a corpus can require a lot of work, and writing -a grammar can be less demanding, especially with tools such as GF or -Regulus. This advantage of grammars can be combined with robustness -by creating a back-up SLM from a **synthesized corpus**. This means -simply that the grammar is used for generating such a corpus. -In GF, this can be done with the ``generate_trees`` command. -As with grammar-based models, the quality of the SLM is better -if meaningless utterances are excluded from the corpus. Thus -a good way to generate an SLM from a GF grammar is by using -dependent types and filter the results through the type checker: -``` - > generate_trees | put_trees -transform=solve | linearize -``` - - -**Exercise**. Measure the size of the corpus generated from -``SmartEng``, with and without type checker filtering. - - - -==Digression: dependent types in concrete syntax== - -The **functional fragment** of GF -terms and types comprises function types, applications, lambda -abstracts, constants, and variables. This fragment is similar in -abstract and concrete syntax. In particular, -dependent types are also available in concrete syntax. -We have not made use of them yet, -but we will now look at one example of how they -can be used. - -Those readers who are familiar with functional programming languages -like ML and Haskell, may already have missed **polymorphic** -functions. For instance, Haskell programmers have access to -the functions -``` - const :: a -> b -> a - const c _ = c - - flip :: (a -> b -> c) -> b -> a -> c - flip f y x = f x y -``` -which can be used for any given types ``a``,``b``, and ``c``. - -The GF counterpart of polymorphic functions are **monomorphic** -functions with explicit **type variables**. Thus the above -definitions can be written -``` - oper const :(a,b : Type) -> a -> b -> a = - \_,_,c,_ -> c ; - - oper flip : (a,b,c : Type) -> (a -> b ->c) -> b -> a -> c = - \_,_,_,f,x,y -> f y x ; -``` -When the operations are used, the type checker requires -them to be equipped with all their arguments; this may be a nuisance -for a Haskell or ML programmer. - - - -==Proof objects== - -Perhaps the most well-known idea in constructive type theory is -the **Curry-Howard isomorphism**, also known as the -**propositions as types principle**. Its earliest formulations -were attempts to give semantics to the logical systems of -propositional and predicate calculus. In this section, we will consider -a more elementary example, showing how the notion of proof is useful -outside mathematics, as well. - -We first define the category of unary (also known as Peano-style) -natural numbers: -``` - cat Nat ; - fun Zero : Nat ; - fun Succ : Nat -> Nat ; -``` -The **successor function** ``Succ`` generates an infinite -sequence of natural numbers, beginning from ``Zero``. - -We then define what it means for a number //x// to be //less than// -a number //y//. Our definition is based on two axioms: -- ``Zero`` is less than ``Succ`` //y// for any //y//. -- If //x// is less than //y//, then ``Succ`` //x// is less than ``Succ`` //y//. - - -The most straightforward way of expressing these axioms in type theory -is as typing judgements that introduce objects of a type ``Less`` //x y//: -``` - cat Less Nat Nat ; - fun lessZ : (y : Nat) -> Less Zero (Succ y) ; - fun lessS : (x,y : Nat) -> Less x y -> Less (Succ x) (Succ y) ; -``` -Objects formed by ``lessZ`` and ``lessS`` are -called **proof objects**: they establish the truth of certain -mathematical propositions. -For instance, the fact that 2 is less that -4 has the proof object -``` - lessS (Succ Zero) (Succ (Succ (Succ Zero))) - (lessS Zero (Succ (Succ Zero)) (lessZ (Succ Zero))) -``` -whose type is -``` - Less (Succ (Succ Zero)) (Succ (Succ (Succ (Succ Zero)))) -``` -which is the formalization of the proposition that 2 is less than 4. - -GF grammars can be used to provide a **semantic control** of -well-formedness of expressions. We have already seen examples of this: -the grammar of well-formed actions on household devices. By introducing proof objects -we have now added a very powerful technique of expressing semantic conditions. - -A simple example of the use of proof objects is the definition of -well-formed //time spans//: a time span is expected to be from an earlier to -a later time: -``` - from 3 to 8 -``` -is thus well-formed, whereas -``` - from 8 to 3 -``` -is not. The following rules for spans impose this condition -by using the ``Less`` predicate: -``` - cat Span ; - fun span : (m,n : Nat) -> Less m n -> Span ; -``` - -**Exercise**. Write an abstract and concrete syntax with the -concepts of this section, and experiment with it in GF. - - -**Exercise**. Define the notions of "even" and "odd" in terms -of proof objects. **Hint**. You need one function for proving -that 0 is even, and two other functions for propagating the -properties. - - - - -===Proof-carrying documents=== - -Another possible application of proof objects is **proof-carrying documents**: -to be semantically well-formed, the abstract syntax of a document must contain a proof -of some property, although the proof is not shown in the concrete document. -Think, for instance, of small documents describing flight connections: - -//To fly from Gothenburg to Prague, first take LH3043 to Frankfurt, then OK0537 to Prague.// - -The well-formedness of this text is partly expressible by dependent typing: -``` - cat - City ; - Flight City City ; - fun - Gothenburg, Frankfurt, Prague : City ; - LH3043 : Flight Gothenburg Frankfurt ; - OK0537 : Flight Frankfurt Prague ; -``` -This rules out texts saying //take OK0537 from Gothenburg to Prague//. -However, there is a -further condition saying that it must be possible to -change from LH3043 to OK0537 in Frankfurt. -This can be modelled as a proof object of a suitable type, -which is required by the constructor -that connects flights. -``` - cat - IsPossible (x,y,z : City)(Flight x y)(Flight y z) ; - fun - Connect : (x,y,z : City) -> - (u : Flight x y) -> (v : Flight y z) -> - IsPossible x y z u v -> Flight x z ; -``` - - -==Restricted polymorphism== - -In the first version of the smart house grammar ``Smart``, -all Actions were either of -- **monomorphic**: defined for one Kind -- **polymorphic**: defined for all Kinds - - -To make this scale up for new Kinds, we can refine this to -**restricted polymorphism**: defined for Kinds of a certain **class** - - -The notion of class can be expressed in abstract syntax -by using the Curry-Howard isomorphism as follows: -- a class is a **predicate** of Kinds - i.e. a type depending of Kinds -- a Kind is in a class if there is a proof object of this type - - -Here is an example with switching and dimming. The classes are called -``switchable`` and ``dimmable``. -``` -cat - Switchable Kind ; - Dimmable Kind ; -fun - switchable_light : Switchable light ; - switchable_fan : Switchable fan ; - dimmable_light : Dimmable light ; - - switchOn : (k : Kind) -> Switchable k -> Action k ; - dim : (k : Kind) -> Dimmable k -> Action k ; -``` -One advantage of this formalization is that classes for new -actions can be added incrementally. - -**Exercise**. Write a new version of the ``Smart`` grammar with -classes, and test it in GF. - -**Exercise**. Add some actions, kinds, and classes to the grammar. -Try to port the grammar to a new language. You will probably find -out that restricted polymorphism works differently in different languages. -For instance, in Finnish not only doors but also TVs and radios -can be "opened", which means switching them on. - - -==Variable bindings== - -Mathematical notation and programming languages have -expressions that **bind** variables. For instance, -a universally quantifier proposition -``` - (All x)B(x) -``` -consists of the **binding** ``(All x)`` of the variable ``x``, -and the **body** ``B(x)``, where the variable ``x`` can have -**bound occurrences**. - -Variable bindings appear in informal mathematical language as well, for -instance, -``` - for all x, x is equal to x - - the function that for any numbers x and y returns the maximum of x+y - and x*y - - Let x be a natural number. Assume that x is even. Then x + 3 is odd. -``` -In type theory, variable-binding expression forms can be formalized -as functions that take functions as arguments. The universal -quantifier is defined -``` - fun All : (Ind -> Prop) -> Prop -``` -where ``Ind`` is the type of individuals and ``Prop``, -the type of propositions. If we have, for instance, the equality predicate -``` - fun Eq : Ind -> Ind -> Prop -``` -we may form the tree -``` - All (\x -> Eq x x) -``` -which corresponds to the ordinary notation -``` - (All x)(x = x). -``` -An abstract syntax where trees have functions as arguments, as in -the two examples above, has turned out to be precisely the right -thing for the semantics and computer implementation of -variable-binding expressions. The advantage lies in the fact that -only one variable-binding expression form is needed, the lambda abstract -``\x -> b``, and all other bindings can be reduced to it. -This makes it easier to implement mathematical theories and reason -about them, since variable binding is tricky to implement and -to reason about. The idea of using functions as arguments of -syntactic constructors is known as **higher-order abstract syntax**. - -The question now arises: how to define linearization rules -for variable-binding expressions? -Let us first consider universal quantification, -``` - fun All : (Ind -> Prop) -> Prop -``` -We write -``` - lin All B = {s = "(" ++ "All" ++ B.$0 ++ ")" ++ B.s} -``` -to obtain the form shown above. -This linearization rule brings in a new GF concept - the ``$0`` -field of ``B`` containing a bound variable symbol. -The general rule is that, if an argument type of a function is -itself a function type ``A -> C``, the linearization type of -this argument is the linearization type of ``C`` -together with a new field ``$0 : Str``. In the linearization rule -for ``All``, the argument ``B`` thus has the linearization -type -``` - {$0 : Str ; s : Str}, -``` -since the linearization type of ``Prop`` is -``` - {s : Str} -``` -In other words, the linearization of a function -consists of a linearization of the body together with a -field for a linearization of the bound variable. -Those familiar with type theory or lambda calculus -should notice that GF requires trees to be in -**eta-expanded** form in order to be linearizable: -any function of type -``` - A -> B -``` -always has a syntax tree of the form -``` - \x -> b -``` -where ``b : B`` under the assumption ``x : A``. -It is in this form that an expression can be analysed -as having a bound variable and a body. - - -Given the linearization rule -``` - lin Eq a b = {s = "(" ++ a.s ++ "=" ++ b.s ++ ")"} -``` -the linearization of -``` - \x -> Eq x x -``` -is the record -``` - {$0 = "x", s = ["( x = x )"]} -``` -Thus we can compute the linearization of the formula, -``` - All (\x -> Eq x x) --> {s = "[( All x ) ( x = x )]"}. -``` -How did we get the //linearization// of the variable ``x`` -into the string ``"x"``? GF grammars have no rules for -this: it is just hard-wired in GF that variable symbols are -linearized into the same strings that represent them in -the print-out of the abstract syntax. - -To be able to //parse// variable symbols, however, GF needs to know what -to look for (instead of e.g. trying to parse //any// -string as a variable). What strings are parsed as variable symbols -is defined in the lexical analysis part of GF parsing -``` - > p -cat=Prop -lexer=codevars "(All x)(x = x)" - All (\x -> Eq x x) -``` -(see more details on lexers below). If several variables are bound in the -same argument, the labels are ``$0, $1, $2``, etc. - - -**Exercise**. Write an abstract syntax of the whole -**predicate calculus**, with the -**connectives** "and", "or", "implies", and "not", and the -**quantifiers** "exists" and "for all". Use higher-order functions -to guarantee that unbounded variables do not occur. - -**Exercise**. Write a concrete syntax for your favourite -notation of predicate calculus. Use Latex as target language -if you want nice output. You can also try producing Haskell boolean -expressions. Use as many parenthesis as you need to -guarantee non-ambiguity. - - - -==Semantic definitions== - -We have seen that, -just like functional programming languages, GF has declarations -of functions, telling what the type of a function is. -But we have not yet shown how to **compute** -these functions: all we can do is provide them with arguments -and linearize the resulting terms. -Since our main interest is the well-formedness of expressions, -this has not yet bothered -us very much. As we will see, however, computation does play a role -even in the well-formedness of expressions when dependent types are -present. - -GF has a form of judgement for **semantic definitions**, -recognized by the key word ``def``. At its simplest, it is just -the definition of one constant, e.g. -``` - def one = Succ Zero ; -``` -We can also define a function with arguments, -``` - def Neg A = Impl A Abs ; -``` -which is still a special case of the most general notion of -definition, that of a group of **pattern equations**: -``` - def - sum x Zero = x ; - sum x (Succ y) = Succ (Sum x y) ; -``` -To compute a term is, as in functional programming languages, -simply to follow a chain of reductions until no definition -can be applied. For instance, we compute -``` - Sum one one --> - Sum (Succ Zero) (Succ Zero) --> - Succ (sum (Succ Zero) Zero) --> - Succ (Succ Zero) -``` -Computation in GF is performed with the ``pt`` command and the -``compute`` transformation, e.g. -``` - > p -tr "1 + 1" | pt -transform=compute -tr | l - sum one one - Succ (Succ Zero) - s(s(0)) -``` - -The ``def`` definitions of a grammar induce a notion of -**definitional equality** among trees: two trees are -definitionally equal if they compute into the same tree. -Thus, trivially, all trees in a chain of computation -(such as the one above) -are definitionally equal to each other. So are the trees -``` - sum Zero (Succ one) - Succ one - sum (sum Zero Zero) (sum (Succ Zero) one) -``` -and infinitely many other trees. - -A fact that has to be emphasized about ``def`` definitions is that -they are //not// performed as a first step of linearization. -We say that **linearization is intensional**, which means that -the definitional equality of two trees does not imply that -they have the same linearizations. For instance, each of the seven terms -shown above has a different linearizations in arithmetic notation: -``` - 1 + 1 - s(0) + s(0) - s(s(0) + 0) - s(s(0)) - 0 + s(0) - s(1) - 0 + 0 + s(0) + 1 -``` -This notion of intensionality is -no more exotic than the intensionality of any **pretty-printing** -function of a programming language (function that shows -the expressions of the language as strings). It is vital for -pretty-printing to be intensional in this sense - if we want, -for instance, to trace a chain of computation by pretty-printing each -intermediate step, what we want to see is a sequence of different -expression, which are definitionally equal. - -What is more exotic is that GF has two ways of referring to the -abstract syntax objects. In the concrete syntax, the reference is intensional. -In the abstract syntax, the reference is extensional, since -**type checking is extensional**. The reason is that, -in the type theory with dependent types, types may depend on terms. -Two types depending on terms that are definitionally equal are -equal types. For instance, -``` - Proof (Odd one) - Proof (Odd (Succ Zero)) -``` -are equal types. Hence, any tree that type checks as a proof that -1 is odd also type checks as a proof that the successor of 0 is odd. -(Recall, in this connection, that the -arguments a category depends on never play any role -in the linearization of trees of that category, -nor in the definition of the linearization type.) - -In addition to computation, definitions impose a -**paraphrase** relation on expressions: -two strings are paraphrases if they -are linearizations of trees that are -definitionally equal. -Paraphrases are sometimes interesting for -translation: the **direct translation** -of a string, which is the linearization of the same tree -in the targer language, may be inadequate because it is e.g. -unidiomatic or ambiguous. In such a case, -the translation algorithm may be made to consider -translation by a paraphrase. - -To stress express the distinction between -**constructors** (=**canonical** functions) -and other functions, GF has a judgement form -``data`` to tell that certain functions are canonical, e.g. -``` - data Nat = Succ | Zero ; -``` -Unlike in Haskell, but similarly to ALF (where constructor functions -are marked with a flag ``C``), -new constructors can be added to -a type with new ``data`` judgements. The type signatures of constructors -are given separately, in ordinary ``fun`` judgements. -One can also write directly -``` - data Succ : Nat -> Nat ; -``` -which is equivalent to the two judgements -``` - fun Succ : Nat -> Nat ; - data Nat = Succ ; -``` - -**Exercise**. Implement an interpreter of a small functional programming -language with natural numbers, lists, pairs, lambdas, etc. Use higher-order -abstract syntax with semantic definitions. As target language, use -your favourite programming language. - -**Exercise**. To make your interpreted language look nice, use -**precedences** instead of putting parentheses everywhere. -You can use the [precedence library ../../lib/prelude/Precedence.gf] -of GF to facilitate this. - - - -#PARTtwo - -=Embedded grammars in Haskell= - -GF grammars can be used as parts of programs written in the -following languages. We will go through a skeleton application in -Haskell, while the next chapter will show how to build an -application in Java. - -We will show how to build a minimal resource grammar -application whose architecture scales up to much -larger applications. The application is run from the -shell by the command -``` - math -``` -whereafter it reads user input in English and French. -To each input line, it answers by the truth value of -the sentence. -``` - ./math - zéro est pair - True - zero is odd - False - zero is even and zero is odd - False -``` -The source of the application consists of the following -files: -``` - LexEng.gf -- English instance of Lex - LexFre.gf -- French instance of Lex - Lex.gf -- lexicon interface - Makefile -- a makefile - MathEng.gf -- English instantiation of MathI - MathFre.gf -- French instantiation of MathI - Math.gf -- abstract syntax - MathI.gf -- concrete syntax functor for Math - Run.hs -- Haskell Main module -``` -The system was built in 22 steps explained below. - - -==Writing GF grammars== - -===Creating the first grammar=== - -1. Write ``Math.gf``, which defines what you want to say. -``` - abstract Math = { - cat Prop ; Elem ; - fun - And : Prop -> Prop -> Prop ; - Even : Elem -> Prop ; - Zero : Elem ; - } -``` -2. Write ``Lex.gf``, which defines which language-dependent -parts are needed in the concrete syntax. These are mostly -words (lexicon), but can in fact be any operations. The definitions -only use resource abstract syntax, which is opened. -``` - interface Lex = open Syntax in { - oper - even_A : A ; - zero_PN : PN ; - } -``` -3. Write ``LexEng.gf``, the English implementation of ``Lex.gf`` -This module uses English resource libraries. -``` - instance LexEng of Lex = open GrammarEng, ParadigmsEng in { - oper - even_A = regA "even" ; - zero_PN = regPN "zero" ; - - } -``` -4. Write ``MathI.gf``, a language-independent concrete syntax of -``Math.gf``. It opens interfaces. -which makes it an incomplete module, aka. parametrized module, aka. -functor. -``` - incomplete concrete MathI of Math = - - open Syntax, Lex in { - - flags startcat = Prop ; - - lincat - Prop = S ; - Elem = NP ; - lin - And x y = mkS and_Conj x y ; - Even x = mkS (mkCl x even_A) ; - Zero = mkNP zero_PN ; - } -``` -5. Write ``MathEng.gf``, which is just an instatiation of ``MathI.gf``, -replacing the interfaces by their English instances. This is the module -that will be used as a top module in GF, so it contains a path to -the libraries. -``` - instance LexEng of Lex = open SyntaxEng, ParadigmsEng in { - oper - even_A = mkA "even" ; - zero_PN = mkPN "zero" ; - } -``` - - -===Testing=== - -6. Test the grammar in GF by random generation and parsing. -``` - $ gf - > i MathEng.gf - > gr -tr | l -tr | p - And (Even Zero) (Even Zero) - zero is evenand zero is even - And (Even Zero) (Even Zero) -``` -When importing the grammar, you will fail if you haven't -- correctly defined your ``GF_LIB_PATH`` as ``GF/lib`` -- installed the resource package or - compiled the resource from source by ``make`` in ``GF/lib/resource-1.0`` - - - -===Adding a new language=== - -7. Now it is time to add a new language. Write a French lexicon ``LexFre.gf``: -``` - instance LexFre of Lex = open SyntaxFre, ParadigmsFre in { - oper - even_A = mkA "pair" ; - zero_PN = mkPN "zéro" ; - } -``` -8. You also need a French concrete syntax, ``MathFre.gf``: -``` - --# -path=.:present:prelude - - concrete MathFre of Math = MathI with - (Syntax = SyntaxFre), - (Lex = LexFre) ; -``` -9. This time, you can test multilingual generation: -``` - > i MathFre.gf - > gr | tb - Even Zero - zéro est pair - zero is even -``` - - -===Extending the language=== - -10. You want to add a predicate saying that a number is odd. -It is first added to ``Math.gf``: -``` - fun Odd : Elem -> Prop ; -``` -11. You need a new word in ``Lex.gf``. -``` - oper odd_A : A ; -``` -12. Then you can give a language-independent concrete syntax in -``MathI.gf``: -``` - lin Odd x = mkS (mkCl x odd_A) ; -``` -13. The new word is implemented in ``LexEng.gf``. -``` - oper odd_A = mkA "odd" ; -``` -14. The new word is implemented in ``LexFre.gf``. -``` - oper odd_A = mkA "impair" ; -``` -15. Now you can test with the extended lexicon. First empty -the environment to get rid of the old abstract syntax, then -import the new versions of the grammars. -``` - > e - > i MathEng.gf - > i MathFre.gf - > gr | tb - And (Odd Zero) (Even Zero) - zéro est impair et zéro est pair - zero is odd and zero is even -``` - - -==Building a user program== - -===Producing a compiled grammar package=== - -16. Your grammar is going to be used by persons wh``MathEng.gf``o do not need -to compile it again. They may not have access to the resource library, -either. Therefore it is advisable to produce a multilingual grammar -package in a single file. We call this package ``math.gfcm`` and -produce it, when we have ``MathEng.gf`` and -``MathEng.gf`` in the GF state, by the command -``` - > pm | wf math.gfcm -``` - - -===Writing the Haskell application=== - -17. Write the Haskell main file ``Run.hs``. It uses the ``EmbeddedAPI`` -module defining some basic functionalities such as parsing. -The answer is produced by an interpreter of trees returned by the parser. -``` -module Main where - -import GSyntax -import GF.Embed.EmbedAPI - -main :: IO () -main = do - gr <- file2grammar "math.gfcm" - loop gr - -loop :: MultiGrammar -> IO () -loop gr = do - s <- getLine - interpret gr s - loop gr - -interpret :: MultiGrammar -> String -> IO () -interpret gr s = do - let tss = parseAll gr "Prop" s - case (concat tss) of - [] -> putStrLn "no parse" - t:_ -> print $ answer $ fg t - -answer :: GProp -> Bool -answer p = case p of - (GOdd x1) -> odd (value x1) - (GEven x1) -> even (value x1) - (GAnd x1 x2) -> answer x1 && answer x2 - -value :: GElem -> Int -value e = case e of - GZero -> 0 -``` - -18. The syntax trees manipulated by the interpreter are not raw -GF trees, but objects of the Haskell datatype ``GProp``. -From any GF grammar, a file ``GFSyntax.hs`` with -datatypes corresponding to its abstract -syntax can be produced by the command -``` - > pg -printer=haskell | wf GSyntax.hs -``` -The module also defines the overloaded functions -``gf`` and ``fg`` for translating from these types to -raw trees and back. - - -===Compiling the Haskell grammar=== - -19. Before compiling ``Run.hs``, you must check that the -embedded GF modules are found. The easiest way to do this -is by two symbolic links to your GF source directories: -``` - $ ln -s /home/aarne/GF/src/GF - $ ln -s /home/aarne/GF/src/Transfer/ -``` - -20. Now you can run the GHC Haskell compiler to produce the program. -``` - $ ghc --make -o math Run.hs -``` -The program can be tested with the command ``./math``. - - -===Building a distribution=== - -21. For a stand-alone binary-only distribution, only -the two files ``math`` and ``math.gfcm`` are needed. -For a source distribution, the files mentioned in -the beginning of this documents are needed. - - -===Using a Makefile=== - -22. As a part of the source distribution, a ``Makefile`` is -essential. The ``Makefile`` is also useful when developing the -application. It should always be possible to build an executable -from source by typing ``make``. Here is a minimal such ``Makefile``: -``` - all: - echo "pm | wf math.gfcm" | gf MathEng.gf MathFre.gf - echo "pg -printer=haskell | wf GSyntax.hs" | gf math.gfcm - ghc --make -o math Run.hs -``` - - -==The Embedded GF Haskell API== - - - -=Embedded grammars in Java FORTHCOMING= - -In this chapter, we will build a similar application in Java as was -built in Haskell in the previous chapter. This application gives -a template with the overall program structure that can be -extended with larger grammars and more Java functionalities. - -Before the chapter is written, the document - - [``http://www.cs.chalmers.se/~bringert/gf/gf-java.html`` http://www.cs.chalmers.se/~bringert/gf/gf-java.html] - -by Björn Bringert gives more information on embedded grammars in Java. - - - -=Spoken language translators FORTHCOMING= - -In this chapter, it will be shown how a multilingual grammar is -equipped with speech recognition and speech synthesis to obtain -a spoken language translator. - -Before the chapter is written, the document - - [``http://www.cs.chalmers.se/~bringert/gf/translatespeech.html`` http://www.cs.chalmers.se/~bringert/gf/translatespeech.html] - -by Björn Bringert gives more information on spoken language translation with GF. - - - -=Multimodal dialogue systems FORTHCOMING= - -In this chapter, we will show how to build a dialogue system in GF: -a system in which the user can talk with the computer to accomplish -a task such as finding a route on a map of transfer systems. -The grammars are **multimodal**, which means that spoken input -can be completed with mouse clicks. - -Before the chapter is written, the article -"Multimodal Dialogue System Grammars" by -Björn Bringert, Robin Cooper, Peter Ljunglöf, and Aarne Ranta -(//Proceedings of DIALOR'05, Ninth Workshop on the Semantics and Pragmatics of Dialogue//, -Nancy, France, June 9-11, 2005) -provides information on multilingual grammars. The paper is available in - -[``http://www.cs.chalmers.se/~bringert/publ/mm-grammars-dialor/mm-grammars-dialor.pdf`` http://www.cs.chalmers.se/~bringert/publ/mm-grammars-dialor/mm-grammars-dialor.pdf] - - - -=Grammars of formal languages FORTHCOMING= - -In this chapter, we will build a grammar for a formal language and interface -it with natural language. - -==Precedence and fixity== - -==Extensible natural-language interfaces== - - - - -#PARTfour - -=Implementing morphology and syntax= - -In this chapter, we will dig deeper into linguistic concepts than -so far. We will build an implementation of a linguistic motivated -fragment of English and Italian, covering basic morphology and syntax. -The result is a miniature of the GF resource library, whose internals will -be covered in the next chapter. There are two main purposes -for this chapter: -- to understand the linguistic concepts underlying the resource - grammar library -- to get practice in the more advanced constructs of concrete syntax - - - - -==Lexical vs. syntactic rules== - -So far we have seen a grammar from a semantic point of view: -a grammar specifies a system of meanings (specified in the abstract syntax) and -tells how they are expressed in some language (as specified in a concrete syntax). -In resource grammars, as in linguistic tradition, the goal is to -specify the **grammatically correct combinations of words**, whatever their -meanings are. - -Thus the grammar has two kinds of categories and two kinds of rules: -- lexical: - - lexical categories, to classify words - - lexical rules, to define words their properties - - -- phrasal (combinatorial, syntactic): - - phrasal categories, to classify phrases of arbitrary size - - phrasal rules, to combine phrases into larger phrases - - -Many grammar formalisms force a radical distinction between the lexical and syntactic -components; sometimes it is not even possible to express the two kinds of rules in -the same formalism. GF has no such restrictions. Nevertheless, it has turned out -to be a good discipline to maintain a distinction between the lexical and syntactic -components. - - - -==The abstract syntax== - -Let us go through the abstract syntax contained in the module ``Syntax``. -It can be found in the file -[``examples/tutorial/syntax/Syntax.gf`` examples/tutorial/syntax/Syntax.gf]. - - -===Lexical categories=== - -Words are classified into two kinds of categories: **closed** and -**open**. The definining property of closed categories is that the -words of them can easily be enumerated; it is very seldom that any -new words are introduced in them. In general, closed categories -contain **structural words**, also known as **function words**. -In ``Syntax``, we have just two closed lexical categories: -``` - cat - Det ; -- determiner e.g. "this" - AdA ; -- adadjective e.g. "very" -``` -We have already used words of both categories in the ``Food`` -examples; they have just not been assigned a category, but -treated as **syncategorematic**. In GF, a syncategoramatic -word is one that is introduced in a linearization rule of -some construction alongside with some other expressions that -are combined; there is no abstract syntax tree for that word -alone. Thus in the rules -``` - fun That : Kind -> Item ; - lin That k = {"that" ++ k.s} ; -``` -the word //that// is syncategoramatic. In linguistically motivated -grammars, syncategorematic words are usually avoided, whereas in -semantically motivated grammars, structural words are often treated -as syncategoramatic. This is partly so because the concept expressed -by a structural word in one language is often expressed by some other -means than an individual word in another. For instance, the definite -article //the// is a determiner word in English, whereas Swedish expresses -determination by inflecting the determined noun: //the wine// is //vinet// -in Swedish. - -As for open classes, we will use four: -``` - cat - N ; -- noun e.g. "pizza" - A ; -- adjective e.g. "good" - V ; -- intransitive verb e.g. "boil" - V2 ; -- two-place verb e.g. "eat" -``` -Two-place verbs differ from intransitive verbs syntactically by -taking an object. In the lexicon, they must be equipped with information -on the //case// of the object in some languages (such as German and Latin), -and on the //preposition// in some languages (such as English). - - - -===Lexical rules=== - -The words of closed categories can be listed once and for all in a -library. The ``Syntax`` module has the following: -``` - fun - this_Det, that_Det, these_Det, those_Det, - every_Det, theSg_Det, thePl_Det, indef_Det, plur_Det, two_Det : Det ; - very_AdA : AdA ; -``` -The naming convention for lexical rules is that we use a word followed by -the category. In this way we can for instance distinguish the determiner -//that// from the conjunction //that//. But there are also rules where this -does not quite suffice. English has no distinction between singular and -plural //the//; yet they behave differently as determiners, analogously to -//this// vs. //these//. The function //indef_Det// is the indefinite article -//a//, whereas //plur_Det// is semantically the plural indefinite article, -which has no separate word in English, as in some other languages, e.g. -//des// in French. - -Open lexical categories have no objects in ``Syntax``. However, we can -build lexical modules as extensions of ``Syntax``. An example is -[``examples/tutorial/syntax/Test.gf`` examples/tutorial/syntax/Test.gf], -which we use to test the syntax. Its vocabulary is from the food domain: -``` - abstract Test = Syntax ** { - fun - wine_N, cheese_N, fish_N, pizza_N, waiter_N, customer_N : N ; - fresh_A, warm_A, italian_A, expensive_A, delicious_A, boring_A : A ; - stink_V : V ; - eat_V2, love_V2, talk_V2 : V2 ; - } -``` - -===Phrasal categories=== - -The topmost category in ``Syntax`` is ``Phr``, **phrase**, covering -all complete sentences, which have a punctuation mark and could be -used alone to make an utterance. In addition to **declarative sentences** -``S``, there are also **question sentences** ``QS``: -``` - cat - Phr ; -- any complete sentence e.g. "Is this pizza good?" - S ; -- declarative sentence e.g. "this pizza is good" - QS ; -- question sentence e.g. "is this pizza good" -``` -The main parts of a sentence are usually taken to be the **noun phrase** ``NP`` and -the **verb phrase** ``VP``. In analogy to noun phrases, we consider -**interrogative phrases**, which are used for forming question sentences. -``` - NP ; -- noun phrase e.g. "this pizza" - IP ; -- interrogative phrase e.g "which pizza" - VP ; -- verb phrase e.g. "is good" -``` -The "smallest" phrasal categories are **common nouns** ``CN`` and -**adjectival phrases** ``AP``: -``` - CN ; -- common noun phrase e.g. "very good pizza" - AP ; -- adjectival phrase e.g. "very good" -``` -Common nouns are typically combined with determiners to build noun -phrases, whereas adjectival phrases are combined with the copula to -form verb phrases. - - -===Phrasal rules=== - -Phrasal rules specify how complex phrases are built from simpler ones. -At the bottom, there are **lexical insertion rules** telling how -words from each lexical category are "promoted" to phrases; i.e. how -the most elementary phrases are built. -``` - fun - UseN : N -> CN ; -- pizza - UseA : A -> AP ; -- be good - UseV : V -> VP ; -- stink -``` -Structural words usually don't form phrases themselves; thus they -are at the first place used for promoting "lower" phrase categories -to "higher" ones, -``` - DetCN : Det -> CN -> NP ; -- this pizza -``` -or for recursively building more complex phrases: -``` - AdAP : AdA -> AP -> AP ; -- very good -``` -In analogy to ``DetCN``, we could have a rule forming interrogative -noun phrases with interogative determiners such as //which//. In -``Syntax``, we however make a shortcut and just treat //which// -syncategorematically: -``` - WhichCN : CN -> IP ; -``` -Starting from the top of the grammar, we need two rules promoting -sentences and questions into complete phrases: -``` - PhrS : S -> Phr ; -- This pizza is good. - PhrQS : QS -> Phr ; -- Is this pizza good? -``` -The most central rule in most grammars is the **predication rule**, -which combines a noun -phrase and a verb phrase into a sentence. In the present grammar, -though not in the full resource grammar library, we split this -rule into two: one for positive and one for negated sentences: -``` - PosVP, NegVP : NP -> VP -> S ; -- this pizza is/isn't good -``` -In the same way, question sentences can be formed with these two -**polarities**: -``` - QPosVP, QNegVP : NP -> VP -> QS ; -- is/isn't this pizza good -``` -Another form of questions are ones with interrogative noun phrases: -``` - IPPosVP, IPNegVP : IP -> VP -> QS ; -- which pizza is/isn't good -``` -Verb phrases can be built by **complementation**, where a two-place -verb needs a noun phrase complement, and the (syncategoriematic) copula -can take an adjectival phrase as complement: -``` - ComplV2 : V2 -> NP -> VP ; -- eat this pizza - ComplAP : AP -> VP ; -- be good -``` -**Adjectival modification** is a recursive rule for forming common nouns: -``` - ModCN : AP -> CN -> CN ; -- warm pizza -``` -Finally, we have two special rules that are instances of so-called -**wh-movement**. The idea with this term is that a question such -as //which pizza do you eat// is a result of moving //which pizza// -from its "proper" place which is after the verb: //you eat which pizza//: -``` - IPPosV2, IPNegV2 : IP -> NP -> V2 -> QS ; -- which pizza do/don't you eat -``` -The full resource grammar has a more general treatment of this phenomenon. -But these special cases are already quite useful; moreover, they illustrate -variation that is possible in English between -**pied piping** (//about which pizzza do you talk//) and -**preposition stranding** (//which pizzza do you talk about//). - - -==Concrete syntax: English morphology== - -===Worst-case functions and data abstraction=== - -Some English nouns, such as ``mouse``, are so irregular that -it makes no sense to see them as instances of a paradigm. Even -then, it is useful to perform **data abstraction** from the -definition of the type ``Noun``, and introduce a constructor -operation, a **worst-case function** for nouns: -``` - oper mkNoun : Str -> Str -> Noun = \x,y -> { - s = table { - Sg => x ; - Pl => y - } - } ; -``` -Thus we can define -``` - lin Mouse = mkNoun "mouse" "mice" ; -``` -and -``` - oper regNoun : Str -> Noun = \x -> - mkNoun x (x + "s") ; -``` -instead of writing the inflection tables explicitly. - -The grammar engineering advantage of worst-case functions is that -the author of the resource module may change the definitions of -``Noun`` and ``mkNoun``, and still retain the -interface (i.e. the system of type signatures) that makes it -correct to use these functions in concrete modules. In programming -terms, ``Noun`` is then treated as an **abstract datatype**. - - -===A system of paradigms using predefined string operations=== - -In addition to the completely regular noun paradigm ``regNoun``, -some other frequent noun paradigms deserve to be -defined, for instance, -``` - sNoun : Str -> Noun = \kiss -> mkNoun kiss (kiss + "es") ; -``` -What about nouns like //fly//, with the plural //flies//? The already -available solution is to use the longest common prefix -//fl// (also known as the **technical stem**) as argument, and define -``` - yNoun : Str -> Noun = \fl -> mkNoun (fl + "y") (fl + "ies") ; -``` -But this paradigm would be very unintuitive to use, because the technical stem -is not an existing form of the word. A better solution is to use -the lemma and a string operator ``init``, which returns the initial segment (i.e. -all characters but the last) of a string: -``` - yNoun : Str -> Noun = \fly -> mkNoun fly (init fly + "ies") ; -``` -The operation ``init`` belongs to a set of operations in the -resource module ``Prelude``, which therefore has to be -``open``ed so that ``init`` can be used. -``` - > cc init "curry" - "curr" -``` -Its dual is ``last``: -``` - > cc last "curry" - "y" -``` -As generalizations of the library functions ``init`` and ``last``, GF has -two predefined funtions: -``Predef.dp``, which "drops" suffixes of any length, -and ``Predef.tk``, which "takes" a prefix -just omitting a number of characters from the end. For instance, -``` - > cc Predef.tk 3 "worried" - "worr" - > cc Predef.dp 3 "worried" - "ied" -``` -The prefix ``Predef`` is given to a handful of functions that could -not be defined internally in GF. They are available in all modules -without explicit ``open`` of the module ``Predef``. - - - -===An intelligent noun paradigm using pattern matching=== - -It may be hard for the user of a resource morphology to pick the right -inflection paradigm. A way to help this is to define a more intelligent -paradigm, which chooses the ending by first analysing the lemma. -The following variant for English regular nouns puts together all the -previously shown paradigms, and chooses one of them on the basis of -the final letter of the lemma (found by the prelude operation ``last``). -``` - regNoun : Str -> Noun = \s -> case last s of { - "s" | "z" => mkNoun s (s + "es") ; - "y" => mkNoun s (init s + "ies") ; - _ => mkNoun s (s + "s") - } ; -``` -The paradigms ``regNoun`` does not give the correct forms for -all nouns. For instance, //mouse - mice// and -//fish - fish// must be given by using ``mkNoun``. -Also the word //boy// would be inflected incorrectly; to prevent -this, either use ``mkNoun`` or modify -``regNoun`` so that the ``"y"`` case does not -apply if the second-last character is a vowel. - -**Exercise**. Extend the ``regNoun`` paradigm so that it takes care -of all variations there are in English. Test it with the nouns -//ax//, //bamboo//, //boy//, //bush//, //hero//, //match//. -**Hint**. The library functions ``Predef.dp`` and ``Predef.tk`` -are useful in this task. - -**Exercise**. The same rules that form plural nouns in English also -apply in the formation of third-person singular verbs. -Write a regular verb paradigm that uses this idea, but first -rewrite ``regNoun`` so that the analysis needed to build //s//-forms -is factored out as a separate ``oper``, which is shared with -``regVerb``. - - -===Morphological resource modules=== - -A common idiom is to -gather the ``oper`` and ``param`` definitions -needed for inflecting words in -a language into a morphology module. Here is a simple -example, [``MorphoEng`` resource/MorphoEng.gf]. -``` - --# -path=.:prelude - - resource MorphoEng = open Prelude in { - - param - Number = Sg | Pl ; - - oper - Noun, Verb : Type = {s : Number => Str} ; - - mkNoun : Str -> Str -> Noun = \x,y -> { - s = table { - Sg => x ; - Pl => y - } - } ; - - regNoun : Str -> Noun = \s -> case last s of { - "s" | "z" => mkNoun s (s + "es") ; - "y" => mkNoun s (init s + "ies") ; - _ => mkNoun s (s + "s") - } ; - - mkVerb : Str -> Str -> Verb = \x,y -> mkNoun y x ; - - regVerb : Str -> Verb = \s -> case last s of { - "s" | "z" => mkVerb s (s + "es") ; - "y" => mkVerb s (init s + "ies") ; - "o" => mkVerb s (s + "es") ; - _ => mkVerb s (s + "s") - } ; - } -``` -The first line gives as a hint to the compiler the -**search path** needed to find all the other modules that the -module depends on. The directory ``prelude`` is a subdirectory of -``GF/lib``; to be able to refer to it in this simple way, you can -set the environment variable ``GF_LIB_PATH`` to point to this -directory. - - -===Morphological analysis and morphology quiz=== - -Even though morphology is in GF -mostly used as an auxiliary for syntax, it -can also be useful on its own right. The command ``morpho_analyse = ma`` -can be used to read a text and return for each word the analyses that -it has in the current concrete syntax. -``` - > rf bible.txt | morpho_analyse -``` -In the same way as translation exercises, morphological exercises can -be generated, by the command ``morpho_quiz = mq``. Usually, -the category is set to be something else than ``S``. For instance, -``` - > cd GF/lib/resource-1.0/ - > i french/IrregFre.gf - > morpho_quiz -cat=V - - Welcome to GF Morphology Quiz. - ... - - réapparaître : VFin VCondit Pl P2 - réapparaitriez - > No, not réapparaitriez, but - réapparaîtriez - Score 0/1 -``` -Finally, a list of morphological exercises can be generated -off-line and saved in a -file for later use, by the command ``morpho_list = ml`` -``` - > morpho_list -number=25 -cat=V | wf exx.txt -``` -The ``number`` flag gives the number of exercises generated. - - - -==Concrete syntax: English phrase building FORTHCOMING== - - -===Predication=== - - -===Complementization=== - - -===Determination=== - - -===Modification=== - - -===Putting the syntax together=== - - -==Concrete syntax for Italian FORTHCOMING== - - - - -=Inside the resource grammar library FORTHCOMING= - -This chapter is meant for those who want to understand the GF resource -grammar library more thoroughly - in particular, for those who -want to write their own implementations. - -Before the chapter is finished, more information can be found in - -[``http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/doc/Resource-HOWTO.html`` http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/doc/Resource-HOWTO.html] - - - - -=Building a compiler in GF FORTHCOMING= - -The purpose of this chapter is to show how the expressive power -of GF can be used in a complete definition of a language, which -includes both its syntax and semantics. We will write a grammar -for a subset of the C programming language, taking care of -parsing and type checking. As an alternative concrete syntax, -we will use JVM (Java Virtual Machine), so that we can use -the grammar to compile C code into runnable JVM code. - -Before the chapter is finished, more information can be found in - -[``http://www.cs.chalmers.se/~aarne/GF/doc/gfcc.pdf`` http://www.cs.chalmers.se/~aarne/GF/doc/gfcc.pdf] - - - -=Using Transfer for semantics actions FORTHCOMING= - -Semantic actions on syntax trees can be defined in a general purpose language, -as is done in embedded Java and Haskell applications. But this method has -two drawbacks: -- the definitions are not portable from one language to another -- the host languages do not support the dependent type system of GF - - -In this chapter, a powerful technique provided by a separate ``transfer`` language -is introduced, and applied to build logical representations from syntax trees, -perform anaphora resolution and generation, and optimize text generation by -aggregagation. - -Before the chapter is ready, more information on ``transfer`` can be found in -[``http://www.cs.chalmers.se/~aarne/GF/doc/transfer.html`` http://www.cs.chalmers.se/~aarne/GF/doc/transfer.html] - -Many aspects of logical semantics and how they are implemented by using ``def`` -definitions are covered in the article -"Computational semantics in type theory" by Aarne Ranta -(// Mathematics and Social Sciences//, 165, pp. 31-57, 2004), -available in - -[``http://msh.revues.org/document2925.html`` http://msh.revues.org/document2925.html] - - -#PARTthree - - -=Syntax and semantics of the GF language FORTHCOMING= - -Before this chapter is written, we refer to Appendix A with a BNF grammar of GF, -and Appendix B with a quick reference. - - - -=The resource grammar API FORTHCOMING= - -Before this chapter is written, we refer to the -Resource Grammar Synopsis: - -[``http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/synopsis.html`` ../../lib/resource-1.0/synopsis.html] - - -=The low-level GFC format FORTHCOMING= - -This is the format generated by the GF grammar compiler. The format is -undergoing a revision, so a reference manual will appear later. - - -=The GF system FORTHCOMING= - -==The command language of the GF shell== - -Before this chapter is written, we refer to online help obtained -in the GF shell with the command ``help``. - - -==The multilingual syntax editor== - -The -[Editor User Manual http://www.cs.chalmers.se/~aarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm] -describes the use of the editor, which works for any multilingual GF grammar. - -Here is a snapshot of the editor: - -%#BCEN - -%#EDITORPNG - -%#ECEN - - -The grammars of the snapshot are from the -[Letter grammar package http://www.cs.chalmers.se/~aarne/GF/examples/letter]. - - -==Communicating with GF== - -Other processes can communicate with the GF command interpreter, -and also with the GF syntax editor. Useful flags when invoking GF are -- ``-batch`` suppresses the promps and structures the communication with XML tags. -- ``-s`` suppresses non-output non-error messages and XML tags. -- ``-nocpu`` suppresses CPU time indication. - - -Thus the most silent way to invoke GF is -``` - gf -batch -s -nocpu -``` - - - -=Documenting grammars with GFDoc FORTHCOMING= - -GFDoc is a very simple system generating HTML and LaTeX from GF grammars. -Some mark-up has been defined to enable annotations of the source code. - -Before this chapter is written, a summary of the tool is obtained by -the command -``` - % gfdoc -``` -The `gfdoc`` program is normally installed as a part of GF installation. - - - -#startappendix - -#PARTbnf - -#twocolumn - -#PARTquickref - -#smallsize - - -This is a quick reference on GF grammars. It aims to -cover all forms of expression available when writing -grammars. It assumes basic knowledge of GF, which -can be acquired from the Tutorial part of this book. -For the commands of the GF system, help is obtained on line by the -help command (``help``). Help on invoking -GF from the shell is obtained with (``gf -help``). - - -==A complete example== - -This is a complete example of a GF grammar divided -into three modules in files. The grammar recognizes the -phrases //one pizza// and //two pizzas//. - -File ``Order.gf``: -``` -abstract Order = { -cat - Order ; - Item ; -fun - One, Two : Item -> Order ; - Pizza : Item ; -} -``` -File ``OrderEng.gf`` (the top file): -``` ---# -path=.:prelude -concrete OrderEng of Order = - open Res, Prelude in { -flags startcat=Order ; -lincat - Order = SS ; - Item = {s : Num => Str} ; -lin - One it = ss ("one" ++ it.s ! Sg) ; - Two it = ss ("two" ++ it.s ! Pl) ; - Pizza = regNoun "pizza" ; -} -``` -File ``Res.gf``: -``` -resource Res = open Prelude in { -param Num = Sg | Pl ; -oper regNoun : Str -> {s : Num => Str} = - \dog -> {s = table { - Sg => dog ; - _ => dog + "s" - } - } ; -} -``` -To use this example, do -``` - % gf -- in shell: start GF - > i OrderEng.gf -- in GF: import grammar - > p "one pizza" -- parse string - > l Two Pizza -- linearize tree -``` - - - -==Modules and files== - -One module per file. -File named ``Foo.gf`` contains module named -``Foo``. - -Each module has the structure -``` -moduletypename = - Inherits ** -- optional - open Opens in -- optional - { Judgements } -``` -Inherits are names of modules of the same type. -Inheritance can be restricted: -``` - Mo[f,g], -- inherit only f,g from Mo - Lo-[f,g] -- inheris all but f,g from Lo -``` -Opens are possible in ``concrete`` and ``resource``. -They are names of modules of these two types, possibly -qualified: -``` - (M = Mo), -- refer to f as M.f or Mo.f - (Lo = Lo) -- refer to f as Lo.f -``` -Module types and judgements in them: -``` -abstract A -- cat, fun, def, data -concrete C of A -- lincat, lin, lindef, printname -resource R -- param, oper - -interface I -- like resource, but can have - oper f : T without definition -instance J of I -- like resource, defines opers - that I leaves undefined -incomplete -- functor: concrete that opens - concrete CI of A = one or more interfaces - open I in ... -concrete CJ of A = -- completion: concrete that - CI with instantiates a functor by - (I = J) instances of open interfaces -``` -The forms -``param``, ``oper`` -may appear in ``concrete`` as well, but are then -not inherited to extensions. - -All modules can moreover have ``flags`` and comments. -Comments have the forms -``` --- till the end of line -{- any number of lines between -} ---# used for compiler pragmas -``` -A ``concrete`` can be opened like a ``resource``. -It is translated as follows: -``` -cat C ---> oper C : Type = -lincat C = T T ** {lock_C : {}} - -fun f : G -> C ---> oper f : A* -> C* = \g -> -lin f = t t g ** {lock_C = <>} -``` -An ``abstract`` can be opened like an ``interface``. -Any ``concrete`` of it then works as an ``instance``. - - - -==Judgements== - -``` -cat C -- declare category C -cat C (x:A)(y:B x) -- dependent category C -cat C A B -- same as C (x : A)(y : B) -fun f : T -- declare function f of type T -def f = t -- define f as t -def f p q = t -- define f by pattern matching -data C = f | g -- set f,g as constructors of C -data f : A -> C -- same as - fun f : A -> C; data C=f - -lincat C = T -- define lin.type of cat C -lin f = t -- define lin. of fun f -lin f x y = t -- same as lin f = \x y -> t -lindef C = \s -> t -- default lin. of cat C -printname fun f = s -- printname shown in menus -printname cat C = s -- printname shown in menus -printname f = s -- same as printname fun f = s - -param P = C | D Q R -- define parameter type P - with constructors - C : P, D : Q -> R -> P -oper h : T = t -- define oper h of type T -oper h = t -- omit type, if inferrable - -flags p=v -- set value of flag p -``` -Judgements are terminated by semicolons (``;``). -Subsequent judgments of the same form may share the -keyword: -``` -cat C ; D ; -- same as cat C ; cat D ; -``` -Judgements can also share RHS: -``` -fun f,g : A -- same as fun f : A ; g : A -``` - - -==Types== - -Abstract syntax (in ``fun``): -``` -C -- basic type, if cat C -C a b -- basic type for dep. category -(x : A) -> B -- dep. functions from A to B -(_ : A) -> B -- nondep. functions from A to B -(p,q : A) -> B -- same as (p : A)-> (q : A) -> B -A -> B -- same as (_ : A) -> B -Int -- predefined integer type -Float -- predefined float type -String -- predefined string type -``` -Concrete syntax (in ``lincat``): -``` -Str -- token lists -P -- parameter type, if param P -P => B -- table type, if P param. type -{s : Str ; p : P}-- record type -{s,t : Str} -- same as {s : Str ; t : Str} -{a : A} **{b : B}-- record type extension, same as - {a : A ; b : B} -A * B * C -- tuple type, same as - {p1 : A ; p2 : B ; p3 : C} -Ints n -- type of n first integers -``` -Resource (in ``oper``): all those of concrete, plus -``` -Tok -- tokens (subtype of Str) -A -> B -- functions from A to B -Int -- integers -Strs -- list of prefixes (for pre) -PType -- parameter type -Type -- any type -``` -As parameter types, one can use any finite type: -``P`` defined in ``param P``, -``Ints n``, and record types of parameter types. - - - -==Expressions== - -Syntax trees = full function applications -``` -f a b -- : C if fun f : A -> B -> C -1977 -- : Int -3.14 -- : Float -"foo" -- : String -``` -Higher-Order Abstract syntax (HOAS): functions as arguments: -``` -F a (\x -> c) -- : C if a : A, c : C (x : B), - fun F : A -> (B -> C) -> C -``` -Tokens and token lists -``` -"hello" -- : Tok, singleton Str -"hello" ++ "world" -- : Str -["hello world"] -- : Str, same as "hello" ++ "world" -"hello" + "world" -- : Tok, computes to "helloworld" -[] -- : Str, empty list -``` -Parameters -``` -Sg -- atomic constructor -VPres Sg P2 -- applied constructor -{n = Sg ; p = P3} -- record of parameters -``` -Tables -``` -table { -- by full branches - Sg => "mouse" ; - Pl => "mice" - } -table { -- by pattern matching - Pl => "mice" ; - _ => "mouse" -- wildcard pattern - } -table { - n => regn n "cat" -- variable pattern - } -table Num {...} -- table given with arg. type -table ["ox"; "oxen"] -- table as course of values -\\_ => "fish" -- same as table {_ => "fish"} -\\p,q => t -- same as \\p => \\q => t - -t ! p -- select p from table t -case e of {...} -- same as table {...} ! e -``` -Records -``` -{s = "Liz"; g = Fem} -- record in full form -{s,t = "et"} -- same as {s = "et";t= "et"} -{s = "Liz"} ** -- record extension: same as - {g = Fem} {s = "Liz" ; g = Fem} - - -- tuple, same as {p1=a;p2=b;p3=c} -``` -Functions -``` -\x -> t -- lambda abstract -\x,y -> t -- same as \x -> \y -> t -\x,_ -> t -- binding not in t -``` -Local definitions -``` -let x : A = d in t -- let definition -let x = d in t -- let defin, type inferred -let x=d ; y=e in t -- same as - let x=d in let y=e in t -let {...} in t -- same as let ... in t - -t where {...} -- same as let ... in t -``` -Free variation -``` -variants {x ; y} -- both x and y possible -variants {} -- nothing possible -``` -Prefix-dependent choices -``` -pre {"a" ; "an" / v} -- "an" before v, "a" otherw. -strs {"a" ; "i" ;"o"}-- list of condition prefixes -``` -Typed expression -``` - -- same as t, to help type inference -``` -Accessing bound variables in ``lin``: use fields ``$1, $2, $3,...``. -Example: -``` -fun F : (A : Set) -> (El A -> Prop) -> Prop ; -lin F A B = {s = ["for all"] ++ A.s ++ B.$1 ++ B.s} -``` - - -==Pattern matching== - -These patterns can be used in branches of ``table`` and -``case`` expressions. Patterns are matched in the order in -which they appear in the grammar. -``` -C -- atomic param constructor -C p q -- param constr. applied to patterns -x -- variable, matches anything -_ -- wildcard, matches anything -"foo" -- string -56 -- integer -{s = p ; y = q} -- record, matches extensions too - -- tuple, same as {p1=p ; p2=q} -p | q -- disjunction, binds to first match -x@p -- binds x to what p matches -- p -- negation -p + "s" -- sequence of two string patterns -p* -- repetition of a string pattern -``` - -==Sample library functions== - -``` --- lib/prelude/Predef.gf -drop : Int -> Tok -> Tok -- drop prefix of length -take : Int -> Tok -> Tok -- take prefix of length -tk : Int -> Tok -> Tok -- drop suffix of length -dp : Int -> Tok -> Tok -- take suffix of length -occur : Tok -> Tok -> PBool -- test if substring -occurs : Tok -> Tok -> PBool -- test if any char occurs -show : (P:Type) -> P ->Tok -- param to string -read : (P:Type) -> Tok-> P -- string to param -toStr : (L:Type) -> L ->Str -- find "first" string - --- lib/prelude/Prelude.gf -param Bool = True | False -oper - SS : Type -- the type {s : Str} - ss : Str -> SS -- construct SS - cc2 : (_,_ : SS) -> SS -- concat SS's - optStr : Str -> Str -- string or empty - strOpt : Str -> Str -- empty or string - bothWays : Str -> Str -> Str -- X++Y or Y++X - init : Tok -> Tok -- all but last char - last : Tok -> Tok -- last char - prefixSS : Str -> SS -> SS - postfixSS : Str -> SS -> SS - infixSS : Str -> SS -> SS -> SS - if_then_else : (A : Type) -> Bool -> A -> A -> A - if_then_Str : Bool -> Str -> Str -> Str -``` - - -==Flags== - -Flags can appear, with growing priority, -- in files, judgement ``flags`` and without dash (``-``) -- as flags to ``gf`` when invoked, with dash -- as flags to various GF commands, with dash - - -Some common flags used in grammars: -``` -startcat=cat use this category as default - -lexer=literals int and string literals recognized -lexer=code like program code -lexer=text like text: spacing, capitals -lexer=textlit text, unknowns as string lits - -unlexer=code like program code -unlexer=codelit code, remove string lit quotes -unlexer=text like text: punctuation, capitals -unlexer=textlit text, remove string lit quotes -unlexer=concat remove all spaces -unlexer=bind remove spaces around "&+" - -optimize=all_subs best for almost any concrete -optimize=values good for lexicon concrete -optimize=all usually good for resource -optimize=noexpand for resource, if =all too big -``` -For the full set of values for ``FLAG``, -use on-line ``h -FLAG``. - - - -==File paths== - -Colon-separated lists of directories searched in the -given order: -``` ---# -path=.:../abstract:../common:prelude -``` -This can be (in order of growing preference), as -first line in the top file, as flag to ``gf`` -when invoked, or as flag to the ``i`` command. -The prefix ``--#`` is used only in files. - -If the environment variabls ``GF_LIB_PATH`` is defined, its -value is automatically prefixed to each directory to -extend the original search path. - - -==Alternative grammar formats== - -**Old GF** (before GF 2.0): -all judgements in any kinds of modules, -division into files uses ``include``s. -A file ``Foo.gf`` is recognized as the old format -if it lacks a module header. - -**Context-free** (file ``foo.cf``). The form of rules is e.g. -``` -Fun. S ::= NP "is" AP ; -``` -If ``Fun`` is omitted, it is generated automatically. -Rules must be one per line. The RHS can be empty. - -**Extended BNF** (file ``foo.ebnf``). The form of rules is e.g. -``` -S ::= (NP+ ("is" | "was") AP | V NP*) ; -``` -where the RHS is a regular expression of categories -and quoted tokens: ``"foo", CAT, T U, T|U, T*, T+, T?``, or empty. -Rule labels are generated automatically. - - -**Probabilistic grammars** (not a separate format). -You can set the probability of a function ``f`` (in its value category) by -``` ---# prob f 0.009 -``` -These are put into a file given to GF using the ``probs=File`` flag -on command line. This file can be the grammar file itself. - -**Example-based grammars** (file ``foo.gfe``). Expressions of the form -``` -in Cat "example string" -``` -are preprocessed by using a parser given by the flag -``` ---# -resource=File -``` -and the result is written to ``foo.gf``. - - diff --git a/doc/tutorial/gf-tutorial2_9.txt b/doc/tutorial/gf-tutorial2_9.txt deleted file mode 100644 index 9363e16f3..000000000 --- a/doc/tutorial/gf-tutorial2_9.txt +++ /dev/null @@ -1,4316 +0,0 @@ -Grammatical Framework: Tutorial, Advanced Applications, and Reference Manual -Author: Aarne Ranta aarne (at) cs.chalmers.se -Last update: %%date(%c) - -% NOTE: this is a txt2tags file. -% Create an html file from this file using: -% txt2tags --toc gf-tutorial2.txt - -%!target:html -%!encoding: iso-8859-1 - -%%!postproc(tex): "section\*" "section" - -%!postproc(tex): "subsection\*" "section" -%!postproc(tex): "section\*" "chapter" - -%!postproc(html): #BCEN
-%!postproc(html): #ECEN
- -%!postproc(tex): #BCEN "begin{center}" -%!postproc(tex): #ECEN "end{center}" - -%!preproc(html): #EDITORPNG [../quick-editor.png] -%!preproc(tex): #EDITORPNG [../../lib/resource-1.0/doc/10lang-small.png] - -%!preproc(html): #LOGOPNG [../gf-logo.png] -%!preproc(tex): #LOGOPNG "" - - -%!postproc(tex): #PARTone "part{Tutorial}" -%!postproc(tex): #PARTtwo "part{Advanced Applications}" -%!postproc(tex): #PARTthree "part{Reference Manual}" - - -#LOGOPNG - - - -%--! -=Introduction= - -==Natural language application programming== - -Making computers understand human language is one of the oldest dreams of -programmers. Projects with machine translations started almost as soon as -the first computers appeared in the 1940's. This was partly encouraged by the -success of decryption during the Second World War. Thus some American scientists -had the vision that Russian can be seen as encrypted English, which can be -deciphered by similar algorithms as those used for cracking the Germans' Enigma. - -Despite substantial efforts on machine translation, the early visions were not -realized, and the general conclusion reached by the mid-1960's was that -high-quality broad-coverage machine translation is impossible. Machine -translation was translated to the less ambitious and more specialized tasks of -computational linguistics. Parallel to this, fantacies of "speaking robots" and -other language-understanding machines prevailed, exemplified by such science -fiction figures as the HAL computer in the film "2001: A Space Odyssey" from -1970. - -What we see in today's market of language understanding machines is a variety of -products, which focus on different aspects of the task and none of which comes -even close to HAL or a machine translator with human-like capacities. Here is a -list of some such applications: -- browse-quality machine translation: Systran -- machine translation specialized on weather reports: Meteo -- electronic dictionaries -- spelling and grammar checkers -- dialogue systems for enabling simple speech interaction with a computer - - -A common feature of these applications is that their construction requires -**linguistic knowledge**: theoretical understanding of languages. As opposed to -practical understanding, which means the ability to speak, listen, write, and -read, theoretical understanding means knowledge of the **rules** of language. -It is by expressing these rules in a programming language the we can hope to -make a computer understand at least something of a natural language. - -This is where GF comes into picture. GF, Grammatical Framework, is a programming -language designed for expressing linguistic rules. A set of such rules is called -a **grammar**. GF is designed in such a way that it is much easier to write -grammar rules in it than in a general-purpose programming language, such as -Java or C or Haskell. At the same time, GF is equipped with tools for -**embedded grammars**. This means that a GF grammar can be used as a component -of a program written in another language, such as Java or C or Haskell. To build -a language application usually involves much more than just a grammar, and it is -important that the grammar can be integrated seemlessly with the rest of the -application. - -Since natural language application programming requires linguistic knowledge, it -is usually considered to need linguistic training. The mission of GF is to relieve -some of this need. This is achieved in two ways: -- GF works in a way familiar to ordinary programmers, namely as a **compiler** - that analyses a language and generates a result. -- GF has a set of **resource grammar libraries**, which encapsulate much of - the linguistic knowledge needed when writing grammars. - - -This said, GF makes no claim to "fire linguists" from natural language programming -projects. The claim is rather one of the **division of labour**: GF enables the -division of grammar writing into different **modules**, where some modules -require linguistic knowledge and others don't. Linguists working on the linguistic -modules will appreciate the way GF supports abstractions and generalizations, and -also the grammar development tools that enable testing of linguistic rules. -Non-linguists working on the application-oriented modules will appreciate the -possibility to take grammar rules for granted and focus on other aspects of -the program. - - - -==The history of GF and its applications== - -GF belongs to the tradition of **functional programming languages**, exemplified -by Lisp and, as later and closer relatives, ML and Haskell. An important branch -of functional programming is **type theory**, which in turn has its roots in -logic and the foundations of mathematics. GF was, at the first place, created to -implement the idea that type theory can provide **semantics**, i.e. formalize -the meaning of natural languages. Several aspects of type-theoretical semantics -were covered in the monograph //Type-Theoretical Grammar// (A. Ranta, OUP 1994). -But a stronger aspect grew out of subsequent experiments dealing with different -languages: it is possible to have a common semantics for many language, and -thereby build systems that translate between languages via the semantics. During -this period, discussions with Per Martin-Löf (Ranta's PhD supervisor at the -University of Stockholm) had a major impact on the work, and cooperation -with Petri Mäenpää at the University of Helsinki led to the first computer -implementations. - -As a stand-alone programming language, GF was first implemented in 1998. This -took place at Xerox Research Centre Europe in Grenoble, within a project entitled -//Multilingual Document Authoring//. The leading idea in the project was to -enable writing documents in multiple languages simultaneously, so that the user -need only know one of the languages; the rest will be produced automatically -via translations from the type-theoretical semantics. The Xerox staff involved -in the project included Marc Dymetman, Lauri Karttunen, Veronika Lux, -Sylvain Pogodalla, and Annie Zaenen. - -The Xerox project produced some prototype applications, e.g. a restaurant phrase -book and an editor of medical drug descriptions. The grammars that were build -remained the property of Xerox, but the GF formalism and its implementation -were released as open-source software under GNU General Public License. The -principal author of GF got an academic position in 1999, at the Department of -Computing Science of Chalmers University of Technology and Gothenburg University. -At Chalmers, both functional programming and type theory flourish, and in this -environment, GF developed into a more stable and more full-fledged programming -language. In this process, collaboration with Koen Claessen, Thierry Coquand, -Thomas Hallgren, Patrik Jansson, and Bengt Nordström made important contributions. - -The idea of making GF into "the working programmer's grammar formalism", as -opposed to a tool requiring linguistic expertise, was confirmed at Chalmers -in courses given to computer science students and later in joint research -projects. A nice experience of the courses was that computer scientists are -often very interested in languages and have firm intuitions on grammar; given -a suitable programming tool, they can achieve impressive results. GF seemed to -be close to such a tool, and, in subsequent collaborations at the Department, -it evolved even more to a programming language with a virtues of familiarity -and "the least surprise". Issues of stability are also important, including -backward compatibility, and documentation is something there can hardly be -too much of. As a mark of stability, version 1.0 of GF was released in -2002. In 2004, a theoretical reference paper appeared in the Journal -of Functional Programming, as well as a long tutorial text in the ESSLLI -lecture notes post-publication. - -The first full-scale applications of GF emerged as natural-language interfaces. -The first one was for the proof editor Alfa, written with Thomas Hallgren. -The second one was a syntax editor and a natural-language interface to the -software specification language OCL (Object Constraint Language) built -within the KeY project. This work was done first with Reiner Hähnle, then -with the students Kristoffer Johannisson (PhD 2005), Hans-Joachim Daniels, -and David Burke. On the GF implementation side, Janna Khegai (PhD 2006) built -a Java-based syntax editor. Peter Ljunglöf (PhD 2004) succeeded to identify -the complexity of parsing in GF and found an algorithm that greatly improved -the use of GF in parsing. He implemented the algorithm with Håkan Burden, and -it was later still improved by Krasimir Angelov. - -At the same time, collaboration with the Linguistics Department of -Gothenburg University served as a "linguistic sanity check" of GF. -Robin Cooper, an eminent linguist working at the Department, initiated -two efforts that have formed the development of GF: -- resource grammar libraries -- dialogue system applications - - -It was the resource grammar libraries that made GF really usable for non-linguist -programmers in more serious projects. They were heavily missed in the Alfa -project, and heavily used and improved in the KeY project. The development of -the library started in 2002; a version stable enough to be released with number -1.0 was complete in 2006, comprising ten languages. - -Dialogue systems, on the other hand, turned -out to be a major source of interesting problems and also of successful solutions. -Much of this work was carried out in the European project TALK (Tools for Ambient -Linguistic Knowledge, 2004-2006), by Björn Bringert, Rebecca Jonson, and -Peter Ljunglöf in Gothenburg, and Oliver Lemon (Edinburgh), Nadine Perera (BMW), -and Karl Weilhammer (Cambridge) at the other sites. In addition to -complete systems, this project produced supporting tools for embedded grammars -and speech recognition, and additions to the resource grammar library. - -Besides dialogue systems, multilingual authoring and translation continues -to be the main application of GF. The European WebALT project (Web Advanced -Learning Technologies, 2005-2006), used GF to build a tool for translating -mathematical exercises from formal specifications (written in MathML) to -six language. Also tool integrating GF with a computer algebra system was -developed. The project gave rise to a company, WebALT Inc. Many members -of the WebALT staff also contributed to GF and the resource grammar library: -Lauri Carlson, Glòria Casanellas, Anni Laine, Wanjiku N'gan'ga, and -Jordi Saludes. - -As of the time of writing (August 2007), the release of GF has version -number 2.8. It is a stable system that has been built with contributions -of dozens of persons and been used by at least hundreds; download figures -are in thousands. New ideas of how to apply GF are posted by users almost -every week. These users are often programmers with good knowledge of -functional languages, highly developed instinct for programming language -design, and firm intuitions on natural language. Another group of users -are those that have been trained in GF on courses. - - - -==The purpose and scope of this book== - -The purpose of this book is to serve the growing user base of GF with -a manual that gathers all relevant information in one place. However, it -is also intended to serve those who want to get started with GF, and -who don't necessarily have the technical background of the typical -users. We believe that learning to program in GF is not more difficult -than learning some other programming language; as for the linguistic -aspects, we believe that writing grammars is an excellent introduction -to the problems of linguistics, where theory can be learnt at the -same time as it is motivated by concrete problems. - -The book thus starts with a tutorial, which gradually explains all -the constructs of the GF programming language. Also the design and style -aspects of grammar engineering are covered, to help the user to scale -up from small to large and possibly collaborative applications. -After the tutorial, the book continues with a "cook book" containing -hints and case studies for advanced users. Moreover, the resource -grammar library is covered in some detail, which will help the -programmers who want to port the library to new languages, but also -motivate linguistically the choices made in the libraries. -A complete reference manual concludes the book, with a quick reference -card as an appendix. - -What is not covered by the book is theoretical discussions of -GF, especially in comparison to other grammar formalism. Even though important -in the development of GF as a scientifically justified framework, such -discussions are not relevant for programmers who want to use GF - any more -than, say, a book on Haskell has to include comparisons with Java. In fact, -introducing Haskell by references to Java may have some point, since many -of the readers can already be assumed to know Java. But, even though some -readers will know DCG or HPSG or LFG, we will not assume this; we will just -note in passing the relation between GF and context-free grammars, also -known as BNF grammars in computer science. - - - -#PARTone - -=Getting started= - -In this chapter, we will introduce the GF program and write a first GF grammar. -We show how the grammar is used for the tasks of translation and multilingual -generation. - - -==What GF is== - -We use the term GF for three different things: -- a **system** (computer program) used for working with grammars -- a **programming language** in which grammars can be written -- a **theory** about grammars and languages - - -The relation between these things is obvious: the GF system is an implementation -of the GF programming language, which in turn is built on the ideas of the -GF theory. The main focus of this book is on the GF programming language. -We learn how grammars are written in the language. At the same time, we learn -the way of thinking in the GF theory. To make this all useful and fun, we -make the grammars run on a computer by using the GF system. - - - -%--! -==What GF grammars are used for== - -A grammar is a definition of a language. -From this definition, different language processing components -can be derived: -- **parsing**: to analyse the language -- **linearization**: to generate the language -- **translation**: to analyse one language and generate another - - -A GF grammar can be seen as a declarative program from which these -processing tasks can be automatically derived. In addition, many -other tasks are readily available for GF grammars: -- **morphological analysis**: find out the possible inflection forms of words -- **morphological synthesis**: generate all inflection forms of words -- **random generation**: generate random expressions -- **corpus generation**: generate all expressions -- **treebank generation**: generate a list of trees with multiple linearizations -- **teaching quizzes**: train morphology and translation -- **multilingual authoring**: create a document in many languages simultaneously -- **speech input**: optimize a speech recognition system for your grammar - - -A typical GF application is based on a **multilingual grammar** involving -translation on a special domain. Existing applications of this idea include -- [Alfa http://www.cs.chalmers.se/~hallgren/Alfa/Tutorial/GFplugin.html]: - a natural-language interface to a proof editor - (languages: English, French, Swedish) -- [KeY http://www.key-project.org/]: - a multilingual authoring system for creating software specifications - (languages: OCL, English, German) -- [TALK http://www.talk-project.org]: - multilingual and multimodal dialogue systems - (languages: English, Finnish, French, German, Italian, Spanish, Swedish) -- [WebALT http://webalt.math.helsinki.fi/content/index_eng.html]: - a multilingual translator of mathematical exercises - (languages: Catalan, English, Finnish, French, Spanish, Swedish) -- [Numeral translator http://www.cs.chalmers.se/~bringert/gf/translate/]: - number words from 1 to 999,999 - (88 languages) - - -The specialization of a grammar to a domain makes it possible to -obtain much better translations than in an unlimited machine translation -system. This is due to the well-defined semantics of such domains. -Grammars having this character are called **application grammars**. -They are different from most grammars written by linguists just -because they are multilingual and domain-specific. - -However, there is another kind of grammars, which we call **resource grammars**. -These are large, comprehensive grammars that can be used on any domain. -The GF Resource Grammar Library has resource grammars for 10 languages. -These grammars can be used as **libraries** to define application grammars. -In this way, it is possible to write a high-quality grammar without -knowing about linguistics: in general, to write an application grammar -by using the resource library just requires practical knowledge of -the target language. and all theoretical knowledge about its grammar -is given by the libraries. - - - - -%--! -==Who is the tutorial for== - -The tutorial part of this book is mainly for programmers -who want to learn to write application grammars. -It will go through GF's programming concepts, and does not -presuppose knowledge of any of the main ingredients of GF: -linguistics, functional programming, and type theory. -Thus it should be accessible to anyone who has some -previous programming experience from any language; the basics -of using computers are also presupposed, e.g. the use of -text editors and the management of files. - -Those who already know GF well can skip the tutorial part, -or skim thorough it, and go directly to the part on advanced applications. -These will involve large scale GF programming, such as needed in resource -grammars, and also the embedding of GF in systems such as -natural-language user interfaces and dialogue systems. - - - -%--! -==The coverage of the tutorial== - -The tutorial gives a hands-on introduction to grammar writing. -We start by building a "Hello World" grammar, which covers greetings -in three languages (//hello world//, //terve maailma//, //ciao mondo//). -This **multilingual grammar** is based on the distinction, central in -GF, between the **abstract syntax** -(the logical structure) and the **concrete syntax** (the -sequence of words) of expressions. - -From the "Hello World" example, we proceed -to a larger grammar for the domain of food: -in this grammar, you can say things like -``` - this Italian cheese is delicious -``` -in English and Italian. This grammar illustrates how translation is -more than just replacement of words. For instance, the order of -words may have to be changed: -``` - Italian cheese ===> formaggio italiano -``` -Moreover, words can have different forms, and which forms -they have vary from language to language. For instance, -Italian adjectives usually have four forms where English -has just one: -``` - delicious (wine, wines, pizza, pizzas) - vino delizioso, vini deliziosi, pizza deliziosa, pizze deliziose -``` -The **morphology** of a language describes the -forms of its words. - -While the complete description of morphology -belongs to resource grammars, and the use of them will be covered -by the tutorial. However, we will explain all the -programming concepts involved in resource grammars. -The tutorial will in fact build a miniature resource grammar in order -to give an introduction to linguistically oriented grammar writing. - -Of course, we will not presuppose that the reader knows Italian. -We have chosen Italian as the example language because it has a rich -morphological structure that illustrates very well the capacities of -GF. Moreover, even those who don't know Italian, will find many of -its words familiar. The exercises will encourage the reader to -port the examples to other languages; in fact, many GF -applications work for 5-10 languages. - -Thus it is by elaborating the Food grammar example that -the tutorial makes a guided tour through most of GF. -While the constructs of the GF language are the main focus, -also the commands of the GF system are introduced as they -are needed. - -In addition to multilinguality, **semantics** is an important aspect of GF -grammars. The concepts needed for "purely linguistic" grammars belong to -the concrete syntax part of GF, whereas semantics is expressed in the abstract -syntax. After the presentation of concrete syntax constructs, we proceed -to the enrichment of abstract syntax with **dependent types**, -**variable bindings**, and **semantic definitions**. - -To learn how to write GF grammars is not the only goal of -this tutorial. We will also explain the most important -commands of the GF system. With these commands, -simple applications of grammars, such as translation and -quiz systems, can be built simply by writing scripts for the -system. - -More complicated applications, such as natural-language -interfaces and dialogue systems, moreover require programming in -some general-purpose language. The part on advanced topics will -explain how GF grammars are used as components of Haskell and Java programs. - - -%--! -==Getting the GF program== - -The GF program is open-source free software, which you can download via the -GF Homepage: - -[``http://www.cs.chalmers.se/~aarne/GF`` http://www.cs.chalmers.se/~aarne/GF] - -There you can download -- binaries for Linux, Mac OS X, and Windows -- source code and documentation -- grammar libraries and examples - - -If you want to compile GF from source, you need a Haskell compiler. -To compile the interactive editor, you also need a Java compilers. -But normally you don't have to compile, and you definitely -don't need to know Haskell or Java to use GF. - -We are assuming the availability of a Unix shell. Linux and Mac OS X users -have it automatically, the latter under the name "terminal". -Windows users are recommended to install Cywgin, the free Unix shell for Windows. - - -%--! -==Running the GF program== - -To start the GF program, assuming you have installed it, just type -``gf`` in the Unix (or Cygwin) shell: -``` - % gf -``` -You will see GF's welcome message and the prompt ``>``. -The command -``` - > help -``` -will give you a list of available commands. - -As a common convention in this Tutorial, we will use -- ``%`` as a prompt that marks system commands -- ``>`` as a prompt that marks GF commands - - -Thus you should not type these prompts, but only the characters that -follow them. - - -==A "Hello World" grammar== - -The tradition in programming language tutorials is to start with a -program that prints "Hello World" on the terminal. GF should be no -exception. But our program has features that distinguish it from -most "Hello World" programs: -- **Multilinguality**: the message is printed in many languages. -- **Reversibility**: in addition to printing, you can **parse** the - message and translate it to other languages. - - -===The program: abstract syntax and concrete syntaxes=== - -A GF program, in general, is a **multilingual grammar**. Its main parts -are -- an **abstract syntax** -- one or more **concrete syntaxes** - - -The abstract syntax defines, in a language-independent way, what **meanings** -can be expressed in the grammar. In the "Hello World" grammar we want -to express //Greetings//, where we greet a //Recipient//, which can be -//World// or //Mum// or //Friends//. Here is the entire -GF code for the abstract syntax: -``` - -- a "Hello World" grammar - abstract Hello = { - - flags startcat = Greeting ; - - cat Greeting ; Recipient ; - - fun - Hello : Recipient -> Greeting ; - World, Mum, Friends : Recipient ; - } -``` -The code has the following parts: -- a **comment** (optional), saying what the module is doing -- a **module header** indicating that it is an abstract syntax - module named ``Hello`` -- a **module body** in braces, consisting of - - a **startcat flag declaration** stating that ``Greeting`` is the - main category, i.e. the one we are most interested in - - **category declarations** stating that ``Greeting`` and ``recipient`` - are categories, i.e. types of meanings - - **function declarations** stating what meaning-building functions there - are; these are the three possible recipients, as well as the function - ``Hello`` constructing a greeting from a recipient - - -A concrete syntax defines a mapping from the abstract meanings to their -expressions in a language. We first give an English concrete syntax: -``` - concrete HelloEng of Hello = { - - lincat Greeting, Recipient = {s : Str} ; - - lin - Hello rec = {s = "hello" ++ rec.s} ; - World = {s = "world"} ; - Mum = {s = "mum"} ; - Friends = {s = "friends"} ; - } -``` -The major parts of this code are: -- a module header indicating that it is a concrete syntax of the abstract syntax - ``Hello``, itself named ``HelloEng`` -- a module body in braces, consisting of - - **linearization type definitions** stating that - ``Greeting`` and ``recipient`` are **records** with a **string** ``s`` - - **linearization definitions** telling what records are assigned to - each of the meanings defined in the abstract syntax; the recipients are - linearized to records containing single words, whereas the ``Hello`` greeting - has a function telling that the word ``hello`` is prefixed to the argument - - - - -To make the grammar truly multilingual, we add a Finnish and an Italian concrete -syntax: -``` - concrete HelloFin of Hello = { - lincat Greeting, Recipient = {s : Str} ; - lin - Hello rec = {s = "terve" ++ rec.s} ; - World = {s = "maailma"} ; - Mum = {s = "äiti"} ; - Friends = {s = "ystävät"} ; - } - - concrete HelloIta of Hello = { - lincat Greeting, Recipient = {s : Str} ; - lin - Hello rec = {s = "ciao" ++ rec.s} ; - World = {s = "mondo"} ; - Mum = {s = "mamma"} ; - Friends = {s = "amici"} ; - } -``` -Now we have a trilingual grammar usable for translation and -many other tasks, which we will now look into. - - - -===Using the grammar in the GF program=== - -In order to compile the grammar in GF, each of the four modules -has to be put in a file named //modulename//``.gf``: -``` - Hello.gf HelloEng.gf HelloFin.gf HelloIta.gf -``` -The first GF command needed when using a grammar is to **import** it. -The command has a long name, ``import``, and a short name, ``i``. -You can type either -``` - > import food.cf -``` -or -``` - > i food.cf -``` -to get the same effect. In general, all GF commands have a long and a short name; -short names are convenient when typing commands by hand, whereas long commands -are more readable in scripts, i.e. files with lists of commands. - -The effect of ``import`` is that the GF program **compiles** your grammar -into an internal representation, and shows a new prompt when it is ready. -It will also show how much CPU time was consumed: -``` - > i HelloEng.gf - - compiling Hello.gf... wrote file Hello.gfc 8 msec - - compiling HelloEng.gf... wrote file HelloEng.gfc 12 msec - - 12 msec -``` -You can now use GF for **parsing**: -``` - > parse "hello world" - Hello World -``` -The ``parse`` (= ``p``) command takes a **string** -(in double quotes) and returns an **abstract syntax tree** - the meaning -of the string defined in the abstract syntax. -A tree is, in general, something easier than a string -for a machine to understand and to process further, although this -is not so obvious in this simple grammar. - -Strings that return a tree when parsed do so in virtue of the grammar -you imported. Try parsing something that is not in grammar, and you fail -``` - > parse "hello dad" - Unknown words: dad - - > parse "world hello" - no tree found -``` -In the first example, the failure is caused by an unknown word. -In the second example, the combination of words is ungrammatical. - -In addition to parsing, you can also use GF for **linearizing** -(``linearize = l``). This is the inverse of -parsing, taking trees into strings: -``` - > linearize Hello World - hello world -``` -What is the use of this? Typically not that you type in a tree at -the GF prompt. The utility of linearization comes from the fact that -you can obtain a tree from somewhere else - for instance, from -a parser. A prime example of this is **translation**: you parse -with one concrete syntax and linearize with another. Let us -now do this by first importing the Italian grammar: -``` - > import HelloIta.gf -``` -We can now parse with ``HelloEng`` and **pipe** the result -into linearizing with ``HelloIta``: -``` - > parse -lang=HelloEng "hello mum" | linearize -lang=HelloIta - ciao mamma -``` -Notice that the commands must use a **language flag** to indicate -which concrete syntax is used in each of the operations. - -To conclude the translation exercise, we import the Finnish grammar -and pipe English parsing into **multilingual generation**: -``` - > parse -lang=HelloEng "hello friends" | linearize -multi - terve ystävät - ciao amici - hello friends -``` - -**Exercise**. Test the parsing and translation examples shown above, as well as -five other examples. - -**Exercise**. Extend the grammar ``Hello.gf`` and some of the -concrete syntaxes by five new recipients and one new greeting -form. - -**Exercise**. Add a concrete syntax for some other -languages you might know. - - - -==What else can be done with the grammar== - -Now we have built our first multilingual grammar and seen the basic -functionalities of GF: parsing and linearization. We have tested -these functionalities inside the GF program. In the forthcoming -chapters, we will build larger grammars and have more fun with -these functionalities. But we will also introduce many more: -- random generation -- exhaustive generation -- treebank generation -- syntax editing -- morphological analysis -- translation and morphological quizzes -- semantic filtering - - -The usefulness of GF would be quite limited if grammars were -usable only inside the GF program. In the forthcoming chapters, -we will see many other ways of using grammars: -- compile them to new formats, such as speech recognition grammars -- embed them in Java and Haskell programs -- build applications using compilation and embedding: - - voice commands - - spoken language translators - - dialogue systems - - user interfaces - - localization: parametrize the messages printed by a program - to support different languages - - -All GF functionalities, both those inside the GF program and those -ported to other environments, -are of course applicable to the simplest of grammars, -such as the ``Hello`` grammars presented above. But the main focus -of this tutorial will be on grammar writing. Thus we will show -how larger and more expressive grammars can be built by using -the constructs of the GF programming language, before entering the -applications in the next part of the book. - - - -==Summary of GF language features== - -A GF grammar consists of **modules**, -into which judgements are grouped. The most important -module forms are -- ``abstract`` A ``=`` M, abstract syntax A with judgements in - the module body M. -- ``concrete`` C ``of`` A ``=`` M, concrete syntax C of the - abstract syntax A, with judgements in the module body M. - - -Each module is written in a file named //Modulename//.``.gf``. - -Rules in a GF grammar are called **judgements**, and the keywords -``fun`` and ``lin`` are used for distinguishing between two -**judgement forms**. Here is a summary of the most important -judgement forms: - - - abstract syntax - - | form | reading | - | ``cat`` C | C is a category - | ``fun`` f ``:`` A | f is a function of type A - - - concrete syntax - - | form | reading | - | ``lincat`` C ``=`` T | category C has linearization type T - | ``lin`` f ``=`` t | function f has linearization t - - -Both abstract and concrete modules may moreover contain definitions of -**flags**, of the form -- ``flags`` //flag//``=``//value// - - -and **comments** of the forms -- ``--`` //anything till a newline// -- ``{-`` //anything except hyphen followed by closing brace// ``-}`` - - -Shorthands permit the sharing of -the keyword in subsequent judgements, -``` - cat Phrase ; Item ; === cat Phrase ; cat Item ; -``` -and of the right-hand-side in subsequent judgements of the same form -``` - fun World, Mum, Friends : Recipient ; === - fun World : Recipient ; Mum : Recipient ; Friends : Recipient ; -``` -The order of judgements in a module is free. In particular, an identifier -need not be declared before it is used. - -An **identifier** is a letter followed by a sequence of letters, digits, and -characters ``'`` or ``_``. Each identifier can only be -introduced once in the same module. - -**Types** in an abstract syntax are either **basic types**, -i.e. ones introduced in ``cat`` judgements, or -**function types** of the form -``` - A1 -> ... -> An -> A -``` -where each of ``A1, ..., An, A`` is a basic type (this restriction -will be relieved later). The last type in the arrow-separated sequence -is the **value type** of the function type, the earlier types are -its **argument types**. - -In a concrete syntax, the available types include -- the type of strings, ``Str`` -- record types of form ``{`` r1 : T1 ; ... ; rn : Tn ``}`` - - -**Terms** used in linearizations have the forms -- quoted string: ``"foo"``, of type ``Str`` -- record: ``{`` r1 = t1 ; ... ; rn = Tn ``}``, - of type ``{`` r1 : R1 ; ... ; rn : Rn ``}`` -- projection ``t.r`` with a record label, of the corresponding record - field type -- argument variable ``x`` bound by the left-hand-side of a ``lin`` rule, - of the corresponding linearization type - - - - - - -=Designing a grammar for complex phrases= - -We will now start with a grammar that has much more structure than -the ``Hello`` grammar. We will look at how the abstract -is divided into suitable categories, and how infinitely many -phrases can be built by using recursive rules. We will also -introduce **modularity** by showing how a large grammar can be -divided into modules, and how functions defined **resource modules** -can be used for avoiding repeated code. - - -==The abstract syntax Food== - -The grammar we wrote defines a set of phrases usable for speaking about food: -- the main category is ``Phrase`` -- a ``Phrase`` can be built by assigning a ``Quality`` to an ``Item``s -- an``Item`` are build from a ``Kind`` by prefixing "this" or "that" -- a ``Kind`` is either **atomic**, such as "cheese" and "wine", or formed - modifying a given ``Kind`` with a ``Quality`` -- a ``Quality`` is either atomic, such as "Italian" and "boring", - or built by modifying a given ``Quality`` "very" - - -These verbal descriptions can be expressed as the following abstract syntax: -``` - abstract Food = { - - flags startcat = Phrase ; - - cat - Phrase ; Item ; Kind ; Quality ; - - fun - Is : Item -> Quality -> Phrase ; - This, That : Kind -> Item ; - QKind : Quality -> Kind -> Kind ; - Wine, Cheese, Fish : Kind ; - Very : Quality -> Quality ; - Fresh, Warm, Italian, Expensive, Delicious, Boring : Quality ; - } -``` -In the concrete syntax, we will be able to build phrases such as -``` - this delicious Italian wine is very very expensive -``` - - -==The concrete syntax FoodEng== - -The English concrete syntax gives no surprises: -``` - concrete FoodEng of Food = { - - lincat - Phrase, Item, Kind, Quality = {s : Str} ; - - lin - Is item quality = {s = item.s ++ "is" ++ quality.s} ; - This kind = {s = "this" ++ kind.s} ; - That kind = {s = "that" ++ kind.s} ; - QKind quality kind = {s = quality.s ++ kind.s} ; - Wine = {s = "wine"} ; - Cheese = {s = "cheese"} ; - Fish = {s = "fish"} ; - Very quality = {s = "very" ++ quality.s} ; - Fresh = {s = "fresh"} ; - Warm = {s = "warm"} ; - Italian = {s = "Italian"} ; - Expensive = {s = "expensive"} ; - Delicious = {s = "delicious"} ; - Boring = {s = "boring"} ; - } -``` -Let us test how the grammar works in parsing: -``` - > import FoodEng.gf - > parse "this delicious wine is very very Italian" - Is (This (QKind Delicious Wine)) (Very (Very Italian)) -``` -You can also try parsing in other categories than the ``startcat``, -by setting the command-line ``cat`` flag: -``` - p -cat=Kind "very Italian wine" - QKind (Very Italian) Wine -``` - -**Exercise**. Extend the ``Food`` grammar by ten new food kinds and -qualities, and run the parser with new kinds of examples. - - -**Exercise**. Add a rule that enables question phrases of the form -//is this cheese Italian//. - - -**Exercise**. Enable the optional prefixing of -phrases with the words "excuse me but". Do this in such a way that -the prefix can occur at most once. - - - -==Commands for testing grammars== - -===Generating trees and strings=== - -When we have a grammar above the trivial size, especially a recursive -one, we need more efficient ways of testing it than just by parsing -sentences that happen to come to our minds. One way to do this is -based on **automatic generation**, which can be either -**random** or **exhausive**. - -Random generation (``generate_random = gr``) is an operation that -builds a random tree in accordance with an abstract syntax: -``` - > generate_random - Is (This (QKind Italian Fish)) Fresh -``` -By using a pipe, random generation can be fed into linearization: -``` - > gr | l - this Italian fish is fresh -``` -Random generation is a good way to test a grammar; it can also -be fun. By using the ``number`` flag, several strings can be generated -in one command: -``` - > gr -number=10 | l - that wine is boring - that fresh cheese is fresh - that cheese is very boring - this cheese is Italian - that expensive cheese is expensive - that fish is fresh - that wine is very Italian - this wine is Italian - this cheese is boring - this fish is boring -``` -To generate //all// phrases that a grammar can produce, -GF provides the command ``generate_trees = gt``. -``` - > generate_trees | l - that cheese is very Italian - that cheese is very boring - that cheese is very delicious - that cheese is very expensive - that cheese is very fresh - ... - this wine is expensive - this wine is fresh - this wine is warm - -``` -You get quite a few trees but not all of them: only up to a given -**depth** of trees. The default depth is 3; the depth can be -set by using the ``depth`` flag: -``` - > generate_trees -depth=5 | l -``` -Other options to the generation commands (like all commands) can be seen -by GF's ``help = h`` command: -``` - > help gr - > help gt -``` - -**Exercise**. If the command ``gt`` generated all -trees in your grammar, it would never terminate. Why? - -**Exercise**. Measure how many trees the grammar gives with depths 4 and 5, -respectively. You use the Unix **word count** command ``wc`` to count lines. -**Hint**. You can pipe the output of a GF command into a Unix command by -using the escape ``?``, as follows: -``` - > generate_trees -depth=4 | ? wc -``` - - - - - -===More on pipes; tracing=== - -A pipe of GF commands can have any length, but the "output type" -(either string or tree) of one command must always match the "input type" -of the next command, in order for the result to make sense. - -The intermediate results in a pipe can be observed by putting the -**tracing** flag ``-tr`` to each command whose output you -want to see: -``` - > gr -tr | l -tr | p - - Is (This Cheese) Boring - this cheese is boring - Is (This Cheese) Boring -``` -This facility is good for test purposes: for instance, you -may want to see if a grammar is **ambiguous**, i.e. -contains strings that can be parsed in more than one way. - -**Exercise**. Extend the ``Food`` grammar so that it produces ambiguous -strings, and try out the ambiguity test. - - - -===Writing and reading files=== - -To save the outputs of GF commands into a file, you can -pipe it to the ``write_file = wf`` command, -``` - > gr -number=10 | l | write_file exx.tmp -``` -You can read the file back to GF with the -``read_file = rf`` command, -``` - > read_file exx.tmp | p -lines -``` -Notice the flag ``-lines`` given to the parsing -command. This flag tells GF to parse each line of -the file separately. Without the flag, the grammar could -not recognize the string in the file, because it is not -a sentence but a sequence of ten sentences. - -Files with examples can be used for **regression testing** -of grammars. - - - - -%--! -==Modules and files== - -GF uses suffixes to recognize different file formats. The most -important ones are: -- Source files: //Modulname//``.gf`` -- Target files: //Modulname//``.gfc`` - - -When you import ``FoodEng.gf``, you see the target files being -generated: -``` - > i FoodEng.gf - - compiling Food.gf... wrote file Food.gfc 16 msec - - compiling FoodEng.gf... wrote file FoodEng.gfc 20 msec -``` -You also see that the GF program does not only read the file -``FoodEng.gf``, but also all other files that it -depends on - in this case, ``Food.gf``. - -For each file that is compiled, a ``.gfc`` file -is generated. The GFC format (="GF Canonical") is the -"machine code" of GF, which is faster to process than -GF source files. When reading a module, GF decides whether -to use an existing ``.gfc`` file or to generate -a new one, by looking at modification times. - -**Exercise**. What happens when you import ``FoodEng.gf`` for -a second time? Try this in different situations: -- Right after importing it the first time (the modules are kept in - the memory of GF and need no reloading). -- After issuing the command ``empty`` (``e``), which clears the memory - of GF. -- After making a small change in ``FoodEng.gf``, be it only an added space. -- After making a change in ``Food.gf``. - - - -==An Italian concrete syntax== - -We write the Italian grammar in a straightforward way, by replacing -English words with their usual dictionary equivalents: -``` - concrete FoodIta of Food = { - - lincat - Phrase, Item, Kind, Quality = {s : Str} ; - - lin - Is item quality = {s = item.s ++ "è" ++ quality.s} ; - This kind = {s = "questo" ++ kind.s} ; - That kind = {s = "quello" ++ kind.s} ; - QKind quality kind = {s = kind.s ++ quality.s} ; - Wine = {s = "vino"} ; - Cheese = {s = "formaggio"} ; - Fish = {s = "pesce"} ; - Very quality = {s = "molto" ++ quality.s} ; - Fresh = {s = "fresco"} ; - Warm = {s = "caldo"} ; - Italian = {s = "italiano"} ; - Expensive = {s = "caro"} ; - Delicious = {s = "delizioso"} ; - Boring = {s = "noioso"} ; - } -``` -An alert reader, or one who already knows Italian, may notice one point in -which a change more radical than replacement of words is made: the order of -a quality and the kind it modifies in -``` - QKind quality kind = {s = kind.s ++ quality.s} ; -``` -Thus Italian says ``vino italiano`` for ``Italian wine``. - -**Exercise**. Write a concrete syntax of ``Food`` for some other language. -You will probably end up with grammatically incorrect output - but don't -worry about this yet. - -**Exercise**. If you have written ``Food`` for German, Swedish, or some -other language, test with random or exhaustive generation what constructs -come out incorrect, and prepare a list of those ones that cannot be helped -with the currently available fragment of GF. - - - -==More application of multilingual grammars== - -===Multilingual treebanks=== - -A **multilingual treebank**, is a set of trees with their -translations in different languages: -``` - > gr -number=2 | tree_bank - - Is (That Cheese) (Very Boring) - quello formaggio è molto noioso - that cheese is very boring - - Is (That Cheese) Fresh - quello formaggio è fresco - that cheese is fresh -``` - - -===Translation session=== - -If translation is what you want to do with a set of grammars, a convenient -way to do it is to open a ``translation_session = ts``. In this session, -you can translate between all the languages that are in scope. -A dot ``.`` terminates the translation session. -``` - > ts - - trans> that very warm cheese is boring - quello formaggio molto caldo è noioso - that very warm cheese is boring - - trans> questo vino molto italiano è molto delizioso - questo vino molto italiano è molto delizioso - this very Italian wine is very delicious - - trans> . - > -``` - - -===Translation quiz=== - -This is a simple language exercise that can be automatically -generated from a multilingual grammar. The system generates a set of -random sentences, displays them in one language, and checks the user's -answer given in another language. The command ``translation_quiz = tq`` -makes this in a subshell of GF. -``` - > translation_quiz FoodEng FoodIta - - Welcome to GF Translation Quiz. - The quiz is over when you have done at least 10 examples - with at least 75 % success. - You can interrupt the quiz by entering a line consisting of a dot ('.'). - - this fish is warm - questo pesce è caldo - > Yes. - Score 1/1 - - this cheese is Italian - questo formaggio è noioso - > No, not questo formaggio è noioso, but - questo formaggio è italiano - - Score 1/2 - this fish is expensive -``` -You can also generate a list of translation exercises and save it in a -file for later use, by the command ``translation_list = tl`` -``` - > translation_list -number=25 FoodEng FoodIta | write_file transl.txt -``` -The ``number`` flag gives the number of sentences generated. - - - -===Multilingual syntax editing=== - -Any multilingual grammar can be used in the graphical syntax editor, which is -opened by the shell -command ``gfeditor`` followed by the names of the grammar files. -Thus -``` - % gfeditor FoodEng.gf FoodIta.gf -``` -opens the editor for the two ``Food`` grammars. - -The editor supports commands for manipulating an abstract syntax tree. -The process is started by choosing a category from the "New" menu. -Choosing ``Phrase`` creates a new tree of type ``Phrase``. A new tree -is in general completely unknown: it consists of a **metavariable** -``?1``. However, since the category ``Phrase`` in ``Food`` has -only one possible constructor, ``Is``, the tree is readily -given the form ``Is ?1 ?2``. Here is what the editor looks like at -this stage: - - [food1.png] - -Editing goes on by **refinements**, i.e. choices of constructors from -the menu, until no metavariables remain. Here is a tree resulting from the -current editing session: - - [food2.png] - -Editing can be continued even when the tree is finished. The user can shift -the **focus** to some of the subtrees by clicking at it of the corresponding -part of a linearization. In the picture, the focus is on "fish". -The menu shows no refinements, since there are no metavariables, but other -possible actions: -- to **change** "fish" to "cheese" or "wine" -- to **delete** "fish", i.e. change it to a metavariable -- to **wrap** "fish" in a qualification, i.e. change it to - ``QKind ? Fish``, where the quality can be given in a later refinement - - -In adition to menu-based editing, the tool supports refinement by parsing, -which gets accessible by middle-clicking at the linearization field. - -**Exercise**. Construct the sentence -//this very expensive cheese is very very delicious// -and its Italian translation by using ``gfeditor``. - - -==The context-free grammar format== - -Readers not familar with context-free grammars, also known as BNF grammars, can -skip this section. Those that are familar with them will find here the exact -relation between GF and context-free grammars. We will moreover show how -the BNF format can be used as input to the GF program; it is often more -concise than GF proper, but also more restricted in expressive power. - - - -==Using resource modules== - -===The golden rule of functional programming=== - -When writing a grammar, you have to type lots of -characters. You have probably -done this by the copy-paste-modify method, which is a common way to -avoid repeating work. - -However, there is a more elegant way to avoid repeating work than -the copy-and-paste -method. The **golden rule of functional programming** says that -- whenever you find yourself programming by copy-and-paste, - write a function instead. - - -A function separates the shared parts of different computations from the -changing parts, its **arguments**, or **parameters**. -In functional programming languages, such as -[Haskell http://www.haskell.org], it is possible to share much more -code with functions than in languages such as C and Java, because -of higher-order functions (functions that takes functions as arguments). - - -===Operation definitions=== - -GF is a functional programming language, not only in the sense that -the abstract syntax is a system of functions (``fun``), but also because -functional programming can be used when defining concrete syntax. This is -done by using a new form of judgement, with the keyword ``oper`` (for -**operation**), distinct from ``fun`` for the sake of clarity. -Here is a simple example of an operation: -``` - oper ss : Str -> {s : Str} = \x -> {s = x} ; -``` -The operation can be **applied** to an argument, and GF will -**compute** the application into a value. For instance, -``` - ss "boy" ===> {s = "boy"} -``` -We use the symbol ``===>`` to indicate how an expression is -computed into a value; this symbol is not a part of GF. - -Thus an ``oper`` judgement includes the name of the defined operation, -its type, and an expression defining it. As for the syntax of the defining -expression, notice the **lambda abstraction** form ``\``//x// ``->`` //t// of -the function. It reads: function with variable //x// and **function body** -//t//. - -For lambda abstraction with multiple arguments, we have the shorthand -``` - \x,y,z -> t === \x -> \y -> \z -> t -``` -The notation we have used for linearization rules, -``` - lin f x y = t -``` -is shorthand for -``` - lin f = \x,y -> t -``` - - - - - -%--! -===The ``resource`` module type=== - -Operator definitions can be included in a concrete syntax. -But they are not really tied to a particular set of linearization rules. -They should rather be seen as **resources** -usable in many concrete syntaxes. - -The ``resource`` module type is used to package -``oper`` definitions into reusable resources. Here is -an example, with a handful of operations to manipulate -strings and records. -``` - resource StringOper = { - oper - SS : Type = {s : Str} ; - ss : Str -> SS = \x -> {s = x} ; - cc : SS -> SS -> SS = \x,y -> ss (x.s ++ y.s) ; - prefix : Str -> SS -> SS = \p,x -> ss (p ++ x.s) ; - } -``` -Resource modules can extend other resource modules, in the -same way as modules of other types can extend modules of the -same type. Thus it is possible to build resource hierarchies. - - - -%--! -===Opening a resource=== - -Any number of ``resource`` modules can be -**opened** in a ``concrete`` syntax, which -makes definitions contained -in the resource usable in the concrete syntax. Here is -an example, where the resource ``StringOper`` is -opened in a new version of ``FoodEng``. -``` - concrete FoodEng of Food = open StringOper in { - - lincat - S, Item, Kind, Quality = SS ; - - lin - Is item quality = cc item (prefix "is" quality) ; - This k = prefix "this" k ; - That k = prefix "that" k ; - QKind k q = cc k q ; - Wine = ss "wine" ; - Cheese = ss "cheese" ; - Fish = ss "fish" ; - Very = prefix "very" ; - Fresh = ss "fresh" ; - Warm = ss "warm" ; - Italian = ss "Italian" ; - Expensive = ss "expensive" ; - Delicious = ss "delicious" ; - Boring = ss "boring" ; - } -``` - -**Exercise**. Use the same string operations to write ``FoodIta`` -more concisely. - - - -%--! -===Partial application=== - -GF, like Haskell, permits **partial application** of -functions. An example of this is the rule -``` - lin This k = prefix "this" k ; -``` -which can be written more concisely -``` - lin This = prefix "this" ; -``` -The first form is perhaps more intuitive to write -but, once you get used to partial application, you will appreciate its -conciseness and elegance. The logic of partial application -is known as **currying**, with a reference to Haskell B. Curry. -The idea is that any //n//-place function can be defined as a 1-place -function whose value is an //n-//1 -place function. Thus -``` - oper prefix : Str -> SS -> SS ; -``` -can be used as a 1-place function that takes a ``Str`` into a -function ``SS -> SS``. The expected linearization of ``This`` is exactly -a function of such a type, operating on an argument of type ``Kind`` -whose linearization is of type ``SS``. Thus we can define the -linearization directly as ``prefix "this"``. - -**Exercise**. Define an operation ``infix`` analogous to ``prefix``, -such that it allows you to write -``` - lin Is = infix "is" ; -``` - - - -===Testing resource modules=== - -To test a ``resource`` module independently, you must import it -with the flag ``-retain``, which tells GF to retain ``oper`` definitions -in the memory; the usual behaviour is that ``oper`` definitions -are just applied to compile linearization rules -(this is called **inlining**) and then thrown away. -``` - > i -retain StringOper.gf -``` -The command ``compute_concrete = cc`` computes any expression -formed by operations and other GF constructs. For example, -``` - > compute_concrete prefix "in" (ss "addition") - { - s : Str = "in" ++ "addition" - } -``` - - - - -==Grammar architecture== - -===Extending a grammar=== - -The module system of GF makes it possible to **extend** a -grammar in different ways. The syntax of extension is -shown by the following example. We extend ``Food`` by -adding a category of questions and two new functions. -``` - abstract Morefood = Food ** { - cat - Question ; - fun - QIs : Item -> Quality -> Question ; - Pizza : Kind ; - - } -``` -Parallel to the abstract syntax, extensions can -be built for concrete syntaxes: -``` - concrete MorefoodEng of Morefood = FoodEng ** { - lincat - Question = {s : Str} ; - lin - QIs item quality = {s = "is" ++ item.s ++ quality.s} ; - Pizza = {s = "pizza"} ; - } -``` -The effect of extension is that all of the contents of the extended -and extending module are put together. We also say that the new -module **inherits** the contents of the old module. - -At the same time as extending a module of the same type, a concrete -syntax module may open resources. The syntax is shown by the -following Italian grammar module: -``` - concrete MorefoodIta of Morefood = FoodIta ** open StringOper in { - lincat - Question = SS ; - lin - QIs item quality = ss (item.s ++ "è" ++ quality.s) ; - Pizza = ss "pizza" ; - } -``` - - - -===Multiple inheritance=== - -Specialized vocabularies can be represented as small grammars that -only do "one thing" each. For instance, the following are grammars -for fruit and mushrooms -``` - abstract Fruit = { - cat Fruit ; - fun Apple, Peach : Fruit ; - } - - abstract Mushroom = { - cat Mushroom ; - fun Cep, Agaric : Mushroom ; - } -``` -They can afterwards be combined into bigger grammars by using -**multiple inheritance**, i.e. extension of several grammars at the -same time: -``` - abstract Foodmarket = Food, Fruit, Mushroom ** { - fun - FruitKind : Fruit -> Kind ; - MushroomKind : Mushroom -> Kind ; - } -``` - -**Exercise**. Refactor ``Food`` by taking apart ``Wine`` into a special -``Drink`` module. - - - -===System commands=== - -To document your grammar, you may want to print the -graph into a file, e.g. a ``.png`` file that -can be included in an HTML document. You can do this -by first printing the graph into a file ``.dot`` and then -processing this file with the ``dot`` program (from the Graphviz package). -``` - > pm -printer=graph | wf Foodmarket.dot - > ! dot -Tpng Foodmarket.dot > Foodmarket.png -``` -The latter command is a Unix command, issued from GF by using the -shell escape symbol ``!``. The resulting graph was shown in the previous section. - -The command ``print_multi = pm`` is used for printing the current multilingual -grammar in various formats, of which the format ``-printer=graph`` just -shows the module dependencies. Use ``help`` to see what other formats -are available: -``` - > help pm - > help -printer - > help help -``` -Another form of system commands are those usable in GF pipes. The escape symbol -is then ``?``. -``` - > generate_trees | ? wc -``` - - -===Division of labour=== - -Using operations defined in resource modules is a -way to avoid repetitive code. -In addition, it enables a new kind of modularity -and division of labour in grammar writing: grammarians familiar with -the linguistic details of a language can make their knowledge -available through resource grammar modules, whose users only need -to pick the right operations and not to know their implementation -details. - -In the following sections, we will go through some -such linguistic details. The programming constructs needed when -doing this are useful for all GF programmers, even for those who don't -hand-code the linguistics of their applications but get them -from libraries. And it is quite interesting to know something about the -linguistic concepts of inflection, agreement, and parts of speech. - - -==Summary of GF language features== - -Module extensions, multiple inheritance. - -Resource modules. - -Oper judgements. - -Lambda abstraction. - -The ``.cf`` grammar format. - - - - -=Grammars with parameters= - -==The problem: words have to be inflected== - -Suppose we want to say, with the vocabulary included in -``Food.gf``, things like -``` - all Italian wines are delicious -``` -The new grammatical facility we need are the plural forms -of nouns and verbs (//wines, are//), as opposed to their -singular forms. - -The introduction of plural forms requires two things: -- the **inflection** of nouns and verbs in singular and plural -- the **agreement** of the verb to subject: - the verb must have the same number as the subject - - -Different languages have different rules of inflection and agreement. -For instance, Italian has also agreement in gender (masculine vs. feminine). -We want to express such special features of languages in the -concrete syntax while ignoring them in the abstract syntax. - -To be able to do all this, we need one new judgement form -and many new expression forms. -We also need to generalize linearization types -from strings to more complex types. - -**Exercise**. Make a list of the possible forms that nouns, -adjectives, and verbs can have in some languages that you know. - - -%--! -==Parameters and tables== - -We define the **parameter type** of number in English by -using a new form of judgement: -``` - param Number = Sg | Pl ; -``` -To express that ``Kind`` expressions in English have a linearization -depending on number, we replace the linearization type ``{s : Str}`` -with a type where the ``s`` field is a **table** depending on number: -``` - lincat Kind = {s : Number => Str} ; -``` -The **table type** ``Number => Str`` is in many respects similar to -a function type (``Number -> Str``). The main difference is that the -argument type of a table type must always be a parameter type. This means -that the argument-value pairs can be listed in a finite table. The following -example shows such a table: -``` - lin Cheese = {s = table { - Sg => "cheese" ; - Pl => "cheeses" - } - } ; -``` -The table consists of **branches**, where a **pattern** on the -left of the arrow ``=>`` is assigned a **value** on the right. - -The application of a table to a parameter is done by the **selection** -operator ``!``. For instance, -``` - table {Sg => "cheese" ; Pl => "cheeses"} ! Pl -``` -is a selection that computes into the value ``"cheeses"``. -This computation is performed by **pattern matching**: return -the value from the first branch whose pattern matches the -selection argument. Thus -``` - table {Sg => "cheese" ; Pl => "cheeses"} ! Pl - ===> "cheeses" -``` - -**Exercise**. In a previous exercise, we made a list of the possible -forms that nouns, adjectives, and verbs can have in some languages that -you know. Now take some of the results and implement them by -using parameter type definitions and tables. Write them into a ``resource`` -module, which you can test by using the command ``compute_concrete``. - - - -%--! -==Inflection tables and paradigms== - -All English common nouns are inflected in number, most of them in the -same way: the plural form is obtained from the singular by adding the -ending //s//. This rule is an example of -a **paradigm** - a formula telling how the inflection -forms of a word are formed. - -From the GF point of view, a paradigm is a function that takes a **lemma** - -also known as a **dictionary form** - and returns an inflection -table of desired type. Paradigms are not functions in the sense of the -``fun`` judgements of abstract syntax (which operate on trees and not -on strings), but operations defined in ``oper`` judgements. -The following operation defines the regular noun paradigm of English: -``` - oper regNoun : Str -> {s : Number => Str} = \x -> { - s = table { - Sg => x ; - Pl => x + "s" - } - } ; -``` -The **gluing** operator ``+`` tells that -the string held in the variable ``x`` and the ending ``"s"`` -are written together to form one **token**. Thus, for instance, -``` - (regNoun "cheese").s ! Pl ===> "cheese" + "s" ===> "cheeses" -``` - -**Exercise**. Identify cases in which the ``regNoun`` paradigm does not -apply in English, and implement some alternative paradigms. - -**Exercise**. Implement a paradigm for regular verbs in English. - -**Exercise**. Implement some regular paradigms for other languages you have -considered in earlier exercises. - - - -==Using parameters in concrete syntax== - -We can now enrich the concrete syntax definitions to -comprise morphology. This will permit a more radical -variation between languages (e.g. English and Italian) -then just the use of different words. In general, -parameters and linearization types are different in -different languages - but this does not prevent the -use of a common abstract syntax. - - -%--! -===Parametric vs. inherent features, agreement=== - -The rule of subject-verb agreement in English says that the verb -phrase must be inflected in the number of the subject. This -means that a noun phrase (functioning as a subject), inherently -has a number, which it passes to the verb. The verb does not -//have// a number, but must be able to //receive// whatever number the -subject has. This distinction is nicely represented by the -different linearization types of **noun phrases** and **verb phrases**: -``` - lincat NP = {s : Str ; n : Number} ; - lincat VP = {s : Number => Str} ; -``` -We say that the number of ``NP`` is an **inherent feature**, -whereas the number of ``NP`` is a **variable feature** (or a -**parametric feature**). - -The agreement rule itself is expressed in the linearization rule of -the predication function: -``` - lin PredVP np vp = {s = np.s ++ vp.s ! np.n} ; -``` -The following section will present -``FoodsEng``, assuming the abstract syntax ``Foods`` -that is similar to ``Food`` but also has the -plural determiners ``These`` and ``Those``. -The reader is invited to inspect the way in which agreement works in -the formation of sentences. - - -%--! -===English concrete syntax with parameters=== - -The grammar uses both -[``Prelude`` ../../lib/prelude/Prelude.gf] and -[``MorphoEng`` resource/MorphoEng]. -We will later see how to make the grammar even -more high-level by using a resource grammar library -and parametrized modules. -``` ---# -path=.:resource:prelude - -concrete FoodsEng of Foods = open Prelude, MorphoEng in { - - lincat - S, Quality = SS ; - Kind = {s : Number => Str} ; - Item = {s : Str ; n : Number} ; - - lin - Is item quality = - ss (item.s ++ (mkVerb "are" "is").s ! item.n ++ quality.s) ; - This = det Sg "this" ; - That = det Sg "that" ; - These = det Pl "these" ; - Those = det Pl "those" ; - QKind quality kind = {s = \\n => quality.s ++ kind.s ! n} ; - Wine = regNoun "wine" ; - Cheese = regNoun "cheese" ; - Fish = mkNoun "fish" "fish" ; - Very = prefixSS "very" ; - Fresh = ss "fresh" ; - Warm = ss "warm" ; - Italian = ss "Italian" ; - Expensive = ss "expensive" ; - Delicious = ss "delicious" ; - Boring = ss "boring" ; - - oper - det : Number -> Str -> Noun -> {s : Str ; n : Number} = - \n,d,cn -> { - s = d ++ cn.s ! n ; - n = n - } ; -} -``` - - -==Pattern matching== - -We have so far built all expressions of the ``table`` form -from branches whose patterns are constants introduced in -``param`` definitions, as well as constant strings. -But there are more expressive patterns. Here is a summary of the possible forms: -- a constructor pattern (identifier introduced in a ``param`` definition) matches - the identical constructor -- a variable pattern (identifier other than constant parameter) matches anything -- the wild card ``_`` matches anything -- a string literal pattern, e.g. ``"s"``, matches the same string -- a disjunctive pattern ``P | ... | Q`` matches anything that - one of the disjuncts matches - - -Pattern matching is performed in the order in which the branches -appear in the table: the branch of the first matching pattern is followed. -As a first example, let us take an English noun that has the same form in -singular and plura: -``` - lin Fish = {s = table {_ => "fish"}} ; -``` -As syntactic sugar, one-branch tables can be written concisely, -``` - \\P,...,Q => t === table {P => ... table {Q => t} ...} -``` -Thus we could rewrite the above rule -``` - lin Fish = {s = \\_ => "fish"} ; -``` -Finally, the ``case`` expressions common in functional -programming languages are syntactic sugar for table selections: -``` - case e of {...} === table {...} ! e -``` - - - -%--! -==Hierarchic parameter types== - -The reader familiar with a functional programming language such as -[Haskell http://www.haskell.org] must have noticed the similarity -between parameter types in GF and **algebraic datatypes** (``data`` definitions -in Haskell). The GF parameter types are actually a special case of algebraic -datatypes: the main restriction is that in GF, these types must be finite. -(It is this restriction that makes it possible to invert linearization rules into -parsing methods.) - -However, finite is not the same thing as enumerated. Even in GF, parameter -constructors can take arguments, provided these arguments are from other -parameter types - only recursion is forbidden. Such parameter types impose a -hierarchic order among parameters. They are often needed to define -the linguistically most accurate parameter systems. - -To give an example, Swedish adjectives -are inflected in number (singular or plural) and -gender (uter or neuter). These parameters would suggest 2*2=4 different -forms. However, the gender distinction is done only in the singular. Therefore, -it would be inaccurate to define adjective paradigms using the type -``Gender => Number => Str``. The following hierarchic definition -yields an accurate system of three adjectival forms. -``` - param AdjForm = ASg Gender | APl ; - param Gender = Utr | Neutr ; -``` -Here is an example of pattern matching, the paradigm of regular adjectives. -``` - oper regAdj : Str -> AdjForm => Str = \fin -> table { - ASg Utr => fin ; - ASg Neutr => fin + "t" ; - APl => fin + "a" ; - } -``` -A constructor can be used as a pattern that has patterns as arguments. For instance, -the adjectival paradigm in which the two singular forms are the same, -can be defined -``` - oper plattAdj : Str -> AdjForm => Str = \platt -> table { - ASg _ => platt ; - APl => platt + "a" ; - } -``` - - - - -%--! -==Discontinuous constituents== - -A linearization type may contain more strings than one. -An example of where this is useful are English particle -verbs, such as //switch off//. The linearization of -a sentence may place the object between the verb and the particle: -//he switched it off//. - -The following judgement defines transitive verbs as -**discontinuous constituents**, i.e. as having a linearization -type with two strings and not just one. -``` - lincat TV = {s : Number => Str ; part : Str} ; -``` -This linearization rule -shows how the constituents are separated by the object in complementization. -``` - lin PredTV tv obj = {s = \\n => tv.s ! n ++ obj.s ++ tv.part} ; -``` -There is no restriction in the number of discontinuous constituents -(or other fields) a ``lincat`` may contain. The only condition is that -the fields must be of finite types, i.e. built from records, tables, -parameters, and ``Str``, and not functions. - -A mathematical result -about parsing in GF says that the worst-case complexity of parsing -increases with the number of discontinuous constituents. This is -potentially a reason to avoid discontinuous constituents. -Moreover, the parsing and linearization commands only give accurate -results for categories whose linearization type has a unique ``Str`` -valued field labelled ``s``. Therefore, discontinuous constituents -are not a good idea in top-level categories accessed by the users -of a grammar application. - - -**Exercise**. Define the language ``a^n b^n c^n`` in GF. - - -==More constructs for concrete syntax== - -In this section, we go through constructs that are not necessary -in simple grammars or when the concrete syntax relies on libraries. -But they are useful when writing advanced concrete syntax implementations, -such as resource grammar libraries. Moreover, they conclude -the presentation of concrete syntax constructs. - - -%--! -===Local definitions=== - -Local definitions ("``let`` expressions") are used in functional -programming for two reasons: to structure the code into smaller -expressions, and to avoid repeated computation of one and -the same expression. Here is an example, from -[``MorphoIta`` resource/MorphoIta.gf]: -``` - oper regNoun : Str -> Noun = \vino -> - let - vin = init vino ; - o = last vino - in - case o of { - "a" => mkNoun Fem vino (vin + "e") ; - "o" | "e" => mkNoun Masc vino (vin + "i") ; - _ => mkNoun Masc vino vino - } ; -``` - - - -===Record extension and subtyping=== - -Record types and records can be **extended** with new fields. For instance, -in German it is natural to see transitive verbs as verbs with a case. -The symbol ``**`` is used for both constructs. -``` - lincat TV = Verb ** {c : Case} ; - - lin Follow = regVerb "folgen" ** {c = Dative} ; -``` -To extend a record type or a record with a field whose label it -already has is a type error. It is also an error to extend a type or -object that is not a record. - -A record type //T// is a **subtype** of another one //R//, if //T// has -all the fields of //R// and possibly other fields. For instance, -an extension of a record type is always a subtype of it. - -If //T// is a subtype of //R//, an object of //T// can be used whenever -an object of //R// is required. For instance, a transitive verb can -be used whenever a verb is required. - -**Contravariance** means that a function taking an //R// as argument -can also be applied to any object of a subtype //T//. - - - -===Tuples and product types=== - -Product types and tuples are syntactic sugar for record types and records: -``` - T1 * ... * Tn === {p1 : T1 ; ... ; pn : Tn} - === {p1 = T1 ; ... ; pn = Tn} -``` -Thus the labels ``p1, p2,...`` are hard-coded. - - -===Record and tuple patterns=== - -Record types of parameter types also count as parameter types. -A typical example is a record of agreement features, e.g. French -``` - oper Agr : PType = {g : Gender ; n : Number ; p : Person} ; -``` -Notice the term ``PType`` rather than just ``Type`` referring to -parameter types. Every ``PType`` is also a ``Type``, but not vice-versa. - -Pattern matching is done in the expected way, but it can moreover -utilize partial records: the branch -``` - {g = Fem} => t -``` -in a table of type ``Agr => T`` means the same as -``` - {g = Fem ; n = _ ; p = _} => t -``` -Tuple patterns are translated to record patterns in the -same way as tuples to records; partial patterns make it -possible to write, slightly surprisingly, -``` - case of { - => t - ... - } -``` - -===Regular expression patterns=== - -To define string operations computed at compile time, such -as in morphology, it is handy to use regular expression patterns: - - //p// ``+`` //q// : token consisting of //p// followed by //q// - - //p// ``*`` : token //p// repeated 0 or more times - (max the length of the string to be matched) - - ``-`` //p// : matches anything that //p// does not match - - //x// ``@`` //p// : bind to //x// what //p// matches - - //p// ``|`` //q// : matches what either //p// or //q// matches - - -The last three apply to all types of patterns, the first two only to token strings. -As an example, we give a rule for the formation of English word forms -ending with an //s// and used in the formation of both plural nouns and -third-person present-tense verbs. -``` - add_s : Str -> Str = \w -> case w of { - _ + "oo" => w + "s" ; -- bamboo - _ + ("s" | "z" | "x" | "sh" | "o") => w + "es" ; -- bus, hero - _ + ("a" | "o" | "u" | "e") + "y" => w + "s" ; -- boy - x + "y" => x + "ies" ; -- fly - _ => w + "s" -- car - } ; -``` -Here is another example, the plural formation in Swedish 2nd declension. -The second branch uses a variable binding with ``@`` to cover the cases where an -unstressed pre-final vowel //e// disappears in the plural -(//nyckel-nycklar, seger-segrar, bil-bilar//): -``` - plural2 : Str -> Str = \w -> case w of { - pojk + "e" => pojk + "ar" ; - nyck + "e" + l@("l" | "r" | "n") => nyck + l + "ar" ; - bil => bil + "ar" - } ; -``` -Variables in regular expression patterns -are always bound to the **first match**, which is the first -in the sequence of binding lists. For example: -- ``x + "e" + y`` matches ``"peter"`` with ``x = "p", y = "ter"`` -- ``x + "er"*`` matches ``"burgerer"`` with ``x = "burg" - - - -**Exercise**. Implement the German **Umlaut** operation on word stems. -The operation changes the vowel of the stressed stem syllable as follows: -//a// to //ä//, //au// to //äu//, //o// to //ö//, and //u// to //ü//. You -can assume that the operation only takes syllables as arguments. Test the -operation to see whether it correctly changes //Arzt// to //Ärzt//, -//Baum// to //Bäum//, //Topf// to //Töpf//, and //Kuh// to //Küh//. - -**Exercise**. Define an operation that deletes all vowels from the -end of a string, so that e.g. "aigeia" becomes "aig". - - -===Free variation=== - -Sometimes there are many alternative ways to define a concrete syntax. -For instance, the verb negation in English can be expressed both by -//does not// and //doesn't//. In linguistic terms, these expressions -are in **free variation**. The ``variants`` construct of GF can -be used to give a list of strings in free variation. For example, -``` - NegVerb verb = {s = variants {["does not"] ; "doesn't} ++ verb.s ! Pl} ; -``` -An empty variant list -``` - variants {} -``` -can be used e.g. if a word lacks a certain form. - -In general, ``variants`` should be used cautiously. It is not -recommended for modules aimed to be libraries, because the -user of the library has no way to choose among the variants. - - -%--! -===Prefix-dependent choices=== - -Sometimes a token has different forms depending on the token -that follows. An example is the English indefinite article, -which is //an// if a vowel follows, //a// otherwise. -Which form is chosen can only be decided at run time, i.e. -when a string is actually build. GF has a special construct for -such tokens, the ``pre`` construct exemplified in -``` - oper artIndef : Str = - pre {"a" ; "an" / strs {"a" ; "e" ; "i" ; "o"}} ; -``` -Thus -``` - artIndef ++ "cheese" ---> "a" ++ "cheese" - artIndef ++ "apple" ---> "an" ++ "apple" -``` -This very example does not work in all situations: the prefix -//u// has no general rules, and some problematic words are -//euphemism, one-eyed, n-gram//. It is possible to write -``` - oper artIndef : Str = - pre {"a" ; - "a" / strs {"eu" ; "one"} ; - "an" / strs {"a" ; "e" ; "i" ; "o" ; "n-"} - } ; -``` - - -===Predefined types=== - -GF has the following predefined categories in abstract syntax: -``` - cat Int ; -- integers, e.g. 0, 5, 743145151019 - cat Float ; -- floats, e.g. 0.0, 3.1415926 - cat String ; -- strings, e.g. "", "foo", "123" -``` -The objects of each of these categories are **literals** -as indicated in the comments above. No ``fun`` definition -can have a predefined category as its value type, but -they can be used as arguments. For example: -``` - fun StreetAddress : Int -> String -> Address ; - lin StreetAddress number street = {s = number.s ++ street.s} ; - - -- e.g. (StreetAddress 10 "Downing Street") : Address -``` -FIXME: The linearization type is ``{s : Str}`` for all these categories. - - -===Overloading of operations=== - -Large libraries, such as the GF Resource Grammar Library, may define -hundreds of names. This can be unpractical -for both the library author and the user: the author has to invent longer -and longer names which are not always intuitive, -and the author has to learn or at least be able to find all these names. -A solution to this problem, adopted by languages such as C++, -is **overloading**: one and the same name can be used for several functions. -When such a name is used, the -compiler performs **overload resolution** to find out which of -the possible functions is meant. Overload resolution is based on -the types of the functions: all functions that -have the same name must have different types. - -In C++, functions with the same name can be scattered everywhere in the program. -In GF, they must be grouped together in ``overload`` groups. Here is an example -of an overload group, giving three different ways to define verbs in English: -``` - oper mkV = overload { - mkV : (walk : Str) -> V = -- regular verbs - mkV : (omit,omitted : Str) -> V = -- regular verbs with duplication - mkN : (sing,sang,sung : Str) -> V = -- irregular verbs - mkN : (run,ran,run,running : Str) -> V = -- irregular verbs with duplication - } -``` -Intuitively, the forms correspond to the way regular and irregular words -are given in a dictionary: by listing relevant forms, instead of -referring to a paradigm. - - - - -=Implementing morphology and syntax= - -In this chapter, we will dig deeper into linguistic concepts than -so far. We will build an implementation of a linguistic motivated -fragment of English and Italian, covering basic morphology of syntax. -The result is a miniature of the GF resource library, which will -be covered in the next chapter. There are two main purposes -for this chapter: -- first, to understand the linguistic concepts underlying the resource - grammar library -- second, to get practice in the more advanced constructs of concrete syntax - - -However, the reader who is not willing to work on an advanced level -of concrete syntax may just skim through the introductory parts of -each section, thus using the chapter in its first purpose only. - - - -==Worst-case functions and data abstraction== - -Some English nouns, such as ``mouse``, are so irregular that -it makes no sense to see them as instances of a paradigm. Even -then, it is useful to perform **data abstraction** from the -definition of the type ``Noun``, and introduce a constructor -operation, a **worst-case function** for nouns: -``` - oper mkNoun : Str -> Str -> Noun = \x,y -> { - s = table { - Sg => x ; - Pl => y - } - } ; -``` -Thus we can define -``` - lin Mouse = mkNoun "mouse" "mice" ; -``` -and -``` - oper regNoun : Str -> Noun = \x -> - mkNoun x (x + "s") ; -``` -instead of writing the inflection tables explicitly. - -The grammar engineering advantage of worst-case functions is that -the author of the resource module may change the definitions of -``Noun`` and ``mkNoun``, and still retain the -interface (i.e. the system of type signatures) that makes it -correct to use these functions in concrete modules. In programming -terms, ``Noun`` is then treated as an **abstract datatype**. - - - -%--! -==A system of paradigms using predefined string operations== - -In addition to the completely regular noun paradigm ``regNoun``, -some other frequent noun paradigms deserve to be -defined, for instance, -``` - sNoun : Str -> Noun = \kiss -> mkNoun kiss (kiss + "es") ; -``` -What about nouns like //fly//, with the plural //flies//? The already -available solution is to use the longest common prefix -//fl// (also known as the **technical stem**) as argument, and define -``` - yNoun : Str -> Noun = \fl -> mkNoun (fl + "y") (fl + "ies") ; -``` -But this paradigm would be very unintuitive to use, because the technical stem -is not an existing form of the word. A better solution is to use -the lemma and a string operator ``init``, which returns the initial segment (i.e. -all characters but the last) of a string: -``` - yNoun : Str -> Noun = \fly -> mkNoun fly (init fly + "ies") ; -``` -The operation ``init`` belongs to a set of operations in the -resource module ``Prelude``, which therefore has to be -``open``ed so that ``init`` can be used. -``` - > cc init "curry" - "curr" -``` -Its dual is ``last``: -``` - > cc last "curry" - "y" -``` -As generalizations of the library functions ``init`` and ``last``, GF has -two predefined funtions: -``Predef.dp``, which "drops" suffixes of any length, -and ``Predef.tk``, which "takes" a prefix -just omitting a number of characters from the end. For instance, -``` - > cc Predef.tk 3 "worried" - "worr" - > cc Predef.dp 3 "worried" - "ied" -``` -The prefix ``Predef`` is given to a handful of functions that could -not be defined internally in GF. They are available in all modules -without explicit ``open`` of the module ``Predef``. - - - - - - -%--! -==An intelligent noun paradigm using pattern matching== - -It may be hard for the user of a resource morphology to pick the right -inflection paradigm. A way to help this is to define a more intelligent -paradigm, which chooses the ending by first analysing the lemma. -The following variant for English regular nouns puts together all the -previously shown paradigms, and chooses one of them on the basis of -the final letter of the lemma (found by the prelude operation ``last``). -``` - regNoun : Str -> Noun = \s -> case last s of { - "s" | "z" => mkNoun s (s + "es") ; - "y" => mkNoun s (init s + "ies") ; - _ => mkNoun s (s + "s") - } ; -``` -The paradigms ``regNoun`` does not give the correct forms for -all nouns. For instance, //mouse - mice// and -//fish - fish// must be given by using ``mkNoun``. -Also the word //boy// would be inflected incorrectly; to prevent -this, either use ``mkNoun`` or modify -``regNoun`` so that the ``"y"`` case does not -apply if the second-last character is a vowel. - -**Exercise**. Extend the ``regNoun`` paradigm so that it takes care -of all variations there are in English. Test it with the nouns -//ax//, //bamboo//, //boy//, //bush//, //hero//, //match//. -**Hint**. The library functions ``Predef.dp`` and ``Predef.tk`` -are useful in this task. - -**Exercise**. The same rules that form plural nouns in English also -apply in the formation of third-person singular verbs. -Write a regular verb paradigm that uses this idea, but first -rewrite ``regNoun`` so that the analysis needed to build //s//-forms -is factored out as a separate ``oper``, which is shared with -``regVerb``. - - - - - -%--! -==Morphological resource modules== - -A common idiom is to -gather the ``oper`` and ``param`` definitions -needed for inflecting words in -a language into a morphology module. Here is a simple -example, [``MorphoEng`` resource/MorphoEng.gf]. -``` - --# -path=.:prelude - - resource MorphoEng = open Prelude in { - - param - Number = Sg | Pl ; - - oper - Noun, Verb : Type = {s : Number => Str} ; - - mkNoun : Str -> Str -> Noun = \x,y -> { - s = table { - Sg => x ; - Pl => y - } - } ; - - regNoun : Str -> Noun = \s -> case last s of { - "s" | "z" => mkNoun s (s + "es") ; - "y" => mkNoun s (init s + "ies") ; - _ => mkNoun s (s + "s") - } ; - - mkVerb : Str -> Str -> Verb = \x,y -> mkNoun y x ; - - regVerb : Str -> Verb = \s -> case last s of { - "s" | "z" => mkVerb s (s + "es") ; - "y" => mkVerb s (init s + "ies") ; - "o" => mkVerb s (s + "es") ; - _ => mkVerb s (s + "s") - } ; - } -``` -The first line gives as a hint to the compiler the -**search path** needed to find all the other modules that the -module depends on. The directory ``prelude`` is a subdirectory of -``GF/lib``; to be able to refer to it in this simple way, you can -set the environment variable ``GF_LIB_PATH`` to point to this -directory. - - - -%--! -==Morphological analysis and morphology quiz== - -Even though morphology is in GF -mostly used as an auxiliary for syntax, it -can also be useful on its own right. The command ``morpho_analyse = ma`` -can be used to read a text and return for each word the analyses that -it has in the current concrete syntax. -``` - > rf bible.txt | morpho_analyse -``` -In the same way as translation exercises, morphological exercises can -be generated, by the command ``morpho_quiz = mq``. Usually, -the category is set to be something else than ``S``. For instance, -``` - > cd GF/lib/resource-1.0/ - > i french/IrregFre.gf - > morpho_quiz -cat=V - - Welcome to GF Morphology Quiz. - ... - - réapparaître : VFin VCondit Pl P2 - réapparaitriez - > No, not réapparaitriez, but - réapparaîtriez - Score 0/1 -``` -Finally, a list of morphological exercises can be generated -off-line and saved in a -file for later use, by the command ``morpho_list = ml`` -``` - > morpho_list -number=25 -cat=V | wf exx.txt -``` -The ``number`` flag gives the number of exercises generated. - - - - - -=Using the resource grammar library= - -In this chapter, we will take a look at the GF resource grammar library. -We will use the library to implement a slightly extended ``Food`` grammar -and port it to some new languages. - -**Exercise**. Define the mini resource of the previous chapter by -using a functor over the full resource. - - -==The coverage of the library== - -The GF Resource Grammar Library contains grammar rules for -10 languages (in addition, 2 languages are available as incomplete -implementations, and a few more are under construction). Its purpose -is to make these rules available for application programmers, -who can thereby concentrate on the semantic and stylistic -aspects of their grammars, without having to think about -grammaticality. The targeted level of application grammarians -is that of a skilled programmer with -a practical knowledge of the target languages, but without -theoretical knowledge about their grammars. -Such a combination of -skills is typical of programmers who, for instance, want to localize -software to new languages. - -The current resource languages are -- ``Ara``bic (incomplete) -- ``Cat``alan (incomplete) -- ``Dan``ish -- ``Eng``lish -- ``Fin``nish -- ``Fre``nch -- ``Ger``man -- ``Ita``lian -- ``Nor``wegian -- ``Rus``sian -- ``Spa``nish -- ``Swe``dish - - -The first three letters (``Eng`` etc) are used in grammar module names. -The incomplete Arabic and Catalan implementations are -enough to be used in many applications; they both contain, amoung other -things, complete inflectional morphology. - - -==The resource API== - -The resource library API is devided into language-specific -and language-independent parts. To put it roughly, -- the syntax API is language-independent, i.e. has the same types and functions for all - languages. - Its name is ``Syntax``//L// for each language //L// -- the morphology API is language-specific, i.e. has partly different types and functions - for different languages. - Its name is ``Paradigms``//L// for each language //L// - - -A full documentation of the API is available on-line in the -[resource synopsis ../../lib/resource-1.0/synopsis.html]. For our -examples, we will only need a fragment of the full API. - -In the first examples, -we will make use of the following categories, from the module ``Syntax``. - -|| Category | Explanation | Example || -| ``Utt`` | sentence, question, word... | "be quiet" | -| ``Adv`` | verb-phrase-modifying adverb, | "in the house" | -| ``AdA`` | adjective-modifying adverb, | "very" | -| ``S`` | declarative sentence | "she lived here" | -| ``Cl`` | declarative clause, with all tenses | "she looks at this" | -| ``AP`` | adjectival phrase | "very warm" | -| ``CN`` | common noun (without determiner) | "red house" | -| ``NP`` | noun phrase (subject or object) | "the red house" | -| ``Det`` | determiner phrase | "those seven" | -| ``Predet`` | predeterminer | "only" | -| ``Quant`` | quantifier with both sg and pl | "this/these" | -| ``Prep`` | preposition, or just case | "in" | -| ``A`` | one-place adjective | "warm" | -| ``N`` | common noun | "house" | - - -We will need the following syntax rules from ``Syntax``. - -|| Function | Type | Example || -| ``mkUtt`` | ``S -> Utt`` | //John walked// | -| ``mkUtt`` | ``Cl -> Utt`` | //John walks// | -| ``mkCl`` | ``NP -> AP -> Cl`` | //John is very old// | -| ``mkNP`` | ``Det -> CN -> NP`` | //the first old man// | -| ``mkNP`` | ``Predet -> NP -> NP`` | //only John// | -| ``mkDet`` | ``Quant -> Det`` | //this// | -| ``mkCN`` | ``N -> CN`` | //house// | -| ``mkCN`` | ``AP -> CN -> CN`` | //very big blue house// | -| ``mkAP`` | ``A -> AP`` | //old// | -| ``mkAP`` | ``AdA -> AP -> AP`` | //very very old// | - -We will also need the following structural words from ``Syntax``. - -|| Function | Type | Example || -| ``all_Predet`` | ``Predet`` | //all// | -| ``defPlDet`` | ``Det`` | //the (houses)// | -| ``this_Quant`` | ``Quant`` | //this// | -| ``very_AdA`` | ``AdA`` | //very// | - - -For French, we will use the following part of ``ParadigmsFre``. - -|| Function | Type || -| ``Gender`` | ``Type`` | -| ``masculine`` | ``Gender`` | -| ``feminine`` | ``Gender`` | -| ``mkN`` | ``(cheval : Str) -> N`` | -| ``mkN`` | ``(foie : Str) -> Gender -> N`` | -| ``mkA`` | ``(cher : Str) -> A`` | -| ``mkA`` | ``(sec,seche : Str) -> A`` | - - -For German, we will use the following part of ``ParadigmsGer``. - -|| Function | Type || -| ``Gender`` | ``Type`` | -| ``masculine`` | ``Gender`` | -| ``feminine`` | ``Gender`` | -| ``neuter`` | ``Gender`` | -| ``mkN`` | ``(Stufe : Str) -> N`` | -| ``mkN`` | ``(Bild,Bilder : Str) -> Gender -> N`` | -| ``mkA`` | ``(klein : Str) -> A`` | -| ``mkA`` | ``(gut,besser,beste : Str) -> A`` | - - -**Exercise**. Try out the morphological paradigms in different languages. Do -in this way: -``` - > i -path=alltenses:prelude -retain alltenses/ParadigmsGer.gfr - > cc mkN "Farbe" - > cc mkA "gut" "besser" "beste" -``` - - -==Example: French== - -We start with an abstract syntax that is like ``Food`` before, but -has a plural determiner (//all wines//) and some new nouns that will -need different genders in most languages. -``` - abstract Food = { - cat - S ; Item ; Kind ; Quality ; - fun - Is : Item -> Quality -> S ; - This, All : Kind -> Item ; - QKind : Quality -> Kind -> Kind ; - Wine, Cheese, Fish, Beer, Pizza : Kind ; - Very : Quality -> Quality ; - Fresh, Warm, Italian, Expensive, Delicious, Boring : Quality ; - } -``` -The French implementation opens ``SyntaxFre`` and ``ParadigmsFre`` -to get access to the resource libraries needed. In order to find -the libraries, a ``path`` directive is prepended; it is interpreted -relative to the environment variable ``GF_LIB_PATH``. -``` - --# -path=.:present:prelude - - concrete FoodFre of Food = open SyntaxFre,ParadigmsFre in { - lincat - S = Utt ; - Item = NP ; - Kind = CN ; - Quality = AP ; - lin - Is item quality = mkUtt (mkCl item quality) ; - This kind = mkNP (mkDet this_Quant) kind ; - All kind = mkNP all_Predet (mkNP defPlDet kind) ; - QKind quality kind = mkCN quality kind ; - Wine = mkCN (mkN "vin") ; - Beer = mkCN (mkN "bière") ; - Pizza = mkCN (mkN "pizza" feminine) ; - Cheese = mkCN (mkN "fromage" masculine) ; - Fish = mkCN (mkN "poisson") ; - Very quality = mkAP very_AdA quality ; - Fresh = mkAP (mkA "frais" "fraîche") ; - Warm = mkAP (mkA "chaud") ; - Italian = mkAP (mkA "italien") ; - Expensive = mkAP (mkA "cher") ; - Delicious = mkAP (mkA "délicieux") ; - Boring = mkAP (mkA "ennuyeux") ; - } -``` -The ``lincat`` definitions in ``FoodFre`` assign **resource categories** -to **application categories**. In a sense, the application categories -are **semantic**, as they correspond to concepts in the grammar application, -whereas the resource categories are **syntactic**: they give the linguistic -means to express concepts in any application. - -The ``lin`` definitions likewise assign resource functions to application -functions. Under the hood, there is a lot of matching with parameters to -take care of word order, inflection, and agreement. But the user of the -library sees nothing of this: the only parameters you need to give are -the genders of some nouns, which cannot be correctly inferred from the word. - -In French, for example, the one-argument ``mkN`` assigns the noun the feminine -gender if and only if it ends with an //e//. Therefore the words //fromage// and -//pizza// are given genders manually. -One can of course always give genders manually, to be on the safe side. - -As for inflection, the one-argument adjective pattern ``mkA`` takes care of -completely regular adjective such as //chaud-chaude//, but also of special -cases such as //italien-italienne//, //cher-chère//, and //délicieux-délicieuse//. -But it cannot form //frais-fraîche// properly. Once again, you can give more -forms to be on the safe side. You can also test the paradigms in the GF -system. - -**Exercise**. Compile the grammar ``FoodFre`` and generate and parse some sentences. - -**Exercise**. Write a concrete syntax of ``Food`` for English or some other language -included in the resource library. You can also compare the output with the hand-written -grammars presented earlier in this tutorial. - -**Exercise**. In particular, try to write a concrete syntax for Italian, even if -you don't know Italian. What you need to know is that "beer" is //birra// and -"pizza" is //pizza//, and that all the nouns and adjectives in the grammar -are regular. - - - -==Functor implementation of multilingual grammars== - -If you did the exercise of writing a concrete syntax of ``Food`` for some other -language, you probably noticed that much of the code looks exactly the same -as for French. The immediate reason for this is that the ``Syntax`` API is the -same for all languages; the deeper reason is that all languages (at least those -in the resource package) implement the same syntactic structures and tend to use them -in similar ways. Thus it is only the lexical parts of a concrete syntax that -you need to write anew for a new language. In brief, -- first copy the concrete syntax for one language -- then change the words (the strings and perhaps some paradigms) - - -But programming by copy-and-paste is not worthy of a functional programmer. -Can we write a function that takes care of the shared parts of grammar modules? -Yes, we can. It is not a function in the ``fun`` or ``oper`` sense, but -a function operating on modules, called a **functor**. This construct -is familiar from the functional languages ML and OCaml, but it does not -exist in Haskell. It also bears some resemblance to templates in C++. -Functors are also known as **parametrized modules**. - -In GF, a functor is a module that ``open``s one or more **interfaces**. -An ``interface`` is a module similar to a ``resource``, but it only -contains the types of ``oper``s, not their definitions. You can think -of an interface as a kind of a record type. Thus a functor is a kind -of a function taking records as arguments and producins a module -as value. - -Let us look at a functor implementation of the ``Food`` grammar. -Consider its module header first: -``` - incomplete concrete FoodI of Food = open Syntax, LexFood in -``` -In the functor-function analogy, ``FoodI`` would be presented as a function -with the following type signature: -``` - FoodI : instance of Syntax -> instance of LexFood -> concrete of Food -``` -It takes as arguments two interfaces: -- ``Syntax``, the resource grammar interface -- ``LexFood``, the domain-specific lexicon interface - - -Functors opening ``Syntax`` and a domain lexicon interface are in fact -so typical in GF applications, that this structure could be called -a **design patter** -for GF grammars. The idea in this pattern is, again, that -the languages use the same syntactic structures but different words. - -Before going to the details of the module bodies, let us look at how functors -are concretely used. An interface has a header such as -``` - interface LexFood = open Syntax in -``` -To give an ``instance`` of it means that all ``oper``s are given definitione (of -appropriate types). For example, -``` - instance LexFoodGer of LexFood = open SyntaxGer, ParadigmsGer in -``` -Notice that when an interface opens an interface, such as ``Syntax``, -then its instance -opens an instance of it. But the instance may also open some other -resources - typically, -a domain lexicon instance opens a ``Paradigms`` module. - -In the function-functor analogy, we now have -``` - SyntaxGer : instance of Syntax - LexFoodGer : instance of LexFood -``` -Thus we can complete the German implementation by "applying" the functor: -``` - FoodI SyntaxGer LexFoodGer : concrete of Food -``` -The GF syntax for doing so is -``` - concrete FoodGer of Food = FoodI with - (Syntax = SyntaxGer), - (LexFood = LexFoodGer) ; -``` -Notice that this is the //complete// module, not just a header of it. -The module body is received from ``FoodI``, by instantiating the -interface constants with their definitions given in the German -instances. - -A module of this form, characterized by the keyword ``with``, is -called a **functor instantiation**. - -Here is the complete code for the functor ``FoodI``: -``` - incomplete concrete FoodI of Food = open Syntax, LexFood in { - lincat - S = Utt ; - Item = NP ; - Kind = CN ; - Quality = AP ; - lin - Is item quality = mkUtt (mkCl item quality) ; - This kind = mkNP (mkDet this_Quant) kind ; - All kind = mkNP all_Predet (mkNP defPlDet kind) ; - QKind quality kind = mkCN quality kind ; - Wine = mkCN wine_N ; - Beer = mkCN beer_N ; - Pizza = mkCN pizza_N ; - Cheese = mkCN cheese_N ; - Fish = mkCN fish_N ; - Very quality = mkAP very_AdA quality ; - Fresh = mkAP fresh_A ; - Warm = mkAP warm_A ; - Italian = mkAP italian_A ; - Expensive = mkAP expensive_A ; - Delicious = mkAP delicious_A ; - Boring = mkAP boring_A ; -} -``` - - -==Interfaces and instances== - -Let us now define the ``LexFood`` interface: -``` - interface LexFood = open Syntax in { - oper - wine_N : N ; - beer_N : N ; - pizza_N : N ; - cheese_N : N ; - fish_N : N ; - fresh_A : A ; - warm_A : A ; - italian_A : A ; - expensive_A : A ; - delicious_A : A ; - boring_A : A ; -} -``` -In this interface, only lexical items are declared. In general, an -interface can declare any functions and also types. The ``Syntax`` -interface does so. - -Here is the German instance of the interface: -``` - instance LexFoodGer of LexFood = open SyntaxGer, ParadigmsGer in { - oper - wine_N = mkN "Wein" ; - beer_N = mkN "Bier" "Biere" neuter ; - pizza_N = mkN "Pizza" "Pizzen" feminine ; - cheese_N = mkN "Käse" "Käsen" masculine ; - fish_N = mkN "Fisch" ; - fresh_A = mkA "frisch" ; - warm_A = mkA "warm" "wärmer" "wärmste" ; - italian_A = mkA "italienisch" ; - expensive_A = mkA "teuer" ; - delicious_A = mkA "köstlich" ; - boring_A = mkA "langweilig" ; - } -``` -Just to complete the picture, we repeat the German functor instantiation -for ``FoodI``, this time with a path directive that makes it compilable. -``` - --# -path=.:present:prelude - - concrete FoodGer of Food = FoodI with - (Syntax = SyntaxGer), - (LexFood = LexFoodGer) ; -``` - - -**Exercise**. Compile and test ``FoodGer``. - -**Exercise**. Refactor ``FoodFre`` into a functor instantiation. - - - -==Adding languages to a functor implementation== - -Once we have an application grammar defined by using a functor, -adding a new language is simple. Just two modules need to be written: -- a domain lexicon instance -- a functor instantiation - - -The functor instantiation is completely mechanical to write. -Here is one for Finnish: -``` ---# -path=.:present:prelude - -concrete FoodFin of Food = FoodI with - (Syntax = SyntaxFin), - (LexFood = LexFoodFin) ; -``` -The domain lexicon instance requires some knowledge of the words of the -language: what words are used for which concepts, how the words are -inflected, plus features such as genders. Here is a lexicon instance for -Finnish: -``` - instance LexFoodFin of LexFood = open SyntaxFin, ParadigmsFin in { - oper - wine_N = mkN "viini" ; - beer_N = mkN "olut" ; - pizza_N = mkN "pizza" ; - cheese_N = mkN "juusto" ; - fish_N = mkN "kala" ; - fresh_A = mkA "tuore" ; - warm_A = mkA "lämmin" ; - italian_A = mkA "italialainen" ; - expensive_A = mkA "kallis" ; - delicious_A = mkA "herkullinen" ; - boring_A = mkA "tylsä" ; - } -``` - -**Exercise**. Instantiate the functor ``FoodI`` to some language of -your choice. - - -==Division of labour revisited== - -One purpose with the resource grammars was stated to be a division -of labour between linguists and application grammarians. We can now -reflect on what this means more precisely, by asking ourselves what -skills are required of grammarians working on different components. - -Building a GF application starts from the abstract syntax. Writing -an abstract syntax requires -- understanding the semantic structure of the application domain -- knowledge of the GF fragment with categories and functions - - -If the concrete syntax is written by means of a functor, the programmer -has to decide what parts of the implementation are put to the interface -and what parts are shared in the functor. This requires -- knowing how the domain concepts are expressed in natural language -- knowledge of the resource grammar library - the categories and combinators -- understanding what parts are likely to be expressed in language-dependent - ways, so that they must belong to the interface and not the functor -- knowledge of the GF fragment with function applications and strings - - -Instantiating a ready-made functor to a new language is less demanding. -It requires essentially -- knowing how the domain words are expressed in the language -- knowing, roughly, how these words are inflected -- knowledge of the paradigms available in the library -- knowledge of the GF fragment with function applications and strings - - -Notice that none of these tasks requires the use of GF records, tables, -or parameters. Thus only a small fragment of GF is needed; the rest of -GF is only relevant for those who write the libraries. - -Of course, grammar writing is not always straightforward usage of libraries. -For example, GF can be used for other languages than just those in the -libraries - for both natural and formal languages. A knowledge of records -and tables can, unfortunately, also be needed for understanding GF's error -messages. - -**Exercise**. Design a small grammar that can be used for controlling -an MP3 player. The grammar should be able to recognize commands such -as //play this song//, with the following variations: -- verbs: //play//, //remove// -- objects: //song//, //artist// -- determiners: //this//, //the previous// -- verbs without arguments: //stop//, //pause// - - -The implementation goes in the following phases: -+ abstract syntax -+ functor and lexicon interface -+ lexicon instance for the first language -+ functor instantiation for the first language -+ lexicon instance for the second language -+ functor instantiation for the second language -+ ... - - - -==Restricted inheritance== - -A functor implementation using the resource ``Syntax`` interface -works as long as all concepts are expressed by using the same structures -in all languages. If this is not the case, the deviant linearization can -be made into a parameter and moved to the domain lexicon interface. - -Let us take a slightly contrived example: assume that English has -no word for ``Pizza``, but has to use the paraphrase //Italian pie//. -This paraphrase is no longer a noun ``N``, but a complex phrase -in the category ``CN``. An obvious way to solve this problem is -to change interface ``LexEng`` so that the constant declared for -``Pizza`` gets a new type: -``` - oper pizza_CN : CN ; -``` -But this solution is unstable: we may end up changing the interface -and the function with each new language, and we must every time also -change the interface instances for the old languages to maintain -type correctness. - -A better solution is to use **restricted inheritance**: the English -instantiation inherits the functor implementation except for the -constant ``Pizza``. This is how we write: -``` - --# -path=.:present:prelude - - concrete FoodEng of Food = FoodI - [Pizza] with - (Syntax = SyntaxEng), - (LexFood = LexFoodEng) ** - open SyntaxEng, ParadigmsEng in { - - lin Pizza = mkCN (mkA "Italian") (mkN "pie") ; - } -``` -Restricted inheritance is available for all inherited modules. One can for -instance exclude some mushrooms and pick up just some fruit in -the ``FoodMarket`` example: -``` - abstract Foodmarket = Food, Fruit [Peach], Mushroom - [Agaric] -``` -A concrete syntax of ``Foodmarket`` must then indicate the same inheritance -restrictions. - - -**Exercise**. Change ``FoodGer`` in such a way that it says, instead of -//X is Y//, the equivalent of //X must be Y// (//X muss Y sein//). -You will have to browse the full resource API to find all -the functions needed. - - -==Browsing the resource with GF commands== - -In addition to reading the -[resource synopsis ../../lib/resource-1.0/synopsis.html], you -can find resource function combinations by using the parser. This -is so because the resource library is in the end implemented as -a top-level ``abstract-concrete`` grammar, on which parsing -and linearization work. - -Unfortunately, only English and the Scandinavian languages can be -parsed within acceptable computer resource limits when the full -resource is used. - -To look for a syntax tree in the overload API by parsing, do like this: -``` - > $GF_LIB_PATH - > i -path=alltenses:prelude alltenses/OverLangEng.gfc - > p -cat=S -overload "this grammar is too big" - mkS (mkCl (mkNP (mkDet this_Quant) grammar_N) (mkAP too_AdA big_A)) -``` -To view linearizations in all languages by parsing from English: -``` - > i alltenses/langs.gfcm - > p -cat=S -lang=LangEng "this grammar is too big" | tb - UseCl TPres ASimul PPos (PredVP (DetCN (DetSg (SgQuant this_Quant) - NoOrd) (UseN grammar_N)) (UseComp (CompAP (AdAP too_AdA (PositA big_A))))) - Den här grammatiken är för stor - Esta gramática es demasiado grande - (Cyrillic: eta grammatika govorit des'at' jazykov) - Denne grammatikken er for stor - Questa grammatica è troppo grande - Diese Grammatik ist zu groß - Cette grammaire est trop grande - Tämä kielioppi on liian suuri - This grammar is too big - Denne grammatik er for stor -``` -Unfortunately, the Russian grammar uses at the moment a different -character encoding than the rest and is therefore not displayed correctly -in a terminal window. However, the GF syntax editor does display all -examples correctly: -``` - % gfeditor alltenses/langs.gfcm -``` -When you have constructed the tree, you will see the following screen: - -#BCEN - - [../../lib/resource-1.0/doc/10lang-small.png] - -#ECEN - - -**Exercise**. Find the resource grammar translations for the following -English phrases (parse in the category ``Phr``). You can first try to -build the terms manually. - -//every man loves a woman// - -//this grammar speaks more than ten languages// - -//which languages aren't in the grammar// - -//which languages did you want to speak// - - - -=Refining semantics in abstract syntax= - -==GF as a logical framework== - -In this section, we will show how -to encode advanced semantic concepts in an abstract syntax. -We use concepts inherited from **type theory**. Type theory -is the basis of many systems known as **logical frameworks**, which are -used for representing mathematical theorems and their proofs on a computer. -In fact, GF has a logical framework as its proper part: -this part is the abstract syntax. - -In a logical framework, the formalization of a mathematical theory -is a set of type and function declarations. The following is an example -of such a theory, represented as an ``abstract`` module in GF. -``` -abstract Arithm = { - cat - Prop ; -- proposition - Nat ; -- natural number - fun - Zero : Nat ; -- 0 - Succ : Nat -> Nat ; -- successor of x - Even : Nat -> Prop ; -- x is even - And : Prop -> Prop -> Prop ; -- A and B - } -``` - -**Exercise**. Give a concrete syntax of ``Arithm``, either from scatch or -by using the resource library. - - - - -==Dependent types== - -**Dependent types** are a characteristic feature of GF, -inherited from the **constructive type theory** of Martin-Löf and -distinguishing GF from most other grammar formalisms and -functional programming languages. - -Dependent types can be used for stating stronger -**conditions of well-formedness** than ordinary types. -A simple example is a "smart house" system, which -defines voice commands for household appliances. This example -is borrowed from the -[Regulus Book http://cslipublications.stanford.edu/site/1575865262.html] -(Rayner & al. 2006). - -One who enters a smart house can use speech to dim lights, switch -on the fan, etc. For each ``Kind`` of a device, there is a set of -``Actions`` that can be performed on it; thus one can dim the lights but - not the fan, for example. These dependencies can be expressed by -by making the type ``Action`` dependent on ``Kind``. We express this -as follows in ``cat`` declarations: -``` - cat - Command ; - Kind ; - Action Kind ; - Device Kind ; -``` -The crucial use of the dependencies is made in the rule for forming commands: -``` - fun CAction : (k : Kind) -> Action k -> Device k -> Command ; -``` -In other words: an action and a device can be combined into a command only -if they are of the same ``Kind`` ``k``. If we have the functions -``` - DKindOne : (k : Kind) -> Device k ; -- the light - - light, fan : Kind ; - dim : Action light ; -``` -we can form the syntax tree -``` - CAction light dim (DKindOne light) -``` -but we cannot form the trees -``` - CAction light dim (DKindOne fan) - CAction fan dim (DKindOne light) - CAction fan dim (DKindOne fan) -``` -Linearization rules are written as usual: the concrete syntax does not -know if a category is a dependent type. In English, you can write as follows: -``` - lincat Action = {s : Str} ; - lin CAction kind act dev = {s = act.s ++ dev.s} ; -``` -Notice that the argument ``kind`` does not appear in the linearization. -The type checker will be able to reconstruct it from the ``dev`` argument. - -Parsing with dependent types is performed in two phases: -+ context-free parsing -+ filtering through type checker - - -If you just parse in the usual way, you don't enter the second phase, and -the ``kind`` argument is not found: -``` - > parse "dim the light" - CAction ? dim (DKindOne light) -``` -Moreover, type-incorrect commands are not rejected: -``` - > parse "dim the fan" - CAction ? dim (DKindOne fan) -``` -The question mark ``?`` is a **metavariable**, and is returned by the parser -for any subtree that is suppressed by a linearization rule. - -To get rid of metavariables, you must feed the parse result into the -second phase of **solving** them. The ``solve`` process uses the dependent -type checker to restore the values of the metavariables. It is invoked by -the command ``put_tree = pt`` with the flag ``-transform=solve``: -``` - > parse "dim the light" | put_tree -transform=solve - CAction light dim (DKindOne light) -``` -The ``solve`` process may fail, in which case no tree is returned: -``` - > parse "dim the fan" | put_tree -transform=solve - no tree found -``` - - -**Exercise**. Write an abstract syntax module with above contents -and an appropriate English concrete syntax. Try to parse the commands -//dim the light// and //dim the fan//, with and without ``solve`` filtering. - - -**Exercise**. Perform random and exhaustive generation, with and without -``solve`` filtering. - -**Exercise**. Add some device kinds and actions to the grammar. - - -==Polymorphism== - -Sometimes an action can be performed on all kinds of devices. It would be -possible to introduce separate ``fun`` constants for each kind-action pair, -but this would be tedious. Instead, one can use **polymorphic** actions, -i.e. actions that take a ``Kind`` as an argument and produce an ``Action`` -for that ``Kind``: -``` - fun switchOn, switchOff : (k : Kind) -> Action k ; -``` -Functions that are not polymorphic are **monomorphic**. However, the -dichotomy into monomorphism and full polymorphism is not always sufficien -for good semantic modelling: very typically, some actions are defined -for a proper subset of devices, but not just one. For instance, both doors and -windows can be opened, whereas lights cannot. -We will return to this problem by introducing the -concept of **restricted polymorphism** later, -after a chapter on proof objects. - - - -==Dependent types and spoken language models== - -We have used dependent types to control semantic well-formedness -in grammars. This is important in traditional type theory -applications such as proof assistants, where only mathematically -meaningful formulas should be constructed. But semantic filtering has -also proved important in speech recognition, because it reduces the -ambiguity of the results. - - -===Grammar-based language models=== - -The standard way of using GF in speech recognition is by building -**grammar-based language models**. To this end, GF comes with compilers -into several formats that are used in speech recognition systems. -One such format is GSL, used in the [Nuance speech recognizer www.nuance.com]. -It is produced from GF simply by printing a grammar with the flag -``-printer=gsl``. -``` - > import -conversion=finite SmartEng.gf - > print_grammar -printer=gsl - - ;GSL2.0 - ; Nuance speech recognition grammar for SmartEng - ; Generated by GF - - .MAIN SmartEng_2 - - SmartEng_0 [("switch" "off") ("switch" "on")] - SmartEng_1 ["dim" ("switch" "off") - ("switch" "on")] - SmartEng_2 [(SmartEng_0 SmartEng_3) - (SmartEng_1 SmartEng_4)] - SmartEng_3 ("the" SmartEng_5) - SmartEng_4 ("the" SmartEng_6) - SmartEng_5 "fan" - SmartEng_6 "light" -``` -Now, GSL is a context-free format, so how does it cope with dependent types? -In general, dependent types can give rise to infinitely many basic types -(exercise!), whereas a context-free grammar can by definition only have -finitely many nonterminals. - -This is where the flag ``-conversion=finite`` is needed in the ``import`` -command. Its effect is to convert a GF grammar with dependent types to -one without, so that each instance of a dependent type is replaced by -an atomic type. This can then be used as a nonterminal in a context-free -grammar. The ``finite`` conversion presupposes that every -dependent type has only finitely many instances, which is in fact -the case in the ``Smart`` grammar. - - -**Exercise**. If you have access to the Nuance speech recognizer, -test it with GF-generated language models for ``SmartEng``. Do this -both with and without ``-conversion=finite``. - -**Exercise**. Construct an abstract syntax with infinitely many instances -of dependent types. - - -===Statistical language models=== - -An alternative to grammar-based language models are -**statistical language models** (**SLM**s). An SLM is -built from a **corpus**, i.e. a set of utterances. It specifies the -probability of each **n-gram**, i.e. sequence of //n// words. The -typical value of //n// is 2 (bigrams) or 3 (trigrams). - -One advantage of SLMs over grammar-based models is that they are -**robust**, i.e. they can be used to recognize sequences that would -be out of the grammar or the corpus. Another advantage is that -an SLM can be built "for free" if a corpus is available. - -However, collecting a corpus can require a lot of work, and writing -a grammar can be less demanding, especially with tools such as GF or -Regulus. This advantage of grammars can be combined with robustness -by creating a back-up SLM from a **synthesized corpus**. This means -simply that the grammar is used for generating such a corpus. -In GF, this can be done with the ``generate_trees`` command. -As with grammar-based models, the quality of the SLM is better -if meaningless utterances are excluded from the corpus. Thus -a good way to generate an SLM from a GF grammar is by using -dependent types and filter the results through the type checker: -``` - > generate_trees | put_trees -transform=solve | linearize -``` - - -**Exercise**. Measure the size of the corpus generated from -``SmartEng``, with and without type checker filtering. - - - -==Digression: dependent types in concrete syntax== - -===Variables in function types=== - -A dependent function type needs to introduce a variable for -its argument type, as in -``` - switchOff : (k : Kind) -> Action k -``` -Function types //without// -variables are actually a shorthand notation: writing -``` - fun PredVP : NP -> VP -> S -``` -is shorthand for -``` - fun PredVP : (x : NP) -> (y : VP) -> S -``` -or any other naming of the variables. Actually the use of variables -sometimes shortens the code, since they can share a type: -``` - octuple : (x,y,z,u,v,w,s,t : Str) -> Str -``` -If a bound variable is not used, it can here, as elsewhere in GF, be replaced by -a wildcard: -``` - octuple : (_,_,_,_,_,_,_,_ : Str) -> Str -``` -A good practice for functions with many arguments of the same type -is to indicate the number of arguments: -``` - octuple : (x1,_,_,_,_,_,_,x8 : Str) -> Str -``` -One can also use the variables to document what each argument is expected -to provide, as is done in inflection paradigms in the resource grammar. -``` - mkV : (drink,drank,drunk : Str) -> V -``` - - -===Polymorphism in concrete syntax=== - -The **functional fragment** of GF -terms and types comprises function types, applications, lambda -abstracts, constants, and variables. This fragment is similar in -abstract and concrete syntax. In particular, -dependent types are also available in concrete syntax. -We have not made use of them yet, -but we will now look at one example of how they -can be used. - -Those readers who are familiar with functional programming languages -like ML and Haskell, may already have missed **polymorphic** -functions. For instance, Haskell programmers have access to -the functions -``` - const :: a -> b -> a - const c _ = c - - flip :: (a -> b -> c) -> b -> a -> c - flip f y x = f x y -``` -which can be used for any given types ``a``,``b``, and ``c``. - -The GF counterpart of polymorphic functions are **monomorphic** -functions with explicit **type variables**. Thus the above -definitions can be written -``` - oper const :(a,b : Type) -> a -> b -> a = - \_,_,c,_ -> c ; - - oper flip : (a,b,c : Type) -> (a -> b ->c) -> b -> a -> c = - \_,_,_,f,x,y -> f y x ; -``` -When the operations are used, the type checker requires -them to be equipped with all their arguments; this may be a nuisance -for a Haskell or ML programmer. - - - -==Proof objects== - -Perhaps the most well-known idea in constructive type theory is -the **Curry-Howard isomorphism**, also known as the -**propositions as types principle**. Its earliest formulations -were attempts to give semantics to the logical systems of -propositional and predicate calculus. In this section, we will consider -a more elementary example, showing how the notion of proof is useful -outside mathematics, as well. - -We first define the category of unary (also known as Peano-style) -natural numbers: -``` - cat Nat ; - fun Zero : Nat ; - fun Succ : Nat -> Nat ; -``` -The **successor function** ``Succ`` generates an infinite -sequence of natural numbers, beginning from ``Zero``. - -We then define what it means for a number //x// to be //less than// -a number //y//. Our definition is based on two axioms: -- ``Zero`` is less than ``Succ`` //y// for any //y//. -- If //x// is less than //y//, then ``Succ`` //x// is less than ``Succ`` //y//. - - -The most straightforward way of expressing these axioms in type theory -is as typing judgements that introduce objects of a type ``Less`` //x y//: -``` - cat Less Nat Nat ; - fun lessZ : (y : Nat) -> Less Zero (Succ y) ; - fun lessS : (x,y : Nat) -> Less x y -> Less (Succ x) (Succ y) ; -``` -Objects formed by ``lessZ`` and ``lessS`` are -called **proof objects**: they establish the truth of certain -mathematical propositions. -For instance, the fact that 2 is less that -4 has the proof object -``` - lessS (Succ Zero) (Succ (Succ (Succ Zero))) - (lessS Zero (Succ (Succ Zero)) (lessZ (Succ Zero))) -``` -whose type is -``` - Less (Succ (Succ Zero)) (Succ (Succ (Succ (Succ Zero)))) -``` -which is the formalization of the proposition that 2 is less than 4. - -GF grammars can be used to provide a **semantic control** of -well-formedness of expressions. We have already seen examples of this: -the grammar of well-formed actions on household devices. By introducing proof objects -we have now added a very powerful technique of expressing semantic conditions. - -A simple example of the use of proof objects is the definition of -well-formed //time spans//: a time span is expected to be from an earlier to -a later time: -``` - from 3 to 8 -``` -is thus well-formed, whereas -``` - from 8 to 3 -``` -is not. The following rules for spans impose this condition -by using the ``Less`` predicate: -``` - cat Span ; - fun span : (m,n : Nat) -> Less m n -> Span ; -``` - -**Exercise**. Write an abstract and concrete syntax with the -concepts of this section, and experiment with it in GF. - - -**Exercise**. Define the notions of "even" and "odd" in terms -of proof objects. **Hint**. You need one function for proving -that 0 is even, and two other functions for propagating the -properties. - - - - -===Proof-carrying documents=== - -Another possible application of proof objects is **proof-carrying documents**: -to be semantically well-formed, the abstract syntax of a document must contain a proof -of some property, although the proof is not shown in the concrete document. -Think, for instance, of small documents describing flight connections: - -//To fly from Gothenburg to Prague, first take LH3043 to Frankfurt, then OK0537 to Prague.// - -The well-formedness of this text is partly expressible by dependent typing: -``` - cat - City ; - Flight City City ; - fun - Gothenburg, Frankfurt, Prague : City ; - LH3043 : Flight Gothenburg Frankfurt ; - OK0537 : Flight Frankfurt Prague ; -``` -This rules out texts saying //take OK0537 from Gothenburg to Prague//. -However, there is a -further condition saying that it must be possible to -change from LH3043 to OK0537 in Frankfurt. -This can be modelled as a proof object of a suitable type, -which is required by the constructor -that connects flights. -``` - cat - IsPossible (x,y,z : City)(Flight x y)(Flight y z) ; - fun - Connect : (x,y,z : City) -> - (u : Flight x y) -> (v : Flight y z) -> - IsPossible x y z u v -> Flight x z ; -``` - - -==Restricted polymorphism== - -In the first version of the smart house grammar ``Smart``, -all Actions were either of -- **monomorphic**: defined for one Kind -- **polymorphic**: defined for all Kinds - - -To make this scale up for new Kinds, we can refine this to -**restricted polymorphism**: defined for Kinds of a certain **class** - - -The notion of class can be expressed in abstract syntax -by using the Curry-Howard isomorphism as follows: -- a class is a **predicate** of Kinds - i.e. a type depending of Kinds -- a Kind is in a class if there is a proof object of this type - - -Here is an example with switching and dimming. The classes are called -``switchable`` and ``dimmable``. -``` -cat - Switchable Kind ; - Dimmable Kind ; -fun - switchable_light : Switchable light ; - switchable_fan : Switchable fan ; - dimmable_light : Dimmable light ; - - switchOn : (k : Kind) -> Switchable k -> Action k ; - dim : (k : Kind) -> Dimmable k -> Action k ; -``` -One advantage of this formalization is that classes for new -actions can be added incrementally. - -**Exercise**. Write a new version of the ``Smart`` grammar with -classes, and test it in GF. - -**Exercise**. Add some actions, kinds, and classes to the grammar. -Try to port the grammar to a new language. You will probably find -out that restricted polymorphism works differently in different languages. -For instance, in Finnish not only doors but also TVs and radios -can be "opened", which means switching them on. - - -==Variable bindings== - -Mathematical notation and programming languages have -expressions that **bind** variables. For instance, -a universally quantifier proposition -``` - (All x)B(x) -``` -consists of the **binding** ``(All x)`` of the variable ``x``, -and the **body** ``B(x)``, where the variable ``x`` can have -**bound occurrences**. - -Variable bindings appear in informal mathematical language as well, for -instance, -``` - for all x, x is equal to x - - the function that for any numbers x and y returns the maximum of x+y - and x*y - - Let x be a natural number. Assume that x is even. Then x + 3 is odd. -``` -In type theory, variable-binding expression forms can be formalized -as functions that take functions as arguments. The universal -quantifier is defined -``` - fun All : (Ind -> Prop) -> Prop -``` -where ``Ind`` is the type of individuals and ``Prop``, -the type of propositions. If we have, for instance, the equality predicate -``` - fun Eq : Ind -> Ind -> Prop -``` -we may form the tree -``` - All (\x -> Eq x x) -``` -which corresponds to the ordinary notation -``` - (All x)(x = x). -``` -An abstract syntax where trees have functions as arguments, as in -the two examples above, has turned out to be precisely the right -thing for the semantics and computer implementation of -variable-binding expressions. The advantage lies in the fact that -only one variable-binding expression form is needed, the lambda abstract -``\x -> b``, and all other bindings can be reduced to it. -This makes it easier to implement mathematical theories and reason -about them, since variable binding is tricky to implement and -to reason about. The idea of using functions as arguments of -syntactic constructors is known as **higher-order abstract syntax**. - -The question now arises: how to define linearization rules -for variable-binding expressions? -Let us first consider universal quantification, -``` - fun All : (Ind -> Prop) -> Prop -``` -We write -``` - lin All B = {s = "(" ++ "All" ++ B.$0 ++ ")" ++ B.s} -``` -to obtain the form shown above. -This linearization rule brings in a new GF concept - the ``$0`` -field of ``B`` containing a bound variable symbol. -The general rule is that, if an argument type of a function is -itself a function type ``A -> C``, the linearization type of -this argument is the linearization type of ``C`` -together with a new field ``$0 : Str``. In the linearization rule -for ``All``, the argument ``B`` thus has the linearization -type -``` - {$0 : Str ; s : Str}, -``` -since the linearization type of ``Prop`` is -``` - {s : Str} -``` -In other words, the linearization of a function -consists of a linearization of the body together with a -field for a linearization of the bound variable. -Those familiar with type theory or lambda calculus -should notice that GF requires trees to be in -**eta-expanded** form in order to be linearizable: -any function of type -``` - A -> B -``` -always has a syntax tree of the form -``` - \x -> b -``` -where ``b : B`` under the assumption ``x : A``. -It is in this form that an expression can be analysed -as having a bound variable and a body. - - -Given the linearization rule -``` - lin Eq a b = {s = "(" ++ a.s ++ "=" ++ b.s ++ ")"} -``` -the linearization of -``` - \x -> Eq x x -``` -is the record -``` - {$0 = "x", s = ["( x = x )"]} -``` -Thus we can compute the linearization of the formula, -``` - All (\x -> Eq x x) --> {s = "[( All x ) ( x = x )]"}. -``` -How did we get the //linearization// of the variable ``x`` -into the string ``"x"``? GF grammars have no rules for -this: it is just hard-wired in GF that variable symbols are -linearized into the same strings that represent them in -the print-out of the abstract syntax. - -To be able to //parse// variable symbols, however, GF needs to know what -to look for (instead of e.g. trying to parse //any// -string as a variable). What strings are parsed as variable symbols -is defined in the lexical analysis part of GF parsing -``` - > p -cat=Prop -lexer=codevars "(All x)(x = x)" - All (\x -> Eq x x) -``` -(see more details on lexers below). If several variables are bound in the -same argument, the labels are ``$0, $1, $2``, etc. - - -**Exercise**. Write an abstract syntax of the whole -**predicate calculus**, with the -**connectives** "and", "or", "implies", and "not", and the -**quantifiers** "exists" and "for all". Use higher-order functions -to guarantee that unbounded variables do not occur. - -**Exercise**. Write a concrete syntax for your favourite -notation of predicate calculus. Use Latex as target language -if you want nice output. You can also try producing Haskell boolean -expressions. Use as many parenthesis as you need to -guarantee non-ambiguity. - - - -==Semantic definitions== - -We have seen that, -just like functional programming languages, GF has declarations -of functions, telling what the type of a function is. -But we have not yet shown how to **compute** -these functions: all we can do is provide them with arguments -and linearize the resulting terms. -Since our main interest is the well-formedness of expressions, -this has not yet bothered -us very much. As we will see, however, computation does play a role -even in the well-formedness of expressions when dependent types are -present. - -GF has a form of judgement for **semantic definitions**, -recognized by the key word ``def``. At its simplest, it is just -the definition of one constant, e.g. -``` - def one = Succ Zero ; -``` -We can also define a function with arguments, -``` - def Neg A = Impl A Abs ; -``` -which is still a special case of the most general notion of -definition, that of a group of **pattern equations**: -``` - def - sum x Zero = x ; - sum x (Succ y) = Succ (Sum x y) ; -``` -To compute a term is, as in functional programming languages, -simply to follow a chain of reductions until no definition -can be applied. For instance, we compute -``` - Sum one one --> - Sum (Succ Zero) (Succ Zero) --> - Succ (sum (Succ Zero) Zero) --> - Succ (Succ Zero) -``` -Computation in GF is performed with the ``pt`` command and the -``compute`` transformation, e.g. -``` - > p -tr "1 + 1" | pt -transform=compute -tr | l - sum one one - Succ (Succ Zero) - s(s(0)) -``` - -The ``def`` definitions of a grammar induce a notion of -**definitional equality** among trees: two trees are -definitionally equal if they compute into the same tree. -Thus, trivially, all trees in a chain of computation -(such as the one above) -are definitionally equal to each other. So are the trees -``` - sum Zero (Succ one) - Succ one - sum (sum Zero Zero) (sum (Succ Zero) one) -``` -and infinitely many other trees. - -A fact that has to be emphasized about ``def`` definitions is that -they are //not// performed as a first step of linearization. -We say that **linearization is intensional**, which means that -the definitional equality of two trees does not imply that -they have the same linearizations. For instance, each of the seven terms -shown above has a different linearizations in arithmetic notation: -``` - 1 + 1 - s(0) + s(0) - s(s(0) + 0) - s(s(0)) - 0 + s(0) - s(1) - 0 + 0 + s(0) + 1 -``` -This notion of intensionality is -no more exotic than the intensionality of any **pretty-printing** -function of a programming language (function that shows -the expressions of the language as strings). It is vital for -pretty-printing to be intensional in this sense - if we want, -for instance, to trace a chain of computation by pretty-printing each -intermediate step, what we want to see is a sequence of different -expression, which are definitionally equal. - -What is more exotic is that GF has two ways of referring to the -abstract syntax objects. In the concrete syntax, the reference is intensional. -In the abstract syntax, the reference is extensional, since -**type checking is extensional**. The reason is that, -in the type theory with dependent types, types may depend on terms. -Two types depending on terms that are definitionally equal are -equal types. For instance, -``` - Proof (Odd one) - Proof (Odd (Succ Zero)) -``` -are equal types. Hence, any tree that type checks as a proof that -1 is odd also type checks as a proof that the successor of 0 is odd. -(Recall, in this connection, that the -arguments a category depends on never play any role -in the linearization of trees of that category, -nor in the definition of the linearization type.) - -In addition to computation, definitions impose a -**paraphrase** relation on expressions: -two strings are paraphrases if they -are linearizations of trees that are -definitionally equal. -Paraphrases are sometimes interesting for -translation: the **direct translation** -of a string, which is the linearization of the same tree -in the targer language, may be inadequate because it is e.g. -unidiomatic or ambiguous. In such a case, -the translation algorithm may be made to consider -translation by a paraphrase. - -To stress express the distinction between -**constructors** (=**canonical** functions) -and other functions, GF has a judgement form -``data`` to tell that certain functions are canonical, e.g. -``` - data Nat = Succ | Zero ; -``` -Unlike in Haskell, but similarly to ALF (where constructor functions -are marked with a flag ``C``), -new constructors can be added to -a type with new ``data`` judgements. The type signatures of constructors -are given separately, in ordinary ``fun`` judgements. -One can also write directly -``` - data Succ : Nat -> Nat ; -``` -which is equivalent to the two judgements -``` - fun Succ : Nat -> Nat ; - data Nat = Succ ; -``` - -**Exercise**. Implement an interpreter of a small functional programming -language with natural numbers, lists, pairs, lambdas, etc. Use higher-order -abstract syntax with semantic definitions. As target language, use -your favourite programming language. - -**Exercise**. To make your interpreted language look nice, use -**precedences** instead of putting parentheses everywhere. -You can use the [precedence library ../../lib/prelude/Precedence.gf] -of GF to facilitate this. - - - -#PARTtwo - -=Embedded grammars in Haskell= - -GF grammars can be used as parts of programs written in the -following languages. We will go through a skeleton application in -Haskell, while the next chapter will show how to build an -application in Java. - -We will show how to build a minimal resource grammar -application whose architecture scales up to much -larger applications. The application is run from the -shell by the command -``` - math -``` -whereafter it reads user input in English and French. -To each input line, it answers by the truth value of -the sentence. -``` - ./math - zéro est pair - True - zero is odd - False - zero is even and zero is odd - False -``` -The source of the application consists of the following -files: -``` - LexEng.gf -- English instance of Lex - LexFre.gf -- French instance of Lex - Lex.gf -- lexicon interface - Makefile -- a makefile - MathEng.gf -- English instantiation of MathI - MathFre.gf -- French instantiation of MathI - Math.gf -- abstract syntax - MathI.gf -- concrete syntax functor for Math - Run.hs -- Haskell Main module -``` -The system was built in 22 steps explained below. - - -==Writing GF grammars== - -===Creating the first grammar=== - -1. Write ``Math.gf``, which defines what you want to say. -``` - abstract Math = { - cat Prop ; Elem ; - fun - And : Prop -> Prop -> Prop ; - Even : Elem -> Prop ; - Zero : Elem ; - } -``` -2. Write ``Lex.gf``, which defines which language-dependent -parts are needed in the concrete syntax. These are mostly -words (lexicon), but can in fact be any operations. The definitions -only use resource abstract syntax, which is opened. -``` - interface Lex = open Syntax in { - oper - even_A : A ; - zero_PN : PN ; - } -``` -3. Write ``LexEng.gf``, the English implementation of ``Lex.gf`` -This module uses English resource libraries. -``` - instance LexEng of Lex = open GrammarEng, ParadigmsEng in { - oper - even_A = regA "even" ; - zero_PN = regPN "zero" ; - - } -``` -4. Write ``MathI.gf``, a language-independent concrete syntax of -``Math.gf``. It opens interfaces. -which makes it an incomplete module, aka. parametrized module, aka. -functor. -``` - incomplete concrete MathI of Math = - - open Syntax, Lex in { - - flags startcat = Prop ; - - lincat - Prop = S ; - Elem = NP ; - lin - And x y = mkS and_Conj x y ; - Even x = mkS (mkCl x even_A) ; - Zero = mkNP zero_PN ; - } -``` -5. Write ``MathEng.gf``, which is just an instatiation of ``MathI.gf``, -replacing the interfaces by their English instances. This is the module -that will be used as a top module in GF, so it contains a path to -the libraries. -``` - instance LexEng of Lex = open SyntaxEng, ParadigmsEng in { - oper - even_A = mkA "even" ; - zero_PN = mkPN "zero" ; - } -``` - - -===Testing=== - -6. Test the grammar in GF by random generation and parsing. -``` - $ gf - > i MathEng.gf - > gr -tr | l -tr | p - And (Even Zero) (Even Zero) - zero is evenand zero is even - And (Even Zero) (Even Zero) -``` -When importing the grammar, you will fail if you haven't -- correctly defined your ``GF_LIB_PATH`` as ``GF/lib`` -- installed the resource package or - compiled the resource from source by ``make`` in ``GF/lib/resource-1.0`` - - - -===Adding a new language=== - -7. Now it is time to add a new language. Write a French lexicon ``LexFre.gf``: -``` - instance LexFre of Lex = open SyntaxFre, ParadigmsFre in { - oper - even_A = mkA "pair" ; - zero_PN = mkPN "zéro" ; - } -``` -8. You also need a French concrete syntax, ``MathFre.gf``: -``` - --# -path=.:present:prelude - - concrete MathFre of Math = MathI with - (Syntax = SyntaxFre), - (Lex = LexFre) ; -``` -9. This time, you can test multilingual generation: -``` - > i MathFre.gf - > gr | tb - Even Zero - zéro est pair - zero is even -``` - - -===Extending the language=== - -10. You want to add a predicate saying that a number is odd. -It is first added to ``Math.gf``: -``` - fun Odd : Elem -> Prop ; -``` -11. You need a new word in ``Lex.gf``. -``` - oper odd_A : A ; -``` -12. Then you can give a language-independent concrete syntax in -``MathI.gf``: -``` - lin Odd x = mkS (mkCl x odd_A) ; -``` -13. The new word is implemented in ``LexEng.gf``. -``` - oper odd_A = mkA "odd" ; -``` -14. The new word is implemented in ``LexFre.gf``. -``` - oper odd_A = mkA "impair" ; -``` -15. Now you can test with the extended lexicon. First empty -the environment to get rid of the old abstract syntax, then -import the new versions of the grammars. -``` - > e - > i MathEng.gf - > i MathFre.gf - > gr | tb - And (Odd Zero) (Even Zero) - zéro est impair et zéro est pair - zero is odd and zero is even -``` - - -==Building a user program== - -===Producing a compiled grammar package=== - -16. Your grammar is going to be used by persons wh``MathEng.gf``o do not need -to compile it again. They may not have access to the resource library, -either. Therefore it is advisable to produce a multilingual grammar -package in a single file. We call this package ``math.gfcm`` and -produce it, when we have ``MathEng.gf`` and -``MathEng.gf`` in the GF state, by the command -``` - > pm | wf math.gfcm -``` - - -===Writing the Haskell application=== - -17. Write the Haskell main file ``Run.hs``. It uses the ``EmbeddedAPI`` -module defining some basic functionalities such as parsing. -The answer is produced by an interpreter of trees returned by the parser. -``` -module Main where - -import GSyntax -import GF.Embed.EmbedAPI - -main :: IO () -main = do - gr <- file2grammar "math.gfcm" - loop gr - -loop :: MultiGrammar -> IO () -loop gr = do - s <- getLine - interpret gr s - loop gr - -interpret :: MultiGrammar -> String -> IO () -interpret gr s = do - let tss = parseAll gr "Prop" s - case (concat tss) of - [] -> putStrLn "no parse" - t:_ -> print $ answer $ fg t - -answer :: GProp -> Bool -answer p = case p of - (GOdd x1) -> odd (value x1) - (GEven x1) -> even (value x1) - (GAnd x1 x2) -> answer x1 && answer x2 - -value :: GElem -> Int -value e = case e of - GZero -> 0 -``` - -18. The syntax trees manipulated by the interpreter are not raw -GF trees, but objects of the Haskell datatype ``GProp``. -From any GF grammar, a file ``GFSyntax.hs`` with -datatypes corresponding to its abstract -syntax can be produced by the command -``` - > pg -printer=haskell | wf GSyntax.hs -``` -The module also defines the overloaded functions -``gf`` and ``fg`` for translating from these types to -raw trees and back. - - -===Compiling the Haskell grammar=== - -19. Before compiling ``Run.hs``, you must check that the -embedded GF modules are found. The easiest way to do this -is by two symbolic links to your GF source directories: -``` - $ ln -s /home/aarne/GF/src/GF - $ ln -s /home/aarne/GF/src/Transfer/ -``` - -20. Now you can run the GHC Haskell compiler to produce the program. -``` - $ ghc --make -o math Run.hs -``` -The program can be tested with the command ``./math``. - - -===Building a distribution=== - -21. For a stand-alone binary-only distribution, only -the two files ``math`` and ``math.gfcm`` are needed. -For a source distribution, the files mentioned in -the beginning of this documents are needed. - - -===Using a Makefile=== - -22. As a part of the source distribution, a ``Makefile`` is -essential. The ``Makefile`` is also useful when developing the -application. It should always be possible to build an executable -from source by typing ``make``. Here is a minimal such ``Makefile``: -``` - all: - echo "pm | wf math.gfcm" | gf MathEng.gf MathFre.gf - echo "pg -printer=haskell | wf GSyntax.hs" | gf math.gfcm - ghc --make -o math Run.hs -``` - - -==The Embedded GF Haskell API== - - - -=Embedded grammars in Java= - -Forthcoming; at the moment, the document - - [``http://www.cs.chalmers.se/~bringert/gf/gf-java.html`` http://www.cs.chalmers.se/~bringert/gf/gf-java.html] - -by Björn Bringert gives more information on Java. - - -=Spoken language translators= - - -=Multimodal dialogue systems= - - -=Grammar of formal languages= - -==Precedence and ficity== - -==Higher-order abstract syntax== - -==Extensible natural-language interfaces== - - - -=Inside the resource grammar library= - -==Writing your own resource implementation== - -==Parametrized modules for language families== - - - -=Using Transfer for semantics actions= - - - -#PARTthree - - -=Syntax and semantics of the GF grammar formalism= - -=The resource grammar API= - -=The GFC format= - -=The command language of the GF shell= - -==Lexers and unlexers== - -Lexers and unlexers can be chosen from -a list of predefined ones, using the flags``-lexer`` and `` -unlexer`` either -in the grammar file or on the GF command line. Here are some often-used lexers -and unlexers: -``` - The default is words. - -lexer=words tokens are separated by spaces or newlines - -lexer=literals like words, but GF integer and string literals recognized - -lexer=vars like words, but "x","x_...","$...$" as vars, "?..." as meta - -lexer=chars each character is a token - -lexer=code use Haskell's lex - -lexer=codevars like code, but treat unknown words as variables, ?? as meta - -lexer=text with conventions on punctuation and capital letters - -lexer=codelit like code, but treat unknown words as string literals - -lexer=textlit like text, but treat unknown words as string literals - - The default is unwords. - -unlexer=unwords space-separated token list (like unwords) - -unlexer=text format as text: punctuation, capitals, paragraph

- -unlexer=code format as code (spacing, indentation) - -unlexer=textlit like text, but remove string literal quotes - -unlexer=codelit like code, but remove string literal quotes - -unlexer=concat remove all spaces -``` -More options can be found by ``help -lexer`` and ``help -unlexer``: - - - - -==Speech input and output== - -The ``speak_aloud = sa`` command sends a string to the speech -synthesizer -[Flite http://www.speech.cs.cmu.edu/flite/doc/]. -It is typically used via a pipe: -``` generate_random | linearize | speak_aloud -The result is only satisfactory for English. - -The ``speech_input = si`` command receives a string from a -speech recognizer that requires the installation of -[ATK http://mi.eng.cam.ac.uk/~sjy/software.htm]. -It is typically used to pipe input to a parser: -``` speech_input -tr | parse -The method words only for grammars of English. - -Both Flite and ATK are freely available through the links -above, but they are not distributed together with GF. - - - -==Multilingual syntax editor== - -The -[Editor User Manual http://www.cs.chalmers.se/~aarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm] -describes the use of the editor, which works for any multilingual GF grammar. - -Here is a snapshot of the editor: - -%#BCEN - -%#EDITORPNG - -%#ECEN - - -The grammars of the snapshot are from the -[Letter grammar package http://www.cs.chalmers.se/~aarne/GF/examples/letter]. - - -==Communicating with GF== - -Other processes can communicate with the GF command interpreter, -and also with the GF syntax editor. Useful flags when invoking GF are -- ``-batch`` suppresses the promps and structures the communication with XML tags. -- ``-s`` suppresses non-output non-error messages and XML tags. -- ``-nocpu`` suppresses CPU time indication. - - -Thus the most silent way to invoke GF is -``` - gf -batch -s -nocpu -``` - - - - - - -=Further reading= - -Syntax Editor User Manual: - -[``http://www.cs.chalmers.se/~aarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm`` http://www.cs.chalmers.se/~aarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm] - -Resource Grammar Synopsis (on using resource grammars): - -[``http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/synopsis.html`` ../../lib/resource-1.0/synopsis.html] - -Resource Grammar HOWTO (on writing resource grammars): - -[``http://www.cs.chalmers.se/~aarne/GF/lib/resource-1.0/synopsis.html`` ../../lib/resource-1.0/doc/Resource-HOWTO.html] - -GF Homepage: - -[``http://www.cs.chalmers.se/~aarne/GF/doc`` ../..] - diff --git a/doc/tutorial/prelude b/doc/tutorial/prelude index e1790817b..3f7b84056 100644 --- a/doc/tutorial/prelude +++ b/doc/tutorial/prelude @@ -1,6 +1,12 @@ -\documentclass[11pt]{book} +\documentclass[nwbk_0pt]{book} \usepackage[latin1]{inputenc} +%\setlength{\oddsidemargin}{0mm} +%\setlength{\evensidemargin}{-2mm} +%\setlength{\topmargin}{-12mm} +%\setlength{\textheight}{220mm} +%\setlength{\textwidth}{158mm} + \newcommand{\bequ}{\begin{quote}} \newcommand{\enqu}{\end{quote}} %%% \ No newline at end of file