From 777028dcd8b4038d1fd725f8e851f218f78fab14 Mon Sep 17 00:00:00 2001
From: "John J. Camilleri"
-
-
-This is a hands-on introduction to grammar writing in GF.
-
-Main ingredients of GF:
-
-Prerequisites:
-
-
-
-Lesson 1: a multilingual "Hello World" grammar. English, Finnish, Italian.
-
-Lesson 2: a larger grammar for the domain of food. English and Italian.
-
-Lesson 3: parameters - morphology and agreement.
-
-Lesson 4: using the resource grammar library.
-
-Lesson 5: semantics - dependent types, variable bindings,
-and semantic definitions.
-
-Lesson 6: implementing formal languages.
-
-Lesson 7: embedded grammar applications.
-
-
-
-You can chop this tutorial into a set of slides by the command
-
-where the program
-The slides will appear as a set of files beginning with
-Internal links will not work in the slide format, except for those in the
-upper left corner of each slide, and the links behind the "Contents" link.
-
-
-
-Goals:
-
-
-
-We use the term GF for three different things:
-
-The GF system is an implementation
-of the GF programming language, which in turn is built on the ideas of the
-GF theory.
-
-The focus of this tutorial is on using the GF programming language.
-
-At the same time, we learn the way of thinking in the GF theory.
-
-We make the grammars run on a computer by
-using the GF system.
-
-
-
-A GF program is called a grammar.
-
-A grammar defines a language.
-
-From this definition, language processing components can be derived:
-
-In general, a GF grammar is multilingual:
-
-
-
-Open-source free software, downloaded via the GF Homepage:
-
-There you find
-
-Many examples in this tutorial are
-online.
-
-Normally you don't have to compile GF yourself.
-But, if you do want to compile GF from source follow the
-instructions in the Developers Guide.
-
-
-
-Type
-You will see GF's welcome message and the prompt
-will give you a list of available commands.
-
-As a common convention, we will use
-
-Thus you should not type these prompts, but only the characters that
-follow them.
-
-
-
-Like most programming language tutorials, we start with a
-program that prints "Hello World" on the terminal.
-
-Extra features:
-
-
-
-A GF program, in general, is a multilingual grammar. Its main parts
-are
-
-The abstract syntax defines what meanings
-can be expressed in the grammar
-
-
-
-GF code for the abstract syntax:
-
-The code has the following parts:
-
-
-
-English concrete syntax (mapping from meanings to strings):
-
-The major parts of this code are:
-
-Notice the concatenation
-
-
-Finnish and an Italian concrete syntaxes:
-
-
-
-In order to compile the grammar in GF,
-we create four files, one for each module, named Modulename
-The first GF command: import a grammar.
-
-All commands also have short names; here:
-
-The GF system will compile your grammar
-into an internal representation and show the CPU time was consumed, followed
-by a new prompt:
-
-
-
-You can use GF for parsing (
-Parsing takes a string into an abstract syntax tree.
-
-The notation for trees is that of function application:
-
-Parentheses are only needed for grouping.
-
-Parsing something that is not in grammar will fail:
-
-
-
-You can also use GF for linearization (
-Translation: pipe linearization to parsing:
-
-Default of the language flag (
-Multilingual generation:
-
-Linearization is by default to all available languages.
-
-
-
-
-
-You can use the
-You can also write a script, a file containing the lines
-
-
-
-If we name this script
-The option
-See Lesson 7, for stand-alone programs that don't need the GF system to run.
-
-Exercise. (For Unix hackers.) Write a GF application that reads
-an English string from the standard input and writes an Italian
-translation to the output.
-
-
-
-Some more functions that will be covered:
-
-
-
-Application programs, using techniques from Lesson 7:
-
-
-
-Goals:
-
-
-
-Phrases usable for speaking about food:
-
-Abstract syntax:
-
-Example
-
-
-
-
-Test the grammar for parsing:
-
-Parse in other categories setting the
-
-
-
-
-Random generation (
-By using a pipe, random generation can be fed into linearization:
-
-Use the
-
-
-To generate all phrases that a grammar can produce,
-use
-The default depth is 3; the depth can be
-set by using the
-What options a command has can be seen by the
-
-
-
-
-Put the tracing option
-Useful for test purposes: the pipe above can show
-if a grammar is ambiguous, i.e.
-contains strings that can be parsed in more than one way.
-
-Exercise. Extend the
-
-
-To save the outputs into a file, pipe it to the
-To read a file to GF, use the
-The flag
-Files with examples can be used for regression testing
-of grammars - the most systematic way to do this is by
-treebanks; see here.
-
-
-
-Parentheses give a linear representation of trees,
-useful for the computer.
-
-Human eye may prefer to see a visualization:
-The tree is generated in postscript (
-
-This command uses the program Graphviz, which you
-might not have, but which are freely available on the web.
-
-You can save the temporary file
-Then you can process this file with the
-You can also visualize parse trees, which show categories and words instead of
-function symbols. The command is
-
-
-
-You can give a system command without leaving GF:
-
-A system command may also receive its argument from
-a GF pipes. It then uses the symbol
-This command example returns the number of generated trees.
-
-Exercise.
-Measure how many trees the grammar
-
-
-Just (?) replace English words with their dictionary equivalents:
-
-
-
-Not just replacing words:
-
-The order of a quality and the kind it modifies is changed in
-
-Thus Italian says
-(Some Italian adjectives
-are put before the noun. This distinction can be controlled by parameters,
-which are introduced in Lesson 3.)
-
-Multilingual grammars have yet another visualization option:
-word alignment, which shows what words correspond to each other.
-Technically, this means words that have the same smallest spanning subtrees
-in abstract syntax. The command is
-
-
-
-
-
-Semantically indistinguishable ways of expressing a thing.
-
-The variants construct of GF expresses free variation. For example,
-
-By default, the
-
-
-An equivalent notation for variants is
-
-This notation also allows the limiting case: an empty variant list,
-
-It can be used e.g. if a word lacks a certain inflection form.
-
-Free variation works for all types in concrete syntax; all terms in
-a variant list must be of the same type.
-
-
-
-Multilingual treebank: a set of trees with their
-linearizations in different languages:
-
-
-
-
-
-
-The grammar
-GF can convert BNF grammars into GF.
-BNF files are recognized by the file name suffix
-The compiler creates separate abstract and concrete modules internally.
-
-
-
-Separating concrete and abstract syntax allows
-three deviations from context-free grammar:
-
-Exercise. Define the non-context-free
-copy language
-
-
-GF uses suffixes to recognize different file formats:
-
-Importing generates target from source:
-
-The
-When reading a module, GF decides whether
-to use an existing
-
-
-Exercise. What happens when you import
-
-
-The golden rule of functional programmin:
-
-Whenever you find yourself programming by copy-and-paste, write a function instead.
-
-Functions in concrete syntax are defined using the keyword
-Example:
-
-The operation can be applied to an argument, and GF will
-compute the value:
-
-The symbol
-
-
-Notice the lambda abstraction form
-
-This is read:
-
-For lambda abstraction with multiple arguments, we have the shorthand
-
-Linearization rules actually use syntactic
-sugar for abstraction:
-
-
-
-The
-
-
-Any number of
-
-
-The rule
-
-can be written more concisely
-
-Part of the art in functional programming:
-decide the order of arguments in a function,
-so that partial application can be used as much as possible.
-
-For instance,
-Exercise. Define an operation
-
-
-Import with the flag
-Compute the value with
-
-
-A new module can extend an old one:
-
-Parallel to the abstract syntax, extensions can
-be built for concrete syntaxes:
-
-The effect of extension: all of the contents of the extended
-and extending modules are put together.
-
-In other words: the new module inherits the contents of the old module.
-
-
-
-Simultaneous extension and opening:
-
-Resource modules can extend other resource modules - thus it is
-possible to build resource hierarchies.
-
-
-
-Extend several grammars at the same time:
-
-where
-
-Exercise. Refactor
-
-
-Goals:
-
-It is possible to skip this chapter and go directly
-to the next, since the use of the GF Resource Grammar library
-makes it unnecessary to use parameters: they
-could be left to library implementors.
-
-
-
-Plural forms are needed in things like
-Grammatical Framework Tutorial
-
-Aarne Ranta
-December 2010 for GF 3.2
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-Overview
-
-
-
-
-
-
-Outline
-Slides
-
- htmls gf-tutorial.html
-
-htmls is distributed with GF (see below), in
-01-gf-tutorial.htmls.
-Lesson 1: Getting Started with GF
-
-
-
-
-What GF is
-
-
-
-GF grammars and language processing tasks
-
-
-
-
-
-
-Getting the GF system
-
-
-
-Running the GF system
-gf in the Unix (or Cygwin) shell:
-
- % gf
-
->.
-The command
-
- > help
-
-
-
-
-% as a prompt that marks system commands
-> as a prompt that marks GF commands
-A "Hello World" grammar
-
-
-
-The program: abstract syntax and concrete syntaxes
-
-
-
-
-
-
-
- -- a "Hello World" grammar
- abstract Hello = {
-
- flags startcat = Greeting ;
-
- cat Greeting ; Recipient ;
-
- fun
- Hello : Recipient -> Greeting ;
- World, Mum, Friends : Recipient ;
- }
-
-
-
-
-Hello
-
-
-Greeting is the
- default start category for parsing and generation
-
- concrete HelloEng of Hello = {
-
- lincat Greeting, Recipient = {s : Str} ;
-
- lin
- Hello recip = {s = "hello" ++ recip.s} ;
- World = {s = "world"} ;
- Mum = {s = "mum"} ;
- Friends = {s = "friends"} ;
- }
-
-
-
-
-Hello, itself named HelloEng
-
-
-Greeting and Recipient are records with a string s
- ++ and the record projection ..
-
- concrete HelloFin of Hello = {
- lincat Greeting, Recipient = {s : Str} ;
- lin
- Hello recip = {s = "terve" ++ recip.s} ;
- World = {s = "maailma"} ;
- Mum = {s = "äiti"} ;
- Friends = {s = "ystävät"} ;
- }
-
- concrete HelloIta of Hello = {
- lincat Greeting, Recipient = {s : Str} ;
- lin
- Hello recip = {s = "ciao" ++ recip.s} ;
- World = {s = "mondo"} ;
- Mum = {s = "mamma"} ;
- Friends = {s = "amici"} ;
- }
-
-
-Using grammars in the GF system
-.gf:
-
- Hello.gf HelloEng.gf HelloFin.gf HelloIta.gf
-
-
- > import HelloEng.gf
-
-
- > i HelloEng.gf
-
-
- > i HelloEng.gf
- - compiling Hello.gf... wrote file Hello.gfo 8 msec
- - compiling HelloEng.gf... wrote file HelloEng.gfo 12 msec
-
- 12 msec
- >
-
-
-parse = p)
-
- > parse "hello world"
- Hello World
-
-
- function argument1 ... argumentn
-
-
- > parse "hello dad"
- Unknown words: dad
-
- > parse "world hello"
- no tree found
-
-
-linearize = l).
-It takes trees into strings:
-
- > linearize Hello World
- hello world
-
-
- > import HelloEng.gf
- > import HelloIta.gf
-
- > parse -lang=HelloEng "hello mum" | linearize -lang=HelloIta
- ciao mamma
-
--lang): the last-imported concrete syntax.
-
- > parse -lang=HelloEng "hello friends" | linearize
- terve ystävät
- ciao amici
- hello friends
-
-Exercises on the Hello World grammar
-
-
-
-Hello.gf and some of the
-concrete syntaxes by five new recipients and one new greeting
-form.
-
-Hello grammars, for example, leave out
-some line, omit a variable in a lin rule, or change the name
-in one occurrence
-of a variable. Inspect the error messages generated by GF.
-Using grammars from outside GF
-gf program in a Unix pipe.
-
-
-
-
- % echo "l Hello World" | gf HelloEng.gf HelloFin.gf HelloIta.gf
-
-
- import HelloEng.gf
- import HelloFin.gf
- import HelloIta.gf
- linearize Hello World
-
-
-GF scripts
-hello.gfs, we can do
-
- $ gf --run <hello.gfs
-
- ciao mondo
- terve maailma
- hello world
-
---run removes prompts, CPU time, and other messages.
-What else can be done with the grammar
-
-
-
-Embedded grammar applications
-
-
-
-
-
-Lesson 2: Designing a grammar for complex phrases
-
-
-
-
-The abstract syntax Food
-
-
-
-Phrase
-Phrase can be built by assigning a Quality to an Item
- (e.g. this cheese is Italian)
-Item is build from a Kind by prefixing this or that
- (e.g. this wine)
-Kind is either atomic (e.g. cheese), or formed
- qualifying a given Kind with a Quality (e.g. Italian cheese)
-Quality is either atomic (e.g. Italian,
- or built by modifying a given Quality with the word very (e.g. very warm)
-
- abstract Food = {
-
- flags startcat = Phrase ;
-
- cat
- Phrase ; Item ; Kind ; Quality ;
-
- fun
- Is : Item -> Quality -> Phrase ;
- This, That : Kind -> Item ;
- QKind : Quality -> Kind -> Kind ;
- Wine, Cheese, Fish : Kind ;
- Very : Quality -> Quality ;
- Fresh, Warm, Italian, Expensive, Delicious, Boring : Quality ;
- }
-
-Phrase
-
- Is (This (QKind Delicious (QKind Italian Wine))) (Very (Very Expensive))
- this delicious Italian wine is very very expensive
-
-
-The concrete syntax FoodEng
-
- concrete FoodEng of Food = {
-
- lincat
- Phrase, Item, Kind, Quality = {s : Str} ;
-
- lin
- Is item quality = {s = item.s ++ "is" ++ quality.s} ;
- This kind = {s = "this" ++ kind.s} ;
- That kind = {s = "that" ++ kind.s} ;
- QKind quality kind = {s = quality.s ++ kind.s} ;
- Wine = {s = "wine"} ;
- Cheese = {s = "cheese"} ;
- Fish = {s = "fish"} ;
- Very quality = {s = "very" ++ quality.s} ;
- Fresh = {s = "fresh"} ;
- Warm = {s = "warm"} ;
- Italian = {s = "Italian"} ;
- Expensive = {s = "expensive"} ;
- Delicious = {s = "delicious"} ;
- Boring = {s = "boring"} ;
- }
-
-
-
- > import FoodEng.gf
- > parse "this delicious wine is very very Italian"
- Is (This (QKind Delicious Wine)) (Very (Very Italian))
-
-cat flag:
-
- p -cat=Kind "very Italian wine"
- QKind (Very Italian) Wine
-
-
-Exercises on the Food grammar
-
-
-
-Food grammar by ten new food kinds and
-qualities, and run the parser with new kinds of examples.
-
-Commands for testing grammars
-
-Generating trees and strings
-generate_random = gr): build
-build a random tree in accordance with an abstract syntax:
-
- > generate_random
- Is (This (QKind Italian Fish)) Fresh
-
-
- > generate_random | linearize
- this Italian fish is fresh
-
-number flag to generate several trees:
-
- > gr -number=4 | l
- that wine is boring
- that fresh cheese is fresh
- that cheese is very boring
- this cheese is Italian
-
-
-generate_trees = gt.
-
- > generate_trees | l
- that cheese is very Italian
- that cheese is very boring
- that cheese is very delicious
- ...
- this wine is fresh
- this wine is warm
-
-depth flag:
-
- > generate_trees -depth=2 | l
-
-help = h command:
-
- > help gr
- > help gt
-
-
-Exercises on generation
-
-
-
-gt generated all
-trees in your grammar, it would never terminate. Why?
-
-wc to count lines.
-More on pipes: tracing
--tr to each command whose output you
-want to see:
-
- > gr -tr | l -tr | p
-
- Is (This Cheese) Boring
- this cheese is boring
- Is (This Cheese) Boring
-
-Food grammar so that it produces ambiguous
-strings, and try out the ambiguity test.
-Writing and reading files
-write_file = wf command,
-
- > gr -number=10 | linearize | write_file -file=exx.tmp
-
-read_file = rf command,
-
- > read_file -file=exx.tmp -lines | parse
-
--lines tells GF to read each line of the file separately.
-Visualizing trees
-visualize_tree = vt:
-
- > parse "this delicious cheese is very Italian" | visualize_tree
-
-.ps) file. The -view option is used for
-telling what command to use to view the file. Its default is "open", which works
-on Mac OS X. On Ubuntu Linux, one can write
-
- > parse "this delicious cheese is very Italian" | visualize_tree -view="eog"
-
-
-
-_grph.dot,
-which the command vt produces.
-dot
-program (from the Graphviz package).
-
- % dot -Tpng _grph.dot > mytree.png
-
-visualize_parse = vp:
-
- > parse "this delicious cheese is very Italian" | visualize_parse
-
-
-
-System commands
-! followed by a Unix command,
-
- > ! dot -Tpng grphtmp.dot > mytree.png
- > ! open mytree.png
-
-?:
-
- > generate_trees -depth=4 | ? wc -l
-
-FoodEng gives with depths 4 and 5,
-respectively. Use the Unix word count command wc to count lines, and
-a system pipe from a GF command into a Unix command.
-An Italian concrete syntax
-
-
- concrete FoodIta of Food = {
-
- lincat
- Phrase, Item, Kind, Quality = {s : Str} ;
-
- lin
- Is item quality = {s = item.s ++ "è" ++ quality.s} ;
- This kind = {s = "questo" ++ kind.s} ;
- That kind = {s = "quel" ++ kind.s} ;
- QKind quality kind = {s = kind.s ++ quality.s} ;
- Wine = {s = "vino"} ;
- Cheese = {s = "formaggio"} ;
- Fish = {s = "pesce"} ;
- Very quality = {s = "molto" ++ quality.s} ;
- Fresh = {s = "fresco"} ;
- Warm = {s = "caldo"} ;
- Italian = {s = "italiano"} ;
- Expensive = {s = "caro"} ;
- Delicious = {s = "delizioso"} ;
- Boring = {s = "noioso"} ;
- }
-
-
-
- QKind quality kind = {s = kind.s ++ quality.s} ;
-
-vino italiano for Italian wine.
-align_words = aw:
-
- > parse "this delicious cheese is very Italian" | align_words
-
-
-
-Exercises on multilinguality
-
-
-
-Food for some other language.
-You will probably end up with grammatically incorrect
-linearizations - but don't
-worry about this yet.
-
-Food for German, Swedish, or some
-other language, test with random or exhaustive generation what constructs
-come out incorrect, and prepare a list of those ones that cannot be helped
-with the currently available fragment of GF. You can return to your list
-after having worked out Lesson 3.
-Free variation
-
- lin Delicious = {s = "delicious" | "exquisit" | "tasty"} ;
-
-linearize command
-shows only the first variant from such lists; to see them
-all, use the option -all:
-
- > p "this exquisit wine is delicious" | l -all
- this delicious wine is delicious
- this delicious wine is exquisit
- ...
-
-
-
- lin Delicious = {s = variants {"delicious" ; "exquisit" ; "tasty"}} ;
-
-
- variants {}
-
-More application of multilingual grammars
-
-Multilingual treebanks
-
-
- > gr -number=2 | l -treebank
-
- Is (That Cheese) (Very Boring)
- quel formaggio è molto noioso
- that cheese is very boring
-
- Is (That Cheese) Fresh
- quel formaggio è fresco
- that cheese is fresh
-
-
-Translation quiz
-translation_quiz = tq:
-generate random sentences, display them in one language, and check the user's
-answer given in another language.
-
- > translation_quiz -from=FoodEng -to=FoodIta
-
- Welcome to GF Translation Quiz.
- The quiz is over when you have done at least 10 examples
- with at least 75 % success.
- You can interrupt the quiz by entering a line consisting of a dot ('.').
-
- this fish is warm
- questo pesce è caldo
- > Yes.
- Score 1/1
-
- this cheese is Italian
- questo formaggio è noioso
- > No, not questo formaggio è noioso, but
- questo formaggio è italiano
-
- Score 1/2
- this fish is expensive
-
-
-Context-free grammars and GF
-
-The "cf" grammar format
-FoodEng can be written in a BNF format as follows:
-
- Is. Phrase ::= Item "is" Quality ;
- That. Item ::= "that" Kind ;
- This. Item ::= "this" Kind ;
- QKind. Kind ::= Quality Kind ;
- Cheese. Kind ::= "cheese" ;
- Fish. Kind ::= "fish" ;
- Wine. Kind ::= "wine" ;
- Italian. Quality ::= "Italian" ;
- Boring. Quality ::= "boring" ;
- Delicious. Quality ::= "delicious" ;
- Expensive. Quality ::= "expensive" ;
- Fresh. Quality ::= "fresh" ;
- Very. Quality ::= "very" Quality ;
- Warm. Quality ::= "warm" ;
-
-.cf (for context-free):
-
- > import food.cf
-
-Restrictions of context-free grammars
-
-
-
-{x x | x <- (a|b)*} in GF.
-Modules and files
-
-
-
-.gf
-.gfo
-
- > i FoodEng.gf
- - compiling Food.gf... wrote file Food.gfo 16 msec
- - compiling FoodEng.gf... wrote file FoodEng.gfo 20 msec
-
-.gfo format (="GF Object") is precompiled GF, which is
-faster to load than source GF (.gf).
-.gfo file or to generate
-a new one, by looking at modification times.
-FoodEng.gf for
-a second time? Try this in different situations:
-
-
-
-empty (e), which clears the memory
- of GF.
-FoodEng.gf, be it only an added space.
-Food.gf.
-Using operations and resource modules
-
-Operation definitions
-oper (for
-operation), distinct from fun for the sake of clarity.
-
- oper ss : Str -> {s : Str} = \x -> {s = x} ;
-
-
- ss "boy" ===> {s = "boy"}
-
-===> will be used for computation.
-
-
-
-\x -> t
-
-
-
-
- \x,y -> t === \x -> \y -> t
-
-
- lin f x = t === lin f = \x -> t
-
-
-The ``resource`` module type
-resource module type is used to package
-oper definitions into reusable resources.
-
- resource StringOper = {
- oper
- SS : Type = {s : Str} ;
- ss : Str -> SS = \x -> {s = x} ;
- cc : SS -> SS -> SS = \x,y -> ss (x.s ++ y.s) ;
- prefix : Str -> SS -> SS = \p,x -> ss (p ++ x.s) ;
- }
-
-
-Opening a resource
-resource modules can be
-opened in a concrete syntax.
-
- concrete FoodEng of Food = open StringOper in {
-
- lincat
- S, Item, Kind, Quality = SS ;
-
- lin
- Is item quality = cc item (prefix "is" quality) ;
- This k = prefix "this" k ;
- That k = prefix "that" k ;
- QKind k q = cc k q ;
- Wine = ss "wine" ;
- Cheese = ss "cheese" ;
- Fish = ss "fish" ;
- Very = prefix "very" ;
- Fresh = ss "fresh" ;
- Warm = ss "warm" ;
- Italian = ss "Italian" ;
- Expensive = ss "expensive" ;
- Delicious = ss "delicious" ;
- Boring = ss "boring" ;
- }
-
-
-Partial application
-
-
- lin This k = prefix "this" k ;
-
-
- lin This = prefix "this" ;
-
-prefix is typically applied to
-linearization variables with constant strings. Hence we
-put the Str argument before the SS argument.
-infix analogous to prefix,
-such that it allows you to write
-
- lin Is = infix "is" ;
-
-
-Testing resource modules
--retain,
-
- > import -retain StringOper.gf
-
-compute_concrete = cc,
-
- > compute_concrete prefix "in" (ss "addition")
- {s : Str = "in" ++ "addition"}
-
-
-Grammar architecture
-
-
-Extending a grammar
-
- abstract Morefood = Food ** {
- cat
- Question ;
- fun
- QIs : Item -> Quality -> Question ;
- Pizza : Kind ;
- }
-
-
- concrete MorefoodEng of Morefood = FoodEng ** {
- lincat
- Question = {s : Str} ;
- lin
- QIs item quality = {s = "is" ++ item.s ++ quality.s} ;
- Pizza = {s = "pizza"} ;
- }
-
-
- concrete MorefoodIta of Morefood = FoodIta ** open StringOper in {
- lincat
- Question = SS ;
- lin
- QIs item quality = ss (item.s ++ "è" ++ quality.s) ;
- Pizza = ss "pizza" ;
- }
-
-Multiple inheritance
-
- abstract Foodmarket = Food, Fruit, Mushroom ** {
- fun
- FruitKind : Fruit -> Kind ;
- MushroomKind : Mushroom -> Kind ;
- }
-
-
- abstract Fruit = {
- cat Fruit ;
- fun Apple, Peach : Fruit ;
- }
-
- abstract Mushroom = {
- cat Mushroom ;
- fun Cep, Agaric : Mushroom ;
- }
-
-
-Food by taking apart Wine into a special
-Drink module.
-Lesson 3: Grammars with parameters
-
-
-
-
-
-
-
-
-
-The problem: words have to be inflected
-
-Different languages have different types of inflection and agreement. -
--In a multilingual grammar, -we want to ignore such distinctions in abstract syntax. -
--Exercise. Make a list of the possible forms that nouns, -adjectives, and verbs can have in some languages that you know. -
-- -
- --We define the parameter type of number in English by -a new form of judgement: -
-- param Number = Sg | Pl ; --
-This judgement defines the parameter type Number by listing
-its two constructors, Sg and Pl
-(singular and plural).
-
-We give Kind a linearization type that has a table depending on number:
-
- lincat Kind = {s : Number => Str} ;
-
-
-The table type Number => Str is similar a function type
-(Number -> Str).
-
-Difference: the argument must be a parameter type. Then -the argument-value pairs can be listed in a finite table. -
-- -
--Here is a table: -
-
- lin Cheese = {
- s = table {
- Sg => "cheese" ;
- Pl => "cheeses"
- }
- } ;
-
-
-The table has branches, with a pattern on the
-left of the arrow => and a value on the right.
-
-The application of a table is done by the selection operator !.
-
-It which is computed by pattern matching: return -the value from the first branch whose pattern matches the -argument. For instance, -
-
- table {Sg => "cheese" ; Pl => "cheeses"} ! Pl
- ===> "cheeses"
-
-
-- -
--Case expressions are syntactic sugar: -
-
- case e of {...} === table {...} ! e
-
--Since they are familiar to Haskell and ML programmers, they can come out handy -when writing GF programs. -
-- -
--Constructors can take arguments from other parameter types. -
--Example: forms of English verbs (except be): -
-- param VerbForm = VPresent Number | VPast | VPastPart | VPresPart ; --
-Fact expressed: only present tense has number variation. -
--Example table: the forms of the verb drink: -
-
- table {
- VPresent Sg => "drinks" ;
- VPresent Pl => "drink" ;
- VPast => "drank" ;
- VPastPart => "drunk" ;
- VPresPart => "drinking"
- }
-
-
-
-Exercise. In an earlier exercise (previous section),
-you made a list of the possible
-forms that nouns, adjectives, and verbs can have in some languages that
-you know. Now take some of the results and implement them by
-using parameter type definitions and tables. Write them into a resource
-module, which you can test by using the command compute_concrete.
-
- -
- --A morphological paradigm is a formula telling how a class of -words is inflected. -
--From the GF point of view, a paradigm is a function that takes -a lemma (also known as a dictionary form, or a citation form) and -returns an inflection table. -
--The following operation defines the regular noun paradigm of English: -
-
- oper regNoun : Str -> {s : Number => Str} = \dog -> {
- s = table {
- Sg => dog ;
- Pl => dog + "s"
- }
- } ;
-
-
-The gluing operator + glues strings to one token:
-
- (regNoun "cheese").s ! Pl ===> "cheese" + "s" ===> "cheeses" -- -
- -
--A more complex example: regular verbs, -
-
- oper regVerb : Str -> {s : VerbForm => Str} = \talk -> {
- s = table {
- VPresent Sg => talk + "s" ;
- VPresent Pl => talk ;
- VPresPart => talk + "ing" ;
- _ => talk + "ed"
- }
- } ;
-
-
-The catch-all case for the past tense and the past participle
-uses a wild card pattern _.
-
- -
- -regNoun paradigm does not
-apply in English, and implement some alternative paradigms.
-
-- -
- --Purpose: a more radical -variation between languages -than just the use of different words and word orders. -
-
-We add to the grammar Food two rules for forming plural items:
-
- fun These, Those : Kind -> Item ; --
-We also add a noun which in Italian has the feminine case: -
-- fun Pizza : Kind ; --
-This will force us to deal with gender- -
-- -
- --In English, the phrase-forming rule -
-- fun Is : Item -> Quality -> Phrase ; --
-is affected by the number because of subject-verb agreement: -the verb of a sentence must be inflected in the number of the subject, -
-- Is (This Pizza) Warm ===> "this pizza is warm" - Is (These Pizza) Warm ===> "these pizzas are warm" --
-It is the copula (the verb be) that is affected: -
-
- oper copula : Number -> Str = \n ->
- case n of {
- Sg => "is" ;
- Pl => "are"
- } ;
-
-
-The subject Item must have such a number to provide to the copula:
-
- lincat Item = {s : Str ; n : Number} ;
-
--Now we can write -
-
- lin Is item qual = {s = item.s ++ copula item.n ++ qual.s} ;
-
-
-- -
- -
-How does an Item subject receive its number? The rules
-
- fun This, These : Kind -> Item ; --
-add determiners, either this or these, which -require different this pizza vs. -these pizzas. -
-
-Thus Kind must have both singular and plural forms:
-
- lincat Kind = {s : Number => Str} ;
-
--We can write -
-
- lin This kind = {
- s = "this" ++ kind.s ! Sg ;
- n = Sg
- } ;
-
- lin These kind = {
- s = "these" ++ kind.s ! Pl ;
- n = Pl
- } ;
-
-
-- -
--To avoid copy-and-paste, we can factor out the pattern of determination, -
-
- oper det :
- Str -> Number -> {s : Number => Str} -> {s : Str ; n : Number} =
- \det,n,kind -> {
- s = det ++ kind.s ! n ;
- n = n
- } ;
-
--Now we can write -
-- lin This = det Sg "this" ; - lin These = det Pl "these" ; --
-In a more lexicalized grammar, determiners would be a category: -
-
- lincat Det = {s : Str ; n : Number} ;
- fun Det : Det -> Kind -> Item ;
- lin Det det kind = {
- s = det.s ++ kind.s ! det.n ;
- n = det.n
- } ;
-
-
-- -
- -
-Kinds have number as a parametric feature: both singular and plural
-can be formed,
-
- lincat Kind = {s : Number => Str} ;
-
-
-Items have number as an inherent feature: they are inherently either
-singular or plural,
-
- lincat Item = {s : Str ; n : Number} ;
-
-
-Italian Kind will have parametric number and inherent gender:
-
- lincat Kind = {s : Number => Str ; g : Gender} ;
-
-
-- -
--Questions to ask when designing parameters: -
--Dictionaries give good advice: -
-For words, inherent features are usually given as lexical information. -
--For combinations, they are inherited from some part of the construction -(typically the one called the head). Italian modification: -
-
- lin QKind qual kind =
- let gen = kind.g in {
- s = table {n => kind.s ! n ++ qual.s ! gen ! n} ;
- g = gen
- } ;
-
--Notice -
-let expression)
-n
-- -
- -
-We use some string operations from the library Prelude are used.
-
- concrete FoodsEng of Foods = open Prelude in {
-
- lincat
- S, Quality = SS ;
- Kind = {s : Number => Str} ;
- Item = {s : Str ; n : Number} ;
-
- lin
- Is item quality = ss (item.s ++ copula item.n ++ quality.s) ;
- This = det Sg "this" ;
- That = det Sg "that" ;
- These = det Pl "these" ;
- Those = det Pl "those" ;
- QKind quality kind = {s = table {n => quality.s ++ kind.s ! n}} ;
- Wine = regNoun "wine" ;
- Cheese = regNoun "cheese" ;
- Fish = noun "fish" "fish" ;
- Pizza = regNoun "pizza" ;
- Very = prefixSS "very" ;
- Fresh = ss "fresh" ;
- Warm = ss "warm" ;
- Italian = ss "Italian" ;
- Expensive = ss "expensive" ;
- Delicious = ss "delicious" ;
- Boring = ss "boring" ;
-
-
-- -
-
- param
- Number = Sg | Pl ;
-
- oper
- det : Number -> Str -> {s : Number => Str} -> {s : Str ; n : Number} =
- \n,d,cn -> {
- s = d ++ cn.s ! n ;
- n = n
- } ;
- noun : Str -> Str -> {s : Number => Str} =
- \man,men -> {s = table {
- Sg => man ;
- Pl => men
- }
- } ;
- regNoun : Str -> {s : Number => Str} =
- \car -> noun car (car + "s") ;
- copula : Number -> Str =
- \n -> case n of {
- Sg => "is" ;
- Pl => "are"
- } ;
- }
-
-
-- -
- --Let us extend the English noun paradigms so that we can -deal with all nouns, not just the regular ones. The goal is to -provide a morphology module that makes it easy to -add words to a lexicon. -
-- -
- --We perform data abstraction from the type -of nouns by writing a a worst-case function: -
-
- oper Noun : Type = {s : Number => Str} ;
-
- oper mkNoun : Str -> Str -> Noun = \x,y -> {
- s = table {
- Sg => x ;
- Pl => y
- }
- } ;
-
- oper regNoun : Str -> Noun = \x -> mkNoun x (x + "s") ;
-
--Then we can define -
-- lincat N = Noun ; - lin Mouse = mkNoun "mouse" "mice" ; - lin House = regNoun "house" ; --
-where the underlying types are not seen. -
-- -
--We are free to change the undelying definitions, e.g. -add case (nominative or genitive) to noun inflection: -
-
- param Case = Nom | Gen ;
-
- oper Noun : Type = {s : Number => Case => Str} ;
-
--Now we have to redefine the worst-case function -
-
- oper mkNoun : Str -> Str -> Noun = \x,y -> {
- s = table {
- Sg => table {
- Nom => x ;
- Gen => x + "'s"
- } ;
- Pl => table {
- Nom => y ;
- Gen => y + case last y of {
- "s" => "'" ;
- _ => "'s"
- }
- }
- } ;
-
--But up from this level, we can retain the old definitions -
-- lin Mouse = mkNoun "mouse" "mice" ; - oper regNoun : Str -> Noun = \x -> mkNoun x (x + "s") ; -- -
- -
-
-In the last definition of mkNoun, we used a case expression
-on the last character of the plural, as well as the Prelude
-operation
-
- last : Str -> Str ; --
-returning the string consisting of the last character. -
--The case expression uses pattern matching over strings, which -is supported in GF, alongside with pattern matching over -parameters. -
-- -
- --The regular dog-dogs paradigm has -predictable variations: -
--We could provide alternative paradigms: -
-- noun_y : Str -> Noun = \fly -> mkNoun fly (init fly + "ies") ; - noun_s : Str -> Noun = \bus -> mkNoun bus (bus + "es") ; --
-(The Prelude function init drops the last character of a token.)
-
-Drawbacks: -
-- -
--Better solution: a smart paradigm: -
-
- regNoun : Str -> Noun = \w ->
- let
- ws : Str = case w of {
- _ + ("a" | "e" | "i" | "o") + "o" => w + "s" ; -- bamboo
- _ + ("s" | "x" | "sh" | "o") => w + "es" ; -- bus, hero
- _ + "z" => w + "zes" ;-- quiz
- _ + ("a" | "e" | "o" | "u") + "y" => w + "s" ; -- boy
- x + "y" => x + "ies" ;-- fly
- _ => w + "s" -- car
- }
- in
- mkNoun w ws
-
--GF has regular expression patterns: -
-| Q
-+ Q
-
-The patterns are ordered in such a way that, for instance,
-the suffix "oo" prevents bamboo from matching the suffix
-"o".
-
- -
- -regNoun so that the analysis needed to build s-forms
-is factored out as a separate oper, which is shared with
-regVerb.
-
-- -
- --In Lesson 5, dependent function types need a notation -that binds a variable to the argument type, as in -
-- switchOff : (k : Kind) -> Action k --
-Function types without variables are actually a shorthand: -
-- PredVP : NP -> VP -> S --
-means -
-- PredVP : (x : NP) -> (y : VP) -> S --
-or any other naming of the variables. -
-- -
--Sometimes variables shorten the code, since they can share a type: -
-- octuple : (x,y,z,u,v,w,s,t : Str) -> Str --
-If a bound variable is not used, it can be replaced by a wildcard: -
-- octuple : (_,_,_,_,_,_,_,_ : Str) -> Str --
-A good practice is to indicate the number of arguments: -
-- octuple : (x1,_,_,_,_,_,_,x8 : Str) -> Str --
-For inflection paradigms, it is handy to use heuristic variable names, -looking like the expected forms: -
-- mkNoun : (mouse,mice : Str) -> Noun -- -
- -
- -
-In librarues, it is useful to group type signatures separately from
-definitions. It is possible to divide an oper judgement,
-
- oper regNoun : Str -> Noun ; - oper regNoun s = mkNoun s (s + "s") ; --
-and put the parts in different places. -
-
-With the interface and instance module types
-(see here): the parts can even be put to different files.
-
- -
- --Overloading: different functions can be given the same name, as e.g. in C++. -
--The compiler performs overload resolution, which works as long as the -functions have different types. -
-
-In GF, the functions must be grouped together in overload groups.
-
-Example: different ways to define nouns in English: -
-
- oper mkN : overload {
- mkN : (dog : Str) -> Noun ; -- regular nouns
- mkN : (mouse,mice : Str) -> Noun ; -- irregular nouns
- }
-
--Cf. dictionaries: if the -word is regular, just one form is needed. If it is irregular, -more forms are given. -
--The definition can be given separately, or at the same time, as the types: -
-
- oper mkN = overload {
- mkN : (dog : Str) -> Noun = regNoun ;
- mkN : (mouse,mice : Str) -> Noun = mkNoun ;
- }
-
--Exercise. Design a system of English verb paradigms presented by -an overload group. -
-- -
- -
-The command morpho_analyse = ma
-can be used to read a text and return for each word its analyses
-(in the current grammar):
-
- > read_file bible.txt | morpho_analyse --
-The command morpho_quiz = mq generates inflection exercises.
-
- % gf -path=alltenses:prelude $GF_LIB_PATH/alltenses/IrregFre.gfo - - > morpho_quiz -cat=V - - Welcome to GF Morphology Quiz. - ... - - réapparaître : VFin VCondit Pl P2 - réapparaitriez - > No, not réapparaitriez, but - réapparaîtriez - Score 0/1 --
-To create a list for later use, use the command morpho_list = ml
-
- > morpho_list -number=25 -cat=V | write_file exx.txt -- -
- -
- --Parameters include not only number but also gender. -
-
- concrete FoodsIta of Foods = open Prelude in {
-
- param
- Number = Sg | Pl ;
- Gender = Masc | Fem ;
-
--Qualities are inflected for gender and number, whereas kinds -have a parametric number and an inherent gender. -Items have an inherent number and gender. -
-
- lincat
- Phr = SS ;
- Quality = {s : Gender => Number => Str} ;
- Kind = {s : Number => Str ; g : Gender} ;
- Item = {s : Str ; g : Gender ; n : Number} ;
-
-
-- -
--A Quality is an adjective, with one form for each gender-number combination. -
-
- oper
- adjective : (_,_,_,_ : Str) -> {s : Gender => Number => Str} =
- \nero,nera,neri,nere -> {
- s = table {
- Masc => table {
- Sg => nero ;
- Pl => neri
- } ;
- Fem => table {
- Sg => nera ;
- Pl => nere
- }
- }
- } ;
-
--Regular adjectives work by adding endings to the stem. -
-
- regAdj : Str -> {s : Gender => Number => Str} = \nero ->
- let ner = init nero
- in adjective nero (ner + "a") (ner + "i") (ner + "e") ;
-
-
-- -
--For noun inflection, we are happy to give the two forms and the gender -explicitly: -
-
- noun : Str -> Str -> Gender -> {s : Number => Str ; g : Gender} =
- \vino,vini,g -> {
- s = table {
- Sg => vino ;
- Pl => vini
- } ;
- g = g
- } ;
-
--We need only number variation for the copula. -
-
- copula : Number -> Str =
- \n -> case n of {
- Sg => "è" ;
- Pl => "sono"
- } ;
-
-
-- -
--Determination is more complex than in English, because of gender: -
-
- det : Number -> Str -> Str -> {s : Number => Str ; g : Gender} ->
- {s : Str ; g : Gender ; n : Number} =
- \n,m,f,cn -> {
- s = case cn.g of {Masc => m ; Fem => f} ++ cn.s ! n ;
- g = cn.g ;
- n = n
- } ;
-
-
-- -
--The complete set of linearization rules: -
-
- lin
- Is item quality =
- ss (item.s ++ copula item.n ++ quality.s ! item.g ! item.n) ;
- This = det Sg "questo" "questa" ;
- That = det Sg "quel" "quella" ;
- These = det Pl "questi" "queste" ;
- Those = det Pl "quei" "quelle" ;
- QKind quality kind = {
- s = \\n => kind.s ! n ++ quality.s ! kind.g ! n ;
- g = kind.g
- } ;
- Wine = noun "vino" "vini" Masc ;
- Cheese = noun "formaggio" "formaggi" Masc ;
- Fish = noun "pesce" "pesci" Masc ;
- Pizza = noun "pizza" "pizze" Fem ;
- Very qual = {s = \\g,n => "molto" ++ qual.s ! g ! n} ;
- Fresh = adjective "fresco" "fresca" "freschi" "fresche" ;
- Warm = regAdj "caldo" ;
- Italian = regAdj "italiano" ;
- Expensive = regAdj "caro" ;
- Delicious = regAdj "delizioso" ;
- Boring = regAdj "noioso" ;
- }
-
-
-- -
- -Foods grammars.
-
-Food for a language of your choice,
-now aiming for complete grammatical correctness by the use of parameters.
-
-FoodsIta. You can do this by printing the grammar in the context-free format
-(print_grammar -printer=bnf) and counting the lines.
-- -
- --A linearization record may contain more strings than one, and those -strings can be put apart in linearization. -
--Example: English particle -verbs, (switch off). The object can appear between: -
--he switched it off -
--The verb switch off is called a -discontinuous constituents. -
--We can define transitive verbs and their combinations as follows: -
-
- lincat TV = {s : Number => Str ; part : Str} ;
-
- fun AppTV : Item -> TV -> Item -> Phrase ;
-
- lin AppTV subj tv obj =
- {s = subj.s ++ tv.s ! subj.n ++ obj.s ++ tv.part} ;
-
-
-
-Exercise. Define the language a^n b^n c^n in GF, i.e.
-any number of a's followed by the same number of b's and
-the same number of c's. This language is not context-free,
-but can be defined in GF by using discontinuous constituents.
-
- -
- --Tokens are created in the following ways: -
-"foo"
-t + s
-init, tail, tk, dp
--Since tokens must be known at compile time, -the above operations may not be applied to run-time variables -(i.e. variables that stand for function arguments in linearization rules). -
--Hence it is not legal to write -
-
- cat Noun ;
- fun Plural : Noun -> Noun ;
- lin Plural n = {s = n.s + "s"} ;
-
-
-because n is a run-time variable. Also
-
- lin Plural n = {s = (regNoun n).s ! Pl} ;
-
-
-is incorrect with regNoun as defined here, because the run-time
-variable is eventually sent to string pattern matching and gluing.
-
- -
--How to write tokens together without a space? -
-
- lin Question p = {s = p + "?"} ;
-
--is incorrect. -
--The way to go is to use an unlexer that creates correct spacing -after linearization. -
-
-Correspondingly, a lexer that e.g. analyses "warm?" into
-to tokens is needed before parsing.
-This topic will be covered in here.
-
- -
- -
-The symbol ** is used for both record types and record objects.
-
- lincat TV = Verb ** {c : Case} ;
-
- lin Follow = regVerb "folgen" ** {c = Dative} ;
-
-
-TV becomes a subtype of Verb.
-
-If T is a subtype of R, an object of T can be used whenever -an object of R is required. -
--Covariance: a function returning a record T as value can -also be used to return a value of a supertype R. -
--Contravariance: a function taking an R as argument -can also be applied to any object of a subtype T. -
-- -
--Product types and tuples are syntactic sugar for record types and records: -
-
- T1 * ... * Tn === {p1 : T1 ; ... ; pn : Tn}
- <t1, ..., tn> === {p1 = T1 ; ... ; pn = Tn}
-
-
-Thus the labels p1, p2,... are hard-coded.
-
- -
--English indefinite article: -
-
- oper artIndef : Str =
- pre {"a" ; "an" / strs {"a" ; "e" ; "i" ; "o"}} ;
-
--Thus -
-- artIndef ++ "cheese" ---> "a" ++ "cheese" - artIndef ++ "apple" ---> "an" ++ "apple" -- -
- -
- --Goals: -
-- -
- --The current 16 resource languages (GF version 3.2, December 2010) are -
-Bulgarian
-Catalan
-Danish
-Dutch
-English
-Finnish
-French
-German
-Italian
-Norwegian
-Polish
-Ron, Romanian
-Russian
-Spanish
-Swedish
-Urdu
-
-The first three letters (Eng etc) are used in grammar module names
-(ISO 639-3 standard).
-
- -
- --Semantic grammars (up to now in this tutorial): -a grammar defines a system of meanings (abstract syntax) and -tells how they are expressed(concrete syntax). -
--Resource grammars (as usual in linguistic tradition): -a grammar specifies the grammatically correct combinations of words, -whatever their meanings are. -
--With resource grammars, we can achieve a -wider coverage than with semantic grammars. -
-- -
- --A resource grammar has two kinds of categories and two kinds of rules: -
--GE makes no formal distinction between these two kinds. -
--But it is a good discipline to follow. -
-- -
- --Two kinds of lexical categories: -
-- Conj ; -- conjunction e.g. "and" - Det ; -- determiner e.g. "this" -- -
- N ; -- noun e.g. "pizza" - A ; -- adjective e.g. "good" - V ; -- verb e.g. "sleep" --
- -
- -
-Closed classes: module Syntax. In the Foods grammar, we need
-
- this_Det, that_Det, these_Det, those_Det : Det ; - very_AdA : AdA ; --
-Naming convention: word followed by the category (so we can -distinguish the quantifier that from the conjunction that). -
-
-Open classes have no objects in Syntax. Words are
-built as they are needed in applications: if we have
-
- fun Wine : Kind ; --
-we will define -
-- lin Wine = mkN "wine" ; --
-where we use mkN from ParadigmsEng:
-
- -
- --Alternative concrete syntax for -
-- fun Wine : Kind ; --
-is to provide a resource lexicon, which contains definitions such as -
-- oper wine_N : N = mkN "wine" ; --
-so that we can write -
-- lin Wine = wine_N ; --
-Advantages: -
-- -
- -
-In Foods, we need just four phrasal categories:
-
- Cl ; -- clause e.g. "this pizza is good" - NP ; -- noun phrase e.g. "this pizza" - CN ; -- common noun e.g. "warm pizza" - AP ; -- adjectival phrase e.g. "very warm" --
-Clauses are similar to sentences (S), but without a
-fixed tense and mood; see here for how they relate.
-
-Common nouns are made into noun phrases by adding determiners. -
-- -
- --We need the following combinations: -
-- mkCl : NP -> AP -> Cl ; -- e.g. "this pizza is very warm" - mkNP : Det -> CN -> NP ; -- e.g. "this pizza" - mkCN : AP -> CN -> CN ; -- e.g. "warm pizza" - mkAP : AdA -> AP -> AP ; -- e.g. "very warm" --
-We also need lexical insertion, to form phrases from single words: -
-- mkCN : N -> NP ; - mkAP : A -> AP ; --
-Naming convention: to construct a C, use a function mkC.
-
-Heavy overloading: the current library
-(version 1.2) has 23 operations named mkNP!
-
- -
- --The sentence -
- mkCl - (mkNP these_Det - (mkCN (mkAP very_AdA (mkAP warm_A)) (mkCN pizza_CN))) - (mkAP italian_AP) --
-The task now: to define the concrete syntax of Foods so that
-this syntactic tree gives the value of linearizing the semantic tree
-
- Is (These (QKind (Very Warm) Pizza)) Italian -- -
- -
- --Language-specific and language-independent parts - roughly, -
-SyntaxL has the same types and
- functions for all languages L
-ParadigmsL has partly
- different types and functions
- for different languages L
--Full API documentation on-line: the resource synopsis, -
-
-grammaticalframework.org/lib/doc/synopsis.html
-
- -
- -| Category | -Explanation | -Example | -|
|---|---|---|---|
Cl |
-clause (sentence), with all tenses | -she looks at this | -|
AP |
-adjectival phrase | -very warm | -|
CN |
-common noun (without determiner) | -red house | -|
NP |
-noun phrase (subject or object) | -the red house | -|
AdA |
-adjective-modifying adverb, | -very | -|
Det |
-determiner | -these | -|
A |
-one-place adjective | -warm | -|
N |
-common noun | -house | -|
- -
- -| Function | -Type | -Example | -|
|---|---|---|---|
mkCl |
-NP -> AP -> Cl |
-John is very old | -|
mkNP |
-Det -> CN -> NP |
-these old man | -|
mkCN |
-N -> CN |
-house | -|
mkCN |
-AP -> CN -> CN |
-very big blue house | -|
mkAP |
-A -> AP |
-old | -|
mkAP |
-AdA -> AP -> AP |
-very very old | -|
- -
- -| Function | -Type | -In English | -|
|---|---|---|---|
this_Det |
-Det |
-this | -|
that_Det |
-Det |
-that | -|
these_Det |
-Det |
-this | -|
those_Det |
-Det |
-that | -|
very_AdA |
-AdA |
-very | -|
- -
- -
-From ParadigmsEng:
-
| Function | -Type | -|
|---|---|---|
mkN |
-(dog : Str) -> N |
-|
mkN |
-(man,men : Str) -> N |
-|
mkA |
-(cold : Str) -> A |
-|
-From ParadigmsIta:
-
| Function | -Type | -|
|---|---|---|
mkN |
-(vino : Str) -> N |
-|
mkA |
-(caro : Str) -> A |
-|
- -
- -
-From ParadigmsGer:
-
| Function | -Type | -|
|---|---|---|
Gender |
-Type |
-|
masculine |
-Gender |
-|
feminine |
-Gender |
-|
neuter |
-Gender |
-|
mkN |
-(Stufe : Str) -> N |
-|
mkN |
-(Bild,Bilder : Str) -> Gender -> N |
-|
mkA |
-(klein : Str) -> A |
-|
mkA |
-(gut,besser,beste : Str) -> A |
-|
-From ParadigmsFin:
-
| Function | -Type | -|
|---|---|---|
mkN |
-(talo : Str) -> N |
-|
mkA |
-(hieno : Str) -> A |
-|
- -
- --1. Try out the morphological paradigms in different languages. Do -as follows: -
-- > i -path=alltenses -retain alltenses/ParadigmsGer.gfo - > cc -table mkN "Farbe" - > cc -table mkA "gut" "besser" "beste" -- -
- -
- -
-We assume the abstract syntax Foods from Lesson 3.
-
-We don't need to think about inflection and agreement, but just pick -functions from the resource grammar library. -
--We need a path with -
-.
-../foods, in which Foods.gf resides.
-present, which is relative to the
- environment variable GF_LIB_PATH
--Thus the beginning of the module is -
-
- --# -path=.:../foods:present
-
- concrete FoodsEng of Foods = open SyntaxEng,ParadigmsEng in {
-
-
-- -
- -
-As linearization types, we use clauses for Phrase, noun phrases
-for Item, common nouns for Kind, and adjectival phrases for Quality.
-
- lincat - Phrase = Cl ; - Item = NP ; - Kind = CN ; - Quality = AP ; --
-Now the combination rules we need almost write themselves automatically: -
-- lin - Is item quality = mkCl item quality ; - This kind = mkNP this_Det kind ; - That kind = mkNP that_Det kind ; - These kind = mkNP these_Det kind ; - Those kind = mkNP those_Det kind ; - QKind quality kind = mkCN quality kind ; - Very quality = mkAP very_AdA quality ; -- -
- -
- --We use resource paradigms and lexical insertion rules. -
--The two-place noun paradigm is needed only once, for -fish - everythins else is regular. -
-- Wine = mkCN (mkN "wine") ; - Pizza = mkCN (mkN "pizza") ; - Cheese = mkCN (mkN "cheese") ; - Fish = mkCN (mkN "fish" "fish") ; - Fresh = mkAP (mkA "fresh") ; - Warm = mkAP (mkA "warm") ; - Italian = mkAP (mkA "Italian") ; - Expensive = mkAP (mkA "expensive") ; - Delicious = mkAP (mkA "delicious") ; - Boring = mkAP (mkA "boring") ; - } -- -
- -
- -
-1. Compile the grammar FoodsEng and generate
-and parse some sentences.
-
-2. Write a concrete syntax of Foods for Italian
-or some other language included in the resource library. You can
-compare the results with the hand-written
-grammars presented earlier in this tutorial.
-
- -
- -
-If you write a concrete syntax of Foods for some other
-language, much of the code will look exactly the same
-as for English. This is because
-
Syntax API is the same for all languages (because
- all languages in the resource package do implement the same
- syntactic structures)
--But lexical rules are more language-dependent. -
--Thus, to port a grammar to a new language, you -
--Can we avoid this programming by copy-and-paste? -
-- -
- --Functors familiar from the functional programming languages ML and OCaml, -also known as parametrized modules. -
-
-In GF, a functor is a module that opens one or more interfaces.
-
-An interface is a module similar to a resource, but it only
-contains the types of opers, not (necessarily) their definitions.
-
-Syntax for functors: add the keyword incomplete. We will use the header
-
- incomplete concrete FoodsI of Foods = open Syntax, LexFoods in --
-where -
-- interface Syntax -- the resource grammar interface - interface LexFoods -- the domain lexicon interface --
-When we moreover have -
-- instance SyntaxEng of Syntax -- the English resource grammar - instance LexFoodsEng of LexFoods -- the English domain lexicon --
-we can write a functor instantiation, -
-- concrete FoodsGer of Foods = FoodsI with - (Syntax = SyntaxGer), - (LexFoods = LexFoodsGer) ; -- -
- -
- -
- --# -path=.:../foods
-
- incomplete concrete FoodsI of Foods = open Syntax, LexFoods in {
- lincat
- Phrase = Cl ;
- Item = NP ;
- Kind = CN ;
- Quality = AP ;
- lin
- Is item quality = mkCl item quality ;
- This kind = mkNP this_Det kind ;
- That kind = mkNP that_Det kind ;
- These kind = mkNP these_Det kind ;
- Those kind = mkNP those_Det kind ;
- QKind quality kind = mkCN quality kind ;
- Very quality = mkAP very_AdA quality ;
-
- Wine = mkCN wine_N ;
- Pizza = mkCN pizza_N ;
- Cheese = mkCN cheese_N ;
- Fish = mkCN fish_N ;
- Fresh = mkAP fresh_A ;
- Warm = mkAP warm_A ;
- Italian = mkAP italian_A ;
- Expensive = mkAP expensive_A ;
- Delicious = mkAP delicious_A ;
- Boring = mkAP boring_A ;
- }
-
-
-- -
- -
- interface LexFoods = open Syntax in {
- oper
- wine_N : N ;
- pizza_N : N ;
- cheese_N : N ;
- fish_N : N ;
- fresh_A : A ;
- warm_A : A ;
- italian_A : A ;
- expensive_A : A ;
- delicious_A : A ;
- boring_A : A ;
- }
-
-
-- -
- -
- instance LexFoodsGer of LexFoods = open SyntaxGer, ParadigmsGer in {
- oper
- wine_N = mkN "Wein" ;
- pizza_N = mkN "Pizza" "Pizzen" feminine ;
- cheese_N = mkN "Käse" "Käsen" masculine ;
- fish_N = mkN "Fisch" ;
- fresh_A = mkA "frisch" ;
- warm_A = mkA "warm" "wärmer" "wärmste" ;
- italian_A = mkA "italienisch" ;
- expensive_A = mkA "teuer" ;
- delicious_A = mkA "köstlich" ;
- boring_A = mkA "langweilig" ;
- }
-
-
-- -
- -- --# -path=.:../foods:present - - concrete FoodsGer of Foods = FoodsI with - (Syntax = SyntaxGer), - (LexFoods = LexFoodsGer) ; -- -
- -
- --Just two modules are needed: -
--The functor instantiation is completely mechanical to write. -
--The domain lexicon instance requires some knowledge of the words of the -language: -
-- -
- --Lexicon instance -
-
- instance LexFoodsFin of LexFoods = open SyntaxFin, ParadigmsFin in {
- oper
- wine_N = mkN "viini" ;
- pizza_N = mkN "pizza" ;
- cheese_N = mkN "juusto" ;
- fish_N = mkN "kala" ;
- fresh_A = mkA "tuore" ;
- warm_A = mkA "lämmin" ;
- italian_A = mkA "italialainen" ;
- expensive_A = mkA "kallis" ;
- delicious_A = mkA "herkullinen" ;
- boring_A = mkA "tylsä" ;
- }
-
--Functor instantiation -
-- --# -path=.:../foods:present - - concrete FoodsFin of Foods = FoodsI with - (Syntax = SyntaxFin), - (LexFoods = LexFoodsFin) ; -- -
- -
- --This can be seen as a design pattern for multilingual grammars: -
-- concrete DomainL* - - instance LexDomainL instance SyntaxL* - - incomplete concrete DomainI - / | \ - interface LexDomain abstract Domain interface Syntax* --
-Modules marked with * are either given in the library, or trivial.
-
-Of the hand-written modules, only LexDomainL is language-dependent.
-
- -
- -
-1. Compile and test FoodsGer.
-
-2. Refactor FoodsEng into a functor instantiation.
-
-3. Instantiate the functor FoodsI to some language of
-your choice.
-
-4. Design a small grammar that can be used for controlling -an MP3 player. The grammar should be able to recognize commands such -as play this song, with the following variations: -
--The implementation goes in the following phases: -
-- -
- -
-Problem: a functor only works when all languages use the resource Syntax
-in the same way.
-
-Example (contrived): assume that English has
-no word for Pizza, but has to use the paraphrase Italian pie.
-This is no longer a noun N, but a complex phrase
-in the category CN.
-
-Possible solution: change interface the LexFoods with
-
- oper pizza_CN : CN ; --
-Problem with this solution: -
-- -
- --A module may inherit just a selection of names. -
-
-Example: the FoodMarket example "Rsecarchitecture:
-
- abstract Foodmarket = Food, Fruit [Peach], Mushroom - [Agaric] --
-Here, from Fruit we include Peach only, and from Mushroom
-we exclude Agaric.
-
-A concrete syntax of Foodmarket must make the analogous restrictions.
-
- -
- -
-The English instantiation inherits the functor
-implementation except for the constant Pizza. This constant
-is defined in the body instead:
-
- --# -path=.:../foods:present
-
- concrete FoodsEng of Foods = FoodsI - [Pizza] with
- (Syntax = SyntaxEng),
- (LexFoods = LexFoodsEng) **
- open SyntaxEng, ParadigmsEng in {
-
- lin Pizza = mkCN (mkA "Italian") (mkN "pie") ;
- }
-
-
-- -
- --Abstract syntax modules can be used as interfaces, -and concrete syntaxes as their instances. -
--The following correspondencies are then applied: -
-- cat C <---> oper C : Type - - fun f : A <---> oper f : A - - lincat C = T <---> oper C : Type = T - - lin f = t <---> oper f : A = t -- -
- -
- -
-1. Find resource grammar terms for the following
-English phrases (in the category Phr). You can first try to
-build the terms manually.
-
-every man loves a woman -
--this grammar speaks more than ten languages -
--which languages aren't in the grammar -
--which languages did you want to speak -
--Then translate the phrases to other languages. -
-- -
- -
-In Foods grammars, we have used the path
-
- --# -path=.:../foods --
-The library subdirectory present is a restricted version
-of the resource, with only present tense of verbs and sentences.
-
-By just changing the path, we get all tenses: -
-- --# -path=.:../foods:alltenses --
-Now we can see all the tenses of phrases, by using the -all flag
-in linearization:
-
- > gr | l -all - This wine is delicious - Is this wine delicious - This wine isn't delicious - Isn't this wine delicious - This wine is not delicious - Is this wine not delicious - This wine has been delicious - Has this wine been delicious - This wine hasn't been delicious - Hasn't this wine been delicious - This wine has not been delicious - Has this wine not been delicious - This wine was delicious - Was this wine delicious - This wine wasn't delicious - Wasn't this wine delicious - This wine was not delicious - Was this wine not delicious - This wine had been delicious - Had this wine been delicious - This wine hadn't been delicious - Hadn't this wine been delicious - This wine had not been delicious - Had this wine not been delicious - This wine will be delicious - Will this wine be delicious - This wine won't be delicious - Won't this wine be delicious - This wine will not be delicious - Will this wine not be delicious - This wine will have been delicious - Will this wine have been delicious - This wine won't have been delicious - Won't this wine have been delicious - This wine will not have been delicious - Will this wine not have been delicious - This wine would be delicious - Would this wine be delicious - This wine wouldn't be delicious - Wouldn't this wine be delicious - This wine would not be delicious - Would this wine not be delicious - This wine would have been delicious - Would this wine have been delicious - This wine wouldn't have been delicious - Wouldn't this wine have been delicious - This wine would not have been delicious - Would this wine not have been delicious --
-We also see -
--The list is even longer in languages that have more -tenses and moods, e.g. the Romance languages. -
-- -
- --Goals: -
-- -
- --Problem: to express conditions of semantic well-formedness. -
--Example: a voice command system for a "smart house" wants to -eliminate meaningless commands. -
--Thus we want to restrict particular actions to -particular devices - we can dim a light, but we cannot -dim a fan. -
--The following example is borrowed from the -Regulus Book (Rayner & al. 2006). -
--A simple example is a "smart house" system, which -defines voice commands for household appliances. -
-- -
- --Ontology: -
--Abstract syntax formalizing this: -
-- cat - Command ; - Kind ; - Device Kind ; -- argument type Kind - Action Kind ; - fun - CAction : (k : Kind) -> Action k -> Device k -> Command ; --
-Device and Action are both dependent types.
-
- -
- -
-Assume the kinds light and fan,
-
- light, fan : Kind ; - dim : Action light ; --
-Given a kind, k, you can form the device the k. -
-- DKindOne : (k : Kind) -> Device k ; -- the light --
-Now we can form the syntax tree -
-- CAction light dim (DKindOne light) --
-but we cannot form the trees -
-- CAction light dim (DKindOne fan) - CAction fan dim (DKindOne light) - CAction fan dim (DKindOne fan) -- -
- -
- --Concrete syntax does not know if a category is a dependent type. -
-
- lincat Action = {s : Str} ;
- lin CAction _ act dev = {s = act.s ++ dev.s} ;
-
-
-Notice that the Kind argument is suppressed in linearization.
-
-Parsing with dependent types is performed in two phases: -
-
-By just doing the first phase, the kind argument is not found:
-
- > parse "dim the light" - CAction ? dim (DKindOne light) --
-Moreover, type-incorrect commands are not rejected: -
-- > parse "dim the fan" - CAction ? dim (DKindOne fan) --
-The term ? is a metavariable, returned by the parser
-for any subtree that is suppressed by a linearization rule.
-These are the same kind of metavariables as were used here
-to mark incomplete parts of trees in the syntax editor.
-
- -
- -
-Use the command put_tree = pt with the option -typecheck:
-
- > parse "dim the light" | put_tree -typecheck - CAction light dim (DKindOne light) --
-The typecheck process may fail, in which case an error message
-is shown and no tree is returned:
-
- > parse "dim the fan" | put_tree -typecheck - - Error in tree UCommand (CAction ? 0 dim (DKindOne fan)) : - (? 0 <> fan) (? 0 <> light) -- -
- -
- --Sometimes an action can be performed on all kinds of devices. -
-
-This is represented as a function that takes a Kind as an argument
-and produce an Action for that Kind:
-
- fun switchOn, switchOff : (k : Kind) -> Action k ; --
-Functions of this kind are called polymorphic. -
--We can use this kind of polymorphism in concrete syntax as well, -to express Haskell-type library functions: -
-- oper const :(a,b : Type) -> a -> b -> a = - \_,_,c,_ -> c ; - - oper flip : (a,b,c : Type) -> (a -> b ->c) -> b -> a -> c = - \_,_,_,f,x,y -> f y x ; -- -
- -
- -
-1. Write an abstract syntax module with above contents
-and an appropriate English concrete syntax. Try to parse the commands
-dim the light and dim the fan, with and without solve filtering.
-
-2. Perform random and exhaustive generation, with and without
-solve filtering.
-
-3. Add some device kinds and actions to the grammar. -
-- -
- --Curry-Howard isomorphism = propositions as types principle: -a proposition is a type of proofs (= proof objects). -
--Example: define the less than proposition for natural numbers, -
-- cat Nat ; - fun Zero : Nat ; - fun Succ : Nat -> Nat ; --
-Define inductively what it means for a number x to be less than -a number y: -
-Zero is less than Succ y for any y.
-Succ x is less than Succ y.
-
-Expressing these axioms in type theory
-with a dependent type Less x y and two functions constructing
-its objects:
-
- cat Less Nat Nat ; - fun lessZ : (y : Nat) -> Less Zero (Succ y) ; - fun lessS : (x,y : Nat) -> Less x y -> Less (Succ x) (Succ y) ; --
-Example: the fact that 2 is less that 4 has the proof object -
-- lessS (Succ Zero) (Succ (Succ (Succ Zero))) - (lessS Zero (Succ (Succ Zero)) (lessZ (Succ Zero))) - : Less (Succ (Succ Zero)) (Succ (Succ (Succ (Succ Zero)))) -- -
- -
- --Idea: to be semantically well-formed, the abstract syntax of a document -must contain a proof of some property, -although the proof is not shown in the concrete document. -
--Example: documents describing flight connections: -
--To fly from Gothenburg to Prague, first take LH3043 to Frankfurt, then OK0537 to Prague. -
--The well-formedness of this text is partly expressible by dependent typing: -
-- cat - City ; - Flight City City ; - fun - Gothenburg, Frankfurt, Prague : City ; - LH3043 : Flight Gothenburg Frankfurt ; - OK0537 : Flight Frankfurt Prague ; --
-To extend the conditions to flight connections, we introduce a category -of proofs that a change is possible: -
-- cat IsPossible (x,y,z : City)(Flight x y)(Flight y z) ; --
-A legal connection is formed by the function -
-- fun Connect : (x,y,z : City) -> - (u : Flight x y) -> (v : Flight y z) -> - IsPossible x y z u v -> Flight x z ; -- -
- -
- --Above, all Actions were either of -
--To make this scale up for new Kinds, we can refine this to -restricted polymorphism: defined for Kinds of a certain class -
--The notion of class uses the Curry-Howard isomorphism as follows: -
-- -
- --We modify the smart house grammar: -
-- cat - Switchable Kind ; - Dimmable Kind ; - fun - switchable_light : Switchable light ; - switchable_fan : Switchable fan ; - dimmable_light : Dimmable light ; - - switchOn : (k : Kind) -> Switchable k -> Action k ; - dim : (k : Kind) -> Dimmable k -> Action k ; --
-Classes for new actions can be added incrementally. -
-- -
- --Mathematical notation and programming languages have -expressions that bind variables. -
--Example: universal quantifier formula -
-- (All x)B(x) --
-The variable x has a binding (All x), and
-occurs bound in the body B(x).
-
-Examples from informal mathematical language: -
-- for all x, x is equal to x - - the function that for any numbers x and y returns the maximum of x+y - and x*y - - Let x be a natural number. Assume that x is even. Then x + 3 is odd. -- -
- -
- --Abstract syntax can use functions as arguments: -
-- cat Ind ; Prop ; - fun All : (Ind -> Prop) -> Prop --
-where Ind is the type of individuals and Prop,
-the type of propositions.
-
-Let us add an equality predicate -
-- fun Eq : Ind -> Ind -> Prop --
-Now we can form the tree -
-- All (\x -> Eq x x) --
-which we want to relate to the ordinary notation -
-- (All x)(x = x) --
-In higher-order abstract syntax (HOAS), all variable bindings are -expressed using higher-order syntactic constructors. -
-- -
- --HOAS has proved to be useful in the semantics and computer implementation of -variable-binding expressions. -
--How do we relate HOAS to the concrete syntax? -
--In GF, we write -
-
- fun All : (Ind -> Prop) -> Prop
- lin All B = {s = "(" ++ "All" ++ B.$0 ++ ")" ++ B.s}
-
-
-General rule: if an argument type of a fun function is
-a function type A -> C, the linearization type of
-this argument is the linearization type of C
-together with a new field $0 : Str.
-
-The argument B thus has the linearization type
-
- {s : Str ; $0 : Str},
-
-
-If there are more bindings, we add $1, $2, etc.
-
- -
- --To make sense of linearization, syntax trees must be -eta-expanded: for any function of type -
-- A -> B --
-an eta-expanded syntax tree has the form -
-- \x -> b --
-where b : B under the assumption x : A.
-
-Given the linearization rule -
-
- lin Eq a b = {s = "(" ++ a.s ++ "=" ++ b.s ++ ")"}
-
--the linearization of the tree -
-- \x -> Eq x x --
-is the record -
-
- {$0 = "x", s = ["( x = x )"]}
-
--Then we can compute the linearization of the formula, -
-
- All (\x -> Eq x x) --> {s = "[( All x ) ( x = x )]"}.
-
-
-The linearization of the variable x is,
-"automagically", the string "x".
-
- -
- --GF can treat any one-word string as a variable symbol. -
-- > p -cat=Prop "( All x ) ( x = x )" - All (\x -> Eq x x) --
-Variables must be bound if they are used: -
-- > p -cat=Prop "( All x ) ( x = y )" - no tree found -- -
- -
- --1. Write an abstract syntax of the whole -predicate calculus, with the -connectives "and", "or", "implies", and "not", and the -quantifiers "exists" and "for all". Use higher-order functions -to guarantee that unbounded variables do not occur. -
--2. Write a concrete syntax for your favourite -notation of predicate calculus. Use Latex as target language -if you want nice output. You can also try producing boolean -expressions of some programming language. Use as many parenthesis as you need to -guarantee non-ambiguity. -
-- -
- -
-The fun judgements of GF are declarations of functions, giving their types.
-
-Can we compute fun functions?
-
-Mostly we are not interested, since functions are seen as constructors, -i.e. data forms - as usual with -
-- fun Zero : Nat ; - fun Succ : Nat -> Nat ; --
-But it is also possible to give semantic definitions to functions.
-The key word is def:
-
- fun one : Nat ; - def one = Succ Zero ; - - fun twice : Nat -> Nat ; - def twice x = plus x x ; - - fun plus : Nat -> Nat -> Nat ; - def - plus x Zero = x ; - plus x (Succ y) = Succ (Sum x y) ; -- -
- -
- --Computation: follow a chain of definition until no definition -can be applied, -
-- plus one one --> - plus (Succ Zero) (Succ Zero) --> - Succ (plus (Succ Zero) Zero) --> - Succ (Succ Zero) --
-Computation in GF is performed with the put_term command and the
-compute transformation, e.g.
-
- > parse -tr "1 + 1" | put_term -transform=compute -tr | l - plus one one - Succ (Succ Zero) - s(s(0)) -- -
- -
- --Two trees are definitionally equal if they compute into the same tree. -
--Definitional equality does not guarantee sameness of linearization: -
-- plus one one ===> 1 + 1 - Succ (Succ Zero) ===> s(s(0)) --
-The main use of this concept is in type checking: sameness of types. -
--Thus e.g. the following types are equal -
-- Less Zero one - Less Zero (Succ Zero)) --
-so that an object of one also is an object of the other. -
-- -
- -
-The judgement form data tells that a category has
-certain functions as constructors:
-
- data Nat = Succ | Zero ; --
-The type signatures of constructors are given separately, -
-- fun Zero : Nat ; - fun Succ : Nat -> Nat ; --
-There is also a shorthand: -
-- data Succ : Nat -> Nat ; === fun Succ : Nat -> Nat ; - data Nat = Succ ; --
-Notice: in def definitions, identifier patterns not
-marked as data will be treated as variables.
-
- -
- --1. Implement an interpreter of a small functional programming -language with natural numbers, lists, pairs, lambdas, etc. Use higher-order -abstract syntax with semantic definitions. As concrete syntax, use -your favourite programming language. -
-
-2. There is no termination checking for def definitions.
-Construct an examples that makes type checking loop.
-Type checking can be invoked with put_term -transform=solve.
-
- -
- --Goals: -
-- -
- --We construct a calculator with addition, subtraction, multiplication, and -division of integers. -
-
- abstract Calculator = {
-
- cat Exp ;
-
- fun
- EPlus, EMinus, ETimes, EDiv : Exp -> Exp -> Exp ;
- EInt : Int -> Exp ;
- }
-
-
-The category Int is a built-in category of
-integers. Its syntax trees integer literals, i.e.
-sequences of digits:
-
- 5457455814608954681 : Int --
-These are the only objects of type Int:
-grammars are not allowed to declare functions with Int as value type.
-
- -
- --We begin with a -concrete syntax that always uses parentheses around binary -operator applications: -
-
- concrete CalculatorP of Calculator = {
-
- lincat
- Exp = SS ;
- lin
- EPlus = infix "+" ;
- EMinus = infix "-" ;
- ETimes = infix "*" ;
- EDiv = infix "/" ;
- EInt i = i ;
-
- oper
- infix : Str -> SS -> SS -> SS = \f,x,y ->
- ss ("(" ++ x.s ++ f ++ y.s ++ ")") ;
- }
-
--Now we have -
-- > linearize EPlus (EInt 2) (ETimes (EInt 3) (EInt 4)) - ( 2 + ( 3 * 4 ) ) --
-First problems: -
-- -
- --The input of parsing in GF is not just a string, but a list of -tokens, returned by a lexer. -
--The default lexer in GF returns chunks separated by spaces: -
-- "(12 + (3 * 4))" ===> "(12", "+", "(3". "*". "4))" --
-The proper way would be -
-
- "(", "12", "+", "(", "3", "*", "4", ")", ")"
-
-
-Moreover, the tokens "12", "3", and "4" should be recognized as
-integer literals - they cannot be found in the grammar.
-
- -
-
-Lexers are invoked by flags to the command put_string = ps.
-
- > put_string -lexcode "(2 + (3 * 4))" - ( 2 + ( 3 * 4 ) ) --
-This can be piped into a parser, as usual: -
-- > ps -lexcode "(2 + (3 * 4))" | parse - EPlus (EInt 2) (ETimes (EInt 3) (EInt 4)) --
-In linearization, we use a corresponding unlexer: -
-- > linearize EPlus (EInt 2) (ETimes (EInt 3) (EInt 4)) | ps -unlexcode - (2 + (3 * 4)) -- -
- -
- -| lexer | -unlexer | -description | -|
|---|---|---|---|
chars |
-unchars |
-each character is a token | -|
lexcode |
-unlexcode |
-program code conventions (uses Haskell's lex) | -|
lexmixed |
-unlexmixed |
-like text, but between $ signs like code | -|
lextext |
-unlextext |
-with conventions on punctuation and capitals | -|
words |
-unwords |
-(default) tokens separated by space characters | -|
- -
- --Arithmetic expressions should be unambiguous. If we write -
-- 2 + 3 * 4 --
-it should be parsed as one, but not both, of -
-- EPlus (EInt 2) (ETimes (EInt 3) (EInt 4)) - ETimes (EPlus (EInt 2) (EInt 3)) (EInt 4) --
-We choose the former tree, because -multiplication has higher precedence than addition. -
--To express the latter tree, we have to use parentheses: -
-- (2 + 3) * 4 --
-The usual precedence rules: -
-1 + 2 + 3 means the same as (1 + 2) + 3.
-- -
- --Precedence can be made into an inherent feature of expressions: -
-
- oper
- Prec : PType = Ints 2 ;
- TermPrec : Type = {s : Str ; p : Prec} ;
-
- mkPrec : Prec -> Str -> TermPrec = \p,s -> {s = s ; p = p} ;
-
- lincat
- Exp = TermPrec ;
-
-
-Notice Ints 2: a parameter type, whose values are the integers
-0,1,2.
-
-Using precedence levels: compare the inherent precedence of an -expression with the expected precedence. -
--This idea is encoded in the operation -
-
- oper usePrec : TermPrec -> Prec -> Str = \x,p ->
- case lessPrec x.p p of {
- True => "(" x.s ")" ;
- False => x.s
- } ;
-
-
-(We use lessPrec from lib/prelude/Formal.)
-
- -
- --We can define left-associative infix expressions: -
-- infixl : Prec -> Str -> (_,_ : TermPrec) -> TermPrec = \p,f,x,y -> - mkPrec p (usePrec x p ++ f ++ usePrec y (nextPrec p)) ; --
-Constant-like expressions (the highest level): -
-- constant : Str -> TermPrec = mkPrec 2 ; --
-All these operations can be found in lib/prelude/Formal,
-which has 5 levels.
-
-Now we can write the whole concrete syntax of Calculator compactly:
-
- concrete CalculatorC of Calculator = open Formal, Prelude in {
-
- flags lexer = codelit ; unlexer = code ; startcat = Exp ;
-
- lincat Exp = TermPrec ;
-
- lin
- EPlus = infixl 0 "+" ;
- EMinus = infixl 0 "-" ;
- ETimes = infixl 1 "*" ;
- EDiv = infixl 1 "/" ;
- EInt i = constant i.s ;
- }
-
-
-- -
- -
-1. Define non-associative and right-associative infix operations
-analogous to infixl.
-
-2. Add a constructor that puts parentheses around expressions
-to raise their precedence, but that is eliminated by a def definition.
-Test parsing with and without a pipe to pt -transform=compute.
-
- -
- --Translate arithmetic (infix) to JVM (postfix): -
-- 2 + 3 * 4 - - ===> - - iconst 2 : iconst 3 ; iconst 4 ; imul ; iadd --
-Just give linearization rules for JVM: -
-
- lin
- EPlus = postfix "iadd" ;
- EMinus = postfix "isub" ;
- ETimes = postfix "imul" ;
- EDiv = postfix "idiv" ;
- EInt i = ss ("iconst" ++ i.s) ;
- oper
- postfix : Str -> SS -> SS -> SS = \op,x,y ->
- ss (x.s ++ ";" ++ y.s ++ ";" ++ op) ;
-
-
-- -
- --A straight code programming language, with -initializations and assignments: -
-- int x = 2 + 3 ; - int y = x + 1 ; - x = x + 9 * y ; --
-We define programs by the following constructors: -
-- fun - PEmpty : Prog ; - PInit : Exp -> (Var -> Prog) -> Prog ; - PAss : Var -> Exp -> Prog -> Prog ; --
-PInit uses higher-order abstract syntax for making the
-initialized variable available in the continuation of the program.
-
-The abstract syntax tree for the above code is -
-- PInit (EPlus (EInt 2) (EInt 3)) (\x -> - PInit (EPlus (EVar x) (EInt 1)) (\y -> - PAss x (EPlus (EVar x) (ETimes (EInt 9) (EVar y))) - PEmpty)) --
-No uninitialized variables are allowed - there are no constructors for Var!
-But we do have the rule
-
- fun EVar : Var -> Exp ; --
-The rest of the grammar is just the same as for arithmetic expressions
-here. The best way to implement it is perhaps by writing a
-module that extends the expression module. The most natural start category
-of the extension is Prog.
-
- -
- --1. Define a C-like concrete syntax of the straight-code language. -
-
-2. Extend the straight-code language to expressions of type float.
-To guarantee type safety, you can define a category Typ of types, and
-make Exp and Var dependent on Typ. Basic floating point expressions
-can be formed from literal of the built-in GF type Float. The arithmetic
-operations should be made polymorphic (as here).
-
-3. Extend JVM generation to the straight-code language, using -two more instructions -
-iload x, which loads the value of the variable x
-istore x which stores a value to the variable x
--Thus the code for the example in the previous section is -
-- iconst 2 ; iconst 3 ; iadd ; istore x ; - iload x ; iconst 1 ; iadd ; istore y ; - iload x ; iconst 9 ; iload y ; imul ; iadd ; istore x ; -- -
-4. If you made the exercise of adding floating point numbers to
-the language, you can now cash out the main advantage of type checking
-for code generation: selecting type-correct JVM instructions. The floating
-point instructions are precisely the same as the integer one, except that
-the prefix is f instead of i, and that fconst takes floating
-point literals as arguments.
-
- -
- --Goals: -
-- -
- --GF grammars can be used as parts of programs written in other programming -languages, to be called host languages. -This facility is based on several components: -
-- -
- --The portable format is called PGF, "Portable Grammar Format". -
-
-This format is produced by using GF as batch compiler, with the option -make,
-from the operative system shell:
-
- % gf -make SOURCE.gf --
-PGF is the recommended format in -which final grammar products are distributed, because they -are stripped from superfluous information and can be started and applied -faster than sets of separate modules. -
--Application programmers have never any need to read or modify PGF files. -
--PGF thus plays the same role as machine code in -general-purpose programming (or bytecode in Java). -
-- -
- --The Haskell API contains (among other things) the following types and functions: -
-- readPGF :: FilePath -> IO PGF - - linearize :: PGF -> Language -> Tree -> String - parse :: PGF -> Language -> Category -> String -> [Tree] - - linearizeAll :: PGF -> Tree -> [String] - linearizeAllLang :: PGF -> Tree -> [(Language,String)] - - parseAll :: PGF -> Category -> String -> [[Tree]] - parseAllLang :: PGF -> Category -> String -> [(Language,[Tree])] - - languages :: PGF -> [Language] - categories :: PGF -> [Category] - startCat :: PGF -> Category --
-This is the only module that needs to be imported in the Haskell application.
-It is available as a part of the GF distribution, in the file
-src/PGF.hs.
-
- -
- --Let us first build a stand-alone translator, which can translate -in any multilingual grammar between any languages in the grammar. -
-- module Main where - - import PGF - import System (getArgs) - - main :: IO () - main = do - file:_ <- getArgs - gr <- readPGF file - interact (translate gr) - - translate :: PGF -> String -> String - translate gr s = case parseAllLang gr (startCat gr) s of - (lg,t:_):_ -> unlines [linearize gr l t | l <- languages gr, l /= lg] - _ -> "NO PARSE" --
-To run the translator, first compile it by -
-- % ghc -make -o trans Translator.hs --
-For this, you need the Haskell compiler GHC. -
-- -
- -
-Then produce a PGF file. For instance, the Food grammar set can be
-compiled as follows:
-
- % gf -make FoodEng.gf FoodIta.gf --
-This produces the file Food.pgf (its name comes from the abstract syntax).
-
-The Haskell library function interact makes the trans program work
-like a Unix filter, which reads from standard input and writes to standard
-output. Therefore it can be a part of a pipe and read and write files.
-The simplest way to translate is to echo input to the program:
-
- % echo "this wine is delicious" | ./trans Food.pgf - questo vino è delizioso --
-The result is given in all languages except the input language. -
-- -
- -
-To avoid starting the translator over and over again:
-change interact in the main function to loop, defined as
-follows:
-
- loop :: (String -> String) -> IO () - loop trans = do - s <- getLine - if s == "quit" then putStrLn "bye" else do - putStrLn $ trans s - loop trans --
-The loop keeps on translating line by line until the input line
-is quit.
-
- -
- --The next application is also a translator, but it adds a -transfer component - a function that transforms syntax trees. -
--The transfer function we use is one that computes a question into an answer. -
--The program accepts simple questions about arithmetic and answers -"yes" or "no" in the language in which the question was made: -
-- Is 123 prime? - No. - 77 est impair ? - Oui. --
-We change the pure translator by giving
-the translate function the transfer as an extra argument:
-
- translate :: (Tree -> Tree) -> PGF -> String -> String --
-Ordinary translation as a special case where
-transfer is the identity function (id in Haskell).
-
-To reply in the same language as the question: -
-- translate tr gr = case parseAllLang gr (startCat gr) s of - (lg,t:_):_ -> linearize gr lg (tr t) - _ -> "NO PARSE" -- -
- -
- --Input: abstract syntax judgements -
-
- abstract Query = {
-
- flags startcat=Question ;
-
- cat
- Answer ; Question ; Object ;
-
- fun
- Even : Object -> Question ;
- Odd : Object -> Question ;
- Prime : Object -> Question ;
- Number : Int -> Object ;
-
- Yes : Answer ;
- No : Answer ;
- }
-
-
-- -
- --To make it easy to define a transfer function, we export the -abstract syntax to a system of Haskell datatypes: -
-- % gf --output-format=haskell Query.pgf --
-It is also possible to produce the Haskell file together with PGF, by -
-- % gf -make --output-format=haskell QueryEng.gf --
-The result is a file named Query.hs, containing a
-module named Query.
-
- -
--Output: Haskell definitions -
-- module Query where - import PGF - - data GAnswer = - GYes - | GNo - - data GObject = GNumber GInt - - data GQuestion = - GPrime GObject - | GOdd GObject - | GEven GObject - - newtype GInt = GInt Integer --
-All type and constructor names are prefixed with a G to prevent clashes.
-
-The Haskell module name is the same as the abstract syntax name. -
-- -
- --Haskell's type checker guarantees that the functions are well-typed also with -respect to GF. -
-- answer :: GQuestion -> GAnswer - answer p = case p of - GOdd x -> test odd x - GEven x -> test even x - GPrime x -> test prime x - - value :: GObject -> Int - value e = case e of - GNumber (GInt i) -> fromInteger i - - test :: (Int -> Bool) -> GObject -> GAnswer - test f x = if f (value x) then GYes else GNo -- -
- -
- --The generated Haskell module also contains -
-
- class Gf a where
- gf :: a -> Tree
- fg :: Tree -> a
-
- instance Gf GQuestion where
- gf (GEven x1) = DTr [] (AC (CId "Even")) [gf x1]
- gf (GOdd x1) = DTr [] (AC (CId "Odd")) [gf x1]
- gf (GPrime x1) = DTr [] (AC (CId "Prime")) [gf x1]
- fg t =
- case t of
- DTr [] (AC (CId "Even")) [x1] -> GEven (fg x1)
- DTr [] (AC (CId "Odd")) [x1] -> GOdd (fg x1)
- DTr [] (AC (CId "Prime")) [x1] -> GPrime (fg x1)
- _ -> error ("no Question " ++ show t)
-
--For the programmer, it is enough to know: -
-G
-gf translates from Haskell objects to GF trees
-fg translates from GF trees to Haskell objects
-- -
- -- module TransferDef where - - import PGF (Tree) - import Query -- generated from GF - - transfer :: Tree -> Tree - transfer = gf . answer . fg - - answer :: GQuestion -> GAnswer - answer p = case p of - GOdd x -> test odd x - GEven x -> test even x - GPrime x -> test prime x - - value :: GObject -> Int - value e = case e of - GNumber (GInt i) -> fromInteger i - - test :: (Int -> Bool) -> GObject -> GAnswer - test f x = if f (value x) then GYes else GNo - - prime :: Int -> Bool - prime x = elem x primes where - primes = sieve [2 .. x] - sieve (p:xs) = p : sieve [ n | n <- xs, n `mod` p > 0 ] - sieve [] = [] -- -
- -
- -
-Here is the complete code in the Haskell file TransferLoop.hs.
-
- module Main where - - import PGF - import TransferDef (transfer) - - main :: IO () - main = do - gr <- readPGF "Query.pgf" - loop (translate transfer gr) - - loop :: (String -> String) -> IO () - loop trans = do - s <- getLine - if s == "quit" then putStrLn "bye" else do - putStrLn $ trans s - loop trans - - translate :: (Tree -> Tree) -> PGF -> String -> String - translate tr gr s = case parseAllLang gr (startCat gr) s of - (lg,t:_):_ -> linearize gr lg (tr t) - _ -> "NO PARSE" -- -
- -
- -
-To automate the production of the system, we write a Makefile as follows:
-
- all: - gf -make --output-format=haskell QueryEng - ghc --make -o ./math TransferLoop.hs - strip math --
-(The empty segments starting the command lines in a Makefile must be tabs.) -Now we can compile the whole system by just typing -
-- make --
-Then you can run it by typing -
-- ./math --
-Just to summarize, the source of the application consists of the following files: -
-- Makefile -- a makefile - Math.gf -- abstract syntax - Math???.gf -- concrete syntaxes - TransferDef.hs -- definition of question-to-answer function - TransferLoop.hs -- Haskell Main module -- -
- -
- -
-PGF files can be used in web servers, for which there is a Haskell library included
-in src/server/. How to build a server for tasks like translators is explained
-in the README file in that directory.
-
-One of the servers that can be readily built with the library (without any
-programming required) is fridge poetry magnets. It is an application that
-uses an incremental parser to suggest grammatically correct next words. Here
-is an example of its application to the Foods grammars.
-
-
-
- -
- --JavaScript is a programming language that has interpreters built in in most -web browsers. It is therefore usable for client side web programs, which can even -be run without access to the internet. The following figure shows a JavaScript -program compiled from GF grammars as run on an iPhone. -
-
-
-
- -
- -
-JavaScript is one of the output formats of the GF batch compiler. Thus the following
-command generates a JavaScript file from two Food grammars.
-
- % gf -make --output-format=js FoodEng.gf FoodIta.gf --
-The name of the generated file is Food.js, derived from the top-most abstract
-syntax name. This file contains the multilingual grammar as a JavaScript object.
-
- -
- -
-To perform parsing and linearization, the run-time library
-gflib.js is used. It is included in GF/lib/javascript/, together with
-some other JavaScript and HTML files; these files can be used
-as templates for building applications.
-
-An example of usage is
-translator.html,
-which is in fact initialized with
-a pointer to the Food grammar, so that it provides translation between the English
-and Italian grammars:
-
-
-
-The grammar must have the name grammar.js. The abstract syntax and start
-category names in translator.html must match the ones in the grammar.
-With these changes, the translator works for any multilingual grammar.
-
- -
- --The standard way of using GF in speech recognition is by building -grammar-based language models. -
--GF supports several formats, including -GSL, the formatused in the Nuance speech recognizer. -
-
-GSL is produced from GF by running gf with the flag
---output-format=gsl.
-
-Example: GSL generated from FoodsEng.gf.
-
- % gf -make --output-format=gsl FoodsEng.gf
- % more FoodsEng.gsl
-
- ;GSL2.0
- ; Nuance speech recognition grammar for FoodsEng
- ; Generated by GF
-
- .MAIN Phrase_cat
-
- Item_1 [("that" Kind_1) ("this" Kind_1)]
- Item_2 [("these" Kind_2) ("those" Kind_2)]
- Item_cat [Item_1 Item_2]
- Kind_1 ["cheese" "fish" "pizza" (Quality_1 Kind_1)
- "wine"]
- Kind_2 ["cheeses" "fish" "pizzas"
- (Quality_1 Kind_2) "wines"]
- Kind_cat [Kind_1 Kind_2]
- Phrase_1 [(Item_1 "is" Quality_1)
- (Item_2 "are" Quality_1)]
- Phrase_cat Phrase_1
-
- Quality_1 ["boring" "delicious" "expensive"
- "fresh" "italian" ("very" Quality_1) "warm"]
- Quality_cat Quality_1
-
-
-- -
- -
-Other formats available via the --output-format flag include:
-
| Format | -Description | -|
|---|---|---|
gsl |
-Nuance GSL speech recognition grammar | -|
jsgf |
-Java Speech Grammar Format (JSGF) | -|
jsgf_sisr_old |
-JSGF with semantic tags in SISR WD 20030401 format | -|
srgs_abnf |
-SRGS ABNF format | -|
srgs_xml |
-SRGS XML format | -|
srgs_xml_prob |
-SRGS XML format, with weights | -|
slf |
-finite automaton in the HTK SLF format | -|
slf_sub |
-finite automaton with sub-automata in HTK SLF | -|
-All currently available formats can be seen with gf --help.
-