diff --git a/doc/Makefile b/doc/Makefile new file mode 100644 index 000000000..771212515 --- /dev/null +++ b/doc/Makefile @@ -0,0 +1,4 @@ +all: + txt2tags gslt-sem-2006.txt + htmls gslt-sem-2006.html + diff --git a/doc/gf-resource.txt b/doc/gf-resource.txt new file mode 100644 index 000000000..1b277691a --- /dev/null +++ b/doc/gf-resource.txt @@ -0,0 +1,1048 @@ +GF Resource Grammar Library +Author: Aarne Ranta +Last update: %%date(%c) + +% NOTE: this is a txt2tags file. +% Create an html file from this file using: +% txt2tags --toc gf-resource.txt + +%!target:html + +%!postproc(html): #NEW + + +#NEW +==GF = Grammatical Framework== + +GF is a grammar formalism based on functional programming and type theory. + + + +GF was designed to be nice for //ordinary programmers// to use: by this +we mean programmers without training in linguistics. + + + +The mission of GF is to make natural-language applications available for +ordinary programmers, in tasks like + +- software documentation +- domain-specific translation +- human-computer interaction +- dialogue systems + +Thus GF is //not// primarily another theoretical framework for +linguists. + + + +#NEW +==Multilingual grammars== + +A GF grammar consists of an abstract syntax and a set +of concrete syntaxes. + + + +**Abstract syntax**: language-independent representation +``` + cat Prop ; Nat ; + fun Even : Nat -> Prop ; + fun NInt : Int -> Nat ; +``` +**Concrete syntax**: mapping from abstract syntax trees to strings in a language +(English, French, German, Swedish,...) +``` + lin Even x = {s = x.s ++ "is" ++ "even"} ; + + lin Even x = {s = x.s ++ "est" ++ "pair"} ; + + lin Even x = {s = x.s ++ "ist" ++ "gerade"} ; + + lin Even x = {s = x.s ++ "är" ++ "jämnt"} ; +``` +We can **translate** between languages via the abstract syntax: +``` + 4 is even 4 ist gerade + \ / + Even (NInt 4) + / \ + 4 est pair 4 är jämnt +``` + + + +But is it really so simple? + + +#NEW +==Difficulties with concrete syntax== + +Most languages have rules of **inflection**, **agreement**, +and **word order**, which have to be obeyed when putting together +expressions. + + + +The previous multilingual grammar breaks these rules in many situations: +// +2 and 3 is even +la somme de 3 et de 5 est pair +wenn 2 ist gerade, dann 2+2 ist gerade +om 2 är jämnt, 2+2 är jämnt +// +All these sentences are grammatically incorrect. + + + +#NEW +==Solving the difficulties== + +GF has tools for expressing the linguistic rules that are needed to +produce correct translations in different languages. + + + +Instead of just strings, we need parameters**, **tables**, +and **record types**. For instance, French: +``` + param Mod = Ind | Subj ; + param Gen = Masc | Fem ; + + lincat Nat = {s : Str ; g : Gen} ; + lincat Prop = {s : Mod => Str} ; + + lin Even x = {s = + table { + m => x.s ++ + case m of {Ind => "est" ; Subj => "soit"} ++ + case x.g of {Masc => "pair" ; Fem => "paire"} + } + } ; +``` +To learn more about these constructs, consult GF documentation, e.g. the +[../../../doc/tutorial/gf-tutorial2.html New Grammarian's Tutorial]. +However, in what follows we will show how to avoid learning them and +still write linguistically correct grammars. + + +#NEW +==Language + Libraries== + +Writing natural language grammars still requires +theoretical knowledge about the language. + + + +Which kind of a programmer is it easier to find? + +- one who can write a sorting algorithm +- one who can write a grammar for Swedish determiners + + + + +In main-stream programming, sorting algorithms are not +written by hand but taken from **libraries**. + + + +In the same way, we want to create grammar libraries that encapsulate +basic linguistic facts. + + + +Cf. the Java success story: the language is just a half of the +success - libraries are another half. + + + +#NEW +==Example of library-based grammar writing== + +To define a Swedish expression of a mathematical predicate from scratch: +``` + Even x = + let jämn = case <x.n,x.g> of { + <Sg,Utr> => "jämn" ; + <Sg,Neutr> => "jämnt" ; + <Pl,_> => "jämna" + } + in + {s = table { + Main => x.s ! Nom ++ "är" ++ jämn ; + Inv => "är" ++ x.s ! Nom ++ jämn ; + Sub => x.s ! Nom ++ "är" ++ jämn + } + } +``` +To use library functions for syntax and morphology: +``` + Even = predA (regA "jämn") ; +``` +For the French version, we write +``` + Even = predA (regA "pair") ; +``` + + + +#NEW +==Questions in grammar library design== + +What should there be in the library? + +- morphology, lexicon, syntax, semantics,... + + + +How do we organize and present the library? + +- division into modules, level of granularity + +- "school grammar" vs. sophisticated linguistic concepts + + + +Where do we get the data from? + +- automatic extraction or hand-writing? + +- reuse of existing resources? + +Extra constraint: we want open-source free software and +hence cannot use existing proprietary resources. + + +#NEW +==Answers to questions in grammar library design== + +The current GF resource grammar library has +made the following decisions: + +The library has, for each language + +- complete morphology, some lexicon (500 words), representative fragment of syntax, +very little semantics, + + + +Organization and presentation: + +- division into top-level (API) modules, and internal modules (only +interesting for resource implementors) + +- the API is, as much as possible, common in different languages + +- we favour "school grammar" concepts rather than innovative linguistic theory + + + +Where do we get the data from? + +- morphology and syntax are hand-written + +- the 500-word lexicon is hand-written, but a tool is provided + for automatic lexicon extraction + +- we have not reused existing resources + +The resource grammar library is entirely +open-source free software (under GNU GPL license). + + + + + +#NEW +==The scope of a resource grammar library for a language== + +All morphological paradigms + + + +Basic lexicon of structural, common, and irregular words + + + +Basic syntactic structures + + + +Currently, +- //no// semantics, +- //no// language-specific structures if not necessary for expressivity. + + + + + +#NEW +==Success criteria== + +Grammatical correctness + + + +Semantic coverage: you can express whatever you want. + + + +Usability as library for non-linguists. + + + +(Bonus for linguists:) nice generalizations w.r.t. language +families, using the module system of GF. + + + +#NEW +==These are not our success criteria== + +Language coverage: to be able to parse all expressions. + +Example: +the French //passé simple// tense, although covered by the +morphology, is not used in the language-independent API, but +only the //passé composé// is. However, an application +accessing the French-specific (or Romance-specific) +modules can use the passé simple. + + + +Semantic correctness: only to produce meaningful expressions. + +Example: the following sentences can be generated +``` + colourless green ideas sleep furiously + + the time is seventy past forty-two +``` +However, an applicatio grammar can use a domain-specific +semantics to guarantee semantic well-formedness. + + + +(Warning for linguists:) theoretical innovation in +syntax is not among the goals +(and it would be hidden from users anyway!). + + + +#NEW +==So where is semantics?== + +GF incorporates a **Logical Framework** and is therefore +capable of expressing logical semantics //à la// Montague +or any other flavour, including anaphora and discourse. + + + +But we do //not// try to give semantics once and +for all for the whole language. + + + +Instead, we expect semantics to be given in +**application grammars** built on semantic models +of different domains. + + + +Example application: number theory +``` + fun Even : Nat -> Prop ; -- a mathematical predicate + + lin Even = predA (regA "even") ; -- English translation + lin Even = predA (regA "pair") ; -- French translation + lin Even = predA (regA "jämn") ; -- Swedish translation +``` +How could the resource predict that just //these// +translations are correct in this domain? + + + +Application grammars are built by experts of these domains +who - thanks to resource grammars - do no more need to be +experts in linguistics. + + + + + + + +#NEW +==Languages== + +The current GF Resource Project covers ten languages: + +-``Dan``ish +-``Eng``lish +-``Fin``nish +-``Fre``nch +-``Ger``man +-``Ita``lian +-``Nor``wegian +-``Rus``sian +-``Spa``nish +-``Swe``dish + +The first three letters (``Dan`` etc) are used in grammar module names + + + +#NEW +==Library structure 1: language-independent API== + + +- ``Lang`` is the top module collecting all of the following. + + + +- syntactic ``Categories`` (parts of speech, word classes), e.g. +``` + V ; NP ; CN ; Det ; -- verb, noun phrase, common noun, determiner +``` +- ``Rules`` for combining words and phrases, e.g. +``` + DetNP : Det -> CN -> NP ; -- combine Det and CN into NP +``` +- the most common ``Structural`` words (determiners, +conjunctions, pronouns) (now 83), e.g. +``` + and_Conj : Conj ; +``` +- ``Numerals``, number words from 1 to 999,999 with their +inflections, e.g. +``` + n8 : Digit ; +``` +- ``Basic`` lexicon of (now 218) frequent everyday words +``` + man_N : N ; +``` + + + +In addition, and not included in ``Lang``, there is +- ``SwadeshLex``, lexicon of (now 206) words from the +[http://en.wiktionary.org/wiki/Swadesh_List Swadesh list], e.g. +``` + squeeze_V : V ; +``` +Of course, there is some overlap between ``SwadeshLex`` and the other modules. + + +#NEW +==Library structure 2: language-dependent modules== + +- morphological ``Paradigms``, e.g. Swedish +``` + mkN : Str -> Str -> Str -> Str -> Gender -> N ; -- worst-case nouns + mkN : Str -> N ; -- regular nouns +``` +- (in some languages) irregular ``Verbs``, e.g. +``` + angripa_V = irregV "angripa" "angrep" "angripit" ; +``` +- (not yet available) ``Ext``ended syntax with language-specific rules +``` + PassBli : V2 -> NP -> VP ; -- bli överkörd av ngn +``` + + + +#NEW +==How much can be language-independent?== + +For the ten languages we have considered, it //is// possible +to implement the current API. + + + +Reservations: + +- this does not necessarily extend to all other languages +- this does not necessarily cover the most idiomatic expressions + of each language +- this may not be the easiest API to implement (e.g. negation and +inversion with //do// in English suggest that some other +structure would be more natural) +- it is not guaranteed that same structure has the same semantics +in all different languages + + + +#NEW +==Library structure: language-independent API== + +%#center + [src="Lang.gif] +%#center + + +#NEW +==API documentation== + +[Categories.html Categories] + + +[Rules.html Rules] + + +Two alternative views on sentence formation by predication: +[Clause.html Clause], +[Verbphrase.html Verbphrase] + + +[Structural.html Structural] + + + +[Time.html Time] + + +[Basic.html Basic] + + + +[Lang.html Lang] + + + +See also [../../resource-1.0/doc/gfdoc resource v 1.0 documentation], +now implemented for English, German, and Swedish. + + + +#NEW +==Paradigms documentation== + +[ParadigmsEng.html English paradigms] + +[BasicEng.html example use of English oaradigms] + +[VerbsEng.html English verbs] + + + +[ParadigmsFin.html Finnish paradigms] + +[BasicFin.html example use of Finnish oaradigms] + + + +[ParadigmsFre.html French paradigms] + +[BasicFre.html example use of French paradigms] + +[VerbsFre.html French verbs] + + + +[ParadigmsIta.html Italian paradigms] + +[BasicIta.html example use of Italian paradigms] + +[BeschIta.html Italian verb conjugations] + + + +[ParadigmsNor.html Norwegian paradigms] + +[BasicNor.html example use of Norwegian paradigms] + +[VerbsNor.html Norwegian verbs] + + +[ParadigmsSpa.html Spanish paradigms] + +[BasicSpa.html example use of Spanish paradigms] + +[BeschSpa.html Spanish verb conjugations] + + +[ParadigmsSwe.html Swedish paradigms] + +[BasicSwe.html example use of Swedish paradigms] + +[VerbsSwe.html Swedish verbs] + + + +#NEW +==Use as top-level grammar: testing== + +Import a set of ``LangX`` grammars: +``` + i english/LangEng.gf + i swedish/LangSwe.gf +``` +Alternatively, you can ``make`` a precompiled package of +all the languages by using ``lib/resource/Makefile``: +``` + make + gf langs.gfcm +``` +Then you can test with translation, random generation, morphological analysis... +``` + > p -lang=LangEng "I have loved her." | l -lang=LangFre + Je l' ai aimée. + + > gr -cat=NP | l -multi + The sock + Strumpan + Strømpen + La media + La calza + La chaussette + Sukka +``` + + +#NEW +==Use as top-level grammar: language learning quizzes== + +Morpho quiz with words (e.g. French verbs): +``` + i french/VerbsFre.gf + mq -cat=V +``` +Morpho quiz with phrases (e.g. Swedish clauses): +``` + i swedish/LangSwe.gf + mq -cat=Cl +``` +Translation quiz with sentences (e.g. sentences from English to Swedish): +``` + i swedish/LangEng.gf + i swedish/LangSwe.gf + tq -cat=S LangEng LangSwe +``` + + + + +#NEW +==Use as library== + +Import directly by ``open``: +``` + concrete AppNor of App = open LangNor, ParadigmsNor in {...} +``` +(Note for the users of GF 2.1 and older: +the dummy ``reuse`` modules and their bulky ``.gfr`` versions +are no longer needed!) + + + +If you need to convert resource records to strings, and don't want to know +the concrete type (as you never should), you can use +``` + Predef.toStr : (L : Type) -> L -> Str ; +``` +``L`` must be a linearization type. For instance, +``` + toStr LangNor.CN (ModAP (PositADeg old_ADeg) (UseN car_N)) + ---> "gammel bil" +``` + + + + +#NEW +==Use as library through parser== + +You can use the parser with a ``LangX`` grammar +when developing a resource. + + + +Using the ``-v`` option shows if the parser fails because +of unknown words. +``` + > p -cat=S -v -lexer=words "jag ska åka till Chalmers" + unknown tokens [TS "åka",TS "Chalmers"] +``` +Then try to select words that ``LangX`` recognizes: +``` + > p -cat=S "jag ska springa till Danmark" + UseCl (PosTP TFuture ASimul) + (AdvCl (SPredV i_NP run_V) + (AdvPP (PrepNP to_Prep (UsePN (PNCountry Denmark))))) +``` +Use these API structures and extend vocabulary to match your need. +``` + åka_V = lexV "åker" ; + Chalmers = regPN "Chalmers" neutrum ; +``` + +#NEW +==Syntax editor as library browser== + +You can run the syntax editor on ``LangX`` to +find resource API functions through context-sensitive menus. +For instance, the shell command +``` + gfeditor LangEng.gf LangFre.gf +``` +opens the editor with English and French views. The +[http://www.cs.chalmers.se/%7Eaarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm +Editor User Manual] gives more information on the use of the editor. + + + +A restriction of the editor is that it does not give access to +``ParadigmsX`` modules. An IDE environment extending the editor +to a grammar programming tool is work in progress. + + + + +#NEW +==Example application: a small translation system== + +In this system, you can express questions and answers of +the following forms: +``` + Who chases mice ? + Whom does the lion chase ? + The dog chases cats. +``` +We build the abstract syntax in two phases: + +- [example/Questions.gf>Questions] defines question and + answer forms independently of domain +- [example/Animals.gf>Animals] defines a lexicon with + animals and things that animals do. + + + + +The concrete syntax of English is built in three phases: + +- [example/HandQuestionsI.gf QuestionsI] is a parametrized module + using the API module ``Resource``. +- [example/QuestionsEng.gf QuestionsEng] is an instantiation + of the API with ``ResourceEng``. +- [example/AnimalsEng.gf AnimalsEng] is a concrete syntax + of ``Animals`` using ``ParadigmsEng`` and ``VerbsEng``. + + + + +The concrete syntax of Swedish is built upon ``QuestionsI`` +in a similar way, with the modules +[example/QuestionsSwe.gf QuestionsSwe] and. +[example/AnimalsSwe.gf AnimalsSwe]. + + + +The concrete syntax of French consists similarly of the modules +[example/QuestionsFre.gf QuestionsFre] and +[example/AnimalsFre.gf AnimalsFre]. + + + + +#NEW +==Compiling the example application== + +The resources are bulky, and it takes a therefore a lot of +time and memory to load the grammars. However, they can be +compiled into the ``gfcm`` +(**GF canonical multilingual**) format, +which is almost one thousand times smaller and faster to load +for this set of grammars. + + + +To produce an end-user multilingual grammar ``animals.gfcm``, +write the sequence of compilation commands in a ``gfs`` (**GF script**) +file, say +[example/mkAnimals.gfs ``mkAnimals.gfs``], +and then call GF with +``` + gf <mkAnimals.gfs +``` +To try out the grammar, +``` + > i animals.gfcm + + > gr | l -multi + vem jagar hundar ? + qui chasse des chiens ? + who chases dogs ? +``` + + +#NEW + +==Grammar writing by examples== + +(New in GF 2.3) + + + +You can use the resource grammar as a parser on a special file format, +``.gfe`` ("GF examples"). Here is the real source, +[example/QuestionsI.gfe QuestionsI.gfe], which +generated +[example/QuestionsI.gf QuestionsI.gf]. +when you executed the GF command +``` + i -ex AnimalsEng.gf +``` +Since ``QuestionsI`` is an incomplete module ("functor"), +it need only be built once. This is why only the first +command in ``mkAnimals.gfs`` needs the flag ``-ex``. + + + +Of course, the grammar of any language can be created by +parsing any language, as long as they have a common resource API. +The use of English resource is generally recommended, because it +is smaller and faster to parse than the other languages. + + +#NEW +==Constants and variables in examples== + +The file [example/QuestionsI.gfe QuestionsI.gfe] uses +as resource ``LangEng``, which contains all resource syntax and +a lexicon of ca. 300 words. A linearization rule, such as +``` + lin Who love_V2 man_N = in Phr "who loves men ?" ; +``` +uses as argument variables constants for words that can be found in +the lexicon. It is due to this that the example can be parsed. +When the resulting rule, +``` + lin Who love_V2 man_N = + QuestPhrase (UseQCl (PosTP TPresent ASimul) + (QPredV2 who8one_IP love_V2 (IndefNumNP NoNum (UseN man_N)))) ; +``` +is read by the GF compiler, the identifiers ``love_V2`` and +``man_N`` are not treated as constants, but, following +the normal binding rules of functional languages, as bound variables. +This is what gives the example method the generality that is needed. + + + +To write linearization rules by examples one thus has to know at +least one abstract syntax constant for each category for which +one needs a variable. + + + +#NEW +==Extending the lexicon on the fly== + +The greatest limitation of the example method is that the lexicon +may lack many of the words that are needed in examples. If parsing +fails because of this, the compiler gives a list of unknown words +in its error message. An obvious solution is, +of course, to extend the resource lexicon and try again. +A more light-weight solution is to add a **substitution** to +the example. For instance, if you want the example "the pope" +but the lexicon does not have the word "pope", you can write +``` + lin Pope = in NP "the man" {man_N = regN "pope"} ; +``` +The resulting linearization rule is initially +``` + lin Pope = DefOneNP (UseN man_N) ; +``` +but the substitution changes this to +``` + lin Pope = DefOneNP (UseN (regN "pope")) ; +``` +In this way, you do not have to extend the resource lexicon, but you +need to open the Paradigms module to compile the resulting term. + + + +Of course, the substituted expressions may come from another language +than the main language of the example: +``` + lin Pope = in NP "the man" {man_N = regN "pape" masculine} ; +``` +If many substitutions are needed, semicolons are used as separators: +``` + {man_N = regN "pope" ; walk_N = regV "pray"} ; +``` + + +#NEW +==Implementation details: low-level files== + +**For developers of resource grammars.** +The modules listed in this section should never be imported in application +grammars. + + + +Each of the API implementations uses the following auxiliary resource modules: + +- ``Types``, the morphological paradigms and word classes +- ``Morpho``, inflection machinery +- ``Syntax``, complex categories and their combinations + +In addition, the following language-independent modules from ``lib/prelude`` +are used. + +- ``Predef``, operations whose definitions are hard-coded in GF +- ``Prelude``, generic string and boolean operations +- ``Coordination``, coordination structures for arbitrary categories + + + +#NEW +==Implementation details: the structure of low-level files== + +%#center + [Low.gif] +%#center + + +#NEW +==How to change a resource grammar?== + +In many cases, the source of a bug is in one of +the low-level modules. Try to trace it back there +by starting from the high-level module. + + + +(Much more to be written...) + + +#NEW +==How to write a resource grammar?== + +Start with a more limited goal, e.g. to implement +the ``stoneage`` grammar (``examples/stoneage``) +for your language. + + + +For this, you need + +- most of ``Types`` +- most of ``Morpho`` +- some of ``Syntax`` +- most of ``Paradigms`` + + + + +A useful command to test ``oper``s: +``` + i -retain MorphoRot.gf + cc regNoun "foo" +``` + + + +See also [../../resource-1.0/doc/Resource-HOWTO.html Resource-HOWTO] +(under construction). + + +#NEW +==The use of parametrized modules== + +In two language families, a lot of code is shared. +- Romance: French, Italian, Spanish +- Scandinavian: Danish, Norwegian, Swedish + + +The structure looks like this. + + [] + + +#NEW +==Current status== + + | Language | v0.6 | v0.9 | v1.0 | Paradigms | Lexicon | Verbs | + | Danish | - | X | - | - | - | - + | English | X | X | X | X | X | X + | Finnish | X | + | - | X | X | 0 + | French | X | X | X | X | X | X + | German | X | - | X | X | - | - + | Italian | X | X | - | X | X | X + | Norwegian | - | X | X | X | X | X + | Russian | X | X | - | * | - | - + | Spanish | - | X | - | X | X | X + | Swedish | X | X | X | X | X | X + +X = implemented (few exceptions may occur) + ++ = implemented for a large part + +* = linguistic material ready for implementation + +- = not implemented + +0 = not applicable + + +#NEW +==Known bugs and limitations== + +(//The listed limitations are ones that do not follow from the table on +the previous page//.) + +Danish + +English: +missing uncontracted negations. + +Finnish: +compiling the heuristic paradigms is slow; +possessive and interrogative suffixes have no proper lexer. + +French: +no inverted questions; +some verbs in Basic should be reflexive + +German + +Italian: +no omission of unstressed subject pronouns; +some verbs in Basic should be reflexive; +bad forms of reflexive infinitives + +Norwegian: +possessives of type //bilen min// not included + +Russian + +Spanish: +no omission of unstressed subject pronouns; +no switch to dative case for human objects; +some verbs in Basic should be reflexive; +bad forms of reflexive infinitives; +spurious parameter for verb auxiliary inherited from Romance + + +Swedish: + + + +#NEW +==Obtaining it== + +Get the grammar package from +[http://sourceforge.net/project/showfiles.php?group_id=132285 +GF Download Page]. The current libraries are in +``lib/resource``. Version 0.6 is in +``lib/resource-0.6``. + + + +The very very latest version of GF and its libraries is in +[http://www.cs.chalmers.se/~bringert/gf/downloads/snapshots/ Snapshots]. + diff --git a/doc/gslt-sem-2006.txt b/doc/gslt-sem-2006.txt new file mode 100644 index 000000000..a98959b18 --- /dev/null +++ b/doc/gslt-sem-2006.txt @@ -0,0 +1,312 @@ +Grammars as Software Libraries +Author: Aarne Ranta +Last update: %%date(%c) + +% NOTE: this is a txt2tags file. +% Create an html file from this file using: +% txt2tags --toc gslt-sem-2006.txt + +%!target:html + +%!postproc(html): #NEW + +#NEW + +==Software Libraries== + +The main device of **division of labour** in programming. + +Instead of writing a sorting algorithm over and over again, +the programmers take it from a library. You write (in Haskell), +``` + Data.List.sort xs +``` +instead of a lot of code actually implementing sorting. + +Practical advantages: +- division of labour +- faster development of new software + + +#NEW + +==Abstraction== + +Libraries promote **abstraction**: you abstract away from details. + +The use of libraries is therefore a good programming style. + +It is also **scientifically interesting** to create libraries: +you have to think about abstractions on your domain of expertise. + +Notice: libraries can bring abstraction to almost any language, +if it just has a support for functions or macros. + + +#NEW + +==Grammars as libraries?== + +Example: we want to create a GUI (Graphical User Interface) button +that says //yes//, and **localize** it to different languages: +``` + Yes Ja Kyllä Oui Ja Sì +``` +Possible ways to do this: ++ Go around dictionaries to find the word in different languages +``` + yesButton english = button "Yes" + yesButton swedish = button "Ja" + yesButton finnish = button "Kyllä" +``` ++ Hire more programmers to perform localization in different languages ++ Use a library ``GUIText`` such that you can write +``` + yesButton lang = button (render lang GUIText.Yes) +``` + + + +#NEW + +==A slightly more advanced example== + +This is what you often see as a feedback from a program: +``` + You have 1 messages. +``` +Or perhaps with a little more thought: +``` + You have 1 message(s). +``` +The code that should be written is of course +``` + mess n = "You have" +++ show n +++ messages ++ "." + where + messages = if n==1 then "message" else "messages" +``` +(E.g. VoiceXML gives good support for this.) + + +#NEW + +==Problems with the more advanced example== + +The same as with "Yes": you have to know the words "you", +"have", "message". + +//Moreover//, you have to know the inflection of the equivalent +of "message": +``` + if n==1 then "meddelande" else "meddelanden" +``` +//Moreover//, you have to know the congruence with different numbers +(e.g. Russian, Arabic): +``` + if n==1 then "m" else + if n==2 then "mein" else "moun" +``` +You also have to know the case required by the verb "have" +(e.g. Finnish: nominative in singular, partitive in plural). + +//Moreover//, you have to know what is the proper way to politely +address the user: +``` + Du har 3 meddelanden / Ni har 3 meddelanden + Vous avez 3 messages / Tu as 3 messages +``` +(This can also depend on country and the kind of program.) + + +#NEW + +==A library-based solution== + +In analogy with the "Yes" case, you write +``` + mess lang n = render lang (MailText.YouHaveMessages n) +``` +Hmm, is this so smart? What about if you want to say +``` + You have 4 documents. + You have 5 jewels. + I have 7 surprises. +``` +It is time to move from **canned text** to a **grammar**. + + + +#NEW + +==An improved library-based solution== + +You may want to write +``` + mess lang n = render lang (Have PolYou (Num n Message)) + sword lang n = render lang (Have FamYou (Num n Sword)) + surpr lang n = render lang (Have I (Num n Surprise)) +``` +For this purpose, you need a library with the following API +(Application Programmer's Interface): +``` + Have : NounPhrase -> NounPhrase -> Sentence + + PolYou, FamYou, I : NounPhrase + + Num : Int -> Noun -> NounPhrase + + Message, Sword, Surprise : Noun +``` +You also need a top-level rendering function +``` + render : Language -> Sentence -> String +``` + + +#NEW + +==An optimal solution?== + +The library API for language will certainly grow big and become +difficult to use. Why could't I just write +``` + mess lang n = render lang (parse english "you have n messages") +``` +To this end, the API should provide the top-level function +``` + parse : Language -> String -> Sentence +``` +The library that we will present actually has this as well! + +The only complication is that ``parse`` does not always return +just one sentence. Those may be zero: +``` + you have n mesaggse +``` +or many: +``` + Have PolYou (Num n Message) + Have FamYou (Num n Message) + Have PlurYou (Num n Message) +``` + + +#NEW + +==The components of a grammar library== + +The library has **construction functions** like +``` + Have : NounPhrase -> NounPhrase -> Sentence + PolYou : NounPhrase +``` +These functions build **grammatical structures**, which +can have different realizations in different languages. + +Therefore we also need **realization functions**, +``` + render : Language -> Sentence -> String + parse : Language -> String -> [Sentence] +``` +Both of them require major linguistic expertise to write - but, +one this is done, they can be used with very little linguistic +knowledge by application programmers! + + +#NEW + +==Implementing a grammar library in GF== + +GF = Grammatical Framework + +Those who know GF have already seen the introduction as a +seduction argument for GF. + +In GF, +- construction functions = **abstract syntax** +- realization functions = **concrete syntax** + + +Example: +``` + abstract GUIText = { + cat Text ; + fun Yes : Text ; + } + concrete GUITextEng of GUIText = { + lin Yes = ss "yes" ; + } + concrete GUITextFin of GUIText = { + lin Yes = ss "kyllä" ; + } +``` + + +#NEW + +==Linearization and parsing== + +The realizatin function is, for each language, implemented by +**linearization rules** (``lin``). + +The linearization rules directly give the ``render`` method: +``` + render english x = GUITextEng.lin x +``` +The GF formalism moreover has the property of **reversibility**: +a set of linearization rules automatically generates a parser as +well. + +While reversibility has a minor importance for the applications +shown above, it is crucial for other applications of GF grammars. + + +#NEW + +==Applying GF== + +**multilingual grammar** = abstract syntax + concrete syntaxes + +Early instances of the idea (from 1998) - **application grammars**: +- multilingual authoring +- domain-specific translation +- dialogue systems + + +Later development (from 2001) - **resource grammars**: +- grammar libraries with language-independent APIs + + +Of course, one important use of resource grammars is +to help writing application grammars in GF. + +In addition to GF itself, GF grammars can be accessed in +Haskell, Prolog, and Java programs. + + +#NEW + +==Domain, ontology, idiom== + +An abstract syntax can represent +- a **semantic model** +- an **ontology** + + +The concrete syntax defines how the **concepts** of the ontology +are represented in natural language (or in a formal language). + +The following requirements are made: +- linguistic correctness (inflection, agreement, word order,...) +- semantic correctness (express the intended concepts) +- conformance to the domain idiom (use natural phrasing) + + +Benefit: translation via semantic model of domain can reach high quality. + +Problem: the expertise of both a linguist and a domain expert are required. + + + + +%http://www.boost.org/ \ No newline at end of file