diff --git a/doc/Makefile b/doc/Makefile deleted file mode 100644 index 771212515..000000000 --- a/doc/Makefile +++ /dev/null @@ -1,4 +0,0 @@ -all: - txt2tags gslt-sem-2006.txt - htmls gslt-sem-2006.html - diff --git a/doc/gf-resource.txt b/doc/gf-resource.txt deleted file mode 100644 index 1b277691a..000000000 --- a/doc/gf-resource.txt +++ /dev/null @@ -1,1048 +0,0 @@ -GF Resource Grammar Library -Author: Aarne Ranta -Last update: %%date(%c) - -% NOTE: this is a txt2tags file. -% Create an html file from this file using: -% txt2tags --toc gf-resource.txt - -%!target:html - -%!postproc(html): #NEW - - -#NEW -==GF = Grammatical Framework== - -GF is a grammar formalism based on functional programming and type theory. - - - -GF was designed to be nice for //ordinary programmers// to use: by this -we mean programmers without training in linguistics. - - - -The mission of GF is to make natural-language applications available for -ordinary programmers, in tasks like - -- software documentation -- domain-specific translation -- human-computer interaction -- dialogue systems - -Thus GF is //not// primarily another theoretical framework for -linguists. - - - -#NEW -==Multilingual grammars== - -A GF grammar consists of an abstract syntax and a set -of concrete syntaxes. - - - -**Abstract syntax**: language-independent representation -``` - cat Prop ; Nat ; - fun Even : Nat -> Prop ; - fun NInt : Int -> Nat ; -``` -**Concrete syntax**: mapping from abstract syntax trees to strings in a language -(English, French, German, Swedish,...) -``` - lin Even x = {s = x.s ++ "is" ++ "even"} ; - - lin Even x = {s = x.s ++ "est" ++ "pair"} ; - - lin Even x = {s = x.s ++ "ist" ++ "gerade"} ; - - lin Even x = {s = x.s ++ "är" ++ "jämnt"} ; -``` -We can **translate** between languages via the abstract syntax: -``` - 4 is even 4 ist gerade - \ / - Even (NInt 4) - / \ - 4 est pair 4 är jämnt -``` - - - -But is it really so simple? - - -#NEW -==Difficulties with concrete syntax== - -Most languages have rules of **inflection**, **agreement**, -and **word order**, which have to be obeyed when putting together -expressions. - - - -The previous multilingual grammar breaks these rules in many situations: -// -2 and 3 is even -la somme de 3 et de 5 est pair -wenn 2 ist gerade, dann 2+2 ist gerade -om 2 är jämnt, 2+2 är jämnt -// -All these sentences are grammatically incorrect. - - - -#NEW -==Solving the difficulties== - -GF has tools for expressing the linguistic rules that are needed to -produce correct translations in different languages. - - - -Instead of just strings, we need parameters**, **tables**, -and **record types**. For instance, French: -``` - param Mod = Ind | Subj ; - param Gen = Masc | Fem ; - - lincat Nat = {s : Str ; g : Gen} ; - lincat Prop = {s : Mod => Str} ; - - lin Even x = {s = - table { - m => x.s ++ - case m of {Ind => "est" ; Subj => "soit"} ++ - case x.g of {Masc => "pair" ; Fem => "paire"} - } - } ; -``` -To learn more about these constructs, consult GF documentation, e.g. the -[../../../doc/tutorial/gf-tutorial2.html New Grammarian's Tutorial]. -However, in what follows we will show how to avoid learning them and -still write linguistically correct grammars. - - -#NEW -==Language + Libraries== - -Writing natural language grammars still requires -theoretical knowledge about the language. - - - -Which kind of a programmer is it easier to find? - -- one who can write a sorting algorithm -- one who can write a grammar for Swedish determiners - - - - -In main-stream programming, sorting algorithms are not -written by hand but taken from **libraries**. - - - -In the same way, we want to create grammar libraries that encapsulate -basic linguistic facts. - - - -Cf. the Java success story: the language is just a half of the -success - libraries are another half. - - - -#NEW -==Example of library-based grammar writing== - -To define a Swedish expression of a mathematical predicate from scratch: -``` - Even x = - let jämn = case <x.n,x.g> of { - <Sg,Utr> => "jämn" ; - <Sg,Neutr> => "jämnt" ; - <Pl,_> => "jämna" - } - in - {s = table { - Main => x.s ! Nom ++ "är" ++ jämn ; - Inv => "är" ++ x.s ! Nom ++ jämn ; - Sub => x.s ! Nom ++ "är" ++ jämn - } - } -``` -To use library functions for syntax and morphology: -``` - Even = predA (regA "jämn") ; -``` -For the French version, we write -``` - Even = predA (regA "pair") ; -``` - - - -#NEW -==Questions in grammar library design== - -What should there be in the library? - -- morphology, lexicon, syntax, semantics,... - - - -How do we organize and present the library? - -- division into modules, level of granularity - -- "school grammar" vs. sophisticated linguistic concepts - - - -Where do we get the data from? - -- automatic extraction or hand-writing? - -- reuse of existing resources? - -Extra constraint: we want open-source free software and -hence cannot use existing proprietary resources. - - -#NEW -==Answers to questions in grammar library design== - -The current GF resource grammar library has -made the following decisions: - -The library has, for each language - -- complete morphology, some lexicon (500 words), representative fragment of syntax, -very little semantics, - - - -Organization and presentation: - -- division into top-level (API) modules, and internal modules (only -interesting for resource implementors) - -- the API is, as much as possible, common in different languages - -- we favour "school grammar" concepts rather than innovative linguistic theory - - - -Where do we get the data from? - -- morphology and syntax are hand-written - -- the 500-word lexicon is hand-written, but a tool is provided - for automatic lexicon extraction - -- we have not reused existing resources - -The resource grammar library is entirely -open-source free software (under GNU GPL license). - - - - - -#NEW -==The scope of a resource grammar library for a language== - -All morphological paradigms - - - -Basic lexicon of structural, common, and irregular words - - - -Basic syntactic structures - - - -Currently, -- //no// semantics, -- //no// language-specific structures if not necessary for expressivity. - - - - - -#NEW -==Success criteria== - -Grammatical correctness - - - -Semantic coverage: you can express whatever you want. - - - -Usability as library for non-linguists. - - - -(Bonus for linguists:) nice generalizations w.r.t. language -families, using the module system of GF. - - - -#NEW -==These are not our success criteria== - -Language coverage: to be able to parse all expressions. - -Example: -the French //passé simple// tense, although covered by the -morphology, is not used in the language-independent API, but -only the //passé composé// is. However, an application -accessing the French-specific (or Romance-specific) -modules can use the passé simple. - - - -Semantic correctness: only to produce meaningful expressions. - -Example: the following sentences can be generated -``` - colourless green ideas sleep furiously - - the time is seventy past forty-two -``` -However, an applicatio grammar can use a domain-specific -semantics to guarantee semantic well-formedness. - - - -(Warning for linguists:) theoretical innovation in -syntax is not among the goals -(and it would be hidden from users anyway!). - - - -#NEW -==So where is semantics?== - -GF incorporates a **Logical Framework** and is therefore -capable of expressing logical semantics //à la// Montague -or any other flavour, including anaphora and discourse. - - - -But we do //not// try to give semantics once and -for all for the whole language. - - - -Instead, we expect semantics to be given in -**application grammars** built on semantic models -of different domains. - - - -Example application: number theory -``` - fun Even : Nat -> Prop ; -- a mathematical predicate - - lin Even = predA (regA "even") ; -- English translation - lin Even = predA (regA "pair") ; -- French translation - lin Even = predA (regA "jämn") ; -- Swedish translation -``` -How could the resource predict that just //these// -translations are correct in this domain? - - - -Application grammars are built by experts of these domains -who - thanks to resource grammars - do no more need to be -experts in linguistics. - - - - - - - -#NEW -==Languages== - -The current GF Resource Project covers ten languages: - --``Dan``ish --``Eng``lish --``Fin``nish --``Fre``nch --``Ger``man --``Ita``lian --``Nor``wegian --``Rus``sian --``Spa``nish --``Swe``dish - -The first three letters (``Dan`` etc) are used in grammar module names - - - -#NEW -==Library structure 1: language-independent API== - - -- ``Lang`` is the top module collecting all of the following. - - - -- syntactic ``Categories`` (parts of speech, word classes), e.g. -``` - V ; NP ; CN ; Det ; -- verb, noun phrase, common noun, determiner -``` -- ``Rules`` for combining words and phrases, e.g. -``` - DetNP : Det -> CN -> NP ; -- combine Det and CN into NP -``` -- the most common ``Structural`` words (determiners, -conjunctions, pronouns) (now 83), e.g. -``` - and_Conj : Conj ; -``` -- ``Numerals``, number words from 1 to 999,999 with their -inflections, e.g. -``` - n8 : Digit ; -``` -- ``Basic`` lexicon of (now 218) frequent everyday words -``` - man_N : N ; -``` - - - -In addition, and not included in ``Lang``, there is -- ``SwadeshLex``, lexicon of (now 206) words from the -[http://en.wiktionary.org/wiki/Swadesh_List Swadesh list], e.g. -``` - squeeze_V : V ; -``` -Of course, there is some overlap between ``SwadeshLex`` and the other modules. - - -#NEW -==Library structure 2: language-dependent modules== - -- morphological ``Paradigms``, e.g. Swedish -``` - mkN : Str -> Str -> Str -> Str -> Gender -> N ; -- worst-case nouns - mkN : Str -> N ; -- regular nouns -``` -- (in some languages) irregular ``Verbs``, e.g. -``` - angripa_V = irregV "angripa" "angrep" "angripit" ; -``` -- (not yet available) ``Ext``ended syntax with language-specific rules -``` - PassBli : V2 -> NP -> VP ; -- bli överkörd av ngn -``` - - - -#NEW -==How much can be language-independent?== - -For the ten languages we have considered, it //is// possible -to implement the current API. - - - -Reservations: - -- this does not necessarily extend to all other languages -- this does not necessarily cover the most idiomatic expressions - of each language -- this may not be the easiest API to implement (e.g. negation and -inversion with //do// in English suggest that some other -structure would be more natural) -- it is not guaranteed that same structure has the same semantics -in all different languages - - - -#NEW -==Library structure: language-independent API== - -%#center - [src="Lang.gif] -%#center - - -#NEW -==API documentation== - -[Categories.html Categories] - - -[Rules.html Rules] - - -Two alternative views on sentence formation by predication: -[Clause.html Clause], -[Verbphrase.html Verbphrase] - - -[Structural.html Structural] - - - -[Time.html Time] - - -[Basic.html Basic] - - - -[Lang.html Lang] - - - -See also [../../resource-1.0/doc/gfdoc resource v 1.0 documentation], -now implemented for English, German, and Swedish. - - - -#NEW -==Paradigms documentation== - -[ParadigmsEng.html English paradigms] - -[BasicEng.html example use of English oaradigms] - -[VerbsEng.html English verbs] - - - -[ParadigmsFin.html Finnish paradigms] - -[BasicFin.html example use of Finnish oaradigms] - - - -[ParadigmsFre.html French paradigms] - -[BasicFre.html example use of French paradigms] - -[VerbsFre.html French verbs] - - - -[ParadigmsIta.html Italian paradigms] - -[BasicIta.html example use of Italian paradigms] - -[BeschIta.html Italian verb conjugations] - - - -[ParadigmsNor.html Norwegian paradigms] - -[BasicNor.html example use of Norwegian paradigms] - -[VerbsNor.html Norwegian verbs] - - -[ParadigmsSpa.html Spanish paradigms] - -[BasicSpa.html example use of Spanish paradigms] - -[BeschSpa.html Spanish verb conjugations] - - -[ParadigmsSwe.html Swedish paradigms] - -[BasicSwe.html example use of Swedish paradigms] - -[VerbsSwe.html Swedish verbs] - - - -#NEW -==Use as top-level grammar: testing== - -Import a set of ``LangX`` grammars: -``` - i english/LangEng.gf - i swedish/LangSwe.gf -``` -Alternatively, you can ``make`` a precompiled package of -all the languages by using ``lib/resource/Makefile``: -``` - make - gf langs.gfcm -``` -Then you can test with translation, random generation, morphological analysis... -``` - > p -lang=LangEng "I have loved her." | l -lang=LangFre - Je l' ai aimée. - - > gr -cat=NP | l -multi - The sock - Strumpan - Strømpen - La media - La calza - La chaussette - Sukka -``` - - -#NEW -==Use as top-level grammar: language learning quizzes== - -Morpho quiz with words (e.g. French verbs): -``` - i french/VerbsFre.gf - mq -cat=V -``` -Morpho quiz with phrases (e.g. Swedish clauses): -``` - i swedish/LangSwe.gf - mq -cat=Cl -``` -Translation quiz with sentences (e.g. sentences from English to Swedish): -``` - i swedish/LangEng.gf - i swedish/LangSwe.gf - tq -cat=S LangEng LangSwe -``` - - - - -#NEW -==Use as library== - -Import directly by ``open``: -``` - concrete AppNor of App = open LangNor, ParadigmsNor in {...} -``` -(Note for the users of GF 2.1 and older: -the dummy ``reuse`` modules and their bulky ``.gfr`` versions -are no longer needed!) - - - -If you need to convert resource records to strings, and don't want to know -the concrete type (as you never should), you can use -``` - Predef.toStr : (L : Type) -> L -> Str ; -``` -``L`` must be a linearization type. For instance, -``` - toStr LangNor.CN (ModAP (PositADeg old_ADeg) (UseN car_N)) - ---> "gammel bil" -``` - - - - -#NEW -==Use as library through parser== - -You can use the parser with a ``LangX`` grammar -when developing a resource. - - - -Using the ``-v`` option shows if the parser fails because -of unknown words. -``` - > p -cat=S -v -lexer=words "jag ska åka till Chalmers" - unknown tokens [TS "åka",TS "Chalmers"] -``` -Then try to select words that ``LangX`` recognizes: -``` - > p -cat=S "jag ska springa till Danmark" - UseCl (PosTP TFuture ASimul) - (AdvCl (SPredV i_NP run_V) - (AdvPP (PrepNP to_Prep (UsePN (PNCountry Denmark))))) -``` -Use these API structures and extend vocabulary to match your need. -``` - åka_V = lexV "åker" ; - Chalmers = regPN "Chalmers" neutrum ; -``` - -#NEW -==Syntax editor as library browser== - -You can run the syntax editor on ``LangX`` to -find resource API functions through context-sensitive menus. -For instance, the shell command -``` - gfeditor LangEng.gf LangFre.gf -``` -opens the editor with English and French views. The -[http://www.cs.chalmers.se/%7Eaarne/GF2.0/doc/javaGUImanual/javaGUImanual.htm -Editor User Manual] gives more information on the use of the editor. - - - -A restriction of the editor is that it does not give access to -``ParadigmsX`` modules. An IDE environment extending the editor -to a grammar programming tool is work in progress. - - - - -#NEW -==Example application: a small translation system== - -In this system, you can express questions and answers of -the following forms: -``` - Who chases mice ? - Whom does the lion chase ? - The dog chases cats. -``` -We build the abstract syntax in two phases: - -- [example/Questions.gf>Questions] defines question and - answer forms independently of domain -- [example/Animals.gf>Animals] defines a lexicon with - animals and things that animals do. - - - - -The concrete syntax of English is built in three phases: - -- [example/HandQuestionsI.gf QuestionsI] is a parametrized module - using the API module ``Resource``. -- [example/QuestionsEng.gf QuestionsEng] is an instantiation - of the API with ``ResourceEng``. -- [example/AnimalsEng.gf AnimalsEng] is a concrete syntax - of ``Animals`` using ``ParadigmsEng`` and ``VerbsEng``. - - - - -The concrete syntax of Swedish is built upon ``QuestionsI`` -in a similar way, with the modules -[example/QuestionsSwe.gf QuestionsSwe] and. -[example/AnimalsSwe.gf AnimalsSwe]. - - - -The concrete syntax of French consists similarly of the modules -[example/QuestionsFre.gf QuestionsFre] and -[example/AnimalsFre.gf AnimalsFre]. - - - - -#NEW -==Compiling the example application== - -The resources are bulky, and it takes a therefore a lot of -time and memory to load the grammars. However, they can be -compiled into the ``gfcm`` -(**GF canonical multilingual**) format, -which is almost one thousand times smaller and faster to load -for this set of grammars. - - - -To produce an end-user multilingual grammar ``animals.gfcm``, -write the sequence of compilation commands in a ``gfs`` (**GF script**) -file, say -[example/mkAnimals.gfs ``mkAnimals.gfs``], -and then call GF with -``` - gf <mkAnimals.gfs -``` -To try out the grammar, -``` - > i animals.gfcm - - > gr | l -multi - vem jagar hundar ? - qui chasse des chiens ? - who chases dogs ? -``` - - -#NEW - -==Grammar writing by examples== - -(New in GF 2.3) - - - -You can use the resource grammar as a parser on a special file format, -``.gfe`` ("GF examples"). Here is the real source, -[example/QuestionsI.gfe QuestionsI.gfe], which -generated -[example/QuestionsI.gf QuestionsI.gf]. -when you executed the GF command -``` - i -ex AnimalsEng.gf -``` -Since ``QuestionsI`` is an incomplete module ("functor"), -it need only be built once. This is why only the first -command in ``mkAnimals.gfs`` needs the flag ``-ex``. - - - -Of course, the grammar of any language can be created by -parsing any language, as long as they have a common resource API. -The use of English resource is generally recommended, because it -is smaller and faster to parse than the other languages. - - -#NEW -==Constants and variables in examples== - -The file [example/QuestionsI.gfe QuestionsI.gfe] uses -as resource ``LangEng``, which contains all resource syntax and -a lexicon of ca. 300 words. A linearization rule, such as -``` - lin Who love_V2 man_N = in Phr "who loves men ?" ; -``` -uses as argument variables constants for words that can be found in -the lexicon. It is due to this that the example can be parsed. -When the resulting rule, -``` - lin Who love_V2 man_N = - QuestPhrase (UseQCl (PosTP TPresent ASimul) - (QPredV2 who8one_IP love_V2 (IndefNumNP NoNum (UseN man_N)))) ; -``` -is read by the GF compiler, the identifiers ``love_V2`` and -``man_N`` are not treated as constants, but, following -the normal binding rules of functional languages, as bound variables. -This is what gives the example method the generality that is needed. - - - -To write linearization rules by examples one thus has to know at -least one abstract syntax constant for each category for which -one needs a variable. - - - -#NEW -==Extending the lexicon on the fly== - -The greatest limitation of the example method is that the lexicon -may lack many of the words that are needed in examples. If parsing -fails because of this, the compiler gives a list of unknown words -in its error message. An obvious solution is, -of course, to extend the resource lexicon and try again. -A more light-weight solution is to add a **substitution** to -the example. For instance, if you want the example "the pope" -but the lexicon does not have the word "pope", you can write -``` - lin Pope = in NP "the man" {man_N = regN "pope"} ; -``` -The resulting linearization rule is initially -``` - lin Pope = DefOneNP (UseN man_N) ; -``` -but the substitution changes this to -``` - lin Pope = DefOneNP (UseN (regN "pope")) ; -``` -In this way, you do not have to extend the resource lexicon, but you -need to open the Paradigms module to compile the resulting term. - - - -Of course, the substituted expressions may come from another language -than the main language of the example: -``` - lin Pope = in NP "the man" {man_N = regN "pape" masculine} ; -``` -If many substitutions are needed, semicolons are used as separators: -``` - {man_N = regN "pope" ; walk_N = regV "pray"} ; -``` - - -#NEW -==Implementation details: low-level files== - -**For developers of resource grammars.** -The modules listed in this section should never be imported in application -grammars. - - - -Each of the API implementations uses the following auxiliary resource modules: - -- ``Types``, the morphological paradigms and word classes -- ``Morpho``, inflection machinery -- ``Syntax``, complex categories and their combinations - -In addition, the following language-independent modules from ``lib/prelude`` -are used. - -- ``Predef``, operations whose definitions are hard-coded in GF -- ``Prelude``, generic string and boolean operations -- ``Coordination``, coordination structures for arbitrary categories - - - -#NEW -==Implementation details: the structure of low-level files== - -%#center - [Low.gif] -%#center - - -#NEW -==How to change a resource grammar?== - -In many cases, the source of a bug is in one of -the low-level modules. Try to trace it back there -by starting from the high-level module. - - - -(Much more to be written...) - - -#NEW -==How to write a resource grammar?== - -Start with a more limited goal, e.g. to implement -the ``stoneage`` grammar (``examples/stoneage``) -for your language. - - - -For this, you need - -- most of ``Types`` -- most of ``Morpho`` -- some of ``Syntax`` -- most of ``Paradigms`` - - - - -A useful command to test ``oper``s: -``` - i -retain MorphoRot.gf - cc regNoun "foo" -``` - - - -See also [../../resource-1.0/doc/Resource-HOWTO.html Resource-HOWTO] -(under construction). - - -#NEW -==The use of parametrized modules== - -In two language families, a lot of code is shared. -- Romance: French, Italian, Spanish -- Scandinavian: Danish, Norwegian, Swedish - - -The structure looks like this. - - [] - - -#NEW -==Current status== - - | Language | v0.6 | v0.9 | v1.0 | Paradigms | Lexicon | Verbs | - | Danish | - | X | - | - | - | - - | English | X | X | X | X | X | X - | Finnish | X | + | - | X | X | 0 - | French | X | X | X | X | X | X - | German | X | - | X | X | - | - - | Italian | X | X | - | X | X | X - | Norwegian | - | X | X | X | X | X - | Russian | X | X | - | * | - | - - | Spanish | - | X | - | X | X | X - | Swedish | X | X | X | X | X | X - -X = implemented (few exceptions may occur) - -+ = implemented for a large part - -* = linguistic material ready for implementation - -- = not implemented - -0 = not applicable - - -#NEW -==Known bugs and limitations== - -(//The listed limitations are ones that do not follow from the table on -the previous page//.) - -Danish - -English: -missing uncontracted negations. - -Finnish: -compiling the heuristic paradigms is slow; -possessive and interrogative suffixes have no proper lexer. - -French: -no inverted questions; -some verbs in Basic should be reflexive - -German - -Italian: -no omission of unstressed subject pronouns; -some verbs in Basic should be reflexive; -bad forms of reflexive infinitives - -Norwegian: -possessives of type //bilen min// not included - -Russian - -Spanish: -no omission of unstressed subject pronouns; -no switch to dative case for human objects; -some verbs in Basic should be reflexive; -bad forms of reflexive infinitives; -spurious parameter for verb auxiliary inherited from Romance - - -Swedish: - - - -#NEW -==Obtaining it== - -Get the grammar package from -[http://sourceforge.net/project/showfiles.php?group_id=132285 -GF Download Page]. The current libraries are in -``lib/resource``. Version 0.6 is in -``lib/resource-0.6``. - - - -The very very latest version of GF and its libraries is in -[http://www.cs.chalmers.se/~bringert/gf/downloads/snapshots/ Snapshots]. - diff --git a/doc/gslt-sem-2006.txt b/doc/gslt-sem-2006.txt deleted file mode 100644 index a98959b18..000000000 --- a/doc/gslt-sem-2006.txt +++ /dev/null @@ -1,312 +0,0 @@ -Grammars as Software Libraries -Author: Aarne Ranta -Last update: %%date(%c) - -% NOTE: this is a txt2tags file. -% Create an html file from this file using: -% txt2tags --toc gslt-sem-2006.txt - -%!target:html - -%!postproc(html): #NEW - -#NEW - -==Software Libraries== - -The main device of **division of labour** in programming. - -Instead of writing a sorting algorithm over and over again, -the programmers take it from a library. You write (in Haskell), -``` - Data.List.sort xs -``` -instead of a lot of code actually implementing sorting. - -Practical advantages: -- division of labour -- faster development of new software - - -#NEW - -==Abstraction== - -Libraries promote **abstraction**: you abstract away from details. - -The use of libraries is therefore a good programming style. - -It is also **scientifically interesting** to create libraries: -you have to think about abstractions on your domain of expertise. - -Notice: libraries can bring abstraction to almost any language, -if it just has a support for functions or macros. - - -#NEW - -==Grammars as libraries?== - -Example: we want to create a GUI (Graphical User Interface) button -that says //yes//, and **localize** it to different languages: -``` - Yes Ja Kyllä Oui Ja Sì -``` -Possible ways to do this: -+ Go around dictionaries to find the word in different languages -``` - yesButton english = button "Yes" - yesButton swedish = button "Ja" - yesButton finnish = button "Kyllä" -``` -+ Hire more programmers to perform localization in different languages -+ Use a library ``GUIText`` such that you can write -``` - yesButton lang = button (render lang GUIText.Yes) -``` - - - -#NEW - -==A slightly more advanced example== - -This is what you often see as a feedback from a program: -``` - You have 1 messages. -``` -Or perhaps with a little more thought: -``` - You have 1 message(s). -``` -The code that should be written is of course -``` - mess n = "You have" +++ show n +++ messages ++ "." - where - messages = if n==1 then "message" else "messages" -``` -(E.g. VoiceXML gives good support for this.) - - -#NEW - -==Problems with the more advanced example== - -The same as with "Yes": you have to know the words "you", -"have", "message". - -//Moreover//, you have to know the inflection of the equivalent -of "message": -``` - if n==1 then "meddelande" else "meddelanden" -``` -//Moreover//, you have to know the congruence with different numbers -(e.g. Russian, Arabic): -``` - if n==1 then "m" else - if n==2 then "mein" else "moun" -``` -You also have to know the case required by the verb "have" -(e.g. Finnish: nominative in singular, partitive in plural). - -//Moreover//, you have to know what is the proper way to politely -address the user: -``` - Du har 3 meddelanden / Ni har 3 meddelanden - Vous avez 3 messages / Tu as 3 messages -``` -(This can also depend on country and the kind of program.) - - -#NEW - -==A library-based solution== - -In analogy with the "Yes" case, you write -``` - mess lang n = render lang (MailText.YouHaveMessages n) -``` -Hmm, is this so smart? What about if you want to say -``` - You have 4 documents. - You have 5 jewels. - I have 7 surprises. -``` -It is time to move from **canned text** to a **grammar**. - - - -#NEW - -==An improved library-based solution== - -You may want to write -``` - mess lang n = render lang (Have PolYou (Num n Message)) - sword lang n = render lang (Have FamYou (Num n Sword)) - surpr lang n = render lang (Have I (Num n Surprise)) -``` -For this purpose, you need a library with the following API -(Application Programmer's Interface): -``` - Have : NounPhrase -> NounPhrase -> Sentence - - PolYou, FamYou, I : NounPhrase - - Num : Int -> Noun -> NounPhrase - - Message, Sword, Surprise : Noun -``` -You also need a top-level rendering function -``` - render : Language -> Sentence -> String -``` - - -#NEW - -==An optimal solution?== - -The library API for language will certainly grow big and become -difficult to use. Why could't I just write -``` - mess lang n = render lang (parse english "you have n messages") -``` -To this end, the API should provide the top-level function -``` - parse : Language -> String -> Sentence -``` -The library that we will present actually has this as well! - -The only complication is that ``parse`` does not always return -just one sentence. Those may be zero: -``` - you have n mesaggse -``` -or many: -``` - Have PolYou (Num n Message) - Have FamYou (Num n Message) - Have PlurYou (Num n Message) -``` - - -#NEW - -==The components of a grammar library== - -The library has **construction functions** like -``` - Have : NounPhrase -> NounPhrase -> Sentence - PolYou : NounPhrase -``` -These functions build **grammatical structures**, which -can have different realizations in different languages. - -Therefore we also need **realization functions**, -``` - render : Language -> Sentence -> String - parse : Language -> String -> [Sentence] -``` -Both of them require major linguistic expertise to write - but, -one this is done, they can be used with very little linguistic -knowledge by application programmers! - - -#NEW - -==Implementing a grammar library in GF== - -GF = Grammatical Framework - -Those who know GF have already seen the introduction as a -seduction argument for GF. - -In GF, -- construction functions = **abstract syntax** -- realization functions = **concrete syntax** - - -Example: -``` - abstract GUIText = { - cat Text ; - fun Yes : Text ; - } - concrete GUITextEng of GUIText = { - lin Yes = ss "yes" ; - } - concrete GUITextFin of GUIText = { - lin Yes = ss "kyllä" ; - } -``` - - -#NEW - -==Linearization and parsing== - -The realizatin function is, for each language, implemented by -**linearization rules** (``lin``). - -The linearization rules directly give the ``render`` method: -``` - render english x = GUITextEng.lin x -``` -The GF formalism moreover has the property of **reversibility**: -a set of linearization rules automatically generates a parser as -well. - -While reversibility has a minor importance for the applications -shown above, it is crucial for other applications of GF grammars. - - -#NEW - -==Applying GF== - -**multilingual grammar** = abstract syntax + concrete syntaxes - -Early instances of the idea (from 1998) - **application grammars**: -- multilingual authoring -- domain-specific translation -- dialogue systems - - -Later development (from 2001) - **resource grammars**: -- grammar libraries with language-independent APIs - - -Of course, one important use of resource grammars is -to help writing application grammars in GF. - -In addition to GF itself, GF grammars can be accessed in -Haskell, Prolog, and Java programs. - - -#NEW - -==Domain, ontology, idiom== - -An abstract syntax can represent -- a **semantic model** -- an **ontology** - - -The concrete syntax defines how the **concepts** of the ontology -are represented in natural language (or in a formal language). - -The following requirements are made: -- linguistic correctness (inflection, agreement, word order,...) -- semantic correctness (express the intended concepts) -- conformance to the domain idiom (use natural phrasing) - - -Benefit: translation via semantic model of domain can reach high quality. - -Problem: the expertise of both a linguist and a domain expert are required. - - - - -%http://www.boost.org/ \ No newline at end of file