mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-23 11:42:49 -06:00
GSLT sem, final version
This commit is contained in:
@@ -44,7 +44,7 @@ Staff contributions to grammar libraries:
|
|||||||
- Aarne Ranta
|
- Aarne Ranta
|
||||||
|
|
||||||
|
|
||||||
Student projects on libraries:
|
Student projects on grammar libraries:
|
||||||
- Inger Andersson & Therese Söderberg: Spanish morphology
|
- Inger Andersson & Therese Söderberg: Spanish morphology
|
||||||
- Ludmilla Bogavac: Russian morphology
|
- Ludmilla Bogavac: Russian morphology
|
||||||
- Ali El Dada: Arabic morphology and syntax
|
- Ali El Dada: Arabic morphology and syntax
|
||||||
@@ -52,6 +52,12 @@ Student projects on libraries:
|
|||||||
- Michael Pellauer: Estonian morphology
|
- Michael Pellauer: Estonian morphology
|
||||||
|
|
||||||
|
|
||||||
|
Technology, also:
|
||||||
|
- Håkan Burden
|
||||||
|
- Hans-Joachim Daniels
|
||||||
|
- Kristofer Johannisson
|
||||||
|
- Peter Ljunglöf
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
|
|
||||||
@@ -67,7 +73,6 @@ the programmers take it from a library. You write (in Haskell),
|
|||||||
instead of a lot of code actually implementing sorting.
|
instead of a lot of code actually implementing sorting.
|
||||||
|
|
||||||
Practical advantages:
|
Practical advantages:
|
||||||
- division of labour
|
|
||||||
- faster development of new software
|
- faster development of new software
|
||||||
- quality guarantee and automatic improvements
|
- quality guarantee and automatic improvements
|
||||||
|
|
||||||
@@ -109,11 +114,20 @@ Possible ways to do this:
|
|||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
|
|
||||||
3. Use a library ``GUIText`` such that you can write
|
3. Use a library ``Text`` such that you can write
|
||||||
```
|
```
|
||||||
yesButton lang = button (render lang GUIText.Yes)
|
yesButton lang = button (Text.render lang Text.Yes)
|
||||||
|
```
|
||||||
|
The library has an API (Application Programmer's Interface) with:
|
||||||
|
+ A repository of text elements such as
|
||||||
|
```
|
||||||
|
Yes : Text
|
||||||
|
No : Text
|
||||||
|
```
|
||||||
|
+ A function rendering text elements in different languages:
|
||||||
|
```
|
||||||
|
render : Language -> Text -> String
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
@@ -134,7 +148,7 @@ The code that should be written is of course
|
|||||||
where
|
where
|
||||||
messages = if n==1 then "message" else "messages"
|
messages = if n==1 then "message" else "messages"
|
||||||
```
|
```
|
||||||
(E.g. VoiceXML gives support for this.)
|
(E.g. VoiceXML supports this.)
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
@@ -163,8 +177,11 @@ of "message":
|
|||||||
==More problems with the advanced example==
|
==More problems with the advanced example==
|
||||||
|
|
||||||
You also have to know the case required by the verb "have"
|
You also have to know the case required by the verb "have"
|
||||||
(e.g. Finnish: nominative in singular, partitive in plural).
|
e.g. Finnish:
|
||||||
|
```
|
||||||
|
1 viesti -- nominative
|
||||||
|
4 viestiä -- partitive
|
||||||
|
```
|
||||||
//Moreover//, you have to know what is the proper way to politely
|
//Moreover//, you have to know what is the proper way to politely
|
||||||
address the user:
|
address the user:
|
||||||
```
|
```
|
||||||
@@ -180,7 +197,7 @@ address the user:
|
|||||||
|
|
||||||
In analogy with the "Yes" case, you write
|
In analogy with the "Yes" case, you write
|
||||||
```
|
```
|
||||||
mess lang n = render lang (MailText.YouHaveMessages n)
|
mess lang n = render lang (Text.YouHaveMessages n)
|
||||||
```
|
```
|
||||||
Hmm, is this so smart? What about if you want to say
|
Hmm, is this so smart? What about if you want to say
|
||||||
```
|
```
|
||||||
@@ -202,8 +219,7 @@ You may want to write
|
|||||||
sword lang n = render lang (Have FamYou (Num n Jewel))
|
sword lang n = render lang (Have FamYou (Num n Jewel))
|
||||||
surpr lang n = render lang (Have I (Num n Surprise))
|
surpr lang n = render lang (Have I (Num n Surprise))
|
||||||
```
|
```
|
||||||
For this purpose, you need a library with the following API
|
For this purpose, you need a library with the API
|
||||||
(Application Programmer's Interface):
|
|
||||||
```
|
```
|
||||||
Have : NounPhrase -> NounPhrase -> Sentence
|
Have : NounPhrase -> NounPhrase -> Sentence
|
||||||
|
|
||||||
@@ -213,16 +229,13 @@ For this purpose, you need a library with the following API
|
|||||||
Num : Int -> Noun -> NounPhrase
|
Num : Int -> Noun -> NounPhrase
|
||||||
|
|
||||||
Message : Noun
|
Message : Noun
|
||||||
```
|
Jewel : Noun
|
||||||
You also need a top-level rendering function
|
|
||||||
```
|
|
||||||
render : Language -> Sentence -> String
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
|
|
||||||
==An optimal solution?==
|
==The ultimate solution?==
|
||||||
|
|
||||||
The library API for language will certainly grow big and become
|
The library API for language will certainly grow big and become
|
||||||
difficult to use. Why couldn't I just write
|
difficult to use. Why couldn't I just write
|
||||||
@@ -241,14 +254,18 @@ The library that we will present actually has this as well!
|
|||||||
The only complication is that ``parse`` does not always return
|
The only complication is that ``parse`` does not always return
|
||||||
just one sentence. Those may be zero:
|
just one sentence. Those may be zero:
|
||||||
```
|
```
|
||||||
you have n mesaggse
|
"you have n mesaggse"
|
||||||
|
|
||||||
```
|
```
|
||||||
or many:
|
or many:
|
||||||
```
|
```
|
||||||
|
"you have n messages"
|
||||||
|
|
||||||
Have PolYou (Num n Message)
|
Have PolYou (Num n Message)
|
||||||
Have FamYou (Num n Message)
|
Have FamYou (Num n Message)
|
||||||
Have PlurYou (Num n Message)
|
Have PlurYou (Num n Message)
|
||||||
```
|
```
|
||||||
|
Thus some amount of interaction is needed.
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
@@ -268,7 +285,7 @@ Therefore we also need **realization functions**,
|
|||||||
render : Language -> Sentence -> String
|
render : Language -> Sentence -> String
|
||||||
parse : Language -> String -> [Sentence]
|
parse : Language -> String -> [Sentence]
|
||||||
```
|
```
|
||||||
Both of them require major linguistic expertise to write - but,
|
Both of them require linguistic expertise to write - but,
|
||||||
one this is done, they can be used with very little linguistic
|
one this is done, they can be used with very little linguistic
|
||||||
knowledge by application programmers!
|
knowledge by application programmers!
|
||||||
|
|
||||||
@@ -291,17 +308,20 @@ In GF,
|
|||||||
|
|
||||||
Simplest possible example:
|
Simplest possible example:
|
||||||
```
|
```
|
||||||
abstract GUIText = {
|
abstract Text = {
|
||||||
cat Text ;
|
cat Text ;
|
||||||
fun Yes : Text ;
|
fun Yes : Text ;
|
||||||
|
fun No : Text ;
|
||||||
}
|
}
|
||||||
|
|
||||||
concrete GUITextEng of GUIText = {
|
concrete TextEng of Text = {
|
||||||
lin Yes = ss "yes" ;
|
lin Yes = ss "yes" ;
|
||||||
|
lin No = ss "no" ;
|
||||||
}
|
}
|
||||||
|
|
||||||
concrete GUITextFin of GUIText = {
|
concrete TextFin of Text = {
|
||||||
lin Yes = ss "kyllä" ;
|
lin Yes = ss "kyllä" ;
|
||||||
|
lin No = ss "ei" ;
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -315,11 +335,11 @@ The realizatin function is, for each language, implemented by
|
|||||||
|
|
||||||
The linearization rules directly give the ``render`` method:
|
The linearization rules directly give the ``render`` method:
|
||||||
```
|
```
|
||||||
render english x = GUITextEng.lin x
|
render english x = TextEng.lin x
|
||||||
```
|
```
|
||||||
The GF formalism moreover has the property of **reversibility**:
|
The GF formalism moreover has the property of **reversibility**:
|
||||||
a set of linearization rules automatically generates a parser as
|
- a set of linearization rules automatically generates a parser.
|
||||||
well.
|
|
||||||
|
|
||||||
%While reversibility has a minor importance for the applications
|
%While reversibility has a minor importance for the applications
|
||||||
%shown above, it is crucial for other applications of GF grammars.
|
%shown above, it is crucial for other applications of GF grammars.
|
||||||
@@ -332,8 +352,8 @@ well.
|
|||||||
**multilingual grammar** = abstract syntax + concrete syntaxes
|
**multilingual grammar** = abstract syntax + concrete syntaxes
|
||||||
|
|
||||||
Examples of the idea:
|
Examples of the idea:
|
||||||
- multilingual authoring
|
|
||||||
- domain-specific translation
|
- domain-specific translation
|
||||||
|
- multilingual authoring
|
||||||
- dialogue systems
|
- dialogue systems
|
||||||
|
|
||||||
|
|
||||||
@@ -342,17 +362,17 @@ Examples of the idea:
|
|||||||
|
|
||||||
==Domain, ontology, idiom==
|
==Domain, ontology, idiom==
|
||||||
|
|
||||||
An abstract syntax represents
|
An abstract syntax has other names:
|
||||||
- a **semantic model**
|
- a **semantic model**
|
||||||
- an **ontology**
|
- an **ontology**
|
||||||
|
|
||||||
|
|
||||||
The concrete syntax defines how the concepts of the ontology
|
The concrete syntax defines how the ontology
|
||||||
are represented in a language.
|
is represented in a language.
|
||||||
|
|
||||||
The following requirements are made:
|
The following requirements are made:
|
||||||
- linguistic correctness (inflection, agreement, word order,...)
|
- linguistic correctness (inflection, agreement, word order,...)
|
||||||
- semantic correctness (express the intended concepts)
|
- semantic correctness (express the concepts properly)
|
||||||
- conformance to the domain idiom (use proper terms and phrasing)
|
- conformance to the domain idiom (use proper terms and phrasing)
|
||||||
|
|
||||||
|
|
||||||
@@ -373,17 +393,17 @@ Arithmetic of natural numbers: abstract syntax
|
|||||||
**Concrete syntax**: mapping from abstract syntax trees to strings in a language
|
**Concrete syntax**: mapping from abstract syntax trees to strings in a language
|
||||||
(English, French, German, Swedish,...)
|
(English, French, German, Swedish,...)
|
||||||
```
|
```
|
||||||
lin Even x = {s = x.s ++ "is" ++ "even"} ;
|
lin Even x = {s = x.s ++ "is" ++ "even"} ;
|
||||||
lin Even x = {s = x.s ++ "est" ++ "pair"} ;
|
lin Even x = {s = x.s ++ "est" ++ "pair"} ;
|
||||||
lin Even x = {s = x.s ++ "ist" ++ "gerade"} ;
|
lin Even x = {s = x.s ++ "ist" ++ "gerade"} ;
|
||||||
lin Even x = {s = x.s ++ "är" ++ "jämnt"} ;
|
lin Even x = {s = x.s ++ "är" ++ "jämnt"} ;
|
||||||
```
|
```
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
|
|
||||||
==Translation system==
|
==Translation system==
|
||||||
|
|
||||||
We can **translate** between languages via the abstract syntax:
|
We can translate using the abstract syntax as interlingua:
|
||||||
```
|
```
|
||||||
4 is even 4 ist gerade
|
4 is even 4 ist gerade
|
||||||
\ /
|
\ /
|
||||||
@@ -405,7 +425,7 @@ The previous multilingual grammar breaks these rules in many situations:
|
|||||||
2 and 3 is even
|
2 and 3 is even
|
||||||
la somme de 3 et de 5 est pair
|
la somme de 3 et de 5 est pair
|
||||||
wenn 2 ist gerade, dann 2+2 ist gerade
|
wenn 2 ist gerade, dann 2+2 ist gerade
|
||||||
om 2 är jämnt, 2+2 är jämnt
|
om x är jämnt, summan av x och 2 är jämnt
|
||||||
```
|
```
|
||||||
All these sentences are grammatically incorrect.
|
All these sentences are grammatically incorrect.
|
||||||
|
|
||||||
@@ -415,11 +435,10 @@ All these sentences are grammatically incorrect.
|
|||||||
|
|
||||||
==Solving the difficulties==
|
==Solving the difficulties==
|
||||||
|
|
||||||
GF can express the linguistic rules that are needed to
|
GF //can// express the linguistic rules that are needed to
|
||||||
produce correct translations. (Expressive power
|
produce correct translations:
|
||||||
between TAG and HPSG, but the language is more high-level.)
|
|
||||||
|
|
||||||
Instead of just strings, we need **parameters**, **tables**,
|
In addition to strings, we use **parameters**, **tables**,
|
||||||
and **record types**. For instance, French:
|
and **record types**. For instance, French:
|
||||||
```
|
```
|
||||||
param Mod = Ind | Subj ;
|
param Mod = Ind | Subj ;
|
||||||
@@ -455,20 +474,33 @@ Resource grammar ("syntactic grammar")
|
|||||||
- author: linguist
|
- author: linguist
|
||||||
|
|
||||||
|
|
||||||
|
#NEW
|
||||||
|
==GF as programming language==
|
||||||
|
|
||||||
|
The expressive power is between TAG and HPSG.
|
||||||
|
|
||||||
|
The language is more high-level: a modern, **typed functional programming language**.
|
||||||
|
|
||||||
|
It enables linguistic generalizations and abstractions.
|
||||||
|
|
||||||
|
But we don't want to bother application grammarians with these details.
|
||||||
|
|
||||||
|
We have built a **module system** that can hide details.
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
|
|
||||||
==Concrete syntax using library==
|
==Concrete syntax using library==
|
||||||
|
|
||||||
Language-independent API
|
Assume the following API
|
||||||
```
|
```
|
||||||
cat S ; NP ; A ;
|
cat S ; NP ; A ;
|
||||||
|
|
||||||
fun predA : NP -> A -> S ;
|
fun predA : A -> NP -> S ;
|
||||||
|
|
||||||
oper regA : Str -> A ;
|
oper regA : Str -> A ;
|
||||||
```
|
```
|
||||||
Implementation for four languages
|
Now implement ``Even`` for four languages
|
||||||
```
|
```
|
||||||
lincat
|
lincat
|
||||||
Prop = S ;
|
Prop = S ;
|
||||||
@@ -479,11 +511,11 @@ Implementation for four languages
|
|||||||
Even = predA (regA "pair") ; -- French
|
Even = predA (regA "pair") ; -- French
|
||||||
Even = predA (regA "gerade") ; -- German
|
Even = predA (regA "gerade") ; -- German
|
||||||
```
|
```
|
||||||
Notice: choice of adjective is domain expert knowledge.
|
Notice: the choice of adjective is domain expert knowledge.
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
==Design questions for grammar the library==
|
==Design questions for the grammar library==
|
||||||
|
|
||||||
What should there be in the library?
|
What should there be in the library?
|
||||||
- morphology, lexicon, syntax, semantics,...
|
- morphology, lexicon, syntax, semantics,...
|
||||||
@@ -506,7 +538,7 @@ hence cannot use existing proprietary resources.
|
|||||||
#NEW
|
#NEW
|
||||||
==Design decisions==
|
==Design decisions==
|
||||||
|
|
||||||
The current GF resource grammar library has, for each language,
|
Coverage, for each language:
|
||||||
- complete morphology
|
- complete morphology
|
||||||
- lexicon of the most important structural words
|
- lexicon of the most important structural words
|
||||||
- test lexicon of ca. 300 content words
|
- test lexicon of ca. 300 content words
|
||||||
@@ -514,13 +546,16 @@ The current GF resource grammar library has, for each language,
|
|||||||
- rather flat semantics (cf. Quasi-Logical Form of CLE)
|
- rather flat semantics (cf. Quasi-Logical Form of CLE)
|
||||||
|
|
||||||
|
|
||||||
Organization and presentation:
|
Organization:
|
||||||
- top-level (API) modules
|
- top-level (API) modules
|
||||||
- internal modules (only interesting for resource implementors)
|
- Ground API + special-purpose APIs ("macro packages")
|
||||||
- we favour "school grammar" concepts rather than innovative linguistic theory
|
- "school grammar" concepts rather than advanced linguistic theory
|
||||||
- tool ``gfdoc`` for generating HTML from grammars
|
|
||||||
|
|
||||||
|
|
||||||
|
Presentation:
|
||||||
|
- tool ``gfdoc`` for generating HTML from grammars
|
||||||
|
- example collections
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
==Design decisions, cont'd==
|
==Design decisions, cont'd==
|
||||||
@@ -533,17 +568,14 @@ Where do we get the data from?
|
|||||||
- we have not reused existing resources
|
- we have not reused existing resources
|
||||||
|
|
||||||
|
|
||||||
The resource grammar library is entirely
|
The resource grammar library is entirely open-source free software (under GNU GPL license).
|
||||||
open-source free software (under GNU GPL license).
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
==Success criteria==
|
==Success criteria and evaluation==
|
||||||
|
|
||||||
Grammatical correctness of everything generated.
|
Grammatical correctness of everything generated.
|
||||||
|
|
||||||
@@ -551,56 +583,58 @@ Semantic coverage: you can express whatever you want.
|
|||||||
|
|
||||||
Usability as library for non-linguists.
|
Usability as library for non-linguists.
|
||||||
|
|
||||||
(Bonus for linguists:) nice generalizations w.r.t. language
|
Evaluation: tested in third-party projects.
|
||||||
families, using the module system of GF.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
==These are not our success criteria==
|
==These are not our success criteria==
|
||||||
|
|
||||||
Language coverage: to be able to parse all expressions.
|
Language coverage:
|
||||||
|
- to be able to parse all expressions.
|
||||||
- Example: French //passé simple//, although covered by the
|
- Example: French //passé simple//, although covered by the
|
||||||
morphology, is not available through the language-independent API.
|
morphology, is not available through the language-independent API.
|
||||||
|
- But: reconsidered to improve example-based grammar writing
|
||||||
|
|
||||||
|
|
||||||
Semantic correctness: only to produce meaningful expressions.
|
Semantic correctness:
|
||||||
|
- only to produce meaningful expressions.
|
||||||
- Example: the following sentences can be generated
|
- Example: the following sentences can be generated
|
||||||
```
|
```
|
||||||
colourless green ideas sleep furiously
|
colourless green ideas sleep furiously
|
||||||
|
|
||||||
the time is seventy past forty-two
|
the time is seventy past forty-two
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
(Warning for linguists:) theoretical innovation in
|
Linguistic innovation in syntax:
|
||||||
syntax is not among the goals
|
- rather a presentation of "known facts"
|
||||||
(and it would be hidden from users anyway!).
|
- innovation would be hidden from users anyway...
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
==So where is semantics?==
|
==Where is semantics?==
|
||||||
|
|
||||||
Application grammars typically use domain-specific
|
Application grammars use domain-specific
|
||||||
semantics to guarantee semantic well-formedness.
|
semantics to guarantee semantic well-formedness.
|
||||||
|
|
||||||
GF incorporates a **Logical Framework** and is therefore
|
GF incorporates a **Logical Framework** and can express
|
||||||
capable of expressing logical semantics //à la// Montague
|
- logical semantics //à la// Montague
|
||||||
or any other flavour, including anaphora and discourse.
|
- anaphora and discourse using dependent types
|
||||||
|
|
||||||
|
|
||||||
|
Language-independent API is a rough semantic model.
|
||||||
|
|
||||||
But we do //not// try to give semantics once and
|
But we do //not// try to give semantics once and
|
||||||
for all for the whole language.
|
for all for the whole language.
|
||||||
|
|
||||||
Instead, we expect semantics to be given in
|
|
||||||
**application grammars** built on semantic models
|
|
||||||
of different domains.
|
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
==Levels of representation==
|
==Representations in different APIs==
|
||||||
|
|
||||||
No fixed set of levels; here some examples:
|
**Grammar composition**: any grammar can serve as resource to another one.
|
||||||
|
|
||||||
|
No fixed set of representation levels; here some examples for
|
||||||
```
|
```
|
||||||
2 is even
|
2 is even
|
||||||
2 är jämnt
|
2 är jämnt
|
||||||
@@ -616,8 +650,10 @@ In ``Predication`` (high level resource API)
|
|||||||
```
|
```
|
||||||
In ``Lang`` (ground level resource API)
|
In ``Lang`` (ground level resource API)
|
||||||
```
|
```
|
||||||
UseCl TPres ASimul PPos (PredVP (UsePN (IntPN 2)) (UseComp (CompAP (PositA (regA "even")))))
|
UseCl TPres ASimul PPos (PredVP (UsePN (IntPN 2))
|
||||||
UseCl TPres ASimul PPos (PredVP (UsePN (IntPN 2)) (UseComp (CompAP (PositA (regA "jämn")))))
|
(UseComp (CompAP (PositA (regA "even")))))
|
||||||
|
UseCl TPres ASimul PPos (PredVP (UsePN (IntPN 2))
|
||||||
|
(UseComp (CompAP (PositA (regA "jämn")))))
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
@@ -632,15 +668,15 @@ The current GF Resource Project covers ten languages:
|
|||||||
- ``Fre``nch
|
- ``Fre``nch
|
||||||
- ``Ger``man
|
- ``Ger``man
|
||||||
- ``Ita``lian
|
- ``Ita``lian
|
||||||
- ``Nor``wegian
|
- ``Nor``wegian (bokmål)
|
||||||
- ``Rus``sian
|
- ``Rus``sian
|
||||||
- ``Spa``nish
|
- ``Spa``nish
|
||||||
- ``Swe``dish
|
- ``Swe``dish
|
||||||
|
|
||||||
|
|
||||||
The first three letters (``Dan`` etc) are used in grammar module names
|
Implementation of API v 1.0 projected for the end of February.
|
||||||
|
|
||||||
In addition, we have parts (morphology) of Arabic, Estonian, and Urdu
|
In addition, we have parts (morphology) of Arabic, Estonian, Latin, and Urdu
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
@@ -652,26 +688,44 @@ In addition, we have parts (morphology) of Arabic, Estonian, and Urdu
|
|||||||
|
|
||||||
[Examples of each category gfdoc/Cat.html]
|
[Examples of each category gfdoc/Cat.html]
|
||||||
|
|
||||||
|
Cf. "matrix" in BLARK, LinGo
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
==Library structure 2: language-dependent modules==
|
==Library structure 2: language-dependent APIs==
|
||||||
|
|
||||||
- morphological paradigms, e.g. ``ParadigmsSwe``
|
- morphological paradigms, e.g. ``ParadigmsSwe``
|
||||||
```
|
```
|
||||||
mkN : (x1,_,_,x4 : Str) -> N ; -- worst-case noun constructor
|
mkN : (man,mannen,män,männen : Str) -> N ; -- worst-case nouns
|
||||||
regN : Str -> N ; -- regular noun constructor
|
regV : (leker : Str) -> V ; -- regular verbs
|
||||||
```
|
```
|
||||||
- (in some languages) irregular verbs (and other words), e.g. ``IrregSwe``
|
- irregular words esp. verbs, e.g. ``IrregSwe``
|
||||||
```
|
```
|
||||||
angripa_V = irregV "angripa" "angrep" "angripit" ;
|
angripa_V = irregV "angripa" "angrep" "angripit" ;
|
||||||
```
|
```
|
||||||
- (not yet available) exended syntax with language-specific rules, e.g. ``ExtNor``
|
- exended syntax with language-specific rules, e.g. ``ExtNor``
|
||||||
```
|
```
|
||||||
PostPoss : CN -> Pron -> NP ; -- bilen min
|
PostPoss : CN -> Pron -> NP ; -- bilen min
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
#NEW
|
||||||
|
==Difficulties encountered==
|
||||||
|
|
||||||
|
English: negation and auxiliary vs. non-auxiliary verbs
|
||||||
|
|
||||||
|
Finnish: object case
|
||||||
|
|
||||||
|
German: double infinitives
|
||||||
|
|
||||||
|
Romance: clitic pronouns
|
||||||
|
|
||||||
|
Scandinavian: determiners
|
||||||
|
|
||||||
|
//In particular//: how to make the grammars efficient
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
==How much can be language-independent?==
|
==How much can be language-independent?==
|
||||||
|
|
||||||
@@ -682,10 +736,33 @@ Reservations:
|
|||||||
|
|
||||||
- does not necessarily extend to all other languages
|
- does not necessarily extend to all other languages
|
||||||
- does not necessarily cover the most idiomatic expressions of each language
|
- does not necessarily cover the most idiomatic expressions of each language
|
||||||
- may not be the easiest API to implement (e.g. negation and
|
- may not be the easiest API to implement
|
||||||
inversion with //do// in English suggest that some other
|
- e.g. negation and inversion with //do// in English suggest that some other
|
||||||
structure would be more natural)
|
structure would be more natural
|
||||||
- no guaranteed that same structure has the same semantics in all different languages
|
|
||||||
|
|
||||||
|
- the structures may not have the same semantics in all different languages
|
||||||
|
|
||||||
|
|
||||||
|
#NEW
|
||||||
|
==Using the library==
|
||||||
|
|
||||||
|
Simplest case: use the API in the same way for all languages.
|
||||||
|
- **+** grammar localization for free
|
||||||
|
- **-** not the best idioms for each language
|
||||||
|
|
||||||
|
|
||||||
|
In practice: use the API in different ways for different languages
|
||||||
|
```
|
||||||
|
-- Eng: x's name is y
|
||||||
|
Name x y = predNP (GenCN x (regN "name")) (StringNP y)
|
||||||
|
-- Swe: x heter y
|
||||||
|
Name x y = predV2 x heta_V2 (StringNP y)
|
||||||
|
```
|
||||||
|
This amounts to **compile-time transfer**.
|
||||||
|
|
||||||
|
Surprisingly, writing an application grammar requires more native-speaker knowledge
|
||||||
|
than writing a resource grammar!
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
@@ -703,23 +780,6 @@ Exploited in two families:
|
|||||||
|
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
|
||||||
==Using the library==
|
|
||||||
|
|
||||||
Simplest case: use the API in the same way for all languages.
|
|
||||||
- **+** grammar localization for free
|
|
||||||
- **-** not the best idioms for each language
|
|
||||||
|
|
||||||
|
|
||||||
In practice: use the API in different ways for different languages
|
|
||||||
```
|
|
||||||
Name x y = predNP (GenCN x (regN "name")) (StringNP y) -- Eng: x's name is y
|
|
||||||
Name x y = predV2 x heta_V2 (StringNP y) -- Swe: x heter y
|
|
||||||
```
|
|
||||||
This amounts to **compile-time transfer**.
|
|
||||||
|
|
||||||
Writing an application grammar requires more native-speaker knowledge
|
|
||||||
than writing a resource grammar!
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@@ -774,29 +834,6 @@ Example heuristic, from [ParadigsSwe gfdoc/ParadigmsSwe.html]:
|
|||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
|
||||||
==Corpus generation==
|
|
||||||
|
|
||||||
The most general format is **multilingual treebank** generation:
|
|
||||||
```
|
|
||||||
> gr -tr | l -multi
|
|
||||||
UseCl TCond AAnter PPos (PredVP (DetCN (DetSg DefSg NoOrd)
|
|
||||||
(AdjCN (PositA young_A) (UseN man_N))) (ComplV2 love_V2 (UsePron she_Pron)))
|
|
||||||
|
|
||||||
den unga mannen skulle ha älskat henne
|
|
||||||
|
|
||||||
der junge Mann würde sie geliebt haben
|
|
||||||
|
|
||||||
le jeune homme l' aurait aimée
|
|
||||||
|
|
||||||
the young man would have loved her
|
|
||||||
```
|
|
||||||
A special case is corpus generation, either exhaustive or random with
|
|
||||||
or without probability weights attached to constructors.
|
|
||||||
|
|
||||||
Cf. Rebecca Jonson this afternoon.
|
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
==Use as program components==
|
==Use as program components==
|
||||||
|
|
||||||
@@ -807,6 +844,49 @@ Parsing, generation, translation
|
|||||||
Push-button creation of spoken language translators (using Nuance)
|
Push-button creation of spoken language translators (using Nuance)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
#NEW
|
||||||
|
==Grammar library as linguistic resource==
|
||||||
|
|
||||||
|
Can we use the libraries outside domain-specific fragments?
|
||||||
|
|
||||||
|
We seem to be approaching full coverage from below.
|
||||||
|
|
||||||
|
The resource API is not good for heavy-duty parsing (too abstract and
|
||||||
|
therefore too inefficient).
|
||||||
|
|
||||||
|
Two ideas:
|
||||||
|
- write shallow parsers as application grammars
|
||||||
|
- generate corpora and use statistic parsing methods
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
#NEW
|
||||||
|
==Corpus generation==
|
||||||
|
|
||||||
|
The most general format is **multilingual treebank** generation:
|
||||||
|
```
|
||||||
|
> gr -tr | l -multi
|
||||||
|
UseCl TCond AAnter PNeg (PredVP (DetCN (DetSg DefSg NoOrd)
|
||||||
|
(AdjCN (PositA young_A) (UseN woman_N))) (ComplV2 love_V2 (UsePron he_Pron)))
|
||||||
|
|
||||||
|
The young woman wouldn't have loved him
|
||||||
|
Den unga kvinnan skulle inte ha älskat honom
|
||||||
|
Den unge kvinna ville ikke ha elska ham
|
||||||
|
La joven mujer no lo habría amado
|
||||||
|
La giovane donna non lo avrebbe amato
|
||||||
|
La jeune femme ne l' aurait pas aimé
|
||||||
|
Nuori nainen ei olisi rakastanut häntä
|
||||||
|
```
|
||||||
|
This is either exhaustive or random, possibly
|
||||||
|
with probability weights attached to constructors.
|
||||||
|
|
||||||
|
A special case is **corpus generation**: just leave one language.
|
||||||
|
|
||||||
|
Can this be useful? Cf. Rebecca Jonson this afternoon.
|
||||||
|
|
||||||
|
|
||||||
#NEW
|
#NEW
|
||||||
==Related work==
|
==Related work==
|
||||||
|
|
||||||
@@ -818,10 +898,23 @@ CLE = Core Language Engine
|
|||||||
- therefore, transfer at compile time as often as possible
|
- therefore, transfer at compile time as often as possible
|
||||||
|
|
||||||
|
|
||||||
Lingo Matrix project (HPSG)
|
LinGo Matrix project (HPSG)
|
||||||
- methodology rather than formal discipline for multilingual grammars
|
- methodology rather than formal discipline for multilingual grammars
|
||||||
- wider coverage
|
|
||||||
- not aimed as library, no grammar specialization?
|
- not aimed as library, no grammar specialization?
|
||||||
|
- wider coverage - parsing real texts
|
||||||
|
|
||||||
|
|
||||||
|
Parsing detached from grammar (Nivre) - grammar detached from parsing
|
||||||
|
|
||||||
|
#NEW
|
||||||
|
==Demo==
|
||||||
|
|
||||||
|
Stoneage grammar, based on the Swadesh word list.
|
||||||
|
|
||||||
|
Implemented as application on top of the resource grammar.
|
||||||
|
|
||||||
|
Illustrate generation and spoken-language parsing.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
%http://www.boost.org/
|
%http://www.boost.org/
|
||||||
|
|||||||
Reference in New Issue
Block a user