1
0
forked from GitHub/gf-rgl
Files
gf-rgl/tests/german/object-order.README
Hans Leiss 8fb8ddd808 Ger: improved infinitives (and passives); tests with more verbs in testing/german
- NP: added field isLight in order to push negation behind light nps;
  this had been done in gf-3.9 using field isPron, but isPron is now
  used to put accusative pronoun before dative pronoun. Removed field
  adv: adverbial extensions cannot be extracted (todo: also for CN).
  Reduced isLight*isPron to w:Weight with 3 values: WPron, WLight, WHeavy.

- added param Control and field ctrl:Control to classify V2V-verbs into
  subject- and object-contol verbs, use ctrl to make reflexives agree
  with subject resp. object in VPSlash, and refine ComplSlash.

- Verb: new versions of ComplVV, SlashV2V and SlashVV to give better
  (nested) infinitives (extracting infzu and correcting object order).
  a) nested SlashVV doesn't work properly;
  b) SlashV2VNP may have to be commented out to prevent a stack overflow
     when compiling.
  Intended change of SlashV2VNP in tests/german/TestLangGer could not
  be tested due to size problems with the compiler.

- VP: changed field a1 : Polarity => Str to a1:Str to collect the adverbs
  coming before negation, using (negation : Polarity => Str) in mkClause.
  Use objCtrl:Bool instead of missingAdv to let reflexives agree with object.

- ResGer: insertObjNP reorganized, infzuVP added

- DictVerbsGer: some corrections (helft -> hilft, *sprecht -> *spricht)

- Some potential passive rules in tests/german/TestLangGer|Eng

- ExtraGer needs to be cleaned up with repect to the modified mkClause.
2019-09-18 15:16:42 +02:00

282 lines
13 KiB
Plaintext

Implementing pronoun switch e.a. in LangGer HL 13.6.2019 -- 20.6.2019
-------------------------------------------
Ternary verbs v:V3 with two objects of category NP order them
depending on their being (personal) pronouns or not. Basically
non-pronoun order: NonPronNP.dat < NonPronNP.acc
pronoun/nonpronoun: Pron < NonPronNP
pronoun order: Pron.acc < Pron.dat
See also (section II):
http://www.dartmouth.edu/~deutsch/Grammatik/WordOrder/WordOrder.html
What about verbs with other complement cases? Apparently we have
- NP.acc < NP.gen: wir verdächtigen ihn|den Mann ihrer|der Tat
- NP.acc[indir] < NonPronNP.acc[dir]: wir lehren ihn|den Studenten die Kunst
- Pron.acc[dir] < Pron.acc[indir]: wir lehren sie ihn (?)
A collection of relevant example sentences to do some regression tests is
contained in examples.txt. (Definiteness seems to be relevant to order, too.)
============== Main changes made: (cf. discussion below) ======================
1. Categories VP and VPSlash have nn : Agr => Str * Str * Str * Str, where now
nn.p1 contains refl+pron (pron.acc < refl, refl < pron.dat, refl < pron.gen),
nn.p2 contains nonpron NPs (np.dat < np.acc | np.gen) (cf. insertObjNP below)
nn.p3 contains prep NPs
nn.p4 contains predicative A | CN | Adv (inserted by UseComp)
Note: keeping complements in 4 nn-fields may be useful to insert adverbs in between
(not done yet), besides ordering them relative to negation (see below).
Note: become_VA is not treated like a copula (i.e. not using CompAP) (also in Eng):
"bin alt" = (UseComp (CompAP adj)) adds adj to nn.p3 (was:nn.p2),
"werde alt" = (ComplVA become_VA adj), inserts adj into vp.adj
So there is no uniform treatment of copula verbs "sein", "werden", "bleiben".
2. Pronoun switch is done by (insertObjectNP np vp.c vp), such that pron.acc < refl < pron.dat|gen:
(insertObjNP pron acc vp).nn = <pron.acc ++ vp.nn.p1(refl), vp.nn.p2, vp.nn.p3, vp.nn.p4>
(insertObjNP pron case vp).nn = <vp.nn.p1(refl) ++ pron!case, vp.nn.p2, vp.nn.p3, vp.nn.p4>
For other object np's, we enforce np.dat < np.acc|gen: (this doesn't enforce np.acc < np.gen)
(insertObjNP np dat vp).nn = <vp.nn.p1, np!dat ++ vp.nn.p2, vp.nn.p3, vp.nn.p4>
(insertObjNP np case vp).nn = <vp.nn.p1, vp.nn.p2 ++ np!case, vp.nn.p3, vp.nn.p4>
Object pp's are collected in nn.p3:
(insertObjNP np prep vp).nn = <vp.nn.p1, vp.nn.p2, vp.nn.p3 ++ app(prep,np), vp.nn.p4>
Complements (AP|Adv|CN) are collected in vp.nn.p4, using the existing insertObj:
(insertObj compl vp).nn = <vp.nn.p1, vp.nn.p2, vpnn.p3, compl ++ vp.nn.p4>
For verbs v:V3 with 2 acc-arguments, "ich lehre ihn sie|die Mathematik", we can't distinguish
direct object (acc: die Mathematik) from indirect object (acc: ihn), so we get two trees.
Bug: ConstructionGer (mkVP have_V2 (mkNP n:N)) (for "Angst|Recht haben") puts n into nn.p2,
which comes before negation.
(Maybe we need np.isLight to prevent this and put n in nn.p3, or apply UseComp with n.)
3. The ordering of objects, complements and negation in mkClause is changed !!!
The "default" order (if it exists) is subtle, depending on whether objects are
definite, indefinite Sg, indefinitePl, pron, quantified, negated indefinite. We now have
(mkClause subj agr vp) : Clause =
let
obj1 = (vp.nn ! agr).p1 ++ (vp.nn ! agr).p2; -- refl,pronouns < nonpronouns
obj2 = (vp.nn ! agr).p3 ; -- (prep + np)s
compl = (vp.nn ! agr).p4 ++ vp.adj ++ vp.a2 ; -- compl via useComp
in
Main => subj ++ verb.fin ++ obj1 ++ neg ++ obj2 ++ compl ++ vp.infExt ++ infs ++ extra ;
Inv => verb.fin ++ subj ++ obj1 ++ neg ++ obj2 ++ compl ++ vp.infExt ++ infs ++ extra ;
Sub => subj ++ obj1 ++ neg ++ obj2 ++ compl ++ vp.infExt ++ inffin ++ extra
I have *removed* the difference between "light" and "heavy" complements, which gave the
ordering light < neg < heavy. It was using np.isPron=True as "light", but also set
indefinite nps (and DetNPs) as heavy. But this ignored the number, which is also relevant:
ich sehe den Mann nicht ; ich sehe nicht einen Mann
but ich liebe Männer nicht ; * ich liebe nicht Männer
The change now gives: nonpronNP < neg,
ich sehe einen Mann nicht [ ich sehe keinen Mann: via no_Predet ]
ich trinke warmes Bier nicht [ ich trinke nicht Bier: via no_Predet ]
The order accPron < refl < (gen|dat)Pron < neg < nonpronNP sometimes sounds
better, but expresses a different meaning (often available via no_Predet):
sie hat sich nicht alle|viele|?mehrere Namen gemerkt
sie hat sich alle?|viele|mehrere Namen nicht gemerkt
The implemented order nonpronNP < neg gives negation narrow scope relative to the
quantifiers in the objects (a meaning that cannot otherwise be expressed):
einige Lehrer haben jedem Studenten viele Bücher nicht geschickt|empfohlen
=?= some teachers haven't sent|recommended many books to every student
For tests, see examples.txt and how to do regression tests (see below).
Rem.: Having more nn-fields may be useful to put adverbs in between (with
additional scope problems).
4. For reflexive V2's (ich bediene mich einer Sache, ich merke mir eine Sache) or
reflexive V3's (ich entschuldige mich bei dat für acc, ich leihe mir acc bei dat),
some tests are in examples.txt and TestLangGer|Eng. We have enforced refl < neg.
TestLangGer introduces ternary predicates VPSlashSlash. These can be built by
Slash2V4, Slash3V4, Slash4V4 from quaternary verbs v:V4 and a noun phrase np:NP.
(A function SlashV3a : V3 -> VPSlashSlash is omitted to reduce ambiguities.)
SlashV2a turns a (DictVerbsGer-) verb_rV2:V2 into a reflexive VPSlash:
SlashV2a bedienen_gen_rV2 : VPSlash = sich einer Sache bedienen
A reflexive VPSlash can also be built from a V3 by
ReflVPSlash : V3 -> VPSlash,
but maybe this is unnecessary, as
(ComplSlash (ReflVPSlash v3) np) = (ReflVP (Slash3V3 v3 np)).
Todo: Some of this ought to go to ExtraGer.gf.
5. I changed ParadigmsGer.accdatV3 from "mkV3 v dat acc" to "mkV3 v acc dat"
= dirV3 v dat, so that it fits to "dirV3 v p" in Eng and gives the corresponding
trees for sentences with main verb v:V3.
============= Motivating discussion of the situation in gf-3.9 / gf-rgl ==============
LexiconGer has those V3s:
add_V3 = dirV3 (prefixV "hinzu" (regV "fügen")) zu_Prep ;
give_V3 = accdatV3 Irreg.geben_V ;
sell_V3 = accdatV3 (no_geV (regV "verkaufen")) ;
send_V3 = accdatV3 (regV "schicken") ;
talk_V3 = mkV3 (regV "reden") datPrep von_Prep ;
ParadigmsGer defines
mkV3 = overload {
mkV3 : V -> V3 = \v -> lin V3 (v ** {c2 = accPrep ; c3 = datPrep}) ;
mkV3 : V -> Prep -> Prep -> V3 = \v,c,d -> lin V3 (v ** {c2 = c ; c3 = d}) ;
} ;
dirV3 v p = mkV3 v accPrep p ; -- v ** {c2=accPrep; c3=p}
accdatV3 v = mkV3 v datPrep accPrep ; -- v ** {c2=datPrep; c3=accPrep}
LexiconEng says, using, roughly, dirV3 v p = v ** {c2=noPrep ; c3=p}:
give_V3 = mkV3 give_V noPrep noPrep ;
sell_V3 = dirV3 (irregV "sell" "sold" "sold") toP ;
send_V3 = dirV3 (irregV "send" "sent" "sent") toP ;
Apparently, the idea is:
(Ger) direct object = acc = c2; indirect object = dat|gen = c3
(Eng) direct object = noPrep = c2; indirect object = toPrep = c3
BUT then, accdatV3 v should be = dirV3 v datPrep = v**{c2=accPrep,c3=datPrep} !!
Which object is bound "closer" to the verb? Is this regulated by using
Slash2V3 versus Slash3V3, and does this binding strength manifest
itself outside of extraction phenomena?
abstract/Verb.gf says:
ComplSlash : VPSlash -> NP -> VP ; -- love it
SlashV2a : V2 -> VPSlash ; -- love (it)
Slash2V3 : V3 -> NP -> VPSlash ; -- give it (to her)
Slash3V3 : V3 -> NP -> VPSlash ; -- give (it) to her
Roughly, gf-3.9/../VerbGer.gf has:
Slash2V3 v np = insertObjc np!v.c2 (predVc v) ** {c2 = v.c3}
Slash3V3 v np = insertObjc np!v.c3 (predVc v) ** {c2 = v.c2}
So, regardless if any object comes with a preposition,
Slash2V3 v np binds direct object c2 to the verb,
Slash3V3 v np binds indirect object c3 to the verb.
But which is direct, which indirect for acc+acc-verbs:
sie lehrt ihn die Kunst, probably: c2=die Kunst, c3=ihn
And which object is direct, which indirect, for prep+prep-verbs?
sie redet mit ihm über die Kunst: c2=die Kunst, c3=ihm ?
PROBLEM: who tells the user which argument is direct, which not?
Eng: sell_V3 = dirV3 sell_V toP, so c2="", c3="to"
talk_V3 = mkV3 (regV "talk") toP aboutP, so c2="to", c3="about"
(Isn't this inconsistent? Shouldn't we have "mkV3 v dir indir"?)
Ger: sell_V3 = accdatV3 verkaufen_V
= mkV3 verkaufen_V datPrep accPrep, so c2=dat, c3=acc
To get trees with similar meaning, I CHANGED accdatV3 to "mkV3 v acc dat"
in ParadigmsGer (so that it fits to "dirV3 v p" in Eng).
The best would be if mkV3 (with acc-obj) were only available through
dirV3 v p, so one could not use (mkV3 v datPrep accPrep) etc.
---------------- word order in ResGer.mkClause in gf-rgl -------------------
In gf-3.9 resp. gf-rgl, VP.nn : Str*Str collects the nominal (and
adjectival) objects; those object-NPs with flag
isPron = True ; --- means: this is not a heavy NP, but comes before negation
are put before the negation in mkClause:
obj0 = (vp.nn ! agr).p1 ;
obj = (vp.nn ! agr).p2 ;
compl = obj0 ++ neg ++ vp.adj ++ obj ++ vp.a2 ; -- adj added
inf = vp.inf ++ verb.inf.p1 ; -- not used for linearisation of Main/Inv
extra = vp.ext ;
inffin : Str = case <a,vp.isAux> of {
<Anter,True> => verb.fin ++ inf ; -- double inf --# notpresent
_ => inf ++ verb.fin --- or just auxiliary vp
} ;
in case o of {
Main => subj ++ verb.fin ++ compl ++ vp.infExt ++ verb.inf ++ extra ++ vp.inf ;
Inv => verb.fin ++ subj ++ compl ++ vp.infExt ++ verb.inf ++ extra ++ vp.inf ;
Sub => subj ++ compl ++ vp.infExt ++ inffin ++ extra
This is too simple:
DetCN creates an NP with
isPron = det.isDef ; -- ich sehe den Mann nicht vs. ich sehe nicht einen Mann
i.e. (definite article + CN | pronoun) are put in nn.p1, to come before negation
ich sehe ihn nicht, ich sehe den Mann nicht: compl = (nn.p1 + neg + ..)
ich sehe nicht einen Mann : compl = neg ++ nn.p2
=? ich sehe keinen Mann
But: plural indefinite NPs behave different:
I don't see men: ich sehe Männer nicht
ich sehe *(nicht Männer) | keine Männer
Also, singular mass-NPs behave different:
ich trinke nicht *((kaltes) Bier)
ich trinke (kaltes) Bier nicht | ich trinke kein (kaltes) Bier
PROBLEM: do V2 + neg behave the same as V3 + neg? Aren't the relative
scopes of negation and quantifiers fixed (or restricted) by intonation?
------------- Generating some example trees and linearize them (LangGer|Eng) -----------
gr -tr -number=4 UseCl (TTAnt ? ?) ? (PredVP (UsePron i_Pron) (ComplSlash (SlashVV want_VV (SlashV2a see_V2)) (DetCN (DetQuant ? ?) (UseN man_N)))) | l
Pronoun switch with V3 and Slash?V2 works:
l UseCl (TTAnt TPast AAnter) PPos (PredVP (UsePron i_Pron) (ComplSlash (Slash3V3 sell_V3 (DetCN (DetQuant IndefArt NumSg) (UseN woman_N))) (DetCN (DetQuant DefArt NumSg) (UseN book_N))))
I had sold the book to a woman
ich hatte einer Frau das Buch verkauft
l UseCl (TTAnt TCond ASimul) PNeg (PredVP (UsePron i_Pron) (ComplSlash (Slash3V3 sell_V3 (UsePron she_Pron)) (UsePron it_Pron)))
I wouldn't sell it to her
ich würde es ihr nicht verkaufen
------ Regression tests: use gf --run < object-order.gfs or gf> eh object-order.gfs
Form Ger to Eng:
example.txt contains german example sentences marked "positive", "negative", "dubious",
some with two marks. The marks may not always be convincing, as some orderings of negation
and quantified nps afford particular intonation and meaning. (Also, there are incorrect
parse trees due to misuse of MassNP etc., so it needs some inspection to see if the content
of examples.*.out is as it ought to be.)
Part of examples.txt needs TestLangGer|Eng for parsing and translation, in particular
those with reflexive ternary verbs or quaternary verbs (which are not in the RGL).
From Eng to Ger:
examples.eng.txt could also be parsed using LangEng instead of TestLangEng|Ger.
Lang> rf -file=examples.eng.txt -lines | p -lang=LangEng | l -lang="Eng,Ger" -treebank | wf -file=examples.eng.tmp
Lang> rf -file=examples.eng.txt -lines | p -lang=LangEng | l -lang="Eng,Ger" | wf -file=examples.eng2ger.new
Using give_V3 is confusing, as both objects are connected with noPrep. The examples are
repeated using send_V3, which attaches its indirect object with toPrep.
-------------------------------------------------------------------------------------End