mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-22 19:22:50 -06:00
119 lines
3.8 KiB
Plaintext
119 lines
3.8 KiB
Plaintext
AR 28/3/2013
|
|
|
|
26/3 Morphology from Kotus.
|
|
|
|
27/3 Senses from Princeton.
|
|
|
|
27/3
|
|
Designed new paradigms. Filtered problematic/illegal things (PLURNOUN, ILLEGALVERB, POSTPONE, TODO).
|
|
Just 9035 lemmas missing now.
|
|
|
|
28/3
|
|
Set up an experiment with 3220 complete trees from Penn prepared by Krasimir. First results:
|
|
561 no linearization
|
|
960 lin with unknowns
|
|
|
|
around 20 missing syntax constructions, 230 missing words
|
|
|
|
Tests generated by
|
|
|
|
gf -run ~/GF/lib/src/ParseEngFin.pgf <wsj.full >4-eng-fin-wsj.txt
|
|
|
|
with
|
|
|
|
l -treebank -bind PhrUtt NoPConj (UttS (UseCl (TTAnt TPast ASimul)
|
|
|
|
|
|
29/3
|
|
Added most missing syntax constructions.
|
|
Some new opers in ParadigmsFin, and 230 more words in DictEngFin: out of 3220 Penn trees now 2721
|
|
are completely translated (but mostly not so well...)
|
|
317 no lin
|
|
182 lin with unknowns
|
|
|
|
After implementing GerundN and GerundNP, only 40 lin with unknowns. But the implementations are bad:
|
|
- applying to run-time V prevents correct vowel harmony
|
|
- composite forms with "minen" should be "mis", e.g. hinnoitteleminendetaljit
|
|
|
|
Counting funs:
|
|
|
|
gf ../GF/lib/src/ParseEng.pgf <wsj.full >funs-wsj.txt
|
|
|
|
with
|
|
|
|
pt -funs PhrUtt NoPConj (UttS (UseCl (TTAnt TPast ASimul) ...
|
|
|
|
From this, with some ghci commands, created freq-wsj.txt, showing
|
|
|
|
AdvVP 1174
|
|
AdvNP 1075
|
|
UsePron 749
|
|
PossNP 749
|
|
UseV 675
|
|
in_Prep 671
|
|
and_Conj 659
|
|
UseComp 651
|
|
IIDig 620
|
|
|
|
and a total of 4512 fun's used in the 3220 trees.
|
|
|
|
Then created a list of missing funs in ParseFin: there are 8820 of them. However, only 80 missing funs appear in the corpus!
|
|
|
|
some_Quant 72
|
|
anyPl_Det 44
|
|
part_of_N2 34
|
|
both_Det 32
|
|
most_Det 28
|
|
ComplN2 21
|
|
several_Num 19
|
|
another_Quant 19
|
|
UseN2 16
|
|
neither_Det 11
|
|
CNNumNP 8
|
|
draw_V2 7
|
|
aware_of_A2 7
|
|
|
|
The next thing is to find out why ComplN2 and UseN2 are missing - they should be there.
|
|
It turned out that this happens just because there was no N2 in the lexicon. Strange... adding just
|
|
"part of" and "idea of" (as well as "familiar with") changes 35 sentences. Now only 9 with unknown
|
|
constants. 314 without lin.
|
|
|
|
Attacked the first ten missing constructs down to 4 occurrences. Now 13 with unknown, 167 missing.
|
|
Thus almost 95% complete.
|
|
|
|
Defined some more, down to the 34th with 2 occurrences. Now 32 missing, 18 with unknown
|
|
(version 7, 7-eng-fin-wsj.txt). Thus over 98% complete. Soon time to fix errors in the things covered!
|
|
|
|
Fixed obvious errors in "date" (taateli -> päivämäärä) and "force" (polttaminen -> voima). Effect on
|
|
24 examples.
|
|
|
|
|
|
30/3
|
|
|
|
Version 9: Changed subcat's of 170 of 230 V2's (the ones with 3 occurrences or more). One hour's work. Changes in 1124
|
|
translations.
|
|
|
|
Also changed the default genitive of symbol (+n) to +in, to be uniform with the other cases. Works for words ending
|
|
in a consonant: Inteln -> Intelin. But a proper morphological analysis with dynamic lex extension is what would
|
|
really be needed.
|
|
|
|
Fixed NounFin.IndefArt, which erroneously added "yksi" to the substantival form of numeral determiners. This changed 125
|
|
linearizations - but there are some mistaken parses of numbers in the treebank, in particular years. Also fixed the passive
|
|
VP in the infinitive form, to better results in 95 sentences - but this structure should be different in Finnish.
|
|
|
|
Fixing passive past tenses improved 250 sentences! Incredibly, they had been missing in the RGL. As well as the correct
|
|
form of the compounds: "minut ollaan nähty" -> "minut on nähty" ("I have been seen").
|
|
|
|
Fixed the form for NPossNom and NPossGen. It had been mistakenly the Nom form. This gave "rakkausnsa" ("his love").
|
|
The proper form is the tk-2 prefix of the essive case: "rakkautensa"; the tk-1 genitive won't do ("rakkaudensa").
|
|
This changed to the better 81 sentences.
|
|
|
|
Added NCompound, or form nr 10, to nouns. This may differ from Nom Sg, e.g. käteinenvirtaus -> käteisvirtaus. 107 errors
|
|
corrected by this.
|
|
|
|
|
|
|
|
|
|
|
|
|