forked from GitHub/gf-core
248 lines
6.7 KiB
Plaintext
248 lines
6.7 KiB
Plaintext
Steps for Extending RGL to a Large Scale Translation Grammar
|
|
|
|
|
|
|
|
We will add Dutch to the system of big translation grammars.
|
|
|
|
=The Translate grammar=
|
|
|
|
This is where we are
|
|
|
|
$ pwd
|
|
/Users/aarne/GF/lib/src/translator
|
|
|
|
We start from files for German
|
|
|
|
$ ls -l *Ger.gf
|
|
-rw-r--r-- 1 aarne staff 1615550 Apr 10 23:38 DictionaryGer.gf
|
|
-rw-r--r-- 1 aarne staff 3042 Jan 22 15:39 ExtensionsGer.gf
|
|
-rw-r--r-- 1 aarne staff 662 Apr 9 11:14 TranslateGer.gf
|
|
|
|
We make copies of these ones
|
|
|
|
$ cp -p ExtensionsGer.gf ExtensionsDut.gf
|
|
$ cp -p TranslateGer.gf TranslateDut.gf
|
|
|
|
Then we change Ger->Dut in these files
|
|
|
|
We take the common parts of a dictionary ; Ger doesn't have them this way but Spa does
|
|
|
|
$ grep "L\." DictionarySpa.gf >DictionaryDut.gf
|
|
$ grep "S\." DictionarySpa.gf >>DictionaryDut.gf
|
|
|
|
Then we add a header, copying from DictionarySpa and changing Spa->Dut. And of course a "}" to the end!
|
|
|
|
concrete DictionarySpa of Dictionary = CatSpa
|
|
** open ParadigmsSpa, MorphoSpa, IrregSpa, (L=LexiconSpa), (S=StructuralSpa), Prelude in {
|
|
|
|
We can now try compile this, using -s to suppress 60k warnings about missing linearizations:
|
|
|
|
$ gf -s DictionaryDut.gf
|
|
|
|
This goes fine - but what about the translator itself?
|
|
|
|
$ gf -s TranslateDut.gf
|
|
File TenseDut.gf does not exist.
|
|
|
|
Just change it to TenseX as in many other languages, as Dutch has no special tenses. Try again (in GF shell):
|
|
|
|
> i TranslateDut.gf
|
|
File ConstructionDut.gf does not exist.
|
|
|
|
Let us just comment this inheritance out from TranslateDut, like in some other languages where
|
|
this module is not yet available. The same with DocumentationDut.
|
|
|
|
---- ConstructionDut,
|
|
---- DocumentationDut,
|
|
|
|
I use four dashes for comments meaning "to be fixed soon". Try again:
|
|
|
|
> i TranslateDut.gf
|
|
File ChunkDut.gf does not exist.
|
|
|
|
This is more critical, since we want a robust translator! Let's fix this:
|
|
|
|
$ cd ../chunk/
|
|
$ cp -p ChunkGer.gf ChunkDut.gf
|
|
$ cd ../translator/
|
|
|
|
Again, go to ChunkDut.gf and change Ger->Dut. Also look for double quotes and change strings in them. E.g.
|
|
|
|
copula_inf_Chunk = ss "sein" --> copula_inf_Chunk = ss "zijn"
|
|
|
|
Now try again (in GF):
|
|
|
|
> i TranslateDut.gf
|
|
Warning: In inherited module Extensions,
|
|
...
|
|
no occurrence of element BaseVPI
|
|
|
|
Now we notice that ExtraDut is just a dummy module. We comment out all references to it in ExtensionsDut; of course we
|
|
will fix ExtraDut later. E.g.
|
|
|
|
---- BaseVPI = E.BaseVPI ;
|
|
|
|
We could continue commenting out things that don't compile. We could just give up and comment out ExtensionsDut from TranslateDut.
|
|
It doesn't use many functions anyway...
|
|
|
|
---- ExtensionsDut [CompoundCN,AdAdV,UttAdV,ApposNP,MkVPI, MkVPS, PredVPS, PassVPSlash, PassAgentVPSlash],
|
|
|
|
Unfortunately, ChunkDut also needs it. So let's at least make it compile by commenting out all offensive functions.
|
|
There is not much left, and in ChunkDut we also comment out whatever the compiler complains about, with four dashes.
|
|
We obtain
|
|
|
|
concrete ChunkDut of Chunk = CatDut
|
|
---- , ExtensionsDut
|
|
**
|
|
ChunkFunctor - [UseVC, VPS_Chunk, emptyNP, VPI_Chunk]
|
|
with (Syntax = SyntaxDut), (Extensions = ExtensionsDut)
|
|
**
|
|
open
|
|
SyntaxDut, (E = ExtensionsDut), Prelude,
|
|
ResDut, (P = ParadigmsDut) in {
|
|
|
|
Et voilà:
|
|
|
|
> i TranslateDut.gf
|
|
linking ... OK
|
|
|
|
Languages: TranslateDut
|
|
|
|
Let us try it:
|
|
|
|
> gr | l -treebank
|
|
Translate: ChunkPhr (PlusChunk fullstop_Chunk (OneChunk refl_SgP1_Chunk))
|
|
TranslateDut: * . mij zelf
|
|
|
|
Let us make it compilable in GF/lib/src/Makefile by adding entries for TranslateDut and Translate11 - since we now have 11 languages.
|
|
Again, we can look for TranslateGer and make a copy beside it, as well as Translate10:
|
|
|
|
TranslateGer: TranslateGer.pgf
|
|
TranslateDut: TranslateDut.pgf
|
|
|
|
TranslateDut.pgf:: ; $(GFMKT) -name=TranslateDut translator/TranslateDut.gf
|
|
|
|
# Without dependencies:
|
|
Translate11:
|
|
$(GFMKT) -name=Translate11 $(TRANSLATE11) +RTS -K32M
|
|
|
|
# With dependencies:
|
|
Translate11.pgf: $(TRANSLATE10)
|
|
$(GFMKT) -name=Translate11 $(TRANSLATE11) +RTS -K32M
|
|
|
|
Since we have everything up to date in Translate10, let us just add the necessary new things to include Dut:
|
|
|
|
$ pwd
|
|
/Users/aarne/GF/lib/src
|
|
|
|
$ make TranslateDut.pgf
|
|
|
|
$ make Translate11
|
|
|
|
We can first try it in the plain C runtime:
|
|
|
|
$ pgf-translate Translate11.pgf Phr TranslateEng TranslateDut
|
|
> what is this
|
|
0.07 sec
|
|
[18.070923] ChunkPhr (OneChunk (QS_Chunk (UseQCl (TTAnt TPres ASimul) PPos (QuestIComp (CompIP whatSg_IP)
|
|
(DetNP (DetQuant this_Quant NumSg))))))
|
|
* wat is dit
|
|
wat is dit
|
|
> can we translate now
|
|
0.19 sec
|
|
[35.258053] ChunkPhr (OneChunk (QS_Chunk (UseQCl (TTAnt TPres ASimul) PPos (QuestCl (PredVP (UsePron we_Pron)
|
|
(AdvVP (ComplVV can_1_VV (UseV translate_V)) now_Adv))))))
|
|
* kunnen we nu [translate_V]
|
|
kunnen we nu [translate_V]
|
|
|
|
What about the web application?
|
|
|
|
First make the new grammar accessible:
|
|
|
|
cd GF/src/www/robust/
|
|
$ ls
|
|
App10.pgf Translate10.pgf Translate8.pgf
|
|
$ ln -s /Users/aarne/GF/lib/src/Translate11.pgf
|
|
|
|
Then update the reference to this grammar - change Translate10 to Translate11 in one place:
|
|
|
|
$ cd ..
|
|
$ grep Translate10 */*.js
|
|
js/gftranslate.js:gftranslate.jsonurl="/robust/Translate10.pgf"
|
|
|
|
Try start the gf server
|
|
|
|
gf -server --document-root=/Users/aarne/GF/src/www/
|
|
|
|
Point your browser to http://localhost:41296/wc.html
|
|
|
|
Wait a bit, and you will see Dutch among the available languages!
|
|
|
|
|
|
|
|
=Building the Android app=
|
|
|
|
Navigate to the App directory and create AppDut; also change Ger->Dut as before
|
|
|
|
$ pwd
|
|
/Users/aarne/GF/examples/app
|
|
|
|
$ cp -p AppGer.gf AppDut.gf
|
|
|
|
Extend the Makefile as before:
|
|
|
|
TRANSLATE11=$(TRANSLATE10) AppDut.pgf
|
|
# Without dependencies:
|
|
App11:
|
|
$(GFMKT) -name=App11 $(TRANSLATE11) +RTS -K200M
|
|
|
|
Make it:
|
|
|
|
$ make AppDut.pgf
|
|
$ make App11
|
|
|
|
Check that all languages are consistently included:
|
|
|
|
$ gf +RTS -K200M App11.pgf
|
|
Languages: AppBul AppChi AppDut AppEng AppFin AppFre AppGer AppHin AppIta AppSpa AppSwe
|
|
|
|
App> l house_N
|
|
къща
|
|
房 子
|
|
huis
|
|
house
|
|
talo
|
|
maison
|
|
Haus
|
|
शाला
|
|
casa
|
|
casa
|
|
hus
|
|
|
|
Now follow the instructions in README in the app/ directory.
|
|
You also need to add to Translator.java, in a place near AppGer reference,
|
|
|
|
new Language("nl-NL", "Dutch", "AppDut", R.xml.qwerty),
|
|
|
|
|
|
=The TopDictionary=
|
|
|
|
Once you have DictionaryDut, go to GF/lib/src/translate/ and do
|
|
|
|
$ ghci
|
|
Prelude> :l CheckDict.hs
|
|
*Main> createConcrete "Dut"
|
|
|
|
This creates the file GF/lib/src/translate/todo/tmp/TopDictionaryDut.gf, which has words in frequency order.
|
|
Copy this one level up, to GF/lib/src/translate/todo/TopDictionaryDut.gf, and follow the instructions in
|
|
|
|
http://www.grammaticalframework.org/lib/src/translator/todo/check-dictionary.html
|
|
|
|
|
|
to improve the dictionary in frequency order.
|
|
|
|
|
|
|
|
|
|
|