mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-09 13:09:33 -06:00
689 lines
20 KiB
HTML
689 lines
20 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
|
|
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8">
|
|
<TITLE>MOLTO Multilingual Phrasebook</TITLE>
|
|
</HEAD><BODY BGCOLOR="white" TEXT="black">
|
|
<P ALIGN="center"><CENTER><H1>MOLTO Multilingual Phrasebook</H1>
|
|
<FONT SIZE="4">
|
|
<I>Krasimir Angelov, Olga Caprotti, Ramona Enache, Thomas Hallgren, Inari Listenmaa, Aarne Ranta, Jordi Saludes, Adam Slaski</I><BR>
|
|
Showcase for project FP7-ICT-247914, Deliverable D10.2.
|
|
</FONT></CENTER>
|
|
|
|
<P></P>
|
|
<HR NOSHADE SIZE=1>
|
|
<P></P>
|
|
<UL>
|
|
<LI><A HREF="#toc1">Purpose</A>
|
|
<LI><A HREF="#toc2">Points illustrated</A>
|
|
<UL>
|
|
<LI><A HREF="#toc3">From the user perspective</A>
|
|
<LI><A HREF="#toc4">From the programmer's perspective</A>
|
|
</UL>
|
|
<LI><A HREF="#toc5">Files</A>
|
|
<UL>
|
|
<LI><A HREF="#toc6">Grammars</A>
|
|
<LI><A HREF="#toc7">Ontology</A>
|
|
<LI><A HREF="#toc8">Run-time system and user interface</A>
|
|
</UL>
|
|
<LI><A HREF="#toc9">Effort and cost</A>
|
|
<LI><A HREF="#toc10">Example-based grammar writing prototype</A>
|
|
<LI><A HREF="#toc11">To Do</A>
|
|
<LI><A HREF="#toc12">How to contribute</A>
|
|
<LI><A HREF="#toc13">Conclusions (tentative)</A>
|
|
<LI><A HREF="#toc14">Acknowledgements</A>
|
|
</UL>
|
|
|
|
<P></P>
|
|
<HR NOSHADE SIZE=1>
|
|
<P></P>
|
|
<P>
|
|
<HR>
|
|
<font size=-1>
|
|
</P>
|
|
<P>
|
|
History
|
|
</P>
|
|
<UL>
|
|
<LI>1 September. Version 1.1: bug fixes, some new constructions.
|
|
<LI>2 June. Version 1.0 released!
|
|
<LI>29 May. Link to Google translate with the current language pair and phrase.
|
|
<LI>27 May. Polish added.
|
|
<LI>26 May. Version 0.9:
|
|
Catalan added, mass/count noun distinction to reduce overgeneration,
|
|
improved web interface.
|
|
<LI>20 May. Version 0.8:
|
|
Spanish added, Bulgarian complete.
|
|
<LI>9 May. Version 0.7:
|
|
Danish and Norwegian added (preliminary versions induced from statistical models
|
|
and resource grammars).
|
|
<LI>3 May. Version 0.6:
|
|
Extended API (now final for release), Dutch added; new user interface with text
|
|
input enabled.
|
|
<LI>10 April. Some additions in API, comments in implementation; regenerated clones.
|
|
<LI>8 April. Added German.
|
|
<LI>7 April. Added the Clone script, applied to initiate the rest of MOLTO languages.
|
|
<LI>6 April. Version 0.4: weekdays, nationalities
|
|
<LI>30 March. Version 0.3: disambiguation grammar for English
|
|
<LI>28 March. Version 0.2: Swe, Ita; cat Action; small phrases.
|
|
<LI>26 March 2010. Version 0.1: Eng, Fin, Fre, Ron; dedicated minibar UI.
|
|
</UL>
|
|
|
|
<P>
|
|
<A HREF="missing.txt">Missing constructs</A>
|
|
</P>
|
|
<P>
|
|
<A HREF="http://www.grammaticalframework.org/demos/phrasebook/">Back to the phrasebook</A>
|
|
</P>
|
|
<P>
|
|
</font>
|
|
<HR>
|
|
</P>
|
|
<A NAME="toc1"></A>
|
|
<H1>Purpose</H1>
|
|
<P>
|
|
This phrasebook is a program for translating touristic phrases
|
|
between 14 European languages included in the
|
|
<A HREF="http://www.molto-project.eu">MOLTO</A> project
|
|
(Multilingual On-Line Translation):
|
|
</P>
|
|
<UL>
|
|
<LI>Bulgarian, Catalan, Danish, Dutch, English,
|
|
Finnish, French, German, Italian, Norwegian,
|
|
Polish, Romanian, Spanish, Swedish
|
|
</UL>
|
|
|
|
<P>
|
|
A Russian version is not yet finished but is projected later. Also other languages may be added.
|
|
</P>
|
|
<P>
|
|
The phrasebook is implemented by using the GF programming language
|
|
(<A HREF="http://grammaticalframework.org">Grammatical Framework</A>).
|
|
It is the first demo for the MOLTO project, released in the third month (by June 2010).
|
|
The first version is a very small system, but it will extended in the course of the project.
|
|
</P>
|
|
<P>
|
|
The phrasebook has the following requirement specification:
|
|
</P>
|
|
<UL>
|
|
<LI>high quality: reliable translations to express yourself in any of the languages
|
|
<LI>translation between all pairs of languages
|
|
<LI>runnable in web browsers
|
|
<LI>runnable on mobile phones (via web browser; Android stand-alone forthcoming)
|
|
<LI>easily extensible by new words (forthcoming: semi-automatic extensions by users)
|
|
</UL>
|
|
|
|
<P>
|
|
The phrasebook is available as open-source software, licensed under GNU LGPL.
|
|
The source code resides in
|
|
<A HREF="http://www.grammaticalframework.org/examples/phrasebook/"><CODE>www.grammaticalframework.org/examples/phrasebook/</CODE></A>
|
|
</P>
|
|
<A NAME="toc2"></A>
|
|
<H1>Points illustrated</H1>
|
|
<A NAME="toc3"></A>
|
|
<H2>From the user perspective</H2>
|
|
<P>
|
|
Interlingua-based translation
|
|
</P>
|
|
<UL>
|
|
<LI>we translate meanings, rather than words
|
|
</UL>
|
|
|
|
<P>
|
|
Incremental parsing
|
|
</P>
|
|
<UL>
|
|
<LI>the user is at every point guided by the list of possible next words
|
|
</UL>
|
|
|
|
<P>
|
|
Mixed modalities
|
|
</P>
|
|
<UL>
|
|
<LI>selection of words ("fridge magnets") combined with text input
|
|
</UL>
|
|
|
|
<P>
|
|
Quasi-incremental translation: many basic types are also used as phrases
|
|
</P>
|
|
<UL>
|
|
<LI>one can translate both words and complete sentences, and get intermediate results
|
|
</UL>
|
|
|
|
<P>
|
|
Disambiguation, esp. of politeness distinctions
|
|
</P>
|
|
<UL>
|
|
<LI>if a phrase has many translations, each of them is shown and given an explanation
|
|
(currently just in English, later in any source language)
|
|
</UL>
|
|
|
|
<P>
|
|
Fall-back to statistical translation
|
|
</P>
|
|
<UL>
|
|
<LI>currently just a link to Google translate (forthcoming: tailor-made statistical models)
|
|
</UL>
|
|
|
|
<P>
|
|
Feed-back from users
|
|
</P>
|
|
<UL>
|
|
<LI>users are welcomed to send comments, bug reports, and better translation suggestions
|
|
</UL>
|
|
|
|
<A NAME="toc4"></A>
|
|
<H2>From the programmer's perspective</H2>
|
|
<P>
|
|
The use of resource grammars and functors
|
|
</P>
|
|
<UL>
|
|
<LI>the translator was implemented on top of an earlier linguistic knowledge base,
|
|
the <A HREF="http://www.grammaticalframework.org/lib">GF Resource Grammar Library</A>
|
|
</UL>
|
|
|
|
<P>
|
|
Example-based grammar writing and grammar induction from statistical models
|
|
(<A HREF="http://translate.google.com">Google translate</A>)
|
|
</P>
|
|
<UL>
|
|
<LI>many of the grammars were created semi-automatically by generalization from
|
|
examples
|
|
</UL>
|
|
|
|
<P>
|
|
Compile-time transfer: especially, in Action in Words
|
|
</P>
|
|
<UL>
|
|
<LI>the structural differences between languages are treated at compile time,
|
|
for maximal run-time efficiency
|
|
</UL>
|
|
|
|
<P>
|
|
The level of skills involved in grammar development
|
|
</P>
|
|
<UL>
|
|
<LI>testing different configurations (see table below)
|
|
</UL>
|
|
|
|
<P>
|
|
Grammar testing
|
|
</P>
|
|
<UL>
|
|
<LI>use of treebanks with guided random generation for initial evaluation and regression testing
|
|
</UL>
|
|
|
|
<A NAME="toc5"></A>
|
|
<H1>Files</H1>
|
|
<A NAME="toc6"></A>
|
|
<H2>Grammars</H2>
|
|
<P>
|
|
<CODE>Sentences</CODE>: general syntactic structures implementable in a uniform way.
|
|
Concrete syntax via the functor <CODE>SencencesI</CODE>.
|
|
</P>
|
|
<P>
|
|
<CODE>Words</CODE>: words and predicates, typically language-dependent.
|
|
Separate concrete syntaxes.
|
|
</P>
|
|
<P>
|
|
<CODE>Greetings</CODE>: idiomatic phrases, string-based.
|
|
Separate concrete syntaxes.
|
|
</P>
|
|
<P>
|
|
<CODE>Phrasebook</CODE>: the top module putting everything together.
|
|
Separate concrete syntaxes.
|
|
</P>
|
|
<P>
|
|
<CODE>DisambPhrasebook</CODE>: disambiguation grammars generating feedback phrases if
|
|
the input language is ambiguous.
|
|
</P>
|
|
<P>
|
|
<CODE>Numeral</CODE>: resource grammar module directly inherited from the library.
|
|
</P>
|
|
<P>
|
|
Here is the module structure as produced in GF by
|
|
</P>
|
|
<PRE>
|
|
> i -retain DisambPhrasebookEng.gf
|
|
> dg -only=Phrasebook*,Sentences*,Words*,Greetings*,Numeral,NumeralEng,DisambPhrasebookEng
|
|
> ! dot -Tpng _gfdepgraph.dot >pgraph.png
|
|
</PRE>
|
|
<P></P>
|
|
<P>
|
|
<IMG ALIGN="middle" SRC="pgraph.png" BORDER="0" ALT="">
|
|
</P>
|
|
<A NAME="toc7"></A>
|
|
<H2>Ontology</H2>
|
|
<P>
|
|
The abstract syntax defines the <B>ontology</B> behind the phrasebook.
|
|
Some explanations can be found in the
|
|
<A HREF="Ontology.html">ontology document</A>, which is produced from the
|
|
abstract syntax files
|
|
<A HREF="http://www.grammaticalframework.org/examples/phrasebook/Sentences.gf"><CODE>Sentences.gf</CODE></A>
|
|
and
|
|
<A HREF="http://www.grammaticalframework.org/examples/phrasebook/Words.gf"><CODE>Words.gf</CODE></A>
|
|
by <CODE>make doc</CODE>.
|
|
</P>
|
|
<A NAME="toc8"></A>
|
|
<H2>Run-time system and user interface</H2>
|
|
<P>
|
|
The phrasebook uses
|
|
the
|
|
<A HREF="http://code.google.com/p/grammatical-framework/wiki/LaunchWebDemos">PGF server</A>
|
|
written in Haskell and the
|
|
<A HREF="http://www.grammaticalframework.org/demos/minibar/about.html">minibar library</A>
|
|
written in JavaScript. Since the sources of these systems are available, anyone can build the phrasebook
|
|
locally on her own computer.
|
|
</P>
|
|
<A NAME="toc9"></A>
|
|
<H1>Effort and cost</H1>
|
|
<TABLE BORDER="1" CELLPADDING="4">
|
|
<TR>
|
|
<TH>Language</TH>
|
|
<TH>Grammarian's language skills</TH>
|
|
<TH>Grammarian's GF skills</TH>
|
|
<TH>Informant used for development</TH>
|
|
<TH>Informant used for testing</TH>
|
|
<TH>Use of external tools</TH>
|
|
<TH>Impact of external tools</TH>
|
|
<TH>Changes on the resource grammar</TH>
|
|
<TH COLSPAN="2">Development time</TH>
|
|
</TR>
|
|
<TR>
|
|
<TD>Bulgarian</TD>
|
|
<TD ALIGN="center">###</TD>
|
|
<TD ALIGN="center">###</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">?</TD>
|
|
<TD ALIGN="center">#</TD>
|
|
<TD ALIGN="center">##</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD>Catalan</TD>
|
|
<TD ALIGN="center">###</TD>
|
|
<TD ALIGN="center">###</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">?</TD>
|
|
<TD ALIGN="center">#</TD>
|
|
<TD ALIGN="center">#</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD>Danish</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">###</TD>
|
|
<TD ALIGN="center">+</TD>
|
|
<TD ALIGN="center">+</TD>
|
|
<TD ALIGN="center">+</TD>
|
|
<TD ALIGN="center">##</TD>
|
|
<TD ALIGN="center">#</TD>
|
|
<TD ALIGN="center">##</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD>Dutch</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">###</TD>
|
|
<TD ALIGN="center">+</TD>
|
|
<TD ALIGN="center">+</TD>
|
|
<TD ALIGN="center">+</TD>
|
|
<TD ALIGN="center">##</TD>
|
|
<TD ALIGN="center">#</TD>
|
|
<TD ALIGN="center">##</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD>English</TD>
|
|
<TD ALIGN="center">##</TD>
|
|
<TD ALIGN="center">###</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">+</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">_</TD>
|
|
<TD ALIGN="center">#</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD>Finnish</TD>
|
|
<TD ALIGN="center">###</TD>
|
|
<TD ALIGN="center">###</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">?</TD>
|
|
<TD ALIGN="center">#</TD>
|
|
<TD ALIGN="center">##</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD>French</TD>
|
|
<TD ALIGN="center">##</TD>
|
|
<TD ALIGN="center">###</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">+</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">?</TD>
|
|
<TD ALIGN="center">#</TD>
|
|
<TD ALIGN="center">#</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD>German</TD>
|
|
<TD ALIGN="center">#</TD>
|
|
<TD ALIGN="center">###</TD>
|
|
<TD ALIGN="center">+</TD>
|
|
<TD ALIGN="center">+</TD>
|
|
<TD ALIGN="center">+</TD>
|
|
<TD ALIGN="center">##</TD>
|
|
<TD ALIGN="center">##</TD>
|
|
<TD ALIGN="center">###</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD>Italian</TD>
|
|
<TD ALIGN="center">###</TD>
|
|
<TD ALIGN="center">#</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">?</TD>
|
|
<TD ALIGN="center">##</TD>
|
|
<TD ALIGN="center">##</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD>Norwegian</TD>
|
|
<TD ALIGN="center">#</TD>
|
|
<TD ALIGN="center">###</TD>
|
|
<TD ALIGN="center">+</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">+</TD>
|
|
<TD ALIGN="center">##</TD>
|
|
<TD ALIGN="center">#</TD>
|
|
<TD ALIGN="center">##</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD>Polish</TD>
|
|
<TD ALIGN="center">###</TD>
|
|
<TD ALIGN="center">###</TD>
|
|
<TD ALIGN="center">+</TD>
|
|
<TD ALIGN="center">+</TD>
|
|
<TD ALIGN="center">+</TD>
|
|
<TD ALIGN="center">#</TD>
|
|
<TD ALIGN="center">#</TD>
|
|
<TD ALIGN="center">##</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD>Romanian</TD>
|
|
<TD ALIGN="center">###</TD>
|
|
<TD ALIGN="center">###</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">+</TD>
|
|
<TD ALIGN="center">#</TD>
|
|
<TD ALIGN="center">###</TD>
|
|
<TD ALIGN="center">###</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD>Spanish</TD>
|
|
<TD ALIGN="center">##</TD>
|
|
<TD ALIGN="center">#</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">?</TD>
|
|
<TD ALIGN="center">_</TD>
|
|
<TD ALIGN="center">##</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD>Swedish</TD>
|
|
<TD ALIGN="center">##</TD>
|
|
<TD ALIGN="center">###</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">+</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">?</TD>
|
|
<TD ALIGN="center">-</TD>
|
|
<TD ALIGN="center">##</TD>
|
|
</TR>
|
|
</TABLE>
|
|
|
|
<P>
|
|
Explanation on scores
|
|
</P>
|
|
<UL>
|
|
<LI>Grammarian's language skills
|
|
<UL>
|
|
<LI>- : no skills
|
|
<LI># : passive knowledge
|
|
<LI>## : fluent non-native
|
|
<LI>### : native speaker
|
|
</UL>
|
|
</UL>
|
|
|
|
<UL>
|
|
<LI>Grammarian's GF skills
|
|
<UL>
|
|
<LI>- : no skills
|
|
<LI># : basic skills (2-day GF tutorial)
|
|
<LI>## : medium skills (previous experience of similar task)
|
|
<LI>### : advanced skills (resource grammar writer/substantial contributor)
|
|
</UL>
|
|
</UL>
|
|
|
|
<UL>
|
|
<LI>Informant used for development/Informant needed for testing/Use of external tools
|
|
<UL>
|
|
<LI>- : no
|
|
<LI>+ : yes
|
|
</UL>
|
|
</UL>
|
|
|
|
<UL>
|
|
<LI>Impact of external tools
|
|
<UL>
|
|
<LI>? : not investigated
|
|
<LI>- : no effect on the Phrasebook
|
|
<LI># : small impact (literal translation, simple idioms)
|
|
<LI>## : medium effect (translation of more forms of words, contextual preposition)
|
|
<LI>### : great effect (no extra work needed, translations are correct)
|
|
</UL>
|
|
</UL>
|
|
|
|
<UL>
|
|
<LI>Changes on the resource grammars
|
|
<UL>
|
|
<LI>- : no changes
|
|
<LI># : 1-3 minor changes
|
|
<LI>## : 4-10 minor changes, 1-3 medium changes
|
|
<LI>### : >10 changes of any kind
|
|
</UL>
|
|
</UL>
|
|
|
|
<UL>
|
|
<LI>Overall effort (including extra work on resource grammars)
|
|
<UL>
|
|
<LI># : less than 8 person hours
|
|
<LI>## : 8-24 person hours
|
|
<LI>### : >24 person hours
|
|
</UL>
|
|
</UL>
|
|
|
|
<A NAME="toc10"></A>
|
|
<H1>Example-based grammar writing prototype</H1>
|
|
<P>
|
|
The figure presents the process of creating a Phrasebook using an example-based
|
|
approach for the language X, where X = {Danish, Dutch, German, Norwegian}.
|
|
</P>
|
|
<P>
|
|
<IMG ALIGN="middle" SRC="picpic.jpg" BORDER="0" ALT="">
|
|
</P>
|
|
<UL>
|
|
<LI>the first step assumes an analysis of the resource grammar and extracts the necessary
|
|
information that functions that build new lexical entries would need.
|
|
A model is built so that the proper forms of the word can be rendered,
|
|
and additional information, such as gender, can be inferred. The script applies
|
|
these rules to each entry that we want to translate into the target language, and
|
|
one obtains a set of constructions.
|
|
<LI>they are furthermore given to an external translator tool (Google translate)
|
|
or a native speaker for translation. One needs the configuration file even if the
|
|
translator is human, because formal knowledge of grammar is not assumed.
|
|
<LI>the translations into the target language are further more processed in order to
|
|
build the linearizations of the categories first, decoding the information received.
|
|
Furthermore, having the words in the lexicon, one can parse the translations of
|
|
functions with the GF parser and generalize from that.
|
|
<LI>the resulting grammar is tested with the aid of a script that generates
|
|
constructions covering all the functions and categories from the grammar, along
|
|
with some other constructions that proved to be problematic in some language.
|
|
The result of the script contains for each construction in the target language
|
|
its English correspondent and the abstract syntax tree. A native speaker
|
|
evaluates the results and if corrections are needed, the algorithm runs again
|
|
with the new examples. Depending on the language skills of the grammar writer,
|
|
the changes can be made directly into the GF files, and the correct examples
|
|
given by the native informant are just kept for validating the results.
|
|
The algorithm is repeated as long as corrections are needed.
|
|
</UL>
|
|
|
|
<P>
|
|
The time needed for preparing the configuration files for a grammar will not be needed
|
|
in the future, since the files are reusable for other applications.
|
|
The time for the second step can be saved if automatic tools, like Google translate
|
|
are used. This is only possible in languages with a simpler morphology and syntax
|
|
and large corpora available.
|
|
Good results were obtained for German and Dutch with Google translate, but for
|
|
languages like Romanian or Polish, which are both complex and lack enough resources,
|
|
the results are discouraging.
|
|
</P>
|
|
<P>
|
|
If the statistical oracle works well, the only step where the presence of a human
|
|
translator is needed is the evaluation and feedback step. An average of 4 hours per
|
|
round and 2 rounds were needed in average for the languages for which we performed
|
|
the experiment. It is possible that more effort is needed for more complex languages.
|
|
</P>
|
|
<A NAME="toc11"></A>
|
|
<H1>To Do</H1>
|
|
<P>
|
|
Disambiguation grammars for other languages than English
|
|
</P>
|
|
<P>
|
|
Extend the abstract lexicon in <CODE>Words</CODE> by hand or (semi)automatically for
|
|
</P>
|
|
<UL>
|
|
<LI>food stuff
|
|
<LI>places
|
|
<LI>actions
|
|
</UL>
|
|
|
|
<P>
|
|
Customizable phone distribution: make your own selection of the 2^15 language subsets
|
|
when downloading the phrasebook to a phone
|
|
</P>
|
|
<A NAME="toc12"></A>
|
|
<H1>How to contribute</H1>
|
|
<P>
|
|
The basic things "everyone" can do is
|
|
</P>
|
|
<UL>
|
|
<LI>complete <A HREF="missing.txt">missing words</A> in concrete syntaxes
|
|
<LI>add new abstract words in <CODE>Words</CODE> and greetings in <CODE>Greetings</CODE>
|
|
</UL>
|
|
|
|
<P>
|
|
The missing concrete syntax entries are added to the <CODE>Words</CODE><I>L</I><CODE>.gf</CODE>
|
|
files for each language <I>L</I>. The
|
|
<A HREF="http://www.grammaticalframework.org/lib/doc/synopsis.html#toc78">morphological paradigms</A>
|
|
of the GF resource library should be used. Actions (prefixed with <CODE>A</CODE>, as <CODE>AWant</CODE>) are
|
|
a little more demanding, since they also require syntax constructors. Greetings (prefixed
|
|
with <CODE>G</CODE>) are pure strings.
|
|
</P>
|
|
<P>
|
|
Some explanations can be found in the
|
|
<A HREF="Implementation.html">implementation document</A>, which is produced from the
|
|
concrete syntax files
|
|
<A HREF="http://www.grammaticalframework.org/examples/phrasebook/SentencesI.gf"><CODE>SentencesI.gf</CODE></A>
|
|
and
|
|
<A HREF="http://www.grammaticalframework.org/examples/phrasebook/WordsEng.gf"><CODE>WordsEng.gf</CODE></A>
|
|
by <CODE>make doc</CODE>.
|
|
</P>
|
|
<P>
|
|
Here are the steps to follow for contributors:
|
|
</P>
|
|
<OL>
|
|
<LI>Make sure you have the latest sources
|
|
from <A HREF="http://www.grammaticalframework.org/doc/gf-developers.html">GF Darcs</A>,
|
|
using <CODE>darcs pull</CODE>.
|
|
<LI>Also make sure that you have compiled the library by <CODE>make present</CODE> in <CODE>gf/lib/src/</CODE>.
|
|
<LI>Work in the directory
|
|
<A HREF="http://www.grammaticalframework.org/examples/phrasebook/"><CODE>gf/examples/phrasebook/</CODE></A>.
|
|
<LI>After you've finished your contribution, recompile the phrasebook by <CODE>make pgf</CODE>.
|
|
<LI>Save your changes in <CODE>darcs record .</CODE> (in the <CODE>phrasebook</CODE> subdirectory).
|
|
<LI>Make a patch file with <CODE>darcs send -o my_phrasebook_patch</CODE>, which you can
|
|
send to GF maintainers.
|
|
<LI>(Recommended:) Test the phrasebook on your local server:
|
|
<OL>
|
|
<LI>Go to <CODE>gf/src/server/</CODE> and follow the instructions in the
|
|
<A HREF="http://code.google.com/p/grammatical-framework/wiki/LaunchWebDemos">project Wiki</A>.
|
|
<LI>Make sure that <CODE>Phrasebook.pgf</CODE> is available to you GF server (see project wiki).
|
|
<LI>Launch <CODE>lighttpd</CODE> (see project wiki).
|
|
<LI>How you can open <CODE>gf/examples/phrasebook/www/phrasebook.html</CODE> and use your phrasebook!
|
|
</OL>
|
|
</OL>
|
|
|
|
<UL>
|
|
<LI>Don't delete anything! But you are free to correct incorrect forms.
|
|
<LI>Don't change the module structure!
|
|
<LI>Don't compromise quality to gain coverage: <I>non multa sed multum!</I>
|
|
</UL>
|
|
|
|
<A NAME="toc13"></A>
|
|
<H1>Conclusions (tentative)</H1>
|
|
<P>
|
|
The grammarian need not be a native speaker of the language.
|
|
</P>
|
|
<P>
|
|
For many languages, the grammarian need not even know the language - native informants are
|
|
enough.
|
|
</P>
|
|
<P>
|
|
However, evaluation by native speakers is necessary.
|
|
</P>
|
|
<P>
|
|
Correct and idiomatic translations are possible.
|
|
</P>
|
|
<P>
|
|
A typical development time was 2-3 person working days per language.
|
|
</P>
|
|
<P>
|
|
Google translate helps in bootstrapping grammars, but must be checked.
|
|
</P>
|
|
<UL>
|
|
<LI>in particular, unreliable for morphologically rich languages
|
|
</UL>
|
|
|
|
<P>
|
|
Resource grammars should give some more support
|
|
</P>
|
|
<UL>
|
|
<LI>higher-level access to constructions like negative expressions
|
|
<LI>large-scale morphological lexica
|
|
</UL>
|
|
|
|
<A NAME="toc14"></A>
|
|
<H1>Acknowledgements</H1>
|
|
<P>
|
|
The Phrasebook has been built in the MOLTO project funded by the European Commission.
|
|
</P>
|
|
<P>
|
|
The authors are grateful to their native speaker informants helping to bootstrap and evaluate
|
|
the grammars:
|
|
Richard Bubel,
|
|
Grégoire Détrez,
|
|
Rise Eilert,
|
|
Karin Keijzer,
|
|
Michał Pałka,
|
|
Willard Rafnsson,
|
|
Nick Smallbone.
|
|
</P>
|
|
|
|
<!-- html code generated by txt2tags 2.5 (http://txt2tags.sf.net) -->
|
|
<!-- cmdline: txt2tags -thtml -\-toc doc-phrasebook.txt -->
|
|
</BODY></HTML>
|