mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-11 13:59:31 -06:00
369 lines
14 KiB
HTML
369 lines
14 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
|
|
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
|
|
<TITLE>European Resource Grammar Summer School</TITLE>
|
|
</HEAD><BODY BGCOLOR="white" TEXT="black">
|
|
<P ALIGN="center"><CENTER><H1>European Resource Grammar Summer School</H1>
|
|
<FONT SIZE="4">
|
|
<I>Gothenburg, August 2009</I><BR>
|
|
Aarne Ranta (aarne at chalmers.se)
|
|
</FONT></CENTER>
|
|
|
|
<P>
|
|
<IMG ALIGN="middle" SRC="eu-langs.png" BORDER="0" ALT="">
|
|
</P>
|
|
<H3>Executive summary</H3>
|
|
<P>
|
|
We plan to organize a summer school with the goal of implementing the GF
|
|
resource grammar library for 15 new languages, so that the library will
|
|
cover all the 23 official EU languages of year 2009.
|
|
As a test application of the grammars, also an extension of
|
|
the WebALT mathematical exercise translator will be built for each
|
|
language.
|
|
</P>
|
|
<P>
|
|
2 students per language are selected to the summer school, after a phase of
|
|
self-studies and on the basis of assignments that consist of parts of the resource
|
|
grammars. Travel and accommodation are paid to these participants.
|
|
If funding gets arranged, the call of participation for the summer school will
|
|
be announced in February 2009, and the summer school itself will take place
|
|
in August 2009, in Gothenburg.
|
|
</P>
|
|
<H2>Introduction</H2>
|
|
<P>
|
|
Since 2007, EU-27 has 23 official languages, listed in the diagram on top of this
|
|
document.
|
|
There is a growing need of translation between
|
|
these languages. The traditional language-to-language method requires 23*22 = 506
|
|
translators (humans or computer programs) to cover all possible translation needs.
|
|
</P>
|
|
<P>
|
|
An alternative to language-to-language translation is the use of an <B>interlingua</B>:
|
|
a language-independent representation such that all translation problems can
|
|
be reduced to translating to and from the interlingua. With 23 languages,
|
|
only 2*23 = 46 translators are needed.
|
|
</P>
|
|
<P>
|
|
Interlingua sounds too good to be true. In a sense, it is. All attempts to
|
|
create an interlingua that would solve all translation problems have failed.
|
|
However, interlinguas for restricted applications have shown more
|
|
success. For instance, mathematical texts and weather reports can be translated
|
|
by using interlinguas tailor-made for the domains of mathematics and weather reports,
|
|
respectively.
|
|
</P>
|
|
<P>
|
|
What is required of an interlingua is
|
|
</P>
|
|
<UL>
|
|
<LI>semantic accuracy: correspondence to what you want to say in the application
|
|
<LI>language-independence: abstraction from individual languages
|
|
</UL>
|
|
|
|
<P>
|
|
Thus, for instance, an interlingua for mathematical texts may be based on
|
|
mathematical logic, which at the same time gives semantic accuracy and
|
|
language independence. In other domains, something else than mathematical
|
|
logic may be needed; the <B>ontologies</B> defined within the semantic
|
|
web technology are often good starting points for interlinguas.
|
|
</P>
|
|
<H2>GF: a framework for multilingual grammars</H2>
|
|
<P>
|
|
The interlingua is just one part of a translation system. We also need
|
|
the mappings between the interlingua and the involved languages. As the
|
|
number of languages increases, this part grows while the interlingua remains
|
|
constant.
|
|
</P>
|
|
<P>
|
|
GF (Grammatical Framework,
|
|
<A HREF="http://gf.digitalgrammars.com"><CODE>gf.digitalgrammars.com</CODE></A>)
|
|
is a programming language designed to support interlingua-based translation.
|
|
A "program" in GF is a <B>multilingual grammar</B>, which consists of an
|
|
<B>abstract syntax</B> and a set of <B>concrete syntaxes</B>. A concrete
|
|
syntaxes is a mapping from the abstract syntax to a particular language.
|
|
These mappings are <B>reversible</B>, which means that they can be used for
|
|
translating in both directions. This means that creating an interlingua-based
|
|
translator for 23 languages just requires 1 + 23 = 24 grammar modules (the abstract
|
|
syntax and the concrete syntaxes).
|
|
</P>
|
|
<P>
|
|
The diagram first in this document shows a system covering the 23 EU languages.
|
|
Languages marked in
|
|
red are of particular interest for the summer school, since they are those
|
|
on which the effort will be concentrated.
|
|
</P>
|
|
<H2>The GF resource grammar library</H2>
|
|
<P>
|
|
The GF resource grammar library is a set of grammars used as libraries when
|
|
building interlingua-based translation systems. The library currently covers
|
|
the 9 languages coloured in green in the diagram above; in addition,
|
|
Catalan, Norwegian, and Russian are covered, and there is ongoing work on
|
|
Arabic, Hindi/Urdu, and Thai.
|
|
</P>
|
|
<P>
|
|
The purpose of the resource grammar library is to define the "low-level" structure
|
|
of a language: inflection, word order, agreement. This structure belongs to what
|
|
linguists call morphology and syntax. It can be very complex and requires
|
|
a lot of knowledge. Yet, when translating from one language to another, knowing
|
|
morphology and syntax is but a part of what is needed. The translator (whether human
|
|
or machine) must understand the meaning of what is translated, and must also know
|
|
the idiomatic way to express the meaning in the target language. This knowledge
|
|
can be very domain-dependent and requires in general an expert in the field to
|
|
reach high quality: a mathematician in the field of mathematics, a meteorologist
|
|
in the field of weather reports, etc.
|
|
</P>
|
|
<P>
|
|
The problem is to find a person who is an expert in both the domain of translation
|
|
and in the low-level linguistic details. It is the rareness of this combination
|
|
that has made it difficult to build interlingua-based translation systems.
|
|
The GF resource grammar library has the mission of helping in this task. It encapsulates
|
|
the low-level linguistics in program modules accessed through easy-to-use interfaces.
|
|
Experts on different domains can build translation systems by using the library,
|
|
without knowing low-level linguistics. The idea is much the same as when a
|
|
programmer builds a graphical user interface (GUI) from high-level elements such as
|
|
buttons and menus, without having to care about pixels or geometrical forms.
|
|
</P>
|
|
<H3>Applications of the library</H3>
|
|
<P>
|
|
In addition to translation, the library is also useful in <B>localization</B>,
|
|
that is, porting a piece of software to new languages.
|
|
The GF resource grammar library has been used in three major projects that need
|
|
interlingua-based translation or localization of systems to new languages:
|
|
</P>
|
|
<UL>
|
|
<LI>in KeY,
|
|
<A HREF="http://www.key-project.org/"><CODE>http://www.key-project.org/</CODE></A>,
|
|
for writing formal and informal software specifications (3 languages)
|
|
<LI>in WebALT,
|
|
<A HREF="http://webalt.math.helsinki.fi/content/index_eng.html"><CODE>http://webalt.math.helsinki.fi/content/index_eng.html</CODE></A>,
|
|
for translating mathematical exercises to 7 languages
|
|
<LI>in TALK <A HREF="http://www.talk-project.org"><CODE>http://www.talk-project.org</CODE></A>,
|
|
where the library was used for localizing spoken dialogue systems to six languages
|
|
</UL>
|
|
|
|
<P>
|
|
The library is also a generic linguistic resource, which can be used for tasks
|
|
such as language teaching and information retrieval. The liberal license (GPL)
|
|
makes it usable for anyone and for any task. GF also has tools supporting the
|
|
use of grammars in programs written in other programming languages: C, C++, Haskell,
|
|
Java, JavaScript, and Prolog. In connection with the TALK project, support has also been
|
|
developed for translating GF grammars to language models used in speech
|
|
recognition.
|
|
</P>
|
|
<H3>The structure of the library</H3>
|
|
<P>
|
|
The library has the following main parts:
|
|
</P>
|
|
<UL>
|
|
<LI><B>Inflection paradigms</B>, covering the inflection of each language.
|
|
<LI><B>Common Syntax API</B>, covering a large set of syntax rule that
|
|
can be implemented for all languages involved.
|
|
<LI><B>Common Test Lexicon</B>, giving ca. 500 common words that can be used for
|
|
testing the library.
|
|
<LI><B>Language-Specific Syntax Extensions</B>, covering syntax rules that are
|
|
not implementable for all languages.
|
|
<LI><B>Language-Specific Lexica</B>, word lists for each language, with
|
|
accurate morphological and syntactic information.
|
|
</UL>
|
|
|
|
<P>
|
|
The goal of the summer school is to implement, for each language, at least
|
|
the first three components. The latter three are more open-ended in character.
|
|
</P>
|
|
<H2>The summer school</H2>
|
|
<P>
|
|
The goal of the summer school is to extend the GF resource grammar library
|
|
to covering all 23 EU languages, which means we need 15 new languages.
|
|
</P>
|
|
<P>
|
|
The amount of work and skill is between a Master's thesis and a PhD thesis.
|
|
The Russian implementation was made by Janna Khegai as a part of her
|
|
PhD thesis; the thesis contains other material, too.
|
|
The Arabic implementation was started by Ali El Dada in his Master's thesis,
|
|
but the thesis does not cover the whole API. The realistic amount of work is
|
|
somewhere around 8 person months, but this is very much language-dependent.
|
|
Dutch, for instance, can profit from previous implementations of German and
|
|
Scandinavian languages, and will probably require less work.
|
|
Latvian and Lithuanian are the first languages of the Baltic family and
|
|
will probably require much more work.
|
|
</P>
|
|
<P>
|
|
In any case, the proposed allocation of work power is 2 participants per
|
|
language. They will have 6 months to work at home, followed
|
|
by 2 weeks of summer school. Who are these participants?
|
|
</P>
|
|
<H3>Selecting participants</H3>
|
|
<P>
|
|
After the call has been published, persons interested to participate in
|
|
the project are expected to learn GF by self-study from the
|
|
<A HREF="http://www.cs.chalmers.se/Cs/Research/Language-technology/GF/doc/gf-tutorial.html">tutorial</A>.
|
|
This should take a couple of weeks.
|
|
</P>
|
|
<P>
|
|
After and perhapts in parallel with
|
|
working out the tutorial, the participants should continue to
|
|
implement selected parts of the resource grammar, following the advice from
|
|
the
|
|
<A HREF="http://www.cs.chalmers.se/Cs/Research/Language-technology/GF/doc/Resource-HOWTO.html">Resource-HOWTO document</A>.
|
|
What parts exactly are selected will be announced later.
|
|
This work will take another couple of weeks.
|
|
</P>
|
|
<P>
|
|
This sample resource grammar fragment
|
|
will be submitted to the Summer School Committee in the beginning of May.
|
|
The Committee then decides who is invited to represent which language
|
|
in the summer school.
|
|
</P>
|
|
<P>
|
|
After the Committee decision, the participants have around three months
|
|
to work on their languages. The work is completed in the summer school itself. It is also
|
|
thoroughly tested by using it to add a new language to the WebALT mathematical
|
|
exercise translator.
|
|
</P>
|
|
<P>
|
|
Depending on the quality of submitted work, and on the demands of different
|
|
languages, the Committee may decide to select another number than 2 participants
|
|
for a language. We will also consider accepting participants who want to
|
|
pay their own expenses.
|
|
</P>
|
|
<P>
|
|
Also good proposals from non-EU languages will be considered. Proponents of
|
|
such languages should contact the summer school organizers as early as possible.
|
|
</P>
|
|
<P>
|
|
To keep track on who is working on which language, we will establish a web page
|
|
(Wiki or similar) soon after the call is published. The participants are encourage
|
|
to contact each other and even work in groups.
|
|
</P>
|
|
<H3>Who is qualified</H3>
|
|
<P>
|
|
Writing a resource grammar implementation requires good general programming
|
|
skills, and a good explicit knowledge of the grammar of the target language.
|
|
A typical participant could be
|
|
</P>
|
|
<UL>
|
|
<LI>native or fluent speaker of the target language
|
|
<LI>interested in languages on the theoretical level, and preferably familiar
|
|
with many languages (to be able to think about them on an abstract level)
|
|
<LI>familiar with functional programming languages such as ML or Haskell
|
|
(GF itself is a language similar to these)
|
|
<LI>on Master's or PhD level in linguistics, computer science, or mathematics
|
|
</UL>
|
|
|
|
<P>
|
|
But it is the quality of the assignment that is assessed, not any formal
|
|
requirements. The "typical participant" was described to give an idea of
|
|
who is likely to succeed in this.
|
|
</P>
|
|
<H3>Costs</H3>
|
|
<P>
|
|
Our aim is to make the summer school free of charge for the participants
|
|
who are selected on the basis of their assignments. And not only that:
|
|
we plan to cover their travel and accommodation costs, up to 1000 EUR
|
|
per person.
|
|
</P>
|
|
<P>
|
|
We want to get the funding question settled by mid-February 2009, and make
|
|
the final decision on the summer school then.
|
|
</P>
|
|
<H3>Teachers</H3>
|
|
<P>
|
|
Krasimir Angelov
|
|
</P>
|
|
<P>
|
|
?Olga Caprotti
|
|
</P>
|
|
<P>
|
|
?Lauri Carlson
|
|
</P>
|
|
<P>
|
|
?Robin Cooper
|
|
</P>
|
|
<P>
|
|
?Björn Bringert
|
|
</P>
|
|
<P>
|
|
Håkan Burden
|
|
</P>
|
|
<P>
|
|
?Elisabet Engdahl
|
|
</P>
|
|
<P>
|
|
?Markus Forsberg
|
|
</P>
|
|
<P>
|
|
?Janna Khegai
|
|
</P>
|
|
<P>
|
|
?Peter Ljunglöf
|
|
</P>
|
|
<P>
|
|
?Wanjiku Ng'ang'a
|
|
</P>
|
|
<P>
|
|
Aarne Ranta
|
|
</P>
|
|
<P>
|
|
?Jordi Saludes
|
|
</P>
|
|
<P>
|
|
In addition, we will look for consultants who can help to assess the results
|
|
for each language
|
|
</P>
|
|
<H3>The Summer School Committee</H3>
|
|
<P>
|
|
This committee consists of a number of teachers and consultants,
|
|
who will select the participants.
|
|
</P>
|
|
<H3>Time and Place</H3>
|
|
<P>
|
|
The summer school will
|
|
be organized in Gothenburg in the latter half of August 2009.
|
|
</P>
|
|
<P>
|
|
Time schedule (2009):
|
|
</P>
|
|
<UL>
|
|
<LI>February: announcement of summer school and the grammar
|
|
writing contest to get participants
|
|
<LI>March-April: work on the contest assignment (ca 1 month)
|
|
<LI>May: submission deadline and notification of acceptance
|
|
<LI>June-July: more work on the grammars
|
|
<LI>August: summer school
|
|
</UL>
|
|
|
|
<H3>Dissemination and intellectual property</H3>
|
|
<P>
|
|
The new resource grammars will be released under the GPL just like
|
|
the current resource grammars,
|
|
with the copyright held by respective authors.
|
|
</P>
|
|
<P>
|
|
The grammars will be distributed via the GF web site.
|
|
</P>
|
|
<P>
|
|
The WebALT-specific grammars will have special licenses agreed between the
|
|
authors and WebALT Inc.
|
|
</P>
|
|
<H2>Why I should participate</H2>
|
|
<P>
|
|
Seven reasons:
|
|
</P>
|
|
<OL>
|
|
<LI>free trip and stay in Gothenburg (to be confirmed)
|
|
<LI>participation in a pioneering language technology work in an enthusiastic atmosphere
|
|
<LI>work and fun with people from all over Europe
|
|
<LI>job opportunities and business ideas
|
|
<LI>credits: the school project will be established as a course worth
|
|
15 ETCS points per person, but extensions to Master's thesis will
|
|
also be considered
|
|
<LI>merits: the resulting grammar can easily lead to a published paper
|
|
<LI>contribution to the multilingual and multicultural development of Europe
|
|
</OL>
|
|
|
|
|
|
<!-- html code generated by txt2tags 2.4 (http://txt2tags.sf.net) -->
|
|
<!-- cmdline: txt2tags gf-summerschool.txt -->
|
|
</BODY></HTML>
|