mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-05-20 16:42:51 -06:00
restored the summer school and Resource-HOWTO documents
This commit is contained in:
368
doc/gf-summerschool.html
Normal file
368
doc/gf-summerschool.html
Normal file
@@ -0,0 +1,368 @@
|
||||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
||||
<HTML>
|
||||
<HEAD>
|
||||
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
|
||||
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
|
||||
<TITLE>European Resource Grammar Summer School</TITLE>
|
||||
</HEAD><BODY BGCOLOR="white" TEXT="black">
|
||||
<P ALIGN="center"><CENTER><H1>European Resource Grammar Summer School</H1>
|
||||
<FONT SIZE="4">
|
||||
<I>Gothenburg, August 2009</I><BR>
|
||||
Aarne Ranta (aarne at chalmers.se)
|
||||
</FONT></CENTER>
|
||||
|
||||
<P>
|
||||
<IMG ALIGN="middle" SRC="eu-langs.png" BORDER="0" ALT="">
|
||||
</P>
|
||||
<H3>Executive summary</H3>
|
||||
<P>
|
||||
We plan to organize a summer school with the goal of implementing the GF
|
||||
resource grammar library for 15 new languages, so that the library will
|
||||
cover all the 23 official EU languages of year 2009.
|
||||
As a test application of the grammars, also an extension of
|
||||
the WebALT mathematical exercise translator will be built for each
|
||||
language.
|
||||
</P>
|
||||
<P>
|
||||
2 students per language are selected to the summer school, after a phase of
|
||||
self-studies and on the basis of assignments that consist of parts of the resource
|
||||
grammars. Travel and accommodation are paid to these participants.
|
||||
If funding gets arranged, the call of participation for the summer school will
|
||||
be announced in February 2009, and the summer school itself will take place
|
||||
in August 2009, in Gothenburg.
|
||||
</P>
|
||||
<H2>Introduction</H2>
|
||||
<P>
|
||||
Since 2007, EU-27 has 23 official languages, listed in the diagram on top of this
|
||||
document.
|
||||
There is a growing need of translation between
|
||||
these languages. The traditional language-to-language method requires 23*22 = 506
|
||||
translators (humans or computer programs) to cover all possible translation needs.
|
||||
</P>
|
||||
<P>
|
||||
An alternative to language-to-language translation is the use of an <B>interlingua</B>:
|
||||
a language-independent representation such that all translation problems can
|
||||
be reduced to translating to and from the interlingua. With 23 languages,
|
||||
only 2*23 = 46 translators are needed.
|
||||
</P>
|
||||
<P>
|
||||
Interlingua sounds too good to be true. In a sense, it is. All attempts to
|
||||
create an interlingua that would solve all translation problems have failed.
|
||||
However, interlinguas for restricted applications have shown more
|
||||
success. For instance, mathematical texts and weather reports can be translated
|
||||
by using interlinguas tailor-made for the domains of mathematics and weather reports,
|
||||
respectively.
|
||||
</P>
|
||||
<P>
|
||||
What is required of an interlingua is
|
||||
</P>
|
||||
<UL>
|
||||
<LI>semantic accuracy: correspondence to what you want to say in the application
|
||||
<LI>language-independence: abstraction from individual languages
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
Thus, for instance, an interlingua for mathematical texts may be based on
|
||||
mathematical logic, which at the same time gives semantic accuracy and
|
||||
language independence. In other domains, something else than mathematical
|
||||
logic may be needed; the <B>ontologies</B> defined within the semantic
|
||||
web technology are often good starting points for interlinguas.
|
||||
</P>
|
||||
<H2>GF: a framework for multilingual grammars</H2>
|
||||
<P>
|
||||
The interlingua is just one part of a translation system. We also need
|
||||
the mappings between the interlingua and the involved languages. As the
|
||||
number of languages increases, this part grows while the interlingua remains
|
||||
constant.
|
||||
</P>
|
||||
<P>
|
||||
GF (Grammatical Framework,
|
||||
<A HREF="http://gf.digitalgrammars.com"><CODE>gf.digitalgrammars.com</CODE></A>)
|
||||
is a programming language designed to support interlingua-based translation.
|
||||
A "program" in GF is a <B>multilingual grammar</B>, which consists of an
|
||||
<B>abstract syntax</B> and a set of <B>concrete syntaxes</B>. A concrete
|
||||
syntaxes is a mapping from the abstract syntax to a particular language.
|
||||
These mappings are <B>reversible</B>, which means that they can be used for
|
||||
translating in both directions. This means that creating an interlingua-based
|
||||
translator for 23 languages just requires 1 + 23 = 24 grammar modules (the abstract
|
||||
syntax and the concrete syntaxes).
|
||||
</P>
|
||||
<P>
|
||||
The diagram first in this document shows a system covering the 23 EU languages.
|
||||
Languages marked in
|
||||
red are of particular interest for the summer school, since they are those
|
||||
on which the effort will be concentrated.
|
||||
</P>
|
||||
<H2>The GF resource grammar library</H2>
|
||||
<P>
|
||||
The GF resource grammar library is a set of grammars used as libraries when
|
||||
building interlingua-based translation systems. The library currently covers
|
||||
the 9 languages coloured in green in the diagram above; in addition,
|
||||
Catalan, Norwegian, and Russian are covered, and there is ongoing work on
|
||||
Arabic, Hindi/Urdu, and Thai.
|
||||
</P>
|
||||
<P>
|
||||
The purpose of the resource grammar library is to define the "low-level" structure
|
||||
of a language: inflection, word order, agreement. This structure belongs to what
|
||||
linguists call morphology and syntax. It can be very complex and requires
|
||||
a lot of knowledge. Yet, when translating from one language to another, knowing
|
||||
morphology and syntax is but a part of what is needed. The translator (whether human
|
||||
or machine) must understand the meaning of what is translated, and must also know
|
||||
the idiomatic way to express the meaning in the target language. This knowledge
|
||||
can be very domain-dependent and requires in general an expert in the field to
|
||||
reach high quality: a mathematician in the field of mathematics, a meteorologist
|
||||
in the field of weather reports, etc.
|
||||
</P>
|
||||
<P>
|
||||
The problem is to find a person who is an expert in both the domain of translation
|
||||
and in the low-level linguistic details. It is the rareness of this combination
|
||||
that has made it difficult to build interlingua-based translation systems.
|
||||
The GF resource grammar library has the mission of helping in this task. It encapsulates
|
||||
the low-level linguistics in program modules accessed through easy-to-use interfaces.
|
||||
Experts on different domains can build translation systems by using the library,
|
||||
without knowing low-level linguistics. The idea is much the same as when a
|
||||
programmer builds a graphical user interface (GUI) from high-level elements such as
|
||||
buttons and menus, without having to care about pixels or geometrical forms.
|
||||
</P>
|
||||
<H3>Applications of the library</H3>
|
||||
<P>
|
||||
In addition to translation, the library is also useful in <B>localization</B>,
|
||||
that is, porting a piece of software to new languages.
|
||||
The GF resource grammar library has been used in three major projects that need
|
||||
interlingua-based translation or localization of systems to new languages:
|
||||
</P>
|
||||
<UL>
|
||||
<LI>in KeY,
|
||||
<A HREF="http://www.key-project.org/"><CODE>http://www.key-project.org/</CODE></A>,
|
||||
for writing formal and informal software specifications (3 languages)
|
||||
<LI>in WebALT,
|
||||
<A HREF="http://webalt.math.helsinki.fi/content/index_eng.html"><CODE>http://webalt.math.helsinki.fi/content/index_eng.html</CODE></A>,
|
||||
for translating mathematical exercises to 7 languages
|
||||
<LI>in TALK <A HREF="http://www.talk-project.org"><CODE>http://www.talk-project.org</CODE></A>,
|
||||
where the library was used for localizing spoken dialogue systems to six languages
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
The library is also a generic linguistic resource, which can be used for tasks
|
||||
such as language teaching and information retrieval. The liberal license (GPL)
|
||||
makes it usable for anyone and for any task. GF also has tools supporting the
|
||||
use of grammars in programs written in other programming languages: C, C++, Haskell,
|
||||
Java, JavaScript, and Prolog. In connection with the TALK project, support has also been
|
||||
developed for translating GF grammars to language models used in speech
|
||||
recognition.
|
||||
</P>
|
||||
<H3>The structure of the library</H3>
|
||||
<P>
|
||||
The library has the following main parts:
|
||||
</P>
|
||||
<UL>
|
||||
<LI><B>Inflection paradigms</B>, covering the inflection of each language.
|
||||
<LI><B>Common Syntax API</B>, covering a large set of syntax rule that
|
||||
can be implemented for all languages involved.
|
||||
<LI><B>Common Test Lexicon</B>, giving ca. 500 common words that can be used for
|
||||
testing the library.
|
||||
<LI><B>Language-Specific Syntax Extensions</B>, covering syntax rules that are
|
||||
not implementable for all languages.
|
||||
<LI><B>Language-Specific Lexica</B>, word lists for each language, with
|
||||
accurate morphological and syntactic information.
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
The goal of the summer school is to implement, for each language, at least
|
||||
the first three components. The latter three are more open-ended in character.
|
||||
</P>
|
||||
<H2>The summer school</H2>
|
||||
<P>
|
||||
The goal of the summer school is to extend the GF resource grammar library
|
||||
to covering all 23 EU languages, which means we need 15 new languages.
|
||||
</P>
|
||||
<P>
|
||||
The amount of work and skill is between a Master's thesis and a PhD thesis.
|
||||
The Russian implementation was made by Janna Khegai as a part of her
|
||||
PhD thesis; the thesis contains other material, too.
|
||||
The Arabic implementation was started by Ali El Dada in his Master's thesis,
|
||||
but the thesis does not cover the whole API. The realistic amount of work is
|
||||
somewhere around 8 person months, but this is very much language-dependent.
|
||||
Dutch, for instance, can profit from previous implementations of German and
|
||||
Scandinavian languages, and will probably require less work.
|
||||
Latvian and Lithuanian are the first languages of the Baltic family and
|
||||
will probably require much more work.
|
||||
</P>
|
||||
<P>
|
||||
In any case, the proposed allocation of work power is 2 participants per
|
||||
language. They will have 6 months to work at home, followed
|
||||
by 2 weeks of summer school. Who are these participants?
|
||||
</P>
|
||||
<H3>Selecting participants</H3>
|
||||
<P>
|
||||
After the call has been published, persons interested to participate in
|
||||
the project are expected to learn GF by self-study from the
|
||||
<A HREF="http://www.cs.chalmers.se/Cs/Research/Language-technology/GF/doc/gf-tutorial.html">tutorial</A>.
|
||||
This should take a couple of weeks.
|
||||
</P>
|
||||
<P>
|
||||
After and perhapts in parallel with
|
||||
working out the tutorial, the participants should continue to
|
||||
implement selected parts of the resource grammar, following the advice from
|
||||
the
|
||||
<A HREF="http://www.cs.chalmers.se/Cs/Research/Language-technology/GF/doc/Resource-HOWTO.html">Resource-HOWTO document</A>.
|
||||
What parts exactly are selected will be announced later.
|
||||
This work will take another couple of weeks.
|
||||
</P>
|
||||
<P>
|
||||
This sample resource grammar fragment
|
||||
will be submitted to the Summer School Committee in the beginning of May.
|
||||
The Committee then decides who is invited to represent which language
|
||||
in the summer school.
|
||||
</P>
|
||||
<P>
|
||||
After the Committee decision, the participants have around three months
|
||||
to work on their languages. The work is completed in the summer school itself. It is also
|
||||
thoroughly tested by using it to add a new language to the WebALT mathematical
|
||||
exercise translator.
|
||||
</P>
|
||||
<P>
|
||||
Depending on the quality of submitted work, and on the demands of different
|
||||
languages, the Committee may decide to select another number than 2 participants
|
||||
for a language. We will also consider accepting participants who want to
|
||||
pay their own expenses.
|
||||
</P>
|
||||
<P>
|
||||
Also good proposals from non-EU languages will be considered. Proponents of
|
||||
such languages should contact the summer school organizers as early as possible.
|
||||
</P>
|
||||
<P>
|
||||
To keep track on who is working on which language, we will establish a web page
|
||||
(Wiki or similar) soon after the call is published. The participants are encourage
|
||||
to contact each other and even work in groups.
|
||||
</P>
|
||||
<H3>Who is qualified</H3>
|
||||
<P>
|
||||
Writing a resource grammar implementation requires good general programming
|
||||
skills, and a good explicit knowledge of the grammar of the target language.
|
||||
A typical participant could be
|
||||
</P>
|
||||
<UL>
|
||||
<LI>native or fluent speaker of the target language
|
||||
<LI>interested in languages on the theoretical level, and preferably familiar
|
||||
with many languages (to be able to think about them on an abstract level)
|
||||
<LI>familiar with functional programming languages such as ML or Haskell
|
||||
(GF itself is a language similar to these)
|
||||
<LI>on Master's or PhD level in linguistics, computer science, or mathematics
|
||||
</UL>
|
||||
|
||||
<P>
|
||||
But it is the quality of the assignment that is assessed, not any formal
|
||||
requirements. The "typical participant" was described to give an idea of
|
||||
who is likely to succeed in this.
|
||||
</P>
|
||||
<H3>Costs</H3>
|
||||
<P>
|
||||
Our aim is to make the summer school free of charge for the participants
|
||||
who are selected on the basis of their assignments. And not only that:
|
||||
we plan to cover their travel and accommodation costs, up to 1000 EUR
|
||||
per person.
|
||||
</P>
|
||||
<P>
|
||||
We want to get the funding question settled by mid-February 2009, and make
|
||||
the final decision on the summer school then.
|
||||
</P>
|
||||
<H3>Teachers</H3>
|
||||
<P>
|
||||
Krasimir Angelov
|
||||
</P>
|
||||
<P>
|
||||
?Olga Caprotti
|
||||
</P>
|
||||
<P>
|
||||
?Lauri Carlson
|
||||
</P>
|
||||
<P>
|
||||
?Robin Cooper
|
||||
</P>
|
||||
<P>
|
||||
?Björn Bringert
|
||||
</P>
|
||||
<P>
|
||||
Håkan Burden
|
||||
</P>
|
||||
<P>
|
||||
?Elisabet Engdahl
|
||||
</P>
|
||||
<P>
|
||||
?Markus Forsberg
|
||||
</P>
|
||||
<P>
|
||||
?Janna Khegai
|
||||
</P>
|
||||
<P>
|
||||
?Peter Ljunglöf
|
||||
</P>
|
||||
<P>
|
||||
?Wanjiku Ng'ang'a
|
||||
</P>
|
||||
<P>
|
||||
Aarne Ranta
|
||||
</P>
|
||||
<P>
|
||||
?Jordi Saludes
|
||||
</P>
|
||||
<P>
|
||||
In addition, we will look for consultants who can help to assess the results
|
||||
for each language
|
||||
</P>
|
||||
<H3>The Summer School Committee</H3>
|
||||
<P>
|
||||
This committee consists of a number of teachers and consultants,
|
||||
who will select the participants.
|
||||
</P>
|
||||
<H3>Time and Place</H3>
|
||||
<P>
|
||||
The summer school will
|
||||
be organized in Gothenburg in the latter half of August 2009.
|
||||
</P>
|
||||
<P>
|
||||
Time schedule (2009):
|
||||
</P>
|
||||
<UL>
|
||||
<LI>February: announcement of summer school and the grammar
|
||||
writing contest to get participants
|
||||
<LI>March-April: work on the contest assignment (ca 1 month)
|
||||
<LI>May: submission deadline and notification of acceptance
|
||||
<LI>June-July: more work on the grammars
|
||||
<LI>August: summer school
|
||||
</UL>
|
||||
|
||||
<H3>Dissemination and intellectual property</H3>
|
||||
<P>
|
||||
The new resource grammars will be released under the GPL just like
|
||||
the current resource grammars,
|
||||
with the copyright held by respective authors.
|
||||
</P>
|
||||
<P>
|
||||
The grammars will be distributed via the GF web site.
|
||||
</P>
|
||||
<P>
|
||||
The WebALT-specific grammars will have special licenses agreed between the
|
||||
authors and WebALT Inc.
|
||||
</P>
|
||||
<H2>Why I should participate</H2>
|
||||
<P>
|
||||
Seven reasons:
|
||||
</P>
|
||||
<OL>
|
||||
<LI>free trip and stay in Gothenburg (to be confirmed)
|
||||
<LI>participation in a pioneering language technology work in an enthusiastic atmosphere
|
||||
<LI>work and fun with people from all over Europe
|
||||
<LI>job opportunities and business ideas
|
||||
<LI>credits: the school project will be established as a course worth
|
||||
15 ETCS points per person, but extensions to Master's thesis will
|
||||
also be considered
|
||||
<LI>merits: the resulting grammar can easily lead to a published paper
|
||||
<LI>contribution to the multilingual and multicultural development of Europe
|
||||
</OL>
|
||||
|
||||
|
||||
<!-- html code generated by txt2tags 2.4 (http://txt2tags.sf.net) -->
|
||||
<!-- cmdline: txt2tags gf-summerschool.txt -->
|
||||
</BODY></HTML>
|
||||
Reference in New Issue
Block a user