mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-23 11:42:49 -06:00
new school web page
This commit is contained in:
@@ -3,21 +3,56 @@
|
|||||||
<HEAD>
|
<HEAD>
|
||||||
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
|
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
|
||||||
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
|
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
|
||||||
<TITLE>European Resource Grammar Summer School</TITLE>
|
<TITLE>GF Resource Grammar Summer School</TITLE>
|
||||||
</HEAD><BODY BGCOLOR="white" TEXT="black">
|
</HEAD><BODY BGCOLOR="white" TEXT="black">
|
||||||
<P ALIGN="center"><CENTER><H1>European Resource Grammar Summer School</H1>
|
<P ALIGN="center"><CENTER><H1>GF Resource Grammar Summer School</H1>
|
||||||
<FONT SIZE="4">
|
<FONT SIZE="4">
|
||||||
<I>Gothenburg, 17-28 August 2009</I><BR>
|
<I>Gothenburg, 17-28 August 2009</I><BR>
|
||||||
Aarne Ranta (aarne at chalmers.se)
|
Aarne Ranta (aarne at chalmers.se)
|
||||||
</FONT></CENTER>
|
</FONT></CENTER>
|
||||||
|
|
||||||
|
<P></P>
|
||||||
|
<HR NOSHADE SIZE=1>
|
||||||
|
<P></P>
|
||||||
|
<UL>
|
||||||
|
<LI><A HREF="#toc1">Executive summary</A>
|
||||||
|
<LI><A HREF="#toc2">Introduction</A>
|
||||||
|
<LI><A HREF="#toc3">The GF resource grammar library</A>
|
||||||
|
<UL>
|
||||||
|
<LI><A HREF="#toc4">Applications of the library</A>
|
||||||
|
<LI><A HREF="#toc5">The structure of the library</A>
|
||||||
|
</UL>
|
||||||
|
<LI><A HREF="#toc6">The summer school</A>
|
||||||
|
<UL>
|
||||||
|
<LI><A HREF="#toc7">Selecting participants</A>
|
||||||
|
<LI><A HREF="#toc8">Who is qualified</A>
|
||||||
|
<LI><A HREF="#toc9">Costs</A>
|
||||||
|
<LI><A HREF="#toc10">Teachers</A>
|
||||||
|
<LI><A HREF="#toc11">The Summer School Committee</A>
|
||||||
|
<LI><A HREF="#toc12">Time and Place</A>
|
||||||
|
<LI><A HREF="#toc13">Dissemination and intellectual property</A>
|
||||||
|
</UL>
|
||||||
|
<LI><A HREF="#toc14">Why I should participate</A>
|
||||||
|
<LI><A HREF="#toc15">More information</A>
|
||||||
|
<UL>
|
||||||
|
<LI><A HREF="#toc16">Contaxt</A>
|
||||||
|
<LI><A HREF="#toc17">Selected publications from earlier resource grammar projects</A>
|
||||||
|
</UL>
|
||||||
|
</UL>
|
||||||
|
|
||||||
|
<P></P>
|
||||||
|
<HR NOSHADE SIZE=1>
|
||||||
|
<P></P>
|
||||||
<P>
|
<P>
|
||||||
<I>preliminary version, 17 November 2008</I>
|
<center>
|
||||||
|
<IMG ALIGN="middle" SRC="school-langs.png" BORDER="0" ALT="">
|
||||||
|
</center>
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
<IMG ALIGN="middle" SRC="eu-langs.png" BORDER="0" ALT="">
|
<I>red=wanted, green=exists, yellow=in-progress, solid=official-eu, dotted=non-eu</I>
|
||||||
</P>
|
</P>
|
||||||
<H3>Executive summary</H3>
|
<A NAME="toc1"></A>
|
||||||
|
<H2>Executive summary</H2>
|
||||||
<P>
|
<P>
|
||||||
We plan to organize a summer school with the goal of implementing the GF
|
We plan to organize a summer school with the goal of implementing the GF
|
||||||
resource grammar library for 15 new languages, so that the library will
|
resource grammar library for 15 new languages, so that the library will
|
||||||
@@ -32,91 +67,76 @@ and also ported to other formats. The library is licensed under LGPL.
|
|||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Each language is implemented by one or two students working together.
|
Each language is implemented by one or two students working together.
|
||||||
Travel grants will be available for students selected on the basis of
|
Travel grants will be available for some students selected on the basis of
|
||||||
pre-conference assignments.
|
pre-conference assignments.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The official announcement will be in January 2009, and the summer school
|
The summer school will be held on 17-28 August 2009, at the campus of
|
||||||
itself on 17-28 August 2009, at the campus of Chalmers University of
|
Chalmers University of Technology in Gothenburg, Sweden.
|
||||||
Technology in Gothenburg, Sweden.
|
|
||||||
</P>
|
</P>
|
||||||
|
<A NAME="toc2"></A>
|
||||||
<H2>Introduction</H2>
|
<H2>Introduction</H2>
|
||||||
<P>
|
<P>
|
||||||
Since 2007, EU-27 has 23 official languages, listed in the diagram on top of this
|
Since 2007, EU-27 has 23 official languages, listed in the diagram on top of this
|
||||||
document.
|
document. There is a growing need of linguistic resources for these
|
||||||
There is a growing need of translation between
|
languages, to help in tasks such as translation and information retrieval.
|
||||||
these languages. The traditional language-to-language method requires 23*22 = 506
|
These resources should be <B>portable</B> and <B>freely accessible</B>.
|
||||||
translators (humans or computer programs) to cover all possible translation needs.
|
Languages marked in red in the diagram are of particular interest for
|
||||||
</P>
|
the summer school, since they are those on which the effort will be concentrated.
|
||||||
<P>
|
|
||||||
An alternative to language-to-language translation is the use of an <B>interlingua</B>:
|
|
||||||
a language-independent representation such that all translation problems can
|
|
||||||
be reduced to translating to and from the interlingua. With 23 languages,
|
|
||||||
only 2*23 = 46 translators are needed.
|
|
||||||
</P>
|
|
||||||
<P>
|
|
||||||
Interlingua sounds too good to be true. In a sense, it is. All attempts to
|
|
||||||
create an interlingua that would solve all translation problems have failed.
|
|
||||||
However, interlinguas for restricted applications have shown more
|
|
||||||
success. For instance, mathematical texts and weather reports can be translated
|
|
||||||
by using interlinguas tailor-made for the domains of mathematics and weather reports,
|
|
||||||
respectively.
|
|
||||||
</P>
|
|
||||||
<P>
|
|
||||||
What is required of an interlingua is
|
|
||||||
</P>
|
|
||||||
<UL>
|
|
||||||
<LI>semantic accuracy: correspondence to what you want to say in the application
|
|
||||||
<LI>language-independence: abstraction from individual languages
|
|
||||||
</UL>
|
|
||||||
|
|
||||||
<P>
|
|
||||||
Thus, for instance, an interlingua for mathematical texts may be based on
|
|
||||||
mathematical logic, which at the same time gives semantic accuracy and
|
|
||||||
language independence. In other domains, something else than mathematical
|
|
||||||
logic may be needed; the <B>ontologies</B> defined within the semantic
|
|
||||||
web technology are often good starting points for interlinguas.
|
|
||||||
</P>
|
|
||||||
<H2>GF: a framework for multilingual grammars</H2>
|
|
||||||
<P>
|
|
||||||
The interlingua is just one part of a translation system. We also need
|
|
||||||
the mappings between the interlingua and the involved languages. As the
|
|
||||||
number of languages increases, this part grows while the interlingua remains
|
|
||||||
constant.
|
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
GF (Grammatical Framework,
|
GF (Grammatical Framework,
|
||||||
<A HREF="http://digitalgrammars.com/gf"><CODE>digitalgrammars.com/gf</CODE></A>)
|
<A HREF="http://digitalgrammars.com/gf"><CODE>digitalgrammars.com/gf</CODE></A>)
|
||||||
is a programming language designed to support interlingua-based translation.
|
is a <B>functional programming language</B> designed for writing natural
|
||||||
A "program" in GF is a <B>multilingual grammar</B>, which consists of an
|
language grammars. It provides an efficient platform for this task, due to
|
||||||
<B>abstract syntax</B> and a set of <B>concrete syntaxes</B>. A concrete
|
its modern characteristics:
|
||||||
syntaxes is a mapping from the abstract syntax to a particular language.
|
</P>
|
||||||
These mappings are <B>reversible</B>, which means that they can be used for
|
<UL>
|
||||||
translating in both directions. This means that creating an interlingua-based
|
<LI>It is a functional programming language, similar to Haskell and ML.
|
||||||
translator for 23 languages just requires 1 + 23 = 24 grammar modules (the abstract
|
<LI>It has a static type system and type checker.
|
||||||
syntax and the concrete syntaxes).
|
<LI>It has a powerful module system supporting separate compilation
|
||||||
|
and data abstraction.
|
||||||
|
<LI>It has an optimizing compiler to <B>Portable Grammar Format</B> (PGF).
|
||||||
|
<LI>PGF can be further compiled to other formats, such as JavaScript and
|
||||||
|
speech recognition language models.
|
||||||
|
<LI>GF has a <B>resource grammar library</B> giving access to the morphology and
|
||||||
|
basic syntax of 12 languages.
|
||||||
|
</UL>
|
||||||
|
|
||||||
|
<P>
|
||||||
|
In addition to "ordinary" grammars for single languages, GF
|
||||||
|
supports <B>multilingual grammars</B>. A multilingual GF grammar consists of an
|
||||||
|
<B>abstract syntax</B> and a set of <B>concrete syntaxes</B>.
|
||||||
|
An abstract syntax is system of <B>trees</B>, serving as a semantic
|
||||||
|
model or an ontology. A concrete syntax is a mapping from abstract syntax
|
||||||
|
trees to strings of a particular language.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The diagram first in this document shows an interlingua
|
These mappings defined in concrete syntax are <B>reversible</B>: they
|
||||||
system covering the 23 EU languages.
|
can be used both for <B>generating</B> strings from trees, and for
|
||||||
Languages marked in
|
<B>parsing</B> strings into trees. Combinations of generation and
|
||||||
red are of particular interest for the summer school, since they are those
|
parsing can be used for <B>translation</B>, where the abstract
|
||||||
on which the effort will be concentrated.
|
syntax works as an <B>interlingua</B>. Thus GF has been used as a
|
||||||
|
framework for building translation systems in several areas
|
||||||
|
of application and large sets of languages.
|
||||||
</P>
|
</P>
|
||||||
|
<A NAME="toc3"></A>
|
||||||
<H2>The GF resource grammar library</H2>
|
<H2>The GF resource grammar library</H2>
|
||||||
<P>
|
<P>
|
||||||
The GF resource grammar library is a set of grammars used as libraries when
|
The GF resource grammar library is a set of grammars usable as libraries when
|
||||||
building interlingua-based translation systems. The library currently covers
|
building translation systems and other applications.
|
||||||
|
The library currently covers
|
||||||
the 9 languages coloured in green in the diagram above; in addition,
|
the 9 languages coloured in green in the diagram above; in addition,
|
||||||
Catalan, Norwegian, and Russian are covered, and there is ongoing work on
|
Catalan, Norwegian, and Russian are covered, and there is ongoing work on
|
||||||
Arabic, Hindi/Urdu, and Thai.
|
Arabic, Hindi/Urdu, Polish, Romanian, and Thai.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The purpose of the resource grammar library is to define the "low-level" structure
|
The purpose of the resource grammar library is to define the "low-level" structure
|
||||||
of a language: inflection, word order, agreement. This structure belongs to what
|
of a language: inflection, word order, agreement. This structure belongs to what
|
||||||
linguists call morphology and syntax. It can be very complex and requires
|
linguists call morphology and syntax. It can be very complex and requires
|
||||||
a lot of knowledge. Yet, when translating from one language to another, knowing
|
a lot of knowledge. Yet, when translating from one language to
|
||||||
morphology and syntax is but a part of what is needed. The translator (whether human
|
another, knowing morphology and syntax is but a part of what is needed.
|
||||||
|
The translator (whether human
|
||||||
or machine) must understand the meaning of what is translated, and must also know
|
or machine) must understand the meaning of what is translated, and must also know
|
||||||
the idiomatic way to express the meaning in the target language. This knowledge
|
the idiomatic way to express the meaning in the target language. This knowledge
|
||||||
can be very domain-dependent and requires in general an expert in the field to
|
can be very domain-dependent and requires in general an expert in the field to
|
||||||
@@ -127,13 +147,15 @@ in the field of weather reports, etc.
|
|||||||
The problem is to find a person who is an expert in both the domain of translation
|
The problem is to find a person who is an expert in both the domain of translation
|
||||||
and in the low-level linguistic details. It is the rareness of this combination
|
and in the low-level linguistic details. It is the rareness of this combination
|
||||||
that has made it difficult to build interlingua-based translation systems.
|
that has made it difficult to build interlingua-based translation systems.
|
||||||
The GF resource grammar library has the mission of helping in this task. It encapsulates
|
The GF resource grammar library has the mission of helping in this task.
|
||||||
the low-level linguistics in program modules accessed through easy-to-use interfaces.
|
It encapsulates the low-level linguistics in program modules
|
||||||
|
accessed through easy-to-use interfaces.
|
||||||
Experts on different domains can build translation systems by using the library,
|
Experts on different domains can build translation systems by using the library,
|
||||||
without knowing low-level linguistics. The idea is much the same as when a
|
without knowing low-level linguistics. The idea is much the same as when a
|
||||||
programmer builds a graphical user interface (GUI) from high-level elements such as
|
programmer builds a graphical user interface (GUI) from high-level elements such as
|
||||||
buttons and menus, without having to care about pixels or geometrical forms.
|
buttons and menus, without having to care about pixels or geometrical forms.
|
||||||
</P>
|
</P>
|
||||||
|
<A NAME="toc4"></A>
|
||||||
<H3>Applications of the library</H3>
|
<H3>Applications of the library</H3>
|
||||||
<P>
|
<P>
|
||||||
In addition to translation, the library is also useful in <B>localization</B>,
|
In addition to translation, the library is also useful in <B>localization</B>,
|
||||||
@@ -149,25 +171,29 @@ interlingua-based translation or localization of systems to new languages:
|
|||||||
<A HREF="http://webalt.math.helsinki.fi/content/index_eng.html"><CODE>http://webalt.math.helsinki.fi/content/index_eng.html</CODE></A>,
|
<A HREF="http://webalt.math.helsinki.fi/content/index_eng.html"><CODE>http://webalt.math.helsinki.fi/content/index_eng.html</CODE></A>,
|
||||||
for translating mathematical exercises to 7 languages
|
for translating mathematical exercises to 7 languages
|
||||||
<LI>in TALK <A HREF="http://www.talk-project.org"><CODE>http://www.talk-project.org</CODE></A>,
|
<LI>in TALK <A HREF="http://www.talk-project.org"><CODE>http://www.talk-project.org</CODE></A>,
|
||||||
where the library was used for localizing spoken dialogue systems to six languages
|
where the library was used for localizing spoken dialogue systems
|
||||||
|
to six languages
|
||||||
</UL>
|
</UL>
|
||||||
|
|
||||||
<P>
|
<P>
|
||||||
The library is also a generic linguistic resource, which can be used for tasks
|
The library is also a generic linguistic resource, which can be used for tasks
|
||||||
such as language teaching and information retrieval. The liberal license (LGPL)
|
such as language teaching and information retrieval. The liberal license (LGPL)
|
||||||
makes it usable for anyone and for any task. GF also has tools supporting the
|
makes it usable for anyone and for any task. GF also has tools supporting the
|
||||||
use of grammars in programs written in other programming languages: C, C++, Haskell,
|
use of grammars in programs written in other
|
||||||
Java, JavaScript, and Prolog. In connection with the TALK project, support has also been
|
programming languages: C, C++, Haskell,
|
||||||
|
Java, JavaScript, and Prolog. In connection with the TALK project,
|
||||||
|
support has also been
|
||||||
developed for translating GF grammars to language models used in speech
|
developed for translating GF grammars to language models used in speech
|
||||||
recognition (GSL/Nuance, HTK/ATK, SRGS, JSGF).
|
recognition (GSL/Nuance, HTK/ATK, SRGS, JSGF).
|
||||||
</P>
|
</P>
|
||||||
|
<A NAME="toc5"></A>
|
||||||
<H3>The structure of the library</H3>
|
<H3>The structure of the library</H3>
|
||||||
<P>
|
<P>
|
||||||
The library has the following main parts:
|
The library has the following main parts:
|
||||||
</P>
|
</P>
|
||||||
<UL>
|
<UL>
|
||||||
<LI><B>Inflection paradigms</B>, covering the inflection of each language.
|
<LI><B>Inflection paradigms</B>, covering the inflection of each language.
|
||||||
<LI><B>Common Syntax API</B>, covering a large set of syntax rule that
|
<LI><B>Core Syntax</B>, covering a large set of syntax rule that
|
||||||
can be implemented for all languages involved.
|
can be implemented for all languages involved.
|
||||||
<LI><B>Common Test Lexicon</B>, giving ca. 500 common words that can be used for
|
<LI><B>Common Test Lexicon</B>, giving ca. 500 common words that can be used for
|
||||||
testing the library.
|
testing the library.
|
||||||
@@ -181,11 +207,13 @@ The library has the following main parts:
|
|||||||
The goal of the summer school is to implement, for each language, at least
|
The goal of the summer school is to implement, for each language, at least
|
||||||
the first three components. The latter three are more open-ended in character.
|
the first three components. The latter three are more open-ended in character.
|
||||||
</P>
|
</P>
|
||||||
|
<A NAME="toc6"></A>
|
||||||
<H2>The summer school</H2>
|
<H2>The summer school</H2>
|
||||||
<P>
|
<P>
|
||||||
The goal of the summer school is to extend the GF resource grammar library
|
The goal of the summer school is to extend the GF resource grammar library
|
||||||
to covering all 23 EU languages, which means we need 15 new languages.
|
to covering all 23 EU languages, which means we need 15 new languages.
|
||||||
We also welcome other languages, if there are interested participants.
|
We also welcome other languages than these 23,
|
||||||
|
if there are interested participants.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
The amount of work and skill is between a Master's thesis and a PhD thesis.
|
The amount of work and skill is between a Master's thesis and a PhD thesis.
|
||||||
@@ -201,50 +229,52 @@ will probably require more work.
|
|||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
In any case, the proposed allocation of work power is 2 participants per
|
In any case, the proposed allocation of work power is 2 participants per
|
||||||
language. They will have 6 months to work at home, followed
|
language. They will do 2 months' worth of home work, followed
|
||||||
by 2 weeks of summer school. Who are these participants?
|
by 2 weeks of summer school, followed by 4 months work at home.
|
||||||
|
Who are these participants?
|
||||||
</P>
|
</P>
|
||||||
|
<A NAME="toc7"></A>
|
||||||
<H3>Selecting participants</H3>
|
<H3>Selecting participants</H3>
|
||||||
<P>
|
<P>
|
||||||
After the call has been published, persons interested to participate in
|
Persons interested to participate in the Summer School should sign up in
|
||||||
the project are expected to learn GF by self-study from the
|
the <B>Google Group</B> of the course,
|
||||||
<A HREF="http://digitalgrammars.com/gf/doc/gf-tutorial.html">tutorial</A>.
|
|
||||||
This should take a couple of weeks. Also an on-line course will be
|
|
||||||
arranged to help in getting started with GF.
|
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Participants should continue to
|
<A HREF="http://groups.google.com/group/gf-resource-school-2009/"><CODE>groups.google.com/group/gf-resource-school-2009/</CODE></A>
|
||||||
implement selected parts of the resource grammar, following the advice from
|
</P>
|
||||||
the
|
<P>
|
||||||
<A HREF="http://digitalgrammars.com/gf/doc/Resource-HOWTO.html">Resource-HOWTO document</A>.
|
The participants are expected to learn GF by self-study from the
|
||||||
What parts exactly are selected will be announced later.
|
<A HREF="http://digitalgrammars.com/gf/doc/gf-tutorial.html">tutorial</A>.
|
||||||
This work will take another couple of weeks.
|
This should take a couple of weeks. An <B>on-line course</B> will be
|
||||||
|
arranged in April to help in getting started with GF.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
After the on-line course, a <B>programming assignment</B> will be published.
|
||||||
|
This assignment will test skills required in resource grammar programming.
|
||||||
|
Work on the assignment will take a couple of weeks.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Those who are interested in getting a travel grant will submit
|
Those who are interested in getting a travel grant will submit
|
||||||
their sample resource grammar fragment
|
their sample resource grammar fragment
|
||||||
to the Summer School Committee in the beginning of May.
|
to the Summer School Committee by 12 May.
|
||||||
The Committee then decides who is invited to represent which language
|
The Committee then decides who is invited to represent which language
|
||||||
in the summer school.
|
in the summer school.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
After the Committee decision, the participants have around three months
|
The summer school itself is devoted for working on resource grammars.
|
||||||
to work on their languages. The work is completed in the summer school
|
In addition to grammar writing itself, testing and evaluation is
|
||||||
itself. It is also thoroughly tested by using it to add new languages
|
performed. One way to do this is via adding new languages
|
||||||
to applications - in particular, to the WebALT mathematical
|
to resource grammar applications - in particular, to the WebALT mathematical
|
||||||
exercise translator.
|
exercise translator.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
Depending on the quality of submitted work, and on the demands of different
|
The resource grammars are expected to be completed by December 2009. They will
|
||||||
languages, the Committee may decide to select another number than 2 participants
|
be published at GF website and licensed under LGPL.
|
||||||
for a language. We will also consider accepting participants who want to
|
|
||||||
pay their own expenses.
|
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
To keep track on who is working on which language, we will establish a Wiki page
|
The participants are encouraged to contact each other and even work in groups.
|
||||||
soon after the call is published. The participants are encouraged
|
|
||||||
to contact each other and even work in groups.
|
|
||||||
</P>
|
</P>
|
||||||
|
<A NAME="toc8"></A>
|
||||||
<H3>Who is qualified</H3>
|
<H3>Who is qualified</H3>
|
||||||
<P>
|
<P>
|
||||||
Writing a resource grammar implementation requires good general programming
|
Writing a resource grammar implementation requires good general programming
|
||||||
@@ -265,6 +295,7 @@ But it is the quality of the assignment that is assessed, not any formal
|
|||||||
requirements. The "typical participant" was described to give an idea of
|
requirements. The "typical participant" was described to give an idea of
|
||||||
who is likely to succeed in this.
|
who is likely to succeed in this.
|
||||||
</P>
|
</P>
|
||||||
|
<A NAME="toc9"></A>
|
||||||
<H3>Costs</H3>
|
<H3>Costs</H3>
|
||||||
<P>
|
<P>
|
||||||
Our aim is to make the summer school free of charge for the participants
|
Our aim is to make the summer school free of charge for the participants
|
||||||
@@ -273,8 +304,15 @@ we plan to cover their travel and accommodation costs, up to 1000 EUR
|
|||||||
per person.
|
per person.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<P>
|
||||||
We try to get the funding question settled by mid-February 2009.
|
The number of grants will be decided during Spring 2009, so that grand
|
||||||
|
holders can be notified before the beginning of June.
|
||||||
</P>
|
</P>
|
||||||
|
<P>
|
||||||
|
Special terms will apply to students in
|
||||||
|
<A HREF="http://www.gslt.hum.gu.se/">GSLT</A> and
|
||||||
|
<A HREF="http://ngslt.org/">NGSLT</A>.
|
||||||
|
</P>
|
||||||
|
<A NAME="toc10"></A>
|
||||||
<H3>Teachers</H3>
|
<H3>Teachers</H3>
|
||||||
<P>
|
<P>
|
||||||
A list of teachers will be published here later. Some of the local teachers
|
A list of teachers will be published here later. Some of the local teachers
|
||||||
@@ -298,11 +336,13 @@ we can discuss your involvement and travel arrangements.
|
|||||||
In addition to teachers, we will look for consultants who can help to assess
|
In addition to teachers, we will look for consultants who can help to assess
|
||||||
the results for each language. Please contact us!
|
the results for each language. Please contact us!
|
||||||
</P>
|
</P>
|
||||||
|
<A NAME="toc11"></A>
|
||||||
<H3>The Summer School Committee</H3>
|
<H3>The Summer School Committee</H3>
|
||||||
<P>
|
<P>
|
||||||
This committee consists of a number of teachers and consultants,
|
This committee consists of a number of teachers and informants,
|
||||||
who will select the participants. It will be selected by February 2009.
|
who will select the participants. It will be selected by April 2009.
|
||||||
</P>
|
</P>
|
||||||
|
<A NAME="toc12"></A>
|
||||||
<H3>Time and Place</H3>
|
<H3>Time and Place</H3>
|
||||||
<P>
|
<P>
|
||||||
The summer school will
|
The summer school will
|
||||||
@@ -313,15 +353,16 @@ Sweden, on 17-28 August 2009.
|
|||||||
Time schedule:
|
Time schedule:
|
||||||
</P>
|
</P>
|
||||||
<UL>
|
<UL>
|
||||||
<LI>February: announcement of summer school and the grammar
|
<LI>February: announcement of summer school
|
||||||
writing contest to get participants
|
<LI>April: on-line course, work on the contest assignment
|
||||||
<LI>March-April: on-line course, work on the contest assignment (ca 1 month)
|
<LI>12 May: submission deadline for assignment work
|
||||||
<LI>May: submission deadline and notification of acceptance
|
<LI>31 May: review of assignments, notifications of acceptance
|
||||||
<LI>June-July: more work on the grammars
|
<LI>17-28 August: Summer School
|
||||||
<LI>August: summer school
|
<LI>September-December: homework on resource grammars
|
||||||
<LI>September-December: more homework if necessary
|
<LI>December: release of the extended Resource Grammar Library
|
||||||
</UL>
|
</UL>
|
||||||
|
|
||||||
|
<A NAME="toc13"></A>
|
||||||
<H3>Dissemination and intellectual property</H3>
|
<H3>Dissemination and intellectual property</H3>
|
||||||
<P>
|
<P>
|
||||||
The new resource grammars will be released under the LGPL just like
|
The new resource grammars will be released under the LGPL just like
|
||||||
@@ -331,28 +372,137 @@ with the copyright held by respective authors.
|
|||||||
<P>
|
<P>
|
||||||
The grammars will be distributed via the GF web site.
|
The grammars will be distributed via the GF web site.
|
||||||
</P>
|
</P>
|
||||||
<P>
|
<A NAME="toc14"></A>
|
||||||
The WebALT-specific grammars will have special licenses agreed between the
|
|
||||||
authors and WebALT Inc.
|
|
||||||
</P>
|
|
||||||
<H2>Why I should participate</H2>
|
<H2>Why I should participate</H2>
|
||||||
<P>
|
<P>
|
||||||
Seven reasons:
|
Seven reasons:
|
||||||
</P>
|
</P>
|
||||||
<OL>
|
<OL>
|
||||||
<LI>participation in a pioneering language technology work in an enthusiastic atmosphere
|
<LI>participation in a pioneering language technology work in an
|
||||||
|
enthusiastic atmosphere
|
||||||
<LI>work and fun with people from all over Europe and the world
|
<LI>work and fun with people from all over Europe and the world
|
||||||
<LI>job opportunities and business ideas
|
<LI>job opportunities and business ideas
|
||||||
<LI>credits: the school project will be established as a course at Chalmers worth
|
<LI>credits: the school project will be established as a course at Chalmers worth
|
||||||
15 ETCS points per person, but extensions to Master's thesis will
|
7.5 or 15 ETCS points per person, depending on the work accompliched; also
|
||||||
also be considered
|
extensions to Master's thesis will be considered (special credit arrangements
|
||||||
<LI>merits: the resulting grammar can easily lead to a published paper
|
for <A HREF="http://www.gslt.hum.gu.se/">GSLT</A> and <A HREF="http://ngslt.org/">NGSLT</A>)
|
||||||
|
<LI>merits: the resulting grammar can easily lead to a published paper (see below)
|
||||||
<LI>contribution to the multilingual and multicultural development of Europe and the
|
<LI>contribution to the multilingual and multicultural development of Europe and the
|
||||||
world
|
world
|
||||||
<LI>free trip and stay in Gothenburg (for travel grant students)
|
<LI>free trip and stay in Gothenburg (for travel grant students)
|
||||||
</OL>
|
</OL>
|
||||||
|
|
||||||
|
<A NAME="toc15"></A>
|
||||||
|
<H2>More information</H2>
|
||||||
|
<P>
|
||||||
|
<A HREF="http://groups.google.com/group/gf-resource-school-2009/">Course Google Group</A>
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
<A HREF="http://digitalgrammars.com/gf/">GF web page</A>
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
<A HREF="http://digitalgrammars.com/gf/doc/gf-tutorial.html">GF tutorial</A>
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
<A HREF="http://digitalgrammars.com/gf/doc/Resource-HOWTO.html">Resource-HOWTO document</A>
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
Forthcoming: survey article "The GF Resource Grammar Library"
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
Forthcoming: book about GF
|
||||||
|
</P>
|
||||||
|
<A NAME="toc16"></A>
|
||||||
|
<H3>Contaxt</H3>
|
||||||
|
<P>
|
||||||
|
Håkan Burden: burden at chalmers se
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
Aarne Ranta: aarne at chalmers se
|
||||||
|
</P>
|
||||||
|
<A NAME="toc17"></A>
|
||||||
|
<H3>Selected publications from earlier resource grammar projects</H3>
|
||||||
|
<P>
|
||||||
|
K. Angelov.
|
||||||
|
Type-Theoretical Bulgarian Grammar.
|
||||||
|
In B. Nordström and A. Ranta (eds),
|
||||||
|
<I>Advances in Natural Language Processing (GoTAL 2008)</I>,
|
||||||
|
LNCS/LNAI 5221, Springer,
|
||||||
|
2008.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
A. El Dada and A. Ranta.
|
||||||
|
Implementing an Open Source Arabic Resource Grammar in GF.
|
||||||
|
In M. Mughazy (ed),
|
||||||
|
<I>Perspectives on Arabic Linguistics XX. Papers from the Twentieth Annual Symposium on Arabic Linguistics, Kalamazoo, March 26</I>
|
||||||
|
John Benjamins Publishing Company.
|
||||||
|
2007.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
A. El Dada.
|
||||||
|
Implementation of the Arabic Numerals and their Syntax in GF.
|
||||||
|
Computational Approaches to Semitic Languages: Common Issues and Resources,
|
||||||
|
ACL-2007 Workshop,
|
||||||
|
June 28, 2007, Prague.
|
||||||
|
2007.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
H. Hammarström and A. Ranta.
|
||||||
|
Cardinal Numerals Revisited in GF.
|
||||||
|
<I>Workshop on Numerals in the World's Languages</I>.
|
||||||
|
Dept. of Linguistics Max Planck Institute for Evolutionary Anthropology, Leipzig,
|
||||||
|
2004.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
M. Humayoun, H. Hammarström, and A. Ranta.
|
||||||
|
Urdu Morphology, Orthography and Lexicon Extraction.
|
||||||
|
<I>CAASL-2: The Second Workshop on Computational Approaches to Arabic Script-based Languages</I>,
|
||||||
|
July 21-22, 2007, LSA 2007 Linguistic Institute, Stanford University.
|
||||||
|
2007.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
J Khegai.
|
||||||
|
GF parallel resource grammars and Russian.
|
||||||
|
In proceedings of ACL2006
|
||||||
|
(The joint conference of the International Committee on Computational
|
||||||
|
Linguistics and the Association for Computational Linguistics) (pp. 475-482),
|
||||||
|
Sydney, Australia, July 2006.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
J. Khegai.
|
||||||
|
Language engineering in Grammatical Framework (GF).
|
||||||
|
Phd thesis, Computer Science, Chalmers University of Technology,
|
||||||
|
2006.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
W. Ng'ang'a.
|
||||||
|
Multilingual content development for eLearning in Africa.
|
||||||
|
eLearning Africa: 1st Pan-African Conference on ICT for Development,
|
||||||
|
Education and Training. 24-26 May 2006, Addis Ababa, Ethiopia.
|
||||||
|
2006.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
N. Perera and A. Ranta.
|
||||||
|
Dialogue System Localization with the GF Resource Grammar Library.
|
||||||
|
<I>SPEECHGRAM 2007: ACL Workshop on Grammar-Based Approaches to Spoken Language Processing</I>,
|
||||||
|
June 29, 2007, Prague.
|
||||||
|
2007.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
A. Ranta.
|
||||||
|
Modular Grammar Engineering in GF.
|
||||||
|
<I>Research on Language and Computation</I>,
|
||||||
|
5:133-158, 2007.
|
||||||
|
</P>
|
||||||
|
<P>
|
||||||
|
A. Ranta.
|
||||||
|
How predictable is Finnish morphology? An experiment on lexicon construction.
|
||||||
|
In J. Nivre, M. Dahllöf and B. Megyesi (eds),
|
||||||
|
<I>Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein</I>,
|
||||||
|
University of Uppsala,
|
||||||
|
2008.
|
||||||
|
</P>
|
||||||
|
|
||||||
<!-- html code generated by txt2tags 2.4 (http://txt2tags.sf.net) -->
|
<!-- html code generated by txt2tags 2.4 (http://txt2tags.sf.net) -->
|
||||||
<!-- cmdline: txt2tags gf-summerschool.txt -->
|
<!-- cmdline: txt2tags -\-toc gf-summerschool.txt -->
|
||||||
</BODY></HTML>
|
</BODY></HTML>
|
||||||
|
|||||||
@@ -1,17 +1,22 @@
|
|||||||
European Resource Grammar Summer School
|
GF Resource Grammar Summer School
|
||||||
Gothenburg, 17-28 August 2009
|
Gothenburg, 17-28 August 2009
|
||||||
Aarne Ranta (aarne at chalmers.se)
|
Aarne Ranta (aarne at chalmers.se)
|
||||||
|
|
||||||
%!Encoding : iso-8859-1
|
%!Encoding : iso-8859-1
|
||||||
|
|
||||||
%!target:html
|
%!target:html
|
||||||
|
%!postproc(html): #BECE <center>
|
||||||
|
%!postproc(html): #ENCE </center>
|
||||||
|
|
||||||
//preliminary version, 17 November 2008//
|
#BECE
|
||||||
|
[school-langs.png]
|
||||||
[eu-langs.png]
|
#ENCE
|
||||||
|
|
||||||
|
|
||||||
===Executive summary===
|
//red=wanted, green=exists, yellow=in-progress, solid=official-eu, dotted=non-eu//
|
||||||
|
|
||||||
|
|
||||||
|
==Executive summary==
|
||||||
|
|
||||||
We plan to organize a summer school with the goal of implementing the GF
|
We plan to organize a summer school with the goal of implementing the GF
|
||||||
resource grammar library for 15 new languages, so that the library will
|
resource grammar library for 15 new languages, so that the library will
|
||||||
@@ -24,89 +29,71 @@ and basic syntax of each language. It can be used in GF applications
|
|||||||
and also ported to other formats. The library is licensed under LGPL.
|
and also ported to other formats. The library is licensed under LGPL.
|
||||||
|
|
||||||
Each language is implemented by one or two students working together.
|
Each language is implemented by one or two students working together.
|
||||||
Travel grants will be available for students selected on the basis of
|
Travel grants will be available for some students selected on the basis of
|
||||||
pre-conference assignments.
|
pre-conference assignments.
|
||||||
|
|
||||||
The official announcement will be in January 2009, and the summer school
|
The summer school will be held on 17-28 August 2009, at the campus of
|
||||||
itself on 17-28 August 2009, at the campus of Chalmers University of
|
Chalmers University of Technology in Gothenburg, Sweden.
|
||||||
Technology in Gothenburg, Sweden.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
==Introduction==
|
==Introduction==
|
||||||
|
|
||||||
Since 2007, EU-27 has 23 official languages, listed in the diagram on top of this
|
Since 2007, EU-27 has 23 official languages, listed in the diagram on top of this
|
||||||
document.
|
document. There is a growing need of linguistic resources for these
|
||||||
%[``http://ec.europa.eu/education/policies/lang/languages/index_en.html``
|
languages, to help in tasks such as translation and information retrieval.
|
||||||
%http://ec.europa.eu/education/policies/lang/languages/index_en.html].
|
These resources should be **portable** and **freely accessible**.
|
||||||
There is a growing need of translation between
|
Languages marked in red in the diagram are of particular interest for
|
||||||
these languages. The traditional language-to-language method requires 23*22 = 506
|
the summer school, since they are those on which the effort will be concentrated.
|
||||||
translators (humans or computer programs) to cover all possible translation needs.
|
|
||||||
|
|
||||||
An alternative to language-to-language translation is the use of an **interlingua**:
|
|
||||||
a language-independent representation such that all translation problems can
|
|
||||||
be reduced to translating to and from the interlingua. With 23 languages,
|
|
||||||
only 2*23 = 46 translators are needed.
|
|
||||||
|
|
||||||
Interlingua sounds too good to be true. In a sense, it is. All attempts to
|
|
||||||
create an interlingua that would solve all translation problems have failed.
|
|
||||||
However, interlinguas for restricted applications have shown more
|
|
||||||
success. For instance, mathematical texts and weather reports can be translated
|
|
||||||
by using interlinguas tailor-made for the domains of mathematics and weather reports,
|
|
||||||
respectively.
|
|
||||||
|
|
||||||
What is required of an interlingua is
|
|
||||||
- semantic accuracy: correspondence to what you want to say in the application
|
|
||||||
- language-independence: abstraction from individual languages
|
|
||||||
|
|
||||||
|
|
||||||
Thus, for instance, an interlingua for mathematical texts may be based on
|
|
||||||
mathematical logic, which at the same time gives semantic accuracy and
|
|
||||||
language independence. In other domains, something else than mathematical
|
|
||||||
logic may be needed; the **ontologies** defined within the semantic
|
|
||||||
web technology are often good starting points for interlinguas.
|
|
||||||
|
|
||||||
|
|
||||||
==GF: a framework for multilingual grammars==
|
|
||||||
|
|
||||||
The interlingua is just one part of a translation system. We also need
|
|
||||||
the mappings between the interlingua and the involved languages. As the
|
|
||||||
number of languages increases, this part grows while the interlingua remains
|
|
||||||
constant.
|
|
||||||
|
|
||||||
GF (Grammatical Framework,
|
GF (Grammatical Framework,
|
||||||
[``digitalgrammars.com/gf`` http://digitalgrammars.com/gf])
|
[``digitalgrammars.com/gf`` http://digitalgrammars.com/gf])
|
||||||
is a programming language designed to support interlingua-based translation.
|
is a **functional programming language** designed for writing natural
|
||||||
A "program" in GF is a **multilingual grammar**, which consists of an
|
language grammars. It provides an efficient platform for this task, due to
|
||||||
**abstract syntax** and a set of **concrete syntaxes**. A concrete
|
its modern characteristics:
|
||||||
syntaxes is a mapping from the abstract syntax to a particular language.
|
- It is a functional programming language, similar to Haskell and ML.
|
||||||
These mappings are **reversible**, which means that they can be used for
|
- It has a static type system and type checker.
|
||||||
translating in both directions. This means that creating an interlingua-based
|
- It has a powerful module system supporting separate compilation
|
||||||
translator for 23 languages just requires 1 + 23 = 24 grammar modules (the abstract
|
and data abstraction.
|
||||||
syntax and the concrete syntaxes).
|
- It has an optimizing compiler to **Portable Grammar Format** (PGF).
|
||||||
|
- PGF can be further compiled to other formats, such as JavaScript and
|
||||||
|
speech recognition language models.
|
||||||
|
- GF has a **resource grammar library** giving access to the morphology and
|
||||||
|
basic syntax of 12 languages.
|
||||||
|
|
||||||
The diagram first in this document shows an interlingua
|
|
||||||
system covering the 23 EU languages.
|
|
||||||
Languages marked in
|
|
||||||
red are of particular interest for the summer school, since they are those
|
|
||||||
on which the effort will be concentrated.
|
|
||||||
|
|
||||||
|
In addition to "ordinary" grammars for single languages, GF
|
||||||
|
supports **multilingual grammars**. A multilingual GF grammar consists of an
|
||||||
|
**abstract syntax** and a set of **concrete syntaxes**.
|
||||||
|
An abstract syntax is system of **trees**, serving as a semantic
|
||||||
|
model or an ontology. A concrete syntax is a mapping from abstract syntax
|
||||||
|
trees to strings of a particular language.
|
||||||
|
|
||||||
|
These mappings defined in concrete syntax are **reversible**: they
|
||||||
|
can be used both for **generating** strings from trees, and for
|
||||||
|
**parsing** strings into trees. Combinations of generation and
|
||||||
|
parsing can be used for **translation**, where the abstract
|
||||||
|
syntax works as an **interlingua**. Thus GF has been used as a
|
||||||
|
framework for building translation systems in several areas
|
||||||
|
of application and large sets of languages.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
==The GF resource grammar library==
|
==The GF resource grammar library==
|
||||||
|
|
||||||
The GF resource grammar library is a set of grammars used as libraries when
|
The GF resource grammar library is a set of grammars usable as libraries when
|
||||||
building interlingua-based translation systems. The library currently covers
|
building translation systems and other applications.
|
||||||
|
The library currently covers
|
||||||
the 9 languages coloured in green in the diagram above; in addition,
|
the 9 languages coloured in green in the diagram above; in addition,
|
||||||
Catalan, Norwegian, and Russian are covered, and there is ongoing work on
|
Catalan, Norwegian, and Russian are covered, and there is ongoing work on
|
||||||
Arabic, Hindi/Urdu, and Thai.
|
Arabic, Hindi/Urdu, Polish, Romanian, and Thai.
|
||||||
|
|
||||||
The purpose of the resource grammar library is to define the "low-level" structure
|
The purpose of the resource grammar library is to define the "low-level" structure
|
||||||
of a language: inflection, word order, agreement. This structure belongs to what
|
of a language: inflection, word order, agreement. This structure belongs to what
|
||||||
linguists call morphology and syntax. It can be very complex and requires
|
linguists call morphology and syntax. It can be very complex and requires
|
||||||
a lot of knowledge. Yet, when translating from one language to another, knowing
|
a lot of knowledge. Yet, when translating from one language to
|
||||||
morphology and syntax is but a part of what is needed. The translator (whether human
|
another, knowing morphology and syntax is but a part of what is needed.
|
||||||
|
The translator (whether human
|
||||||
or machine) must understand the meaning of what is translated, and must also know
|
or machine) must understand the meaning of what is translated, and must also know
|
||||||
the idiomatic way to express the meaning in the target language. This knowledge
|
the idiomatic way to express the meaning in the target language. This knowledge
|
||||||
can be very domain-dependent and requires in general an expert in the field to
|
can be very domain-dependent and requires in general an expert in the field to
|
||||||
@@ -116,8 +103,9 @@ in the field of weather reports, etc.
|
|||||||
The problem is to find a person who is an expert in both the domain of translation
|
The problem is to find a person who is an expert in both the domain of translation
|
||||||
and in the low-level linguistic details. It is the rareness of this combination
|
and in the low-level linguistic details. It is the rareness of this combination
|
||||||
that has made it difficult to build interlingua-based translation systems.
|
that has made it difficult to build interlingua-based translation systems.
|
||||||
The GF resource grammar library has the mission of helping in this task. It encapsulates
|
The GF resource grammar library has the mission of helping in this task.
|
||||||
the low-level linguistics in program modules accessed through easy-to-use interfaces.
|
It encapsulates the low-level linguistics in program modules
|
||||||
|
accessed through easy-to-use interfaces.
|
||||||
Experts on different domains can build translation systems by using the library,
|
Experts on different domains can build translation systems by using the library,
|
||||||
without knowing low-level linguistics. The idea is much the same as when a
|
without knowing low-level linguistics. The idea is much the same as when a
|
||||||
programmer builds a graphical user interface (GUI) from high-level elements such as
|
programmer builds a graphical user interface (GUI) from high-level elements such as
|
||||||
@@ -138,14 +126,17 @@ interlingua-based translation or localization of systems to new languages:
|
|||||||
[``http://webalt.math.helsinki.fi/content/index_eng.html`` http://webalt.math.helsinki.fi/content/index_eng.html],
|
[``http://webalt.math.helsinki.fi/content/index_eng.html`` http://webalt.math.helsinki.fi/content/index_eng.html],
|
||||||
for translating mathematical exercises to 7 languages
|
for translating mathematical exercises to 7 languages
|
||||||
- in TALK [``http://www.talk-project.org`` http://www.talk-project.org],
|
- in TALK [``http://www.talk-project.org`` http://www.talk-project.org],
|
||||||
where the library was used for localizing spoken dialogue systems to six languages
|
where the library was used for localizing spoken dialogue systems
|
||||||
|
to six languages
|
||||||
|
|
||||||
|
|
||||||
The library is also a generic linguistic resource, which can be used for tasks
|
The library is also a generic linguistic resource, which can be used for tasks
|
||||||
such as language teaching and information retrieval. The liberal license (LGPL)
|
such as language teaching and information retrieval. The liberal license (LGPL)
|
||||||
makes it usable for anyone and for any task. GF also has tools supporting the
|
makes it usable for anyone and for any task. GF also has tools supporting the
|
||||||
use of grammars in programs written in other programming languages: C, C++, Haskell,
|
use of grammars in programs written in other
|
||||||
Java, JavaScript, and Prolog. In connection with the TALK project, support has also been
|
programming languages: C, C++, Haskell,
|
||||||
|
Java, JavaScript, and Prolog. In connection with the TALK project,
|
||||||
|
support has also been
|
||||||
developed for translating GF grammars to language models used in speech
|
developed for translating GF grammars to language models used in speech
|
||||||
recognition (GSL/Nuance, HTK/ATK, SRGS, JSGF).
|
recognition (GSL/Nuance, HTK/ATK, SRGS, JSGF).
|
||||||
|
|
||||||
@@ -155,7 +146,7 @@ recognition (GSL/Nuance, HTK/ATK, SRGS, JSGF).
|
|||||||
|
|
||||||
The library has the following main parts:
|
The library has the following main parts:
|
||||||
- **Inflection paradigms**, covering the inflection of each language.
|
- **Inflection paradigms**, covering the inflection of each language.
|
||||||
- **Common Syntax API**, covering a large set of syntax rule that
|
- **Core Syntax**, covering a large set of syntax rule that
|
||||||
can be implemented for all languages involved.
|
can be implemented for all languages involved.
|
||||||
- **Common Test Lexicon**, giving ca. 500 common words that can be used for
|
- **Common Test Lexicon**, giving ca. 500 common words that can be used for
|
||||||
testing the library.
|
testing the library.
|
||||||
@@ -173,7 +164,8 @@ the first three components. The latter three are more open-ended in character.
|
|||||||
|
|
||||||
The goal of the summer school is to extend the GF resource grammar library
|
The goal of the summer school is to extend the GF resource grammar library
|
||||||
to covering all 23 EU languages, which means we need 15 new languages.
|
to covering all 23 EU languages, which means we need 15 new languages.
|
||||||
We also welcome other languages, if there are interested participants.
|
We also welcome other languages than these 23,
|
||||||
|
if there are interested participants.
|
||||||
|
|
||||||
The amount of work and skill is between a Master's thesis and a PhD thesis.
|
The amount of work and skill is between a Master's thesis and a PhD thesis.
|
||||||
The Russian implementation was made by Janna Khegai as a part of her
|
The Russian implementation was made by Janna Khegai as a part of her
|
||||||
@@ -187,45 +179,43 @@ Latvian and Lithuanian are the first languages of the Baltic family and
|
|||||||
will probably require more work.
|
will probably require more work.
|
||||||
|
|
||||||
In any case, the proposed allocation of work power is 2 participants per
|
In any case, the proposed allocation of work power is 2 participants per
|
||||||
language. They will have 6 months to work at home, followed
|
language. They will do 2 months' worth of home work, followed
|
||||||
by 2 weeks of summer school. Who are these participants?
|
by 2 weeks of summer school, followed by 4 months work at home.
|
||||||
|
Who are these participants?
|
||||||
|
|
||||||
|
|
||||||
===Selecting participants===
|
===Selecting participants===
|
||||||
|
|
||||||
After the call has been published, persons interested to participate in
|
Persons interested to participate in the Summer School should sign up in
|
||||||
the project are expected to learn GF by self-study from the
|
the **Google Group** of the course,
|
||||||
[tutorial http://digitalgrammars.com/gf/doc/gf-tutorial.html].
|
|
||||||
This should take a couple of weeks. Also an on-line course will be
|
|
||||||
arranged to help in getting started with GF.
|
|
||||||
|
|
||||||
Participants should continue to
|
[``groups.google.com/group/gf-resource-school-2009/`` http://groups.google.com/group/gf-resource-school-2009/]
|
||||||
implement selected parts of the resource grammar, following the advice from
|
|
||||||
the
|
The participants are expected to learn GF by self-study from the
|
||||||
[Resource-HOWTO document http://digitalgrammars.com/gf/doc/Resource-HOWTO.html].
|
[tutorial http://digitalgrammars.com/gf/doc/gf-tutorial.html].
|
||||||
What parts exactly are selected will be announced later.
|
This should take a couple of weeks. An **on-line course** will be
|
||||||
This work will take another couple of weeks.
|
arranged in April to help in getting started with GF.
|
||||||
|
|
||||||
|
After the on-line course, a **programming assignment** will be published.
|
||||||
|
This assignment will test skills required in resource grammar programming.
|
||||||
|
Work on the assignment will take a couple of weeks.
|
||||||
|
|
||||||
Those who are interested in getting a travel grant will submit
|
Those who are interested in getting a travel grant will submit
|
||||||
their sample resource grammar fragment
|
their sample resource grammar fragment
|
||||||
to the Summer School Committee in the beginning of May.
|
to the Summer School Committee by 12 May.
|
||||||
The Committee then decides who is invited to represent which language
|
The Committee then decides who is invited to represent which language
|
||||||
in the summer school.
|
in the summer school.
|
||||||
|
|
||||||
After the Committee decision, the participants have around three months
|
The summer school itself is devoted for working on resource grammars.
|
||||||
to work on their languages. The work is completed in the summer school
|
In addition to grammar writing itself, testing and evaluation is
|
||||||
itself. It is also thoroughly tested by using it to add new languages
|
performed. One way to do this is via adding new languages
|
||||||
to applications - in particular, to the WebALT mathematical
|
to resource grammar applications - in particular, to the WebALT mathematical
|
||||||
exercise translator.
|
exercise translator.
|
||||||
|
|
||||||
Depending on the quality of submitted work, and on the demands of different
|
The resource grammars are expected to be completed by December 2009. They will
|
||||||
languages, the Committee may decide to select another number than 2 participants
|
be published at GF website and licensed under LGPL.
|
||||||
for a language. We will also consider accepting participants who want to
|
|
||||||
pay their own expenses.
|
|
||||||
|
|
||||||
To keep track on who is working on which language, we will establish a Wiki page
|
The participants are encouraged to contact each other and even work in groups.
|
||||||
soon after the call is published. The participants are encouraged
|
|
||||||
to contact each other and even work in groups.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@@ -254,7 +244,14 @@ who are selected on the basis of their assignments. And not only that:
|
|||||||
we plan to cover their travel and accommodation costs, up to 1000 EUR
|
we plan to cover their travel and accommodation costs, up to 1000 EUR
|
||||||
per person.
|
per person.
|
||||||
|
|
||||||
We try to get the funding question settled by mid-February 2009.
|
The number of grants will be decided during Spring 2009, so that grand
|
||||||
|
holders can be notified before the beginning of June.
|
||||||
|
|
||||||
|
Special terms will apply to students in
|
||||||
|
[GSLT http://www.gslt.hum.gu.se/] and
|
||||||
|
[NGSLT http://ngslt.org/].
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@@ -281,8 +278,8 @@ the results for each language. Please contact us!
|
|||||||
|
|
||||||
===The Summer School Committee===
|
===The Summer School Committee===
|
||||||
|
|
||||||
This committee consists of a number of teachers and consultants,
|
This committee consists of a number of teachers and informants,
|
||||||
who will select the participants. It will be selected by February 2009.
|
who will select the participants. It will be selected by April 2009.
|
||||||
|
|
||||||
|
|
||||||
===Time and Place===
|
===Time and Place===
|
||||||
@@ -292,13 +289,13 @@ be organized at the campus of Chalmers University of Technology in Gothenburg,
|
|||||||
Sweden, on 17-28 August 2009.
|
Sweden, on 17-28 August 2009.
|
||||||
|
|
||||||
Time schedule:
|
Time schedule:
|
||||||
- February: announcement of summer school and the grammar
|
- February: announcement of summer school
|
||||||
writing contest to get participants
|
- April: on-line course, work on the contest assignment
|
||||||
- March-April: on-line course, work on the contest assignment (ca 1 month)
|
- 12 May: submission deadline for assignment work
|
||||||
- May: submission deadline and notification of acceptance
|
- 31 May: review of assignments, notifications of acceptance
|
||||||
- June-July: more work on the grammars
|
- 17-28 August: Summer School
|
||||||
- August: summer school
|
- September-December: homework on resource grammars
|
||||||
- September-December: more homework if necessary
|
- December: release of the extended Resource Grammar Library
|
||||||
|
|
||||||
|
|
||||||
===Dissemination and intellectual property===
|
===Dissemination and intellectual property===
|
||||||
@@ -309,22 +306,115 @@ with the copyright held by respective authors.
|
|||||||
|
|
||||||
The grammars will be distributed via the GF web site.
|
The grammars will be distributed via the GF web site.
|
||||||
|
|
||||||
The WebALT-specific grammars will have special licenses agreed between the
|
|
||||||
authors and WebALT Inc.
|
|
||||||
|
|
||||||
|
|
||||||
==Why I should participate==
|
==Why I should participate==
|
||||||
|
|
||||||
Seven reasons:
|
Seven reasons:
|
||||||
+ participation in a pioneering language technology work in an enthusiastic atmosphere
|
+ participation in a pioneering language technology work in an
|
||||||
|
enthusiastic atmosphere
|
||||||
+ work and fun with people from all over Europe and the world
|
+ work and fun with people from all over Europe and the world
|
||||||
+ job opportunities and business ideas
|
+ job opportunities and business ideas
|
||||||
+ credits: the school project will be established as a course at Chalmers worth
|
+ credits: the school project will be established as a course at Chalmers worth
|
||||||
15 ETCS points per person, but extensions to Master's thesis will
|
7.5 or 15 ETCS points per person, depending on the work accompliched; also
|
||||||
also be considered
|
extensions to Master's thesis will be considered (special credit arrangements
|
||||||
+ merits: the resulting grammar can easily lead to a published paper
|
for [GSLT http://www.gslt.hum.gu.se/] and [NGSLT http://ngslt.org/])
|
||||||
|
+ merits: the resulting grammar can easily lead to a published paper (see below)
|
||||||
+ contribution to the multilingual and multicultural development of Europe and the
|
+ contribution to the multilingual and multicultural development of Europe and the
|
||||||
world
|
world
|
||||||
+ free trip and stay in Gothenburg (for travel grant students)
|
+ free trip and stay in Gothenburg (for travel grant students)
|
||||||
|
|
||||||
|
|
||||||
|
==More information==
|
||||||
|
|
||||||
|
[Course Google Group http://groups.google.com/group/gf-resource-school-2009/]
|
||||||
|
|
||||||
|
[GF web page http://digitalgrammars.com/gf/]
|
||||||
|
|
||||||
|
[GF tutorial http://digitalgrammars.com/gf/doc/gf-tutorial.html]
|
||||||
|
|
||||||
|
[Resource-HOWTO document http://digitalgrammars.com/gf/doc/Resource-HOWTO.html]
|
||||||
|
|
||||||
|
Forthcoming: survey article "The GF Resource Grammar Library"
|
||||||
|
|
||||||
|
Forthcoming: book about GF
|
||||||
|
|
||||||
|
===Contaxt===
|
||||||
|
|
||||||
|
Håkan Burden: burden at chalmers se
|
||||||
|
|
||||||
|
Aarne Ranta: aarne at chalmers se
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
===Selected publications from earlier resource grammar projects===
|
||||||
|
|
||||||
|
K. Angelov.
|
||||||
|
Type-Theoretical Bulgarian Grammar.
|
||||||
|
In B. Nordström and A. Ranta (eds),
|
||||||
|
//Advances in Natural Language Processing (GoTAL 2008)//,
|
||||||
|
LNCS/LNAI 5221, Springer,
|
||||||
|
2008.
|
||||||
|
|
||||||
|
A. El Dada and A. Ranta.
|
||||||
|
Implementing an Open Source Arabic Resource Grammar in GF.
|
||||||
|
In M. Mughazy (ed),
|
||||||
|
//Perspectives on Arabic Linguistics XX. Papers from the Twentieth Annual Symposium on Arabic Linguistics, Kalamazoo, March 26//
|
||||||
|
John Benjamins Publishing Company.
|
||||||
|
2007.
|
||||||
|
|
||||||
|
A. El Dada.
|
||||||
|
Implementation of the Arabic Numerals and their Syntax in GF.
|
||||||
|
Computational Approaches to Semitic Languages: Common Issues and Resources,
|
||||||
|
ACL-2007 Workshop,
|
||||||
|
June 28, 2007, Prague.
|
||||||
|
2007.
|
||||||
|
|
||||||
|
H. Hammarström and A. Ranta.
|
||||||
|
Cardinal Numerals Revisited in GF.
|
||||||
|
//Workshop on Numerals in the World's Languages//.
|
||||||
|
Dept. of Linguistics Max Planck Institute for Evolutionary Anthropology, Leipzig,
|
||||||
|
2004.
|
||||||
|
|
||||||
|
M. Humayoun, H. Hammarström, and A. Ranta.
|
||||||
|
Urdu Morphology, Orthography and Lexicon Extraction.
|
||||||
|
//CAASL-2: The Second Workshop on Computational Approaches to Arabic Script-based Languages//,
|
||||||
|
July 21-22, 2007, LSA 2007 Linguistic Institute, Stanford University.
|
||||||
|
2007.
|
||||||
|
|
||||||
|
J Khegai.
|
||||||
|
GF parallel resource grammars and Russian.
|
||||||
|
In proceedings of ACL2006
|
||||||
|
(The joint conference of the International Committee on Computational
|
||||||
|
Linguistics and the Association for Computational Linguistics) (pp. 475-482),
|
||||||
|
Sydney, Australia, July 2006.
|
||||||
|
|
||||||
|
J. Khegai.
|
||||||
|
Language engineering in Grammatical Framework (GF).
|
||||||
|
Phd thesis, Computer Science, Chalmers University of Technology,
|
||||||
|
2006.
|
||||||
|
|
||||||
|
W. Ng'ang'a.
|
||||||
|
Multilingual content development for eLearning in Africa.
|
||||||
|
eLearning Africa: 1st Pan-African Conference on ICT for Development,
|
||||||
|
Education and Training. 24-26 May 2006, Addis Ababa, Ethiopia.
|
||||||
|
2006.
|
||||||
|
|
||||||
|
N. Perera and A. Ranta.
|
||||||
|
Dialogue System Localization with the GF Resource Grammar Library.
|
||||||
|
//SPEECHGRAM 2007: ACL Workshop on Grammar-Based Approaches to Spoken Language Processing//,
|
||||||
|
June 29, 2007, Prague.
|
||||||
|
2007.
|
||||||
|
|
||||||
|
A. Ranta.
|
||||||
|
Modular Grammar Engineering in GF.
|
||||||
|
//Research on Language and Computation//,
|
||||||
|
5:133-158, 2007.
|
||||||
|
|
||||||
|
A. Ranta.
|
||||||
|
How predictable is Finnish morphology? An experiment on lexicon construction.
|
||||||
|
In J. Nivre, M. Dahllöf and B. Megyesi (eds),
|
||||||
|
//Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein//,
|
||||||
|
University of Uppsala,
|
||||||
|
2008.
|
||||||
|
|
||||||
|
|||||||
100
doc/school-langs.dot
Normal file
100
doc/school-langs.dot
Normal file
@@ -0,0 +1,100 @@
|
|||||||
|
graph{
|
||||||
|
|
||||||
|
size = "8,8" ;
|
||||||
|
|
||||||
|
overlap = scale ;
|
||||||
|
|
||||||
|
"Abs" [label = "Abstract Syntax", style = "solid", shape = "rectangle"] ;
|
||||||
|
|
||||||
|
"1" [label = "Bulgarian", style = "solid", shape = "ellipse", color = "green"] ;
|
||||||
|
"1" -- "Abs" [style = "solid"];
|
||||||
|
|
||||||
|
"2" [label = "Czech", style = "solid", shape = "ellipse", color = "red"] ;
|
||||||
|
"2" -- "Abs" [style = "solid"];
|
||||||
|
|
||||||
|
"3" [label = "Danish", style = "solid", shape = "ellipse", color = "green"] ;
|
||||||
|
"3" -- "Abs" [style = "solid"];
|
||||||
|
|
||||||
|
"4" [label = "German", style = "solid", shape = "ellipse", color = "green"] ;
|
||||||
|
"4" -- "Abs" [style = "solid"];
|
||||||
|
|
||||||
|
"5" [label = "Estonian", style = "solid", shape = "ellipse", color = "red"] ;
|
||||||
|
"5" -- "Abs" [style = "solid"];
|
||||||
|
|
||||||
|
"6" [label = "Greek", style = "solid", shape = "ellipse", color = "red"] ;
|
||||||
|
"6" -- "Abs" [style = "solid"];
|
||||||
|
|
||||||
|
"7" [label = "English", style = "solid", shape = "ellipse", color = "green"] ;
|
||||||
|
"7" -- "Abs" [style = "solid"];
|
||||||
|
|
||||||
|
"8" [label = "Spanish", style = "solid", shape = "ellipse", color = "green"] ;
|
||||||
|
"8" -- "Abs" [style = "solid"];
|
||||||
|
|
||||||
|
"9" [label = "French", style = "solid", shape = "ellipse", color = "green"] ;
|
||||||
|
"9" -- "Abs" [style = "solid"];
|
||||||
|
|
||||||
|
"10" [label = "Italian", style = "solid", shape = "ellipse", color = "green"] ;
|
||||||
|
"10" -- "Abs" [style = "solid"];
|
||||||
|
|
||||||
|
"11" [label = "Latvian", style = "solid", shape = "ellipse", color = "red"] ;
|
||||||
|
"11" -- "Abs" [style = "solid"];
|
||||||
|
|
||||||
|
"12" [label = "Lithuanian", style = "solid", shape = "ellipse", color = "red"] ;
|
||||||
|
"Abs" -- "12" [style = "solid"];
|
||||||
|
|
||||||
|
"13" [label = "Irish", style = "solid", shape = "ellipse", color = "red"] ;
|
||||||
|
"Abs" -- "13" [style = "solid"];
|
||||||
|
|
||||||
|
"14" [label = "Hungarian", style = "solid", shape = "ellipse", color = "red"] ;
|
||||||
|
"Abs" -- "14" [style = "solid"];
|
||||||
|
|
||||||
|
"15" [label = "Maltese", style = "solid", shape = "ellipse", color = "red"] ;
|
||||||
|
"Abs" -- "15" [style = "solid"];
|
||||||
|
|
||||||
|
"16" [label = "Dutch", style = "solid", shape = "ellipse", color = "red"] ;
|
||||||
|
"Abs" -- "16" [style = "solid"];
|
||||||
|
|
||||||
|
"17" [label = "Polish", style = "solid", shape = "ellipse", color = "yellow"] ;
|
||||||
|
"Abs" -- "17" [style = "solid"];
|
||||||
|
|
||||||
|
"18" [label = "Portuguese", style = "solid", shape = "ellipse", color = "red"] ;
|
||||||
|
"Abs" -- "18" [style = "solid"];
|
||||||
|
|
||||||
|
"19" [label = "Slovak", style = "solid", shape = "ellipse", color = "red"] ;
|
||||||
|
"Abs" -- "19" [style = "solid"];
|
||||||
|
|
||||||
|
"20" [label = "Slovene", style = "solid", shape = "ellipse", color = "red"] ;
|
||||||
|
"Abs" -- "20" [style = "solid"];
|
||||||
|
|
||||||
|
"21" [label = "Romanian", style = "solid", shape = "ellipse", color = "yellow"] ;
|
||||||
|
"Abs" -- "21" [style = "solid"];
|
||||||
|
|
||||||
|
"22" [label = "Finnish", style = "solid", shape = "ellipse", color = "green"] ;
|
||||||
|
"Abs" -- "22" [style = "solid"];
|
||||||
|
|
||||||
|
"23" [label = "Swedish", style = "solid", shape = "ellipse", color = "green"] ;
|
||||||
|
"Abs" -- "23" [style = "solid"];
|
||||||
|
|
||||||
|
"24" [label = "Catalan", style = "dotted", shape = "ellipse", color = "green"] ;
|
||||||
|
"Abs" -- "24" [style = "solid"];
|
||||||
|
|
||||||
|
"25" [label = "Norwegian", style = "dotted", shape = "ellipse", color = "green"] ;
|
||||||
|
"Abs" -- "25" [style = "solid"];
|
||||||
|
|
||||||
|
"26" [label = "Russian", style = "dotted", shape = "ellipse", color = "green"] ;
|
||||||
|
"Abs" -- "26" [style = "solid"];
|
||||||
|
|
||||||
|
"27" [label = "Interlingua", style = "dotted", shape = "ellipse", color = "green"] ;
|
||||||
|
"Abs" -- "27" [style = "solid"];
|
||||||
|
|
||||||
|
"28" [label = "Latin", style = "dotted", shape = "ellipse", color = "yellow"] ;
|
||||||
|
"Abs" -- "28" [style = "solid"];
|
||||||
|
"29" [label = "Turkish", style = "dotted", shape = "ellipse", color = "yellow"] ;
|
||||||
|
"Abs" -- "29" [style = "solid"];
|
||||||
|
"30" [label = "Hindi", style = "dotted", shape = "ellipse", color = "yellow"] ;
|
||||||
|
"Abs" -- "30" [style = "solid"];
|
||||||
|
"31" [label = "Thai", style = "dotted", shape = "ellipse", color = "yellow"] ;
|
||||||
|
"Abs" -- "31" [style = "solid"];
|
||||||
|
|
||||||
|
|
||||||
|
}
|
||||||
BIN
doc/school-langs.png
Normal file
BIN
doc/school-langs.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 134 KiB |
@@ -187,17 +187,29 @@ resource ParadigmsAra = open
|
|||||||
-- The definitions should not bother the user of the API. So they are
|
-- The definitions should not bother the user of the API. So they are
|
||||||
-- hidden from the document.
|
-- hidden from the document.
|
||||||
|
|
||||||
{-
|
|
||||||
-- AED's original definition of regV
|
|
||||||
|
|
||||||
regV = \word ->
|
----AR AED's original definition of regV
|
||||||
case word of {
|
regV_orig : Str -> V = \wo ->
|
||||||
|
case wo of {
|
||||||
"يَ" + f@_ + c@_ + "ُ" + l@_ => v1 (f+c+l) a u ;
|
"يَ" + f@_ + c@_ + "ُ" + l@_ => v1 (f+c+l) a u ;
|
||||||
"يَ" + f@_ + c@_ + "ِ" + l@_ => v1 (f+c+l) a i ;
|
"يَ" + f@_ + c@_ + "ِ" + l@_ => v1 (f+c+l) a i ;
|
||||||
"يَ" + f@_ + c@_ + "َ" + l@_ => v1 (f+c+l) a a ;
|
"يَ" + f@_ + c@_ + "َ" + l@_ => v1 (f+c+l) a a ;
|
||||||
f@_ + "َ" + c@_ + "ِ" + l@_ => v1 (f+c+l) i a
|
f@_ + "َ" + c@_ + "ِ" + l@_ => v1 (f+c+l) i a ;
|
||||||
|
_ => Predef.error "regV not applicable"
|
||||||
};
|
};
|
||||||
-}
|
|
||||||
|
|
||||||
|
regV_o : Str -> Str = \word ->
|
||||||
|
case word of {
|
||||||
|
"يَ" + f@_ + c@_ + "ُ" + l@_ => "a" ;
|
||||||
|
"يَ" + f@_ + c@_ + "ِ" + l@_ => "b" ;
|
||||||
|
"يَ" + f@_ + c@_ + "َ" + l@_ => "c" ;
|
||||||
|
f@_ + "َ" + c@_ + "ِ" + l@_ => "d" ;
|
||||||
|
_ => "q"
|
||||||
|
};
|
||||||
|
aa = a ; uu = u ; ii = i ;
|
||||||
|
----AR for debug end
|
||||||
|
|
||||||
|
|
||||||
---- begin workaround for a problem with pattern matching, AR 27/6/2008
|
---- begin workaround for a problem with pattern matching, AR 27/6/2008
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user