mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-30 14:52:51 -06:00
moved parts of doc to deprecated/doc
This commit is contained in:
533
deprecated/doc/gf-summerschool.txt
Normal file
533
deprecated/doc/gf-summerschool.txt
Normal file
@@ -0,0 +1,533 @@
|
||||
GF Resource Grammar Summer School
|
||||
Gothenburg, 17-28 August 2009
|
||||
Aarne Ranta (aarne at chalmers.se)
|
||||
|
||||
%!Encoding : iso-8859-1
|
||||
|
||||
%!target:html
|
||||
%!postproc(html): #BECE <center>
|
||||
%!postproc(html): #ENCE </center>
|
||||
%!postproc(html): #GRAY <font color="green" size="-1">
|
||||
%!postproc(html): #EGRAY </font>
|
||||
%!postproc(html): #RED <font color="red">
|
||||
%!postproc(html): #YELLOW <font color="orange">
|
||||
%!postproc(html): #ERED </font>
|
||||
|
||||
#BECE
|
||||
[school-langs.png]
|
||||
#ENCE
|
||||
|
||||
|
||||
//red=wanted, green=exists, orange=in-progress, solid=official-eu, dotted=non-eu//
|
||||
|
||||
|
||||
==News==
|
||||
|
||||
An on-line course //GF for Resource Grammar Writers// will start on
|
||||
Monday 20 April at 15.30 CEST. The slides and recordings of the five
|
||||
45-minute lectures will be made available via this web page. If requested,
|
||||
the course may be repeated in the beginning of the summer school.
|
||||
|
||||
|
||||
==Executive summary==
|
||||
|
||||
GF Resource Grammar Library is an open-source computational grammar resource
|
||||
that currently covers 12 languages.
|
||||
The Summer School is a part of a collaborative effort to extend the library
|
||||
to all of the 23 official EU languages. Also other languages
|
||||
chosen by the participants are welcome.
|
||||
|
||||
The missing EU languages are:
|
||||
Czech, Dutch, Estonian, Greek, Hungarian, Irish, Latvian, Lithuanian,
|
||||
Maltese, Portuguese, Slovak, and Slovenian. There is also more work to
|
||||
be done on Polish and Romanian.
|
||||
|
||||
The linguistic coverage of the library includes the inflectional morphology
|
||||
and basic syntax of each language. It can be used in GF applications
|
||||
and also ported to other formats. It can also be used for building other
|
||||
linguistic resources, such as morphological lexica and parsers.
|
||||
The library is licensed under LGPL.
|
||||
|
||||
In the summer school, each language will be implemented by one or two students
|
||||
working together. A morphology implementation will be credited
|
||||
as a Chalmers course worth 7.5 ETCS points; adding a syntax implementation
|
||||
will be worth more. The estimated total work load is 1-2 months for the
|
||||
morphology, and 3-6 months for the whole grammar.
|
||||
|
||||
Participation in the course is free. Registration is done via the courses's
|
||||
Google group, [``groups.google.com/group/gf-resource-school-2009/`` http://groups.google.com/group/gf-resource-school-2009/]. The registration deadline is 15 June 2009.
|
||||
|
||||
Some travel grants will be available. They are distributed on the basis of a
|
||||
GF programming contest in April and May.
|
||||
|
||||
The summer school will be held on 17-28 August 2009, at the campus of
|
||||
Chalmers University of Technology in Gothenburg, Sweden.
|
||||
|
||||
|
||||
[align6.png]
|
||||
|
||||
//Word alignment produced by GF from the resource grammar in Bulgarian, English, Italian, German, Finnish, French, and Swedish.//
|
||||
|
||||
==Introduction==
|
||||
|
||||
Since 2007, EU-27 has 23 official languages, listed in the diagram on top of this
|
||||
document. There is a growing need of linguistic resources for these
|
||||
languages, to help in tasks such as translation and information retrieval.
|
||||
These resources should be **portable** and **freely accessible**.
|
||||
Languages marked in red in the diagram are of particular interest for
|
||||
the summer school, since they are those on which the effort will be concentrated.
|
||||
|
||||
GF (Grammatical Framework,
|
||||
[``digitalgrammars.com/gf`` http://digitalgrammars.com/gf])
|
||||
is a **functional programming language** designed for writing natural
|
||||
language grammars. It provides an efficient platform for this task, due to
|
||||
its modern characteristics:
|
||||
- It is a functional programming language, similar to Haskell and ML.
|
||||
- It has a static type system and type checker.
|
||||
- It has a powerful module system supporting separate compilation
|
||||
and data abstraction.
|
||||
- It has an optimizing compiler to **Portable Grammar Format** (PGF).
|
||||
- PGF can be further compiled to other formats, such as JavaScript and
|
||||
speech recognition language models.
|
||||
- GF has a **resource grammar library** giving access to the morphology and
|
||||
basic syntax of 12 languages.
|
||||
|
||||
|
||||
In addition to "ordinary" grammars for single languages, GF
|
||||
supports **multilingual grammars**. A multilingual GF grammar consists of an
|
||||
**abstract syntax** and a set of **concrete syntaxes**.
|
||||
An abstract syntax is system of **trees**, serving as a semantic
|
||||
model or an ontology. A concrete syntax is a mapping from abstract syntax
|
||||
trees to strings of a particular language.
|
||||
|
||||
These mappings defined in concrete syntax are **reversible**: they
|
||||
can be used both for **generating** strings from trees, and for
|
||||
**parsing** strings into trees. Combinations of generation and
|
||||
parsing can be used for **translation**, where the abstract
|
||||
syntax works as an **interlingua**. Thus GF has been used as a
|
||||
framework for building translation systems in several areas
|
||||
of application and large sets of languages.
|
||||
|
||||
|
||||
|
||||
==The GF resource grammar library==
|
||||
|
||||
The GF resource grammar library is a set of grammars usable as libraries when
|
||||
building translation systems and other applications.
|
||||
The library currently covers
|
||||
the 9 languages coloured in green in the diagram above; in addition,
|
||||
Catalan, Norwegian, and Russian are covered, and there is ongoing work on
|
||||
Arabic, Hindi/Urdu, Polish, Romanian, and Thai.
|
||||
|
||||
The purpose of the resource grammar library is to define the "low-level" structure
|
||||
of a language: inflection, word order, agreement. This structure belongs to what
|
||||
linguists call morphology and syntax. It can be very complex and requires
|
||||
a lot of knowledge. Yet, when translating from one language to
|
||||
another, knowing morphology and syntax is but a part of what is needed.
|
||||
The translator (whether human
|
||||
or machine) must understand the meaning of what is translated, and must also know
|
||||
the idiomatic way to express the meaning in the target language. This knowledge
|
||||
can be very domain-dependent and requires in general an expert in the field to
|
||||
reach high quality: a mathematician in the field of mathematics, a meteorologist
|
||||
in the field of weather reports, etc.
|
||||
|
||||
The problem is to find a person who is an expert in both the domain of translation
|
||||
and in the low-level linguistic details. It is the rareness of this combination
|
||||
that has made it difficult to build interlingua-based translation systems.
|
||||
The GF resource grammar library has the mission of helping in this task.
|
||||
It encapsulates the low-level linguistics in program modules
|
||||
accessed through easy-to-use interfaces.
|
||||
Experts on different domains can build translation systems by using the library,
|
||||
without knowing low-level linguistics. The idea is much the same as when a
|
||||
programmer builds a graphical user interface (GUI) from high-level elements such as
|
||||
buttons and menus, without having to care about pixels or geometrical forms.
|
||||
|
||||
|
||||
===Missing EU languages, by the family===
|
||||
|
||||
Writing a grammar for a language is usually easier if other languages
|
||||
from the same family already have grammars. The colours have the same
|
||||
meaning as in the diagram above.
|
||||
|
||||
Baltic:
|
||||
#RED Latvian #ERED
|
||||
#RED Lithuanian #ERED
|
||||
|
||||
Celtic:
|
||||
#RED Irish #ERED
|
||||
|
||||
Fenno-Ugric:
|
||||
#RED Estonian #ERED
|
||||
#GRAY Finnish #EGRAY
|
||||
#RED Hungarian #ERED
|
||||
|
||||
Germanic:
|
||||
#GRAY Danish #EGRAY
|
||||
#RED Dutch #ERED
|
||||
#GRAY English #EGRAY
|
||||
#GRAY German #EGRAY
|
||||
#GRAY Swedish #EGRAY
|
||||
|
||||
Hellenic:
|
||||
#RED Greek #ERED
|
||||
|
||||
Romance:
|
||||
#GRAY French #EGRAY
|
||||
#GRAY Italian #EGRAY
|
||||
#RED Portuguese #ERED
|
||||
#YELLOW Romanian #ERED
|
||||
#GRAY Spanish #EGRAY
|
||||
|
||||
Semitic:
|
||||
#RED Maltese #ERED
|
||||
|
||||
Slavonic:
|
||||
#GRAY Bulgarian #EGRAY
|
||||
#RED Czech #ERED
|
||||
#YELLOW Polish #ERED
|
||||
#RED Slovak #ERED
|
||||
#RED Slovenian #ERED
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
===Applications of the library===
|
||||
|
||||
In addition to translation, the library is also useful in **localization**,
|
||||
that is, porting a piece of software to new languages.
|
||||
The GF resource grammar library has been used in three major projects that need
|
||||
interlingua-based translation or localization of systems to new languages:
|
||||
- in KeY,
|
||||
[``http://www.key-project.org/`` http://www.key-project.org/],
|
||||
for writing formal and informal software specifications (3 languages)
|
||||
- in WebALT,
|
||||
[``http://webalt.math.helsinki.fi/content/index_eng.html`` http://webalt.math.helsinki.fi/content/index_eng.html],
|
||||
for translating mathematical exercises to 7 languages
|
||||
- in TALK [``http://www.talk-project.org`` http://www.talk-project.org],
|
||||
where the library was used for localizing spoken dialogue systems
|
||||
to six languages
|
||||
|
||||
|
||||
The library is also a generic **linguistic resource**,
|
||||
which can be used for tasks
|
||||
such as language teaching and information retrieval. The liberal license (LGPL)
|
||||
makes it usable for anyone and for any task. GF also has tools supporting the
|
||||
use of grammars in programs written in other
|
||||
programming languages: C, C++, Haskell,
|
||||
Java, JavaScript, and Prolog. In connection with the TALK project,
|
||||
support has also been
|
||||
developed for translating GF grammars to language models used in speech
|
||||
recognition (GSL/Nuance, HTK/ATK, SRGS, JSGF).
|
||||
|
||||
|
||||
|
||||
===The structure of the library===
|
||||
|
||||
The library has the following main parts:
|
||||
- **Inflection paradigms**, covering the inflection of each language.
|
||||
- **Core Syntax**, covering a large set of syntax rule that
|
||||
can be implemented for all languages involved.
|
||||
- **Common Test Lexicon**, giving ca. 500 common words that can be used for
|
||||
testing the library.
|
||||
- **Language-Specific Syntax Extensions**, covering syntax rules that are
|
||||
not implementable for all languages.
|
||||
- **Language-Specific Lexica**, word lists for each language, with
|
||||
accurate morphological and syntactic information.
|
||||
|
||||
|
||||
The goal of the summer school is to implement, for each language, at least
|
||||
the first three components. The latter three are more open-ended in character.
|
||||
|
||||
|
||||
==The summer school==
|
||||
|
||||
The goal of the summer school is to extend the GF resource grammar library
|
||||
to covering all 23 EU languages, which means we need 15 new languages.
|
||||
We also welcome other languages than these 23,
|
||||
if there are interested participants.
|
||||
|
||||
The amount of work and skill is between a Master's thesis and a PhD thesis.
|
||||
The Russian implementation was made by Janna Khegai as a part of her
|
||||
PhD thesis; the thesis contains other material, too.
|
||||
The Arabic implementation was started by Ali El Dada in his Master's thesis,
|
||||
but the thesis does not cover the whole API. The realistic amount of work is
|
||||
somewhere between 3 and 8 person months,
|
||||
but this is very much language-dependent.
|
||||
Dutch, for instance, can profit from previous implementations of German and
|
||||
Scandinavian languages, and will probably require less work.
|
||||
Latvian and Lithuanian are the first languages of the Baltic family and
|
||||
will probably require more work.
|
||||
|
||||
In any case, the proposed allocation of work power is 2 participants per
|
||||
language. They will do 1 months' worth of home work, followed
|
||||
by 2 weeks of summer school, followed by 4 months work at home.
|
||||
Who are these participants?
|
||||
|
||||
|
||||
===Selecting participants===
|
||||
|
||||
Persons interested to participate in the Summer School should sign up in
|
||||
the **Google Group** of the course,
|
||||
|
||||
[``groups.google.com/group/gf-resource-school-2009/`` http://groups.google.com/group/gf-resource-school-2009/]
|
||||
|
||||
The registration deadline is 15 June 2009.
|
||||
|
||||
Notice: you can sign up in the Google
|
||||
group even if you are not planning to attend the summer school, but are
|
||||
just interested in the topic. There will be a separate registration to the
|
||||
school itself later.
|
||||
|
||||
The participants are recommended to learn GF in advance, by self-study from the
|
||||
[tutorial http://digitalgrammars.com/gf/doc/gf-tutorial.html].
|
||||
This should take a couple of weeks. An **on-line course** will be
|
||||
arranged on 20-29 April to help in getting started with GF.
|
||||
|
||||
At the end of the on-line course, a **programming assignment** will be published.
|
||||
This assignment will test skills required in resource grammar programming.
|
||||
Work on the assignment will take a couple of weeks.
|
||||
Those who are interested in getting a travel grant will submit
|
||||
their sample resource grammar fragment
|
||||
to the Summer School Committee by 12 May.
|
||||
The Committee then decides who is given a travel grant of up to 1000 EUR.
|
||||
|
||||
Notice: you can participate in the summer school without following the on-line
|
||||
course or participating in the contest. These things are required only if you
|
||||
want a travel grant. If requested by enough many participants, the lectures of
|
||||
the on-line course will be repeated in the beginning of the summer school.
|
||||
|
||||
The summer school itself is devoted for working on resource grammars.
|
||||
In addition to grammar writing itself, testing and evaluation is
|
||||
performed. One way to do this is via adding new languages
|
||||
to resource grammar applications - in particular, to the WebALT mathematical
|
||||
exercise translator.
|
||||
|
||||
The resource grammars are expected to be completed by December 2009. They will
|
||||
be published at GF website and licensed under LGPL.
|
||||
|
||||
The participants are encouraged to contact each other and even work in groups.
|
||||
|
||||
|
||||
|
||||
===Who is qualified===
|
||||
|
||||
Writing a resource grammar implementation requires good general programming
|
||||
skills, and a good explicit knowledge of the grammar of the target language.
|
||||
A typical participant could be
|
||||
- native or fluent speaker of the target language
|
||||
- interested in languages on the theoretical level, and preferably familiar
|
||||
with many languages (to be able to think about them on an abstract level)
|
||||
- familiar with functional programming languages such as ML or Haskell
|
||||
(GF itself is a language similar to these)
|
||||
- on Master's or PhD level in linguistics, computer science, or mathematics
|
||||
|
||||
|
||||
But it is the quality of the assignment that is assessed, not any formal
|
||||
requirements. The "typical participant" was described to give an idea of
|
||||
who is likely to succeed in this.
|
||||
|
||||
|
||||
===Costs===
|
||||
|
||||
The summer school is free of charge.
|
||||
|
||||
Some travel grants are given, on the basis of a programming contest,
|
||||
to cover travel and accommodation costs up to 1000 EUR
|
||||
per person.
|
||||
|
||||
The number of grants will be decided during Spring 2009, and the grand
|
||||
holders will be notified before the beginning of June.
|
||||
|
||||
Special terms will apply to students in
|
||||
[GSLT http://www.gslt.hum.gu.se/] and
|
||||
[NGSLT http://ngslt.org/].
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
===Teachers===
|
||||
|
||||
A list of teachers will be published here later. Some of the local teachers
|
||||
probably involved are the following:
|
||||
- Krasimir Angelov
|
||||
- Robin Cooper
|
||||
- Håkan Burden
|
||||
- Markus Forsberg
|
||||
- Harald Hammarström
|
||||
- Peter Ljunglöf
|
||||
- Aarne Ranta
|
||||
|
||||
|
||||
More teachers are welcome! If you are interested, please contact us so that
|
||||
we can discuss your involvement and travel arrangements.
|
||||
|
||||
In addition to teachers, we will look for consultants who can help to assess
|
||||
the results for each language. Please contact us!
|
||||
|
||||
|
||||
|
||||
===The Summer School Committee===
|
||||
|
||||
This committee consists of a number of teachers and informants,
|
||||
who will select the participants. It will be selected by April 2009.
|
||||
|
||||
|
||||
===Time and Place===
|
||||
|
||||
The summer school will
|
||||
be organized at the campus of Chalmers University of Technology in Gothenburg,
|
||||
Sweden, on 17-28 August 2009.
|
||||
|
||||
Time schedule:
|
||||
- February: announcement of summer school
|
||||
- 20-29 April: on-line course
|
||||
- 12 May: submission deadline for assignment work
|
||||
- 31 May: review of assignments, notifications of acceptance
|
||||
- 15 June: **registration deadline**
|
||||
- 17-28 August: Summer School
|
||||
- September-December: homework on resource grammars
|
||||
- December: release of the extended Resource Grammar Library
|
||||
|
||||
|
||||
===Dissemination and intellectual property===
|
||||
|
||||
The new resource grammars will be released under the LGPL just like
|
||||
the current resource grammars,
|
||||
with the copyright held by respective authors.
|
||||
|
||||
The grammars will be distributed via the GF web site.
|
||||
|
||||
|
||||
|
||||
==Why I should participate==
|
||||
|
||||
Seven reasons:
|
||||
+ participation in a pioneering language technology work in an
|
||||
enthusiastic atmosphere
|
||||
+ work and fun with people from all over Europe and the world
|
||||
+ job opportunities and business ideas
|
||||
+ credits: the school project will be established as a course at Chalmers worth
|
||||
7.5 or 15 ETCS points per person, depending on the work accompliched; also
|
||||
extensions to Master's thesis will be considered (special credit arrangements
|
||||
for [GSLT http://www.gslt.hum.gu.se/] and [NGSLT http://ngslt.org/])
|
||||
+ merits: the resulting grammar can easily lead to a published paper (see below)
|
||||
+ contribution to the multilingual and multicultural development of Europe and the
|
||||
world
|
||||
+ free trip and stay in Gothenburg (for travel grant students)
|
||||
|
||||
|
||||
==More information==
|
||||
|
||||
[Course Google Group http://groups.google.com/group/gf-resource-school-2009/]
|
||||
|
||||
[GF web page http://digitalgrammars.com/gf/]
|
||||
|
||||
[GF tutorial http://digitalgrammars.com/gf/doc/gf-tutorial.html]
|
||||
|
||||
[GF resource synopsis http://digitalgrammars.com/gf/lib/resource/doc/synopsis.html]
|
||||
|
||||
[Resource-HOWTO document http://digitalgrammars.com/gf/doc/Resource-HOWTO.html]
|
||||
|
||||
|
||||
===Contact===
|
||||
|
||||
Håkan Burden: burden at chalmers se
|
||||
|
||||
Aarne Ranta: aarne at chalmers se
|
||||
|
||||
|
||||
|
||||
===Selected publications from earlier resource grammar projects===
|
||||
|
||||
K. Angelov.
|
||||
Type-Theoretical Bulgarian Grammar.
|
||||
In B. Nordström and A. Ranta (eds),
|
||||
//Advances in Natural Language Processing (GoTAL 2008)//,
|
||||
LNCS/LNAI 5221, Springer,
|
||||
2008.
|
||||
|
||||
B. Bringert.
|
||||
//Programming Language Techniques for Natural Language Applications//.
|
||||
Phd thesis, Computer Science, University of Gothenburg,
|
||||
2008.
|
||||
|
||||
A. El Dada and A. Ranta.
|
||||
Implementing an Open Source Arabic Resource Grammar in GF.
|
||||
In M. Mughazy (ed),
|
||||
//Perspectives on Arabic Linguistics XX. Papers from the Twentieth Annual Symposium on Arabic Linguistics, Kalamazoo, March 26//
|
||||
John Benjamins Publishing Company.
|
||||
2007.
|
||||
|
||||
A. El Dada.
|
||||
Implementation of the Arabic Numerals and their Syntax in GF.
|
||||
Computational Approaches to Semitic Languages: Common Issues and Resources,
|
||||
ACL-2007 Workshop,
|
||||
June 28, 2007, Prague.
|
||||
2007.
|
||||
|
||||
H. Hammarström and A. Ranta.
|
||||
Cardinal Numerals Revisited in GF.
|
||||
//Workshop on Numerals in the World's Languages//.
|
||||
Dept. of Linguistics Max Planck Institute for Evolutionary Anthropology, Leipzig,
|
||||
2004.
|
||||
|
||||
M. Humayoun, H. Hammarström, and A. Ranta.
|
||||
Urdu Morphology, Orthography and Lexicon Extraction.
|
||||
//CAASL-2: The Second Workshop on Computational Approaches to Arabic Script-based Languages//,
|
||||
July 21-22, 2007, LSA 2007 Linguistic Institute, Stanford University.
|
||||
2007.
|
||||
|
||||
K. Johannisson.
|
||||
//Formal and Informal Software Specifications.//
|
||||
Phd thesis, Computer Science, University of Gothenburg,
|
||||
2005.
|
||||
|
||||
J. Khegai.
|
||||
GF parallel resource grammars and Russian.
|
||||
In proceedings of ACL2006
|
||||
(The joint conference of the International Committee on Computational
|
||||
Linguistics and the Association for Computational Linguistics) (pp. 475-482),
|
||||
Sydney, Australia, July 2006.
|
||||
|
||||
J. Khegai.
|
||||
//Language engineering in Grammatical Framework (GF)//.
|
||||
Phd thesis, Computer Science, Chalmers University of Technology,
|
||||
2006.
|
||||
|
||||
W. Ng'ang'a.
|
||||
Multilingual content development for eLearning in Africa.
|
||||
eLearning Africa: 1st Pan-African Conference on ICT for Development,
|
||||
Education and Training. 24-26 May 2006, Addis Ababa, Ethiopia.
|
||||
2006.
|
||||
|
||||
N. Perera and A. Ranta.
|
||||
Dialogue System Localization with the GF Resource Grammar Library.
|
||||
//SPEECHGRAM 2007: ACL Workshop on Grammar-Based Approaches to Spoken Language Processing//,
|
||||
June 29, 2007, Prague.
|
||||
2007.
|
||||
|
||||
A. Ranta.
|
||||
Modular Grammar Engineering in GF.
|
||||
//Research on Language and Computation//,
|
||||
5:133-158, 2007.
|
||||
|
||||
A. Ranta.
|
||||
How predictable is Finnish morphology? An experiment on lexicon construction.
|
||||
In J. Nivre, M. Dahllöf and B. Megyesi (eds),
|
||||
//Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein//,
|
||||
University of Uppsala,
|
||||
2008.
|
||||
|
||||
A. Ranta. Grammars as Software Libraries.
|
||||
To appear in
|
||||
Y. Bertot, G. Huet, J-J. Lévy, and G. Plotkin (eds.),
|
||||
//From Semantics to Computer Science//,
|
||||
Cambridge University Press, Cambridge, 2009.
|
||||
|
||||
A. Ranta and K. Angelov.
|
||||
Implementing Controlled Languages in GF.
|
||||
To appear in the proceedings of //CNL 2009//.
|
||||
|
||||
Reference in New Issue
Block a user