diff --git a/doc/gf-summerschool.html b/doc/gf-summerschool.html index 6fac73612..8f47f3671 100644 --- a/doc/gf-summerschool.html +++ b/doc/gf-summerschool.html @@ -3,21 +3,56 @@ -European Resource Grammar Summer School +GF Resource Grammar Summer School -

European Resource Grammar Summer School

+

GF Resource Grammar Summer School

Gothenburg, 17-28 August 2009
Aarne Ranta (aarne at chalmers.se)
+

+
+

+ + +

+
+

-preliminary version, 17 November 2008 +

+ +

- +red=wanted, green=exists, yellow=in-progress, solid=official-eu, dotted=non-eu

-

Executive summary

+ +

Executive summary

We plan to organize a summer school with the goal of implementing the GF resource grammar library for 15 new languages, so that the library will @@ -32,91 +67,76 @@ and also ported to other formats. The library is licensed under LGPL.

Each language is implemented by one or two students working together. -Travel grants will be available for students selected on the basis of +Travel grants will be available for some students selected on the basis of pre-conference assignments.

-The official announcement will be in January 2009, and the summer school -itself on 17-28 August 2009, at the campus of Chalmers University of -Technology in Gothenburg, Sweden. +The summer school will be held on 17-28 August 2009, at the campus of +Chalmers University of Technology in Gothenburg, Sweden.

+

Introduction

Since 2007, EU-27 has 23 official languages, listed in the diagram on top of this -document. -There is a growing need of translation between -these languages. The traditional language-to-language method requires 23*22 = 506 -translators (humans or computer programs) to cover all possible translation needs. -

-

-An alternative to language-to-language translation is the use of an interlingua: -a language-independent representation such that all translation problems can -be reduced to translating to and from the interlingua. With 23 languages, -only 2*23 = 46 translators are needed. -

-

-Interlingua sounds too good to be true. In a sense, it is. All attempts to -create an interlingua that would solve all translation problems have failed. -However, interlinguas for restricted applications have shown more -success. For instance, mathematical texts and weather reports can be translated -by using interlinguas tailor-made for the domains of mathematics and weather reports, -respectively. -

-

-What is required of an interlingua is -

- - -

-Thus, for instance, an interlingua for mathematical texts may be based on -mathematical logic, which at the same time gives semantic accuracy and -language independence. In other domains, something else than mathematical -logic may be needed; the ontologies defined within the semantic -web technology are often good starting points for interlinguas. -

-

GF: a framework for multilingual grammars

-

-The interlingua is just one part of a translation system. We also need -the mappings between the interlingua and the involved languages. As the -number of languages increases, this part grows while the interlingua remains -constant. +document. There is a growing need of linguistic resources for these +languages, to help in tasks such as translation and information retrieval. +These resources should be portable and freely accessible. +Languages marked in red in the diagram are of particular interest for +the summer school, since they are those on which the effort will be concentrated.

GF (Grammatical Framework, digitalgrammars.com/gf) -is a programming language designed to support interlingua-based translation. -A "program" in GF is a multilingual grammar, which consists of an -abstract syntax and a set of concrete syntaxes. A concrete -syntaxes is a mapping from the abstract syntax to a particular language. -These mappings are reversible, which means that they can be used for -translating in both directions. This means that creating an interlingua-based -translator for 23 languages just requires 1 + 23 = 24 grammar modules (the abstract -syntax and the concrete syntaxes). +is a functional programming language designed for writing natural +language grammars. It provides an efficient platform for this task, due to +its modern characteristics: +

+ + +

+In addition to "ordinary" grammars for single languages, GF +supports multilingual grammars. A multilingual GF grammar consists of an +abstract syntax and a set of concrete syntaxes. +An abstract syntax is system of trees, serving as a semantic +model or an ontology. A concrete syntax is a mapping from abstract syntax +trees to strings of a particular language.

-The diagram first in this document shows an interlingua -system covering the 23 EU languages. -Languages marked in -red are of particular interest for the summer school, since they are those -on which the effort will be concentrated. +These mappings defined in concrete syntax are reversible: they +can be used both for generating strings from trees, and for +parsing strings into trees. Combinations of generation and +parsing can be used for translation, where the abstract +syntax works as an interlingua. Thus GF has been used as a +framework for building translation systems in several areas +of application and large sets of languages.

+

The GF resource grammar library

-The GF resource grammar library is a set of grammars used as libraries when -building interlingua-based translation systems. The library currently covers +The GF resource grammar library is a set of grammars usable as libraries when +building translation systems and other applications. +The library currently covers the 9 languages coloured in green in the diagram above; in addition, Catalan, Norwegian, and Russian are covered, and there is ongoing work on -Arabic, Hindi/Urdu, and Thai. +Arabic, Hindi/Urdu, Polish, Romanian, and Thai.

The purpose of the resource grammar library is to define the "low-level" structure of a language: inflection, word order, agreement. This structure belongs to what linguists call morphology and syntax. It can be very complex and requires -a lot of knowledge. Yet, when translating from one language to another, knowing -morphology and syntax is but a part of what is needed. The translator (whether human +a lot of knowledge. Yet, when translating from one language to +another, knowing morphology and syntax is but a part of what is needed. +The translator (whether human or machine) must understand the meaning of what is translated, and must also know the idiomatic way to express the meaning in the target language. This knowledge can be very domain-dependent and requires in general an expert in the field to @@ -127,13 +147,15 @@ in the field of weather reports, etc. The problem is to find a person who is an expert in both the domain of translation and in the low-level linguistic details. It is the rareness of this combination that has made it difficult to build interlingua-based translation systems. -The GF resource grammar library has the mission of helping in this task. It encapsulates -the low-level linguistics in program modules accessed through easy-to-use interfaces. +The GF resource grammar library has the mission of helping in this task. +It encapsulates the low-level linguistics in program modules +accessed through easy-to-use interfaces. Experts on different domains can build translation systems by using the library, without knowing low-level linguistics. The idea is much the same as when a programmer builds a graphical user interface (GUI) from high-level elements such as buttons and menus, without having to care about pixels or geometrical forms.

+

Applications of the library

In addition to translation, the library is also useful in localization, @@ -149,25 +171,29 @@ interlingua-based translation or localization of systems to new languages: http://webalt.math.helsinki.fi/content/index_eng.html, for translating mathematical exercises to 7 languages

  • in TALK http://www.talk-project.org, - where the library was used for localizing spoken dialogue systems to six languages + where the library was used for localizing spoken dialogue systems + to six languages

    The library is also a generic linguistic resource, which can be used for tasks such as language teaching and information retrieval. The liberal license (LGPL) makes it usable for anyone and for any task. GF also has tools supporting the -use of grammars in programs written in other programming languages: C, C++, Haskell, -Java, JavaScript, and Prolog. In connection with the TALK project, support has also been +use of grammars in programs written in other +programming languages: C, C++, Haskell, +Java, JavaScript, and Prolog. In connection with the TALK project, +support has also been developed for translating GF grammars to language models used in speech recognition (GSL/Nuance, HTK/ATK, SRGS, JSGF).

    +

    The structure of the library

    The library has the following main parts: