The GF Resource Grammar Library Version 1.0

Author: Aarne Ranta <aarne (at) cs.chalmers.se>
Last update: Sat Mar 4 14:20:07 2006

Plan

Purpose

Background

Coverage

Structure

How to use

How to implement a new language

How to extend the API

Purpose

Library for applications

High-level access to grammatical rules

E.g. You have k new messages rendered in ten languages X

    render X (Have (You (Number (k (New Message)))))

Usability for different purposes

Grammar as parser

Often in NLP, a grammar is just high-level code for a parser.

But writing a grammar can be inadequate for parsing:

Moreover, a grammar fine-tuned for parsing may not be reusable

Grammar as language definition

Linguistic ontology: abstract syntax

E.g. adjectival modification

    AdjCN : AP -> CN -> CN ;

Rendering in different languages: concrete syntax

Resource grammars have generation perspective, rather than parsing

Usability by non-linguists

Division of labour: resource grammars hide linguistic details

Presentation: "school grammar" concepts, dictionary-like conventions

API = Application Programmer's Interface

Documentation: gfdoc

IDE = Interactive Development Environment (forthcoming)

Example-based grammar writing

    render Ita (parse Eng "you have k messages")

Scientific interest

Linguistics

Computer science

Background

History

2002: v. 0.2

2003: v. 0.6

2005: v. 0.9

2006: v. 1.0

Authors

Janna Khegai (Russian modules, forthcoming), Bjorn Bringert (many Swadesh lexica), Carlos Gonzalia (Spanish cardinals), Partik Jansson (Swedish cardinals), Aarne Ranta.

We are grateful for contributions and comments to several other people who have used this and the previous versions of the resource library, including Ana Bove, David Burke, Lauri Carlson, Gloria Casanellas, Karin Cavallin, Hans-Joachim Daniels, Kristofer Johannisson, Anni Laine, Wanjiku Ng'ang'a, Jordi Saludes.

Related work

CLE (Core Language Engine, Book 1992)

LinGO Grammar Matrix

Pargram

Rosetta Machine Translation (Book 1994)

Coverage

===Languages====

The current GF Resource Project covers ten languages:

In addition, parts (morphology) of Arabic, Estonian, Latin, and Urdu

API 1.0 not yet implemented for Danish and Russian

===Morphology====

Complete inflection engine

High-level access via ParadigmsX; e.g. Swedish:

Syntactic structures

Quantitative measures

67 categories

150 abstract syntax combination rules

100 structural words

350 content words in a test lexicon

Lines of source code (4/3/2006):

    abstract     1131
    english      2344
    german       2386
    finnish      3396
    norwegian    1257
    swedish      1465
    scandinavian 1023
    french       3246 -- Besch + Irreg + Morpho 2111
    italian      7797 -- Besch 6512
    spanish      7120 -- Besch 5877
    romance      1066

Structure

Language-independent ground API

Language-dependent paradigm modules

Language-dependent syntax extensions

Special-purpose APIs

How to use as top-level grammar

Parsing

Treebank generation

Treebank-based parsing

Morphology

Syntax editing

Efficient parsing via application grammar

How to use as library

Specialization through parametrized modules

Compile-time transfer

A natural division into modules

Example-based grammar writing

How to implement a new language

Ordinary modules

Parametrized modules

The kernel of the API

How to proceed

How to extend the API

Extend old modules or add a new one?