mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-11 05:49:31 -06:00
The new logo is in the new directory doc/Logos. The old logo has been moved there too. The documentation has been updated to refer to the new logo. The new logo was provided in SVG format by Krasimir, but PNG files are included as well. There are some new rules in doc/Makefile to convert from SVG to PNG, which you can use if you edit the SVG files.
351 lines
8.4 KiB
HTML
351 lines
8.4 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<META NAME="generator" CONTENT="http://txt2tags.sf.net">
|
|
<TITLE>A Birds-Eye View of GF as a Grammar Formalism</TITLE>
|
|
</HEAD><BODY BGCOLOR="white" TEXT="black">
|
|
<P ALIGN="center"><CENTER><H1>A Birds-Eye View of GF as a Grammar Formalism</H1>
|
|
<FONT SIZE="4">
|
|
<I>Author: Aarne Ranta</I><BR>
|
|
Last update: Thu Feb 2 14:16:01 2006
|
|
</FONT></CENTER>
|
|
|
|
<P></P>
|
|
<HR NOSHADE SIZE=1>
|
|
<P></P>
|
|
<UL>
|
|
<LI><A HREF="#toc1">GF in a few words</A>
|
|
<LI><A HREF="#toc2">History of GF</A>
|
|
<LI><A HREF="#toc3">Some key ingredients of GF in other grammar formalisms</A>
|
|
<LI><A HREF="#toc4">Examples of descriptions in each formalism</A>
|
|
<LI><A HREF="#toc5">Lambda terms and records</A>
|
|
<LI><A HREF="#toc6">The structure of GF formalisms</A>
|
|
<LI><A HREF="#toc7">The expressivity of GF</A>
|
|
<LI><A HREF="#toc8">Grammars and parsing</A>
|
|
<LI><A HREF="#toc9">Grammars as software libraries</A>
|
|
<LI><A HREF="#toc10">Multilinguality</A>
|
|
<LI><A HREF="#toc11">Parametrized modules</A>
|
|
</UL>
|
|
|
|
<P></P>
|
|
<HR NOSHADE SIZE=1>
|
|
<P></P>
|
|
<P>
|
|
<IMG ALIGN="middle" SRC="Logos/gf0.png" BORDER="0" ALT="">
|
|
</P>
|
|
<P>
|
|
<I>Abstract. This document gives a general description of the</I>
|
|
<I>Grammatical Framework (GF), with comparisons to other grammar</I>
|
|
<I>formalisms such as CG, ACG, HPSG, and LFG.</I>
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<A NAME="toc1"></A>
|
|
<H2>GF in a few words</H2>
|
|
<P>
|
|
Grammatical Framework (GF) is a grammar formalism
|
|
based on <B>constructive type theory</B>.
|
|
</P>
|
|
<P>
|
|
GF makes a distinction between <B>abstract syntax</B> and <B>concrete syntax</B>.
|
|
</P>
|
|
<P>
|
|
The abstract syntax part of GF is a <B>logical framework</B>, with
|
|
dependent types and higher-order functions.
|
|
</P>
|
|
<P>
|
|
The concrete syntax is a system of <B>records</B> containing strings and features.
|
|
</P>
|
|
<P>
|
|
A GF grammar defines a <B>reversible homomorphism</B> from an abstract syntax to a
|
|
concrete syntax.
|
|
</P>
|
|
<P>
|
|
A <B>multilingual GF grammar</B> is a set of concrete syntaxes associated with
|
|
one abstract syntax.
|
|
</P>
|
|
<P>
|
|
GF grammars are written in a high-level <B>functional programming language</B>,
|
|
which is compiled into a <B>core language</B> (GFC).
|
|
</P>
|
|
<P>
|
|
GF grammars can be used as <B>resources</B>, i.e. as libraries for writing
|
|
new grammars; these are compiled and optimized by the method of
|
|
<B>grammar composition</B>.
|
|
</P>
|
|
<P>
|
|
GF has a <B>module system</B> that supports grammar engineering and separate
|
|
compilation.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<A NAME="toc2"></A>
|
|
<H2>History of GF</H2>
|
|
<P>
|
|
1988. Intuitionistic Categorial Grammar; type theory as abstract syntax,
|
|
playing the role of Montague's analysis trees. Grammars implemented in Prolog.
|
|
</P>
|
|
<P>
|
|
1994. Type-Theoretical Grammar. Abstract syntax organized as a system of
|
|
combinators. Grammars implemented in ALF.
|
|
</P>
|
|
<P>
|
|
1996. Multilingual Type-Theoretical Grammar. Rules for generating six
|
|
languages from the same abstract syntax. Grammars implemented in ALF, ML, and
|
|
Haskell.
|
|
</P>
|
|
<P>
|
|
1998. The first implementation of GF as a language of its own.
|
|
</P>
|
|
<P>
|
|
2000. New version of GF: high-level functional source language, records used
|
|
for concrete syntax.
|
|
</P>
|
|
<P>
|
|
2003. The module system.
|
|
</P>
|
|
<P>
|
|
2004. Ljunglöf's thesis <I>Expressivity and Complexity of GF</I>.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<A NAME="toc3"></A>
|
|
<H2>Some key ingredients of GF in other grammar formalisms</H2>
|
|
<UL>
|
|
<LI>[GF ]: Grammatical Framework
|
|
<LI>[CG ]: categorial grammar
|
|
<LI>[ACG ]: abstract categorial grammar
|
|
<LI>[HPSG ]: head-driven phrase structure grammar
|
|
<LI>[LFG ]: lexical functional grammar
|
|
</UL>
|
|
|
|
<TABLE CELLPADDING="4" BORDER="1">
|
|
<TR>
|
|
<TD ALIGN="center">/</TD>
|
|
<TD>GF</TD>
|
|
<TD>ACG</TD>
|
|
<TD>LFG</TD>
|
|
<TD>HPSG</TD>
|
|
<TD>CG</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD>abstract vs concrete syntax</TD>
|
|
<TD>X</TD>
|
|
<TD>X</TD>
|
|
<TD>?</TD>
|
|
<TD>-</TD>
|
|
<TD>-</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD>type theory</TD>
|
|
<TD>X</TD>
|
|
<TD>X</TD>
|
|
<TD>-</TD>
|
|
<TD>-</TD>
|
|
<TD>X</TD>
|
|
</TR>
|
|
<TR>
|
|
<TD>records and features</TD>
|
|
<TD>X</TD>
|
|
<TD>-</TD>
|
|
<TD>X</TD>
|
|
<TD>X</TD>
|
|
<TD>-</TD>
|
|
</TR>
|
|
</TABLE>
|
|
|
|
<P></P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<A NAME="toc4"></A>
|
|
<H2>Examples of descriptions in each formalism</H2>
|
|
<P>
|
|
To be written...
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<A NAME="toc5"></A>
|
|
<H2>Lambda terms and records</H2>
|
|
<P>
|
|
In CS, abstract syntax is trees and concrete syntax is strings.
|
|
This works more or less for programming languages.
|
|
</P>
|
|
<P>
|
|
In CG, all syntax is lambda terms.
|
|
</P>
|
|
<P>
|
|
In Montague grammar, abstract syntax is lambda terms and
|
|
concrete syntax is trees. Abstract syntax as lambda terms
|
|
can be considered well-established.
|
|
</P>
|
|
<P>
|
|
In PATR and HPSG, concrete syntax it records. This can be considered
|
|
well-established for natural languages.
|
|
</P>
|
|
<P>
|
|
In ACG, both are lambda terms. This is more general than GF,
|
|
but reversibility requires linearity restriction, which can be
|
|
unnatural for grammar writing.
|
|
</P>
|
|
<P>
|
|
In GF, linearization from lambda terms to records is reversible,
|
|
and grammar writing is not restricted to linear terms.
|
|
</P>
|
|
<P>
|
|
Grammar composition in ACG is just function composition. In GF,
|
|
it is more restricted...
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<A NAME="toc6"></A>
|
|
<H2>The structure of GF formalisms</H2>
|
|
<P>
|
|
The following diagram (to be drawn properly!) describes the
|
|
levels.
|
|
</P>
|
|
<PRE>
|
|
| programming language design
|
|
V
|
|
GF source language
|
|
|
|
|
| type-directed partial evaluation
|
|
V
|
|
GFC assembly language
|
|
|
|
|
| Ljunglöf's translation
|
|
V
|
|
MCFG parser
|
|
</PRE>
|
|
<P>
|
|
The last two phases are nontrivial mathematica properties.
|
|
</P>
|
|
<P>
|
|
In most grammar formalisms, grammarians have to work on the GFC
|
|
(or MCFG) level.
|
|
</P>
|
|
<P>
|
|
Maybe they use macros - they are therefore like macro assemblers. But there
|
|
are no separately compiled library modules, no type checking, etc.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<A NAME="toc7"></A>
|
|
<H2>The expressivity of GF</H2>
|
|
<P>
|
|
Parsing complexity is the same as MCFG: polynomial, with
|
|
unrestricted exponent depending on grammar.
|
|
This is between TAG and HPSG.
|
|
</P>
|
|
<P>
|
|
If semantic well-formedness (type theory) is taken into account,
|
|
then arbitrary logic can be expressed. The well-formedness of
|
|
abstract syntax is decidable, but the well-formedness of a
|
|
concrete-syntax string can require an arbitrary proof construction
|
|
and is therefore undecidable.
|
|
</P>
|
|
<P>
|
|
Separability between AS and CS: like TAG (Tree Adjoining Grammar), GF
|
|
has the goal of assigning intended trees for strings. This is
|
|
generalized to shared trees for different languages.
|
|
</P>
|
|
<P>
|
|
The high-level language strives after the properties of
|
|
writability and readability (programming language notions).
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<A NAME="toc8"></A>
|
|
<H2>Grammars and parsing</H2>
|
|
<P>
|
|
In many projects, a grammar is just seen as a <B>declarative parsing program</B>.
|
|
</P>
|
|
<P>
|
|
For GF, a grammar is primarily the <B>definition of a language</B>.
|
|
</P>
|
|
<P>
|
|
Detaching grammars from parsers is a good idea, giving
|
|
</P>
|
|
<UL>
|
|
<LI>more efficient and robust parsing (statistical etc)
|
|
<LI>cleaner grammars
|
|
</UL>
|
|
|
|
<P>
|
|
Separating abstract from concrete syntax is a prerequisite for this:
|
|
we want parsers to return abstract syntax objects, and these must exist
|
|
independently of parse trees.
|
|
</P>
|
|
<P>
|
|
A possible radical approach to parsing:
|
|
use a grammar to generate a treebank and machine-learn
|
|
a statistical parser from this.
|
|
</P>
|
|
<P>
|
|
Comparison: Steedman in CCG has done something like this.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<A NAME="toc9"></A>
|
|
<H2>Grammars as software libraries</H2>
|
|
<P>
|
|
Reuse for different purposes.
|
|
</P>
|
|
<P>
|
|
Grammar composition.
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<A NAME="toc10"></A>
|
|
<H2>Multilinguality</H2>
|
|
<P>
|
|
In <B>application grammars</B>, the AS is a semantic
|
|
model, and a CS covers domain terminology and idioms.
|
|
</P>
|
|
<P>
|
|
This can give publication-quality translation on
|
|
limited domains (e.g. the WebALT project).
|
|
</P>
|
|
<P>
|
|
Resource grammars with grammar composition lead to
|
|
<B>compile-time transfer</B>.
|
|
</P>
|
|
<P>
|
|
When is <B>run-time transfer</B> necessary?
|
|
</P>
|
|
<P>
|
|
Cf. CLE (Core Language Engine).
|
|
</P>
|
|
<P>
|
|
<!-- NEW -->
|
|
</P>
|
|
<A NAME="toc11"></A>
|
|
<H2>Parametrized modules</H2>
|
|
<P>
|
|
This notion comes from the ML language in the 1980's.
|
|
</P>
|
|
<P>
|
|
It can be used for sharing even more code between languages
|
|
than their AS.
|
|
</P>
|
|
<P>
|
|
Especially, for related languages (Scandinavian, Romance).
|
|
</P>
|
|
<P>
|
|
Cf. grammar porting in CLE: what they do with untyped
|
|
macro packages GF does with typable interfaces.
|
|
</P>
|
|
|
|
<!-- html code generated by txt2tags 2.0 (http://txt2tags.sf.net) -->
|
|
<!-- cmdline: txt2tags -thtml -\-toc gf-formalism.txt -->
|
|
</BODY></HTML>
|