mirror of
https://github.com/GrammaticalFramework/gf-core.git
synced 2026-04-22 19:22:50 -06:00
API module titles, resource.txt corrections
This commit is contained in:
@@ -1,6 +1,8 @@
|
|||||||
\documentclass[11pt,a4paper]{article}
|
\documentclass[11pt,a4paper]{article}
|
||||||
\usepackage{amsfonts,graphicx}
|
\usepackage{amsfonts,graphicx}
|
||||||
|
\usepackage{isolatin1}
|
||||||
\usepackage[pdfstartview=FitH,urlcolor=blue,colorlinks=true,bookmarks=true]{hyperref}
|
\usepackage[pdfstartview=FitH,urlcolor=blue,colorlinks=true,bookmarks=true]{hyperref}
|
||||||
|
%%\usepackage[utf8x]{inputenc}
|
||||||
\pagestyle{plain} % do page numbering ('empty' turns off)
|
\pagestyle{plain} % do page numbering ('empty' turns off)
|
||||||
\frenchspacing % no aditional spaces after periods
|
\frenchspacing % no aditional spaces after periods
|
||||||
\setlength{\parskip}{8pt}\parindent=0pt % no paragraph indentation
|
\setlength{\parskip}{8pt}\parindent=0pt % no paragraph indentation
|
||||||
@@ -28,29 +30,34 @@ in order to use the library.
|
|||||||
How to write one's own resource grammar (i.e. to implement the API for
|
How to write one's own resource grammar (i.e. to implement the API for
|
||||||
a new language), is covered by a separate Resource-HOWTO document.
|
a new language), is covered by a separate Resource-HOWTO document.
|
||||||
|
|
||||||
\section{Motivation}
|
\subsection*{Motivation}
|
||||||
The GF Resource Grammar Library contains grammar rules for
|
The GF Resource Grammar Library contains grammar rules for
|
||||||
10 languages (some more are under construction). Its purpose
|
10 languages (some more are under construction). Its purpose
|
||||||
is to make these rules available for application programmers,
|
is to make these rules available for application programmers,
|
||||||
who can thereby concentrate on the semantic and stylistic
|
who can thereby concentrate on the semantic and stylistic
|
||||||
aspects of their grammars, without having to think about
|
aspects of their grammars, without having to think about
|
||||||
grammaticality. The targeted level of application grammarians
|
grammaticality. The targeted level of application grammarians
|
||||||
is skilled programmer without knowledge linguistics, but with
|
is that of a skilled programmer with
|
||||||
a good knowledge of the target languages. Such a combination of
|
a practical knowledge of the target languages, but without
|
||||||
|
theoretical knowledge about their grammars.
|
||||||
|
Such a combination of
|
||||||
skills is typical of programmers who want to localize
|
skills is typical of programmers who want to localize
|
||||||
software to new languages.
|
software to new languages.
|
||||||
|
|
||||||
The current resource languages are
|
The current resource languages are
|
||||||
-\texttt{Dan}ish
|
|
||||||
-\texttt{Eng}lish
|
\begin{itemize}
|
||||||
-\texttt{Fin}nish
|
\item \texttt{Dan}ish
|
||||||
-\texttt{Fre}nch
|
\item \texttt{Eng}lish
|
||||||
-\texttt{Ger}man
|
\item \texttt{Fin}nish
|
||||||
-\texttt{Ita}lian
|
\item \texttt{Fre}nch
|
||||||
-\texttt{Nor}wegian
|
\item \texttt{Ger}man
|
||||||
-\texttt{Rus}sian
|
\item \texttt{Ita}lian
|
||||||
-\texttt{Spa}nish
|
\item \texttt{Nor}wegian
|
||||||
-\texttt{Swe}dish
|
\item \texttt{Rus}sian
|
||||||
|
\item \texttt{Spa}nish
|
||||||
|
\item \texttt{Swe}dish
|
||||||
|
\end{itemize}
|
||||||
|
|
||||||
The first three letters (\texttt{Dan} etc) are used in grammar module names.
|
The first three letters (\texttt{Dan} etc) are used in grammar module names.
|
||||||
|
|
||||||
@@ -83,7 +90,7 @@ variation is taken care of by the resource grammar function
|
|||||||
\begin{verbatim}
|
\begin{verbatim}
|
||||||
fun AdjCN : AP -> CN -> CN
|
fun AdjCN : AP -> CN -> CN
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
and the resource grammar implementation of the rule adding properties
|
The resource grammar implementation of the rule adding properties
|
||||||
to kinds is
|
to kinds is
|
||||||
|
|
||||||
\begin{verbatim}
|
\begin{verbatim}
|
||||||
@@ -95,8 +102,8 @@ given that
|
|||||||
lincat Prop = AP
|
lincat Prop = AP
|
||||||
lincat Kind = CN
|
lincat Kind = CN
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
The resource library API is devided into language-specific and language-independet
|
The resource library API is devided into language-specific
|
||||||
parts. To put it roughly,
|
and language-independet parts. To put it roughly,
|
||||||
|
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item the lexicon API is language-specific
|
\item the lexicon API is language-specific
|
||||||
@@ -111,9 +118,10 @@ pick a different linearization of \texttt{Song},
|
|||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
But to linearize \texttt{PropKind}, we can use the very same rule as in German.
|
But to linearize \texttt{PropKind}, we can use the very same rule as in German.
|
||||||
The resource function \texttt{AdjCN} has different implementations in the two
|
The resource function \texttt{AdjCN} has different implementations in the two
|
||||||
languages, but the application programmer need not care about the difference.
|
languages (e.g. a different word order in French),
|
||||||
|
but the application programmer need not care about the difference.
|
||||||
|
|
||||||
\subsection{A complete example}
|
\subsubsection*{A complete example}
|
||||||
To summarize the example, and also give a template for a programmer to work on,
|
To summarize the example, and also give a template for a programmer to work on,
|
||||||
here is the complete implementation of a small system with songs and properties.
|
here is the complete implementation of a small system with songs and properties.
|
||||||
The abstract syntax defines a "domain ontology":
|
The abstract syntax defines a "domain ontology":
|
||||||
@@ -129,7 +137,8 @@ The abstract syntax defines a "domain ontology":
|
|||||||
American : Property ;
|
American : Property ;
|
||||||
}
|
}
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
The concrete syntax is defined independently of language, by opening
|
The concrete syntax is defined by a functor (parametrize module),
|
||||||
|
independently of language, by opening
|
||||||
two interfaces: the resource \texttt{Grammar} and an application lexicon.
|
two interfaces: the resource \texttt{Grammar} and an application lexicon.
|
||||||
|
|
||||||
\begin{verbatim}
|
\begin{verbatim}
|
||||||
@@ -153,14 +162,14 @@ the resource category system \texttt{Cat}.
|
|||||||
american_A : A ;
|
american_A : A ;
|
||||||
}
|
}
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
Each language has its own concrete syntax, which opens the inflectional paradigms
|
Each language has its own concrete syntax, which opens the
|
||||||
module for that language:
|
inflectional paradigms module for that language:
|
||||||
|
|
||||||
\begin{verbatim}
|
\begin{verbatim}
|
||||||
concrete MusicLexGer of MusicLex = CatGer ** open ParadigmsGer in {
|
concrete MusicLexGer of MusicLex = CatGer ** open ParadigmsGer in {
|
||||||
lin
|
lin
|
||||||
song_N = reg2N "Lied" "Lieder" neuter ;
|
song_N = reg2N "Lied" "Lieder" neuter ;
|
||||||
american_A = regA "amerikanisch" ;
|
american_A = regA "Amerikanisch" ;
|
||||||
}
|
}
|
||||||
|
|
||||||
concrete MusicLexFre of MusicLex = CatFre ** open ParadigmsFre in {
|
concrete MusicLexFre of MusicLex = CatFre ** open ParadigmsFre in {
|
||||||
@@ -169,8 +178,8 @@ module for that language:
|
|||||||
american_A = regA "américain" ;
|
american_A = regA "américain" ;
|
||||||
}
|
}
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
The top-level \texttt{Music} grammars are obtained by instantiating the two interfaces
|
The top-level \texttt{Music} grammars are obtained by
|
||||||
of \texttt{MusicI}:
|
instantiating the two interfaces of \texttt{MusicI}:
|
||||||
|
|
||||||
\begin{verbatim}
|
\begin{verbatim}
|
||||||
concrete MusicGer of Music = MusicI with
|
concrete MusicGer of Music = MusicI with
|
||||||
@@ -189,8 +198,10 @@ Both of these files can use the same \texttt{path}, defined as
|
|||||||
The \texttt{present} category contains the compiled resources, restricted to
|
The \texttt{present} category contains the compiled resources, restricted to
|
||||||
present tense; \texttt{alltenses} has the full resources.
|
present tense; \texttt{alltenses} has the full resources.
|
||||||
|
|
||||||
To localize the music player system to a new language, all that is needed is two modules,
|
To localize the music player system to a new language,
|
||||||
one implementing \texttt{MusicLex} and the other instantiating \texttt{Music}. The latter is
|
all that is needed is two modules,
|
||||||
|
one implementing \texttt{MusicLex} and the other
|
||||||
|
instantiating \texttt{Music}. The latter is
|
||||||
completely trivial, whereas the former one involves the choice of correct
|
completely trivial, whereas the former one involves the choice of correct
|
||||||
vocabulary and inflectional paradigms. For instance, Finnish is added as follows:
|
vocabulary and inflectional paradigms. For instance, Finnish is added as follows:
|
||||||
|
|
||||||
@@ -238,10 +249,10 @@ gives its own definition of this function:
|
|||||||
}
|
}
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
|
|
||||||
\subsection{Parsing with resource grammars?}
|
\subsubsection*{Parsing with resource grammars?}
|
||||||
The intended use of the resource grammar is as a library for writing
|
The intended use of the resource grammar is as a library for writing
|
||||||
application grammars. It is not designed for e.g. parsing newspaper text. There
|
application grammars. It is not designed for parsing e.g. newspaper text. There
|
||||||
are several reasons why this is not so practical:
|
are several reasons why this is not practical:
|
||||||
|
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item Efficiency: the resource grammar uses complex data structures, in
|
\item Efficiency: the resource grammar uses complex data structures, in
|
||||||
@@ -265,15 +276,16 @@ details such as inflection, agreement, and word order.
|
|||||||
It is for the same reasons that resource grammars are not adequate for translation.
|
It is for the same reasons that resource grammars are not adequate for translation.
|
||||||
That the syntax API is implemented for different languages of course makes
|
That the syntax API is implemented for different languages of course makes
|
||||||
it possible to translate via it - but there is no guarantee of translation
|
it possible to translate via it - but there is no guarantee of translation
|
||||||
equivalence. Of course, the use of parametrized implementations such as \texttt{MusicI}
|
equivalence. Of course, the use of functor implementations such as \texttt{MusicI}
|
||||||
above only extends to those cases where the syntax API does give translation
|
above only extends to those cases where the syntax API does give translation
|
||||||
equivalence - but this must be seen as a limiting case, and real applications
|
equivalence - but this must be seen as a limiting case, and bigger applications
|
||||||
will often use only restricted inheritance of \texttt{MusicI}.
|
will often use only restricted inheritance of \texttt{MusicI}.
|
||||||
|
|
||||||
\section{To find rules in the resource grammar library}
|
\subsection*{To find rules in the resource grammar library}
|
||||||
\subsection{Inflection paradigms}
|
\subsubsection*{Inflection paradigms}
|
||||||
Inflection paradigms are defined separately for each language \textit{L}
|
Inflection paradigms are defined separately for each language \textit{L}
|
||||||
in the module \texttt{Paradigms}\textit{L}. To test them, the command \texttt{cc} (= \texttt{compute\_concrete})
|
in the module \texttt{Paradigms}\textit{L}. To test them, the command
|
||||||
|
\texttt{cc} (= \texttt{compute\_concrete})
|
||||||
can be used:
|
can be used:
|
||||||
|
|
||||||
\begin{verbatim}
|
\begin{verbatim}
|
||||||
@@ -310,15 +322,16 @@ For the sake of convenience, every language implements these four paradigms:
|
|||||||
It is often possible to initialize a lexicon by just using these functions,
|
It is often possible to initialize a lexicon by just using these functions,
|
||||||
and later revise it by using the more involved paradigms. For instance, in
|
and later revise it by using the more involved paradigms. For instance, in
|
||||||
German we cannot use \texttt{regN "Lied"} for \texttt{Song}, because the result would be a
|
German we cannot use \texttt{regN "Lied"} for \texttt{Song}, because the result would be a
|
||||||
Masculine noun with the plural form \texttt{"Liede"}. The individual \texttt{Paradigms} modules
|
Masculine noun with the plural form \texttt{"Liede"}.
|
||||||
|
The individual \texttt{Paradigms} modules
|
||||||
tell what cases are covered by the regular heuristics.
|
tell what cases are covered by the regular heuristics.
|
||||||
|
|
||||||
As a limiting case, one could even initialize the lexicon for a new language
|
As a limiting case, one could even initialize the lexicon for a new language
|
||||||
by copying the English (or some other already existing) lexicon. This will
|
by copying the English (or some other already existing) lexicon. This would
|
||||||
produce language with correct grammar but with content words directly borrowed from
|
produce language with correct grammar but with content words directly borrowed from
|
||||||
English.
|
English - maybe not so strange in certain technical domains.
|
||||||
|
|
||||||
\subsection{Syntax rules}
|
\subsubsection*{Syntax rules}
|
||||||
Syntax rules should be looked for in the abstract modules defining the
|
Syntax rules should be looked for in the abstract modules defining the
|
||||||
API. There are around 10 such modules, each defining constructors for
|
API. There are around 10 such modules, each defining constructors for
|
||||||
a group of one or more related categories. For instance, the module
|
a group of one or more related categories. For instance, the module
|
||||||
@@ -326,14 +339,16 @@ a group of one or more related categories. For instance, the module
|
|||||||
Thus the proper place to find out how nouns are modified with adjectives
|
Thus the proper place to find out how nouns are modified with adjectives
|
||||||
is \texttt{Noun}, because the result of the construction is again a common noun.
|
is \texttt{Noun}, because the result of the construction is again a common noun.
|
||||||
|
|
||||||
Browsing the libraries is helped by the gfdoc-generated HTML pages.
|
Browsing the libraries is helped by the gfdoc-generated HTML pages,
|
||||||
|
whose LaTeX versions are included in the present document.
|
||||||
However, this is still not easy, and the most efficient way is
|
However, this is still not easy, and the most efficient way is
|
||||||
probably to use the parser.
|
probably to use the parser.
|
||||||
Even though parsing is not an intended end-user application
|
Even though parsing is not an intended end-user application
|
||||||
of resource grammars, it is a useful technique for application grammarians
|
of resource grammars, it is a useful technique for application grammarians
|
||||||
to browse the library. To find out what resource function does some
|
to browse the library. To find out which resource function implements
|
||||||
particular job, you can just parse a string that exemplifies this job. For
|
a particular structure, one can just parse a string that exemplifies this
|
||||||
instance, to find out how sentences are built using transitive verbs, write
|
structure. For instance, to find out how sentences are built using
|
||||||
|
transitive verbs, write
|
||||||
|
|
||||||
\begin{verbatim}
|
\begin{verbatim}
|
||||||
> i english/LangEng.gf
|
> i english/LangEng.gf
|
||||||
@@ -369,7 +384,7 @@ This can be built by parsing "I have beer" in LanEng and then writing
|
|||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
which uses ParadigmsIta.regGenN.
|
which uses ParadigmsIta.regGenN.
|
||||||
|
|
||||||
\subsection{Example-based grammar writing}
|
\subsubsection*{Example-based grammar writing}
|
||||||
The technique of parsing with the resource grammar can be used in GF source files,
|
The technique of parsing with the resource grammar can be used in GF source files,
|
||||||
endowed with the suffix \texttt{.gfe} ("GF examples"). The suffix tells GF to preprocess
|
endowed with the suffix \texttt{.gfe} ("GF examples"). The suffix tells GF to preprocess
|
||||||
the file by replacing all expressions of the form
|
the file by replacing all expressions of the form
|
||||||
@@ -402,8 +417,8 @@ However, the technique of example-based grammar writing has some limitations:
|
|||||||
it may not be the intended one. The other parses are shown in a comment, from
|
it may not be the intended one. The other parses are shown in a comment, from
|
||||||
where they must/can be picked manually.
|
where they must/can be picked manually.
|
||||||
\item Lexicality. The arguments of a function must be atomic identifiers, and are thus
|
\item Lexicality. The arguments of a function must be atomic identifiers, and are thus
|
||||||
not available for categories that have no lexical items. For instance, the \texttt{PropKind}
|
not available for categories that have no lexical items.
|
||||||
rule above gives the result
|
For instance, the \texttt{PropKind} rule above gives the result
|
||||||
\begin{verbatim}
|
\begin{verbatim}
|
||||||
lin
|
lin
|
||||||
PropKind car_N old_A = AdjCN (UseN car_N) (PositA old_A) ;
|
PropKind car_N old_A = AdjCN (UseN car_N) (PositA old_A) ;
|
||||||
@@ -418,9 +433,10 @@ all those categories that can be used as arguments, for instance,
|
|||||||
and then use this lexicon instead of the standard one included in \texttt{Lang}.
|
and then use this lexicon instead of the standard one included in \texttt{Lang}.
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
\subsection{Special-purpose APIs}
|
\subsubsection*{Special-purpose APIs}
|
||||||
To give an analogy with a well-known type setting program, GF can be compared
|
To give an analogy with the well-known type setting software, GF can be compared
|
||||||
with TeX and the resource grammar library with LaTeX. As TeX frees the author
|
with TeX and the resource grammar library with LaTeX.
|
||||||
|
Just like TeX frees the author
|
||||||
from thinking about low-level problems of page layout, so GF frees the grammarian
|
from thinking about low-level problems of page layout, so GF frees the grammarian
|
||||||
from writing parsing and generation algorithms. But quite a lot of knowledge of
|
from writing parsing and generation algorithms. But quite a lot of knowledge of
|
||||||
\textit{how} to write grammars is still needed, and the resource grammar library helps
|
\textit{how} to write grammars is still needed, and the resource grammar library helps
|
||||||
@@ -457,15 +473,16 @@ The implementation of this module is the functor \texttt{PredicationI}:
|
|||||||
Of course, \texttt{Predication} can be opened together with \texttt{Grammar}, but using
|
Of course, \texttt{Predication} can be opened together with \texttt{Grammar}, but using
|
||||||
the resulting grammar for parsing can be frustrating, since having both
|
the resulting grammar for parsing can be frustrating, since having both
|
||||||
ways of building clauses simultaneously available will produce spurious
|
ways of building clauses simultaneously available will produce spurious
|
||||||
ambiguities. Using \texttt{Predication} without \texttt{Verb} for parsing is a better idea,
|
ambiguities. But using just \texttt{Predication} without \texttt{Verb}
|
||||||
since parsing is also made more efficient without rules for the \texttt{VP} category.
|
for parsing is a good idea,
|
||||||
|
since parsing is more efficient without rules producing verb phrases.
|
||||||
|
|
||||||
The use of special-purpose APIs is to some extent just an alternative
|
The use of special-purpose APIs is to some extent just an alternative
|
||||||
to grammar writing by parsing, and its importance may decrease as parsing
|
to grammar writing by parsing, and its importance may decrease as parsing
|
||||||
with resource grammars gets more efficient.
|
with resource grammars becomes more practical.
|
||||||
|
|
||||||
\section{Overview of syntactic structures}
|
\subsection*{Overview of syntactic structures}
|
||||||
\subsection{Texts. phrases, and utterances}
|
\subsubsection*{Texts. phrases, and utterances}
|
||||||
The outermost linguistic structure is \texttt{Text}. \texttt{Text}s are composed
|
The outermost linguistic structure is \texttt{Text}. \texttt{Text}s are composed
|
||||||
from Phrases (\texttt{Phr}) followed by punctuation marks - either of ".", "?" or
|
from Phrases (\texttt{Phr}) followed by punctuation marks - either of ".", "?" or
|
||||||
"!" (with their proper variants in Spanish and Arabic). Here is an
|
"!" (with their proper variants in Spanish and Arabic). Here is an
|
||||||
@@ -502,7 +519,7 @@ What is the difference between Phrase and Utterance? Just technical:
|
|||||||
a Phrase is an Utterance with an optional leading conjunction ("but")
|
a Phrase is an Utterance with an optional leading conjunction ("but")
|
||||||
and an optional tailing vocative ("John", "please").
|
and an optional tailing vocative ("John", "please").
|
||||||
|
|
||||||
\subsection{Sentences and clauses}
|
\subsubsection*{Sentences and clauses}
|
||||||
The richest of the categories below Utterance is \texttt{S}, Sentence. A Sentence
|
The richest of the categories below Utterance is \texttt{S}, Sentence. A Sentence
|
||||||
is formed from a Clause (\texttt{Cl}), by fixing its Tense, Anteriority, and Polarity.
|
is formed from a Clause (\texttt{Cl}), by fixing its Tense, Anteriority, and Polarity.
|
||||||
The difference between Sentence and Clause is thus also rather technical.
|
The difference between Sentence and Clause is thus also rather technical.
|
||||||
@@ -570,13 +587,15 @@ many constructors:
|
|||||||
ComplV2 love_V2 this_NP John loves this.
|
ComplV2 love_V2 this_NP John loves this.
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
|
|
||||||
\subsection{Parts of sentences}
|
\subsubsection*{Parts of sentences}
|
||||||
The linguistic phenomena mostly discussed in both traditional grammars and modern
|
The linguistic phenomena mostly discussed in both traditional grammars and modern
|
||||||
syntax belong to the level of Clauses, that is, lines 9-13, and occasionally
|
syntax belong to the level of Clauses, that is, lines 9-13, and occasionally
|
||||||
to Sentences, lines 5-13. At this level, the major categories are
|
to Sentences, lines 5-13. At this level, the major categories are
|
||||||
\texttt{NP} (Noun Phrase) and \texttt{VP} (Verb Phrase). A Clause typically consists of just an
|
\texttt{NP} (Noun Phrase) and \texttt{VP} (Verb Phrase). A Clause typically
|
||||||
\texttt{NP} and a \texttt{VP}. The internal structure of both \texttt{NP} and \texttt{VP} can be very complex,
|
consists of just an \texttt{NP} and a \texttt{VP}.
|
||||||
and these categories are mutually recursive: not only can a \texttt{VP} contain an \texttt{NP},
|
The internal structure of both \texttt{NP} and \texttt{VP} can be very complex,
|
||||||
|
and these categories are mutually recursive: not only can a \texttt{VP}
|
||||||
|
contain an \texttt{NP},
|
||||||
|
|
||||||
\begin{verbatim}
|
\begin{verbatim}
|
||||||
[VP loves [NP Mary]]
|
[VP loves [NP Mary]]
|
||||||
@@ -612,7 +631,8 @@ The most frequent ways are
|
|||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
\textbf{Verb}.
|
\textbf{Verb}.
|
||||||
How to construct VPs. The main mechanism is verbs with their arguments, for instance,
|
How to construct VPs. The main mechanism is verbs with their arguments,
|
||||||
|
for instance,
|
||||||
|
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item one-place verbs: "walks"
|
\item one-place verbs: "walks"
|
||||||
@@ -645,10 +665,12 @@ How to construct \texttt{Adv}s. The main ways are
|
|||||||
|
|
||||||
\begin{itemize}
|
\begin{itemize}
|
||||||
\item from adjectives: "slowly"
|
\item from adjectives: "slowly"
|
||||||
|
\item as prepositional phrases: "in the car"
|
||||||
\end{itemize}
|
\end{itemize}
|
||||||
|
|
||||||
\subsection{Modules and their names}
|
\subsubsection*{Modules and their names}
|
||||||
The resource modules are named after the kind of phrases that are constructed in them,
|
The resource modules are named after the kind of
|
||||||
|
phrases that are constructed in them,
|
||||||
and they can be roughly classified by the "level" or "size" of expressions that are
|
and they can be roughly classified by the "level" or "size" of expressions that are
|
||||||
formed in them:
|
formed in them:
|
||||||
|
|
||||||
@@ -663,7 +685,8 @@ Because of mutual recursion such as in embedded sentences, this classification i
|
|||||||
not a complete order. However, no mutual dependence is needed between the
|
not a complete order. However, no mutual dependence is needed between the
|
||||||
modules in a formal sense - they can all be compiled separately. This is due
|
modules in a formal sense - they can all be compiled separately. This is due
|
||||||
to the module \texttt{Cat}, which defines the type system common to the other modules.
|
to the module \texttt{Cat}, which defines the type system common to the other modules.
|
||||||
For instance, the types \texttt{NP} and \texttt{VP} are defined in \texttt{Cat}, and the module \texttt{Verb} only
|
For instance, the types \texttt{NP} and \texttt{VP} are defined in \texttt{Cat},
|
||||||
|
and the module \texttt{Verb} only
|
||||||
needs to know what is given in \texttt{Cat}, not what is given in \texttt{Noun}. To implement
|
needs to know what is given in \texttt{Cat}, not what is given in \texttt{Noun}. To implement
|
||||||
a rule such as
|
a rule such as
|
||||||
|
|
||||||
@@ -683,7 +706,7 @@ category-specific modules is as follows:
|
|||||||
abstract Verb = Cat ** {...}
|
abstract Verb = Cat ** {...}
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
|
|
||||||
\subsection{Top-level grammar and lexicon}
|
\subsubsection*{Top-level grammar and lexicon}
|
||||||
The module \texttt{Grammar} collects all the category-specific modules into
|
The module \texttt{Grammar} collects all the category-specific modules into
|
||||||
a complete grammar:
|
a complete grammar:
|
||||||
|
|
||||||
@@ -698,7 +721,8 @@ The module \texttt{Idiom} is a collection of idiomatic structures whose
|
|||||||
implementation is very language-dependent. An example is existential
|
implementation is very language-dependent. An example is existential
|
||||||
structures ("there is", "es gibt", "il y a", etc).
|
structures ("there is", "es gibt", "il y a", etc).
|
||||||
|
|
||||||
The module \texttt{Lang} combines \texttt{Grammar} with a \texttt{Lexicon} of ca. 350 content words:
|
The module \texttt{Lang} combines \texttt{Grammar} with a \texttt{Lexicon} of
|
||||||
|
ca. 350 content words:
|
||||||
|
|
||||||
\begin{verbatim}
|
\begin{verbatim}
|
||||||
abstract Lang = Grammar, Lexicon
|
abstract Lang = Grammar, Lexicon
|
||||||
@@ -709,7 +733,7 @@ help testing the resource library. It does not seem possible to maintain
|
|||||||
a general-purpose multilingual lexicon, and this is the form that the module
|
a general-purpose multilingual lexicon, and this is the form that the module
|
||||||
\texttt{Lexicon} has.
|
\texttt{Lexicon} has.
|
||||||
|
|
||||||
\subsection{Language-specific syntactic structures}
|
\subsubsection*{Language-specific syntactic structures}
|
||||||
The API collected in \texttt{Grammar} has been designed to be implementable for
|
The API collected in \texttt{Grammar} has been designed to be implementable for
|
||||||
all languages in the resource package. It does contain some rules that
|
all languages in the resource package. It does contain some rules that
|
||||||
are strange or superfluous in some languages; for instance, the distinction
|
are strange or superfluous in some languages; for instance, the distinction
|
||||||
@@ -725,19 +749,23 @@ rules. The top level of each languages looks as follows (with English as example
|
|||||||
abstract English = Grammar, ExtraEngAbs, DictEngAbs
|
abstract English = Grammar, ExtraEngAbs, DictEngAbs
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
where \texttt{ExtraEngAbs} is a collection of syntactic structures specific to English,
|
where \texttt{ExtraEngAbs} is a collection of syntactic structures specific to English,
|
||||||
and \texttt{DictEngAbs} is an English dictionary (at the moment, it consists of \texttt{IrregEngAbs},
|
and \texttt{DictEngAbs} is an English dictionary
|
||||||
|
(at the moment, it consists of \texttt{IrregEngAbs},
|
||||||
the irregular verbs of English). Each of these language-specific grammars has
|
the irregular verbs of English). Each of these language-specific grammars has
|
||||||
the potential to grow into a full-scale grammar of the language. These grammar
|
the potential to grow into a full-scale grammar of the language. These grammar
|
||||||
can also be used as libraries, but the possibility of using functors is lost.
|
can also be used as libraries, but the possibility of using functors is lost.
|
||||||
|
|
||||||
To give a better overview of language-specific structures, modules like \texttt{ExtraEngAbs}
|
To give a better overview of language-specific structures,
|
||||||
are built from a language-independent module \texttt{ExtraAbs} by restricted inheritance:
|
modules like \texttt{ExtraEngAbs}
|
||||||
|
are built from a language-independent module \texttt{ExtraAbs}
|
||||||
|
by restricted inheritance:
|
||||||
|
|
||||||
\begin{verbatim}
|
\begin{verbatim}
|
||||||
abstract ExtraEngAbs = Extra [f,g,...]
|
abstract ExtraEngAbs = Extra [f,g,...]
|
||||||
\end{verbatim}
|
\end{verbatim}
|
||||||
Thus any category and function in \texttt{Extra} may be shared by a subset of all
|
Thus any category and function in \texttt{Extra} may be shared by a subset of all
|
||||||
languages. One can see this set-up as a matrix, which tells what \texttt{Extra} structures
|
languages. One can see this set-up as a matrix, which tells
|
||||||
|
what \texttt{Extra} structures
|
||||||
are implemented in what languages. For the common API in \texttt{Grammar}, the matrix
|
are implemented in what languages. For the common API in \texttt{Grammar}, the matrix
|
||||||
is filled with 1's (everything is implemented in every language).
|
is filled with 1's (everything is implemented in every language).
|
||||||
|
|
||||||
|
|||||||
134
doc/resource.txt
134
doc/resource.txt
@@ -27,27 +27,28 @@ is to make these rules available for application programmers,
|
|||||||
who can thereby concentrate on the semantic and stylistic
|
who can thereby concentrate on the semantic and stylistic
|
||||||
aspects of their grammars, without having to think about
|
aspects of their grammars, without having to think about
|
||||||
grammaticality. The targeted level of application grammarians
|
grammaticality. The targeted level of application grammarians
|
||||||
is skilled programmer without knowledge linguistics, but with
|
is that of a skilled programmer with
|
||||||
a good knowledge of the target languages. Such a combination of
|
a practical knowledge of the target languages, but without
|
||||||
|
theoretical knowledge about their grammars.
|
||||||
|
Such a combination of
|
||||||
skills is typical of programmers who want to localize
|
skills is typical of programmers who want to localize
|
||||||
software to new languages.
|
software to new languages.
|
||||||
|
|
||||||
The current resource languages are
|
The current resource languages are
|
||||||
-``Dan``ish
|
- ``Dan``ish
|
||||||
-``Eng``lish
|
- ``Eng``lish
|
||||||
-``Fin``nish
|
- ``Fin``nish
|
||||||
-``Fre``nch
|
- ``Fre``nch
|
||||||
-``Ger``man
|
- ``Ger``man
|
||||||
-``Ita``lian
|
- ``Ita``lian
|
||||||
-``Nor``wegian
|
- ``Nor``wegian
|
||||||
-``Rus``sian
|
- ``Rus``sian
|
||||||
-``Spa``nish
|
- ``Spa``nish
|
||||||
-``Swe``dish
|
- ``Swe``dish
|
||||||
|
|
||||||
|
|
||||||
The first three letters (``Dan`` etc) are used in grammar module names.
|
The first three letters (``Dan`` etc) are used in grammar module names.
|
||||||
|
|
||||||
|
|
||||||
To give an example application, consider
|
To give an example application, consider
|
||||||
music playing devices. In the application,
|
music playing devices. In the application,
|
||||||
we may have a semantical category ``Kind``, examples
|
we may have a semantical category ``Kind``, examples
|
||||||
@@ -75,7 +76,7 @@ variation is taken care of by the resource grammar function
|
|||||||
```
|
```
|
||||||
fun AdjCN : AP -> CN -> CN
|
fun AdjCN : AP -> CN -> CN
|
||||||
```
|
```
|
||||||
and the resource grammar implementation of the rule adding properties
|
The resource grammar implementation of the rule adding properties
|
||||||
to kinds is
|
to kinds is
|
||||||
```
|
```
|
||||||
lin PropKind kind prop = AdjCN prop kind
|
lin PropKind kind prop = AdjCN prop kind
|
||||||
@@ -85,8 +86,8 @@ given that
|
|||||||
lincat Prop = AP
|
lincat Prop = AP
|
||||||
lincat Kind = CN
|
lincat Kind = CN
|
||||||
```
|
```
|
||||||
The resource library API is devided into language-specific and language-independet
|
The resource library API is devided into language-specific
|
||||||
parts. To put it roughly,
|
and language-independet parts. To put it roughly,
|
||||||
- the lexicon API is language-specific
|
- the lexicon API is language-specific
|
||||||
- the syntax API is language-independent
|
- the syntax API is language-independent
|
||||||
|
|
||||||
@@ -98,7 +99,8 @@ pick a different linearization of ``Song``,
|
|||||||
```
|
```
|
||||||
But to linearize ``PropKind``, we can use the very same rule as in German.
|
But to linearize ``PropKind``, we can use the very same rule as in German.
|
||||||
The resource function ``AdjCN`` has different implementations in the two
|
The resource function ``AdjCN`` has different implementations in the two
|
||||||
languages, but the application programmer need not care about the difference.
|
languages (e.g. a different word order in French),
|
||||||
|
but the application programmer need not care about the difference.
|
||||||
|
|
||||||
|
|
||||||
===A complete example===
|
===A complete example===
|
||||||
@@ -117,7 +119,8 @@ The abstract syntax defines a "domain ontology":
|
|||||||
American : Property ;
|
American : Property ;
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
The concrete syntax is defined independently of language, by opening
|
The concrete syntax is defined by a functor (parametrize module),
|
||||||
|
independently of language, by opening
|
||||||
two interfaces: the resource ``Grammar`` and an application lexicon.
|
two interfaces: the resource ``Grammar`` and an application lexicon.
|
||||||
```
|
```
|
||||||
incomplete concrete MusicI of Music = open Grammar, MusicLex in {
|
incomplete concrete MusicI of Music = open Grammar, MusicLex in {
|
||||||
@@ -139,13 +142,13 @@ the resource category system ``Cat``.
|
|||||||
american_A : A ;
|
american_A : A ;
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
Each language has its own concrete syntax, which opens the inflectional paradigms
|
Each language has its own concrete syntax, which opens the
|
||||||
module for that language:
|
inflectional paradigms module for that language:
|
||||||
```
|
```
|
||||||
concrete MusicLexGer of MusicLex = CatGer ** open ParadigmsGer in {
|
concrete MusicLexGer of MusicLex = CatGer ** open ParadigmsGer in {
|
||||||
lin
|
lin
|
||||||
song_N = reg2N "Lied" "Lieder" neuter ;
|
song_N = reg2N "Lied" "Lieder" neuter ;
|
||||||
american_A = regA "amerikanisch" ;
|
american_A = regA "Amerikanisch" ;
|
||||||
}
|
}
|
||||||
|
|
||||||
concrete MusicLexFre of MusicLex = CatFre ** open ParadigmsFre in {
|
concrete MusicLexFre of MusicLex = CatFre ** open ParadigmsFre in {
|
||||||
@@ -154,8 +157,8 @@ module for that language:
|
|||||||
american_A = regA "américain" ;
|
american_A = regA "américain" ;
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
The top-level ``Music`` grammars are obtained by instantiating the two interfaces
|
The top-level ``Music`` grammars are obtained by
|
||||||
of ``MusicI``:
|
instantiating the two interfaces of ``MusicI``:
|
||||||
```
|
```
|
||||||
concrete MusicGer of Music = MusicI with
|
concrete MusicGer of Music = MusicI with
|
||||||
(Grammar = GrammarGer),
|
(Grammar = GrammarGer),
|
||||||
@@ -172,8 +175,10 @@ Both of these files can use the same ``path``, defined as
|
|||||||
The ``present`` category contains the compiled resources, restricted to
|
The ``present`` category contains the compiled resources, restricted to
|
||||||
present tense; ``alltenses`` has the full resources.
|
present tense; ``alltenses`` has the full resources.
|
||||||
|
|
||||||
To localize the music player system to a new language, all that is needed is two modules,
|
To localize the music player system to a new language,
|
||||||
one implementing ``MusicLex`` and the other instantiating ``Music``. The latter is
|
all that is needed is two modules,
|
||||||
|
one implementing ``MusicLex`` and the other
|
||||||
|
instantiating ``Music``. The latter is
|
||||||
completely trivial, whereas the former one involves the choice of correct
|
completely trivial, whereas the former one involves the choice of correct
|
||||||
vocabulary and inflectional paradigms. For instance, Finnish is added as follows:
|
vocabulary and inflectional paradigms. For instance, Finnish is added as follows:
|
||||||
```
|
```
|
||||||
@@ -222,8 +227,8 @@ gives its own definition of this function:
|
|||||||
===Parsing with resource grammars?===
|
===Parsing with resource grammars?===
|
||||||
|
|
||||||
The intended use of the resource grammar is as a library for writing
|
The intended use of the resource grammar is as a library for writing
|
||||||
application grammars. It is not designed for e.g. parsing newspaper text. There
|
application grammars. It is not designed for parsing e.g. newspaper text. There
|
||||||
are several reasons why this is not so practical:
|
are several reasons why this is not practical:
|
||||||
- Efficiency: the resource grammar uses complex data structures, in
|
- Efficiency: the resource grammar uses complex data structures, in
|
||||||
particular, discontinuous constituents, which make parsing slow and the
|
particular, discontinuous constituents, which make parsing slow and the
|
||||||
parser size huge.
|
parser size huge.
|
||||||
@@ -245,9 +250,9 @@ details such as inflection, agreement, and word order.
|
|||||||
It is for the same reasons that resource grammars are not adequate for translation.
|
It is for the same reasons that resource grammars are not adequate for translation.
|
||||||
That the syntax API is implemented for different languages of course makes
|
That the syntax API is implemented for different languages of course makes
|
||||||
it possible to translate via it - but there is no guarantee of translation
|
it possible to translate via it - but there is no guarantee of translation
|
||||||
equivalence. Of course, the use of parametrized implementations such as ``MusicI``
|
equivalence. Of course, the use of functor implementations such as ``MusicI``
|
||||||
above only extends to those cases where the syntax API does give translation
|
above only extends to those cases where the syntax API does give translation
|
||||||
equivalence - but this must be seen as a limiting case, and real applications
|
equivalence - but this must be seen as a limiting case, and bigger applications
|
||||||
will often use only restricted inheritance of ``MusicI``.
|
will often use only restricted inheritance of ``MusicI``.
|
||||||
|
|
||||||
|
|
||||||
@@ -257,7 +262,8 @@ will often use only restricted inheritance of ``MusicI``.
|
|||||||
===Inflection paradigms===
|
===Inflection paradigms===
|
||||||
|
|
||||||
Inflection paradigms are defined separately for each language //L//
|
Inflection paradigms are defined separately for each language //L//
|
||||||
in the module ``Paradigms``//L//. To test them, the command ``cc`` (= ``compute_concrete``)
|
in the module ``Paradigms``//L//. To test them, the command
|
||||||
|
``cc`` (= ``compute_concrete``)
|
||||||
can be used:
|
can be used:
|
||||||
```
|
```
|
||||||
> i -retain german/ParadigmsGer.gf
|
> i -retain german/ParadigmsGer.gf
|
||||||
@@ -292,13 +298,14 @@ For the sake of convenience, every language implements these four paradigms:
|
|||||||
It is often possible to initialize a lexicon by just using these functions,
|
It is often possible to initialize a lexicon by just using these functions,
|
||||||
and later revise it by using the more involved paradigms. For instance, in
|
and later revise it by using the more involved paradigms. For instance, in
|
||||||
German we cannot use ``regN "Lied"`` for ``Song``, because the result would be a
|
German we cannot use ``regN "Lied"`` for ``Song``, because the result would be a
|
||||||
Masculine noun with the plural form ``"Liede"``. The individual ``Paradigms`` modules
|
Masculine noun with the plural form ``"Liede"``.
|
||||||
|
The individual ``Paradigms`` modules
|
||||||
tell what cases are covered by the regular heuristics.
|
tell what cases are covered by the regular heuristics.
|
||||||
|
|
||||||
As a limiting case, one could even initialize the lexicon for a new language
|
As a limiting case, one could even initialize the lexicon for a new language
|
||||||
by copying the English (or some other already existing) lexicon. This will
|
by copying the English (or some other already existing) lexicon. This would
|
||||||
produce language with correct grammar but with content words directly borrowed from
|
produce language with correct grammar but with content words directly borrowed from
|
||||||
English.
|
English - maybe not so strange in certain technical domains.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@@ -311,14 +318,16 @@ a group of one or more related categories. For instance, the module
|
|||||||
Thus the proper place to find out how nouns are modified with adjectives
|
Thus the proper place to find out how nouns are modified with adjectives
|
||||||
is ``Noun``, because the result of the construction is again a common noun.
|
is ``Noun``, because the result of the construction is again a common noun.
|
||||||
|
|
||||||
Browsing the libraries is helped by the gfdoc-generated HTML pages.
|
Browsing the libraries is helped by the gfdoc-generated HTML pages,
|
||||||
|
whose LaTeX versions are included in the present document.
|
||||||
However, this is still not easy, and the most efficient way is
|
However, this is still not easy, and the most efficient way is
|
||||||
probably to use the parser.
|
probably to use the parser.
|
||||||
Even though parsing is not an intended end-user application
|
Even though parsing is not an intended end-user application
|
||||||
of resource grammars, it is a useful technique for application grammarians
|
of resource grammars, it is a useful technique for application grammarians
|
||||||
to browse the library. To find out what resource function does some
|
to browse the library. To find out which resource function implements
|
||||||
particular job, you can just parse a string that exemplifies this job. For
|
a particular structure, one can just parse a string that exemplifies this
|
||||||
instance, to find out how sentences are built using transitive verbs, write
|
structure. For instance, to find out how sentences are built using
|
||||||
|
transitive verbs, write
|
||||||
```
|
```
|
||||||
> i english/LangEng.gf
|
> i english/LangEng.gf
|
||||||
|
|
||||||
@@ -381,8 +390,8 @@ However, the technique of example-based grammar writing has some limitations:
|
|||||||
it may not be the intended one. The other parses are shown in a comment, from
|
it may not be the intended one. The other parses are shown in a comment, from
|
||||||
where they must/can be picked manually.
|
where they must/can be picked manually.
|
||||||
- Lexicality. The arguments of a function must be atomic identifiers, and are thus
|
- Lexicality. The arguments of a function must be atomic identifiers, and are thus
|
||||||
not available for categories that have no lexical items. For instance, the ``PropKind``
|
not available for categories that have no lexical items.
|
||||||
rule above gives the result
|
For instance, the ``PropKind`` rule above gives the result
|
||||||
```
|
```
|
||||||
lin
|
lin
|
||||||
PropKind car_N old_A = AdjCN (UseN car_N) (PositA old_A) ;
|
PropKind car_N old_A = AdjCN (UseN car_N) (PositA old_A) ;
|
||||||
@@ -400,8 +409,9 @@ and then use this lexicon instead of the standard one included in ``Lang``.
|
|||||||
|
|
||||||
===Special-purpose APIs===
|
===Special-purpose APIs===
|
||||||
|
|
||||||
To give an analogy with a well-known type setting program, GF can be compared
|
To give an analogy with the well-known type setting software, GF can be compared
|
||||||
with TeX and the resource grammar library with LaTeX. As TeX frees the author
|
with TeX and the resource grammar library with LaTeX.
|
||||||
|
Just like TeX frees the author
|
||||||
from thinking about low-level problems of page layout, so GF frees the grammarian
|
from thinking about low-level problems of page layout, so GF frees the grammarian
|
||||||
from writing parsing and generation algorithms. But quite a lot of knowledge of
|
from writing parsing and generation algorithms. But quite a lot of knowledge of
|
||||||
//how// to write grammars is still needed, and the resource grammar library helps
|
//how// to write grammars is still needed, and the resource grammar library helps
|
||||||
@@ -436,12 +446,13 @@ The implementation of this module is the functor ``PredicationI``:
|
|||||||
Of course, ``Predication`` can be opened together with ``Grammar``, but using
|
Of course, ``Predication`` can be opened together with ``Grammar``, but using
|
||||||
the resulting grammar for parsing can be frustrating, since having both
|
the resulting grammar for parsing can be frustrating, since having both
|
||||||
ways of building clauses simultaneously available will produce spurious
|
ways of building clauses simultaneously available will produce spurious
|
||||||
ambiguities. Using ``Predication`` without ``Verb`` for parsing is a better idea,
|
ambiguities. But using just ``Predication`` without ``Verb``
|
||||||
since parsing is also made more efficient without rules for the ``VP`` category.
|
for parsing is a good idea,
|
||||||
|
since parsing is more efficient without rules producing verb phrases.
|
||||||
|
|
||||||
The use of special-purpose APIs is to some extent just an alternative
|
The use of special-purpose APIs is to some extent just an alternative
|
||||||
to grammar writing by parsing, and its importance may decrease as parsing
|
to grammar writing by parsing, and its importance may decrease as parsing
|
||||||
with resource grammars gets more efficient.
|
with resource grammars becomes more practical.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@@ -556,9 +567,11 @@ many constructors:
|
|||||||
The linguistic phenomena mostly discussed in both traditional grammars and modern
|
The linguistic phenomena mostly discussed in both traditional grammars and modern
|
||||||
syntax belong to the level of Clauses, that is, lines 9-13, and occasionally
|
syntax belong to the level of Clauses, that is, lines 9-13, and occasionally
|
||||||
to Sentences, lines 5-13. At this level, the major categories are
|
to Sentences, lines 5-13. At this level, the major categories are
|
||||||
``NP`` (Noun Phrase) and ``VP`` (Verb Phrase). A Clause typically consists of just an
|
``NP`` (Noun Phrase) and ``VP`` (Verb Phrase). A Clause typically
|
||||||
``NP`` and a ``VP``. The internal structure of both ``NP`` and ``VP`` can be very complex,
|
consists of just an ``NP`` and a ``VP``.
|
||||||
and these categories are mutually recursive: not only can a ``VP`` contain an ``NP``,
|
The internal structure of both ``NP`` and ``VP`` can be very complex,
|
||||||
|
and these categories are mutually recursive: not only can a ``VP``
|
||||||
|
contain an ``NP``,
|
||||||
```
|
```
|
||||||
[VP loves [NP Mary]]
|
[VP loves [NP Mary]]
|
||||||
```
|
```
|
||||||
@@ -588,7 +601,8 @@ The most frequent ways are
|
|||||||
|
|
||||||
|
|
||||||
**Verb**.
|
**Verb**.
|
||||||
How to construct VPs. The main mechanism is verbs with their arguments, for instance,
|
How to construct VPs. The main mechanism is verbs with their arguments,
|
||||||
|
for instance,
|
||||||
- one-place verbs: "walks"
|
- one-place verbs: "walks"
|
||||||
- two-place verbs: "loves Mary"
|
- two-place verbs: "loves Mary"
|
||||||
- three-place verbs: "gives her a kiss"
|
- three-place verbs: "gives her a kiss"
|
||||||
@@ -613,12 +627,13 @@ How to constuct ``AP``s. The main ways are
|
|||||||
**Adverb**.
|
**Adverb**.
|
||||||
How to construct ``Adv``s. The main ways are
|
How to construct ``Adv``s. The main ways are
|
||||||
- from adjectives: "slowly"
|
- from adjectives: "slowly"
|
||||||
|
- as prepositional phrases: "in the car"
|
||||||
|
|
||||||
|
|
||||||
===Modules and their names===
|
===Modules and their names===
|
||||||
|
|
||||||
The resource modules are named after the kind of phrases that are constructed in them,
|
The resource modules are named after the kind of
|
||||||
|
phrases that are constructed in them,
|
||||||
and they can be roughly classified by the "level" or "size" of expressions that are
|
and they can be roughly classified by the "level" or "size" of expressions that are
|
||||||
formed in them:
|
formed in them:
|
||||||
- Larger than sentence: ``Text``, ``Phrase``
|
- Larger than sentence: ``Text``, ``Phrase``
|
||||||
@@ -631,7 +646,8 @@ Because of mutual recursion such as in embedded sentences, this classification i
|
|||||||
not a complete order. However, no mutual dependence is needed between the
|
not a complete order. However, no mutual dependence is needed between the
|
||||||
modules in a formal sense - they can all be compiled separately. This is due
|
modules in a formal sense - they can all be compiled separately. This is due
|
||||||
to the module ``Cat``, which defines the type system common to the other modules.
|
to the module ``Cat``, which defines the type system common to the other modules.
|
||||||
For instance, the types ``NP`` and ``VP`` are defined in ``Cat``, and the module ``Verb`` only
|
For instance, the types ``NP`` and ``VP`` are defined in ``Cat``,
|
||||||
|
and the module ``Verb`` only
|
||||||
needs to know what is given in ``Cat``, not what is given in ``Noun``. To implement
|
needs to know what is given in ``Cat``, not what is given in ``Noun``. To implement
|
||||||
a rule such as
|
a rule such as
|
||||||
```
|
```
|
||||||
@@ -665,7 +681,8 @@ The module ``Idiom`` is a collection of idiomatic structures whose
|
|||||||
implementation is very language-dependent. An example is existential
|
implementation is very language-dependent. An example is existential
|
||||||
structures ("there is", "es gibt", "il y a", etc).
|
structures ("there is", "es gibt", "il y a", etc).
|
||||||
|
|
||||||
The module ``Lang`` combines ``Grammar`` with a ``Lexicon`` of ca. 350 content words:
|
The module ``Lang`` combines ``Grammar`` with a ``Lexicon`` of
|
||||||
|
ca. 350 content words:
|
||||||
```
|
```
|
||||||
abstract Lang = Grammar, Lexicon
|
abstract Lang = Grammar, Lexicon
|
||||||
```
|
```
|
||||||
@@ -693,18 +710,22 @@ rules. The top level of each languages looks as follows (with English as example
|
|||||||
abstract English = Grammar, ExtraEngAbs, DictEngAbs
|
abstract English = Grammar, ExtraEngAbs, DictEngAbs
|
||||||
```
|
```
|
||||||
where ``ExtraEngAbs`` is a collection of syntactic structures specific to English,
|
where ``ExtraEngAbs`` is a collection of syntactic structures specific to English,
|
||||||
and ``DictEngAbs`` is an English dictionary (at the moment, it consists of ``IrregEngAbs``,
|
and ``DictEngAbs`` is an English dictionary
|
||||||
|
(at the moment, it consists of ``IrregEngAbs``,
|
||||||
the irregular verbs of English). Each of these language-specific grammars has
|
the irregular verbs of English). Each of these language-specific grammars has
|
||||||
the potential to grow into a full-scale grammar of the language. These grammar
|
the potential to grow into a full-scale grammar of the language. These grammar
|
||||||
can also be used as libraries, but the possibility of using functors is lost.
|
can also be used as libraries, but the possibility of using functors is lost.
|
||||||
|
|
||||||
To give a better overview of language-specific structures, modules like ``ExtraEngAbs``
|
To give a better overview of language-specific structures,
|
||||||
are built from a language-independent module ``ExtraAbs`` by restricted inheritance:
|
modules like ``ExtraEngAbs``
|
||||||
|
are built from a language-independent module ``ExtraAbs``
|
||||||
|
by restricted inheritance:
|
||||||
```
|
```
|
||||||
abstract ExtraEngAbs = Extra [f,g,...]
|
abstract ExtraEngAbs = Extra [f,g,...]
|
||||||
```
|
```
|
||||||
Thus any category and function in ``Extra`` may be shared by a subset of all
|
Thus any category and function in ``Extra`` may be shared by a subset of all
|
||||||
languages. One can see this set-up as a matrix, which tells what ``Extra`` structures
|
languages. One can see this set-up as a matrix, which tells
|
||||||
|
what ``Extra`` structures
|
||||||
are implemented in what languages. For the common API in ``Grammar``, the matrix
|
are implemented in what languages. For the common API in ``Grammar``, the matrix
|
||||||
is filled with 1's (everything is implemented in every language).
|
is filled with 1's (everything is implemented in every language).
|
||||||
|
|
||||||
@@ -735,7 +756,6 @@ has only been exploited in a very small scale so far.
|
|||||||
%!include: ../lib/resource-1.0/abstract/Idiom.txt
|
%!include: ../lib/resource-1.0/abstract/Idiom.txt
|
||||||
%!include: ../lib/resource-1.0/abstract/Noun.txt
|
%!include: ../lib/resource-1.0/abstract/Noun.txt
|
||||||
%!include: ../lib/resource-1.0/abstract/Numeral.txt
|
%!include: ../lib/resource-1.0/abstract/Numeral.txt
|
||||||
%!include: ../lib/resource-1.0/abstract/OldLexicon.txt
|
|
||||||
%!include: ../lib/resource-1.0/abstract/Phrase.txt
|
%!include: ../lib/resource-1.0/abstract/Phrase.txt
|
||||||
%!include: ../lib/resource-1.0/abstract/Question.txt
|
%!include: ../lib/resource-1.0/abstract/Question.txt
|
||||||
%!include: ../lib/resource-1.0/abstract/Relative.txt
|
%!include: ../lib/resource-1.0/abstract/Relative.txt
|
||||||
|
|||||||
Reference in New Issue
Block a user