This bug was introduced sometime between 2013-08-21 and 2013-11-01 and caused
the function convertTerm in GF.Compile.GeneratePMCFG to encounter a EPatt where
it expected Strs. I fixed it by applying the function getPatts (from the old
partial evaluator) to the pattern.
PGF service requests are stateless and can run in parallel, but some other
requests handled by the server are not and might even change the current
working directory temporarily, and this affects all threads, so it is
important that the PGF service requests access PGF files by absolute paths.
If the C run-time library is compiled and installed on your system, you can now
do 'cabal configure -fc-runtime' to get the following extras:
+ The haskell binding to the C run-time library will be included in the
PGF library (so you can import it in Haskell applications).
Documentation on the new modules will be included when you run
'cabal haddock'.
+ The new command 'pgf-shell', implemented on top of haskell binding to
the C run-time system.
+ Three new commands in the web API: c-parse, c-linearize and
c-translate. Their interfaces are similar to the corresponding commands
without the "c-" prefix, but they should be considered preliminary.
When running a command like
gf -make L_1.gf ... L_n.gf
gf now avoids recreating the target PGF file if it already exists and is
up-to-date.
gf still reads all required .gfo files, so significant additional speed
improvements are still possible. This could be done by reading .gfo files
more lazily...
When running a command like
gf -make -name=T L_1.pgf ... L_n.pgf
gf now checks if T.pgf exists and is up-to-date before reading and computing
the union of the L_i.pgf files.
The name (T) of the target PGF file has to be given explicitly for this to work,
since otherwise the name is not known until the union has been computed.
If the functions for reading PGF files and computing the union were lazier,
this would not be necessary...
This means that the -old-comp and -new-comp flags are not recognized anymore.
The only functional difference is that printnames were still normalized with
the old partial evaluator. Now that is done with the new partial evaluator.
This was a fairly simple change thanks to previous work on making the Ident
type abstract and the fact that PGF.CId already uses UTF-8-encoded
ByteStrings.
One potential pitfall is that Data.ByteString.UTF8 uses the same type for
ByteStrings as Data.ByteString. I renamed ident2bs to ident2utf8 and
bsCId to utf8CId, to make it clearer that they work with UTF-8-encoded
ByteStrings.
Since both the compiler input and identifiers are now UTF-8-encoded
ByteStrings, the lexer now creates identifiers without copying any characters.
**END OF DESCRIPTION***
Place the long patch description above the ***END OF DESCRIPTION*** marker.
The first line of this file will be the patch name.
This patch contains the following changes:
M ./src/compiler/GF/Compile/CheckGrammar.hs -3 +3
M ./src/compiler/GF/Compile/GrammarToPGF.hs -2 +2
M ./src/compiler/GF/Grammar/Binary.hs -5 +1
M ./src/compiler/GF/Grammar/Lexer.x -11 +13
M ./src/compiler/GF/Infra/Ident.hs -19 +36
M ./src/runtime/haskell/PGF.hs -1 +1
M ./src/runtime/haskell/PGF/CId.hs -2 +3
1. The default encoding is changed from Latin-1 to UTF-8.
2. Alternate encodings should be specified as "--# -coding=enc", the old
"flags coding=enc" declarations have no effect but are still checked for
consistency.
3. A transitional warning is generated for files that contain non-ASCII
characters without specifying a character encoding:
"Warning: default encoding has changed from Latin-1 to UTF-8"
4. Conversion to Unicode is now done *before* lexing. This makes it possible
to allow arbitrary Unicode characters in identifiers. But identifiers are
still stored as ByteStrings, so they are limited to Latin-1 characters
for now.
5. Lexer.hs is no longer part of the repository. We now generate the lexer
from Lexer.x with alex>=3. Some workarounds for bugs in alex-3.0 were
needed. These bugs might already be fixed in newer versions of alex, but
we should be compatible with what is shipped in the Haskell Platform.
+ Eliminated vairous ad-hoc coersion functions between specific monads
(IO, Err, IOE, Check) in favor of more general lifting functions
(liftIO, liftErr).
+ Generalized many basic monadic operations from specific monads to
arbitrary monads in the appropriate class (MonadIO and/or ErrorMonad),
thereby completely eliminating the need for lifting functions in lots
of places.
This can be considered a small step forward towards a cleaner
compiler API and more malleable compiler code in general.
1. No temporary files are created.
2. The output of a system command is read lazily, making it feasible to
process large or even infinite output, e.g. the following works as
expected:
? "yes" | ? "head -5" | ps -lextext
The system_pipe (aka "?") command creates a temporary file _tmpi containing
the input of the system command. It *both* appends _tmpi as an extra argument
to the system command line *and* adds an input redirection "< _tmpi". (It
also uses and output redirection "> _tmpo" to captures the output of the
command.)
With this patch, the _tmpi argument is no longer appended to the command line.
This allows system_pipe to work with pure filters, such as the "tr" commands,
but it will no longer work with commands that require an input file name.
(It is possible to use write_file instead...)
TODO: it would also be fairly easy to eliminate the creation of the _tmpi and
_tmpo files altogether.
Add proper type checking of course-of-values tables:
+ Make sure that all subterms have the same type.
+ Resolve overloaded operators.
Note though that the GF book states in C.4.12 that the "course-of-values
table [...] format is not recommended for GF source code, since the
ordering of parameter values is not specified and therefore a
compiler-internal decision."
The CF parser in GF.Grammar.CF assigns function names to the rules, but they
are not always unique, causing rules to be dropped in the follwing CF->GF
conversion. So a pass has been added before the CF->GF conversion, to make
sure that function names are unique.
A comment says "rules have an amazingly easy parser", but the parser looks
like quick hack. It is very sloppy and silently ignores many errors, e.g.
- Explicitly given function names should end with '.', but if the do not, the
last character in the function name is silently dropped.
- Everything following a ';' is silently dropped.
Trailing spaces caused the command line parse to be ambiguous, and
ambiguous parses were rejected by function readCommandLine, causing
the cryptic error message "command not parsed".
The only use of PGF.Tree outside the PGF library was in GF.Command.Commands,
and it was eliminated by using PGF.Expr directly instead.
PGF.Paraphrase still uses PGF.Tree.
This module should not be part of the public PGF library API, and it was only
used in GF.CompileToAPI, so the code was moved there. The module defined
constFuncs and syntaxFuncs, but only syntaxFuncs was used.