From dbb0f3f3e464044aef1de3abfc1286569ea6543f Mon Sep 17 00:00:00 2001
From: bjorn <bjorn@bringert.net>
Date: Mon, 15 Sep 2008 12:38:37 +0000
Subject: [PATCH] Temporary fix for the grave accent a encoding problem: change
 compatPrint to id.

The problem is that lower case a with a grave accent is coded in UTF-8 as \195\160.
Unicode character \160 is non-breaking space, so Haskell's words function
will break a UTF-8 encoded string at this character.
String literals in the .gfo file are UTF-8 encoded in generateModuleCode,
just before the call to prGrammar (which uses compactPrint, which used words).
The real solution would be to pretty-print the grammar to Unicode, and then
encode as UTF-8. The problem with that is Latin-1 identifers. They are now
kept in Latin-1 in the .gfo file, since Alex can't handle Unicode.
The real solution to that would be to fix Alex to handle Unicode, but
that is non-trivial. GHC interally uses a very hacky .x file to be
able to lex UTF-8 source files.

An alternative solution that doesn't address the weirdness of using two different
encodings in the same .gfo as we do now, is to incorporate compactPrint
into the grammar printer, to avoid having to do any postprocessing.
---
 src/GF/Infra/CompactPrint.hs | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/GF/Infra/CompactPrint.hs b/src/GF/Infra/CompactPrint.hs
index 486c9e183..ca2452de6 100644
--- a/src/GF/Infra/CompactPrint.hs
+++ b/src/GF/Infra/CompactPrint.hs
@@ -5,7 +5,8 @@ compactPrint = compactPrintCustom keywordGF (const False)
 
 compactPrintGFCC = compactPrintCustom (const False) keywordGFCC
 
-compactPrintCustom pre post = dps . concat . map (spaceIf pre post) . words 
+-- FIXME: using words is not safe, since this is run on UTF-8 encoded data.
+compactPrintCustom pre post = id -- dps . concat . map (spaceIf pre post) . words 
 
 dps = dropWhile isSpace