1
0
forked from GitHub/gf-core

Represent identifiers as UTF-8-encoded ByteStrings

This was a fairly simple change thanks to previous work on making the Ident
type abstract and the fact that PGF.CId already uses UTF-8-encoded
ByteStrings.

One potential pitfall is that Data.ByteString.UTF8 uses the same type for
ByteStrings as Data.ByteString. I renamed ident2bs to ident2utf8 and
bsCId to utf8CId, to make it clearer that they work with UTF-8-encoded
ByteStrings.

Since both the compiler input and identifiers are now UTF-8-encoded
ByteStrings, the lexer now creates identifiers without copying any characters.
**END OF DESCRIPTION***

Place the long patch description above the ***END OF DESCRIPTION*** marker.
The first line of this file will be the patch name.


This patch contains the following changes:

M ./src/compiler/GF/Compile/CheckGrammar.hs -3 +3
M ./src/compiler/GF/Compile/GrammarToPGF.hs -2 +2
M ./src/compiler/GF/Grammar/Binary.hs -5 +1
M ./src/compiler/GF/Grammar/Lexer.x -11 +13
M ./src/compiler/GF/Infra/Ident.hs -19 +36
M ./src/runtime/haskell/PGF.hs -1 +1
M ./src/runtime/haskell/PGF/CId.hs -2 +3
This commit is contained in:
hallgren
2013-11-26 16:12:03 +00:00
parent 9d7fdf7c9a
commit 3f57151cc3
7 changed files with 60 additions and 44 deletions

View File

@@ -22,7 +22,7 @@ module PGF(
CId, mkCId, wildCId,
showCId, readCId,
-- extra
ppCId, pIdent, bsCId,
ppCId, pIdent, utf8CId,
-- * Languages
Language,

View File

@@ -3,7 +3,7 @@ module PGF.CId (CId(..),
readCId, showCId,
-- utils
bsCId, pCId, pIdent, ppCId) where
utf8CId, pCId, pIdent, ppCId) where
import Control.Monad
import qualified Data.ByteString.Char8 as BS
@@ -24,7 +24,8 @@ wildCId = CId (BS.singleton '_')
mkCId :: String -> CId
mkCId s = CId (UTF8.fromString s)
bsCId = CId
-- | Creates an identifier from a UTF-8-encoded 'ByteString'
utf8CId = CId
-- | Reads an identifier from 'String'. The function returns 'Nothing' if the string is not valid identifier.
readCId :: String -> Maybe CId