Faster regular expression pattern matching in the grammar compiler.

The sequence operator (x+y) was implemented by splitting the string to be
matched at all positions and trying to match the parts against the two
subpatterns. To reduce the number of splits, we now estimate the minimum and
maximum length of the string that the subpatterns could match. For common
cases, where one of the subpatterns is a string of known length, like
in (x+"y") or (x + ("a"|"o"|"u"|"e")+"y"), only one split will be tried.
This commit is contained in:
hallgren
2013-02-27 20:59:43 +00:00
parent 95c4cbb8f5
commit 0feb386691
5 changed files with 50 additions and 10 deletions

View File

@@ -8,7 +8,7 @@ module GF.Compile.Compute.ConcreteNew
import GF.Grammar hiding (Env, VGen, VApp, VRecType)
import GF.Grammar.Lookup(lookupResDefLoc,allParamValues)
import GF.Grammar.Predef(cPredef,cErrorType,cTok,cStr)
import GF.Grammar.PatternMatch(matchPattern)
import GF.Grammar.PatternMatch(matchPattern,measurePatt)
import GF.Grammar.Lockfield(unlockRecord,lockLabel,isLockLabel,lockRecType)
import GF.Compile.Compute.Value hiding (Predefined(..))
import GF.Compile.Compute.Predef(predef,predefName,delta)
@@ -320,7 +320,7 @@ valueTable env i cs =
TWild _ -> True
_ -> False
valueCase (p,t) = do p' <- inlinePattMacro p
valueCase (p,t) = do p' <- measurePatt # inlinePattMacro p
let pvs = pattVars p'
vt <- value (extend pvs env) t
return (p', \ vs -> Bind $ \ bs -> vt (push' p' bs pvs vs))