Happy parse lex #1

Merged
msydneyslaga merged 7 commits from happy-parse-lex into main 2023-11-20 14:09:33 -07:00
15 changed files with 792 additions and 430 deletions

View File

@@ -0,0 +1,170 @@
Lexing, Parsing, and Layouts
============================
The C-style languages of my previous experiences have all had quite trivial
lexical analysis stages, peaking in complexity when I streamed tokens lazily in
C. The task of tokenising a C-style language is very simple in description: you
ignore all whitespace and point out what you recognise. If you don't recognise
something, check if it's a literal or an identifier. Should it be neither,
return an error.
On paper, both lexing and parsing a Haskell-like language seem to pose a few
greater challenges. Listed by ascending intimidation factor, some of the
potential roadblocks on my mind before making an attempt were:
* Operators; Haskell has not only user-defined infix operators, but user-defined
precedence levels and associativities. I recall using an algorithm that looked
up infix, prefix, postfix, and even mixfix operators up in a global table to
call their appropriate parser (if their precedence was appropriate, also
stored in the table). I never modified the table at runtime, however this
could be a very nice solution for Haskell.
* Context-sensitive keywords; Haskell allows for some words to be used as identifiers in
appropriate contexts, such as :code:`family`, :code:`role`, :code:`as`.
Reading a note_ found in `GHC's lexer
<https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser/Lexer.x#L1133>`_,
it appears that keywords are only considered in bodies for which their use is
relevant, e.g. :code:`family` and :code:`role` in type declarations,
:code:`as` after :code:`case`; :code:`if`, :code:`then`, and :code:`else` in
expressions, etc.
* Whitespace sensitivity; While I was comfortable with the idea of a system
similar to Python's INDENT/DEDENT tokens, Haskell seemed to use whitespace to
section code in a way that *felt* different.
.. _note: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/coding-style#2-using-notes
After a bit of thought and research, whitespace sensitivity in the form of
*layouts* as Haskell and I will refer to them as, are easily the scariest thing
on this list -- however they are achievable!
A Lexical Primer: Python
************************
We will compare and contrast with Python's lexical analysis. Much to my dismay,
Python uses newlines and indentation to separate statements and resolve scope
instead of the traditional semicolons and braces found in C-style languages (we
may generally refer to these C-style languages as *explicitly-sectioned*).
Internally during tokenisation, when the Python lexer begins a new line, they
compare the indentation of the new line with that of the previous and apply the
following rules:
1. If the new line has greater indentation than the previous, insert an INDENT
token and push the new line's indentation level onto the indentation stack
(the stack is initialised with an indentation level of zero).
2. If the new line has lesser indentation than the previous, pop the stack until
the top of the stack is greater than the new line's indentation level. A
DEDENT token is inserted for each level popped.
3. If the indentation is equal, insert a NEWLINE token to terminate the previous
line, and leave it at that!
Parsing Python with the INDENT, DEDENT, and NEWLINE tokens is identical to
parsing a language with braces and semicolons. This is a solution pretty in line
with Python's philosophy of the "one correct answer" (TODO: this needs a
source). In developing our *layout* rules, we will follow in the pattern of
translating the whitespace-sensitive source language to an explicitly sectioned
language.
But What About Haskell?
***********************
We saw that Python, the most notable example of an implicitly sectioned
language, is pretty simple to lex. Why then am I so afraid of Haskell's layouts?
To be frank, I'm far less scared after asking myself this -- however there are
certainly some new complexities that Python needn't concern. Haskell has
implicit line *continuation*: forms written over multiple lines; indentation
styles often seen in Haskell are somewhat esoteric compared to Python's
"s/[{};]//".
.. code-block:: haskell
-- line continuation
something = this is a
single expression
-- an extremely common style found in haskell
data Python = Users
{ are :: Crying
, right :: About
, now :: Sorry
}
-- another formatting oddity
-- note that this is not line contiation!
-- `look at`, `this`, and `alignment`
-- are all separate expressions!
anotherThing = do look at
this
alignment
But enough fear, lets actually think about implementation. Firstly, some
formality: what do we mean when we say layout? We will define layout as the
rules we apply to an implicitly-sectioned language in order to yield one that is
explicitly-sectioned. We will also define indentation of a lexeme as the column
number of its first character.
Thankfully for us, our entry point is quite clear; layouts only appear after a
select few keywords, (with a minor exception; TODO: elaborate) being :code:`let`
(followed by supercombinators), :code:`where` (followed by supercombinators),
:code:`do` (followed by expressions), and :code:`of` (followed by alternatives)
(TODO: all of these terms need linked glossary entries). Under this assumption,
we give the following rule:
1. If a :code:`let`, :code:`where`, :code:`do`, or :code:`of` keyword is not
followed by the lexeme :code:`{`, the token :math:`\{n\}` is inserted after
the keyword, where :math:`n` is the indentation of the next lexeme if there
is one, or 0 if the end of file has been reached.
Henceforth :math:`\{n\}` will denote the token representing the begining of a
layout; similar in function to a brace, but it stores the indentation level for
subsequent lines to compare with. We must introduce an additional input to the
function handling layouts. Obviously, such a function would require the input
string, but a helpful book-keeping tool which we will make good use of is a
stack of "layout contexts", describing the current cascade of layouts. Each
element is either a :code:`NoLayout`, indicating an explicit layout (i.e. the
programmer inserted semicolons and braces herself) or a :code:`Layout n` where
:code:`n` is a non-negative integer representing the indentation level of the
enclosing context.
.. code-block:: haskell
f x -- layout stack: []
= let -- layout keyword; remember indentation of next token
y = w * w -- layout stack: [Layout 10]
w = x + x
in do -- layout keyword; next token is a brace!
{ -- layout stack: [NoLayout]
pure }
In the code seen above, notice that :code:`let` allows for multiple definitions,
separated by a newline. We accomate for this with a token :math:`\langle n
\rangle` which compliments :math:`\{n\}` in how it functions as a closing brace
that stores indentation. We give a rule to describe the source of such a token:
2. When the first lexeme on a line is preceeded by only whitespace a
:math:`\langle n \rangle` token is inserted before the lexeme, where
:math:`n` is the indentation of the lexeme, provided that it is not, as a
consequence of rule 1 or rule 3 (as we'll see), preceded by {n}.
Lastly, to handle the top level we will initialise the stack with a
:math:`\{n\}` where :math:`n` is the indentation of the first lexeme.
3. If the first lexeme of a module is not '{' or :code:`module`, then it is
preceded by :math:`\{n\}` where :math:`n` is the indentation of the lexeme.
This set of rules is adequete enough to satisfy our basic concerns about line
continations and layout lists. For a more pedantic description of the layout
system, see `chapter 10
<https://www.haskell.org/onlinereport/haskell2010/haskellch10.html>`_ of the
2010 Haskell Report, which I **heavily** referenced here.
References
----------
* `Python's lexical analysis
<https://docs.python.org/3/reference/lexical_analysis.html>`_
* `Haskell Syntax Reference
<https://www.haskell.org/onlinereport/haskell2010/haskellch10.html>`_

View File

@@ -1,3 +0,0 @@
Parser Combinators
==================

15
docs/src/glossary.rst Normal file
View File

@@ -0,0 +1,15 @@
Glossary
========
Haskell and Haskell culture is infamous for using scary mathematical terms for
simple ideas. Please excuse us, it's really fun :3.
.. glossary::
supercombinator
An expression with no free variables. For most purposes, just think of a
top-level definition.
case alternative
An possible match in a case expression (TODO: example)

View File

@@ -6,6 +6,12 @@ Contents
.. toctree::
:maxdepth: 2
:caption: Index
glossary.rst
.. toctree::
:maxdepth: 1
:caption: Commentary
:glob:

View File

@@ -22,16 +22,22 @@ library
, TIM
other-modules: Data.Heap
, Data.Pretty
, Control.Parser
, Core.Syntax
, Core.Parse
, Core.TH
, Core.Examples
, Core.Lex
build-tool-depends: happy:happy, alex:alex
-- other-extensions:
build-depends: base ^>=4.18.0.0
, containers
, microlens
, microlens-th
, template-haskell
-- required for happy
, array
hs-source-dirs: src
default-language: GHC2021

10
src/Compiler/RLPC.hs Normal file
View File

@@ -0,0 +1,10 @@
{-# LANGUAGE GeneralisedNewtypeDeriving #-}
module Compiler.RLPC
( RLPC(..)
)
where
-- TODO: fancy errors
newtype RLPC a = RLPC { runRLPC :: Either String a }
deriving (Functor, Applicative, Monad)

View File

@@ -1,101 +0,0 @@
{-|
Module : Control.Parser
Description : Parser combinators
This module implements an interface for parser *types*, used in lexical analysis
and parsing. For the implementation of the rlp language's parser, see 'Parse'.
-}
{-# LANGUAGE GeneralisedNewtypeDeriving #-}
{-# LANGUAGE BlockArguments, LambdaCase #-}
module Control.Parser
( ParserT
, runParserT
, satisfy
, char
, spaces
, surround
, string
, match
, termMany
, sepSome
-- * Control.Applicative re-exports
, (<|>)
, many
, some
, empty
)
where
----------------------------------------------------------------------------------
import Control.Applicative
import Control.Arrow ((***))
import Control.Monad
import Data.Char
----------------------------------------------------------------------------------
newtype ParserT i m o = ParserT { runParserT :: i -> m (i, o) }
deriving (Functor)
instance (Monad m) => Applicative (ParserT i m) where
pure a = ParserT \i -> pure (i, a)
m <*> k = ParserT \i -> do
(i',f) <- runParserT m i
fmap (id *** f) $ runParserT k i'
instance (MonadPlus m) => Alternative (ParserT i m) where
empty = ParserT $ const empty
ParserT m <|> ParserT k = ParserT $ \i ->
m i <|> k i
instance (MonadPlus m) => MonadPlus (ParserT i m)
instance (Monad m) => Monad (ParserT i m) where
m >>= k = ParserT $ \i -> do
(i',a) <- runParserT m i
runParserT (k a) i'
instance (MonadFail m) => MonadFail (ParserT i m) where
fail s = ParserT $ \i -> fail s
----------------------------------------------------------------------------------
-- TODO: generalise to non-lists
satisfy :: (MonadPlus m) => (a -> Bool) -> ParserT [a] m a
satisfy p = ParserT $ \case
(x:xs) | p x -> pure (xs,x)
_ -> empty
match :: (MonadPlus m) => (a -> Maybe b) -> ParserT [a] m b
match f = ParserT $ \case
(x:xs) -> case f x of
Just b -> pure (xs,b)
Nothing -> empty
[] -> empty
termMany :: (MonadPlus m) => ParserT i m t -> ParserT i m o -> ParserT i m [o]
termMany t a = many (a <* t)
sepSome :: (MonadPlus m) => ParserT i m t -> ParserT i m o -> ParserT i m [o]
sepSome s a = (:) <$> a <*> many (s *> a)
char :: (MonadPlus m, Eq a) => a -> ParserT [a] m a
char c = satisfy (==c)
string :: (MonadPlus m, Eq a) => [a] -> ParserT [a] m [a]
string s = sequenceA $ char <$> s
----------------------------------------------------------------------------------
surround :: (MonadPlus m)
=> ParserT i m l
-> ParserT i m r
-> ParserT i m c
-> ParserT i m c
surround l r c = l *> c <* r
spaces :: (MonadPlus m) => ParserT String m Int
spaces = length <$> many (satisfy (==' '))

22
src/Core/Examples.hs Normal file
View File

@@ -0,0 +1,22 @@
{-# LANGUAGE QuasiQuotes #-}
module Core.Examples where
----------------------------------------------------------------------------------
import Core.Syntax
import Core.TH
----------------------------------------------------------------------------------
{-
letrecExample :: Program
letrecExample = [core|
pair x y f = f x y;
fst p = p k;
snd p = p k1;
f x y = letrec
{ a = pair x b;
; b = pair y a
} in fst (snd (snd (snd a)));
main = f 3 4;
|]
-}

View File

@@ -1,141 +0,0 @@
{-|
Module : Core.Lex
Description : Core language lexer
-}
module Core.Lex
( CoreToken(..)
, Result(..)
, lexCore
)
where
----------------------------------------------------------------------------------
import Control.Parser
import Control.Applicative
import Control.Monad
import Data.Char
import Data.Functor
import Core.Syntax (Name)
----------------------------------------------------------------------------------
type CoreLexer = ParserT String Result
data Result a = Success a
| Error String Int Int
deriving (Show)
-- TODO: whitespace-sensitive layout
data CoreToken = TokLitInt Int
| TokEquals
| TokLBrace
| TokRBrace
| TokSemicolon
| TokLParen
| TokRParen
| TokLambda
| TokArrow
| TokCase
| TokOf
| TokLet
| TokLetRec
| TokIn
| TokCName Name
| TokName Name
deriving (Show, Eq)
instance Functor Result where
fmap f (Success a) = Success (f a)
fmap _ (Error s l c) = Error s l c
instance Foldable Result where
foldr f z (Success a) = a `f` z
foldr _ z (Error _ _ _) = z
instance Traversable Result where
traverse k (Success a) = fmap Success (k a)
traverse _ (Error s l c) = pure $ Error s l c
instance Applicative Result where
pure = Success
liftA2 f (Success a) (Success b) = Success $ f a b
liftA2 _ (Error s l c) _ = Error s l c
liftA2 _ _ (Error s l c) = Error s l c
instance Alternative Result where
empty = Error "some error! this is a temporary system lol" 0 0
(Success a) <|> _ = Success a
_ <|> b = b
instance Monad Result where
Success a >>= k = k a
Error s l c >>= _ = Error s l c
instance MonadPlus Result
instance MonadFail Result where
fail s = Error s 0 0
----------------------------------------------------------------------------------
lexCore :: String -> Result [CoreToken]
lexCore = fmap snd . runParserT (many (token <* spaces))
token :: CoreLexer CoreToken
token = litInt
<|> lbrace
<|> rbrace
<|> semicolon
<|> lparen
<|> rparen
<|> equals
<|> lambda
<|> arrow
<|> _case
<|> _of
<|> letrec
<|> _let
<|> _in
<|> cName
<|> name
----------------------------------------------------------------------------------
litInt, equals, lparen, rparen, lambda,
arrow, _case, _of, _let, letrec, _in, cName, name :: CoreLexer CoreToken
litInt = TokLitInt . value <$> some (satisfy isDigit)
where
value = foldl (\acc a -> 10*acc + digitToInt a) 0
semicolon = (semis <|> nls) $> TokSemicolon
where
nls = head <$> some (char '\n')
semis = char ';' <* many (char '\n')
equals = char '=' $> TokEquals
lbrace = char '{' $> TokLBrace
rbrace = char '}' $> TokRBrace
lparen = char '(' $> TokLParen
rparen = char ')' $> TokRParen
lambda = (char '\\' <|> char 'λ') $> TokLambda
arrow = string "->" $> TokArrow
_case = string "case" $> TokCase
_of = string "of" $> TokOf
_let = string "let" $> TokLet
letrec = string "letrec" $> TokLetRec
_in = string "in" $> TokIn
cName = TokCName <$> ((:) <$> cNameHead <*> properNameTail)
where cNameHead = satisfy isUpper
name = some (satisfy p) <&> TokName
where p c = not (isSpace c) && c `notElem` ";{}()"
properName :: CoreLexer Name
properName = (:) <$> nameHead <*> properNameTail
where nameHead = satisfy isLetter
properNameTail :: CoreLexer Name
properNameTail = many . satisfy $ \c ->
isLetter c || isDigit c || c == '_'

284
src/Core/Lex.x Normal file
View File

@@ -0,0 +1,284 @@
{
{-# LANGUAGE TemplateHaskell #-}
module Core.Lex
( lexCore
, lexCore'
, CoreToken(..)
, lexTmp
)
where
import Data.Char (chr)
import Debug.Trace
import Core.Syntax
import Lens.Micro
import Lens.Micro.TH
}
%wrapper "monadUserState"
$whitechar = [ \t\n\r\f\v]
$special = [\(\)\,\;\[\]\{\}]
$ascdigit = 0-9
$unidigit = [] -- TODO
$digit = [$ascdigit $unidigit]
$ascsymbol = [\!\#\$\%\&\*\+\.\/\<\=\>\?\@\\\^\|\-\~]
$unisymbol = [] -- TODO
$symbol = [$ascsymbol $unisymbol] # [$special \_\:\"\']
$large = [A-Z \xc0-\xd6 \xd8-\xde]
$small = [a-z \xdf-\xf6 \xf8-\xff \_]
$alpha = [$small $large]
$graphic = [$small $large $symbol $digit $special \:\"\']
$octit = 0-7
$hexit = [0-9 A-F a-f]
$namechar = [$alpha $digit \']
$symchar = [$symbol \:]
$nl = [\n\r]
$white_no_nl = $white # $nl
@reservedid =
case|data|do|import|in|let|letrec|module|of|where
@reservedop =
"=" | \\ | "->"
@varname = $small $namechar*
@conname = $large $namechar*
@varsym = $symbol $symchar*
@consym = \: $symchar*
@decimal = $digit+
rlp :-
-- everywhere: skip whitespace
$white_no_nl+ { skip }
"--"\-*[^$symbol].* { skip }
"{-" { nestedComment }
-- syntactic symbols
<0>
{
"(" { constTok TokenLParen }
")" { constTok TokenRParen }
"{" { lbrace }
"}" { rbrace }
";" { constTok TokenSemicolon }
"," { constTok TokenComma }
}
-- keywords
-- see commentary on the layout system
<0>
{
"let" { constTok TokenLet `andBegin` layout_keyword }
"letrec" { constTok TokenLet `andBegin` layout_keyword }
"of" { constTok TokenOf `andBegin` layout_keyword }
"case" { constTok TokenCase }
"module" { constTok TokenModule }
"in" { letin }
"where" { constTok TokenWhere }
}
-- reserved symbols
<0>
{
"=" { constTok TokenEquals }
"->" { constTok TokenArrow }
}
-- identifiers
<0>
{
-- TODO: qualified names
@varname { lexWith TokenVarName }
@conname { lexWith TokenConName }
@varsym { lexWith TokenVarSym }
}
<0> \n { begin bol }
<bol>
{
\n { skip }
() { doBol `andBegin` 0 }
}
<layout_keyword>
{
$white { skip }
\{ { lbrace `andBegin` 0 }
() { noBrace `andBegin` 0 }
}
{
data Located a = Located AlexPosn a
deriving Show
constTok :: t -> AlexInput -> Int -> Alex (Located t)
constTok t (p,_,_,_) _ = pure $ Located p t
data CoreToken = TokenLet
| TokenLetrec
| TokenIn
| TokenModule
| TokenWhere
| TokenComma
| TokenCase
| TokenOf
| TokenLambda
| TokenArrow
| TokenLitInt Int
| TokenVarName Name
| TokenConName Name
| TokenVarSym Name
| TokenConSym Name
| TokenEquals
| TokenLParen
| TokenRParen
| TokenLBrace
| TokenRBrace
| TokenIndent Int
| TokenDedent Int
| TokenSemicolon
| TokenEOF
deriving Show
data LayoutContext = Layout Int
| NoLayout
deriving Show
data AlexUserState = AlexUserState
{ _ausContext :: [LayoutContext]
}
ausContext :: Lens' AlexUserState [LayoutContext]
ausContext f (AlexUserState ctx)
= fmap
(\a -> AlexUserState a) (f ctx)
{-# INLINE ausContext #-}
pushContext :: LayoutContext -> Alex ()
pushContext c = do
st <- alexGetUserState
alexSetUserState $ st { _ausContext = c : _ausContext st }
popContext :: Alex ()
popContext = do
st <- alexGetUserState
alexSetUserState $ st { _ausContext = drop 1 (_ausContext st) }
getContext :: Alex [LayoutContext]
getContext = do
st <- alexGetUserState
pure $ _ausContext st
type Lexer = AlexInput -> Int -> Alex (Located CoreToken)
alexEOF :: Alex (Located CoreToken)
alexEOF = Alex $ \ st@(AlexState { alex_pos = p }) -> Right (st, Located p TokenEOF)
alexInitUserState :: AlexUserState
alexInitUserState = AlexUserState [Layout 1]
nestedComment :: Lexer
nestedComment _ _ = undefined
lexStream :: Alex [Located CoreToken]
lexStream = do
l <- alexMonadScan
case l of
Located _ TokenEOF -> pure [l]
_ -> (l:) <$> lexStream
lexCore :: String -> Either String [Located CoreToken]
lexCore s = runAlex s (alexSetStartCode layout_keyword *> lexStream)
-- temp; does not support module header
lexCore' :: String -> Either String [CoreToken]
lexCore' s = fmap f <$> lexCore s
where f (Located _ t) = t
lexWith :: (String -> CoreToken) -> Lexer
lexWith f (p,_,_,s) l = pure $ Located p (f $ take l s)
lexToken :: Alex (Located CoreToken)
lexToken = alexMonadScan
getSrcCol :: Alex Int
getSrcCol = Alex $ \ st ->
let AlexPn _ _ col = alex_pos st
in Right (st, col)
lbrace :: Lexer
lbrace (p,_,_,_) _ = do
pushContext NoLayout
pure $ Located p TokenLBrace
rbrace :: Lexer
rbrace (p,_,_,_) _ = do
popContext
pure $ Located p TokenRBrace
insRBrace :: AlexPosn -> Alex (Located CoreToken)
insRBrace p = do
popContext
pure $ Located p TokenRBrace
insSemi :: AlexPosn -> Alex (Located CoreToken)
insSemi p = do
pure $ Located p TokenSemicolon
modifyUst :: (AlexUserState -> AlexUserState) -> Alex ()
modifyUst f = do
st <- alexGetUserState
alexSetUserState $ f st
getUst :: Alex AlexUserState
getUst = alexGetUserState
newLayoutContext :: Lexer
newLayoutContext (p,_,_,_) _ = do
undefined
noBrace :: Lexer
noBrace (p,_,_,_) l = do
col <- getSrcCol
pushContext (Layout col)
pure $ Located p TokenLBrace
getOffside :: Alex Ordering
getOffside = do
ctx <- getContext
m <- getSrcCol
case ctx of
Layout n : _ -> pure $ m `compare` n
_ -> pure GT
doBol :: Lexer
doBol (p,c,_,s) l = do
off <- getOffside
col <- getSrcCol
case off of
LT -> insRBrace p
EQ -> insSemi p
_ -> lexToken
letin :: Lexer
letin (p,_,_,_) l = do
popContext
pure $ Located p TokenIn
lexTmp :: IO [CoreToken]
lexTmp = do
s <- readFile "/tmp/t.hs"
case lexCore' s of
Left e -> error e
Right a -> pure a
}

View File

@@ -1,95 +0,0 @@
{-# LANGUAGE LambdaCase, BlockArguments #-}
module Core.Parse
( parseCore
)
where
----------------------------------------------------------------------------------
import Control.Parser
import Data.Functor ((<&>), ($>))
import Core.Lex
import Core.Syntax
----------------------------------------------------------------------------------
type CoreParser = ParserT [CoreToken] Result
parseCore :: [CoreToken] -> Result Program
parseCore = fmap snd . runParserT program
program :: CoreParser Program
program = Program <$> termMany (char TokSemicolon) scdef
scdef :: CoreParser ScDef
scdef = ScDef <$> f <*> (xs <* eq) <*> body
where
f = name
xs = many name
eq = char TokEquals
body = expr
expr :: CoreParser Expr
expr = letE
<|> app
<|> lam
<|> atom
atom :: CoreParser Expr
atom = var
<|> con
<|> parenE
<|> lit
where
var = Var <$> name
parenE = surround (char TokLParen) (char TokRParen) expr
lit = IntE <$> litInt
lam :: CoreParser Expr
lam = Lam <$> (l *> bs) <*> (arrow *> expr)
where
l = char TokLambda
arrow = char TokArrow
bs = some name
app :: CoreParser Expr
app = foldl App <$> atom <*> some atom
con :: CoreParser Expr
con = pack *> (Con <$> (l *> tag) <*> (arity <* r))
where
l = char TokLBrace
r = char TokRBrace
tag = litInt
arity = litInt
pack = match \case
TokCName "Pack" -> Just ()
_ -> Nothing
letE :: CoreParser Expr
letE = Let <$> word <*> defs <*> (char TokIn *> expr)
where
word = char TokLet $> NonRec
<|> char TokLetRec $> Rec
defs = surround (char TokLBrace) (char TokRBrace) bindings
bindings :: CoreParser [Binding]
bindings = sepSome (char TokSemicolon) binding
binding :: CoreParser Binding
binding = Binding <$> name <*> (char TokEquals *> expr)
----------------------------------------------------------------------------------
name :: CoreParser Name
name = match \case
TokName n -> Just n
_ -> Nothing
cName :: CoreParser Name
cName = match \case
TokCName n -> Just n
_ -> Nothing
litInt :: CoreParser Int
litInt = match \case
TokLitInt n -> Just n
_ -> Nothing

128
src/Core/Parse.y Normal file
View File

@@ -0,0 +1,128 @@
-- TODO: resolve shift/reduce conflicts
{
module Core.Parse
( parseCore
, parseCoreExpr
, module Core.Lex -- temp convenience
, parseTmp
)
where
import Data.Foldable (foldl')
import Core.Syntax
import Core.Lex
import Compiler.RLPC
}
%name parseCore Module
%name parseCoreExpr Expr
%tokentype { CoreToken }
%error { parseError }
%monad { RLPC }
%token
let { TokenLet }
letrec { TokenLetrec }
module { TokenModule }
where { TokenWhere }
',' { TokenComma }
in { TokenIn }
litint { TokenLitInt $$ }
varname { TokenVarName $$ }
varsym { TokenVarSym $$ }
conname { TokenConName $$ }
consym { TokenConSym $$ }
'λ' { TokenLambda }
'->' { TokenArrow }
'=' { TokenEquals }
'(' { TokenLParen }
')' { TokenRParen }
'{' { TokenLBrace }
'}' { TokenRBrace }
';' { TokenSemicolon }
eof { TokenEOF }
%%
Module :: { Module }
Module : module conname where Program Eof { Module (Just ($2, [])) $4 }
| Program Eof { Module Nothing $1 }
Eof :: { () }
Eof : eof { () }
| error { () }
Program :: { Program }
Program : '{' ScDefs Close { Program $2 }
ScDefs :: { [ScDef] }
ScDefs : ScDef ';' ScDefs { $1 : $3 }
| {- epsilon -} { [] }
ScDef :: { ScDef }
ScDef : Var ParList '=' Expr { ScDef $1 $2 $4 }
ParList :: { [Name] }
ParList : Var ParList { $1 : $2 }
| {- epsilon -} { [] }
Expr :: { Expr }
Expr : let '{' Bindings Close in Expr { Let NonRec $3 $6 }
| letrec '{' Bindings Close in Expr { Let Rec $3 $6 }
| 'λ' Binders '->' Expr { Lam $2 $4 }
| Application { $1 }
| Expr1 { $1 }
Close :: { () }
Close : '}' { () }
| error { () }
Binders :: { [Name] }
Binders : Var Binders { $1 : $2 }
| Var { [$1] }
Application :: { Expr }
Application : Expr1 AppArgs { foldl' App $1 $2 }
-- TODO: Application can probably be written as a single rule, without AppArgs
AppArgs :: { [Expr] }
AppArgs : Expr1 AppArgs { $1 : $2 }
| Expr1 { [$1] }
Expr1 :: { Expr }
Expr1 : litint { IntE $1 }
| Id { Var $1 }
| '(' Expr ')' { $2 }
Bindings :: { [Binding] }
Bindings : Binding ';' Bindings { $1 : $3 }
| Binding ';' { [$1] }
| Binding { [$1] }
Binding :: { Binding }
Binding : Var '=' Expr { $1 := $3 }
Id :: { Name }
Id : Var { $1 }
| Con { $1 }
Var :: { Name }
Var : '(' varsym ')' { $2 }
| varname { $1 }
Con :: { Name }
Con : '(' consym ')' { $2 }
| conname { $1 }
{
parseError :: [CoreToken] -> a
parseError ts = error $ "parse error at token: " <> show (head ts)
parseTmp :: IO (Module)
parseTmp = do
s <- readFile "/tmp/t.hs"
case lexCore' s >>= runRLPC . parseCore of
Left e -> error e
Right a -> pure a
}

View File

@@ -8,19 +8,22 @@ module Core.Syntax
, Alter(..)
, Name
, ScDef(..)
, Module(..)
, Program(..)
, corePrelude
, bindersOf
, rhssOf
, isAtomic
, insertModule
)
where
----------------------------------------------------------------------------------
import Data.Coerce
import Data.Pretty
import Data.List (intersperse)
import Data.Function ((&))
import Data.List (intersperse)
import Data.Function ((&))
import Data.String
import Language.Haskell.TH.Syntax (Lift)
----------------------------------------------------------------------------------
data Expr = Var Name
@@ -30,14 +33,14 @@ data Expr = Var Name
| Lam [Name] Expr
| App Expr Expr
| IntE Int
deriving Show
deriving (Show, Lift)
infixl 2 :$
pattern (:$) :: Expr -> Expr -> Expr
pattern f :$ x = App f x
data Binding = Binding Name Expr
deriving Show
deriving (Show, Lift)
infixl 1 :=
pattern (:=) :: Name -> Expr -> Binding
@@ -45,24 +48,36 @@ pattern k := v = Binding k v
data Rec = Rec
| NonRec
deriving (Show, Eq)
deriving (Show, Eq, Lift)
data Alter = Alter Int [Name] Expr
deriving Show
deriving (Show, Lift)
type Name = String
data ScDef = ScDef Name [Name] Expr
deriving (Show)
deriving (Show, Lift)
data Module = Module (Maybe (Name, [Name])) Program
deriving (Show, Lift)
newtype Program = Program [ScDef]
deriving (Show)
deriving (Show, Lift)
instance IsString Expr where
fromString = Var
----------------------------------------------------------------------------------
instance Pretty Program where
-- TODO: module header
prettyPrec (Program ss) _ = mconcat $ intersperse "\n\n" $ fmap pretty ss
instance Pretty ScDef where
prettyPrec (ScDef n as e) _ =
mconcat (intersperse " " $ fmap IStr (n:as))
<> " = " <> pretty e <> IBreak
instance Pretty Expr where
prettyPrec (Var k) = withPrec maxBound $ IStr k
prettyPrec (IntE n) = withPrec maxBound $ iShow n
@@ -105,7 +120,7 @@ instance Pretty Binding where
----------------------------------------------------------------------------------
instance Semigroup Program where
(<>) = coerce $ (++) @ScDef
(<>) = coerce $ (<>) @[ScDef]
instance Monoid Program where
mempty = Program []
@@ -124,15 +139,19 @@ isAtomic _ = False
----------------------------------------------------------------------------------
corePrelude :: Program
corePrelude = Program
corePrelude :: Module
corePrelude = Module (Just ("Prelude", [])) $ Program
[ ScDef "id" ["x"] (Var "x")
, ScDef "K" ["x", "y"] (Var "x")
, ScDef "K1" ["x", "y"] (Var "y")
, ScDef "S" ["f", "g", "x"] (Var "f" :$ Var "x" :$ (Var "g" :$ Var "x"))
, ScDef "k" ["x", "y"] (Var "x")
, ScDef "k1" ["x", "y"] (Var "y")
, ScDef "succ" ["f", "g", "x"] (Var "f" :$ Var "x" :$ (Var "g" :$ Var "x"))
, ScDef "compose" ["f", "g", "x"] (Var "f" :$ (Var "g" :$ Var "x"))
, ScDef "twice" ["f", "x"] (Var "f" :$ (Var "f" :$ Var "x"))
, ScDef "False" [] $ Con 0 0
, ScDef "True" [] $ Con 1 0
]
-- TODO: export list awareness
insertModule :: Module -> Program -> Program
insertModule (Module _ m) p = p <> m

42
src/Core/TH.hs Normal file
View File

@@ -0,0 +1,42 @@
module Core.TH
( coreExpr
, core
)
where
----------------------------------------------------------------------------------
import Language.Haskell.TH
import Language.Haskell.TH.Syntax
import Language.Haskell.TH.Quote
import Core.Parse
import Core.Lex
----------------------------------------------------------------------------------
core :: QuasiQuoter
core = QuasiQuoter
{ quoteExp = qCore
, quotePat = error "core quasiquotes may only be used in expressions"
, quoteType = error "core quasiquotes may only be used in expressions"
, quoteDec = error "core quasiquotes may only be used in expressions"
}
coreExpr :: QuasiQuoter
coreExpr = QuasiQuoter
{ quoteExp = qCoreExpr
, quotePat = error "core quasiquotes may only be used in expressions"
, quoteType = error "core quasiquotes may only be used in expressions"
, quoteDec = error "core quasiquotes may only be used in expressions"
}
qCore = undefined
qCoreExpr = undefined
-- qCore :: String -> Q Exp
-- qCore s = case lexCore s >>= parseCore of
-- Success a -> lift a
-- Error e _ _ -> error e
-- qCoreExpr :: String -> Q Exp
-- qCoreExpr s = case lexCore s >>= parseCoreExpr of
-- Success a -> lift a
-- Error e _ _ -> error e

View File

@@ -71,7 +71,7 @@ compile prog = Just $ TiState s d h g stats
s = [mainAddr]
d = []
(h,g) = buildInitialHeap defs
defs = prog <> corePrelude
defs = insertModule corePrelude prog
stats = Stats 0 0 0
mainAddr = fromJust $ lookup "main" g
@@ -422,91 +422,91 @@ hdbgProg p hio = do
TiState [resAddr] _ h _ sts = last p'
res = hLookupUnsafe resAddr h
letrecExample :: Program
letrecExample = Program
[ ScDef "pair" ["x","y","f"] $ "f" :$ "x" :$ "y"
, ScDef "fst" ["p"] $ "p" :$ "K"
, ScDef "snd" ["p"] $ "p" :$ "K1"
, ScDef "f" ["x","y"] $
Let Rec
[ "a" := "pair" :$ "x" :$ "b"
, "b" := "pair" :$ "y" :$ "a"
]
("fst" :$ ("snd" :$ ("snd" :$ ("snd" :$ "a"))))
, ScDef "main" [] $ "f" :$ IntE 3 :$ IntE 4
]
-- letrecExample :: Program
-- letrecExample = Program
-- [ ScDef "pair" ["x","y","f"] $ "f" :$ "x" :$ "y"
-- , ScDef "fst" ["p"] $ "p" :$ "K"
-- , ScDef "snd" ["p"] $ "p" :$ "K1"
-- , ScDef "f" ["x","y"] $
-- Let Rec
-- [ "a" := "pair" :$ "x" :$ "b"
-- , "b" := "pair" :$ "y" :$ "a"
-- ]
-- ("fst" :$ ("snd" :$ ("snd" :$ ("snd" :$ "a"))))
-- , ScDef "main" [] $ "f" :$ IntE 3 :$ IntE 4
-- ]
idExample :: Program
idExample = Program
[ ScDef "main" [] $ "id" :$ IntE 3
]
-- idExample :: Program
-- idExample = Program
-- [ ScDef "main" [] $ "id" :$ IntE 3
-- ]
indExample1 :: Program
indExample1 = Program
[ ScDef "main" [] $ "twice" :$ "twice" :$ "id" :$ IntE 3
]
-- indExample1 :: Program
-- indExample1 = Program
-- [ ScDef "main" [] $ "twice" :$ "twice" :$ "id" :$ IntE 3
-- ]
indExample2 :: Program
indExample2 = Program
[ ScDef "main" [] $ "twice" :$ "twice" :$ "twice" :$ "id" :$ IntE 3
]
-- indExample2 :: Program
-- indExample2 = Program
-- [ ScDef "main" [] $ "twice" :$ "twice" :$ "twice" :$ "id" :$ IntE 3
-- ]
indExample3 :: Program
indExample3 = Program
[ ScDef "main" [] $
Let Rec
[ "x" := IntE 2
, "y" := "f" :$ "x" :$ "x"
]
("g" :$ "y" :$ "y")
, ScDef "f" ["a","b"] $ "b"
, ScDef "g" ["a","b"] $ "a"
]
-- indExample3 :: Program
-- indExample3 = Program
-- [ ScDef "main" [] $
-- Let Rec
-- [ "x" := IntE 2
-- , "y" := "f" :$ "x" :$ "x"
-- ]
-- ("g" :$ "y" :$ "y")
-- , ScDef "f" ["a","b"] $ "b"
-- , ScDef "g" ["a","b"] $ "a"
-- ]
negExample1 :: Program
negExample1 = Program
[ ScDef "main" [] $
"negate#" :$ ("id" :$ IntE 3)
]
-- negExample1 :: Program
-- negExample1 = Program
-- [ ScDef "main" [] $
-- "negate#" :$ ("id" :$ IntE 3)
-- ]
negExample2 :: Program
negExample2 = Program
[ ScDef "main" [] $
"negate#" :$ IntE 3
]
-- negExample2 :: Program
-- negExample2 = Program
-- [ ScDef "main" [] $
-- "negate#" :$ IntE 3
-- ]
negExample3 :: Program
negExample3 = Program
[ ScDef "main" [] $
"twice" :$ "negate#" :$ IntE 3
]
-- negExample3 :: Program
-- negExample3 = Program
-- [ ScDef "main" [] $
-- "twice" :$ "negate#" :$ IntE 3
-- ]
arithExample1 :: Program
arithExample1 = Program
[ ScDef "main" [] $
"+#" :$ (IntE 3) :$ ("negate#" :$ (IntE 2))
]
-- arithExample1 :: Program
-- arithExample1 = Program
-- [ ScDef "main" [] $
-- "+#" :$ (IntE 3) :$ ("negate#" :$ (IntE 2))
-- ]
arithExample2 :: Program
arithExample2 = Program
[ ScDef "main" [] $
"negate#" :$ ("+#" :$ (IntE 2) :$ ("*#" :$ IntE 5 :$ IntE 3))
]
-- arithExample2 :: Program
-- arithExample2 = Program
-- [ ScDef "main" [] $
-- "negate#" :$ ("+#" :$ (IntE 2) :$ ("*#" :$ IntE 5 :$ IntE 3))
-- ]
ifExample :: Program
ifExample = Program
[ ScDef "main" [] $
"if#" :$ "True" :$ IntE 2 :$ IntE 3
]
-- ifExample :: Program
-- ifExample = Program
-- [ ScDef "main" [] $
-- "if#" :$ "True" :$ IntE 2 :$ IntE 3
-- ]
facExample :: Program
facExample = Program
[ ScDef "fac" ["n"] $
"if#" :$ ("==#" :$ "n" :$ IntE 0)
:$ (IntE 1)
:$ ("*#" :$ "n" :$ ("fac" :$ ("-#" :$ "n" :$ IntE 1)))
, ScDef "main" [] $ "fac" :$ IntE 3
]
-- facExample :: Program
-- facExample = Program
-- [ ScDef "fac" ["n"] $
-- "if#" :$ ("==#" :$ "n" :$ IntE 0)
-- :$ (IntE 1)
-- :$ ("*#" :$ "n" :$ ("fac" :$ ("-#" :$ "n" :$ IntE 1)))
-- , ScDef "main" [] $ "fac" :$ IntE 3
-- ]
----------------------------------------------------------------------------------