2023-11-20 14:09:33 -07:00
15 changed files with 792 additions and 430 deletions
--- a/docs/src/commentary/layout-lexing.rst
+++ b/docs/src/commentary/layout-lexing.rst
@@ -0,0 +1,170 @@
+Lexing, Parsing, and Layouts
+============================
+
+The C-style languages of my previous experiences have all had quite trivial
+lexical analysis stages, peaking in complexity when I streamed tokens lazily in
+C. The task of tokenising a C-style language is very simple in description: you
+ignore all whitespace and point out what you recognise. If you don't recognise
+something, check if it's a literal or an identifier. Should it be neither,
+return an error.
+
+On paper, both lexing and parsing a Haskell-like language seem to pose a few
+greater challenges. Listed by ascending intimidation factor, some of the
+potential roadblocks on my mind before making an attempt were:
+
+* Operators; Haskell has not only user-defined infix operators, but user-defined
+  precedence levels and associativities. I recall using an algorithm that looked
+  up infix, prefix, postfix, and even mixfix operators up in a global table to
+  call their appropriate parser (if their precedence was appropriate, also
+  stored in the table). I never modified the table at runtime, however this
+  could be a very nice solution for Haskell.
+
+* Context-sensitive keywords; Haskell allows for some words to be used as identifiers in
+  appropriate contexts, such as :code:`family`, :code:`role`, :code:`as`.
+  Reading a note_ found in `GHC's lexer
+  <https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser/Lexer.x#L1133>`_,
+  it appears that keywords are only considered in bodies for which their use is
+  relevant, e.g. :code:`family` and :code:`role` in type declarations,
+  :code:`as` after :code:`case`; :code:`if`, :code:`then`, and :code:`else` in
+  expressions, etc.
+
+* Whitespace sensitivity; While I was comfortable with the idea of a system
+  similar to Python's INDENT/DEDENT tokens, Haskell seemed to use whitespace to
+  section code in a way that *felt* different.
+
+.. _note: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/coding-style#2-using-notes
+
+After a bit of thought and research, whitespace sensitivity in the form of
+*layouts* as Haskell and I will refer to them as, are easily the scariest thing
+on this list -- however they are achievable!
+
+A Lexical Primer: Python
+************************
+
+We will compare and contrast with Python's lexical analysis. Much to my dismay,
+Python uses newlines and indentation to separate statements and resolve scope
+instead of the traditional semicolons and braces found in C-style languages (we
+may generally refer to these C-style languages as *explicitly-sectioned*).
+Internally during tokenisation, when the Python lexer begins a new line, they
+compare the indentation of the new line with that of the previous and apply the
+following rules:
+
+1. If the new line has greater indentation than the previous, insert an INDENT
+   token and push the new line's indentation level onto the indentation stack
+   (the stack is initialised with an indentation level of zero).
+
+2. If the new line has lesser indentation than the previous, pop the stack until
+   the top of the stack is greater than the new line's indentation level. A
+   DEDENT token is inserted for each level popped.
+
+3. If the indentation is equal, insert a NEWLINE token to terminate the previous
+   line, and leave it at that!
+
+Parsing Python with the INDENT, DEDENT, and NEWLINE tokens is identical to
+parsing a language with braces and semicolons. This is a solution pretty in line
+with Python's philosophy of the "one correct answer" (TODO: this needs a
+source). In developing our *layout* rules, we will follow in the pattern of
+translating the whitespace-sensitive source language to an explicitly sectioned
+language.
+
+But What About Haskell?
+***********************
+
+We saw that Python, the most notable example of an implicitly sectioned
+language, is pretty simple to lex. Why then am I so afraid of Haskell's layouts?
+To be frank, I'm far less scared after asking myself this -- however there are
+certainly some new complexities that Python needn't concern. Haskell has
+implicit line *continuation*: forms written over multiple lines; indentation
+styles often seen in Haskell are somewhat esoteric compared to Python's
+"s/[{};]//".
+
+.. code-block:: haskell
+
+   -- line continuation
+   something = this is a
+       single expression
+
+   -- an extremely common style found in haskell
+   data Python = Users
+       { are        :: Crying
+       , right      :: About
+       , now        :: Sorry
+       }
+
+   -- another formatting oddity
+   -- note that this is not line contiation!
+   -- `look at`, `this`, and `alignment`
+   -- are all separate expressions!
+   anotherThing = do look at
+                     this
+                     alignment
+
+But enough fear, lets actually think about implementation. Firstly, some
+formality: what do we mean when we say layout? We will define layout as the
+rules we apply to an implicitly-sectioned language in order to yield one that is
+explicitly-sectioned. We will also define indentation of a lexeme as the column
+number of its first character.
+
+Thankfully for us, our entry point is quite clear; layouts only appear after a
+select few keywords, (with a minor exception; TODO: elaborate) being :code:`let`
+(followed by supercombinators), :code:`where` (followed by supercombinators),
+:code:`do` (followed by expressions), and :code:`of` (followed by alternatives)
+(TODO: all of these terms need linked glossary entries). Under this assumption,
+we give the following rule:
+
+1. If a :code:`let`, :code:`where`, :code:`do`, or :code:`of` keyword is not
+   followed by the lexeme :code:`{`, the token :math:`\{n\}` is inserted after
+   the keyword, where :math:`n` is the indentation of the next lexeme if there
+   is one, or 0 if the end of file has been reached.
+
+Henceforth :math:`\{n\}` will denote the token representing the begining of a
+layout; similar in function to a brace, but it stores the indentation level for
+subsequent lines to compare with. We must introduce an additional input to the
+function handling layouts. Obviously, such a function would require the input
+string, but a helpful book-keeping tool which we will make good use of is a
+stack of "layout contexts", describing the current cascade of layouts. Each
+element is either a :code:`NoLayout`, indicating an explicit layout (i.e. the
+programmer inserted semicolons and braces herself) or a :code:`Layout n` where
+:code:`n` is a non-negative integer representing the indentation level of the
+enclosing context.
+
+.. code-block:: haskell
+
+    f x -- layout stack: []
+        = let -- layout keyword; remember indentation of next token
+              y = w * w -- layout stack: [Layout 10]
+              w = x + x
+          in do -- layout keyword; next token is a brace!
+              { -- layout stack: [NoLayout]
+              pure }
+
+In the code seen above, notice that :code:`let` allows for multiple definitions,
+separated by a newline. We accomate for this with a token :math:`\langle n
+\rangle` which compliments :math:`\{n\}` in how it functions as a closing brace
+that stores indentation. We give a rule to describe the source of such a token:
+
+2. When the first lexeme on a line is preceeded by only whitespace a
+   :math:`\langle n \rangle` token is inserted before the lexeme, where
+   :math:`n` is the indentation of the lexeme, provided that it is not, as a
+   consequence of rule 1 or rule 3 (as we'll see), preceded by {n}. 
+
+Lastly, to handle the top level we will initialise the stack with a
+:math:`\{n\}` where :math:`n` is the indentation of the first lexeme.
+
+3. If the first lexeme of a module is not '{' or :code:`module`, then it is
+   preceded by :math:`\{n\}` where :math:`n` is the indentation of the lexeme. 
+
+This set of rules is adequete enough to satisfy our basic concerns about line
+continations and layout lists. For a more pedantic description of the layout
+system, see `chapter 10
+<https://www.haskell.org/onlinereport/haskell2010/haskellch10.html>`_ of the
+2010 Haskell Report, which I **heavily** referenced here.
+
+References
+----------
+
+* `Python's lexical analysis
+  <https://docs.python.org/3/reference/lexical_analysis.html>`_
+
+* `Haskell Syntax Reference
+  <https://www.haskell.org/onlinereport/haskell2010/haskellch10.html>`_
--- a/docs/src/commentary/parser.rst
+++ b/docs/src/commentary/parser.rst
@@ -1,3 +0,0 @@
-Parser Combinators
-==================
-
--- a/docs/src/glossary.rst
+++ b/docs/src/glossary.rst
@@ -0,0 +1,15 @@
+Glossary
+========
+
+Haskell and Haskell culture is infamous for using scary mathematical terms for
+simple ideas. Please excuse us, it's really fun :3.
+
+.. glossary::
+
+   supercombinator
+      An expression with no free variables. For most purposes, just think of a
+      top-level definition.
+
+   case alternative
+      An possible match in a case expression (TODO: example)
+
--- a/docs/src/index.rst
+++ b/docs/src/index.rst
@@ -6,6 +6,12 @@ Contents

 .. toctree::
   :maxdepth: 2
+   :caption: Index
+
+   glossary.rst
+
+.. toctree::
+   :maxdepth: 1
   :caption: Commentary
   :glob:

--- a/rlp.cabal
+++ b/rlp.cabal
@@ -22,16 +22,22 @@ library
                    , TIM
    other-modules:    Data.Heap
                    , Data.Pretty
-                    , Control.Parser
                    , Core.Syntax
                    , Core.Parse
+                    , Core.TH
+                    , Core.Examples
                    , Core.Lex

+    build-tool-depends: happy:happy, alex:alex
+
    -- other-extensions:
    build-depends:    base ^>=4.18.0.0
                    , containers
                    , microlens
                    , microlens-th
+                    , template-haskell
+                    -- required for happy
+                    , array
    hs-source-dirs:   src
    default-language: GHC2021

--- a/src/Compiler/RLPC.hs
+++ b/src/Compiler/RLPC.hs
@@ -0,0 +1,10 @@
+{-# LANGUAGE GeneralisedNewtypeDeriving #-}
+module Compiler.RLPC
+    ( RLPC(..)
+    )
+    where
+
+-- TODO: fancy errors
+newtype RLPC a = RLPC { runRLPC :: Either String a }
+    deriving (Functor, Applicative, Monad)
+
--- a/src/Control/Parser.hs
+++ b/src/Control/Parser.hs
@@ -1,101 +0,0 @@
-{-|
-Module      : Control.Parser
-Description : Parser combinators
-
-This module implements an interface for parser *types*, used in lexical analysis
-and parsing. For the implementation of the rlp language's parser, see 'Parse'.
-}
-{-# LANGUAGE GeneralisedNewtypeDeriving #-}
-{-# LANGUAGE BlockArguments, LambdaCase #-}
-module Control.Parser
-    ( ParserT
-    , runParserT
-
-    , satisfy
-    , char
-    , spaces
-    , surround
-    , string
-    , match
-    , termMany
-    , sepSome
-
-    -- * Control.Applicative re-exports
-    , (<|>)
-    , many
-    , some
-    , empty
-    )
-    where
----------------------------------------------------------------------------------
-import Control.Applicative
-import Control.Arrow                ((***))
-import Control.Monad
-import Data.Char
----------------------------------------------------------------------------------
-
-newtype ParserT i m o = ParserT { runParserT :: i -> m (i, o) }
-    deriving (Functor)
-
-instance (Monad m) => Applicative (ParserT i m) where
-    pure a = ParserT \i -> pure (i, a)
-
-    m <*> k = ParserT \i -> do
-        (i',f) <- runParserT m i
-        fmap (id *** f) $ runParserT k i'
-
-instance (MonadPlus m) => Alternative (ParserT i m) where
-    empty = ParserT $ const empty
-
-    ParserT m <|> ParserT k = ParserT $ \i ->
-        m i <|> k i
-
-instance (MonadPlus m) => MonadPlus (ParserT i m)
-
-instance (Monad m) => Monad (ParserT i m) where
-    m >>= k = ParserT $ \i -> do
-        (i',a) <- runParserT m i
-        runParserT (k a) i'
-
-instance (MonadFail m) => MonadFail (ParserT i m) where
-    fail s = ParserT $ \i -> fail s
-
----------------------------------------------------------------------------------
-
-- TODO: generalise to non-lists
-satisfy :: (MonadPlus m) => (a -> Bool) -> ParserT [a] m a
-satisfy p = ParserT $ \case
-    (x:xs) | p x -> pure (xs,x)
-    _            -> empty
-
-match :: (MonadPlus m) => (a -> Maybe b) -> ParserT [a] m b
-match f = ParserT $ \case
-    (x:xs) -> case f x of
-        Just b  -> pure (xs,b)
-        Nothing -> empty
-    [] -> empty
-
-termMany :: (MonadPlus m) => ParserT i m t -> ParserT i m o -> ParserT i m [o]
-termMany t a = many (a <* t)
-
-sepSome :: (MonadPlus m) => ParserT i m t -> ParserT i m o -> ParserT i m [o]
-sepSome s a = (:) <$> a <*> many (s *> a)
-
-char :: (MonadPlus m, Eq a) => a -> ParserT [a] m a
-char c = satisfy (==c)
-
-string :: (MonadPlus m, Eq a) => [a] -> ParserT [a] m [a]
-string s = sequenceA $ char <$> s
-
----------------------------------------------------------------------------------
-
-surround :: (MonadPlus m)
-         => ParserT i m l
-         -> ParserT i m r
-         -> ParserT i m c
-         -> ParserT i m c
-surround l r c = l *> c <* r
-
-spaces :: (MonadPlus m) => ParserT String m Int
-spaces = length <$> many (satisfy (==' '))
-
--- a/src/Core/Examples.hs
+++ b/src/Core/Examples.hs
@@ -0,0 +1,22 @@
+{-# LANGUAGE QuasiQuotes #-}
+module Core.Examples where
+----------------------------------------------------------------------------------
+import Core.Syntax
+import Core.TH
+----------------------------------------------------------------------------------
+
+{-
+
+letrecExample :: Program
+letrecExample = [core|
+pair x y f = f x y;
+fst p = p k;
+snd p = p k1;
+f x y = letrec
+    { a = pair x b;
+    ; b = pair y a
+    } in fst (snd (snd (snd a)));
+main = f 3 4;
+|]
+
+-}
--- a/src/Core/Lex.hs
+++ b/src/Core/Lex.hs
@@ -1,141 +0,0 @@
-{-|
-Module      : Core.Lex
-Description : Core language lexer
-}
-module Core.Lex
-    ( CoreToken(..)
-    , Result(..)
-    , lexCore
-    )
-    where
----------------------------------------------------------------------------------
-import Control.Parser
-import Control.Applicative
-import Control.Monad
-import Data.Char
-import Data.Functor
-import Core.Syntax              (Name)
----------------------------------------------------------------------------------
-
-type CoreLexer = ParserT String Result
-
-data Result a = Success a
-              | Error String Int Int
-              deriving (Show)
-
-- TODO: whitespace-sensitive layout
-data CoreToken = TokLitInt Int
-               | TokEquals
-               | TokLBrace
-               | TokRBrace
-               | TokSemicolon
-               | TokLParen
-               | TokRParen
-               | TokLambda
-               | TokArrow
-               | TokCase
-               | TokOf
-               | TokLet
-               | TokLetRec
-               | TokIn
-               | TokCName Name
-               | TokName Name
-               deriving (Show, Eq)
-
-instance Functor Result where
-    fmap f (Success a)   = Success (f a)
-    fmap _ (Error s l c) = Error s l c
-
-instance Foldable Result where
-    foldr f z (Success a)   = a `f` z
-    foldr _ z (Error _ _ _) = z
-
-instance Traversable Result where
-    traverse k (Success a)   = fmap Success (k a)
-    traverse _ (Error s l c) = pure $ Error s l c
-
-instance Applicative Result where
-    pure = Success
-
-    liftA2 f (Success a)   (Success b)   = Success $ f a b
-    liftA2 _ (Error s l c) _             = Error s l c
-    liftA2 _ _             (Error s l c) = Error s l c
-
-instance Alternative Result where
-    empty = Error "some error! this is a temporary system lol" 0 0
-
-    (Success a) <|> _ = Success a
-    _           <|> b = b
-
-instance Monad Result where
-    Success a   >>= k = k a
-    Error s l c >>= _ = Error s l c
-
-instance MonadPlus Result
-
-instance MonadFail Result where
-    fail s = Error s 0 0
-
----------------------------------------------------------------------------------
-
-lexCore :: String -> Result [CoreToken]
-lexCore = fmap snd . runParserT (many (token <* spaces))
-
-token :: CoreLexer CoreToken
-token = litInt
-    <|> lbrace
-    <|> rbrace
-    <|> semicolon
-    <|> lparen
-    <|> rparen
-    <|> equals
-    <|> lambda
-    <|> arrow
-    <|> _case
-    <|> _of
-    <|> letrec
-    <|> _let
-    <|> _in
-    <|> cName
-    <|> name
-
----------------------------------------------------------------------------------
-
-litInt, equals, lparen, rparen, lambda,
-    arrow, _case, _of, _let, letrec, _in, cName, name :: CoreLexer CoreToken
-
-litInt = TokLitInt . value <$> some (satisfy isDigit)
-    where
-        value = foldl (\acc a -> 10*acc + digitToInt a) 0
-
-semicolon = (semis <|> nls)        $> TokSemicolon
-    where
-        nls = head <$> some (char '\n')
-        semis = char ';' <* many (char '\n')
-equals = char '='                  $> TokEquals
-lbrace = char '{'                  $> TokLBrace
-rbrace = char '}'                  $> TokRBrace
-lparen = char '('                  $> TokLParen
-rparen = char ')'                  $> TokRParen
-lambda = (char '\\' <|> char 'λ')  $> TokLambda
-arrow  = string "->"               $> TokArrow
-_case  = string "case"             $> TokCase
-_of    = string "of"               $> TokOf
-_let   = string "let"              $> TokLet
-letrec = string "letrec"           $> TokLetRec
-_in    = string "in"               $> TokIn
-
-cName = TokCName <$> ((:) <$> cNameHead <*> properNameTail)
-    where cNameHead = satisfy isUpper
-
-name = some (satisfy p) <&> TokName
-    where p c = not (isSpace c) && c `notElem` ";{}()"
-
-properName :: CoreLexer Name
-properName = (:) <$> nameHead <*> properNameTail
-    where nameHead = satisfy isLetter
-
-properNameTail :: CoreLexer Name
-properNameTail = many . satisfy $ \c ->
-    isLetter c || isDigit c || c == '_'
-
--- a/src/Core/Lex.x
+++ b/src/Core/Lex.x
@@ -0,0 +1,284 @@
+{
+{-# LANGUAGE TemplateHaskell #-}
+module Core.Lex
+    ( lexCore
+    , lexCore'
+    , CoreToken(..)
+    , lexTmp
+    )
+    where
+import Data.Char (chr)
+import Debug.Trace
+import Core.Syntax
+import Lens.Micro
+import Lens.Micro.TH
+}
+
+%wrapper "monadUserState"
+
+$whitechar = [ \t\n\r\f\v]
+$special   = [\(\)\,\;\[\]\{\}]
+
+$ascdigit  = 0-9
+$unidigit  = [] -- TODO
+$digit     = [$ascdigit $unidigit]
+
+$ascsymbol = [\!\#\$\%\&\*\+\.\/\<\=\>\?\@\\\^\|\-\~]
+$unisymbol = [] -- TODO
+$symbol    = [$ascsymbol $unisymbol] # [$special \_\:\"\']
+
+$large     = [A-Z \xc0-\xd6 \xd8-\xde]
+$small     = [a-z \xdf-\xf6 \xf8-\xff \_]
+$alpha     = [$small $large]
+
+$graphic   = [$small $large $symbol $digit $special \:\"\']
+
+$octit     = 0-7
+$hexit     = [0-9 A-F a-f]
+$namechar  = [$alpha $digit \']
+$symchar   = [$symbol \:]
+$nl        = [\n\r]
+$white_no_nl = $white # $nl
+
+@reservedid = 
+    case|data|do|import|in|let|letrec|module|of|where
+
+@reservedop =
+    "=" | \\ | "->"
+
+@varname   = $small $namechar*
+@conname   = $large $namechar*
+@varsym    = $symbol $symchar*
+@consym    = \: $symchar*
+
+@decimal = $digit+
+
+rlp :-
+
+-- everywhere: skip whitespace
+$white_no_nl+           { skip }
+
+"--"\-*[^$symbol].*     { skip }
+
+"{-"                        { nestedComment }
+
+-- syntactic symbols
+<0>
+{
+    "("                     { constTok TokenLParen }
+    ")"                     { constTok TokenRParen }
+    "{"                     { lbrace }
+    "}"                     { rbrace }
+    ";"                     { constTok TokenSemicolon }
+    ","                     { constTok TokenComma }
+}
+
+-- keywords
+-- see commentary on the layout system
+<0>
+{
+    "let"                   { constTok TokenLet `andBegin` layout_keyword }
+    "letrec"                { constTok TokenLet `andBegin` layout_keyword }
+    "of"                    { constTok TokenOf `andBegin` layout_keyword }
+    "case"                  { constTok TokenCase }
+    "module"                { constTok TokenModule }
+    "in"                    { letin }
+    "where"                 { constTok TokenWhere }
+}
+
+-- reserved symbols
+<0>
+{
+    "="                     { constTok TokenEquals }
+    "->"                    { constTok TokenArrow }
+}
+
+-- identifiers
+<0>
+{
+    -- TODO: qualified names
+    @varname                { lexWith TokenVarName }
+    @conname                { lexWith TokenConName }
+    @varsym                 { lexWith TokenVarSym }
+}
+
+<0> \n                      { begin bol }
+
+<bol>
+{
+    \n                      { skip }
+    ()                      { doBol `andBegin` 0 }
+}
+
+<layout_keyword>
+{
+    $white                  { skip }
+    \{                      { lbrace `andBegin` 0 }
+    ()                      { noBrace `andBegin` 0 }
+}
+
+{
+data Located a = Located AlexPosn a
+    deriving Show
+
+constTok :: t -> AlexInput -> Int -> Alex (Located t)
+constTok t (p,_,_,_) _ = pure $ Located p t
+
+data CoreToken = TokenLet
+               | TokenLetrec
+               | TokenIn
+               | TokenModule
+               | TokenWhere
+               | TokenComma
+               | TokenCase
+               | TokenOf
+               | TokenLambda
+               | TokenArrow
+               | TokenLitInt Int
+               | TokenVarName Name
+               | TokenConName Name
+               | TokenVarSym Name
+               | TokenConSym Name
+               | TokenEquals
+               | TokenLParen
+               | TokenRParen
+               | TokenLBrace
+               | TokenRBrace
+               | TokenIndent Int
+               | TokenDedent Int
+               | TokenSemicolon
+               | TokenEOF
+               deriving Show
+
+data LayoutContext = Layout Int
+                   | NoLayout
+                   deriving Show
+
+data AlexUserState = AlexUserState
+    { _ausContext :: [LayoutContext]
+    }
+
+ausContext :: Lens' AlexUserState [LayoutContext]
+ausContext f (AlexUserState ctx)
+  = fmap
+      (\a -> AlexUserState a) (f ctx)
+{-# INLINE ausContext #-}
+
+pushContext :: LayoutContext -> Alex ()
+pushContext c = do
+    st <- alexGetUserState
+    alexSetUserState $ st { _ausContext = c : _ausContext st }
+
+popContext :: Alex ()
+popContext = do
+    st <- alexGetUserState
+    alexSetUserState $ st { _ausContext = drop 1 (_ausContext st) }
+
+getContext :: Alex [LayoutContext]
+getContext = do
+    st <- alexGetUserState
+    pure $ _ausContext st
+
+type Lexer = AlexInput -> Int -> Alex (Located CoreToken)
+
+alexEOF :: Alex (Located CoreToken)
+alexEOF = Alex $ \ st@(AlexState { alex_pos = p }) -> Right (st, Located p TokenEOF)
+
+alexInitUserState :: AlexUserState
+alexInitUserState = AlexUserState [Layout 1]
+
+nestedComment :: Lexer
+nestedComment _ _ = undefined
+
+lexStream :: Alex [Located CoreToken]
+lexStream = do
+    l <- alexMonadScan
+    case l of
+        Located _ TokenEOF  -> pure [l]
+        _                   -> (l:) <$> lexStream
+
+lexCore :: String -> Either String [Located CoreToken]
+lexCore s = runAlex s (alexSetStartCode layout_keyword *> lexStream)
+-- temp; does not support module header
+
+lexCore' :: String -> Either String [CoreToken]
+lexCore' s = fmap f <$> lexCore s
+    where f (Located _ t) = t
+
+lexWith :: (String -> CoreToken) -> Lexer
+lexWith f (p,_,_,s) l = pure $ Located p (f $ take l s)
+
+lexToken :: Alex (Located CoreToken)
+lexToken = alexMonadScan
+
+getSrcCol :: Alex Int
+getSrcCol = Alex $ \ st ->
+    let AlexPn _ _ col = alex_pos st
+    in Right (st, col)
+
+lbrace :: Lexer
+lbrace (p,_,_,_) _ = do
+    pushContext NoLayout
+    pure $ Located p TokenLBrace
+
+rbrace :: Lexer
+rbrace (p,_,_,_) _ = do
+    popContext
+    pure $ Located p TokenRBrace
+
+insRBrace :: AlexPosn -> Alex (Located CoreToken)
+insRBrace p = do
+    popContext
+    pure $ Located p TokenRBrace
+
+insSemi  :: AlexPosn -> Alex (Located CoreToken)
+insSemi p = do
+    pure $ Located p TokenSemicolon
+
+modifyUst :: (AlexUserState -> AlexUserState) -> Alex ()
+modifyUst f = do
+    st <- alexGetUserState
+    alexSetUserState $ f st
+
+getUst :: Alex AlexUserState
+getUst = alexGetUserState
+
+newLayoutContext :: Lexer
+newLayoutContext (p,_,_,_) _ = do
+    undefined
+
+noBrace :: Lexer
+noBrace (p,_,_,_) l = do
+    col <- getSrcCol
+    pushContext (Layout col)
+    pure $ Located p TokenLBrace
+
+getOffside :: Alex Ordering
+getOffside = do
+    ctx <- getContext
+    m <- getSrcCol
+    case ctx of
+        Layout n : _ -> pure $ m `compare` n
+        _            -> pure GT
+
+doBol :: Lexer
+doBol (p,c,_,s) l = do
+    off <- getOffside
+    col <- getSrcCol
+    case off of
+        LT      -> insRBrace p
+        EQ      -> insSemi p
+        _       -> lexToken
+
+letin :: Lexer
+letin (p,_,_,_) l = do
+    popContext
+    pure $ Located p TokenIn
+
+lexTmp :: IO [CoreToken]
+lexTmp = do
+    s <- readFile "/tmp/t.hs"
+    case lexCore' s of
+        Left e -> error e
+        Right a -> pure a
+}
--- a/src/Core/Parse.hs
+++ b/src/Core/Parse.hs
@@ -1,95 +0,0 @@
-{-# LANGUAGE LambdaCase, BlockArguments #-}
-module Core.Parse
-    ( parseCore
-    )
-    where
----------------------------------------------------------------------------------
-import Control.Parser
-import Data.Functor         ((<&>), ($>))
-import Core.Lex
-import Core.Syntax
----------------------------------------------------------------------------------
-
-type CoreParser = ParserT [CoreToken] Result
-
-parseCore :: [CoreToken] -> Result Program
-parseCore = fmap snd . runParserT program
-
-program :: CoreParser Program
-program = Program <$> termMany (char TokSemicolon) scdef
-
-scdef :: CoreParser ScDef
-scdef = ScDef <$> f <*> (xs <* eq) <*> body
-    where
-        f = name
-        xs = many name
-        eq = char TokEquals
-        body = expr
-
-expr :: CoreParser Expr
-expr = letE
-   <|> app
-   <|> lam
-   <|> atom
-
-atom :: CoreParser Expr
-atom  = var
-    <|> con
-    <|> parenE
-    <|> lit
-    where
-        var = Var <$> name
-        parenE = surround (char TokLParen) (char TokRParen) expr
-        lit = IntE <$> litInt
-
-lam :: CoreParser Expr
-lam = Lam <$> (l *> bs) <*> (arrow *> expr)
-    where
-        l = char TokLambda
-        arrow = char TokArrow
-        bs = some name
-
-app :: CoreParser Expr
-app = foldl App <$> atom <*> some atom
-
-con :: CoreParser Expr
-con = pack *> (Con <$> (l *> tag) <*> (arity <* r))
-    where
-        l = char TokLBrace
-        r = char TokRBrace
-        tag = litInt
-        arity = litInt
-        pack = match \case
-            TokCName "Pack" -> Just ()
-            _               -> Nothing
-
-letE :: CoreParser Expr
-letE = Let <$> word <*> defs <*> (char TokIn *> expr)
-    where
-        word = char TokLet    $> NonRec
-           <|> char TokLetRec $> Rec
-        defs = surround (char TokLBrace) (char TokRBrace) bindings
-
-bindings :: CoreParser [Binding]
-bindings = sepSome (char TokSemicolon) binding
-
-binding :: CoreParser Binding
-binding = Binding <$> name <*> (char TokEquals *> expr)
-
----------------------------------------------------------------------------------
-
-name :: CoreParser Name
-name = match \case
-    TokName n -> Just n
-    _         -> Nothing
-
-cName :: CoreParser Name
-cName = match \case
-    TokCName n -> Just n
-    _          -> Nothing
-
-litInt :: CoreParser Int
-litInt = match \case
-    TokLitInt n -> Just n
-    _           -> Nothing
-
--- a/src/Core/Parse.y
+++ b/src/Core/Parse.y
@@ -0,0 +1,128 @@
+-- TODO: resolve shift/reduce conflicts
+{
+module Core.Parse
+    ( parseCore
+    , parseCoreExpr
+    , module Core.Lex -- temp convenience
+    , parseTmp
+    )
+    where
+
+import Data.Foldable        (foldl')
+import Core.Syntax
+import Core.Lex
+import Compiler.RLPC
+}
+
+%name parseCore Module
+%name parseCoreExpr Expr
+%tokentype { CoreToken }
+%error { parseError }
+%monad { RLPC }
+
+%token
+      let             { TokenLet }
+      letrec          { TokenLetrec }
+      module          { TokenModule }
+      where           { TokenWhere }
+      ','             { TokenComma }
+      in              { TokenIn }
+      litint          { TokenLitInt $$ }
+      varname         { TokenVarName $$ }
+      varsym          { TokenVarSym $$ }
+      conname         { TokenConName $$ }
+      consym          { TokenConSym $$ }
+      'λ'             { TokenLambda }
+      '->'            { TokenArrow }
+      '='             { TokenEquals }
+      '('             { TokenLParen }
+      ')'             { TokenRParen }
+      '{'             { TokenLBrace }
+      '}'             { TokenRBrace }
+      ';'             { TokenSemicolon }
+      eof             { TokenEOF }
+
+%%
+
+Module          :: { Module }
+Module          : module conname where Program Eof { Module (Just ($2, [])) $4 }
+                | Program Eof                      { Module Nothing $1 }
+
+Eof             :: { () }
+Eof             : eof           { () }
+                | error         { () }
+
+Program         :: { Program }
+Program         : '{' ScDefs Close              { Program $2 }
+
+ScDefs          :: { [ScDef] }
+ScDefs          : ScDef ';' ScDefs              { $1 : $3 }
+                | {- epsilon -}                 { [] }
+
+ScDef           :: { ScDef }
+ScDef           : Var ParList '=' Expr          { ScDef $1 $2 $4 }
+
+ParList         :: { [Name] }
+ParList         : Var ParList                   { $1 : $2 }
+                | {- epsilon -}                 { [] }
+
+Expr            :: { Expr }
+Expr            : let '{' Bindings Close in Expr  { Let NonRec $3 $6 }
+                | letrec '{' Bindings Close in Expr { Let Rec $3 $6 }
+                | 'λ' Binders '->' Expr         { Lam $2 $4 }
+                | Application                   { $1 }
+                | Expr1                         { $1 }
+
+Close           :: { () }
+Close           : '}'                           { () }
+                | error                         { () }
+
+Binders         :: { [Name] }
+Binders         : Var Binders                  { $1 : $2 }
+                | Var                          { [$1] }
+
+Application     :: { Expr }
+Application     : Expr1 AppArgs                 { foldl' App $1 $2 }
+
+-- TODO: Application can probably be written as a single rule, without AppArgs
+AppArgs         :: { [Expr] }
+AppArgs         : Expr1 AppArgs                 { $1 : $2 }
+                | Expr1                         { [$1] }
+
+Expr1           :: { Expr }
+Expr1           : litint                        { IntE $1 }
+                | Id                            { Var $1 }
+                | '(' Expr ')'                  { $2 }
+
+Bindings        :: { [Binding] }
+Bindings        : Binding ';' Bindings          { $1 : $3 }
+                | Binding ';'                   { [$1] }
+                | Binding                       { [$1] }
+
+Binding         :: { Binding }
+Binding         : Var '=' Expr                  { $1 := $3 }
+
+Id              :: { Name }
+Id              : Var                                { $1 }
+                | Con                                { $1 }
+
+Var             :: { Name }
+Var             : '(' varsym ')'                    { $2 }
+                | varname                           { $1 }
+
+Con             :: { Name }
+Con             : '(' consym ')'                    { $2 }
+                | conname                           { $1 }
+
+{
+parseError :: [CoreToken] -> a
+parseError ts = error $ "parse error at token: " <> show (head ts)
+
+parseTmp :: IO (Module)
+parseTmp = do
+    s <- readFile "/tmp/t.hs"
+    case lexCore' s >>= runRLPC . parseCore of
+        Left e -> error e
+        Right a -> pure a
+}
+
--- a/src/Core/Syntax.hs
+++ b/src/Core/Syntax.hs
@@ -8,19 +8,22 @@ module Core.Syntax
    , Alter(..)
    , Name
    , ScDef(..)
+    , Module(..)
    , Program(..)
    , corePrelude
    , bindersOf
    , rhssOf
    , isAtomic
+    , insertModule
    )
    where
 ----------------------------------------------------------------------------------
 import Data.Coerce
 import Data.Pretty
-import Data.List            (intersperse)
-import Data.Function        ((&))
+import Data.List                    (intersperse)
+import Data.Function                ((&))
 import Data.String
+import Language.Haskell.TH.Syntax   (Lift)
 ----------------------------------------------------------------------------------

 data Expr = Var Name
@@ -30,14 +33,14 @@ data Expr = Var Name
          | Lam [Name] Expr
          | App Expr Expr
          | IntE Int
-          deriving Show
+          deriving (Show, Lift)

 infixl 2 :$
 pattern (:$) :: Expr -> Expr -> Expr
 pattern f :$ x = App f x

 data Binding = Binding Name Expr
-    deriving Show
+    deriving (Show, Lift)

 infixl 1 :=
 pattern (:=) :: Name -> Expr -> Binding
@@ -45,24 +48,36 @@ pattern k := v = Binding k v

 data Rec = Rec
         | NonRec
-         deriving (Show, Eq)
+         deriving (Show, Eq, Lift)

 data Alter = Alter Int [Name] Expr
-    deriving Show
+    deriving (Show, Lift)

 type Name = String

 data ScDef = ScDef Name [Name] Expr
-    deriving (Show)
+    deriving (Show, Lift)
+
+data Module = Module (Maybe (Name, [Name])) Program
+    deriving (Show, Lift)

 newtype Program = Program [ScDef]
-    deriving (Show)
+    deriving (Show, Lift)

 instance IsString Expr where
    fromString = Var

 ----------------------------------------------------------------------------------

+instance Pretty Program where
+    -- TODO: module header
+    prettyPrec (Program ss) _ = mconcat $ intersperse "\n\n" $ fmap pretty ss
+
+instance Pretty ScDef where
+    prettyPrec (ScDef n as e) _ =
+        mconcat (intersperse " " $ fmap IStr (n:as))
+        <> " = " <> pretty e <> IBreak
+
 instance Pretty Expr where
    prettyPrec (Var k)      = withPrec maxBound $ IStr k
    prettyPrec (IntE n)     = withPrec maxBound $ iShow n
@@ -105,7 +120,7 @@ instance Pretty Binding where
 ----------------------------------------------------------------------------------

 instance Semigroup Program where
-    (<>) = coerce $ (++) @ScDef
+    (<>) = coerce $ (<>) @[ScDef] 

 instance Monoid Program where
    mempty = Program []
@@ -124,15 +139,19 @@ isAtomic _       = False

 ----------------------------------------------------------------------------------

-corePrelude :: Program
-corePrelude = Program
+corePrelude :: Module
+corePrelude = Module (Just ("Prelude", [])) $ Program
    [ ScDef "id" ["x"] (Var "x")
-    , ScDef "K" ["x", "y"] (Var "x")
-    , ScDef "K1" ["x", "y"] (Var "y")
-    , ScDef "S" ["f", "g", "x"] (Var "f" :$ Var "x" :$ (Var "g" :$ Var "x"))
+    , ScDef "k" ["x", "y"] (Var "x")
+    , ScDef "k1" ["x", "y"] (Var "y")
+    , ScDef "succ" ["f", "g", "x"] (Var "f" :$ Var "x" :$ (Var "g" :$ Var "x"))
    , ScDef "compose" ["f", "g", "x"] (Var "f" :$ (Var "g" :$ Var "x"))
    , ScDef "twice" ["f", "x"] (Var "f" :$ (Var "f" :$ Var "x"))
    , ScDef "False" [] $ Con 0 0
    , ScDef "True" [] $ Con 1 0
    ]

+-- TODO: export list awareness
+insertModule :: Module -> Program -> Program
+insertModule (Module _ m) p = p <> m
+
--- a/src/Core/TH.hs
+++ b/src/Core/TH.hs
@@ -0,0 +1,42 @@
+module Core.TH
+    ( coreExpr
+    , core
+    )
+    where
+----------------------------------------------------------------------------------
+import Language.Haskell.TH
+import Language.Haskell.TH.Syntax
+import Language.Haskell.TH.Quote
+import Core.Parse
+import Core.Lex
+----------------------------------------------------------------------------------
+
+core :: QuasiQuoter
+core = QuasiQuoter
+    { quoteExp = qCore
+    , quotePat = error "core quasiquotes may only be used in expressions"
+    , quoteType = error "core quasiquotes may only be used in expressions"
+    , quoteDec = error "core quasiquotes may only be used in expressions"
+    }
+
+coreExpr :: QuasiQuoter
+coreExpr = QuasiQuoter
+    { quoteExp = qCoreExpr
+    , quotePat = error "core quasiquotes may only be used in expressions"
+    , quoteType = error "core quasiquotes may only be used in expressions"
+    , quoteDec = error "core quasiquotes may only be used in expressions"
+    }
+
+qCore = undefined
+qCoreExpr = undefined
+
+-- qCore :: String -> Q Exp
+-- qCore s = case lexCore s >>= parseCore of
+--     Success a -> lift a
+--     Error e _ _ -> error e
+
+-- qCoreExpr :: String -> Q Exp
+-- qCoreExpr s = case lexCore s >>= parseCoreExpr of
+--     Success a -> lift a
+--     Error e _ _ -> error e
+
--- a/src/TIM.hs
+++ b/src/TIM.hs
@@ -71,7 +71,7 @@ compile prog = Just $ TiState s d h g stats
        s = [mainAddr]
        d = []
        (h,g) = buildInitialHeap defs
-        defs = prog <> corePrelude
+        defs = insertModule corePrelude prog
        stats = Stats 0 0 0

        mainAddr = fromJust $ lookup "main" g
@@ -422,91 +422,91 @@ hdbgProg p hio = do
        TiState [resAddr] _ h _ sts = last p'
        res = hLookupUnsafe resAddr h

-letrecExample :: Program
-letrecExample = Program
-    [ ScDef "pair" ["x","y","f"] $ "f" :$ "x" :$ "y"
-    , ScDef "fst" ["p"] $ "p" :$ "K"
-    , ScDef "snd" ["p"] $ "p" :$ "K1"
-    , ScDef "f" ["x","y"] $
-        Let Rec
-            [ "a" := "pair" :$ "x" :$ "b"
-            , "b" := "pair" :$ "y" :$ "a"
-            ]
-            ("fst" :$ ("snd" :$ ("snd" :$ ("snd" :$ "a"))))
-    , ScDef "main" [] $ "f" :$ IntE 3 :$ IntE 4
-    ]
+-- letrecExample :: Program
+-- letrecExample = Program
+--     [ ScDef "pair" ["x","y","f"] $ "f" :$ "x" :$ "y"
+--     , ScDef "fst" ["p"] $ "p" :$ "K"
+--     , ScDef "snd" ["p"] $ "p" :$ "K1"
+--     , ScDef "f" ["x","y"] $
+--         Let Rec
+--             [ "a" := "pair" :$ "x" :$ "b"
+--             , "b" := "pair" :$ "y" :$ "a"
+--             ]
+--             ("fst" :$ ("snd" :$ ("snd" :$ ("snd" :$ "a"))))
+--     , ScDef "main" [] $ "f" :$ IntE 3 :$ IntE 4
+--     ]

-idExample :: Program
-idExample = Program
-    [ ScDef "main" [] $ "id" :$ IntE 3
-    ]
+-- idExample :: Program
+-- idExample = Program
+--     [ ScDef "main" [] $ "id" :$ IntE 3
+--     ]

-indExample1 :: Program
-indExample1 = Program
-    [ ScDef "main" [] $ "twice" :$ "twice" :$ "id" :$ IntE 3
-    ]
+-- indExample1 :: Program
+-- indExample1 = Program
+--     [ ScDef "main" [] $ "twice" :$ "twice" :$ "id" :$ IntE 3
+--     ]

-indExample2 :: Program
-indExample2 = Program
-    [ ScDef "main" [] $ "twice" :$ "twice" :$ "twice" :$ "id" :$ IntE 3
-    ]
+-- indExample2 :: Program
+-- indExample2 = Program
+--     [ ScDef "main" [] $ "twice" :$ "twice" :$ "twice" :$ "id" :$ IntE 3
+--     ]

-indExample3 :: Program
-indExample3 = Program
-    [ ScDef "main" [] $
-        Let Rec
-            [ "x" := IntE 2
-            , "y" := "f" :$ "x" :$ "x"
-            ]
-            ("g" :$ "y" :$ "y")
-    , ScDef "f" ["a","b"] $ "b"
-    , ScDef "g" ["a","b"] $ "a"
-    ]
+-- indExample3 :: Program
+-- indExample3 = Program
+--     [ ScDef "main" [] $
+--         Let Rec
+--             [ "x" := IntE 2
+--             , "y" := "f" :$ "x" :$ "x"
+--             ]
+--             ("g" :$ "y" :$ "y")
+--     , ScDef "f" ["a","b"] $ "b"
+--     , ScDef "g" ["a","b"] $ "a"
+--     ]

-negExample1 :: Program
-negExample1 = Program
-    [ ScDef "main" [] $
-        "negate#" :$ ("id" :$ IntE 3)
-    ]
+-- negExample1 :: Program
+-- negExample1 = Program
+--     [ ScDef "main" [] $
+--         "negate#" :$ ("id" :$ IntE 3)
+--     ]

-negExample2 :: Program
-negExample2 = Program
-    [ ScDef "main" [] $
-        "negate#" :$ IntE 3
-    ]
+-- negExample2 :: Program
+-- negExample2 = Program
+--     [ ScDef "main" [] $
+--         "negate#" :$ IntE 3
+--     ]

-negExample3 :: Program
-negExample3 = Program
-    [ ScDef "main" [] $
-        "twice" :$ "negate#" :$ IntE 3
-    ]
+-- negExample3 :: Program
+-- negExample3 = Program
+--     [ ScDef "main" [] $
+--         "twice" :$ "negate#" :$ IntE 3
+--     ]

-arithExample1 :: Program
-arithExample1 = Program
-    [ ScDef "main" [] $
-        "+#" :$ (IntE 3) :$ ("negate#" :$ (IntE 2))
-    ]
+-- arithExample1 :: Program
+-- arithExample1 = Program
+--     [ ScDef "main" [] $
+--         "+#" :$ (IntE 3) :$ ("negate#" :$ (IntE 2))
+--     ]

-arithExample2 :: Program
-arithExample2 = Program
-    [ ScDef "main" [] $
-        "negate#" :$ ("+#" :$ (IntE 2) :$ ("*#" :$ IntE 5 :$ IntE 3))
-    ]
+-- arithExample2 :: Program
+-- arithExample2 = Program
+--     [ ScDef "main" [] $
+--         "negate#" :$ ("+#" :$ (IntE 2) :$ ("*#" :$ IntE 5 :$ IntE 3))
+--     ]

-ifExample :: Program
-ifExample = Program
-    [ ScDef "main" [] $
-        "if#" :$ "True" :$ IntE 2 :$ IntE 3
-    ]
+-- ifExample :: Program
+-- ifExample = Program
+--     [ ScDef "main" [] $
+--         "if#" :$ "True" :$ IntE 2 :$ IntE 3
+--     ]

-facExample :: Program
-facExample = Program
-    [ ScDef "fac" ["n"] $
-        "if#" :$ ("==#" :$ "n" :$ IntE 0)
-        :$ (IntE 1)
-        :$ ("*#" :$ "n" :$ ("fac" :$ ("-#" :$ "n" :$ IntE 1)))
-    , ScDef "main" [] $ "fac" :$ IntE 3
-    ]
+-- facExample :: Program
+-- facExample = Program
+--     [ ScDef "fac" ["n"] $
+--         "if#" :$ ("==#" :$ "n" :$ IntE 0)
+--         :$ (IntE 1)
+--         :$ ("*#" :$ "n" :$ ("fac" :$ ("-#" :$ "n" :$ IntE 1)))
+--     , ScDef "main" [] $ "fac" :$ IntE 3
+--     ]

 ----------------------------------------------------------------------------------