diff --git a/doc/src/commentary/gm.rst b/doc/src/commentary/gm.rst index d1ae166..e471c95 100644 --- a/doc/src/commentary/gm.rst +++ b/doc/src/commentary/gm.rst @@ -63,52 +63,13 @@ an assembly target. The goal of our new G-Machine is to compile a *linear sequence of instructions* which, **when executed**, build up a graph representing the code. -************************** -Trees and Vines, in Theory -************************** - -Rather than instantiating an expression at runtime -- traversing the AST and -building a graph -- we want to compile all expressions at compile-time, -generating a linear sequence of instructions which may be executed to build the -graph. - -************************** -Evaluation: Slurping Vines -************************** - -WIP. - -Laziness --------- - -WIP. - -* Instead of :code:`Slide (n+1); Unwind`, do :code:`Update n; Pop n; Unwind` - -**************************** -Compilation: Squashing Trees -**************************** - -WIP. - -Notice that we do not keep a (local) environment at run-time. The environment -only exists at compile-time to map local names to stack indices. When compiling -a supercombinator, the arguments are enumerated from zero (the top of the -stack), and passed to :code:`compileR` as an environment. +************* +The G-Machine +************* .. literalinclude:: /../../src/GM.hs :dedent: - :start-after: -- >> [ref/compileSc] - :end-before: -- << [ref/compileSc] - :caption: src/GM.hs - -Of course, variables being indexed relative to the top of the stack means that -they will become inaccurate the moment we push or pop the stack a single time. -The way around this is quite simple: simply offset the stack when w - -.. literalinclude:: /../../src/GM.hs - :dedent: - :start-after: -- >> [ref/compileC] - :end-before: -- << [ref/compileC] + :start-after: -- >> [ref/Instr] + :end-before: -- << [ref/Instr] :caption: src/GM.hs diff --git a/doc/src/commentary/layout-lexing.rst b/doc/src/commentary/layout-lexing.rst index 2039b35..e000c3a 100644 --- a/doc/src/commentary/layout-lexing.rst +++ b/doc/src/commentary/layout-lexing.rst @@ -62,159 +62,6 @@ braces and semicolons. In developing our *layout* rules, we will follow in the pattern of translating the whitespace-sensitive source language to an explicitly sectioned language. -But What About Haskell? -*********************** - -Parsing Haskell -- and thus rl' -- is only slightly more complex than Python, -but the design is certainly more sensitive. - -.. code-block:: haskell - - -- line folds - something = this is a - single expression - - -- an extremely common style found in haskell - data Some = Data - { is :: Presented - , in :: This - , silly :: Style - } - - -- another style oddity - -- note that this is not a single - -- continued line! `look at`, - -- `this odd`, and `alignment` are all - -- discrete items! - anotherThing = do look at - this odd - alignment - -But enough fear, lets actually think about implementation. Firstly, some -formality: what do we mean when we say layout? We will define layout as the -rules we apply to an implicitly-sectioned language in order to yield one that is -explicitly-sectioned. We will also define indentation of a lexeme as the column -number of its first character. - -Thankfully for us, our entry point is quite clear; layouts only appear after a -select few keywords, (with a minor exception; TODO: elaborate) being :code:`let` -(followed by supercombinators), :code:`where` (followed by supercombinators), -:code:`do` (followed by expressions), and :code:`of` (followed by alternatives) -(TODO: all of these terms need linked glossary entries). In order to manage the -cascade of layout contexts, our lexer will record a stack for which each element -is either :math:`\varnothing`, denoting an explicit layout written with braces -and semicolons, or a :math:`\langle n \rangle`, denoting an implicitly laid-out -layout where the start of each item belonging to the layout is indented -:math:`n` columns. - -.. code-block:: haskell - - -- layout stack: [] - module M where -- layout stack: [∅] - - f x = let -- layout keyword; remember indentation of next token - y = w * w -- layout stack: [∅, <10>] - w = x + x - -- layout ends here - in do -- layout keyword; next token is a brace! - { -- layout stack: [∅] - print y; - print x; - } - -Finally, we also need the concept of "virtual" brace tokens, which as far as -we're concerned at this moment are exactly like normal brace tokens, except -implicitly inserted by the compiler. With the presented ideas in mind, we may -begin to introduce a small set of informal rules describing the lexer's handling -of layouts, the first being: - -1. If a layout keyword is followed by the token '{', push :math:`\varnothing` - onto the layout context stack. Otherwise, push :math:`\langle n \rangle` onto - the layout context stack where :math:`n` is the indentation of the token - following the layout keyword. Additionally, the lexer is to insert a virtual - opening brace after the token representing the layout keyword. - -Consider the following observations from that previous code sample: - -* Function definitions should belong to a layout, each of which may start at - column 1. - -* A layout can enclose multiple bodies, as seen in the :code:`let`-bindings and - the :code:`do`-expression. - -* Semicolons should *terminate* items, rather than *separate* them. - -Our current focus is the semicolons. In an implicit layout, items are on -separate lines each aligned with the previous. A naïve implementation would be -to insert the semicolon token when the EOL is reached, but this proves unideal -when you consider the alignment requirement. In our implementation, our lexer -will wait until the first token on a new line is reached, then compare -indentation and insert a semicolon if appropriate. This comparison -- the -nondescript measurement of "more, less, or equal indentation" rather than a -numeric value -- is referred to as *offside* by myself internally and the -Haskell report describing layouts. We informally formalise this rule as follows: - -2. When the first token on a line is preceeded only by whitespace, if the - token's first grapheme resides on a column number :math:`m` equal to the - indentation level of the enclosing context -- i.e. the :math:`\langle n - \rangle` on top of the layout stack. Should no such context exist on the - stack, assume :math:`m > n`. - -We have an idea of how to begin layouts, delimit the enclosed items, and last -we'll need to end layouts. This is where the distinction between virtual and -non-virtual brace tokens comes into play. The lexer needs only partial concern -towards closing layouts; the complete responsibility is shared with the parser. -This will be elaborated on in the next section. For now, we will be content with -naïvely inserting a virtual closing brace when a token is indented right of the -layout. - -3. Under the same conditions as rule 2., when :math:`m < n` the lexer shall - insert a virtual closing brace and pop the layout stack. - -This rule covers some cases including the top-level, however, consider -tokenising the :code:`in` in a :code:`let`-expression. If our lexical analysis -framework only allows for lexing a single token at a time, we cannot return both -a virtual right-brace and a :code:`in`. Under this model, the lexer may simply -pop the layout stack and return the :code:`in` token. As we'll see in the next -section, as long as the lexer keeps track of its own context (i.e. the stack), -the parser will cope just fine without the virtual end-brace. - -Parsing Lonely Braces -********************* - -When viewed in the abstract, parsing and tokenising are near-identical tasks yet -the two are very often decomposed into discrete systems with very different -implementations. Lexers operate on streams of text and tokens, while parsers -are typically far less linear, using a parse stack or recursing top-down. A -big reason for this separation is state management: the parser aims to be as -context-free as possible, while the lexer tends to burden the necessary -statefulness. Still, the nature of a stream-oriented lexer makes backtracking -difficult and quite inelegant. - -However, simply declaring a parse error to be not an error at all -counterintuitively proves to be an elegant solution our layout problem which -minimises backtracking and state in both the lexer and the parser. Consider the -following definitions found in rlp's BNF: - -.. productionlist:: rlp - VOpen : `vopen` - VClose : `vclose` | `error` - -A parse error is recovered and treated as a closing brace. Another point of note -in the BNF is the difference between virtual and non-virtual braces (TODO: i -don't like that the BNF is formatted without newlines :/): - -.. productionlist:: rlp - LetExpr : `let` VOpen Bindings VClose `in` Expr | `let` `{` Bindings `}` `in` Expr - -This ensures that non-virtual braces are closed explicitly. - -This set of rules is adequete enough to satisfy our basic concerns about line -continations and layout lists. For a more pedantic description of the layout -system, see `chapter 10 -`_ of the -2010 Haskell Report, which I heavily referenced here. - References ---------- diff --git a/src/GM.hs b/src/GM.hs index d4493cf..c815e83 100644 --- a/src/GM.hs +++ b/src/GM.hs @@ -93,6 +93,7 @@ data Key = NameKey Name | ConstrKey Tag Int deriving (Show, Eq) +-- >> [ref/Instr] data Instr = Unwind | PushGlobal Name | PushConstr Tag Int @@ -114,6 +115,7 @@ data Instr = Unwind | Print | Halt deriving (Show, Eq) +-- << [ref/Instr] data Node = NNum Int | NAp Addr Addr