remove bad, incorrct, outdated docs
This commit is contained in:
@@ -63,52 +63,13 @@ an assembly target. The goal of our new G-Machine is to compile a *linear
|
|||||||
sequence of instructions* which, **when executed**, build up a graph
|
sequence of instructions* which, **when executed**, build up a graph
|
||||||
representing the code.
|
representing the code.
|
||||||
|
|
||||||
**************************
|
*************
|
||||||
Trees and Vines, in Theory
|
The G-Machine
|
||||||
**************************
|
*************
|
||||||
|
|
||||||
Rather than instantiating an expression at runtime -- traversing the AST and
|
|
||||||
building a graph -- we want to compile all expressions at compile-time,
|
|
||||||
generating a linear sequence of instructions which may be executed to build the
|
|
||||||
graph.
|
|
||||||
|
|
||||||
**************************
|
|
||||||
Evaluation: Slurping Vines
|
|
||||||
**************************
|
|
||||||
|
|
||||||
WIP.
|
|
||||||
|
|
||||||
Laziness
|
|
||||||
--------
|
|
||||||
|
|
||||||
WIP.
|
|
||||||
|
|
||||||
* Instead of :code:`Slide (n+1); Unwind`, do :code:`Update n; Pop n; Unwind`
|
|
||||||
|
|
||||||
****************************
|
|
||||||
Compilation: Squashing Trees
|
|
||||||
****************************
|
|
||||||
|
|
||||||
WIP.
|
|
||||||
|
|
||||||
Notice that we do not keep a (local) environment at run-time. The environment
|
|
||||||
only exists at compile-time to map local names to stack indices. When compiling
|
|
||||||
a supercombinator, the arguments are enumerated from zero (the top of the
|
|
||||||
stack), and passed to :code:`compileR` as an environment.
|
|
||||||
|
|
||||||
.. literalinclude:: /../../src/GM.hs
|
.. literalinclude:: /../../src/GM.hs
|
||||||
:dedent:
|
:dedent:
|
||||||
:start-after: -- >> [ref/compileSc]
|
:start-after: -- >> [ref/Instr]
|
||||||
:end-before: -- << [ref/compileSc]
|
:end-before: -- << [ref/Instr]
|
||||||
:caption: src/GM.hs
|
|
||||||
|
|
||||||
Of course, variables being indexed relative to the top of the stack means that
|
|
||||||
they will become inaccurate the moment we push or pop the stack a single time.
|
|
||||||
The way around this is quite simple: simply offset the stack when w
|
|
||||||
|
|
||||||
.. literalinclude:: /../../src/GM.hs
|
|
||||||
:dedent:
|
|
||||||
:start-after: -- >> [ref/compileC]
|
|
||||||
:end-before: -- << [ref/compileC]
|
|
||||||
:caption: src/GM.hs
|
:caption: src/GM.hs
|
||||||
|
|
||||||
|
|||||||
@@ -62,159 +62,6 @@ braces and semicolons. In developing our *layout* rules, we will follow in the
|
|||||||
pattern of translating the whitespace-sensitive source language to an explicitly
|
pattern of translating the whitespace-sensitive source language to an explicitly
|
||||||
sectioned language.
|
sectioned language.
|
||||||
|
|
||||||
But What About Haskell?
|
|
||||||
***********************
|
|
||||||
|
|
||||||
Parsing Haskell -- and thus rl' -- is only slightly more complex than Python,
|
|
||||||
but the design is certainly more sensitive.
|
|
||||||
|
|
||||||
.. code-block:: haskell
|
|
||||||
|
|
||||||
-- line folds
|
|
||||||
something = this is a
|
|
||||||
single expression
|
|
||||||
|
|
||||||
-- an extremely common style found in haskell
|
|
||||||
data Some = Data
|
|
||||||
{ is :: Presented
|
|
||||||
, in :: This
|
|
||||||
, silly :: Style
|
|
||||||
}
|
|
||||||
|
|
||||||
-- another style oddity
|
|
||||||
-- note that this is not a single
|
|
||||||
-- continued line! `look at`,
|
|
||||||
-- `this odd`, and `alignment` are all
|
|
||||||
-- discrete items!
|
|
||||||
anotherThing = do look at
|
|
||||||
this odd
|
|
||||||
alignment
|
|
||||||
|
|
||||||
But enough fear, lets actually think about implementation. Firstly, some
|
|
||||||
formality: what do we mean when we say layout? We will define layout as the
|
|
||||||
rules we apply to an implicitly-sectioned language in order to yield one that is
|
|
||||||
explicitly-sectioned. We will also define indentation of a lexeme as the column
|
|
||||||
number of its first character.
|
|
||||||
|
|
||||||
Thankfully for us, our entry point is quite clear; layouts only appear after a
|
|
||||||
select few keywords, (with a minor exception; TODO: elaborate) being :code:`let`
|
|
||||||
(followed by supercombinators), :code:`where` (followed by supercombinators),
|
|
||||||
:code:`do` (followed by expressions), and :code:`of` (followed by alternatives)
|
|
||||||
(TODO: all of these terms need linked glossary entries). In order to manage the
|
|
||||||
cascade of layout contexts, our lexer will record a stack for which each element
|
|
||||||
is either :math:`\varnothing`, denoting an explicit layout written with braces
|
|
||||||
and semicolons, or a :math:`\langle n \rangle`, denoting an implicitly laid-out
|
|
||||||
layout where the start of each item belonging to the layout is indented
|
|
||||||
:math:`n` columns.
|
|
||||||
|
|
||||||
.. code-block:: haskell
|
|
||||||
|
|
||||||
-- layout stack: []
|
|
||||||
module M where -- layout stack: [∅]
|
|
||||||
|
|
||||||
f x = let -- layout keyword; remember indentation of next token
|
|
||||||
y = w * w -- layout stack: [∅, <10>]
|
|
||||||
w = x + x
|
|
||||||
-- layout ends here
|
|
||||||
in do -- layout keyword; next token is a brace!
|
|
||||||
{ -- layout stack: [∅]
|
|
||||||
print y;
|
|
||||||
print x;
|
|
||||||
}
|
|
||||||
|
|
||||||
Finally, we also need the concept of "virtual" brace tokens, which as far as
|
|
||||||
we're concerned at this moment are exactly like normal brace tokens, except
|
|
||||||
implicitly inserted by the compiler. With the presented ideas in mind, we may
|
|
||||||
begin to introduce a small set of informal rules describing the lexer's handling
|
|
||||||
of layouts, the first being:
|
|
||||||
|
|
||||||
1. If a layout keyword is followed by the token '{', push :math:`\varnothing`
|
|
||||||
onto the layout context stack. Otherwise, push :math:`\langle n \rangle` onto
|
|
||||||
the layout context stack where :math:`n` is the indentation of the token
|
|
||||||
following the layout keyword. Additionally, the lexer is to insert a virtual
|
|
||||||
opening brace after the token representing the layout keyword.
|
|
||||||
|
|
||||||
Consider the following observations from that previous code sample:
|
|
||||||
|
|
||||||
* Function definitions should belong to a layout, each of which may start at
|
|
||||||
column 1.
|
|
||||||
|
|
||||||
* A layout can enclose multiple bodies, as seen in the :code:`let`-bindings and
|
|
||||||
the :code:`do`-expression.
|
|
||||||
|
|
||||||
* Semicolons should *terminate* items, rather than *separate* them.
|
|
||||||
|
|
||||||
Our current focus is the semicolons. In an implicit layout, items are on
|
|
||||||
separate lines each aligned with the previous. A naïve implementation would be
|
|
||||||
to insert the semicolon token when the EOL is reached, but this proves unideal
|
|
||||||
when you consider the alignment requirement. In our implementation, our lexer
|
|
||||||
will wait until the first token on a new line is reached, then compare
|
|
||||||
indentation and insert a semicolon if appropriate. This comparison -- the
|
|
||||||
nondescript measurement of "more, less, or equal indentation" rather than a
|
|
||||||
numeric value -- is referred to as *offside* by myself internally and the
|
|
||||||
Haskell report describing layouts. We informally formalise this rule as follows:
|
|
||||||
|
|
||||||
2. When the first token on a line is preceeded only by whitespace, if the
|
|
||||||
token's first grapheme resides on a column number :math:`m` equal to the
|
|
||||||
indentation level of the enclosing context -- i.e. the :math:`\langle n
|
|
||||||
\rangle` on top of the layout stack. Should no such context exist on the
|
|
||||||
stack, assume :math:`m > n`.
|
|
||||||
|
|
||||||
We have an idea of how to begin layouts, delimit the enclosed items, and last
|
|
||||||
we'll need to end layouts. This is where the distinction between virtual and
|
|
||||||
non-virtual brace tokens comes into play. The lexer needs only partial concern
|
|
||||||
towards closing layouts; the complete responsibility is shared with the parser.
|
|
||||||
This will be elaborated on in the next section. For now, we will be content with
|
|
||||||
naïvely inserting a virtual closing brace when a token is indented right of the
|
|
||||||
layout.
|
|
||||||
|
|
||||||
3. Under the same conditions as rule 2., when :math:`m < n` the lexer shall
|
|
||||||
insert a virtual closing brace and pop the layout stack.
|
|
||||||
|
|
||||||
This rule covers some cases including the top-level, however, consider
|
|
||||||
tokenising the :code:`in` in a :code:`let`-expression. If our lexical analysis
|
|
||||||
framework only allows for lexing a single token at a time, we cannot return both
|
|
||||||
a virtual right-brace and a :code:`in`. Under this model, the lexer may simply
|
|
||||||
pop the layout stack and return the :code:`in` token. As we'll see in the next
|
|
||||||
section, as long as the lexer keeps track of its own context (i.e. the stack),
|
|
||||||
the parser will cope just fine without the virtual end-brace.
|
|
||||||
|
|
||||||
Parsing Lonely Braces
|
|
||||||
*********************
|
|
||||||
|
|
||||||
When viewed in the abstract, parsing and tokenising are near-identical tasks yet
|
|
||||||
the two are very often decomposed into discrete systems with very different
|
|
||||||
implementations. Lexers operate on streams of text and tokens, while parsers
|
|
||||||
are typically far less linear, using a parse stack or recursing top-down. A
|
|
||||||
big reason for this separation is state management: the parser aims to be as
|
|
||||||
context-free as possible, while the lexer tends to burden the necessary
|
|
||||||
statefulness. Still, the nature of a stream-oriented lexer makes backtracking
|
|
||||||
difficult and quite inelegant.
|
|
||||||
|
|
||||||
However, simply declaring a parse error to be not an error at all
|
|
||||||
counterintuitively proves to be an elegant solution our layout problem which
|
|
||||||
minimises backtracking and state in both the lexer and the parser. Consider the
|
|
||||||
following definitions found in rlp's BNF:
|
|
||||||
|
|
||||||
.. productionlist:: rlp
|
|
||||||
VOpen : `vopen`
|
|
||||||
VClose : `vclose` | `error`
|
|
||||||
|
|
||||||
A parse error is recovered and treated as a closing brace. Another point of note
|
|
||||||
in the BNF is the difference between virtual and non-virtual braces (TODO: i
|
|
||||||
don't like that the BNF is formatted without newlines :/):
|
|
||||||
|
|
||||||
.. productionlist:: rlp
|
|
||||||
LetExpr : `let` VOpen Bindings VClose `in` Expr | `let` `{` Bindings `}` `in` Expr
|
|
||||||
|
|
||||||
This ensures that non-virtual braces are closed explicitly.
|
|
||||||
|
|
||||||
This set of rules is adequete enough to satisfy our basic concerns about line
|
|
||||||
continations and layout lists. For a more pedantic description of the layout
|
|
||||||
system, see `chapter 10
|
|
||||||
<https://www.haskell.org/onlinereport/haskell2010/haskellch10.html>`_ of the
|
|
||||||
2010 Haskell Report, which I heavily referenced here.
|
|
||||||
|
|
||||||
References
|
References
|
||||||
----------
|
----------
|
||||||
|
|
||||||
|
|||||||
@@ -93,6 +93,7 @@ data Key = NameKey Name
|
|||||||
| ConstrKey Tag Int
|
| ConstrKey Tag Int
|
||||||
deriving (Show, Eq)
|
deriving (Show, Eq)
|
||||||
|
|
||||||
|
-- >> [ref/Instr]
|
||||||
data Instr = Unwind
|
data Instr = Unwind
|
||||||
| PushGlobal Name
|
| PushGlobal Name
|
||||||
| PushConstr Tag Int
|
| PushConstr Tag Int
|
||||||
@@ -114,6 +115,7 @@ data Instr = Unwind
|
|||||||
| Print
|
| Print
|
||||||
| Halt
|
| Halt
|
||||||
deriving (Show, Eq)
|
deriving (Show, Eq)
|
||||||
|
-- << [ref/Instr]
|
||||||
|
|
||||||
data Node = NNum Int
|
data Node = NNum Int
|
||||||
| NAp Addr Addr
|
| NAp Addr Addr
|
||||||
|
|||||||
Reference in New Issue
Block a user