remove bad, incorrct, outdated docs

This commit is contained in:
crumbtoo
2024-02-13 13:20:39 -07:00
parent c57da862ae
commit ccc71a751c
3 changed files with 7 additions and 197 deletions

View File

@@ -63,52 +63,13 @@ an assembly target. The goal of our new G-Machine is to compile a *linear
sequence of instructions* which, **when executed**, build up a graph
representing the code.
**************************
Trees and Vines, in Theory
**************************
Rather than instantiating an expression at runtime -- traversing the AST and
building a graph -- we want to compile all expressions at compile-time,
generating a linear sequence of instructions which may be executed to build the
graph.
**************************
Evaluation: Slurping Vines
**************************
WIP.
Laziness
--------
WIP.
* Instead of :code:`Slide (n+1); Unwind`, do :code:`Update n; Pop n; Unwind`
****************************
Compilation: Squashing Trees
****************************
WIP.
Notice that we do not keep a (local) environment at run-time. The environment
only exists at compile-time to map local names to stack indices. When compiling
a supercombinator, the arguments are enumerated from zero (the top of the
stack), and passed to :code:`compileR` as an environment.
*************
The G-Machine
*************
.. literalinclude:: /../../src/GM.hs
:dedent:
:start-after: -- >> [ref/compileSc]
:end-before: -- << [ref/compileSc]
:caption: src/GM.hs
Of course, variables being indexed relative to the top of the stack means that
they will become inaccurate the moment we push or pop the stack a single time.
The way around this is quite simple: simply offset the stack when w
.. literalinclude:: /../../src/GM.hs
:dedent:
:start-after: -- >> [ref/compileC]
:end-before: -- << [ref/compileC]
:start-after: -- >> [ref/Instr]
:end-before: -- << [ref/Instr]
:caption: src/GM.hs

View File

@@ -62,159 +62,6 @@ braces and semicolons. In developing our *layout* rules, we will follow in the
pattern of translating the whitespace-sensitive source language to an explicitly
sectioned language.
But What About Haskell?
***********************
Parsing Haskell -- and thus rl' -- is only slightly more complex than Python,
but the design is certainly more sensitive.
.. code-block:: haskell
-- line folds
something = this is a
single expression
-- an extremely common style found in haskell
data Some = Data
{ is :: Presented
, in :: This
, silly :: Style
}
-- another style oddity
-- note that this is not a single
-- continued line! `look at`,
-- `this odd`, and `alignment` are all
-- discrete items!
anotherThing = do look at
this odd
alignment
But enough fear, lets actually think about implementation. Firstly, some
formality: what do we mean when we say layout? We will define layout as the
rules we apply to an implicitly-sectioned language in order to yield one that is
explicitly-sectioned. We will also define indentation of a lexeme as the column
number of its first character.
Thankfully for us, our entry point is quite clear; layouts only appear after a
select few keywords, (with a minor exception; TODO: elaborate) being :code:`let`
(followed by supercombinators), :code:`where` (followed by supercombinators),
:code:`do` (followed by expressions), and :code:`of` (followed by alternatives)
(TODO: all of these terms need linked glossary entries). In order to manage the
cascade of layout contexts, our lexer will record a stack for which each element
is either :math:`\varnothing`, denoting an explicit layout written with braces
and semicolons, or a :math:`\langle n \rangle`, denoting an implicitly laid-out
layout where the start of each item belonging to the layout is indented
:math:`n` columns.
.. code-block:: haskell
-- layout stack: []
module M where -- layout stack: [∅]
f x = let -- layout keyword; remember indentation of next token
y = w * w -- layout stack: [∅, <10>]
w = x + x
-- layout ends here
in do -- layout keyword; next token is a brace!
{ -- layout stack: [∅]
print y;
print x;
}
Finally, we also need the concept of "virtual" brace tokens, which as far as
we're concerned at this moment are exactly like normal brace tokens, except
implicitly inserted by the compiler. With the presented ideas in mind, we may
begin to introduce a small set of informal rules describing the lexer's handling
of layouts, the first being:
1. If a layout keyword is followed by the token '{', push :math:`\varnothing`
onto the layout context stack. Otherwise, push :math:`\langle n \rangle` onto
the layout context stack where :math:`n` is the indentation of the token
following the layout keyword. Additionally, the lexer is to insert a virtual
opening brace after the token representing the layout keyword.
Consider the following observations from that previous code sample:
* Function definitions should belong to a layout, each of which may start at
column 1.
* A layout can enclose multiple bodies, as seen in the :code:`let`-bindings and
the :code:`do`-expression.
* Semicolons should *terminate* items, rather than *separate* them.
Our current focus is the semicolons. In an implicit layout, items are on
separate lines each aligned with the previous. A naïve implementation would be
to insert the semicolon token when the EOL is reached, but this proves unideal
when you consider the alignment requirement. In our implementation, our lexer
will wait until the first token on a new line is reached, then compare
indentation and insert a semicolon if appropriate. This comparison -- the
nondescript measurement of "more, less, or equal indentation" rather than a
numeric value -- is referred to as *offside* by myself internally and the
Haskell report describing layouts. We informally formalise this rule as follows:
2. When the first token on a line is preceeded only by whitespace, if the
token's first grapheme resides on a column number :math:`m` equal to the
indentation level of the enclosing context -- i.e. the :math:`\langle n
\rangle` on top of the layout stack. Should no such context exist on the
stack, assume :math:`m > n`.
We have an idea of how to begin layouts, delimit the enclosed items, and last
we'll need to end layouts. This is where the distinction between virtual and
non-virtual brace tokens comes into play. The lexer needs only partial concern
towards closing layouts; the complete responsibility is shared with the parser.
This will be elaborated on in the next section. For now, we will be content with
naïvely inserting a virtual closing brace when a token is indented right of the
layout.
3. Under the same conditions as rule 2., when :math:`m < n` the lexer shall
insert a virtual closing brace and pop the layout stack.
This rule covers some cases including the top-level, however, consider
tokenising the :code:`in` in a :code:`let`-expression. If our lexical analysis
framework only allows for lexing a single token at a time, we cannot return both
a virtual right-brace and a :code:`in`. Under this model, the lexer may simply
pop the layout stack and return the :code:`in` token. As we'll see in the next
section, as long as the lexer keeps track of its own context (i.e. the stack),
the parser will cope just fine without the virtual end-brace.
Parsing Lonely Braces
*********************
When viewed in the abstract, parsing and tokenising are near-identical tasks yet
the two are very often decomposed into discrete systems with very different
implementations. Lexers operate on streams of text and tokens, while parsers
are typically far less linear, using a parse stack or recursing top-down. A
big reason for this separation is state management: the parser aims to be as
context-free as possible, while the lexer tends to burden the necessary
statefulness. Still, the nature of a stream-oriented lexer makes backtracking
difficult and quite inelegant.
However, simply declaring a parse error to be not an error at all
counterintuitively proves to be an elegant solution our layout problem which
minimises backtracking and state in both the lexer and the parser. Consider the
following definitions found in rlp's BNF:
.. productionlist:: rlp
VOpen : `vopen`
VClose : `vclose` | `error`
A parse error is recovered and treated as a closing brace. Another point of note
in the BNF is the difference between virtual and non-virtual braces (TODO: i
don't like that the BNF is formatted without newlines :/):
.. productionlist:: rlp
LetExpr : `let` VOpen Bindings VClose `in` Expr | `let` `{` Bindings `}` `in` Expr
This ensures that non-virtual braces are closed explicitly.
This set of rules is adequete enough to satisfy our basic concerns about line
continations and layout lists. For a more pedantic description of the layout
system, see `chapter 10
<https://www.haskell.org/onlinereport/haskell2010/haskellch10.html>`_ of the
2010 Haskell Report, which I heavily referenced here.
References
----------

View File

@@ -93,6 +93,7 @@ data Key = NameKey Name
| ConstrKey Tag Int
deriving (Show, Eq)
-- >> [ref/Instr]
data Instr = Unwind
| PushGlobal Name
| PushConstr Tag Int
@@ -114,6 +115,7 @@ data Instr = Unwind
| Print
| Halt
deriving (Show, Eq)
-- << [ref/Instr]
data Node = NNum Int
| NAp Addr Addr