rc (#13)

* update readme * Literal -> Lit, LitE -> Lit * commentary * infer * hindley milner inference :D * comments and better type errors * type IsString + test unification error * infer nonrec let binds infer nonrec let binds * small * LitE -> Lit * LitE -> Lit * TyInt -> TyCon "Int#" * parse type sigs; program type sigs * parse types * parse programs (with types :D) * parse programs (with type sigs :D) * Name = Text Name = Text * RlpcError * i'm on an airplane rn, my eyelids grow heavy, and i forgot my medication. should this be my final commit (of the week): gootbye * kinda sorta typechecking * back and medicated! * errorful (it's not good) * type-checked quasiquoters * fix hm tests * Compiler.JustRun * lex \ instead of \\ * grammar reference * 4:00 AM psychopath code * oh boy am i going to hate this code in 12 hours * application and lits appl * something * goofy * Show1 instances * fixation fufilled - back to work! * works * labels * infix decl * expr fixups * where * cool * aaaaa * decls fix * finally in a decent state * replace uses of many+satisfy with takeWhileP * layout layouts oh my layouts * i did not realise my fs is case insensitive * tysigs * add version bounds * grammar reference * 4:00 AM psychopath code * oh boy am i going to hate this code in 12 hours * application and lits appl * something * goofy * Show1 instances * fixation fufilled - back to work! * works * labels * infix decl * expr fixups * where * cool * aaaaa * decls fix * finally in a decent state * replace uses of many+satisfy with takeWhileP * layout layouts oh my layouts * i did not realise my fs is case insensitive * tysigs * its fine * threaded lexer * decent starting point * man this sucks * aagh * okay layouts kinda * kitten i'll be honest mommy's about to kill herself * see previous commit and scale back the part where i'm joking * version bounds * we're so back * fixy * cool * FIX REAL * oh my god * works * now we're fucking GETTING SOMEWHERE * i really need to learn git proper * infix exprs * remove debug flags * renamerlp * rename rlp * compiles (kill me) man * RlpcError -> IsRlpcError * when the "Test suite rlp-test: PASS" hits i'm like atlas and the world is writing two lines of code * errorful parser * errorful parser small * msgenvelope * errors! * allow uppercase sc names in preperation for Rlp2Core * letrec * infer letrec expressions * minor docs * checklist * minor docs * stable enough for a demo hey? * small fixups * new tag syntax; preparing for Core patterns new tag syntax; preparing for data names * temporary pragma system * resolve named data in case exprs * named constr tests * nearing release :3 * minor changes putting this on hold; implementing TTG first * some * oh my god guys!!! `Located` is a lax semimonoidal endofunctor on the category Hask!!! ![abstractionjak](https://media.discordapp.net/attachments/1101767463579951154/1200248978642567168/3877820-20SoyBooru.png?ex=65c57df8&is=65b308f8&hm=67da3acb61861cab6156df014b397d78fb8815fa163f2e992474d545beb668ba&=&format=webp&quality=lossless&width=880&height=868) * it's also a comonad. lol. * idk * show * abandon ship * at long last more no more undefineds * i should've made a lisp man this sucks * let layout * ttg boilerplate * fixup! ttg boilerplate * fixup! ttg boilerplate * organisation and cleaning organisation and tidying * error messages * driver progress * formatting * *R functions * -ddump-ast * debug tags * -ddump-eval * core driver * XRec fix * rlp2core base * ccoool * something * rlp TH * sc * expandableAlt * expandableAlt * fix layout_let * parse case exprs * case unrolling * rose * her light cuts deep time and time again ('her' of course referring to the field of computer science) * tidying * NameSupply effect * tidy * fix incomplete byTag * desugar * WIP associate postproc corecursive * sigh i'm gonna have to nuke the ast again in a month * remove old files * remove old files * fix top-level layout * define datatags * diagram * diagram * Update README.md * ppr debug flags ddump-parsed * ppr typesigs * ppr datatags * remove unnecessary comment * tidying * .hs -> .cr update examples * fix evil parser bug (it was a fucking typo) * fix evil lexer bug (it was actually quite subtle unlike prev.) * examples * examples * letrec + typechecking core * Update README.md * Rlp2Core: simple let binds * Rlp2Core: pattern let binds * small core fixes * update examples * formatting * typed coreExpr quoter * typechecking things * lt * decent state! * constants for bool tags * print# gm primitive * bind VarP after pats * fix: tag nested data names * gte gm prim * more nightmare GM fixes * QuickSort example works i'm gonig to cry * remove debug code * remove debug tracers * ready? * update readme * remove bad, incorrct, outdated docs --------- Co-authored-by: crumbtoo <crumb@disroot.org>
2024-02-13 13:22:23 -07:00
parent bc3bd92c43
commit 36a17d092b
55 changed files with 4117 additions and 1380 deletions
--- a/doc/src/commentary/gm.rst
+++ b/doc/src/commentary/gm.rst
@@ -1,16 +1,24 @@
 The *G-Machine*
 ===============

+The G-Machine (graph machine) is the current heart of rlpc, until we potentially
+move onto a STG (spineless tagless graph machine) or a TIM (three-instruction
+machine). rl' source code is desugared into Core; a dumbed-down subset of rl',
+and then compiled to G-Machine code, which is then finally translated to the
+desired target.
+
 **********
 Motivation
 **********

-Our initial model, the *Template Instantiator* (TI) was a very
-straightforward solution to compilation, but its core design has a major
-Achilles' heel, being that Compilation is interleaved with evaluation -- The
-heap nodes for supercombinators hold uninstantiated expressions, i.e. raw ASTs
-straight from the parser. When a supercombinator is found on the stack during
-evaluation, the template expression is instantiated (compiled) on the spot.
+Our initial model, the *Template Instantiator* (TI) was a very straightforward
+solution to compilation, but its core design has a major Achilles' heel, being
+that compilation is interleaved with evaluation -- The heap nodes for
+supercombinators hold uninstantiated expressions, i.e. raw ASTs straight from
+the parser. When a supercombinator is found on the stack during evaluation, the
+template expression is instantiated (compiled) on the spot. This makes
+translation to an assembly difficult, undermining the point of an intermediate
+language.

 .. math::
   \transrule
@@ -31,7 +39,7 @@ evaluation, the template expression is instantiated (compiled) on the spot.
   \text{where } h' = \mathtt{instantiateU} \; e \; a_n \; h \; g
   }

-The process of instantiating a supercombinator goes something like this
+The process of instantiating a supercombinator goes something like this:

 1. Augment the environment with bindings to the arguments.

@@ -52,53 +60,16 @@ The process of instantiating a supercombinator goes something like this
 Instantiating the supercombinator's body in this way is the root of our
 Achilles' heel. Traversing a tree structure is a very non-linear task unfit for
 an assembly target. The goal of our new G-Machine is to compile a *linear
-sequence of instructions* which instantiate the expression at execution.
+sequence of instructions* which, **when executed**, build up a graph
+representing the code.

-**************************
-Trees and Vines, in Theory
-**************************
-
-WIP.
-
-**************************
-Evaluation: Slurping Vines
-**************************
-
-WIP.
-
-Laziness
--------
-
-WIP.
-
-* Instead of :code:`Slide (n+1); Unwind`, do :code:`Update n; Pop n; Unwind`
-
-****************************
-Compilation: Squashing Trees
-****************************
-
-WIP.
-
-Notice that we do not keep a (local) environment at run-time. The environment
-only exists at compile-time to map local names to stack indices. When compiling
-a supercombinator, the arguments are enumerated from zero (the top of the
-stack), and passed to :code:`compileR` as an environment.
+*************
+The G-Machine
+*************

 .. literalinclude:: /../../src/GM.hs
   :dedent:
-   :start-after: -- >> [ref/compileSc]
-   :end-before: -- << [ref/compileSc]
+   :start-after: -- >> [ref/Instr]
+   :end-before: -- << [ref/Instr]
   :caption: src/GM.hs

-Of course, variables being indexed relative to the top of the stack means that
-they will become inaccurate the moment we push or pop the stack a single time.
-The way around this is quite simple: simply offset the stack when w
-
-.. literalinclude:: /../../src/GM.hs
-   :dedent:
-   :start-after: -- >> [ref/compileC]
-   :end-before: -- << [ref/compileC]
-   :caption: src/GM.hs
-
-
-
--- a/doc/src/commentary/layout-lexing.rst
+++ b/doc/src/commentary/layout-lexing.rst
@@ -2,16 +2,21 @@ Lexing, Parsing, and Layouts
 ============================

 The C-style languages of my previous experiences have all had quite trivial
-lexical analysis stages, peaking in complexity when I streamed tokens lazily in
-C. The task of tokenising a C-style language is very simple in description: you
-ignore all whitespace and point out what you recognise. If you don't recognise
-something, check if it's a literal or an identifier. Should it be neither,
-return an error.
+lexical analysis stages: you ignore all whitespace and point out the symbols you
+recognise. If you don't recognise something, check if it's a literal or an
+identifier. Should it be neither, return an error.

-On paper, both lexing and parsing a Haskell-like language seem to pose a few
+In contrast, both lexing and parsing a Haskell-like language poses a number of
 greater challenges. Listed by ascending intimidation factor, some of the
 potential roadblocks on my mind before making an attempt were:

+* Context-sensitive keywords; Haskell allows for some words to be used as
+  identifiers in appropriate contexts, such as :code:`family`, :code:`role`,
+  :code:`as`. Reading a note_ found in `GHC's lexer`_, it appears that keywords
+  are only considered in bodies for which their use is relevant, e.g.
+  :code:`family` and :code:`role` in type declarations, :code:`as` after
+  :code:`case`; :code:`if`, :code:`then`, and :code:`else` in expressions, etc.
+
 * Operators; Haskell has not only user-defined infix operators, but user-defined
  precedence levels and associativities. I recall using an algorithm that looked
  up infix, prefix, postfix, and even mixfix operators up in a global table to
@@ -19,17 +24,9 @@ potential roadblocks on my mind before making an attempt were:
  stored in the table). I never modified the table at runtime, however this
  could be a very nice solution for Haskell.

-* Context-sensitive keywords; Haskell allows for some words to be used as identifiers in
-  appropriate contexts, such as :code:`family`, :code:`role`, :code:`as`.
-  Reading a note_ found in `GHC's lexer`_,
-  it appears that keywords are only considered in bodies for which their use is
-  relevant, e.g. :code:`family` and :code:`role` in type declarations,
-  :code:`as` after :code:`case`; :code:`if`, :code:`then`, and :code:`else` in
-  expressions, etc.
-
 * Whitespace sensitivity; While I was comfortable with the idea of a system
-  similar to Python's INDENT/DEDENT tokens, Haskell seemed to use whitespace to
-  section code in a way that *felt* different.
+  similar to Python's INDENT/DEDENT tokens, Haskell's layout system is based on
+  alignment and is very generous with line-folding.

 .. _note: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/coding-style#2-using-notes
 .. _GHC's lexer: https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser/Lexer.x#L1133
@@ -45,9 +42,9 @@ We will compare and contrast with Python's lexical analysis. Much to my dismay,
 Python uses newlines and indentation to separate statements and resolve scope
 instead of the traditional semicolons and braces found in C-style languages (we
 may generally refer to these C-style languages as *explicitly-sectioned*).
-Internally during tokenisation, when the Python lexer begins a new line, they
-compare the indentation of the new line with that of the previous and apply the
-following rules:
+Internally during tokenisation, when the Python lexer encounters a new line, the
+indentation of the new line is compared with that of the previous and the
+following rules are applied:

 1. If the new line has greater indentation than the previous, insert an INDENT
   token and push the new line's indentation level onto the indentation stack
@@ -60,170 +57,10 @@ following rules:
 3. If the indentation is equal, insert a NEWLINE token to terminate the previous
   line, and leave it at that!

-Parsing Python with the INDENT, DEDENT, and NEWLINE tokens is identical to
-parsing a language with braces and semicolons. This is a solution pretty in line
-with Python's philosophy of the "one correct answer" (TODO: this needs a
-source). In developing our *layout* rules, we will follow in the pattern of
-translating the whitespace-sensitive source language to an explicitly sectioned
-language.
-
-But What About Haskell?
-***********************
-
-We saw that Python, the most notable example of an implicitly sectioned
-language, is pretty simple to lex. Why then am I so afraid of Haskell's layouts?
-To be frank, I'm far less scared after asking myself this -- however there are
-certainly some new complexities that Python needn't concern. Haskell has
-implicit line *continuation*: forms written over multiple lines; indentation
-styles often seen in Haskell are somewhat esoteric compared to Python's
-"s/[{};]//".
-
-.. code-block:: haskell
-
-   -- line continuation
-   something = this is a
-       single expression
-
-   -- an extremely common style found in haskell
-   data Python = Users
-       { are        :: Crying
-       , right      :: About
-       , now        :: Sorry
-       }
-
-   -- another formatting oddity
-   -- note that this is not a single
-   -- continued line! `look at`,
-   -- `this`, and `alignment` are all
-   -- separate expressions!
-   anotherThing = do look at
-                     this
-                     alignment
-
-But enough fear, lets actually think about implementation. Firstly, some
-formality: what do we mean when we say layout? We will define layout as the
-rules we apply to an implicitly-sectioned language in order to yield one that is
-explicitly-sectioned. We will also define indentation of a lexeme as the column
-number of its first character.
-
-Thankfully for us, our entry point is quite clear; layouts only appear after a
-select few keywords, (with a minor exception; TODO: elaborate) being :code:`let`
-(followed by supercombinators), :code:`where` (followed by supercombinators),
-:code:`do` (followed by expressions), and :code:`of` (followed by alternatives)
-(TODO: all of these terms need linked glossary entries). In order to manage the
-cascade of layout contexts, our lexer will record a stack for which each element
-is either :math:`\varnothing`, denoting an explicit layout written with braces
-and semicolons, or a :math:`\langle n \rangle`, denoting an implicitly laid-out
-layout where the start of each item belonging to the layout is indented
-:math:`n` columns.
-
-.. code-block:: haskell
-
-    -- layout stack: []
-    module M where -- layout stack: [∅]
-
-    f x = let -- layout keyword; remember indentation of next token
-              y = w * w -- layout stack: [∅, <10>]
-              w = x + x
-              -- layout ends here
-          in do -- layout keyword; next token is a brace!
-              { -- layout stack: [∅]
-                  print y;
-                  print x;
-              }
-
-Finally, we also need the concept of "virtual" brace tokens, which as far as
-we're concerned at this moment are exactly like normal brace tokens, except
-implicitly inserted by the compiler. With the presented ideas in mind, we may
-begin to introduce a small set of informal rules describing the lexer's handling
-of layouts, the first being:
-
-1. If a layout keyword is followed by the token '{', push :math:`\varnothing`
-   onto the layout context stack. Otherwise, push :math:`\langle n \rangle` onto
-   the layout context stack where :math:`n` is the indentation of the token
-   following the layout keyword. Additionally, the lexer is to insert a virtual
-   opening brace after the token representing the layout keyword.
-
-Consider the following observations from that previous code sample:
-
-* Function definitions should belong to a layout, each of which may start at
-  column 1.
-
-* A layout can enclose multiple bodies, as seen in the :code:`let`-bindings and
-  the :code:`do`-expression.
-
-* Semicolons should *terminate* items, rather than *separate* them.
-
-Our current focus is the semicolons. In an implicit layout, items are on
-separate lines each aligned with the previous. A naïve implementation would be
-to insert the semicolon token when the EOL is reached, but this proves unideal
-when you consider the alignment requirement. In our implementation, our lexer
-will wait until the first token on a new line is reached, then compare
-indentation and insert a semicolon if appropriate. This comparison -- the
-nondescript measurement of "more, less, or equal indentation" rather than a
-numeric value -- is referred to as *offside* by myself internally and the
-Haskell report describing layouts. We informally formalise this rule as follows:
-
-2. When the first token on a line is preceeded only by whitespace, if the
-   token's first grapheme resides on a column number :math:`m` equal to the
-   indentation level of the enclosing context -- i.e. the :math:`\langle n
-   \rangle` on top of the layout stack. Should no such context exist on the
-   stack, assume :math:`m > n`.
-
-We have an idea of how to begin layouts, delimit the enclosed items, and last
-we'll need to end layouts. This is where the distinction between virtual and
-non-virtual brace tokens comes into play. The lexer needs only partial concern
-towards closing layouts; the complete responsibility is shared with the parser.
-This will be elaborated on in the next section. For now, we will be content with
-naïvely inserting a virtual closing brace when a token is indented right of the
-layout.
-
-3. Under the same conditions as rule 2., when :math:`m < n` the lexer shall
-   insert a virtual closing brace and pop the layout stack.
-
-This rule covers some cases including the top-level, however, consider
-tokenising the :code:`in` in a :code:`let`-expression. If our lexical analysis
-framework only allows for lexing a single token at a time, we cannot return both
-a virtual right-brace and a :code:`in`. Under this model, the lexer may simply
-pop the layout stack and return the :code:`in` token. As we'll see in the next
-section, as long as the lexer keeps track of its own context (i.e. the stack),
-the parser will cope just fine without the virtual end-brace.
-
-Parsing Lonely Braces
-*********************
-
-When viewed in the abstract, parsing and tokenising are near-identical tasks yet
-the two are very often decomposed into discrete systems with very different
-implementations. Lexers operate on streams of text and tokens, while parsers
-are typically far less linear, using a parse stack or recursing top-down. A
-big reason for this separation is state management: the parser aims to be as
-context-free as possible, while the lexer tends to burden the necessary
-statefulness. Still, the nature of a stream-oriented lexer makes backtracking
-difficult and quite inelegant.
-
-However, simply declaring a parse error to be not an error at all
-counterintuitively proves to be an elegant solution our layout problem which
-minimises backtracking and state in both the lexer and the parser. Consider the
-following definitions found in rlp's BNF:
-
-.. productionlist:: rlp
-   VOpen   : `vopen`
-   VClose  : `vclose` | `error`
-
-A parse error is recovered and treated as a closing brace. Another point of note
-in the BNF is the difference between virtual and non-virtual braces (TODO: i
-don't like that the BNF is formatted without newlines :/):
-
-.. productionlist:: rlp
-   LetExpr : `let` VOpen Bindings VClose `in` Expr | `let` `{` Bindings `}` `in` Expr
-
-This ensures that non-virtual braces are closed explicitly.
-
-This set of rules is adequete enough to satisfy our basic concerns about line
-continations and layout lists. For a more pedantic description of the layout
-system, see `chapter 10
-<https://www.haskell.org/onlinereport/haskell2010/haskellch10.html>`_ of the
-2010 Haskell Report, which I heavily referenced here.
+On the parser's end, the INDENT, DEDENT, and NEWLINE tokens are identical to
+braces and semicolons. In developing our *layout* rules, we will follow in the
+pattern of translating the whitespace-sensitive source language to an explicitly
+sectioned language.

 References
 ----------
@@ -233,3 +70,4 @@ References

 * `Haskell syntax reference
  <https://www.haskell.org/onlinereport/haskell2010/haskellch10.html>`_
+
--- a/doc/src/commentary/ti.rst
+++ b/doc/src/commentary/ti.rst
@@ -1,6 +0,0 @@
-The *Template Instantiator*
-====================================
-
-WIP. This will hopefully be expanded into a thorough walkthrough of the state
-machine.
-
--- a/doc/src/commentary/type-inference.rst
+++ b/doc/src/commentary/type-inference.rst
@@ -0,0 +1,5 @@
+Type Inference in rl'
+=====================
+
+rl' implements type inference via the Hindley-Milner type system.
+
--- a/doc/src/conf.py
+++ b/doc/src/conf.py
@@ -13,7 +13,7 @@ author = 'madeleine sydney slaga'
 # -- General configuration ---------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

-extensions = ['sphinx.ext.imgmath']
+extensions = ['sphinx.ext.imgmath', 'sphinx.ext.graphviz']

 # templates_path = ['_templates']
 exclude_patterns = []
@@ -32,6 +32,7 @@ html_theme = 'alabaster'
 imgmath_latex_preamble = r'''
 \usepackage{amsmath}
 \usepackage{tabularray}
+\usepackage{syntax}

 \newcommand{\transrule}[2]
    {\begin{tblr}{|rrrlc|}
--- a/doc/src/references/rlp-grammar.rst
+++ b/doc/src/references/rlp-grammar.rst
@@ -0,0 +1,67 @@
+The Complete Syntax of rl'
+==========================
+
+WIP.
+
+Provided is the complete syntax of rl' in (pseudo) EBNF. {A} represents zero or
+more A's, [A] means optional A, and terminals are wrapped in 'single-quotes'.
+
+.. math
+   :nowrap:
+
+   \setlength{\grammarparsep}{20pt plus 1pt minus 1pt}
+   \setlength{\grammarindent}{12em}
+   \begin{grammar}
+       <Decl> ::= <InfixDecl>
+       \alt <DataDecl>
+       \alt <TypeSig>
+       \alt <FunDef>
+
+       <InfixDecl> ::= <InfixWord> `litint' <Name>
+       <InfixWord> ::= `infix'
+       \alt `infixl'
+       \alt `infixr'
+
+       <DataDecl> ::= `data' `conname' {}
+
+   \end{grammar}
+
+.. code-block:: bnf
+
+   Decl      ::= InfixDecl
+               | DataDecl
+               | TypeSig
+               | FunDef
+
+   InfixDecl ::= InfixWord 'litint' Operator
+   InfixWord ::= 'infix'
+               | 'infixl'
+               | 'infixr'
+
+   DataDecl  ::= 'data' 'conname' {'name'} '=' Data
+   DataCons  ::= 'conname' {Type1} ['|' DataCons]
+
+   TypeSig   ::= Var '::' Type
+   FunDef    ::= Var {Pat1} '=' Expr
+
+   Type      ::= Type1 {Type1}
+               -- note that (->) is right-associative,
+               -- and extends as far as possible
+               | Type '->' Type
+   Type1     ::= '(' Type ')'
+               | 'conname'
+
+   Pat       ::= 'conname' Pat1 {Pat1}
+               | Pat 'consym' Pat
+
+   Pat1      ::= Literal
+               | 'conname'
+               | '(' Pat ')'
+
+   Literal   ::= 'litint'
+
+   Var ::= 'varname'
+         | '(' 'varsym' ')'
+   Con ::= 'conname'
+         | '(' 'consym' ')'
+
--- a/doc/src/references/rlp-inference-rules.rst
+++ b/doc/src/references/rlp-inference-rules.rst
@@ -0,0 +1,17 @@
+rl' Inference Rules
+===================
+
+.. rubric::
+   [Var]
+
+.. math::
+   \frac{x : \tau \in \Gamma}
+        {\Gamma \vdash x : \tau}
+
+.. rubric::
+   [App]
+
+.. math::
+   \frac{\Gamma \vdash f : \alpha \to \beta \qquad \Gamma \vdash x : \alpha}
+        {\Gamma \vdash f x : \beta}
+