minor docs

This commit is contained in:
crumbtoo
2024-01-24 09:31:57 -07:00
parent 3d45e12676
commit c8199a9dd1
4 changed files with 50 additions and 43 deletions

View File

@@ -112,5 +112,3 @@ The way around this is quite simple: simply offset the stack when w
:end-before: -- << [ref/compileC] :end-before: -- << [ref/compileC]
:caption: src/GM.hs :caption: src/GM.hs

View File

@@ -2,16 +2,21 @@ Lexing, Parsing, and Layouts
============================ ============================
The C-style languages of my previous experiences have all had quite trivial The C-style languages of my previous experiences have all had quite trivial
lexical analysis stages, peaking in complexity when I streamed tokens lazily in lexical analysis stages: you ignore all whitespace and point out the symbols you
C. The task of tokenising a C-style language is very simple in description: you recognise. If you don't recognise something, check if it's a literal or an
ignore all whitespace and point out what you recognise. If you don't recognise identifier. Should it be neither, return an error.
something, check if it's a literal or an identifier. Should it be neither,
return an error.
On paper, both lexing and parsing a Haskell-like language seem to pose a few In contrast, both lexing and parsing a Haskell-like language poses a number of
greater challenges. Listed by ascending intimidation factor, some of the greater challenges. Listed by ascending intimidation factor, some of the
potential roadblocks on my mind before making an attempt were: potential roadblocks on my mind before making an attempt were:
* Context-sensitive keywords; Haskell allows for some words to be used as
identifiers in appropriate contexts, such as :code:`family`, :code:`role`,
:code:`as`. Reading a note_ found in `GHC's lexer`_, it appears that keywords
are only considered in bodies for which their use is relevant, e.g.
:code:`family` and :code:`role` in type declarations, :code:`as` after
:code:`case`; :code:`if`, :code:`then`, and :code:`else` in expressions, etc.
* Operators; Haskell has not only user-defined infix operators, but user-defined * Operators; Haskell has not only user-defined infix operators, but user-defined
precedence levels and associativities. I recall using an algorithm that looked precedence levels and associativities. I recall using an algorithm that looked
up infix, prefix, postfix, and even mixfix operators up in a global table to up infix, prefix, postfix, and even mixfix operators up in a global table to
@@ -19,17 +24,9 @@ potential roadblocks on my mind before making an attempt were:
stored in the table). I never modified the table at runtime, however this stored in the table). I never modified the table at runtime, however this
could be a very nice solution for Haskell. could be a very nice solution for Haskell.
* Context-sensitive keywords; Haskell allows for some words to be used as identifiers in
appropriate contexts, such as :code:`family`, :code:`role`, :code:`as`.
Reading a note_ found in `GHC's lexer`_,
it appears that keywords are only considered in bodies for which their use is
relevant, e.g. :code:`family` and :code:`role` in type declarations,
:code:`as` after :code:`case`; :code:`if`, :code:`then`, and :code:`else` in
expressions, etc.
* Whitespace sensitivity; While I was comfortable with the idea of a system * Whitespace sensitivity; While I was comfortable with the idea of a system
similar to Python's INDENT/DEDENT tokens, Haskell seemed to use whitespace to similar to Python's INDENT/DEDENT tokens, Haskell's layout system is based on
section code in a way that *felt* different. alignment and is very generous with line-folding.
.. _note: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/coding-style#2-using-notes .. _note: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/coding-style#2-using-notes
.. _GHC's lexer: https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser/Lexer.x#L1133 .. _GHC's lexer: https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser/Lexer.x#L1133
@@ -45,9 +42,9 @@ We will compare and contrast with Python's lexical analysis. Much to my dismay,
Python uses newlines and indentation to separate statements and resolve scope Python uses newlines and indentation to separate statements and resolve scope
instead of the traditional semicolons and braces found in C-style languages (we instead of the traditional semicolons and braces found in C-style languages (we
may generally refer to these C-style languages as *explicitly-sectioned*). may generally refer to these C-style languages as *explicitly-sectioned*).
Internally during tokenisation, when the Python lexer begins a new line, they Internally during tokenisation, when the Python lexer encounters a new line, the
compare the indentation of the new line with that of the previous and apply the indentation of the new line is compared with that of the previous and the
following rules: following rules are applied:
1. If the new line has greater indentation than the previous, insert an INDENT 1. If the new line has greater indentation than the previous, insert an INDENT
token and push the new line's indentation level onto the indentation stack token and push the new line's indentation level onto the indentation stack
@@ -60,44 +57,37 @@ following rules:
3. If the indentation is equal, insert a NEWLINE token to terminate the previous 3. If the indentation is equal, insert a NEWLINE token to terminate the previous
line, and leave it at that! line, and leave it at that!
Parsing Python with the INDENT, DEDENT, and NEWLINE tokens is identical to On the parser's end, the INDENT, DEDENT, and NEWLINE tokens are identical to
parsing a language with braces and semicolons. This is a solution pretty in line braces and semicolons. In developing our *layout* rules, we will follow in the
with Python's philosophy of the "one correct answer" (TODO: this needs a pattern of translating the whitespace-sensitive source language to an explicitly
source). In developing our *layout* rules, we will follow in the pattern of sectioned language.
translating the whitespace-sensitive source language to an explicitly sectioned
language.
But What About Haskell? But What About Haskell?
*********************** ***********************
We saw that Python, the most notable example of an implicitly sectioned Parsing Haskell -- and thus rl' -- is only slightly more complex than Python,
language, is pretty simple to lex. Why then am I so afraid of Haskell's layouts? but the design is certainly more sensitive.
To be frank, I'm far less scared after asking myself this -- however there are
certainly some new complexities that Python needn't concern. Haskell has
implicit line *continuation*: forms written over multiple lines; indentation
styles often seen in Haskell are somewhat esoteric compared to Python's
"s/[{};]//".
.. code-block:: haskell .. code-block:: haskell
-- line continuation -- line folds
something = this is a something = this is a
single expression single expression
-- an extremely common style found in haskell -- an extremely common style found in haskell
data Python = Users data Some = Data
{ are :: Crying { is :: Presented
, right :: About , in :: This
, now :: Sorry , silly :: Style
} }
-- another formatting oddity -- another style oddity
-- note that this is not a single -- note that this is not a single
-- continued line! `look at`, -- continued line! `look at`,
-- `this`, and `alignment` are all -- `this odd`, and `alignment` are all
-- separate expressions! -- discrete items!
anotherThing = do look at anotherThing = do look at
this this odd
alignment alignment
But enough fear, lets actually think about implementation. Firstly, some But enough fear, lets actually think about implementation. Firstly, some
@@ -233,3 +223,4 @@ References
* `Haskell syntax reference * `Haskell syntax reference
<https://www.haskell.org/onlinereport/haskell2010/haskellch10.html>`_ <https://www.haskell.org/onlinereport/haskell2010/haskellch10.html>`_

View File

@@ -0,0 +1,5 @@
Type Inference in rl'
=====================
rl' implements type inference via the Hindley-Milner type system.

View File

@@ -0,0 +1,13 @@
rl' Inference Rules
===================
.. rubric::
[Var]
.. math::
\frac{x : \tau \in \Gamma}
{\Gamma \vdash x : \tau}
.. rubric::
[App]