minor docs
This commit is contained in:
@@ -112,5 +112,3 @@ The way around this is quite simple: simply offset the stack when w
|
|||||||
:end-before: -- << [ref/compileC]
|
:end-before: -- << [ref/compileC]
|
||||||
:caption: src/GM.hs
|
:caption: src/GM.hs
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -2,16 +2,21 @@ Lexing, Parsing, and Layouts
|
|||||||
============================
|
============================
|
||||||
|
|
||||||
The C-style languages of my previous experiences have all had quite trivial
|
The C-style languages of my previous experiences have all had quite trivial
|
||||||
lexical analysis stages, peaking in complexity when I streamed tokens lazily in
|
lexical analysis stages: you ignore all whitespace and point out the symbols you
|
||||||
C. The task of tokenising a C-style language is very simple in description: you
|
recognise. If you don't recognise something, check if it's a literal or an
|
||||||
ignore all whitespace and point out what you recognise. If you don't recognise
|
identifier. Should it be neither, return an error.
|
||||||
something, check if it's a literal or an identifier. Should it be neither,
|
|
||||||
return an error.
|
|
||||||
|
|
||||||
On paper, both lexing and parsing a Haskell-like language seem to pose a few
|
In contrast, both lexing and parsing a Haskell-like language poses a number of
|
||||||
greater challenges. Listed by ascending intimidation factor, some of the
|
greater challenges. Listed by ascending intimidation factor, some of the
|
||||||
potential roadblocks on my mind before making an attempt were:
|
potential roadblocks on my mind before making an attempt were:
|
||||||
|
|
||||||
|
* Context-sensitive keywords; Haskell allows for some words to be used as
|
||||||
|
identifiers in appropriate contexts, such as :code:`family`, :code:`role`,
|
||||||
|
:code:`as`. Reading a note_ found in `GHC's lexer`_, it appears that keywords
|
||||||
|
are only considered in bodies for which their use is relevant, e.g.
|
||||||
|
:code:`family` and :code:`role` in type declarations, :code:`as` after
|
||||||
|
:code:`case`; :code:`if`, :code:`then`, and :code:`else` in expressions, etc.
|
||||||
|
|
||||||
* Operators; Haskell has not only user-defined infix operators, but user-defined
|
* Operators; Haskell has not only user-defined infix operators, but user-defined
|
||||||
precedence levels and associativities. I recall using an algorithm that looked
|
precedence levels and associativities. I recall using an algorithm that looked
|
||||||
up infix, prefix, postfix, and even mixfix operators up in a global table to
|
up infix, prefix, postfix, and even mixfix operators up in a global table to
|
||||||
@@ -19,17 +24,9 @@ potential roadblocks on my mind before making an attempt were:
|
|||||||
stored in the table). I never modified the table at runtime, however this
|
stored in the table). I never modified the table at runtime, however this
|
||||||
could be a very nice solution for Haskell.
|
could be a very nice solution for Haskell.
|
||||||
|
|
||||||
* Context-sensitive keywords; Haskell allows for some words to be used as identifiers in
|
|
||||||
appropriate contexts, such as :code:`family`, :code:`role`, :code:`as`.
|
|
||||||
Reading a note_ found in `GHC's lexer`_,
|
|
||||||
it appears that keywords are only considered in bodies for which their use is
|
|
||||||
relevant, e.g. :code:`family` and :code:`role` in type declarations,
|
|
||||||
:code:`as` after :code:`case`; :code:`if`, :code:`then`, and :code:`else` in
|
|
||||||
expressions, etc.
|
|
||||||
|
|
||||||
* Whitespace sensitivity; While I was comfortable with the idea of a system
|
* Whitespace sensitivity; While I was comfortable with the idea of a system
|
||||||
similar to Python's INDENT/DEDENT tokens, Haskell seemed to use whitespace to
|
similar to Python's INDENT/DEDENT tokens, Haskell's layout system is based on
|
||||||
section code in a way that *felt* different.
|
alignment and is very generous with line-folding.
|
||||||
|
|
||||||
.. _note: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/coding-style#2-using-notes
|
.. _note: https://gitlab.haskell.org/ghc/ghc/-/wikis/commentary/coding-style#2-using-notes
|
||||||
.. _GHC's lexer: https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser/Lexer.x#L1133
|
.. _GHC's lexer: https://gitlab.haskell.org/ghc/ghc/-/blob/master/compiler/GHC/Parser/Lexer.x#L1133
|
||||||
@@ -45,9 +42,9 @@ We will compare and contrast with Python's lexical analysis. Much to my dismay,
|
|||||||
Python uses newlines and indentation to separate statements and resolve scope
|
Python uses newlines and indentation to separate statements and resolve scope
|
||||||
instead of the traditional semicolons and braces found in C-style languages (we
|
instead of the traditional semicolons and braces found in C-style languages (we
|
||||||
may generally refer to these C-style languages as *explicitly-sectioned*).
|
may generally refer to these C-style languages as *explicitly-sectioned*).
|
||||||
Internally during tokenisation, when the Python lexer begins a new line, they
|
Internally during tokenisation, when the Python lexer encounters a new line, the
|
||||||
compare the indentation of the new line with that of the previous and apply the
|
indentation of the new line is compared with that of the previous and the
|
||||||
following rules:
|
following rules are applied:
|
||||||
|
|
||||||
1. If the new line has greater indentation than the previous, insert an INDENT
|
1. If the new line has greater indentation than the previous, insert an INDENT
|
||||||
token and push the new line's indentation level onto the indentation stack
|
token and push the new line's indentation level onto the indentation stack
|
||||||
@@ -60,44 +57,37 @@ following rules:
|
|||||||
3. If the indentation is equal, insert a NEWLINE token to terminate the previous
|
3. If the indentation is equal, insert a NEWLINE token to terminate the previous
|
||||||
line, and leave it at that!
|
line, and leave it at that!
|
||||||
|
|
||||||
Parsing Python with the INDENT, DEDENT, and NEWLINE tokens is identical to
|
On the parser's end, the INDENT, DEDENT, and NEWLINE tokens are identical to
|
||||||
parsing a language with braces and semicolons. This is a solution pretty in line
|
braces and semicolons. In developing our *layout* rules, we will follow in the
|
||||||
with Python's philosophy of the "one correct answer" (TODO: this needs a
|
pattern of translating the whitespace-sensitive source language to an explicitly
|
||||||
source). In developing our *layout* rules, we will follow in the pattern of
|
sectioned language.
|
||||||
translating the whitespace-sensitive source language to an explicitly sectioned
|
|
||||||
language.
|
|
||||||
|
|
||||||
But What About Haskell?
|
But What About Haskell?
|
||||||
***********************
|
***********************
|
||||||
|
|
||||||
We saw that Python, the most notable example of an implicitly sectioned
|
Parsing Haskell -- and thus rl' -- is only slightly more complex than Python,
|
||||||
language, is pretty simple to lex. Why then am I so afraid of Haskell's layouts?
|
but the design is certainly more sensitive.
|
||||||
To be frank, I'm far less scared after asking myself this -- however there are
|
|
||||||
certainly some new complexities that Python needn't concern. Haskell has
|
|
||||||
implicit line *continuation*: forms written over multiple lines; indentation
|
|
||||||
styles often seen in Haskell are somewhat esoteric compared to Python's
|
|
||||||
"s/[{};]//".
|
|
||||||
|
|
||||||
.. code-block:: haskell
|
.. code-block:: haskell
|
||||||
|
|
||||||
-- line continuation
|
-- line folds
|
||||||
something = this is a
|
something = this is a
|
||||||
single expression
|
single expression
|
||||||
|
|
||||||
-- an extremely common style found in haskell
|
-- an extremely common style found in haskell
|
||||||
data Python = Users
|
data Some = Data
|
||||||
{ are :: Crying
|
{ is :: Presented
|
||||||
, right :: About
|
, in :: This
|
||||||
, now :: Sorry
|
, silly :: Style
|
||||||
}
|
}
|
||||||
|
|
||||||
-- another formatting oddity
|
-- another style oddity
|
||||||
-- note that this is not a single
|
-- note that this is not a single
|
||||||
-- continued line! `look at`,
|
-- continued line! `look at`,
|
||||||
-- `this`, and `alignment` are all
|
-- `this odd`, and `alignment` are all
|
||||||
-- separate expressions!
|
-- discrete items!
|
||||||
anotherThing = do look at
|
anotherThing = do look at
|
||||||
this
|
this odd
|
||||||
alignment
|
alignment
|
||||||
|
|
||||||
But enough fear, lets actually think about implementation. Firstly, some
|
But enough fear, lets actually think about implementation. Firstly, some
|
||||||
@@ -233,3 +223,4 @@ References
|
|||||||
|
|
||||||
* `Haskell syntax reference
|
* `Haskell syntax reference
|
||||||
<https://www.haskell.org/onlinereport/haskell2010/haskellch10.html>`_
|
<https://www.haskell.org/onlinereport/haskell2010/haskellch10.html>`_
|
||||||
|
|
||||||
|
|||||||
5
doc/src/commentary/type-inference.rst
Normal file
5
doc/src/commentary/type-inference.rst
Normal file
@@ -0,0 +1,5 @@
|
|||||||
|
Type Inference in rl'
|
||||||
|
=====================
|
||||||
|
|
||||||
|
rl' implements type inference via the Hindley-Milner type system.
|
||||||
|
|
||||||
13
doc/src/references/rlp-inference-rules.rst
Normal file
13
doc/src/references/rlp-inference-rules.rst
Normal file
@@ -0,0 +1,13 @@
|
|||||||
|
rl' Inference Rules
|
||||||
|
===================
|
||||||
|
|
||||||
|
.. rubric::
|
||||||
|
[Var]
|
||||||
|
|
||||||
|
.. math::
|
||||||
|
\frac{x : \tau \in \Gamma}
|
||||||
|
{\Gamma \vdash x : \tau}
|
||||||
|
|
||||||
|
.. rubric::
|
||||||
|
[App]
|
||||||
|
|
||||||
Reference in New Issue
Block a user