Merge branch 'majestic' of github.com:GrammaticalFramework/gf-core into majestic

This commit is contained in:
krangelov
2021-11-04 08:31:43 +01:00
2 changed files with 60 additions and 5 deletions

View File

@@ -0,0 +1,42 @@
The concrete syntax in GF is expressed in a special kind of functional language. Unlike in other functional languages, all GF programs are computed at compile time. The result of the computation is another program in a simplified formalized called Parallel Multiple Context-Free Grammar (PMCFG). More on that later. For now we will only discuss how the computations in a GF program work.
At the heart of the GF compiler is the so called partial evaluator. It computes GF terms but it also have the added super power to be able to work with unknown variables. Consider for instance the term ``\s -> s ++ ""``. A normal evaluator cannot do anything with it, since in order to compute the value of the lambda function, you need to know the value of ``s``. In the computer science terminology the term is already in its normal form. A partial evaluator on the other hand, will just remember that ``s`` is a variable with an unknown value and it will try to compute the expression in the body of the function. After that it will construct a new function where the body is precomputed as much as it goes. In the concrete case the result will be ``\s -> s``, since adding an empty string to any other string produces the same string.
Another super power of the partial evaluator is that it can work with meta variables. The syntax for meta variables in GF is ``?0, ?1, ?2, ...``, and they are used as placeholders which mark parts of the program that are not finished yet. The partial evaluator has no problem to work with such incomplete programs. Sometimes the result of the computation depends on a yet unfinished part of the program, then the evaluator just suspends the computation. In other cases, the result is completely independent of the existance of metavariables. In the later, the evaluator will just return the result.
One of the uses of the evaluator is during type checking where we must enforce certain constraints. The constraints may for instance indicate that the only way for them to be satisfied is to assign a fixed value to one or more of the meta variables. The partial evaluator does that as well. Another use case is during compilation to PMCFG. The compiler to PMCFG, in certain cases assigns to a metavariable all possible values that the variable may have and it then produces different results.
In the rest of we will discuss the implementation of the partial evaluator.
# Simple Lambda Terms
We will start with the simplest possible subset of the GF language, also known as simple lambda calculus. It is defined as an algebraic data type in Haskell, as follows:
```Haskell
data Term
= Vr Ident -- e.g. variables: x,y,z ...
| Cn Ident -- e.g. constructors: cons, nil, etc.
| App Term Term -- e.g. function application: @f x@
| Abs Ident Term -- e.g. \x -> t
```
```Haskell
type Env = [(Ident,Value)]
data Value
= VApp Ident [Value]
| VClosure Env Term
```
```Haskell
eval env (Vr x) vs = apply (lookup x env) vs
eval env (Cn c) vs = VApp c vs
eval env (App t1 t2) vs = eval env t1 (eval env t2 : vs)
eval env (Abs x t) [] = VClosure env (Abs x t)
eval env (Abs x t) (v:vs) = eval ((x,v):env) t vs
apply (VApp c vs0) vs = VApp c (vs0++vs)
apply (VClosure env (Abs x t)) (v:vs) = eval ((x,v):env) t vs
```
# Variants
# Meta Variables

View File

@@ -12,7 +12,7 @@ main = do
-- modify the grammar gr
functionType gr "f" >>= print
```
Here we ask for the type of a function before and after an arbitrary update in the grammar `gr`. Obviously if we allow that then `functionType` would have to be in the IO monad, e.g.:
Here we ask for the type of a function before and after an arbitrary update in the grammar `gr`. Obviously if we allow that, then `functionType` would have to be in the IO monad, e.g.:
```Haskell
functionType :: PGF -> Fun -> IO Type
@@ -29,7 +29,7 @@ main = do
-- do all updates here
print (functionType gr2 "f")
```
Here `modifyPGF` allows us to do updates but the updates are performed on a freshly created clone of the grammar `gr`. The original grammar is never ever modified. After the changes the variable `gr2` is a reference to the new revision. While the transaction is in progress we cannot see the currently changing revision, and therefore all read-only operations can remain pure. Only after the transaction is complete do we get to use `gr2`, which will not change anymore.
Here `modifyPGF` allows us to do updates but the updates are performed on a freshly created clone of the grammar `gr`. The original grammar is never ever modified. After the changes the variable `gr2` is a reference to the new revision. While the transaction is in progress we cannot see the currently changing revision, and therefore all read-only operations can remain pure. Only after the transaction is complete, do we get to use `gr2`, which will not allowed to change anymore.
Note also that above `functionType` is used with its usual pure type:
```Haskell
@@ -47,8 +47,6 @@ The last line prints the type of function `"f"` in both the old and the new revi
The API as described so far would have been complete if all updates were happening in a single thread. In reality we can expect that there might be several threads or processes modifying the database. The database ensures a multiple readers/single writer exclusion but this doesn't mean that another process/thread cannot modify the database while the current one is reading an old revision. In a parallel setting, `modifyPGF` first merges the revision which the process is using with the latest revision in the database. On top of that the specified updates are performed. The final revision after the updates is returned as a result.
**TODO: Interprocess synhronization is still not implemented**
**TODO: Merges are still not implemented.**
The process can also ask for the latest revision by calling `checkoutPGF`, see bellow.
@@ -79,6 +77,9 @@ Here we start with an existing revision, apply a transaction and store the resul
# Implementation
In this section we summarize important design decisions related to the internal implementation.
## API
The low-level API for transactions consists of only four functions:
```C
PgfRevision pgf_clone_revision(PgfDB *db, PgfRevision revision,
@@ -107,6 +108,8 @@ From an imperative point of view, it may sound wasteful that a new copy of the g
- J. Nievergelt and E.M. Reingold, "Binary search trees of bounded balance", SIAM journal of computing 2(1), March 1973.
This is also the same algorithm used by Data.Map in Haskell. There are also other possible implementations (B-Trees for instance), and they may be considered if the current one turns our too inefficient.
## Garbage Collection
@@ -114,8 +117,18 @@ We use reference counting to keep track of which objects should be kept alive. F
Clients are supposed to correctly use `pgf_free_revision` to indicate that they don't need a revision any more. Unfortunately, this is not always possible to guarantee. For example many languages with garbage collection will call `pgf_free_revision` from a finalizer method. In some languages, however, the finalizer is not guaranteed to be executed if the process terminates before the garbage collection is done. Haskell is one of those languages. Even in languages with reference counting like Python, the process may get killed by the operating system and then the finalizer may still not be executed.
The solution is that we count on the database clients to correctly report when a revision is not needed. However, on a fresh database restart we explictly clean all left over transient revisions. This means that even if a client is killed or if it does not correctly release its revisions, the worst that can happen is a memory leak until the next restart.
The solution is that we count on the database clients to correctly report when a revision is not needed. In addition, to be on the safe side, on a fresh database restart we explictly clean all left over transient revisions. This means that even if a client is killed or if it does not correctly release its revisions, the worst that can happen is a memory leak until the next restart.
## Inter-process Communication
One and the same database may be opened by several processes. In that case, each process creates a mapping of the database into his own address space. The mapping is shared. This means that if a page from the database gets loaded in memory, it is loaded in only one place in the physical memory. The physical memory is then assigned possibly different virtual addresses for each process. All processes can read the data simultaneously, but if we let them to change it at a same, all kinds of problems may happen. To avoid that, we currently use a single-writer/multiple-readers lock which is shared between all processes accessing the same database.
Shared locks must be allocated in shared memory. Each time when you open a database, the runtime looks for the shared memory object called "/gf-runtime-locks".
If it doesn't exist then it creates it and allocates 4Kb for it. In that area we keep a table of locks for all databases which are currently open from at least one process. The entries in the table, beside the lock itself, contain the device id and the inode of the database file. This lets us to create a lock the first time when the file is opened. If another thread or process opens the same database, the we reuse the lock. In this way, all threads and processes accessing the file are synchronised by a shared lock.
When all processes accessing a given file release all references to grammar revisions from that file, then we must close the database and remove the shared lock. In this way the released entry can be reused for another database file. This is possible by keeping a list of processes that are currently using a given database. When a process doesn't need a database anymore, it removes itself from the list. If that is the last process, then it also takes care of freeing the entry for the shared lock. That would have been enough if we can trust that all processes will close databases that they don't need. Unfortunately we can't trust them. What we do instead is that each time when we open/close a database, we also go through the list of processes and remove those that are dead.
We check whether a process is alive by looking for a file under the "/proc" folder with name equal to the process id. This is not a 100% sure check. Since the kernel assigns process ids randomly, it is possible that the process has died and then another one with the same id was created. This is not a big issue. It happens rarely and if it happens, the new process will soon or later die as well.
## Atomicity