Update memory_model.md

This commit is contained in:
Krasimir Angelov
2021-08-07 10:36:34 +02:00
committed by GitHub
parent 72c51f4bf9
commit cc8db24a46

View File

@@ -35,3 +35,53 @@ Once an .ngf file exists, it can be mapped back to memory by using this function
create new empty .ngf files. If the file does not exist, then a new one will be created which contains an empty grammar. The grammar could then be extended
by dynamically adding functions and categories.
# The content of an .ngf file
The .ngf file is a memory image but this is not the end of the story. The problem is that there is no way to control at which address the memory image would be
mapped. On Posix systems, `mmap` takes as hint the mapping address but the kernel may choose to ignore it. There is also the flag MAP_FIXED, which makes the hint
into a constraint, but then the kernel may fail to satisfy the constraint. For example that address may already be used for something else. Furthermore, if the
same file is mapped from several processes (if they all load the same grammar), it would be difficult to find an address which is free in all of them.
Last but not least using `MAP_FIXED` is considered a security risk.
Since the start address of the mapping can change, using traditional memory pointers withing the mapped area is not possible. The only option is to use offsets
relative to the beginning of the area. On other words, if normally we would have had a pointer `p`, now we have the offset `o` which is converted to a
pointer by using `current_base+o`.
Writing explicitly the pointer arithmetics and the corresponding typecasts, each time when we dereference a pointer, is too tedious and verbose. Instead,
we use the operator overloading in C++. There is the type `ref<A>` which wraps around a file offset to a data item of type `A`. The operators `->` and `*`
are overloaded for that type and they do the necessary pointer arithmetics and type casts.
This solves the problem with code readability but creates another problem. How do `->` and `*` know the address of the memory mapped area? Obviously,
`current_base` must be a static variable and there must be a way to initialize that variable.
A database (a memory mapped file) in the runtime is represented by the type `DB`. Before any of the data in the database is accessed, the database must
be brought into scope. Bringing into scope means that `current_base` is initialized to point to the mapping area for that database. After that any dereferencing
of a reference will be done relative to the corresponding database. This is how scopes are defined:
```C++
{
DB_scope scope(db, READER_SCOPE);
...
}
```
Here `DB_scope` is a helper type and `db` is a pointer to the database that you want to bring into scope. The constructor for `DB_scope` saves the old value
for `current_base` and then sets it to point to the area of the given database. Conversely the destructor, restores the previous value.
The use of `DB_scope` is reentrant, i.e. you can do this:
```C++
{
DB_scope scope(db1, READER_SCOPE);
...
{
DB_scope scope(db2, READER_SCOPE);
...
}
...
}
```
What you can't do is to have more than one database in scope simultaneously. Fortunately, that is not needed. All API functions start a scope
and the internals of the runtime always work with the current database in scope.
Note the flag `READER_SCOPE`. You can use either `READER_SCOPE` or `WRITER_SCOPE`. In addition to selecting the database, the DB_scope also enforces,
the single writer, multiple readers policy. The main problem is that a writer may have to enlarge the current file, which consequently may mean
that the kernel should relocate the mapping area to a new address. If there are readers at the same time, they way break since they expect that the mapped
area is at a particular location.