Update memory_model.md

This commit is contained in:
Krasimir Angelov
2021-08-07 20:39:09 +02:00
committed by GitHub
parent bfd839b7b0
commit db8843c8bf

View File

@@ -44,12 +44,15 @@ same file is mapped from several processes (if they all load the same grammar),
Last but not least using `MAP_FIXED` is considered a security risk.
Since the start address of the mapping can change, using traditional memory pointers withing the mapped area is not possible. The only option is to use offsets
relative to the beginning of the area. On other words, if normally we would have had a pointer `p`, now we have the offset `o` which is converted to a
pointer by using `current_base+o`.
relative to the beginning of the area. In other words, if normally we would have written `p->x`, now we have the offset `o` which we must use like this:
```C++
((A*) (current_base+o))->x
```
Writing explicitly the pointer arithmetics and the corresponding typecasts, each time when we dereference a pointer, is too tedious and verbose. Instead,
we use the operator overloading in C++. There is the type `ref<A>` which wraps around a file offset to a data item of type `A`. The operators `->` and `*`
are overloaded for that type and they do the necessary pointer arithmetics and type casts.
Writing the explicit pointer arithmetics and typecasts, each time when we dereference a pointer, is not better than Vogon poetry and it
becomes worse when using a chain of arrow operators. The solution is to use the operator overloading in C++.
There is the type `ref<A>` which wraps around a file offset to a data item of type `A`. The operators `->` and `*`
are overloaded for the type and they do the necessary pointer arithmetics and type casts.
This solves the problem with code readability but creates another problem. How do `->` and `*` know the address of the memory mapped area? Obviously,
`current_base` must be a static variable and there must be a way to initialize that variable.
@@ -85,3 +88,39 @@ Note the flag `READER_SCOPE`. You can use either `READER_SCOPE` or `WRITER_SCOPE
the single writer, multiple readers policy. The main problem is that a writer may have to enlarge the current file, which consequently may mean
that the kernel should relocate the mapping area to a new address. If there are readers at the same time, they way break since they expect that the mapped
area is at a particular location.
# Developing Writers
There is one important complication when developing procedures modifying the database. Every call to `DB::malloc` may potentially have to enlarge the mapped area
which sometimes leads to changing `current_base`. That would not have been a problem if GCC was not sometimes caching variables in registers. Look at the following code:
```C++
p->r = foo();
```
Here `p` is a reference which is used to access another reference `r`. On the other hand, `foo()` is a procedure which directly or indirectly calls `DB::malloc`.
GCC compiles assignments by first computing the address to modify, and then it evaluates the right hand side. This means that while `foo()` is beeing evaluated the address computed on the left-hand side is saved in a register or somewhere in the stack. But now, if it happens that the allocation in `foo()` has changed
`current_base`, then the saved address is no longer valid.
That first problem is solved by overloading the assignment operator for `ref<A>`:
```C++
ref<A>& operator= (const ref<A>& r) {
offset = r.offset;
return *this;
}
```
On a first sight, nothing special happens here and it looks like the overloading is redundant. However, now the assignments are compiled in a very different way.
The overloaded operator is inlined, so there is no real method call and we don't get any overhead. The real difference is that now, whatever is on the left-hand side of the assignment becomes the value of the `this` pointer, and `this` is always the last thing to be evaluated in a method call. This solves the problem.
`foo()` is evaluated first and if it changes `current_base`, the change will be taken into account when computing the left-hand side of the assignment.
Unfortunately, this is not the only problem. A similar thing happens when the arguments of a function are calls to other functions. See this:
```C++
foo(p->r,bar(),q->r)
```
Where now `bar()` is the function that do allocation. The compiler is free to keep in a register the value of `current_base` that it needs for the evaluation of
`p->r`, while it evaluates `bar()`. But if `current_base` has changed, then the saved value would be invalid while computing `q->r`. There doesn't seem to be
a work around for this. The only solution is to:
**Never call a function that allocates as an argument to another function**
Instead we call allocating functions on a separate line and we save the result in a temporary variable.