Files
gf-core/doc/hackers-guide/memory_model.md

8.8 KiB

The different storage files

The purpose of the .ngf files is to be used as on-disk databases that store grammars. Their format is platform-dependent and they should not be copied from one platform to another. In contrast the .pgf files are platform-independent and can be moved around. The runtime can import a .pgf file and create an .ngf file. Conversely a .pgf file can be exported from an already existing .ngf file.

The internal relation between the two files is more interesting. The runtime uses its own memory allocator which always allocates memory from a memory mapped file. The file may be explicit or an anonymous one. The .ngf is simply a memory image saved in a file. This means that loading the file is always immediate. You just create a new mapping and the kernel will load memory pages on demand.

On the other hand a .pgf file is a version of the grammar serialized in a platform-independent format. This means that loading this type of file is always slower. Fortunately, you can always create an .ngf file from it to speed up later reloads.

The runtime has three ways to load a grammar:

1. Loading a .pgf

readPGF :: FilePath -> IO PGF

This loads the .pgf into an anonymous memory-mapped file. In practice, this means that instead of allocating memory from an explicit file, the runtime will still use the normal swap file.

2. Loading a .pgf and booting a new .ngf

bootPGF :: FilePath -> FilePath -> IO PGF

The grammar is loaded from a .pgf (the first argument) and the memory is mapped to an explicit .ngf (second argument). The .ngf file is created by the function and a file with the same name should not exist before the call.

3. Loading an existing memory image

readNGF :: FilePath -> IO PGF

Once an .ngf file exists, it can be mapped back to memory by using this function. This call is always guaranteed to be fast. The same function can also create new empty .ngf files. If the file does not exist, then a new one will be created which contains an empty grammar. The grammar could then be extended by dynamically adding functions and categories.

The content of an .ngf file

The .ngf file is a memory image but this is not the end of the story. The problem is that there is no way to control at which address the memory image would be mapped. On Posix systems, mmap takes as hint the mapping address but the kernel may choose to ignore it. There is also the flag MAP_FIXED, which makes the hint into a constraint, but then the kernel may fail to satisfy the constraint. For example that address may already be used for something else. Furthermore, if the same file is mapped from several processes (if they all load the same grammar), it would be difficult to find an address which is free in all of them. Last but not least using MAP_FIXED is considered a security risk.

Since the start address of the mapping can change, using traditional memory pointers withing the mapped area is not possible. The only option is to use offsets relative to the beginning of the area. In other words, if normally we would have written p->x, now we have the offset o which we must use like this:

((A*) (current_base+o))->x

Writing the explicit pointer arithmetics and typecasts, each time when we dereference a pointer, is not better than Vogon poetry and it becomes worse when using a chain of arrow operators. The solution is to use the operator overloading in C++. There is the type ref<A> which wraps around a file offset to a data item of type A. The operators -> and * are overloaded for the type and they do the necessary pointer arithmetics and type casts.

This solves the problem with code readability but creates another problem. How do -> and * know the address of the memory mapped area? Obviously, current_base must be a global variable and there must be a way to initialize it. More specifically it must be thread-local to allow different threads to work without collisions.

A database (a memory-mapped file) in the runtime is represented by the type DB. Before any of the data in the database is accessed, the database must be brought into scope. Bringing into scope means that current_base is initialized to point to the mapping area for that database. After that any dereferencing of a reference will be done relative to the corresponding database. This is how scopes are defined:

{
    DB_scope scope(db, READER_SCOPE);
    ...
}

Here DB_scope is a helper type and db is a pointer to the database that you want to bring into scope. The constructor for DB_scope saves the old value for current_base and then sets it to point to the area of the given database. Conversely, the destructor restores the previous value.

The use of DB_scope is reentrant, i.e. you can do this:

{
    DB_scope scope(db1, READER_SCOPE);
    ...
    {
        DB_scope scope(db2, READER_SCOPE);
        ...
    }
    ...
}

What you can't do is to have more than one database in scope simultaneously. Fortunately, that is not needed. All API functions start a scope and the internals of the runtime always work with the current database in scope.

Note the flag READER_SCOPE. You can use either READER_SCOPE or WRITER_SCOPE. In addition to selecting the database, the DB_scope also enforces the single writer/multiple readers policy. The main problem is that a writer may have to enlarge the current file, which consequently may mean that the kernel should relocate the mapping area to a new address. If there are readers at the same time, they may break since they expect that the mapped area is at a particular location.

Developing writers

There is one important complication when developing procedures modifying the database. Every call to DB::malloc may potentially have to enlarge the mapped area which sometimes leads to changing current_base. That would not have been a problem if GCC was not sometimes caching variables in registers. Look at the following code:

p->r = foo();

Here p is a reference which is used to access another reference r. On the other hand, foo() is a procedure which directly or indirectly calls DB::malloc. GCC compiles assignments by first computing the address to modify, and then it evaluates the right hand side. This means that while foo() is being evaluated the address computed on the left-hand side is saved in a register or somewhere in the stack. But now, if it happens that the allocation in foo() has changed current_base, then the saved address is no longer valid.

That first problem is solved by overloading the assignment operator for ref<A>:

ref<A>& operator= (const ref<A>& r) {
    offset = r.offset;
    return *this;
}

On first sight, nothing special happens here and it looks like the overloading is redundant. However, now the assignments are compiled in a very different way. The overloaded operator is inlined, so there is no real method call and we don't get any overhead. The real difference is that now, whatever is on the left-hand side of the assignment becomes the value of the this pointer, and this is always the last thing to be evaluated in a method call. This solves the problem. foo() is evaluated first and if it changes current_base, the change will be taken into account when computing the left-hand side of the assignment.

Unfortunately, this is not the only problem. A similar thing happens when the arguments of a function are calls to other functions. See this:

foo(p->r,bar(),q->r)

Where now bar() is the function that performs allocation. The compiler is free to keep in a register the value of current_base that it needs for the evaluation of p->r, while it evaluates bar(). But if current_base has changed, then the saved value would be invalid while computing q->r. There doesn't seem to be a work around for this. The only solution is to:

Never call a function that allocates as an argument to another function

Instead we call allocating functions on a separate line and we save the result in a temporary variable.

Thread-local variables

A final remark is the compilation of thread-local variables. When a thread-local variable is compiled in a position-dependent code, i.e. in executables, it is compiled efficiently by using the fs register which points to the thread-local segment. Unfortunately, that is not the case by default for shared libraries like our runtime. In that case, GCC applies the global-dynamic model which means that access to a thread local variable is internally implemented with a call to the function __tls_get_addr. Since current_base is used all the time, this adds overhead.

The solution is to define the variable with the attribute __attribute__((tls_model("initial-exec"))) which says that it should be treated as if it is defined in an executable. This removes the overhead, but adds the limitation that the runtime should not be loaded with dlopen.