This is an experiment to develop **a majestic new GF runtime**. The reason is that there are several features that we want to have and they all require a major rewrite of the existing C runtime. Instead of beating the old code until it starts doing what we want, it is time to start from scratch. # New Features The features that we want are: - We want to support **even bigger grammars that don't fit in the main memory** anymore. Instead, they should reside on the disc and parts will be loaded on demand. The current design is that all memory allocated for the grammars should be from memory-mapped files. In this way the only limit for the grammar size will be the size of the virtual memory, i.e. 2^64 bytes. The swap file is completely circumvented, while all of the available RAM can be used as a cache for loading parts of the grammar. - We want to be able to **update grammars dynamically**. This is a highly desired feature since recompiling large grammars takes hours. Instead, dynamic updates should happen instantly. - We want to be able to **store additional information in the PGF**. For example that could be application specific semantic data. Another example is to store the source code of the different grammar rules, to allow the compiler to recompile individual rules. - We want to **allow a single file to contain slightly different versions of the grammar**. This will be a kind of a version control system, which will allow different users to store their own grammar extensions while still using the same core content. - We want to **avoid the exponential explosion in the size of PMCFG** for some grammars. This happens because PMCFG as a formalism is too low-level. By enriching it with light-weight variables, we can make it more powerful and hopefully avoid the exponential explosion. - We want to finally **ditch the old Haskell runtime** which has long outlived its time. There are also two bugs in the old C runtime whose fixes will require a lot of changes, so instead of fixing the old runtime we do it here: - **Integer literals in the C runtime** are implemented as 32-bit integers, while the Haskell runtime used unlimited integers. Python supports unlimited integers too, so it would be nice to support them in the new runtime as well. - The old C runtime assumed that **String literals are terminated with the NULL character**. None of the modern languages (Haskell, Python, Java, etc) make that assumption, so we should drop it too. # Consequences The desired features will have the following implementation cosequences. - The switch from memory-based to disc-based runtime requires one big change. Before it was easy to just keep a pointer from one object to another. Unfortunately this doesn't work with memory-mapped files, since every time when you map a file into memory it may end up at a different virtual address. Instead we must use file offsets. In order to make programming simpler, the new runtime will be **implemented in C++ instead of C**. This allows us to overload the arrow operator (`->`) which will dynamically convert file offsets to in-memory pointers. - The choice of C++ also allows us to ditch the old `libgu` library and **use STL** instead. - The content of the memory mapped files is platform-specific. For that reason there will be two grammar representations: - **Native Grammar Format** (`.ngf`) - which will be instantly loadable by just mapping it to memory, but will be platform-dependent. - **Portable Grammar Format** (`.pgf`) - which will take longer to load but will be more compact and platform independent. The runtime will be able to load `.pgf` files and convert them to `.ngf`. Conversely `.pgf` can be exported from the current `.ngf`.