Update transactions.md

2021-11-17 08:11:24 +01:00
parent ed6b0f303e
commit e1f6a24371
1 changed files with 3 additions and 10 deletions
--- a/doc/hackers-guide/transactions.md
+++ b/doc/hackers-guide/transactions.md
@@ -115,20 +115,13 @@ This is also the same algorithm used by Data.Map in Haskell. There are also othe

 We use reference counting to keep track of which objects should be kept alive. For instance, `pgf_free_revision` knows that a transient revision should be removed only when its reference count reaches zero. This means that there is no process or thread using it. The function also checks whether the revision is persistent. Persistent revisions are never removed since they can always be retrieved with `checkoutPGF`.

-Clients are supposed to correctly use `pgf_free_revision` to indicate that they don't need a revision any more. Unfortunately, this is not always possible to guarantee. For example many languages with garbage collection will call `pgf_free_revision` from a finalizer method. In some languages, however, the finalizer is not guaranteed to be executed if the process terminates before the garbage collection is done. Haskell is one of those languages. Even in languages with reference counting like Python, the process may get killed by the operating system and then the finalizer may still not be executed.
+Clients are supposed to correctly use `pgf_free_revision` to indicate that they don't need a revision any more. Unfortunately, this is not always possible to guarantee. For example many languages with garbage collection call `pgf_free_revision` from a finalizer method. In some languages, however, the finalizer is not guaranteed to be executed if the process terminates before the garbage collection is done. Haskell is one of those languages. Even in languages with reference counting like Python, the process may get killed by the operating system and then the finalizer may still not be executed.

-The solution is that we count on the database clients to correctly report when a revision is not needed. In addition, to be on the safe side, on a fresh database restart we explictly clean all left over transient revisions. This means that even if a client is killed or if it does not correctly release its revisions, the worst that can happen is a memory leak until the next restart.
+The solution is that we count on the database clients to correctly report when a revision is not needed. In addition, to be on the safe side, on a fresh database restart we explictly clean all leftover transient revisions. This means that even if a client is killed or if it does not correctly release its revisions, the worst that can happen is a memory leak until the next restart. Here by fresh restart we mean a situation where a process opens a database which is not used by anyone else. In order to detect that case we maintain a list of processes who currently have access to the file. While a new process is added, we also remove all processes in the list who are not alive anymore. If at the end the list contains only one element, then this is a fresh restart.

 ## Inter-process Communication

-One and the same database may be opened by several processes. In that case, each process creates a mapping of the database into his own address space. The mapping is shared. This means that if a page from the database gets loaded in memory, it is loaded in only one place in the physical memory. The physical memory is then assigned possibly different virtual addresses for each process. All processes can read the data simultaneously, but if we let them to change it at a same, all kinds of problems may happen. To avoid that, we currently use a single-writer/multiple-readers lock which is shared between all processes accessing the same database.
-
-Shared locks must be allocated in shared memory. Each time when you open a database, the runtime looks for the shared memory object called "/gf-runtime-locks".
-If it doesn't exist then it creates it and allocates 4Kb for it. In that area we keep a table of locks for all databases which are currently open from at least one process. The entries in the table, beside the lock itself, contain the device id and the inode of the database file. This lets us to create a lock the first time when the file is opened. If another thread or process opens the same database, the we reuse the lock. In this way, all threads and processes accessing the file are synchronised by a shared lock.
-
-When all processes accessing a given file release all references to grammar revisions from that file, then we must close the database and remove the shared lock. In this way the released entry can be reused for another database file. This is possible by keeping a list of processes that are currently using a given database. When a process doesn't need a database anymore, it removes itself from the list. If that is the last process, then it also takes care of freeing the entry for the shared lock. That would have been enough if we can trust that all processes will close databases that they don't need. Unfortunately we can't trust them. What we do instead is that each time when we open/close a database, we also go through the list of processes and remove those that are dead.
-
-We check whether a process is alive by looking for a file under the "/proc" folder with name equal to the process id. This is not a 100% sure check. Since the kernel assigns process ids randomly, it is possible that the process has died and then another one with the same id was created. This is not a big issue. It happens rarely and if it happens, the new process will soon or later die as well.
+One and the same database may be opened by several processes. In that case, each process creates a mapping of the database into his own address space. The mapping is shared, which means that if a page from the database gets loaded in memory, it is loaded in a single place in the physical memory. The physical memory is then assigned possibly different virtual addresses in each process. All processes can read the data simultaneously, but if we let them to change it at the same time, all kinds of problems may happen. To avoid that, we store a single-writer/multiple-readers lock in the database file, which the processes use for synchronization.

 ## Atomicity