This new milestone brings many low-level improvements to Red's memory management and garbage collecting. Most of those are long-planned additions needed to complete the internal memory model and make it robust enough for the future stable Red v1.0.
First, here is a simplified overview of the Red memory model (existing parts in green color, new parts in orange, non-Red parts in blue):
All Red values are stored in series. Some Red values require one or more buffers to hold their content. The values can never reference a buffer directly, but only through a node reference, to enable relocation when expanding the series buffer or when moving it around during compaction by the GC.
Now let's dive into the hairy details!
External resources GC
The Red/View engine backends rely on external resources provided by the OS. Among those resources, some are linked to face! or font! object and require special care when those objects are not reachable anymore. So far, our GC (Garbage Collector) was not able to release such resources (images bitmap buffers and fonts handles), as unreachable Red aggregate values are seeing as simple series during the sweeping GC stage. In order to improve that, we have added an external resources manager, that will track and free unused resources, allowing now unrestricted images and fonts usage!
Accurate GC
The Red GC relies on allocated memory walking and native stack scanning to identify live Red values. Scanning the native stack can be challenging. The scanner used so far a conservative approach, which is simpler, but can lead to corruptions or crashes in rare cases (e.g. a floating point number being mistaken for a series or node pointer). Moreover, such approach precluded from having a nodes frame GC, as there was no way to accurately identify node pointers on the stack. This is now solved. The plan was always to make it precise when getting closer to a Red v1.0 and that's what we did in this release.
In order to achieve that, several key additions were made:
➤ Frame records hints: the R/S compiler now generates a map of hints for
arguments and local variables which are series/nodes pointers, using bit
arrays stored in the
.data segment and retrieved
during scanning. In order to match a call frame on stack to the right bitmap
array, an offset is now pushed on stack by each function call as part of
their prolog sequence. Only the stack slots corresponding to 1's in the bit
array are analyzed further to identify their origin series/node frame, then
marked and stored in an list for the collector to later update them if
needed. The bit arrays are compressed using our
CRUSH
algorithm
implementation, so that, e.g. for the GUI console executable, all bit arrays add only
about 3472 extra bytes to the final executable.
➤ Variadic hints (typed vs untyped): for variadic functions, the bit array
is
dynamically created. If the
typed mode
is used, an accurate bitmap is produced. If the generic untyped
variadic
mode is used, all the arguments stack slots will be marked for processing.
This could, in theory, create false positives, but in practice, in Red's
runtime code, all such cases are safe, referencing only Red values.
➤ Optimized pointers identification performance: each extracted pointer
from the stack needs to be confirmed to be a valid series or a node
pointer. Such checking is
now achieved
using cached sorted lists and binary search, ensuring vastly faster
operation.
➤ Optimized frame walking by skipping non-Red frames: the stack scanning is done by jumping between call frames, relying on the saved frame pointer in each frame to chain the frames. However, when R/S callback functions are invoked by external (mostly OS) code, those external frames should be skipped to avoid false positives and for sake of performances. Now the scanner identifies which call frames are part of Red's code segment and skips the rest. However, one last hurdle remains, the dreadful compilation option in C compilers where the frame pointer is omitted in call frames (e.g. -fomit-frame-pointer in gcc). In such cases, walking the stack by dereferencing frame pointers is not an option anymore. The workaround is to save an extra "last known Red frame" pointer before calling any external code, which is then used by the scanner to jump over external code directly into the parent Red frame.
Node frames compaction
The GC is now capable of reclaiming node frames where the number of used slots is very low. It was, until now, a cause of memory leaking for long-running apps with bursts of high number of series allocations, as new node frames were allocated, but unused (non-empty) ones were never released.
This is now taken care of through a special GC pass that runs when specific conditions are fulfilled, moving live nodes from emptier frames to fuller frames, then freeing the entirely empty frames. The GC is then updating all references to relocated nodes during its marking and stack scanning stages.
In addition to that, the internal structure of node frames was improved. The free slots tracking method was changed from a stack-oriented model, to free slots linked-lists, resulting in doubling the node frames capacity, while keeping constant-time allocation/freeing performance.
External Red values reference management
Red values can sometimes be referenced by external non-Red code. The View engine relies on that and was storing copies of face object values inside external OS structures in order to be able to retrieve them on OS-generated events that would trigger Red callbacks. Such practice is not reliable and not compatible with the new node GC, as some node pointers could be stored away from the GC reach. So a new external values management system was introduced to only export a reference (an array index used as ID) to external code and keep all values inside Red series. The View backends were modified accordingly to rely on those references instead of copying the face object values.
That sub-system could in the future also be used for libRed external values management, to replace the ring buffer used there, which is functionally almost identical but now redundant.
Low-level allocations tracking
The Red runtime code sometimes has a need for allocating memory regions which last until the end of the Red process or need to be kept away for the GC. For that purpose, Red relies on malloc for such use-cases, just importing it from C library. Instead of a direct mapping, it now uses a thin layer on top of it in order to track all allocations providing extra features:
➤ Freeing of all system allocated memory regions on Red exit. This is not strickly needed for Red runtime, but allows to track eventual leaks (rare case as most of such allocations are permanent).
➤ Ability to gather stats about such allocations (reported in show/info in "allocated on heap" part).
➤ Buffer overflow detection in debug mode using guard barriers at the tail of allocated buffers and checked on freeing for eventual overflows.
➤ This layer is part of the R/S runtime, so available to R/S code too.
Other Changes
➤ stats native improvements: /info has been extended to contain also total allocated from OS and allocated memory from heap (see above). /show refinement has been implemented to pretty-print all those infos.
➤ Lowered memory allocations in Red runtime at start (about 1MB gained in total).
➤ Memory frames integrity
auto-testing
in debug mode (only node frames for now).
➤ Handle! values now hold a sub-type, revealed by mold/all (for debugging purposes):
view [b: button "Hi!" [print mold/all b/state/1]] #[handle! 030A063Eh window]
➤ Now the final buffer is preallocated internally for insert and append calls with /dup refinement, resulting in much lower memory usage.
➤ Using zero?, with a point3D value was always returning false due to an incomplete copy/paste change. Fixed now.
➤ Updated GPIO definitions for RaspberryPi devices. Pi 4 should be supported, but untested yet. Pi 5 not yet supported (should be updated soon).
Red/System changes
➤ Added system/lib-image to support libRedRT image properties.
Great release, as always. Brings solid ground towards the further 1.0 development. Looking forward to the async IO :-)
ReplyDeleteGreat work!
ReplyDeleteIt's really exciting to see the progress made on this fascinating language!
Good job on this release!
ReplyDelete