Vendor things

2024-03-08 11:03:01 -08:00 · 2024-03-08 11:03:01 -08:00 · 977e3c17e5
commit 977e3c17e5
parent 5deceec006
19434 changed files with 10682014 additions and 0 deletions
--- a/third-party/vendor/sharded-slab/IMPLEMENTATION.md
+++ b/third-party/vendor/sharded-slab/IMPLEMENTATION.md
@ -0,0 +1,135 @@
+Notes on `sharded-slab`'s implementation and design.
+
+# Design
+
+The sharded slab's design is strongly inspired by the ideas presented by
+Leijen, Zorn, and de Moura in [Mimalloc: Free List Sharding in
+Action][mimalloc]. In this report, the authors present a novel design for a
+memory allocator based on a concept of _free list sharding_.
+
+Memory allocators must keep track of what memory regions are not currently
+allocated ("free") in order to provide them to future allocation requests.
+The term [_free list_][freelist] refers to a technique for performing this
+bookkeeping, where each free block stores a pointer to the next free block,
+forming a linked list. The memory allocator keeps a pointer to the most
+recently freed block, the _head_ of the free list. To allocate more memory,
+the allocator pops from the free list by setting the head pointer to the
+next free block of the current head block, and returning the previous head.
+To deallocate a block, the block is pushed to the free list by setting its
+first word to the current head pointer, and the head pointer is set to point
+to the deallocated block. Most implementations of slab allocators backed by
+arrays or vectors use a similar technique, where pointers are replaced by
+indices into the backing array.
+
+When allocations and deallocations can occur concurrently across threads,
+they must synchronize accesses to the free list; either by putting the
+entire allocator state inside of a lock, or by using atomic operations to
+treat the free list as a lock-free structure (such as a [Treiber stack]). In
+both cases, there is a significant performance cost — even when the free
+list is lock-free, it is likely that a noticeable amount of time will be
+spent in compare-and-swap loops. Ideally, the global synchronzation point
+created by the single global free list could be avoided as much as possible.
+
+The approach presented by Leijen, Zorn, and de Moura is to introduce
+sharding and thus increase the granularity of synchronization significantly.
+In mimalloc, the heap is _sharded_ so that each thread has its own
+thread-local heap. Objects are always allocated from the local heap of the
+thread where the allocation is performed. Because allocations are always
+done from a thread's local heap, they need not be synchronized.
+
+However, since objects can move between threads before being deallocated,
+_deallocations_ may still occur concurrently. Therefore, Leijen et al.
+introduce a concept of _local_ and _global_ free lists. When an object is
+deallocated on the same thread it was originally allocated on, it is placed
+on the local free list; if it is deallocated on another thread, it goes on
+the global free list for the heap of the thread from which it originated. To
+allocate, the local free list is used first; if it is empty, the entire
+global free list is popped onto the local free list. Since the local free
+list is only ever accessed by the thread it belongs to, it does not require
+synchronization at all, and because the global free list is popped from
+infrequently, the cost of synchronization has a reduced impact. A majority
+of allocations can occur without any synchronization at all; and
+deallocations only require synchronization when an object has left its
+parent thread (a relatively uncommon case).
+
+[mimalloc]: https://www.microsoft.com/en-us/research/uploads/prod/2019/06/mimalloc-tr-v1.pdf
+[freelist]: https://en.wikipedia.org/wiki/Free_list
+[Treiber stack]: https://en.wikipedia.org/wiki/Treiber_stack
+
+# Implementation
+
+A slab is represented as an array of [`MAX_THREADS`] _shards_. A shard
+consists of a vector of one or more _pages_ plus associated metadata.
+Finally, a page consists of an array of _slots_, head indices for the local
+and remote free lists.
+
+```text
+┌─────────────┐
+│ shard 1     │
+│             │    ┌─────────────┐        ┌────────┐
+│ pages───────┼───▶│ page 1      │        │        │
+├─────────────┤    ├─────────────┤  ┌────▶│  next──┼─┐
+│ shard 2     │    │ page 2      │  │     ├────────┤ │
+├─────────────┤    │             │  │     │XXXXXXXX│ │
+│ shard 3     │    │ local_head──┼──┘     ├────────┤ │
+└─────────────┘    │ remote_head─┼──┐     │        │◀┘
+      ...          ├─────────────┤  │     │  next──┼─┐
+┌─────────────┐    │ page 3      │  │     ├────────┤ │
+│ shard n     │    └─────────────┘  │     │XXXXXXXX│ │
+└─────────────┘          ...        │     ├────────┤ │
+                   ┌─────────────┐  │     │XXXXXXXX│ │
+                   │ page n      │  │     ├────────┤ │
+                   └─────────────┘  │     │        │◀┘
+                                    └────▶│  next──┼───▶  ...
+                                          ├────────┤
+                                          │XXXXXXXX│
+                                          └────────┘
+```
+
+
+The size of the first page in a shard is always a power of two, and every
+subsequent page added after the first is twice as large as the page that
+preceeds it.
+
+```text
+
+pg.
+┌───┐   ┌─┬─┐
+│ 0 │───▶ │ │
+├───┤   ├─┼─┼─┬─┐
+│ 1 │───▶ │ │ │ │
+├───┤   ├─┼─┼─┼─┼─┬─┬─┬─┐
+│ 2 │───▶ │ │ │ │ │ │ │ │
+├───┤   ├─┼─┼─┼─┼─┼─┼─┼─┼─┬─┬─┬─┬─┬─┬─┬─┐
+│ 3 │───▶ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │
+└───┘   └─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┘
+```
+
+When searching for a free slot, the smallest page is searched first, and if
+it is full, the search proceeds to the next page until either a free slot is
+found or all available pages have been searched. If all available pages have
+been searched and the maximum number of pages has not yet been reached, a
+new page is then allocated.
+
+Since every page is twice as large as the previous page, and all page sizes
+are powers of two, we can determine the page index that contains a given
+address by shifting the address down by the smallest page size and
+looking at how many twos places necessary to represent that number,
+telling us what power of two page size it fits inside of. We can
+determine the number of twos places by counting the number of leading
+zeros (unused twos places) in the number's binary representation, and
+subtracting that count from the total number of bits in a word.
+
+The formula for determining the page number that contains an offset is thus:
+
+```rust,ignore
+WIDTH - ((offset + INITIAL_PAGE_SIZE) >> INDEX_SHIFT).leading_zeros()
+```
+
+where `WIDTH` is the number of bits in a `usize`, and `INDEX_SHIFT` is
+
+```rust,ignore
+INITIAL_PAGE_SIZE.trailing_zeros() + 1;
+```
+
+[`MAX_THREADS`]: https://docs.rs/sharded-slab/latest/sharded_slab/trait.Config.html#associatedconstant.MAX_THREADS