Vendor things

This commit is contained in:
John Doty 2024-03-08 11:03:01 -08:00
parent 5deceec006
commit 977e3c17e5
19434 changed files with 10682014 additions and 0 deletions

View file

@ -0,0 +1,106 @@
//! Internal details.
//!
//! While the other parts of documentation are useful to users of the crate, this part is probably
//! helpful only if you want to look into the code or are curious about how it works internally.
//!
//! Also note that any of these details may change in future versions and are not part of the
//! stability guarantees. Don't rely on anything here.
//!
//! # Storing the [`Arc`].
//!
//! The [`Arc`] can be turned into a raw pointer and back. This is abstracted by the [`RefCnt`]
//! trait and it is technically possible to implement it for custom types (this crate also
//! implements it for [`Rc`] and [`Weak`], though the actual usefulness of these is a bit
//! questionable).
//!
//! The raw pointer is stored inside an [`AtomicPtr`].
//!
//! # Protection of reference counts
//!
//! The first idea would be to just use [`AtomicPtr`] with whatever the [`Arc::into_raw`] returns.
//! Then replacing it would be fine (there's no need to update ref counts). The load needs to
//! increment the reference count one still stays inside and another is returned to the caller.
//! This is done by re-creating the Arc from the raw pointer and then cloning it, throwing one
//! instance away (without destroying it).
//!
//! This approach has a problem. There's a short time between we read the raw pointer and increment
//! the count. If some other thread replaces the stored Arc and throws it away, the ref count could
//! drop to 0, get destroyed and we would be trying to bump ref counts in a ghost, which would be
//! totally broken.
//!
//! To prevent this, we actually use two approaches in a hybrid manner.
//!
//! The first one is based on hazard pointers idea, but slightly modified. There's a global
//! repository of pointers that owe a reference. When someone swaps a pointer, it walks this list
//! and pays all the debts (and takes them out of the repository).
//!
//! For simplicity and performance, storing into the repository is fallible. If storing into the
//! repository fails (because the thread used up all its own slots, or because the pointer got
//! replaced in just the wrong moment and it can't confirm the reservation), unlike the full
//! hazard-pointers approach, we don't retry, but fall back onto secondary strategy.
//!
//! The secondary strategy is similar, but a bit more complex (and therefore slower, that's why it
//! is only a fallback). We first publish an intent to read a pointer (and where we are reading it
//! from). Then we actually do so and publish the debt, like previously.
//!
//! The writer pays the debts as usual. But also, if it sees the intent to read the value, it helps
//! along, reads it, bumps the reference and passes it to the reader. Therefore, if the reader
//! fails to do the protection itself, because it got interrupted by a writer, it finds a
//! ready-made replacement value it can just use and doesn't have to retry. Also, the writer
//! doesn't have to wait for the reader in any way, because it can just solve its problem and move
//! on.
//!
//! # Unsafety
//!
//! All the uses of the unsafe keyword is just to turn the raw pointer back to Arc. It originated
//! from an Arc in the first place, so the only thing to ensure is it is still valid. That means its
//! ref count never dropped to 0.
//!
//! At the beginning, there's ref count of 1 stored in the raw pointer (and maybe some others
//! elsewhere, but we can't rely on these). This 1 stays there for the whole time the pointer is
//! stored there. When the arc is replaced, this 1 is returned to the caller, so we just have to
//! make sure no more readers access it by that time.
//!
//! # Leases and debts
//!
//! Instead of incrementing the reference count, the pointer reference can be owed. In such case, it
//! is recorded into a global storage. As each thread has its own storage (the global storage is
//! composed of multiple thread storages), the readers don't contend. When the pointer is no longer
//! in use, the debt is erased.
//!
//! The writer pays all the existing debts, therefore the reader have the full Arc with ref count at
//! that time. The reader is made aware the debt was paid and decrements the reference count.
//!
//! # Memory orders
//!
//! ## Synchronizing the data pointed to by the pointer.
//!
//! We have AcqRel (well, SeqCst, but that's included) on the swap and Acquire on the loads. In case
//! of the double read around the debt allocation, we do that on the *second*, because of ABA.
//! That's also why that SeqCst on the allocation of debt itself is not enough.
//! the *latest* decrement. By making both the increment and decrement AcqRel, we effectively chain
//! the edges together.
//!
//! # Memory orders around debts
//!
//! The linked list of debt nodes only grows. The shape of the list (existence of nodes) is
//! synchronized through Release on creation and Acquire on load on the head pointer.
//!
//! The debts work similar to locks Acquire and Release make all the pointer manipulation at the
//! interval where it is written down. However, we use the SeqCst on the allocation of the debt
//! because when we see an empty slot, we need to make sure that it happened after we have
//! overwritten the pointer.
//!
//! In case the writer pays the debt, it sees the new enough data (for the same reasons the stale
//! empties are not seen). The reference count on the Arc is AcqRel and makes sure it is not
//! destroyed too soon. The writer traverses all the slots, therefore they don't need to synchronize
//! with each other.
//!
//! Further details are inside the internal `debt` module.
//!
//! [`RefCnt`]: crate::RefCnt
//! [`Arc`]: std::sync::Arc
//! [`Arc::into_raw`]: std::sync::Arc::into_raw
//! [`Rc`]: std::rc::Rc
//! [`Weak`]: std::sync::Weak
//! [`AtomicPtr`]: std::sync::atomic::AtomicPtr

View file

@ -0,0 +1,53 @@
//! Limitations and common pitfalls.
//!
//! # Sized types
//!
//! This currently works only for `Sized` types. Unsized types have „fat pointers“, which are twice
//! as large as the normal ones. The [`AtomicPtr`] doesn't support them. One could use something
//! like `AtomicU128` for them. The catch is this doesn't exist and the difference would make it
//! really hard to implement the debt storage/stripped down hazard pointers.
//!
//! A workaround is to use double indirection:
//!
//! ```rust
//! # use arc_swap::ArcSwap;
//! // This doesn't work:
//! // let data: ArcSwap<[u8]> = ArcSwap::new(Arc::from([1, 2, 3]));
//!
//! // But this does:
//! let data: ArcSwap<Box<[u8]>> = ArcSwap::from_pointee(Box::new([1, 2, 3]));
//! # drop(data);
//! ```
//!
//! It also may be possible to use `ArcSwap` with the [`triomphe::ThinArc`] (that crate needs
//! enabling a feature flag to cooperate with `ArcSwap`).
//!
//! # Too many [`Guard`]s
//!
//! There's only limited number of "fast" slots for borrowing from [`ArcSwap`] for each single
//! thread (currently 8, but this might change in future versions). If these run out, the algorithm
//! falls back to slower path.
//!
//! If too many [`Guard`]s are kept around, the performance might be poor. These are not intended
//! to be stored in data structures or used across async yield points.
//!
//! [`ArcSwap`]: crate::ArcSwap
//! [`Guard`]: crate::Guard
//! [`AtomicPtr`]: std::sync::atomic::AtomicPtr
//!
//! # No `Clone` implementation
//!
//! Previous version implemented [`Clone`], but it turned out to be very confusing to people, since
//! it created fully independent [`ArcSwap`]. Users expected the instances to be tied to each
//! other, that store in one would change the result of future load of the other.
//!
//! To emulate the original behaviour, one can do something like this:
//!
//! ```rust
//! # use arc_swap::ArcSwap;
//! # let old = ArcSwap::from_pointee(42);
//! let new = ArcSwap::new(old.load_full());
//! # let _ = new;
//! ```
//!
//! [`triomphe::ThinArc`]: https://docs.rs/triomphe/latest/triomphe/struct.ThinArc.html

View file

@ -0,0 +1,54 @@
//! Additional documentation.
//!
//! Here we have some more general topics that might be good to know that just don't fit to the
//! crate level intro.
//!
//! Also, there were some previous blog posts about the crate which you might find interesting.
//!
//! # Atomic orderings
//!
//! Each operation on the [`ArcSwapAny`] with [`DefaultStrategy`] type callable concurrently (eg.
//! [`load`], but not [`into_inner`]) contains at least one [`SeqCst`] atomic read-write operation,
//! therefore even operations on different instances have a defined global order of operations.
//!
//! # Features
//!
//! The `weak` feature adds the ability to use arc-swap with the [`Weak`] pointer too,
//! through the [`ArcSwapWeak`] type. The needed std support is stabilized in rust version 1.45 (as
//! of now in beta).
//!
//! The `experimental-strategies` enables few more strategies that can be used. Note that these
//! **are not** part of the API stability guarantees and they may be changed, renamed or removed at
//! any time.
//!
//! The `experimental-thread-local` feature can be used to build arc-swap for `no_std` targets, by
//! replacing occurences of [`std::thread_local!`] with the `#[thread_local]` directive. This
//! requires a nightly Rust compiler as it makes use of the experimental
//! [`thread_local`](https://doc.rust-lang.org/unstable-book/language-features/thread-local.html)
//! feature. Using this features, thread-local variables are compiled using LLVM built-ins, which
//! have [several underlying modes of
//! operation](https://doc.rust-lang.org/beta/unstable-book/compiler-flags/tls-model.html). To add
//! support for thread-local variables on a platform that does not have OS or linker support, the
//! easiest way is to use `-Ztls-model=emulated` and to implement `__emutls_get_address` by hand,
//! as in [this
//! example](https://opensource.apple.com/source/clang/clang-800.0.38/src/projects/compiler-rt/lib/builtins/emutls.c.auto.html)
//! from Clang.
//!
//! # Minimal compiler version
//!
//! The `1` versions will compile on all compilers supporting the 2018 edition. Note that this
//! applies only if no additional feature flags are enabled and does not apply to compiling or
//! running tests.
//!
//! [`ArcSwapAny`]: crate::ArcSwapAny
//! [`ArcSwapWeak`]: crate::ArcSwapWeak
//! [`load`]: crate::ArcSwapAny::load
//! [`into_inner`]: crate::ArcSwapAny::into_inner
//! [`DefaultStrategy`]: crate::DefaultStrategy
//! [`SeqCst`]: std::sync::atomic::Ordering::SeqCst
//! [`Weak`]: std::sync::Weak
pub mod internal;
pub mod limitations;
pub mod patterns;
pub mod performance;

View file

@ -0,0 +1,271 @@
//! Common use patterns
//!
//! Here are some common patterns one can use for inspiration. These are mostly covered by examples
//! at the right type in the crate, but this lists them at a single place.
//!
//! # Sharing of configuration data
//!
//! We want to share configuration from some source with rare updates to some high performance
//! worker threads. It can be configuration in its true sense, or a routing table.
//!
//! The idea here is, each new version is a newly allocated in its own [`Arc`]. It is then stored
//! into a *shared* `ArcSwap` instance.
//!
//! Each worker then loads the current version before each work chunk. In case a new version is
//! stored, the worker keeps using the loaded one until it ends the work chunk and, if it's the
//! last one to have the version, deallocates it automatically by dropping the [`Guard`]
//!
//! Note that the configuration needs to be passed through a *single shared* [`ArcSwap`]. That
//! means we need to share that instance and we do so through an [`Arc`] (one could use a global
//! variable instead).
//!
//! Therefore, what we have is `Arc<ArcSwap<Config>>`.
//!
//! ```rust
//! # use std::sync::Arc;
//! # use std::sync::atomic::{AtomicBool, Ordering};
//! # use std::thread;
//! # use std::time::Duration;
//! #
//! # use arc_swap::ArcSwap;
//! # struct Work;
//! # impl Work { fn fetch() -> Self { Work } fn perform(&self, _: &Config) {} }
//! #
//! #[derive(Debug, Default)]
//! struct Config {
//! // ... Stuff in here ...
//! }
//!
//! // We wrap the ArcSwap into an Arc, so we can share it between threads.
//! let config = Arc::new(ArcSwap::from_pointee(Config::default()));
//!
//! let terminate = Arc::new(AtomicBool::new(false));
//! let mut threads = Vec::new();
//!
//! // The configuration thread
//! threads.push(thread::spawn({
//! let config = Arc::clone(&config);
//! let terminate = Arc::clone(&terminate);
//! move || {
//! while !terminate.load(Ordering::Relaxed) {
//! thread::sleep(Duration::from_secs(6));
//! // Actually, load it from somewhere
//! let new_config = Arc::new(Config::default());
//! config.store(new_config);
//! }
//! }
//! }));
//!
//! // The worker thread
//! for _ in 0..10 {
//! threads.push(thread::spawn({
//! let config = Arc::clone(&config);
//! let terminate = Arc::clone(&terminate);
//! move || {
//! while !terminate.load(Ordering::Relaxed) {
//! let work = Work::fetch();
//! let config = config.load();
//! work.perform(&config);
//! }
//! }
//! }));
//! }
//!
//! // Terminate gracefully
//! terminate.store(true, Ordering::Relaxed);
//! for thread in threads {
//! thread.join().unwrap();
//! }
//! ```
//!
//! # Consistent snapshots
//!
//! While one probably wants to get a fresh instance every time a work chunk is available,
//! therefore there would be one [`load`] for each work chunk, it is often also important that the
//! configuration doesn't change in the *middle* of processing of one chunk. Therefore, one
//! commonly wants *exactly* one [`load`] for the work chunk, not *at least* one. If the processing
//! had multiple phases, one would use something like this:
//!
//! ```rust
//! # use std::sync::Arc;
//! #
//! # use arc_swap::ArcSwap;
//! # struct Config;
//! # struct Work;
//! # impl Work {
//! # fn fetch() -> Self { Work }
//! # fn phase_1(&self, _: &Config) {}
//! # fn phase_2(&self, _: &Config) {}
//! # }
//! # let config = Arc::new(ArcSwap::from_pointee(Config));
//! let work = Work::fetch();
//! let config = config.load();
//! work.phase_1(&config);
//! // We keep the same config value here
//! work.phase_2(&config);
//! ```
//!
//! Over this:
//!
//! ```rust
//! # use std::sync::Arc;
//! #
//! # use arc_swap::ArcSwap;
//! # struct Config;
//! # struct Work;
//! # impl Work {
//! # fn fetch() -> Self { Work }
//! # fn phase_1(&self, _: &Config) {}
//! # fn phase_2(&self, _: &Config) {}
//! # }
//! # let config = Arc::new(ArcSwap::from_pointee(Config));
//! let work = Work::fetch();
//! work.phase_1(&config.load());
//! // WARNING!! This is broken, because in between phase_1 and phase_2, the other thread could
//! // have replaced the config. Then each phase would be performed with a different one and that
//! // could lead to surprises.
//! work.phase_2(&config.load());
//! ```
//!
//! # Caching of the configuration
//!
//! Let's say that the work chunks are really small, but there's *a lot* of them to work on. Maybe
//! we are routing packets and the configuration is the routing table that can sometimes change,
//! but mostly doesn't.
//!
//! There's an overhead to [`load`]. If the work chunks are small enough, that could be measurable.
//! We can reach for [`Cache`]. It makes loads much faster (in the order of accessing local
//! variables) in case nothing has changed. It has two costs, it makes the load slightly slower in
//! case the thing *did* change (which is rare) and if the worker is inactive, it holds the old
//! cached value alive.
//!
//! This is OK for our use case, because the routing table is usually small enough so some stale
//! instances taking a bit of memory isn't an issue.
//!
//! The part that takes care of updates stays the same as above.
//!
//! ```rust
//! # use std::sync::Arc;
//! # use std::thread;
//! # use std::sync::atomic::{AtomicBool, Ordering};
//! # use arc_swap::{ArcSwap, Cache};
//! # struct Packet; impl Packet { fn receive() -> Self { Packet } }
//!
//! #[derive(Debug, Default)]
//! struct RoutingTable {
//! // ... Stuff in here ...
//! }
//!
//! impl RoutingTable {
//! fn route(&self, _: Packet) {
//! // ... Interesting things are done here ...
//! }
//! }
//!
//! let routing_table = Arc::new(ArcSwap::from_pointee(RoutingTable::default()));
//!
//! let terminate = Arc::new(AtomicBool::new(false));
//! let mut threads = Vec::new();
//!
//! for _ in 0..10 {
//! let t = thread::spawn({
//! let routing_table = Arc::clone(&routing_table);
//! let terminate = Arc::clone(&terminate);
//! move || {
//! let mut routing_table = Cache::new(routing_table);
//! while !terminate.load(Ordering::Relaxed) {
//! let packet = Packet::receive();
//! // This load is cheaper, because we cache in the private Cache thing.
//! // But if the above receive takes a long time, the Cache will keep the stale
//! // value alive until this time (when it will get replaced by up to date value).
//! let current = routing_table.load();
//! current.route(packet);
//! }
//! }
//! });
//! threads.push(t);
//! }
//!
//! // Shut down properly
//! terminate.store(true, Ordering::Relaxed);
//! for thread in threads {
//! thread.join().unwrap();
//! }
//! ```
//!
//! # Projecting into configuration field
//!
//! We have a larger application, composed of multiple components. Each component has its own
//! `ComponentConfig` structure. Then, the whole application has a `Config` structure that contains
//! a component config for each component:
//!
//! ```rust
//! # struct ComponentConfig;
//!
//! struct Config {
//! component: ComponentConfig,
//! // ... Some other components and things ...
//! }
//! # let c = Config { component: ComponentConfig };
//! # let _ = c.component;
//! ```
//!
//! We would like to use [`ArcSwap`] to push updates to the components. But for various reasons,
//! it's not a good idea to put the whole `ArcSwap<Config>` to each component, eg:
//!
//! * That would make each component depend on the top level config, which feels reversed.
//! * It doesn't allow reusing the same component in multiple applications, as these would have
//! different `Config` structures.
//! * One needs to build the whole `Config` for tests.
//! * There's a risk of entanglement, that the component would start looking at configuration of
//! different parts of code, which would be hard to debug.
//!
//! We also could have a separate `ArcSwap<ComponentConfig>` for each component, but that also
//! doesn't feel right, as we would have to push updates to multiple places and they could be
//! inconsistent for a while and we would have to decompose the `Config` structure into the parts,
//! because we need our things in [`Arc`]s to be put into [`ArcSwap`].
//!
//! This is where the [`Access`] trait comes into play. The trait abstracts over things that can
//! give access to up to date version of specific T. That can be a [`Constant`] (which is useful
//! mostly for the tests, where one doesn't care about the updating), it can be an
//! [`ArcSwap<T>`][`ArcSwap`] itself, but it also can be an [`ArcSwap`] paired with a closure to
//! project into the specific field. The [`DynAccess`] is similar, but allows type erasure. That's
//! more convenient, but a little bit slower.
//!
//! ```rust
//! # use std::sync::Arc;
//! # use arc_swap::ArcSwap;
//! # use arc_swap::access::{DynAccess, Map};
//!
//! #[derive(Debug, Default)]
//! struct ComponentConfig;
//!
//! struct Component {
//! config: Box<dyn DynAccess<ComponentConfig>>,
//! }
//!
//! #[derive(Debug, Default)]
//! struct Config {
//! component: ComponentConfig,
//! }
//!
//! let config = Arc::new(ArcSwap::from_pointee(Config::default()));
//!
//! let component = Component {
//! config: Box::new(Map::new(Arc::clone(&config), |config: &Config| &config.component)),
//! };
//! # let _ = component.config;
//! ```
//!
//! One would use `Box::new(Constant(ComponentConfig))` in unittests instead as the `config` field.
//!
//! The [`Cache`] has its own [`Access`][crate::cache::Access] trait for similar purposes.
//!
//! [`Arc`]: std::sync::Arc
//! [`Guard`]: crate::Guard
//! [`load`]: crate::ArcSwapAny::load
//! [`ArcSwap`]: crate::ArcSwap
//! [`Cache`]: crate::cache::Cache
//! [`Access`]: crate::access::Access
//! [`DynAccess`]: crate::access::DynAccess
//! [`Constant`]: crate::access::Constant

View file

@ -0,0 +1,87 @@
//! Performance characteristics.
//!
//! There are several performance advantages of [`ArcSwap`] over [`RwLock`].
//!
//! ## Lock-free readers
//!
//! All the read operations are always [lock-free]. Most of the time, they are actually
//! [wait-free]. They are [lock-free] from time to time, with at least `usize::MAX / 4` accesses
//! that are [wait-free] in between.
//!
//! Writers are [lock-free].
//!
//! Whenever the documentation talks about *contention* in the context of [`ArcSwap`], it talks
//! about contention on the CPU level multiple cores having to deal with accessing the same cache
//! line. This slows things down (compared to each one accessing its own cache line), but an
//! eventual progress is still guaranteed and the cost is significantly lower than parking threads
//! as with mutex-style contention.
//!
//! ## Speeds
//!
//! The base line speed of read operations is similar to using an *uncontended* [`Mutex`].
//! However, [`load`] suffers no contention from any other read operations and only slight
//! ones during updates. The [`load_full`] operation is additionally contended only on
//! the reference count of the [`Arc`] inside so, in general, while [`Mutex`] rapidly
//! loses its performance when being in active use by multiple threads at once and
//! [`RwLock`] is slow to start with, [`ArcSwap`] mostly keeps its performance even when read by
//! many threads in parallel.
//!
//! Write operations are considered expensive. A write operation is more expensive than access to
//! an *uncontended* [`Mutex`] and on some architectures even slower than uncontended
//! [`RwLock`]. However, it is faster than either under contention.
//!
//! There are some (very unscientific) [benchmarks] within the source code of the library, and the
//! [`DefaultStrategy`][crate::DefaultStrategy] has some numbers measured on my computer.
//!
//! The exact numbers are highly dependant on the machine used (both absolute numbers and relative
//! between different data structures). Not only architectures have a huge impact (eg. x86 vs ARM),
//! but even AMD vs. Intel or two different Intel processors. Therefore, if what matters is more
//! the speed than the wait-free guarantees, you're advised to do your own measurements.
//!
//! Further speed improvements may be gained by the use of the [`Cache`].
//!
//! ## Consistency
//!
//! The combination of [wait-free] guarantees of readers and no contention between concurrent
//! [`load`]s provides *consistent* performance characteristics of the synchronization mechanism.
//! This might be important for soft-realtime applications (the CPU-level contention caused by a
//! recent update/write operation might be problematic for some hard-realtime cases, though).
//!
//! ## Choosing the right reading operation
//!
//! There are several load operations available. While the general go-to one should be
//! [`load`], there may be situations in which the others are a better match.
//!
//! The [`load`] usually only borrows the instance from the shared [`ArcSwap`]. This makes
//! it faster, because different threads don't contend on the reference count. There are two
//! situations when this borrow isn't possible. If the content gets changed, all existing
//! [`Guard`]s are promoted to contain an owned instance. The promotion is done by the
//! writer, but the readers still need to decrement the reference counts of the old instance when
//! they no longer use it, contending on the count.
//!
//! The other situation derives from internal implementation. The number of borrows each thread can
//! have at each time (across all [`Guard`]s) is limited. If this limit is exceeded, an owned
//! instance is created instead.
//!
//! Therefore, if you intend to hold onto the loaded value for extended time span, you may prefer
//! [`load_full`]. It loads the pointer instance ([`Arc`]) without borrowing, which is
//! slower (because of the possible contention on the reference count), but doesn't consume one of
//! the borrow slots, which will make it more likely for following [`load`]s to have a slot
//! available. Similarly, if some API needs an owned `Arc`, [`load_full`] is more convenient and
//! potentially faster then first [`load`]ing and then cloning that [`Arc`].
//!
//! Additionally, it is possible to use a [`Cache`] to get further speed improvement at the
//! cost of less comfortable API and possibly keeping the older values alive for longer than
//! necessary.
//!
//! [`ArcSwap`]: crate::ArcSwap
//! [`Cache`]: crate::cache::Cache
//! [`Guard`]: crate::Guard
//! [`load`]: crate::ArcSwapAny::load
//! [`load_full`]: crate::ArcSwapAny::load_full
//! [`Arc`]: std::sync::Arc
//! [`Mutex`]: std::sync::Mutex
//! [`RwLock`]: std::sync::RwLock
//! [benchmarks]: https://github.com/vorner/arc-swap/tree/master/benchmarks
//! [lock-free]: https://en.wikipedia.org/wiki/Non-blocking_algorithm#Lock-freedom
//! [wait-free]: https://en.wikipedia.org/wiki/Non-blocking_algorithm#Wait-freedom