GPU and deadlocks
Copyright© Schmied Enterprises LLC, 2025.
The key to large clustered AI virtual computers
Deadlocks occur when two or more processes are blocked indefinitely, each waiting for the other to release a resource. This can bring your system to a grinding halt. One common sign of a deadlock is processes stuck with zero CPU usage, indicating a circular dependency.
1. Circular Dependencies: A Recipe for Gridlock
Circular dependency deadlocks happen when a chain of processes each hold a resource that the next process in the chain needs. Imagine Process A holds Resource X and needs Resource Y, Process B holds Resource Y and needs Resource Z, and Process C holds Resource Z and needs Resource X. No process can proceed, creating a standstill. The common solution is acquiring locks in a deterministic order.
2. Busy Locks: When "Busy" Becomes a Problem
Busy locks (also known as spinlocks) are a low-level locking mechanism where a process repeatedly checks if a resource is available. While they can be efficient for short-duration locks, they become problematic when a lock is held for too long. If a process spins indefinitely waiting for a busy lock to be released, it wastes CPU cycles and can starve other processes. These locks either never return or rarely return, hogging the system. These spinning locks caused lots of issues designing drivers for legacy Windows and Linux systems.
Concurrency Control Mechanisms: Managing Access
These mechanisms are designed to prevent data corruption and ensure consistency when multiple processes access shared resources.
1. Dijkstra Semaphores: Signaling Availability
Dijkstra semaphores are a classic synchronization tool. They use integer values to signal the availability of resources. Processes can decrement the semaphore to acquire a resource (wait) and increment it to release a resource (signal). Semaphores help manage access to a limited number of resources, preventing race conditions.
2. Hierarchical Locks: Orderly Access
Hierarchical locking imposes a strict order in which locks can be acquired. This prevents circular dependencies and reduces the risk of deadlocks. Processes must acquire locks in a predefined sequence, ensuring that no process can block another indefinitely.
3. Two-Phase Locking: A Commitment Strategy
Two-phase locking (2PL) is a concurrency control method used in database systems like SQL, Oracle, MSSQL. It has two phases:
Growing Phase: Processes acquire all the locks they need. Shrinking Phase: Processes release all their locks.
Once a process starts releasing locks, it cannot acquire any new ones. 2PL ensures serializability, meaning that the execution of concurrent transactions is equivalent to some serial order.
4. Reader-Writer Locks: Balancing Access
Reader-writer locks allow multiple processes to read a shared resource concurrently, but only one process can write to it at a time. This improves performance in scenarios where reads are much more frequent than writes.
5. Shared-Intention-Exclusive (SIX) Locks: Fine-Grained Control
Shared-Intention-Exclusive (SIX) locks are an extension of reader-writer locks that provide more fine-grained control over access to resources. They are often used in database systems to optimize concurrency.
Shared (S) Lock: Allows multiple processes to read the resource. Intention Shared (IS) Lock: Indicates that a process intends to acquire a shared lock on a resource. Exclusive (X) Lock: Allows only one process to write to the resource. Intention Exclusive (IX) Lock: Indicates that a process intends to acquire an exclusive lock on a resource. Shared Intention Exclusive (SIX) Lock: Allows a process to read the resource and also intends to update it later.
SIX locks can improve concurrency by allowing readers to access a resource while a writer prepares to modify it.
Here's how we tackle the challenge of pinpointing a specific piece of data within a massive data store of petabytes.
Our method for finding data blocks in massive storage systems is designed for speed and reliability. By using a balanced approach, we ensure that we only need to check a fixed number of locations to find any piece of data. This allows for smooth system upgrades, flexible scaling, efficient PCIE and DDRAM bus use, and robust error correction, all while guaranteeing consistently fast lookups. We use hashing to pinpoint a specific set of nodes (N of them) that could potentially hold the data block.
This approach is also perfect for ensuring data consistency and reliability. Because we know exactly which nodes might contain the data, we can implement a strict, predictable locking order across them. The key is to be deterministic, never random. This prevents data corruption issues that can arise when multiple writes of the same data end up on different servers. Such queries will be eventually consistent, available, and partition tolerant.
Any distributed option than a hashed set would hurt partition tolerance, and performance.
Link of the day.
Perplexity Science: Breakthrough in Fusion Energy Stability. Here.
Link of the day.
Microsoft rolls out next generation of its AI chips, takes aim at Nvidia's software. Here.