..

Sparse Set - stable pointers

This is a continuation of my last article, ‘Sparse Set - A Flexible, Cache-Friendly Data Structure’.

In the last article, we implemented a basic sparse set. Now, let’s improve our implementation by adding support for stable pointers. To achieve this, we can make a slight adjustment, instead of storing data in the dense array, we’ll store it directly in the sparse array.

For this to work, we would need to change the structure of the sparse array from an array of integers to an array of pages. Each page will hold N dense indices along with N data items. The sparse index would then be a combination of the page index and the page offset (the index of the data within a specific page).

Sparse Array Remove 2

As shown in the diagram, the sparse array is paged and directly holds the data. Since we allocate and deallocate one page at a time, we avoid the need for complete reallocation each time we run out of capacity, as would be required with a dynamic array.

You might have noticed that this implementation of a sparse set is not fully contiguous. However, this isn’t a significant concern in practical use if an optimal page size is chosen. Crossing OS page boundaries often results in a cache miss anyway, so we still benefit from spatial locality while maintaining stable pointers to the elements.

Implementation

Let’s start by redefining our sparse and dense arrays.

struct SparsePage {
    size_t* sparse; // array of dense indices
    T* data;        // array of data
}

std::vector<SparsePage> pages;
std::vector<size_t> dense;

Now, we’ll need some logic to map the sparse index to the page index and page offset, which we can easily implement using bitwise operations.

const unsigned log2_page_size = 6;
const size_t page_size = 1 << 6;

size_t toPageIndex(size_t sparse_index) {
    sparse_index >> log2_page_size;
}

size_t toPageOffset(size_t sparse_index) {
    return sparse_index & (page_size - 1);
}

The page index refers to the index of the pages array, while the page offset is the index of the data & sparse array within the corresponding page.

auto page_index = toPageIndex(sparse_index);
auto page_offset = toPageOffset(sparse_index);

auto data = pages[page_index].data[page_offset];
auto dense_index = pages[page_index].sparse[page_offset];

With this changes, we can move on to a proper implementation of a sparse set with stable pointer to its elements.

const unsigned LG2_PAGE_SIZE = 6;
const size_t PAGE_SIZE = 1 << 6;

size_t toPageIndex(size_t sparse_index) {
    return sparse_index >> LG2_PAGE_SIZE;
}

size_t toPageOffset(size_t sparse_index) {
    return sparse_index & (PAGE_SIZE - 1);
}

template <typename T>
struct SparseSet {
    struct SparsePage {
        size_t* sparse;
        T* data;
    };

    struct SetEntry {
        T *data;
        size_t index;
    };

    size_t size = 0;
    size_t max_sparse_idx = 0;
    std::vector<std::optional<SparsePage>> pages;
    std::vector<size_t> dense;

    ~SparseSet() {
        for(auto page_opt : this->pages) {
            if(auto page = page_opt) {
                delete[] page->sparse;
                delete[] page->data;
            }
        }
    }

    SetEntry add(T item) {
        auto dense_idx = this->size;
        this->size += 1;

        // try to reuse last freed index
        if(dense_idx < this->dense.size()) {
            auto sparse_idx = this->dense[dense_idx];
            auto page_idx= toPageIndex(sparse_idx);
            auto page_offset = toPageOffset(sparse_idx);
            
            if(auto page = this->pages[page_idx]) {
                auto data_ptr = &page->data[page_offset];
                *data_ptr = item;
                return { data_ptr, sparse_idx };
            }
        }

        // allocate new index
        auto sparse_idx = this->max_sparse_idx;
        this->max_sparse_idx += 1;

        auto page_idx = toPageIndex(sparse_idx);
        auto page_offset = toPageOffset(sparse_idx);

        ensurePageIndex(page_idx);
        this->dense.push_back(sparse_idx);

        auto [sparse_ptr, data_ptr] = getSparseEntryPtr(sparse_idx);
        *sparse_ptr = dense_idx;
        *data_ptr = item;

        return {data_ptr, sparse_idx};
    }

    bool remove(size_t idx) {
        if(!contains(idx)) return false;

        this->size -= 1;

        auto [sparse_ptr, data_ptr] = getSparseEntryPtr(idx);

        auto end_dense_idx = this->size;
        auto end_sparse_idx = this->dense[end_dense_idx];
        auto [end_sparse_ptr, _] = getSparseEntryPtr(end_sparse_idx);

        // swap remove
        this->dense[*sparse_ptr] = end_sparse_idx;
        *end_sparse_ptr = *sparse_ptr;

        // update end dense idx for reuse
        *sparse_ptr = end_dense_idx;
        this->dense[end_dense_idx] =  idx;

        return true;
    }

    std::optional<T*const> getPtr(size_t idx) {
        if(!contains(idx)) return std::nullopt;

        auto page_idx = toPageIndex(idx);
        auto page_offset = toPageOffset(idx);

        return { &this->pages[page_idx].value().data[page_offset] };
    }

    bool contains(size_t idx) {
       auto page_idx = toPageIndex(idx);
       auto page_offset = toPageOffset(idx);

       if(page_idx >= this->pages.size()) return false;

       if(auto page = this->pages[page_idx]) {
           auto dense_idx = page->sparse[page_offset];
           auto current_idx = this->dense[dense_idx];
           return dense_idx < this->size && current_idx == dense_idx;
       }
        
       return false; 
    }

    void ensurePageIndex(size_t page_index) {
        auto pages_size = this->pages.size();
        if(page_index >= pages_size) {
            this->pages.resize(page_index + 1, std::nullopt);
        }
        if(!this->pages[page_index]) {
            this->pages[page_index] = { new size_t[PAGE_SIZE], new T[PAGE_SIZE] };
        }
    }

    std::pair<size_t*, T*> getSparseEntryPtr(size_t idx) {
        auto page_idx = toPageIndex(idx);
        auto page_offset = toPageOffset(idx);
        auto page = this->pages[page_idx].value();
        return { &page.sparse[page_offset], &page.data[page_offset] };
    }
};

Complete Implementation on Compiler Explorer

This implementation does not include iterator support, so it is up to the reader to implement their own iterator. Note: iterators can be implemented using the dense array.