Skip to content

Proposal: A revamp of the data structures offered in std #7782

@mrakh

Description

@mrakh

Currently, the standard library's offering of data structures have some shortcomings:

  1. Specialized variants of structs should have their qualifiers appended to the struct name whenever possible.
    Ideally, structs should be named so that, when lexicographically sorted, variants of a given struct are grouped together. This makes it easier to read and scan through lists of struct names in places like autogenerated docs and autocomplete menus, and makes the struct names amenable to prefix based indexing schemes for search functionality. It is my opinion that qualifiers like Auto or Unmanaged should be appended to the struct name (e.g. std.HashMap -> std.HashMapAuto) whenever possible. As an exception, qualifiers that are a fundamental part of the name, like the Hash in std.HashMap or the Singly and Linked in std.SinglyLinkedList, should remain where they are.
  2. Vaguely named data structures should be replaced with clearly named concrete implementations.
    Stacks and queues are extremely general classes of data structure, to the point that I would consider them an abstraction moreso than concrete data structures, simply because there's a hundred different ways to implement them. If a user wants a FILO stack, they can make one with a dynamic array or a linked list. If a user wants a FIFO queue, they can make one with a circular buffer or a linked list. If a user wants a priority queue, they can make one with a red-black tree. Given that Zig is a low level language, I believe it's best to refactor the current stack and queue implementations to provide those data structures directly:
    • std.TailQueue: This is literally just a doubly linked list. Why not call it std.DoublyLinkedList, which makes its operations and performance characteristics more clear to the user?
    • std.SegmentedList: This is basically what an unrolled linked list is, so why not call it something like std.UnrolledLinkedList?
    • std.PriorityQueue: I would remove this and replace it with data structures that can be used to trivially implement a priority queue (e.g. std.RedBlackTree, std.AVLTree, std.BinaryHeap, std.SkipList)
      Which leads me to my final point.
  3. Most importantly, there are not enough data structures.
    A good standard library can make or break a language, and out-of-the-box access to basic data structures is part of that. Right now, the Zig std has hash tables, dynamic arrays, linked lists, a "priority queue" that seems to be implemented as a binary heap, and a "segmented list" that seems to be implemented as an unrolled linked list. Here are some data structures that I think are worth considering adding to Zig:
    • Red-black trees
    • AVL trees
    • B/B+ trees
    • Splay trees
    • Binary heaps (can lift this from existing std.PriorityQueue)
    • Tries (prefix trees)
    • Ropes (for fast editing of large strings)
    • Doubly linked lists (can lift this from existing std.TailQueue)
    • Unrolled linked lists (can lift this from existing std.SegmentedList)
    • Skip lists
    • Ring buffers

I'm willing to create a pull request to implement some of these data structures, but I'd like to hear from you guys what your thoughts are.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions