What are some cool and obscure data structure you know of?

lil_shi@programming.dev · 24 days ago

What are some cool and obscure data structure you know of?

dustletter@piefed.blahaj.zone · 24 days ago

Skew binary trees. They’re an immutable data structure combining the performance characteristics of lists (O(1) non-amortized push/pop) and b-trees (log(N) lookup and updates)
They use a sequence of complete trees, cleverly arranged using skew binary numbers so that adding an element never causes cascading updates.
In practice they’re superseded by relaxed radix balanced trees.

Marc@sueden.social · 24 days ago

@protein

Finger Tree!

A persistent, purely functional workhorse. Amortized O(1) access at both ends, O(log n) concatenation/splitting.

It generalizes elegantly to build sequences, priority queues, and more. Powers Haskell’s main Data.Sequence. A functional programmer’s secret weapon.

Vorpal@programming.dev · 24 days ago

On paper they are efficient. In practise, all pointer based data structures (linked lists, binary trees, etc) are slow on modern hardware. And this effect is more important than the complexity in practise for most practical high performance code.

You are far better off with linear access where possible (e.g. vectors, open addressing hash maps) or if you must have a tree, make the fan-out factor as large as possible (e.g. btrees rather than binary trees).

Now, I don’t know if Haskell etc affords you such control, I mainly code in Rust (and C++ in the past).

Also see this old thread from 2016 on hacker news about this very topic: https://news.ycombinator.com/item?id=13263275

tiredofsametab@fedia.io · 24 days ago

Not necessarily obscure, but I don’t think Tries get enough love.

Edit: I can’t spell

muzzle@lemmy.zip · 24 days ago

Tris?

catchy_name@feddit.it · 24 days ago

They likely meant tries / prefix trees.

muzzle@lemmy.zip · 24 days ago

Trie!

tiredofsametab@fedia.io · 24 days ago

Yep. I can’t spell today

Atlas_@lemmy.world · 23 days ago

Fibonacci heaps are pretty cool. Not used very often b/c they’re awful to implement, but better complexity than many other heaps.

Also Binary Lifting is closer to an algorithm than a data structure but it’s used in Competitive Programming a fair bit, and isn’t often taught: https://cp-algorithms.com/graph/lca_binary_lifting.html

And again closer to an algo tham a data structure, but Sum over Subsets DP in 3^n also has a cool little bit of math in it: https://cp-algorithms.com/algebra/all-submasks.html

xep@discuss.online · 24 days ago

How about this variation of linked lists? https://www.data-structures-in-practice.com/intrusive-linked-lists/

idunnololz@lemmy.world · 24 days ago

I personally don’t think it’s that obscure but I have never seen this used in production code that I didn’t write: the linked hash map or ordered hash map.

xthexder@l.sw0.com · edit-2 22 days ago

I came up with a kind of clever data type for storing short strings in a fixed size struct so they can be stored on the stack or inline without any allocations.
It’s always null-terminated so it can be passed directly as a C-style string, but it also stores the string length without using any additional data (Getting the length would normally have to iterate to find the end).
The trick is to store the number of unused bytes in the last character of the buffer. When the string is full, there are 0 unused bytes and the size byte overlaps the null terminator.
(Only works for strings < 256 chars excluding null byte)

Implementation in C++ here: https://github.com/frustra/strayphotons/blob/master/src/common/common/InlineString.hh

Edit: Since a couple people don’t seem to understand the performance impact of this vs regular std::string, here’s a demo: https://godbolt.org/z/34j7obnbs This generates 10000 strings like “Hello, World! 00001” via concatenation. The effect is huge in debug mode, but still has performance benefits with optimizations turned on:

With -O3 optimization
std::string: 0.949216ms
char[256] with strlen: 0.88104ms
char[256] without strlen: 0.684734ms

With no optimization:
std::string: 3.5501ms
char[256] with strlen: 0.885888ms
char[256] without strlen: 0.687733ms

(You may need to run it a few times to get sample numbers due to random server load on godbolt)
Changing the buffer size to 32 bytes has a negligible performance improvement over 256 bytes in this case, but might be slightly faster due to the whole string fitting in a cache line.

FizzyOrange@programming.dev · 24 days ago

Interesting idea, but your trick is never really going to help (you can store up to 255 bytes instead of 254). Also always using 256 bytes for every string seems wasteful.

I think LLVM’s small string optimisation is always going to be a better option: https://joellaity.com/2020/01/31/string.html

xthexder@l.sw0.com · 24 days ago

22 characters is significantly less useful than 255 characters. I use this for resource name keys, asset file paths, and a few other scenarios. The max size is configurable, so I know that nothing I am going to store is ever going to require heap allocations (really bad to be doing every frame in a game engine).

I developed this specifically after benchmarking a simpler version and noticed a significant amount of time being spent in strlen(), and it had real benefits in my case.
Admittedly just storing a struct with a static buffer and separate size would have worked pretty much the same and eliminated the 255 char limitation, but it was fun to build.

duckythescientist@sh.itjust.works · 24 days ago

I’m also not sure if this is obscure, but Bloom Filters! It’s a structure that you can add elements to then ask it if it has seen the element before with the answer being either “no” or “probably yes”. There’s a trade-off between confidence of a “probably yes”, how many elements you expect to add, and how big the Bloom Filter is, but it’s very space and time efficient. And it uses hash functions which always make for a fun time.

lad@programming.dev · 24 days ago

Relevant xkcd

in Randall's words

Sometimes, you can tell Bloom filters are the wrong tool for the job, but when they’re the right one you can never be sure.