• Hacker News
  • new|
  • comments|
  • show|
  • ask|
  • jobs|
  • aserafini 1 days

    I made an “offline” version of this visualisation a long time ago: https://github.com/adamserafini/suffix-tree

    C++ with output rendered with Graphviz

  • esafak 1 days

    https://gemini.google/overview/canvas/ is great at these visualizations. I used it recently to help me understand chord progressions in specific songs, and compare them to those of other songs.

    abahgat 3 hours

    Author here. It’s a very hopeful time for interactive learning. Coincidentally, it is inspiring that both OpenAI and Anthropic released these improved visualization capabilities just this week.

    Using Gemini's canvas for chord progressions is a great example of this. When I was building this suffix tree visualizer, I kept thinking about how much "spatial" intuition is required to understand the algorithm; having these live, interactive environments available to students is a massive step forward.

    https://openai.com/index/new-ways-to-learn-math-and-science-... https://claude.com/blog/claude-builds-visuals

  • sillywabbit 1 days

    In the event someone is encountering suffix trees for the first time and thinking of using one: the amount of RAM required for suffix trees is obscene.

    vintermann 1 days

    Yes, we get the same speed from suffix arrays these days, and much, much less memory usage.

    But good luck visualizing what those algorithms do :)

    abahgat 3 hours

    I totally fell for the "obscene memory" trap myself. My first encounter with suffix trees outside of a textbook was for an ITA Software 'Instant Search' puzzle. The requirement was sub-0.1ms search on a large string database, I went straight for a generalized suffix tree. Then I realized they had asked for the solution to fit within a 1GB heap. :(

    I wrote up the full 'war story' of how I had to profile the heap and optimize the node representation (shaving bytes off the edge storage) just to get it to boot without an OOM error: https://www.abahgat.com/blog/the-programming-puzzle-that-got...

    It’s the most tangible example I've run into of where theoretical O(n) space complexity meets the reality of object pointer overhead.

    xen0 22 hours

    I've never grokkeed suffix trees, but isn't possible for them to be O(n) in space (n total length of all strings)? Is there just an unacceptable constant factor overhead? I can imagine the pointer overhead being painful.

    sillywabbit 19 hours

    Like the other poster said, the rabbit hole continues with suffix arrays (https://en.wikipedia.org/wiki/Suffix_array#Space_efficiency), then compressed suffix arrays (https://en.wikipedia.org/wiki/Compressed_suffix_array).

    Also explained by the creator of this: https://www.abahgat.com/project/suffix-tree/

    > the human genome can be encoded as a 3GB string constructed out of an alphabet of four characters

    > As of 2019, a suffix tree indexing the human genome using state of the art algorithms can easily occupy tens of gigabytes.