I’ve been working on ZoomIn a lot in the past few weeks. I need to start writing more newsletters about what I’m doing. But anyway, there’s a technical aspect of ZoomIn that I want to highlight, since it’s really great.
Back in 2006, when John, Nick and I were working on the first version of ZoomIn, we were discussing ways of storing the ZoomIn url hierarchy in the database. So for example, we had nodes like:
- /wellington/
- /wellington/newtown/
- /wellington/newtown/coromandel-street/
And in the database, we wanted to store that newtown was a child of wellington, and coromandel street a child of newtown. The easiest way to do that would have been just randomly inserting nodes into the database and building a tree structure where each node has a parent_id. The only downside to this, is that I wanted a really efficient way of finding all the nodes, comments, or photos, under a node in the tree.
eg - we wanted to be able to do a single query (without any crazy joins) that would return all comments on any streets, places or street addresses under /wellington/newtown/
.
We went away for a while, and bounced over a bunch of ideas, then eventually stumbled on the inet6 datatype that is native to postgresql. So after a bit of research, we worked out that we could use an inet6 as a primary key and foreign key, rails worked well with it, and indexes operated fine as well. So what we ended up doing was something like this:
- /wellington/ => 1::
- /wellington/newtown/ => 1::1::
- /wellington/newtown/coromandel-street/ => 1::1::1::
- /wellington/newtown/hanson-street/ => 1::1::2::
- /wellington/newtown/hanson-street/43/ => 1::1::2::1::
- /wellington/berhampore/ => 1::2::
And we wrote two helper methods on each Place (a node in the tree), left
and right
.
- wellington.left => 1::1::
- wellington.right => 1::FFFF::
So - if for example, you want to find all the comments under Newtown (to show the new forum page), you do a query like this:
['select * from comments where place_id between ? and ?', newtown.left, newtown.right]
In the end, we actually used a variable sized structure (so we don’t use a whole subnet per node, but instead a specific bit-depth), but the basic idea is as above.
I’m not sure what kind of data structure this is (I think maybe it’s a fixed tree?), but it’s worked excellently for over 7 years on ZoomIn. Combined with the GIST 2-dimensional index that PostGIS supplies, we’ve been able to do most all the queries you want on a geo-social site without relying on denormalizing the data or super expensive mega-joins or subselects.
Edit Stig from SilverStripe informed me that this technique is called a Nested Set.