Hierarchical data in RDBMSs
Original works:
o Vadim Tropashko introduces the Nested Intervals Model in Trees in SQL:
Nested Sets and Materialized Path. The article has been followed up by articles
discussing different numeric path encodings:
Binary rational number encoding—Relatively easy to understand, but
hard to manage for the DBMS when the tree grows large:
Relocating Subtrees in Nested Intervals Model: How parts of
hierarchies are moved in the Nested Intervals Model.
Integer Labeling in Nested Intervals Model.
Refined encodings—harder to understand(?), but better real-World
performance:
Nested Intervals with Farey Fractions.
Nested Intervals Tree Encoding with Continued Fractions. See
also: A discussion about errata in the article.
See also a message where the author gives some reading-hints.
Review, and introduction of Path Matrices in the June '05 edition of
SIGMOD Record.
In the second half of a usenet message, Mr Tropashko summarizes pros
and cons of the different encoding systems.
o Joe Celko: Trees in SQL: Covers the Nested Set Model; mentions adjacency list
model.
o Dan Hazel proposes an alternative to Celko's and Tropashko's encodings in Using
rational numbers to key nested sets.
o Dong et al: Maintaining Transitive Closure of Graphs in SQL (see upper right
corner for PDF-version).
Books:
o Vadim Tropashko's SQL Design Patterns includes good chapters about trees and
digraphs in SQL databases. The book contains a few errors; unfortunately, the
book's errata page is hard to find. Publicly available chapters:
Chapter 3: Exotic Operators in SQL
Chapter 3: Trees
o Joe Celko: Trees and Hierarchies in SQL for Smarties. Chapter 2, Adjacency List
Model, is available at SQLSummit.
Articles and tutorials:
o Comparison of hierarchical query methods, by Vadim Tropashko.
o Aaron Mackey: Relational Modeling of Biological Data: Trees and Graphs
(adjacency list, nested set, transitive closure tables). Mackey has also co-written a
related slide-set about relational databases, aimed at biologists.
o Lennart Jonsson: Representing Trees in a relational DB (materialized path,
transitive closure tables).
o Anirban Dutta: Trees in SQL: Nested Sets and Materialized Path
o Bill Karwin: Models for hierarchical data
o Joe Celko: Hierarchical SQL (at ONLamp). Covers two variants of the
materialized path approach.
o Tom Haughey: Modeling Hierachies (adjacency lists, nested sets).
o Daniel Aioanei & Adi Malinaru: General trees persisted in relational databases.
o From c2: Doc Query In Sql.
o Gijs Van Tulder: Storing Hierarchical Data in a Database (adjacency model and
nested set model in an PHP+MySQL context).
o Phillip J. Eby: The simplest(?) way to do tree-based queries in SQL; covers
transitive closure tables.
o Stack Overflow overview-Q/A on the subject
Articles for PostgreSQL users:
o Michael Glaesemann: Static Hierarchies and Binary Fractions in PostgreSQL.
o Shawn Deleurme: A Nested Set Implementation in Java and PostgreSQL.
o Quassnoi:
Adjacency list vs. nested sets: PostgreSQL
PostgreSQL: nested sets and R-Tree
Articles for users of Microsoft's SQL Server:
o MSSQL 2008's new hierarchyid data type, for implementing efficient
materialized path hierarchy encodings. More on the subject.
o Quassnoi: Adjacency list vs. nested sets: SQL Server
o Eugene Lepekhin: Trees in SQL databases.
o Dennis W. Forbes: Versatile High Performance Hierarchies in SQL Server: How
to implement the Nested Set Model.
o Kendall Willets: SQL Graph Algorithms.
o From sqlteam.com: More Trees & Hierarchies in SQL (adjacency list,
materialized path).
o Pure Performance, Inc: Managing Hierarchical Data in SQL
Articles for Oracle users
o Philip Greenspun: Trees in Oracle SQL.
o Rene Nyffenegger: START WITH and CONNECT BY in Oracle SQL.
o Quassnoi: Adjacency list vs. nested sets: Oracle
Articles for MySQL users
o Quassnoi: Hierarchical queries in MySQL
o Mike Hillyer: Managing Hierarchical Data in MySQL (adjacency list, nested set).
Articles/libraries for Python programmers
o The Django Treebeard project.
Michael M. David has written several articles and a book about modelling hierarchical
structures—including XML—in standard SQL, using outer joins.
o Advanced Data Access Technologies, Mr. David's company site; contains several
articles about modeling hierarchical structures in standard SQL, including a
general tutorial on the subject.
o TDAN articles:
Part 1, In XML: Using SQL to Link Below the Root
Part 2, In SQL: Processing XML's Complex Hierarchical Structures
o SIGMOD article: ANSI SQL Hierarchical Processing Can Fully Integrate Native
XML
o Mr. David has also written a book about the subject: Advanced ANSI SQL Data
Modeling and Structure Processing.
The book's site has a sample chapter named Data Structure Extraction (DSE)
Technology. — And at Amazon, you may browse another (small) part of the
book.
XML in RDBMSs:
o InfoWorld special report: Relational databases get a grip on XML.
o Cohen et al: Labeling Dynamic XML Trees (postscript document).
o Ronald Bourrets site about XML and databases.
LDAP-related:
o From OpenLDAP's FAQ: Directories vs. Relational Database Management
Systems.
o An IBM paper about the challenges which arise when using DB2 as an LDAP
backend (requires IEEE Xplore subscription).
Implementations:
o developerWorks: Migrating Recursive SQL from Oracle to DB2 UDB (note:
DB2's recursive SQL feature is close to SQL:1999, while Oracle's CONNECT BY is
non-standard).
o Dan Sullivan: Heirarchical Queries in SQL Server 2005.
o Oracle's built-in facilities for hierarchical queries
o ltree PostgreSQL module for materialized path trees.
o Not an RDBMS, but still interesting: OpenLDAP.
Temporal databases
Online book (thesis) by Christian S. Jensen: Temporal Database Management.
The web-site of Richard T. Snodgrass contains multiple relevant resources, including an
online book (see first paragraph in the publications page).
From Steve Barker's excellent lecture notes: Temporal and Active Databases (gzipped
postscript document).
Temporal Access for Users (TAU): Somewhat experimental software products that
implement ideas from temporal database theories.
TimeCenter, an international center for the support of temporal database applications on
traditional and emerging DBMS technologies.
Temporal Database Glossary.
Book by Date, Darwen and Lorentzos: Temporal Data & the Relational Model. A
number of summary slide sets related to the book exist online.
Darwen & Date: An Overview and Analysis of TSQL2 (TSQL2 was a proposed addition
to SQL, but never made it into the standard).
The Chorochronos research network.
On persistent B-trees (AKA multiversion B-trees), a data storage structure which may be
used to implement temporal database software:
o Lars Arge's External Memory Data Structures mentions partially persistent B-
trees (AKA multiversion B-trees).
o Becker, Geschwind, Ohler, Seeger, Widmayer: An asymptotically optimal
multiversion B-tree. An expanded version (also featuring a more readable layout)
is available if you have access to The VLDB Journal.
o TPIE ("Transparent Parallel I/O Environment") has been used by some to
implement persistent B-trees (and many other storage structures) in C++.