Home
Help
Register
Log in

Search

 
   Active Threads  

You are here: Home > LLBLGen Pro > Architecture> Storing hierarchies in a single table, paper
 

Pages: 1
Architecture
Storing hierarchies in a single table, paper
Page:1/1 

  Print all messages in this thread  
Poster Message
Otis
LLBLGen Pro Team



Location:
The Hague, The Netherlands
Joined on:
17-Aug-2003 18:00:36
Posted:
34993 posts
# Posted on: 20-Jun-2008 10:02:31.  
I think a lot of you will be interested in this. Storing a hierarchy or tree in a single table is a requirement seen often in a project. SQL however doesn't have constructs to fetch hierarchies, sort on them and filter on them. Several different approaches have been brought forward, see this link to a thread on our own forum:
http://www.llblgen.com/tinyforum/Messages.aspx?ThreadID=3208

Today I ran into a paper about new research on this subject: http://arxiv.org/abs/0806.3115

It's full of math, so you've been warned Wink, but most of it is used as proof that what's been brought forward actually is correct (so you can just assume that and read about the idea). The basic idea is to use a rational number as the key, based on the left and right key. It's a brilliant idea, and gives very easy SQL to fetch, sort etc. a hierarchy. It also gives math to move subtrees from one node to the other, unfotunately not SQL.
Frans Bouma
LLBLGen Pro / ORM Profiler Lead Developer | Blog | Twitter
 
Top
psandler
User



Location:
Chicago, IL
Joined on:
22-Feb-2005 22:24:13
Posted:
540 posts
# Posted on: 20-Jun-2008 16:50:33.  
The problem I have with the nested set approach is that you are basically bastardizing the relational model by storing your entity relationships as metadata. That means that moving a complex tree from one parent to another could require a massive amount of updates to the metadata that stores all the relationships.

In the adjacency model, to move a subtree you only have to update a single value--the FK (parent) of the child you want to move.

To me the adjacency model is more "correct", in that relationships between nodes are stored as actual relationships in the database, and are enforced with referential integrity. It actually suprised/surprises me that Celko advocates this approach, since he is such a purist about the relational model (read his views on the Entity-Attribute-Value model for example).

All that said, I understand that the nested set approach may be more pragmatic in many cases. The concept in the paper you posted certainly looks like an improvement on the concept. The math is way over my head, of course. Regular Smiley

I do think SQL Server's CTE feature can be used to do adjacency model fetches. A basic example at http://www.4guysfromrolla.com/webtech/071906-1.shtml. I have not tinkered with this at all, so I don't claim that it solves the whole problem (sorting and filtering for example).


My C#/SQL Blog (some LLBL content)
Email: psandler70 (at) yahoo.com
 
Top
Otis
LLBLGen Pro Team



Location:
The Hague, The Netherlands
Joined on:
17-Aug-2003 18:00:36
Posted:
34993 posts
# Posted on: 20-Jun-2008 16:59:43.  
psandler wrote:
The problem I have with the nested set approach is that you are basically bastardizing the relational model by storing your entity relationships as metadata. That means that moving a complex tree from one parent to another could require a massive amount of updates to the metadata that stores all the relationships.

In the adjacency model, to move a subtree you only have to update a single value--the FK (parent) of the child you want to move.

To me the adjacency model is more "correct", in that relationships between nodes are stored as actual relationships in the database, and are enforced with referential integrity. It actually suprised/surprises me that Celko advocates this approach, since he is such a purist about the relational model (read his views on the Entity-Attribute-Value model for example).

I think that having a single table with a 'ParentID' FK pointing to its own PK is still checked by an FK constraint (so it does fall into the boundaries of a relational model), however it's unsuitable for doing set operations. The tree solution Celko put forward does offer the ability to perform set operations on it, so it's more SQL friendly, however it has problems with updates.

So I think we're not talking about the same thing Wink. This paper is about having a hierarchy of data (or better: a semantical hierarchy of data), like the textbook examples: Employees which have an FK to themselves for Manager.). Typically, people store these kind of hierarchies in a single table, but can't do anything with it when it comes to hierarchies: it's a pain to obtain the whole hierarchy below employee X for example.

I think what you're talking about is having a db inside a db, where a couple of tables are defined to be able to add columns etc. on the fly, indeed storing meta-data inside a table, which is a big problem waiting to happen.

Quote:

All that said, I understand that the nested set approach may be more pragmatic in many cases. The concept in the paper you posted certainly looks like an improvement on the concept. The math is way over my head, of course. Regular Smiley

Oh, the math is also way over my head too, the beauty about these proofs is that you can skip them, and just assume a math-nerd has checked it already Wink.Though admitted, the stuff is still in the Math corner and not ready to be applied onto a DB right away without digging into the material.

Quote:
I do think SQL Server's CTE feature can be used to do adjacency model fetches. A basic example at http://www.4guysfromrolla.com/webtech/071906-1.shtml. I have not tinkered with this at all, so I don't claim that it solves the whole problem (sorting and filtering for example).

Yes CTEs can be used to fetch hierarchies as well, but of course it's very complicated to setup. Having special crafted PK/FK values is IMHO a simpler approach and it works everywhere Wink
Frans Bouma
LLBLGen Pro / ORM Profiler Lead Developer | Blog | Twitter
 
Top
psandler
User



Location:
Chicago, IL
Joined on:
22-Feb-2005 22:24:13
Posted:
540 posts
# Posted on: 21-Jun-2008 15:42:47.  
Otis wrote:

I think that having a single table with a 'ParentID' FK pointing to its own PK is still checked by an FK constraint (so it does fall into the boundaries of a relational model), however it's unsuitable for doing set operations. The tree solution Celko put forward does offer the ability to perform set operations on it, so it's more SQL friendly, however it has problems with updates.

So I think we're not talking about the same thing Wink. This paper is about having a hierarchy of data (or better: a semantical hierarchy of data), like the textbook examples: Employees which have an FK to themselves for Manager.). Typically, people store these kind of hierarchies in a single table, but can't do anything with it when it comes to hierarchies: it's a pain to obtain the whole hierarchy below employee X for example.


I'm not clear on where you think we're talking about two different things. I'm guessing from your comment above about PK/FK that we understand the model differently. In your understanding of this model, you would store BOTH the PK/FK and the rational numbers in each row, so the table is actually self-referencing (EDIT: changed this to "self-referencing")?

(If that guess is wrong, you can likely skip the rest of this Regular Smiley ).

The model Celko initially proposed does not use a self-referencing table (at least in the discussion you linked to in the original thread), and this is the design I was basing my comments on. Here is the table he proposed:

Quote:

Personnel
emp         lft    rgt
======================
'Albert'     1    12
'Bert'         2     3
'Chuck'     4    11
'Donna'     5     6
'Eddie'     7     8
'Fred'        9     10


So I was under the impression that the model proposes no PK/FK constraint, ONLY the left/right (or rational) numbers that describe the relationship. Doing it without a PK/FK column, I would consider that "storing the relationship as metadata". If there were a PK/FK column, it's probably closer to "denormalization". And if it's denormalization, it comes with all the potential problems of such--every time a row is updated, the tree has to be checked for cascading updates (disclaimer: I don't understand a word of the "Moving Subtrees" section of that paper Wink ).

Just to be clear, I'm not poo-pooing the paper or the idea. It's obviously an improvement on the nested set model, and (as I said) it's probably a good choice in many scenarios.

Otis wrote:

I think what you're talking about is having a db inside a db, where a couple of tables are defined to be able to add columns etc. on the fly, indeed storing meta-data inside a table, which is a big problem waiting to happen.


Yes, this is what I meant regarding the Entity-Attribute-Value (EAV) model. I was only mentioning that as an example of where Celko gets on his soapbox about bastardizing the relational model.


Phil


My C#/SQL Blog (some LLBL content)
Email: psandler70 (at) yahoo.com
 
Top
Otis
LLBLGen Pro Team



Location:
The Hague, The Netherlands
Joined on:
17-Aug-2003 18:00:36
Posted:
34993 posts
# Posted on: 25-Jun-2008 09:33:50.  
Ah! Yes I now understand what you meant. Regular Smiley Indeed you're right, celko was talking about multiple tables to store a hierarchy of entity instances instead of the classic textbook example of 1 table which references itself.
Frans Bouma
LLBLGen Pro / ORM Profiler Lead Developer | Blog | Twitter
 
Top
Pages: 1  


Powered by HnD ©2002-2007 Solutions Design
HnD uses LLBLGen Pro

Version: 2.1.12172008 Final.