Tagging (web 2.0-style) best practice

Posts   
 
    
bmoeskau
User
Posts: 54
Joined: 15-Jun-2005
# Posted on: 24-Oct-2006 20:15:40   

Hello all,

We are planning on implementing tagging in our app, similar to del.icio.us, etc. The basic concept is simple: add a Tags table (id / value) and add a TagRelations table (TagID, FK ID) that will link TagID to other entity ids.

My question is simply, has anyone else implemented this in LLBL? Did you do it this way or some other way? I'm a little concerned about the number of joins that may be required, as a generic tag search would essentially have to join to every single entity table that supports tags (could be 8-10 or more). Most systems I've seen that use tagging only operate on one or a handful of entity types, but our goal is to be able to link together any of a number of entities arbitrarily by tags and create dynamic "project" views of our data. Anyone have any insights about the best approach for this?

bmoeskau
User
Posts: 54
Joined: 15-Jun-2005
# Posted on: 24-Oct-2006 20:21:44   

PS -- I accidentally clicked the "mark as done" link even though the thread is not done. Too bad I can't unmark it once it's marked done! frowning

[Edit]: OK, so by replying myself, it's now reopened I guess. Learn something new everyday.

bclubb
User
Posts: 934
Joined: 12-Feb-2004
# Posted on: 25-Oct-2006 02:50:09   

This assumes that you would be primarily using this for a search with results.
I would look at using a table to store the data used in displaying results. Such as a summary, keywords, type, and key to the actual data. I would then have this summary joined to the tags so that you search your summary table, but you can also use the tags to categorize them. This way all of the specific types of data can be stored in the link from the summary to the detailed entities.

Kris
User
Posts: 79
Joined: 27-Oct-2005
# Posted on: 25-Oct-2006 09:06:28   

Gavin Joyce used tagging in dotnetkicks. How did he do it with LLBLGen 2.0?

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 25-Oct-2006 19:19:36   

bmoeskau wrote:

Hello all,

We are planning on implementing tagging in our app, similar to del.icio.us, etc. The basic concept is simple: add a Tags table (id / value) and add a TagRelations table (TagID, FK ID) that will link TagID to other entity ids.

My question is simply, has anyone else implemented this in LLBL? Did you do it this way or some other way? I'm a little concerned about the number of joins that may be required, as a generic tag search would essentially have to join to every single entity table that supports tags (could be 8-10 or more). Most systems I've seen that use tagging only operate on one or a handful of entity types, but our goal is to be able to link together any of a number of entities arbitrarily by tags and create dynamic "project" views of our data. Anyone have any insights about the best approach for this?

We use tagging extensively in our app... The problems you describe are well know.

You will need a separate TagRelations table for each type of data being tagged. Using a single table means that you will need a discriminator and this means that you need extra select criteria and you can't define "real" FKs which help the optimizer choose an appropriate query plan based on statistics.

If you want an aggregated resultset from all your TagRelation table you can always UNION the results. This BY FAR more perfromant than a single query with lots of JOINs as the optimiser will select a more appropriate plan for each query.

Depending on the usage profile of your app, i.e. how many concurrent users will be searching and then paging through the tagged data (I presume you are going to have paging) coupled with how the data will be sorted and whether or not there are permission on that data, JOINs will either be okay or a complete "no no".

The TagRelations table needs to have enough information to isolate the set of FKs for your data for a given page. If the sort column is not in the TagRelations table, then SQL will need to JOIN ALL the data and then sort before it can isolate the required page.

If you ensure that each TagRelation table contains exactly the data you need for FKs and sorting criteria etc AND you create an index for each query you are going to run. The JOIN will not be an issue. SQL Server will ONLY join the rows you are planning on fetching anyway using an Index Seek.

Our traffic profile is such that JOIN are a "no no" and therefore we have denormalised all of our tables to eliminate JOINs completely. The denormalisation is managed by triggers and is not for the faint hearted.

We have a seperate stored procedure for every possible fetch and these have been hand tweaked to ensure that SQL Server chooses the correct query plan. The results from the SPs are then "projected" into EntityCollections.

If you have more than 1 web server or business tier server, the other issue you will have is TagIDs. As it is quite possible that more than one user will create the same tag at the same time. You will need to have a unique contraint on the TagName field and ensure that failed inserts check to see if the tag has "since" been created.

I also recommend that you cache TagID/TagName pairs since this information is immutable (doesn't change) and will save you going back to the database to fetch TagIDs when people add tags that you have already in the cache. This means that you will only need to fetch TagIDs for tags that are not already in the cache.

Hope this helps,

Marcus

bmoeskau
User
Posts: 54
Joined: 15-Jun-2005
# Posted on: 25-Oct-2006 21:19:32   

bclubb wrote:

This assumes that you would be primarily using this for a search with results.
I would look at using a table to store the data used in displaying results. Such as a summary, keywords, type, and key to the actual data. I would then have this summary joined to the tags so that you search your summary table, but you can also use the tags to categorize them. This way all of the specific types of data can be stored in the link from the summary to the detailed entities.

I think this approach will probably work well for us. I had not considered adding the summary column directly into the join table in addition to the type, but for my purposes, that would provide enough info for the results and actually eliminate the need for any JOINs. Then the actual entity retrieval would always be by PK.

Thanks for your response too Marcus, as it has given me a lot to think about. If I end up still needing to actually JOIN to all entity tables, I'll definitely keep your advice in mind. I also hadn't specifically thought about caching the tags, but I'll definitely look into that as well.

Our traffic profile is such that JOIN are a "no no" and therefore we have denormalised all of our tables to eliminate JOINs completely. The denormalisation is managed by triggers and is not for the faint hearted.

I don't envy you simple_smile

Thanks for the quick responses guys, I appreciate it.