Marcus wrote:
Frans,
There's an area of my app that I really need to tighten up performance on. Can you comment on the performance implications of the prefetch implementation?
1) I am seeing (via SQL Profiler) that prefeth uses sub-queries for related tables. SELECT * FROM related_table WHERE id IN (SELECT id FROM table). Is this faster than a JOIN?
In most cases it is. The problem with joins is that you get a lot of redundant data. There are cases where joins can be faster, but this has to be specified by the user, which is planned, however these cases is mostly just 2 related entities, i.e. simple graphs.
The redundant data from joins really causes a lot of problems, for example when DISTINCT can't be used due to a blob / text/image field in the resultset. The core then has to filter out duplicates on the client, which is a bit slower than when SqlServer would do it.
2) For fetching single entities (not collections) I have come to the conclusion that using prefetch is not a good idea. Essentially for a single entity I end up:
a) SELECT * FROM table
b) SELECT * FROM related_table1 WHERE id IN (SELECT id FROM table)
c) SELECT * FROM related_table2 WHERE id IN (SELECT id FROM table)
Since I will know the PK ID of the main table after (a) I'd be much better off doing:
a) SELECT * FROM table
b) SELECT * FROM related_table1 WHERE id = @ID
c) SELECT * FROM related_table2 WHERE id = @ID
by using this method:
adapter.FetchEntityCollection(folderEntity.ChildFolderLinkCollection, folderEntity.GetRelationInfoChildFolderLinkCollection());
Or are you relying on some internal SQL Server caching to cache the results of the SELECT id FROM table query (above)
The subquery can use cached results inside the database, but as the query is bigger in b) than it is in a) it needs to recalculate the execution plan anyway. Prefetch paths aren't always the best way to deal with things. They can help when you have to load a lot of data in a graph, as it saves you from a lot of queries (10 customers, each 10 orders with 5 order lines, gives normally 10 + 1010 + 1010*5 queries, with prefetch paths just 3). However when it's not worth the overhead, you can also execute the queries separately, if the amount of queries would be the same anyway (and with a single entity that's the case).
The other thing is that prefetch doesn't keep the connection open for the prefeches by default... is this a bug? (I'd like it to! )
Marcus
No this is not a bug. It has to do some processing in between queries. It therefore closes the connection each time, which is just giving back the connection to the connection pool. For heavy load applications this is best, as every connection has to be fully utilized. If you want it to keep open the connection, set the KeepConnectionOpen flag to true before the prefetchpath fetch. Though getting a connection from the pool takes 20ms tops.