Paging + prefetch path

Posts   
1  /  2  /  3  /  4
 
    
Otis avatar
Otis
LLBLGen Pro Team
Posts: 39617
Joined: 17-Aug-2003
# Posted on: 12-May-2005 19:14:43   

O brother, I completey missed this one. I'll try to come up with an answer later today simple_smile

Frans Bouma | Lead developer LLBLGen Pro
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 12-May-2005 19:26:48   

Otis wrote:

O brother, I completey missed this one. I'll try to come up with an answer later today simple_smile

Frans, Don't worry... I'm off to Warsaw, Poland for a stag weekend early in the morning smile .

The paging combined with prefetch is working fine here. simple_smile

The new method in the DataAccessAdapterBase needs some optimisation and refactoring to clean it up before it's prodcution worthy. Also it has not been fully tested in every combination yet, I'll try to get around to it when I get back. I'll also send you the code on my return. Have a good weekend.

Marcus

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39617
Joined: 17-Aug-2003
# Posted on: 20-May-2005 11:41:07   

I like your approach, the downside is that it has limitations in the fact that it assumes a page is small. But what if I have a processing routine which chomps at 2000 rows a page, through a 1 million row table? Very possible. Now, on Oracle your approach fails, as the IN clause can have 1000 parameters max. I'm not certain about other database's limitations but I won't be surprised if there are.

disappointed so I've a bit of a mixed feeling about this. It's solveable though, with very specific queries per database, as paging on Oracle is very simple, so the actual paging query is not hard to fabricate, so the IN problem will go away, though that requires some redesign in the DQE and prefetch path code, as the IN clause is only required for Sqlserver.

Frans Bouma | Lead developer LLBLGen Pro
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 21-May-2005 11:45:35   

Otis wrote:

I like your approach, the downside is that it has limitations in the fact that it assumes a page is small.

Yes this is the basis on which the implementation was done... It assumes that the page size is within reason. I also mentioned that the SubPaths which (in my implementation recurse into the same routine) should actually recurse into your original method as we have no indication on the size of the Prefetched set.

After looking at this, a thought occurred to me that in fact there is a huge optimisation to be had here. I personally dislike using PrefecthPaths except for when I'm fetching collections because of the fact that the issue a minimum of 2 queries for each prefetch. Obviously this is inefficient if you are simply fetching an entity as you might as well just fetch the related entities yourself which results in 1 query. Nesting prefetch paths (adding a SubPath) aggravates the problem exponentially...

What I'm thinking of is using a Strategy Pattern to determine the best FetchPrefetchPath method to call. Since we already have the root entity collection before we call FetchPrefetchPath, we know the set size in advance and can call your method for sets with a score > n (where score is the set size * PK count which gives us a rough estimate of the size of the eventual WHERE clause).

private void FetchPrefetchPath(
    IEntityCollection2 rootEntities,
    IRelationPredicateBucket filterBucket,
    long maxNumberOfItemsToReturn,
    ISortExpression sortClauses,
    IPrefetchPath2 prefetchPath)
{
    if (rootEntities.Count == 0)
        return;

    int pkCount = rootEntities[0].Fields.PrimaryKeyFields.Count;
    int setCount = rootEntities.Count;

    if ((setCount * pkCount) <= THRESHOLD)
        FetchPrefetchPathFast(rootEntities, filterBucket, maxNumberOfItemsToReturn, sortClauses, prefetchPath);
    else
        FetchPrefetchPathSlow(rootEntities, filterBucket, maxNumberOfItemsToReturn, sortClauses, prefetchPath);
}

Your current implementation of FetchPrefetchPath becomes FetchPrefetchPathSlow and my implementation becomes FetchPrefetchPathFast. (Don't take that the wrong way stuck_out_tongue_winking_eye )

Prefetch with Paging needs to imposes the same threshold as it needs to call FetchPrefetchPathFast directly, but sub prefetches recurse back into the FetchPrefetchPath strategy method. Maybe an exception needs to be thrown if the threshold is exceeded for paging + prefetch fetches.

Otis wrote:

But what if I have a processing routine which chomps at 2000 rows a page, through a 1 million row table? Very possible. Now, on Oracle your approach fails, as the IN clause can have 1000 parameters max. I'm not certain about other database's limitations but I won't be surprised if there are.

I think the beauty of the strategy pattern is that you can set the threshold on a per database basis. SQL Server also has a limit on the size of query (16777216) does it not? So the threshold needs to be set accordingly. The threshold should also take into account the size of the query string being sent to the server as this will also impact performance. If the WHERE clause is 16Mb then this is obviously not a good candidate to send over the network for the database to process.

Otis wrote:

disappointed so I've a bit of a mixed feeling about this. It's solveable though, with very specific queries per database, as paging on Oracle is very simple, so the actual paging query is not hard to fabricate, so the IN problem will go away, though that requires some redesign in the DQE and prefetch path code, as the IN clause is only required for Sqlserver.

Have I convinced you? simple_smile

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 21-May-2005 12:00:24   

Also I forgot to mention... MergeNormal and MergeManyToMany (in DataAccessAdapterBase) both use a number of Hashtables to assist in the merging of the prefetched entities, but they do not initialise the new Hashtables with a capacity.

For large sets, this results in the Hashtable having to resize itself (which cause all the entries to be re-hashed) as entities are added.

But, you actually know what capacity is required since you have the entity collections before the hashtables are created. Just a tip simple_smile

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39617
Joined: 17-Aug-2003
# Posted on: 21-May-2005 12:27:52   

Marcus wrote:

Otis wrote:

I like your approach, the downside is that it has limitations in the fact that it assumes a page is small.

Yes this is the basis on which the implementation was done... It assumes that the page size is within reason. I also mentioned that the SubPaths which (in my implementation recurse into the same routine) should actually recurse into your original method as we have no indication on the size of the Prefetched set.

I'll re-check your code simple_smile

After looking at this, a thought occurred to me that in fact there is a huge optimisation to be had here. I personally dislike using PrefecthPaths except for when I'm fetching collections because of the fact that the issue a minimum of 2 queries for each prefetch. Obviously this is inefficient if you are simply fetching an entity as you might as well just fetch the related entities yourself which results in 1 query. Nesting prefetch paths (adding a SubPath) aggravates the problem exponentially...

I don't see where 2 queries are issued per prefetch. One query per node is executed. Only for m:n relations 2 queries are executed, but that's a requirement to find the real relations between the actual m:n related entities.

Could you elaborate a bit on this?

What I'm thinking of is using a Strategy Pattern to determine the best FetchPrefetchPath method to call. Since we already have the root entity collection before we call FetchPrefetchPath, we know the set size in advance and can call your method for sets with a score > n (where score is the set size * PK count which gives us a rough estimate of the size of the eventual WHERE clause).

private void FetchPrefetchPath(
    IEntityCollection2 rootEntities,
    IRelationPredicateBucket filterBucket,
    long maxNumberOfItemsToReturn,
    ISortExpression sortClauses,
    IPrefetchPath2 prefetchPath)
{
    if (rootEntities.Count == 0)
        return;

    int pkCount = rootEntities[0].Fields.PrimaryKeyFields.Count;
    int setCount = rootEntities.Count;

    if ((setCount * pkCount) <= THRESHOLD)
        FetchPrefetchPathFast(rootEntities, filterBucket, maxNumberOfItemsToReturn, sortClauses, prefetchPath);
    else
        FetchPrefetchPathSlow(rootEntities, filterBucket, maxNumberOfItemsToReturn, sortClauses, prefetchPath);
}

Your current implementation of FetchPrefetchPath becomes FetchPrefetchPathSlow and my implementation becomes FetchPrefetchPathFast. (Don't take that the wrong way stuck_out_tongue_winking_eye )

heh, well, I'd like to know where the 'gain' is in your faster routine simple_smile . Of course my routine is a general purpose routine, and with large tables, there are ways to optimize it using joins, but it takes a lot of logic to get that right.

Could you elaborate a bit why your routine is faster? Due to the replacement of the subquery with the hardcoded IN clause?

Prefetch with Paging needs to imposes the same threshold as it needs to call FetchPrefetchPathFast directly, but sub prefetches recurse back into the FetchPrefetchPath strategy method. Maybe an exception needs to be thrown if the threshold is exceeded for paging + prefetch fetches.

I don't think that's great, as that exception will likely be thrown during production runs, under load at saturday morning when everybody but you is on holiday wink so you're called to get it fixed.

Otis wrote:

But what if I have a processing routine which chomps at 2000 rows a page, through a 1 million row table? Very possible. Now, on Oracle your approach fails, as the IN clause can have 1000 parameters max. I'm not certain about other database's limitations but I won't be surprised if there are.

I think the beauty of the strategy pattern is that you can set the threshold on a per database basis. SQL Server also has a limit on the size of query (16777216) does it not?

11,000 or so, then it crashes. Some customer had that last week.

So the threshold needs to be set accordingly. The threshold should also take into account the size of the query string being sent to the server as this will also impact performance. If the WHERE clause is 16Mb then this is obviously not a good candidate to send over the network for the database to process.

True, though when is which faster ?

I think we're mixing two things here, and that bugs me a little: 1) paging 2) apparently slow prefetch logic.

now, 1) is something which can be hacked in with some effort and has limits. 2) is something different and if it's slow, I'd like to know when and why so proper fixes can be applied. I see 2) completely separated from 1), otherwise complex code is added to 'fix' 2) together with 1) but no real evidence is there if it's really faster. Complex code makes code slower as well, it then depends if the extra complexity (and thus bugs and slower code) is worth the extra speed it might create (if it creates extra speed).

Otis wrote:

disappointed so I've a bit of a mixed feeling about this. It's solveable though, with very specific queries per database, as paging on Oracle is very simple, so the actual paging query is not hard to fabricate, so the IN problem will go away, though that requires some redesign in the DQE and prefetch path code, as the IN clause is only required for Sqlserver.

Have I convinced you? simple_smile

I think the 2 things shouldn't be mixed. I now read that you run into slow prefetch code. I then would like to know when and what's the reason it's slow simple_smile .

Frans Bouma | Lead developer LLBLGen Pro
Otis avatar
Otis
LLBLGen Pro Team
Posts: 39617
Joined: 17-Aug-2003
# Posted on: 21-May-2005 12:29:57   

Marcus wrote:

Also I forgot to mention... MergeNormal and MergeManyToMany (in DataAccessAdapterBase) both use a number of Hashtables to assist in the merging of the prefetched entities, but they do not initialise the new Hashtables with a capacity.

For large sets, this results in the Hashtable having to resize itself (which cause all the entries to be re-hashed) as entities are added.

But, you actually know what capacity is required since you have the entity collections before the hashtables are created. Just a tip simple_smile

You're right on that simple_smile I'll file it as an issue simple_smile

Frans Bouma | Lead developer LLBLGen Pro
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 21-May-2005 14:49:31   

Otis wrote:

I think we're mixing two things here, and that bugs me a little: 1) paging 2) apparently slow prefetch logic.

Yes you are correct...simple_smile this thread started by discussing the ability to combine paging with prefetch. Period. So let's take them seperately as you said.

Otis wrote:

now, 1) is something which can be hacked in with some effort and has limits.

I agree that there is a limit to combining the ability to fetch a given page of an entity collection and have entities on that page prefetch related entities. The limit is imposed by the fact that you must supply FieldCompareValuePredicates with a set of hardcoded primary key IDs. Therefore an upperbound must be set on the size of the page when using this combination. I suggested throwing an exception and you said:

Otis wrote:

I don't think that's great, as that exception will likely be thrown during production runs, under load at saturday morning when everybody but you is on holiday wink so you're called to get it fixed.

But I don't agree here...simple_smile the exception would be of type ArgumentOutOfRangeException which is perfectly acceptable in a production system as it's a design time restriction. It's up to the developers to ensure that the application restricts page sizes to within the parameter limits when combining paging with a prefetch. Page sizes should not dynamically increase as the load on the system increases... they are independent of load on the system.

Otis wrote:

now, 1) is something which can be hacked in with some effort and has limits.

I don't see my implementation as a hack or a lot of effort, just a different strategy and as you point out, this strategy has limits. Agreed.

On the other point:

Otis wrote:

2) apparently slow prefetch logic.

Okay... this is not meant to be an attack on the internals of LLBLGen (which we all know is fantastic). I am simply making some observations on possible enhancements to the implmentation of the prefetch logic.

Otis wrote:

I don't see where 2 queries are issued per prefetch. One query per node is executed. Only for m:n relations 2 queries are executed, but that's a requirement to find the real relations between the actual m:n related entities. Could you elaborate a bit on this?

Sorry, I use the word "query" to mean in this context "a SELECT statement" whether that select originates in the application or as a subquery on the db server. The current implemetation of prefetch causes a minimum of 2 SELECTs per prefetch because of the sub query. My implementation removes the need for the sub query because it creates the prefetch predicate in the application based on the previously fetched parent entitie's PK IDs. This predicate can be re-used for all prefetches on the parent entities. For SubPath prefetches, the predicate would need to be re-created. A quick look at the Execution Plan and Server Trace in Query Analyser will confirm that using sub queries is slower.

Here is an example of a prefetch which has a SubPath:

IPrefetchPath2 prefetch = new PrefetchPath2((int)EntityType.PermissionEntity);
prefetch.Add(PermissionEntity.PrefetchPathGroup);

and here is the associated generate SQL for the prefetch (tidied up a little)

SELECT ...
FROM [Group] 
WHERE ( [Group].[GroupUID] 
IN (SELECT [Permission].[GroupUID] AS [GroupUID] 
FROM [Permission] 
WHERE ( [Permission].[ResourceUID] = @ResourceUID1)))

My prefetch implementation yields the following (because I supply the predicate) which executes faster:

SELECT ...
FROM [Group] 
WHERE ( [Group].[GroupUID] = @GroupUID1)

I know this is a simple example, but as predicates and relations are added to the parent entities' fetch, the sub query becomes increasingly slower, especially if "god forbid" a table or index scan is involved and the table is large. Adding a SubPath to the prefetch causes the entire parent prefetch query to become the sub query and so on... I don't think there can be any argument over which query executes faster in the db... QueryAnalyser can show the execution plans and server traces which indicates that using the sub query executes twice as many reads as not using it. When SubPaths are added you are looking at a minimum of 4 times as many reads. (I havn't taken into account any caching that may be implemented in the db to save the results previously executed queries)

All my implementation does, is determine whether or not the parent entities set size is smaller than some threshold. If it is, then create the desired predicate in advance of hitting the database, thus eliminating the sub queries.

Otis wrote:

1) but no real evidence is there if it's really faster. Complex code makes code slower as well, it then depends if the extra complexity (and thus bugs and slower code) is worth the extra speed it might create (if it creates extra speed).

Good point... Is the creation of the predicates, coupled with the latency of transmision of the larger query to the database over the network and its associated parsing actually slower than having the database perform the sub query... There may well be some point at which the sub query is faster... and is the reason you need to set the threshold values to something reasonable.

The other point worth mentioning here is that if there is a tossup between having the database doing more work or having the middle tier doing more work, I'm much more in favor of having the middle tier take the extra load as we can always add more middle tiers...

simple_smile

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39617
Joined: 17-Aug-2003
# Posted on: 23-May-2005 11:07:12   

Marcus wrote:

Otis wrote:

now, 1) is something which can be hacked in with some effort and has limits.

I agree that there is a limit to combining the ability to fetch a given page of an entity collection and have entities on that page prefetch related entities. The limit is imposed by the fact that you must supply FieldCompareValuePredicates with a set of hardcoded primary key IDs. Therefore an upperbound must be set on the size of the page when using this combination. I suggested throwing an exception and you said:

Otis wrote:

I don't think that's great, as that exception will likely be thrown during production runs, under load at saturday morning when everybody but you is on holiday wink so you're called to get it fixed.

But I don't agree here...simple_smile the exception would be of type ArgumentOutOfRangeException which is perfectly acceptable in a production system as it's a design time restriction. It's up to the developers to ensure that the application restricts page sizes to within the parameter limits when combining paging with a prefetch. Page sizes should not dynamically increase as the load on the system increases... they are independent of load on the system.

Ok, that's true simple_smile .

And now the 1 million dollar question: when will paging + prefetch paths using a set of hardcoded compare value predicates (or an IN range) be slower than paging + manual fetching?

Of course, it could be left to the developer to find that out of course.

Otis wrote:

now, 1) is something which can be hacked in with some effort and has limits.

I don't see my implementation as a hack or a lot of effort, just a different strategy and as you point out, this strategy has limits. Agreed.

hack as in: added to the code without changing the overall structure.

On the other point:

Otis wrote:

2) apparently slow prefetch logic.

Okay... this is not meant to be an attack on the internals of LLBLGen (which we all know is fantastic). I am simply making some observations on possible enhancements to the implmentation of the prefetch logic.

Oh, I fully agree on the enhancements thing. For example:


[Test]
public void OptimizingPrefetchPathsTest()
{
    using(DataAccessAdapter adapter = new DataAccessAdapter())
    {
        CustomerEntity c = new CustomerEntity("CHOPS");
        PrefetchPath2 path = new PrefetchPath2((int)EntityType.CustomerEntity);
        path.Add(CustomerEntity.PrefetchPathOrders);

        adapter.FetchEntity(c, path);
    }
}

The path fetch for the orders results in this query:


SELECT  [Northwind].[dbo].[Orders].[OrderID] AS [OrderId],
        ...
FROM    [Northwind].[dbo].[Orders]  
WHERE   ( 
            [Northwind].[dbo].[Orders].[CustomerID] IN 
            (
                SELECT  [Northwind].[dbo].[Customers].[CustomerID] AS [CustomerId] 
                FROM    [Northwind].[dbo].[Customers]  
                WHERE 
                ( 
                    ( [Northwind].[dbo].[Customers].[CustomerID] = @CustomerId1)
                )
            )
        )

I think we all agree on the fact that it can be more efficient wink . It's caused by the fact that the routine uses the same routine for the root -> first child fetch and all other childs: it doesn't take into account a root entity collection of 1. You can also optimize it for root collections with n items, like:


SELECT  [Northwind].[dbo].[Orders].[OrderID] AS [OrderId],
        ...
FROM    [Northwind].[dbo].[Orders]  
WHERE   ( 
            [Northwind].[dbo].[Orders].[CustomerID] IN 
            (
                @CustomerId1, @CustomerId2, ..., @CustomerIdn
            )
        )

The point is, how much faster is it? It doesn't have to read rows from the tables (or better: perform a cross filter on an index) but it does have to setup an in-memory table with the parameter values, which are not indexed (as they're supplied). For example a WHERE foo IN (1) clause is extremely slow, compared to WHERE foo = 1

Say I use this optimization. Two queries: Q1:


SELECT  *
FROM    Orders
WHERE   CustomerID IN
(
    SELECT  CustomerID
    FROM    Customers
    WHERE   Country = 'Mexico'
)

which gives 5 rows in customer. I can optimize that to use no query on customer: Q2:


SELECT  *
FROM    Orders
WHERE   CustomerID IN
(
    'ANATR', 'ANTON', 'CENTC', 'PERIC', 'TORTU'
)

Execution plan is different, query costs are the same, though the IO costs are abit higher in Q1. (though the query is easier to parse so it gains a little on that). The CPU costs for Q1 are 0, but for Q2 have a value. Reads are lower on Q2. If I increase the values to 'France', which has 11 customers, it's about the same. Of course this is totally unscientific measuring, as the tables are relatively small, and I don't know what the impact will be on a very large table.

Though it shouldn't matter: as the code doesn't know how large the table is. The developer doesn't know either, because the table might be small when the project started but it can grow over time.

So, for 1 value, i.e. a single entity fetch and a prefetch path: agreed. For multiple roots, I'm not so sure. (as in: I have to test it, it's not obvious. But how to test it?). I'm reluctant to do micro-optimization based on guesses.

Otis wrote:

I don't see where 2 queries are issued per prefetch. One query per node is executed. Only for m:n relations 2 queries are executed, but that's a requirement to find the real relations between the actual m:n related entities. Could you elaborate a bit on this?

Sorry, I use the word "query" to mean in this context "a SELECT statement" whether that select originates in the application or as a subquery on the db server. The current implemetation of prefetch causes a minimum of 2 SELECTs per prefetch because of the sub query. My implementation removes the need for the sub query because it creates the prefetch predicate in the application based on the previously fetched parent entitie's PK IDs. This predicate can be re-used for all prefetches on the parent entities. For SubPath prefetches, the predicate would need to be re-created. A quick look at the Execution Plan and Server Trace in Query Analyser will confirm that using sub queries is slower.

Not in the execution plans, though in traces indeed. (reads), though more CPU usage (0 vs. 16)

However the RDBMS has to do more work on the parameter aspect, which isn't taken into account when executing the query from QA.

Here is an example of a prefetch which has a SubPath:

IPrefetchPath2 prefetch = new PrefetchPath2((int)EntityType.PermissionEntity);
prefetch.Add(PermissionEntity.PrefetchPathGroup);

and here is the associated generate SQL for the prefetch (tidied up a little)

SELECT ...
FROM [Group] 
WHERE ( [Group].[GroupUID] 
IN (SELECT [Permission].[GroupUID] AS [GroupUID] 
FROM [Permission] 
WHERE ( [Permission].[ResourceUID] = @ResourceUID1)))

My prefetch implementation yields the following (because I supply the predicate) which executes faster:

SELECT ...
FROM [Group] 
WHERE ( [Group].[GroupUID] = @GroupUID1)

... because there's 1 permission entity fetched and you use that object's groupid to fetch the group? (so m:1/1:1 relations)

All my implementation does, is determine whether or not the parent entities set size is smaller than some threshold. If it is, then create the desired predicate in advance of hitting the database, thus eliminating the sub queries.

Ok, I see that, the point is that the route the query text has to travel is longer than what you test with QA. So you have less reads, agreed, though you also have a potential long query text and potential long parameter list. I do agree that I/O is probably slower than processing these parameters, though because you're specifying a hardcoded in query, the query's execution plan is likely to be tossed out every time as it isn't re-usable when another fetch is done with a different amount of root entities.

In my queries, if I define an index on 'country', duration and cpu on Q1 are 0, and IO goes down, though for Q2, it doesn't matter.

For a root entity set of 1, fully agreed, for a root entity set larger than 1 element, I'm not sure.

Otis wrote:

1) but no real evidence is there if it's really faster. Complex code makes code slower as well, it then depends if the extra complexity (and thus bugs and slower code) is worth the extra speed it might create (if it creates extra speed).

Good point... Is the creation of the predicates, coupled with the latency of transmision of the larger query to the database over the network and its associated parsing actually slower than having the database perform the sub query... There may well be some point at which the sub query is faster... and is the reason you need to set the threshold values to something reasonable.

But when is that? I can't do that check in my code. I can try to emit silly queries like the example I gave, where the subquery's field is the PK and there is 1 element to filter on.

The other point worth mentioning here is that if there is a tossup between having the database doing more work or having the middle tier doing more work, I'm much more in favor of having the middle tier take the extra load as we can always add more middle tiers... simple_smile

Agreed, though the burden of the longer query is carried by the rdbms I think, as there the query is parsed and compiled.

I'll see what I can do to optimize it a bit.

Frans Bouma | Lead developer LLBLGen Pro
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 23-May-2005 11:44:03   

Otis wrote:

And now the 1 million dollar question: when will paging + prefetch paths using a set of hardcoded compare value predicates (or an IN range) be slower than paging + manual fetching?

smile

Otis wrote:

Oh, I fully agree on the enhancements thing. For example:


[Test]
public void OptimizingPrefetchPathsTest()
{
    using(DataAccessAdapter adapter = new DataAccessAdapter())
    {
        CustomerEntity c = new CustomerEntity("CHOPS");
        PrefetchPath2 path = new PrefetchPath2((int)EntityType.CustomerEntity);
        path.Add(CustomerEntity.PrefetchPathOrders);

        adapter.FetchEntity(c, path);
    }
}

The path fetch for the orders results in this query:


SELECT  [Northwind].[dbo].[Orders].[OrderID] AS [OrderId],
        ...
FROM    [Northwind].[dbo].[Orders]  
WHERE   ( 
            [Northwind].[dbo].[Orders].[CustomerID] IN 
            (
                SELECT  [Northwind].[dbo].[Customers].[CustomerID] AS [CustomerId] 
                FROM    [Northwind].[dbo].[Customers]  
                WHERE 
                ( 
                    ( [Northwind].[dbo].[Customers].[CustomerID] = @CustomerId1)
                )
            )
        )

I think we all agree on the fact that it can be more efficient wink . It's caused by the fact that the routine uses the same routine for the root -> first child fetch and all other childs: it doesn't take into account a root entity collection of 1. You can also optimize it for root collections with n items, like:


SELECT  [Northwind].[dbo].[Orders].[OrderID] AS [OrderId],
        ...
FROM    [Northwind].[dbo].[Orders]  
WHERE   ( 
            [Northwind].[dbo].[Orders].[CustomerID] IN 
            (
                @CustomerId1, @CustomerId2, ..., @CustomerIdn
            )
        )

All these optimisation are covered in my implementation...

Otis wrote:

The point is, how much faster is it?

...the 1 Million Dollar Question! simple_smile

Otis wrote:

Execution plan is different, query costs are the same, though the IO costs are abit higher in Q1. (though the query is easier to parse so it gains a little on that). The CPU costs for Q1 are 0, but for Q2 have a value. Reads are lower on Q2. If I increase the values to 'France', which has 11 customers, it's about the same. Of course this is totally unscientific measuring, as the tables are relatively small, and I don't know what the impact will be on a very large table.

Some of my tables are > 1,000,000 rows and I see significant improvement using the optimisations...

Otis wrote:

Though it shouldn't matter: as the code doesn't know how large the table is. The developer doesn't know either, because the table might be small when the project started but it can grow over time.

This is why I introduced a "threshold" value which cuts off the optimisations a some point. This point (or a reasonable value for it) can be estimated with a few tests and either put in a config file or hardcoded in LLBLGen.

Otis wrote:

Not in the execution plans, though in traces indeed. (reads), though more CPU usage (0 vs. 16)

For me, reads are killing the DB more than CPU...

Otis wrote:

I do agree that I/O is probably slower than processing these parameters, though because you're specifying a hardcoded in query, the query's execution plan is likely to be tossed out every time as it isn't re-usable when another fetch is done with a different amount of root entities.

Good point. simple_smile

Otis wrote:

I'll see what I can do to optimize it a bit.

Great!

I think we've spent enough time discussing this one... it's obviously a topic that has many facets and is certainly not clear cut... What I can suggest is to send you an updated copy of my DataAccessAdapterBase which contains both the Prefetch + Paging and the optimisations I described above. I have done some limited performance testing on the system as a whole and have seen a 20% improvement in my tests using Application Test Center. ~200 page per second vs ~164 pages per second. This figure increases dramatically as the tables size increase, but this of course if application specific. Maybe you have a prefetch load test already set up and could give my version a bash to see if it does indeed give a performance benefit (especially on large tables... AFTER the Index Tuning Wizard has been run)

If you want to incorporate my changes, please feel free, otherwise maybe you could change the protection level of the private FetchPrefetchPath method and private methods it calls (MergeNormal and MergeManyToMany) from "private" to "protected virtual"? This way I can simply sub class DataAccessAdapterBase and keep my changes isolated.

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 09-Jun-2005 18:53:13   

Otis wrote:

Marcus wrote:

Speaking of subclassing DataAccessAdapter did you have any thoughts on changing the protection level of:

private FetchPrefetchPath -> protected virtual private MergeNormal -> protected private MergeManyToMany -> protected

... in DataAccessAdapterBase to accomodate my custom paging + prefetch (from our other discussion on http://llblgen.com/tinyforum/Messages.aspx?ThreadID=1517&StartAtMessage=34)

MergeNormal and MergeManyToMany not virtual? I don't mind making them protected virtual simple_smile

Sure why not... simple_smile I was going with a minimalistic approach. wink The only reason they are needed is because they are called from FetchPrefetchPath and I need to be able to call back into them...

While we are on the subject of extensibility... smile My TempDB is getting hit hard due to paging creating temp tables... I havn't had a chance to experiement yet, but that article on CodeProject (the one that examined the paging options http://www.codeproject.com/aspnet/PagingLarge.asp) suggested that a Cursor might be less harsh on the systems and in his tests seemed to be faster... Given this, would there be a future way that I could override DynamicQueryEngine.ManglePageSelectDQ() (currently static) so that different paging implementations could be chosen at runtime... wink

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39617
Joined: 17-Aug-2003
# Posted on: 10-Jun-2005 17:33:26   

Marcus wrote:

Otis wrote:

Marcus wrote:

Speaking of subclassing DataAccessAdapter did you have any thoughts on changing the protection level of:

private FetchPrefetchPath -> protected virtual private MergeNormal -> protected private MergeManyToMany -> protected

... in DataAccessAdapterBase to accomodate my custom paging + prefetch (from our other discussion on http://llblgen.com/tinyforum/Messages.aspx?ThreadID=1517&StartAtMessage=34)

MergeNormal and MergeManyToMany not virtual? I don't mind making them protected virtual simple_smile

Sure why not... simple_smile I was going with a minimalistic approach. wink The only reason they are needed is because they are called from FetchPrefetchPath and I need to be able to call back into them...

They'll be virtual in the next upgrade simple_smile (1.0.2005.1)

While we are on the subject of extensibility... smile My TempDB is getting hit hard due to paging creating temp tables... I havn't had a chance to experiement yet, but that article on CodeProject (the one that examined the paging options http://www.codeproject.com/aspnet/PagingLarge.asp) suggested that a Cursor might be less harsh on the systems and in his tests seemed to be faster... Given this, would there be a future way that I could override DynamicQueryEngine.ManglePageSelectDQ() (currently static) so that different paging implementations could be chosen at runtime... wink

Cursors on Sqlserver create also a temptable. So that's not going to do you any good. What could help is using a table variable instead, though that's slower in larger resultsets situations.

Frans Bouma | Lead developer LLBLGen Pro
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 14-Jun-2005 10:46:10   

Otis wrote:

Cursors on Sqlserver create also a temptable. So that's not going to do you any good. What could help is using a table variable instead, though that's slower in larger resultsets situations.

I didn't know that!! I thought that a FORWARD ONLY, READ ONLY cursor would have no need for a temp table...

[EDIT 2] Removed rubbish link altogether... flushed

If I were to implement a custom paging solution, as far as I can see, the only option is to make changes directly to DynamicQueryEngine.ManglePageSelectDQ(). Do you think there is a more eligant way to introduce the custom implementation?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39617
Joined: 17-Aug-2003
# Posted on: 14-Jun-2005 11:18:41   

Marcus wrote:

Otis wrote:

Cursors on Sqlserver create also a temptable. So that's not going to do you any good. What could help is using a table variable instead, though that's slower in larger resultsets situations.

I didn't know that!! I thought that a FORWARD ONLY, READ ONLY cursor would have no need for a temp table...

Hmm, I've to look it up indeed. I remember reading somewhere that cursors need a temptable, so when tuning your sql you should take care the tempdb wasn't very small (so a lot of file-extending was going on in the background)

[EDIT 2] Removed rubbish link altogether... flushed

haha I was just about to get some venting going about that erm... well... sql wink

If I were to implement a custom paging solution, as far as I can see, the only option is to make changes directly to DynamicQueryEngine.ManglePageSelectDQ(). Do you think there is a more eligant way to introduce the custom implementation?

No, that's it. But be sure what you're doing. Almost all 'fast' algoritms for paging on SqlServer fall apart when doing joins or multi-field pk's. The only reliable method which ALWAYS works is the temptable approach.

If you want to optimize paging, and it's already pretty optimized as it never inserts rows beyond the page requested, first try these: 1) optimize the tempdb: give it enough room to breath 2) use proper filters.

Especially 2) is often overlooked. 'just page through a 10,000 row resultset is not that handy, users will never reach page 987 manually. So you can apply a limit for example, only fetch the top 300, or apply more filters so the resultset is smaller to page through.

Paging IS expensive, no matter how you look at it or what kind of tricks are pulled. At least on SqlServer that is. Other databases (except access of course) have proper paging logic build into their engines for ages.

Frans Bouma | Lead developer LLBLGen Pro
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 14-Jun-2005 11:33:33   

Otis wrote:

haha I was just about to get some venting going about that erm... well... sql wink

Yes I was afraid of getting "bashed" by you for even quoting it in the first place... wink

Otis wrote:

But be sure what you're doing. Almost all 'fast' algoritms for paging on SqlServer fall apart when doing joins or multi-field pk's. The only reliable method which ALWAYS works is the temptable approach.

Ok... I'll keep that in mind...

Otis wrote:

If you want to optimize paging, and it's already pretty optimized as it never inserts rows beyond the page requested, first try these: 1) optimize the tempdb: give it enough room to breath 2) use proper filters.

Will start here as you suggest... simple_smile

Otis wrote:

Paging IS expensive, no matter how you look at it or what kind of tricks are pulled. At least on SqlServer that is. Other databases (except access of course) have proper paging logic build into their engines for ages.

I believe SQL Server 2005 will improve on this. BTW Did you know about the fact that "SQL Server 2000 never guarantees that IDENTITY values are assigned in the order specified in the query's ORDER BY clause" frowning from http://www.windowsitpro.com/SQLServer/Article/ArticleID/43922/43922.html

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39617
Joined: 17-Aug-2003
# Posted on: 14-Jun-2005 12:39:10   

Marcus wrote:

Otis wrote:

haha I was just about to get some venting going about that erm... well... sql wink

Yes I was afraid of getting "bashed" by you for even quoting it in the first place... wink

haha simple_smile nah, I'm not that rude stuck_out_tongue_winking_eye

Otis wrote:

Paging IS expensive, no matter how you look at it or what kind of tricks are pulled. At least on SqlServer that is. Other databases (except access of course) have proper paging logic build into their engines for ages.

I believe SQL Server 2005 will improve on this. BTW Did you know about the fact that "SQL Server 2000 never guarantees that IDENTITY values are assigned in the order specified in the query's ORDER BY clause" frowning from http://www.windowsitpro.com/SQLServer/Article/ArticleID/43922/43922.html

ack no! frowning I'll check it right away!

SqlServer 2005 will indeed have a ROW_NUMBER() option in select... finally!

(edit): about the identity order thingy: great info, but there's no solution apparently... disappointed i.o.w.: no alternative.

(edit2): haha, that person knows about ROW_NUMBER(), but he refuses to use it in a single query so he still uses a temptable confused ... (as here: http://weblogs.asp.net/Firoz/archive/2005/06/12/411949.aspx)

Frans Bouma | Lead developer LLBLGen Pro
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 29-Jun-2005 20:05:08   

Otis wrote:

SqlServer 2005 will indeed have a ROW_NUMBER() option in select... finally!

There's a TechEd Grok talk on this subject click here

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39617
Joined: 17-Aug-2003
# Posted on: 30-Jun-2005 11:08:22   

Thanks for the link! simple_smile

Frans Bouma | Lead developer LLBLGen Pro
MacDennis avatar
MacDennis
User
Posts: 50
Joined: 03-May-2005
# Posted on: 01-Jul-2005 15:00:54   

Any idea if the solutions discussed in this thread will be available soon? I also could use the paging and prefetch path functionality.

On a side note, I am playing around with ASP.Net beta2 in combination with LLBLGenPro. The generated code works great with gridviews which are tied to my custom objectdatasources which in turn call the LLBLGenPro DAL. Paging, sorting and even a search argument, works like a charm!

Nice product. sunglasses

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 01-Jul-2005 16:55:36   

MacDennis wrote:

Any idea if the solutions discussed in this thread will be available soon? I also could use the paging and prefetch path functionality.

Frans has agreed to change the protection level on some of the methods required to provide paging + prefetch. Once this has been released I can release my code which extends the DataAccessAdapterBase if you would like...

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39617
Joined: 17-Aug-2003
# Posted on: 02-Jul-2005 10:33:17   

MacDennis wrote:

Any idea if the solutions discussed in this thread will be available soon? I also could use the paging and prefetch path functionality.

As Marcus said, I'll open the routines for injection of other code, starting in 1.0.2005.1. I'll try to make it as easy as possible, so adding paging code isn't that hard.

On a side note, I am playing around with ASP.Net beta2 in combination with LLBLGenPro. The generated code works great with gridviews which are tied to my custom objectdatasources which in turn call the LLBLGenPro DAL. Paging, sorting and even a search argument, works like a charm!

Cool! simple_smile I had a hard time yesterday to get an Adapter collection working with BindingSource in winforms in design time (the designer code is IMHO broken, I can't specify a generic collection this way at designtime). You just wrote your own objectdatasource object and this offers also design time support? (I haven't looked at asp.net 2.0 )

Nice product. sunglasses

smile

Frans Bouma | Lead developer LLBLGen Pro
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 31-Jul-2005 20:18:07   

Marcus wrote:

Otis wrote:

Marcus wrote:

Speaking of subclassing DataAccessAdapter did you have any thoughts on changing the protection level of:

private FetchPrefetchPath -> protected virtual private MergeNormal -> protected private MergeManyToMany -> protected

... in DataAccessAdapterBase to accomodate my custom paging + prefetch (from our other discussion on http://llblgen.com/tinyforum/Messages.aspx?ThreadID=1517&StartAtMessage=34)

MergeNormal and MergeManyToMany not virtual? I don't mind making them protected virtual simple_smile

Sure why not... simple_smile I was going with a minimalistic approach. wink The only reason they are needed is because they are called from FetchPrefetchPath and I need to be able to call back into them...

While we are on the subject of extensibility... smile My TempDB is getting hit hard due to paging creating temp tables... I havn't had a chance to experiement yet, but that article on CodeProject (the one that examined the paging options http://www.codeproject.com/aspnet/PagingLarge.asp) suggested that a Cursor might be less harsh on the systems and in his tests seemed to be faster... Given this, would there be a future way that I could override DynamicQueryEngine.ManglePageSelectDQ() (currently static) so that different paging implementations could be chosen at runtime... wink

I've just implemented the extended version of DataAccessAdapter having moved my custom methods out of DataAccessAdapterBase and have come across another requirement for the new methods to work:

In addition to the protection level of the methods mentioned above, you'll need to change the following in EntityRelation.cs "internal string AliasStartEntity" -> public and maybe for consistency the AliasEndEntity also...

The other thing is whether you want to make Tracing usable from outside the core assemblies... By overriding core methods, you will loose some of the trace functionality and it might be an idea to make it public accessible so that the override methods can continue to issue trace commands.

Marcus

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39617
Joined: 17-Aug-2003
# Posted on: 01-Aug-2005 09:56:02   

Marcus wrote:

Marcus wrote:

Otis wrote:

Marcus wrote:

Speaking of subclassing DataAccessAdapter did you have any thoughts on changing the protection level of:

private FetchPrefetchPath -> protected virtual private MergeNormal -> protected private MergeManyToMany -> protected

... in DataAccessAdapterBase to accomodate my custom paging + prefetch (from our other discussion on http://llblgen.com/tinyforum/Messages.aspx?ThreadID=1517&StartAtMessage=34)

MergeNormal and MergeManyToMany not virtual? I don't mind making them protected virtual simple_smile

Sure why not... simple_smile I was going with a minimalistic approach. wink The only reason they are needed is because they are called from FetchPrefetchPath and I need to be able to call back into them...

While we are on the subject of extensibility... smile My TempDB is getting hit hard due to paging creating temp tables... I havn't had a chance to experiement yet, but that article on CodeProject (the one that examined the paging options http://www.codeproject.com/aspnet/PagingLarge.asp) suggested that a Cursor might be less harsh on the systems and in his tests seemed to be faster... Given this, would there be a future way that I could override DynamicQueryEngine.ManglePageSelectDQ() (currently static) so that different paging implementations could be chosen at runtime... wink

I've just implemented the extended version of DataAccessAdapter having moved my custom methods out of DataAccessAdapterBase and have come across another requirement for the new methods to work:

In addition to the protection level of the methods mentioned above, you'll need to change the following in EntityRelation.cs "internal string AliasStartEntity" -> public and maybe for consistency the AliasEndEntity also...

These are used by the join construction code, could you elaborate a bit why these have to be public? The reason I don't open up every method is that once I do that, I can't change the signature that easily as it will always break code.

The other thing is whether you want to make Tracing usable from outside the core assemblies... By overriding core methods, you will loose some of the trace functionality and it might be an idea to make it public accessible so that the override methods can continue to issue trace commands. Marcus

On one hand I think this is a good idea, on the other hand, it opens up the way to trace log whatever using a switch for the runtime libs. This can degrade the switch of the runtime libs to become unusable because it will trigger so many lines of code. I'll look into it, at the moment I think it's good to open it up, I've to see if I will run into probs in the future if something changes/should change (not likely)

Frans Bouma | Lead developer LLBLGen Pro
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 01-Aug-2005 10:20:24   

Otis wrote:

These are used by the join construction code, could you elaborate a bit why these have to be public? The reason I don't open up every method is that once I do that, I can't change the signature that easily as it will always break code.

My implementation of FetchPrefetchPath (which I call FetchParameterisedPrefetchPath) is almost identical to yours except where the filters build the predicates... later in the method the folllowing code is adding relations.

                // append extra relations and filters, if applicable.
                if (currentElement.FilterRelations != null)
                {
                    for (int j = 0; j < currentElement.FilterRelations.Count; j++)
                    {
                        EntityRelation currentRelation = (EntityRelation)currentElement.FilterRelations[j];
                        elementFilter.Relations.Add(currentRelation, currentRelation.AliasStartEntity, currentRelation.AliasEndEntity, currentRelation.HintForJoins);
                    }
                }

If you don't want to make AliasStartEntity and AliasEndEntity public, that's no problem... we can simply extract the above code to a protected method (in DataAccessAdapterBase) which I can call back into... and it becomes:

        protected static void ExtractedMethod(IPrefetchPathElement2 currentElement, IRelationPredicateBucket elementFilter)
        {
            // append extra relations and filters, if applicable.
            if (currentElement.FilterRelations != null)
            {
                for (int j = 0; j < currentElement.FilterRelations.Count; j++)
                {
                    EntityRelation currentRelation = (EntityRelation)currentElement.FilterRelations[j];
                    elementFilter.Relations.Add(currentRelation, currentRelation.AliasStartEntity, currentRelation.AliasEndEntity, currentRelation.HintForJoins);
                }
            }
        }

Marcus

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39617
Joined: 17-Aug-2003
# Posted on: 01-Aug-2005 21:27:32   

The extra method is appealing, though it fragments the logic around also, which might not be a good choice (which is also one of the reasons I keep methods together sometimes instead of splitting them up in a lot of small methods).

In the prefetch path code, there are 2 parts which are candidates for replacement with own code: 1) filter construction for collection fetch 2) merge routines after the fetch has taken place.

If I expose the logic in a way taht you can override these 2 parts, it plugs into the existing prefetch framework logic and requires less code, I think

Do you see any pitfalls with that? (As you already have written part 1) so it might require extra info)

Frans Bouma | Lead developer LLBLGen Pro
1  /  2  /  3  /  4