Bug with Paging and Projection

Posts   
 
    
MarcoP avatar
MarcoP
User
Posts: 270
Joined: 29-Sep-2004
# Posted on: 06-Jan-2011 21:10:44   

LLBL 2.6 Final v2.0.50727

Sorry if this is a repost, but it is urgent and I just uncovered it. I am using mysql driver and when I perform a projection using paging, the LIMIT X, X is placed at the end of the query, NOT the inner select. for example:


select column1 from (select column1 from table) order by column1 LIMIT 1, 25


var query = MetaData.CapitalRevolving
                .Where(x => x.DoNotContact != DoNotContact.Yes || x.DoNotContact == null)
                .Select(x =>
                    new CapitalRevolvingRecord
                    {
                        DmConstantKey = x.DmConstantKey,
                        CompanyName = x.CompanyName,
                        FirstName = x.FirstName,
                        LastName = x.LastName,
                    }
            );

            // this happens in a base class.
            return query
                .OrderBy(...) 
                .Where(Predicate)
                .TakePage(Page, PageSize)
                .ToList();

The limit should be placed inside the inner select. Now I know why my query against large tables are taking a LONG time!

Thanks!

MTrinder
User
Posts: 1461
Joined: 08-Oct-2008
# Posted on: 06-Jan-2011 21:58:27   

Thanks, we'll take a look at this. I'll close your other post about the performance.

Matt

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39613
Joined: 17-Aug-2003
# Posted on: 06-Jan-2011 23:55:58   

Paging is always against the outer query, not against the inner query (as the paging query could require any given construct), which means you can't page in an inner query.

besides that, this shouldn't matter. The limit is for mysql an indication that it should limit the resultset to return to the client. And you can't limit an unordered list, that doesn't make any sense simple_smile So the ordering always should be in the same query as the paging, i.e. the paging should be ON the ordered set.

To work around this, group the query together so you get 1 query...

Frans Bouma | Lead developer LLBLGen Pro
MarcoP avatar
MarcoP
User
Posts: 270
Joined: 29-Sep-2004
# Posted on: 07-Jan-2011 00:06:03   

Otis wrote:

Paging is always against the outer query, not against the inner query (as the paging query could require any given construct), which means you can't page in an inner query.

besides that, this shouldn't matter. The limit is for mysql an indication that it should limit the resultset to return to the client. And you can't limit an unordered list, that doesn't make any sense simple_smile So the ordering always should be in the same query as the paging, i.e. the paging should be ON the ordered set.

To work around this, group the query together so you get 1 query...

Crap, I see what you mean. Basically I have a base class that lets derived classes build IQueryable (using a projection) and in the base class I handle the basic ordering, where, takepage, etc. So the problem is i am applying the ordering, filtering, and paging AFTER the derived classes construct the query. any thoughts on how to overcome this?

In other words, I use the PredicateBuilder to allow the UI to build their where clause. BUT, I cannot add the expression to the Where method until AFTER the select new Projection construct, thus an inner query is used. Does that make sense?

daelmo avatar
daelmo
Support Team
Posts: 8245
Joined: 28-Nov-2005
# Posted on: 07-Jan-2011 06:02:25   

How your classes looks like? Could you construct the query AFTER you have the 'where' input from the GUI? coz I think that is the only way to go.

David Elizondo | LLBLGen Support Team
MarcoP avatar
MarcoP
User
Posts: 270
Joined: 29-Sep-2004
# Posted on: 07-Jan-2011 06:26:46   

daelmo wrote:

How your classes looks like? Could you construct the query AFTER you have the 'where' input from the GUI? coz I think that is the only way to go.

It's just now the UI uses the entity to build the predicate and gets returned a DTO...little confusing. That's also assuming no joining to other tables.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39613
Joined: 17-Aug-2003
# Posted on: 07-Jan-2011 11:01:03   

Do you have indices defined in the DB on the fields you sort on? It's likely the slowness comes from the fact you order on a field which doesn't have its own index, which means it's a table scan.

Frans Bouma | Lead developer LLBLGen Pro
MarcoP avatar
MarcoP
User
Posts: 270
Joined: 29-Sep-2004
# Posted on: 07-Jan-2011 14:13:40   

Otis wrote:

Do you have indices defined in the DB on the fields you sort on? It's likely the slowness comes from the fact you order on a field which doesn't have its own index, which means it's a table scan.

I do. The problem is I am writing an application (denormalized database) that contains anywhere from 500K to 8 millions records per table, thus my application is built around paging. So when the LIMIT happens on the outer query, I guess that is making it perform poorly.

Any other thoughts, ideas?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39613
Joined: 17-Aug-2003
# Posted on: 07-Jan-2011 14:30:04   

MarcoP wrote:

Otis wrote:

Do you have indices defined in the DB on the fields you sort on? It's likely the slowness comes from the fact you order on a field which doesn't have its own index, which means it's a table scan.

I do. The problem is I am writing an application (denormalized database) that contains anywhere from 500K to 8 millions records per table, thus my application is built around paging. So when the LIMIT happens on the outer query, I guess that is making it perform poorly.

Any other thoughts, ideas?

The limit can't be applied on the inner query, as that would make 1) no sense (it would order the page, not all data) and 2) it's not always possible to do this in all DQEs, and the architecture of the query producer produces 1 query per level. Paging directives are therefore moved to the outer query, as that's exactly what you want: you want to obtain the page X on the resultset to return.

What I think happens is that mysql isn't capable of sorting the resultset fast enough, which causes the slowness. After all, it has to sort 8million rows. The only way to overcome this IMHO is to drop the sorting, but then you get random rows back.

Frans Bouma | Lead developer LLBLGen Pro
MarcoP avatar
MarcoP
User
Posts: 270
Joined: 29-Sep-2004
# Posted on: 07-Jan-2011 14:45:27   

Otis wrote:

MarcoP wrote:

Otis wrote:

Do you have indices defined in the DB on the fields you sort on? It's likely the slowness comes from the fact you order on a field which doesn't have its own index, which means it's a table scan.

I do. The problem is I am writing an application (denormalized database) that contains anywhere from 500K to 8 millions records per table, thus my application is built around paging. So when the LIMIT happens on the outer query, I guess that is making it perform poorly.

Any other thoughts, ideas?

The limit can't be applied on the inner query, as that would make 1) no sense (it would order the page, not all data) and 2) it's not always possible to do this in all DQEs, and the architecture of the query producer produces 1 query per level. Paging directives are therefore moved to the outer query, as that's exactly what you want: you want to obtain the page X on the resultset to return.

What I think happens is that mysql isn't capable of sorting the resultset fast enough, which causes the slowness. After all, it has to sort 8million rows. The only way to overcome this IMHO is to drop the sorting, but then you get random rows back.

It performs great if I remove the inner select and just do the following over the table:


select column1, column2 from table order by column1 where ... LIMIT 1, 25

When I look at the execution plan in the my sql query window, the order by does take like 30% BUT the Sending Data part takes the rest. I would think that would mean the amount of data, but when i run the same query with and without the inner select, performance is night and day different.

Also, MySql does allow you to using a LIMIT on the inner select (also peform an order by here) and thus performance is great again....sig

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39613
Joined: 17-Aug-2003
# Posted on: 08-Jan-2011 16:18:40   

MarcoP wrote:

Otis wrote:

MarcoP wrote:

Otis wrote:

Do you have indices defined in the DB on the fields you sort on? It's likely the slowness comes from the fact you order on a field which doesn't have its own index, which means it's a table scan.

I do. The problem is I am writing an application (denormalized database) that contains anywhere from 500K to 8 millions records per table, thus my application is built around paging. So when the LIMIT happens on the outer query, I guess that is making it perform poorly.

Any other thoughts, ideas?

The limit can't be applied on the inner query, as that would make 1) no sense (it would order the page, not all data) and 2) it's not always possible to do this in all DQEs, and the architecture of the query producer produces 1 query per level. Paging directives are therefore moved to the outer query, as that's exactly what you want: you want to obtain the page X on the resultset to return.

What I think happens is that mysql isn't capable of sorting the resultset fast enough, which causes the slowness. After all, it has to sort 8million rows. The only way to overcome this IMHO is to drop the sorting, but then you get random rows back.

It performs great if I remove the inner select and just do the following over the table:


select column1, column2 from table order by column1 where ... LIMIT 1, 25

that suggests that the optimizer of mysql is pretty stupid. It likely does: - fetch inner resultset in tempdb table - fetch outer resultset from table in tempdb

This causes a slow query if your tempdb is small. It also isn't as efficient, as it does 2 queries instead of one.

When I look at the execution plan in the my sql query window, the order by does take like 30% BUT the Sending Data part takes the rest. I would think that would mean the amount of data, but when i run the same query with and without the inner select, performance is night and day different.

Also, MySql does allow you to using a LIMIT on the inner select (also peform an order by here) and thus performance is great again....sig

but that would not help, the orderby has to be done on the query which is limited. Otherwise you're ordering the page.

So IMHO you should write 1 method for this query, as 1 query isn't doable this way, or use our own query system.

Frans Bouma | Lead developer LLBLGen Pro