SqlServer paging with single sorted column having duplicate values

Posts   
 
    
happyfirst
User
Posts: 215
Joined: 28-Nov-2008
# Posted on: 06-Nov-2012 17:08:30   

LLBL 3.1, Sql Server 2008, Adapter

Having an interesting problem with paging on sql server 2008 when sorting on a single column that has duplicate values.

The problem appears to be that whatever sqlserver is doing behind the scenes to order the rows within that duplicate value (some hidden 2nd sort?), it is not consistently ordering the rows based on TOP, and if that TOP ends up ending in the middle of a group of rows with duplicate sorted values.

My first page of data is ending on let's say account 1001. The next page, the rows are again the same account number 1001. So the account group is broken across pages.

If I log llbl generated queries for the first and second page and run then manually, I can duplicate the problem in sql server.

For the first page, you are just issuing a top 50 query. Looks good. BUT, if I just change that top 50 to top 55, what's interesting is the last few records that all match my last account number, 1001, for that first page, those records now show up in a different order!!!

If I manipulate the query you use to fetch the 2nd page to really return the first page, those ending records are coming back in a different order than the query you normally use to fetch the first page.

So basically, I can't page when sorting on a column that has duplicate values.

I did find that TSQL does have a "WITH TIES" clause that helps to keep duplicate sorted value groups together, but I don't think this is a proper fix. If I had a page end on some "ties" group with thousands of records, that would wreck paging and really slow it down unless maybe you were to use WITH TIES on the inner clause and then still narrow it down to just my page size? I haven't experimented that far yet but then if that does work, how do I even get that WITH TIES into the clause.

Any other ideas on how to fix (besides having to always add a secondary sort now to every query)?

Walaa avatar
Walaa
Support Team
Posts: 14950
Joined: 21-Aug-2005
# Posted on: 06-Nov-2012 20:06:22   

I'm not sure if I understand the problem correctly. But there might be an easy solution, if you used a secondary field in your ordering clause.

Say I need to sort Orders on CustomerId. But for the same customer, records can vary in order according to the page (as you describe it), then I might as well find another field to act as a suporting field in the Order Clause. e.g. sort on CustomerId & OrderDate.

happyfirst
User
Posts: 215
Joined: 28-Nov-2008
# Posted on: 06-Nov-2012 20:32:14   

Yeah, a secondary sort fixes the problem, but users don't think that way. I basically would have to find a way behind the scenes to make sure to always add the primary keys (however many) to the end of whatever sort clause the user has chosen. But then potentially I am slowing things down with extra sort columns? With a proper paging/sorting algorithm, should it be my responsiblity to have to add extra sort columns to get the expected results?

We did not have this problem when the app was running against oracle. Just now noticing this problem since rebuilding for sqlserver. I'm thinking Oracle was more consistent in how it returned the sub rows ordered within the duplicate sorted value. I'd think behind the scenes, if I ask for sql server to sort, then great, it will sort, but I guess I kind of expected that even with duplicate values in that sort column, sql server would still consistently return the rows in the same order regardless of whether I did top 50, top 100, or even all the rows, but it's not.

SqlServer will return the rows in a constistent order for a query that is run over and over. BUT, you use a different query style for the first page and subsequent pages. AND, your TOP N changes per page. So across the different paging queries, sql server is returning the rows in different orders (but still sorted properly by the column of choice).

I just want to make sure I'm not missing any other sql server tricks to get it to more consistently returned the sorted rows.

Or is this paging "issue" a side effect of the way you are doing your paging sql?

It would be nice not to have to add extra sort columns to get paging to work.

daelmo avatar
daelmo
Support Team
Posts: 8245
Joined: 28-Nov-2005
# Posted on: 07-Nov-2012 06:34:41   

happyfirst wrote:

Or is this paging "issue" a side effect of the way you are doing your paging sql?

There is no "issue", it's the way SQL standard works where the order is not defined (thus not guaranteed) for possible duplicate rows. It's more elaborated at http://www.llblgen.com/tinyforum/Messages.aspx?ThreadID=13353

happyfirst wrote:

It would be nice not to have to add extra sort columns to get paging to work.

Just to be clear: Paging works, the results are the expected ones given the context (possible duplicate rows and the nature of undefined order on a SQLServer table). LLBLGen won't add that PKs for you, as LLBLGen Framework doesn't make assumptions for you. There are situations where you include sort clauses on related entities, so You can -of course- write your own code that add this for you (i.e. write a generic routine that receive a sort expression and add the corresponding PKs fields to that) then return the new sort expression.

David Elizondo | LLBLGen Support Team
happyfirst
User
Posts: 215
Joined: 28-Nov-2008
# Posted on: 07-Nov-2012 14:17:02   

Thanks for the other link.

I guess I am at the mercy of SqlServer not reliabily doing the same thing with the different queries you are giving it for paging. I guess I just assumed that the unimplied final sort would have been the clustered index or something else consistent.

So that's it. I must add more columns to the sort. I guess for llbl paging, whenever I have sqlserver, I must remmeber to always add the primary keys to the end of any sorts if I want to ensure the rows come back in the same order so as to not break order across pages. Just seems like a silly thing to have to remember or even do. But if it's the only fix, I would think or hope that you could have (or add) a property that I could set statically into the framework for sqlserver such that if I ever load a "page" of data, that llbl could then optionally add those primary keys to the sort for me.

Whether or not there is an issue with the llbl queries used for paging I think is up for debate. To me, it's not about right or wrong, but what is the expected behaviour and what makes life easier for the developer using the library.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 08-Nov-2012 17:58:06   

We don't force an ordering on a paging query, but in theory we should throw an exception if no ordering is specified, as by definition all SQL using databases will return a set without an ordering as unordered, so the ordering of the set is undefined. This is not only with sqlserver, but with any database.

We could sort on the pk value, but we don't know whether the column you order on already has duplicate values. If there are, we in theory could add the pk value as sorting, but if we do, it will alter results compared to how they are now, unordered. However, even if we do, we can't determine whether there are duplicate values: we don't know that.

The only person who does know is the developer who defines the query. Therefore it's key to define the ordering there: specify the ordering till there's no doubt there are duplicate values in your ordering specification so the order is deterministic.

I'm actually reluctant to auto-add the pk fields as ordering mechanism. Not only is it not always possible to do so (only on plain entity fetches this is possible, not on queries which do a projection), it's also an ordering which is perhaps also unwanted.

Frans Bouma | Lead developer LLBLGen Pro
happyfirst
User
Posts: 215
Joined: 28-Nov-2008
# Posted on: 08-Nov-2012 20:53:43   

I'm "told" this problem was not happening with the Oracle version of the app. Maybe behind the scenes oracle just does something that is more consistent? Or maybe they're just wrong and the problem was there too just nobody noticed it.

As for sqlserver, I've always assumed that if no order by is present but there is a primary key (clustered), that sqlserver uses that. But I googled it and found conflicting info. I found one post that mentioned as soon as you add TOP N, that sqlserver no longer does that.

In any event, it just seems ridiculous that now I need to think more deeply about any paging I do and whether or not I need to add all the primary keys to the end. Or just not waste time thinking about it, and just always do it. Who wants to remember or think about that issue? And if the primary keys ever change, I have to go fix up that code.

I only mentioned it because that earlier link there was your comment that you "think" ms appends the primary keys to the linq to sql queries and that llbl doesn't do that.

Otis wrote:

I'm actually reluctant to auto-add the pk fields as ordering mechanism. Not only is it not always possible to do so (only on plain entity fetches this is possible, not on queries which do a projection), it's also an ordering which is perhaps also unwanted.

I can understand with not wanting to break past apps. But I'd think you could find a way to add some global static property, maybe on the DataAccessAdapter class or something, default is off, but if I do turn it on, and llbl internally builds a paging query and there are primary keys available, that it could auto append them for us?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 09-Nov-2012 10:35:41   

happyfirst wrote:

I'm "told" this problem was not happening with the Oracle version of the app. Maybe behind the scenes oracle just does something that is more consistent? Or maybe they're just wrong and the problem was there too just nobody noticed it.

The database will stream rows to the client 1 at a time, and in general it will do that in the order in which the rows are read from the table, i.e. the sequence in which they're stored on disk, which is controlled by the clustered index. This order is altered when you apply a custom ordering through order by. Then, the set is re-ordered before streaming back to the client. It can very well be that within Oracle, it keeps the clustered index order when duplicates are found during ordering in this particular query and in sqlserver, due to different code used inside sqlserver, it might be that there, the duplicates are ordered differently. This can also be due to larger pagesizes in oracle compared to sqlserver (rows are stored on pages) and different page storage mechanisms within oracle compared to sqlserver.

So it is possible you're seeing different behavior, but it's just luck. SQL is as a standard not defining any ordering for SELECT, unless an ordering is specified. simple_smile

As for sqlserver, I've always assumed that if no order by is present but there is a primary key (clustered), that sqlserver uses that. But I googled it and found conflicting info. I found one post that mentioned as soon as you add TOP N, that sqlserver no longer does that.

It won't apply any ordering, if you see the order you want, based on clustered indexing, it's because it reads the rows that way. However it might very well be due to memory mechanisms used internally that rows are given in a different order, if there's no ordering applied.

In any event, it just seems ridiculous that now I need to think more deeply about any paging I do and whether or not I need to add all the primary keys to the end. Or just not waste time thinking about it, and just always do it. Who wants to remember or think about that issue? And if the primary keys ever change, I have to go fix up that code.

It's not ridiculous, it's how things are: SELECT has an undefined order, that's what you start with. If you want ordering, fine, define it simple_smile If the ordering isn't conclusive, you will get results which might not be what you want. To get what you want you have to formulate what you want.

I only mentioned it because that earlier link there was your comment that you "think" ms appends the primary keys to the linq to sql queries and that llbl doesn't do that.

Linq to sql does append pk fields if there's no ordering, EF throws up (IIRC, not sure). However if there's some ordering (e.g. you sort on Country, but not on City, so you get duplicates as multiple rows have the same Country), pk fields aren't added by linq to sql, as they don't know whether that will influence your order or not.

There are other issues with respect to adding fields to a set which is sorted: databases add any column which isn't in the projection but which is in the orderby clause to the resultset, then sorts the resultset simply on the column(s) specified in the orderby, and silently removes the column again. However, there's a caveat: if you have distinct specified, the hidden column(s) silently added by the database to sort on, ARE taken into account when doing distinct filtering on the client. With paging you absolutely want distinct filtering. I.o.w.: it might ruin your distinct filtering as with the PK fields (if they're not in the projection) many rows might look unique while they're not.

Otis wrote:

I'm actually reluctant to auto-add the pk fields as ordering mechanism. Not only is it not always possible to do so (only on plain entity fetches this is possible, not on queries which do a projection), it's also an ordering which is perhaps also unwanted.

I can understand with not wanting to break past apps. But I'd think you could find a way to add some global static property, maybe on the DataAccessAdapter class or something, default is off, but if I do turn it on, and llbl internally builds a paging query and there are primary keys available, that it could auto append them for us?

Which PK fields do I add then, if you specify a projection out of all the entity fields? It will also make things likely fail to produce the proper resultset, see the remark above about distinct filtering.

I'm sorry, but this is one of the areas where we can't do much unless stepping on someone's tows, no matter what we do.

Frans Bouma | Lead developer LLBLGen Pro