Need less requirements on Linq Take functionality

Posts   
 
    
Posts: 1263
Joined: 10-Mar-2006
# Posted on: 20-Aug-2010 20:29:12   

Building a linq query and then using Take(10), should just work.

There is no need for distinct rules/requirements, cannot do projection, etc. If I ask for only 10 records and that includes duplicates - then I need to deal with that, but it should only give 10 records from the database.

Currently, using a linq expression like:

var query= (from data in SomeWhere.MyData where data.x>12 select data).Take(10);

Generally works in llblgen....however if you do this:

var query= (from data in SomeWhere.MyData where data.x>12 select new {data.x, data.y}).Take(10);

It will not...however, doing:

var query= (from data in SomeWhere.MyData where data.x>12 select new {data.x, data.y}).Distinct().Take(10);

Does work.

Why are you enforcing this? If we request a Take(x) that should me you put 'Top X' in the query - nothing else should matter.

EF and Linq2Sql work regardless of the 'distinct' requirement and this causes issues. In my case, I 'think' I am pulling only the data I need - then by doing profiling I discover I am not!

I have read some of your other forum posts around this (ones that I could find - since I cannot search for 'take') wink .

I am sure you have reasons this is done - you always do - so let's discuss and consider the possibility of changing?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39888
Joined: 17-Aug-2003
# Posted on: 20-Aug-2010 21:14:03   

(removed, as it didn't make sense. See my reply below with full info)

Frans Bouma | Lead developer LLBLGen Pro
Posts: 1263
Joined: 10-Mar-2006
# Posted on: 20-Aug-2010 21:46:49   

'it will not', what does that mean?

Sorry, meant it will not server side restrict the rows.

This is because retrieving duplicate rows is not useful, you already obtained the row, why obtain it again? I know you are going to debate this, but I won't change it.

How do you know it is not useful? Always very confrontational when you insist on things at a provider level that put rules on what my data looks like. As an example, I might be doing a report showing details making up transactions - and for the data I am projecting - they are not unique. (They are obviously unique on the server). So, say I am showing date, amount and vendor in a listing. You purchased a pack of gum and got in your car and realized you forgot to buy Bob a pack of gum. You go back in and buy it. Guess what - that looks like duplicate data - but it is not.

var query = from trans in MyData.Transactions where trans.date<=DateTime.Today select new {date, vendor, amount};

query.ToList() This should return all the data - including duplicate rows.
query.Take(10) This should restrict the query to returning top 10 rows - regardless of the fact that I have two transactions that 'look alike'. query.Distinct().Take(10) <---in your provider would give only 10 rows, but would never return the 'duplicate projection' because of your requirements and now my details do not add up to my totals.

I know you are going to debate this, but I won't change it.

You know me well, but I do pick my battles. I am not the first or last one with this issue. Why not just let the provider 'do what we tell it to' and not try to impose 'your database theory' rules on it?

The fact that EF/L2S do it does NOT make it right...but if you could show me one other provider that did it your way - maybe you would have an argument. A linq query should return the same data from the backend source regardless of the provider though - and yours does not. If I showed a class of 1000 people the linq below and asked them how many records should be returned from the server - none of them would say - well, since you did a projection - all of them. None would say - well are the records distinct - or do some of them duplicate?

No, changing is not an option. 'take' is a limitation and that only works on unique rows, otherwise it's not correct.

I just checked and SQL SERVER has no such rule, nor any database I could find. You can put [top x] in front of any query and it will work - regardless of what kinds of rows you have. Again - this is simply you trying to enforce some rule on us - and I cannot figure out why? If I ask for 10 rows - pass that through to the backend. Do not worry about if I used distinct or what the server will return - that is my job.

'Take', or a limitation, can cause duplicate rows to be fetched.

Not sure why you say this. If I have query X without take and then apply take to query X it will not cause duplicate rows... Confused?

'However it's unclear if you wanted that (why would you?) or that you made a mistake. Hence the error.

There is not an error here - just the linq provider retrieving all the data from the datasource and then filtering it client side.

Look at it this way. Let's say you are retrieving 10 out of 1000 records. Your linq provider will retrieve ALL records from the database and after then get to the client, it will pick out the top 10. Why is that more correct than just letting the database pick out the top 10 records and only send 10 across?

MyIQueryable.Take(10).ToList() <--list has only 10 items in it, but query brought back all 1000 records. MyIQueryable.Take(10).Distinct().ToList() <--list has only 10 items in it and it only brought back 10 from database.

Point is - you are taking top 10 somewhere - why not at the database? Wouldn't that be much better?

sunglasses

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39888
Joined: 17-Aug-2003
# Posted on: 20-Aug-2010 22:36:42   

(removed, as it didn't make sense)

Frans Bouma | Lead developer LLBLGen Pro
Otis avatar
Otis
LLBLGen Pro Team
Posts: 39888
Joined: 17-Aug-2003
# Posted on: 21-Aug-2010 10:41:07   

.. After a good night sleep, I thought: wait a minute.... what I said was wrong. So I wrote a test:

[Test]
public void TakeDuplicatesTest()
{
    using(DataAccessAdapter adapter = new DataAccessAdapter())
    {
        LinqMetaData metaData = new LinqMetaData(adapter);
        var q = (from o in metaData.Order
                 orderby o.EmployeeId ascending, o.CustomerId ascending
                 select new { o.EmployeeId, o.CustomerId }).Take(10);

        foreach(var v in q)
        {
            Console.WriteLine("{0}, {1}", v.EmployeeId == null ? "Null" : v.EmployeeId.ToString(), v.CustomerId ==null ? "Null" : v.CustomerId.ToString());
        }
    }
}

the ordering is added to make it more easier to have duplicate rows.

Query: SELECT TOP 10 [LPLA_1].[EmployeeID] AS [EmployeeId], [LPLA_1].[CustomerID] AS [CustomerId] FROM [Northwind].[dbo].[Orders] [LPLA_1] ORDER BY [LPLA_1].[EmployeeID] ASC,[LPLA_1].[CustomerID] ASC

output: 1, ALFKI 1, ALFKI 1, ANTON 1, AROUT 1, AROUT 1, AROUT 1, BERGS 1, BERGS 1, BERGS 1, BONAP

ORMSupportClasses buildnr: 2.6.10.817 DQE: 2.6.9.917 Linq provider: 2.6.10.809

Almost 1 year ago, on August 28th, we released this change: Fix Runtime Libraries 2.6 08282009 28-Aug-2009 Linq: BREAKING CHANGE. AllowDuplicates was always initialized as 'false' in QueryExpression, leading to distinct emits in queries which could cause invalid results. Change is located at line 74 in QueryExpression.cs. Comment out that line to rollback this change. (see changelogs system in customer area)

Perhaps you're using an older linq provider, not sure.

Anyway, I've removed my previous post, as it didn't make sense. I hope this gives more insights in how it works, and with actual proof that it does work, I hope we can all sleep well again wink

Frans Bouma | Lead developer LLBLGen Pro
Posts: 1263
Joined: 10-Mar-2006
# Posted on: 23-Aug-2010 16:15:18   

I knew you would come around smile

It does not issue the 'top' command in my scenario without the distinct() clause. Here are the versions I am running:

Sql DQE: 2.6.9.917 Linq Support: 2.6.10.707 ORM Support: 2.6.10.401

Additionally, my query does some right joins, but that should not have an impact and what I project does not contain any ntext or anything like that.

Ideas/thoughts?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39888
Joined: 17-Aug-2003
# Posted on: 23-Aug-2010 17:51:50   

WayneBrantley wrote:

I knew you would come around smile

It does not issue the 'top' command in my scenario without the distinct() clause. Here are the versions I am running:

Sql DQE: 2.6.9.917 Linq Support: 2.6.10.707 ORM Support: 2.6.10.401

Additionally, my query does some right joins, but that should not have an impact and what I project does not contain any ntext or anything like that.

Ideas/thoughts?

Could you create a repro on northwind? ( I just need the query)

Frans Bouma | Lead developer LLBLGen Pro