'it will not', what does that mean?
Sorry, meant it will not server side restrict the rows.
This is because retrieving duplicate rows is not useful, you already obtained the row, why obtain it again? I know you are going to debate this, but I won't change it.
How do you know it is not useful? Always very confrontational when you insist on things at a provider level that put rules on what my data looks like. As an example, I might be doing a report showing details making up transactions - and for the data I am projecting - they are not unique. (They are obviously unique on the server). So, say I am showing date, amount and vendor in a listing. You purchased a pack of gum and got in your car and realized you forgot to buy Bob a pack of gum. You go back in and buy it. Guess what - that looks like duplicate data - but it is not.
var query = from trans in MyData.Transactions
where trans.date<=DateTime.Today
select new {date, vendor, amount};
query.ToList() This should return all the data - including duplicate rows.
query.Take(10) This should restrict the query to returning top 10 rows - regardless of the fact that I have two transactions that 'look alike'.
query.Distinct().Take(10) <---in your provider would give only 10 rows, but would never return the 'duplicate projection' because of your requirements and now my details do not add up to my totals.
I know you are going to debate this, but I won't change it.
You know me well, but I do pick my battles. I am not the first or last one with this issue. Why not just let the provider 'do what we tell it to' and not try to impose 'your database theory' rules on it?
The fact that EF/L2S do it does NOT make it right...but if you could show me one other provider that did it your way - maybe you would have an argument. A linq query should return the same data from the backend source regardless of the provider though - and yours does not. If I showed a class of 1000 people the linq below and asked them how many records should be returned from the server - none of them would say - well, since you did a projection - all of them. None would say - well are the records distinct - or do some of them duplicate?
No, changing is not an option. 'take' is a limitation and that only works on unique rows, otherwise it's not correct.
I just checked and SQL SERVER has no such rule, nor any database I could find. You can put [top x] in front of any query and it will work - regardless of what kinds of rows you have. Again - this is simply you trying to enforce some rule on us - and I cannot figure out why? If I ask for 10 rows - pass that through to the backend. Do not worry about if I used distinct or what the server will return - that is my job.
'Take', or a limitation, can cause duplicate rows to be fetched.
Not sure why you say this. If I have query X without take and then apply take to query X it will not cause duplicate rows... Confused?
'However it's unclear if you wanted that (why would you?) or that you made a mistake. Hence the error.
There is not an error here - just the linq provider retrieving all the data from the datasource and then filtering it client side.
Look at it this way. Let's say you are retrieving 10 out of 1000 records. Your linq provider will retrieve ALL records from the database and after then get to the client, it will pick out the top 10. Why is that more correct than just letting the database pick out the top 10 records and only send 10 across?
MyIQueryable.Take(10).ToList() <--list has only 10 items in it, but query brought back all 1000 records.
MyIQueryable.Take(10).Distinct().ToList() <--list has only 10 items in it and it only brought back 10 from database.
Point is - you are taking top 10 somewhere - why not at the database? Wouldn't that be much better?