- Home
- LLBLGen Pro
- LLBLGen Pro Runtime Framework
Linq to llblgen 4.2 ignores join
Joined: 18-Oct-2013
Hiya Frans,
Northwind query :
from p in Product
join od in OrderDetail on p.ProductId equals od.ProductId
select p
returns 77 rows (the exact size of Product)
while
(from p in Product
join od in OrderDetail on p.ProductId equals od.ProductId
select p)
.Count()
returns 2155 rows, respecting the inner join.
also
instead of "select p" in the first example
select p.Anymember
or
select new { p }
will return the correct 2155 rows, only the top example using p ignores the join.
regards, Bart Krijnen
Please post the generated SQL queries. You use the latest build?
(the sql queries are available if you enable DQE tracing: http://www.llblgen.com/documentation/4.2/LLBLGen%20Pro%20RTF/hh_goto.htm#Using%20the%20generated%20code/gencode_troubleshootingdebugging.htm#conventions )
The object fetch in the first query (the select p), does a SELECT DISTINCT ... as it will fetch unique rows, it won't return duplicate objects, it will return unique objects, so to avoid duplicates it will issue a DISTINCT in the query, and this leads to 77 rows (as the rest are duplicates). Other ORMs, like NHibernate and EF, return all duplicate rows, causing problems as in some cases you can't specify 'distinct' and you actually want the unique objects, not the duplicates.
The count on the other hand isn't an entity fetch and this is a simple count on the set with all the duplicate rows (so not the product rows, but the joined rows, this is a different set).
The other queries too don't do an entity fetch, but actually a projection. The new {p} query is a little confusing perhaps, as it does return a set of elements which contain entities, but the query is a projection query, which uses a different pipeline and doesn't do the optimization of filtering out duplicates (as a projection requires duplicate filtering to be explicitly defined through .Distinct())
Hope this helps
Joined: 18-Oct-2013
Hi Frans, no I'm sorry, this doesn't help at all....
On the one hand I'm glad you understand why this happens, and apparently why it isn't a bug in your eyes, but on the other hand...
LINQ to objects is a layer to easily iterate over a sequence, well used in .NET, and well understood, and will give the same results (amount wise) regardless of whatever variant I picked in the examples.
SQL itself, a well known/understood layer to fetch data out of a db, can "identically" be written as the (trivial) examples , and as expected gives the same result (amount wise)
Other well known object-relational mapping frameworks, you already mentioned NHibernate and the Entity Framework and I have tested it as well for Linq to SQL, all handle this the same way, all resulting in the same result (amount wise) regardless of projection.
LLBLGEN's (current) behaviour is the odd one out here...
I am a user, a programmer who wants to be able to have a reasonable expectation ahead of time what will happen, based on common pattern knowledge as in he stated technologies/examples above .
The huge benefit of abstracting technical details away is for the user to be more productive, being able to expect similar behavior in similar situations.
Of course I am well aware of implementation details that sometimes bleed through abstraction layers, but surely this trivial example should not reveal such a thing ?
Seeing that the current implementation results in LINQ to LLBLGEN behave differently from other major ORM's, LINQ to object and even SQL itself, I would highly appreciate it if you would resolve this discrepancy.
With kind regards, Bart Krijnen
MisterBK wrote:
Hi Frans, no I'm sorry, this doesn't help at all....
On the one hand I'm glad you understand why this happens, and apparently why it isn't a bug in your eyes, but on the other hand...
Let's not get carried away. The last time you started a debate you simply didn't respond at the end: http://www.llblgen.com/TinyForum/GotoMessage.aspx?MessageID=129939&ThreadID=22938
LINQ to objects is a layer to easily iterate over a sequence, well used in .NET, and well understood, and will give the same results (amount wise) regardless of whatever variant I picked in the examples.
SQL itself, a well known/understood layer to fetch data out of a db, can "identically" be written as the (trivial) examples , and as expected gives the same result (amount wise)
Linq to Objects isn't Linq to DBs. If you would have written a full Linq provider for a DB system you'd know they're not equal, simply because you can't map Linq sequences onto SQL statements in a 1:1 fashion. Comparing the two therefore is useless.
Other well known object-relational mapping frameworks, you already mentioned NHibernate and the Entity Framework and I have tested it as well for Linq to SQL, all handle this the same way, all resulting in the same result (amount wise) regardless of projection.
Yes, and that's a problem they have and we don't. No-one wants a set of entities which contains duplicates, as you have to filter them out manually. You can of course add 'Distinct' but not in all cases (e.g. when the query is generated for you by a service tier or e.g. binding source) and then it gives a lot of problems.
So 12 years ago we decided not to do that and instead always fetch unique entity instances, as that's the right thing to do: you ask for the products matching a query, you get them. If you would get a set of duplicate objects, it's useless: you'd alter one copy of the product entities in the collection but another one, a duplicate, in the same collection isn't altered. Makes no sense.
LLBLGEN's (current) behaviour is the odd one out here...
And future, this has always been the case and will not change in the future. That they do something different is their problem: there's no use case in which you would want a set of entity objects with duplicates in the same set, as the duplicate entity objects have no semantic value. And having the 'non-odd' behavior like EF/NH does is not preferable in other ways as well, as I explained above: there are situations you can't provide the distinct call but have to to avoid the useless duplicates.
I am a user, a programmer who wants to be able to have a reasonable expectation ahead of time what will happen, based on common pattern knowledge as in he stated technologies/examples above .
We do prefetch/eager loading also differently than EF, should we change it to their (limited) way of fetching too? Funny thing: their 'Include' call creates a massive join. So Customer Include Order Include OrderDetails creates a join with duplicate Customer rows. But, surprise, they filter these out so you don't get duplicates. Isn't that nice of them. But according to you, they should add all those duplicates in the resultset as the SQL query generated suggests that!
Every framework has different characteristics. We're not EF, not NHibernate. And they too have many differences between them.
But besides that, name 1 use case where you would need duplicate entity objects in the same set in memory. You can't, there is none.
The huge benefit of abstracting technical details away is for the user to be more productive, being able to expect similar behavior in similar situations.
Of course I am well aware of implementation details that sometimes bleed through abstraction layers, but surely this trivial example should not reveal such a thing ?
what do you mean? you simply say we should materialize all the rows returned from the datareader? Why? We're an ORM framework, not a dumb object materializer. The query is executed through an ORM framework, which performs its own logic on the entities before they're returned: e.g. NHibernate has a uniquing cache, if a row is fetched twice, they'll return the same entity class instance which holds that row, not a new entity class instance. This thus means the duplicates in the returned collection are actually the same objects already in the collection. EF doesn't. We have authorization. It might be an entity is not allowed to be returned/fetched by the current user, so it's not added to the returned collection. And there are many other scenarios thinkable.
Besides, you make assumptions about how the SQL will look like when you look at the linq query. Try to do that with a GroupJoin (join into) statement or a group by with multiple aggregates. Suddenly it becomes a problem. See what I mean?
Seeing that the current implementation results in LINQ to LLBLGEN behave differently from other major ORM's, LINQ to object and even SQL itself, I would highly appreciate it if you would resolve this discrepancy.
They're not the same, so why do they have to behave the same? The linq query says: you ask for the product entities matching the query q. You got them. The objects returned match perfectly the SQL that's been generated (it contains a DISTINCT). The SQL query matches what you wanted: the products matching a query.
You seem to expect the framework NOT to emit a distinct in the SQL query, but that's just you: there's no document which describes how Linq expression trees should result in SQL: it's simply a matter of converting intend expressed in the expression tree to an equivalent in the underlying query API which is then resulting in a SQL query. And how that converting is done is up to the framework and its developers, there's no rulebook.
There's no other way to handle these things. A good example is nullability and comparisons in C# vs VB.NET vs SQL. As nullability and comparisons differ in C# vs VB.NET vs. SQL, there's little else to do than do what the framework developer thinks best.
So short answer: NO, this won't be changed. It's also not a problem, but actually a good thing this is the way it is: one less way to create a bug, as the entity sets fetched never contain a duplicate entity.
EF:
var ctx = new Northwind26DataContext();
var q = (from p in ctx.Products
join od in ctx.OrderDetails on p.ProductId equals od.ProductId
select p).Include(p=>p.OrderDetails).ToList();
Returns 2015 product entities. Each entity object has related OD entities. Many of them are duplicates (as there are just 2015 OD entities in the DB). The query generated by the linq query above: (can you guess how it looks like? No, no-one can)
SELECT [Project1].[OrderID] AS [OrderID],
[Project1].[ProductID1] AS [ProductID],
[Project1].[ProductID] AS [ProductID1],
[Project1].[Discontinued] AS [Discontinued],
[Project1].[ProductName] AS [ProductName],
[Project1].[QuantityPerUnit] AS [QuantityPerUnit],
[Project1].[ReorderLevel] AS [ReorderLevel],
[Project1].[UnitPrice] AS [UnitPrice],
[Project1].[UnitsInStock] AS [UnitsInStock],
[Project1].[UnitsOnOrder] AS [UnitsOnOrder],
[Project1].[CategoryID] AS [CategoryID],
[Project1].[SupplierID] AS [SupplierID],
[Project1].[C1] AS [C1],
[Project1].[Discount] AS [Discount],
[Project1].[OrderID1] AS [OrderID1],
[Project1].[ProductID2] AS [ProductID2],
[Project1].[Quantity] AS [Quantity],
[Project1].[UnitPrice1] AS [UnitPrice1]
FROM (SELECT [Extent1].[ProductID] AS [ProductID],
[Extent1].[ProductName] AS [ProductName],
[Extent1].[SupplierID] AS [SupplierID],
[Extent1].[CategoryID] AS [CategoryID],
[Extent1].[QuantityPerUnit] AS [QuantityPerUnit],
[Extent1].[UnitPrice] AS [UnitPrice],
[Extent1].[UnitsInStock] AS [UnitsInStock],
[Extent1].[UnitsOnOrder] AS [UnitsOnOrder],
[Extent1].[ReorderLevel] AS [ReorderLevel],
[Extent1].[Discontinued] AS [Discontinued],
[Extent2].[OrderID] AS [OrderID],
[Extent2].[ProductID] AS [ProductID1],
[Extent3].[OrderID] AS [OrderID1],
[Extent3].[ProductID] AS [ProductID2],
[Extent3].[UnitPrice] AS [UnitPrice1],
[Extent3].[Quantity] AS [Quantity],
[Extent3].[Discount] AS [Discount],
CASE
WHEN ([Extent3].[OrderID] IS NULL) THEN CAST(NULL AS int)
ELSE 1
END AS [C1]
FROM [dbo].[Products] AS [Extent1]
INNER JOIN [dbo].[Order Details] AS [Extent2]
ON [Extent1].[ProductID] = [Extent2].[ProductID]
LEFT OUTER JOIN [dbo].[Order Details] AS [Extent3]
ON [Extent1].[ProductID] = [Extent3].[ProductID]) AS [Project1]
ORDER BY [Project1].[OrderID] ASC,
[Project1].[ProductID1] ASC,
[Project1].[ProductID] ASC,
[Project1].[C1] ASC
Executing this monster returns 66967 rows. A tremendous amount of them are duplicates. The vast majority of them. The materialization of the entities takes 330ms(!) on a 4 core VM on a i7 3.4ghz to process. (that's not DB, query time and other overhead, just entity materialization) Not only that, but you would want all 66967 product entities in a set, right? As that's what the SQL dictates. But you got only 2015.
It also has a big problem: As there are product duplicates in the set, which ones are the unique ones and which ones are the duplicates? Do the duplicates refer to the duplicate order details or to the unique ones (where 'unique ones' are the ones you'd want to keep, e.g. for modification, and the duplicate ones are the ones to skip)? The graph returned is completely unusable.
At the same time, our system:
var q = (from p in metaData.Product
join od in metaData.OrderDetail on p.ProductId equals od.ProductId
select p).WithPath(path=>path.Prefetch(p => p.OrderDetails)).ToList();
results in 2 queries:
SELECT DISTINCT [LPA_L1].[CategoryID] AS [CategoryId],
[LPA_L1].[Discontinued],
[LPA_L1].[ProductID] AS [ProductId],
[LPA_L1].[ProductName],
[LPA_L1].[QuantityPerUnit],
[LPA_L1].[ReorderLevel],
[LPA_L1].[SupplierID] AS [SupplierId],
[LPA_L1].[UnitPrice],
[LPA_L1].[UnitsInStock],
[LPA_L1].[UnitsOnOrder]
FROM ( [Northwind].[dbo].[Products] [LPA_L1]
INNER JOIN [Northwind].[dbo].[Order Details] [LPA_L2]
ON [LPA_L1].[ProductID] = [LPA_L2].[ProductID])
and
SELECT [Northwind].[dbo].[Order Details].[Discount],
[Northwind].[dbo].[Order Details].[OrderID] AS [OrderId],
[Northwind].[dbo].[Order Details].[ProductID] AS [ProductId],
[Northwind].[dbo].[Order Details].[Quantity],
[Northwind].[dbo].[Order Details].[UnitPrice]
FROM [Northwind].[dbo].[Order Details]
WHERE ([Northwind].[dbo].[Order Details].[ProductID] IN
(SELECT [LPA_L1].[ProductID] AS [ProductId]
FROM ( [Northwind].[dbo].[Products] [LPA_L1]
INNER JOIN [Northwind].[dbo].[Order Details] [LPA_L2]
ON [LPA_L1].[ProductID] = [LPA_L2].[ProductID])))
The first returning 74 rows, the latter 2015. To materialize the entities they together (with merging!) take 19.5ms combined, that's 2 commands. No duplicates for you to filter out in all the product entities and no duplicates for you to filter out in the OrderDetail entities related to the products.
I have no idea why you'd want EF's system over ours. Just because they do something a certain way doesn't make it 'the' way to do it, on the contrary. Let us do our job, Bart, we know what we're doing so you can reap the benefits from that.