Please help me understand these inefficient sql queries

Posts   
 
    
Posts: 14
Joined: 12-Dec-2005
# Posted on: 15-Sep-2010 20:20:19   

So, I'm testing on LLBLGen 3.0 on the AdventureWorks database. I want to retrieve the names of the Top 100 customers. So, with AdventureWorks, that means getting the top 100 records from the Customer table, joining it with the Individual table, and then joining that with the Contact table.

This is the sql that I want:

SELECT TOP 100 contact.FirstName, contact.LastName FROM Sales.Customer cust JOIN Sales.Individual i ON i.CustomerID = cust.CustomerID JOIN Person.Contact contact ON contact.ContactID = i.ContactID

My code looks like this:


            var customers = new EntityCollection<CustomerEntity>();
            IPrefetchPath2 path = new PrefetchPath2(EntityType.CustomerEntity);
            path.Add(CustomerEntity.PrefetchPathIndividual).SubPath.Add(IndividualEntity.PrefetchPathContact);

            using (var adapter = new DataAccessAdapter())
            {
                adapter.FetchEntityCollection(customers, null, 100, null, path);
            }

Now, LLBLGen generates three queries to do this.

exec sp_executesql N'SELECT TOP(@p2) [AdventureWorks].[Sales].[Customer].[AccountNumber], [AdventureWorks].[Sales].[Customer].[CustomerID] AS [CustomerId], [AdventureWorks].[Sales].[Customer].[CustomerType], [AdventureWorks].[Sales].[Customer].[ModifiedDate], [AdventureWorks].[Sales].[Customer].[rowguid] AS [Rowguid], [AdventureWorks].[Sales].[Customer].[TerritoryID] AS [TerritoryId] FROM [AdventureWorks].[Sales].[Customer]  ',N'@p2 bigint',@p2=100

SELECT [AdventureWorks].[Sales].[Individual].[ContactID] AS [ContactId], [AdventureWorks].[Sales].[Individual].[CustomerID] AS [CustomerId], [AdventureWorks].[Sales].[Individual].[Demographics], [AdventureWorks].[Sales].[Individual].[ModifiedDate] FROM [AdventureWorks].[Sales].[Individual]   WHERE ( [AdventureWorks].[Sales].[Individual].[CustomerID] IN (SELECT [AdventureWorks].[Sales].[Customer].[CustomerID] AS [CustomerId] FROM [AdventureWorks].[Sales].[Customer]  ))

SELECT [AdventureWorks].[Person].[Contact].[AdditionalContactInfo], [AdventureWorks].[Person].[Contact].[ContactID] AS [ContactId], [AdventureWorks].[Person].[Contact].[EmailAddress], [AdventureWorks].[Person].[Contact].[EmailPromotion], [AdventureWorks].[Person].[Contact].[FirstName], [AdventureWorks].[Person].[Contact].[LastName], [AdventureWorks].[Person].[Contact].[MiddleName], [AdventureWorks].[Person].[Contact].[ModifiedDate], [AdventureWorks].[Person].[Contact].[NameStyle], [AdventureWorks].[Person].[Contact].[PasswordHash], [AdventureWorks].[Person].[Contact].[PasswordSalt], [AdventureWorks].[Person].[Contact].[Phone], [AdventureWorks].[Person].[Contact].[rowguid] AS [Rowguid], [AdventureWorks].[Person].[Contact].[Suffix], [AdventureWorks].[Person].[Contact].[Title] FROM [AdventureWorks].[Person].[Contact]   WHERE ( [AdventureWorks].[Person].[Contact].[ContactID] IN (SELECT [AdventureWorks].[Sales].[Individual].[ContactID] AS [ContactId] FROM [AdventureWorks].[Sales].[Individual]   WHERE ( ( [AdventureWorks].[Sales].[Individual].[CustomerID] IN (SELECT [AdventureWorks].[Sales].[Customer].[CustomerID] AS [CustomerId] FROM [AdventureWorks].[Sales].[Customer]  )))))

Now, ignoring the fact that I haven't specified that I only want FirstName and LastName in the code... the generated SQL is grossly inefficient. Its bringing back 18000+ rows for the second and third queries. What am I doing wrong?

EDIT:

Upon further experimentation, I wrote the following query with Entity Framework:

            var context = new AdventureWorksEntities();
            var q = from x in context.Customers.Include("Individual.Contact")
                                     select x;

            List<Customer> list = q.ToList();

It produced the following query, which is much more efficient:

SELECT [Extent1].[CustomerID] AS [CustomerID],
       [Extent1].[TerritoryID] AS [TerritoryID],
       [Extent1].[AccountNumber] AS [AccountNumber],
       [Extent1].[CustomerType] AS [CustomerType],
       [Extent1].[rowguid] AS [rowguid],
       [Extent1].[ModifiedDate] AS [ModifiedDate],
       [Extent3].[CustomerID] AS [CustomerID1],
       [Extent3].[ContactID] AS [ContactID],
       [Extent3].[Demographics] AS [Demographics],
       [Extent3].[ModifiedDate] AS [ModifiedDate1],
       [Extent5].[ContactID] AS [ContactID1],
       [Extent5].[NameStyle] AS [NameStyle],
       [Extent5].[Title] AS [Title],
       [Extent5].[FirstName] AS [FirstName],
       [Extent5].[MiddleName] AS [MiddleName],
       [Extent5].[LastName] AS [LastName],
       [Extent5].[Suffix] AS [Suffix],
       [Extent5].[EmailAddress] AS [EmailAddress],
       [Extent5].[EmailPromotion] AS [EmailPromotion],
       [Extent5].[Phone] AS [Phone],
       [Extent5].[PasswordHash] AS [PasswordHash],
       [Extent5].[PasswordSalt] AS [PasswordSalt],
       [Extent5].[AdditionalContactInfo] AS [AdditionalContactInfo],
       [Extent5].[rowguid] AS [rowguid1],
       [Extent5].[ModifiedDate] AS [ModifiedDate2]
FROM   [Sales].[Customer] AS [Extent1]
       LEFT OUTER JOIN [Sales].[Individual] AS [Extent2]
            ON  [Extent1].[CustomerID] = [Extent2].[CustomerID]
       LEFT OUTER JOIN [Sales].[Individual] AS [Extent3]
            ON  [Extent2].[CustomerID] = [Extent3].[CustomerID]
       LEFT OUTER JOIN [Person].[Contact] AS [Extent5]
            ON  [Extent3].[ContactID] = [Extent5].[ContactID]

One recordset is returned, versus the 3 queries that LLBLGen creates, which each return nearly 20,000 rows. How do I get LLBLGen to produce a similar query?

Ultimately, we are evaluating whether to purchase several LLBLGen 3.0 licenses. We have been longtime users of the 1.x version, but I'd like to get this fairly common scenario ironed out before I can recommend the upgrade.

Thanks! Matt

daelmo avatar
daelmo
Support Team
Posts: 8245
Joined: 28-Nov-2005
# Posted on: 16-Sep-2010 07:00:10   

Hi Matt,

Your second and third queries are missing the parameters. Please enable tracing at level 4 or confirm that in you SQL Profiler.

Then, using JOIN instead of separate queries doesn't make it better. There is a lot of discussion out there about that. Maybe you would like to begin with:

David Elizondo | LLBLGen Support Team
Ren
User
Posts: 42
Joined: 01-Jul-2005
# Posted on: 16-Sep-2010 18:03:33   

Daelmo,

I am very interested in the topic as well (I work with Matt).

Why are you having Matt turn on level 4 tracing? What is wrong with the predicate he wrote above against AdventureWorks? It looks simple enough and he doesn't seem to be missing anything.

I watched as he ran the same call from EntityFramework and it was much faster (minus the top 100). In profiler it looks much more efficient as well.

Can you provide a cliff notes version of the link you posted? I am a bit skeptical right now after watching the difference in performance first hand.

Thanks Eric

Posts: 14
Joined: 12-Dec-2005
# Posted on: 16-Sep-2010 19:27:44   

Hi David,

Thanks for the response. Those queries that I pasted were directly from SQL Profiler. I ran the code again just to double check, and the queries captured by profiler match those that I pasted (i.e., there are no parameters passed to the 2nd and 3rd queries).

I ran a level 4 trace (which produced a 25 mb file). Here are the interesting tidbits:


Method Enter: DataAccessAdapterBase.FetchEntityCollection(8)
Method Enter: DataAccessAdapterBase.FetchEntityCollectionInternal(7)
Active Entity Collection Description: 
    EntityCollection: AdventureworksDAL.HelperClasses.EntityCollection`1[[AdventureworksDAL.EntityClasses.CustomerEntity, AdventureworksDAL, Version=1.0.3909.23048, Culture=neutral, PublicKeyToken=null]].    Will contain entities of type: CustomerEntity

Method Enter: CreatePagingSelectDQ
Method Enter: CreateSelectDQ
Method Enter: CreateSelectDQ
Generated Sql query: 
    Query: SELECT TOP(@p2) [AdventureWorks].[Sales].[Customer].[AccountNumber], [AdventureWorks].[Sales].[Customer].[CustomerID] AS [CustomerId], [AdventureWorks].[Sales].[Customer].[CustomerType], [AdventureWorks].[Sales].[Customer].[ModifiedDate], [AdventureWorks].[Sales].[Customer].[rowguid] AS [Rowguid], [AdventureWorks].[Sales].[Customer].[TerritoryID] AS [TerritoryId] FROM [AdventureWorks].[Sales].[Customer]  
    Parameter: @p2 : Int64. Length: 0. Precision: 0. Scale: 0. Direction: Input. Value: 100.
Method Exit: CreateSelectDQ
Method Exit: CreatePagingSelectDQ: no paging.
Method Enter: DataAccessAdapterBase.ExecuteMultiRowRetrievalQuery
Method Enter: DataAccessAdapterBase.OpenConnection
: Connection physically opened.
Method Exit: DataAccessAdapterBase.OpenConnection
Executed Sql Query: 
    Query: SELECT TOP(@p2) [AdventureWorks].[Sales].[Customer].[AccountNumber], [AdventureWorks].[Sales].[Customer].[CustomerID] AS [CustomerId], [AdventureWorks].[Sales].[Customer].[CustomerType], [AdventureWorks].[Sales].[Customer].[ModifiedDate], [AdventureWorks].[Sales].[Customer].[rowguid] AS [Rowguid], [AdventureWorks].[Sales].[Customer].[TerritoryID] AS [TerritoryId] FROM [AdventureWorks].[Sales].[Customer]  
    Parameter: @p2 : Int64. Length: 0. Precision: 0. Scale: 0. Direction: Input. Value: 100.

Method Enter: CollectionCore.PerformAdd
Entity to Add Description: 
    Entity: AdventureworksDAL.EntityClasses.CustomerEntity. ObjectID: cd62a7c5-c6f3-4698-91a1-f591ca27ed0e
    PrimaryKey field: CustomerId. Type: System.Int32. Value: 1
Index of added entity: 0
Method Exit: CollectionCore.PerformAdd
Method Enter: CollectionCore.PerformAdd
Entity to Add Description: 
    Entity: AdventureworksDAL.EntityClasses.CustomerEntity. ObjectID: 4bd8e0dc-0c15-439b-b460-31022d28f735
    PrimaryKey field: CustomerId. Type: System.Int32. Value: 2
Index of added entity: 1
Method Exit: CollectionCore.PerformAdd

... (repeated for all 100 customer entity objects) ...

then

Method Enter: CreateSelectDQ
Generated Sql query: 
    Query: SELECT [AdventureWorks].[Sales].[Customer].[CustomerID] AS [CustomerId] FROM [AdventureWorks].[Sales].[Customer]  
Method Exit: CreateSelectDQ
Method Exit: CreateSubQuery
Generated Sql query: 
    Query: SELECT [AdventureWorks].[Sales].[Individual].[ContactID] AS [ContactId], [AdventureWorks].[Sales].[Individual].[CustomerID] AS [CustomerId], [AdventureWorks].[Sales].[Individual].[Demographics], [AdventureWorks].[Sales].[Individual].[ModifiedDate] FROM [AdventureWorks].[Sales].[Individual]   WHERE ( [AdventureWorks].[Sales].[Individual].[CustomerID] IN (SELECT [AdventureWorks].[Sales].[Customer].[CustomerID] AS [CustomerId] FROM [AdventureWorks].[Sales].[Customer]  ))
Method Exit: CreateSelectDQ
Method Exit: CreatePagingSelectDQ: no paging.
Method Enter: DataAccessAdapterBase.ExecuteMultiRowRetrievalQuery
Method Enter: DataAccessAdapterBase.OpenConnection
Method Exit: DataAccessAdapterBase.OpenConnection
Executed Sql Query: 
    Query: SELECT [AdventureWorks].[Sales].[Individual].[ContactID] AS [ContactId], [AdventureWorks].[Sales].[Individual].[CustomerID] AS [CustomerId], [AdventureWorks].[Sales].[Individual].[Demographics], [AdventureWorks].[Sales].[Individual].[ModifiedDate] FROM [AdventureWorks].[Sales].[Individual]   WHERE ( [AdventureWorks].[Sales].[Individual].[CustomerID] IN (SELECT [AdventureWorks].[Sales].[Customer].[CustomerID] AS [CustomerId] FROM [AdventureWorks].[Sales].[Customer]  ))

Method Enter: CollectionCore.PerformAdd
Entity to Add Description: 
    Entity: AdventureworksDAL.EntityClasses.IndividualEntity. ObjectID: 1d66f621-41c1-4a78-8eca-90ab598304d2
    PrimaryKey field: CustomerId. Type: System.Int32. Value: 11000
Index of added entity: 0
Method Exit: CollectionCore.PerformAdd
Method Enter: CollectionCore.PerformAdd
Entity to Add Description: 
    Entity: AdventureworksDAL.EntityClasses.IndividualEntity. ObjectID: e4c43888-6d15-42fe-8be6-fee38b577dda
    PrimaryKey field: CustomerId. Type: System.Int32. Value: 11001
Index of added entity: 1
Method Exit: CollectionCore.PerformAdd

... (repeated tens of thousands of times for 18483 retrieved Individual records) ...

then

Method Enter: CreateSelectDQ
Generated Sql query: 
    Query: SELECT [AdventureWorks].[Sales].[Customer].[CustomerID] AS [CustomerId] FROM [AdventureWorks].[Sales].[Customer]  
Method Exit: CreateSelectDQ
Method Exit: CreateSubQuery
Generated Sql query: 
    Query: SELECT [AdventureWorks].[Sales].[Individual].[ContactID] AS [ContactId] FROM [AdventureWorks].[Sales].[Individual]   WHERE ( ( [AdventureWorks].[Sales].[Individual].[CustomerID] IN (SELECT [AdventureWorks].[Sales].[Customer].[CustomerID] AS [CustomerId] FROM [AdventureWorks].[Sales].[Customer]  )))
Method Exit: CreateSelectDQ
Method Exit: CreateSubQuery
Generated Sql query: 
    Query: SELECT [AdventureWorks].[Person].[Contact].[AdditionalContactInfo], [AdventureWorks].[Person].[Contact].[ContactID] AS [ContactId], [AdventureWorks].[Person].[Contact].[EmailAddress], [AdventureWorks].[Person].[Contact].[EmailPromotion], [AdventureWorks].[Person].[Contact].[FirstName], [AdventureWorks].[Person].[Contact].[LastName], [AdventureWorks].[Person].[Contact].[MiddleName], [AdventureWorks].[Person].[Contact].[ModifiedDate], [AdventureWorks].[Person].[Contact].[NameStyle], [AdventureWorks].[Person].[Contact].[PasswordHash], [AdventureWorks].[Person].[Contact].[PasswordSalt], [AdventureWorks].[Person].[Contact].[Phone], [AdventureWorks].[Person].[Contact].[rowguid] AS [Rowguid], [AdventureWorks].[Person].[Contact].[Suffix], [AdventureWorks].[Person].[Contact].[Title] FROM [AdventureWorks].[Person].[Contact]   WHERE ( [AdventureWorks].[Person].[Contact].[ContactID] IN (SELECT [AdventureWorks].[Sales].[Individual].[ContactID] AS [ContactId] FROM [AdventureWorks].[Sales].[Individual]   WHERE ( ( [AdventureWorks].[Sales].[Individual].[CustomerID] IN (SELECT [AdventureWorks].[Sales].[Customer].[CustomerID] AS [CustomerId] FROM [AdventureWorks].[Sales].[Customer]  )))))
Method Exit: CreateSelectDQ
Method Exit: CreatePagingSelectDQ: no paging.
Method Enter: DataAccessAdapterBase.ExecuteMultiRowRetrievalQuery
Method Enter: DataAccessAdapterBase.OpenConnection
Method Exit: DataAccessAdapterBase.OpenConnection
Executed Sql Query: 
    Query: SELECT [AdventureWorks].[Person].[Contact].[AdditionalContactInfo], [AdventureWorks].[Person].[Contact].[ContactID] AS [ContactId], [AdventureWorks].[Person].[Contact].[EmailAddress], [AdventureWorks].[Person].[Contact].[EmailPromotion], [AdventureWorks].[Person].[Contact].[FirstName], [AdventureWorks].[Person].[Contact].[LastName], [AdventureWorks].[Person].[Contact].[MiddleName], [AdventureWorks].[Person].[Contact].[ModifiedDate], [AdventureWorks].[Person].[Contact].[NameStyle], [AdventureWorks].[Person].[Contact].[PasswordHash], [AdventureWorks].[Person].[Contact].[PasswordSalt], [AdventureWorks].[Person].[Contact].[Phone], [AdventureWorks].[Person].[Contact].[rowguid] AS [Rowguid], [AdventureWorks].[Person].[Contact].[Suffix], [AdventureWorks].[Person].[Contact].[Title] FROM [AdventureWorks].[Person].[Contact]   WHERE ( [AdventureWorks].[Person].[Contact].[ContactID] IN (SELECT [AdventureWorks].[Sales].[Individual].[ContactID] AS [ContactId] FROM [AdventureWorks].[Sales].[Individual]   WHERE ( ( [AdventureWorks].[Sales].[Individual].[CustomerID] IN (SELECT [AdventureWorks].[Sales].[Customer].[CustomerID] AS [CustomerId] FROM [AdventureWorks].[Sales].[Customer]  )))))

Method Enter: CollectionCore.PerformAdd
Entity to Add Description: 
    Entity: AdventureworksDAL.EntityClasses.ContactEntity. ObjectID: 0d1ab854-5fc3-48e0-b248-f6e810832caf
    PrimaryKey field: ContactId. Type: System.Int32. Value: 705
Index of added entity: 0
Method Exit: CollectionCore.PerformAdd
Method Enter: CollectionCore.PerformAdd
Entity to Add Description: 
    Entity: AdventureworksDAL.EntityClasses.ContactEntity. ObjectID: 2cfee0dc-63b3-4eb0-ad61-82734970fb24
    PrimaryKey field: ContactId. Type: System.Int32. Value: 706
Index of added entity: 1
Method Exit: CollectionCore.PerformAdd

Again repeated for 18483 retrieved Contact records. Then the trace shows a bunch of EntityBase2.SyncFKFields for all of these superfluous objects.

For a TOP 100, I can't believe that retrieving nearly 40000 extra rows from the database and then assembling them in memory is in any way efficient. What if the related tables had millions of rows? This simple query takes around 10 seconds to complete. The Entity Framework code using joins takes a third that amount of time and uses far less bandwidth and memory.

I'm hoping there is a simple solution to this. I realize that I can set the ParameterisedPrefetchPathThreshold to 100, and then it will parameterize the 2nd and 3rd queries, but this just seems like a hack to me. What if I want the top 1000? Then the ParameterisedPrefetchPathThreshold won't work so well.

Thanks again! Matt

MTrinder
User
Posts: 1461
Joined: 08-Oct-2008
# Posted on: 16-Sep-2010 23:39:27   

Just a thought - have you tried adding an OrderBy to see if this makes any difference....

Matt

Posts: 14
Joined: 12-Dec-2005
# Posted on: 16-Sep-2010 23:50:20   

Thanks for the follow up. As I understand relations... I am able to basically construct filters on related entities. However, I actually want to retrieve those related entities as well.

For example, I am doing this:

            var customers = new EntityCollection<CustomerEntity>();
            RelationPredicateBucket contactFilter = new RelationPredicateBucket();
            contactFilter.Relations.Add(CustomerEntity.Relations.IndividualEntityUsingCustomerId);
            contactFilter.Relations.Add(IndividualEntity.Relations.ContactEntityUsingContactId);

            using (var adapter = new DataAccessAdapter())
            {
                adapter.FetchEntityCollection(customers, contactFilter, 100);
            }

This successfully gives me the top 100 customers that have a related record in the Individual and Contact tables. However, I also want to retrieve the Contact entity so that I can get at the name. Can't I just get the top 100 customers and their names without some grossly inefficient query?

Thanks again, Matt

EDIT: I'm not sure if the forum ate your previous post, or if you deleted it. So, my response doesn't really make sense anymore simple_smile

Nevertheless, I did try adding an "order by" using a SortExpression object. Still no dice. Perhaps this whole thing is just a philosophical issue, but it seems a like it would be a common problem. Is there no way to get llblgen to use joins versus multiple queries?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 17-Sep-2010 10:57:47   

I start with the startpost, the discussion got derailed quickly.

MattHornsby wrote:

So, I'm testing on LLBLGen 3.0 on the AdventureWorks database. I want to retrieve the names of the Top 100 customers. So, with AdventureWorks, that means getting the top 100 records from the Customer table, joining it with the Individual table, and then joining that with the Contact table.

This is the sql that I want:

SELECT TOP 100 contact.FirstName, contact.LastName FROM Sales.Customer cust JOIN Sales.Individual i ON i.CustomerID = cust.CustomerID JOIN Person.Contact contact ON contact.ContactID = i.ContactID

That's a projection of entity fields, not an entity fetch.

My code looks like this:


            var customers = new EntityCollection<CustomerEntity>();
            IPrefetchPath2 path = new PrefetchPath2(EntityType.CustomerEntity);
            path.Add(CustomerEntity.PrefetchPathIndividual).SubPath.Add(IndividualEntity.PrefetchPathContact);

            using (var adapter = new DataAccessAdapter())
            {
                adapter.FetchEntityCollection(customers, null, 100, null, path);
            }

Now, LLBLGen generates three queries to do this.

exec sp_executesql N'SELECT TOP(@p2) [AdventureWorks].[Sales].[Customer].[AccountNumber], [AdventureWorks].[Sales].[Customer].[CustomerID] AS [CustomerId], [AdventureWorks].[Sales].[Customer].[CustomerType], [AdventureWorks].[Sales].[Customer].[ModifiedDate], [AdventureWorks].[Sales].[Customer].[rowguid] AS [Rowguid], [AdventureWorks].[Sales].[Customer].[TerritoryID] AS [TerritoryId] FROM [AdventureWorks].[Sales].[Customer]  ',N'@p2 bigint',@p2=100

SELECT [AdventureWorks].[Sales].[Individual].[ContactID] AS [ContactId], [AdventureWorks].[Sales].[Individual].[CustomerID] AS [CustomerId], [AdventureWorks].[Sales].[Individual].[Demographics], [AdventureWorks].[Sales].[Individual].[ModifiedDate] FROM [AdventureWorks].[Sales].[Individual]   WHERE ( [AdventureWorks].[Sales].[Individual].[CustomerID] IN (SELECT [AdventureWorks].[Sales].[Customer].[CustomerID] AS [CustomerId] FROM [AdventureWorks].[Sales].[Customer]  ))

SELECT [AdventureWorks].[Person].[Contact].[AdditionalContactInfo], [AdventureWorks].[Person].[Contact].[ContactID] AS [ContactId], [AdventureWorks].[Person].[Contact].[EmailAddress], [AdventureWorks].[Person].[Contact].[EmailPromotion], [AdventureWorks].[Person].[Contact].[FirstName], [AdventureWorks].[Person].[Contact].[LastName], [AdventureWorks].[Person].[Contact].[MiddleName], [AdventureWorks].[Person].[Contact].[ModifiedDate], [AdventureWorks].[Person].[Contact].[NameStyle], [AdventureWorks].[Person].[Contact].[PasswordHash], [AdventureWorks].[Person].[Contact].[PasswordSalt], [AdventureWorks].[Person].[Contact].[Phone], [AdventureWorks].[Person].[Contact].[rowguid] AS [Rowguid], [AdventureWorks].[Person].[Contact].[Suffix], [AdventureWorks].[Person].[Contact].[Title] FROM [AdventureWorks].[Person].[Contact]   WHERE ( [AdventureWorks].[Person].[Contact].[ContactID] IN (SELECT [AdventureWorks].[Sales].[Individual].[ContactID] AS [ContactId] FROM [AdventureWorks].[Sales].[Individual]   WHERE ( ( [AdventureWorks].[Sales].[Individual].[CustomerID] IN (SELECT [AdventureWorks].[Sales].[Customer].[CustomerID] AS [CustomerId] FROM [AdventureWorks].[Sales].[Customer]  )))))

Now, ignoring the fact that I haven't specified that I only want FirstName and LastName in the code... the generated SQL is grossly inefficient. Its bringing back 18000+ rows for the second and third queries. What am I doing wrong?

You specified a prefetch path for a graph fetching. This isn't inefficient, it's the way a graph should be fetched. The point is, if you use a join to fetch all entities, you'll get 1) data duplication (so slow roundtripping) and 2) a wide resultset which might also not be consumable in an efficient way.

Btw, you should set the parameterized prefetch path threshold parameter of the adapter to the # of rows you limited the parent set to. So set it to 100 and you'll get more efficient queries. It now can't limit the child queries because it can't filter on the customerid's in the subquery (no order by)

see: http://www.llblgen.com/documentation/3.0/LLBLGen%20Pro%20RTF/hh_goto.htm#Using%20the%20generated%20code/Adapter/gencode_prefetchpaths_adapter.htm#paging

(limiting the parents is equal to paging with prefetch paths. You need the threshold to be set to the window / page size, in your case 100).

EDIT: Upon further experimentation, I wrote the following query with Entity Framework:

            var context = new AdventureWorksEntities();
            var q = from x in context.Customers.Include("Individual.Contact")
                                     select x;

            List<Customer> list = q.ToList();

It produced the following query, which is much more efficient:

SELECT [Extent1].[CustomerID] AS [CustomerID],
       [Extent1].[TerritoryID] AS [TerritoryID],
       [Extent1].[AccountNumber] AS [AccountNumber],
       [Extent1].[CustomerType] AS [CustomerType],
       [Extent1].[rowguid] AS [rowguid],
       [Extent1].[ModifiedDate] AS [ModifiedDate],
       [Extent3].[CustomerID] AS [CustomerID1],
       [Extent3].[ContactID] AS [ContactID],
       [Extent3].[Demographics] AS [Demographics],
       [Extent3].[ModifiedDate] AS [ModifiedDate1],
       [Extent5].[ContactID] AS [ContactID1],
       [Extent5].[NameStyle] AS [NameStyle],
       [Extent5].[Title] AS [Title],
       [Extent5].[FirstName] AS [FirstName],
       [Extent5].[MiddleName] AS [MiddleName],
       [Extent5].[LastName] AS [LastName],
       [Extent5].[Suffix] AS [Suffix],
       [Extent5].[EmailAddress] AS [EmailAddress],
       [Extent5].[EmailPromotion] AS [EmailPromotion],
       [Extent5].[Phone] AS [Phone],
       [Extent5].[PasswordHash] AS [PasswordHash],
       [Extent5].[PasswordSalt] AS [PasswordSalt],
       [Extent5].[AdditionalContactInfo] AS [AdditionalContactInfo],
       [Extent5].[rowguid] AS [rowguid1],
       [Extent5].[ModifiedDate] AS [ModifiedDate2]
FROM   [Sales].[Customer] AS [Extent1]
       LEFT OUTER JOIN [Sales].[Individual] AS [Extent2]
            ON  [Extent1].[CustomerID] = [Extent2].[CustomerID]
       LEFT OUTER JOIN [Sales].[Individual] AS [Extent3]
            ON  [Extent2].[CustomerID] = [Extent3].[CustomerID]
       LEFT OUTER JOIN [Person].[Contact] AS [Extent5]
            ON  [Extent3].[ContactID] = [Extent5].[ContactID]

One recordset is returned, versus the 3 queries that LLBLGen creates, which each return nearly 20,000 rows. How do I get LLBLGen to produce a similar query?

Try multiple includes or a multi-path graph or an include of a 1:n relationship. you'll quickly see this isn't the way to go.

Frans Bouma | Lead developer LLBLGen Pro
Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 17-Sep-2010 11:59:35   

MattHornsby wrote:

Thanks for the follow up. As I understand relations... I am able to basically construct filters on related entities. However, I actually want to retrieve those related entities as well.

Please see the fetch of a projected set and entity fetches as two different things: entity fetches have the notion of fetching a graph, as entities can have related entities. A projected set is always flat and doesn't have 'related data'. So if you want to fetch a projection of an entity set, there's no 'related data'.

For example, I am doing this:

            var customers = new EntityCollection<CustomerEntity>();
            RelationPredicateBucket contactFilter = new RelationPredicateBucket();
            contactFilter.Relations.Add(CustomerEntity.Relations.IndividualEntityUsingCustomerId);
            contactFilter.Relations.Add(IndividualEntity.Relations.ContactEntityUsingContactId);

            using (var adapter = new DataAccessAdapter())
            {
                adapter.FetchEntityCollection(customers, contactFilter, 100);
            }

This successfully gives me the top 100 customers that have a related record in the Individual and Contact tables. However, I also want to retrieve the Contact entity so that I can get at the name. Can't I just get the top 100 customers and their names without some grossly inefficient query?

It pains me to read someone calling the queries 'grossly inefficient' as there's no other way to do them efficiently (as otherwise there have to be a gazillion code paths, one for every situation they can be used it.). simple_smile

Anyway, as I said above: if you want to fetch a projection, just define the projection (e.g. with Linq or in a dynamic list), if you want to fetch a graph with entities, it's a different type of query: parent query + prefetch path.

So in your case,you want to fetch customers and their related contacts through the individual table. This is an inheritance hierarchy btw, (Customer <- Individual), which can be created automatically in the designer, and which makes things easier to work with, as you then simply fetch 'individuals' which will make the runtime add the join with customers automatically.

To stay with your query, I'll write the query you want with the efficient tuning below.


using(DataAccessAdapter adapter = new DataAccessAdapter())
{
    adapter.ParameterisedPrefetchPathThreshold = 100;

    var metaData = new LinqMetaData(adapter);
    var q = (from c in metaData.Customer
                join i in metaData.Individual on c.CustomerId equals i.CustomerId
                join co in metaData.Contact on i.ContactId equals co.ContactId
                select c)
        .Take(100)
            .WithPath(cp=>cp.Prefetch<IndividualEntity>(c=>c.Individual)
                        .SubPath(ip=>ip.Prefetch<ContactEntity>(i=>i.Contact)));

        //...
}

This will produce the following 3 queries: customers:

Generated Sql query: 
    Query: SELECT DISTINCT TOP(@p2) [LPA_L1].[AccountNumber], [LPA_L1].[CustomerID] AS [CustomerId], [LPA_L1].[CustomerType], [LPA_L1].[ModifiedDate], [LPA_L1].[rowguid] AS [Rowguid], [LPA_L1].[TerritoryID] AS [TerritoryId] FROM (( [AdventureWorks].[Sales].[Customer] [LPA_L1]  INNER JOIN [AdventureWorks].[Sales].[Individual] [LPA_L2]  ON  [LPA_L1].[CustomerID] = [LPA_L2].[CustomerID]) INNER JOIN [AdventureWorks].[Person].[Contact] [LPA_L3]  ON  [LPA_L2].[ContactID] = [LPA_L3].[ContactID])
    Parameter: @p2 : Int64. Length: 0. Precision: 0. Scale: 0. Direction: Input. Value: 100.

Individuals: Generated Sql query:

    Query: SELECT [AdventureWorks].[Sales].[Individual].[ContactID] AS [ContactId], [AdventureWorks].[Sales].[Individual].[CustomerID] AS [CustomerId], [AdventureWorks].[Sales].[Individual].[Demographics], [AdventureWorks].[Sales].[Individual].[ModifiedDate] FROM [AdventureWorks].[Sales].[Individual]   WHERE ( [AdventureWorks].[Sales].[Individual].[CustomerID] IN (@p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9, @p10, @p11, @p12, @p13, @p14, @p15, @p16, @p17, @p18, @p19, @p20, @p21, @p22, @p23, @p24, @p25, @p26, @p27, @p28, @p29, @p30, @p31, @p32, @p33, @p34, @p35, @p36, @p37, @p38, @p39, @p40, @p41, @p42, @p43, @p44, @p45, @p46, @p47, @p48, @p49, @p50, @p51, @p52, @p53, @p54, @p55, @p56, @p57, @p58, @p59, @p60, @p61, @p62, @p63, @p64, @p65, @p66, @p67, @p68, @p69, @p70, @p71, @p72, @p73, @p74, @p75, @p76, @p77, @p78, @p79, @p80, @p81, @p82, @p83, @p84, @p85, @p86, @p87, @p88, @p89, @p90, @p91, @p92, @p93, @p94, @p95, @p96, @p97, @p98, @p99, @p100))
    Parameter: @p1 : Int32. Length: 0. Precision: 10. Scale: 0. Direction: Input. Value: 11001.
    Parameter: @p2 : Int32. Length: 0. Precision: 10. Scale: 0. Direction: Input. Value: 11094.
 etc...

Contact


Generated Sql query: 
    Query: SELECT [AdventureWorks].[Person].[Contact].[AdditionalContactInfo], [AdventureWorks].[Person].[Contact].[ContactID] AS [ContactId], [AdventureWorks].[Person].[Contact].[EmailAddress], [AdventureWorks].[Person].[Contact].[EmailPromotion], [AdventureWorks].[Person].[Contact].[FirstName], [AdventureWorks].[Person].[Contact].[LastName], [AdventureWorks].[Person].[Contact].[MiddleName], [AdventureWorks].[Person].[Contact].[ModifiedDate], [AdventureWorks].[Person].[Contact].[NameStyle], [AdventureWorks].[Person].[Contact].[PasswordHash], [AdventureWorks].[Person].[Contact].[PasswordSalt], [AdventureWorks].[Person].[Contact].[Phone], [AdventureWorks].[Person].[Contact].[rowguid] AS [Rowguid], [AdventureWorks].[Person].[Contact].[Suffix], [AdventureWorks].[Person].[Contact].[Title] FROM [AdventureWorks].[Person].[Contact]   WHERE ( [AdventureWorks].[Person].[Contact].[ContactID] IN (SELECT [AdventureWorks].[Sales].[Individual].[ContactID] AS [ContactId] FROM [AdventureWorks].[Sales].[Individual]   WHERE ( ( [AdventureWorks].[Sales].[Individual].[CustomerID] IN (@p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9, @p10, @p11, @p12, @p13, @p14, @p15, @p16, @p17, @p18, @p19, @p20, @p21, @p22, @p23, @p24, @p25, @p26, @p27, @p28, @p29, @p30, @p31, @p32, @p33, @p34, @p35, @p36, @p37, @p38, @p39, @p40, @p41, @p42, @p43, @p44, @p45, @p46, @p47, @p48, @p49, @p50, @p51, @p52, @p53, @p54, @p55, @p56, @p57, @p58, @p59, @p60, @p61, @p62, @p63, @p64, @p65, @p66, @p67, @p68, @p69, @p70, @p71, @p72, @p73, @p74, @p75, @p76, @p77, @p78, @p79, @p80, @p81, @p82, @p83, @p84, @p85, @p86, @p87, @p88, @p89, @p90, @p91, @p92, @p93, @p94, @p95, @p96, @p97, @p98, @p99, @p100)))))
    Parameter: @p1 : Int32. Length: 0. Precision: 10. Scale: 0. Direction: Input. Value: 11001.
    Parameter: @p2 : Int32. Length: 0. Precision: 10. Scale: 0. Direction: Input. Value: 11094.
etc...

This is very efficient, as it can 1) always deal with any graph form (multi branch, skewed 1:n relationships and 2) doesn't lead to data duplication in the resultset and 3) is easy to consume, as it uses existing code paths

I can tweak this down further, as you are only interested in the FirstName and LastName of contact, not in the other fields. I can use fields exclusion, so my query then becomes in C#:

var q = (from c in metaData.Customer
            join i in metaData.Individual on c.CustomerId equals i.CustomerId
            join co in metaData.Contact on i.ContactId equals co.ContactId
            select c)
    .Take(100)
        .WithPath(cp=>cp.Prefetch<IndividualEntity>(c=>c.Individual)
                    .SubPath(
                        ip=>ip.Prefetch<ContactEntity>(i=>i.Contact)
                            .Include(x=>x.FirstName, x=>x.LastName)
                ));

and the Contact query then becomes:


Generated Sql query: 
    Query: SELECT [AdventureWorks].[Person].[Contact].[ContactID] AS [ContactId], [AdventureWorks].[Person].[Contact].[FirstName], [AdventureWorks].[Person].[Contact].[LastName] FROM [AdventureWorks].[Person].[Contact]   WHERE ( [AdventureWorks].[Person].[Contact].[ContactID] IN (SELECT [AdventureWorks].[Sales].[Individual].[ContactID] AS [ContactId] FROM [AdventureWorks].[Sales].[Individual]   WHERE ( ( [AdventureWorks].[Sales].[Individual].[CustomerID] IN (@p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9, @p10, @p11, @p12, @p13, @p14, @p15, @p16, @p17, @p18, @p19, @p20, @p21, @p22, @p23, @p24, @p25, @p26, @p27, @p28, @p29, @p30, @p31, @p32, @p33, @p34, @p35, @p36, @p37, @p38, @p39, @p40, @p41, @p42, @p43, @p44, @p45, @p46, @p47, @p48, @p49, @p50, @p51, @p52, @p53, @p54, @p55, @p56, @p57, @p58, @p59, @p60, @p61, @p62, @p63, @p64, @p65, @p66, @p67, @p68, @p69, @p70, @p71, @p72, @p73, @p74, @p75, @p76, @p77, @p78, @p79, @p80, @p81, @p82, @p83, @p84, @p85, @p86, @p87, @p88, @p89, @p90, @p91, @p92, @p93, @p94, @p95, @p96, @p97, @p98, @p99, @p100)))))
    Parameter: @p1 : Int32. Length: 0. Precision: 10. Scale: 0. Direction: Input. Value: 11001.
    Parameter: @p2 : Int32. Length: 0. Precision: 10. Scale: 0. Direction: Input. Value: 11094.
    Parameter: @p3 : Int32. Length: 0. Precision: 10. Scale: 0. Direction: Input. Value: 11070.
    Parameter: @p4 : Int32. Length: 0. Precision: 10. Scale: 0. Direction: Input. Value: 11087.
... etc.

As you can see, it only includes the fields which are vital to the entity (pk, fk and the included fields). I can also do this with individual to limit the resultset returned for individual.

EDIT: I'm not sure if the forum ate your previous post, or if you deleted it. So, my response doesn't really make sense anymore simple_smile

Nevertheless, I did try adding an "order by" using a SortExpression object. Still no dice. Perhaps this whole thing is just a philosophical issue, but it seems a like it would be a common problem. Is there no way to get llblgen to use joins versus multiple queries?

Yes, you can use joins, that's no problem as I showed above. Rules of thumb: - to fetch a graph, use eager loading with prefetch paths - to fetch an individual set of entities and filter on related entities use relations/joins - you can combine those per level of the prefetch path.

Frans Bouma | Lead developer LLBLGen Pro
Posts: 14
Joined: 12-Dec-2005
# Posted on: 17-Sep-2010 19:39:35   

Hi Franz,

Thanks for taking the time to explain things.

Also, I apologize for the "grossly inefficient" comment - I've just been struggling with what I think is a fairly common scenario and was a little cranky after a few days of tinkering. As has been my experience in the last few years working with LLBLGen, if something is not working as I expect it to, its usually because of something dumb that I'm doing.

So, on to a couple of questions/comments about what you wrote.

First, thank you for putting together the query I was looking for - that was part of the battle. The main concern I have is that even with the optimized query, it still takes more time on average than the equivalent Entity Framework query that uses joins. I ran the following two pieces of code 1000 times each:

Entity Framework:

            var context = new AdventureWorksEntities();
            IQueryable<Customer> q =
                from x in
                    context.Customers.Include("Individual.Contact")
                    .Where(x => x.Individual.Contact.FirstName == "David")
                select x;
    
            Stopwatch clock = new Stopwatch();
            clock.Start();
            for (int i = 0; i < 1000; i++)
            {
                q.ToList();
            }
            clock.Stop();

            Console.WriteLine(clock.ElapsedMilliseconds / 1000);

LLBLGen:

            using (var adapter = new DataAccessAdapter())
            {
                var metaData = new LinqMetaData(adapter);

                IQueryable<CustomerEntity> q = (from c in metaData.Customer
                                                join i in metaData.Individual on c.CustomerId equals i.CustomerId
                                                join co in metaData.Contact on i.ContactId equals co.ContactId
                                                select c)
                    .WithPath(cp => cp.Prefetch<IndividualEntity>(c => c.Individual)
                                        .SubPath(
                                            ip => ip.Prefetch<ContactEntity>(i => i.Contact)
                                        )).Where(c=>c.Individual.Contact.FirstName == "David");

                Stopwatch clock = new Stopwatch();
                clock.Start();
                for (int i = 0; i < 1000; i++)
                {
                    q.ToList();
                }
                clock.Stop();
                Console.WriteLine(clock.ElapsedMilliseconds / 1000);
            }

Results: EF: 29ms LLBLGen: 105ms

The EF query is over three times as fast on average, and the code more concise. I'm not trying to get into a religious war, because I love LLBLGen and really don't want to use Entity Framework. I'm just struggling to understand why I'm not getting comparable performance.

Am I just being really dumb about how I'm thinking about the problem? It's okay if you say yes simple_smile

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 17-Sep-2010 21:05:49   

For this particular scenario, their chosen way of doing prefetch paths pays off. However as soon as you add another include, or have a 1:n relationship, it is totally different.

Eager loading is something that can be done in a couple of ways. We chose to use 1 query per graph node, which is on average the best choice. MS chose to use joins which gives lots of problems (see the EF wish list http://data.uservoice.com/forums/72025-ado-net-entity-framework-ef-feature-suggestions for some horror stories about include wink ) in many scenarios, luckily for them, not in this particular case wink .

You can make it a little more efficient for llblgen pro when creating an inheritance hierarchy, and remove the join to contact (although it's not really that much of a problem in the performance).

So in this particular case, our eager loading system is a bit slower than theirs, but as we decided years ago already: this is really the best way to handle this, especially when you think about more complex graphs which give huge problems with joins or multiple nodes (e.g. you fetch 3 or 4 nodes in a graph).

Frans Bouma | Lead developer LLBLGen Pro
Posts: 14
Joined: 12-Dec-2005
# Posted on: 17-Sep-2010 21:27:06   

Thanks Franz, you're awesome.

I know you've probably had to rehash this more times than you'd like, but it's that personal attention that has made and kept me a loyal user.

I also learned a bunch of stuff in this thread, despite years of use, so thanks again for taking the extra time to explain things in detail!

-Matt

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 17-Sep-2010 21:28:05   

No problem, glad we could help out simple_smile

If you have any further questions, just let us know simple_smile

Frans Bouma | Lead developer LLBLGen Pro