Why is Take(100) doing top in memory, not at server?

Posts   
 
    
Gabbo
User
Posts: 56
Joined: 12-Jun-2006
# Posted on: 16-Oct-2009 17:54:02   

Details:

LLBLGen Pro 2.6 Final Builder 2.6 Final, October 9th, 2009 Adapter for .NET 3.5 ORMSupportClasses.NET20 version: 2.6.9.1005 DQE.SqlServer.NET20 version: 2.6.9.917 LinqSupportClasses.NET35 version: 2.6.9.1008 Database: SQL Server 2005

When I do a SQL Profiler trace of the following:

var query = (from slf in metaData.SiteLogFilename join sl in metaData.SiteLog on slf.SiteLogId equals sl.SiteLogId where sl.SiteProcessId == (int)DataEnum.SiteProcessId.Condition1 || sl.SiteProcessId == (int)DataEnum.SiteProcessId.Condition2 || sl.SiteProcessId == (int)DataEnum.SiteProcessId.Condition3 orderby sl.ProcessStart descending select slf).Take(100);

var results = query.ToList();

I get this result:

exec sp_executesql N' SELECT [LPA_L1].[site_log_filename_id] AS [SiteLogFilenameId], [LPA_L1].[site_log_id] AS [SiteLogId], [LPA_L1].[filename] AS [Filename] FROM ( [database_name].[dbo].[site_log_filename] [LPA_L1]
INNER JOIN [database_name].[dbo].[site_log] [LPA_L2] ON [LPA_L1].[site_log_id] = [LPA_L2].[site_log_id]) WHERE ( ( ( ( ( ( ( ( ( [LPA_L2].[site_process_id] = @SiteProcessId1) OR ( [LPA_L2].[site_process_id] = @SiteProcessId2)) OR ( [LPA_L2].[site_process_id] = @SiteProcessId3)))))))) ORDER BY [LPA_L2].[process_start] DESC',N'@SiteProcessId1 int,@SiteProcessId2 int,@SiteProcessId3 int ',@SiteProcessId1=1017,@SiteProcessId2=1001,@SiteProcessId3=1019

The total number of records on the server in this case is 192 rows, and the count of query after the ToList() is correctly 100, so it appears the reduction is happening in code. I had expected to see a TOP 100 (or similar) call on the server.

Is this correct?

Thanks in advance!

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39861
Joined: 17-Aug-2003
# Posted on: 16-Oct-2009 18:49:17   

what are the db types of the 3 fields? Is filename an NText/text field by any chance? Do you also see this happening with other queries?

Frans Bouma | Lead developer LLBLGen Pro
Gabbo
User
Posts: 56
Joined: 12-Jun-2006
# Posted on: 16-Oct-2009 19:07:38   

Otis wrote:

what are the db types of the 3 fields? Is filename an NText/text field by any chance? Do you also see this happening with other queries?

Hi Frans,

Thanks for the quick reply. I had hoped filename was inadvertently NText, but it's varchar(256). So:

site_log_filename_id = int site_log_id = int filename = varchar(256)

I double-checked in the database just in case. I'm currently not running any other queries that use top in LINQ. Up to this point I'd been using either Typed Views or Typed Lists for this kind of query. I've been writing on new queries using LINQ, and I always profile them just in case I got something wrong, which is how I noticed the missing TOP statement.

daelmo avatar
daelmo
Support Team
Posts: 8245
Joined: 28-Nov-2005
# Posted on: 17-Oct-2009 07:58:05   

Could you please attach the DDL script for the involved tables to see whether we can reproduce your scenario?

David Elizondo | LLBLGen Support Team
Otis avatar
Otis
LLBLGen Pro Team
Posts: 39861
Joined: 17-Aug-2003
# Posted on: 19-Oct-2009 09:58:10   

The reason for this is the following: You sort on a field which isn't in the select list (projection) of the query, namely a field in 'sl'. The projection is on the 'm' side of an m:1 relationship, which means the query is marked as a 'query which result could contain duplicates'. When that's the case, the DQE will try to emit DISTINCT because it's otherwise not going to be a correct paging query. DISTINCT is required for 'top' in this case, as otherwise you'll probably get 1 page of all duplicates.

DISTINCT + TOP aren't allowed if there's an ORDER BY in the query on a field which isn't in the projection. So the runtime decides to page on the client. This means that it reads pageno*pagesize rows from the datareader and then closes the connection. it doesn't mean it fetches all data to the client, only the rows which are necessary for the page.

There's no way around this (as it's a DB limitation), other than that you avoid sorting on a field in a related entity.

Frans Bouma | Lead developer LLBLGen Pro
Gabbo
User
Posts: 56
Joined: 12-Jun-2006
# Posted on: 19-Oct-2009 19:32:33   

Otis wrote:

The reason for this is the following: You sort on a field which isn't in the select list (projection) of the query, namely a field in 'sl'. The projection is on the 'm' side of an m:1 relationship, which means the query is marked as a 'query which result could contain duplicates'. When that's the case, the DQE will try to emit DISTINCT because it's otherwise not going to be a correct paging query. DISTINCT is required for 'top' in this case, as otherwise you'll probably get 1 page of all duplicates.

DISTINCT + TOP aren't allowed if there's an ORDER BY in the query on a field which isn't in the projection. So the runtime decides to page on the client. This means that it reads pageno*pagesize rows from the datareader and then closes the connection. it doesn't mean it fetches all data to the client, only the rows which are necessary for the page.

There's no way around this (as it's a DB limitation), other than that you avoid sorting on a field in a related entity.

Yes, when I changed the SORT to the primary key of the MANY table, it did emit SELECT DISTINCT TOP 100.

My usage isn't for paging, but for admin purposes (easily view the last 100 records for testing/debugging reasons). Right now these are SQL scripts that I run in Management Studio, but are a pain because they're more difficult to organize and access efficiently. I thought I could put them in the application and make an easy to use interface over them. But since some of the tables have millions of records, bringing them over to the client-side is clearly problematic.

When you say it's a DB limitation, I'm not sure I understand this. Are you saying that for the purposes of using Take() for paging that it's a limitation? I only ask because with SQL Server you can obviously take the query that's created and add a TOP 100 and at least for my use case, I get the results I want.

Thanks!

Gabbo
User
Posts: 56
Joined: 12-Jun-2006
# Posted on: 19-Oct-2009 20:05:34   

Otis wrote:

The reason for this is the following: You sort on a field which isn't in the select list (projection) of the query, namely a field in 'sl'. The projection is on the 'm' side of an m:1 relationship, which means the query is marked as a 'query which result could contain duplicates'. When that's the case, the DQE will try to emit DISTINCT because it's otherwise not going to be a correct paging query. DISTINCT is required for 'top' in this case, as otherwise you'll probably get 1 page of all duplicates.

DISTINCT + TOP aren't allowed if there's an ORDER BY in the query on a field which isn't in the projection. So the runtime decides to page on the client. This means that it reads pageno*pagesize rows from the datareader and then closes the connection. it doesn't mean it fetches all data to the client, only the rows which are necessary for the page.

There's no way around this (as it's a DB limitation), other than that you avoid sorting on a field in a related entity.

One more related question based on an observation as I was debugging this. It has to do with using this form (the LINQ is identical up to the ellipses and is based on my original example):

... select new { slf.SitLogFilenameId, slf.SiteLogId, slf.FileName }).Take(100);

VERSUS ... select slf).Take(100);

When sorting on the primary key of the many field, I now get the DISTINCT TOP 100 for the second example, but the first example just issues a SELECT. If you DIFF the results the query produced is identical, with the exception of the DISTINCT TOP 100. Is this by design?

Thanks again!

Walaa avatar
Walaa
Support Team
Posts: 14993
Joined: 21-Aug-2005
# Posted on: 20-Oct-2009 12:33:19   

... select new { slf.SitLogFilenameId, slf.SiteLogId, slf.FileName }).Take(100);

Pease post the corresponding generated SQL.

Gabbo
User
Posts: 56
Joined: 12-Jun-2006
# Posted on: 20-Oct-2009 15:38:48   

Walaa wrote:

... select new { slf.SitLogFilenameId, slf.SiteLogId, slf.FileName }).Take(100);

Pease post the corresponding generated SQL.

Hi Walaa,

exec sp_executesql N' SELECT [LPA_L1].[site_log_filename_id] AS [SiteLogFilenameId], [LPA_L1].[site_log_id] AS [SiteLogId], [LPA_L1].[filename] AS [Filename] FROM ( [DATABASE_NAME].[dbo].[site_log_filename] [LPA_L1]
INNER JOIN [DATABASE_NAME].[dbo].[site_log] [LPA_L2] ON [LPA_L1].[site_log_id] = [LPA_L2].[site_log_id]) WHERE ( ( ( ( ( ( ( ( ( [LPA_L2].[site_process_id] = @SiteProcessId1) OR ( [LPA_L2].[site_process_id] = @SiteProcessId2)) OR ( [LPA_L2].[site_process_id] = @SiteProcessId3)))))))) ORDER BY [LPA_L1].[site_log_filename_id] DESC ',N'@SiteProcessId1 int,@SiteProcessId2 int,@SiteProcessId3 int ',@SiteProcessId1=1017,@SiteProcessId2=1001,@SiteProcessId3=1019

Just replace SELECT with SELECT DISTINCT TOP 100 for the "select slf).Take(100)" version.

For the particular admin needs I have I've decided to create a stored procedure and just use the RetrievalProcedures class to pull the TOP'd datasets back.

Thanks!

Gabbo
User
Posts: 56
Joined: 12-Jun-2006
# Posted on: 20-Oct-2009 16:04:58   

Otis wrote:

The reason for this is the following: You sort on a field which isn't in the select list (projection) of the query, namely a field in 'sl'. The projection is on the 'm' side of an m:1 relationship, which means the query is marked as a 'query which result could contain duplicates'. When that's the case, the DQE will try to emit DISTINCT because it's otherwise not going to be a correct paging query. DISTINCT is required for 'top' in this case, as otherwise you'll probably get 1 page of all duplicates.

DISTINCT + TOP aren't allowed if there's an ORDER BY in the query on a field which isn't in the projection. So the runtime decides to page on the client. This means that it reads pageno*pagesize rows from the datareader and then closes the connection. it doesn't mean it fetches all data to the client, only the rows which are necessary for the page.

There's no way around this (as it's a DB limitation), other than that you avoid sorting on a field in a related entity.

A curiosity question for Otis: is the LLBL forum picture or the Twitter "Metal Guitar" picture more recent? simple_smile

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39861
Joined: 17-Aug-2003
# Posted on: 20-Oct-2009 17:01:07   

Gabbo wrote:

Otis wrote:

The reason for this is the following: You sort on a field which isn't in the select list (projection) of the query, namely a field in 'sl'. The projection is on the 'm' side of an m:1 relationship, which means the query is marked as a 'query which result could contain duplicates'. When that's the case, the DQE will try to emit DISTINCT because it's otherwise not going to be a correct paging query. DISTINCT is required for 'top' in this case, as otherwise you'll probably get 1 page of all duplicates.

DISTINCT + TOP aren't allowed if there's an ORDER BY in the query on a field which isn't in the projection. So the runtime decides to page on the client. This means that it reads pageno*pagesize rows from the datareader and then closes the connection. it doesn't mean it fetches all data to the client, only the rows which are necessary for the page.

There's no way around this (as it's a DB limitation), other than that you avoid sorting on a field in a related entity.

Yes, when I changed the SORT to the primary key of the MANY table, it did emit SELECT DISTINCT TOP 100.

My usage isn't for paging, but for admin purposes (easily view the last 100 records for testing/debugging reasons). Right now these are SQL scripts that I run in Management Studio, but are a pain because they're more difficult to organize and access efficiently. I thought I could put them in the application and make an easy to use interface over them. But since some of the tables have millions of records, bringing them over to the client-side is clearly problematic.

When you say it's a DB limitation, I'm not sure I understand this. Are you saying that for the purposes of using Take() for paging that it's a limitation? I only ask because with SQL Server you can obviously take the query that's created and add a TOP 100 and at least for my use case, I get the results I want. Thanks!

Yes, only using TOP 100 will get you 100 rows, but the chances are you will get duplicates in those 100 rows. (as you join a 1:n relation)

This means that DISTINCT is necessary to really get the 100 distinct rows, and distinct isn't allowed in that situation.

Fetching entities will filter out the duplicates, resulting in less than 100 entities in this situation, hence the paging is on the client.

Gabbo wrote:

Otis wrote:

The reason for this is the following: You sort on a field which isn't in the select list (projection) of the query, namely a field in 'sl'. The projection is on the 'm' side of an m:1 relationship, which means the query is marked as a 'query which result could contain duplicates'. When that's the case, the DQE will try to emit DISTINCT because it's otherwise not going to be a correct paging query. DISTINCT is required for 'top' in this case, as otherwise you'll probably get 1 page of all duplicates.

DISTINCT + TOP aren't allowed if there's an ORDER BY in the query on a field which isn't in the projection. So the runtime decides to page on the client. This means that it reads pageno*pagesize rows from the datareader and then closes the connection. it doesn't mean it fetches all data to the client, only the rows which are necessary for the page.

There's no way around this (as it's a DB limitation), other than that you avoid sorting on a field in a related entity.

One more related question based on an observation as I was debugging this. It has to do with using this form (the LINQ is identical up to the ellipses and is based on my original example):

... select new { slf.SitLogFilenameId, slf.SiteLogId, slf.FileName }).Take(100);

VERSUS ... select slf).Take(100);

When sorting on the primary key of the many field, I now get the DISTINCT TOP 100 for the second example, but the first example just issues a SELECT. If you DIFF the results the query produced is identical, with the exception of the DISTINCT TOP 100. Is this by design?

This is because you fetch a custom projection and by default you have to explicitly specify Distinct() if you want unique values simple_smile . So if you add .Distinct() before .Take() you should get the distinct / top combo.

A curiosity question for Otis: is the LLBL forum picture or the Twitter "Metal Guitar" picture more recent?

Forum picture: from 2002, twitter picture: 2009 (metal guitar? wink )

Frans Bouma | Lead developer LLBLGen Pro