Missing LIMIT Clause when Joining (MySQL)

Posts   
 
    
MarcoP avatar
MarcoP
User
Posts: 270
Joined: 29-Sep-2004
# Posted on: 10-Jan-2011 07:47:14   

2.6 Final v2.0.50727

I am using the MySql driver and when I run the following query that joins up to another table the limit clause is not being applied. Am I missing something here? I use Take when using only one table and it works great. Also, no text/ntext fields are being returned.

Also, the first table (UCC) contains 6million rows while the second one contains 100K, so limiting the # results is essential for this case.


var query = from u in MetaData.UCC
                       join d in MetaData.AssignedDealers on u.Fips equals d.Fips into dealers
                       from x in dealers.DefaultIfEmpty()
                       where
                            (x.Brand == SearchDealerBrand || x.Brand == null) || (x.ProductType == SearchProductType || x.ProductType == null)
                       order by u.Id
                       select new UCCRecord
                       {
                           DmConstantKey = u.DmConstantKey,
                           CompanyName = u.CompanyName,
                           FullName = u.FullName,
                           FirstName = u.FirstName,
                           LastName = u.LastName,
                           AddressLine1 = u.AddressLine1,
                           AddressLine2 = u.AddressLine2,
                           City = u.City,
                           StateCode = u.StateCode,
                           ZipCode = u.ZipCode,
                           Phone1 = u.Phone1,
                           EmailAddress = u.EmailAddress,
                           AgCe = u.AgCe
                       };

return query.Take(50);

Walaa avatar
Walaa
Support Team
Posts: 14950
Joined: 21-Aug-2005
# Posted on: 10-Jan-2011 08:43:47   

v2.0.50727

That's the .NET version, not the correct RTL version(build number), please refer to the Forum's Guideline thread.

When you say the "limit clause is not being applied", do you mean you cant see it in the generate sql query, or that you are getting more records than what you have specified?

I'm asking this coz sometimes the engine resort to limiting at client side, i.e. using a datareader that only reads the number of records you have specified. This happens when the engine thinks the resultSet might contain duplicate records and for some reason it can't apply Distinct.

One reason for that can be sorting on a field that's not in the select list.

You can try to add the u.Id to the select List. You can also try:

query.Distinct().Take(50);
Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 10-Jan-2011 11:37:32   

Indeed, specify Distinct() to get server-side limiting if duplicates can occur. It otherwise switches to client-side limiting. Distinct is recommended, because duplicates ruin the limiting.

This is actually wrong (as in: not necessary), which we have corrected in v3.1, which goes into beta later this week.

Frans Bouma | Lead developer LLBLGen Pro
MarcoP avatar
MarcoP
User
Posts: 270
Joined: 29-Sep-2004
# Posted on: 10-Jan-2011 14:15:41   

Otis wrote:

Indeed, specify Distinct() to get server-side limiting if duplicates can occur. It otherwise switches to client-side limiting. Distinct is recommended, because duplicates ruin the limiting.

This is actually wrong (as in: not necessary), which we have corrected in v3.1, which goes into beta later this week.

Hmm....Based on my tables containing 6million and 1 milllion rows, how do you think the client-side would perform (I am sure no duplicates come back)? Wouldn't the server-side performance be very expensive with this amount of rows??

Worse case, is there a way I can add this to the generated sql?

Walaa avatar
Walaa
Support Team
Posts: 14950
Joined: 21-Aug-2005
# Posted on: 10-Jan-2011 15:22:03   

Did you try specifying Dinstinct() ?

MarcoP avatar
MarcoP
User
Posts: 270
Joined: 29-Sep-2004
# Posted on: 10-Jan-2011 15:28:36   

Walaa wrote:

Did you try specifying Dinstinct() ?

I took the generated query and added distinct to it and ran it in the query window and it ran for a very long time before I just killed the connection. I am asking because the company supplying the denormalized database and tables has not added any indexes yet and I am trying to cover my bases if they do not get to it right away. I know many people have requested this, but WORSE, case scenario, is there a way to force the DQE to apply LIMIT when joining WITHOUT distinct (since I know they will be distinct)?

I do not want to get burned by this..

Walaa avatar
Walaa
Support Team
Posts: 14950
Joined: 21-Aug-2005
# Posted on: 10-Jan-2011 16:29:02   

Just to clarify: you are saying that Distinct & Top were emitted, and yet the query took much time on the server. And now you want to find a way to apply Top without applying Distinct.

Right?

MarcoP avatar
MarcoP
User
Posts: 270
Joined: 29-Sep-2004
# Posted on: 10-Jan-2011 17:04:48   

Walaa wrote:

Just to clarify: you are saying that Distinct & Top were emitted, and yet the query took much time on the server. And now you want to find a way to apply Top without applying Distinct.

Right?

Correct. Distinct it causing it to run very slow and I know duplicates will not get returned, so getting the LIMIT emitted would be ideal.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 10-Jan-2011 18:05:33   

MarcoP wrote:

Otis wrote:

Indeed, specify Distinct() to get server-side limiting if duplicates can occur. It otherwise switches to client-side limiting. Distinct is recommended, because duplicates ruin the limiting.

This is actually wrong (as in: not necessary), which we have corrected in v3.1, which goes into beta later this week.

Hmm....Based on my tables containing 6million and 1 milllion rows, how do you think the client-side would perform (I am sure no duplicates come back)? Wouldn't the server-side performance be very expensive with this amount of rows??

Worse case, is there a way I can add this to the generated sql?

Client side is simple: it simply reads n rows from the datareader and then closes it. This is equal to SELECT ... FROM ... LIMIT x; with the difference that the server will 'load into memory' more rows for the client-side version.

That's also why we made the change. Distinct is recommended to get suitable results (fetching 10 rows which are all duplicates makes no sense).

If distinct is slow in your RDBMS, we added to 3.0 a setting to always do client-side distinct filtering: DynamicQueryEngineBase.DistinctFilteringPreferenceDefault = DistinctFilteringPreferenceType.AlwaysClientSide;

(default is 'System', which is the behavior where distinct filtering is done on the server, unless it's impossible to do otherwise (as it will generate an error)).

This will simply mean that distinct filtering is done on the client, per row values are hashed, and if a duplicate row is detected, it's thrown out. This is done per-row on the datareader, so it could make things perform better, as it doesn't go through the whole resultset first.

After posting I saw you are still on v2.6. It's doable but not ideal. You could modify the MySQL DQE a bit to always switch to client side distinct filtering (by setting the flag on the IQuery object, see sourcecode).

Frans Bouma | Lead developer LLBLGen Pro
MarcoP avatar
MarcoP
User
Posts: 270
Joined: 29-Sep-2004
# Posted on: 10-Jan-2011 18:21:20   

Otis wrote:

MarcoP wrote:

Otis wrote:

Indeed, specify Distinct() to get server-side limiting if duplicates can occur. It otherwise switches to client-side limiting. Distinct is recommended, because duplicates ruin the limiting.

This is actually wrong (as in: not necessary), which we have corrected in v3.1, which goes into beta later this week.

Hmm....Based on my tables containing 6million and 1 milllion rows, how do you think the client-side would perform (I am sure no duplicates come back)? Wouldn't the server-side performance be very expensive with this amount of rows??

Worse case, is there a way I can add this to the generated sql?

Client side is simple: it simply reads n rows from the datareader and then closes it. This is equal to SELECT ... FROM ... LIMIT x; with the difference that the server will 'load into memory' more rows for the client-side version.

That's also why we made the change. Distinct is recommended to get suitable results (fetching 10 rows which are all duplicates makes no sense).

If distinct is slow in your RDBMS, we added to 3.0 a setting to always do client-side distinct filtering: DynamicQueryEngineBase.DistinctFilteringPreferenceDefault = DistinctFilteringPreferenceType.AlwaysClientSide;

(default is 'System', which is the behavior where distinct filtering is done on the server, unless it's impossible to do otherwise (as it will generate an error)).

This will simply mean that distinct filtering is done on the client, per row values are hashed, and if a duplicate row is detected, it's thrown out. This is done per-row on the datareader, so it could make things perform better, as it doesn't go through the whole resultset first.

After posting I saw you are still on v2.6. It's doable but not ideal. You could modify the MySQL DQE a bit to always switch to client side distinct filtering (by setting the flag on the IQuery object, see sourcecode).

Ok, so just to clarify. Right now, using 2.6, if I do NOT specify distinct in my query, client-side version will be used and duplicates COULD result in the result (which they wont since I know our data). So when you say "with the difference that the server will 'load into memory' more rows for the client-side version" does that mean potentially all 6 million rows could be loaded into server memory?

But from what you are saying, there is no way for me to get the LIMIT clause added to the query (Server-Side) when joining without DISTINCT?

Walaa avatar
Walaa
Support Team
Posts: 14950
Joined: 21-Aug-2005
# Posted on: 11-Jan-2011 11:34:19   

But from what you are saying, there is no way for me to get the LIMIT clause added to the query (Server-Side) when joining without DISTINCT?

You may drive from the DataAccessAdapter and override the CreateSelectDQ(), in there you should call the base method, then append your code to the query before returning it.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 11-Jan-2011 12:55:50   

That's a possibility indeed. You can also modify the MySQL DQE code. At line 600 in the MySQL DQE's DynamicQueryEngine, change (maxNumberOfItemsToReturn == 1) into allowDuplicates

Frans Bouma | Lead developer LLBLGen Pro