Feature Enhacement - UNION

Posts   
 
    
bertcord avatar
bertcord
User
Posts: 206
Joined: 01-Dec-2003
# Posted on: 05-Feb-2004 17:08:08   

In my environment we have many instances were we create new tables for reach day. Now before any one flames me and tells me the design sucks I can assure you this was required. Some tables we use can are populated with anywhere from 10million to 100 million records a day. When the new daily tables are created views are created that union a range of days. For example we might have weekly view that UNION the last 7 days. I would like to do the same thing in code

Dim adapter As New DataAccessAdapter() Dim LogTableResults As New EntityCollection(New LogTableUnionEntityFactory()"tblLogTracking20040123,tblLogTracking20040124") adapter.FetchEntityCollection(LogTableResults, Nothing)

but now I think about this .... it would not be good, confused as Entity classes can be used for updates and deletes. A union is read only. Maybe this type of functionality with typedlists.

An overload to the FetchTypedView?

Dim LogTable As New LogTableTypedView() Dim adapter As New DataAccessAdapter() adapter.FetchTypedView(LogTable.GetFieldsInfo(), LogTable,"tblLogTracking20040123,tblLogTracking20040124")))

Or a new method of the adapterobject

Dim LogTable As New LogTableTypedView() Dim adapter As New DataAccessAdapter() adapter.FetchUnionTypedView(LogTable.GetFieldsInfo(), LogTable,"tblLogTracking20040123,tblLogTracking20040124")))

Any input or suggestions would be great

Thanks Bert

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 05-Feb-2004 17:36:36   

If there are views generated, can you use them to fetch the unioned data?

(I have to make a change to the DataAccessAdapter first to make this more possible, to allow you to modify persistence information on the fly I guess. This change is planned)

You can always fetch multiple entities of different tables into a single entitycollection, if you want. This is not that efficient if you want to fetch large amounts of data though.

Frans Bouma | Lead developer LLBLGen Pro
takb
User
Posts: 150
Joined: 12-Mar-2004
# Posted on: 15-Oct-2004 03:07:18   

Any plans on implementing UNION capabilities in typed lists?

We have a few queries that we would normally be implementing using typed lists, however the queries contain UNION statements and so we can't use a typed list. We of course can implement them as a view (which is what we are doing now) but we were limiting our usage of views prefering to stick with typed lists for consistency.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 15-Oct-2004 10:00:13   

You can execute 2 times a fill on a typed list, doing the same. Of course, it's 2 times a query. What is not done then is DISTINCT with union.

There are no plans for a union query at the moment, unless you can give me an example which requires the UNION statement otherwise it will fail. simple_smile

Frans Bouma | Lead developer LLBLGen Pro
Posts: 112
Joined: 09-Aug-2004
# Posted on: 19-Oct-2004 14:57:13   

When binding to a drop down list we like to include a blank element at the top. The easiest way to do this is to just Union the query with 0's and empy strings.

What solution would you recommend which avoids manually putting the element together for each object?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 19-Oct-2004 15:10:26   

Hmm, if you bind typedlists, you can create a generic routine which adds a datarow which is empty, then you fill the typedlist.

Frans Bouma | Lead developer LLBLGen Pro
Posts: 112
Joined: 09-Aug-2004
# Posted on: 19-Oct-2004 15:16:06   

But then you have to generate a typed list for every drop down. In the current project I am working on we have about 6 or so tables which are comprised of an id and a string to describe what the id represents. I extended the classes to add a blank row so its not such a big issue for this project, but for bigger projects it could get tedious. Hmm, Maybe modifing the templates would be appropriate

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 19-Oct-2004 15:46:53   

You could: - add an empty entity to the new collection - set the collection's property SuppressClearInGetMulti to true - call getmulti()

For adapter, you don't have to do anything, collections aren't cleared before a fetch.

Frans Bouma | Lead developer LLBLGen Pro
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 21-May-2005 16:29:52   

Otis wrote:

You can execute 2 times a fill on a typed list, doing the same. Of course, it's 2 times a query. What is not done then is DISTINCT with union. There are no plans for a union query at the moment, unless you can give me an example which requires the UNION statement otherwise it will fail. simple_smile

I'm on a big performance enhancement drive at the moment and need to eliminate all large table or index scans from our system. I have just discovered the ability to use a UNION in place of an OR predicate to eliminate the scan. Obviously this would be nice to have in LLBLGen without having to make 2 queries and hence 2 trips to the DB...

I tell you it would be even nicer if LLBLGen spotted the OR predicate and transformed it into a UNION on your behalf like a pre-optimizer wink Well you can only wish... simple_smile

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 22-May-2005 11:28:05   

Marcus wrote:

Otis wrote:

You can execute 2 times a fill on a typed list, doing the same. Of course, it's 2 times a query. What is not done then is DISTINCT with union. There are no plans for a union query at the moment, unless you can give me an example which requires the UNION statement otherwise it will fail. simple_smile

I'm on a big performance enhancement drive at the moment and need to eliminate all large table or index scans from our system. I have just discovered the ability to use a UNION in place of an OR predicate to eliminate the scan. Obviously this would be nice to have in LLBLGen without having to make 2 queries and hence 2 trips to the DB...

Index scans aren't slow, they HAVE to be done at some point to achieve something wink .

OR queries are always slow. As soon as you see this in your query: WHERE ... OR ...

you know you have a slow query.

I tell you it would be even nicer if LLBLGen spotted the OR predicate and transformed it into a UNION on your behalf like a pre-optimizer wink Well you can only wish... simple_smile

UNIONs are hard to generate. The problem is that the two queries have to be the same, select list wise, (in types). This is a problem with the more advanced usage of UNION, not the two times the same query exection wink .

A union comes down to the fact of executing the same query twice and storing the data in the collection you're already working with, I left it out. That setup isn't slow, as with adapter for example you can keep open the connection (and otherwise it will take <20ms) and the query is already processed on the rdbms thus the execution plan can be re-used.

Now, some tests. Q1:


SELECT * FROM Customers where country = 'France'
UNION 
SELECT * FROM Customers where country = 'Mexico'

Q2:


SELECT * FROM Customers 
where country = 'France'
    OR
    country = 'Mexico'

Q1 subtree costs is 8. Q2 subtree costs is 3. I think that's significant.

(IN query in Q2 doesn't matter).

I think that UNION isn't much faster, on the contrary.

Frans Bouma | Lead developer LLBLGen Pro
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 23-May-2005 10:22:54   

Otis wrote:

UNIONs are hard to generate. The problem is that the two queries have to be the same, select list wise, (in types). This is a problem with the more advanced usage of UNION, not the two times the same query exection wink .

I thought UNIONs always had to have the same number of columns, the first SELECT in the UNION determines the number of columns and their type, the other selects have to match this criteria (or cast results to the appropriate type) which is why I thought it wouldn't present a difficulty in generation... But then there's always something in the detail that makes it more difficult simple_smile

Otis wrote:

A union comes down to the fact of executing the same query twice and storing the data in the collection you're already working with, I left it out. That setup isn't slow, as with adapter for example you can keep open the connection (and otherwise it will take <20ms) and the query is already processed on the rdbms thus the execution plan can be re-used.

Is that figure of <20ms the time taken to fetch a connection from the pool? What about network latency if the DB is on another box? I've always been curious about your insitance that making multiple calls (over an open connection) to the DB is not inefficient. I don't have any data to back up anything to the contrary, but it just seems unlikely. I'd be very interested if anyone has any figures on this.

Otis wrote:

Now, some tests. ... Q1 subtree costs is 8. Q2 subtree costs is 3. I think that's significant.

Ohhhh, I didn't mean that UNIONs are faster than ORs in all cases. It seems that under certain circumstances when using ORs (which I have failed to reproduced for you in Northwind disappointed ) the SQL Server query optimiser ignores column indexes and favors index scans.

In my search for performance bottlenecks the following query which has a subtree cost of 1.21 (97% relative to batch) and uses Index Scans

SELECT ...
FROM ( [Permission] LEFT JOIN [FolderItem] 
ON  [Permission].[ResourceUID]=[FolderItem].[FolderUID]) 
WHERE ( ( [Permission].[ResourceUID] = @ResourceUID1 
Or [FolderItem].[ItemUID] = @ItemUID2) 
And [Permission].[GroupUID] 
IN (@GroupUID3, @GroupUID4, @GroupUID5, @GroupUID6) 
And [Permission].[SkinAccountUID] = @SkinAccountUID7)

can be replaced with the following query which has a subtree cost of 0.03 (3% relative to batch) and uses Index Seeks

SELECT ...
FROM ( [Permission] LEFT JOIN [FolderItem] 
ON  [Permission].[ResourceUID]=[FolderItem].[FolderUID]) 
WHERE ( Permission].[ResourceUID] = @ResourceUID1) 
And [Permission].[GroupUID] 
IN (@GroupUID3, @GroupUID4, @GroupUID5, @GroupUID6) 
And [Permission].[SkinAccountUID] = @SkinAccountUID7)

UNION ALL

SELECT ...
FROM ( [Permission] LEFT JOIN [FolderItem] 
ON  [Permission].[ResourceUID]=[FolderItem].[FolderUID]) 
WHERE ( [FolderItem].[ItemUID] = @ItemUID2
And [Permission].[GroupUID] 
IN (@GroupUID3, @GroupUID4, @GroupUID5, @GroupUID6) 
And [Permission].[SkinAccountUID] = @SkinAccountUID7)

The trick is discussed http://www.windowsitpro.com/Article/ArticleID/37531/37531.html and http://www.sql-server-performance.com/transact_sql.asp

Personally I don't understand why the optimizer doesn't spot this optimisation itself... I guess its not perfect.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 23-May-2005 20:45:10   

Marcus wrote:

Otis wrote:

UNIONs are hard to generate. The problem is that the two queries have to be the same, select list wise, (in types). This is a problem with the more advanced usage of UNION, not the two times the same query exection wink .

I thought UNIONs always had to have the same number of columns, the first SELECT in the UNION determines the number of columns and their type, the other selects have to match this criteria (or cast results to the appropriate type) which is why I thought it wouldn't present a difficulty in generation... But then there's always something in the detail that makes it more difficult simple_smile

Heh, ok that's of course not that hard. The union tricks I meant was that you add more columns which are NULL in the first set and filled in in the second set. simple_smile

Otis wrote:

A union comes down to the fact of executing the same query twice and storing the data in the collection you're already working with, I left it out. That setup isn't slow, as with adapter for example you can keep open the connection (and otherwise it will take <20ms) and the query is already processed on the rdbms thus the execution plan can be re-used.

Is that figure of <20ms the time taken to fetch a connection from the pool?

Yes.

What about network latency if the DB is on another box? I've always been curious about your insitance that making multiple calls (over an open connection) to the DB is not inefficient. I don't have any data to back up anything to the contrary, but it just seems unlikely. I'd be very interested if anyone has any figures on this.

This is an open connection, the one you get from the pool. I wonder, why is that much time so important? I won't believe it would make or break your app, unless it is called a lot of times per second simple_smile (and I mean: A LOT of times, so you then would think: why not cache the processed output for 20 seconds?)

Otis wrote:

Now, some tests. ... Q1 subtree costs is 8. Q2 subtree costs is 3. I think that's significant.

Ohhhh, I didn't mean that UNIONs are faster than ORs in all cases. It seems that under certain circumstances when using ORs (which I have failed to reproduced for you in Northwind disappointed ) the SQL Server query optimiser ignores column indexes and favors index scans.

In my search for performance bottlenecks the following query which has a subtree cost of 1.21 (97% relative to batch) and uses Index Scans

SELECT ...
FROM ( [Permission] LEFT JOIN [FolderItem] 
ON  [Permission].[ResourceUID]=[FolderItem].[FolderUID]) 
WHERE ( ( [Permission].[ResourceUID] = @ResourceUID1 
Or [FolderItem].[ItemUID] = @ItemUID2) 
And [Permission].[GroupUID] 
IN (@GroupUID3, @GroupUID4, @GroupUID5, @GroupUID6) 
And [Permission].[SkinAccountUID] = @SkinAccountUID7)

can be replaced with the following query which has a subtree cost of 0.03 (3% relative to batch) and uses Index Seeks

SELECT ...
FROM ( [Permission] LEFT JOIN [FolderItem] 
ON  [Permission].[ResourceUID]=[FolderItem].[FolderUID]) 
WHERE ( Permission].[ResourceUID] = @ResourceUID1) 
And [Permission].[GroupUID] 
IN (@GroupUID3, @GroupUID4, @GroupUID5, @GroupUID6) 
And [Permission].[SkinAccountUID] = @SkinAccountUID7)

UNION ALL

SELECT ...
FROM ( [Permission] LEFT JOIN [FolderItem] 
ON  [Permission].[ResourceUID]=[FolderItem].[FolderUID]) 
WHERE ( [FolderItem].[ItemUID] = @ItemUID2
And [Permission].[GroupUID] 
IN (@GroupUID3, @GroupUID4, @GroupUID5, @GroupUID6) 
And [Permission].[SkinAccountUID] = @SkinAccountUID7)

The trick is discussed http://www.windowsitpro.com/Article/ArticleID/37531/37531.html and http://www.sql-server-performance.com/transact_sql.asp

Personally I don't understand why the optimizer doesn't spot this optimisation itself... I guess its not perfect.

I see the difference, but it would be very hard for me to optimize that away from a filter which is offered to the framework.

Frans Bouma | Lead developer LLBLGen Pro