Bulk handling of mixed inserts and updates

Posts   
 
    
tomra
User
Posts: 2
Joined: 03-Jun-2008
# Posted on: 03-Jun-2008 15:15:06   

Build: LLBLGen Pro ver 2.0.0.0 Database: SQL Server 2000 Dev Env: VB.NET 1.1, SelfServicing

Hello all

I have a project, part of which fetches data from an external datawarehouse and places these rows in our local database, thus effectively replicating the external source locally. I'm using SelfServicing generated code to handle the local database. The provider of the external database timestamps some tables, but not all - meaning we can't always just fetch the data we don't have.

Where there's no timestamp, a typical case is this: I do a fetch from one of the tables which returns around 200,000 rows. For each of the rows, one of the following is true:

1) The row doesn't exist in our local database 2) The row exists in our database (same PK) and is identical to the fetched row 3) The row exists in our database (same PK) and is different to the fetched row

So, using the SelfServicing code, what I'm currently doing for each row is using the appropriate entity's FetchUsingPK method to see if a row with the same PK exists locally. If the row exists in the local database, it's updated with the new row (i.e. create a new entity of that type, set .IsNew = False, save it) and if the row doesn't exist locally it's inserted (same but with IsNew = True).

Seems fairly logical, but it's slow - for this table alone it's essentially doing 200,000 SELECTS on a table which currently has 16 million rows. There are more than 200 tables in a daily process - I've stated by far the most extreme example here, but you see where it's going!

Is there any quicker way? If I add the entities to a collection, I'm guessing that for each one I still have to determine whether the row exists and set .IsNew accordingly before calling SaveMulti()? This means I'm no better off as the time is taken up with these 200k fetches.

Is there any way around this with a UnitOfWork? My understanding of that is a little hazy.

Any help gratefully appreciated!

goose avatar
goose
User
Posts: 392
Joined: 06-Aug-2007
# Posted on: 03-Jun-2008 20:07:21   

I can only think of fetching the whole rowset at once instead of fetching each row individually: so instead of using FetchUsingPK you could put togheter the whole bunch of PK's and send it over in a IN statement and obtain an EntityCollection. After that you still must compare each row individually and Commit the whole collection for update.

tomra
User
Posts: 2
Joined: 03-Jun-2008
# Posted on: 04-Jun-2008 10:37:42   

That's confirmed what I suspected. The way the data is returned from the remote source makes it a little tricky to compare rowset-to-rowset but it's given me an idea none the less.

Many thanks for your help.