Upserting

Posts   
 
    
simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 28-Jun-2017 09:39:56   

I have a Valuation table which stores a Value and a ValuationDate for 'things'.

The structure is this:- int ID (PK) int AssetID (FK, nullable) int LiabilityID (FK, nullable) int ProviderFundID (FK, nullable) int ClientFundID (FK, nullable) byte ValuationSource (nullable - Automatic, Manual etc.) decimal Value datetime ValuationDate (date only, no time)

  • Only one of AssetID, LiabilityID, ProviderFundID, ClientFundID will be not null in a row
  • A 'thing' can only have one valuation for a given date (no constraint currently)
  • This table will have hundreds of thousands if not millions of rows eventually
  • I need to be able to Upsert in a batch of, potentially, hundreds or thousands at a time.
  • I am using LLBLGen 4.2 with Sql Server Express (2014 or higher), potentially Azure
  • LLBLGen 5.x is not an option at the moment but I am able to build special a query in a partial class of DataAccessAdapter - have done a bit of this before.

I am looking for any help/guidance on the best way of optimizing storage and Upserting.

First question: is it worth having a constraint to enforce one valuation per 'thing'/date? My initial thought is that it will just slow Upserts down and I should just enforce this from the software side.

For the Upsert, I am fetching data online from various sources, matching it to the 'things' and then also fetching via LLBLGen the latest Valuation per 'thing'. Typically, the online source will have a Valuation Date later than the one in the database per 'thing'. The process can be run again on the same day of course and since I have the latest Entity I can detect that and arrange to up the entity by ID. The problem is that, in theory, the user can add or import manual valuations and there may be a Valuation in the database which should be overwritten but is not the latest one I have fetched into memory - this would bomb the process if I had a constraint or leave multiple value if not.

For complete safety, I think I want to build the Valuations 'as they need to be' and then use a transaction to Upsert the database to match them.

Pseudo code: If Exists (a db row with matching (AssetID/LiabilityID/ProviderFundID/ClientFundID) AND (date) Then Update (ValuationSource, Value, ValuationDate) Else Insert (new entity)

My initial thought would be to insert the Valuations I build on the client into a SQL Server temp table then somehow process them all on the server side.

Any thoughts or suggestions?

Walaa avatar
Walaa
Support Team
Posts: 14950
Joined: 21-Aug-2005
# Posted on: 28-Jun-2017 18:30:56   

First question: is it worth having a constraint to enforce one valuation per 'thing'/date? My initial thought is that it will just slow Upserts down and I should just enforce this from the software side.

IMHO it's needed from a logical and data integrity perspective.

My initial thought would be to insert the Valuations I build on the client into a SQL Server temp table then somehow process them all on the server side.

Any thoughts or suggestions?

Since you are going to use Azure, I'd recommend the client saves the list of things to an Azure Queue, then you'd process the queue from the server side and save each item to the database (whether Insert or Update).

simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 05-Jul-2017 09:35:22   

Any solution has to work on plain SQL Server as well as Azure

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 05-Jul-2017 10:19:34   

Then I think a temptable solution is best: it doesn't need to be a '#table' temptable btw, you can also use a dedicated table for this, as long as you know which rows to process. For this kind of work I think it's best to simply push the data to process as close to the DB as possible, as you want row/read locks while operating the data but at the same time get rid of them as soon as possible. With rows to process in a table (temp or otherwise) the process should be quick and perhaps you can utilize more generic insert/update statements which process multiple rows at a time, instead of one row max.

Frans Bouma | Lead developer LLBLGen Pro
simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 05-Jul-2017 10:49:32   

This code works but is there a better way to reuse the parameters and/or Fields collection?

            public void Upsert(EntityCollection<T> collectionToSave)
            {
                try
                {
                    var catalogName = owner.CatalogNameToUse.Length == 0 ? "TIPS" : owner.CatalogNameToUse;

//                  CreateTempTable(catalogName, collectionToSave[0].Fields);

                    var engine = owner.CreateDynamicQueryEngine();

                    var fieldPersistenceInfos = owner.GetFieldPersistenceInfos(collectionToSave[0].Fields)
                        .Select(o =>
                            new FieldPersistenceInfo(string.Empty, tableName, o.SourceColumnName, o.SourceColumnIsNullable, o.SourceColumnDbType, o.SourceColumnMaxLength, o.SourceColumnScale,
                                o.SourceColumnPrecision, false, null, o.TypeConverterToUse, o.ActualDotNetType))
                        .ToArray();

                    foreach (var entity in collectionToSave)
                    {
                        var entityFields = entity.Fields.GetAsEntityFieldCoreArray()
                            .ToArray();

                        var actionQuery = engine.CreateInsertDQ(entityFields, fieldPersistenceInfos, owner.GetActiveConnection());


                        owner.ExecuteActionQuery(actionQuery);

                    }
                }
                finally
                {
//                      RunSqlFetchCommandDirect($"DROP TABLE [{tableName}];");
                }
            }

Walaa avatar
Walaa
Support Team
Posts: 14950
Joined: 21-Aug-2005
# Posted on: 05-Jul-2017 20:58:42   

This code works but is there a better way to reuse the parameters and/or Fields collection?

For what else, would you reuse the fields?

simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 06-Jul-2017 09:09:54   

Let me clarify something:- <entity>.Fields - these are per-entity and read/write IFieldPersistenceInfo[] - these are shared and so cannot be modified <entity>.Fields.GetAsEntityFieldCoreArray() - these are Fields just cast to IEntityFieldCore

I create copies of the FieldPersistenceInfos so I can remove the TableName and hide the PrimaryKey column.

Can you confirm this right?

I now have a child class within DataAccessAdapter and it creates and populates a Temp table. I am now looking at the perform the Upsert bit.

I have arrays containing the Match fields and Update fields. Here is a sample script I found on StackOverflow (not my fields)

UPDATE drug
SET [CDM] = tmp.[CDM]
  , [NDC] = tmp.[NDC]
  , [IP_COST] = tmp.[IP_COST]
  , [OP_COST] = tmp.[OP_COST]
  , [HH_COST] = tmp.[HH_COST]
  , [VAR_COST] = tmp.[VAR_COST]
  , [LSTUPDATE] = tmp.[LSTUPDATE]
FROM [RX_Billing].[dbo].[FS_Drug] drug
INNER JOIN [RX_Billing].[dbo].[FS_Drug_TEMP] tmp  
    ON drug.[TRADENAME] = tmp.[TRADENAME]

-- Insert rows that don't have matching TRADENAME
INSERT INTO drug
SELECT 
    tmp.[TRADENAME]
  , tmp.[CDM]
  , tmp.[NDC]
  , tmp.[IP_COST]
  , tmp.[OP_COST]
  , tmp.[HH_COST]
  , tmp.[VAR_COST]
  , tmp.[LSTUPDATE]
FROM [RX_Billing].[dbo].[FS_Drug] drug
RIGHT OUTER JOIN [RX_Billing].[dbo].[FS_Drug_TEMP] tmp 
    ON  drug.[TRADENAME] = tmp.[TRADENAME]
WHERE drug.[TRADENAME] IS NULL 

Any suggestions on the best way of building a similar query using the Field infos I have?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 06-Jul-2017 11:26:32   

simmotech wrote:

Let me clarify something:- <entity>.Fields - these are per-entity and read/write IFieldPersistenceInfo[] - these are shared and so cannot be modified <entity>.Fields.GetAsEntityFieldCoreArray() - these are Fields just cast to IEntityFieldCore

I create copies of the FieldPersistenceInfos so I can remove the TableName and hide the PrimaryKey column.

Can you confirm this right?

Correct. FieldPersistenceInfo's are static and shared among all usages, so manipulating them for a single query will ruin all queries. You can make copies and change those and that's fine.

EntityField instances are actually simple wrappers around IFieldInfo (which is also static and shared among all instances of the same field) and dynamic values like an alias set for a query. It has copy-on-write code for when you want to change values in the IFieldInfo like the index. But in general you don't need to (and you shouldn't, really)

I now have a child class within DataAccessAdapter and it creates and populates a Temp table. I am now looking at the perform the Upsert bit.

I have arrays containing the Match fields and Update fields. Here is a sample script I found on StackOverflow (not my fields)

UPDATE drug
SET [CDM] = tmp.[CDM]
  , [NDC] = tmp.[NDC]
  , [IP_COST] = tmp.[IP_COST]
  , [OP_COST] = tmp.[OP_COST]
  , [HH_COST] = tmp.[HH_COST]
  , [VAR_COST] = tmp.[VAR_COST]
  , [LSTUPDATE] = tmp.[LSTUPDATE]
FROM [RX_Billing].[dbo].[FS_Drug] drug
INNER JOIN [RX_Billing].[dbo].[FS_Drug_TEMP] tmp  
    ON drug.[TRADENAME] = tmp.[TRADENAME]

-- Insert rows that don't have matching TRADENAME
INSERT INTO drug
SELECT 
    tmp.[TRADENAME]
  , tmp.[CDM]
  , tmp.[NDC]
  , tmp.[IP_COST]
  , tmp.[OP_COST]
  , tmp.[HH_COST]
  , tmp.[VAR_COST]
  , tmp.[LSTUPDATE]
FROM [RX_Billing].[dbo].[FS_Drug] drug
RIGHT OUTER JOIN [RX_Billing].[dbo].[FS_Drug_TEMP] tmp 
    ON  drug.[TRADENAME] = tmp.[TRADENAME]
WHERE drug.[TRADENAME] IS NULL 

Any suggestions on the best way of building a similar query using the Field infos I have?

In your copies you can overwrite the SourceObjectName to 'tmp', then I'd simply try to create a select query using the CreateSelectDQ() method and prefix it with 'insert into <table>'. For the update query you can create expressions, which contain a field and the copy of the fieldpersistence info, on the fields of a dummy entity and use the entity to create the update query (so the fields aren't set to a value, but to an expression with just a field and the copy of the persistenceinfo), using http://www.llblgen.com/Documentation/5.2/LLBLGen%20Pro%20RTF/Using%20the%20generated%20code/gencode_expressionsaggregates.htm#expressions-in-entity-updates

this avoids much SQL, I think.

Frans Bouma | Lead developer LLBLGen Pro
simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 06-Jul-2017 12:41:44   

Hmm. I am getting a problem with different array sizes because Fields has had two fields hidden. If I provide all fields , I get past that problem but the query is trying to read the two fields not in the temp table. Does Field map to FieldProvider Info by Index and not Name?

There is also the issue of the joins I will need to create. I thought I would need some sort of DerivedTable and DerivedRelation configured too.

Maybe it would be easier to build a query using StringBuilder?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 06-Jul-2017 16:00:15   

simmotech wrote:

Hmm. I am getting a problem with different array sizes because Fields has had two fields hidden. If I provide all fields , I get past that problem but the query is trying to read the two fields not in the temp table. Does Field map to FieldProvider Info by Index and not Name?

If you have a field as 'excluded' simply pass 'null' for its fieldpersistence info. So the arrays have to be of the same size, but at the index of the excluded field, pass null.

There is also the issue of the joins I will need to create. I thought I would need some sort of DerivedTable and DerivedRelation configured too. Maybe it would be easier to build a query using StringBuilder?

A dynamicrelation you mean? you can specify a join between two tables by using a dynamicrelation and use SetLeftOperandPersistenceInfo and SetRightOperandPersistenceInfo to pass the IFieldPersistenceInfo's that you've altered.

Frans Bouma | Lead developer LLBLGen Pro
simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 10-Jul-2017 13:04:46   

Walaa was right - I do need data integrity first and foremost because I found my test data already had a few duplicate rows!

So I asked a question on Stack Overflow (stackoverflow.com/questions/45005283/) about the best Constraints and Indexes to use. The two answers I got suggested that instead of nullable FK Columns, I should use a DataType table(s) specific to each 'thing'. I had responded by saying I needed to use LLBLGen PrefetchPaths to bring back associated Valuations for a 'thing'

However, the more I read, the more it sounded a bit like a LLBLGen TargetPerEntityHierachy discriminator column though I can't get my head around how the FK relationships would work Am I barking up the wrong tree?

With regard to the Upsert from Temp table, I think I have this working now (went with StringBuilder)

USE [TIPS]

UPDATE [Valuation]
SET
  [Value] = [##temp].[Value],
  [ValuationSource] = [##temp].[ValuationSource]
FROM [Valuation]
JOIN [##temp] ON 
(([##temp].[AssetID] IS NULL AND [Valuation].[AssetID] IS NULL) OR ([##temp].[AssetID] = [Valuation].[AssetID])) AND
(([##temp].[ClientFundID] IS NULL AND [Valuation].[ClientFundID] IS NULL) OR ([##temp].[ClientFundID] = [Valuation].[ClientFundID])) AND
(([##temp].[LiabilityID] IS NULL AND [Valuation].[LiabilityID] IS NULL) OR ([##temp].[LiabilityID] = [Valuation].[LiabilityID])) AND
(([##temp].[ProviderFundID] IS NULL AND [Valuation].[ProviderFundID] IS NULL) OR ([##temp].[ProviderFundID] = [Valuation].[ProviderFundID])) AND
(([##temp].[ValuationDate] IS NULL AND [Valuation].[ValuationDate] IS NULL) OR ([##temp].[ValuationDate] = [Valuation].[ValuationDate]))
WHERE
[##temp].[Value] <> [Valuation].[Value] AND
[##temp].[ValuationSource] <> [Valuation].[ValuationSource]

SET @UpdateCount = @@ROWCOUNT

INSERT INTO [Valuation]
([Value], [ValuationSource], [AssetID], [ClientFundID], [LiabilityID], [ProviderFundID], [ValuationDate])
SELECT
  [##temp].[Value],
  [##temp].[ValuationSource],
  [##temp].[AssetID],
  [##temp].[ClientFundID],
  [##temp].[LiabilityID],
  [##temp].[ProviderFundID],
  [##temp].[ValuationDate]
FROM [Valuation]
RIGHT JOIN [##temp] ON 
(([##temp].[AssetID] IS NULL AND [Valuation].[AssetID] IS NULL) OR ([##temp].[AssetID] = [Valuation].[AssetID])) AND
(([##temp].[ClientFundID] IS NULL AND [Valuation].[ClientFundID] IS NULL) OR ([##temp].[ClientFundID] = [Valuation].[ClientFundID])) AND
(([##temp].[LiabilityID] IS NULL AND [Valuation].[LiabilityID] IS NULL) OR ([##temp].[LiabilityID] = [Valuation].[LiabilityID])) AND
(([##temp].[ProviderFundID] IS NULL AND [Valuation].[ProviderFundID] IS NULL) OR ([##temp].[ProviderFundID] = [Valuation].[ProviderFundID])) AND
(([##temp].[ValuationDate] IS NULL AND [Valuation].[ValuationDate] IS NULL) OR ([##temp].[ValuationDate] = [Valuation].[ValuationDate]))
WHERE [Valuation].[ID] IS NULL

SET @InsertCount = @@ROWCOUNT

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 11-Jul-2017 15:09:11   

Not really sure what we can add here simple_smile The scenario for TPEH and relationships is not ideal: all relationships are defined on the same table and FK fields are always nullable, if they're in a subtype, and nullability is constraint at the entity level.

I'd be careful with 'I'm the only app writing to this table', as that might change in the future for whatever reason and constraining things at the app level will then be a problem.

Frans Bouma | Lead developer LLBLGen Pro
simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 17-Jul-2017 10:39:51   

Guys

Thanks for all the help. Much appreciated.

I have a working Upserter and I have a attached a copy for anyone interested (there are one or two bits specific to my code but you'll get the gist)

I took your advice re/ TPEH and left the table as-is.

With regard to a constraint to preventing duplicate Valuations, I am going to defer this for now because there are a number of places in the app code that could be causing the problem and Valuations are not saved in isolation but as a part of an entity tree and I don't want to introduce a worse problem where that whole save would fail! The existing code only reads the last Valuation anyway (top1 ordered by date DESC then ID desc) so duplicates aren't actually a problem when reading them.

Instead I have a checker (visible in the GUI) which identifies duplicates and provides a button to remove them. So I can clear them now and then, when I have more time, use this to identify which bits of the code are creating the duplicates and then fix them.

Attachments
Filename File size Added on Approval
DataAccessAdapter_Upserter.zip 2,312 17-Jul-2017 10:40.14 Approved
Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 19-Jul-2017 09:32:42   

Thanks for sharing! simple_smile

Frans Bouma | Lead developer LLBLGen Pro