Efficient multiple inserts

Posts   
 
    
simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 30-Jan-2009 13:13:58   

I have a table called DOCUMENT_READER which has doc_id and user_id columns. There is a unique index on these columns. (There is also an identity PK field too but we still need to keep that to support an existing app that doesn't support multiple field PKs)

On the client side, I have a user_id (int) and a list of doc_id (int[]). What I want to do is efficiently insert rows for the combinations of user_id and doc_ids where they don't already exist. There is no need to retrieve the new entities (or the existing ones for that matter).

It may be clearer to describe what this table does rather than just supply vague column descriptions: This table maintains a list per user of the documents they have read. Think Outlook and a given user (user_id) selecting a list of messages (to give a list of doc_id) and then choosing Mark All Read from a context menu. It is possible that none, some or all of the messages they have selected are already Marked Read (ie in this DOCUMENT_READER table).

The easiest way seems to be to fetch all rows for the user_id/doc_ids combination, work out what is missing and then insert rows for those that are missing. But this involves two trips to the database.

Is there a way, say by pre-creating all the entities in memory and then performing an insert maybe using a unit of work with a predicate of some sort to exclude those already created so that the database ends up in the correct state in one go with no items fetched?

Cheers Simon

daelmo avatar
daelmo
Support Team
Posts: 8245
Joined: 28-Nov-2005
# Posted on: 31-Jan-2009 09:01:50   

Mmm.... As far as I know, that's not possible. What about using a stored procedure and calling it from code??

David Elizondo | LLBLGen Support Team
Otis avatar
Otis
LLBLGen Pro Team
Posts: 39590
Joined: 17-Aug-2003
# Posted on: 02-Feb-2009 10:34:25   

There's no way to insert by filter. Some databases support insert or update kind of constructs though you only want to insert if it's not there yet, so that means either checking beforehand, or deal with exceptions if a row already exists (which I think isn't the recommended way)

Frans Bouma | Lead developer LLBLGen Pro
simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 13-Feb-2009 07:01:49   

Otis wrote:

There's no way to insert by filter. Some databases support insert or update kind of constructs though you only want to insert if it's not there yet, so that means either checking beforehand, or deal with exceptions if a row already exists (which I think isn't the recommended way)

Isn't INSERT INTO (...) SELECT .... ANSI standard? Anyway, it works in SQL Server so I thought I would have a go. I know my inputs and I know the SQL I would want to generate eventually but I find myself going around in circles with LLCoolJ trying to get the relevant bits out.

Here is the UnitTest (to show the inputs):

        [Test]
        public void CheckJoinTableMultiInsertForMultipleUsers()
        {
            var documentIDs = new[] {2901722, 2901721, 2901632, 2901623};
            var userIDs = new[] { "hewisc", "wayes1" };

            var result = InsertMissingJoinTableEntities<DocumentReaderEntity>(DocumentReaderFields.DocID == documentIDs, DocumentReaderFields.UserID == userIDs);

            Assert.AreEqual(7, result);
        }

Here is the SQL needed to do the business:

DECLARE @UserID1 VarChar(20); SET @UserID1='hewisc'
DECLARE @UserID2 VarChar(20); SET @UserID2='wayes1'
DECLARE @DocID2 Int; SET @DocID2=2901722
DECLARE @DocID3 Int; SET @DocID3=2901721
DECLARE @DocID4 Int; SET @DocID4=2901632
DECLARE @DocID5 Int; SET @DocID5=2901623

INSERT INTO OSS_DOCUMENT_READER (doc_id, user_id, read_dtm)
SELECT
  d.doc_id,
  u.user_id,
  GETDATE()
FROM OSS_DOCUMENT_READER dr
RIGHT JOIN (OSS_DOCUMENT d CROSS JOIN RD_USER u) ON dr.doc_id = d.doc_id AND dr.user_id = u.user_id
WHERE dr.doc_reader_seq IS NULL 
AND d.doc_id IN (@DocID2, @DocID3, @DocID4, @DocID5)
AND u.user_id IN (@UserID1, @UserID2)

and here is the code I have so far:

        static IEntityFieldCore GetEntityFieldFromPredicate(Predicate filter)
        {
            if (filter is FieldCompareValuePredicate) return ((FieldCompareValuePredicate) filter).FieldCore;
            if (filter is FieldCompareRangePredicate) return ((FieldCompareRangePredicate) filter).FieldCore;

            return null;
        }

        public int InsertMissingJoinTableEntities<T>(Predicate filter1, Predicate filter2) where T: EntityBase2, new()
        {
            return InsertMissingJoinTableEntities(new T(), filter1, filter2);
        }

        public int InsertMissingJoinTableEntities(EntityBase2 templateEntity, Predicate filter1, Predicate filter2)
        {
            Guard.NotNull(templateEntity, "templateEntity");
            Guard.NotNull(filter1, "filter1");
            Guard.NotNull(filter2, "filter2");

            var entityName = templateEntity.GetType().Name;
            var filter1Field = GetEntityFieldFromPredicate(filter1);
            var filter2Field = GetEntityFieldFromPredicate(filter2);
            Guard.ValidOperation(entityName == filter1Field.ContainingObjectName && entityName == filter2Field.ContainingObjectName, "templateEntity, filter1 and filter2 must all be from the same entity");

            var emptyPKFilter = ((EntityField2) templateEntity.PrimaryKeyFields[0]) == DBNull.Value;

            var filter = filter1 & filter2 & emptyPKFilter;

        }

Any ideas on how I gen the SQL (or just the select part of it) from those inputs.

Cheers Simon

Walaa avatar
Walaa
Support Team
Posts: 14946
Joined: 21-Aug-2005
# Posted on: 13-Feb-2009 09:38:58   

That's not supported.

Anyway I think the best thing to do is to loop on the entities to be inserted and insert them one by one inside a try catch block, so you can catch any database exception caused by the unique index, and just ignore it and continue looping and inserting.

simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 13-Feb-2009 10:34:41   

Walaa wrote:

That's not supported.

Anyway I think the best thing to do is to loop on the entities to be inserted and insert them one by one inside a try catch block, so you can catch any database exception caused by the unique index, and just ignore it and continue looping and inserting.

I know its not supported currently, though I think it might be a useful addition as an InsertEntitiesDirectly method. I was trying to prototype it.

The solution you suggest works, and is indeed what we currently do, but isn't desirable for several reasons: 1) If I need to insert 200 rows (such as a user selecting all messages and clicking Mark All Read as in my current case) , it is noticeable slow. 2) There is no need for a try catch block because there won't be any exceptions since the query ensures that what is inserted are the 'missing' records. 3) Locking of files. What happens now is that there is a read of the rows that do exist followed by (up to) 200 individual inserts. For a similar task I have in mind there are also triggers that get fired on each insert. This keeps the table locked way longer than is necessary.

With the query above, all the 'missing' records are inserted in one action query to the database and any triggers would work on a single 'inserted' table.

Cheers Simon

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39590
Joined: 17-Aug-2003
# Posted on: 13-Feb-2009 11:04:16   

I was confusing the query you proposed with an INSERT OR UPDATE query construct, which isn't always supported by all databases. The INSERT INTO SELECT might be ansi, but that's not a guarantee wink though I think most databases we support have it.

I'll see if this can be added to v3.

To get the select query, use the CreateSelectDQ method of the adapter instance. Then modify it to get the INSERT statement up front, but it might be tricky.

Frans Bouma | Lead developer LLBLGen Pro
simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 17-Feb-2009 11:58:05   

Had a go at this and performance is now _very _acceptable.

Here is the code I added to DataAccessAdapter if anyone is interested....

        void CreateQueryFromElementsCore(IEntityFields2 fieldCollectionToFetch, IRelationPredicateBucket filterBucket,
                                         int maxNumberOfItemsToReturn,
                                         ISortExpression sortClauses, bool allowDuplicates, IGroupByCollection groupByClause,
                                         int pageNumber, int pageSize,
                                         out IFieldPersistenceInfo[] persistenceInfo, out IRetrievalQuery selectQuery)
        {
            var mi = typeof(DataAccessAdapterBase).GetMethod("CreateQueryFromElements", BindingFlags.NonPublic | BindingFlags.Instance | BindingFlags.Static);

            persistenceInfo = null;
            selectQuery = null;

            var parameters = new object[] {
                fieldCollectionToFetch,
                filterBucket,
                maxNumberOfItemsToReturn,
                sortClauses,
                allowDuplicates,
                groupByClause,
                pageNumber,
                pageSize,
                persistenceInfo, 
                selectQuery,
            };

            mi.Invoke(this, parameters);

            persistenceInfo = (IFieldPersistenceInfo[]) parameters[8];
            selectQuery = (IRetrievalQuery) parameters[9];
        }

        public int InsertMissingJoinTableEntitiesDirectly<T>(Predicate filter1, Predicate filter2) where T: IEntity2, new()
        {
            return InsertMissingJoinTableEntitiesDirectly(new T(), filter1, filter2);
        }

        public int InsertMissingJoinTableEntitiesDirectly(IEntity2 templateEntity, Predicate filter1, Predicate filter2)
        {
            if (templateEntity == null) throw new ArgumentNullException("templateEntity");
            if (filter1 == null && filter2 == null) throw new ArgumentException("At least one filter must be specified.");

            IFieldPersistenceInfo[] selectPersistenceInfo;
            IRetrievalQuery selectQuery;

            var relatedFields = new List<EntityField2>();
            var joinFields = new List<EntityField2>();
            var onClause = new PredicateExpression();

            // Parse the relations and build up the on clause and lists of PK fields, FK fields
            foreach(var relation in templateEntity.GetAllRelations())
            {
                if (relation.TypeOfRelation != RelationType.ManyToOne) continue;

                var pkField = (EntityField2) relation.GetPKEntityFieldCore(0);
                var fkField = (EntityField2) relation.GetFKEntityFieldCore(0);

                onClause.AddWithAnd(fkField == pkField);

                relatedFields.Add(pkField);
                joinFields.Add(fkField);
            }

            // Simple validation
            if (relatedFields.Count != 2 || relatedFields[0].ContainingObjectName == relatedFields[1].ContainingObjectName)
            {
                throw new InvalidOperationException("Must be exactly two FK relations each with a single field PK");
            }

            // Create a cross join relation between the non-join table tables
            var crossJoinRelation = new EntityRelation();
            crossJoinRelation.AddEntityFieldPair(relatedFields[0], relatedFields[1]);
            crossJoinRelation.HintForJoins = JoinHint.Cross;

            // Create a relation between the join table and the cross join
            // (we just use the first select field since we have to supply a pair)
            IEntityRelation joinTableRelation = new EntityRelation
                                                    {
                                                        CustomFilter = onClause,
                                                        CustomFilterReplacesOnClause = true,
                                              StartEntityIsPkSide = true,
                                                                                            HintForJoins = JoinHint.Left
                                                    };
            joinTableRelation.AddEntityFieldPair(relatedFields[0], templateEntity.PrimaryKeyFields[0]);

            // Build a RelationPredicateBucket with our joins and filter
            var filterBucket = new RelationPredicateBucket(((EntityField2) templateEntity.PrimaryKeyFields[0]) == DBNull.Value & filter1 & filter2);
            filterBucket.Relations.Add(crossJoinRelation);
            filterBucket.Relations.Add(joinTableRelation);


          // Build select list fields (must use aliases but doesn't matter what they are)
            var selectFields = new EntityFields2(relatedFields.Count);
            for(var i = 0; i < relatedFields.Count; i++)
            {
                selectFields.DefineField(relatedFields[i], i, "f" + i);
            }

            // Create the select query
            CreateQueryFromElementsCore(selectFields, filterBucket, 0, null, true, null, 0, 0, out selectPersistenceInfo, out selectQuery);

            // Since the DynamicQueryEnginer doesn't support INSERT INTO SELECT
            // we build the query directly here
            var creator = CreateDynamicQueryEngine().Creator;
            var joinFieldPersistenceInfos = GetFieldPersistenceInfos(templateEntity);

            var queryText = new StringBuilder(DynamicQueryEngineBase.InsertQueryBufferLength);
            queryText.AppendFormat("INSERT INTO {0} (", creator.CreateObjectName(joinFieldPersistenceInfos[0]));

            for(var i = 0; i < joinFields.Count; i++)
            {
                IEntityFieldCore field = joinFields[i];
                var joinFieldPersistenceInfo = joinFieldPersistenceInfos[field.FieldIndex];

                if (i > 0) queryText.Append(", ");

                queryText.AppendFormat("{0}", creator.CreateFieldNameSimple(joinFieldPersistenceInfo, field.Name));
            }

            queryText.Append(")\r\n");
            queryText.Append(selectQuery.Command.CommandText);

            selectQuery.Command.CommandText = queryText.ToString();

            return ExecuteActionQuery(new ActionQuery(selectQuery.Connection, selectQuery.Command));
        }

Walaa avatar
Walaa
Support Team
Posts: 14946
Joined: 21-Aug-2005
# Posted on: 17-Feb-2009 12:04:59   

Thank you very much for the feedback.

psandler
User
Posts: 540
Joined: 22-Feb-2005
# Posted on: 18-Feb-2009 02:54:17   

simmotech wrote:

Had a go at this and performance is now _very _acceptable.

Here is the code I added to DataAccessAdapter if anyone is interested....

This looks very interesting. Can you post an example of what the calling code looks like? If this were flexible enough to construct dynamically, I would definitely use this on a regular basis.

Currently we have areas where we need to do many inserts from one table to another (staging to "real" tables) and we pull everything into memory and then use ADO Bulk. It's MUCH faster than sending 1000's of insert statements, but I'd have to think this would be faster still.

Thanks for posting this.

Phil

simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 18-Feb-2009 07:14:45   

psandler wrote:

simmotech wrote:

Had a go at this and performance is now _very _acceptable.

Here is the code I added to DataAccessAdapter if anyone is interested....

This looks very interesting. Can you post an example of what the calling code looks like? If this were flexible enough to construct dynamically, I would definitely use this on a regular basis.

Currently we have areas where we need to do many inserts from one table to another (staging to "real" tables) and we pull everything into memory and then use ADO Bulk. It's MUCH faster than sending 1000's of insert statements, but I'd have to think this would be faster still.

Thanks for posting this.

Phil

Well, my calling code is very simple, I am only using a simple join table (is that the correct terminology?) and I don't need to get back the Inserted rows.

        public void MarkAllRead(int[] documentIDs, string userID)
        {
            if (userID == null) userID = Principal.UserName;

            using(var tsw = new TransactionScopeWrapper("MarkAllRead"))
            {
                tsw.DataAccessAdapter.InsertMissingJoinTableEntitiesDirectly<DocumentReaderEntity>(DocumentFields.ID == documentIDs, UserFields.ID == userID);

                tsw.Complete();
            }
        }

You can ignore the TransactionScopeWrapper stuff - it just ensure that a DataAccessAdapter is available and any transactions are shared with our old O/R mapper - effectively it is a one-liner.

Actually, I've just discovered that SQL server also has an OUTPUT clause on the INSERT INTO command which would allow all of the newly inserted row PKs (or any set of columns) to be returned. Not sure if other databases support this though.

If I get time, I may split the method into two - one part for auto-generating fields etc. for join tables and a second to create the INSERT INTO SELECT query.

Cheers Simon

simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 18-Feb-2009 07:19:11   

I like the idea of using a 'template' entity for the UpdateDirectly and now InsertDirectly stuff to provide constant values.

One issue I have though is with dates. I would to be able to set a date property on a template entity, say DateTime.MinValue which would be invalid in SQL anyway, and get the server-specific code to automatically substitute GETUTCDATE() in its place.

I don't mind having a look into this but if someone can tell me that it definitely wouldn't be possible then it will save me some time.

Cheers Simon

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39590
Joined: 17-Aug-2003
# Posted on: 18-Feb-2009 11:02:34   

Thanks for the code, Simon. simple_smile . I see you need reflection to reach CreateQueryFromElements, which is indeed private. I'm not really sure if making it protected is good, as I then can't refactor the method easily enough.

About your last question, I think that's not directly related to this thread? I think the main issue is that INSERT queries don't obey expressions on fields. You could, but it depends on if you want to do this or not, use an insert trigger and manipulate a 'magic value' (e.g. 1/1/1790, which is I think the lowest date sqlserver can go back to) and insert the value you want. ). Of course, a default constraint is also possible.

Though, as you insert the date already through code, why not set the UTC date in the entity to insert?

Frans Bouma | Lead developer LLBLGen Pro
simmotech
User
Posts: 1024
Joined: 01-Feb-2006
# Posted on: 18-Feb-2009 12:09:59   

Otis wrote:

Thanks for the code, Simon. simple_smile . I see you need reflection to reach CreateQueryFromElements, which is indeed private. I'm not really sure if making it protected is good, as I then can't refactor the method easily enough.

About your last question, I think that's not directly related to this thread? I think the main issue is that INSERT queries don't obey expressions on fields. You could, but it depends on if you want to do this or not, use an insert trigger and manipulate a 'magic value' (e.g. 1/1/1790, which is I think the lowest date sqlserver can go back to) and insert the value you want. ). Of course, a default constraint is also possible.

Though, as you insert the date already through code, why not set the UTC date in the entity to insert?

Answering backwards... The problem is using the server current time or current utc time. Queries are OK as there is the DBFunction solution but for a 'template' entity, there is only a DateTime property to set.

In theory the UTC date should be the same on the client as on the server but typically they will be out by up to 5 minutes. but it was a general thing as I was also going to implement that, say, DateTime.MinValue gets translated to GETDATE() and DateTime.MaxValue gets translated to GETUTCDATE() (for Sql Server - I assume that the other SQL providers have an equivalent).

Default constraints work for the INSERTS but not UPDATES.

Triggers are overkill and would require setting up for every DateField.

I was hoping there was a place in the SQL Dynamic Query Engine where it translates Fields to SQL Parameters to check for a DateTime Field which had a value of one of the Magic Values and then not emit a parameter but instead directly use the Magic Value Replacement text.

Cheers Simon

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39590
Joined: 17-Aug-2003
# Posted on: 19-Feb-2009 11:25:22   

Inserts use the same parameter creation routine as all other methods (see line 509 in Sqlserver DQE). There you could check for a magic value, but of course it will be used in all situations.

Frans Bouma | Lead developer LLBLGen Pro
Posts: 11
Joined: 04-Nov-2013
# Posted on: 07-Nov-2013 18:12:57   

Hello,

Is there a way to use this method for INSERT INTO .. SELECT with LLBLgen 4.0. I have test the previous code but it doesn't work. The "CreateQueryFromElements" has probably changed.

Thanks,

Martin

Walaa avatar
Walaa
Support Team
Posts: 14946
Joined: 21-Aug-2005
# Posted on: 08-Nov-2013 05:06:51   

Hi Martin,

Please create a new thread, as per the forum guidelines.