Sharing Some Duplicate Check Code

Posts   
 
    
psandler
User
Posts: 540
Joined: 22-Feb-2005
# Posted on: 11-Oct-2008 06:09:19   

All,

Wrote this for duplicate checking in the application I am currently working on, and thought I would post it. This is for the purpose of sharing, but also to get thoughts on whether this should work in all cases.

I'm still looking at how I want to wire this into my validation routine. One thought was that CommonEntityBase would have a virtual method with a return type of List<List<IEntityField2>>. The base would just return null, and any entity that required duplicate checking during validation would override the method and return the combinations of fields the would constitute a duplicate. Since an entity could have more than one possible combination that would be considered a duplicate, more than one List<IEntityField2> could be created/returned.

An option to make this easier might be code generation, where any entity with a composite key and/or a unique constraint could get this overriding method added in a partial class automatically. (I say "easier", though I have never used the code generation engine for LLBL, so I have no idea what the level of effort would be).

Thoughts?

Main code:


        /// <summary>
        /// Validates that there are no duplicates of the entity being passed in.
        /// </summary>
        /// <param name="entityToCheck">The entity whose fields should be checked for duplicates.</param>
        /// <param name="fieldsToCheck">The list of fields whose combination constitues a duplicate.</param>
        /// <returns>False if duplicates are found, True if they are not.</returns>
        public static bool ValidateNoDuplicates(IEntity2 entityToCheck, IList<IEntityField2> fieldsToCheck)
        {
            if (fieldsToCheck == null)
            {
                throw new ArgumentNullException("fieldsToCheck");
            }

            bool returnValue = false;

            PredicateExpression pred = new PredicateExpression();

            //add each field from the list
            foreach (IEntityField2 field in fieldsToCheck)
            {
                object value = entityToCheck.Fields[field.Name].CurrentValue;
                pred.Add(new FieldCompareValuePredicate(field, null, ComparisonOperator.Equal, value));

                //filter out the entity that was passed in, only if it's not new--
                //this is because if a modified entity is saved, it's values will cause the count to be one
                if (!entityToCheck.IsNew)
                {
                    //use the DBValue, not the currentValue
                    pred.Add(new FieldCompareValuePredicate(field, null, ComparisonOperator.NotEqual, 
                        entityToCheck.Fields[field.Name].DbValue));
                }
            }

            //field to count.  Can really be any field, but we need to CLONE it so it doesn't change
            //the actual entity, which can screw up a refetch
            IEntityField2 fieldToCount = (IEntityField2)entityToCheck.Fields[0].Clone();            
            fieldToCount.AggregateFunctionToApply = AggregateFunction.Count;
            
            object result = DAOScalarUtility.GetScalar(fieldToCount, pred, null);
                        
            if (GetIntFromObject(result) == 0)
            {
                returnValue = true;
            }
            return returnValue;
        }

My GetScalar wrapper:


        public static object GetScalar(IEntityField2 fieldToCalculate, IPredicateExpression pred, 
            IRelationCollection relation)
        {
            object returnValue;
            using (FlexFetchAdapter adapter = new FlexFetchAdapter())
            {
                ResultsetFields fields = new ResultsetFields(1);
                fields.DefineField(fieldToCalculate, 0);
                returnValue = adapter.GetScalar(fields, pred, null, relation);
            }

            return returnValue;
        }       
    

And a helper function:


        private static int GetIntFromObject(object value)
        {
            int returnValue;

            if (value != DBNull.Value)
            {
                returnValue = (int)value;               
            }
            else
            {
                returnValue = 0;
            }
            return returnValue;
        }

daelmo avatar
daelmo
Support Team
Posts: 8245
Joined: 28-Nov-2005
# Posted on: 11-Oct-2008 07:01:20   

Hi psandler,

Thanks for contribution!

In my own point of view, I would do these changes:

  1. Method name more descriptive. For example, we don't know if the method check in an existing collection or at DB.

  2. Don't include the check for DBValue. I would want to fetch an entity (IsNew = false), do some modifications and check whether some filter would match to some records on DB.

  3. Use adapter.GetDbCount (less code for the same purpose).

So, the code would look like:

static bool EntityAlreadyExistInDB(IEntity2 entityToCheck, IEntityFields2 fieldsToCheck)
{
    if (entityToCheck == null)
    {
        // exception
    }

    // to return
    bool alreadyExists = false;

    // build the predicate
    IRelationPredicateBucket filter = new RelationPredicateBucket();
    foreach (IEntityField2 field in fieldsToCheck)
    {
        filter.PredicateExpression.Add(new FieldCompareValuePredicate(field, null, 
             ComparisonOperator.Equal, entityToCheck.Fields[field.Name].CurrentValue);
    }

    // get the count
    using (DataAccessAdapter adapter = new DataAccessAdapter())
    {
        int count = adapter.GetDbCount(fieldsToCheck, filter);          
        alreadyExists = (count > 0);
    }

    // return result
    return alreadyExists;
}

What do you think?

David Elizondo | LLBLGen Support Team
psandler
User
Posts: 540
Joined: 22-Feb-2005
# Posted on: 11-Oct-2008 22:19:52   

Hey David,

Thanks for your input.

daelmo wrote:

  1. Method name more descriptive. For example, we don't know if the method check in an existing collection or at DB.

Heh, I agree. I've changed the method name about 5 times already. Your suggestion is not bad, but I want to return "true" if the query does not show duplicates, "false" if it does. This is simply because it will be used for validation (so true should indicate that it's valid).

(As an aside, don't you hate when you can't come up with a good method name? I'd love to know how many hours I have wasted in my life on this one task. simple_smile )

daelmo wrote:

  1. Don't include the check for DBValue. I would want to fetch an entity (IsNew = false), do some modifications and check whether some filter would match to some records on DB.

This is actually necessary, otherwise tests like this will fail:


        /// <summary>
        /// Test that an updated Primary Key will pass validation.
        /// </summary>
        [Test]
        public void TestDuplicatePKUpdateSuccess()
        {
            //save an initial department 
            DepartmentEntity dept = new DepartmentEntity();
            dept.DepartmentId = "Department " + _random.Next().ToString();
            dept.SiteId = _mainSite.SiteId;
            dept.DepartmentName = "test dept.";
            _saveManager.SaveEntityOnly(dept);

            //now change the entity and save again.  It should not be seen as a dupe            
            dept.DepartmentName = dept.DepartmentName + "_update";

            //(this calls the validation routine)
            ValidationResult result = _saveManager.SaveEntityOnly(dept);

            Assert.IsTrue(result.IsSuccessful);   
        }

daelmo wrote:

  1. Use adapter.GetDbCount (less code for the same purpose).

Ah, good call. Done. I also liked a few other tweaks you made, so I put those in as well. I may post an updated version if I tweak further . . .

Thanks for the input!

Phil

psandler
User
Posts: 540
Joined: 22-Feb-2005
# Posted on: 13-Oct-2008 22:42:28   

Ok, just wanted to update in case anyone finds this in the future and tries to use it.

The original code did not cover the case described in the following test:

        /// <summary>
        /// Tests that modifying an existing entity to match another existing entity's values will fail.
        /// </summary>
        [Test]
        public void TestDuplicatePKUpdateToExistingValueFailure()
        {
            //save an initial department 
            DepartmentEntity deptExisting = new DepartmentEntity();
            deptExisting.DepartmentId = "Department " + _random.Next().ToString();
            deptExisting.SiteId = _mainSite.SiteId;
            deptExisting.DepartmentName = "test dept.";
            _saveManager.SaveEntityOnly(deptExisting);

            //save a second dept
            DepartmentEntity dept = new DepartmentEntity();
            dept.DepartmentId = "Department " + _random.Next().ToString();
            dept.SiteId = _mainSite.SiteId;
            dept.DepartmentName = "test dept.";
            _saveManager.SaveEntityOnly(dept);

            //now change the second department to match the values of the 1st           
            dept.DepartmentId = deptExisting.DepartmentId;
            dept.SiteId = deptExisting.SiteId;
            ValidationResult result = _saveManager.SaveEntityOnly(dept);

            Assert.IsFalse(result.IsValid);
            Assert.IsTrue(result.ValidationErrors.Contains(ValidationErrorType.DuplicateError));
        }

The problem was my predicate expressions that filtered out the values from the entity currently being checked. Obviously if you have "WHERE site_id == 1 AND site_id != 1", you will never get rows back. simple_smile

Here is the (hopefully) final version:

        /// <summary>
        /// Validates that there are no duplicates of the entity being passed in.
        /// </summary>
        /// <param name="entityToCheck">The entity whose fields should be checked for duplicates.</param>
        /// <param name="fieldsToCheck">The list of fields whose combination constitues a duplicate.</param>
        /// <returns>False if duplicates are found, True if they are not.</returns>
        public static bool ValidateNoDuplicates(IEntity2 entityToCheck, IList<IEntityField2> fieldsToCheck)
        {
            if (fieldsToCheck == null)
            {
                throw new ArgumentNullException("fieldsToCheck");
            }
            
            RelationPredicateBucket bucket = new RelationPredicateBucket();
            PredicateExpression existingRowPred = new PredicateExpression();

            //add each field from the list
            foreach (IEntityField2 field in fieldsToCheck)
            {
                object value = entityToCheck.Fields[field.Name].CurrentValue;
                bucket.PredicateExpression.Add(
                    new FieldCompareValuePredicate(field, null, ComparisonOperator.Equal, value));

                //filter out the entity that was passed in, only if it's not new--
                //this is because if a modified entity is saved, it's values will cause the count to be one
                if (!entityToCheck.IsNew)
                {
                    //use the DBValue, not the currentValue
                    existingRowPred.Add(new FieldCompareValuePredicate(field, null, ComparisonOperator.Equal,
                        entityToCheck.Fields[field.Name].DbValue));
                }
            }

            existingRowPred.Negate = true;
            bucket.PredicateExpression.Add(existingRowPred);

            //field to count.  Can really be any field
            IEntityField2 fieldToCount = (IEntityField2)entityToCheck.Fields[0];
            ResultsetFields fields = new ResultsetFields(1);
            fields.DefineField(fieldToCount, 0);
            int result = DAOScalarUtility.GetDbCount(fields, bucket, null, true);
            
            return (result < 1);            
        }
daelmo avatar
daelmo
Support Team
Posts: 8245
Joined: 28-Nov-2005
# Posted on: 14-Oct-2008 08:43:34   

Great! Thanks for sharing simple_smile

David Elizondo | LLBLGen Support Team
clint
User
Posts: 145
Joined: 15-Nov-2005
# Posted on: 13-Oct-2014 20:51:53   

Thanks psandler for the code.
I have used your design for a couple years. But I think I might tweak the design because it currently requires you to explicitly set values for entity fields you are going to compare which is kind of annoying if you just want to use the default values that are assigned when you construct an entity.

Here is an example of the issue.

I have a Person table: NamePrefix (NOT NULL, default value is empty string) FirstName (NOT NULL, default value is empty string) MiddleName (NOT NULLdefault value is empty string) LastName (NOT NULL) NameSuffix (NOT NULL, default value is empty string)

I have a corresponding PersonEntity class with these fields: NamePrefix FirstName
MiddleName LastName
NameSuffix

I make a new Person entity.


PersonEntity person = new Person();

At this point all the string fields have been initialized to empty string: person.NamePrefix == "" person.FirstName == "" person.MiddleName == "" person.LastName == "" person.NameSuffix == ""

Now I explicitly set some fields:


person.FirstName = "JOHN";
person.LastName = "SMITH";

I want to prevent duplicate persons from being added to the database. My definition of duplicate in this case is two records with the same NamePrefix, FirstName, MiddleName, LastName, and NameSuffix. To check for duplicates, I call your ValidateNoDuplicates() function where I pass in this list of fields: NamePrefix, FirstName, MiddleName, LastName, and NameSuffix

Since your function is building a predicate based on the CurrentValues of items in the Fields collection, and CurrentValue is null unless I explicitly set the field, I end up with a predicate like this:


NamePrefix IS NULL AND FirstName = "JOHN" AND MiddleName IS NULL AND LastName = "SMITH" AND NameSuffix IS NULL

In my situation, comparing these fields to NULL is useless because they will never be NULL due to how I designed the Person table in the database.

To get around the problem, I have to explicitly set every field, which is kind of annoying.


person.NamePrefix = "";
person.FirstName  = "JOHN";
person.MiddleName = "";
person.LastName   = "SMITH";
person.NameSuffix = "";

My co-worker ran into this "NULL" problem when he was using a form that had PersonEntity fields databound to GUI controls. If a user didn't fill in a field on the form, then that entity field would never get set, so the CurrentValue for that field would be null.

So I think I might tweak your design so that instead of passing in just a list of fields, I will pass in Dictionary of fields and their values to check. The values in this case would come from person.NamePrefix, person.FirstName, etc. NOT from person.Fields["NamePrefix"].CurrentValue, person.Fields["FirstName"].CurrentValue.

daelmo avatar
daelmo
Support Team
Posts: 8245
Joined: 28-Nov-2005
# Posted on: 14-Oct-2014 08:36:29   

Interesting. Although What is a duplicate? is an important question here. If someone doesn't set some field, and he pass that to the fieldsToCheck collection, that could mean that he wants to check for nulls in the duplication check. If he doesn't want to, maybe he should just don't pass that field. So, a possible answer, given the contribution code above, could be: If you want to compare some field, set the corresponding value (some value, null, or empty string) and add it to the collection.

Your workaround do the trick as well. You also could make an additional check inside the routine to check for nulls or default values, like:

if (fieldToCheck.CurrentValue == null)
{
     object defaultValue = .... // get the default value for this type of field
     bucket.Add( fieldToCheck == null || fieldToCheck == defaultValue)
}
David Elizondo | LLBLGen Support Team