LLBLGen Pro - generated code - Resultset Caching

Generated code - Resultset Caching

Preface

The LLBLGen Pro Runtime Framework supports a first-level cache for resultsets of queries. In short, this means developers have a way to cache resultsets of queries for a period of time. The resultsets are cached as sets of object arrays with the values directly from the datareader. Caching resultsets is implemented around IRetrievalQuery and an easy to use API is available through Linq to LLBLGen Pro and QuerySpec. Users who want to use the low-level API instead of Linq or QuerySpec, can use the new overloads which accept a QueryParameters object to set the cache parameters for a query. Below are the specifics for the caching system and how to use it

What is resultset caching and when to use it

Resultset caching is a mechanism which caches the actual resultset used during the execution of a select query for a given duration. This means that if a given Linq / QuerySpec query is specified to have its resultset cached for e.g. 10 seconds, every execution of the exact same query with the same parameters will pull the results from the cache for the 10 seconds after the query was first executed on the database. The caching of the resultset is done at the lowest level, at the IRetrievalQuery object, and only the object[] arrays obtained from the Data Reader are cached, no entities or other materialized objects.

Caching can lead to better performance in situations where a query is taking a relatively long time in the database and/or is executed a lot of times: by automatically re-using already read rows from the cache instead of from the database, a round trip + query execution is omitted. This is completely transparent and automatic: the only thing the developer has to do is to specify with a query whether it should cache its resultset and for how long.

The caching system is opt-in per query, so by default, all queries are executed on the database, every time, and no caching takes place. This is ideal because caching resultsets means that the resultset of a query which is defined to cache its results might pull its resultset from the cache instead of the database and therefore it might be the data is out-of-date with the data in the database: the cache data is 'stale' data, but for many situations this is ok for a short period of time. The situations for which it's ok to have stale data for a short period of time, it's OK to cache the results. For example, if your website's frontpage's data is changed once a day, it's OK to cache the data, as subsequential fetches of a query Q will likely result in the same data.

Resultset caching is done through cache objects which implement the caching of resultsets, either by delegating it to 3rd party servers like the .NET system cache or e.g. Memcached, or by doing the caching themselves like the built-in ResultsetCache class. It's an easy to use way to tweak performance at the query level. As with all performance tweaks: use them in situations where they're needed. Enabling caching by default is therefore not recommended nor implemented: unforeseen usage of stale data is unavoidable that way and caching might consume more memory than anticipated.

Safe re-use of data

The cached data is re-used as long as the cached set is valid and its duration hasn't expired. This is safe for values of all types except byte[] values. As all values are value types (strings behave as value types as well) except byte[] values, they're copied by the .NET framework when the row is copied from the cached resultset into its final destination (entity, datatable etc.). Byte[] values are different: each instance which gets a cached row as its set of values will share the same byte[] array, for performance and memory reasons. This isn't a problem in most cases, as byte[] values are used in most cases as a single value: they're replaced by a different byte[] object if they have to change.

It's only a problem when a byte[] array's contents is changed, e.g. the byte at index 10 is changed: in that case all instances which re-use the cached row will see this change. It's not necessarily a bad thing: the set is after all the same one returned by the same query, however it might be that this behavior is unwanted. To overcome this, make sure you first copy the byte[] before manipulating it. The framework doesn't copy the byte[] by default because it would mean a performance hit while it's likely to be unnecessary.

For setting entity field values, the following rule is in effect: as cached rows are copied when they're used, setting an entity field's value to a different value will only affect the values in that particular entity class instance, not the entity class instances which have received the same row from the cache, as the value inside the entity will be replaced/overwritten. Manipulating a field's value if it's a byte[] by replacing bytes at indexes inside the array will affect all instances which share that value, as byte[] arrays are copied by reference, not by value, so all entities share a reference to the same byte[] array. Setting a field of type byte[] to a different byte[] value will overwrite the reference for that field of that entity to the new byte[], the same as with all other values. So unless you're doing byte manipulation inside a byte[] array value of an entity field and work with cached resultsets, you don't have to worry about it. If you do manipulate bytes in a byte[] array value of an entity field, it's best not to cache the resultset, unless it's OK that all instances which will re-use the cached resultset will see the manipulated bytes.

Registering a cache: the CacheController

Before caching can take place, at least one cache has to be registered with the CacheController. The CacheController is a static class which keeps track of the registered caches and is consulted by the query object whether a cached set of rows is available for that query. It's key a cache is registered before a query which is specified to have its resultset cached.

Caches are implementations of a simple interface, IResultsetCache. A default implementation is provided in the runtime framework called ResultsetCache. The default implementation doesn't have any memory restrictions, though it does provide a thread-safe in-process way to cache resultsets without external dependencies. The interface allows support for 3rd party caches like the .NET system cache introduced in .NET 4 or for example MemCached or Redis. As the .NET system cache is .net 4 or higher and the runtime is .net 3.5 or higher, we didn't include an implementation for MemoryCache, the default .NET cache class, in the runtime.

Example of registering an instance of the ResultsetCache:

CacheController.RegisterCache(connectionString, new ResultsetCache());

The ResultsetCache class

The ResultsetCache class is a default caching object: it works in-process, caches resultsets under a key and allows auto-purging of resultsets which expiration date/time has passed. It doesn't monitor memory usage and therefore shouldn't be used when memory pressure will be a problem (e.g. a lot of objects are cached and removal based on memory pressure is required).

Cache per connection string or one cache for all

Registering a cache with the CacheController is simple: call CacheController.RegisterCache as described above and specify the connection string and the IResultsetCache instance which has to be registered. This ties the cache object specified to the connection string specified and which makes sure that resultsets from different databases are cached separately for the same query. It's also possible to specify the empty string as connection string, which means all sets are cached in the same cache. This is not a problem if there's just 1 connection string used.

CacheController also offers a way to obtain a reference to a registered cache object and to manipulate the cache and cached results directly through its public interface. Most of the time the only method needed however is the RegisterCache method. RegisterCache should be called before caching is going to take place, so it's ideal if the caches are registered at application startup.

Catalog / Schema name overwriting and caching

If catalog/schema name overwriting are used, it's not a problem: the key under which the sets are stored in the cache is based on the SQL statement. Overwriting a catalog/schema name will result in a different SQL statement and therefore the query will be cached under a different key.

Fetching DataReaders for projections and caching resultsets

When fetching DataReaders directly, using the low-level API, and then passing the fetched, open datareader to a projection method to produce instances from the rows offered by the DataReader will not re-use cached resultsets if the overload is used which doesn't accept the IRetrievalQuery used to open the DataReader. This is the case in all code written against v3.5 or earlier. If you want to cache the resultset offered by the reader during the projection of the resultset, be sure to use the overload which accepts the IRetrievalQuery object and make sure the IRetrievalQuery object's CacheResultset and CacheDuration properties are set.

The CacheKey

Resultsets are cached in a cache under a key, a CacheKey instance. A CacheKey is created from an IRetrievalQuery object which provides the values for the key.

A CacheKey uses:

The SQL statement
All parameter values before execution
Flags for in-memory distinct/limiting/paging
Values for rows to skip/take in in-memory paging scenarios

A hashcode is calculated from these values and an Equals implementation makes sure two CacheKey instances are only considered equal if all of these values are indeed equal. This makes sure that two queries which differ only in e.g. parameter values will not return each other's resultsets from the cache. Two pieces of code which are fetching data differently (e.g. a normal entity collection fetch and a projection using all entity fields onto a custom class) but result in the same SQL query and parameters will re-use the same set from the cache, if both queries are defined to be cachable.

CacheKey and 3rd party caches

Third-party caches like Memcached and the .NET system cache require short keys for the values they cache, or a key which is e.g. of type string. To overcome this, an IResultsetCache implementation for a 3rd party cache system should create mappings between a CacheKey object and a value which is used instead of CacheKey as the real key. A way to do this is by using Guid values for the real key value and store these in a Dictionary inside the IResultsetCache implementation. Guids are nice ways to provide unique values across the board and also make sure the resultset is cachable.

Prefetch Paths / Nested queries in projections and caching

Queries, formulated through Linq or QuerySpec, which have a prefetch path defined or have nested queries in the projection, and which are specified to have their resultsets cached, will propagate the cache directive to all queries executed with the main query, so queries with a prefetch path will also be cached at the same time for the same amount of time. This is true for nested queries in a main query as well: they too will inherit the cache specifics of the root query.

The main reason this is done is that the complete graph is fetched at the same time, and as child nodes rely on the parent rows for filtering (child sets are filtered based on the parent nodes present), it's better to keep the whole chain on the same cache strategy and duration: that way child nodes are relying on root nodes which are from the same 'snapshot' of data.

Prefetch paths fetched with a single entity as root

As the query to fetch a single entity is not a query which results in a set, there's no method to specify caching, so the prefetch path fetched with a single entity is never pulled from the cache, it's always fetched from the DB.

Database connections and caching

Caching occurs at a low level, and is directed by the IRetrievalQuery instance. This means that it isn't in control over the connection it is associated with. That connection might very well be open or closed, it doesn't control that. Logic which opens/closes the connection doesn't know when/if caching occurs, so it can't prevent opening the connection as well. This means that when a set is read from the cache, the connection is already opened, and will stay open till the set from the cache is consumed.

It's OK to have this behavior, because the IRetrievalQuery instance can now decide inside itself whether to proceed with the cached set or to fetch the set from the DB, without the requirement of controlling the connection state: if a set is not available in the cache, e.g. it's expired, the IRetrievalQuery must fetch the set again from the DB. Because the connection state is controlled outside the IRetrievalQuery object, it can simply proceed as it would when the query wasn't a cachable query.

Specifying caching behavior for a query

LLBLGen Pro has three ways to specify a query: Linq, QuerySpec and the Low-level API. We primarily added caching specification code to Linq and QuerySpec, as those two query APIs are the recommended way to formulate queries. Specifying caching with these three ways of querying is described below.

Low-level API

If you want to use the Low-level API, you can specify caching behavior, however you have to call the methods which accept a QueryParameters object. Using these methods, instead of calling the regular methods, you have to produce these QueryParameters objects, set the caching specification in that object and then call the overload which accepts the QueryParameters object. See the example below.

// v3.5 code
var customers = new EntityCollection<CustomerEntity>();
using(var adapter = new DataAccessAdapter())
{
    var filter = new RelationPredicateBucket(CustomerFields.Country=="USA");
    adapter.FetchEntityCollection(customers, filter);
}

// Same query with caching
var customers = new EntityCollection<CustomerEntity>();
using(var adapter = new DataAccessAdapter())
{
    var parameters = new QueryParameters()
        {
                 CollectionToFetch = customers,
                 FilterToUse = CustomerFields.Country=="USA", 
                 CacheResultset = true,
                 CacheDuration = new TimeSpan(0, 0, 10)  // cache for 10 seconds
        };
    adapter.FetchEntityCollection(parameters);
}

QuerySpec

As a higher level query API on top of the low-level API, QuerySpec has an easier way to specify caching for a query: .CacheResultset(int) and .CacheResultset(timespan). You can append this method to any QuerySpec query, be it a DynamicQuery, DynamicQuery<T> or EntityQuery and the query is automatically marked as a query to cache its resultset for the duration specified and will re-use a cached resultset if present. CacheResultset() should be called at the outside of the query.

There are two overloads, one takes a number as the number of seconds to cache the resultset, the other takes a Timespan as duration. See the examples below

// typed list fetch 
var tl = new OrderCustomerTypedList();
var q = tl.GetQuerySpecQuery(new QueryFactory())
                 .CacheResultset(10);  // cache for 10 seconds
new DataAccessAdapter().FetchAsDataTable(q, tl);

// typed view fetch
var qf = new QueryFactory();
var q = qf.Create()
            .Select(InvoicesFields.CustomerId, InvoicesFields.CustomerName, InvoicesFields.OrderId)
            .Where(InvoicesFields.Country.StartsWith("U"))
            .CacheResultset(10);        // cache for 10 seconds
var tv = new InvoicesTypedView();
new DataAccessAdapter().FetchAsDataTable(q, tv);

// dynamic list / custom projection
var qf = new QueryFactory();
var q = qf.Employee
        .From(QueryTarget.InnerJoin(qf.Order)
                   .On(EmployeeFields.EmployeeId == OrderFields.EmployeeId))
        .OrderBy(EmployeeFields.EmployeeId.Ascending()).Offset(1).Distinct()
        .Select(() => new
            {
                EmployeeId = EmployeeFields.EmployeeId.ToValue<int>(),
                Notes = EmployeeFields.Notes.ToValue<string>()
            })
        .CacheResultset(5);    // cache for 5 seconds
var results = new DataAccessAdapter().FetchQuery(q);

// custom projection with nested set
var qf = new QueryFactory();
var q = qf.Create()
        .Select(() => new
          {
                Key = CustomerFields.Country.ToValue<string>(),
                CustomersInCountry = qf.Customer.TargetAs("C")
                                         .CorrelatedOver(CustomerFields.Country.Source("C") ==
                                                         CustomerFields.Country)
                                         .ToResultset()
          })
        .GroupBy(CustomerFields.Country)
        .CacheResultset(10);     // cache all for 10 seconds
var results = new DataAccessAdapter().FetchQuery(q);

// entity fetch
var qf = new QueryFactory();
var q = qf.Customer
            .Where(CustomerFields.Country == "Germany")
                 .WithPath(CustomerEntity.PrefetchPathOrders
                               .WithSubPath(OrderEntity.PrefetchPathEmployee),
                                      CustomerEntity.PrefetchPathEmployeeCollectionViaOrder)
            .CacheResultset(10);    // cache all for 10 seconds
var customers = new DataAccessAdapter().FetchQuery(q);

Linq to LLBLGen Pro

Linq to LLBLGen Pro follows the similar pattern as QuerySpec in this regards, use the .CacheResultset(int) or .CacheResultset(timespan) extension methods on an IQueryable to specify the caching behavior of the query. As with QuerySpec, specify the CacheResultset call at the outside of the query, it's not migrated to the outer query if you specify it inside a query wrapped in another query.

Linq allows queries to be fetched as entities or projections. This means that caching resultsets for typedviews and typedlists require the usage of QuerySpec. You can write QuerySpec queries together with Linq queries in the same application, so this should be no problem.

Examples:

// custom projection, with nested set
var metaData = new LinqMetaData(new DataAccessAdapter());
var q = (from c in metaData.Customer
          group c by c.Country into g
          select new
          {
             g.Key,
             CustomersInCountry = (
                  from c2 in metaData.Customer
                  where g.Key == c2.Country
                  select c2)
          })
         .CacheResultset(10);    // cache all for 10 seconds

// entity fetch with prefetch path
LinqMetaData metaData = new LinqMetaData(adapter);
var q = (from c in metaData.Customer
          where c.Country == "Germany"
          select c).WithPath<CustomerEntity>(cpath => cpath
                 .Prefetch<OrderEntity>(c => c.Orders)
                     .SubPath(opath => opath
                         .Prefetch(o => o.OrderDetails)
                         .Prefetch<EmployeeEntity>(o => o.Employee).Exclude(e => e.Photo, e => e.Notes)))
          .CacheResultset(10);    // cache all for 10 seconds