Resultset Caching

The LLBLGen Pro Runtime Framework supports a first-level cache for resultsets of queries. In short, this means developers have a way to cache resultsets of queries for a period of time. The resultsets are cached as sets of object arrays with the values directly from the datareader.

Caching resultsets is implemented around IRetrievalQuery and an easy to use API is available through Linq to LLBLGen Pro and QuerySpec. Users who want to use the low-level API instead of Linq or QuerySpec, can use the new overloads which accept a QueryParameters object to set the cache parameters for a query. Below are the specifics for the caching system and how to use it

What is it and when to use it

Resultset caching is a mechanism which caches the actual resultset used during the execution of a select query for a given duration. This means that if a given Linq / QuerySpec query is specified to have its resultset cached for e.g. 10 seconds, every execution of the exact same query with the same parameters will pull the results from the cache for the 10 seconds after the query was first executed on the database.

The caching of the resultset is done at the lowest level, at the IRetrievalQuery object, and only the object[] arrays obtained from the Data Reader are cached, no entities or other materialized objects.

Caching can lead to better performance in situations where a query is taking a relatively long time in the database and/or is executed a lot of times: by automatically re-using already read rows from the cache instead of from the database, a round trip + query execution is omitted. This is completely transparent and automatic: the only thing the developer has to do is to specify with a query whether it should cache its resultset and for how long.

The caching system is opt-in per query, so by default, all queries are executed on the database, every time, and no caching takes place. This is ideal because caching resultsets means that the resultset of a query which is defined to cache its results might pull its resultset from the cache instead of the database and therefore it might be the data is out-of-date with the data in the database: the cache data is 'stale' data, but for many situations this is ok for a short period of time.

The situations for which it's ok to have stale data for a short period of time, it's OK to cache the results. For example, if your website's frontpage's data is changed once a day, it's OK to cache the data, as subsequential fetches of a query Q will likely result in the same data.

Resultset caching is done through cache objects which implement the caching of resultsets, either by delegating it to 3rd party servers like the .NET system cache or e.g. Memcached, or by doing the caching themselves like the built-in ResultsetCache class.

It's an easy to use way to tweak performance at the query level. As with all performance tweaks: use them in situations where they're needed. Enabling caching by default is therefore not recommended nor implemented: unforeseen usage of stale data is unavoidable that way and caching might consume more memory than anticipated.

Safe re-use of data

The cached data is re-used as long as the cached set is valid and its duration hasn't expired. This is safe for values of all types except byte[] values. As all values are value types (strings behave as value types as well) except byte[] values, they're copied by the .NET framework when the row is copied from the cached resultset into its final destination (entity, datatable etc.).

Byte[] values are different: each instance which gets a cached row as its set of values will share the same byte[] array, for performance and memory reasons. This isn't a problem in most cases, as byte[] values are used in most cases as a single value: they're replaced by a different byte[] object if they have to change.

It's only a problem when a byte[] array's contents is changed, e.g. the byte at index 10 is changed: in that case all instances which re-use the cached row will see this change. It's not necessarily a bad thing: the set is after all the same one returned by the same query, however it might be that this behavior is unwanted.

To overcome this, make sure you first copy the byte[] before manipulating it. The framework doesn't copy the byte[] by default because it would mean a performance hit while it's likely to be unnecessary.

For setting entity field values, the following rule is in effect: as cached rows are copied when they're used, setting an entity field's value to a different value will only affect the values in that particular entity class instance, not the entity class instances which have received the same row from the cache, as the value inside the entity will be replaced/overwritten.

Manipulating a field's value if it's a byte[] by replacing bytes at indexes inside the array will affect all instances which share that value, as byte[] arrays are copied by reference, not by value, so all entities share a reference to the same byte[] array.

Setting a field of type byte[] to a different byte[] value will overwrite the reference for that field of that entity to the new byte[], the same as with all other values. So unless you're doing byte manipulation inside a byte[] array value of an entity field and work with cached resultsets, you don't have to worry about it.

If you do manipulate bytes in a byte[] array value of an entity field, it's best not to cache the resultset, unless it's OK that all instances which will re-use the cached resultset will see the manipulated bytes.

Registering a cache: the CacheController

Before caching can take place, at least one cache has to be registered with the CacheController. The CacheController is a static class which keeps track of the registered caches and is consulted by the query object whether a cached set of rows is available for that query. It's key a cache is registered before a query which is specified to have its resultset cached.

Caches are implementations of a simple interface, IResultsetCache. A default implementation is provided in the runtime framework called ResultsetCache. The default implementation doesn't have any memory restrictions, though it does provide a thread-safe in-process way to cache resultsets without external dependencies.

The interface allows support for 3rd party caches like the .NET system cache introduced in .NET 4 or for example MemCached or Redis. As the .NET system cache is .net 4 or higher and the runtime is .net 3.5 or higher, we didn't include an implementation for MemoryCache, the default .NET cache class, in the runtime.

Example of registering an instance of the ResultsetCache:

CacheController.RegisterCache(connectionString, new ResultsetCache()); 

The ResultsetCache class

The ResultsetCache class is a default caching object: it works in-process, caches resultsets under a key and allows auto-purging of resultsets which expiration date/time has passed. It doesn't monitor memory usage and therefore shouldn't be used when memory pressure will be a problem (e.g. a lot of objects are cached and removal based on memory pressure is required).

Creating your own cache class

If the provided ResultsetCache class isn't suitable for your needs, e.g. because you want to cache in e.g. Redis or using the .NET 4+ MemoryCached class, you can create your own. We provided an example how to do that in the LLBLGen Pro contrib library at GitHub, which implements a cache class utilizing the MemoryCached cache provided by .NET 4.0+.

Cache per connection string or one cache for all

Registering a cache with the CacheController is simple: call CacheController.RegisterCache as described above and specify the connection string and the IResultsetCache instance which has to be registered. This ties the cache object specified to the connection string specified and which makes sure that resultsets from different databases are cached separately for the same query.

It's also possible to specify the empty string as connection string, which means all sets are cached in the same cache. This is not a problem if there's just 1 connection string used.

CacheController also offers a way to obtain a reference to a registered cache object and to manipulate the cache and cached results directly through its public interface. Most of the time the only method needed however is the RegisterCache method. RegisterCache should be called before caching is going to take place, so it's ideal if the caches are registered at application startup.

Info

The connection string to lookup the registered resultset cache is obtained from the active DbConnection object on the query which resultset has to be cached. If specify the password in the connection string, the ADO.NET provider you're using might strip that out in the connection string provided by the active DbConnection object. For instance, if you're using SQL Server, persist security info=false is the default which means the password will be stripped out of the connection string.

This has consequences for what you have to specify to register the cache with the CacheController. In the situation described above, if you passed a connection string to the CacheController with a password, the connection string obtained from the DbConnection will differ from the one you used to register the cache and no cache will be found. To solve this, don't pass a connection string with the password keyword and the password to the CacheController.

If you specify the 'Password=somepassword' name-value pair at the end of the connection string, some ADO.NET providers will simply strip it off but leave the ';' before it in place. E.g. SQLClient does this. To make sure your connection string specified when registering the cache matches the one returned by DbConnection.ConnectionString, specify 'Password=somepassword' before the last name-value pair.

Catalog / Schema name overwriting and caching

 If catalog/schema name overwriting are used, it's not a problem: the key under which the sets are stored in the cache is based on the SQL statement. Overwriting a catalog/schema name will result in a different SQL statement and therefore the query will be cached under a different key.

Fetching DataReaders for projections and caching resultsets

When fetching DataReaders directly, using the low-level API, and then passing the fetched, open datareader to a projection method to produce instances from the rows offered by the DataReader will not re-use cached resultsets if the overload is used which doesn't accept the IRetrievalQuery used to open the DataReader.

This is the case in all code written against v3.5 or earlier. If you want to cache the resultset offered by the reader during the projection of the resultset, be sure to use the overload which accepts the IRetrievalQuery object and make sure the IRetrievalQuery object's CacheResultset and CacheDuration properties are set.

Switching off resultset caching

It's possible to switch off resultset caching, e.g. when you're debugging code and want the code to hit the database every time. To do so, set the property CachingController.CachingEnabled to false (default is true). This is a global setting and will switch off all resultset caching.

The CacheKey

Resultsets are cached in a cache under a key, a CacheKey instance. A CacheKey is created from an IRetrievalQuery object which provides the values for the key.

A CacheKey uses:

  • The SQL statement
  • All parameter values before execution
  • Flags for in-memory distinct/limiting/paging
  • Values for rows to skip/take in in-memory paging scenarios

A hashcode is calculated from these values and an Equals implementation makes sure two CacheKey instances are only considered equal if all of these values are indeed equal. This makes sure that two queries which differ only in e.g. parameter values will not return each other's resultsets from the cache.

Two pieces of code which are fetching data differently (e.g. a normal entity collection fetch and a projection using all entity fields onto a custom class) but result in the same SQL query and parameters will re-use the same set from the cache, if both queries are defined to be cachable.

CacheKey and 3rd party caches

Third-party caches like Memcached and the .NET system cache require short keys for the values they cache, or a key which is e.g. of type string. To overcome this, an IResultsetCache implementation for a 3rd party cache system should create mappings between a CacheKey object and a value which is used instead of CacheKey as the real key.

A way to do this is by using Guid values for the real key value and store these in a Dictionary inside the IResultsetCache implementation. Guids are nice ways to provide unique values across the board and also make sure the resultset is cachable.

Prefetch Paths / Nested queries in projections and caching

Queries, formulated through Linq or QuerySpec, which have a prefetch path defined or have nested queries in the projection, and which are specified to have their resultsets cached, will propagate the cache directive to all queries executed with the main query, so queries with a prefetch path will also be cached at the same time for the same amount of time.

This is true for nested queries in a main query as well: they too will inherit the cache specifics of the root query. You can make an exception on a per-node level, see below.

The main reason this is done is that the complete graph is fetched at the same time, and as child nodes rely on the parent rows for filtering (child sets are filtered based on the parent nodes present), it's better to keep the whole chain on the same cache strategy and duration: that way child nodes are relying on root nodes which are from the same 'snapshot' of data.

Exclude prefetch path nodes from resultset caching

It might be, due to performance reasons, the resultsets of parts of the prefetch path have to be cached while other elements should not be cached as they have to be read from the database every time. You can specify per node in a prefetch path to not cache the resultset in both Linq and QuerySpec. This is called the NoCaching directive.

NoCaching in Linq

We support out of the box two prefetch path systems: using PathEdge objects and using WithPath / lambdas. Both systems have support for NoCaching. Below are two examples which each fetch the prefetch path Customers / Orders / OrderDetails.

Caching is specified on the root query, however on the Orders node NoCaching is specified, meaning the Order entities are always fetched from the database. The Customer and OrderDetail entities will be read, if present, from the resultset cache.

For PathEdge the method WithNoCaching() should be used as shown in the example below. For the lambda system the method NoCaching() should be used, in line with the other methods like OrderBy().

var q = (from c in metaData.Customer
         where c.Country == "Germany"
         orderby c.CustomerId
         select c)
            .WithPath(new PathEdge<OrderEntity>
                          (
                              CustomerEntity.PrefetchPathOrders, null, 
                              new SortExpression(OrderFields.OrderId.Ascending()), 0,
                 new PathEdge<OrderDetailEntity>(OrderEntity.PrefetchPathOrderDetails)
                          ).WithNoCaching())
            .CacheResultset(10);
var q = (from c in metaData.Customer
         where c.Country == "Germany"
         orderby c.CustomerId
         select c)
             .WithPath(cp => cp.Prefetch<OrderEntity>(c => c.Orders)
                .OrderBy(o => o.OrderId)
                .NoCaching()
                .SubPath(op => op.Prefetch<OrderDetailEntity>(o => o.OrderDetails)))
             .CacheResultset(10);

NoCaching in QuerySpec

Using NoCaching in QuerySpec is similar to how it's used in Linq: the method WithNoCaching() should be used to specify that the resultset of the node shouldn't be cached.

var qf = new QueryFactory();
var q2 = qf.Customer
        .Where(CustomerFields.Country == "Germany")
        .OrderBy(CustomerFields.CustomerId.Ascending())
        .WithPath(CustomerEntity.PrefetchPathOrders
                .WithOrdering(OrderFields.OrderId.Ascending())
                .WithNoCaching()
                .WithSubPath(OrderEntity.PrefetchPathOrderDetails))
        .CacheResultset(10);

Prefetch paths fetched with a single entity as root

As the query to fetch a single entity is not a query which results in a set, there's no method to specify caching, so the prefetch path fetched with a single entity is never pulled from the cache, it's always fetched from the DB.

Database connections and caching

Caching occurs at a low level, and is directed by the IRetrievalQuery instance. This means that it isn't in control over the connection it is associated with. That connection might very well be open or closed, it doesn't control that.

Logic which opens/closes the connection doesn't know when/if caching occurs, so it can't prevent opening the connection as well. This means that when a set is read from the cache, the connection is already opened, and will stay open till the set from the cache is consumed.

It's OK to have this behavior, because the IRetrievalQuery instance can now decide inside itself whether to proceed with the cached set or to fetch the set from the DB, without the requirement of controlling the connection state: if a set is not available in the cache, e.g. it's expired, the IRetrievalQuery must fetch the set again from the DB.

Because the connection state is controlled outside the IRetrievalQuery object, it can simply proceed as it would when the query wasn't a cachable query.

Specifying caching behavior for a query

LLBLGen Pro has three ways to specify a query: Linq, QuerySpec and the Low-level API. We primarily added caching specification code to Linq and QuerySpec, as those two query APIs are the recommended way to formulate queries. Specifying caching with these three ways of querying is described below.

Low-level API

If you want to use the Low-level API, you can specify caching behavior, however you have to call the methods which accept a QueryParameters object. Using these methods, instead of calling the regular methods, you have to produce these QueryParameters objects, set the caching specification in that object and then call the overload which accepts the QueryParameters object.  See the example below.

// v3.5 code
var customers = new EntityCollection<CustomerEntity>();
using(var adapter = new DataAccessAdapter())
{
    var filter = new RelationPredicateBucket(CustomerFields.Country=="USA");
    adapter.FetchEntityCollection(customers, filter);
}

// Same query with caching
var customers = new EntityCollection<CustomerEntity>();
using(var adapter = new DataAccessAdapter())
{
    var parameters = new QueryParameters()
        {
                 CollectionToFetch = customers,
                 FilterToUse = CustomerFields.Country=="USA", 
                 CacheResultset = true,
                 CacheDuration = new TimeSpan(0, 0, 10)  // cache for 10 seconds
        };
    adapter.FetchEntityCollection(parameters);
}

QuerySpec

As a higher level query API on top of the low-level API, QuerySpec has an easier way to specify caching for a query: .CacheResultset(int) and .CacheResultset(timespan). You can append this method to any QuerySpec query, be it a DynamicQuery, DynamicQuery<T> or EntityQuery and the query is automatically marked as a query to cache its resultset for the duration specified and will re-use a cached resultset if present. CacheResultset() should be called at the outside of the query.

There are two overloads, one takes a number as the number of seconds to cache the resultset, the other takes a Timespan as duration.  See the examples below

// typed list fetch 
var tl = new OrderCustomerTypedList();
var q = tl.GetQuerySpecQuery(new QueryFactory())
                .CacheResultset(10);  // cache for 10 seconds
new DataAccessAdapter().FetchAsDataTable(q, tl);

// typed view fetch
var qf = new QueryFactory();
var q = qf.Create()
            .Select(InvoicesFields.CustomerId, InvoicesFields.CustomerName, 
                    InvoicesFields.OrderId)
            .Where(InvoicesFields.Country.StartsWith("U"))
            .CacheResultset(10);        // cache for 10 seconds
var tv = new InvoicesTypedView();
new DataAccessAdapter().FetchAsDataTable(q, tv);

// dynamic list / custom projection
var qf = new QueryFactory();
var q = qf.Employee
            .From(QueryTarget.InnerJoin(qf.Order)
                       .On(EmployeeFields.EmployeeId == OrderFields.EmployeeId))
            .OrderBy(EmployeeFields.EmployeeId.Ascending()).Offset(1).Distinct()
            .Select(() => new
                {
                    EmployeeId = EmployeeFields.EmployeeId.ToValue<int>(),
                    Notes = EmployeeFields.Notes.ToValue<string>()
                })
            .CacheResultset(5);    // cache for 5 seconds
var results = new DataAccessAdapter().FetchQuery(q);

// custom projection with nested set
var qf = new QueryFactory();
var q = qf.Create()
            .Select(() => new
            {
                    Key = CustomerFields.Country.ToValue<string>(),
                    CustomersInCountry = qf.Customer.TargetAs("C")
                                             .CorrelatedOver(CustomerFields.Country.Source("C") ==
                                                             CustomerFields.Country)
                                             .ToResultset()
            })
            .GroupBy(CustomerFields.Country)
            .CacheResultset(10);     // cache all for 10 seconds
var results = new DataAccessAdapter().FetchQuery(q);

// entity fetch
var qf = new QueryFactory();
var q = qf.Customer
            .Where(CustomerFields.Country == "Germany")
                 .WithPath(CustomerEntity.PrefetchPathOrders
                               .WithSubPath(OrderEntity.PrefetchPathEmployee),
                                      CustomerEntity.PrefetchPathEmployeeCollectionViaOrder)
            .CacheResultset(10);    // cache all for 10 seconds
var customers = new DataAccessAdapter().FetchQuery(q);

// A poco typed view mapped onto a stored procedure resultset.
// This uses the IRetrievalQuery instance to specify the caching directives.
var qf = new QueryFactory();
var query = RetrievalProcedures.GetCustOrdersDetailCallAsQuery(10254);
query.CacheResultset = true;
query.CacheDuration = new TimeSpan(0, 0, 10);
var rows = new DataAccessAdapter()
            .FetchQueryFromSource(qf.GetCustOrdersDetailQsTypedViewProjection(), query);

Linq to LLBLGen Pro

Linq to LLBLGen Pro follows the similar pattern as QuerySpec in this regards, use the .CacheResultset(int) or .CacheResultset(timespan) extension methods on an IQueryable to specify the caching behavior of the query. As with QuerySpec, specify the CacheResultset call at the outside of the query, it's not migrated to the outer query if you specify it inside a query wrapped in another query.

Linq allows queries to be fetched as entities or projections. This means that caching resultsets for typedviews and typedlists require the usage of QuerySpec. You can write QuerySpec queries together with Linq queries in the same application, so this should be no problem.

Examples:

// custom projection, with nested set
var metaData = new LinqMetaData(new DataAccessAdapter());
var q = (from c in metaData.Customer
         group c by c.Country into g
         select new
         {
             g.Key,
             CustomersInCountry = (
                  from c2 in metaData.Customer
                  where g.Key == c2.Country
                  select c2)
         })
        .CacheResultset(10);    // cache all for 10 seconds

// entity fetch with prefetch path
LinqMetaData metaData = new LinqMetaData(adapter);
var q = (from c in metaData.Customer
         where c.Country == "Germany"
         select c).WithPath<CustomerEntity>(cpath => cpath
                 .Prefetch<OrderEntity>(c => c.Orders)
                     .SubPath(opath => opath
                         .Prefetch(o => o.OrderDetails)
                         .Prefetch<EmployeeEntity>(o => o.Employee).Exclude(e => e.Photo, e => e.Notes)))
         .CacheResultset(10);    // cache all for 10 seconds

Tagging cached resultsets for cache retrieval and cache purging

Normally a cached resultset is stored under its cachekey and there's no way to actively purge the resultset from the cache without knowing the cachekey nor is it possible to obtain the cached resultsets from the CacheController. However LLBLGen Pro offers a way to tag a cached resultset.

The tag, which is a string, can be used to group cached resultsets from different queries together so they can be purged from the cache in one go by calling the CacheController.PurgeResultsets(tag)** method. Additionally it's possible to obtain the cached resultsets which have the specified tag by calling CacheController.GetCachedResultsets(tag)**. It's not recommended to alter the cached resultsets.

The tag is a case sensitive string value and can be anything. It's specified with the CacheResultset() method in Linq and QuerySpec by using an overload which accepts a cacheTag string. The main usage for the tags is to be able to purge resultsets from the cache if code modifies data which you know is in the cache: by caching the data using a known tag, you can purge the cached resultsets from the cache by calling the PurgeResultsets method when you change the data.

It's not possible to update the tag on a resultset once it's been cached. The tag isn't part of the cache key, which means that the tag is ignored when checking which resultset should be retrieved from the cache.