Entity Size Estimate in V2

Posts   
 
    
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 27-Jul-2006 19:08:17   

Frans,

I'm trying to calculate cache sizing and decide whether to cache Entities or "simply their data values...". I'd prefer to cache the Entities because it would be easier. Do you have any idea how much memory overhead there is in an Entity over and above their field's data.

Marcus

JimFoye avatar
JimFoye
User
Posts: 656
Joined: 22-Jun-2004
# Posted on: 27-Jul-2006 21:26:49   

I'd like to expand on this theme a little bit. Since I am generating code, it would be nice to know in general what kind of memory footprint my generated classes have, and more specifically, what effect on that footprint turning off unused relations would have. I am tempted to go turn a bunch off, but I don't know if it's really worth the effort. Perhaps there are some other options (especially in V2) that affect size of generated code as well?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 27-Jul-2006 22:03:16   

In .NET, code is shared among all instances of a class, so the only thing that adds to the size of an object in memory is its data footprint. This includes reference pointers (4 bytes per reference).

Relations are created on the fly, so these don't add to the memory footprint at all, neither does the generated code.

Entity field data have a shared part (FIeldInfo) and a field-data part. The fieldinfo part is shared among all instances of the same field (e.g. customerid) the field-data part is specific for a field, e.g. its value, dbvalue, isnull flag, ischanged flag.

It greatly depends on what data you store in an entity how big the memory footprint is.

Frans Bouma | Lead developer LLBLGen Pro
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 28-Jul-2006 00:44:25   

Otis wrote:

Entity field data have a shared part (FIeldInfo) and a field-data part. The fieldinfo part is shared among all instances of the same field (e.g. customerid) the field-data part is specific for a field, e.g. its value, dbvalue, isnull flag, ischanged flag.

It greatly depends on what data you store in an entity how big the memory footprint is.

I know I'm asking "how long is a piece of string..." but maybe I should re-phrase:

If I had an class which held data values for the following:

ID int Name nvarchar(50)

this would take up about 120 bytes "give or take" in .NET.

If I were to cache an Entity which consisted of these two data types is there any rule of thumb for estimating how much memory "give or take" it would consume.

Marcus

Walaa avatar
Walaa
Support Team
Posts: 14950
Joined: 21-Aug-2005
# Posted on: 28-Jul-2006 08:55:34   

In such situations and just for testing, I would load an object or a bunsh of them (maybe 1000) and I would look in Task Manager at the memory utilization of my process before and after.

(edit) You may also use a tool like ANTS Memory Profiler.

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 28-Jul-2006 09:33:57   

Walaa wrote:

In such situations and just for testing, I would load an object or a bunsh of them (maybe 1000) and I would look in Task Manager at the memory utilization of my process before and after.

(edit) You may also use a tool like ANTS Memory Profiler.

Good suggestion.

I should also add that for the purpose of the question, I'm not including any relations, just the simple entity.

Also, once an entity is fetched and will now only be used in a read only scenario (where ONLY the field properties are read), are there are fields or internal collections that can be eliminated (set to null) in order to save space.

I'm thinking maybe I could add a method called "ToReadOnly()" to the entity which would return a wrapped readonly version of the entity with unneeded internal fields nulled... so that caching becomes more efficient.

Marcus

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 28-Jul-2006 10:02:34   

Marcus wrote:

Walaa wrote:

In such situations and just for testing, I would load an object or a bunsh of them (maybe 1000) and I would look in Task Manager at the memory utilization of my process before and after.

(edit) You may also use a tool like ANTS Memory Profiler.

Good suggestion.

I should also add that for the purpose of the question, I'm not including any relations, just the simple entity.

Also, once an entity is fetched and will now only be used in a read only scenario (where ONLY the field properties are read), are there are fields or internal collections that can be eliminated (set to null) in order to save space.

I'm thinking maybe I could add a method called "ToReadOnly()" to the entity which would return a wrapped readonly version of the entity with unneeded internal fields nulled... so that caching becomes more efficient.

Marcus

In adapter, internal collections already are set to null unless you read the property. There are some overhead bytes internally, but not that much. Tests showed that by loading 50,000 entities in memory, the memory footprint was on par with a POCO approach, perhaps 10% bigger, but not much.

You lose way more memory space due to .NET itself simple_smile

Frans Bouma | Lead developer LLBLGen Pro
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 28-Jul-2006 10:20:34   

Otis wrote:

In adapter, internal collections already are set to null unless you read the property. There are some overhead bytes internally, but not that much. Tests showed that by loading 50,000 entities in memory, the memory footprint was on par with a POCO approach, perhaps 10% bigger, but not much.

You lose way more memory space due to .NET itself simple_smile

This is what I was hoping to hear... GREAT. smile

What do you think of the ReadOnly wrapper? If anyone is caching Entities in a multi threaded app like ASP.NET or a web service, they could easily run into big threading problems which might be very difficult to find.

I have 2 strategies I'm exploring:

1) Generate ReadOnly wrappers and add a ToReadOnly() method to entities which returns a wrapped entity.

2) Project resultsets directly into new ReadOnly entities (generated) which derive from EntityBase2 but have their setters overriden in some way... This may not be possible as the framework itself needs to initialise the entities data values.

Do you think this is a feature that is required in the framework or is it something unusual?

Marcus

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 28-Jul-2006 11:24:51   

Marcus wrote:

Otis wrote:

In adapter, internal collections already are set to null unless you read the property. There are some overhead bytes internally, but not that much. Tests showed that by loading 50,000 entities in memory, the memory footprint was on par with a POCO approach, perhaps 10% bigger, but not much.

You lose way more memory space due to .NET itself simple_smile

This is what I was hoping to hear... GREAT. smile

I tested this against nhibernate, it might be that one creates a lot of internal data as well, not sure. In general, memory is eaten alive with .NET, so if memory pressure is high, consider other options. I mean, a 10 liner with hello world already eats several megabytes of data... huh? simple_smile

What do you think of the ReadOnly wrapper? If anyone is caching Entities in a multi threaded app like ASP.NET or a web service, they could easily run into big threading problems which might be very difficult to find.

I have 2 strategies I'm exploring:

1) Generate ReadOnly wrappers and add a ToReadOnly() method to entities which returns a wrapped entity.

2) Project resultsets directly into new ReadOnly entities (generated) which derive from EntityBase2 but have their setters overriden in some way... This may not be possible as the framework itself needs to initialise the entities data values.

Do you think this is a feature that is required in the framework or is it something unusual?

By design the entities are 'values' i.e. should be passed by value over boundaries. As the entities contain their own changetracking, cross-thread usage is thus not that important, as in: if thread A changes a value, thread B can be using that change. However, in general, it's best not to use shared entities for object graphs build in memory, as in: you have a shared Customer entity, and you create order entities in several threads and you do: myOrder.Customer = mySharedCustomer; this will add myOrder to the mySharedCustomer.Orders collection.

Use the shared entities for values, not for object references to set, or first clone it. Making them readonly is kind of hard I think, as you can always set the CurrentValue properties.

Frans Bouma | Lead developer LLBLGen Pro
Posts: 1255
Joined: 10-Mar-2006
# Posted on: 28-Jul-2006 16:11:05   

Marcus, I like your idea of just generating a read only class with properites for all the fields of an Entity. You can then project that data onto that class. However, I do not think it needs to inherit from EntityBase. It does not need to inherit from anything (unless you want your own base) as it is just a class with read-only properties.

Or, are you thinking it would need to come from EntityBase to be able to put it in a collection? You probably would need to generate the collection too (assuming you are caching collections of entities too).

Let me know as I am interested in this.

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 28-Jul-2006 16:33:31   

WayneBrantley wrote:

Marcus, I like your idea of just generating a read only class with properites for all the fields of an Entity. You can then project that data onto that class. However, I do not think it needs to inherit from EntityBase. It does not need to inherit from anything (unless you want your own base) as it is just a class with read-only properties.

Or, are you thinking it would need to come from EntityBase to be able to put it in a collection? You probably would need to generate the collection too (assuming you are caching collections of entities too).

Let me know as I am interested in this.

Unfortunate EntityCollection<TEntity> requires that items inherit from EntityBase2

public partial class EntityCollection<TEntity> : EntityCollectionBase2<TEntity> where TEntity : EntityBase2, IEntity2

I'm not sure why this is needed given the specified interface "IEntity2", and it does get in the way of being able to do some interesting optimisations.

I'm reluctant to generate my own collection class for 2 reasons. a) This would give me little benefit over using any of the existing IList<> implementations. and b) I want this optimisation to be transparent to upper layers which use the existing EntityCollection<> class.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 28-Jul-2006 17:00:47   

It's needed because otherwise I can't access internal methods in the base class. I also can't implement an internal interface on it as I then can't define it on the where outside the runtime libs.

Yeah, the where restrictions are indeed... restrictive simple_smile

Frans Bouma | Lead developer LLBLGen Pro
Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 28-Jul-2006 17:08:13   

Otis wrote:

I tested this against nhibernate, it might be that one creates a lot of internal data as well, not sure. In general, memory is eaten alive with .NET, so if memory pressure is high, consider other options. I mean, a 10 liner with hello world already eats several megabytes of data... huh? simple_smile

Its not so much memory pressure being high, but more about how many entities I can keep cached at any given time.

I have a bank of middle tier servers which run an instance of a federated web service, each service maintains multiple caches at various layers in order to minimise database calls. I am currently implementing a low level cache at the Entity layer. Although cached data can become stale, the user will always see a consistent view of the stale data until it expires (a few minutes). On the machine where an entity does changes, the cache is invalidated after the transaction succesfully commits. This allows subsequent calls on that machine to pull the fresh data from the database again and repopulate the cache with the new data.

Otis wrote:

By design the entities are 'values' i.e. should be passed by value over boundaries. As the entities contain their own changetracking, cross-thread usage is thus not that important, as in: if thread A changes a value, thread B can be using that change. However, in general, it's best not to use shared entities for object graphs build in memory, as in: you have a shared Customer entity, and you create order entities in several threads and you do: myOrder.Customer = mySharedCustomer; this will add myOrder to the mySharedCustomer.Orders collection.

Use the shared entities for values, not for object references to set, or first clone it. Making them readonly is kind of hard I think, as you can always set the CurrentValue properties.

There are a couple of points here concerning the cachability of Entities:

  • Cached entities should never be written to and are hence are read only. This gives us thread safetly...
  • Cached entities must be flat and should not include any relation data. This makes things manageable especially when personalisation gets in the way, where different users have a different view of the world! Some entities are not suitable for caching... but some can be cached indefinately like immutable data. Tags are a great example: once a "Tag" is created, it's ID never changes, so why would you prefetch Tags every time you fetch (related) Items?

In general I fetch link tables on every request. These are generally narrow tables which have a high rate of change. This operation of fetching these rows is generally cheap and does not impact the DB very much.

For example: I fetch ItemIDs and TagIDs from the ItemTagLink table. Then I identifiy which ItemIDs represent ItemEntities that are not cached, and fetch them. I do the same for TagIDs and fetch uncached Tags.

I then create a EntityCollection of cloned Items and manually add cached Tags to its TagCollection before passing this back to the caller. All this happens in the DAL of the middle tier and the database was only used for entities that were not found in cache.

The uppers layers are not aware of any caching below.

In practice this gives me a very high cache hit ratio of > 80% and reduces the load of the database significantly. Without this cache the call volumes on the db can be very high.

There's an interesting article on high call volumes to SQL Server at http://www.sql-server-performance.com/jc_high_call_volume.asp

Being able to make Entities read only, only gives me protection in the upper layers from accidential modifications to Entities which may be shared by other threads.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 01-Aug-2006 13:04:57   

Marcus wrote:

Otis wrote:

I tested this against nhibernate, it might be that one creates a lot of internal data as well, not sure. In general, memory is eaten alive with .NET, so if memory pressure is high, consider other options. I mean, a 10 liner with hello world already eats several megabytes of data... huh? simple_smile

Its not so much memory pressure being high, but more about how many entities I can keep cached at any given time.

I think the only way to find that out is by testing. For example loading 1000 entities into memory and see how much memory it takes vs. 2000 entities.

I have a bank of middle tier servers which run an instance of a federated web service, each service maintains multiple caches at various layers in order to minimise database calls. I am currently implementing a low level cache at the Entity layer. Although cached data can become stale, the user will always see a consistent view of the stale data until it expires (a few minutes). On the machine where an entity does changes, the cache is invalidated after the transaction succesfully commits. This allows subsequent calls on that machine to pull the fresh data from the database again and repopulate the cache with the new data.

Sounds like a cool setup! simple_smile

Otis wrote:

By design the entities are 'values' i.e. should be passed by value over boundaries. As the entities contain their own changetracking, cross-thread usage is thus not that important, as in: if thread A changes a value, thread B can be using that change. However, in general, it's best not to use shared entities for object graphs build in memory, as in: you have a shared Customer entity, and you create order entities in several threads and you do: myOrder.Customer = mySharedCustomer; this will add myOrder to the mySharedCustomer.Orders collection.

Use the shared entities for values, not for object references to set, or first clone it. Making them readonly is kind of hard I think, as you can always set the CurrentValue properties.

There are a couple of points here concerning the cachability of Entities:

  • Cached entities should never be written to and are hence are read only. This gives us thread safetly...
  • Cached entities must be flat and should not include any relation data. This makes things manageable especially when personalisation gets in the way, where different users have a different view of the world! Some entities are not suitable for caching... but some can be cached indefinately like immutable data. Tags are a great example: once a "Tag" is created, it's ID never changes, so why would you prefetch Tags every time you fetch (related) Items?

In general I fetch link tables on every request. These are generally narrow tables which have a high rate of change. This operation of fetching these rows is generally cheap and does not impact the DB very much.

For example: I fetch ItemIDs and TagIDs from the ItemTagLink table. Then I identifiy which ItemIDs represent ItemEntities that are not cached, and fetch them. I do the same for TagIDs and fetch uncached Tags.

I then create a EntityCollection of cloned Items and manually add cached Tags to its TagCollection before passing this back to the caller. All this happens in the DAL of the middle tier and the database was only used for entities that were not found in cache.

The uppers layers are not aware of any caching below.

In practice this gives me a very high cache hit ratio of > 80% and reduces the load of the database significantly. Without this cache the call volumes on the db can be very high.

There's an interesting article on high call volumes to SQL Server at http://www.sql-server-performance.com/jc_high_call_volume.asp

Being able to make Entities read only, only gives me protection in the upper layers from accidential modifications to Entities which may be shared by other threads.

I played with the idea of making a readonly version of the entities, but the thing was that you can always set an entity's values anyway through reflection, the .NET framework itself doesn't offer the ability for an immutable object.

Frans Bouma | Lead developer LLBLGen Pro
Fabrice
User
Posts: 180
Joined: 25-May-2004
# Posted on: 02-Aug-2006 12:19:32   

I ran into the same problematic cache functionality (but in v1) The main problem I got was a big memory leak when using cached object, because as said before when you do

MyObject.MyParentObject = myParentObjectInCache

then the MyObject is added to the myParentObjectInCache.MyObjectElements. And so MyObject is never released anymore because there is a reference to it from the cached object, which is referenced by the cache. So the only solution I found to that, it's that when an entity is requested to the CacheManager, it doesn't return the cached item but a copy of it :


IEntity2 entity         = this.entityFactory.Create(originalEntity.Fields);
            entity.IsNew            = false;
            entity.IsDirty          = false;
            entity.Fields.IsDirty   = false;

            return entity;

So that the item return is not referenced by the cache manager, and can be released.

Marcus avatar
Marcus
User
Posts: 747
Joined: 23-Apr-2004
# Posted on: 03-Aug-2006 10:09:08   

Good points...