HnD | EntityBase2.cs (.Net 1.1) serialization, Page 2

EntityBase2.cs (.Net 1.1) serialization

Posts

Otis
LLBLGen Pro Team

Posts: 39897
Joined: 17-Aug-2003

# Posted on: 10-Aug-2006 10:17:54

simmotech wrote:

Hi Frans My earlier message had some questions - maybe you missed them. The answers would be helpful in putting my serialization stuff in a more final, usable state.

Sorry, I bookmarked the thread but didn't answer it yesterday. I've answered them above

Had another thought last night - using tokens for strings to prevent duplication. Thought this might slow things down too much so a further thought was to have two versions - Fast and Compact.

Strings are already 'unique' inside the CLR at runtime, so a string value isn't duplicated, as they're reference types so 2 references to the same string simply refer to the same object in memory.

However reading:

Tried it this morning and it worked a treat - even speeded things up. Current results are:- Original: 13, 608,188 bytes; serialize in 92.81 seconds; deserialize in 58.00 seconds

Current Best: 2,590,641 bytes; serialize in 0.29 seconds; deserialize in 1.10 seconds (80.96% reduction in size) (99.68% reduction in time) 320x faster for serialization and 108x faster overall

suggests it does pay off . What exactly did you do with the strings? You gave every string a hashvalue?

Frans Bouma | Lead developer LLBLGen Pro

simmotech
User

Posts: 1024
Joined: 01-Feb-2006

# Posted on: 10-Aug-2006 13:49:09

Otis wrote:

Strings are already 'unique' inside the CLR at runtime, so a string value isn't duplicated, as they're reference types so 2 references to the same string simply refer to the same object in memory.

Thats what I was thinking and why I didn't think of this optimization before. (well thats my excuse )

Thats only true if they refer to the same string object. If there are many strings with identical contents, as is likely from a database query, then they are separate objects and will be written potentially many times.

Instead of writing a string directly I instead call get an int token from a StringTokenizer class and write that instead. The StringTokenizer class just looks in a bog-standard Hashtable (so comparing by value not object reference) and returns the int value associated with the string key. If not found then it just adds a new entry. It involves boxing in .NET 1.1 of course but who cares! .NET 2.0 would allow a Dictionary<string, int> as a free further optimization.

The boxing and sheer number of strings, many of which may only appear once, make it sound slow but on the other hand the string would have to be written anyway and doing it once means that the UTF8 used in the BinaryWrite only has to do its stuff once.

For deserialization, no hashtable is required - just a string[] of the values which are retrieved directly from the int token.

Cheers Simon

PS will look over the other answers - thanks very much.

Otis
LLBLGen Pro Team

Posts: 39897
Joined: 17-Aug-2003

# Posted on: 11-Aug-2006 11:55:25

simmotech wrote:

Otis wrote:

Strings are already 'unique' inside the CLR at runtime, so a string value isn't duplicated, as they're reference types so 2 references to the same string simply refer to the same object in memory.

Thats what I was thinking and why I didn't think of this optimization before. (well thats my excuse )

Thats only true if they refer to the same string object. If there are many strings with identical contents, as is likely from a database query, then they are separate objects and will be written potentially many times.

ah, makes sense. In v2.0 I do use the same string-preserving mechanism in the persistence info / field info setup code, to avoid having to have a lot of strings created in the code. Though I was under the assumption that the CLR would see two string objects with the same value as the same value, but that's not the case proven by your code

Frans Bouma | Lead developer LLBLGen Pro

simmotech
User

Posts: 1024
Joined: 01-Feb-2006

# Posted on: 17-Aug-2006 08:30:29

Hi Frans

I have sent you a private email containing the code. (MikeG - let me know if you are interested in looking at this and I'll send you the Zip file)

Original: 13, 608,188 bytes; serialize in 92.81 seconds; deserialize in 58.00 seconds

Current Best: 2,524,673 bytes; serialize in 0.31 seconds; deserialize in 1.11 seconds

I have rejigged the code into a classes that aren't nested in EntityBase2 or EntityCollectionBase2 and added some interfaces to make it extensible.

There are four options for optimization:

Optimization types: None - Uses the original serialization LLBLGenPro serialization as-is

UltraSafe - Uses flags to determine what information needs to be serialized/deserialized but stores all information in SerializationInfo with names as normal

Safe - Similar to UltraSafe but stores all internal information (ie. anything that cannot be shared with any external objects) using SerializerWriter/Reader. Much faster than UltraSafe (10x for larger numbers of entites) but only marginally smaller file size.

Fast (default) - The real deal. Takes the top-level entity or collection and serializes it and everything below it recursively using SerializerWriter/Reader. Fully optimized resulting in the smallest amount of stored data and by far the fastest serialization/deserialization times.

I have two issues for which I would need your help to speed things further: Issue 1 is a bottleneck (seen by profiling) in putting the object[] data back into a new entity: I have an EntityFactory2 instance, and either 2 x object[] data + 1 BitArray of modified flags (or 1 x object[] data if the data was not dirty). What I need is a special internal constructor (or set of) to take this info and create a usable entity bypassing all of the checking etc. The object[] doesn't have to be copied at all and can be directly assigned.

Issue 2 possibly isn't a problem at all. In my testing, I was retrieving one set of data and serializing/deserializing it using each of the 4 optimizations in turn - I was then looping this to get average values. However on the second pass, all of the optimizations except Fast, ran much slower. Eventually I also noticed that their size was different to pass one. What is happening is that Fast optimization (which happens to be last in the list of four) looks for non-empty collections within an entity by calling GetMemberEntityCollections(). This has the side effect of creating empty collections if they don't already exist. Therefore in pass two and higher, the other optimizations also end up indirectly serializing these empty collections (and take 1.5MB to do it!). Since Fast optimization does all the serialization itself, it ignores empty collections and so wasn't affected. Is there a way (or could a good way be added) of getting the used-collection information without actually creating empty instances.

Any help here would be great.

Cheers Simon

PS The code also includes an EntityFactoryCache class to ensure that only one instance of a factory is serialized. This might be a useful optimization in itself for LLBLGenPro - I just looked in a profiler and saw 34,000+ instances of the same factory class

Otis
LLBLGen Pro Team

Posts: 39897
Joined: 17-Aug-2003

# Posted on: 17-Aug-2006 11:52:40

Thanks for the code!

About the constructor: I think that can be arranged though I have to check with the details of your code when it should be called. The last thing you want is to trigger all kinds of events and other stuff like OnInitialized etc.

About the collections: there's no other way without adding code to the templates, because the generated serialization method GetObjectInfo adds all collection members. However, it's IMHO not a problem, as in practise it won't happen you first run a 'fast' serialization and then again run a non-fast serialization after that.

About the factories: They're not a bottleneck in entity creation etc., perhaps in memory, but not that big, as they don't contain any data, so they eat up a couple of bytes but not that much. For deserialization / serialization purposes, they can slow down the stuff a lot indeed, so a cache for these is indeed something useful!

Frans Bouma | Lead developer LLBLGen Pro

mikeg22
User

Posts: 411
Joined: 30-Jun-2005

# Posted on: 17-Aug-2006 23:17:03

simmotech wrote:

Hi Frans

I have sent you a private email containing the code. (MikeG - let me know if you are interested in looking at this and I'll send you the Zip file)

Original: 13, 608,188 bytes; serialize in 92.81 seconds; deserialize in 58.00 seconds

Current Best: 2,524,673 bytes; serialize in 0.31 seconds; deserialize in 1.11 seconds

I have rejigged the code into a classes that aren't nested in EntityBase2 or EntityCollectionBase2 and added some interfaces to make it extensible.

There are four options for optimization:

Optimization types: None - Uses the original serialization LLBLGenPro serialization as-is

UltraSafe - Uses flags to determine what information needs to be serialized/deserialized but stores all information in SerializationInfo with names as normal

Safe - Similar to UltraSafe but stores all internal information (ie. anything that cannot be shared with any external objects) using SerializerWriter/Reader. Much faster than UltraSafe (10x for larger numbers of entites) but only marginally smaller file size.

Fast (default) - The real deal. Takes the top-level entity or collection and serializes it and everything below it recursively using SerializerWriter/Reader. Fully optimized resulting in the smallest amount of stored data and by far the fastest serialization/deserialization times.

I have two issues for which I would need your help to speed things further: Issue 1 is a bottleneck (seen by profiling) in putting the object[] data back into a new entity: I have an EntityFactory2 instance, and either 2 x object[] data + 1 BitArray of modified flags (or 1 x object[] data if the data was not dirty). What I need is a special internal constructor (or set of) to take this info and create a usable entity bypassing all of the checking etc. The object[] doesn't have to be copied at all and can be directly assigned.

Issue 2 possibly isn't a problem at all. In my testing, I was retrieving one set of data and serializing/deserializing it using each of the 4 optimizations in turn - I was then looping this to get average values. However on the second pass, all of the optimizations except Fast, ran much slower. Eventually I also noticed that their size was different to pass one. What is happening is that Fast optimization (which happens to be last in the list of four) looks for non-empty collections within an entity by calling GetMemberEntityCollections(). This has the side effect of creating empty collections if they don't already exist. Therefore in pass two and higher, the other optimizations also end up indirectly serializing these empty collections (and take 1.5MB to do it!). Since Fast optimization does all the serialization itself, it ignores empty collections and so wasn't affected. Is there a way (or could a good way be added) of getting the used-collection information without actually creating empty instances.

Any help here would be great.

Cheers Simon

PS The code also includes an EntityFactoryCache class to ensure that only one instance of a factory is serialized. This might be a useful optimization in itself for LLBLGenPro - I just looked in a profiler and saw 34,000+ instances of the same factory class

I would love to see that code

My email is mgatesNOSPAMMERS@tecolote.comNOSPAMMERS

Remove the NOSPAMMERS from the middle and end...thanks!

tomahawk
User

Posts: 169
Joined: 02-Mar-2005

# Posted on: 19-Sep-2006 04:32:25

I've been following this post with interest, as we have a smart client we are developing that could greatly benefit from these improvements. Any ideas on when these performance improvements will be available in an update to LLBLGen?

Otis
LLBLGen Pro Team

Posts: 39897
Joined: 17-Aug-2003

# Posted on: 19-Sep-2006 09:17:36

It requires a lot of changes under the hood, so it's postponed till at least the next upgrade for v2.0. Simon Hewitt (simmotech) wrote the additional code and he might want to share it for the people who want to use it now, however it does require changes to the runtime libs you've to make yourself.

Frans Bouma | Lead developer LLBLGen Pro

mgolubovic
User

Posts: 16
Joined: 25-Oct-2006

# Posted on: 19-Dec-2006 16:14:59

I was just wondering if there is any update to the status of fitting these optimizations into a version of 2.0.

I realize this thread is a couple of months old and a few versions have been released since then, but I have version 2.0.0.61023 and performance is still slow during serialization / deserialization and am assuming that the optimizations have not yet been put in for this version.

Thanks, Milos

Otis
LLBLGen Pro Team

Posts: 39897
Joined: 17-Aug-2003

# Posted on: 19-Dec-2006 17:20:07

Well, it's as fast as I can make it at the moment. The main issue is that the optimizations come from the fact that the serialization code is called by a custom sink during remoting, so an optimizer object can be passed in through the formatter so the entity etc. code can utilize that optimizer at runtime. This to be able to serialize the caches into the main data as well at the end (which is otherwise unknown).

I've made a couple of minor architectural changes for Simon so his optimization code is easily included into an entitybase2 class without any effort or problems when the source is updated.

For now if you want fast serialization, you have to create your own sink and also add the optimization code of simon to the runtime lib and recompile. I'll leave it to Simon to contact you to share his code with you.

For v2.1, I will optimize the code a bit, so bottlenecks currently still in the code are weeded out but don't expect massive improvements, because I can't assume everyone will create his / her own sink.

Frans Bouma | Lead developer LLBLGen Pro

acradyn
User

Posts: 57
Joined: 03-Apr-2004

# Posted on: 22-Dec-2006 06:04:23

Frans,

Do you expect LLBLGen's 2.0 Serialization to perform better than v1? I just did some preliminary testing before we move to 2.0 and it appears the performance is a litle worse in 2.0, which I was surprised b/c you mentioned how you improved the memory footprint in 2.0.

Anyway in my small test, I took an existing fetch in our application w/ total of 5000 entiies in a graph with prefetches. Using the BinaryFormatter, I would serialize it to disk.

Size, FillEntityCollection speed Version 1: 24.7 MB, 11 second to fill EntityColleciton from DB Version 2: 40 MB, 18 seconds to fill EntityCollection from DB

      public static void SerializeGraph(object obj) {
            using(Stream stream = File.Open(@"D:\Temp3\Ver1_" + obj.GetType().Name + "_" + DateTime.Now.ToString("hhMMssmmm") + ".bin", FileMode.Create)) {
                BinaryFormatter bformatter = new BinaryFormatter();
                Debug.WriteLine("Begin Writing Info" + DateTime.Now.ToLongTimeString());
                bformatter.Serialize(stream, obj);
                stream.Close();
                Debug.WriteLine("End Writing Info" + DateTime.Now.ToLongTimeString());
            }
        }

I wanted to see what you think and possibly setup a test of AdventureWorks so that we have more constant environment to compare the two version.

Do you have any suggestions? One of the main reasons we where going to move to 2.0 was for the performance improvements but it doesn't look that good now..

Info: - I compared the SQL being sent to DB and all fetches returned the exact same row count - I'm using the latest version of LLBLGen 1 & 2 as of today. - I add two small interfaces in my Entity generation - I add two methods my EntityCollection Partial class in v.2, (v1. it's in a template)

Otis
LLBLGen Pro Team

Posts: 39897
Joined: 17-Aug-2003

# Posted on: 22-Dec-2006 08:29:12

I'm really surprised it's way bigger in size...

I'll check what might be causing that, as it should be the same size... .NET 2.0 serialization is faster than .net 1.x serialization. You used .net 1.x serialization with adapter?

v2.0 has much faster entity fetch logic and a lower memory footprint. The persistence data wasn't serialized into the data on disk anyway.

Frans Bouma | Lead developer LLBLGen Pro

Otis
LLBLGen Pro Team

Posts: 39897
Joined: 17-Aug-2003

# Posted on: 22-Dec-2006 12:22:22

I created two adapter .NET 2.0 projects: one with v1.0.2005.1, and one with v2.0, all latest release builds of the runtime libs.

I then fetched from northwind all customers, all orders and all order details in a prefetch path. I then serialized the big graph to a file using the binary formatter. The v1.0.2005.1 version was 2.86 MB, the v2.0 version was 6.56MB

Clearly something is wrong here, so I'll look into this. Something gets serialized into the data which shouldn't be there.

(edit): when I have just customers, the file is the same size (almost, I run now a non-final debug build of the runtime libs, which has a slightly smaller name (no 'NET20') so type references emitted by the formatter are smaller in the output data, which makes the v2.0 output data a bit smaller), when I fetch the related orders into the graph, I get twice the size of data.

So it's somewhere in the code which adds the collections. I'll check that out. It's hard to debug, as the collections are generic, and I can't use the SOAP formatter to see what data is emitted more than once.

(edit). It's something with the .NET 2.0 implementation of the code. In .NET 1.1, v2.0 gives the same (smaller actually) file size as the .NET 1.1 version created with v1.0.2005.1, also with the complete graph.

Something gets serialized more than once I think. I'll now try to hack the code a bit to make it use non-generic collections so I can use the soap stuff.

(edit) it's something in the syncing info. If I exclude that, the file is 600KB instead of 1.9MB. The sync info is stored in dictionary objects (generics).

Frans Bouma | Lead developer LLBLGen Pro

Otis
LLBLGen Pro Team

Posts: 39897
Joined: 17-Aug-2003

# Posted on: 22-Dec-2006 13:31:01

ack

When I instead of doing this:

info.AddValue("_relatedEntitySyncInfos", _relatedEntitySyncInfos, typeof(Dictionary<Guid, Dictionary<string, EntitySyncInfo<IEntity2>>>));

do:

Hashtable syncInfos = null;
if(_relatedEntitySyncInfos != null)
{
    syncInfos = new Hashtable();
    foreach(KeyValuePair<Guid, Dictionary<string, EntitySyncInfo<IEntity2>>> pair in _relatedEntitySyncInfos)
    {
        Hashtable toAdd = new Hashtable();
        foreach(KeyValuePair<string, EntitySyncInfo<IEntity2>> subPair in pair.Value)
        {
            toAdd.Add(subPair.Key, subPair.Value);
        }
        syncInfos.Add(pair.Key, toAdd);
    }
}
//...
info.AddValue("_relatedEntitySyncInfos", syncInfos);

I get instead of 1.96MB, 1.11MB. So instead of serializing Dictionary's using generic types, I serialize hashtables, which are much more compact as it seems.

I still have 200KB to win somewhere, which is likely in the next line:


info.AddValue("_field2RelatedEntity", _field2RelatedEntity, typeof(Dictionary<string, Guid>));

also a dictionary.

These two are used for syncing info, and are only filled when related entities are fetched into a collection contained inside an entity, hence the correct size when just the customers were serialized.

I'll test that next.

(edit) BINGO. Rewriting that to a hashtable indeed chops off the 200KB, I'm now at 967KB for v2.0, and 919KB for v1.0.2005.1.

And you know what? It's also faster, even WITH the convert routine to hashtable.

I'll try something else as well: nested arrays, to see if I can chop it off even more. (edit): looking into the code of Dictionary.GetObjectData(), it creates a KeyValuePair<T, V> array, not 2 arrays with keys and values, like the hashtable does.

If I create a 'FastDictionary' and serialize the keys and values as generic arrays, it makes not a lot of difference: I get 1.6MB. So the generic data makes the serialization output much bigger... because, if I simply instead of TKey[], TValue[] arrays, simply export object[] arrays, the size is... 1.09 MB. (slightly more compact than the hashtable, as I used the fieldToRelatedEntity sync info again as dictionary)

Frans Bouma | Lead developer LLBLGen Pro

Otis
LLBLGen Pro Team

Posts: 39897
Joined: 17-Aug-2003

# Posted on: 22-Dec-2006 16:40:13

OK, my 'FastDictionary' subclass works OK, my unittests succeed. It's slightly faster, and has no longer the overhead as the Dictionary serialization has. There's still a tiny little bit of overhead extra in the v2.0 data, because of the generics used in various classes, which apparently (as this threads shows) adds extra overhead to the serialized data.

Fixed in build 061222

(edit). I've attached this build to this post. If you want, you can test this build with your v2.0 test code. It should give similar sizes as with v1.0.2005.1 code. (click the paperclip in the post header )

(edit2): refreshed the attached file, as I didn't set the buildnr in the build I uploaded previously. ~~There is a tiny bug in the selfservicing code, stay tuned.~~ Squashed

Be sure to use the 22-Dec-2006 17:24.18 uploaded version

Frans Bouma | Lead developer LLBLGen Pro

acradyn
User

Posts: 57
Joined: 03-Apr-2004

# Posted on: 22-Dec-2006 17:23:58

If only every software company had the response-time like you guys...

So with this fix are LLBLGen v1 and v2 roughly equivelent in speed and size of the serialization? Thanks for taking a look into this.

I think I'll still need a custom sink with our remoting. I think that's our best bet on making a significant improvement.

Frans, how can I get a simonTech's FastSerialization? Can you pull my email from my profile?

Thanks for your help!

Otis
LLBLGen Pro Team

Posts: 39897
Joined: 17-Aug-2003

# Posted on: 22-Dec-2006 17:28:31

acradyn wrote:

If only every software company had the response-time like you guys...

Well, this is a big issue, so I really like to have this fixed a.s.a.p. It's not our code that's not performing well (the Dictionary<K, V> is) but still, it's our code you all are working with so this is major enough to get fixed a.s.a.p.

So with this fix are LLBLGen v1 and v2 roughly equivelent in speed and size of the serialization? Thanks for taking a look into this.

Yes. Speed is a tiny bit faster (couple of ms) and size is a tiny bit bigger (2.86MB with 1.0.2005.1 vs. 2.87MB with v2.0, due to the fact that there are still generic types in the output).

I just attached the latest version, as it had a tiny issue. to my previous post. You can drop that dll in your app's bin folder and see what it does now instead of the shipped v2.0 runtime.

I think I'll still need a custom sink with our remoting. I think that's our best bet on making a significant improvement.

Yes if you're using a custom sink you can fully utilize Simon's code and have a massive speed / size optimization.

Frans, how can I get a simonTech's FastSerialization? Can you pull my email from my profile?

Thanks for your help!

Yes. I'll mail Simon that you want to get in contact with him and utilize his improvements.

Can I mail Simon your emailaddress?

Frans Bouma | Lead developer LLBLGen Pro

mgolubovic
User

Posts: 16
Joined: 25-Oct-2006

# Posted on: 22-Dec-2006 18:18:28

That's a good catch...

Thanks Frans for the input.

Simmotech, if it is possible that you send me your code for the optimizations you have done it would be greatly appreciated. You can email me at mgolubovic AT thatchertech.com.

Hopefully, you're still aware of this thread.

acradyn
User

Posts: 57
Joined: 03-Apr-2004

# Posted on: 22-Dec-2006 18:22:48

Yes, please send him my info. Thank you!

jeff.lewisNOSPAM at NOSPAMnsightsoftwareNOSPAM dot com

Otis
LLBLGen Pro Team

Posts: 39897
Joined: 17-Aug-2003

# Posted on: 22-Dec-2006 18:49:20

I've contacted Simon. As I have only his work email address, I think he reads the email after Christmas, but just in case

Frans Bouma | Lead developer LLBLGen Pro

mgolubovic
User

Posts: 16
Joined: 25-Oct-2006

# Posted on: 22-Dec-2006 18:52:11

Otis wrote:

I've contacted Simon. As I have only his work email address, I think he reads the email after Christmas, but just in case

That's excellent, I don't plan to do much work until after Christmas anyways

Thanks for editing my email to avoid spamming, slipped my mind.

Otis
LLBLGen Pro Team

Posts: 39897
Joined: 17-Aug-2003

# Posted on: 22-Dec-2006 19:42:36

mgolubovic wrote:

Otis wrote:

I've contacted Simon. As I have only his work email address, I think he reads the email after Christmas, but just in case

That's excellent, I don't plan to do much work until after Christmas anyways

heh, I think you're not alone

Thanks for editing my email to avoid spamming, slipped my mind.

No problem

Frans Bouma | Lead developer LLBLGen Pro

simmotech
User

Posts: 1024
Joined: 01-Feb-2006

# Posted on: 02-Jan-2007 09:48:54

Got the message....

acradyn and mgolubovic: I've got a lot of priority stuff since I got back after the holidays but by the end of the week I'll have something to email to you to try out.

The current situation is this: - The code I now have is just for .NET 2.0/C#/Adapter - I can't think of a reason it can't be ported back to .NET 1.1 (or VB.NET) but realistically I don't have the time for writing/testing other variations so I'll just stick to what I will be using for now - hopefully you are using the same combination.

I am writing some intensive unit tests to confirm that the code works as expected for all situations. This is being done on a ongoing but low priority basis compared to my other work.
The Fast serialization code requires changes to both the runtime library and templates but these are mostly additions and will be fully backwards compatible - only a handful of files need to be modified.
Further up this thread mentioned Safe/Ultrasafe/OwnedData optimizations. I have done away with these because only the Fast version gives the sort of results I would be interested in anyway and it was getting too cumbersome to maintain these.
Custom serialization sinks/surrogates are no longer necessary (but are still included) - when Fast optimization is switched on, an if statement in GetObjectData for the generated entities/collections effectively does nothing and leaves it all to the code in EntityBase2/EntityCollectionBase2<TEntity>. This passes all my tests so far.
I will download the latest version and incorporate the latest changes and then write a small doc on how to modify that version to add Fast Serialization.
The major omission so far is for UnitOfWork2 - this won't be a problem but I just haven't written the code yet - maybe next week.

I just tried out the Customers/Orders/OrderDetails test as detailed above. For .NET 2.0 (but without the latest changes), I got 7,033,545 bytes without optimization and 95,599 bytes with Fast Optimization.

Cheers Simon

Otis
LLBLGen Pro Team

Posts: 39897
Joined: 17-Aug-2003

# Posted on: 02-Jan-2007 09:55:20

simmotech wrote:

Got the message.... - Custom serialization sinks/surrogates are no longer necessary (but are still included) - when Fast optimization is switched on, an if statement in GetObjectData for the generated entities/collections effectively does nothing and leaves it all to the code in EntityBase2/EntityCollectionBase2<TEntity>. This passes all my tests so far.

Could you elaborate a bit on this? As you do extensive string cacheing, how can you add the cache to the stream on a normal basis if you dont know when the stream will end? Or am I confusing the code I had in mind with yours? (so when you wanted to optimize the serialization, you should simply pass in the optimizer)

The major omission so far is for UnitOfWork2 - this won't be a problem but I just haven't written the code yet - maybe next week.

The main optimization you can get there is by simply serialize the queues without going into more levels, though WITH the entities mentioned in the sync info. These sync info objects drag more entities into the tree than you probably would expect.

Frans Bouma | Lead developer LLBLGen Pro

simmotech
User

Posts: 1024
Joined: 01-Feb-2006

# Posted on: 02-Jan-2007 11:16:26

Otis wrote:

simmotech wrote:

Got the message.... - Custom serialization sinks/surrogates are no longer necessary (but are still included) - when Fast optimization is switched on, an if statement in GetObjectData for the generated entities/collections effectively does nothing and leaves it all to the code in EntityBase2/EntityCollectionBase2<TEntity>. This passes all my tests so far.

Could you elaborate a bit on this? As you do extensive string cacheing, how can you add the cache to the stream on a normal basis if you dont know when the stream will end? Or am I confusing the code I had in mind with yours? (so when you wanted to optimize the serialization, you should simply pass in the optimizer)

The major omission so far is for UnitOfWork2 - this won't be a problem but I just haven't written the code yet - maybe next week.

The main optimization you can get there is by simply serialize the queues without going into more levels, though WITH the entities mentioned in the sync info. These sync info objects drag more entities into the tree than you probably would expect.

Fast Serialization works with a 'root' object which can be of the following types: 1) Entity 2) EntityCollection 3) UnitOfWork (soon)

Each of these supported 'root' types is serialized in a 'self contained' manner and, so as far as the BinaryFormatter is concerned, it only ever has to serialize this single object which will give it a byte[] to store and no additional references to process.

Lets take the Nothwind Customers/Orders/OrderDetails example: I have a EntityCollection<CustomersEntity> collection containing 91 customers. The Fast Serialization code builds a list of all of the entities involved by looking at the root collection and its entities and then their member collections and then their entities and so on. The result is a graph of 3,076 entities of which 919 are referenced by other entities. The 919 entities are serialized first and this is done in two stages. First, all of their owned data is written followed by any references they might have to other entities - this includes their member collections. Once all of the referenced entities are written, the Fast serializer then serializes the root collection. For each entity in the root collection, it is checked to see if it is one of the referenced entities, if it is then the ordinal position within the referenced entity list is written otherwise the entity's data is serialized 'inline'. Then any references to other entities are serialized - typically member collections - child entities not referenced anywhere else are serialized 'inline' otherwise an int of its ordinal position is written. (Deserialization is the reverse of this - the referenced entity list is deserialized first and then the root collection is built up recursively using entities from the referenced entity list or creating new entities from 'inline' data as required)

The result is a just single byte[] and no other references for the BinarySerializer to process hence the serialization stream contains about 200-300 bytes for the root type string and the contents of the byte[].

UnitOfWork2 will be written in a similar manner - there will be flags to say which collections have contents and which are null or empty. The graph generator will look at all the populated collections and build up a list of all referenced entities and then serialize each collection in turn using the same SerializationWriter instance - therefore all the benefits of string caching and entity references occur over multiple collections within the UnitOfWork2 instance.

The use of the surrogates was only required as such when we were previously musing about optimizing serialization without changing any existing templates. In effect, it bypassed ISerializable.GetObjectData altogether since the template generated code would have added data and references we didn't want. Since we have to modify the standard templates (for other reasons) to get Fast Serialization anyway, we can do the bypass using an if-statement instead.

Cheers Simon

Previous page Next page