- Home
- LLBLGen Pro
- Bugs & Issues
Serialization performance
Joined: 01-Apr-2005
Hi!
I use the adapter pattern and encountered a performance problem when i started to host a new service layer in IIS (using the binary formatter) and tried to access it via remoting.
After searching the web for answers I started to suspect that the performance problem was caused by serialization overhead. To verify this i removed the remoting configuration from my WinForms client and read 2200 entities from the service layer (that now runs in-process): Then I did this:
// Call service to fetch the collection ResultatPfaAvslutadeService service=new ResultatPfaAvslutadeService(); EntityCollection col= service.ResultatPfaAvslutades(IndataResultatDetailid, out _messageCollection);
// Time serialization/deserialization int start1=Environment.TickCount; MemoryStream outStream=new MemoryStream(); BinaryFormatter formatter=new BinaryFormatter(); formatter.Serialize(outStream,col); outStream.Position=0; EntityCollection col2=(EntityCollection) formatter.Deserialize(outStream);
int stop1=Environment.TickCount; double result1=(stop1-start1)/1000.0; MessageBox.Show(String.Format("Serialize / Deserialize: {0} s",result1));
To be able to compare the EntityCollection serialization time to something else I retrived a DataSet with the same set of records (using the EnterpriseLibrary DataAccess block):
Database db=DatabaseFactory.CreateDatabase();
string sql = "SELECT ResultatPfaAvslutadeId, IndataResultatDetailId, KundId, StatusId, BerakningId, PNr, Namn, Yrkeskategori, IntjanadFore1997, Pensionsavgift, KAP, " + "Schablon, SparadAv, SparadDen, UppdateradAv, UppdateradDen FROM ResultatPfaAvslutade WHERE IndataResultatDetailId=" + id;
DataSet ds=db.ExecuteDataSet(CommandType.Text,sql);
int start1=Environment.TickCount; MemoryStream outStream=new MemoryStream(); BinaryFormatter formatter=new BinaryFormatter();
formatter.Serialize(outStream,ds); outStream.Seek(0,SeekOrigin.Begin); DataSet ds2=(DataSet) formatter.Deserialize(outStream); int stop1=Environment.TickCount; double result1=(stop1-start1)/1000.0; MessageBox.Show(String.Format("Serialize / Deserialize: {0} s",result1));
The serialization/deserialization time of the EntityCollection is approxemately 7 seconds. The serialization/deserialization time of the DataSet is approxemately 0,55 seconds. Do you know why serialization of the EntityCollection is so slow compared to the DataSet? (I have verified that EntityCollection.GetObjectData is only called once => no child collections). I am hoping something can be done to increase serialization performance of the EntityCollection and if you want to try to reproduce this problem the table ResultatPfaAvslutade has the following definition (indexes and keys excluded):
CREATE TABLE [dbo].[ResultatPfaAvslutade] ( [ResultatPfaAvslutadeId] [int] IDENTITY (1, 1) NOT NULL , [IndataResultatDetailId] [int] NULL , [KundId] [int] NULL , [StatusId] [smallint] NULL , [BerakningId] [int] NULL , [PNr] [char] (12) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [Namn] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [Yrkeskategori] [smallint] NULL , [IntjanadFore1997] [int] NULL , [Pensionsavgift] [int] NULL , [KAP] [int] NULL , [Schablon] [char] (1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [SparadAv] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [SparadDen] [datetime] NULL , [UppdateradAv] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [UppdateradDen] [datetime] NULL ) ON [PRIMARY]
I have already tried to rewrite the GetObjectData and "Serialization constructors" involved but so far my versions are just as slow . Mayby you can find a solution.
Regards,
Tomas Sondén
Dataset is faster because it has 1 set of definitions for every row, while every object is serialized on its own, and the binary/soap formatters simply put all info per field in a separate element.
The serialization code in the entities and entity collections are made for 'general purpose' actions, e.g.: they get the data accross. For example, for speed, the entity collection shouldn't serialize its inner entities, but should serialize the data of these entities in a packed blob. That would speed up the serialization speed a lot, because you'd avoid all object elements inside an entity get serialized as well.
Generally though it's not a good idea to send thousands of objects accross a remoting/service wire, simply because it will potentially be a lot of data.
Joined: 01-Apr-2005
Thank you for your reply!
Yes, it is usually a **very a bad idea **to send thousands of objects accross the wire but in this case it is neccesary. To solve the problem I will try to write a specialized EntityCollection class that only accepts entities without object references and then try to serialize the entities in a big blob as you suggest.
By the way, after installing hotfix 890929 (http://support.microsoft.com/default.aspx?scid=kb;en-us;890929) we reduced the serialisation time from 25 seconds to 7. Mayby someone that has fewer objects then we can get some performance improvments by installing the hotfix.
Sonden wrote:
Thank you for your reply!
Yes, it is usually a **very a bad idea **to send thousands of objects accross the wire but in this case it is neccesary. To solve the problem I will try to write a specialized EntityCollection class that only accepts entities without object references and then try to serialize the entities in a big blob as you suggest.
Ok, do it this way: Create an arraylist. Add to the arraylist, all EntityFields2 objects, thus, collection[index].Fields. These are the actually data carriers. Then serialize the ArrayList and send that to the client.
On the client, you have to know the type of the entities. You then create the entities back using: entityCollection.Add(entityFactory.Create((IEntityFields2)yourDeserializedArrayList[index]));
This should give less overhead while serializing.
With the upcoming 1.0.2005.1 with inheritance this can lead to problems (with multiple types in a single collection), but for single typed collections, it should work.
By the way, after installing hotfix 890929 (http://support.microsoft.com/default.aspx?scid=kb;en-us;890929) we reduced the serialisation time from 25 seconds to 7. Mayby someone that has fewer objects then we can get some performance improvments by installing the hotfix.
Whoa, great tip, thanks! (especially because the designer deserializes too, and with very large projects this can become slow)
Joined: 01-Apr-2005
Hello!
I took another, slightly more ambitious approach.
First I added a new boolean property UseCustomSerializer on EntityCollection and our entity classes that inherit from EntityBase2. Then I modified our customized version of the adapter templates to generate de-serialization constructors like these:
protected BerakningEntity(SerializationInfo info, StreamingContext context)
{
_useCustomSerializer=(bool) info.GetBoolean("_useCustomSerializer");
if (_useCustomSerializer)
{
byte[] entityData=(byte[]) info.GetValue("_customSerializerData",typeof(byte[]));
InitClassEmpty(null, EntityFieldsFactory.CreateEntityFieldsObject(Handelsbanken.Koss.DAL.EntityType.BerakningEntity));
CustomSerializer.Deserialize(this,entityData);
}
else
{
CustomSerializer.DefaultDeserializeEntity(this,info,context);
....
RewireEventHandlers();
}
}
internal void RewireEventHandlers()
{
....
}
And GetObjectData like this:
public override void GetObjectData(SerializationInfo info, StreamingContext context)
{
info.AddValue("_useCustomSerializer",_useCustomSerializer);
if (_useCustomSerializer)
{
info.AddValue("_customSerializerData",CustomSerializer.Serialize(this));
}
else
{
....
base.GetObjectData(info, context);
}
}
The CustomSerializer class serialize and de-serialize EntityBase2 and EntityCollection objects approximately 10 times faster then the BinaryFormatter. The serialization time of our EntityCollection with 2200 items was reduced from 7 s to 0.65 s. Before we applied the hotfix mentioned above the serialization time was 25 seconds. From 25 to 0.65 is OK .
The CustomSerializer can handle cyclic references but it has at least these restrictions:
/// <summary>
/// The CustomSerializer serialize (and de-serialize) EntityCollection and direct
/// descendants of EntityBase2 with the following restrictions:
/// EntityBase2 descendant fields are only serialized if they are of the types IEntity2,
/// EntityCollection, string or a primitive type (i.e. Int32).
/// The following EntityBase2 fields (or properties) are NOT serialized:
/// EntityValidatorUse, ConcurrencyPredicateFactoryToUse, _isNewViaDataBinding and _savedFields.
/// Set UseCustomSerializer on a EntityCollection or Entity to use this serializer (instead
/// of the BinaryFormatter or SoapFormatter).
/// </summary>
Unfortunately the CustomSerializer has to use reflection to serialize and de-serialize many EntityCollectionBase2 and EntityBase2 fields. This tells me the serialization code actually belongs in these classes. Another problem with the CustomSerializer class is that it will probably break when new versions of LLBLGen Pro are released.
When LLBLGen Pro for .NET 2.0 arrives we can hopefully stop using the CustomSerializer class that is now generated into the HelperClasses folder by our modified adapter templates. But if the BinarySerializer in 2.0 is as slow as the 1.1 version (for large object graphs) it would be nice to have some kind of specialized serialization code in upcoming versions of LLBLGen Pro.
Regards,
Tomas Sondén
Joined: 06-Sep-2005
I thought the general performance penalty of serialization had to do mostly with the time required to generate the app-object specifc DLL's at run-time. There are extra bytes using XML format, but it always seems that the primary driver of horrific times is the DLL generation, not so much the inefficiency of one format over another.
Doug_Hettinger wrote:
I thought the general performance penalty of serialization had to do mostly with the time required to generate the app-object specifc DLL's at run-time. There are extra bytes using XML format, but it always seems that the primary driver of horrific times is the DLL generation, not so much the inefficiency of one format over another.
That's the case with XML serialization using the XmlSerializer. This is about binary serialization, which is done by the binary formatter (or your own ).
Sonden: great code!
Joined: 06-Sep-2005
That fact does make the times reported in the thread especially attrocious. Hard to believe it could be so slow - especially compared to another high-level mechansim such as a DataReader or whatever it was that was the subject of comparion.
Doug_Hettinger wrote:
That fact does make the times reported in the thread especially attrocious. Hard to believe it could be so slow - especially compared to another high-level mechansim such as a DataReader or whatever it was that was the subject of comparion.
Well, the amount of data of a hierarchical object graph is not necessarily bigger in memory, but because there are a lot of different objects, albeit small, per entity, this eats up a lot of space in the serialized data, as every object gets its own reference. A lot of this is meta-data, which can be re-created when you reconstruct the graph during deserialization. Though currently that's not the case.
For 2.0 I'm considering a remoting formatter which takes care of this, or other ways to drastically limit the # of data serialized during binary serialization. The dataset does the same thing: it packs all its data in a single block and serializes that. I leave it to the binary formatter, but it's perhaps better to do it myself. Though it's not simple.