VARBINARY returned as string?!

Posts   
 
    
Posts: 21
Joined: 18-Sep-2006
# Posted on: 18-Sep-2006 18:04:35   

I'm storing GUIDs as VARBINARY(16) in a MySQL 5 database, and trying to write a TypeConverter to handle the conversion.

However, the LLBLGen application and the generated code seem to disagree about the C# type of a VARBINARY field:

  • The app calls it VARBINARY and claims it's a Byte[] in C#, and ensures that the substitute TypeConverter deals with Byte[] input.
  • The generated code calls it VARCHAR and claims it's a string. This causes the TypeConverter to choke, because it's not expecting a string. Furthermore, the conversion to a Unicode string from an array of raw, unsigned bytes has corrupted said bytes; converting to a char array and back to a byte array (char by char) won't get the original data back.

Having looked through the generated code, I note that the field types are stored in an Enum (MySqlType) which is in the CoreLab assembly, and this enum doesn't contain VARBINARY...

There seems to be some fundamental inconsistency between the way the LLBLGen app sees the database and the way the code it generates sees the database. How is it possible to store GUIDs as binary values when CoreLab's library doesn't seem to recognise them?!

Has anyone else encountered this problem, and are there any workarounds? Or am I missing some obvious configuration option which eliminates the problem entirely stuck_out_tongue_winking_eye

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 18-Sep-2006 21:17:43   

The .NET type that's displayed in the fields grid of the entity in the llblgen pro designer is the .NET type the property which represents the field will get in the generated code.

So if the field has as .NET type byte[], it will have byte[] as .NET type in the generated code.

So I'm confused about:

The app calls it VARBINARY and claims it's a Byte[] in C#, and ensures that the substitute TypeConverter deals with Byte[] input. * The generated code calls it VARCHAR and claims it's a string. This causes the TypeConverter to choke, because it's not expecting a string.

When are which types defined exactly? 'Varchar' isn't a .net type, so I've a bit of a problem with where this is defined in your code.

Frans Bouma | Lead developer LLBLGen Pro
Posts: 17
Joined: 13-Sep-2006
# Posted on: 19-Sep-2006 10:43:28   

OK, to clarify...

In the LLBLGen entity properties dialog, the default type of our VARBINARY(16) fields appear as System.Byte[]. (I also note that BINARY fields only map to System.String... Why this shouldn't map to System.Byte[] also escapes me).

We have then defined a type converter that converts a Guid to / from a byte array. This converter is loaded fine in the entity properties dialog, and we can set our fields to be of return type System.Guid using this converter.

However, the fun starts when we compile and run the generated code. Our converter is being passed System.String objects, which are not defined as an acceptable type in our converter logic. The VarChar mapping Alex talks about is present in the generated PersistenceInfoProvider.cs source file:


/// <summary>Inits Table1Entity's mappings</summary>
        private void InitTable1EntityMappings()
        {
            base.AddElementMapping( "Table1Entity", "test", @"test", "table_1", 2 );
            base.AddElementFieldMapping( "Table1Entity", "Id", "id", false, (int)MySqlType.VarChar, 16, 0, 0, false, "",  new BluewireTechnologies.Data.GuidTypeConverter(), typeof(System.Byte[]), 0 );
            base.AddElementFieldMapping( "Table1Entity", "Data", "data", false, (int)MySqlType.VarChar, 45, 0, 0, false, "", null, typeof(System.String), 1 );
        }

Here, Id is our VARBINARY(16) field, but the generated code indicates the source column type is MySqlType.VarChar...

Hope this further information lets you cast more light on our problem.

Regards, James Jackson.

Posts: 17
Joined: 13-Sep-2006
# Posted on: 19-Sep-2006 10:59:11   

Further clarification of execution. I have written this simple test program:


class Program
    {
        static void Main(string[] args)
        {
            IDataAccessAdapter adapter =
                new DataAccessAdapter("Server=localhost;Database=test;User ID=root;Password=***********;");

            Guid g = Guid.NewGuid();
            Console.WriteLine(g.ToString());
            
            Table1Entity e = new Table1Entity();
            
            e.Id = g;
            e.Data = "Some test data";

            adapter.SaveEntity(e);
            
            Table1Entity e2 = new Table1Entity(g);
            adapter.FetchEntity(e2);
        }
    }

So... On SaveEntity call, our converter gets called successfully, turns the Guid in a Byte array and everything's good. On the FetchEntity call, first the converter gets called to convert the Guid to a ByteArray, which works fine. Then the ConvertFrom method gets called, but with a string being passed in. If each character in the string is taken as a byte, it's almost what we passed in, but with some bytes mangled (we assume due to character encoding being applied - the table uses UTF8 codepage / collation).

So, we have two problems. Firstly we're not getting a byte array from the database driver when fetching entities, and secondly the stored (or retrieved... need to check this) data has been mangled.

Any ideas?

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 19-Sep-2006 11:23:04   

greybadger_19 wrote:

OK, to clarify...

In the LLBLGen entity properties dialog, the default type of our VARBINARY(16) fields appear as System.Byte[]. (I also note that BINARY fields only map to System.String... Why this shouldn't map to System.Byte[] also escapes me).

BINARY is a type overlooked in the driver. I'll add that, although I think it's the same type as CHAR(n) binary.

[quote] We have then defined a type converter that converts a Guid to / from a byte array. This converter is loaded fine in the entity properties dialog, and we can set our fields to be of return type System.Guid using this converter.

However, the fun starts when we compile and run the generated code. Our converter is being passed System.String objects, which are not defined as an acceptable type in our converter logic. The VarChar mapping Alex talks about is present in the generated PersistenceInfoProvider.cs source file:


/// <summary>Inits Table1Entity's mappings</summary>
        private void InitTable1EntityMappings()
        {
            base.AddElementMapping( "Table1Entity", "test", @"test", "table_1", 2 );
            base.AddElementFieldMapping( "Table1Entity", "Id", "id", false, (int)MySqlType.VarChar, 16, 0, 0, false, "",  new BluewireTechnologies.Data.GuidTypeConverter(), typeof(System.Byte[]), 0 );
            base.AddElementFieldMapping( "Table1Entity", "Data", "data", false, (int)MySqlType.VarChar, 45, 0, 0, false, "", null, typeof(System.String), 1 );
        }

Here, Id is our VARBINARY(16) field, but the generated code indicates the source column type is MySqlType.VarChar...

(edit) It seems the corelab provider doesn't think varbinary is returnable as a byte[]. I'll check it out. They don't seem to support varbinary in the provider, just varchar. Strange.

I'll do some testing with the binary field and char (n) binary to see how these turn up in the meta-data of MYSQL (which is always a bit of a gamble).

Frans Bouma | Lead developer LLBLGen Pro
Posts: 17
Joined: 13-Sep-2006
# Posted on: 19-Sep-2006 11:23:23   

Sorry, I was incorrect in stating the characters in the string were mangled. Taking each character in the string as a byte, we are getting back what we passed in.

The question still remains - why is this coming as a string and not as a byte array? We can enumerate through the characters in the string, build a byte array, and then load the Guid from this but considering the data should be coming out of the database as a binary it seems like somewhere we're converting binary --> string --> binary.

Is this a problem with the Core driver? It looks like it could be...

Posts: 17
Joined: 13-Sep-2006
# Posted on: 19-Sep-2006 11:24:17   

Just seen your reply - thanks for looking into it.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 19-Sep-2006 11:31:43   

greybadger_19 wrote:

Sorry, I was incorrect in stating the characters in the string were mangled. Taking each character in the string as a byte, we are getting back what we passed in.

Correct. It seems that mysql stores everything as string data anyway, so a binary(16) is simply a char(16) with all the bytes.

The question still remains - why is this coming as a string and not as a byte array? We can enumerate through the characters in the string, build a byte array, and then load the Guid from this but considering the data should be coming out of the database as a binary it seems like somewhere we're converting binary --> string --> binary. Is this a problem with the Core driver? It looks like it could be...

The corelab provider doesn't support varbinary / binary as a type, so they're returning it as a varchar. disappointed

It's thus not correct what I'm doing as well: varbinary won't resolve to a byte[] at all, it will resolve to a string. I'll do a test.

Frans Bouma | Lead developer LLBLGen Pro
Posts: 17
Joined: 13-Sep-2006
# Posted on: 19-Sep-2006 11:32:11   

Otis wrote:

BINARY is a type overlooked in the driver. I'll add that, although I think it's the same type as CHAR(n) binary.

Nearly...

"The BINARY and VARBINARY data types are distinct from the CHAR BINARY and VARCHAR BINARY data types. For the latter types, the BINARY attribute does not cause the column to be treated as a binary string column. Instead, it causes the binary collation for the column character set to be used, and the column itself contains non-binary character strings rather than binary byte strings. For example, CHAR(5) BINARY is treated as CHAR(5) CHARACTER SET latin1 COLLATE latin1_bin, assuming that the default character set is latin1. This differs from BINARY(5), which stores 5-bytes binary strings that have no character set or collation."

So it may be returned as CHAR(n), it's handled differently internally...

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 19-Sep-2006 11:35:17   

Ah simple_smile I now see it in the docs as well. The problem is the same as with the varbinary: corelab doesn't support it. So it's likely returned as a string as well.

Frans Bouma | Lead developer LLBLGen Pro
Posts: 17
Joined: 13-Sep-2006
# Posted on: 19-Sep-2006 11:36:41   

Otis wrote:

he corelab provider doesn't support varbinary / binary as a type, so they're returning it as a varchar. disappointed

It's thus not correct what I'm doing as well: varbinary won't resolve to a byte[] at all, it will resolve to a string. I'll do a test.

Rrrrubbish stuck_out_tongue_winking_eye I get the impression that CoreLab are a bit slow on the uptake when it comes to updating things like this too? If we have to enumerate every character string in a VARBINARY at runtime to build a byte array (rather than the driver just streaming it straight into a byte array to begin with) that could cause significant performance problems - all our primary keys are GUIDs.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 19-Sep-2006 11:42:52   

Ok, when I create a BINARY field in a table using Mysql administrator, I get a varbinary field on MySql 4 and a VarChar field on MySql 5. MySql 4 converts it to VARBINARY internally, and Mysql 5 stores it as Binary but the driver doesn't recognize that.

Frans Bouma | Lead developer LLBLGen Pro
Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 19-Sep-2006 11:46:12   

greybadger_19 wrote:

Otis wrote:

he corelab provider doesn't support varbinary / binary as a type, so they're returning it as a varchar. disappointed

It's thus not correct what I'm doing as well: varbinary won't resolve to a byte[] at all, it will resolve to a string. I'll do a test.

Rrrrubbish stuck_out_tongue_winking_eye I get the impression that CoreLab are a bit slow on the uptake when it comes to updating things like this too? If we have to enumerate every character string in a VARBINARY at runtime to build a byte array (rather than the driver just streaming it straight into a byte array to begin with) that could cause significant performance problems - all our primary keys are GUIDs.

Well, I build against 3.20.9 and in there, it doesn't have support for binary/varbinary fields in the type enum so you and I can jump up and down for ages, it's not going to change all of a sudden. simple_smile I'm not sure about 3.50 though, it might be supported in that provider version. What provider version are you using? 3.50? Could you check for me please if the Mysqltype enum in their assembly contains values for binary/varbinary? (edit) 3.50 also doesn't have binary/varbinary in the enum of types supported, so I can't say MySqlType.VarBinary, as that's not a valid value, I have to revert to varchar.

Varbinary was already present in mysql 4, so I really doubt this is slowness from their part.

Frans Bouma | Lead developer LLBLGen Pro
Posts: 17
Joined: 13-Sep-2006
# Posted on: 19-Sep-2006 11:47:16   

It seems I was actually telling the truth about mangling too, I was just lucky with the last byte array I looked at.

a GUID I just created had three corrupt bytes - as raw integers they were 151, 138 and 132 in the original, and had values of 8212, 351 and 8222 in the string passed to our converter.

Where do you suggest we go from here? Do we need to try and get the CoreLabs people to sort out their driver?

Posts: 17
Joined: 13-Sep-2006
# Posted on: 19-Sep-2006 11:51:21   

Otis wrote:

I'm not sure about 3.50 though, it might be supported in that provider version. What provider version are you using? 3.50? Could you check for me please if the Mysqltype enum in their assembly contains values for binary/varbinary?

Yes, I have v3.50. The MySqlType enum doesn't have Binary or VarBinary in this version either.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 19-Sep-2006 11:58:27   

greybadger_19 wrote:

It seems I was actually telling the truth about mangling too, I was just lucky with the last byte array I looked at.

a GUID I just created had three corrupt bytes - as raw integers they were 151, 138 and 132 in the original, and had values of 8212, 351 and 8222 in the string passed to our converter.

I can't find anything about binary/varbinary in their docs, only related to blob. The 151 value is likely seen as a character not being part of the standard ascii set so it's converted disappointed Hmm...

Where do you suggest we go from here? Do we need to try and get the CoreLabs people to sort out their driver?

That's an option indeed, though it will likely take a couple of days before you'll get a reply.

The main problem is the value that's returned from the datareader. The value you get passed into the typeconverter is the raw value I get back from the datareader. As that's just the value from a select ... it's not changed because of a varchar specification (that's used for parameters), so if you get a string passed in, datareader.GetValue(i) returns a string for the varbinary field, so it's a corelab issue.

To work around this, you could opt for what people use on oracle as well: use char(32) to store guids and store them as hex strings. In the type converter you pass the string to the Guid constructor to create a real GUID value.

Frans Bouma | Lead developer LLBLGen Pro
Posts: 17
Joined: 13-Sep-2006
# Posted on: 19-Sep-2006 12:06:17   

I notice that other people have commented on this before:

http://crlab.com/forums/viewtopic.php?t=4293

So it looks like the bytes can be got at if using a MySqlDataReader... Is this any use or can this not be applied to LLBLGen's driver code?

I'd like to avoid lengthening the field if at all possible, considering we frequently join across multiple tables using GUIDs as PK / FK relations, shoving another 16 bytes per key could have performance issues, and will definitely impact on our storage footprint.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 19-Sep-2006 12:17:23   

greybadger_19 wrote:

I notice that other people have commented on this before:

http://crlab.com/forums/viewtopic.php?t=4293

So it looks like the bytes can be got at if using a MySqlDataReader... Is this any use or can this not be applied to LLBLGen's driver code?

That can't be applied to llblgen pro code as the datareader is read in generic code. What they propose is also a silly solution, all they have to do is return a byte[] when the type is (var)binary. That's it. Also with a custom MySqlString class it won't work, as that has the same problem: datareader.GetValues() will then still return varchar.

I'd like to avoid lengthening the field if at all possible, considering we frequently join across multiple tables using GUIDs as PK / FK relations, shoving another 16 bytes per key could have performance issues, and will definitely impact on our storage footprint.

I wonder why you don't use int PK's then? GUIDs for PKs are most of the time a pain, because they're random, which means each insert likely corrupts an index on disk and requires the RDBMS to make room in a stored (clustered) index. I'm not sure if mysql even uses that though..

Frans Bouma | Lead developer LLBLGen Pro
Posts: 17
Joined: 13-Sep-2006
# Posted on: 19-Sep-2006 12:21:41   

We have to use GUIDs for a couple of reasons which I won't go in to, but basically our architecture depends on this from the ground up.

We're currently migrating from SqlServer, so have lots of experience with indexing GUID PKs. Basically, our GUIDs are a hybrid of a 'standard' UUID with high-end bytes replaced with a time-dependant hash. The way SqlServer compares uniqueidentifer fields means that our indexes do not get corrupt, and we can also cluster on just our PKs and still get recent items near the top of each page.

We do need to look into how MySql handles this, although if CoreLabs don't sort out their driver we are rather stuck!

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 19-Sep-2006 16:56:37   

Please let me know if I need to provide info to Corelab. I've looked into ways to convert the bytes in the string but the conversion apparently is already done for you when the value comes out of the datareader as a string disappointed

If NOTHING helps, you could make a change to the runtime lib. At line 714 in DataAccessAdapterBase and at line 4860, the datareader is read (for multi and single entity fetches).

In there you could traverse the fieldset and if a field is a byte[], you could call GetBytes()...

But it's of course a support problem from then on, as you then have to make sure that your code stays in sync with our code. We don't use GetBytes at that level as it slows down the traversal of rows in the datareader.

Frans Bouma | Lead developer LLBLGen Pro
slipserve
User
Posts: 7
Joined: 04-Jul-2007
# Posted on: 04-Jul-2007 19:32:59   

If I followed this thread, I am seeing the same behavior with LLBLGenPro and the Npgsql (included with PostgreSQL 8.2) driver. All columns defined as "bit varying" are recognized in the field mapping as varchar. This is problematic as I plan on storing/retrieving binary serialized objects in this column.

I'm using the June 15th, 2007 (final) release of LLBLGenPro 2.0.

Any advice is appreciated!

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39614
Joined: 17-Aug-2003
# Posted on: 04-Jul-2007 20:46:01   

Shouldn't you define blob fields as byte arrays ?

Frans Bouma | Lead developer LLBLGen Pro
slipserve
User
Posts: 7
Joined: 04-Jul-2007
# Posted on: 05-Jul-2007 00:19:12   

I didn't think that LLBLGenPro supported them based on the documentation. "PostgreSql All features of PostgreSql 7.4 or higher, except array types. "

After reviewing the PostgreSQL doc, however, I now realize that array types and BYTEA are different. I'll try it.

Thanks!