File Encoding Issues

Posts   
 
    
Cadmium avatar
Cadmium
User
Posts: 153
Joined: 19-Sep-2003
# Posted on: 22-Nov-2006 00:02:15   

Disclaimer I don't know that there is actually a solution to this problem.

Problem: Files that are in an encoding other than the project specified encoding are read as garbage and loose anything their user code regions.

This is by design! Yes, I'm not a file encoding wizard and everything I read seems to indicate it is difficult or impossible to accurately detect a file's actual encoding.

Why is this a problem? Visual studio likes to occasionally change a file's encoding to ANSI (or similar) for the fun of it. I should admit that this is most common with aspx file.

By LLBLGen doesn't generate aspx files! I've written a few custom task performers simple_smile

(Lame) Workaround Catch a file that has been overwritten and hope you have a back up in svn. Restore the code and change file to the correct encoding.

Solution? Since it's difficult to detect the proper encoding and it's something that seems difficult to predict in visual studio, I don't know that there is a good solution, but maybe you know something I don't.

*This is all with VS2k3 and LLBLGen v1.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39588
Joined: 17-Aug-2003
# Posted on: 22-Nov-2006 08:38:10   

In the designer in project properties, what's the encoding you've specified there? (under Output settings -> EncodingToUse). You can specify ASCII there, if you don't use characters with ascii codes > 127 (like è for example)

Frans Bouma | Lead developer LLBLGen Pro
Cadmium avatar
Cadmium
User
Posts: 153
Joined: 19-Sep-2003
# Posted on: 22-Nov-2006 16:12:56   

Otis wrote:

In the designer in project properties, what's the encoding you've specified there? (under Output settings -> EncodingToUse). You can specify ASCII there, if you don't use characters with ascii codes > 127 (like è for example)

I'm using the default, which I beleive is UTF-8. Wouldn't I manually have to convert all of the files before regenerating or risk encoding issues there too?

This might work. Initial tests are positive.

Cadmium avatar
Cadmium
User
Posts: 153
Joined: 19-Sep-2003
# Posted on: 22-Nov-2006 16:16:05   

By the way, thank you for the sourcecode for the task performers. Without being able to step through the code I probably never would have figured out WHY I was having problems.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39588
Joined: 17-Aug-2003
# Posted on: 22-Nov-2006 16:44:05   

Cadmium wrote:

Otis wrote:

In the designer in project properties, what's the encoding you've specified there? (under Output settings -> EncodingToUse). You can specify ASCII there, if you don't use characters with ascii codes > 127 (like è for example)

I'm using the default, which I beleive is UTF-8. Wouldn't I manually have to convert all of the files before regenerating or risk encoding issues there too?

No, the output is encoded in the encoding specified by the streamwriter object, so it should be taken care of. The text is always converted from .NET strings to the output by the streamwriter so the encoding it's in isn't important for the output.

I'm still interested in what kind of garbage you end up with though. The files are scrambled completely?

About the sourcecode: No problem, glad it's helpful simple_smile It's often useful to have sourcecode for debugging purposes like you needed, or to extend existing code simple_smile

Frans Bouma | Lead developer LLBLGen Pro
Cadmium avatar
Cadmium
User
Posts: 153
Joined: 19-Sep-2003
# Posted on: 22-Nov-2006 16:59:03   

See attached. What seems to happen in the parser is the _originalFileContents gets set to junk, so the regex parsing of _userCodeRegions doesn't find anything (in the junk text). So it regenerates new content for the file (in utf-8 ), that has no user code regions any more.

Your suggestion for changing the project encoding to ASCII should be a decent workaround for now (we don't do a lot of non-English projects here).

I wish Visual Studio didn't wet the bed so much though wink

If you want to recreate this process manually, save a generated file as ANSI/ASCII, add some code to a user code region and regenerate. Notepad++ makes it really easy to switch the file encode, or you can use File->Advanced Save Options in Visual Studio (2003 at least).

If the project is set to UTF-8 and the file to ANSI/ASCII, it will loose the code every time.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39588
Joined: 17-Aug-2003
# Posted on: 23-Nov-2006 12:28:56   

Cadmium wrote:

See attached. What seems to happen in the parser is the _originalFileContents gets set to junk, so the regex parsing of _userCodeRegions doesn't find anything (in the junk text). So it regenerates new content for the file (in utf-8 ), that has no user code regions any more.

Your suggestion for changing the project encoding to ASCII should be a decent workaround for now (we don't do a lot of non-English projects here).

I wish Visual Studio didn't wet the bed so much though wink

If you want to recreate this process manually, save a generated file as ANSI/ASCII, add some code to a user code region and regenerate. Notepad++ makes it really easy to switch the file encode, or you can use File->Advanced Save Options in Visual Studio (2003 at least).

If the project is set to UTF-8 and the file to ANSI/ASCII, it will loose the code every time.

Hmm, that's indeed a lot of garbage. Could you determine what the encoding is when the file is read? Looks like something completely incompatible with the .net streamreader.

Frans Bouma | Lead developer LLBLGen Pro
Cadmium avatar
Cadmium
User
Posts: 153
Joined: 19-Sep-2003
# Posted on: 30-Nov-2006 00:20:52   

Otis wrote:

Hmm, that's indeed a lot of garbage. Could you determine what the encoding is when the file is read? Looks like something completely incompatible with the .net streamreader.

Sorry for the delay, it was our Thanksgiving Holiday over here and I took some extra time off.

I'm no encoding expert, but Notepad++ autodetects is as "ANSI" and Visual Studio ('03) seems to think it's "Western European (Windows) - Codepage 1252", though it might just be defaulting to that, I'm not sure.

If you have any other procedure for determining the file encoding let me know and I'll try it. I could also send you an example if you wish.

Hope that helps.

Otis avatar
Otis
LLBLGen Pro Team
Posts: 39588
Joined: 17-Aug-2003
# Posted on: 30-Nov-2006 09:46:42   

Very strange, as it looks like the file is simply garbage. I use a normal streamreader with the encoding Unicode. Perhaps that's the problem.

At the top of LptParserEngine, in the constructor _encodingToUse is set to an encoding. Could you set that to Default please? Or change the StreamReader call in TestFileCanBeOverwritten to not use an encoding and try again?

Frans Bouma | Lead developer LLBLGen Pro