Author Topic: UTF8 decoding with false exceptions  (Read 5156 times)

KommerszUnicum

  • Newbie
  • *
  • Posts: 8
UTF8 decoding with false exceptions
« on: August 01, 2012, 07:46:34 AM »
Hello,

I've experienced two different issues with the IceChat and finally revealed that both of them has the same root. I used to connect to an UTF-8 custom IRC server. The problems were the following:

1. Sometimes I didn't get all users in the channel when I logged in
2. I got the users, BUT not all users have Full name in the Nicklist

First I stated with the debug window and found that all the server sent all info about the nicks, but part of the WHO answers had encoded in "Windows-1252". I discovered that in "strData = enc.GetString(readBuffer)" line the GetSting trowed DecoderFallbackException, and because of that the rest of the undecoded readBuffer forced to be decoded in Windows-1252. So because of this, the nicks with false decoded characters did not appeared in the NickList. The strange thing in this problem is that GetString throwed exception for characters THAT previously ENCODED correctly....

I examined UTF8Encoding enc and found that the object had been initialized with exception throwing capability when it tries to encode invalid character:

UTF8Encoding enc = new UTF8Encoding(false, true);

So because I found these exception false, I modified the code:

UTF8Encoding enc = new UTF8Encoding(false, false);

After this, the problems are disappeared, the nicks are encoded correctly. I know that mistakes could happen after this, but the problems are solved.

Did you have these kind of problems on a UTF-8 server? What I like to try is compile the source with MonoDevelop, and see what is the situation on Linux with Mono when the exception throw in enabled in the enc.

KU

KommerszUnicum

  • Newbie
  • *
  • Posts: 8
UPDATE: UTF8 decoding with false exceptions
« Reply #1 on: August 01, 2012, 03:10:42 PM »
I compiled the source with Mono, and it absolutely doing the same thing also on Linux.

When I disable the exception throw much of problems are gone.
UTF8Encoding enc = new UTF8Encoding(false, false);

I had more time to test, but unfortunately there are invalidly encoded messages even if I disable the exception throw, but not so many as before. And those are also common characters, therefore it should decode them.

Have you got any idea, what can be the problem with the decoding in this case?

Thank you!

KU

Snerf

  • Administrator
  • Hero Member
  • *****
  • Posts: 1968
    • IceChat IRC Client
Re: UTF8 decoding with false exceptions
« Reply #2 on: August 01, 2012, 06:03:42 PM »
What server are you testing this on, so I can see for myself.
This way I can see the sequence of characters that is causing the error.
« Last Edit: August 01, 2012, 06:07:08 PM by Snerf »
The IceChat God

Snerf

  • Administrator
  • Hero Member
  • *****
  • Posts: 1968
    • IceChat IRC Client
Re: UTF8 decoding with false exceptions
« Reply #3 on: August 01, 2012, 07:03:32 PM »
So, in leaving this like it was:
Code: [Select]
UTF8Encoding enc = new UTF8Encoding(false, true);
I gave added under the catch statement, instead of defaulting to Windows-1252:
Code: [Select]
strData = Encoding.GetEncoding("utf-8").GetString(readBuffer);
This seems to just replace the characters with <?> chars, but the process continues.
In looking at the error, it seems that it errors out on certain bytes, like D0, BC , but I am not sure if the error is truly the problem or not. But at least most of the text is translated to utf-8 this way.
The IceChat God

KommerszUnicum

  • Newbie
  • *
  • Posts: 8
Re: UTF8 decoding with false exceptions
« Reply #4 on: August 02, 2012, 01:49:47 AM »
I've turned back the exception throwing, and put the GetEncoding line into the catch in my code. This produce of course exactly the same results.

If you would like to see for yourself the remaining encoding issues, please let me know, and I'll register you a Nickname and send the server and the login details in private.

According to debug output during a normal channel "/list" on the server I get exceptions for characters such as A1, C3, A9.
« Last Edit: August 02, 2012, 01:57:18 AM by KommerszUnicum »

Snerf

  • Administrator
  • Hero Member
  • *****
  • Posts: 1968
    • IceChat IRC Client
Re: UTF8 decoding with false exceptions
« Reply #5 on: August 02, 2012, 07:27:07 AM »
Well, the decoder error still occurs, but at least it continues on as best as it can.
I am not sure if the data being sent is invalid, or the method of decoding is the problem.
I have found another utf-8 server which I have tested on, and I am going to test it and compare some results.

But if you could, you can send me some login information for your server, and I can test there as well.
The IceChat God

KommerszUnicum

  • Newbie
  • *
  • Posts: 8
Re: UTF8 decoding with false exceptions
« Reply #6 on: August 02, 2012, 12:20:05 PM »
It looks like IRC connection is not activated for fresh registered users on the server I used to connect. The main client for this server is a web based java client... so other IRC clients are restricted in this way. I will send you the account in a few days.

KommerszUnicum

  • Newbie
  • *
  • Posts: 8
Re: UTF8 decoding with false exceptions
« Reply #7 on: August 04, 2012, 06:13:10 PM »
As I see it correctly the auto detection of encoding tries only 2 encoding if enabled.
Give a try with UTF-8 and when it throws exception, the code will continue on the Windows-1252. This is a bit strange for me.

And because the autodetection is enabled by default, every packet is parsed with that 2 encoding, so putting utf-8 decoding to catch will break the connection immediately:

Code: [Select]
catch (ArgumentException)
{
   strData = Encoding.GetEncoding("utf-8").GetString(readBuffer);
}

Because the

Code: [Select]
strData = enc.GetString(readBuffer);
will always fail on a Windows-1252 server and the next step will be decoding again with utf-8 as a reaction to the exception results disconnect.

I did not see any way to disable the auto detection of encoding in the IceChat UI.

Did I miss something in these problems?

Snerf

  • Administrator
  • Hero Member
  • *****
  • Posts: 1968
    • IceChat IRC Client
Re: UTF8 decoding with false exceptions
« Reply #8 on: August 08, 2012, 03:35:21 PM »
I have made some changes, and I have tested it on a UTF8 server and it works a lot better/
Will see when I release RC 4.1
The IceChat God