Guid.Parse failure with valid Guid - Encoding issue? - c#

So, I have a valid GUID string below
Guid.Parse("e6f85ae0-f479-4e98-9287-98f7e62ba083") // Parses just fine
This parses to a GUID in .NET4.0 / 4.5
However this code
// Copy paste this line in VS Immediate window. Does not parse!!
Guid.Parse("e6f85ae0‐f479‐4e98‐9287‐98f7e62ba083")
Does not parse. The Guid strings are identical, or so I thought. Try it, in the immediate window of Visual Studio, the first does not parse, but second one does!
You can also verify with this code
var string1 = "e6f85ae0-f479-4e98-9287-98f7e62ba083";
var string2 = "e6f85ae0‐f479‐4e98‐9287‐98f7e62ba083";
bool isSame = string1.Equals(string2); // Equals false!! :/
Could this be an encoding issue? Is there any way to detect this issue and correctly parse the GUID?

I'm half convinced this is a trolling question, but in case it is an honest problem:
Those strings are not identical. the second one uses the so called 'narrow hyphen' ('\u2010'), which is a completely different character than the regular hyphen ('\u002D') and as such it is not parsed correctly.

As decPL already said the the strings are not identical due to different hyphens used, So Guid.Parse will definitely fail. To detect that the GUID is computer readable you can use Guid.TryParse Method.
bool isValidGUID = Guid.TryParse(string2, null);
And there is nothing called human readable If you are going to replace 'narrow hyphen' then in case it contains some other symbol your code will fail.
Instead of using workaround I suggest you the debug the cause for getting the 'narrow hyphen'
GUID also supports various formats. If you want to validate a specified format then you can use Guid.TryParseExact Method.

Related

int.Parse Fails On Valid String

I am completely lost with this.
So, I'm trying to use int.TryParse to parse a string, a perfectly valid string containing literally the number "3", that's it. But, it's giving me back false when I try to parse it.
This is how I'm using it - I've debugged it, and int.TryParse is definitely giving back false, as the code in the if statement runs:
if (!int.TryParse(numberSplitParts[0], out int hour))
return false;
And I've looked in the debugger numberSplitParts[0] is definitely the digit 3, that's perfectly valid!
Now, I did some research and people have been saying to use the invariant CultureInfo, so, I did that, here's the new updated line (I also tried NumberStyles.Any and that didn't work either):
if (!int.TryParse(numberSplitParts[0], NumberStyles.Integer, CultureInfo.InvariantCulture, out int hour))
return false;
That also doesn't work - it continues to give me back false, and hour is 0.
I've tried all of the other number types as well - byte.Parse, Int16.Parse etc. All of those gave back false too.
And, I've tried regular int.Parse, and that simply gives me the following exception:
System.FormatException: 'Input string was not in a correct format.'
But then, I tried it in a different project, so, I replicated the string array and everything, and it worked there - both with and without "InvariantCulture".
So, I'm suspecting that the project I'm working in must be configured in such a way that caused int.Parse/int.TryParse to not work. This is in a class library, that is being accessed from a UWP Application - could the fact that this is running under UWP have any effect?
As discussed in the comments, this is due to a couple of LEFT-TO-RIGHT MARK Unicode characters in your input.
When you did your test in a different project, you probably hard-coded the string "3", or got your input from a source which didn't add the same invisible characters.
A better test is to check whether numberSplitParts[0] == "3", either in the watch window or in your code itself. Another is to set numberSplitParts[0] = "3" in your UWP project, and see if that parses correctly.
In my experience, most cases of "this string looks fine but <stuff> fails" is due to invisible Unicode characters sneaking into your input.

Weird behavior C#

Somehow I'm getting a weird result from a GetString(). So, in my project I got this code:
byte[] arrayBytes = System.Convert.FromBase64String(n["spo_fdat"].InnerText);
string str = System.Text.Encoding.UTF8.GetString(arrayBytes);
The InnerText Value and the code is in: https://dotnetfiddle.net/mMUlti
So, my problem is that somehow I'm getting this result on my Visual Studio:
While in the online compiler that I post above the output is as expected.
This output is an output for a printer and this \0 are destroying the format.
Anyone have a clue of what is going on and what should I do/try?
It looks like for some reason every other byte in your input is null. If you strip those out you get something that looks much more plausible as printer commands (though I am no expert). Hopefully you can verify things...
To do this all I did was added this line in:
arrayBytes = arrayBytes.Where((x,i)=>i%2==0).ToArray();
The where command takes the value (x), and index (i) and if the index mode 2 is 0 (ie its even) then the where clause allows it - if its odd it throws it away.
The output I get from this starts:
CT~~CD,~CC^~CT~
^XA~TA000~JSN^LT0^MNW^MTT^PON^PMN^LH0,0^JMA^PR2,2~SD15^JUS^LRN^CI0^XZ
^XA
^MMT
^PW607
^LL0406
There are some non-printing character in there too that look like possible printing commands (eg 16 is the first character that is "data link escape" character.
Edited afterthought:
The problem you have here is obviously a problem with the specification. It seems to be that your input is wrong. You need to talk to whoever generated it find out the specification they are using to generate it, make sure their ode matches that spec and then right your code to accept that spec. With a solid specification you should both be writing compatible code.
Try inspecting the bytes instead. You'll see that what you have encoded in the base-64 string is much closer to what Visual Studio shows to you in comparison to the output from dotnetfiddle. Consoles usually don't escape non-printables (such as \0 - the null character) whereas Visual Studio string inspector does so in attempt to provide as much value to its user as possible.
Looking at your base-64 encoded data, it looks way more like UTF-16 than UTF-8. If you decode it like so, you'll perhaps get rid of the null characters in Visual Studio inspector as well.
Regardless of that, the base-64 data don't make much sense. More semantical context is required to figure out what the issue is.
According to inspection by Chris, it looks like the data is UTF-8 encoded in UTF-16.
You should be able to get proper results with the following:
var xml = //your base-64 input...
var arrayBytes = Convert.FromBase64String(xml);
var utf16 = Encoding.Unicode.GetString(arrayBytes);
var utf8Bytes = utf16.Select(c => (byte)c).ToArray();
var utf8 = Encoding.UTF8.GetString(utf8Bytes);
Console.WriteLine(utf8);
The opposite is probably how your input was created. However, you could also go for Chris' solution of ignoring every odd byte as it is basically the same with less weird encoding things going on (although this may be more explicit to what really goes on: UTF-8 inside UTF-16).

Double.Parse fails for "10.00" retrieved value

The screenshot sums up the problem:
I have no control over the retrieved value. It comes in with some funky format that I can't figure out and the parsing fails even though it looks totally normal. Typing the value in manually works just fine.
How can I "normalize" the retrieved value so Decimal.Parse does not fail?
For reference, here is the string that fails (copied and pasted):
"‎10.00"
First I would check your regional settings to eliminate anything as simple as a difference in expected decimal separator.
If that draws a blank then if the string 10.00 parses successfully then a string that looks like 10.00 but which fails to parse cannot actually be 10.00.
Inspect and determine the character code of each character of the string and confirm that it really is 10.00 and not some exotic Unicode that has the same appearance but which is actually different (which may also include characters which are not even visible when displayed).
You might have some kind of special character hidden in the string you are retrieving.
Try this:
Double.Parse(Regex.Replace(decimalValue, #"[^0-9.,]+", ""))
You might need to add using statement for System.Text.RegularExpressions
I would replace only the one problematic character, that is the safest option:
s = s.Replace("\u200E", "");
As Jeroen Mostert mentioned in a comment, there is a non-printed character in your decimalValue.
This is a similar question which should help you deal with that.
https://stackoverflow.com/a/15259355/7636764
Edit:
Using the the string output = new string(input.Where(c => char.IsLetter(c) || char.IsDigit(c)).ToArray()); part of the solution, but also include in || char.IsPunctuation(c) after IsDigit will get your desired result.

Why are some unintended symbols added to my string?

I wrote a console application which fetches strings from some fields in a Sharepoint list. Then I simply write the strings to console. This works fine for the most fields. There is one MultiLineTextField with RichText enabled where i had to remove all the html-tags, that causes this issue.
Even after all the tags are removed the strings seem to contain question marks which were never added to the string. The most weird thing about this is when I set a breakpoint and look into the string's value there are no question marks, but they suddenly appear on the console output.
The only thing I could think of was to Trim the string. Because sometimes they appear in front of the actual string sometimes they are at the and of it, but never in between.
So this is what I tried:
myString = myString.Trim();
myString = myString.Replace("?",string.Empty);
But this does not solve the issue. Besides this would not be a smart solution in case one of the strings would be supposed to contain question marks. For detailed code please see the link above.
Also Convert.ToBase64String(Encoding.UTF8.GetBytes(myString)) gives me the following output:
4oCLTWVobCwgRWllciwgV2Fzc2VyLCBIYWNrZmxlaXNjaCA=
There are probably some non-printing unicode (or possibly low ASCII) characters in the end of the string. The console has a different encoding, and will often render such as ?. Basically: use the indexer (yourString[n]) or yourString.ToCharArray() to investigate what is actually in the string aroung the location of the ?.
With the edit, we can see that the string has a zero-width space (decimal 8203) at the start:
Sounds like you're maybe having a problem with unicode characters. Chances are you're outputting the string as ASCII instead of Unicode. Take a look at this question as it sounds like you may be experiencing the same problem.

Comparing Strings in .NET

I am running into what must be a HUGE misunderstanding...
I have an object with a string component ID, I am trying to compare this ID to a string in my code in the following way...
if(object.ID == "8jh0086s)
{
//Execute code
}
However, when debugging, I can see that ID is in fact "8jh0086s" but the code is not being executed. I have also tried the following
if(String.Compare(object.ID,"8jh0086s")==0)
{
//Execute code
}
as well as
if(object.ID.Equals("8jh0086s"))
{
//Execute code
}
And I still get nothing...however I do notice that when I am debugging the '0' in the string object.ID does not have a line through it, like the one in the compare string. But I don't know if that is affecting anything. It is not the letter 'o' or 'O', it's a zero but without a line through it.
Any ideas??
I suspect there's something not easily apparent in one of your strings, like a non-printable character for example.
Trying running both strings through this to look at their actual byte values. Both arrays should contain the same numerical values.
var test1 = System.Text.Encoding.UTF8.GetBytes(object.ID);
var test2 = System.Text.Encoding.UTF8.GetBytes("8jh0086s");
==== Update from first comment ====
A very easy way to do this is to use the immediate window or watch statements to execute those statements and view the results without having to modify your code.
Your first example should be correct.
My guess is there is an un-rendered character present in the Object.ID.
You can inspect this further by debugging, copying both values into an editor like Notepad++ and turning on view all symbols.
I suspect you answered your own question. If one string has O and the other has 0, then they will compare differently. I have been in similar situations where strings seem the same but they really aren't. Worst-case, write a loop to compare each individual character one at a time and you might find some subtle difference like that.
Alternatively, if object.ID is not a string, but perhaps something of type "object" then look at this:
http://blog.coverity.com/2014/01/13/inconsistent-equality
The example uses int, not string, but it can give you an idea of the complications with == when dealing with different objects. But I suspect this is not your problem since you explicitly called String.Compare. That was the right thing to do, and it tells you that the strings really are different!

Categories