Consider the following line of code:
string _decoded = System.Web.HttpUtility.UrlDecode(
"There%20should%20be%20text%20after%20this%0022help!");
The encoded line
"There%20should%20be%20text%20after%20this%0022help!"
when decoded via the website urldecoder.org produces
"There should be text after this22help!"
however the value of _decoded as displayed in the debugger is:
Figure 1: Debugger view of problem
What could be causing this problem? Is there a setting or special encoding that will circumvent this in all cases?
EDIT: Yes, I consider this behavior to be an error. I don't want URLDecode to introduce the \0 char to the resultant string, because it would result in an invalid file name (my code is moving around files).
There is a null byte (\0 = %00) after this so the debugger doesn't show the rest of the string.
So the decoded value is correct, it's just the limitation (or bug?) of the debugger.
You can take a look at here for more info about null byte from security perspective. And there is this question posted about it as well.
Related
Somehow I'm getting a weird result from a GetString(). So, in my project I got this code:
byte[] arrayBytes = System.Convert.FromBase64String(n["spo_fdat"].InnerText);
string str = System.Text.Encoding.UTF8.GetString(arrayBytes);
The InnerText Value and the code is in: https://dotnetfiddle.net/mMUlti
So, my problem is that somehow I'm getting this result on my Visual Studio:
While in the online compiler that I post above the output is as expected.
This output is an output for a printer and this \0 are destroying the format.
Anyone have a clue of what is going on and what should I do/try?
It looks like for some reason every other byte in your input is null. If you strip those out you get something that looks much more plausible as printer commands (though I am no expert). Hopefully you can verify things...
To do this all I did was added this line in:
arrayBytes = arrayBytes.Where((x,i)=>i%2==0).ToArray();
The where command takes the value (x), and index (i) and if the index mode 2 is 0 (ie its even) then the where clause allows it - if its odd it throws it away.
The output I get from this starts:
CT~~CD,~CC^~CT~
^XA~TA000~JSN^LT0^MNW^MTT^PON^PMN^LH0,0^JMA^PR2,2~SD15^JUS^LRN^CI0^XZ
^XA
^MMT
^PW607
^LL0406
There are some non-printing character in there too that look like possible printing commands (eg 16 is the first character that is "data link escape" character.
Edited afterthought:
The problem you have here is obviously a problem with the specification. It seems to be that your input is wrong. You need to talk to whoever generated it find out the specification they are using to generate it, make sure their ode matches that spec and then right your code to accept that spec. With a solid specification you should both be writing compatible code.
Try inspecting the bytes instead. You'll see that what you have encoded in the base-64 string is much closer to what Visual Studio shows to you in comparison to the output from dotnetfiddle. Consoles usually don't escape non-printables (such as \0 - the null character) whereas Visual Studio string inspector does so in attempt to provide as much value to its user as possible.
Looking at your base-64 encoded data, it looks way more like UTF-16 than UTF-8. If you decode it like so, you'll perhaps get rid of the null characters in Visual Studio inspector as well.
Regardless of that, the base-64 data don't make much sense. More semantical context is required to figure out what the issue is.
According to inspection by Chris, it looks like the data is UTF-8 encoded in UTF-16.
You should be able to get proper results with the following:
var xml = //your base-64 input...
var arrayBytes = Convert.FromBase64String(xml);
var utf16 = Encoding.Unicode.GetString(arrayBytes);
var utf8Bytes = utf16.Select(c => (byte)c).ToArray();
var utf8 = Encoding.UTF8.GetString(utf8Bytes);
Console.WriteLine(utf8);
The opposite is probably how your input was created. However, you could also go for Chris' solution of ignoring every odd byte as it is basically the same with less weird encoding things going on (although this may be more explicit to what really goes on: UTF-8 inside UTF-16).
I wrote a console application which fetches strings from some fields in a Sharepoint list. Then I simply write the strings to console. This works fine for the most fields. There is one MultiLineTextField with RichText enabled where i had to remove all the html-tags, that causes this issue.
Even after all the tags are removed the strings seem to contain question marks which were never added to the string. The most weird thing about this is when I set a breakpoint and look into the string's value there are no question marks, but they suddenly appear on the console output.
The only thing I could think of was to Trim the string. Because sometimes they appear in front of the actual string sometimes they are at the and of it, but never in between.
So this is what I tried:
myString = myString.Trim();
myString = myString.Replace("?",string.Empty);
But this does not solve the issue. Besides this would not be a smart solution in case one of the strings would be supposed to contain question marks. For detailed code please see the link above.
Also Convert.ToBase64String(Encoding.UTF8.GetBytes(myString)) gives me the following output:
4oCLTWVobCwgRWllciwgV2Fzc2VyLCBIYWNrZmxlaXNjaCA=
There are probably some non-printing unicode (or possibly low ASCII) characters in the end of the string. The console has a different encoding, and will often render such as ?. Basically: use the indexer (yourString[n]) or yourString.ToCharArray() to investigate what is actually in the string aroung the location of the ?.
With the edit, we can see that the string has a zero-width space (decimal 8203) at the start:
Sounds like you're maybe having a problem with unicode characters. Chances are you're outputting the string as ASCII instead of Unicode. Take a look at this question as it sounds like you may be experiencing the same problem.
I receive this exception during parsing string containing JSON:
Newtonsoft.Json.JsonReaderException: Unterminated string. Expected delimiter: ". Path '[114].var2', line 1, position 431602.
So I went to exactly that (431602) position and found that it's here:
(...)lZmdoaWpzdHV2d3h5eoKDhIWGh4iJipKTlJWWl5iZmqK*jp*KWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW1(...)
So it's just simply "jp" chars which shouldn't be anything wrong.
What could be a reason for this exception?
EDIT
To be more specific I also put a whole string with few variables around it:
"var1":"946","var2":"\/9j\/4AAQSkZJRgABAQAAAQABAAD\/\/gA7Q1JFQVRPUjogZ2QtanBlZyB2MS4wICh1c2luZyBJSkcgSlBFRyB2NjIpLCBxdWFsaXR5ID0gNjAK\/9sAQwANCQoLCggNCwoLDg4NDxMgFRMSEhMnHB4XIC4pMTAuKS0sMzpKPjM2RjcsLUBXQUZMTlJTUjI+WmFaUGBKUVJP\/9sAQwEODg4TERMmFRUmTzUtNU9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09P\/8AAEQgAeABHAwEiAAIRAQMRAf\/EAB8AAAEFAQEBAQEBAAAAAAAAAAABAgMEBQYHCAkKC\/\/EALUQAAIBAwMCBAMFBQQEAAABfQECAwAEEQUSITFBBhNRYQcicRQygZGhCCNCscEVUtHwJDNicoIJChYXGBkaJSYnKCkqNDU2Nzg5OkNERUZHSElKU1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6g4SFhoeIiYqSk5SVlpeYmZqio6Slpqeoqaqys7S1tre4ubrCw8TFxsfIycrS09TV1tfY2drh4uPk5ebn6Onq8fLz9PX29\/j5+v\/EAB8BAAMBAQEBAQEBAQEAAAAAAAABAgMEBQYHCAkKC\/\/EALURAAIBAgQEAwQHBQQEAAECdwABAgMRBAUhMQYSQVEHYXETIjKBCBRCkaGxwQkjM1LwFWJy0QoWJDThJfEXGBkaJicoKSo1Njc4OTpDREVGR0hJSlNUVVZXWFlaY2RlZmdoaWpzdHV2d3h5eoKDhIWGh4iJipKTlJWWl5iZmqKjpKWmp6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uLj5OXm5+jp6vLz9PX29\/j5+v\/aAAwDAQACEQMRAD8A9OooooAKTNLTaAI7m5jtoWllYKqjOTXmniTxrqMt00elXflQDg4TB\/765P8AKtPxnqdxc67HokZEVuqB55O+OvH4Vz2qalbQCAWiQqpBwyRAuADjC5GAM5GepwT6Z5pVJqpY0UVy3LXh\/wAU65JGySqb0K3LLOfMH4ZyR+FbMXj0218kN5bs0R4f++h\/TP06+9cdZxWWqyyxo0qXuCyFpMlz+X+FQarMb6CByzedANkin7w\/Hrjg9elTG\/tLjaXKe32tzDd20dxbSLJFIMqw7iiuR8B6r51qtm5XfGigj1P+Qf0orohPmRm1Y7OiiirEFIRS0UAecfEbTJYro6tbSNuKKkir2UgjP8qwRbO+mXiwxxvP+6iw+Pkwoyee3J59q7vxaxt7uGaZN9q8ZjlX\/Zzz\/NT+BrkBJZRXzI8jQMmUVuAJY+wOeOnuCPeuKrKSk1Y2gkZmm2lrZ6haJPbOlwSHExlwu0dWwPboKuqqy+JYbmCNSk12UY5yGRdvzfmrnNYpihuL1ks7Se4cnasa5Kg\/zIq3dx30t+mm26CCWNNrFWwIUx8xJBOBj3\/U05Ru736AnpYteFUuJvFSRwybtxZ2Zc4A2HA\/L+dFdJ4PmtBrMOn6YhaCBGaSXH3jjGfzx+Aoram3JXtYzluegUUUVsSFFFFAFa\/tYby2MM65U9D6GvMfEeg32m5CwPc2n8EkQJaP29cV6fdSJDE0sjYVRkmuD1fxHqU8p\/s+wl8kHALxMS3vjtWU7XuUmznbO38S6lbFIL2SK16Zf5N35DJq9b+FobVDJe3ny9X28A\/UmmNe+J7kbUhkQH\/pkF\/nTU0K8um36ret\/uht5\/wFcbk1vJRXlqzVK\/S50\/guazbVZrfTYx5MURLuBwTkADPfvRWv4Q02HT9PdoI9okbgnknHcmiuyilyKxlPc36KKK1JCmuyopZ2CqoySTgAU6ue8TRXOpQfYLW5WCJj++bBJb\/ZHt61FSpGmryY1FvRFW+8V6O0m37aCq9AqMc\/pVCfxppUIIjiuJPooA\/U1Vj8D2xOZ72Vv91Qv+NWH8JaVFg7ZpSB\/G\/+GK8qpLDSlzSbZulO1kZVz4xspSfL0yTPr52P6VWSbWtXlSGxs3gWRgA+Mf8Ajx\/pW0H0XSG+Y2sJHpgt\/jXReG5YdQR9QhVvKyUiZlxu9SP5fnV0VGU1yQ07sJNpayNextUsrGG1jJKxIFyep9z9etFT0V6pzhRRUNzOtvEZH+gA6k+lJtJXYGP4o8RQ6FbqCplnk+6gbBA9T\/n+VcHP42vHYmC2hjz\/AHiWP9K67UNGsLqd7u\/Rpp5OTuc4HoAB2FUhDodgculnAR\/e27v8a82riaVR\/C2bxhJLexzaa\/r1\/wDLA8p9oYv\/AK2aVtN8QXn+uFwQevnS4A\/AmuifxTpFvlI5Hl9o0P8AXFY194xIJFrZ\/RpG\/oP8aIyqv+HTS\/r5A1Hqw0vwTJdXiJe3ACnlliGePqf8K9OtreK0to7e3QJFEoVVHYCuf8Epezaa2o6gQHuT+6QLgKg7\/j\/hXS13UVNR996mUmr6BRRRWpIyWRIYnklYIiAszE4AA6mvNtW8Q6rqmoNJpqzrbR8QqkeSf9o8dT+ldfrOrWkbtBJcwKkf+s3uBk9l\/qfwrFm8WaRCNrXW8+iKT\/8AWrjrV3fkjG5rGGl27HJyaf4m1FyZRdFT2kk2j8iafB4Ov2cefPDGPYlj\/n8a2LjxnZKD5FvPJ\/vYUf1rHn8Z3jt+4too\/wDeJY\/0rJSxL+GKRVqa3dzctfB9lHg3E80p9BhR\/X+dalloel\/bkt4LWLcBuct8zBfXmuPW58R6mm5Tc7D\/AHR5a\/nxXoXhHQjounE3BDXlwd0zZzj0XPt\/PNOFCpOXvzv5ITmktEboAVQqjAAwAKWiivQMQooooA8417wlc3OsX00U0McUkvmDOc8gE8Y9SazIfCUCNi5upH9kUL\/PNdZ411S60ia3e3ijdLhSpLg8Ffoff9K8+udc1i7lKwswz2hj\/wAmuKca7k7OyNouCWqOpt9A0mFMm3Lkd3cn\/wCtT2uNF0\/7rWkLDsoG79Oa5BNL1q95m8zB7zSf061p6d4OmnuESa5XJONsa5\/U1g6Uft1LlqT6RO18OXUWsXLSW6u1tbEZkYYDP2A+nX8q6qqek6bb6Tp0VlajCRjknqx7k1crvpUo0o2iYSk5O7CiiitSQooooAoavaLc26ExLI0ThgCufY\/zrEOi39xIf3ccMeeNx7fQUUVzzoRqTvIuM3FaFqLw1ggy3ZI7hEx+pP8AStOy0u2sm3xb2fGNznOKKKuNGnHVITnJ7su0UUVqSJRmiigAzRRRQB\/\/2Q==","var3":"77241"
I noticed that this seems to be a base64 encoded JPEG. In most cases it would be better & easier to have your server send back the jpeg rather than JSON.
public ActionResult ShowImage()
{
var file = Server.MapPath("~/App_Data/UserUpload/asd.png");
return File(file, "image/png", Path.GetFileName(file));
}
To see original source follow this link.
Troubleshooting advice:
If you are unable to put the whole string, you may want to make it smaller to isolate the error.
Try removing a a block characters at a time (I would use a binary left/right half experimentation method) until it stops breaking. Then look closer at the bad half. If you can get the bad half small enough and are unable to see the error, please post it.
I am guessing the column given by the error isn't counted the same way that you are counting them.
I have a problem with using Directory.Exists() on a string that contains an accented character.
This is the directory path: D:\ést_test\scenery. It is coming in as a simple string in a file that I am parsing:
[Area.121]
Title=ést_test
local=D:\AITests\ést_test
Layer=121
Active=FALSE
Required=FALSE
My code is taking the local value and adding \scenery to it. I need to test that this exists (which it does) and am simply using:
if (!Directory.Exists(area.Path))
{
// some handling code
area.AreaIsValid = false;
}
This returns false. It seems that the string handling that I am doing is replacing the accented character. The text visualizer in VS2012 is showing this (directoryManager is just a wrap around System.IO.Directory):
And the warning message as displayed is showing this:
So it seems that the accented character is not being recognized. Searching for this issue does turn up but mostly about removing or replacing the accented character. I am currently using 'normal' string handling. I tried using FileInfo but the path seems to get mangled anyway.
So my first question is how do I get the path stored into a string so that it will pass the Directory.Exists test?
This raises a wider question of non latin characters in path names. I have users all over the world so I can see arabic. Russian, Chinese and so on in paths. How can I handle all of these?
The problem is almost certainly that you're loading the file with the wrong encoding. The fact that it's a filename is irrelevant - the screenshots show that you've lost the relevant data before you call Directory.Exists.
You should make sure you know the file encoding (e.g. UTF-8, Cp1252 etc) and then pass that in as an argument into however you're loading the file (e.g. File.ReadAllText). If this isn't enough information to get you going, you'll need to tell us more about the file (to work out what encoding it's in) and more about your code (how you're reading it).
Once you've managed to load the correct data, I'd hope that the file aspect just handles itself automatically.
I am writing a crawler for a website.
Its response is gzip encoded.
I am not able to parse correctly a particular field, though the decompression is successful.
I am also using htmlagilitypack to parse it,
the parsed value of the field is only a part of the original value
as an example :
I am getting only /wEWAwKc04vTCQKb86mzBwKln/PuCg==
whereas the firebug shows the actual value as much longer:
/wEWBgKj7IuJCgKb86mzBwKln/PuCgLT250qAtC0+8cMAvimiNYD
what does the '==' at the end means?
I am assuming it that its a error on decompressors behalf?
The character = is added by the Base64 encoding.
Encoding the following sentence
Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure.
you would get
TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz
IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg
dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu
dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo
ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=
The = character can only be present at the end of the Base64 string. If you obtain it, it means you are probably getting all the characters; vice versa is not true, as that character is used as padding character, and it is not always mandatory in all the Base64 implementations.
You don't have a problem with decompression. The page has obviously been correctly decompressed. Otherwise your software would likely throw an error or you'd see just a bunch of strange characters.
However, what you get is an ASCII string that's obviously in Base 64 encoding. The equal signs at the end appear if the original binary data is not a multiple of 3 bytes. So that's all perfect Base 64 data.
As to why your crawler gets different data than Firefox with Firebug: I don't know but can image many reasons. These are two separate browsing sessions and the web site might just assign them different session IDs or somehow record some history of the session.
Anyhow, at the end of the day I don't understand your problem. What exactly are you unable to parse? Do you get some kind of error? What do you mean by field? Are you talking about a field of an HTML form?