JSON Data format to remove escaped characters - c#

Having some trouble with parsing some JSON data, and removing the escaped characters so that I can then assign the values to a List. I've read lots of pages on SO about this very thing, and where people are having success, I am just now. I was wondering if anyone could run their eyes over my method to see what I am doing wrong?
The API I have fetching the JSON data from is from IPStack. It allows me to capture location based data from website visitors.
Here is how I am building up the API path. The two querystrings i've added to the URI are the access key that APIStack give you to use, as well as fields=main which gives you the main location based data (they have a few other blocks of data you can also get).
string api_URI = "http://api.ipstack.com/";
string api_IP = "100.121.126.33";
string api_KEY = "8378273uy12938";
string api_PATH = string.Format("{0}{1}?access_key={2}&fields=main", api_URI, api_IP, api_KEY);
The rest of the code in my method to pull the JSON data in is as follows.
System.Net.WebClient wc = new System.Net.WebClient();
Uri myUri = new Uri(api_PATH, UriKind.Absolute);
var jsonResponse = wc.DownloadString(myUri);
dynamic Data = Json.Decode(jsonResponse);
This gives me a JSON string that looks like this. (I have entered on each key/value to show you the format better). The IP and KEY I have obfuscated from my own details, but it won't matter in this summary anyway.
"{
\"ip\":\"100.121.126.33\",
\"type\":\"ipv4\",
\"continent_code\":\"OC\",
\"continent_name\":\"Oceania\",
\"country_code\":\"AU\",
\"country_name\":\"Australia\"
}"
This is where I believe the issue lies, in that I cannot remove the escaped characters. I have tried to use Regex.Escape(jsonResponse.ToString()); and whilst this does not throw any errors, it actually doesn't remove the \ characters either. It leaves me with the exact same string that went into it.
The rest of my method is to create a List which has one public string (country_name) just for limiting the scope during the test.
List<IPLookup> List = new List<IPLookup>();
foreach (var x in Data)
{
List.Add(new IPLookup()
{
country_name = x.country_name
});
}
The actual error in Visual Studio is thrown when it tries to add country_name to the List, as it complains that it does not contain country_name, and i'm presuming because it still has it's backslash attached to it?
Any help or pointers on where I can look to fix this one up?

Resolved just from the questions posed by Jon and Luke which got me looking at the problem from another angle.
Rather than finish my method in a foreach statement and trying to assign via x.something,,, I simple replaced that block of code with the following.
List<IPLookup> List = new List<IPLookup>();
List.Add(new IPLookup()
{
country_name = Data.country_name,
});
I can now access the key/value pairs from this JSON data without having to try remove the escaped characters that my debugger was showing me to have...

Related

cleaning JSON for XSS before deserializing

I am using Newtonsoft JSON deserializer. How can one clean JSON for XSS (cross site scripting)? Either cleaning the JSON string before de-serializing or writing some kind of custom converter/sanitizer? If so - I am not 100% sure about the best way to approach this.
Below is an example of JSON that has a dangerous script injected and needs "cleaning." I want a want to manage this before I de-serialize it. But we need to assume all kinds of XSS scenarios, including BASE64 encoded script etc, so the problem is more complex that a simple REGEX string replace.
{ "MyVar" : "hello<script>bad script code</script>world" }
Here is a snapshot of my deserializer ( JSON -> Object ):
public T Deserialize<T>(string json)
{
T obj;
var JSON = cleanJSON(json); //OPTION 1 sanitize here
var customConverter = new JSONSanitizer();// OPTION 2 create a custom converter
obj = JsonConvert.DeserializeObject<T>(json, customConverter);
return obj;
}
JSON is posted from a 3rd party UI interface, so it's fairly exposed, hence the server-side validation. From there, it gets serialized into all kinds of objects and is usually stored in a DB, later to be retrieved and outputted directly in HTML based UI so script injection must be mitigated.
Ok, I am going to try to keep this rather short, because this is a lot of work to write up the whole thing. But, essentially, you need to focus on the context of the data you need to sanitize. From comments on the original post, it sounds like some values in the JSON will be used as HTML that will be rendered, and this HTML comes from an un-trusted source.
The first step is to extract whichever JSON values need to be sanitized as HTML, and for each of those objects you need to run them through an HTML parser and strip away everything that is not in a whitelist. Don't forget that you will also need a whitelist for attributes.
HTML Agility Pack is a good starting place for parsing HTML in C#. How to do this part is a separate question in my opinion - and probably a duplicate of the linked question.
Your worry about base64 strings seems a little over-emphasized in my opinion. It's not like you can simply put aW5zZXJ0IGg0eCBoZXJl into an HTML document and the browser will render it. It can be abused through javascript (which your whitelist will prevent) and, to some extent, through data: urls (but this isn't THAT bad, as javascript will run in the context of the data page. Not good, but you aren't automatically gobbling up cookies with this). If you have to allow a tags, part of the process needs to be validating that the URL is http(s) (or whatever schemes you want to allow).
Ideally, you would avoid this uncomfortable situation, and instead use something like markdown - then you could simply escape the HTML string, but this is not always something we can control. You'd still have to do some URL validation though.
Interesting!! Thanks for asking. we normally use html.urlencode in terms of web forms. I have a enterprise web api running that has validations like this. We have created a custom regex to validate. Please have a look at this MSDN link.
This is the sample model created to parse the request named KeyValue (say)
public class KeyValue
{
public string Key { get; set; }
}
Step 1: Trying with a custom regex
var json = #"[{ 'MyVar' : 'hello<script>bad script code</script>world' }]";
JArray readArray = JArray.Parse(json);
IList<KeyValue> blogPost = readArray.Select(p => new KeyValue { Key = (string)p["MyVar"] }).ToList();
if (!Regex.IsMatch(blogPost.ToString(),
#"^[\p{L}\p{Zs}\p{Lu}\p{Ll}\']{1,40}$"))
Console.WriteLine("InValid");
// ^ means start looking at this position.
// \p{ ..} matches any character in the named character class specified by {..}.
// {L} performs a left-to-right match.
// {Lu} performs a match of uppercase.
// {Ll} performs a match of lowercase.
// {Zs} matches separator and space.
// 'matches apostrophe.
// {1,40} specifies the number of characters: no less than 1 and no more than 40.
// $ means stop looking at this position.
Step 2: Using HttpUtility.UrlEncode - this newtonsoft website link suggests the below implementation.
string json = #"[{ 'MyVar' : 'hello<script>bad script code</script>world' }]";
JArray readArray = JArray.Parse(json);
IList<KeyValue> blogPost = readArray.Select(p => new KeyValue {Key =HttpUtility.UrlEncode((string)p["MyVar"])}).ToList();

How to select something within an XML attribute?

I am currently attempting to replace a certain string in an xml document. I am doing this through Visual Studio using C#. The exact string I want to replace is Data Source = some-host to Data Source = local-host. The string is located under an attribute to my Strings. However, the attribute connectionString has many values under it.
<Strings>
<add name="Cimbrian.Data.ConnectionString" connectionString="Data Source=some-host;Integrated Security=false;pooling=true;Min Pool Size=5;Max Pool Size=400;Connection Timeout=5;"/>
I have managed to be able to select and replace the entire values for both name and connectionString however I want to be able to select JUST the Data Source = some-host to replace.
After loading the document my code currently looks like this,
XmlNode ConnectNode = Incident.SelectSingleNode("//Strings");
XmlNode add1 = ConnectNode.FirstChild;
add1.Attributes[1].Value = "THIS REPLACES ALL OF CONNECTION STRING";
But as the string value suggests, it is replacing far more than I want it to. Any help would be appreciated. Apologies if that is slightly hard to follow.
EDIT - I forgot to mention that if possible I want to do this without searching for the specific string Data Source = some-host due to the fact that the some-host part may change, and I still want to be able to edit the value without having to change my code.
This has really nothing to do with XML - the fact that the value of the attribute is itself a semi-colon-separated list is irrelevant as far as XML is concerned. You'd have the same problem if you had the connection string on its own.
You can use SqlConnectionStringBuilder to help though:
var builder = new SqlConnectionStringBuilder(currentConnectionString);
builder.DataSource = "some other host";
string newConnectionString = builder.ToString();
This means you don't need to rely on the current exact value of some-host (and spacing) which you will do if you just use string.Replace.
If you know exactly what you would be replacing you could use the replace method:
string string2 = string1.Replace("x", "y");
This would find all instances of x and replace them with y in string1
EDIT:
Your specific code would look something like this:
add1.Attributes[1].Value = add1.Attributes[1].Value.Replace("Data Source = some-host","Data Source = local-host");
EDIT 2:
Okay based on your comment I would then split the string on the semi-colon and then iterate to find the DataSource string and modify it and then concatenate everything back together

How to skip encoding params in ASP.NET Routes

In my ASP.NET WebForm application I have simple rule:
routes.MapPageRoute("RouteSearchSimple", "search/{SearchText}", "~/SearchTicket.aspx");
As "SearchText" param I need to use cyrillic words, so to create Url I use:
string searchText = "текст";
string url = Page.GetRouteUrl("RouteSearchSimple",
new
{
SearchText = searchText
});
GetRouteUrl automatically encode searchText value and as a result
url = /search/%D1%82%D0%B5%D0%BA%D1%81%D1%82
but I need -> /search/текст
How is it possible to get it by Page.GetRouteUrl function.
Thanks a lot!
Actually, I believe Alexei Levenkov is close to the answer. Ultimately, a URL may only contain ASCII characters, so anything beyond alphanumeric characters will be URL encoded (even things like spaces).
Now, to your point, there are browsers out there that will display non-ASCII characters, but that is up to the implementation of the browser (behind the scenes, it is still performing the encoding). GetRouteUrl, however, will return the ASCII-encoded form every time because that is a requirement for URLs.
(As an aside, that "some 8 year old document" defines URLs. It's written by Tim Berners Lee. He had a bit of an impact on the Internet.)
Update
And because you got me interested, I did a bit more research. It looks as though Internationalized Domain Names do exist. However, from what I understand from the article, underneath the covers, ToASCII or ToUnicode are applied to the names. More can be read in this spec: RFC 3490. So, again, you're still at the same point. More discussion can be found at this Stackoverflow question.
Ok, guys, thank you for replies, it helps much. Simple answer is: it's impossible to do that by Page.GetRouteUrl() function. It's very strange why it hasn't beed developed in way to rely Encoding/Decoding params on developers like we have it in Request.Params or .QueryString, or at least it would be some alternate routing function where developers could control that.
One way I found is getting Url from RouteTable and parse it manually, in my case it would be like:
string url = (System.Web.Routing.RouteTable.Routes["RouteSearchSimple"] as System.Web.Routing.Route).Url.Replace("{SearchText}", "текст");
or simplest way is just creating url via string concatenation:
string param = "текст";
string url = "/search/" + param;
what I already did, but in that case you will need change the code in all places where it appears if you change your route url, therefore better create some static function like GetSearchUrl(string searchText) in one place.
And it works like a charm, Url's looks human readable and I can read params via RouteData.Values
The most simple solution is to decode with UrlDecode method:
string searchText = "текст";
string url = Page.GetRouteUrl("RouteSearchSimple",
new
{
SearchText = searchText
});
string decodedUrl = Server.UrlDecode(url); // => /search/текст

.Net 4.0 JSON Serialization: Double quotes are changed to \"

I'm using System.Web.Script.Serialization.JavaScriptSerializer() to serialize dictionary object into JSON string. I need to send this JSON string to API sitting in the cloud. However, when we serialize it, serializer replaces all the double quotes with \"
For example -
Ideal json_string = {"k":"json", "data":"yeehaw"}
Serializer messed up json_string = {\"k\":\"json\",\"data\":\"yeehaw\" }
Any idea why it is doing so? And I also used external packages like json.net but it still doesn't fix the issues.
Code -
Dictionary<string, string> json_value = new Dictionary<string, string>();
json_value.Add("k", "json");
json_value.Add("data", "yeehaw");
var jsonSerializer = new System.Web.Script.Serialization.JavaScriptSerializer();
string json_string = jsonSerializer.Serialize(json_value);
I'm going to hazard the guess that you're looking in the IDE at a breakpoint. In which case, there is no problem here. What you are seeing is perfectly valid JSON; simply the IDE is using the escaped string notation to display it to you. The contents of the string, however, are your "ideal" string. It uses the escaped version for various reasons:
so that you can correctly see and identify non-text characters like tab, carriage-return, new-line, etc
so that strings with lots of newlines can be displayed in a horizontal-based view
so that it can be clear that it is a string, i.e. "foo with \" a quote in" (the outer-quotes tell you it is a string; if the inner quote wasn't escaped it would be confusing)
so that you can copy/paste the value into the editor or immediate-window (etc) without having to add escaping yourself
Make sure you're not double serializating the object. It happened to me some days ago.
What you're seeing is a escape character
Your JSON is a String and when you want to have " in a string you must use one of the following:
string alias = #"My alias is ""Tx3""";
or
string alias = "My alias is \"Tx3\"";
Update
Just to clarify. What I wanted say here is that your JSON is perfectly valid. You're seeing the special characters in the IDE and that is perfectly normal like Jon & Marc are pointing in their answers and comments. Problem lies somewhere else than those \ characters.

How to avoid the comma in json object?

I am writing an application to get the json object from server.
for example:
{"23423423", [abc, 2009-10-12, hello]}
My problem is:
if abc is a string that contains comma, then how can I parse the content in square brackets?
normally it should be three items in the square brackets. But if abc contains a comma, then I will get four items, which is not right.
Any ideas ?
Thanks in advance !
EDIT:
JSONObject obj = new JSONObject();
List list = new ArrayList();
list.add("abc");
list.add("2009-10");
obj.put("234234", list.toString());// don't use toString();
Finally I solve it, I should not use the list.toString(), otherwise the whole list will be converted to a string.
If abc is a string, then it should be coming from the server quoted, as "abc". If it isn't, then whatever created the JSON is doing it wrong.
A decent JSON parser handles that. Why not just use one of the existing C# JSON parsers out there, such as JSONSharp?

Categories