Using JSON with corrupt data in C# - c#

I recently had to parse JSON data like
[
{"firstName":"John", "lastName":"Doe"},
{"firstName":"Anna", "lastName":"Smith"},
{"firstName":"Peter", "lastName":"Jones"}
]
like this:
var reqData = JsonConvert.DeserializeObject<Dictionary<string, object>>("{" + fileData + "}");
which I used in another project where the data was well formattted. Here, however, the data was somewhat corrupt. For instance "firstName" might appear as ".\"firstName" and so forth. Using JSON like above results in an exception thrown.
I tried various schemes to "purify" the data but as I cannot predict the state of other data, I stopped using JSON and just parsed it myself (with heavy use of substrings and counting to isolate the keys and values). That method works OK but of course using JSON would be much simplier.
Is there a way around this with JSON?

The main Problem is to define corrupt data. If you know that there is never a substring .\" so you can replace it with an empty string and parse it afterwards. That is no problem, but it can be dificult to do something like this if it is more complex.
It is sometimes no problem for an human to read corrupt data withut a valid format - but it is almost impossible for simple algorithms.
By the way, the formatting ".\"firstName" is a valid JSON element because the " is escaped by \. See this question too.

Related

Reliably fix broken escape sequences in JSON

I'm getting some JSON for an outside source that can't be changed and apparently they don't understand the rules about escaping characters correctly in JSON string values. So they have a string value that might have tabs in it, for example, that should have been escaped and other invalid escape sequences like \$. I'm trying to parse this with JSON.Net but it keeps falling over on these sequences.
For example, the source might look something like this:
{
"someRegularProp": 10,
"aNormalString": "foo bar etc",
"anInvalidString": "foo <tab \$100"
}
and it's parsed with
var obj = JObject.Parse(json);
So I can fix this specific case with something like:
json = json.Replace("\t", "").Replace("\\$", "$"); // note: in this case I'm fine with just stripping the tabs out
But is there a general way to fix these problems to remove invalid escape sequences before parsing? Because I don't know what other invalid sequences they might put in there?
I don't see general way. Obviously they are using bugged library or no library at all to generate this output and unless you explore more, all you can do is try as much output from them as possible to find all problems.
Perhaps make a script to generate as much output as possible and validate all of that, then you can be at least a bit more sure.

Maintaining source string format when reading a date-time from a JSON path and writing it to another file

How can I have Newtonsoft.Json read the value of a path without converting or otherwise meddling with values?
This code
jsonObject.SelectToken("path.to.nested.value").ToString()
Returns this string
03/07/2019 00:02:12
From this string in the JSON document
2019-07-03T00:02:12.1542739Z
It's lost its original formatting, ISO 8601 in this case.
I would like all values to come through as strings, verbatim. I'm writing code to reshape JSON into other formats and I don't want to effect the values as they pass through my .NET code.
What do I need to change? I am not wedded to Newtonsoft.Json btw.
I got it, I think.
jsonObject.SelectToken(path).ToString(Newtonsoft.Json.Formatting.None);
The other options were to supply nothing or this.
Newtonsoft.Json.Formatting.Indented
Which is strange logic in this API as you'd think None means not indented but it means not ... I don't know. Hang on....
Okay so None or Indented returns
"2019-07-03T00:02:12.1542739Z"
(including quotes) but using the overload taking no parameters returns
03/07/2019 00:02:12
That's an odd API design ¯\_(ツ)_/¯
Here's a screenshot which shows really simple repro code.

Finding specific parts of strings containing certain letters/symbols and then creating a reference to those parts

Currently I am storing data in form of jsons (strings) on a database. As jsons contain quotation marks though and the database I am using is unable to store quotation marks in this form: " it converts all quotation marks (like this one :") to "
Unity will therefor not allow me to deserialize the json anymore as it now looks somewhat like this:
{"coins":0,"level":0,"kills":0,"deaths":0,"xp":0.0}
instead of like this:
{"coins":0,"level":0,"kills":0,"deaths":0,"xp":0.0}
Obviously a possible solution to this would be to find all the parts of my json string containing ", storing a reference to these parts and then converting all of those parts to a simple "
Therefore I would ask you how I would go about doing this.
You can use String.replace(""","\"") and than String.split, but maybe you need to think about moving to a database that supports JSONs, like mongodb. Other direction to solve this: have you tried placing the " as \"?
The Database is doing a good job by encoding the text for you thereby preventing Hacks!! It is simply doing text encoding for you.
All you have to do is Decode the text before using it. If there are chances that double quote is part of the data then you should be careful while reverse converting the encoded text. Refer to this MSDN resource Anti-Cross Site Scripting Library to get better insight into topic

String.Format not taking 4th object

Here is my prob, I wanted String.Format() function should take 4 objects and format string. But it throws "Input string not in a correct format error".
Here is my code,
string jsonData = string.Format("{{\"sectionTitle\":\"{0}\",\"strPushMsg\":\"{1}\",\"Language\":\"{2}\",}\",\"articleid\":\"{3}\"}}", urlsectiontitle, formatHeadline, Language, articleid);
\"{2}\",}\"
Looks like you need to escape that closing brace by doubling it:
string.Format("{{\"sectionTitle\":\"{0}\",\"strPushMsg\":\"{1}\",\"Language\":\"{2}\",}}\",\"articleid\":\"{3}\"}}", urlsectiontitle, formatHeadline, Language, articleid);
It appears you are creating JSON. This can use single quotes (which would avoid all the escaping), but even better use a tool like JSON.Net designed to create JSON. While your (partial) structure here is quite small (the unmatched } shows this is only partial), and the JSON gets bigger it is much easier to use a tool to get it right.

How to parse JSON results in to a dynamic object

I'm trying to write a C# utility to consume the results returned from the Export API by MailChimp.
The documentation states that the results will be returned as "streamed JSON."
"This means that a call to this API will not return a single valid JSON
object but, rather, a series of valid JSON objects separated by
newline characters."
The results that I'm seeing don't look like normal JSON to me, and aren't what I was expecting to be working with. It looks to me like CSV data wrapped in square brackets, with row headers in the first line.
A snip of the results can be viewed here. I'll paste them below as well.
["Email Address","First Name","Last Name","Company","FirstOrder","LastOrder","CustomerID","SalesRep","ScreenName","PlayerPage","PlayerPDF","Services Purchased","Contests","EMAIL_TYPE","MEMBER_RATING","OPTIN_TIME","OPTIN_IP","CONFIRM_TIME","CONFIRM_IP","LATITUDE","LONGITUDE","GMTOFF","DSTOFF","TIMEZONE","CC","REGION","LAST_CHANGED","LEID","EUID"]
["john#domain.com","John","Doe","ACME Inc","2010-09-07","2010-09-07","ABC123","sally","","","","Service1","","html",2,"",null,"2011-12-23 15:58:44","10.0.1.1","34.0257000","-84.1418000","-5","-4","America\/Kentucky\/Monticello","US","GA","2014-04-11 18:38:39","40830325","82c81e14a"]
["jane#domain2.com","Jane","Doe","XYZ Inc","2011-05-02","2011-05-02","XYZ001","jack","","","","Service2","","html",2,"",null,"2011-12-23 15:58:44","10.0.1.1","34.0257000","-84.1418000","-5","-4","America\/Kentucky\/Monticello","US","GA","2014-04-11 18:38:40","40205835","6c23329a"]
Can you help me understand what is being returned -- as it doesn't appear to be normal JSON. And what would be my best approach to parse this stream of data into a C# object.
EDIT: I've confirmed that the data stream is valid JSON using http://www.freeformatter.com/json-validator.html and pasting in the sample lines above. So what I'm hoping for is a way to dynamically create an object based on the first line, then create a list of these objects with the values contained in the subsequent lines.
You are correct, this is not in typical JSON form. What you could do is create a collection of Dictionary<string, string> objects. Use the first part of the response to use as the keys of the dictionaries and then the values found in subsequent pieces of the result as the values of each dictionary.

Categories