XML with special chars converting to Json in C# - c#

I am trying to convert XML with special characters (Tab) to Json for below xml :
<Request>
<HEADER>
<uniqueID>2019111855545921230</uniqueID>
</HEADER>
<DETAIL>
<cmnmGrp>
<coNm>IS XYZ INC.</coNm>
<embossedNm>ANNA ST UART</embossedNm>
<cMNm>ST UART/ANNA K</cMNm>
<cmfirstNm>ANNA</cmfirstNm>
<cmmiddleNm>K</cmmiddleNm>
<cm2NdLastNm>ST UART</cm2NdLastNm>
</cmnmGrp>
</DETAIL>
</Request>
I am getting below output in Json :
{
"Request": {
"HEADER": { "uniqueID": "2019111855545921230" },
"DETAIL": {
"cmnmGrp": {
"coNm": "IS XYZ INC.",
"embossedNm": "ANNA ST\t\tUART",
"cMNm": "ST\t\tUART/ANNA K",
"cmfirstNm": "ANNA",
"cmmiddleNm": "K",
"cm2NdLastNm": "ST\t\tUART"
}
}
}
}
Above response contains special characters. How can I remove \t which is coming for tab spaces. I am using below code for xml to Json conversion :
var xml = #"Input xml";
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.LoadXml(xml);
string json = JsonConvert.SerializeXmlNode(xmlDoc, Newtonsoft.Json.Formatting.None);
I am expecting final Json output as below :
{
"Request": {
"HEADER": { "uniqueID": "2019111855545921230" },
"DETAIL": {
"cmnmGrp": {
"coNm": "IS XYZ INC.",
"embossedNm": "ANNA ST UART",
"cMNm": "ST UART/ANNA K",
"cmfirstNm": "ANNA",
"cmmiddleNm": "K",
"cm2NdLastNm": "ST UART"
}
}
}
}
Can anyone help with this.
Thanks.

Don't confuse data and representation!
ANNA ST\t\tUART - is a JSON representation of the string "ANNA ST UART".
Do now JSON parsing and you will get a string without \t.
var obj = JObject.Parse(json);
var value = obj["Request"]["DETAIL"]["cmnmGrp"]["embossedNm"];
Console.WriteLine(value); // ANNA ST UART

\t is not just fixed amount of spaces, it depends on the position from the start of the line and reader's setting of max tab size in spaces (usually 8). If you expect them to appear in JSON like they appear in XML, then you have to read the XML file in text format and programmatically replace tabs with spaces according to their position before converting to JSON. Assuming you know reader's max tab size: it could be 4.
Below are two identical lines with the same "abc\t" value with assumption of max 8 spaces per tab:
<value>abc </value>
<value>abc </value>
Generally, keeping tabs is correct, although it doesn't work for you.
JSON spec defines tabs as two characters \t, and your snapshot is correct. If you retrieve a value containing \t, they should be replaced by tab characters by JSON parser. Depends on what you need; if you don't mind the initial tab positions in XML file, you may be OK already.

Related

Newtonsoft.Json.JsonReaderException: Invalid JavaScript property identifier character: ,

I have this code
var list = new List<long>();
long id = 202;
list.Add(2000);
list.Add(2001);
list.Add(2002);
var stringOfIds = string.Join(",", list);
var paramList = #"{'ProjectId':" + id + ", 'EntityIDsList': " + stringOfIds + "}";
Console.WriteLine(paramList);
var parameters = JsonConvert.DeserializeObject<Dictionary<string, object>>(paramList);
Console.WriteLine(parameters);
for some particular reason, it doesn't Deserialize the object and it crashes. What I'm trying here to do is: transform a list of longs into a string, comma separated -> construct the paramList string and then deserialize it using Newtonsoft.Json. I believe that the error is somewhere in the stringOfIds but couldn't figure it out sadly. Do you know what am I doing wrong and how can I fix it?
Right now your paramList looks like this:
{
"ProjectId": 202,
"EntityIDsList":
2000,
2001,
2002
}
Which is not proper JSON. It should look like this:
{
"ProjectId": 202,
"EntityIDsList": [
2000,
2001,
2002
]
}
So you should change it to:
var paramList = #"{'ProjectId':" + id + ", 'EntityIDsList': [" + stringOfIds + "]}";
Also at this point Console.WriteLine(parameters); won't tell you anything meaningfull, you should probably change it to Console.WriteLine(parameters.ToString());
The string you have, paramList is not a valid JSON. JSON object has keys (and values if they are strings) surrounded with double quotes, not single quotes.
Corrected string with escaped double quotes:
#"{""ProjectId"": " + id + #", ""EntityIDsList"": """ + stringOfIds + #"""}";
If your purpose of writing this string is to convert it to an object, you should directly create an object. Also note that you cant print the objects with Console.WriteLine... you will need to convert this to a string first (JsonConvert.SerializeObject) then print it.
var parameters = new
{
ProjectId = id,
EntityIDsList = stringOfIds
};
Console.WriteLine(JsonConvert.SerializeObject(parameters, Formatting.Indented));
// output:
{
"ProjectId": 202,
"EntityIDsList": "2000,2001,2002"
}
If you want EntityIDList as a list of numbers, change the value of EntityIDsList to list instead of stringOfIds.
var parameters2 = new
{
ProjectId = id,
EntityIDsList = list
};
Console.WriteLine(JsonConvert.SerializeObject(parameters2, Formatting.Indented));
//output:
{
"ProjectId": 202,
"EntityIDsList": [
2000,
2001,
2002
]
}
You have two "problems"
you need to add extra single-quotes around the stringOfIds bit
maybe it's actually what you want, but... this will give you a dictionary with 2 items with keys: "ProjectId" and "EnitityIDsList".
As the list is stringified you may as well use D<string, string> (or dynamic, depending on what you're actually trying to do.
I'm guessing you will want to have a collection of "projects"? So the structure in the question won't work.
[
{ "1": "1001,1002" },
{ "2": "2001,2002" }
]
is the normal json form for a dictionary of items
[
{ "1": [1001,1002] },
{ "2": [2001,2002] }
]
into a D<string,List<int>> would be "better".
Strongly suggest you create classes/records to represent the shapes and serialize those. Rather than string concatenation. If you must, then try to use StringBuilder.
Also, although Newtonsoft will handle single quotes, they're not actually part of the spec. You should escape double-quotes into the string if you actually need to generate json this way.
Maybe this is just a cutdown snippet to demo your actual problem and I'm just stating the obvious :D
Just a load of observations.
The extra quotes is the actual "problem" with your sample code.

Retain HTML tags on JSON to XML conversion

I have a JSON object which I convert to XML using the following code:
private string ConvertFileToXml(string file)
{
string fileContent = File.ReadAllText(file);
XmlDocument doc = JsonConvert.DeserializeXmlNode(fileContent, "root");
// Retain html tags.
doc.InnerXml = HttpUtility.HtmlDecode(doc.InnerXml);
return XDocument.Parse(doc.InnerXml).ToString();
}
where string json is the following object:
{
"id": "2639",
"type": "www.stack.com",
"bodyXML": "\n<body><p>Democrats also want to “reinvigorate and modernise” US <ft-content type=\"http://www.stack.com/ontology/content/Article\" url=\"http://api.stack.com/content/d2c32614-61c6-11e7-91a7-502f7ee26895\">antitrust</ft-content> laws for a broad attack on corporations.</p>\n<p>Mr Schumer said the Democrats’ new look should appeal to groups that backed Mrs Clinton, such as the young and minority groups, and members of the white working-class who deserted Democrats for Mr Trump. </p>\n</body>",
"title": "Democrats seek to reclaim populist mantle from Donald Trump",
"standfirst": "New economic plan is pitched as an assault on growing corporate power",
"byline": "David J Lynch in Washington",
"firstPublishedDate": "2017-07-24T17:51:25Z",
"publishedDate": "2017-07-24T17:50:25Z",
"requestUrl": "http://api.stack.com/content/e8bec6dc-708d-11e7-aca6-c6bd07df1a3c",
"brands": [
"http://api.ft.com/things/dbb0bdae-1f0c-11e4-b0cb-b2227cce2b54"
],
"standout": {
"editorsChoice": false,
"exclusive": false,
"scoop": false
},
"canBeSyndicated": "yes",
"webUrl": "http://www.stack.com/cms/s/e8bec6dc-708d-11e7-aca6-c6bd07df1a3c.html"
}
and the output of the method generates this:
<root>
<id>2639</id>
<type>www.stack.com</type>
<bodyXML>
<p>Democrats also want to “reinvigorate and modernise” US <ft-content type="http://www.stack.com/ontology/content/Article" url="http://api.stack.com/content/d2c32614-61c6-11e7-91a7-502f7ee26895">antitrust</ft-content> laws for a broad attack on corporations.</p>
<p>Mr Schumer said the Democrats’ new look should appeal to groups that backed Mrs Clinton, such as the young and minority groups, and members of the white working-class who deserted Democrats for Mr Trump. </p>
</body></bodyXML>
<title>Democrats seek to reclaim populist mantle from Donald Trump</title>
<standfirst>New economic plan is pitched as an assault on growing corporate power</standfirst>
<byline>David J Lynch in Washington</byline>
<firstPublishedDate>2017-07-24T17:51:25Z</firstPublishedDate>
<publishedDate>2017-07-24T17:50:25Z</publishedDate>
<requestUrl>http://api.stack.com/content/e8bec6dc-708d-11e7-aca6-c6bd07df1a3c</requestUrl>
<brands>http://api.ft.com/things/dbb0bdae-1f0c-11e4-b0cb-b2227cce2b54</brands>
<standout>
<editorsChoice>false</editorsChoice>
<exclusive>false</exclusive>
<scoop>false</scoop>
</standout>
<canBeSyndicated>yes</canBeSyndicated>
<webUrl>http://www.stack.com/cms/s/e8bec6dc-708d-11e7-aca6-c6bd07df1a3c.html</webUrl>
</root>
Within the original "bodyXML" of the JSON, there is HTML text with HTML tags but they get crushed into HTML entities after the conversion. What I want to do is retain these HTML tags after conversion.
How do I do this?
Help would be much appreciated!
I don't think its possible to have the 'Encoded' HTML tags in the inner text of an xml Node
But its possible to do an HTML Decode on the inner text of that Xml Node after you parse the XmlDocument.
This will get you the text with all the HTML tags intact.
Eg.,
private static string ConvertFileToXml()
{
string fileContent = File.ReadAllText("text.json");
XmlDocument doc = JsonConvert.DeserializeXmlNode(fileContent, "root");
return System.Web.HttpUtility.HtmlDecode(doc.SelectSingleNode("root").SelectSingleNode("bodyXML").InnerText);
}
Namespace required : System.Web

Extract only JSON from string in C#

I have a requirement in c# to extract the below JSON error message and read the title element.
I need to remove all the characters in the string and I want only starting from errors
i.e
{
"errors":
[{
"status": "404",
"title": "Not found data",
"detail": "This is a sample line of error detail."
}]
}
Please note that the exception can be anything so I just require to extract the JSON message starting from"errors".
Can you please assist me?
Code
string sb="{465F6CE7-3DF9-4BAF-8DD0-3E116CDAC9E7}0xc0c0167a0System.Net.WebException: There was no endpoint listening at http://TestData/member that could accept the message. This is often caused by an incorrect address or SOAP action. See InnerException, if present, for more details.
{ "errors": [ { "status": "404", "title": "Not found data","detail": "This is a sample line of error detail." } ] }";
If you're asking how to extract a specific sequence of text from a random string of text, this sounds like a regular expression.
The lazy mans solution:
If you're just looking to read the title, you could just do IndexOf on "title", and then read to the next quotation mark that's not preceded by a backward-slash.
var pattern = #"\{(\s?)\'errors.*";
string sb = "{465F6CE7-3DF9-4BAF-8DD0-3E116CDAC9E7}0xc0c0167a0System.Net.WebException: There was no endpoint listening at http://TestData/member that could accept the message. This is often caused by an incorrect address or SOAP action. See InnerException, if present, for more details. { 'errors': [ { 'status': '404', 'title': 'Not found data','detail': 'This is a sample line of error detail.' } ] }";
MatchCollection matches = Regex.Matches(sb, pattern);
I have changed the " to ', so just change the pattern to match ".
matches is not an array of all matches matches[0] will give you what you want.
You can use JSON.NET. So, you need to parse your string into JObject i.e.
string sb = #"{ ""errors"": [ { ""status"": ""404"", ""title"": ""Not found data"",""detail"": ""This is a sample line of error detail."" } ] }";
JObject jsonObject = JObject.Parse(sb);
JArray errors = (JArray)jsonObject["errors"];
foreach(var item in errors.Children())
{
int itemStatus = (int)item["status"];
string itemTitle = (string)item["title"];
string itemDetail = (string)item["detail"];
}
So, in this loop you can get what you want i have shown all the elements from the JSON that can be extracted.
Hope this helps you :)

how to normalise json from javascript in c#

hello there i have this following j.s .. i am sending an array to my C# file in r in json format
var r=['maths','computer','physics']
$.post("Global.aspx", { opt: "postpost", post: w.val(),tags:JSON.stringify(r)
}, function (d) {
});
but in c# i am getting this type of string:
["Maths""Computer""Physics"]
.
i want only the words maths,computer,physics not the [ sign and " sign .. please help me out
i have following c# code :
string[] _tags = Request.Form["tags"].ToString().Split(',');
string asd="";
foreach (string ad in _tags) {
asd += ad;
}
You're looking for JSON deserialization:
List<string> list = new JavaScriptSerializer().Deserialize<List<string>>(Request.Form["tags"]);
As pointed out, you've split your string on the , character leaving you with an array of:
[0] = "[\"Maths\""
[1] = "\"Computer\""
[2] = "\"Physics\"]"
Because JSON is a data type, those square brackets actually have functional meaning. They're not just useless extra characters. As such, you need to parse the data into a format you can actually work that.

Resolving new lines within a json field

I have been using Json.NET to serialize my objects in memory to json. When I call the following lines of code:
string json = JsonConvert.SerializeObject(template, Formatting.Indented);
System.IO.File.WriteAllText(file, json);
I get the following in a text file:
{
"template": {
"title": "_platform",
"description": "Platform",
"queries": [
{
"query": "// *******************************\n// -- Remove from DurationWindow at the end \n// *******************************\t\n"
}
],
"metadata": ""
}
}
a query is an object I pulled out of the database, that has a string value. When I use xml and write to file (Using XDocument), the new lines in the string (as well as the \t) are properly resolved into tabs and new lines in the file. Is it possible to get the same effect here with json.Net ?
The line-break and tab chars are not valid in JSON values, and JSON.net won't render \t and \n into tab & line break characters actually. To display this nicely, you could do:
var withLineBreaks = json.Replace("\\n", "\n").Replace("\\t", "\t");
However if you do that, the text that you're writing will be invalid JSON, and you'll have to strip out tab and line breaks when you read it back if you want to deserialize it.

Categories