Regex Option - no recursive regex - c#

I'm trying to use System.Text.RegularExpressions.Regex class to grab some text from a JSON string. The sting is something like
[{"name":"joe","message":"hello","sent":"datetime"}{"name":"steve","message":"bye","sent":"datetime"}]
I'm attempting to use the Matches() method to grab the "message" values. However, specifying a match as something like message":"*","sent as the pattern would return 3 matches:
hello
bye
hello","sent":"datetime"}{"name":"steve","message":"bye
How do I structure the options or modify my pattern to exclude recursive regex checks? I only want matches
hello
bye

The JavaScriptSerializer class (namespace System.Web.Script.Serialization, assembly System.Web.Extensions.dll) is pretty useful for dealing with JSON strings like this.
var json = "[{\"name\":\"joe\",\"message\":\"hello\",\"sent\":\"datetime\"},{\"name\":\"steve\",\"message\":\"bye\",\"sent\":\"datetime\"}]";
var serializer = new JavaScriptSerializer();
var result = serializer.Deserialize<object[]>(json);
// now have an array of objects, each of which happens to be an IDictionary<string, object>
foreach(IDictionary<string, object> map in result)
{
var messageValue = map["message"].ToString();
Console.WriteLine("message = {0}", messageValue);
}

JSON is better parsed by a JSON tool.
You can try using the non-greedy syntax .*? for example.

Related

Json get array object from JSON Object C#

I have this json string passed to my webapi
string jsonstring = "{\"datamodel\": \"[{\"K1\":\"V1\",\"K2\":\"V2\"}]\"}";
I may have more than on pair of (K,V) inside. How do i parse this in C# ?
I thought i could first convert my string to a JObject and get the key for datamodel from that and then use JArray to parse the K,V. But its throwing a jsonreader exception on the first line of code here
JObject my_obj = JsonConvert.DeserializeObject<JObject>(jsonstring.ToString());
and then do this..
JObject data = my_obj["datamodel"].Value<JObject>();
First of all, the JSON string you are posting is not valid. Given your comment, you can clean up the quotes before and after the square brackets using this snippet:
string jsonstring = "{\"datamodel\": \"[{\"K1\":\"V1\",\"K2\":\"V2\"}]\"}";;
string jsonstringCleaned = jsonstring.Replace("\"[", "[").Replace("]\"", "]");
var my_obj = JsonConvert.DeserializeObject<JObject>(jsonstringCleaned);
The code is right, but the exception you are getting is related to the formatting of your JSON string. If you put valid JSON in this code, it should work as expected.
There are \" missing around V1 in your JSON string.
It should look like this:
string jsonstring = "{\"datamodel\": \"[{\"K1\":\"V1\",\"K2\":\"V2\"}]\"}";
First always make sure that you have a valid Json string. A simple way to do that is paste it into a Json to C# converter tool, such as this one: http://json2csharp.com/
It may be simpler and more readable to use single quotes within your Json string if that is an option, as it avoids the need to escape the double quotes:
string jsonstring = "{'datamodel': [{'K1':'V1','K2':'V2'}]}"
Now we deserialize the object and get the JArray. There is no need to call the ToString() on the JSON jsonstring string.
var my_obj = JsonConvert.DeserializeObject<JObject>(jsonstring);
var data = (JArray)my_obj["datamodel"];
A better and more concise way to accomplish the same result could be to just use JObject.Parse. We can accomplish the same result with just one line of code.
var data = (JArray)JObject.Parse(jsonstring)["datamodel"];

Newtonsoft JSON.NET and spaces in json key bug?

Take the following valid json:
{
"universe": {
"solar system": "sun"
}
}
and here's the simple C# code:
using Newtonsoft.Json;
JToken x = JToken.Parse("{\"universe\": {\"solar system\": \"sun\"}}");
string s = x.First.First.First.Path;
At this point s = "universe['solar system']"
However I'm expecting "universe.['solar system']" (notice the '.' after "universe").
If the json key does not have a space ("solar_system") I get "universe.solar_system" which is correct.
The question is: Is this a bug in json.net or do I need to do something else to support spaces in json keys?
Thanks,
PT
This is not a bug. The path returned by JToken.Path is intended to be in JSONPath syntax. As explained in the original JSONPath proposal:
JSONPath expressions can use the dot–notation
$.store.book[0].title
or the bracket–notation
$['store']['book'][0]['title']
So universe['solar system'] is perfectly valid, and if you pass it to SelectToken() you'll get the correct value "sun" back:
JToken x = JToken.Parse("{\"universe\": {\"solar system\": \"sun\"}}");
string path = x.First.First.First.Path;
Console.WriteLine(path); // Prints universe['solar system']
var val = (string)x.SelectToken(path);
Console.WriteLine(val); // Prints "sun"
Debug.Assert(val == "sun"); // No assert
See also Querying JSON with SelectToken and escaped properties.
If you nevertheless want the extra . in the path you can create your own extension method JTokenExtensions.ExpandedPath(this JToken token) based on the reference source.

C# preserve escape sequence when reading JSON content using Json.NET

C# preserve escape sequence when reading JSON content using Json.NET
Given the following json text content:
{ "Pattern": "[0-9]*\t[a-z]+" }
Which is reflected in a simple class:
public class Rule
{
public string Pattern { get; set; }
public bool Test(string text)
{
return new Regex(Pattern).IsMatch(text);
}
}
And it's deserialised like this:
var json = System.IO.File.ReadAllText("file.json");
var rule = JsonConvert.DeserializeObject<Rule>(text);
The value of Pattern is supposed to be a regex pattern. The problem is that, once the content is read, the "\t" escape sequence is immediately applied as a escape character which is a tab, resulting in the string value: [0-9]* [a-z]+.
What I understand is that the content is somewhat malformed, because it should look like this: [0-9]*\\t[a-z]+ to be valid within the Json content, escaping the backslash so it could be preserved and result into the actual pattern [0-9]*\t[a-z]+. But the file is user edited and I would just like to be able to loosely interpret the content, assuming that backslashes should be preserved (and escape sequences would not be transformed).
I tried to implement a custom JsonConverter but when looking up the token, the value is already resolved.
FIDDLE
I've tried the below code and it works...maybe i don't understand what is the problem or you can provide a sample that doesn't work with this:
StreamReader s= new StreamReader(#"test.txt");
string json = s.ReadToEnd();
json=json.Replace("\\","\\\\");
JObject obj = JObject.Parse(json);
string pattern = obj["Pattern"].ToString();
bool test = Regex.IsMatch("1 a", pattern);
test.txt contains just this:
{ "Pattern": "[0-9]*\t[a-z]+" }
Edit
As Thomasjaworsky remarks, instead of json=json.Replace("\\","\\\\"); is better to use Regex.Replace(json, #"(?<!\\)[\\](?!\\)", #"\\")
, it will do the same replace, but only if not already escaped. Two backspaces in row are untouched.

Remove dynamic substring from string c#

I am currently implementing a system and came across the following problem.
We are making a call to a 3rd party supplier that provides a string of contents which is not in JSON format, therefore we are trying to remove content from the string, in order to build a JSON string and serialize it into our own object.
We are trying to remove the part from {"imgCount to ]", (just before the "images":
An example of the string is the following:
img_CB("imgCount":31,"imgHash":"[6ede94341e1ba423ccc2d4cfd27b9760]","images":{...});
The issue is that, the imgCount and imgHash may not be in that order. It could be in something like the following:
img_CB("images":{....}, "imgHash":"[6ede94341e1ba423ccc2d4cfd27b9760]", "imgCount":31);
Therefore this makes it quite dynamic and hard to determine where to start "replacing" from.
Would anyone help to possibly build a regex expression to replace/remove the imgHash and imgCount tags with their values please?
Thanks
Looks like you're getting a jsonp response. Have a look at this answer on how to parse your json after stripping off the jsonp stuff:
How to parse JSONP using JSON.NET?
Example:
string supposedToBeJson = "img_CB(\"imgCount\":31,\"imgHash\":\"[6ede94341e1ba423ccc2d4cfd27b9760]\",\"images\":{});";
var jobject = JObject.Parse(supposedToBeJson.Replace("img_CB(", "{").Replace(");", "}"));
var images = jobject.SelectToken("images");
try this:
str = Regex.Replace(str, "\"imgCount*,","");
str = Regex.Replace(str, "\"imgHash*,","");
str = Regex.Replace(str, ",*\"imgCount*)",")");
str = Regex.Replace(str, ",*\"imgHash*)",")");

how to validate JSON string before converting to XML in C#

I will receive an response in the form of JSON string.
We have an existing tool developed in C# which will take input in XML format.
Hence i am converting the JSON string obtained from server using Newtonsoft.JSON to XML string and passing to the tool.
Problem:
When converting JSON response to XML, I am getting an error
"Failed to process request. Reason: The ' ' character, hexadecimal
value 0x20, cannot be included in a name."
The above error indicates that the JSON Key contains a space [For Example: \"POI Items\":[{\"lat\":{\"value\":\"00\"}] which cannot be converted to XML element.
Is there any approach to identify spaces only JSON key's ["POI Items"] and remove the spaces in it?
Also suggest any alternative solution so that we needn't change the existing solution?
Regards,
Sudhir
You can use Json.Net and replace the names while loading the json..
JsonSerializer ser = new JsonSerializer();
var jObj = ser.Deserialize(new JReader(new StringReader(json))) as JObject;
var newJson = jObj.ToString(Newtonsoft.Json.Formatting.None);
.
public class JReader : Newtonsoft.Json.JsonTextReader
{
public JReader(TextReader r) : base(r)
{
}
public override bool Read()
{
bool b = base.Read();
if (base.CurrentState == State.Property && ((string)base.Value).Contains(' '))
{
base.SetToken(JsonToken.PropertyName,((string)base.Value).Replace(" ", "_"));
}
return b;
}
}
Input : {"POI Items":[{"lat":{"value":"00","ab cd":"de fg"}}]}
Output: {"POI_Items":[{"lat":{"value":"00","ab_cd":"de fg"}}]}
I recommend using some sort of Regex.Replace().
Search the input string for something like:
\"([a-zA-Z0-9]+) ([a-zA-Z0-9]+)\":
and then replace something like (mind the missing space):
\"(1)(2)\":
The 1st pair of parenthesis contain the first word in a variable name, the 2nd pair of parenthesis means the 2nd word. The : guarantees that this operation will be done in variable names only (not in string data). the JSON variable names are inside a pair of \"s.
Maybe it's not 100% correct but you can start searching by this.
For details check MSDN, and some Regex examples
http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.replace.aspx

Categories