Regular expression groups replacement - c#

I'm working on a regular expression and I just can't figure out what the problem is. I've tried several helping sites like http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx and http://gskinner.com/RegExr/ but somehow when I put the tested regular expression in c# it is not processed correctly
I'm working on a JSON string I can receive from JIRA. The heavily stripped down and beautified version of this JSON string is as follows:
{
"fields": {
"progress": {
"progress": 0,
"total": 0
},
"summary": "Webhook listener is working",
"timetracking": {},
"resolution": null,
"resolutiondate": null,
"timespent": null,
"reporter": {
"self": "http://removed.com/rest/api/2/user?username=removed",
"name": "removed#nothere.com",
"emailAddress": "removed#nothere.com",
"avatarUrls": {
"16x16": "http://www.gravatar.com/avatar/88994b13ab4916972ff1861f9cccd4ed?d=mm&s=16",
"24x24": "http://www.gravatar.com/avatar/88994b13ab4916972ff1861f9cccd4ed?d=mm&s=24",
"32x32": "http://www.gravatar.com/avatar/88994b13ab4916972ff1861f9cccd4ed?d=mm&s=32",
"48x48": "http://www.gravatar.com/avatar/88994b13ab4916972ff1861f9cccd4ed?d=mm&s=48"
},
"displayName": "Wubinator]",
"active": true
},
"updated": "2013-08-20T14:08:00.247+0200",
"created": "2013-07-30T14:41:07.090+0200",
"description": "Say what?",
"customfield_10001": null,
"duedate": null,
"issuelinks": [],
"customfield_10004": "73",
"worklog": {
"startAt": 0,
"maxResults": 0,
"total": 0,
"worklogs": []
},
"project": {
"self": "http://removed.com/rest/api/2/project/EP",
"id": "10000",
"key": "EP",
"name": "EuroPort+ Suite",
"avatarUrls": {
"16x16": "http://removed.com/secure/projectavatar?size=xsmall&pid=10000&avatarId=10208",
"24x24": "http://removed.com/secure/projectavatar?size=small&pid=10000&avatarId=10208",
"32x32": "http://removed.com/secure/projectavatar?size=medium&pid=10000&avatarId=10208",
"48x48": "http://removed.com/secure/projectavatar?pid=10000&avatarId=10208"
}
},
"customfield_10700": null,
"timeestimate": null,
"lastViewed": null,
"timeoriginalestimate": null,
"customfield_10802": null
}
}
I need to convert this JSON to a XML of course this is not directly possible because of the "16x16", "24x24", "32x32" and "48x48" bits inside the json which would be transformed into <16x16 />, <24x24 />, <32x32 /> and <48x48 /> tags which are invalid tags.
The receiver of the XML doesn't even need those avatar urls so I was thinking about stripping out the entire "avatarUrls":"{ ..... }, bit before handing the json over to JSON.NET for converting.
I was thinking about doing this using a regular expression. After some testing on the mentioned websites I came to the following regular expression:
("avatarUrls)(.*?)("displayName")
The Regex.Replace method should remove all found results instead of the third groep (a.k.a. "displayName")
The website http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx shows me the correct groups and find results and says that the mentioned regular expression should be used inside C# is:
#"(""avatarUrls)(.*?)(""displayName"")"
So inside C# I wrote the following:
string expression = #"(""avatarUrls)(.*?)(""displayName"")";
string result = Regex.Replace(json, expression, "$3");
return result;
When I look at the result after the RegexReplace nothing has been replaced. Does anyone see what I did wrong here?

I wouldn't use regular expressions to remove these nodes. I'd instead use JSON .Net to remove the nodes you don't want.
I refer to the quote:
Some people, when confronted with a problem, think “I know, I'll use
regular expressions.” Now they have two problems.
Using the answer found here, you could write:
var jsonObject = (JObject)JsonConvert.DeserializeObject(yourJsonString);
removeFields(jsonObject.Root, new[]{"avatarUrls"});
(Note that I was not sure if you wanted to delete both "avatarUrls" nodes.)

There's an overload of Regex.Replace that takes RegexOptions that you may need to look into. For example, for . to matches every character (instead of every character except \n), you'd need to specify RegexOptions.Singleline. Also, it looks like you're trying to replace every match of #"(""avatarUrls)(.*?)(""displayName"")" with $3 is that intended? You might be better off doing something like this:
var match = Regex.Match(json, pattern, options);
while (match.Success) {
// Do stuff with match.Groups(1)
match = match.NextMatch();
}
However... I'm not really sure that's going to replace it in the source string.

The problem is something completely different:
Inside the following string:
{"16x16":"http://www.gravatar.com/avatar/88994b13ab4916972ff1861f9cccd4ed?d=mm&s=16, "32.32"
There is an '&' the magic symbol that indicates a next parameter is started. Therefor no complete JSON is read and therefor it cannot convert it properly. It also indicates why nothing is being replaced inside the regular expression I used because "displayName" is not inside the string, so nothing matches.

Related

How do I extract particular value based another another value?

so I want to be able to extract and ID based on whether that object has a particular property. I NEED this to be done via Regex. Here is an example of the JSON I am working with:
{
"workspaceid": ws01,
"data": {
"workspacetitle": "My Workspace"
},
"collections": {
"projects": [{
"id": 01,
"data": {
"title": "My Project 01",
"enddateperiod": "2020-02-20T23:59:59",
"profilecomplete": true,
"synced": false
},
"lists": {
"projectcode": [{
"id": pcodered,
"data": {
"code": "myproject123",
"name": "OffshoreProject"
}
}]
}
}, {
"id": 02,
"data": {
"title": "My Project 02",
"enddateperiod": "2020-02-20T23:59:59",
"profilecomplete": false,
"synced": false
},
"lists": {
"projectcode": [{
"id": pcodered,
"data": {
"code": "myproject123",
"name": "OffshoreProject"
}
}]
}
}]
}}
So what I want to extract is the ID of the project whose profile is not complete ("profilecomplete":false). So in this case, I want to select Project 2's id (which is 02).
How can I do this via Regex? I've managed to remove all of the whitespace and new lines as well so the JSON is essentially all one long line. Would it be easier to extract the Regex like this? Either way, I could use some help on how to get this ID.
NOTE: The format of the JSON cannot change.
This one works
/"id": ([^,]*?)(?=,[^{]*{[^}]*"profilecomplete": false)/
Explanations :
Read all these chars first "id":[space]
Then read in a group chars that aren't ","
And then a lookahead : you expect "," then chars that aren't "{", then "{"; and finally, before matching the closing "}", you want to read "profilecomplete": false
But I agree that a JSON parser would have been my preferred option!

Extract only JSON from string in C#

I have a requirement in c# to extract the below JSON error message and read the title element.
I need to remove all the characters in the string and I want only starting from errors
i.e
{
"errors":
[{
"status": "404",
"title": "Not found data",
"detail": "This is a sample line of error detail."
}]
}
Please note that the exception can be anything so I just require to extract the JSON message starting from"errors".
Can you please assist me?
Code
string sb="{465F6CE7-3DF9-4BAF-8DD0-3E116CDAC9E7}0xc0c0167a0System.Net.WebException: There was no endpoint listening at http://TestData/member that could accept the message. This is often caused by an incorrect address or SOAP action. See InnerException, if present, for more details.
{ "errors": [ { "status": "404", "title": "Not found data","detail": "This is a sample line of error detail." } ] }";
If you're asking how to extract a specific sequence of text from a random string of text, this sounds like a regular expression.
The lazy mans solution:
If you're just looking to read the title, you could just do IndexOf on "title", and then read to the next quotation mark that's not preceded by a backward-slash.
var pattern = #"\{(\s?)\'errors.*";
string sb = "{465F6CE7-3DF9-4BAF-8DD0-3E116CDAC9E7}0xc0c0167a0System.Net.WebException: There was no endpoint listening at http://TestData/member that could accept the message. This is often caused by an incorrect address or SOAP action. See InnerException, if present, for more details. { 'errors': [ { 'status': '404', 'title': 'Not found data','detail': 'This is a sample line of error detail.' } ] }";
MatchCollection matches = Regex.Matches(sb, pattern);
I have changed the " to ', so just change the pattern to match ".
matches is not an array of all matches matches[0] will give you what you want.
You can use JSON.NET. So, you need to parse your string into JObject i.e.
string sb = #"{ ""errors"": [ { ""status"": ""404"", ""title"": ""Not found data"",""detail"": ""This is a sample line of error detail."" } ] }";
JObject jsonObject = JObject.Parse(sb);
JArray errors = (JArray)jsonObject["errors"];
foreach(var item in errors.Children())
{
int itemStatus = (int)item["status"];
string itemTitle = (string)item["title"];
string itemDetail = (string)item["detail"];
}
So, in this loop you can get what you want i have shown all the elements from the JSON that can be extracted.
Hope this helps you :)

receinv bad json format from service

I get the following: How to make it as a valid JSON?
{{
"id": "123",
"name": "Kaizen",
"living": {
"city": "Sydney",
"state": "NSW"
},
"Country": {
"name": "Australia",
"region": "APAC"
}
}}
It looks like a valid JSON except for the opening and closing bracket.
You can simply cut it out:
string jsonString = yourServerClient.GetData();
jsonString = jsonString.Trim();
jsonString = jsonString.Substring(1, jsonString.Length - 2);
var jsonObj = JsonConvert.DeserializeObject(jsonString);
However, I would recommend you to refuse using any incorrect or invalid data sources - it is the road to hell.
You can never expect what they do next, and you definitely do not want to spend much of your time every time they change their service, and rewrite (and worsen) your code such that it now supports their incorrect format.

How to handle spaces in JSON keys when serializing to XML?

I'm using Json.NET in a .NET 4.0 application in order to convert a JSON RESTful response into XML. I am running into issues converting JSON into XML if a JSON child key has a space.
So far, I am able to convert most JSON responses.
Here are example responses along with the code which I am using to generate the XML.
{
num_reviews: "2",
page_id: "17816",
merchant_id: 7165
}
And here is the response which is causing an error:
[
{
headline: "ant bully",
created_date: "2010/06/12",
merchant_group_id: 10126,
profile_id: 0,
provider_id: 10000,
locale: "en_US",
helpful_score: 1314,
locale_id: 1,
variant: "",
bottomline: "Yes",
name: "Jessie",
page_id: "17816",
review_tags: [
{
Pros: [
"Easy to Learn",
"Engaging Story Line",
"Graphics",
"Good Audio",
"Multiplayer",
"Gameplay"
]
},
{
Describe Yourself: [
"Casual Gamer"
]
},
{
Best Uses: [
"Multiple Players"
]
},
{
Primary use: [
"Personal"
]
}
],
rating: 4,
merchant_id: 7165,
reviewer_type: "Verified Reviewer",
comments: "fun to play"
},
{
headline: "Ok game, but great price!",
created_date: "2010/02/28",
merchant_group_id: 10126,
profile_id: 0,
provider_id: 10000,
locale: "en_US",
helpful_score: 1918,
locale_id: 1,
variant: "",
bottomline: "Yes",
name: "Alleycatsandconmen",
page_id: "17816",
review_tags: [
{
Pros: [
"Easy to Learn",
"Engaging Story Line"
]
},
{
Describe Yourself: [
"Frequent Player"
]
},
{
Primary use: [
"Personal"
]
},
{
Best Uses: [
"Kids"
]
}
],
rating: 3,
merchant_id: 7165,
reviewer_type: "Verified Reviewer",
comments: "This is a cute game for the kids and at a great price. Just don't expect a whole lot."
}
]
So far, I have been considering on creating a mapping of the JSON data to a C# object and generating XML for that class. However, is there a way to keep this dynamic? Or is there a way to treat spaces as %20 encodings?
This question is same as how to validate JSON string before converting to XML in C#
If you have any further queries, please let me know.
You can call XmlConvert.EncodeName, which will escape any invalid characters using _s.
For example, a space would become _x0020_.
You cannot have an XMLElement Name with a space in it. You would need to replace the space with an Underscore or anyother element. If that is not feasible for you, try putting that value as an attribute for that Node.
I hope this makes sense.

Remove Unneeded Spaces from JSON Output

I am sterilizing a JSON.Net object, and it contains many arrays. Here is the output I currently get:
"children": [
{
"children": [
{
},
{
}
}
However, just for the ease of reading and comparing, I would like to remove the line breaks between each brace and bracket and between the comma and next brace, so it looks like this:
"children": [ {
"children": [ {
}, {
}
}
I am already sterilizing my JSON with the Formatting.Indented argument, so I would like to know if there is another setting I can change so that JSON.Net sterilizes without the extra line brakes, but retaining the indented formatting.
There is no feature in Json.NET to give you that kind of indentation. You'll either have to do it yourself outside of Json.NET or modify the source code.
Can you split on '{' and then join the array again by spaces?

Categories