Removing elements from JSON based on a condition in C# - c#

I have a JSON string that I want to be able to amend in C#. I want to be able to delete a set of data based when one of the child values is a certain value.
Take the following
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"explainOther":"",
"fl":"*,score",
"indent":"on",
"start":"0",
"q":"*:*",
"hl.fl":"",
"qt":"",
"wt":"json",
"fq":"",
"version":"2.2",
"rows":"2"}
},
"response":{"numFound":2,"start":0,"maxScore":1.0,"docs":
[{
"id":"438500feb7714fbd9504a028883d2860",
"name":"John",
"dateTimeCreated":"2012-02-07T15:00:42Z",
"dateTimeUploaded":"2012-08-09T15:30:57Z",
"score":1.0
},
{
"id":"2f7661ae3c7a42dd9f2eb1946262cd24",
"name":"David",
"dateTimeCreated":"2012-02-07T15:02:37Z",
"dateTimeUploaded":"2012-08-09T15:45:06Z",
"score":1.0
}]
}}
There are two response results shown above. I want to be able to remove the whole parent response result group when its child "id" value is matched, for example if my id was "2f7661ae3c7a42dd9f2eb1946262cd24", I would want the second group to be deleted and thus my result would look as follows.
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"explainOther":"",
"fl":"*,score",
"indent":"on",
"start":"0",
"q":"*:*",
"hl.fl":"",
"qt":"",
"wt":"json",
"fq":"",
"version":"2.2",
"rows":"2"}},
"response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
{
"id":"438500feb7714fbd9504a028883d2860",
"name":"John",
"dateTimeCreated":"2012-02-07T15:00:42Z",
"dateTimeUploaded":"2012-08-09T15:30:57Z",
"score":1.0
}]
}}
I will need to perform multiple delete operations on the Json file. The Json file could contain thousands of results and I really need the most performant way possible.
Any help greatly appreciated.

I've been attempting to compress this into a nicer LINQ statement for the last 10 minutes or so, but the fact that the list of known Ids is inherently changing how each element is evaluated means that I'm probably not going to get that to happen.
var jObj = (JObject)JsonConvert.DeserializeObject(json);
var docsToRemove = new List<JToken>();
foreach (var doc in jObj["response"]["docs"])
{
var id = (string)doc["id"];
if (knownIds.Contains(id))
{
docsToRemove.Add(doc);
}
else
{
knownIds.Add(id);
}
}
foreach (var doc in docsToRemove)
doc.Remove();
This seems to work well with the crappy little console app I spun up to test, but my testing was limited to the sample data above so if there's any problems go ahead and leave a comment so I can fix them.
For what it's worth, this will basically run in linear time with respect to how many elements you feed it, which is likely all the more algorithmic performance you're going to get without getting hilarious with this problem. Spinning each page of ~100 records off into its own task using the Task Parallel Library invoking a worker that will handle its own little page and returned the cleaned JSON string comes to mind. That would certainly make this faster if you ran it on a multi-cored machine, and I'd be happy to provide some code to get you started on that, but it's also a huge overengineering for the scope of the problem as it's presented.

var jObj = (JObject)JsonConvert.DeserializeObject(json);
HashSet<string> idsToDelete = new HashSet<string>() { "2f7661ae3c7a42dd9f2eb1946262cd24" };
jObj["response"]["docs"]
.Where(x => idsToDelete.Contains((string)x["id"]))
.ToList()
.ForEach(doc=>doc.Remove());
var newJson = jObj.ToString();

None of the answers above worked for me, I had to Remove() child from Parent (.Parent.Remove()) not just Remove() it. Working code example below:
namespace Engine.Api.Formatters
{
using Newtonsoft.Json;
using Newtonsoft.Json.Linq;
using System;
using System.IO;
using System.Net;
using System.Net.Http;
using System.Net.Http.Formatting;
using System.Net.Http.Headers;
using System.Threading.Tasks;
using System.Web.Script.Serialization;
using System.Xml;
using System.Xml.Serialization;
public class ReducedJson
{
public dynamic WriteToStreamAsync(object value)
{
var json = new JavaScriptSerializer().Serialize(value);
var serializedJson = (JObject)JsonConvert.DeserializeObject(json);
foreach (var response in serializedJson["ProductData"]["Motor"]["QuoteResponses"])
{
response["NetCommResults"].Parent.Remove();
foreach (var netCommResult in response["BestPriceQuote"]["NetCommResults"])
{
netCommResult["Scores"].Parent.Remove();
}
}
return serializedJson;
}
}
Hope this saves you some time.

I just find another answer.
var aJson = JsonConvert.DeserializeObject<JObject>(json);
var doc = aJson["response"]["docs"];
JObject docs = new JObject();
docs["docs"] = doc;
// remove
docs.SelectTokens(string.Format("docs[?(#.id == '{0}')]", "2f7661ae3c7a42dd9f2eb1946262cd24")).ToList().ForEach(i => i.Remove());
// replace
aJson.SelectToken("response.docs").Replace(docs["docs"]);

Related

JObject Not Parsing Values I need

String Contents =
{
"links":[
{
".tag":"file",
"url":"myURL",
"id":"CCCCC",
"name":"CCCC",
"path_lower":"CCCC"
},
{
"url".. and so on.
}
JObject json = JObject.Parse(contents);
Console.WriteLine(json.GetValue("links.url"));
I am trying to get all the URL values and store them into an array. The problem is that this code does not parse anything.
The main json is links and the rest is under it. How can I go about getting all the URL values?
Take json["links"] as JArray.
Use Linq to retrieve url from the element in (1) and cast it to string.
using System.Collections.Generic;
using System.Linq;
using Newtonsoft.Json.Linq;
JObject json = JObject.Parse(contents);
JArray array = json["links"] as JArray;
List<string> links = array.Select(x => (string)x["url"]).ToList();
Sample demo on .NET Fiddle

Convert Nested JSON to CSV in C# via ChoETL

Does anyone know how to convert the below nested JSON to CSV via CHOETL (An ETL framework for .NET)? Thank you!
I'm using this code but it will only return the first equipment record.
CODE:
{
using (var json = new ChoJSONReader("./test.json"))
{
csv.Write(json.Cast<dynamic>().Select(i => new
{
EquipmentId = i.GpsLocation.Equipment[0].EquipmentId,
InquiryValue = i.GpsLocation.Equipment[0].InquiryValue,
Timestamp = i.GpsLocation.Equipment[0].Timestamp
}));
}
}
JSON:
"GpsLocation": {
"Equipment": [
{
"EquipmentId": "EQ00001",
"InquiryValue": [
"IV00001"
],
"Timestamp": "2020-01-01 01:01:01.01",
},
{
"EquipmentId": "EQ00002",
"InquiryValue": [
"IV00002"
],
"Timestamp": "2020-01-01 01:01:01.01"
}
]
}
}````
As others suggest, the issue is you are only looking at the first element of the array.
It appears that the easiest way to control what you serialise into CSV is by correctly defining your source objects from JSON. JSON Path expressions come in pretty handy.
What I ended up doing here is query all JSON to return an array of Equipment objects regardless of where they are in the hierarchy (which means you may need to filter it a bit better depending on your full JSON).
Then it's pretty easy to define each field based on JSON path and just pass the result to CSVWriter.
Also check out some gotchas that I outlined in the respective comment lines.
void Main()
{
var jsonString = "{\"GpsLocation\":{\"Equipment\":[{\"EquipmentId\":\"EQ00001\",\"InquiryValue\":[\"IV00001\"],\"Timestamp\":\"2020-01-01 01:01:01.01\"},{\"EquipmentId\":\"EQ00002\",\"InquiryValue\":[\"IV00002\"],\"Timestamp\":\"2020-01-01 01:01:01.01\"}]}}";
var jsonReader = new StringReader(jsonString);
var csvWriter = new StringWriter(); // outputs to string, comment out if you want file output
//var csvWriter = new StreamWriter(".\\your_output.csv"); // writes to a file of your choice
using (var csv = new ChoCSVWriter(csvWriter))
using (var json = new ChoJSONReader(jsonReader)
.WithJSONPath("$..Equipment[*]", true) // firstly you scope the reader to all Equipment objects. take note of the second parameter. Apparently you need to pass true here as otherwise it just won't return anythig
.WithField("EquipmentId", jsonPath: "$.EquipmentId", isArray: false) // then you scope each field in the array to what you want it to be. Since you want scalar values, pass `isArray: false` for better predictability
.WithField("InquiryValue", jsonPath: "$.InquiryValue[0]", isArray: false) // since your InquiryValue is actually an array, you want to obtain first element here. if you don't do this, fields names and values would go askew
.WithField("Timestamp", jsonPath: "$.Timestamp", fieldType: typeof(DateTime), isArray: false)) // you can also supply field type, otherwise it seems to default to `string`
{
csv.WithFirstLineHeader().Write(json);
}
Console.WriteLine(csvWriter.GetStringBuilder().ToString()); // comment this out if writing to file - you won't need it
}
Update summary:
Pivoted to update the code to rely on JSON Path scoping - this seems to allow for field name manipulation with pretty low effort
Looking at your comment, you could probably simplify your file writer a little bit - use StreamWriter instead of StringWriter - see updated code for example
Here is the working sample of producing CSV from your JSON
string json = #"{
""GpsLocation"": {
""Equipment"": [
{
""EquipmentId"": ""EQ00001"",
""InquiryValue"": [
""IV00001""
],
""Timestamp"": ""2020-02-01 01:01:01.01"",
},
{
""EquipmentId"": ""EQ00002"",
""InquiryValue"": [
""IV00002""
],
""Timestamp"": ""2020-01-01 01:01:01.01""
}
]
}
}";
StringBuilder csv = new StringBuilder();
using (var r = ChoJSONReader.LoadText(json)
.WithJSONPath("$.GpsLocation.Equipment")
.WithField("EquipmentId")
.WithField("InquiryValue", jsonPath: "InquiryValue[0]", fieldType: typeof(string))
.WithField("Timestamp", fieldType: typeof(DateTime))
)
{
using (var w = new ChoCSVWriter(csv)
.WithFirstLineHeader())
w.Write(r);
}
Console.WriteLine(csv.ToString());
Output:
EquipmentId,InquiryValue,Timestamp
EQ00001,IV00001,2/1/2020 1:01:01 AM
EQ00002,IV00002,1/1/2020 1:01:01 AM
Sample fiddle: https://dotnetfiddle.net/hJWtqH
Your code is sound, but the issue is that you're only writing the first variable in the array by using i.GpsLocation.Equipment[0]. Instead, try looping over everything by putting it into a for loop, and changing the [0] to your iterating variable inside of said loop.

Use LINQ in C# to find MondoDB records when values in a list field match a criteria value from a list

I want to use LINQ to return all records in a MongoDB collection where the field in the record is a list of strings and any string in the list matches any string value in a list of strings used as the search criteria:
Mongo Record in Collection ("Item"):
{
"_id": ...,
"StringList": [
"string1",
"string2",
"string3"
],
...
}
Search Criteria:
var criteria = new List<string> { "string2", "string4" };
My Code:
var foundItems = iMongoDataProvider.FindAll<Item>()
.Where(x =>x.StringList.ContainsAny(criteria)).ToList();
Based on the above, the Mongo record should be returned since one of the StringList values matches one of the values in the search criteria. Nothing is returned even though I can manually peruse the collection and find the matching record. What am I doing wrong? Can someone provide an example that will do what I need? Thanks!
have you tried somethinglike:
using System;
using System.Collections.Generic;
using MongoDB.Bson;
using MongoDB.Driver;
using MongoDB.Driver.Linq;
using System.Linq;
using System.Linq.Expressions;
var foundItems = _collection.FindAll(x=> criteria.Any(cc=> xx.StringList.Contains(cc))).ToList();
Where _collection is IMongoCollection<TEntity> _collection
What you are looking for is ElemMatch Filter (https://docs.mongodb.com/v3.2/reference/operator/query/elemMatch/) :
var foundItems = collection.Find(Builders<Item>.Filter.ElemMatch(
x => x.StringList,
s=>criteria.Contains(s)));
where collection is your IMongoCollection<Item>
I see, that you are using FindAll, that means, that your MongoDb driver is of the version 1.x (see here more about it: FindAll in MongoDB .NET Driver 2.0)
I would suggest to update your driver, because this version isnot uptodate. Or are the any important reason don't to do it?
This filter query on server. Sure you could get your data as IEnumerable and filter it locally:
var foundItems = collection.Find(x=>true)
.ToEnumerable()
.Where(x => x.StringList.Intersect(criteria).Any());
If your data is not so huge and you are fine with filtering on client it's a good way too.
And if you are doing already FindAll, that means you get get all the data, you could query it with intersect :
var foundItems = iMongoDataProvider.FindAll<Item>()
.Where(x => x.StringList.Intersect(criteria).Any());
What you want is to know if the intersection of the two lists has any values:
.Where(x =>x.StringList.Intersect(criteria).Any())
I'm not sure what the problem is with your code but here is working code
void Main()
{
List<string> []StringList = new List<string>[] {
new List<string> { "string1", "string2", "string3" },
new List<string> { "string11", "string12", "string13" },
new List<string> { "string21", "string22", "string4" }
};
var criteria = new List<string> { "string2", "string4" };
var foundItems = StringList
.Where(x => x.Intersect(criteria).Any()).ToList();
foundItems.Dump();
}
I tested this using LinqPad (which I recommend to anyone working in Linq and it is free).

JSON to CSV and CSV to JSON in C#

I have been searching and searching for a way to convert a json file to a csv and the vice versa using C#. I have searched google and have not come up with anything. Everything I've tried so far from the answers on stack overflow just do not work from me. Does anyone know of any tooling or tutorials I could have look at how to accomplish this with the .NET Framework? Usually I post what I've tried however I'm clearly far off here so it is pointless.
Compromises and Problems
You can accomplish this with the .NET Framework but there's not a clear and obvious way to just do this straight-up because of hierarchies and collections. What I mean by that is that CSV data is very flat and unstructured whereas JSON data is very organized and iterative. Let's take a simple chunk of JSON data that could look like this:
{
"Data": [
{
"Name":"Mickey Mouse",
"Friends":[ "Pluto", "Minnie", "Donald" ]
},
{
"Name":"Pluto",
"Friends":[ "Mickey" ]
}
]
}
The most obvious CSV file for that could be:
Name,Friend
Mickey Mouse,Pluto
Mickey Mouse,Minnie
Mickey Mouse,Donald
Pluto,Mickey
That's the easier conversion but let's say you just have that CSV file. It's not so obvious what the JSON should look like. One could argue that the JSON should look like this:
{
"Data": [
{ "Name":"Mickey Mouse", "Friend":"Pluto" },
{ "Name":"Mickey Mouse", "Friend":"Minnie" },
{ "Name":"Mickey Mouse", "Friend":"Donald" },
{ "Name":"Pluto", "Friend":"Mickey" },
]
}
That resulting JSON file is very different than the input JSON file. My point is that this isn't a simple/obvious conversion so any off-the-shelf or copy/paste solution will be imperfect. Whatever your solution is, you're going to have to make compromises or intelligent decisions.
.NET Framework Options
Now that we've gotten that out of the way, .NET gives you some capabilities to accomplish this out of the box and there are some good Nuget-supplied options as well. If you want to utilize pure .NET capabilities, you could use a combination of these two SO Answers:
Not perfect but this answer has some great code to get you started in the logic to generate a CSV file
This question and the resulting answers have some good info about generating JSON using just the .NET Framework and without any third-party utilities.
You should be able to apply the concepts in those two links PLUS the compromises and intelligent decisions you need to make from my first "Compromises and Problems" section of this post to accomplish what you need.
Something I've Done Before
I've done something similar where I actually used some functionality in the Microsoft.VisualBasic.FileIO namespace (works great in a C# app) in addition to Web API's serialization functionality to accomplish a CSV->JSON conversion using Dynamic objects (using the dynamic keyword) as an intermediary. The code is provided below. It's not terribly robust and makes some significant compromises but it has worked well for me. If you want to try this, you'll have to create your own version that goes in reverse, but as I mentioned in my first section, that's really the easy part.
using System.Collections.Generic;
using System.Dynamic;
using System.IO;
using System.Linq;
using System.Web.Http;
// NOTE: This is not purely my code. This was put together
// with the help of other SO questions that I wish I had the
// links to so I could credit them. You probably will find
// some chunk(s) of this code elsewhere on SO.
namespace Application1.Controllers
{
public class Foo
{
public string Csv { get; set; }
}
public class JsonController : ApiController
{
[HttpPost]
[Route("~/Csv/ToJson")]
public dynamic[] ConvertCsv([FromBody] Foo input)
{
var data = CsvToDynamicData(input.Csv);
return data.ToArray();
}
internal static List<dynamic> CsvToDynamicData(string csv)
{
var headers = new List<string>();
var dataRows = new List<dynamic>();
using (TextReader reader = new StringReader(csv))
{
using (var parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(reader))
{
parser.Delimiters = new[] {","};
parser.HasFieldsEnclosedInQuotes = true;
parser.TrimWhiteSpace = true;
var rowIdx = 0;
while (!parser.EndOfData)
{
var colIdx = 0;
dynamic rowData = new ExpandoObject();
var rowDataAsDictionary = (IDictionary<string, object>) rowData;
foreach (var field in parser.ReadFields().AsEnumerable())
{
if (rowIdx == 0)
{
// header
headers.Add(field.Replace("\\", "_").Replace("/", "_").Replace(",", "_"));
}
else
{
if (field == "null" || field == "NULL")
{
rowDataAsDictionary.Add(headers[colIdx], null);
}
else
{
rowDataAsDictionary.Add(headers[colIdx], field);
}
}
colIdx++;
}
if (rowDataAsDictionary.Keys.Any())
{
dataRows.Add(rowData);
}
rowIdx++;
}
}
}
return dataRows;
}
}
}
If you want something more robust, then you can always leverage these great projects:
JSON.NET (This works VERY WELL with creating JSON from dynamic objects. Given that you're not using Web API, this would be the first place I would look to take the dynamic[] return value and convert it to JSON.)
CsvHelper
Besides using combination of multiple libraries to do the conversion of JSON to CSV and vice versa, Cinchoo ETL gives you unified interface to do the conversion between those 2 formats.
For a sample JSON file:
[
{
"Name" : "Xytrex Co.",
"Description" : "Industrial Cleaning Supply Company",
"AccountNumber" : "ABC15797531"
},
{
"Name" : "Watson and Powell, Inc.",
"Description" : "Law firm. New York Headquarters",
"AccountNumber" : "ABC24689753"
}
]
To produce CSV file:
Name,Description,AccountNumber
Xytrex Co.,Industrial Cleaning Supply Company,ABC15797531
Watson and Powell Inc.,Law firm. New York Headquarters,ABC24689753
JSON to CSV:
using (var p = ChoJSONReader.LoadText(json))
{
using (var w = new ChoCSVWriter(Console.Out)
.WithFirstLineHeader()
)
{
w.Write(p);
}
}
Sample fiddle: https://dotnetfiddle.net/T3u4W2
CSV to JSON:
using (var p = ChoCSVReader.LoadText(csv)
.WithFirstLineHeader()
)
{
using (var w = new ChoJSONWriter(Console.Out))
{
w.Write(p);
}
}
Sample fiddle: https://dotnetfiddle.net/gVlJVX
like Jaxidian mentioned, the problem is, that json can have a hierarchy, csv not.
So, there are two solutions I could suggest you:
create a hierarchical csv, shouldn't be much effort:
"Id";"Name";"Age";"Type"
"FriendId"
1;"Mickey Mouse";20;"mouse"
2
3
4
2;"Pluto";7;"dog"
1
3;"Minnie";20;"mouse"
4;"Donald";22;"duck"
create multiple files, could be more effort, but is more beautiful and more dynamic, when you eg. export from/import into database. Maybe this link could help you: http://www.snellman.net/blog/archive/2016-01-12-json-to-multicsv
all.csv (store all characters)
"Id";"Name";"Age";"Type"
1;"Mickey Mouse";20;"mouse"
2;"Pluto";7;"dog"
3;"Minnie";20;"mouse"
4;"Donald";22;"duck"
friends.csv (store all relations)
"FriendKey1";"FriendKey2"
1;2
2;1
1;3
1;4

how to return first element of a MongoCursor?

I have a cursor which contains at least one element :
MongoCursor cursor = oColl.FindAs<CMongoCon>(Query.EQ("isAc", "1"));
I would like to return only the first element. Right now I do it this way
foreach (CMongoCon job in cursor)
{
return job;
}
Would there be more simple way since I know it's the first element?
Does this work?
using System.Linq;
...
var whatYouAreAfter = cursor.FirstOrDefault();
cursor.First() should also work. Just depends what you need.
Inspired heavily from here: https://stackoverflow.com/a/19492292/2524589
Why don't you simply use C# fluent API for querying data from mongo? This example works fine on my setup.
public Doc GetFirstExistingDocument()
{
var client = new MongoClient();
var database = client.GetDatabase("test");
return database.GetCollection<Doc>("docs")
.Find(doc => !doc.Deleted)
.Sort(Builders<Doc>.Sort.Ascending(doc => doc.Date))
.First();
}

Categories