I'm scraping the internet, therefore, the JSON I'm analyzing will be completely different on each webpage. In a nutshell, though, I'm looking to find the author's name.
I'm currently using:
dynamic results = JsonConvert.DeserializeObject(value);
string Author = results.Author;
The problem is, two pages are never the same.
This is an example of two different web pages, with schema which I will find and deserialize & find the author's name.
Example 1:
{
"#context": "https://schema.org",
"#type": "BookSeries",
"author": {
"#type": "Person",
"givenName": "Douglas",
"familyName": "Adams",
"additionalName": "Noel",
"birthDate": "1952-03-11",
"birthPlace": {
"#type": "Place",
"address": "Cambridge, Cambridgeshire, England"
}
}
}
Example 2:
{
"#context": "https://schema.org",
"#type": "WebPage",
"name": "Lecture 12: Graphs, networks, incidence matrices",
"author": "James Beckett",
"description": "These video lectures of Professor Gilbert Strang teaching 18.06 were recorded in Fall 1999 and do not correspond precisely to the current edition of the textbook.",
"publisher": {
"#type": "CollegeOrUniversity",
"name": "MIT OpenCourseWare"
},
"license": "http://creativecommons.org/licenses/by-nc-sa/3.0/us/deed.en_US"
}
Is there a way of truly being dynamic, and finding said values within a JSON string, no matter how they're formatted? With static JSON it's very simple, however, like this - I have absolutely no clue because you can't turn the JSON into C# classes, because they'll always be different.
Any help would be appreciated!
Related
I'm using the following from Newtonsoft to deserialize some JSON data into a datatable (for the ultimate purpose of saving out to a spreadsheet if it matters);
var dt = (DataTable)JsonConvert.DeserializeObject(returnData, (typeof(DataTable)));
While this works well enough, it has the problem that nested rows are lost. Below is example data of a similar format. In the ratings section only "Internet Movie Database" is saved, "Rotten Tomatoes" & "Metacritic" are lost in the conversion. Is there a deserialize method that would retain these? I'm willing to consider options that would split the results onto multiple rows OR concatenate the ratings section into a single field.
{
"Title": "Guardians of the Galaxy Vol. 2",
"Year": "2017",
"Rated": "PG-13",
"Released": "05 May 2017",
"Runtime": "136 min",
"Genre": "Action, Adventure, Comedy, Sci-Fi",
"Director": "James Gunn",
"Writer": "James Gunn, Dan Abnett (based on the Marvel comics by), Andy Lanning (based on the Marvel comics by), Steve Englehart (Star-Lord created by), Steve Gan (Star-Lord created by), Jim Starlin (Gamora and Drax created by), Stan Lee (Groot created by), Larry Lieber (Groot created by), Jack Kirby (Groot created by), Bill Mantlo (Rocket Raccoon created by), Keith Giffen (Rocket Raccoon created by), Steve Gerber (Howard the Duck created by), Val Mayerik (Howard the Duck created by)",
"Actors": "Chris Pratt, Zoe Saldana, Dave Bautista, Vin Diesel",
"Plot": "The Guardians struggle to keep together as a team while dealing with their personal family issues, notably Star-Lord's encounter with his father the ambitious celestial being Ego.",
"Language": "English",
"Country": "USA",
"Awards": "Nominated for 1 Oscar. Another 12 wins & 42 nominations.",
"Poster": "https://m.media-amazon.com/images/M/MV5BMTg2MzI1MTg3OF5BMl5BanBnXkFtZTgwNTU3NDA2MTI#._V1_SX300.jpg",
"Ratings": [{
"Source": "Internet Movie Database",
"Value": "7.7/10"
}, {
"Source": "Rotten Tomatoes",
"Value": "84%"
}, {
"Source": "Metacritic",
"Value": "67/100"
}
],
"Metascore": "67",
"imdbRating": "7.7",
"imdbVotes": "482,251",
"imdbID": "tt3896198",
"Type": "movie",
"DVD": "22 Aug 2017",
"BoxOffice": "$389,804,217",
"Production": "Walt Disney Pictures",
"Website": "https://marvel.com/guardians",
"Response": "True"
}
UPDATE
Thanks for the solutions, I'm going to try these when I get home. In the meantime, perhaps to be clearer (or maybe even more complicated), I'd settle for concatenating the Ratings section to a single delimited string/field. What would be ideal is something like below.
The DataTable type to which you're de-serializing is unable to handle the one-to-many relationship between the movie and its ratings.
Try de-serializing to a more specific type that better suits your JSON objects.
You can use json2csharp.com to make a C# class out of a JSON object.
Once you have your C# type, you can de-serialize to that and get the C# equivalent of your objects.
var obj = (RootObject)JsonConvert.DeserializeObject(returnData, (typeof(RootObject)));
or if your JSON data is an array of these objects:
var list = (RootObject[])JsonConvert.DeserializeObject(returnData, (typeof(RootObject[])));
This works for you if you don't want to declare a class.
var dict = JsonConvert.DeserializeObject<Dictionary<string, object>>(json);
string rating = Convert.ToString(dict["Ratings"]);
var dtScore = JsonConvert.DeserializeObject<DataTable>(rating);
string MetacriticScore = dtScore.Rows[2]["Value"].ToString();
And there is another simple way
var jsonObj = JsonConvert.DeserializeObject<JObject>(json);
string MetacriticScore = Convert.ToString(jsonObj["Ratings"][2]["Value"]);
I have documents like this in my CosmosDB database:
{
"id": "12345",
"filename": "foo.txt",
"versions": {
"1": {
"storageAccount": "blob123",
"size": 33
},
"2": {
"storageAccount": "blob123",
"size": 42
}
}
}
(this is a simplified sample)
I need to query on the "storageAccount" property, to check if there are files stored on a given storage account. But I can't find a way to express "for each version".
I tried this, but of course it doesn't work
select top 1 *
from c
join v in c.versions
where v.storageAccount = 'blob123'
Apparently JOIN only works on arrays, not dictionaries. Is there a way to query items in a dictionary?
As a workaround, I can use an UDF, but the performance and cost are terrible (1200 RUs for just 2000 documents when there is not matching document...)
EDIT: updated to more closely reflect actual use case
Unfortunately, this isn't possible today. You cannot iterate over object keys in Cosmos's SQL.
I'd recommend changing the schema to something like:
{
"id": "12345",
"filename": "foo.txt",
"versions": [
{
"id": "1"
"storageAccount": "blob123",
"size": 33
},
{
"id": "2"
"storageAccount": "blob123",
"size": 42
}
]
}
Additionally, you could evaluate a User Defined Function which would return the keys of an object for you, but that will increase your RU costs, though possibly less than sprocs.
I have a JSON document coming from a vendor that looks like this:
{
"content": [{
"name": "Windows 8.1 x64",
"id": "Windows81x64",
"components": {
"Windows81x64": {
"propertyGroups": ["VirtualWindows81x64"],
"dependsOn": [],
"data": {
"provisioning_workflow": {
"fixed": {
"id": "WIMImageWorkflow",
"label": "WIMImageWorkflow"
}
},
"memory": {
"default": 2048,
"min": 2048,
"max": 16384
}
}
}
}
}]
}
Most of this document is fairly easy to deserialize into an object using the typical DataContractSerializer, however, there are a couple of keys/values that I am not sure what the "best practice" might be.
If you look at the "components" key the first key after that one is titled "Windows81x64". This key can change from document to document and it can be any value. It almost should be a 'Name' property of the collection but I can't control that. Furthermore, inside the 'Windows81x64' key there is another property called 'data'. According to the vendor the value of data is 'anonymous.' So, basically it can be anything.
Any ideas on the best way to deserialize this into a custom object when it comes to those parts of the document? Thank you.
You can deserialize dynamic ones as Dictionary<string, object>
Or if you know the value's type you can use Dictionary<string, ValueType> where the key of the dictionary would be the name (in your case Windows81x64)
Relatively newbie here with a little question. I been extracting a json string that looks like this (in this case it is a modified return from Facebook oauth2.
{"id":"555555555555555","name":"Monkey
Man","last_name":"Man","first_name":"Monkey","email":"test\u0040someaccount.com","location":{"id":"555555555555555","name":"Jungle,
North
Carolina"},"gender":"male","work":[{"employer":{"id":"555555555555555","name":"Big
Boss makes me work"}:"projects":{"current":"doing stuff",
"previous":"other
stuff"},"location":{"id":"555555555555555","name":"Jungle, North
Carolina"},"position":{"id":"555555555555555","name":"IT
monkey"},"start_date":"2010-09"}],"picture":"http://profile.ak.fbcdn.net/static-ak/rsrc.php/v1/yo/r/5555555-555.gif"}
Well I am able to extract everything to a the dictionary by using the following code
JavaScriptSerializer ser = new JavaScriptSerializer();
Dictionary<string, object> dict = ser.Deserialize<Dictionary<string,object>>(json);
I then extract the data as following from the dictionary and store them in an object called contact which is pretty much just a collection of strings.
if (d.ContainsKey("email"))
{
c.email = d["email"].ToString();
}
else
c.email = "";
I did it this way as I was not gaurenteed the information fields will all be there.
If there is an object set in the value such as with the address I use a modified code (thanks to the guy who showed me how to do that) like following.
c.location = (d["location"] as Dictionary<string, object>)["name"].ToString();
Now come the difficult part that I am stuck on.
I am trying to extract the employer name "Big Boss makes me work" from the following part of the string...
"work":[{"employer":{"id":"555555555555555","name":"Big Boss makes me
work"}:"projects":{"current":"doing stuff", "previous":"other
stuff"},"location":{"id":"555555555555555","name":"Jungle, North
Carolina"},"position":{"id":"555555555555555","name":"IT
monkey"},"start_date":"2010-09"}]
It is storing the data down within an array inside of other objects and I have no idea how to get to the information to extract it, or even how to extract information like this from live oauth2...
"addresses": { "personal": { "street": null, "street_2": null, "city":
"Jungle", "state": "NC", "postal_code": "28677", "region": "United
States" }, "business": { "street": "Tree Street", "street_2": null,
"city": "Jungle", "state": "NC", "postal_code": "28677", "region":
"United States" } }
As you can see this goes three levels deep so my (d["location"] as Dictionary)["name"].ToString(); is pretty useless here. How would you go about getting say the street name from this?
I hope my questions aren't too vague or random. I just need some advice on properly extracting data from the dictionary objects. The ways I come up with involve editing the json string and that causes alsorts of problems as I just don't understand the dictionary object well enough to figure this out on my own
Thanks
Running your JSON through jsonlint.com (and correcting it slightly), it looks like this formatted:
{
"id": "555555555555555",
"name": "Monkey Man",
"last_name": "Man",
"first_name": "Monkey",
"email": "test#someaccount.com",
"location": {
"id": "555555555555555",
"name": "Jungle, North Carolina"
},
"gender": "male",
"work": [
{
"employer": {
"id": "555555555555555",
"name": "Big Boss makes me work"
},
"projects": {
"current": "doing stuff",
"previous": "other stuff"
},
"location": {
"id": "555555555555555",
"name": "Jungle, North Carolina"
},
"position": {
"id": "555555555555555",
"name": "IT monkey"
},
"start_date": "2010-09"
}
],
"picture": "http://profile.ak.fbcdn.net/static-ak/rsrc.php/v1/yo/r/5555555-555.gif"
}
Your JSON data in this case just isn't really suitable to be serialized to a straightforward Dictionary object, so that's not really the way to go here.
The easier way to do is to create a C# class that has defined properties the same as the Javascript object you're de-serializaing. Then, deserialize the JSON as that object and you should be able to access the ""Big Boss makes me work" value should be at objectFromJson.work[0].employer.name .
I'm having a few problems trying to consume my JSON Data from a web URL and put it into my Class Array.
My class looks something like this;
public class User
{
String Name;
String Serial;
String Email;
}
Where my JSON data looks like
{ "name": "cname", "value": [ "Joe Bloggs"] },
{ "name": "serialnumber", "value": [ "231212312" ] },
{ "name": "gender", "value": [ "male" ] },
{ "name": "email", "value": [ "jbloggs#domain.com" ] },
I want to pop this into a User Class Array so that it would be somthing like
User myUsers[] = new User[100];
I have the data downloaded using a StreamReader, but I'm lost as to where to start. I've tried out DataContractJsonSerializer and a few others but can't find any basic guides on the web where to start.
I should note that I want to only grab the values listed in the class and not extras such as Gender etc.
If someone could provide a basic sample of both the class and the program implementation to read the data that would be great.
Thanks,
CM888.
I Highly Reccomend looking into this library:
Json.NET
It has many great features, but the best is that it's designed to mimic LINQ to XML. You can use it very similarly.
Using this library you could parse your json into objects and work with them & linq queries to build up your user array.
To expand on my comment above: (Unrelated to question or answer)
What i meant was i was curious why your JSON wasn't strctured like this:
[
{"cname": "Joe Bloggs", "serialnumber": "231313213", "gender": "male", "email": "jbloggs#domain.com"},
{"cname": "Another Dude", "serialnumber": "345345345", "gender": "male", "email": "another#dude.com"}
]