I'm trying to scrape pages, find their schema.org script, then deserialize it.
I am able to find the script, however, valid JSON schema (according to Google/schema.org) is supposedly invalid in most Json Validator tools.
For example, this is my code
string Url = "https://www.independent.co.uk/news/health/nhs-pay-health-coronavirus-unions-b1812659.html";
HtmlWeb web = new HtmlWeb();
HtmlDocument doc = web.Load(Url);
var scripts = doc.DocumentNode.SelectNodes("//script");
foreach (HtmlNode node in scripts)
{
string value = node.InnerText;
if (value.Contains("schema.org"))
{
dynamic results = JsonConvert.DeserializeObject<dynamic>(value);
var name = results.name;
}
}
Which finds the following Schema (JSON)
{{
"#type": "Organization",
"#context": "https://schema.org",
"name": "The Independent",
"url": "https://www.independent.co.uk",
"logo": {
"#type": "ImageObject",
"url": "https://www.independent.co.uk/img/logo.png",
"width": 504,
"height": 60
},
"sameAs": [
"https://twitter.com/Independent",
"https://www.facebook.com/TheIndependentOnline"
]
}}
#1 The JSON is supposedly invalid, even though every website using structured data uses it like this
#2 When I try to get the name value, it returns null.
I assume my problems are because the JSON is invalid. How do I make this work? I'm out of ideas.
You need to get rid of the extra curly brackets at the start and end of the JSON to make it valid JSON.
Related
I have a simple GraphQL query that I'm making out to a server, and trying to use GraphQL.Client.Serializer.Newtonsoft to deserialize it. I'm getting a response, but the response does not make the NewtonsoftJsonSerializer happy. It complains:
This converter can only parse when the root element is a JSON Object.
Okay, fair enough. The response must not be what was expected. But how can I view the response to see what may be wrong? The GraphQLHttpClient.SendQueryAsync<>() method requires a type to try to deserialize to...which is the part that is failing. I don't see a way to just get the text response so I can see what is wrong with it.
I've included sample code at the bottom, but the only real line of interest (I believe) is the last one:
var graphQLResponse = await graphQLClient.SendQueryAsync<object>(theRequest);
Is there some way to just get the text of the response?
var graphQLClient = new GraphQLHttpClient("https://<TheDomain>.com/graphql/api/v1", new NewtonsoftJsonSerializer());
graphQLClient.HttpClient.DefaultRequestHeaders.Add("Authorization", "Bearer <MyKey>");
graphQLClient.HttpClient.DefaultRequestHeaders.Add("Accept", "application/json");
var theRequest = new GraphQLRequest
{
Query = "{ __schema { queryType { fields { name } } } }"
};
var graphQLResponse = await graphQLClient.SendQueryAsync<object>(theRequest);
Using Fiddler Everywhere, I was able to pull in the request and response, both of which seem superficially valid (I updated the query in the question to match the current query I'm passing in...a general schema query)
Request:
{
"query": "{ __schema { queryType { fields { name } } } }"
}
Response:
(I've redacted the names of the endpoints and removed much of the repetition from the middle of the response. But the rest is unchanged, and looks superficially okay to me...in particular, it looks like valid JSON, so I'm unclear why the converter is complaining.)
{
"errors": [],
"data": {
"__schema": {
"queryType": {
"fields": [
{
"name": "getData1"
},
{
"name": "getData2"
},
...
<a bunch more here>
...
{
"name": "getData100"
}
]
}
}
},
"extensions": null,
"dataPresent": true
}
After a few hit and trials and discussion with the O.P, it was found that while initializing GraphQLHttpClient with GraphQL.Client.Serializer.Newtonsoft, the serializer is not correctly doing the deserialization of the response despite the response being a valid JSON string.
After switching to GraphQL.Client.Serializer.SystemTextJson, the response is being correctly parsed as expected which suggests that there could be a bug in the GraphQL.Client.Serializer.Newtonsoft serializer
I want to be able to print the body of a google docs in the console using c#.
I am able to print the title using this code
DocumentsResource.GetRequest request = service.Documents.Get(documentId);
Document doc = request.Execute();
Console.WriteLine(doc.Title);
but I am unable to do the same thing with the body of the text using this code
DocumentsResource.GetRequest request = service.Documents.Get(documentId);
Document doc = request.Execute();
Console.WriteLine(doc.Body);
The output is Google.Apis.Docs.v1.Data.Body.
What is the problem with the code and what should I change ?
Answer:
You need to extract the data out of the body object.
More Information:
As per the documentation on the Document Resource, the body is an object containing more than just a string of data:
{
"documentId": string,
"title": string,
"body": {
"content": [
{
"startIndex": integer,
"endIndex": integer,
"paragraph": {
object (Paragraph)
},
"sectionBreak": {
object (SectionBreak)
},
"table": {
object (Table)
},
"tableOfContents": {
object (TableOfContents)
}
}
]
},
// ...
}
The Document resource goes quite a few layers deep, so depending on what information you are trying to extract from the Body, you will have to reference this directly - something like doc.Body.Content[0].Paragraph.Elements[0].TextRun.Content - but this will highly depend on what your document contains.
You can also try viewing the whole object with by serialising the object with the JavaScriptSerializer Class as recommended by Microsoft.
References:
REST Resource: documents | Google Docs API | Google Developers
JavaScriptSerializer Class (System.Web.Script.Serialization) | Microsoft Docs
I'm trying to integrate BlueImp jQuery file upload component into my ASP.NET 4 website. I have the file upload working and writing to disk, but the component requires that I return a JSON object from the server as confirmation of success, in a particular format:
{"files": [
{
"name": "picture1.jpg",
"size": 902604,
"url": "http:\/\/example.org\/files\/picture1.jpg",
"thumbnailUrl": "http:\/\/example.org\/files\/thumbnail\/picture1.jpg",
"deleteUrl": "http:\/\/example.org\/files\/picture1.jpg",
"deleteType": "DELETE"
},
{
"name": "picture2.jpg",
"size": 841946,
"url": "http:\/\/example.org\/files\/picture2.jpg",
"thumbnailUrl": "http:\/\/example.org\/files\/thumbnail\/picture2.jpg",
"deleteUrl": "http:\/\/example.org\/files\/picture2.jpg",
"deleteType": "DELETE"
}
]}
I'd like to use the JsonResultClass to return this object in my C#, but I'm not sure how to format the response correctly. I can probably do something like this:
var uploadedFiles = new List<object>();
uploadedFiles.Add(new { name = "picture1.jpg", size = 902604, url = "http://example.org/files/picture1.jpg", thumbnailUrl = "http://example.org/files/thumbnail/picture1.jpg", deleteUrl ="http://example.org/files/picture1.jpg", deleteType = "DELETE" });
uploadedFiles.Add(new { name = "picture2.jpg", size = 902604, url = "http://example.org/files/picture1.jpg", thumbnailUrl = "http://example.org/files/thumbnail/picture1.jpg", deleteUrl ="http://example.org/files/picture1.jpg", deleteType = "DELETE" });
return Json(uploadedFiles);
...but then I'm not sure how to wrap this in the outer 'files' object.
Can anyone point me (a .NET novice trying to learn!) in the right direction here. I've looked at the MSDN documentation but it doesn't go into detail about formatting or constructing more complex JSON objects.
Many thanks.
Replace:
return Json(uploadedFiles);
with:
return Json(new {files = uploadedFiles});
to create a new anonymous type with property "files", which has your original list as a value.
I am new to C#, and did not find an easy piece of code to read a response from a URL. Example:
http://www.somesitehere.com/mysearch
The response is something like this ( I do not know what kind of response is this):
{ "response": {
"status": {
"code": "0",
"message": "Success",
"version": "4.2"
},
"start": 0,
"total": 121,
"images": [
{
"url": "www.someimagelinkhere.com/pic.jpg",
"license": {
"url": "",
"attribution": "",
"type": "unknown"
}
}
]
}}
After that I will to save that url "www.someimagelinkhere.com/pic.jpg" to a file. But this I know how to do. I just want to separate the url from the rest.
I saw this topic: Easiest way to read from a URL into a string in .NET
bye
Your response is of JSON Format. Use a library (NewtonSoft but there are others too) to extract the node you want.
You can use something like JSON.NET by Newton soft, which can be found and installed using NuGet Package Manager in Visual Studio.
Also you could just do this.
var jSerializer = new JavaScriptSerializer();
var result = jSerializer.DeserializeObject("YOUR JSON RESPONSE STRING");
The JSON string will not be a C# object with properties that match your names such as start, total, images, etc. If you need to you can create a strong type object and cast your converted object to that one for ease of use.
Strong typed version:
var jSerializer = new JavaScriptSerializer();
var result = (YourStrongType)jSerializer.DeserializeObject("YOUR JSON RESPONSE STRING");
var imgUrl = result.images[0].url;
Using C# and Visual Studio 2010 (Windows Form Project), InstaSharp and Newtonsoft.Json libraries.
I want to get the image url from the JSON string returned to me by the Endpoint Instagram API when I request for a particular hashtag.
I can so far retrive the JSON string.
I am trying to use Newtonsoft.Json to deserialize the object using the examples, but I probably dont understand the JSON string representation of the object properly.
Below is a simplified sample response I get from the api call tags/tag-name/media/recent from their documentation. source here
{
"data": [{
"type": "image",
"filter": "Earlybird",
"tags": ["snow"],
"comments": {
}
"caption": {
},
"likes": {
},
"created_time": "1296703536",
"images": {
"low_resolution": {
"url": "http://distillery.s3.amazonaws.com/media/2011/02/02/f9443f3443484c40b4792fa7c76214d5_6.jpg",
"width": 306,
"height": 306
},
"thumbnail": {
"url": "http://distillery.s3.amazonaws.com/media/2011/02/02/f9443f3443484c40b4792fa7c76214d5_5.jpg",
"width": 150,
"height": 150
},
"standard_resolution": {
"url": "http://distillery.s3.amazonaws.com/media/2011/02/02/f9443f3443484c40b4792fa7c76214d5_7.jpg",
"width": 612,
"height": 612
}
},
"id": "22699663",
"location": null
},
...
]
}
I want to get specifically the standard_resolution in the images part.
This is the revelevant code that I currently have.
//Create the Client Configuration object using Instasharp
var config = new InstaSharp.Endpoints.Tags.Unauthenticated(config);
//Get the recent pictures of a particular hashtag (tagName)
var pictures = config.Recent(tagName);
//Deserialize the object to get the "images" part
var pictureResultObject = JsonConvert.DeserializeObject<dynamic>(pictureResult.Json);
consoleTextBox.Text = pictureResult.Json;
var imageUrl = pictureResultObject.Data.Images;
Console.WriteLine(imageUrl);
I get the error: Additional information: Cannot perform runtime binding on a null reference
so imageUrl is indeed null when I debug, hence indicating I am not accessing it the right way.
Anyone can explain to me how to access different parts of this JSON String using Newtonsoft.Json?
Using Newtonsoft.Json
dynamic dyn = JsonConvert.DeserializeObject(json);
foreach (var data in dyn.data)
{
Console.WriteLine("{0} - {1}",
data.filter,
data.images.standard_resolution.url);
}
I wrote a plugin for .net which takes care of deserializing the json string and returning a data table. it is still in development but see if it helps. Instagram.NET on Github