I am trying to bulk index document into ES using BulkDescriptor in C#. i am using V1.7 ES. Following is my piece of code,
public IBulkResponse IndexBulk(string index, string type, List<string> documents)
{
BulkDescriptor descriptor = new BulkDescriptor();
foreach (var doc in documents)
{
JObject data = JObject.Parse(documents);
descriptor.Index<object>(i => i
.Index(index)
.Type(type)
.Id(data["Id"].toString())
.Document(doc));
}
return _Client.Bulk(descriptor);
}
But it is not inserting the documents, When i verified the response i saw the following message MapperParsingException[Malformed content, must start with an object]
Sample JSON document
{
"a" : "abc",
"b": { "c": ["1","2"]}
}
What went wrong in it?
Issue here is passing raw json through strongly typed fluent bulk method.
What you are actually sending to elasticsearch is
{"index":{"_index":"test1","_type":"string"}}
"{"a" : "abc","b": { "c": ["1","2"]}}"
which is not correct.
Few ideas what you can do about this:
use JObject to send correctly serialized object to elasticsearch
descriptor.Index<JObject>(i => i
.Index(index)
.Type(type)
.Id(data["Id"].toString())
.Document(JObject.Parse(doc)));
take advantage of using .Raw client to send raw json
var json = new StringBuilder();
json.AppendLine(#"{""index"":{""_index"":""indexName"",""_type"":""typeName""}}");
json.AppendLine(#"{""a"" : ""abc"",""b"": { ""c"": [""1"",""2""]}}");
_Client.Raw.Bulk(json2.ToString());
Hope it helps.
Related
I'm pushing a stream of data to Azure EventHub with the following code leveraging Microsoft.Hadoop.Avro.. this code runs every 5 seconds, and simply plops the same two Avro serialised items šš¼:
var strSchema = File.ReadAllText("schema.json");
var avroSerializer = AvroSerializer.CreateGeneric(strSchema);
var rootSchema = avroSerializer.WriterSchema as RecordSchema;
var itemList = new List<AvroRecord>();
dynamic record_one = new AvroRecord(rootSchema);
record_one.FirstName = "Some";
record_one.LastName = "Guy";
itemList.Add(record_one);
dynamic record_two = new AvroRecord(rootSchema);
record_two.FirstName = "A.";
record_two.LastName = "Person";
itemList.Add(record_two);
using (var buffer = new MemoryStream())
{
using (var writer = AvroContainer.CreateGenericWriter(strSchema, buffer, Codec.Null))
{
using (var streamWriter = new SequentialWriter<object>(writer, itemList.Count))
{
foreach (var item in itemList)
{
streamWriter.Write(item);
}
}
}
eventHubClient.SendAsync(new EventData(buffer.ToArray()));
}
The schema used here is, again, v. simple:
{
"type": "record",
"name": "User",
"namespace": "SerDes",
"fields": [
{
"name": "FirstName",
"type": "string"
},
{
"name": "LastName",
"type": "string"
}
]
}
I have validated this is all good, with a simple view in Azure Stream Analytics on the portal:
So far so good, but i cannot, for the life of me correctly deserialize this in Databricks leverage the from_avro() command under Scala..
Load (the exact same) schema as a string:
val sampleJsonSchema = dbutils.fs.head("/mnt/schemas/schema.json")
Configure EventHub
val connectionString = ConnectionStringBuilder("<CONNECTION_STRING>")
.setEventHubName("<NAME_OF_EVENT_HUB>")
.build
val eventHubsConf = EventHubsConf(connectionString).setStartingPosition(EventPosition.fromEndOfStream)
val eventhubs = spark.readStream.format("eventhubs").options(eventHubsConf.toMap).load()
Read the data..
// this works, and i can see the serialised data
display(eventhubs.select($"body"))
// this fails, and with an exception: org.apache.spark.SparkException: Malformed records are detected in record parsing. Current parse Mode: FAILFAST. To process malformed records as null result, try setting the option 'mode' as 'PERMISSIVE'.
display(eventhubs.select(from_avro($"body", sampleJsonSchema)))
So essentially, what is going on here.. i am serialising the data with the same schema as deserializing, but something is malformed.. the documentation is incredibly sparse on this front (very very minimal on the Microsoft website).
The issue
After additional investigation, (and mainly with the help of this article) I found what my problem was: from_avro(data: Column, jsonFormatSchema: String) expects spark schema format and not avro schema format. The documentation is not very clear on this.
Solution 1
Databricks provides a handy method from_avro(column: Column, subject: String, schemaRegistryUrl: String)) that fetches needed avro schema from kafka schema registry and automatically converts to correct format.
Unfortunately, it is not available for pure spark, nor is it possible to use it without a kafka schema registry.
Solution 2
Use schema conversion provided by spark:
// define avro deserializer
class AvroDeserializer() extends AbstractKafkaAvroDeserializer {
override def deserialize(payload: Array[Byte]): String = {
val genericRecord = this.deserialize(payload).asInstanceOf[GenericRecord]
genericRecord.toString
}
}
// create deserializer instance
val deserializer = new AvroDeserializer()
// register deserializer
spark.udf.register("deserialize_avro", (bytes: Array[Byte]) =>
deserializer.deserialize(bytes)
)
// get avro schema from registry (but I presume that it should also work with schema read from a local file)
val registryClient = new CachedSchemaRegistryClient(kafkaSchemaRegistryUrl, 128)
val avroSchema = registryClient.getLatestSchemaMetadata(topic + "-value").getSchema
val sparkSchema = SchemaConverters.toSqlType(new Schema.Parser().parse(avroSchema))
// consume data
df.selectExpr("deserialize_avro(value) as data")
.select(from_json(col("data"), sparkSchema.dataType).as("data"))
.select("data.*")
I'm trying to get an object List from cors API but all of the List entries are null.
I succeeded in obtaining the List (list type and length are ok).
List<PluginModelDB> result;
using (HttpResponseMessage response = await ApiBroker.ApiClient.GetAsync(""))
{
if (response.IsSuccessStatusCode)
{
result = await response.Content.ReadAsAsync<List<PluginModelDB>>();
}
else
{
throw new Exception(response.ReasonPhrase);
}
}
HomeViewModel.PluginList = new List<PluginModelDB>();
foreach (var p in result)
{
HomeViewModel.PluginList.Add(new PluginModelDB { ID = p.ID, Name = p.Name, Description = p.Description});
}
Try to read it as string first, so don't deserialize it immediately. Output that string to Console or a simple text file and see if you can track you items there. If not, the problem is at the API, its returning empty objects.
It seems that your model is not corresponds to received JSON. You can check you model via http://json2csharp.com/
I'm getting from client json string:
{ "Client": { "Name": "John" } }
And in document I have the following tag:
<<[client.name]>>
and try to inject it:
var obj = JsonConvert.DeserializeObject(input.DataJson);
var engine = new ReportingEngine();
engine.BuildReport(document, obj);
But it doesn't work.
Can I inject that json with case insensetive checking of properties? Or I have to modify json to do its properties as lowercase?
How can I do that?
I am afraid, LINQ Reporting Engine currently does not support dynamic objects as data sources. We have logged a new feature request for your scenario. The ID of this issue is WORDSNET-16421. We will inform you via this thread as soon as the requested feature is implemented. You may convert JSON string to DataSet to make it work as described in the following example:
// Assume you have following in document
// <<[Client.Name]>>
string json = "{ \"Client\": { \"Name\": \"John\" } }";
XmlDocument Xml = (XmlDocument)JsonConvert.DeserializeXmlNode(json);
DataSet ds = new DataSet();
ds.ReadXml(new MemoryStream(Encoding.UTF8.GetBytes(Xml.InnerXml)));
Document doc = new Document(MyDir + #"in.docx");
ReportingEngine engine = new ReportingEngine();
engine.BuildReport(doc, ds.Tables[0].Rows[0], "Client");
doc.Save(MyDir + #"18.2.docx");
I work with Aspose as Developer Evangelist.
If I have json that looks something like this: (wrote this by hand, so it may have errors)
{
"http://devserver.somesite.com/someendpoint/1/2/3$metadata#Something.Start": [
{
"Title": "start",
"Endpoint": "https://devserver.somesite.com/someendpoint/1/2/3/start"
}
],
"http://devserver.somesite.com/someendpoint/1/2/3$metadata#Something.Stop": [
{
"Title": "stop",
"Endpoint": "https:// devserver.somesite.com/someendpoint/1/2/3/stop"
}
]
}
Is there any easy, built in way (JSON.net) to have it understand that thereās a namespace in play here? Or is there a way to set a variable or pattern based JsonProperty via an attribute?
I can't have the URL as part of my business object, because that will change from environment to environment.
I know I can create a custom json converter, but before going down that route Iād like to see if thereās something more out of box that handles this. Another option is to get the data via xml and handle that by hand.
Assuming you are taking this as a string that you have received from a web call you can do the following in JSON.NET.
var json = "your string here";
var obj = JObject.Parse(json);
foreach(var ns in obj.Properties) {
var arr = (JArray)ns.Value;
foreach(var obj2 in arr) {
// do you logic here to get the properties
}
}
Another option that James Newton-King provided you can do this, which seems a little cleaner:
var list = JsonConvert.DeserializeObject<Dictionary<string, List<MyClass>>>(json);
How can I use JSON.Net and loop through the following JSON to output one HTML image tag (a string) for each member of the "photos" object?
My goal is to read the below JSON and output this string:
"<img src='/images/foo.jpg' alt='Hello World!'><img src='/images/bar.jpg' alt='Another Photo' />"
JSON is stored in external file "photos.json"
{
"photos": {
"photo1": {
"src": "/images/foo.jpg",
"alt": "Hello World!"
},
"photo2": {
"src": "/images/bar.jpg",
"alt": "Another Photo"
}
}
}
I've started with code similar to what's shown here: http://www.hanselman.com/blog/NuGetPackageOfTheWeek4DeserializingJSONWithJsonNET.aspx
var client = new WebClient();
client.Headers.Add("User-Agent", "Nobody");
var response = client.DownloadString(new Uri("http://www.example.com/photos.json"));
JObject o = JObject.Parse(response);'
//Now o is an object I can walk around...
But, I haven't found a way to "walk around o" as shown in the example.
I want to loop through each member of the photos object, read the properties and add html to my string for each photo.
So far, I've tried the examples shown here: http://james.newtonking.com/json/help/index.html?topic=html/QueryJson.htm
But, I cannot make them work once inside a for each loop.
Here is how you can "walk around" your JObject to extract the information you need.
string json = #"
{
""photos"": {
""photo1"": {
""src"": ""/images/foo.jpg"",
""alt"": ""Hello World!""
},
""photo2"": {
""src"": ""/images/bar.jpg"",
""alt"": ""Another Photo""
}
}
}";
StringBuilder sb = new StringBuilder();
JObject o = JObject.Parse(json);
foreach (JProperty prop in o["photos"].Children<JProperty>())
{
JObject photo = (JObject)prop.Value;
sb.AppendFormat("<img src='{0}' alt='{1}' />\r\n",
photo["src"], photo["alt"]);
}
Console.WriteLine(sb.ToString());
Output:
<img src='/images/foo.jpg' alt='Hello World!' />
<img src='/images/bar.jpg' alt='Another Photo' />
First you need to define a class that holds your data and also is able to output itself as an HTML tag:
public class Photo
{
public string Src { get; set; }
public string Alt { get; set; }
public string ToHtml()
{
return string.Format(
"<img src='{0}' alt='{1}'/>,
this.Src,
this.Alt);
}
}
In order to be able to use JSON.Net for creating typed objects, you need to 'normalize' your JSON - it is not entirely in the usual format that would indicate an array of identical objects. You have to entirely get rid of the identifiers photo*1*, photo*2*,.., photo*n*, or you have to make them identical (i.e. they all should simply be photo, without number). If you can control JSON creation, you can do it right there. Otherwise you must manipulate the web response accordingly (e.g. with string.Replace(...)).
Having done that, you can use JSON.Net to get a typed list, and subsequently you can simply iterate through it to get the required HTML:
var client = new WebClient();
client.Headers.Add("User-Agent", "Nobody");
string response = client.DownloadString(new Uri("http://www.example.com/photos.json"));
// --> 'Normalize' response string here, if necessary
List<Photo> photos = JsonConvert.DeserializeObject<List<Photo>>(response);
// now buid the HTML string
var sb = new StringBuilder();
foreach(photo in photos)
{
sb.Append(photo.ToHtml());
}
string fullHtml = sb.ToString();
...