Index Json Document using Elasticsearch NEST C# - c#

I'm very new to Elasticsearch and Want to know How to create index and index following json document to Elasticsearch using NEST C#?
{
"BookName": "Book1",
"ISBN": "978-3-16-148410-0",
"chapter" : [
{
"chapter_name": "Chapter1",
"chapter_desc": "Before getting into computer programming, let us first understand computer programs and what they..."
},
{
"chapter_name": "Chapter2",
"chapter_desc": "Today computer programs are being used in almost every field, household, agriculture, medical, entertainment, defense.."
},
{
"chapter_name": "Chapter3",
"chapter_desc": "MS Word, MS Excel, Adobe Photoshop, Internet Explorer, Chrome, etc., are..."
},
{
"chapter_name": "Chapter4",
"chapter_desc": "Computer programs are being used to develop graphics and special effects in movie..."
}
]
}

To create an index with NEST is as simple as
var client = new ElasticClient();
client.CreateIndex("index-name");
This will create an index with the default number of shards and replicas defined for the node.
To index a document represented as json into the index would be
var json = #"{
""BookName"": ""Book1"",
""ISBN"": ""978-3-16-148410-0"",
""chapter"" : [
{
""chapter_name"": ""Chapter1"",
""chapter_desc"": ""Before getting into computer programming, let us first understand computer programs and what they...""
},
{
""chapter_name"": ""Chapter2"",
""chapter_desc"": ""Today computer programs are being used in almost every field, household, agriculture, medical, entertainment, defense..""
},
{
""chapter_name"": ""Chapter3"",
""chapter_desc"": ""MS Word, MS Excel, Adobe Photoshop, Internet Explorer, Chrome, etc., are...""
},
{
""chapter_name"": ""Chapter4"",
""chapter_desc"": ""Computer programs are being used to develop graphics and special effects in movie...""
}
]
}";
var indexResponse = client.LowLevel.Index<string>("index-name", "type-name", json);
if (!indexResponse.Success)
Console.WriteLine(indexResponse.DebugInformation);
Here we use the low level client to index json, available in NEST through the .LowLevel property on ElasticClient.
To search the indexed document would be
// refresh the index so that newly indexed documents are available
// for search without waiting for the refresh interval
client.Refresh("index-name");
var searchResponse = client.Search<dynamic>(s => s
.Index("index-name")
.Type("type-name")
.Query(q => q
.Match(m => m
.Query("Photoshop")
.Field("chapter.chapter_desc")
)
)
);
This returns the document indexed. The generic type parameter dynamic used in Search<T>() here means that the resulting documents will be deserialized to Json.Net JObject types.
When we created the index, we didn't specify a mapping for our type, type-name, so Elasticsearch inferred the mapping from the structure of the json document. This is dynamic mapping and can be useful for many situations however, if you know the structure of documents that you're going to be sending and it's not going to be destructively changed, then you specify a mapping for the type. The advantage of doing this in this particular example is that the chapter array will be inferred as an object type mapping, but if you wanted to search across both chapter name and chapter description of an individual chapter, then you probably want to map chapter as a nested type.

Related

Query all documents in the collection to find if certain document contains value

Let's say that I have thousands of documents in the mongodb collection. Every random document contains object ownersData with two properties (ownerId, ownerRef)
{
"_id": {....},
"name": "abc",
"ownersData": { "ownerId":"1", "ownerRef":"qwer" }
}
what should be the fastest way to query all documents with info that some document contains certain ownerRef value
Assuming you have C# objects representing the data in your database, you could achieve this using linq.
public IEnumerable<Document>? GetDocumentByOwnerRef(string ownerReference)
=> dbContext.Documents.Where(d =>
d.ownerData is not null && d.ownersData.ownerRef.Equals(ownerReference)
).ToList();
EDIT: As #SJFJ correctly pointed out in the comments; while this does work, it will scan your entire collection, making it a very long process. Creating an index on the ownerRef column in your mongodb will save performance and load time.

Query a dynamic-key nested dictionary structure in Cosmos DB with LINQ to SQL in C#

I want to query for values in a dynamic-key nested dictionary structure in order to apply a permission filter using LINQ to SQL (preferably utilizing index queries and not causing scans).
The JSON structure I have contains a Permissions property which is basically just a C# dictionary<string, dictionary<string, dictionary<string, string>>>
JSON example:
{
"Permissions": {
"Account 1 GUID": {
"Location": "Location1 GUID"
},
"Account 2 GUID: {
"Location": "Location 2 GUID"
},
"Account 3 GUID": {
"Location": "Location 3 GUID"
}
}
}
The account key properties are dynamic, meaning they are in fact the GUIDs of the accounts owning the nested location.
When a query is performed I have a list of location and account IDs that should be used to perform the permission filtering by giving access only to those locations.
The filtering query I want to apply is SIMILAR TO SOMETHING like this (THIS IS WHAT I WANT TO ACHIEVE):
var allowedLocations = new List<string> { "Location 1 GUID", "Location 2 GUID" };
var result = AllDocsIQueryable.Where(model =>
model.Permissions.Values.Where(secondLevel =>
secondLevel.Values.Where(thirdLevel =>
allowedLocations.Contains(thirdLevel)))).ToList();
As you can see, I want to filter the docs a user can see to those the user has defined an access to, by querying for docs that contains any of the location IDs in my access list.
I have no idea how to go about construction the query for querying a structure like this. I have tried many combinations of .Select and .SelectMany queries, but nothing I have tried succeeds, all I have tried yields no results or fails.
A query like the following yields the correct results, but I would really like to avoid having to build this query dynamically with predicate builder for example:
var result = AllDocsIQueryable.Where(model =>
(model.Permissions["Account 1 GUID"]["Location"].IsDefined() &&
model.Permissions["Account 1 GUID"]["Location"] == "Location 1 GUID")
||
(model.Permissions["Account 2 GUID"]["Location"].IsDefined() &&
model.Permissions["Account 2 GUID"]["Location"] == "Location 2 GUID")
// .. etc ..
).ToList();
I hope my question is clear and there is a LINQ to SQL/Cosmos/NoSQL/C# super wiz that can help me construct this query? Maybe someone knows that I can't construct this query at all, and I need to change the JSON structure (which I know would be preferable)? I need to query using LINQ and not SQL query string!
Conclusion:
It seems, as I was afraid of, that there's just no way of building the query.
As far as I know, and have tried, and read, there's just no way to do it without building the dynamic account keys into the query.
I've changed the schema to include arrays instead.
"Permissions": [
{
"LocationId": "Loc1",
"AccountId": "Acc1"
},
{
"LocationId": "Loc2",
"AccountId": "Acc2"
}
]
I'm afraid there is no way you can achieve that just using LINQ. Here are your options:
Use a UDF to retrieve an array of locations from a permissions object. Unfortunately, you can't call a UDF from LINQ (at least not in SDK v3 - you can in SDK v2), so you need to use raw SQL. (Actually there is hacky way.) This solution is bad because it's really slow.
Leave your schema but introduce an additional property that contains a list of locations that occur nested within your permissions object. Now you can easily use LINQ, and the query is fast.

Picking Out Simple Properties from Hierarchical JSON Part II

I thought this question might be ill-formed, so I created a "sister" question that's far more to-the-point, which specific output cited. Please see to:
Querying JSON Nested Arrays with Linq, JSON.NET, C#
If that question gets answered before this one, I'll try to answer this one myself using the information from the other question... :) Thanks!
In a previous question (Picking Out Simple Properties from Hierarchical JSON), I asked how to pick up simple properties from hierarchical JSON. The answer there [pasted as a Linq query at the end of this post] is extremently useful (and, since posting that, I've studied quite a bit about Linq and JSON.NET). So I'm not using this forum because I'm lazy--I use it when I'm really stuck and can't seem to find the answers in the books I have access to.
I'm stumped on how to continue with the example provided by my previous question. This question builds on the previous one, so here (as succinctly as I could express) is what I'd love to learn how to do in a single Linq query.
To recap: I'm working with dynamic JSON like this (it is more complex than the JSON presented in my earlier Part I question because it contains arrays):
{
"Branch1": {
"Prop1A": "1A",
"Prop1B": "1B",
"Prop1C": "1C",
"Branch2": {
"Prop2A": "2A",
"Prop2B": "2B",
"Prop2C": "2C",
"Branch3": [{
"Prop3A": "3A",
"Prop3B": "3B",
"Prop3C": "3C"
}, {
"Prop3D": "3D",
"Prop3E": "3E",
"Prop3F": "3F"
}, {
"Prop3G": "3G",
"Prop3H": "3H",
"Prop3I": "3I"
}]
},
"Branch4": [{
"Prop4A": "4A",
"Prop4B": "4B",
"Prop4C": "4C"
}, {
"Prop4E": "4E",
"Prop4F": "4F",
"Prop4G": "4G"
}, {
"Prop4H": "4H",
"Prop4I": "4I",
"Prop4I": "4I"
}]
}
}
As you can see, the dynamic JSON is composed of hierarchical objects, and these objects are JSON objects, JSON arrays, and JSON properties.
Ultimately, I want to convert this JSON to a List object I can work with in C#. I plan to use that List object to effectively process each JSON branch in document order from the top, down.
Each item in the List collection would be a JObject; each of these objects would have a synthetic "parent" string property that would point back to the branch under which that JObject appears in the original JSON (my examples below explain what I mean by "parent"). [The previous question correctly came up with a solution for this "parent" value, so that's not too relevant to this question... What's new/relevant here is dealing with JArray objects in the JSON...]
The key is that I want each List item object to contain only the string-value properties for the object. For example, Branch1 has string properties Prop1A, 1B, and 1C. Thus, I would want query[0] to contain:
{"Prop1A":"1A","Prop1B":"1B","Prop1C":"1C", Parent:""}
Next, I would want query[2] to contain the string-value properties for Branch2:
{"Prop2A":"2A","Prop2B":"2B","Prop2C":"2C", Parent:"Branch1"}
Next, I would want query[2] to contain the string properties for only Branch3, but because Branch3 is an array of objects, I would want that array to end up together in query[2]:
[
{"Prop3A": "3A","Prop3B": "3B","Prop3C": "3C"},
{"Prop3D": "3D","Prop3E": "3E","Prop3F": "3F"},
{"Prop3G": "3G","Prop3H": "3H","Prop3I": "3I"}
]
Notice that this Branch doesn't yet have a reference to its "Parent"... I'd be happy getting an array in query[2] that looks like the above. (I plan to use dbc's logic to add a "Parent" property to each array element or figure out a way to create a new JObject that contains the array and cites the Parent only once):
[{"Prop3A": "3A","Prop3B": "3B","Prop3C": "3C","Parent":"Branch2"},
{"Prop3D": "3D","Prop3E": "3E","Prop3F": "3F","Parent":"Branch2"},
{"Prop3G": "3G","Prop3H": "3H","Prop3I": "3I","Parent":"Branch2"}
]
So, as you can see:
I would want any JSON branch that is not an array to be inserted as a new JObject in the query result along with only its string properties and a reference to its parent branch.
I would want any JSON branch that is an array to be inserted as a new JObject array in the query result along with only its string properties and a reference to its parent branch.
The trouble I had solving this on my own has to do with figuring out how to create the "if myJsonObject is JArray" condition in Linq and assigning just the string-property properties when the branch is not an array and iterating the array's elements when it is a JArray. I suspect I need to somehow leverage the ? : ternary expression, but I don't exactly know how to do that.
The query from the previous question is here:
var query3 = from o in root.DescendantsAndSelf().OfType<JObject>() // Find objects
let l = o.Properties().Where(p => p.Value is JValue) // Select their primitive properties
where l.Any() // Skip objects with no properties
// Add synthetic "Parent" property
let l2 = l.Concat(new[] { new JProperty("Parent", o.Ancestors().OfType<JProperty>().Skip(1).Select(a => a.Name).FirstOrDefault() ?? "") })
select new JObject(l2); // And return a JObject.
var list3 = query3.ToList();
That code doesn't handle arrays in the way described above.
A suitable and more general answer to this question effectively appears in my related post:
Picking Out Simple Properties from Hierarchical JSON

Working with index in Neo4j

I've been going through the Neo4J and Neo4J C# client..
The neo4jclient wiki helped me to with node crud operations.. however the wiki ends there abruptly..
I poked around the test methods in source code and managed to understand about relationships and searched online to understand how indexing works.
So far, here's what I have, roughly:
//create indexing on user and car
client.CreateIndex("User", new IndexConfiguration() { Provider = IndexProvider.lucene, Type = IndexType.fulltext }, IndexFor.Node);
client.CreateIndex("Car", new IndexConfiguration() { Provider = IndexProvider.lucene, Type = IndexType.fulltext }, IndexFor.Node);
//create user
client.Create(new User() { Name = "Dovakiin", Job = "Dragon Slayer" });
client.Create(new User() { Name = "Ulfric stormcloak", Job = "Imperial Slayer" });
//create Car
client.Create(new Car() { Name = "Paarthurnax", Modal = 212 });
//User owns car relationship
client.CreateRelationship(userRef, new Owns_CarRelationship(CarRef));
This is where I am stuck now.. When I try to look for the user by name, my cipher query is returning zero results:
start u=node:User(Name="Dovakiin") return u;
and I don't quite understand why it returns zero nodes when clearly
start n=node(*) return n;
shows all nodes.
Am I missing something else while indexing? Or is this not index related at all? Do I not need to add each node to the index?
All I am trying to do, is select the node with a given property: Name = "Dovakiin" in this case.. How do I select this please?
Just to expand on ulkas' answer, if you want to enable auto indexing and found the documentation a little confusing (like I did the first time I read it), this is how you set it up.
Let's say you want to automatically index some node properties; say, "name" and "job". Open up the /conf/neo4j.properties file and you should see something like this:
# Autoindexing
# Enable auto-indexing for nodes, default is false
#node_auto_indexing=true
# The node property keys to be auto-indexed, if enabled
#node_keys_indexable=name,age
You then have to edit the file to the following:
# Autoindexing
# Enable auto-indexing for nodes, default is false
node_auto_indexing=true
# The node property keys to be auto-indexed, if enabled
node_keys_indexable=name,job
Once this is done, in order for auto indexing to take effect, you'll have to restart neo4j. Also, as a side note, any currently existing nodes won't be auto indexed, which means you'll have to recreate them. If you don't want to start from scratch, here's some documentation on how to update them: http://docs.neo4j.org/chunked/milestone/auto-indexing.html#auto-indexing-update-removal (I've never tried it).
Then you can start finding nodes like this:
start n=node:node_auto_index(name="Dovakiin"), or
start n=node:node_auto_index(job="Dragon Slayer")
Or, like this with the C# client:
Node<User> myNode = client.QueryIndex<User>("node_auto_index", IndexFor.Node, "name:Dovakiin").First();, or
Node<User> myNode = client.QueryIndex<User>("node_auto_index", IndexFor.Node, "job:Dragon Slayer").First();
You can do the same thing with with relationships as well, as soon as you set it up in the /conf/neo4j.properties file. You do it exactly the same way as with nodes.
you must manually add the nodes to the index, something like
client.indexRef1.addToIndex(nodeRef, 'name', 'Dovakiin')
client.indexRef2.addToIndex(nodeRef, 'job', 'Dragon Slayer')
there is also an automatic indexing feature in neo4j in case you want the nodes to be automatically added to the index.

With mongodb and guids for the Id of documents what is efficient way to store the Guids to easily retrieve the actual Guid?

I'm running version 2.06 of Mongodb and version (1.5) of the C# driver supplied by 10Gen.
Each of my entities has an Id property setup as such...
[BsonId(IdGenerator = typeof(GuidGenerator))]
public Guid Id { get; set; }
The Id field is stored as Binary - 3:UuidLegacy. Because of how it is stored when I call ToJson() on an entity it returns the following javascript object for the Id.
_id : Object
$binary: "some values here"
$type: "03"
This is obviously because the data is being stored as Binary = 3:UuidLegacy. This make sense.
I want to use the actual Guid in my Javascript code. How efficient would it be for MongoDB if I made my Id properties look like the following?
[BsonId(IdGenerator = typeof(GuidGenerator)),MongoDB.Bson.Serialization.Attributes.BsonRepresentation(BsonType.String)]
public Guid Id { get; set; }
This makes mongodb store my Id as a string. But how efficient is this really? I'm guessing the Binary format for my Id is better, but I really need the Guid.
How can I go from Binary - 3:uuidLegacy to the Guid I need in my json?
I guess another thought would be could I just use the $binary value that is sent to me? I use the Id to perform lookups and such as part of my query strings.
Thanks,
Working with GUIDs has a few pitfalls, mostly related to how to work with the binary representation in the mongo shell and also to historical accidents which resulted in different drivers storing GUIDs using different byte orders.
I used the following code to illustrate the issues:
var document = new BsonDocument { { "_id", Guid.NewGuid() }, { "x", 1 } };
collection.Drop();
collection.Insert(document);
Console.WriteLine("Inserted GUID: {0}", document["_id"].AsGuid);
which when I ran it output:
Inserted GUID: 2d25b9c6-6d30-4441-a360-47e7804c62be
when I display this in the mongo shell I get:
> var doc = db.test.findOne()
> doc
{ "_id" : BinData(3,"xrklLTBtQUSjYEfngExivg=="), "x" : 1 }
> doc._id.hex()
c6b9252d306d4144a36047e7804c62be
>
Notice that even when displayed as hex the byte order doesn't match the original GUID. That's the historical accident I was talking about. All the bytes are there, they're just in an unusual order thanks to Microsoft's implementation of Guid.ToByteArray().
To help you work with GUIDs in the mongo shell you could copy the following file of helper functions to the directory where mongo.exe is stored:
https://github.com/rstam/mongo-csharp-driver/blob/master/uuidhelpers.js
The file has some brief documentation comments at the top that you might find helpful. To make these functions available in the mongo shell you need to tell the mongo shell to read this file as it starts up. See the following sample session:
C:\mongodb\mongodb-win32-x86_64-2.0.6\bin>mongo --shell uuidhelpers.js
MongoDB shell version: 2.0.6
connecting to: test
type "help" for help
> var doc = db.test.findOne()
> doc
{ "_id" : BinData(3,"xrklLTBtQUSjYEfngExivg=="), "x" : 1 }
> doc._id.hex()
c6b9252d306d4144a36047e7804c62be
> doc._id.toCSUUID()
CSUUID("2d25b9c6-6d30-4441-a360-47e7804c62be")
>
You could also use another of the helper functions to query for the GUIDs:
> db.test.find({_id : CSUUID("2d25b9c6-6d30-4441-a360-47e7804c62be")})
{ "_id" : BinData(3,"xrklLTBtQUSjYEfngExivg=="), "x" : 1 }
>
As far as storing your GUIDs as strings, that's not an unheard of thing to do and it definitely makes viewing and querying the data in the mongo shell easier and avoids all the issues with different byte orders. The only disadvantage is that it uses more space (roughly double).

Categories