Query items in a dictionary in a CosmosDB document - c#

I have documents like this in my CosmosDB database:
{
"id": "12345",
"filename": "foo.txt",
"versions": {
"1": {
"storageAccount": "blob123",
"size": 33
},
"2": {
"storageAccount": "blob123",
"size": 42
}
}
}
(this is a simplified sample)
I need to query on the "storageAccount" property, to check if there are files stored on a given storage account. But I can't find a way to express "for each version".
I tried this, but of course it doesn't work
select top 1 *
from c
join v in c.versions
where v.storageAccount = 'blob123'
Apparently JOIN only works on arrays, not dictionaries. Is there a way to query items in a dictionary?
As a workaround, I can use an UDF, but the performance and cost are terrible (1200 RUs for just 2000 documents when there is not matching document...)
EDIT: updated to more closely reflect actual use case

Unfortunately, this isn't possible today. You cannot iterate over object keys in Cosmos's SQL.
I'd recommend changing the schema to something like:
{
"id": "12345",
"filename": "foo.txt",
"versions": [
{
"id": "1"
"storageAccount": "blob123",
"size": 33
},
{
"id": "2"
"storageAccount": "blob123",
"size": 42
}
]
}
Additionally, you could evaluate a User Defined Function which would return the keys of an object for you, but that will increase your RU costs, though possibly less than sprocs.

Related

How to search in dynamodb with a complex value using .net sdk

I have just started with DynamoDB. I have background in MongoDB and relational databases and I am structuring my JSON in more like a graph structure than a flat structure. For example,
[
{
"id": "1",
"title": "Castle on the hill",
"lyrics": "when i was six years old I broke my leg",
"artists": [
{
"name": "Ed Sheeran",
"sex": "male"
}
]
}
]
For example, If I like to search the item by 'Ed Sheeran'. The closest I have got is this and this is not even matching any value.
var request = new ScanRequest
{
TableName = "Table",
ProjectionExpression = "Id, Title, Artists",
ExpressionAttributeValues = new Dictionary<string,AttributeValue>
{
{ ":artist", new AttributeValue { M = new Dictionary<string, AttributeValue>
{
{ "Name", new AttributeValue { S = "Ed Sheeran" }}
}
}
}
},
ExpressionAttributeNames = new Dictionary<string, string>
{
{ "#artist", "Artists" },
},
FilterExpression = "#artist = :artist",
};
var result = await client.ScanAsync(request);
Most of the example and tuturials I have watched so far, they have treated dynamodb as a table in a normal relational database with very flat design. Am I doing it wrong to structure the JSON as above? Should Artists be in a separate table?
And If it can be done, how do i search by some value in a complex type like in the above example?
First of all, you should not be using the scan operation in dynamodb. I would strongly recommend to use query. Have a look at this stack overflow question first.
If you want to search on any attribute, you can either mark them as the primary key (either hash_key or hash_key + sort_key) or create an index on the field you want to query on.
Depending on the use case of id attribute in your schema, if you are never querying on id attribute, I would recommend the structure something like this :
[
{
"artist_name" : "Ed Sheeran" // Hash Key
"id": "1", // Sort Key (Assuming id is unique and combination of HK+SK is unique)
"title": "Castle on the hill",
"lyrics": "when i was six years old I broke my leg",
"artists": [
{
"name": "Ed Sheeran",
"sex": "male"
}
]
}
]
Alternatively, if you also need to query on id and it has to be the hash key, you can an index on the artist_name attribute and then query it.
[
{
"artist_name" : "Ed Sheeran" // GSI Hash key
"id": "1", // Table Hash key
"title": "Castle on the hill",
"lyrics": "when i was six years old I broke my leg",
"artists": [
{
"name": "Ed Sheeran",
"sex": "male"
}
]
}
]
In either case, it is not possible to query inside a nested object without using scan operation and then iterating it in code, something which you have already tried.

DocumentDB Stored Procedure Continuation

I'm trying out DocumentDB as a possible data store for a new application. The app has to handle a lot of data so I used the Data Migration tool to put a lot of documents into a collection.
Most of the queries from my app will be aggregating and summing. So I'm using documentdb-lumenize. The code sample for calling that stored procedure from C# has me doing something like this:
var configString = #"{
cubeConfig: {
groupBy: 'year',
field: 'Amount',
f: 'sum'
},
filterQuery: 'SELECT * FROM TestLargeData t'
}";
var config = JsonConvert.DeserializeObject<object>(configString);
var result = await _client.ExecuteStoredProcedureAsync<dynamic>("my/sproc/link", config);
The result I get back looks like this:
{
"cubeConfig": {
"groupBy": "year",
"field": "Amount",
"f": "sum"
},
"filterQuery": "SELECT * FROM TestLargeData t",
"continuation": "-RID:rOtjAPc4TgBxFwAAAAAAAA==#RT:6#TRC:6000",
"stillQueueing": false,
"savedCube": {
"config": {
"groupBy": "year",
"field": "Amount",
"f": "sum"
},
"cellsAsCSVStyleArray": [
[
"year",
"_count",
"Amount_sum"
],
[
2006,
4825,
1391399555.74
],
[
2007,
1175,
693886378
]
],
"summaryMetrics": {}
},
"example": {
"year": 2007,
"SomeOtherField1": "SomeOtherValue1",
"SomeOtherField2": "SomeOtherValue2",
"Amount": 12000,
"id": "0ee80b66-7fa7-40c1-9124-292c01059562",
"_rid": "...",
"_self": "...",
"_etag": "\"...\"",
"_attachments": "attachments/",
"_ts": ...
}
}
The _count values indicate that I got back 6,000 documents worth of aggregated data. There are a million documents in the collection (I wanted to test big!)
I see the "continuation" value in the result. But StoredProcedureResponse doesn't have an ExecuteNextAsync method like the DocumentQuery class does. How would I use the DocumentDB API to request the next part of the data?
I'm the author of documentdb-lumenize. If you just send back in what's returned as the only parameter, then the documentdb-lumenize sproc will know how to deal with the continuation token. You'll have to keep calling it until the continuation token comes back empty.
That said, I'm really surprised it only did 6000 in one round trip. I generally get 20-50K per round trip. Maybe you have a lower spec'd collection? Maybe it's doing an index-less full-scan?
Submit an issue in the GitHub repo if you want more 1:1 help with this.

Getting distinct values using NEST ElasticSearch client

I'm building a product search engine with Elastic Search in my .NET application, by using the NEST client, and there is one thing i'm having trouble with. Getting a distinct set of values.
I'm search for products, which there are many thousands, but of course i can only return 10 or 20 at a time to the user. And for this paging works fine. But besides this primary result, i want to show my users a list of brands that are found within the complete search, to present these for filtering.
I have read about that i should use Terms Aggregations for this. But, i couldn't get anything better than this. And this still doesn't really give me what i want, because it splits values like "20th Century Fox" into 3 separate values.
var brandResults = client.Search<Product>(s => s
.Query(query)
.Aggregations(a => a.Terms("my_terms_agg", t => t.Field(p => p.BrandName).Size(250))
)
);
var agg = brandResult.Aggs.Terms("my_terms_agg");
Is this even the right approach? Or should is use something totally different? And, how can i get the correct, complete values? (Not split by space .. but i guess that is what you get when you ask for a list of 'Terms'??)
What i'm looking for is what you would get if you would do this in MS SQL
SELECT DISTINCT BrandName FROM [Table To Search] WHERE [Where clause without paging]
You are correct that what you want is a terms aggregation. The problem you're running into is that ES is splitting the field "BrandName" in the results it is returning. This is the expected default behavior of a field in ES.
What I recommend is that you change BrandName into a "Multifield", this will allow you to search on all the various parts, as well as doing a terms aggregation on the "Not Analyzed" (aka full "20th Century Fox") term.
Here is the documentation from ES.
https://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/mapping-multi-field-type.html
[UPDATE]
If you are using ES version 1.4 or newer the syntax for multi-fields is a little different now.
https://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_multi_fields.html#_multi_fields
Here is a full working sample the illustrate the point in ES 1.4.4. Note the mapping specifies a "not_analyzed" version of the field.
PUT hilden1
PUT hilden1/type1/_mapping
{
"properties": {
"brandName": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
POST hilden1/type1
{
"brandName": "foo"
}
POST hilden1/type1
{
"brandName": "bar"
}
POST hilden1/type1
{
"brandName": "20th Century Fox"
}
POST hilden1/type1
{
"brandName": "20th Century Fox"
}
POST hilden1/type1
{
"brandName": "foo bar"
}
GET hilden1/type1/_search
{
"size": 0,
"aggs": {
"analyzed_field": {
"terms": {
"field": "brandName",
"size": 10
}
},
"non_analyzed_field": {
"terms": {
"field": "brandName.raw",
"size": 10
}
}
}
}
Results of the last query:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 5,
"max_score": 0,
"hits": []
},
"aggregations": {
"non_analyzed_field": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "20th Century Fox",
"doc_count": 2
},
{
"key": "bar",
"doc_count": 1
},
{
"key": "foo",
"doc_count": 1
},
{
"key": "foo bar",
"doc_count": 1
}
]
},
"analyzed_field": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "20th",
"doc_count": 2
},
{
"key": "bar",
"doc_count": 2
},
{
"key": "century",
"doc_count": 2
},
{
"key": "foo",
"doc_count": 2
},
{
"key": "fox",
"doc_count": 2
}
]
}
}
}
Notice that not-analyzed fields keep "20th century fox" and "foo bar" together where as the analyzed field breaks them up.
I had a similar issue. I was displaying search results and wanted to show counts on the category and sub category.
You're right to use aggregations. I also had the issue with the strings being tokenised (i.e. 20th century fox being split) - this happens because the fields are analysed. For me, I added the following mappings (i.e. tell ES not to analyse that field):
"category": {
"type": "nested",
"properties": {
"CategoryNameAndSlug": {
"type": "string",
"index": "not_analyzed"
},
"SubCategoryNameAndSlug": {
"type": "string",
"index": "not_analyzed"
}
}
}
As jhilden suggested, if you use this field for more than one reason (e.g. search and aggregation) you can set it up as a multifield. So on one hand it can get analysed and used for searching and on the other hand for not being analysed for aggregation.

Asp.Net Extracting data from deep within a dictionary<string,object>-

Relatively newbie here with a little question. I been extracting a json string that looks like this (in this case it is a modified return from Facebook oauth2.
{"id":"555555555555555","name":"Monkey
Man","last_name":"Man","first_name":"Monkey","email":"test\u0040someaccount.com","location":{"id":"555555555555555","name":"Jungle,
North
Carolina"},"gender":"male","work":[{"employer":{"id":"555555555555555","name":"Big
Boss makes me work"}:"projects":{"current":"doing stuff",
"previous":"other
stuff"},"location":{"id":"555555555555555","name":"Jungle, North
Carolina"},"position":{"id":"555555555555555","name":"IT
monkey"},"start_date":"2010-09"}],"picture":"http://profile.ak.fbcdn.net/static-ak/rsrc.php/v1/yo/r/5555555-555.gif"}
Well I am able to extract everything to a the dictionary by using the following code
JavaScriptSerializer ser = new JavaScriptSerializer();
Dictionary<string, object> dict = ser.Deserialize<Dictionary<string,object>>(json);
I then extract the data as following from the dictionary and store them in an object called contact which is pretty much just a collection of strings.
if (d.ContainsKey("email"))
{
c.email = d["email"].ToString();
}
else
c.email = "";
I did it this way as I was not gaurenteed the information fields will all be there.
If there is an object set in the value such as with the address I use a modified code (thanks to the guy who showed me how to do that) like following.
c.location = (d["location"] as Dictionary<string, object>)["name"].ToString();
Now come the difficult part that I am stuck on.
I am trying to extract the employer name "Big Boss makes me work" from the following part of the string...
"work":[{"employer":{"id":"555555555555555","name":"Big Boss makes me
work"}:"projects":{"current":"doing stuff", "previous":"other
stuff"},"location":{"id":"555555555555555","name":"Jungle, North
Carolina"},"position":{"id":"555555555555555","name":"IT
monkey"},"start_date":"2010-09"}]
It is storing the data down within an array inside of other objects and I have no idea how to get to the information to extract it, or even how to extract information like this from live oauth2...
"addresses": { "personal": { "street": null, "street_2": null, "city":
"Jungle", "state": "NC", "postal_code": "28677", "region": "United
States" }, "business": { "street": "Tree Street", "street_2": null,
"city": "Jungle", "state": "NC", "postal_code": "28677", "region":
"United States" } }
As you can see this goes three levels deep so my (d["location"] as Dictionary)["name"].ToString(); is pretty useless here. How would you go about getting say the street name from this?
I hope my questions aren't too vague or random. I just need some advice on properly extracting data from the dictionary objects. The ways I come up with involve editing the json string and that causes alsorts of problems as I just don't understand the dictionary object well enough to figure this out on my own
Thanks
Running your JSON through jsonlint.com (and correcting it slightly), it looks like this formatted:
{
"id": "555555555555555",
"name": "Monkey Man",
"last_name": "Man",
"first_name": "Monkey",
"email": "test#someaccount.com",
"location": {
"id": "555555555555555",
"name": "Jungle, North Carolina"
},
"gender": "male",
"work": [
{
"employer": {
"id": "555555555555555",
"name": "Big Boss makes me work"
},
"projects": {
"current": "doing stuff",
"previous": "other stuff"
},
"location": {
"id": "555555555555555",
"name": "Jungle, North Carolina"
},
"position": {
"id": "555555555555555",
"name": "IT monkey"
},
"start_date": "2010-09"
}
],
"picture": "http://profile.ak.fbcdn.net/static-ak/rsrc.php/v1/yo/r/5555555-555.gif"
}
Your JSON data in this case just isn't really suitable to be serialized to a straightforward Dictionary object, so that's not really the way to go here.
The easier way to do is to create a C# class that has defined properties the same as the Javascript object you're de-serializaing. Then, deserialize the JSON as that object and you should be able to access the ""Big Boss makes me work" value should be at objectFromJson.work[0].employer.name .

Couchbase View equivalent of Select count(doc.id) from doc where doc.productId in (select productId from doc where doc.lastupdated between x and y)

I'm trying to count the number of 'comments' related to a product in a couchbase bucket. That part is easy for a "full" set of data. It's just a simple map / reduce. Things get tricky when i want limit it to only products that have had changes within a date range. I can do this as two different Views in CB. One that gets the Product Id's where the dateCreated falls within my range, and then One that I pass these Id's to and it calculates my stats. The performance on this approach is horrible though. The key's for the second query aren't necessarily contiguous so i can't do a start/end on them; I'm using the .net 2.2 client for version 4.x couchbase.
I'm open to any options; i.e. Super-awesome-do-it-all-in-one-call View, or follow the 2 view approach if the client has some capacity for bulk get's against non-contiguous keys in a View (i can't find anything on this topic).
Here's my simplified example schema:
{
"comment": {
"key": "key1",
"title": "yay",
"productId": "product1",
"dateCreated": "2016,11,30"
},
"comment": {
"key": "key2",
"title": "booo",
"productId": "product1",
"dateCreated": "2016,12,30"
}
}
Not sure if this is what you want (also not sure about how to translate this to C#), but say you have two documents with ids comment::1 and comment::2 and a Couchbase document for each in this format.
{
"key": "key2",
"title": "booo",
"productId": "product1",
"dateCreated": "2016,12,30"
}
You can define a view (let's call it comments_by_time)
Map
function (doc, meta) {
if (doc.dateCreated) {
var dateParts = doc.dateCreated.split(",");
dateParts = dateParts.map(Number);
emit(dateParts, doc.productId);
}
}
Reduce
_count
Then, you can use the View Query API to do a startKey and endKey range on your documents.
End point
http://<couchbase>:8092/<bucket>/_design/<view>/_view/comments_by_time
Get count of all comments
?reduce=true
{"rows":[ {"key":null,"value":2} ] }
Get documents before a date
?reduce=false&endkey=[2016,12,1]
{"total_rows":2,"rows":[
{"id":"comment::1","key":[2016,11,30],"value":"product1"}
]
}
Between dates
?reduce=false&startkey=[2016,12,1]&endkey=[2017,1,1]
{"total_rows":2,"rows":[
{"id":"comment::2","key":[2016,12,30],"value":"product1"}
]
}

Categories