I have just started with DynamoDB. I have background in MongoDB and relational databases and I am structuring my JSON in more like a graph structure than a flat structure. For example,
[
{
"id": "1",
"title": "Castle on the hill",
"lyrics": "when i was six years old I broke my leg",
"artists": [
{
"name": "Ed Sheeran",
"sex": "male"
}
]
}
]
For example, If I like to search the item by 'Ed Sheeran'. The closest I have got is this and this is not even matching any value.
var request = new ScanRequest
{
TableName = "Table",
ProjectionExpression = "Id, Title, Artists",
ExpressionAttributeValues = new Dictionary<string,AttributeValue>
{
{ ":artist", new AttributeValue { M = new Dictionary<string, AttributeValue>
{
{ "Name", new AttributeValue { S = "Ed Sheeran" }}
}
}
}
},
ExpressionAttributeNames = new Dictionary<string, string>
{
{ "#artist", "Artists" },
},
FilterExpression = "#artist = :artist",
};
var result = await client.ScanAsync(request);
Most of the example and tuturials I have watched so far, they have treated dynamodb as a table in a normal relational database with very flat design. Am I doing it wrong to structure the JSON as above? Should Artists be in a separate table?
And If it can be done, how do i search by some value in a complex type like in the above example?
First of all, you should not be using the scan operation in dynamodb. I would strongly recommend to use query. Have a look at this stack overflow question first.
If you want to search on any attribute, you can either mark them as the primary key (either hash_key or hash_key + sort_key) or create an index on the field you want to query on.
Depending on the use case of id attribute in your schema, if you are never querying on id attribute, I would recommend the structure something like this :
[
{
"artist_name" : "Ed Sheeran" // Hash Key
"id": "1", // Sort Key (Assuming id is unique and combination of HK+SK is unique)
"title": "Castle on the hill",
"lyrics": "when i was six years old I broke my leg",
"artists": [
{
"name": "Ed Sheeran",
"sex": "male"
}
]
}
]
Alternatively, if you also need to query on id and it has to be the hash key, you can an index on the artist_name attribute and then query it.
[
{
"artist_name" : "Ed Sheeran" // GSI Hash key
"id": "1", // Table Hash key
"title": "Castle on the hill",
"lyrics": "when i was six years old I broke my leg",
"artists": [
{
"name": "Ed Sheeran",
"sex": "male"
}
]
}
]
In either case, it is not possible to query inside a nested object without using scan operation and then iterating it in code, something which you have already tried.
Related
I am learning C# and I am trying to parse json/xml responses and check each and every key and value pair. For xml I am converting to json so I have only one function/script to work with both cases. My issue is that I am working with a wide range of json responses which are not similar and there may be arrays in some of the json response. I have tried accessing the "Count" of the json object as a way to check for arrays.
Note: The responses will vary. This example is for Products > Product > name, quantity and category. The next response will change and can be like Country > State > Cities and so on. I cannot rely on creating classes since all responses are going to be different. Plus I am working on automating it so it should be able to handle anything thrown at it.
Sample Json I am working with:
{
"products": {
"product": [
{
"name": "Dom quixote de La Mancha",
"quantity": "12",
"category": "Book"
},
{
"name": "Hamlet",
"quantity": "3",
"category": "Book"
},
{
"name": "War and Peace",
"quantity": "7",
"category": "Book"
},
{
"name": "Moby Dick",
"quantity": "14",
"category": "Book"
},
{
"name": "Forrest Gump",
"quantity": "16",
"category": "DVD"
}
]
}
The way I am accessing the count, name and value is as follows:
dynamic dyn = JsonConvert.DeserializeObject<dynamic>(jsonText);
foreach (JProperty property in dyn.Properties())
{
string propname = property.Name;
var propvalue = property.Value;
int count = property.Count;
}
Is there a way to access these without going through the foreach loop like int count = dyn.Count ? All I am getting from this is null instead of actual values.
For the above example my end result will be like:
This responses contains products> product> 5 x (name, quantity, category)
The QuickWatch for the object:
QuickWatch for dyn object
Try to deserialize your JSON into JObject like below:
var jObject = JsonConvert.DeserializeObject<JObject>(jsonText);
I need to update several thousand items every several minutes in Elastic and unfortunately reindexing is not an option for me. From my research the best way to update an item is using _update_by_query - I have had success updating single documents like so -
{
"query": {
"match": {
"itemId": {
"query": "1"
}
}
},
"script": {
"source": "ctx._source.field = params.updateValue",
"lang": "painless",
"params": {
"updateValue": "test",
}
}
}
var response = await Client.UpdateByQueryAsync<dynamic>(q => q
.Index("masterproducts")
.Query(q => x.MatchQuery)
.Script(s => s.Source(x.Script).Lang("painless").Params(x.Params))
.Conflicts(Elasticsearch.Net.Conflicts.Proceed)
);
Although this works it is extremely inefficient as it generates thousands of requests - is there a way in which I can update multiple documents with a matching ID in a single request? I have already tried Multiple search API which it would seem cannot be used for this purpose. Any help would be appreciated!
If possible, try to generalize your query.
Instead of targeting a single itemId, perhaps try using a terms query:
{
"query": {
"terms": {
"itemId": [
"1", "2", ...
]
}
},
"script": {
...
}
}
From the looks of it, your (seemingly simplified) script sets the same value, irregardless of the document ID / itemId. So that's that.
If the script does indeed set different values based on the doc IDs / itemIds, you could make the params multi-value:
"params": {
"updateValue1": "test1",
"updateValue2": "test2",
...
}
and then dynamically access them:
...
def value_to_set = params['updateValue' + ctx._source['itemId']];
...
so the target doc is updated with the corresponding value.
There is a series of JSON objects called "issues", which each have one or more "issue links", which have the following format:
// an issue link
{
"id": "000000",
"self": "some link",
"type": {
"id": "0000",
"name": "some name",
"inward": "is met by",
"outward": "meets",
"self": "some link"
},
"outwardIssue": {
"id": "000000",
"key": "the required key",
"self": "some link",
"fields": {
// the remainder is not applicable
}
}
}
}
These "issue links" have been extracted as follows. Create a JArray for the JSON for the "issue" itself, and extract the child JObjects:
public void Deserialize(dynamic jsonObject)
{
// get the issue links
if (jsonObject["fields"]["issuelinks"]!=null)
{
JArray issueLinksArray = jsonObject["fields"]["issuelinks"];
var issueLinkObjects = issueLinksArray.Children();
foreach (var issueLink in issueLinkObjects)
{
// now need the "key" in the "outwardIssue" for this object, if the value of "inward" is "is met by".
}
}
}
How to go about extracting the value of the second property "key" of "outwardIssue"?
Not sure whether I fully understand but following excerpt gets u the value (or null if condition not met) like this.
var key = issueLink["type"]["inward"].ToString()=="is met by" ? issueLink["outwardIssue"]["key"]: null;
Hint: Try to avoid dynamic.
Nowadays loops can in certain conditions considered old-school. Think LINQ: The problem can be divided into smaller probs and distributed across multiple lines (think-steps).
The additional variables might improve readability. As the project grows, for loops are prone to span more and more lines. So if you just need requested values following might be of interest:
var inwardLinks = issueLinkObject.Where(i=>i["type"]["inward"].ToString()=="is met by");
var keys = inwardLinks.Select(iwl=>iwl["outwardIssue"]["key"]);
I have documents that looks something like that, with a unique index on bars.name:
{ name: 'foo', bars: [ { name: 'qux', somefield: 1 } ] }
. I want to either update the sub-document where { name: 'foo', 'bars.name': 'qux' } and $set: { 'bars.$.somefield': 2 }, or create a new sub-document with { name: 'qux', somefield: 2 } under { name: 'foo' }.
Is it possible to do this using a single query with upsert, or will I have to issue two separate ones?
Related: 'upsert' in an embedded document (suggests to change the schema to have the sub-document identifier as the key, but this is from two years ago and I'm wondering if there are better solutions now.)
No there isn't really a better solution to this, so perhaps with an explanation.
Suppose you have a document in place that has the structure as you show:
{
"name": "foo",
"bars": [{
"name": "qux",
"somefield": 1
}]
}
If you do an update like this
db.foo.update(
{ "name": "foo", "bars.name": "qux" },
{ "$set": { "bars.$.somefield": 2 } },
{ "upsert": true }
)
Then all is fine because matching document was found. But if you change the value of "bars.name":
db.foo.update(
{ "name": "foo", "bars.name": "xyz" },
{ "$set": { "bars.$.somefield": 2 } },
{ "upsert": true }
)
Then you will get a failure. The only thing that has really changed here is that in MongoDB 2.6 and above the error is a little more succinct:
WriteResult({
"nMatched" : 0,
"nUpserted" : 0,
"nModified" : 0,
"writeError" : {
"code" : 16836,
"errmsg" : "The positional operator did not find the match needed from the query. Unexpanded update: bars.$.somefield"
}
})
That is better in some ways, but you really do not want to "upsert" anyway. What you want to do is add the element to the array where the "name" does not currently exist.
So what you really want is the "result" from the update attempt without the "upsert" flag to see if any documents were affected:
db.foo.update(
{ "name": "foo", "bars.name": "xyz" },
{ "$set": { "bars.$.somefield": 2 } }
)
Yielding in response:
WriteResult({ "nMatched" : 0, "nUpserted" : 0, "nModified" : 0 })
So when the modified documents are 0 then you know you want to issue the following update:
db.foo.update(
{ "name": "foo" },
{ "$push": { "bars": {
"name": "xyz",
"somefield": 2
}}
)
There really is no other way to do exactly what you want. As the additions to the array are not strictly a "set" type of operation, you cannot use $addToSet combined with the "bulk update" functionality there, so that you can "cascade" your update requests.
In this case it seems like you need to check the result, or otherwise accept reading the whole document and checking whether to update or insert a new array element in code.
if you dont mind changing the schema a bit and having a structure like so:
{ "name": "foo", "bars": { "qux": { "somefield": 1 },
"xyz": { "somefield": 2 },
}
}
You can perform your operations in one go.
Reiterating 'upsert' in an embedded document for completeness
I was digging for the same feature, and found that in version 4.2 or above, MongoDB provides a new feature called Update with aggregation pipeline.
This feature, if used with some other techniques, makes possible to achieve an upsert subdocument operation with a single query.
It's a very verbose query, but I believe if you know that you won't have too many records on the subCollection, it's viable. Here's an example on how to achieve this:
const documentQuery = { _id: '123' }
const subDocumentToUpsert = { name: 'xyz', id: '1' }
collection.update(documentQuery, [
{
$set: {
sub_documents: {
$cond: {
if: { $not: ['$sub_documents'] },
then: [subDocumentToUpsert],
else: {
$cond: {
if: { $in: [subDocumentToUpsert.id, '$sub_documents.id'] },
then: {
$map: {
input: '$sub_documents',
as: 'sub_document',
in: {
$cond: {
if: { $eq: ['$$sub_document.id', subDocumentToUpsert.id] },
then: subDocumentToUpsert,
else: '$$sub_document',
},
},
},
},
else: { $concatArrays: ['$sub_documents', [subDocumentToUpsert]] },
},
},
},
},
},
},
])
There's a way to do it in two queries - but it will still work in a bulkWrite.
This is relevant because in my case not being able to batch it is the biggest hangup. With this solution, you don't need to collect the result of the first query, which allows you to do bulk operations if you need to.
Here are the two successive queries to run for your example:
// Update subdocument if existing
collection.updateMany({
name: 'foo', 'bars.name': 'qux'
}, {
$set: {
'bars.$.somefield': 2
}
})
// Insert subdocument otherwise
collection.updateMany({
name: 'foo', $not: {'bars.name': 'qux' }
}, {
$push: {
bars: {
somefield: 2, name: 'qux'
}
}
})
This also has the added benefit of not having corrupted data / race conditions if multiple applications are writing to the database concurrently. You won't risk ending up with two bars: {somefield: 2, name: 'qux'} subdocuments in your document if two applications run the same queries at the same time.
starting from a JObject I can get the array that interests me:
JArray partial = (JArray)rssAlbumMetadata["tracks"]["items"];
First question: "partial" contains a lot of attributes I'm not interested on.
How can I get only what I need?
Second question: once succeeded in the first task I'll get a JArray of duplicated items. How can I get only the unique ones ?
The result should be something like
{
'composer': [
{
'id': '51523',
'name': 'Modest Mussorgsky'
},
{
'id': '228918',
'name': 'Sergey Prokofiev'
},
]
}
Let me start from something like:
[
{
"id": 32837732,
"composer": {
"id": 245,
"name": "George Gershwin"
},
"title": "Of Thee I Sing: Overture (radio version)"
},
{
"id": 32837735,
"composer": {
"id": 245,
"name": "George Gershwin"
},
"title": "Concerto in F : I. Allegro"
},
{
"id": 32837739,
"composer": {
"id": 245,
"name": "George Gershwin"
},
"title": "Concerto in F : II. Adagio"
}
]
First question:
How can I get only what I need?
There is no magic, you need to read the whole JSON string and then query the object to find what you are looking for. It is not possible to read part of the JSON if that is what you need. You have not provided an example of what the data looks like so not possible to specify how to query.
Second question which I guess is: How to de-duplicate contents of an array of object?
Again, I do not have full view of your objects but this example should be able to show you - using Linq as you requested:
var items = new []{new {id=1, name="ali"}, new {id=2, name="ostad"}, new {id=1, name="ali"}};
var dedup = items.GroupBy(x=> x.id).Select(y => y.First()).ToList();
Console.WriteLine(dedup);