Read Azure DocumentDB document that might not exist - c#

I can query a single document from the Azure DocumentDB like this:
var response = await client.ReadDocumentAsync( documentUri );
If the document does not exist, this will throw a DocumentClientException. In my program I have a situation where the document may or may not exist. Is there any way to query for the document without using try-catch and without doing two round trips to the server, first to query for the document and second to retrieve the document should it exist?

Sadly there is no other way, either you handle the exception or you make 2 calls, if you pick the second path, here is one performance-driven way of checking for document existence:
public bool ExistsDocument(string id)
{
var client = new DocumentClient(DatabaseUri, DatabaseKey);
var collectionUri = UriFactory.CreateDocumentCollectionUri("dbName", "collectioName");
var query = client.CreateDocumentQuery<Microsoft.Azure.Documents.Document>(collectionUri, new FeedOptions() { MaxItemCount = 1 });
return query.Where(x => x.Id == id).Select(x=>x.Id).AsEnumerable().Any(); //using Linq
}
The client should be shared among all your DB-accesing methods, but I created it there to have a auto-suficient example.
The new FeedOptions () {MaxItemCount = 1} will make sure the query will be optimized for 1 result (we don't really need more).
The Select(x=>x.Id) will make sure no other data is returned, if you don't specify it and the document exists, it will query and return all it's info.

You're specifically querying for a given document, and ReadDocumentAsync will throw that DocumentClientException when it can't find the specific document (returning a 404 in the status code). This is documented here. By catching the exception (and seeing that it's a 404), you wouldn't need two round trips.
To get around dealing with this exception, you'd need to make a query instead of a discrete read, by using CreateDocumentQuery(). Then, you'll simply get a result set you can enumerate through (even if that result set is empty). For example:
var collLink = UriFactory.CreateDocumentCollectionUri(databaseId, collectionId);
var querySpec = new SqlQuerySpec { <querytext> };
var itr = client.CreateDocumentQuery(collLink, querySpec).AsDocumentQuery();
var response = await itr.ExecuteNextAsync<Document>();
foreach (var doc in response.AsEnumerable())
{
// ...
}
With this approach, you'll just get no responses. In your specific case, where you'll be adding a WHERE clause to query a specific document by its id, you'll either get zero results or one result.

With CosmosDB SDK ver. 3 it's possible. You can check if an item exists in a container and get it by using Container.ReadItemStreamAsync<T>(string id, PartitionKey key) and checking response.StatusCode:
using var response = await container.ReadItemStreamAsync(id, new PartitionKey(key));
if (response.StatusCode == HttpStatusCode.NotFound)
{
return null;
}
if (!response.IsSuccessStatusCode)
{
throw new Exception(response.ErrorMessage);
}
using var streamReader = new StreamReader(response.Content);
var content = await streamReader.ReadToEndAsync();
var item = JsonConvert.DeserializeObject(content, stateType);
This approach has a drawback, however. You need to deserialize the item by hand.

Related

Elastic Search ingest attachment plugin blocks

I am using NEST (C#) and the ingest attachment plugin to ingest 10s of thousands of documents into an Elastic search instance. Unfortunately, after a while everything just stands still - i.e. no more documents are ingested. The log shows:
[2019-02-20T17:35:07,528][INFO ][o.e.m.j.JvmGcMonitorService] [BwAAiDl] [gc][7412] overhead, spent [326ms] collecting in the last [1s]
Not sure if this tells anyone anything? Btw, are there more efficient ways to ingest many documents (rather than using thousands of REST requests)?
I am using this kind of code:
client.Index(new Document
{
Id = Guid.NewGuid(),
Path = somePath,
Content = Convert.ToBase64String(File.ReadAllBytes(somePath))
}, i => i.Pipeline("attachments"));
Define the pipeline:
client.PutPipeline("attachments", p => p
.Description("Document attachment pipeline")
.Processors(pr => pr
.Attachment<Document>(a => a
.Field(f => f.Content)
.TargetField(f => f.Attachment)
)
.Remove<Document>(r => r
.Field(f => f.Content)
)
)
);
The log indicates that a considerable amount of time is being spent performing Garbage Collection on the Elasticsearch server side; this is very likely to be the cause of large stop events that you are seeing. If you have monitoring enabled on the cluster (ideally exporting such data to a separate cluster), I would look at analysing those to see if it sheds some light on why large GC is happening.
are there more efficient ways to ingest many documents (rather than using thousands of REST requests)?
Yes, you are indexing each attachment in a separate index request. Depending on the size of each attachment, base64 encoded, you may want to send several in one bulk request
// Your collection of documents
var documents = new[]
{
new Document
{
Id = Guid.NewGuid(),
Path = "path",
Content = "content"
},
new Document
{
Id = Guid.NewGuid(),
Path = "path",
Content = "content" // base64 encoded bytes
}
};
var client = new ElasticClient();
var bulkResponse = client.Bulk(b => b
.Pipeline("attachments")
.IndexMany(documents)
);
If you're reading documents from the filesystem, you probably want to lazily enumerate them and send bulk requests. Here, you can make use of the BulkAll helper method too.
First have some lazily enumerated collection of documents
public static IEnumerable<Document> GetDocuments()
{
var count = 0;
while (count++ < 20)
{
yield return new Document
{
Id = Guid.NewGuid(),
Path = "path",
Content = "content" // base64 encoded bytes
};
}
}
Then configure the BulkAll call
var client = new ElasticClient();
// set up the observable configuration
var bulkAllObservable = client.BulkAll(GetDocuments(), ba => ba
.Pipeline("attachments")
.Size(10)
);
var waitHandle = new ManualResetEvent(false);
Exception exception = null;
// set up what to do in response to next bulk call, exception and completion
var bulkAllObserver = new BulkAllObserver(
onNext: response =>
{
// perform some action e.g. incrementing counter
// to indicate how many have been indexed
},
onError: e =>
{
exception = e;
waitHandle.Set();
},
onCompleted: () =>
{
waitHandle.Set();
});
// start the observable process
bulkAllObservable.Subscribe(bulkAllObserver);
// wait for indexing to finish, either forever,
// or set a max timeout as here.
waitHandle.WaitOne(TimeSpan.FromHours(1));
if (exception != null)
throw exception;
Size dictates how many documents to send in each request. There are no hard and fast rules for how big this can be for your cluster, because it can depend on a number of factors including ingest pipeline, the mapping of documents, the byte size of documents, the cluster hardware etc. You can configure the observable to retry documents that fail to be indexed, and if you see es_rejected_execution_exception, you are at the limits of what your cluster can concurrently handle.
Another recommendation is that of document ids. I see you're using new Guids for the ids of documents, which implies to me that you don't care what the value is for each document. If that is the case, I would recommend not sending an Id value, and instead allow Elasticsearch to generate an id for each document. This is very likely to result in an improvement in performance (I believe the implementation had changed slightly further in Elasticsearch and Lucene since this post, but the point still stands).

DynamoDb.net: Adding multiple scan conditions

I am using Dynamodb.net here. How can I add multiple scan conditions so that the data is filtered out based on those conditions.
I am using the below code:
var creds = new BasicAWSCredentials(awsId, awsPassword);
var dynamoClient = new AmazonDynamoDBClient(creds, awsDynamoDbRegion);
var context = new DynamoDBContext(dynamoClient);
List<ScanCondition> conditions = new List<ScanCondition>();
// conditions.Add(new ScanCondition("Id", ScanOperator.Equal, myId));
conditions.Add(new ScanCondition("name", ScanOperator.Equal, myName));
var response = await context.ScanAsync<Data>(conditions).GetRemainingAsync();
return response;
In my code above if I add 2 scan conditions, it does not works. But does work with one condition. Not sure if what I am doing wrong here.
Your code looks OK, with one caveat: scan conditions are for non-key attributes.
I'm going to go out on a limb and assume that Id is the partition key (or perhaps sort key) of your table. If that's true then that's why you can't use it in a scan condition. You can have multiple Scan conditions added but they must all be for non-key attributes.
In order to specify key conditions, you must use a Query operation, not a Scan.
Assuming your table only has a Primary key and no Sort key, then the example below should work. However, if the table has a sort key as well then your query must include that as well so the example below would need to be modified slightly.
var creds = new BasicAWSCredentials(awsId, awsPassword);
var dynamoClient = new AmazonDynamoDBClient(creds, awsDynamoDbRegion);
var context = new DynamoDBContext(dynamoClient);
var opConfig = new DynamoDBOperationConfig();
opConfig.QueryFilter = new List<ScanCondition>();
opConfig.QueryFilter.Add(new ScanCondition("name", ScanOperator.Equal, myName));
var response = await context.QueryAsync<Data>(myId, opConfig).GetRemainingAsync();
return response;

MongoDB C# Driver collcetion.Find(filter, "wanted to choose not to return _id").ToListAsync()

var filter = new BsonDocument("filename", filename);
var list = await col.Find(filter).ToListAsync();
here are my code, I can't figure out what's the proper syntax to perform the task I wanted to do.
Not sure what you mean. I guess you want the returned document contains no _id field. To do so, you need a projection.
Try
var projection = new BsonDocument("_id", false);
var list = col.Find(filter).SetFields(new FieldsBuilder<YourClass>()
.Exclude(p => p.Id));
reference

IPP .NET c# SDK allows query filters, but this does not reduce payload?

Referencing the following IPP Documentation:
https://developer.intuit.com/docs/0025_quickbooksapi/0055_devkits/0150_ipp_.net_devkit_3.0/query_filters
I made the assumption that the following code using the Linq Extentions Projection would alter the request and reduce the payload of the response by only querying for the requested fields and only including those fields (narrow result set) in the response:
public List<ShortAccount> GetFullShortAccountList(bool logRequestResponse)
{
var accounts = new List<ShortAccount>();
var accountQueryService = new QueryService<Account>
(GetIppServiceContext(logRequestResponse));
var selected = accountQueryService.Select(a => new { a.Id, a.Name });
foreach (var account in selected)
{
accounts.Add(new ShortAccount { Id = account.Id, Name = account.Name });
}
return accounts;
}
Now the behavior of this method is as expected, but if I look at the request/response logs (or the actual request and response using Fiddler) the request doesn't change -- it is still "Select * from Account", and the response still includes all the other properties in the Account entity.
In other words, the payload is not reduced one iota.
Am I doing something wrong here? Or do I just understand this incorrectly?
How can I use the SDK to generate a query that would look more like "Select Id, Name from Account", and only return that result set?
Related question -- if this mode of query filtering does not reduce the payload, what is its purpose? You might as well get the whole shebang and just take the fields you need?
Thanks in advance.
That's right #Barrick. The implementation of our query providers is not exactly the same as the standard LINQ. So, Stephan, that's the issue.
If you just want to get specific fields I would suggest you to use IDSQuery like:
QueryService<Account> AccQueryService22 = new QueryService<Account>(context);
var t13 = AccQueryService22.ExecuteIdsQuery("Select Id, Name From Account Where Active in (true, false)");
I will forward the feedback to our team.
Thanks!

MongoDB: update only specific fields

I am trying to update a row in a (typed) MongoDB collection with the C# driver. When handling data of that particular collection of type MongoCollection<User>, I tend to avoid retrieving sensitive data from the collection (salt, password hash, etc.)
Now I am trying to update a User instance. However, I never actually retrieved sensitive data in the first place, so I guess this data would be default(byte[]) in the retrieved model instance (as far as I can tell) before I apply modifications and submit the new data to the collection.
Maybe I am overseeing something trivial in the MongoDB C# driver how I can use MongoCollection<T>.Save(T item) without updating specific properties such as User.PasswordHash or User.PasswordSalt? Should I retrieve the full record first, update "safe" properties there, and write it back? Or is there a fancy option to exclude certain fields from the update?
Thanks in advance
Save(someValue) is for the case where you want the resulting record to be or become the full object (someValue) you passed in.
You can use
var query = Query.EQ("_id","123");
var sortBy = SortBy.Null;
var update = Update.Inc("LoginCount",1).Set("LastLogin",DateTime.UtcNow); // some update, you can chain a series of update commands here
MongoCollection<User>.FindAndModify(query,sortby,update);
method.
Using FindAndModify you can specify exactly which fields in an existing record to change and leave the rest alone.
You can see an example here.
The only thing you need from the existing record would be its _id, the 2 secret fields need not be loaded or ever mapped back into your POCO object.
It´s possible to add more criterias in the Where-statement. Like this:
var db = ReferenceTreeDb.Database;
var packageCol = db.GetCollection<Package>("dotnetpackage");
var filter = Builders<Package>.Filter.Where(_ => _.packageName == packageItem.PackageName.ToLower() && _.isLatestVersion);
var update = Builders<Package>.Update.Set(_ => _.isLatestVersion, false);
var options = new FindOneAndUpdateOptions<Package>();
packageCol.FindOneAndUpdate(filter, update, options);
Had the same problem and since I wanted to have 1 generic method for all types and didn't want to create my own implementation using Reflection, I end up with the following generic solution (simplified to show all in one method):
Task<bool> Update(string Id, T item)
{
var serializerSettings = new JsonSerializerSettings()
{
NullValueHandling = NullValueHandling.Ignore,
DefaultValueHandling = DefaultValueHandling.Ignore
};
var bson = new BsonDocument() { { "$set", BsonDocument.Parse(JsonConvert.SerializeObject(item, serializerSettings)) } };
await database.GetCollection<T>(collectionName).UpdateOneAsync(Builders<T>.Filter.Eq("Id", Id), bson);
}
Notes:
Make sure all fields that must not update are set to default value.
If you need to set field to default value, you need to either use DefaultValueHandling.Include, or write custom method for that update
When performance matters, write custom update methods using Builders<T>.Update
P.S.: It's obviously should have been implemented by MongoDB .Net Driver, however I couldn't find it anywhere in the docs, maybe I just looked the wrong way.
Well there are many ways to updated value in mongodb.
Below is one of the simplest way I choose to update a field value in mongodb collection.
public string UpdateData()
{
string data = string.Empty;
string param= "{$set: { name:'Developerrr New' } }";
string filter= "{ 'name' : 'Developerrr '}";
try
{
//******get connections values from web.config file*****
var connectionString = ConfigurationManager.AppSettings["connectionString"];
var databseName = ConfigurationManager.AppSettings["database"];
var tableName = ConfigurationManager.AppSettings["table"];
//******Connect to mongodb**********
var client = new MongoClient(connectionString);
var dataBases = client.GetDatabase(databseName);
var dataCollection = dataBases.GetCollection<BsonDocument>(tableName);
//****** convert filter and updating value to BsonDocument*******
BsonDocument filterDoc = BsonDocument.Parse(filter);
BsonDocument document = BsonDocument.Parse(param);
//********Update value using UpdateOne method*****
dataCollection.UpdateOne(filterDoc, document);
data = "Success";
}
catch (Exception err)
{
data = "Failed - " + err;
}
return data;
}
Hoping this will help you :)

Categories