DocumentDb insert performance

DocumentDb insert performance - c#

I am using DocumentDb to store my data and this is the sample code I am using to insert a record in documentdb.
I am calling method like this
var result = ProcessRequestAsync(() => Client.CreateDocumentAsync("collection link", data)).Result;
and the method logic is like this
public async static Task<ResourceResponse<T>> ProcessRequestAsync<T>(Func<Task<ResourceResponse<T>>> request)
where T : Resource, new()
{
var delay = TimeSpan.Zero;
var minDelayTime = new TimeSpan(0, 0, 1);
for (; ; )
{
try
{
await Task.Delay(delay);
return await request();
}
catch (DocumentClientException documentClientException)
{
var statusCode = (int)documentClientException.StatusCode;
if (statusCode == 429 || statusCode == 503)
{
delay = TimeSpan.Compare(documentClientException.RetryAfter, minDelayTime) >= 0 ? documentClientException.RetryAfter : minDelayTime;
}
else
{
throw;
}
}
}
}
It is taking 2 seconds to insert a record into documentDb.
However I repeat the insertion process in a loop, the first record is taking 2 seconds to insert and remaining are taking 400ms around.
Anything I need to add to improve the speed of insertion?
Thanks in advance.

Have you followed the performance tips listed here: http://azure.microsoft.com/blog/2015/01/20/performance-tips-for-azure-documentdb-part-1-2/ and http://azure.microsoft.com/blog/2015/01/27/performance-tips-for-azure-documentdb-part-2/? You should see a performance of < 10 ms on writes with DocumentDB aside from network latency of your connection.
If you could post a complete sample, we can help further. Like Ryan mentioned, the longer time for the first call could be for the initialization of the client. The blog above also explains how to avoid that.

Related

How to get data from sql database using dapper in parallel/multiple threads?

I am trying to get data from sql server using dapper. I have requirement to export 460K records stored in a Azure sql database. I decided to get data in batches, so I getting record of 10k records in each batch. I have planned to get the records in Parallel, so I added async methods to a list of task and did Task.WhenAll. The code works fine when i run locally but after deployed to k8s cluster, I am getting data read issue for some records. I am new to multi threading and I don't how to handle this issue. I tried to do a lock inside the method but the system crashes, Below is my code, the code might be clumsy because I was trying many solution to fix the issue.
for (int i = 0; i < numberOfPages; i++)
{
tableviewWithCondition.startRow = startRow;
resultData.Add(_tableviewRepository.GetTableviewRowsByPagination(tableviewExportCondition.TableviewName, modelMappingGroups, tableviewWithCondition.startRow, builder, pageSize, appName, i));
startRow += tableviewWithCondition.pageSize;
}
foreach(var task in resultData)
{
if (task != null)
{
dataToExport.AddRange(task.Result);
}
}
This is the method I implemented to get data from azure sql database using dapper.
public async Task<(IEnumerable<int> unprocessedData, IEnumerable<dynamic> rowData)> GetTableviewRowsByPagination(string tableName, IEnumerable<MappingGroup> tableviewAttributeDetails,
int startRow, SqlBuilder builder, int pageSize = 100, AppNameEnum appName = AppNameEnum.OptiSoil, int taskNumber = 1)
{
var _unitOfWork = _unitOfWorkServices.Build(appName.ToString());
List<int> unprocessedData = new List<int>();
try
{
var columns = tableviewAttributeDetails.Select(c => { return $"{c.mapping_group_value} [{c.attribute}]"; });
var joinedColumn = string.Join(",", columns);
builder.Select(joinedColumn);
var selector = builder.AddTemplate($"SELECT /**select**/ FROM {tableName} with (nolock) /**innerjoin**/ /**where**/ /**orderby**/ OFFSET {startRow} ROWS FETCH NEXT {(pageSize == 0 ? 100 : pageSize)} ROWS ONLY");
using (var connection = _unitOfWork.Connection)
{
connection.Open();
var data = await connection.QueryAsync(selector.RawSql, selector.Parameters);
Console.WriteLine($"data completed for task{taskNumber}");
return (unprocessedData, data);
}
}
catch(Exception ex)
{
Console.WriteLine($"Exception: {ex.Message}");
if (ex.InnerException != null)
Console.WriteLine($"InnerException: {ex.InnerException.Message}");
Console.WriteLine($"Error in fetching from row {startRow}");
unprocessedData.Add(startRow);
return (unprocessedData, null);
}
finally
{
_unitOfWork.Dispose();
}
}
The above code works fine locally, but in server I am getting below issue.
Exception: A transport-level error has occurred when sending the request to the server. (provider: TCP Provider, error: 35 - An internal exception was caught).
InnerException: The WriteAsync method cannot be called when another write operation is pending.
How to avoid this issue when fetch data in parallel tasks?

You're using the same connection and trying to execute multiple commands over it (I'm assuming this because of the naming), also should you be disposing the unit of work?
Rather than :
using (var connection = _unitOfWork.Connection)
{
connection.Open();
var data = await connection.QueryAsync(selector.RawSql, selector.Parameters);
Console.WriteLine($"data completed for task{taskNumber}");
return (unprocessedData, data);
}
Create a new connection for each item, if this is what you truly want to do. I imagine, and this is an educated guess it's working locally because of timing.
Also look into Task.WhenAll it's a better way collect all the results up. Rather than :
foreach(var task in resultData)
{
if (task != null)
{
dataToExport.AddRange(task.Result);
}
}
calling result on a task is usually bad practice.

Avoid fast post on webapi c#

I have problem in when user post the data. Some times the post run so fast and this make problem in my website.
The user want to register a form about 100$ and have 120$ balance.
When the post (save) button pressed sometimes two post come to server very fast like:
2018-01-31 19:34:43.660 Register Form 5760$
2018-01-31 19:34:43.663 Register Form 5760$
Therefore my client balance become negative.
I use If in my code to check balance but the code run many fast and I think both if happen together and I missed them.
Therefore I made Lock Controll class to avoid concurrency per user but not work well.
I made global Action Filter to control the users this is my code:
public void OnActionExecuting(ActionExecutingContext context)
{
try
{
var controller = (Controller)context.Controller;
if (controller.User.Identity.IsAuthenticated)
{
bool jobDone = false;
int delay = 0;
int counter = 0;
do
{
delay = LockControllers.IsRequested(controller.User.Identity.Name);
if (delay == 0)
{
LockControllers.AddUser(controller.User.Identity.Name);
jobDone = true;
}
else
{
counter++;
System.Threading.Thread.Sleep(delay);
}
if (counter >= 10000)
{
context.HttpContext.Response.StatusCode = 400;
jobDone = true;
context.Result = new ContentResult()
{
Content = "Attack Detected"
};
}
} while (!jobDone);
}
}
catch (System.Exception)
{
}
}
public void OnActionExecuted(ActionExecutedContext context)
{
try
{
var controller = (Controller)context.Controller;
if (controller.User.Identity.IsAuthenticated)
{
LockControllers.RemoveUser(controller.User.Identity.Name);
}
}
catch (System.Exception)
{
}
}
I made list static list of user and sleep their thread until previous task happen.
Is there any better way to manage this problem?

So the original question has been edited so this answer is invalid.
so the issue isn't that the code runs too fast. Fast is always good :) The issue is that the account is going into negative funds. If the client decides to post a form twice that is the clients fault. It maybe that you only want the client to pay only once which is an other problem.
So for the first problem, I would recommend a using transactions (https://en.wikipedia.org/wiki/Database_transaction) to lock your table. Which means that the add update/add a change (or set of changes) and you force other calls to that table to wait until those operations have been done. You can always begin your transaction and check that the account has the correct amount of funds.
If the case is that they are only ever meant to pay once then.. then have a separate table that records if the user has payed (again within a transaction), before processing the update/add.
http://www.entityframeworktutorial.net/entityframework6/transaction-in-entity-framework.aspx
(Edit: fixing link)

You have a few options here
You implement ETag functionality in your app which you can use for optimistic concurrency. This works well, when you are working with records, i.e. you have a database with a data record, return that to the user and then the user changes it.
You could add an required field with a guid to your view model which you pass to your app and add it to in memory cache and check it on each request.
public class RegisterViewModel
{
[Required]
public Guid Id { get; set; }
/* other properties here */
...
}
and then use IMemoryCache or IDistributedMemoryCache (see ASP.NET Core Docs) to put this Id into the memory cache and validate it on request
public Task<IActioNResult> Register(RegisterViewModel register)
{
if(!ModelState.IsValid)
return BadRequest(ModelState);
var userId = ...; /* get userId */
if(_cache.TryGetValue($"Registration-{userId}", register.Id))
{
return BadRequest(new { ErrorMessage = "Command already recieved by this user" });
}
// Set cache options.
var cacheEntryOptions = new MemoryCacheEntryOptions()
// Keep in cache for 5 minutes, reset time if accessed.
.SetSlidingExpiration(TimeSpan.FromMinutes(5));
// when we're here, the command wasn't executed before, so we save the key in the cache
_cache.Set($"Registration-{userId}", register.Id, cacheEntryOptions );
// call your service here to process it
registrationService.Register(...);
}
When the second request arrives, the value will already be in the (distributed) memory cache and the operation will fail.
If the caller do not sets the Id, validation will fail.
Of course all that Jonathan Hickey listed in his answer below applies to, you should always validate that there is enough balance and use EF-Cores optimistic or pessimistic concurrency

Handle multiple similar requests in webapi

In my WebApi controller I have the following (pseudo) code that receives update notifications from Instagrams real-time API:
[HttpPost]
public void Post(InstagramUpdate instagramUpdate)
{
var subscriptionId = instagramUpdate.SubscriptionId;
var lastUpdate = GetLastUpdate(subscriptionId);
// To avoid breaking my Instagram request limit, do not fetch new images too often.
if (lastUpdate.AddSeconds(5) < DateTime.UtcNow)
{
// More than 5 seconds ago since last update for this subscription. Get new images
GetNewImagesFromInstagram(subscriptionId);
UpdateLastUpdate(subscriptionId, DateTime.UtcNow);
}
}
This won't work very well if I receive two update notifications for the same subscription almost simultaneously, since lastUpdate won't have been updated until after the first request has been processed.
What would be the best way to tackle this problem? I'm thinking of using some kind of cache, but I'm not sure how. Is there some kind of best practices for these kind of things? I'm guessing it's a common problem: "receive notification, do something if something hasn't been done recently..."

Thanks to this answer I went with the following approach, using MemoryCache
[HttpPost]
public void Post(IEnumerable<InstagramUpdate> instagramUpdates)
{
foreach (var instagramUpdate in instagramUpdates)
{
if (WaitingToProcessSubscriptionUpdate(instagramUpdate.Subscription_id))
{
// Ongoing request, do nothing
}
else
{
// Process update
}
}
}
private bool WaitingToProcessSubscriptionUpdate(string subscriptionId)
{
// Check in the in memory cache if this subscription is in queue to be processed. Add it otherwise
var queuedRequest = _cache.AddOrGetExisting(subscriptionId, string.Empty, new CacheItemPolicy
{
// Automatically expire this item after 1 minute (if update failed for example)
AbsoluteExpiration = DateTime.Now.AddMinutes(1)
});
return queuedRequest != null;
}

I am afraid that it is awful idea, but ... Maybe it worth to add lock to this method ? Like
private List<int> subscriptions = new List<int>();
and then
int subscriptinId = 1;//add calculation here
int subscriptionIdIndex = subscriptions.IndexOf(subscriptinId);
lock (subscriptions[subscriptionIdIndex])
{
//your method code
}
Feel free to criticize this approach )

Proper way to filter BlockBuffer.RecieveAsync

Good day.
I have a TPL Dataflow mesh for rpc calls
It has two unkinked flows which in simplified way looks like this:
Output flow:
BlockBuffer to store output
ActionBLock to send output to server and produce sent id
And input flow:
while loop to recieve data
TransformBlock to parse data
BlockBuffer to save answer with sentid
there is a problem: when i make calls from separate threads i can mess with answers, so i need to filter it.
my rpc call:
public async Task<RpcAnswer> PerformRpcCall(Call rpccall)
{
...
_outputRpcCalls.Post(rpccall);
long uniqueId = GetUniq(); // call unique id
...
var sent = new Tuple<long, long>(uniqueId, 0);
while (_sentRpcCalls.TryReceive(u => u.Item1 == uniqueId, out sent)) ; // get generated id from send function
return await _inputAnswers.ReceiveAsync(TimeSpan.FromSeconds(30));
}
as you can see i have uniqueId which can help me to determine answer for this call, but how can i filter it and await for it?
Is it good way to have some array of buffers (WriteOnceBlock maybe?) which will be created in rpc call and LinkedTo with filter?

Ok, i didn't found any proper way so i made a dirty workaround
while (true)
{
answer = await _inputAnswers.ReceiveAsync(TimeSpan.FromSeconds(5));
if (answer.Success)
{
if (answer.Answer.Combinator.ValueType.Equals(rpccall.Combinator.ValueType))
{
break;
}
else
{
// wrong answer - post it back
_inputAnswers.Post(answer.Answer);
}
}
else
{
// answer fail - return it
break;
}
}

One way to do this would be to create a new block for each id, and link it to the answers block with a predicate checking the id and MaxMessages set to 1:
Task<Answer> ReceiveAnswerAsync(int uniqueId)
{
var block = new BufferBlock<Answer>();
_inputAnswers.LinkTo(
block,
new DataflowLinkOptions { MaxMessages = 1, PropagateCompletion = true },
answer => answer.Id == uniqueId);
return block.ReceiveAsync();
}

Need to know if my threading lock does what it is supposed to in .Net?

I have an application that, before is creates a thread it calls the database to pull X amount of records. When the records are retrieved from the database a locked flag is set so those records are not pulled again.
Once a thread has completed it will pull some more records form that database. When I call the database from a thread should I set a lock on that section of code so it is called only by that thread at that time? Here is an exmaple of my code (I commented in the area where I have the lock):
private void CreateThreads()
{
for(var i = 1; i <= _threadCount; i++)
{
var adapter = new Dystopia.DataAdapter();
var records = adapter.FindAllWithLocking(_recordsPerThread,_validationId,_validationDateTime);
if(records != null && records.Count > 0)
{
var paramss = new ArrayList { i, records };
ThreadPool.QueueUserWorkItem(ThreadWorker, paramss);
}
this.Update();
}
}
private void ThreadWorker(object paramList)
{
try
{
var parms = (ArrayList) paramList;
var stopThread = false;
var threadCount = (int) parms[0];
var records = (List<Candidates>) parms[1];
var runOnce = false;
var adapter = new Dystopia.DataAdapter();
var lastCount = records.Count;
var runningCount = 0;
while (_stopThreads == false)
{
if (records.Count > 0)
{
foreach (var record in records)
{
var proc = new ProcRecords();
proc.Validate(ref rec);
adapter.Update(rec);
if (_stopThreads)
{
break;
}
}
//This is where I think I may need to sync the threads.
//Is this correct?
lock(this){
records = adapter.FindAllWithLocking;
}
}
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
SQL to Pull records:
WITH cte AS (
SELECT TOP (#topCount) *
FROM Candidates WITH (READPAST)
WHERE
isLocked = 0 and
isTested = 0 and
validated = 0
)
UPDATE cte
SET
isLocked = 1,
validationID = #validationId,
validationDateTime = #validationDateTime
OUTPUT INSERTED.*;

You shouldn't need to lock your threads as the database should be doing this on the request for you.

I see a few issues.
First, you are testing _stopThreads == false, but you have not revealed whether this a volatile read. Read the second of half this answer for a good description of what I am talking about.
Second, the lock is pointless because adapter is a local reference to a non-shared object and records is a local reference which just being replaced. I am assuming that the adapter makes a separate connection to the database, but if it shares an existing connection then some type of synchronization may need to take place since ADO.NET connection objects are not typically thread-safe.
Now, you probably will need locking somewhere to publish the results from the work item. I do not see where the results are being published to the main thread so I cannot offer any guidance here.
By the way, I would avoid showing a message box from a ThreadPool thread. The reason being that this will hang that thread until the message box closes.

You shouldn't lock(this) since its really easy for you to create deadlocks you should create a separate lock object. if you search for "lock(this)" you can find numerous articles on why.
Here's an SO question on lock(this)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

DocumentDb insert performance - c#

Related

How to get data from sql database using dapper in parallel/multiple threads?

Avoid fast post on webapi c#

Handle multiple similar requests in webapi

Proper way to filter BlockBuffer.RecieveAsync

Need to know if my threading lock does what it is supposed to in .Net?

Categories

Resources