Is it possible to modify Table Controllers in Azure Mobile Services .Net backend to handle multi-insertion per http request?
After coming back online it takes 2+ minutes for my app to sync its data. over 70% of the 2 minute is wasted over the network handshaking overhead.
I had to do something similar. My app creates around 10,000 new rows every time a user creates a new project so I made a custom controller in my Mobile Service to accept this. After the 10,000 entities are inserted I pull all of them back down to the local sync database.
I first created a custom controller.
public class BatchInsertController : ApiController
{
DbContext context;
protected override void Initialize(HttpControllerContext controllerContext)
{
base.Initialize(controllerContext);
this.context = new DbContext();
}
[HttpPost]
public async Task<bool> BatchInsert(List<Entity> entities)
{
try
{
this.context.Entities.AddRange(entities);
await this.context.SaveChangesAsync();
return true;
}
catch (System.Exception ex)
{
Trace.WriteLine("Error: " + ex);
return false;
}
}
}
Then I would call this custom controller method from my client code.
var entities = new List<Entity>();
// Add a bunch of entities to the list...
foreach (List<Entity> chunkedEntities in entities.ChunksOf(1000))
{
var response = await _client.InvokeApiAsync<List<Entity>, bool>("batchinsert", chunkedEntities);
}
I would have over 10,000 records at a time so I created an extension method to chunk the list and send 1,000 records at a time.
public static IEnumerable<IList<T>> ChunksOf<T>(this IEnumerable<T> sequence, int size)
{
List<T> chunk = new List<T>(size);
foreach (T element in sequence)
{
chunk.Add(element);
if (chunk.Count == size)
{
yield return chunk;
chunk = new List<T>(size);
}
}
if (chunk.Any())
{
yield return chunk;
}
}
After that I just did a PullAsync() on my local db. I came up with this after reading through this article.
Related
I am trying to get data from sql server using dapper. I have requirement to export 460K records stored in a Azure sql database. I decided to get data in batches, so I getting record of 10k records in each batch. I have planned to get the records in Parallel, so I added async methods to a list of task and did Task.WhenAll. The code works fine when i run locally but after deployed to k8s cluster, I am getting data read issue for some records. I am new to multi threading and I don't how to handle this issue. I tried to do a lock inside the method but the system crashes, Below is my code, the code might be clumsy because I was trying many solution to fix the issue.
for (int i = 0; i < numberOfPages; i++)
{
tableviewWithCondition.startRow = startRow;
resultData.Add(_tableviewRepository.GetTableviewRowsByPagination(tableviewExportCondition.TableviewName, modelMappingGroups, tableviewWithCondition.startRow, builder, pageSize, appName, i));
startRow += tableviewWithCondition.pageSize;
}
foreach(var task in resultData)
{
if (task != null)
{
dataToExport.AddRange(task.Result);
}
}
This is the method I implemented to get data from azure sql database using dapper.
public async Task<(IEnumerable<int> unprocessedData, IEnumerable<dynamic> rowData)> GetTableviewRowsByPagination(string tableName, IEnumerable<MappingGroup> tableviewAttributeDetails,
int startRow, SqlBuilder builder, int pageSize = 100, AppNameEnum appName = AppNameEnum.OptiSoil, int taskNumber = 1)
{
var _unitOfWork = _unitOfWorkServices.Build(appName.ToString());
List<int> unprocessedData = new List<int>();
try
{
var columns = tableviewAttributeDetails.Select(c => { return $"{c.mapping_group_value} [{c.attribute}]"; });
var joinedColumn = string.Join(",", columns);
builder.Select(joinedColumn);
var selector = builder.AddTemplate($"SELECT /**select**/ FROM {tableName} with (nolock) /**innerjoin**/ /**where**/ /**orderby**/ OFFSET {startRow} ROWS FETCH NEXT {(pageSize == 0 ? 100 : pageSize)} ROWS ONLY");
using (var connection = _unitOfWork.Connection)
{
connection.Open();
var data = await connection.QueryAsync(selector.RawSql, selector.Parameters);
Console.WriteLine($"data completed for task{taskNumber}");
return (unprocessedData, data);
}
}
catch(Exception ex)
{
Console.WriteLine($"Exception: {ex.Message}");
if (ex.InnerException != null)
Console.WriteLine($"InnerException: {ex.InnerException.Message}");
Console.WriteLine($"Error in fetching from row {startRow}");
unprocessedData.Add(startRow);
return (unprocessedData, null);
}
finally
{
_unitOfWork.Dispose();
}
}
The above code works fine locally, but in server I am getting below issue.
Exception: A transport-level error has occurred when sending the request to the server. (provider: TCP Provider, error: 35 - An internal exception was caught).
InnerException: The WriteAsync method cannot be called when another write operation is pending.
How to avoid this issue when fetch data in parallel tasks?
You're using the same connection and trying to execute multiple commands over it (I'm assuming this because of the naming), also should you be disposing the unit of work?
Rather than :
using (var connection = _unitOfWork.Connection)
{
connection.Open();
var data = await connection.QueryAsync(selector.RawSql, selector.Parameters);
Console.WriteLine($"data completed for task{taskNumber}");
return (unprocessedData, data);
}
Create a new connection for each item, if this is what you truly want to do. I imagine, and this is an educated guess it's working locally because of timing.
Also look into Task.WhenAll it's a better way collect all the results up. Rather than :
foreach(var task in resultData)
{
if (task != null)
{
dataToExport.AddRange(task.Result);
}
}
calling result on a task is usually bad practice.
I'm working on Mongo Db C# Driver version 2.0.1.27 and Mongo Db version 3.0.
Our Aim is to insert huge number of documents into the MongoDb Collection using "Insert" method.
Our Architecture calls this Add method multiple times for each thread.
Below is the Add method:
public bool Add(CallContext context, FileQueueEntity entity)
{
bool bResult = false;
// This logic is to prevent duplicate file.
// Consider new algorithm if supporting other files types
bResult = Delete(context, entity);
if (context.ErrorList.Count == 0)
{
var server = GetMongoServer();
try
{
var database = GetMongoDatabase(server);
var collection = database.GetCollection<FileQueueEntity>("QueueCollection");
entity.BaseMeta = null;
entity.IsNew = false;
collection.Insert(entity);
context.AddToUpdatedList(entity);
}
catch (Exception ex)
{
bResult = false;
context.AddError(ErrorSeverity.System, "DataAccess.AddFileQueue", GetThreadExceptionMessage(ex));
}
finally
{
}
}
return bResult;
}
Below is the Get MongoDatabase Method:
private MongoDatabase GetMongoDatabase(MongoServer mongoServer)
{
return mongoServer.GetDatabase(mConnectionBuilder.InitialCatalog);
}
Below is the one for GetMongoServer
private MongoServer GetMongoServer()
{
System.Threading.Monitor.Enter(_lock);
try
{
if (_mongoServer != null)
{
return _mongoServer;
}
DatabaseProviderFactory factory = new DatabaseProviderFactory();
var aDatabase = factory.Create("ConnectionStringName");
mConnectionBuilder = new SqlConnectionStringBuilder(aDatabase.ConnectionString);
var credential = MongoCredential.CreateCredential(mConnectionBuilder.InitialCatalog, mConnectionBuilder.UserID, mConnectionBuilder.Password);
MongoServerSettings databaseSettings = new MongoServerSettings();
var connectionStrings = mConnectionBuilder.DataSource.Split(',');
if (connectionStrings != null && connectionStrings.Count() > 1)
{
string ipAddress = connectionStrings[0];
int portNumber = Convert.ToInt32(connectionStrings[1], CultureInfo.InvariantCulture);
databaseSettings.Credentials = new[] { credential };
databaseSettings.Server = new MongoServerAddress(ipAddress, portNumber);
}
_mongoServer = new MongoServer(databaseSettings);
return _mongoServer;
}
finally
{
System.Threading.Monitor.Exit(_lock);
}
}
and this is called in this way:
foreach (var n in entities)
{
Add(n);
}
The Foreach loop is called for every instance separately.
The problem is that we are seeing that all the files are not reaching the db as every time there are
random files which are missing from the table.
The entity that we are sending is very light(hardly 400-500bytes).
The Number of files will be 2000-5000 max which will be cleared on daily basis.
So the maximum storage will not be exceeded in this case
For Example:
Thread 1: 50 files - Random 48 files are inserted
Thread 2: 80 files - Random 75 files are inserted
Thread 3: 70 files - Random 60 files are inserted
Thread 4: 60 files - Random 59 files are inserted
Are we missing any Mongo Configuration as it is not throwing any exception and fails silently to insert the records,
Which is a bit strange.
The response we are getting duing insert is
Response: { "ok" : 1, "n" : NumberLong(0) }
It is observed that all the time random files from each of the thread are failing every time.
Can any one help me on this? Are we missing any MongoDB configuration?
couple of points for consideration:
IMongoColleciton can be retrieved once and stored as static/singleton. that is the recommended pattern.
deletion is redundant. use collection.ReplaceOne(e => e.Id == entity.Id, entity);
better yet, use bulk replace with batches of about 50 to 100 entities in each iteration or thread.
try to update to the latest server and driver. So many good changes have occurred since v2.0
I have problem in when user post the data. Some times the post run so fast and this make problem in my website.
The user want to register a form about 100$ and have 120$ balance.
When the post (save) button pressed sometimes two post come to server very fast like:
2018-01-31 19:34:43.660 Register Form 5760$
2018-01-31 19:34:43.663 Register Form 5760$
Therefore my client balance become negative.
I use If in my code to check balance but the code run many fast and I think both if happen together and I missed them.
Therefore I made Lock Controll class to avoid concurrency per user but not work well.
I made global Action Filter to control the users this is my code:
public void OnActionExecuting(ActionExecutingContext context)
{
try
{
var controller = (Controller)context.Controller;
if (controller.User.Identity.IsAuthenticated)
{
bool jobDone = false;
int delay = 0;
int counter = 0;
do
{
delay = LockControllers.IsRequested(controller.User.Identity.Name);
if (delay == 0)
{
LockControllers.AddUser(controller.User.Identity.Name);
jobDone = true;
}
else
{
counter++;
System.Threading.Thread.Sleep(delay);
}
if (counter >= 10000)
{
context.HttpContext.Response.StatusCode = 400;
jobDone = true;
context.Result = new ContentResult()
{
Content = "Attack Detected"
};
}
} while (!jobDone);
}
}
catch (System.Exception)
{
}
}
public void OnActionExecuted(ActionExecutedContext context)
{
try
{
var controller = (Controller)context.Controller;
if (controller.User.Identity.IsAuthenticated)
{
LockControllers.RemoveUser(controller.User.Identity.Name);
}
}
catch (System.Exception)
{
}
}
I made list static list of user and sleep their thread until previous task happen.
Is there any better way to manage this problem?
So the original question has been edited so this answer is invalid.
so the issue isn't that the code runs too fast. Fast is always good :) The issue is that the account is going into negative funds. If the client decides to post a form twice that is the clients fault. It maybe that you only want the client to pay only once which is an other problem.
So for the first problem, I would recommend a using transactions (https://en.wikipedia.org/wiki/Database_transaction) to lock your table. Which means that the add update/add a change (or set of changes) and you force other calls to that table to wait until those operations have been done. You can always begin your transaction and check that the account has the correct amount of funds.
If the case is that they are only ever meant to pay once then.. then have a separate table that records if the user has payed (again within a transaction), before processing the update/add.
http://www.entityframeworktutorial.net/entityframework6/transaction-in-entity-framework.aspx
(Edit: fixing link)
You have a few options here
You implement ETag functionality in your app which you can use for optimistic concurrency. This works well, when you are working with records, i.e. you have a database with a data record, return that to the user and then the user changes it.
You could add an required field with a guid to your view model which you pass to your app and add it to in memory cache and check it on each request.
public class RegisterViewModel
{
[Required]
public Guid Id { get; set; }
/* other properties here */
...
}
and then use IMemoryCache or IDistributedMemoryCache (see ASP.NET Core Docs) to put this Id into the memory cache and validate it on request
public Task<IActioNResult> Register(RegisterViewModel register)
{
if(!ModelState.IsValid)
return BadRequest(ModelState);
var userId = ...; /* get userId */
if(_cache.TryGetValue($"Registration-{userId}", register.Id))
{
return BadRequest(new { ErrorMessage = "Command already recieved by this user" });
}
// Set cache options.
var cacheEntryOptions = new MemoryCacheEntryOptions()
// Keep in cache for 5 minutes, reset time if accessed.
.SetSlidingExpiration(TimeSpan.FromMinutes(5));
// when we're here, the command wasn't executed before, so we save the key in the cache
_cache.Set($"Registration-{userId}", register.Id, cacheEntryOptions );
// call your service here to process it
registrationService.Register(...);
}
When the second request arrives, the value will already be in the (distributed) memory cache and the operation will fail.
If the caller do not sets the Id, validation will fail.
Of course all that Jonathan Hickey listed in his answer below applies to, you should always validate that there is enough balance and use EF-Cores optimistic or pessimistic concurrency
I am currently working on an API where a record should only be allowed to be pulled once. It's basically a queue where once a client pulls the record, the Retrieved field on the record is marked true. The Get calls only pull records where the Retrieved field is false.
Controller:
[HttpGet]
public virtual IActionResult GetAll([FromQuery] int? limit)
{
try
{
return Ok(_repository.Get(limit));
}
catch
{
return new StatusCodeResult(StatusCodes.Status500InternalServerError);
}
}
Repository:
public IQueryable<Report> Get(int? limit)
{
IQueryable<Report> reports;
if (limit == null)
{
reports = _context.Reports.Where(r => r.Retrieved == false);
}
else
{
reports = _context.Reports.Where(r => r.Retrieved == false).Take((int)limit);
}
return reports;
}
What would be the best way to modify the records that have been pulled by the Get call? If I do the modification before returning results from the repository code, then when the controller actually converts the IQueryable to real data, the field has changed and it won't pull any results, but the Controller seems like the wrong place to be doing this sort of modification to the database.
I would split this functionality away from the retrieval. Let the caller/client indicate that the report has been successfully retrieved and read with a second call. It is a little more overhead but it adds resilience. Example: if there is a failure in the retrieval after the server call (maybe in the network on browser or client app) then the client has another opportunity to retrieve the data.
Controller:
[HttpPut]
public virtual async Task<IActionResult> MarkAsRetrieved(IEnumerable<int> reportIds, CancellationToken token)
{
await _repository.MarkRetrievedAsync(reportIds, token).ConfigureAwait(true);
return Ok();
}
Repository:
public Task MarkRetrievedAsync([FromBody]IEnumerable<int> reportIds, CancellationToken token)
{
foreach (Report report in reportIds.Select(x => new Report{ReportId = x, Retrieved = false}))
{
_context.Reports.Attach(report);
report.Retrieved = true;
}
return _context.SaveChangesAsync(token);
}
Notes
It is only necessary to send over the identifier for a Report instance. You can then attach an empty instance with that same identifier and update the Retrieved property to true, just that will be sent in the corresponding store update statement.
I would changed the Retrieved bit in the database to a handle of some kind-- Guid or record id to another table that records the fetch, or some other unique value. Then I would determine the handle, update the records I am about to fetch with that that handle, then retrieve the records that match that handle. At any point if you fail, you can set the retrieved handle back to NULL for the handle value you started.
I have a data processing job that consists of about 20 sequential steps. The steps all fall under one of three categories:
do some file manipulation
import / export data from a database
make a call to a 3rd party web API
I've refactored the code from one long, awful looking method to a pipeline pattern, using examples here and here. All of the steps are TransformBlock, such as
var stepThirteenPostToWebApi = new TransformBlock<FileInfo, System.Guid>(async csv =>
{
dynamic task = await ApiUtils.SubmitData(csv.FullName);
return task.guid;
});
The code works most of the time, but occasionally a step in the pipeline fails for whatever reason - let's say a corrupt file can't be read in step 6 of 20 (just an example - any step could fail). The pipeline stops running further tasks, as it should.
However, the 3rd party web API introduces a challenge - we are charged for each job we initiate whether we execute all 20 steps or just the first one.
I would like to be able to fix whatever went wrong in the problem step (again, for our example let's say I fix the corrupt file in step 6 of 20), then pick back up at step 6. The 3rd party web API has a GUID for each job, and is asynchronous, so that should be fine - after the problem is fixed, it will happily let a job resume with remaining steps.
My question: Is it possible (and if so advisable?) to design a pipeline that could begin at any step, assuming the pre-requisites for that step were valid?
It would look something like:
job fails on step 6 and logs step 5 as the last successful step
a human comes along and fixes whatever caused step 6 to fail
a new pipeline is started at step 6
I realize a brute-force way would be to have StartAtStep2(), StartAtStep3(), StartAtStep4() methods. That doesn't seem like a good design, but I'm a bit new at this pattern so maybe that's acceptable.
The brute force way is not that bad, for example your above code would just need to be
bool StartAtStepThirteen(FileInfo csv)
{
return stepThirteenPostToWebApi.Post(csv);
}
The setup of the chain should be a separate method than the executing of the chain. You should save stepThirteenPostToWebApi in a class level variable in a class that represent's the entire chain, the setup of the chain could be done in the class's constructor.
Here is a simple 3 step version of the process. When a error happens instead of faulting the task chain I log the error and pass null along the chain for invalid entries. You could make that log method raise a event and then the user can decide what to do with the bad entry.
public class WorkChain
{
private readonly TransformBlock<string, FileInfo> stepOneGetFileInfo;
private readonly TransformBlock<FileInfo, System.Guid?> stepTwoPostToWebApi;
private readonly ActionBlock<System.Guid?> stepThreeDisplayIdToUser;
public WorkChain()
{
stepOneGetFileInfo = new TransformBlock<string, FileInfo>(new Func<string, FileInfo>(GetFileInfo));
stepTwoPostToWebApi = new TransformBlock<FileInfo, System.Guid?>(new Func<FileInfo, Task<Guid?>>(PostToWebApi));
stepThreeDisplayIdToUser = new ActionBlock<System.Guid?>(new Action<Guid?>(DisplayIdToUser));
stepOneGetFileInfo.LinkTo(stepTwoPostToWebApi, new DataflowLinkOptions() {PropagateCompletion = true});
stepTwoPostToWebApi.LinkTo(stepThreeDisplayIdToUser, new DataflowLinkOptions() {PropagateCompletion = true});
}
public void PostToStepOne(string path)
{
bool result = stepOneGetFileInfo.Post(path);
if (!result)
{
throw new InvalidOperationException("Failed to post to stepOneGetFileInfo");
}
}
public void PostToStepTwo(FileInfo csv)
{
bool result = stepTwoPostToWebApi.Post(csv);
if (!result)
{
throw new InvalidOperationException("Failed to post to stepTwoPostToWebApi");
}
}
public void PostToStepThree(Guid id)
{
bool result = stepThreeDisplayIdToUser.Post(id);
if (!result)
{
throw new InvalidOperationException("Failed to post to stepThreeDisplayIdToUser");
}
}
public void CompleteAdding()
{
stepOneGetFileInfo.Complete();
}
public Task Completion { get { return stepThreeDisplayIdToUser.Completion; } }
private FileInfo GetFileInfo(string path)
{
try
{
return new FileInfo(path);
}
catch (Exception ex)
{
LogGetFileInfoError(ex, path);
return null;
}
}
private async Task<Guid?> PostToWebApi(FileInfo csv)
{
if (csv == null)
return null;
try
{
dynamic task = await ApiUtils.SubmitData(csv.FullName);
return task.guid;
}
catch (Exception ex)
{
LogPostToWebApiError(ex, csv);
return null;
}
}
private void DisplayIdToUser(Guid? obj)
{
if(obj == null)
return;
Console.WriteLine(obj.Value);
}
}