ASP.Net Core API response takes too much time - c#

I have a SQL database table with 9000 rows and 97 columns. Its primary key has 2 columns: Color and Name. You can see the simplified table to image better:
I have an ASP.NET Core API listening at URL api/color/{colorName}, it reads the table to get color information. Currently I have 3 colors and about 3000 rows each.
It takes too much time. It reads table in 2383ms and maps to DTO in 14ms. And after that I immediately return the DTO to consumer but somehow the API takes 4135.422ms. I don't understand why. I guess I should take 2407.863ms but it not. It takes almost 2 times more.
You can see my code and logs below. Do you have an idea how can I improve the response time?
I use Entity Framework Core 3.1, AutoMapper and ASP.NET Core 3.1.
Service:
public async Task<IEnumerable<ColorDTO>> GetColors(string requestedColor)
{
var watch = System.Diagnostics.Stopwatch.StartNew();
var colors = await _dbContext.Colors.Where(color => color.color == requestedColor).ToListAsync();
watch.Stop();
_logger.LogError("Color of:{requestedColor} Reading takes:{elapsedMs}", requestedColor, watch.ElapsedMilliseconds);
var watch2 = System.Diagnostics.Stopwatch.StartNew();
var colorDtos = _mapper.Map<IEnumerable<ColorDTO>>(colors);
watch2.Stop();
_logger.LogError("Color of:{requestedColor} Mapping takes:{elapsedMs}", requestedColor, watch2.ElapsedMilliseconds);
return colorDtos;
}
Controller:
public async Task<ActionResult<IEnumerable<ColorDTO>>> GetBlocksOfPanel(string requestedColor)
{
return Ok(await _colorService.GetColors(requestedColor));
}
And the logs:
2020-04-27 15:21:54.8793||0HLVAKLTJO59T:00000003|MyProject.Api.Services.IColorService|INF|Color of Purple Reading takes:2383ms
2020-04-27 15:21:54.8994||0HLVAKLTJO59T:00000003|MyProject.Api.Services.IColorService|INF|Color of Purple Mapping takes:14ms
2020-04-27 15:21:54.9032||0HLVAKLTJO59T:00000003|Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker|INF|Executed action method MyProject.Api.Web.Controllers.ColorsController.GetColors (MyProject.Api.Web), returned result Microsoft.AspNetCore.Mvc.OkObjectResult in 2407.863ms.
2020-04-27 15:21:54.9081||0HLVAKLTJO59T:00000003|Microsoft.AspNetCore.Mvc.Infrastructure.ObjectResultExecutor|INF|Executing ObjectResult, writing value of type 'System.Collections.Generic.List`1[[MyProject.Api.Contracts.Dtos.ColorDTO, MyProject.Api.Contracts, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null]]'.
2020-04-27 15:21:56.4895||0HLVAKLTJO59T:00000003|Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker|INF|Executed action MyProject.Api.Web.Controllers.ColorsController.GetColors (MyProject.Api.Web) in 4003.8022ms
2020-04-27 15:21:56.4927||0HLVAKLTJO59T:00000003|Microsoft.AspNetCore.Routing.EndpointMiddleware|INF|Executed endpoint 'MyProject.Api.Web.Controllers.ColorsController.GetColors (MyProject.Api.Web)'
2020-04-27 15:21:56.4972||0HLVAKLTJO59T:00000003|Microsoft.AspNetCore.Hosting.Diagnostics|INF|Request finished in 4135.422ms 200 application/json; charset=utf-8

As #ejwill mentioned in his comment, you need to consider latency in the entire operation. Fetching from the database and mapping to DTOs is only part of what is happening during the round trip of the request and response to your API.
You can probably reduce the the query time against your database table through some optimizations there. You don't indicate what database you're using, but a composite key based on two string/varchar values may not necessarily be the most performant, and the use of indexes on values you're filtering on may also help -- there are tradeoffs there depending on whether you're optimizing for write or for read. That being said, 97 columns is not trivial either way. Do you need to query and return all 97 columns over the API? Is pagination an option?
If you must return all the data for all 97 columns at once and you're querying the API frequently, you can also consider the use of an in-memory cache, especially if the table is not changing often; instead of making the roundtrip to the database every time, you keep a copy of the data in memory so it can be returned much more quickly. You can look at an implementation of an in-memory cache that supports a generational model to keep serving up data while new versions are fetched.
https://github.com/jfbosch/recache

Serialization of result could take huge time.
First thing is serialization itself: if you return 3k records, it will take significant time to serialize it to JSON or XML. Consider moving to more compact binary formats.
Second thing is memory and GC. If amount of serialized data exceeds 85,000 bytes, memory for this data will be allocated on LOH in one chunk. This could take time. You might consider inspecting your LOH and look for response data stored there. Possible workaround could be responding with chunks of data and utilization of kind of paging with offset and position.
You can easily check that serialization causes performance trouble: leave call to DB as it is, but return to the client only 100-200 rows instead of the whole result, or return less object fields (for example, only 3). Time should be reduced.

your problem relates to SQL side. you should check the indexing of your columns and run your query in the execution plan state to find your bottleneck. Also, for increasing performance, I suggest that rewrite code in the async state.

Related

Large HTTPResponseMessage causes .NET Core server process to run out of memory

I have a C# .NET 2.2 web server process that exposes an API. When a request comes in, the server needs to make its own HTTP request to a database API. Depending on the query, the response from the database can be very large, and in some cases this is large enough that my .NET process crashes with (Memory quota exceeded) in the logs.
The code that sends off the request looks like this:
string endpoint_url = "<database service url>";
var request_body = new StringContent(query, Encoding.UTF8, "<content type>");
request_body.Headers.ContentType.CharSet = "";
try {
var request_task = Http.client.PostAsync(endpoint_url, request_body);
if (await Task.WhenAny(request_task, Task.Delay(timeoutSeconds*1000)) == request_task) {
request_task.Result.EnsureSuccessStatusCode();
var response = await request_task.Result.Content.ReadAsStringAsync();
JObject json_result = JObject.Parse(response);
if (json_result["errors"] is null) {
return json_result;
} else {
// return error
}
} else {
// return timeout error
}
} catch(Exception e) {
// return error
}
My question is, what is the best way of protecting against my web service going down when a query returns a large response like this? The .NET Core best practices suggest that I shouldn't be loading the response body into a string wholesale, but doesn't really suggest an alternative.
I want to fail gracefully and return an error to the client rather than causing an outage of the .NET service, so setting some kind of limit on the response size would work. Unfortunately the database service in question does not return a Content-Length header so I can't just check that.
My web server currently has 512MB of memory available, which I know is not much, but I'm concerned that this error could happen for a large response regardless of the amount of memory I have available. My main concern is guaranteeing that my .NET service wont crash regardless of the size of response from the database service.
If Http.client is an HttpClient you can restrict the maximum data that it will read before aborting the operation and throwing an exception with it's MaxResponseContentBufferSize property. By default it's set to 2Gb, that explains why makes your server go away if it only has 512Mb of RAM, so you can set it to something like 10/20Mb and handle the exception if it has been overflown.
The simplest approach that you could use is to make decision based on the returning row count.
If you are using ExecuteReader then it will not return the affected rows, but you can overcome this limitation by simply returning two result sets. The first result set would have a single row with a single column, which tells you the row count and based on that you can decide whether or not you are calling the NextResult and process the requested data.
If you are using stored procedures then you can use an out parameter to indicate the retrieved row count. By using either the ##ROWCOUNT variable or the ROWCOUNT_BIG() function. Yet again you can branch on that data.
The pro side of these solutions is that you don't have to read any record if it would outgrow your available space.
The con side of these solutions is that determining the threshold could be hard, because it could depend on the query itself, on one (or more) parameter(s) of it, on the table size, etc.
Well you definitely shouldn't be creating an unbounded string that could be larger than your heap size but it's more complicated than just that advice. As others are pointing out the entire system needs to work together be able to return large results with a limited memory footprint.
The simplest answer to your direct question - how can I send back an error if the response won't fit in memory - would be to create a buffer of some limited "max" size and read only that much data from the response. If it doesn't fit in your buffer then it's too large and you can return an error.
But in general that's a poor design because the "max" is impossible to statically derive - it depends on server load.
The better answer is to avoid buffering the entire result before sending it to the client and instead stream the results to the client - read in a buffer full of data and write out that buffer - or some processed form of that buffer - to the client. But that requires some synergy between the back-end API, your service and possibly the client.
If your service has to parse a complete object - as you're showing with Json.Parse - then you'll likely need to re-think your design in general.

How to insert into documentDB from Excel file containing 5000 records?

I have an Excel file that originally had about 200 rows, and I was able to convert the excel file to a data table and everything got inserted into the documentdb correctly.
The Excel file now has 5000 rows and it is not inserting after 30-40 records insertion and rest of all the rows are not inserted into the documentdb
I found some exception as below.
Microsoft.Azure.Documents.DocumentClientException: Exception:
Microsoft.Azure.Documents.RequestRateTooLargeException, message:
{"Errors":["Request rate is large"]}
My code is :
Service service = new Service();
foreach(data in exceldata) //exceldata contains set of rows
{
var student = new Student();
student.id= "";
student.name = data.name;
student.age = data.age;
student.class = data.class;
student.id = service.savetoDocumentDB(collectionLink,student); //collectionlink is a string stored in web.config
students.add(student);
}
Class Service
{
public async Task<string> AddDocument(string collectionLink, Student data)
{
this.DeserializePayload(data);
var result = await Client.CreateDocumentAsync(collectionLink, data);
return result.Resource.Id;
}
}
Am I doing anything wrong?
Any help would be greatly appreciable.
Update:
As of 4/8/15, DocumentDB has released a data import tool, which supports JSON files, MongoDB, SQL Server, and CSV files. You can find it here: http://www.microsoft.com/en-us/download/details.aspx?id=46436
In this case, you can save your Excel file as a CSV and then bulk-import records using the data import tool.
Original Answer:
DocumentDB Collections are provisioned 2,000 request-units per second. It's important to note - the limits are expressed in terms of request-units and not requests; so writing larger documents costs more than smaller documents, and scanning is more expensive than index seeks.
You can measure the overhead of any operations (CRUD) by inspecting the x-ms-request-charge HTTP response header or the RequestCharge property in the ResourceResponse/FeedResponse objects returned by the SDK.
A RequestRateTooLargeException is thrown when you exhaust the provisioned throughput. Some solutions include:
Back off w/ a short delay and retry whenever you encounter the exception. A recommended retry delay is included in the x-ms-retry-after-ms HTTP response header. Alternatively, you could simply batch requests with a short delay
Use lazy indexing for faster ingestion rate. DocumentDB allows you to specify indexing policies at the collection level. By default, the index is updated synchronously on each write to the collection. This enables the queries to honor the same consistency level as that of the document reads without any delay for the index to “catch up”. Lazy indexing can be used to amortize the work required to index content over a longer period of time. It is important to note, however, that when lazy indexing is enabled, query results will be eventually consistent regardless of the consistency level configured for the DocumentDB account.
As mentioned, each collection has a limit of 2,000 RUs - you can increase throughput by sharding / partitioning your data across multiple collections and capacity units.
Delete empty collections to utilize all provisioned throughput - every document collection created in a DocumentDB account is allocated reserved throughput capacity based on the number of Capacity Units (CUs) provisioned, and the number of collections created. A single CU makes available 2,000 request units (RUs) and supports up to 3 collections. If only one collection is created for the CU, the entire CU throughput will be available for the collection. Once a second collection is created, the throughput of the first collection will be halved and given to the second collection, and so on. To maximize throughput available per collection, I'd recommend the number of capacity units to collections is 1:1.
References:
DocumentDB Performance Tips:
http://azure.microsoft.com/blog/2015/01/27/performance-tips-for-azure-documentdb-part-2/
DocumentDB Limits:
http://azure.microsoft.com/en-us/documentation/articles/documentdb-limits/

'Streaming' data into Sql server

I'm working on a project where we're receiving data from multiple sources, that needs to be saved into various tables in our database.
Fast.
I've played with various methods, and the fastest I've found so far is using a collection of TableValue parameters, filling them up and periodically sending them to the database via a corresponding collection of stored procedures.
The results are quite satisfying. However, looking at disk usage (% Idle Time in Perfmon), I can see that the disk is getting periodically 'thrashed' (a 'spike' down to 0% every 13-18 seconds), whilst in between the %Idle time is around 90%. I've tried varying the 'batch' size, but it doesn't have an enormous influence.
Should I be able to get better throughput by (somehow) avoiding the spikes while decreasing the overall idle time?
What are some things I should be looking out to work out where the spiking is happening? (The database is in Simple recovery mode, and pre-sized to 'big', so it's not the log file growing)
Bonus: I've seen other questions referring to 'streaming' data into the database, but this seems to involve having a Stream from another database (last section here). Is there any way I could shoe-horn 'pushed' data into that?
A very easy way of inserting loads of data into an SQL-Server is -as mentioned- the 'bulk insert' method. ADO.NET offers a very easy way of doing this without the need of external files. Here's the code
var bulkCopy = new SqlBulkCopy(myConnection);
bulkCopy.DestinationTableName = "MyTable";
bulkCopy.WriteToServer (myDataSet);
That's easy.
But: myDataSet needs to have exactly the same structure as MyTable, i.e. Names, field types and order of fields must be exactly the same. If not, well there's a solution to that. It's column mapping. And this is even easier to do:
bulkCopy.ColumnMappings.Add("ColumnNameOfDataSet", "ColumnNameOfTable");
That's still easy.
But: myDataSet needs to fit into memory. If not, things become a bit more tricky as we have need a IDataReader derivate which allows us to instantiate it with an IEnumerable.
You might get all the information you need in this article.
Building on the code referred to in alzaimar's answer, I've got a proof of concept working with IObservable (just to see if I can). It seems to work ok. I just need to put together some tidier code to see if this is actually any faster than what I already have.
(The following code only really makes sense in the context of the test program in code download in the aforementioned article.)
Warning: NSFW, copy/paste at your peril!
private static void InsertDataUsingObservableBulkCopy(IEnumerable<Person> people,
SqlConnection connection)
{
var sub = new Subject<Person>();
var bulkCopy = new SqlBulkCopy(connection);
bulkCopy.DestinationTableName = "Person";
bulkCopy.ColumnMappings.Add("Name", "Name");
bulkCopy.ColumnMappings.Add("DateOfBirth", "DateOfBirth");
using(var dataReader = new ObjectDataReader<Person>(people))
{
var task = Task.Factory.StartNew(() =>
{
bulkCopy.WriteToServer(dataReader);
});
var stopwatch = Stopwatch.StartNew();
foreach(var person in people) sub.OnNext(person);
sub.OnCompleted();
task.Wait();
Console.WriteLine("Observable Bulk copy: {0}ms",
stopwatch.ElapsedMilliseconds);
}
}
It's difficult to comment without knowing the specifics, but one of the fastest ways to get data into SQL Server is Bulk Insert from a file.
You could write the incoming data to a temp file and periodically bulk insert it.
Streaming data into SQL Server Table-Valued parameter also looks like a good solution for fast inserts as they are held in memory. In answer to your question, yes you could use this, you just need to turn your data into a IDataReader. There's various ways to do this, from a DataTable for example see here.
If your disk is a bottleneck you could always optimise your infrastructure. Put database on a RAM disk or SSD for example.

SQL - Better two queries instead of one big one

I am working on a C# application, which loads data from a MS SQL 2008 or 2008 R2 database. The table looks something like this:
ID | binary_data | Timestamp
I need to get only the last entry and only the binary data. Entries to this table are added irregular from another program, so I have no way of knowing if there is a new entry.
Which version is better (performance etc.) and why?
//Always a query, which might not be needed
public void ProcessData()
{
byte[] data = "query code get latest binary data from db"
}
vs
//Always a smaller check-query, and sometimes two queries
public void ProcessData()
{
DateTime timestapm = "query code get latest timestamp from db"
if(timestamp > old_timestamp)
data = "query code get latest binary data from db"
}
The binary_data field size will be around 30kB. The function "ProcessData" will be called several times per minutes, but sometimes can be called every 1-2 seconds. This is only a small part of a bigger program with lots of threading/database access, so I want to the "lightest" solution. Thanks.
Luckily, you can have both:
SELECT TOP 1 binary_data
FROM myTable
WHERE Timestamp > #last_timestamp
ORDER BY Timestamp DESC
If there is a no record newer than #last_timestamp, no record will be returned and, thus, no data transmission takes place (= fast). If there are new records, the binary data of the newest is returned immediately (= no need for a second query).
I would suggest you perform tests using both methods as the answer would depend on your usages. Simulate some expected behaviour.
I would say though, that you are probably okay to just do the first query. Do what works. Don't prematurely optimise, if the single query is too slow, try your second two-query approach.
Two-step approach is more efficient from overall workload of system point of view:
Get informed that you need to query new data
Query new data
There are several ways to implement this approach. Here are a pair of them.
Using Query Notifications which is built-in functionality of SQL Server supported in .NET.
Using implied method of getting informed of database table update, e.g. one described in this article at SQL Authority blog
I think that the better path is a storedprocedure that keeps the logic inside the database, Something with an output parameter with the data required and a return value like a TRUE/FALSE to signal the presence of new data

Can I use LINQ to skip a collection and just return 100 records?

I have the following that returns a collection from Azure table storage where Skip is not implemented. The number of rows returned is approximately 500.
ICollection<City> a = cityService.Get("0001I");
What I would like to do is to be able to depending on an argument have just the following ranges returned:
records 1-100 passing in 0 as an argument to a LINQ expression
records 101-200 passing in 100 as an argument to a LINQ expression
records 201-300 passing in 200 as an argument to a LINQ expression
records 301-400 passing in 300 as an argument to a LINQ expression
etc
Is there some way I can add to the above and use link to get these ranges
of records returned:
As you already stated in your question, the Skip method is not implemented in Windows Azure Table storage. This means you have 2 options left:
Option 1
Download all data from table storage (by using ToList, see abatishchev's answer) and execute the Skip and Take methods on this complete list. In your question you're talking about 500 records. If the number of records doesn't grow too much this solution should be OK for you, just make sure that all records have the same partition key.
If the data grows you can still use this approach, but I suggest you evaluate a caching solution to store all the records instead of loading them from table storage over and over again (this will improve the performance, but don't expect this to work with very large amounts of data). Caching is possible in Windows Azure using:
Windows Azure Caching (Preview)
Windows Azure Shared Caching
Option 2
The CloudTableQuery class allows you to query for data, but more important to receive a continuation token to build a paging implementation. This allows you to detect if you can query for more data, the pagination example on Scott's blogpost (see nemensv's comment) uses this.
For more information on continuation tokens I suggest you take a look at Jim's blogpost: Azure#home Part 7: Asynchronous Table Storage Pagination. By using continuation tokens you only download the data for the current page meaning it will also work correctly even if you have millions of records. But you have to know the downside of using continuation tokens:
This won't work with the Skip method out of the box, so it might not be a solution for you.
No page 'numbers', because you only know if there's more data (not how much)
No way to count all records
If paging is not supported by the underlying engine, the only way to implement it is to load all the data into memory and then perform paging:
var list = cityService.Get("0001I").ToList(); // meterialize
var result = list.Skip(x).Take(y);
Try something like this:
cityService.Get("0001I").ToList().Skip(n).Take(100);
This should return records 201-300:
cityService.Get("0001I").ToList().Skip(200).Take(100);
a.AsEnumerable().Skip(m).Take(n)

Categories