I need a bit of advice when it comes to making efficient use of database resource.
At the moment, I'm building an ordering system that takes an uploaded file, and runs through that file adding each line to an order.
At the same time this is done, the app checks that the product code requested is available to sell to that customer.
Given that the file can contain upwards of 200 lines (and thus that many requests to the database to check), I'm eager to know if, it's more efficient to make a single request to the database for all the product codes available, and then run the check against that list, even though there will be roughly 2000 codes in that list.
So, either 200 sequential one result requests, or a single 2000 result request.
The site will be handling about 130 uploads within a 4-5 hour period, and must traverse a VPN from Azure to our database server.
This looks like another case of Permature Optimization (tam tam taaaaam).
You don't know that you have a problem, and yet you're trying to solve it. The first thing you should see is if there's a real performance problem here. My guess is - there isn't. You're going to read 2000 records and write 200 records once every few minutes. That's really not something to worry about.
But don't take my word for it, try it out. See how long it takes you to load those 2000 records and write those 200 records. If there's a problem, try to optimize.
By the way, optimizing this by breaking the request into 200 smaller requests is unlikely to work. Let's cross this bridge when you get there.
It will definitely be more efficient to make a single query that gets 2000 rows than to make 200 queries that gets a single row. For the single row queries the actual data would be a minor part of the traffic, it would be mostly overhead.
Another alternative would be to put that check in the query that adds the line to the order, that way you don't need a separate query to check the product first. If the product can't be sold to the customer the query would simply not insert any record, and it can return the number of records added so that the calling code can determine if the line was added or not.
Example:
create procedure AddOrderLine
#OrderId int,
#ProductId int,
#Quantity int
as
set nocount on
insert into OrderLines (OrderId, ProductId, Quantity)
select
o.OrderId,
#ProductId,
#Quantity
from
Orders o
inner join AllowedProducts a on a.CustomerId = o.CustomerId and a.ProductId = #ProductId
where
OrderId = #OrderId
return ##rowcount
Related
Am developing a C# windows desktop application in Visual Studio and am literally stack with regards to how i should phrase my code when it has been sold in my supermarket system so that it reduces the available stock in the database any assistance rendered will be much appreciated
I tried with this Sql but it failed miserably cause i didnt even have the know the C# to add to it
string query = "update ProductTable set Product_Quantity = (Product_Quantity-'{OrderProductQuantity}') where Product_Name ='{productName}'";
The answer?
Is you don't write any code at all to reduce the stock.
A simple query will get you the current stock
WITH MYstock as
(SELECT StockID,
(SELECT SUM(StockSold) from Invoices WHERE Invoices.StockID = Stock.StockDI) as ItemsSold,
(SELECT SUM(StockAdded) from tblStock WHERE TblStock.StockID) AS ItemsAdded)
FROM tblSTock
)
SELECT * from MySTock, (ItemsAdded - ItemsSold) as InventoryLevel
So, as you can see, you don't have to write any code at all. You simple sum the items added, less the items sold, and you are done.
Now, above is SQL server syntax, and you could say use a MySQL view - I don't know how (or if) aliased columns can be re-used, but above gives you the basic idea here.
Really nice? Maybe you have to edit a sales transaction, or even delete a row - it will automatic update the inventory value - since you compute it on the fly as required.
As a result, your UI code becomes VERY easy to write. You never have to write ANY code to update the inventory amounts, since you are free to calculate the in stock value anytime you want based on above.
I am working on a console app (C#, asp-core 2.1, Entity Framework Core) which is connected to a local SQL Server database, the default (localdb)\MSSQLLocalDB (SQL Server 2016 v13.0) provided with Visual Studio.
The problem I am facing is that it takes quite a long time to insert data into a table. The table has 400.000 rows, 6 columns, and I insert them 200 at a time.
Right now, the request takes 20 seconds to be executed. And this execution time keeps increasing. Considering the fact that I still have 20.000 x200 rows to insert, it's worth figuring out where does this problem comes from!
A couple of facts :
There is no Index on the table
My computer is not new but I have a quite good hardware (i7, 16 Go RAM) and I don't hit 100% CPU while inserting
So, my questions are :
Is 400 k rows considered to be a 'large' database? I've never worked with a table that big before but I thought it was common to have a dataset like this.
How can I investigate where does the inserting time come from? I have only Visual Studio installed so far (but I am opened to other options)
Here is the SQL code of the table in question :
CREATE TABLE [dbo].[KfStatDatas]
(
[Id] INT IDENTITY (1, 1) NOT NULL,
[DistrictId] INT NOT NULL,
[StatId] INT NOT NULL,
[DataSourceId] INT NOT NULL,
[Value] NVARCHAR(300) NULL,
[SnapshotDate] DATETIME2(7) NOT NULL
);
EDIT
I ran SQL Server Management Studio, and I found the request that is the slowing down the whole process. It is the insertion request.
But, by looking at the SQL Request create by Entity Framework, it looks like it's doing an inner join and going through the whole table, which would explain why the processing time increases with the table.
I may miss a point but why would you need to enumerate the whole table to add rows?
Raw request being executed :
SELECT [t].[Id]
FROM [KfStatDatas] t
INNER JOIN #inserted0 i ON ([t].[Id] = [i].[Id])
ORDER BY [i].[_Position]
EDIT and SOLUTION
I eventually found the issue, and it was a stupid mistake : my Id field was not declared as primary key! So the system had to go through the whole DB for every inserted row. I added the PK and it now takes...100 ms for 200 rows, and this duration is stable.
Thanks for your time!
I think you may simply missing an primary key. You've declared to EF that Id is the Entity Key, but you don't have a unique index on the table to enforce that.
And when EF wants to fetch the inserted IDs, without an index, it's expensive. So this query
SELECT t.id from KfStatDatas t
inner join #inserted0 i
on t.id = i.id
order by i._Position
performs 38K logical reads, and takes 16sec on average.
So try:
ALTER TABLE [dbo].[KfStatDatas]
ADD CONSTRAINT PK_KfStatDatas
PRIMARY KEY (id)
BTW are you sure this is EF6? This looks more like EF Core batch insert.
No 400K rows is not large.
The most efficient way to insert a large number of rows from .NET is with SqlBulkCopy. This should take seconds rather than minutes for 400K rows.
With batching individual inserts, execute the entire batch in a single transaction to improve throughput. Otherwise, each insert is committed individually, requiring a synchronous flush of the log buffer to disk for each insert to harden the transaction.
EDIT:
I see from your comment that you are using Entity Framework. This answer may help you use SqlBulkCopy with EF.
I am creating an application that takes data from a text file which has sales data from Amazon market place.The market place has items with different names compared to the data in our main database. The application accepts the text file as input and it needs to check if the item exists in our database. If not present I should throw an option to save the item to a Master table or to Sub item table and map it to a master item. My question is if the text file has 100+ items should I hit the database each time to check if the data exists there.Is there any better way of doing that so that we can minimize the database hits.
I have two options that i have used earlier
Hit database and check if it exists in table.
Fill the data in a DataTable and use DataTable.Select to check if it exists.
Can some one tell me the best way to do this?. I have to check two tables (master table, subItem table), maybe 1 at a time. Thanks.
Update:
#Downvoters add an comment .
i am not asking you whats the way to check if an item exists in database.I just want to know the best way of doing that. Should I be hitting database 1000 times if an file has 1000 items? That's my question.
The current query I use:
if exists (select * from [table] where itemname= [itemname] )
select 'True'
else
select 'False'
return
(From Chat)
I would create a Stored Procedure which takes a table valued parameter of all the items that you want to check. You can then use a join (a couple of options here)* to return a result set of items and whether each one exists or not. You can use TVP's from ADO like this.
It will certainly handle the 100 to 1000 row range mentioned in your post. To be honest, I haven't used it in the 1M+ range.
in newer versions of SQL Server, I would prefer TVP's over using an xml input parameter, as it is really quite cumbersome to pack the xml in your .Net code and then unpack it again in your SPROC.
(*) Re Joins : With the result set, you can either just inner join the TVP to your items / product table and check in .Net if the row doesn't exist, or you can do an left outer join with the TVP as the left table, and e.g. ISNULL() missing items to 0 / 'false' etc.
Make it as batch of 100 items to the database. probably a stored procedure might help, since repetitive queries has to be fired. If the data is not changed frequently, you can consider caching. I assume you will be making service calls from ur .net application, so ingest a xml from back end, in batches. Consider increasing batch size based on the filesize.
If your entire application is local, batch size size may very high, as there is no netowrk oberhead, still dont make 100 calls to db.
Try like this
SELECT EXISTS(SELECT * FROM table1 WHERE itemname= [itemname])
SELECT EXISTS(SELECT 1 FROM table1 WHERE itemname= [itemname])
I'm looking for an efficient way of inserting records into SQL server for my C#/MVC application. Anyone know what the best method would be?
Normally I've just done a while loop and insert statement within, but then again I've not had quite so many records to deal with. I need to insert around half a million, and at 300 rows a minute with the while loop, I'll be here all day!
What I'm doing is looping through a large holding table, and using it's rows to create records in a different table. I've set up some functions for lookup data which is necessary for the new table, and this is no doubt adding to the drain.
So here is the query I have. Extremely inefficient for large amounts of data!
Declare #HoldingID int
Set #HoldingID = (Select min(HoldingID) From HoldingTable)
While #JourneyHoldingID IS NOT NULL
Begin
Insert Into Journeys (DepartureID, ArrivalID, ProviderID, JourneyNumber, Active)
Select
dbo.GetHubIDFromName(StartHubName),
dbo.GetHubIDFromName(EndHubName),
dbo.GetBusIDFromName(CompanyName),
JourneyNo, 1
From Holding
Where HoldingID = #HoldingID
Set #HoldingID = (Select MIN(HoldingID) From Holding Where HoldingID > #HoldingID)
End
I've heard about set-based approaches - is there anything that might work for the above problem?
If you want to insert a lot of data into a MSSQL Server then you should use BULK INSERTs - there is a command line tool called the bcp utility for this, and also a C# wrapper for performing Bulk Copy Operations, but under the covers they are all using BULK INSERT.
Depending on your application you may want to insert your data into a staging table first, and then either MERGE or INSERT INTO SELECT... to transfer those rows from the staging table(s) to the target table(s) - if you have a lot of data then this will take some time, however will be a lot quicker than performing the inserts individually.
If you want to speed this up then are various things that you can do such as changing the recovery model or tweaking / removing triggers and indexes (depending on whether or not this is a live database or not). If its still really slow then you should look into doing this process in batches (e.g. 1000 rows at a time).
This should be exactly what you are doing now.
Insert Into Journeys(DepartureID, ArrivalID, ProviderID, JourneyNumber, Active)
Select
dbo.GetHubIDFromName(StartHubName),
dbo.GetHubIDFromName(EndHubName),
dbo.GetBusIDFromName(CompanyName),
JourneyNo, 1
From Holding
ORDER BY HoldingID ASC
you (probably) are able to do it in one statement of the form
INSERT INTO JOURNEYS
SELECT * FROM HOLDING;
Without more information about your schema it is difficult to be absolutely sure.
SQLServer 2008 introduced Table Parameters. These allow you to insert multiple rows in a single trip to the database (send it as a large blob). Without using a temporary table. This article describes how it works (step four in the article)
http://www.altdevblogaday.com/2012/05/16/sql-server-high-performance-inserts/
It differs from bulk inserts in that you do not need special utilities and that all constraints and foreign keys are checked.
I quadrupled my throughput using this and parallelizing the inserts. Now at 15.000 inserts/second in the same table sustained. Regular table with indexes and over a billion rows.
I am using VSTS 2008 + C# + .Net 3.0 + ADO.Net + SQL Server 2008. And from ADO.Net I am invoking a stored procedure from SQL Server side. The stored procedure is like this,
SELECT Table1.col2
FROM Table1
LEFT JOIN Table2 USING (col1)
WHERE Table2.col1 IS NULL
My question is, how to retrieve the returned rows (Table1.col2 in my sample) efficiently? My result may return up to 5,000 rows and the data type for Table1.col2 is nvarchar (4000).
thanks in advance,
George
You CANNOT - you can NEVER retrieve that much data efficiently....
The whole point of being efficient is to limit the data you retrieve - only those columns that you really need (no SELECT *, but SELECT (list of fields), which you already do), and only as much rows as you can handle easily.
For instance, you don't want to fill a drop down or listbox where the user needs to pick a single value with thousands of entries - that's just not feasible.
So I guess my point really is: if you really, truly need to return 5000 rows or more, it'll just take its time. There's not much you can do about that (if you transmit 5000 rows with 5000 bytes per row, that's 25'000'000 bytes or 25 megabytes - no magic go make that go fast).
It'll only go really fast if you find a way to limit the number of rows returned to 10, 20, 50 or so. Think: server-side paging!! :-)
Marc
You don't say what you want to do with the data. However, assuming you need to process the results in .NET then reading the results using an SqlDataReader would be the most efficient way.
I'd use exists for one.
SELECT
Table1.col2
FROM
Table1
WHERE
NOT EXISTS (SELECT *
FROM
Table2
WHERE
Table2.col1 = Table1.col1)
The query can be efficient (assume col1 is indexed but covers cols (very wide index of course), but you still have to shovel a lot of data over the network.
It depends what you mean by performance. 5000 rows isn't much for a report but it's a lot for a combo box