I'm new in programming and databases. I've started learning Mysql and C#. So, I created a really simple test program in C# to test how many inserts can do in a minute. (Just a simple infinite loop to insert a simple text into a column) I am watching the dashboard in MySQL Workbench and the problem is that the program can only insert 1000 queries/second. If I run 2-3 instances of the program at the same time, I can see 2-3 * 1000 queries/second.
Is there any limit in MySQL?
There's no built-in limit to insert rates.
Lots of things come into play when you're trying for high insert rates. For example.
How big is each row you're inserting?
How complex are the indexes on your target table? Index updates take time during INSERT operations.
Which access method does your table use? MyISAM is transactionless, so a naive program can push more rows/sec. InnoDB has transactions, so doing your inserts in batches of 1000 or so, wrapped in BEGIN / COMMIT statements, can speed things up.
How fast are the disks / ssds on your server? How much RAM does it have?
How fast are your client machines and the network between them and the MySQL server?
Are other programs trying to read the target table at the same time you're doing inserts?
You've mentioned that your total insert rate scales up approximately linearly for 2-3 instances of your insert program. That means the bottleneck is in your insert program, not the server, at that scale.
C#, like many language frameworks, offers prepared statements. They are a way to write a query once and use it over and over with different data values. It's faster. It's also safer if your data comes from an untrusted source (look up SQL injection).
MySQL lets you insert multiple rows with a single INSERT operation. Faster.
INSERT INTO TABLE tbl (a, b, c)
VALUES (1,2,3),(4,5,6),(7,8,9)
MySQL offers the LOAD DATA INFILE statement. You can get astonishingly high bulk load rates with that statement if you need them..
Related
I have to insert 40 million records from a .csv file into a database and below is the process I followed.
Windows Service 1:
Reading the CSV
validating the records
inserting valid records into success table (intermediate table) using SqlBulkCopy.
Windows Service 2.
Getting 10 000 records from success table at a time
running a foreach for these 10 000
sending each record to database to insert into the main table.
Windows Service 1 takes about 30-40 Min but windows service 2 takes about 5 Hours to complete the task (minimum time). I have 2 ways to do this but cannot decide which is better and open for suggestions.
Creating 4 separate windows service and processing 40000 records simultaneously
Using a job watch we can use while loop
Calling procedure async from windows service
My biggest doubt here is that we are using transactions in procedure and will async work with it because as per my assumption using transaction locks the table and other process needs to work on it.
I think your using the wrong tool for this job.
c# apps might do the trick but there is a much more powerful way to do this using integration services.
I am guessing here, but these tables in the middle are to transform or check, or maybe to batch the import down?
SSIS can do all of these things using it log limit and SQL Bulk Import tools. I currently do hospital data imports which is around 8,000,000 records each night and it takes me a matter of minutes not hours to do.
A good read too around how SQL deals with such large data inputs is this article
I am working with aws sqs queue. The queue may having massive messages i.e if i do not process there will be more than a million mesasge per hour.
I am processing all the messages and putting them into a mysql table. Innodb with 22 columns. Insert on Duplicate Key Update. I have a primary key and unique key.
I am working with C# where i ran 80 threads in order to pull messages from sqs.
I applied transaction in c# run the query like "insert on duplicate key update"
at the same time i am using lock in c# so only single thread can update the table. if id do not use C# lock then an exception is thrown from mysql dead lock occured.
Problem is here i can see there are a lot of threads are waiting before C# lock and this time gradually increasing. Can any body suggest me what is the best way to do this..
Note, i have 8GB RAM intell xeon 2.53 with 1GE internet speed. please suggest me in this regard.
If I were to do it, the C# program would primarily be creating the CSV file to empty your SQS queue. Or at least a significant chunk of it. The file would then be used for bulk insert into an empty non-indexed in anyway worktable. I would steer for non-temporary but whatever. I see no reason to add temporary to the mix when this is recurring, and when done the worktable is truncated anyway.
The bulk insert would be achieved through LOAD DATA FROM INFILE fired off from the c# program. Alternatively, a value in a new row in some other table could be written with an incrementer saying file2 is ready, file3 is ready, and the LOAD happens in an event triggered, say every n minutes. An event that was put together with mysql Create Event. Six of one, half dozen of another.
But the benefits of a sentinal, a mutex, might be of value, as this whole thing happens in batches. And the next batch(es) to be processed need to be suspended while this occurs. Let's call this concept The Blocker, and the one being worked on is row N.
Ok, now your data is in the worktable. And it is safe from being stomped on until processed. Let's say you have 250k rows. Other batches shortly to follow. If you have special processing to have happen, you may wish to create indexes. But at this moment there are none.
You perform a normal insert on duplicate key update (IODKU) to the REAL table using this worktable. It would, in that IODKU follow a normal insert into select pattern, where the select part comes from the worktable.
At the end of that statement, the worktable is truncated, any indexes dropped, row N has its status set to complete, and The Blocker is free to work on row N+1 when it appears.
The indexes are dropped to facilitate the next round of bulk insert, where maintaining indexes is of least importance. And indexes on the worktable may very well be overhead baggage unnecessary during IODKU.
In this manner, you get the best of both worlds
LOAD DATA FROM INFILE
IODKU
And the focus is taken off of multi-threading, a good thing to take one's focus off of.
Here is a nice article on performance and strategies titled Testing the Fastest Way to Import a Table into MySQL. Don't let the mysql version of the title or inside the article scare you away. Jumping to the bottom and picking up some conclusions:
The fastest way you can import a table into MySQL without using raw
files is the LOAD DATA syntax. Use parallelization for InnoDB for
better results, and remember to tune basic parameters like your
transaction log size and buffer pool. Careful programming and
importing can make a >2-hour problem became a 2-minute process. You
can disable temporarily some security features for extra performance
I would separate the C# routine entirely from the actual LOAD DATA and IODKU update effort and leave that to the event mentioned with Create Event for several reasons. Mainly better design. As such the C# program is only dealing with SQS and writing out files with incrementing file #'s.
I am looking for advice on a design to effectively allow scalability and reduce contention for my requirement:
My current environment is using messaging architecture with queues , windows services and sql 2014, c# .net etc. we have a requirement to persist data into a staging table, do some validation on this data and then once validation is complete load the data into a final table and do some more calculations on the data in the final table, once this work flow is complete then clear those specific data rows from the staging table. The load into the staging table can be a few thousand rows per load. Each message in the queue initiates this process and we expect there to be a burst of around 1000 to 2000 messages in a short periods throughout the day.
Due to the nature of using queues and trying to build a scalable architecture, within our window service we want to read a batch of messages of a queue say 10 per read, run a couple of threads in parallel one per message with each thread owning a transaction that is responsible for loading the staging staging table, validating in process the staging data - imagine mapping, enriching etc. and the shifting the data to the final target table, doing some calculations on the final table and then deleting the relative rows in the staging table. Sounds like lots of contention!
Both tables will have a unique index across 3 columns and a PK, but not neccesarily 1 to 1 relationship. Around 20 columns in staging and 15 in final table so not very wide... I would also prefer to not use msdtc as we will be using multiple sql connections. I realize in 2008 Microsoft introduced lightweight sql transaction to reduce the burden.
I have not looked into locking on columnar store tables and would prefer not to use dirty reads as a hack.
If you have successfully done something similar to this before I would really like to hear your comments! PS please note SQL 2014, C#, TPL.
Thanks
We are fighting this problem:
We have a big table in SQL Server 2008 R2 (R2 !) (millions of rows)
We need to walk through this table and each row needs to be recomputed by a C# code (so the table is loaded from C# application)
How to do that if performance is crucial? Some kind of batching?
If performance is crucial, use SQL CLR to compute the new values. You can just use an update statement that way. SQL CLR places restrictions on what code you can use so this might not be an easy option.
There is another way if SQL CLR is not an option:
I assume there is an integer primary key. I'd do it like this:
request a batch of rows (10k or so)
compute the updates
send the updates
repeat
The trick is to keep track of which rows you already processed. Keep the ID value of the last row processed. Request a batch like this:
select top 10000 *
from T
where ID > #lastIDProcessed
order by ID
That will guarantee fast an reliable execution.
We have a application (written in c#) to store live stock market price in the database (SQL Server 2005). It insert about 1 Million record in a single day. Now we are adding some more segment of market into it and the no of records would be double (2 Millions/day).
Currently the average record insertion per second is about 50, maximum is 450 and minimum is 0.
To check certain conditions i have used service broker (asynchronous trigger) on my price table. It is running fine at this time(about 35% CPU utilization).
Now i am planning to create a in memory dataset of current stock price. we would like to do some simple calculations.
Currently i am using xml batch insertion method. (OPENXML in Storred Proc)
I want to know different views of members on this.
Please provide your way of dealing with such situation.
Your question is reading, but title implies writing?
When reading, consider (bit don't blindly use) temporary tables to cache data if you're going to do some processing. However, by simple calculations I assume aggregates live AVG, MAX etc?
It would generally be inane to drag data around, cache it in the client and aggregate it there.
If batch uploads:
SQLBulkCopy or similar to a staging table
Single write from staging to final table with
If single upload, just insert it
A million rows a day is a rounding error for what SQL Server ('Orable, MySQL, DB2 etc) is capable of
Example: 35k transaction (not rows) per second