Multi Connections to DB - c#

I have a project where at pick times the site will get 1000 calls per secs this calls need to be saved in the DB ( MS-SQL DB ) .
What is the best practice to manage this large scale connections.
I am using .net C# .
Currently building this as a site that get all the calls in post way.
Thanks For your answers.

From my experience the fastest way to insert a large amount of data is to use a GUID as PK. If you use an autoincrement the sql will write the rows one by one to get the next ID.
But if there is a GUID, the server will write all your data at the "same" time.
Depending on the ressources of your production machine there are several approaches:
1fst: Store all the requests in one big list and your application writes all the requests synchronously into your database. Pro: pretty easy to code, Con: could be a real bottleneck
2nd: Store all the requests in one big list and your application starts a thread for every 1000 requests that you received, these threads write the data into your database asynchronously. Pro: Should be faster, Con: hard to code and hard to maintain.
3rd: Store the request inside the memory of the server and write the data into the database when the application is not busy.

Related

Handling limitations in multithreaded server

In my client-server architecture I have few API functions which usage need to be limited.
Server is written in .net C# and it is running on IIS.
Until now I didn't need to perform any synchronization. Code was written in a way that even if client would send same request multiple times (e.g. create sth request) one call will end with success and all others with error (because of server code + db structure).
What is the best way to perform such limitations? For example I want no more that 1 call of API method: foo() per user per minute.
I thought about some SynchronizationTable which would have just one column unique_text and before computing foo() call I'll write something like foo{userId}{date}{HH:mm} to this table. If call end with success I know that there wasn't foo call from that user in current minute.
I think there is much better way, probably in server code, without using db for that. Of course, there could be thousands of users calling foo.
To clarify what I need: I think it could be some light DictionaryMutex.
For example:
private static DictionaryMutex FooLock = new DictionaryMutex();
FooLock.lock(User.GUID);
try
{
...
}
finally
{
FooLock.unlock(User.GUID);
}
EDIT:
Solution in which one user cannot call foo twice at the same time is also sufficient for me. By "at the same time" I mean that server started to handle second call before returning result for first call.
Note, that keeping this state in memory in an IIS worker process opens the possibility to lose all this data at any instant in time. Worker processes can restart for any number of reasons.
Also, you probably want to have two web servers for high availability. Keeping the state inside of worker processes makes the application no longer clustering-ready. This is often a no-go.
Web apps really should be stateless. Many reasons for that. If you can help it, don't manage your own data structures like suggested in the question and comments.
Depending on how big the call volume is, I'd consider these options:
SQL Server. Your queries are extremely simple and easy to optimize for. Expect 1000s of such queries per seconds per CPU core. This can bear a lot of load. You can use a SQL Express for free.
A specialized store like Redis. Stack Overflow is using Redis as a persistent, clustering-enabled cache. A good idea.
A distributed cache, like Microsoft Velocity. Or others.
This storage problem is rather easy because it fits a key/value store model well. And the data is near worthless so you don't even need to backup.
I think you're overestimating how costly this rate limitation will be. Your web-service is probably doing a lot more costly things than a single UPDATE by primary key to a simple table.

Scalability and availability

I am quite confused on which approach to take and what is best practice.
Lets say i have a C# application which does the following:
sends emails from a queue. Emails to send and all the content is stored in the DB.
Now, I know how to make my C# application almost scalable but I need to go somewhat further.
I want some form of responsibility of being able to distribute the tasks across say X servers. So it is not just 1 server doing all the processing but to share it amoungst the servers.
If one server goes down, then the load is shared between the other servers. I know NLB does this but im not looking for an NLB here.
Sure, you could add a column of some kind in the DB table to indicate which server should be assigned to process that record, and each of the applications on the servers would have an ID of some kind that matches the value in the DB and they would only pull their own records - but this I consider to be cheap, bad practice and unrealistic.
Having a DB table row lock as well, is not something I would do due to potential deadlocks and other possible issues.
I am also NOT indicating using threading "to the extreme" here but yes, there will be threading per item to process or batching them up per thread for x amount of threads.
How should I approach and what do you recommend on making a C# application which is scalable and has high availability? The aim is to have X servers, each with the same application and for each to be able to get records and process them but have the level of processing/items to process shared amoungst the servers so incase if one server or service fails, the other can take on that load until another server is put back.
Sorry for my lack of understanding or knowledge but have been thinking about this quite alot and had lack of sleep trying to think of a good robust solution.
I would be thinking of batching up the work, so each app only pulled back x number of records at a time, marking those retrieved records as taken with a bool field in the table. I'd amend the the SELECT statement to pull only records not marked as taken/done. Table locks would be ok in this instance for very short periods to ensure there is no overlap of apps processing the same records.
EDIT: It's not very elegant, but you could have a datestamp and a status for each entry (instead of a bool field as above). Then you could run a periodic Agent job which runs a sproc to reset the status of any records which have a status of In Progress but which have gone beyond a time threshold without being set to complete. They would be ready for reprocessing by another app later on.
This may not be enterprise-y enough for your tastes, but I'd bet my hide that there are plenty of apps out there in the enterprise which are just as un-sophisticated and work just fine. The best things work with the least complexity.

Recommended database/storage for thousands of transactions per second

I need to be able to insert thousands of transactions per second into some form of storage, and be able to query it quickly.
What I would like to do is log all TCP requests (the key being the source or dest IP Address).
Then when another requests comes in check if a prior request by the IP is in the store/database/cache and perform the appropriate action.
SQLExpress, SQLCompact, SQLIte, MongoDB, Firebird ? Are any of these fast enough ?
Should I be just use an in memory data structure of some description ?
The store needs to be accessible by multiple threads concurrently...
Any suggestions please...
UPDATE: RaptorDB ? Is it any good ? Worth considering along with the other Databases/noSQL above ?
http://www.codeproject.com/KB/database/RaptorDB.aspx
At these speeds, your bottleneck is likely to be your disk; a standard SATA drive can do only a few tens of transactions per second. To supplement the disk, throw lots of RAM at it so that the database is stored mostly in memory.
Alternatively, just do it all in memory by rolling your own logger that uses an associative array. Do you need to log ALL data being sent, or just the fact that PC A sent data to PC B?

SQL Design: Big table, thread access serialization

I have one BIG table(90k rows, size cca 60mb) which holds info about free rooms capacities for about 50 hotels. This table has very few updates/inserts per hour.
My application sends async requests to this(and joined tables) at max 30 times per sec.
When i start 30 threads(with default AppPool class at .NET 3.5 C#) at one time(with random valid sql query string), only few(cca 4) are processed asynchronously and other threads waits. Why?
Is it becouse of SQL SERVER 2008 table locking, or becouse of .NET core? Or something else?
If it is a SQL problem, can help if i split this big table into one table per each hotel model?
My goal is to have at least 10 threads servet at a time.
This table is tiny. It's doesn't even qualify as a "medium sized" table. It's trivial.
You can be full table scanning it 30 times per second, you can be copying the whole thing in ram, no server is going to be the slightest bit bothered.
If your data fits in ram, databases are fast. If you don't find this, you're doing something REALLY WRONG. Therefore I also think the problems are all on the client side.
It is more than likely on the .NET side. If it were table locking more threads would be processing, but they would be waiting on their queries to return. If I remember correctly there's a property for thread pools that controls how many actual threads they create at once. If there are more pending threads than that number, then they get in line and wait for running threads to finish. Check that.
Have you tried changing the transaction isolation level?
Even when reading from a table Sql Server will lock the table
try setting the isolation level to read uncommitted and see if that improves the situation,
but be advised that its feasible that you will read 'dirty' data make sure you understand the ramifications if this is the solution
this link explains what it is.
link text
Rather than ask, measure. Each of your SQL queries that is actually submitted by your application will create a request on the server, and the sys.dm_exec_requests DMV shows the state of each request. When the request is blocked the wait_type column shows a non-empty value. You can judge from this whether your requests are blocked are not. If they are blocked you'll also know the reason why they are blocked.

Is there a fast and scalable solution to save data?

I'm developing a service that needs to be scalable in Windows platform.
Initially it will receive aproximately 50 connections by second (each connection will send proximately 5kb data), but it needs to be scalable to receive more than 500 future.
It's impracticable (I guess) to save the received data to a common database like Microsoft SQL Server.
Is there another solution to save the data? Considering that it will receive more than 6 millions "records" per day.
There are 5 steps:
Receive the data via http handler (c#);
Save the received data; <- HERE
Request the saved data to be processed;
Process the requested data;
Save the processed data. <- HERE
My pre-solution is:
Receive the data via http handler (c#);
Save the received data to Message Queue;
Request from MSQ the saved data to be processed using a windows services;
Process the requested data;
Save the processed data to Microsoft SQL Server (here's the bottleneck);
6 million records per day doesn't sound particularly huge. In particular, that's not 500 per second for 24 hours a day - do you expect traffic to be "bursty"?
I wouldn't personally use message queue - I've been bitten by instability and general difficulties before now. I'd probably just write straight to disk. In memory, use a producer/consumer queue with a single thread writing to disk. Producers will just dump records to be written into the queue.
Have a separate batch task which will insert a bunch of records into the database at a time.
Benchmark the optimal (or at least a "good" number of records to batch upload) at a time. You may well want to have one thread reading from disk and a separate one writing to the database (with the file thread blocking if the database thread has a big backlog) so that you don't wait for both file access and the database at the same time.
I suggest that you do some tests nice and early, to see what the database can cope with (and letting you test various different configurations). Work out where the bottlenecks are, and how much they're going to hurt you.
I think that you're prematurely optimizing. If you need to send everything into a database, then see if the database can handle it before assuming that the database is the bottleneck.
If the database can't handle it, then maybe turn to a disk-based queue like Jon Skeet is describing.
Why not do this:
1.) Receive data
2.) Process data
3.) Save original and processsed data at once
That would save you the trouble of requesting it again if you already have it. I'd be more worried about your table structure and your database machine then the actual flow though. I'd be sure to make sure that your inserts are as cheap as possible. If that isn't possible then queuing up the work makes some sense. I wouldn't use message queue myself. Assuming you have a decent SQL Server machine 6 million records a day should be fine assuming you're not writing a ton of data in each record.

Categories