How to directly scan a SQL indexed table

How to directly scan a SQL indexed table - c#

I have a SQL table with indexes containing leads to call. About 30 users will be calling these leads. To be sure that there is no two users calling the same lead, the system has to be instant.
So I would like to go this way:
Set the table to the right index
Scan the table for a lead I can call (there are conditions), following the index
When I have a call, indicate that the record is "in use"
Here are my issues:
- I can't find any way to set a table to an index by c# code
- Linq requires dataContext (not instant) and ADO requires DataSet
I have not found any resource to help me on that. If you have any, they are more than welcome.
Sorry if I may sound ignorant, I'm new to SQL databases.
Thank you very much in advance!
Mathieu

I've worked on similar systems before. The tact we took was to have a distribution routine that handled passing out the leads to the call center people. Typically we had a time limit on how long the lead was allowed to be in any one users queue before it was yanked away and given to someone else.
This allowed us to do some pretty complicated things like giving preference based on details about the lead as well as productivity of the individual call center person.
We had a very high volume of leads that came in and had our distribution routine set to run once a minute. The SLA was set so that a lead was contacted within 2 minutes of us knowing about them.
To support this, your leads table should have a AssignedUserId and probably a date/time stamp of when it was assigned. Write a proc or some c# code which grabs all the records from that table which aren't assigned. Do the assignment routine, saving the changes back to the table. This routine should probably take into account how many leads they are currently working and the acceptable number of open leads per person in order to give preference in a round robin distribution.
When the user refreshes they will have their leads. You can control the refresh rate in the UI.

I don't see how your requirement of being "instant" relates to the use of an index. Accessing a table by index is not instantaneous either.
To solve your problem, I would suggest to lock the whole table while a lead is being called. This will limit performance, but it will also ensure that the same lead is never called by two users.
Example code:
Begin Transaction
Lock Table
Search for Lead
Update Lead to indicate that it is in use
Commit Transaction (removes the lock)
Locking a table in SQL Server until the end of the transaction can be done by using SELECT * FROM table WITH (HOLDLOCK, TABLOCKX) WHERE 1=0.
Disclaimer: Yes, I'm aware that cleaner solutions with less locking are possible. The advantage of the above solution is that it is simple (no worrying about the correct transaction isolation level etc.) and it is usally performant enough (if you remember to keep the "locked part" short and there is not too much concurrent access).

Related

Deadlock on transaction with multiple tables

My scenario is common:
I have a stored procedure that need to update multiple tables.
if one of updates failed - all the updates should be rolled back.
the strait forward answer is to include all the updates in one transaction and just roll that back. however, in system like ours , this will cause concurrency issues.
when we break the updates into multiple short transactions - we get throughput of ~30 concurrent executions per second before and deadlocking issues start to emerge.
if we put it to one transaction which span all of them - we get concurrent ~2 per second before deadlock shows up.
in our case, we place a try-catch block after every short transaction, and manually DELETE/Update back the changes from the previous ones. so essentially we mimic the transaction behavior in a very expensive way...
It is working alright since its well written and dont get many "rollbacks"...
one thing this approach cannot resolve at all is a case of command timeout from the web server / client.
I have read extensively in many forms and blogs and scanned through the MSDN and cannot find a good solution. many have presented the problem but I am yet to see a good solution.
The question is this: is there ANY solution to this issue that will allow a stable rollback of update to multiple tables, without require to establish exclusivity lock on all of the rows for the entire duration of the long transaction.
Assume that it is not an optimization issue. The tables are almost at the max optimization probably, and can give a very high throughput as long as deadlock don't hit it. there are no table locks/page locks etc. all row locks on updates - but when you have so many concurrent sessions some of them need to update the same row...
it can be via SQL, client side C#, server side C# (extend the SQL server?).
Is there such solution in any book/blog that i have not found?
we are using SQL server 2008 R2, with .NET client/web server connecting to it.
Code example:
Create procedure sptest
Begin transaction
Update table1
Update table2
Commit transaction
In this case, if sptest is run twice, the second instance cannot update table 1 until instance 1 has committed.
Compared to this
Create sptest2
Update table1
Update table2
Sptest2 has a much higher throughput - but it has chance to corrupt the data.
This is what we are trying to solve. Is there even a theoretical solution to this?
Thanks,
JS

I would say that you should dig deeper to find out the reason why deadlock occurs. Possibly you should change the order of updates to avoid them. Maybe some index is "guilty".
You cannot roolback changes if other transactions can change data. So you need to have update lock on them. But you can use snapshot isolation level to allow consistent reads before update commits.

For all inner joined tables that are mostly static or with a high degree of probability not effect the query by using dirty data then you can apply:
INNER JOIN LookupTable (with NOLOCK) lut on lut.ID=SomeOtherTableID
This will tell the query that I do not care about updates made to SomeOtherTable
This can reduce your issue in most cases. For more difficult deadlocks I have implemented a deadlock graph that is generated and emailed when a deadlock occurs contains all the detailed info for the deadlock.

Is Data Reader better or Data Set for application where we may have Concurrency issue

I know the difference between Data Reader and Data Set.
The DataReader is a better choice for applications that require optimized read-only, fast and forward-only data access.
The Data set is better for Application wherein you can get all the data and update it according to your needs at application level and submit the change to the database.
Please clear if there is anything wrong in my understanding.
Now I had an interview there a person asked me. Is datareader or Connected architecture good for application like ticketing system. Basically she meant were many user might be trying to update the same table. Thus the concept of Concurrency comes.
We can Use Disconnected architecture to check for concurrency and let only one user update the table at a time. But dont know how it happens in terms of connected Architecture. Does the connection to the data base and particularly to the table concerned would that make only one user do the update while others who try to hit later wont be able to do that.
Wont it affect the performance if all the user have opened a connection as database will reach bottle neck.
I hope i will get the answer to understand them.

I think its not a matter of which one is better, since data is already old/invalid once it reaches the client. Showing a table of reservations can be useful to get a rough view of what reservations are made, but it might be totally different within the next second. You want to eliminate race conditions. A good architecture is necessary to start with.
One way to do this is to 'reserve' the ticket [1]. The application asks to get a ticket that is available given the matched criteria. At this point its a known fact on whether the ticket is available or not. If it was available, it was already reserved as well. This avoids multiple reservations for one ticket. The next reservation (same operation/action) will result into a different ticket being reserved. You can always add information to this ticket later (such as the owner of the ticket and his/her information) if required. Tickets that do not have information attached to it, will timeout after a certain amount of minutes and will return back to the pool. These tickets can be 'reserved' again [1].
[1] To avoid multiple assignments, use optimistic locking.
To answer the question, I would say DataReader. It keeps the database communication to a minimum (load and locks), so it can handle updates as fast as possible. Just keep in mind picking one over another doesn't solve concurrency problems. It's the total solution that matters.
Example
I don't know the requirements, but since it's an interview question I'll give an example. Don't take this as a golden rule, but off the tip of my head it would be something like this:
(if required) First the user is shown a screen that there are tickets left in the system that can be reserved. Open a connection, and a reader to read the amount of tickets available for reservation. Close the reader and connection. The user proceeds to the next screen.
SELECT COUNT(*)
FROM [Tickets]
WHERE ([LastReserved] IS NULL OR [LastReserved] <= DATEADD(MINUTE, GETDATE(), #ticketTimeout))
AND [TickedAssignedToUserId] IS NULL;
The user requests an x-amount of tickets and proceeds to the next screen. At this moment the system checks with optimistic locking if there are enough tickets available. Simply open a connection (with transaction!) and execute the following query:
UPDATE TOP(#numberOfTicketsRequested) [Tickets]
SET [LastReserved]=GETDATE()
WHERE ([LastReserved] IS NULL OR [LastReserved] <= DATEADD(MINUTE, GETDATE(), #ticketTimeout))
AND [TickedAssignedToUserId] IS NULL;
The number of rows affected should be the same as #numberOfTicketsRequested. If this is the case, commit the transaction and get it's ticket identifier. Otherwise rollback and tell the user that there are no tickets available anymore. At this point we need the record information, so you might want to get the identifier as well.
At this point, the user gets #ticketTimeout amount of minutes time to enter their user details. If done correctly, the following query can be executed:
UPDATE TOP(#numberOfTicketsRequested) [Tickets]
SET [TickedAssignedToUserId]=#userId
WHERE [TicketId]=#Id AND [LastReserved]=#lastReserved AND [TickedAssignedToUserId] IS NULL;
If the user took longer than, say 10 minutes, and somebody else requested the same ticket again, then the LastReserved timestamp has changed. When the first user tried to reserve the ticket with their details, the update does not match the original LastReserved timestamp anymore, and the update will show not enough rows affected (=rollback). If it matches the number of rows affected, the user successfully reserved the tickets (=commit).
Note that no ticket information except for ticket identifiers have reached the application. Nor have I included user registration. No full tables are being passed, and locks are just being used minimally (just for two short updates).

Calling same method with different parameters - How to use multithreading?

Here is the scenario:
I have a dll which has method that gets data from db, depending on parameters passed, does various checks and gives me required data.
GetGOS_ForBill(AgencyCode)
In a windows application, I have listbox which list 500 + agencies.
I retrieve GOS for each agency append to a generic list.
If the user has selected all agencies (500 + for now), it takes about 10 min. to return data from the dll.
We though about background processing. But that doesn't reduce the time, other than user get to do other things on the screen. Considering multithreading.
Can anybody help me with this? What would be right approach and how can we accomplish with multithreading?

By the way you ask I think you don't have much experience with multithreading and multithreading is not a topic to just be improvised and throw away via a Stackoverflow quesiton. I would strongly advice against using multithreading if you don't know what you're doing... instead of one problem you'll have two.
In your case the performance problem does not have to do with using threading to get a parallel workload but with correctly structuring the problem.
Right now you're querying each agency separately which is working fine for a couple of agencies but is degrading quickly. The query itself is probably fast, the problem is you're running that query 500 times. Instead of that why don't you try to get all the GOS for all the agencies in a single query (which is probably gonna be fast) and store that in memory (say a Dictionary). Then just retrieve the appropiate set of GOS when needed.
If the most usual case is a user just selecting a couple of them you can always establish a threshold... if the selected number is less than, say, 30 do each query, otherwise run the general query and retrieve from memory.

Entity Framework timeout error due to database block

Have a project that uses the entity framework (v1 with .NET 3.5). It's been in use for a few years, but it's now being used by more people. Started getting timeout errors and have tracked it down to a few things. For simplicity sake let's say my database has three tables, product, part, and product_part. There are ~1400 parts and a handful of products.
The user has the ability to add any number of parts to a product. My problem is that when there are many parts added to the product the inserts take a long time. I think it's mostly due to network traffic/delay, but to insert all 1400 takes around a minute. If someone goes in and tries to view the details of a part while those records are being inserted I get a timeout and can see a block in the Activity Monitor of SQL Server.
What can I do to avoid this? My apologies if this has been asked before and I missed it.
Thanks,
Nick

I think the root problem is that your write transaction is taking so long. EF is not good at executing mass DML. It executes each insert in a separate network roundtrip and separate statement.
If you want to insert 1400 rows, and performance matters, do the insert in one single statement using TVP's (INSERT ... SELECT * FROM #tvp). Or switch to bulk-copy but I don't think that will be advantageous at only 1400 rows.
If your read transactions are getting blocked, and this is a problem, switch on snapshot isolation. That takes care of the readers 100% as they never block under snapshot isolation.

Scalability and availability

I am quite confused on which approach to take and what is best practice.
Lets say i have a C# application which does the following:
sends emails from a queue. Emails to send and all the content is stored in the DB.
Now, I know how to make my C# application almost scalable but I need to go somewhat further.
I want some form of responsibility of being able to distribute the tasks across say X servers. So it is not just 1 server doing all the processing but to share it amoungst the servers.
If one server goes down, then the load is shared between the other servers. I know NLB does this but im not looking for an NLB here.
Sure, you could add a column of some kind in the DB table to indicate which server should be assigned to process that record, and each of the applications on the servers would have an ID of some kind that matches the value in the DB and they would only pull their own records - but this I consider to be cheap, bad practice and unrealistic.
Having a DB table row lock as well, is not something I would do due to potential deadlocks and other possible issues.
I am also NOT indicating using threading "to the extreme" here but yes, there will be threading per item to process or batching them up per thread for x amount of threads.
How should I approach and what do you recommend on making a C# application which is scalable and has high availability? The aim is to have X servers, each with the same application and for each to be able to get records and process them but have the level of processing/items to process shared amoungst the servers so incase if one server or service fails, the other can take on that load until another server is put back.
Sorry for my lack of understanding or knowledge but have been thinking about this quite alot and had lack of sleep trying to think of a good robust solution.

I would be thinking of batching up the work, so each app only pulled back x number of records at a time, marking those retrieved records as taken with a bool field in the table. I'd amend the the SELECT statement to pull only records not marked as taken/done. Table locks would be ok in this instance for very short periods to ensure there is no overlap of apps processing the same records.
EDIT: It's not very elegant, but you could have a datestamp and a status for each entry (instead of a bool field as above). Then you could run a periodic Agent job which runs a sproc to reset the status of any records which have a status of In Progress but which have gone beyond a time threshold without being set to complete. They would be ready for reprocessing by another app later on.
This may not be enterprise-y enough for your tastes, but I'd bet my hide that there are plenty of apps out there in the enterprise which are just as un-sophisticated and work just fine. The best things work with the least complexity.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.