c# how to implement Data Base which is queried by threads - c#

hello there i have a situation the entities are customerManager warehouse customer and suppliers
my goals are :
the warehouse is singletone and open db in runtime.
the customerManager manage customers as threads who query the warehouse and update it (after buying staff).
when one of the items in the warehouse is run out of we ask supplier in a diffrent thread to supply it for us ' while the supplier does his thing (let's assume it's something like 5 seconds ) the customer waits(in queue) and invoked when the supplier method returned true (let's assume it return true always)..
so my questions are about 3 things :
design - should the customerManager holds inside him the warehouse and the customers ? it seems like the best soultion, does someone recommend otherwise?(c# design topic )
how many threads can go to the db at once ? can a db handle it by himself so i wont need to do it myself ? should I hold for them SqlCommand(s)? should I use dataset or datareader? in other words can someone advice me how to do it ?
should i do for 10 threads :
for (int i = 0 ; i < 10 ; i++)
{
SqlConnection sqlConnection = new SqlConnection(r_ConnectionString);
sqlConnection.Open();
sqlConnection.Close();
}
...so the conection pool would be open for 10 connectiones ?
** database ADO.NET ** topic
how should the threads wait in a queue?(in order to wait to the supplier method to awake them ) how to wake them ? is there a good solution in c# for that ? (c# threads topic)
I think the question is too long but otherwise would be too out of context so I would appreciate if you would write in the the title what question you want to reference.
thank you .

Your worker threads could be fed work via BlockingCollection or ConcurrentQueue.
For connection management you are better off doing this:
using (SqlConnection conn = new SqlConnection(...))
{
}
since this ensures Dispose() gets called for you. As noted in other feedback, you can do this without worrying about actual conn count to the DB since ADO.Net manages a pool of physical connections behind the scenes.
Nobody can tell you whether DataSet or DataReader works best, it depends on your usage of the data once it's loaded. DataReader provides a sequential read of each record in turn, while DataSet provides an in-memory cache of underlying DB data and in that sense is a 'higher-level' abstraction.

SqlConnections are implemented in .net using a connection pool. You do not need to worry about the management of the connections themselves. The only requirement on you is that adfter you open them you call .close. .net will manage the rest for you in an efficient way.
If you wanted to run multiple queries simultaneously then you can call the sqlcommand wtih begin invoke and end invoke.
By using both of these you cna work at a level that does not require you to mange the threads whilst gettting a multi threaded behaviour.
However you shuold read up on ADO.Net because a lot of what you are talking about is unnecessary when you kow how it works.
as for dataset or datareader that depends on your problem, Dataset is a very heavy object though, datareaders are lightweight and fast that allow you to populate a collection fairly easily.
I prefer using linq2sql though or entity framework. ADO.Net is kinda fragile because you ahve to do a lot of casting on data adn manual mapping onto objects that is prone to errors at run time rather than compile time.

Related

Transactions with multiple connections (MySql, C#)

I'd like to ask a question. I've been trying to find some information regarding transactions with multiple connections, but I haven't been able to find any good source of information.
Now for what I'm trying to do. I have code that looks like this:
using (var Connection1 = m_Db.CreateConnection())
using (var Connection2 = m_Db.CreateConnection())
{
Connection1.DoRead(..., (IDataReader Reader) =>
{
// Do stuff
Connection2.DoWrite(...);
Connection2.DoRead(..., (IDataReader Reader) =>
{
// Do more stuff
using (var Connection3 = m_Db.CreateConnection())
{
Connection3.DoWrite(...);
Connection3.Commit(); // Is this even right?
}
});
});
Connection1.DoRead(..., (IDataReader) =>
{
// Do yet more stuff
});
Connection1.Commit();
Connection2.Commit();
}
Each CreateConnection creates a new transaction using MySqlConnection::BeginTransaction. The CreateConnection method creates a Connection object which wraps a MySqlConnection. The DoRead function executes some SQL, and disposes the IDataReader when done.
Every Connection will do a Rollback when disposed.
Now for some notes:
I have ONE server with multiple databases.
I am running MySql server with InnoDB databases.
I am doing both reads and writes to these databases.
For performance reasons and not to mess up the database, I am using transactions.
The code is (at least, for now) entirely serial. There are NO concurrent threads. All inserts and queries are done in serial fashion.
I use multiple connections to the database because a read or write is not allowed while another read is in progress (basically the reader object has not yet been disposed).
I basically want every connection to see all changes. So for example, after Connection 3 does some writes, Connection 1 should see those. But the data should be in the transaction and not written to the database (yet).
Now, as for my questions:
Does this work? Will everything ONLY be committed only once the last Commit function is called? Should I use another approach?
Is this right? Is my approach completely and utterly wrong and silly?
Any drawbacks? Especially regarding performance.
Thanks.
Welp, it seems no one knows. But that's okay.
For now, I just went with the method of just using one connection and reading all the results into a List>, then closing the reader, thereby avoiding the problem of having to use multiple connections.
Might there be performance problems? Maybe, but it's better than having to deal with uncertainty and deadlocks.

Multi Threading with LINQ to SQL

I am writing a WinForms application. I am pulling data from my database, performing some actions on that data set and then plan to save it back to the database. I am using LINQ to SQL to perform the query to the database because I am only concerned with 1 table in our database so I didn't want to implement an entire ORM for this.
I have it pulling the dataset from the DB. However, the dataset is rather large. So currently what I am trying to do is separate the dataset into 4 relatively equal sized lists (List<object>).
Then I have a separate background worker to run through each of those lists, perform the action and report its progress while doing so. I have it planned to consolidate those sections into one big list once all 4 background workers have finished processing their section.
But I keep getting an error while the background workers are processing their unique list. Do the objects maintain their tie to the DataContext for the LINQ to SQL even though they have been converted to List objects? Any ideas how to fix this? I have minimal experience with multi-threading so if I am going at this completely wrong, please tell me.
Thanks guys. If you need any code snippets or any other information just ask.
Edit: Oops. I completely forgot to give the error message. In the DataContext designer.cs it gives the error An item with the same key has already been added. on the SendPropertyChanging function.
private void Setup(){
List<MyObject> quarter1 = _listFromDB.Take(5000).ToList();
bgw1.RunWorkerAsync();
}
private void bgw1_DoWork(object sender, DoWorkEventArgs e){
e.Result = functionToExecute(bgw1, quarter1);
}
private List<MyObject> functionToExecute(BackgroundWorker caller, List<MyObject> myList)
{
int progress = 0;
foreach (MyObject obj in myList)
{
string newString1 = createString();
obj.strText = newString;
//report progress here
caller.ReportProgress(progress++);
}
return myList;
}
This same function is called by all four workers and is given a different list for myList based on which worker is called the function.
Because a real answer has yet to be posted, I'll give it a shot.
Given that you haven't shown any LINQ-to-SQL code (no usage of DataContext) - I'll take an educated guess that the DataContext is shared between the threads, for example:
using (MyDataContext context = new MyDataContext())
{
// this is just some random query, that has not been listed - ToList()
// thus query execution is defered. listFromDB = IQueryable<>
var listFromDB = context.SomeTable.Where(st => st.Something == true);
System.Threading.Tasks.Task.Factory.StartNew(() =>
{
var list1 = listFromDB.Take(5000).ToList(); // runs the SQL query
// call some function on list1
});
System.Threading.Tasks.Task.Factory.StartNew(() =>
{
var list2 = listFromDB.Take(5000).ToList(); // runs the SQL query
// call some function on list2
});
}
Now the error you got - An item with the same key has already been added. - was because the DataContext object is not thread safe! A lot of stuff happens in the background - DataContext has to load objects from SQL, track their states, etc. This background work is what throws the error (because each thread is running the query, the DataContext gets accessed).
At least this is my own personal experience. Having come across the same error while sharing the DataContext between multiple threads. You only have two options in this scenario:
1) Before starting the threads, call .ToList() on the query, making listFromDB not an IQueryable<>, but an actual List<>. This means that the query has already ran and the threads operate on an actual List, not on the DataContext.
2) Move the DataContext definition into each thread. Because the DataContext is no longer shared, no more errors.
The third option would be to re-write the scenario into something else, like you did (for example, make everything sequential on a single background thread)...
First of all, I don't really see why you'd need multiple worker threads at all. (are theses lists in seperate databases / tables / servers? Do you really want to show 4 progress bars if you have 4 lists or are you somehow merging these progress reportings into one weird progress bar:D
Also, you're trying to speed up processing updates to your databases, but you don't send linq to sql any SAVES, so you're not really batching transactions, you'll just save everything at the end in one big transaction, is that really what you're aiming for? the progress bar will just stop at 100% and then spend a lot of time on the SQL side.
Just create one background thread and process everything synchronously, but batch a save transaction every couple of rows (i'd suggest something like every 1000 rows, but you should experiment with this) , it'll be fast, even with millions of rows,
If you really need this multithreaded solution:
The "another blabla with the same key has been added" error suggests that you are adding the same item to multiple "mylists", or adding the same item to the same list twice, otherwise how would there be any errors at all?
Using Parallel LINQ (PLINQ), you can take benefit of multiple CPU cores for processing your data. But if your application is going to run on single-core CPU, then splitting data into peaces wouldn't give you performance benefits instead it will incur some context-change overhead.
Hope it Helps

LINQ-To-SQL NOLOCK (NOT ReadUncommitted)

I've been searching for some time now in here and other places and can't find a good answer to why Linq-TO-SQL with NOLOCK is not possible..
Every time I search for how to apply the with(NOLOCK) hint to a Linq-To-SQL context (applied to 1 sql statement) people often answer to force a transaction (TransactionScope) with IsolationLevel set to ReadUncommitted. Well - they rarely tell this causes the connection to open an transaction (that I've also read somewhere must be ensured closed manually).
Using ReadUncommitted in my application as is, is really not that good. Right now I've got using context statements for the same connection within each other. Like:
using( var ctx1 = new Context()) {
... some code here ...
using( var ctx2 = new Context()) {
... some code here ...
using( var ctx3 = new Context()) {
... some code here ...
}
... some code here ...
}
... some code here ...
}
With a total execution time of 1 sec and many users on the same time, changing the isolation level will cause the contexts to wait for each other to release a connection because all the connections in the connection pool is being used.
So one (of many reasons) for changing to "nolock" is to avoid deadlocks (right now we have 1 customer deadlock per day). The consequence of above is just another kind of deadlock and really doesn't solve my issue.
So what I know I could do is:
Avoid nested usage of same connection
Increase the connection pool size at the server
But my problem is:
This is not possible within near future because of many lines of code re-factoring and it will conflict with the architecture (without even starting to comment whether this is good or bad)
Even though this of course will work, this is what I would call "symptomatic treatment" - as I don't know how much the application will grow and if this is a reliable solution for the future (and then I might end up with a even worse situation with a lot more users being affected)
My thoughts are:
Can it really be true that NoLock is not possible (for each statement without starting transactions)?
If 1 is true - can it really be true no one other got this problem and solved it in a generic linq to sql modification?
If 2 is true - why is this not a issue for others?
Is there another workaround I havn't looked at maybe?
Is the using of the same connection (nested) many times so bad practice that no-one has this issue?
1: LINQ-to-SQL does indeed not allow you to indicate hints like NOLOCK; it is possible to write your own TSQL, though, and use ExecuteQuery<T> etc
2: to solve in an elegant way would be pretty complicated, frankly; and there's a strong chance that you would be using it inappropriately. For example, in the "deadlock" scenario, I would wager that actually it is UPDLOCK that you should be using (during the first read), to ensure that the first read takes a write lock; this prevents a second later query getting a read lock, so you generally get blocking instead of deadlock
3: using the connection isn't necessarily a big problem (although note that new Context() won't generally share a connection; to share a connection you would use new Context(connection)). If seeing this issue, there are three likely solutions (if we exclude "use an ORM with hint support"):
using an explicit transaction (which doesn't have to be TransactionScope - it can be a connection level transaction) to specify the isolation level
write your own TSQL with hints
use a connection-level isolation level (noting the caveat I added as a comment)
IIRC there is also a way to subclass the data-context and override some of the transaction-creation code to control the isolation-level for the transactions that it creates internally.

new objects added during long loop

We currently have a production application that runs as a windows service. Many times this application will end up in a loop that can take several hours to complete. We are using Entity Framework for .net 4.0 for our data access.
I'm looking for confirmation that if we load new data into the system, after this loop is initialized, it will not result in items being added to the loop itself. When the loop is initialized we are looking for data "as of" that moment. Although I'm relatively certain that this will work exactly like using ADO and doing a loop on the data (the loop only cycles through data that was present at the time of initialization), I am looking for confirmation for co-workers.
Thanks in advance for your help.
//update : here's some sample code in c# - question is the same, will the enumeration change if new items are added to the table that EF is querying?
IEnumerable<myobject> myobjects = (from o in db.theobjects where o.id==myID select o);
foreach (myobject obj in myobjects)
{
//perform action on obj here
}
It depends on your precise implementation.
Once a query has been executed against the database then the results of the query will not change (assuming you aren't using lazy loading). To ensure this you can dispose of the context after retrieving query results--this effectively "cuts the cord" between the retrieved data and that database.
Lazy loading can result in a mix of "initial" and "new" data; however once the data has been retrieved it will become a fixed snapshot and not susceptible to updates.
You mention this is a long running process; which implies that there may be a very large amount of data involved. If you aren't able to fully retrieve all data to be processed (due to memory limitations, or other bottlenecks) then you likely can't ensure that you are working against the original data. The results are not fixed until a query is executed, and any updates prior to query execution will appear in results.
I think your best bet is to change the logic of your application such that when the "loop" logic is determining whether it should do another interation or exit you take the opportunity to load the newly added items to the list. see pseudo code below:
var repo = new Repository();
while (repo.HasMoreItemsToProcess())
{
var entity = repo.GetNextItem();
}
Let me know if this makes sense.
The easiest way to assure that this happens - if the data itself isn't too big - is to convert the data you retrieve from the database to a List<>, e.g., something like this (pulled at random from my current project):
var sessionIds = room.Sessions.Select(s => s.SessionId).ToList();
And then iterate through the list, not through the IEnumerable<> that would otherwise be returned. Converting it to a list triggers the enumeration, and then throws all the results into memory.
If there's too much data to fit into memory, and you need to stick with an IEnumerable<>, then the answer to your question depends on various database and connection settings.
I'd take a snapshot of ID's to be processed -- quickly and as a transaction -- then work that list in the fashion you're doing today.
In addition to accomplishing the goal of not changing the sample mid-stream, this also gives you the ability to extend your solution to track status on each item as it's processed. For a long-running process, this can be very helpful for progress reporting restart / retry capabilities, etc.

Is it suggestable to use generics for large amount of data?

I'm having let's say thousands of Customer records and I have to show them on a webform. Also, I have one CustomerEntity which has 10 properties. So when I fetch data in using a DataReader and convert it into List<CustomerEntity> I am required to loop through the data two times.
So is the use of generics suggestable in such a scenario? If yes then what will be my applications performance?
For E.g.
In CustomerEntity class, i'm having CustomerId & CustomerName propeties. And i'm getting 100 records from Customer Table
Then for Preparing List i've wrote following code
while (dr.Read())
{
// creation of new object of customerEntity
// code for getting properties of CustomerEntity
for (var index = 0; index < MyProperties.Count; index++)
{
MyProperty.setValue(CustEntityObject,dr.GetValue(index));
}
//adding CustEntity object to List<CustomerEntity>
}
How can i avoid these two loops. Is their any other mechanism?
I'm not really sure how generics ties into data-volume; they are unrelated concepts... it also isn't clear to me why this requires you to read everything twice. But yes: generics are fine when used in volume (why wouldn't they be?). But of course, the best way to find a problem is profiling (either server performance or bandwidth - perhaps more the latter in this case).
Of course the better approach is: don't show thousands of records on a web form; what is the user going to do with that? Use paging, searching, filtering, ajax, etc - every trick imaginable - but don't send thousands of records to the client.
Re the updated question; the loop for setting properties isn't necessarily bad. This is an entirely appropriate inner loop. Before doing anything, profile to see if this is actually a problem. I suspect that sheer bandwidth (between server and client, or server and database) is the biggest issue. If you can prove that this loop is a problem there are things you can do do optimise:
switch to using PropertyDescriptor (rather than PropertyInfo), and use HyperDescriptor to make it a lot faster
write code with DynamicMethod to do the job - requires some understanding of IL, but very fast
write a .NET 3.5 / LINQ Expression to do the same and use .Compile() - like the second point, but (IMO) a bit easier
I can add examples for the first and third bullets; I don't really want to write an example for the second, simply because I wouldn't write that code myself that way any more (I'd use the 3rd option where available, else the 1st).
It is very difficult what to say the performance will be, but consider these things -
Generics provides type saftey
If you're going to display 10,000 records in the page, your application will probably be unusable. If records are being paged, consider returning only those records that are actually needed for the page you are on.
You shouldn't need to loop through the data twice. What are you doing with the data?

Categories