I have a huge data to run which takes awful amount of time so thought Threading might do the job for me quickly.
What I do : Call SQL Stored Procedures from ASP.NET front end and processing takes place there and it takes almost 30 hours.
What I need : I have split the data into different batches and created respective SPs for each. Now I require all SPs to be running at the same time at a single button click.
Please help!
I used the below code but it doesnt seem to run in parallel.
protected void Button3_Click(object sender, EventArgs e)
{
Thread t1 = new Thread(Method1);
Thread t2 = new Thread(Method2);
t1.Start();
t2.Start();
t1.Join();
t2.Join();
}
void Method1()
{
for (int i = 0; i < 10000; i++)
{
Response.Write("hello1"+i);
Response.Write("<br>");
}
}
void Method2()
{
for (int i = 0; i < 10000; i++)
{
Response.Write("hello2" + i);
}
}
You probably don't want to be doing this directly in ASP.NET for a variety of reasons, such as the worker process has limited execution time.
Also note that the SqlConnection etc also have their own time limits.
What you should really do is queue up the work to do (using IPC or another database table etc) and have something like a Windows service or external process in a scheduled task pick up and process through the queue.
Hell, you could even kick off a job within SQL Server and have that directly do the work.
Threading doesnt magically speed up your process.
If you dont know what you are doing server side threading is not a good idea in general.
Sql server probably will time out for 30hrs :)
for 30 hours of job Asp.net is not the way to go. This is a big process and you shouldn't handle it within Asp.net. As an alternative you might want to write a windows service. Pass your parameters to it ( maybe with msmq or some kind of messaging system) Do your process and send progress to web application show it with signalR or ajax pulls.
Narendran, start here:
http://www.albahari.com/threading/
This is the best Threading tutorial I have seen online and respective book is also very good.
Make sure you spend enough time to go through the whole tutorial(I have done it and believe me, it worth it!).
As said above using Join method of thread class in this case defeats the purpose of using threads. Instead of using join use lock(see basic Synchoronization in the above tutorial) to make sure threads are synchronized.
Also as mentioned, before doing multithreading Run those stored procedures on SQL server directly and all together. If it still takes 30 hours for them to get executed ,then using Threading won't do any help. If you see less than 30 hours then you may benefeat from multithreading.
Related
I have an application that runs on a machine with multiple processors. When I run the code in visual studio on my development machine it runs fairly quickly. When I run the published version on the server with the same inputs, it runs more slowly. I'm working on a theory, here. My development laptop has a single processor with a higher clock speed than the multiple processors on the server. Since the application is single-threaded, it seems logical that it would run faster locally. So, if I can add some multithreading to the application to make use of the additional processors on the server, I might be able to improve performance. That's the theory, anyway. And my experience with multithreaded applications is limited.
The crux of the application is a loop which calls several methods. A simplified version would look something like:
public DataTable MyMethod()
{
DataTable MyDataTable = new DataTable();
<add columns to the data table>
for (int counter = 1; counter <= MaxCounter; counter++)
{
<<generate some values>>
ComputeOutputsByRecipe(id, ref MyDataTable);
}
return MyDataTable;
}
private void ComputeOutputsByRecipe(int RecipeID, ref DataTable Results)
{
switch (RecipeID)
{
case 1:
ProcessRecipe_1(ref Results);
break;
case 2:
<repeat for supported recipe IDs>
}
}
private void ProcessRecipe_1(ref DataTable Results)
{
<do some processing>
DataRow dr = Results.NewRow();
<populate the new data row>
Results.Rows.Add(dr);
}
So what I'm looking to do is replace the "For" with "Paralel.For" to take advantage of multiple threads running on multiple processors. But since each iteration of the loop writes to a reference parameter, I'm concerned about thread safety. Now... the order that the data gets written to this data table is not important. And I don't read from the data table until after the looping has completed. So I don't think that this is a problem. But since Add() is an instance method, I'm concerned about what would happen.
So the question is, is it safe to add rows to a datatable like this if the for loop in my example is replaced with Paralel.For() and I don't read from the data table until after the loop has completed?
No, it is not safe. DataTable is not designed to be mutated from multiple concurrent threads.
You'll need to synchronize access to it in order to ensure it works properly.
I am writing a WinForms application. I am pulling data from my database, performing some actions on that data set and then plan to save it back to the database. I am using LINQ to SQL to perform the query to the database because I am only concerned with 1 table in our database so I didn't want to implement an entire ORM for this.
I have it pulling the dataset from the DB. However, the dataset is rather large. So currently what I am trying to do is separate the dataset into 4 relatively equal sized lists (List<object>).
Then I have a separate background worker to run through each of those lists, perform the action and report its progress while doing so. I have it planned to consolidate those sections into one big list once all 4 background workers have finished processing their section.
But I keep getting an error while the background workers are processing their unique list. Do the objects maintain their tie to the DataContext for the LINQ to SQL even though they have been converted to List objects? Any ideas how to fix this? I have minimal experience with multi-threading so if I am going at this completely wrong, please tell me.
Thanks guys. If you need any code snippets or any other information just ask.
Edit: Oops. I completely forgot to give the error message. In the DataContext designer.cs it gives the error An item with the same key has already been added. on the SendPropertyChanging function.
private void Setup(){
List<MyObject> quarter1 = _listFromDB.Take(5000).ToList();
bgw1.RunWorkerAsync();
}
private void bgw1_DoWork(object sender, DoWorkEventArgs e){
e.Result = functionToExecute(bgw1, quarter1);
}
private List<MyObject> functionToExecute(BackgroundWorker caller, List<MyObject> myList)
{
int progress = 0;
foreach (MyObject obj in myList)
{
string newString1 = createString();
obj.strText = newString;
//report progress here
caller.ReportProgress(progress++);
}
return myList;
}
This same function is called by all four workers and is given a different list for myList based on which worker is called the function.
Because a real answer has yet to be posted, I'll give it a shot.
Given that you haven't shown any LINQ-to-SQL code (no usage of DataContext) - I'll take an educated guess that the DataContext is shared between the threads, for example:
using (MyDataContext context = new MyDataContext())
{
// this is just some random query, that has not been listed - ToList()
// thus query execution is defered. listFromDB = IQueryable<>
var listFromDB = context.SomeTable.Where(st => st.Something == true);
System.Threading.Tasks.Task.Factory.StartNew(() =>
{
var list1 = listFromDB.Take(5000).ToList(); // runs the SQL query
// call some function on list1
});
System.Threading.Tasks.Task.Factory.StartNew(() =>
{
var list2 = listFromDB.Take(5000).ToList(); // runs the SQL query
// call some function on list2
});
}
Now the error you got - An item with the same key has already been added. - was because the DataContext object is not thread safe! A lot of stuff happens in the background - DataContext has to load objects from SQL, track their states, etc. This background work is what throws the error (because each thread is running the query, the DataContext gets accessed).
At least this is my own personal experience. Having come across the same error while sharing the DataContext between multiple threads. You only have two options in this scenario:
1) Before starting the threads, call .ToList() on the query, making listFromDB not an IQueryable<>, but an actual List<>. This means that the query has already ran and the threads operate on an actual List, not on the DataContext.
2) Move the DataContext definition into each thread. Because the DataContext is no longer shared, no more errors.
The third option would be to re-write the scenario into something else, like you did (for example, make everything sequential on a single background thread)...
First of all, I don't really see why you'd need multiple worker threads at all. (are theses lists in seperate databases / tables / servers? Do you really want to show 4 progress bars if you have 4 lists or are you somehow merging these progress reportings into one weird progress bar:D
Also, you're trying to speed up processing updates to your databases, but you don't send linq to sql any SAVES, so you're not really batching transactions, you'll just save everything at the end in one big transaction, is that really what you're aiming for? the progress bar will just stop at 100% and then spend a lot of time on the SQL side.
Just create one background thread and process everything synchronously, but batch a save transaction every couple of rows (i'd suggest something like every 1000 rows, but you should experiment with this) , it'll be fast, even with millions of rows,
If you really need this multithreaded solution:
The "another blabla with the same key has been added" error suggests that you are adding the same item to multiple "mylists", or adding the same item to the same list twice, otherwise how would there be any errors at all?
Using Parallel LINQ (PLINQ), you can take benefit of multiple CPU cores for processing your data. But if your application is going to run on single-core CPU, then splitting data into peaces wouldn't give you performance benefits instead it will incur some context-change overhead.
Hope it Helps
Ok so I am not very familiar with databases so there may be a simple solution that I am not aware of.
I have a SQL database that is to be managed by a class in my c# application. What I want the class to do is to constantly check the database to see if there is new data. If there is new data, I want it to trigger an event that another class will be listening to. Now I'm guessing that I need to implement a thread that will check the database at every other ms or something. However, what would I need to look for in order to fire my event? Can the database notify the class when there is a new entry?
If you are using MS SQLServer, you can use the SqlDependency class from the .NET Framework to get notifications about database changes.
Maybe other database systems have similar mechanisms in their database driver packages.
If you cannot use that for whatever reason, you will need a Thread to poll the database periodically.
1.If you want the database to inform your Application about a change then you can user Broker(first you enable your database to support Brokers and then you write some code so as to "attach" the Broker.). For your Application you will need SqlDependency Class.
Helpful links:
Enable Broker
Query Notifications in SQL Server
If you want to check multiple Queries then be aware that Broker is a little haevy.
2.If you want your application to do all the work you have to create a function that will check the CKECKSUM for the selected table, each time you will keep the last checksum and if you find any difference then you will "hit" the database to get the new data.
You have to decide who is going to do all your job!
Hope it helps.
Other than using SqlDependency, you can use a Timer, or SqlCacheDependency if you are using ASP.NET or MVC with the Cache object. 1ms intervals are not recommended though as you probably wont complete your check before the next one starts, and your database load will be very high as a result. You could also make sure you use the Timer.AutoReset property so you don't have calls tripping over each other.
Edit 2: This MSDN example shows how you can use SqlDependency, including having to Enable Query Notifications (MSDN). There are many considerations for using SqlDependency, for example it was really designed for web servers where limited watchers would be created, not so much for desktop applications, so keep that in mind. There is a good article on BOL on this called Planning for Notifications which emphasises that Query notifications are useful
if the data in the query changes relatively infrequently, if the application does not require an instantaneous update when the data changes, and if the query meets the requirements and restrictions outlined in Creating a Query for Notification
In your sample you suggest the need for 1ms latency, so maybe the Dependency classes are not the best way for you (also see my later comment on your latency requirement).
EDIT: For example (using the timer):
class Program
{
static void Main(string[] args)
{
Timer timer = new Timer(1);
timer.Elapsed += timer_Elapsed;
timer.AutoReset = false;
timer.Enabled = true;
}
static void timer_Elapsed(object sender, ElapsedEventArgs e)
{
Timer timer = (Timer)sender;
try
{
// do the checks here
}
finally
{
// re=enable the timer to check again very soon
timer.Enabled = true;
}
}
}
As for what to check, it depends on what changes you are actually looking to detect. Here are some ideas:
table row count (but dangerous if a row is added and deleted since the last check)
max value of the table id column (only works if you have a numeric identity field that is increasing, and only works to check for new rows)
check individual columns for changes in specific rows you want to watch
use a row CHECKSUM in a column to check for changes on individual rows
ask writers to update a separate table with a change reference id that you can check
use audit tables to record changes, and check for new audit records
You need to better define the scope of your change monitoring before you can get a good answer to this.
Latency
Also ask yourself if you really need 1ms latency on change updates. If you do, a different approach entirely might be better. For example you may need to use a notification mechanism by the data writers to the parts of your application that need to know an update has occurred right now.
I know I'm having a massive derp moment here and this is probably quite easy to actually do - I have had a search around and read a few articles but i'm still struggling a little, so any feedback or pointers to useful resources would be greatly appreciated!
Anyway I have a class called PopulateDatagridViews which I have various functions in, one of which is called ExecuteSqlStatement, this function is simple enough, it initializes an SQL connection and returns a DataTable populated with the results of the SQL query. Within the same class I also have various functions that use string builders to build up SQL statements. (Not ideal, I know.)
I create a PopulateDatagridViews object in my GUI thread and use it to set various datagrid views with with the returned DataTables. For example:
dataGridViewVar.DataSource = populateDgv.GetCustomers();
Naturally a problem I'm having is that the more data to be read from the database, the longer the U.I is unresponsive. I would like to shift the process of retrieving data via the PopulateDatagridViews to a separate thread or BackgroundWorker so as prevent the main GUI thread from locking up whilst this is processed.
I realise I can create a BackgroundWorker to do this and place in the DoWork handler a call to the appropriate function within my PopulateDatagridViews.
I figure I could create a BackgroundWorker for each individual function inside my PopulateDatagridViews class, but surely there is a more efficient way to do this? I'd very much appreciate a point in the right direction on this as it's driving me around the bend!
Additional Info: I use version 4.0 of the .Net framework.
I strongly suggest that you use TPL (Task Parallel Library) http://msdn.microsoft.com/en-us/library/dd537609.aspx
In your case you will create first task to pull some data and than start second task after first is completed to update UI.
I`ll try to find code that i write for similar problem.
Edit: Adding code
Task<return_type> t1 = new Task<return_type>(() =>
{
//do something to take some result
return some_result; //return it
});
t1.Start();
Task t2 = t1.ContinueWith((some_arg_that_represent_previous_task_obj) =>{//ContinueWith guarantees that t2 is started AFTER t1 is executed!
//Update your GUI here
//if you need result from previos task: some_arg_that_represent_previous_task_obj.Result //Your dataset or whatever
}, TaskScheduler.FromCurrentSynchronizationContext()); //VERY important - you must update gui from same thread that created it! (you will have cross thread exeption if you dont add TaskScheduler.FromCurrentSynchronizationContext()
Hope it helps.
Well in that case I recommend reading this msdn article to get some ideas. Afterwards you should look for some tutorials, because the msdn is not the best source to learn things. ;o)
I've been programming console apps for 1 year and I think its time to start something with forms. I don't really know how to make 2 loops work at the same time.
Could any1 help me and give me an example of 2 loops, working together (1 counting from 1 to 100 and 2nd countin from 100 to 200 (both at the same time, lets say 2 message boxes)). I've been looking for smth like that on the net but without success.
I'd also like to know if infinite whiles has to be like while (5>2) or if theres a better way to do that.
Thanks in advance !
I don't really know how to make 2 loops work at the same time.
This is a simple question with an enormous answer, but I'll try to break it down for you.
The problem you're describing at its basic level is "I have two different hunks of code that both interact with the user in some way. I would like to give the user the impression that both hunks of code are running at the same time, smoothly responding to user input."
Obviously the easiest way to do that is to write two programs. That is, make the operating system solve the problem. The operating system somehow manages to have dozens of different processes running "at the same time", all interacting smoothly (we hope) with the user.
But having two processes imposes a high cost. Processes are heavyweight, and it is expensive for the two hunks of code to talk to each other. Suppose you therefore want to have the two hunks of code in the same program. Now what do you do?
One way is to put the two hunks of code each on their own thread within the same process. This seems like a good idea, but it creates a lot of problems of its own. Now you have to worry about thread safety and deadlocks and all of that. And, unfortunately, only one thread is allowed to communicate with the user. Every forms application has a "UI" thread. If you have two "worker" threads running your hunks of code, they have to use cross-thread communication to communicate with the UI thread.
Another way is to break up each hunk of code into tiny little pieces, and then schedule all the pieces to run in order, on the UI thread. The scheduler can give priority to user interaction, and any particular tiny piece of work is not going to block and make the UI thread unresponsive.
It is this last technique that I would suggest you explore. We are doing a lot of work in C# 5 to make it easier to write programs in this style.
See http://msdn.microsoft.com/en-us/async for more information about this new feature.
Not sure if this is what you mean about the two loops.
Infinite loops is anything where while (expression is true) where your expression is 5>2 is always returning true and there is no terminating out of the loop i.e. return; or break;
Drop two labels on the form in Designer view. And then add this in Code view:
public Form1()
{
InitializeComponent();
Shown += new EventHandler(Form1_Shown);
}
void Form1_Shown(object sender, EventArgs e)
{
for (int i = 1; i <= 100; i++)
{
label1.Text = i.ToString();
// "Second loop"
label2.Text = (i + 100).ToString();
Update();
System.Threading.Thread.Sleep(10);
}
}
You'll get two numbers counting simultaneously. One from 1-100. The other from 101-200.
This?
for (int i = 1; i <= 100; i++)
{
//..
for (int i2 = 100; i2 <= 200; i2++)
{
//..
}
}