This question has probably been asked in various ways before, but here is what I want to do. I am going to have a Windows form with many tabs. Each tab will contain a grid object. For each tab/grid that is created by the user, I would like a spawn off a dedicated thread to populate the contents of that grid with constantly arriving information. Could anyone provide an example of how to do this safely?
Thanks.
Inside the initialization for the tab (assuming WinForms until I see otherwise):
Thread newThread = new Thread(() =>
{
// Get your data
dataGridView1.Invoke(new Action(() => { /* add data to the grid here */ } );
});
newThread.Start();
That is obviously the most simple example. You could also spawn the threads using the ThreadPool (which is more commonly done in server side applications).
If you're using .NET 4.0 you also have the Task Parallel library which could help as well.
There are two basic approaches you can use. Choose the one that makes the most sense in your situation. Often times there is no right or wrong choice. They can both work equally well in many situations. Each has its own advantages and disadvantages. Oddly the community seems to overlook the pull method too often. I am not sure why that is really. I recently stumbled upon this question in which everyone recommeded the push approach despite it being the perfect situation for the pull method (there was one poor soul who did go against the herd and got downvoted and eventually deleted his answer leaving only me as the lone dissenter).
Push Method
Have the worker thread push the data to the form. You will need to use the ISynchronizeInvoke.Invoke method to accomplish this. The advantage here is that as each data item arrives it will immediately be added to the grid. The disadvantage is that you have to use an expensive marshaling operation and the UI could bog down if the worker thread acquires the data too fast.
void WorkerThread()
{
while (true)
{
object data = GetNewData();
yourForm.Invoke(
(Action)(() =>
{
// Add data to your grid here.
}));
}
}
Pull Method
Have the UI thread pull the data from the worker thread. You will have the worker thread enqueue new data items into a shared queue and the UI thread will dequeue the items periodically. The advantage here is that you can throttle the amount of work each thread is performing independently. The queue is your buffer that will shrink and grow as CPU usage ebbs and flows. It also decouples the logic of the worker thread from the UI thread. The disadvantage is that if your UI thread does not poll fast enough or keep up the worker thread could overrun the queue. And, of course, the data items would not appear in real-time on your grid. However, if you set the System.Windows.Forms.Timer interval short enough that might be not be an issue for you.
private Queue<object> m_Data = new Queue<object>();
private void YourTimer_Tick(object sender, EventArgs args)
{
lock (m_Data)
{
while (m_Data.Count > 0)
{
object data = m_Data.Dequeue();
// Add data to your grid here.
}
}
}
void WorkerThread()
{
while (true)
{
object data = GetNewData();
lock (m_Data)
{
m_Data.Enqueue(data);
}
}
}
You should have an array of threads, to be able to control them
List<Thread> tabs = new List<Thread>();
...
To add a new one, would be like:
tabs.Add( new Thread( new ThreadStart( TabRefreshHandler ) );
//Now starting:
tabs[tabs.Count - 1].Start();
And finally, in the TabRefreshHandler you should check which is the calling thread number and you'll know which is the tab that should be refreshed!
Related
I have a main form called ProxyTesterForm, which has a child form ProxyScraperForm. When ProxyScraperForm scrapes a new proxy, ProxyTesterForm handles the event by testing the scraped proxy asynchronously, and after testing adds the proxy to a BindingList which is the datasource of a DataGridView.
Because I am adding to a databound list which was created on the UI thread I am calling BeginInvoke on the DataGridView so the update happens on the appropriate thread.
Without the BeginInvoke call in the method I will post below, I can drag the form around on my screen during processing and it doesn't stutter and is smooth. With the BeginInvoke call, it's doing the opposite.
I have a few ideas on how to fix it, but wanted to hear from smarter people than me here on SO so I solve this properly.
Use a semaphore slim to control the amount of simultaneous updates.
Add asynchronously processed items to a list outside of the scope of the the method I will post below, and iterate over that list in a Timer_Tick event handler, calling BeginInvoke for each item in the list every 1 second, then clearing that list and wash, rinse, repeat until the job is done.
Give up the convenience of data binding and go virtual mode.
Anything else someone might suggest here.
private void Site_ProxyScraped(object sender, Proxy proxy)
{
Task.Run(async () =>
{
proxy.IsValid = await proxy.TestValidityAsync(judges[0]);
proxiesDataGridView.BeginInvoke(new Action(() => { proxies.Add(proxy); }));
});
}
In Windows every thread that has UI has a message queue - this queue is used to send UI messages for the windows for this thread, those message include things like mouse moved, mouse up/down, etc.
Somewhere in every UI framework there is a loop that reads a message from the queue, processes it and then wait for the next message.
Some messages are lower priority, for example the mouse move message is generated only when the thread is ready to process it (because the mouse tends to move a lot)
BeginInvoke also uses this mechanism, it send a message telling the loop there's code it needs to run.
What you are doing is flooding the queue with your BeginInvoke message and not letting it handle UI events.
The standard solution is to limit the amount of BeginInvoke calls, for example, collect all the items you need to add and use one BeginInvoke call to add them all.
Or add in batches, if you make just one BeginInvoke call per second for all the objects found in this second you probably not effect the UI responsiveness and the user won't be able to tell the difference.
Note: For the actual answer on why this is happening, see #Nir's answer. This is only an explanation to overcome som problems and to give some directions. It's not flawless, but it was in line of the conversation by comments.
Just some quick proto type to add some separation of layers (minimal attempt):
//member field which contains all the actual data
List<Proxy> _proxies = new List<Proxy>();
//this is some trigger: it might be an ellapsed event of a timer or something
private void OnSomeTimerOrOtherTrigger()
{
UIupdate();
}
//just a helper function
private void UIupdate
{
var local = _proxies.ToList(); //ensure static encapsulation
proxiesDataGridView.BeginInvoke(new Action(() =>
{
//someway to add *new ones* to UI
//perform actions on local copy
}));
}
private void Site_ProxyScraped(object sender, Proxy proxy)
{
Task.Run(async () =>
{
proxy.IsValid = await proxy.TestValidityAsync(judges[0]);
//add to list
_proxies.Add(proxy);
});
}
I have a Window in WPF and user can start very long operations on it. User must be able to cancel those operations.
All of my operations are in separate threads. So my question is:
Can I terminate all threads that are started from that Window, without killing UI thread obviously, at any time?
On places where I need to do long operations threads were created and started like this
Thread thread =
new Thread(
new ThreadStart(
delegate
{...}));
thread.Start();
How to pass that object to it? is it possible? If it is important at all I do not care about graceful closing of threads, they can be killed, it would still be a solution. Is window object aware of threads to whom it is parent?
Thank you in advance.
Typically you won't want to create/destroy threads. There's much more overhead when creating a Thread every time you need one than there is in thread pools and Tasks (This applies, like specified, when you need to create a significant number of Threads in the lifetime of your processes).
The preferred approach (especially if you're using .Net 4.0, or even better 4.5) is to use Tasks.
There's is tons of documentation on how to use Tasks, and how to cancel them. #xxbbcc posted a link in a comment on your question.
However, if you still think that dealing with Threads is your best choice, you could keep a track of all the threads. Then whenever you (as a developer) or your user determines they want to kill the thread, you can just iterate through the threads and call the Abort() method on them.
public class MyExampleClass
{
private List<Thread> MyThreads { get; set; }
public MyExampleClass()
{
MyThreads = new List<Thread>();
InstanciateThreadsWithSomeSuperImportantOperations();
}
private void InstanciateThreadsWithSomeSuperImportantOperations()
{
var thread = new Thread();
// some code here
MyThreads.Add(thread);
}
public void KillAllThreads()
{
foreach (var t in MyThreads)
{
if (t.IsAlive)
t.Abort(); // Note this isn't guaranteed to stop the thread.
}
}
}
I'm currently developing a system in C# / WPF which accesses an SQL database, retrieves some data (around 10000 items) and then should update a collection of data points that is used as data for a WPF chart I'm using in my application (Visifire charting solution, in case anyone was wondering).
When I wrote the straight-forward, single-threaded solution, the system would, as you might expect, hang for the period of time it took the application to query the database, retrieve the data and render the charts. However, I wanted to make this task quicker by adding a wait animation to the user while the data was being fetched and processed using multithreading. However, two problems arise:
I'm having trouble updating my collections and keeping them synchronized when using multithreading. I'm not very familiar with the Dispatcher class, so I'm not very sure what to do.
Since I'm obviously not handling the multi-threading very well, the wait animation won't show up (since the UI is frozen).
I'm trying to figure out if there's a good way to use multi-threading effectively for collections. I found that Microsoft had Thread-Safe collections but none seems to fit my needs.
Also, if anyone have a good reference to learn and understand the Dispatcher I would highly appreciate it.
EDIT:
Here's a code snippet of what I'm trying to do, maybe it can shed some more light on my question:
private List<DataPoint> InitializeDataSeries(RecentlyPrintedItemViewModel item)
{
var localDataPoints = new List<DataPoint>();
// Stopping condition for recursion - if we've hit a childless (roll) item
if (item.Children.Count == 0)
{
// Populate DataPoints and return it as one DataSeries
_dataPoints.AddRange(InitializeDataPoints(item));
}
else
{
// Iterate through all children and activate this function on them (recursion)
var datapointsCollection = new List<DataPoint>();
Parallel.ForEach(item.Children, child => datapointsCollection = (InitializeDataSeries((RecentlyPrintedItemViewModel)child)));
foreach (var child in item.Children)
{
localDataPoints.AddRange(InitializeDataSeries((RecentlyPrintedItemViewModel)child));
}
}
RaisePropertyChanged("DataPoints");
AreDataPointsInitialized = true;
return localDataPoints;
}
Thanks
The Dispatcher is an object used to manage multiple queues of work items on a single thread, and each queues has a different priority for when it should execute it's work items.
The Dispatcher usually references WPF's main application thread, and is used to schedule code at different DispatcherPriorities so they run in a specific order.
For example, suppose you want to show a loading graphic, load some data, then hide the graphic.
IsLoading = true;
LoadData();
IsLoading = false;
If you do this all at once, it will lock up your application and you won't ever see the loading graphic. This is because all the code runs by default in the DispatcherPriority.Normal queue, so by the time it's finished running the loading graphic will be hidden again.
Instead, you could use the Dispatcher to load the data and hide the graphic at a lower dispatcher priority than DispatcherPriority.Render, such as DispatcherPriority.Background, so all tasks in the other queues get completed before the loading occurs, including rendering the loading graphic.
IsLoading = true;
Dispatcher.BeginInvoke(DispatcherPriority.Background,
new Action(delegate() {
LoadData();
IsLoading = false;
}));
But this still isn't ideal because the Dispatcher references the single UI thread of the application, so you will still be locking up the thread while your long running process occurs.
A better solution is to use a separate thread for your long running process. My personal preference is to use the Task Parallel Library because it's simple and easy to use.
IsLoading = true;
Task.Factory.StartNew(() =>
{
LoadData();
IsLoading = false;
});
But this can still give you problems because WPF objects can only be modified from the thread that created them.
So if you create an ObservableCollection<DataItem> on a background thread, you cannot modify that collection from anywhere in your code other than that background thread.
The typical solution is to obtain your data on a background thread and return it to the main thread in a temp variable, and have the main UI thread create the object and fill it with data obtained from the background thread.
So often your code ends up looking something like this :
IsLoading = true;
Task.Factory.StartNew(() =>
{
// run long process and return results in temp variable
return LoadData();
})
.ContinueWith((t) =>
{
// this block runs once the background code finishes
// update with results from temp variable
UpdateData(t.Result)
// reset loading flag
IsLoading = false;
});
Main question is: How to run the code within TestingButton_Click on several threads in background (similar to BackgroundWorker) so I will be able to:
1. Get all the raw data to the methods
2. Cancel test for all threads simultaneously
3. Report progress
4. Retrieve all the result tables to main thread.
The following code is within TestingButton_Click
List<Thread> threads = new List<Thread>();
//Testing for each pair
foreach (InterfaceWithClassName aCompound in Group1)
{
foreach (InterfaceWithClassName bCompound in Group2)
{
InstancePair pair = new InstancePair();
//some code
if (testModeParallel)
{
Thread thr = new Thread(TestPairParallel);
thr.Start(pair);
threads.Add(thr);
}
else
{
Thread thr = new Thread(TestPairSerial);
thr.Start(pair);
threads.Add(thr);
}
}
}
while (true)
{
int i = 0;
foreach (Thread thread in threads)
{
if (thread.IsAlive)
break;
i++;
}
if (i == threads.Count)
break;
Thread.Sleep(1000);
}
pairsResultsDataGrid.ItemsSource = tab.DefaultView
User is able to choose what compounds to test so every time I have different number of pairs to test.
I made to different methods TestPairSerial() and TestPairParallel() just in case.
TestPairSerial() structure is
do
{
do
{
} while (isSetbCompaundParams);
} while (isSetaCompaundParams);
//filling up results into tables (main window variables) later to be connected to DataGrids
TestPairParallel() is implemented with InfinitePartitioner and using similar structure only with Parallel.ForEach(new InfinitePartitioner(),...
Thank you for your help.
Use .NET 4.0 Tasks instead of creating new Threads yourself. Tasks give you finer granularity of control, make it easy to pass data into the background operation, and provide excellent support for waiting for results across multiple concurrent tasks and for cancellation of everything in one fell swoop if needed. Highly recommended.
How to run the code within TestingButton_Click on several threads in
background.
I would use a Task as they were design to do exactly what you want.
The only other question I will answer until you get closer to the actual solution is the following:
Report progress
There are lots of ways to report the progress on a given thread, you would have to subscribe to an event, and write code to report the progress of the thread. In order to update a control on the form, this would require you Invoke the change, this is not a trivial feature.
I post a lot here regarding multithreading, and the great stackoverflow community have helped me alot in understand multithreading.
All the examples I have seen online only deal with one thread.
My application is a scraper for an insurance company (family company ... all free of charge). Anyway, the user is able to select how many threads they want to run. So lets say for example the user wants the application to scrape 5 sites at one time, and then later in the day he choses 20 threads because his computer isn't doing anything else so it has the resources to spare.
Basically the application builds a list of say 1000 sites to scrape. A thread goes off and does that and updates the UI and builds the list.
When thats finished another thread is called to start the scraping. Depending on the number of threads the user has set to use it will create x number of threads.
Whats the best way to create these threads? Should I create 1000 threads in a list. And loop through them? If the user has set 5 threads to run, it will loop through 5 at a time.
I understand threading, but it's the application logic which is catching me out.
Any ideas or resources on the web that can help me out?
You could consider using a thread pool for that:
using System;
using System.Threading;
public class Example
{
public static void Main()
{
ThreadPool.SetMaxThreads(100, 10);
// Queue the task.
ThreadPool.QueueUserWorkItem(new WaitCallback(ThreadProc));
Console.WriteLine("Main thread does some work, then sleeps.");
Thread.Sleep(1000);
Console.WriteLine("Main thread exits.");
}
// This thread procedure performs the task.
static void ThreadProc(Object stateInfo)
{
Console.WriteLine("Hello from the thread pool.");
}
}
This scraper, does it use a lot of CPU when its running?
If it does a lot of communication with these 1000 remote sites, downloading their pages, that may be taking more time than the actual analysis of the pages.
And how many CPU cores does your user have? If they have 2 (which is common these days) then beyond two simultaneous threads performing analysis, they aren't going to see any speed up.
So you probably need to "parallelize" the downloading of the pages. I doubt you need to do the same for the analysis of the pages.
Take a look into asynchronous IO, instead of explicit multi-threading. It lets you launch a bunch of downloads in parallel and then get called back when each one completes.
If you really just want the application, use something someone else already spent time developing and perfecting:
http://arachnode.net/
arachnode.net is a complete and comprehensive .NET web crawler for
downloading, indexing and storing
Internet content including e-mail
addresses, files, hyperlinks, images,
and Web pages.
Whether interested or involved in
screen scraping, data mining, text
mining, research or any other
application where a high-performance
crawling application is key to the
success of your endeavors,
arachnode.net provides the solution
you need for success.
If you also want to write one yourself because it's a fun thing to write (I wrote one not long ago, and yes, it is alot of fun ) then you can refer to this pdf provided by arachnode.net which really explains in detail the theory behind a good web crawler:
http://arachnode.net/media/Default.aspx?Sort=Downloads&PageIndex=1
Download the pdf entitled: "Crawling the Web" (second link from top). Scroll to Section 2.6 entitled: "2.6 Multi-threaded Crawlers". That's what I used to build my crawler, and I must say, I think it works quite well.
I think this example is basically what you need.
public class WebScraper
{
private readonly int totalThreads;
private readonly List<System.Threading.Thread> threads;
private readonly List<Exception> exceptions;
private readonly object locker = new object();
private volatile bool stop;
public WebScraper(int totalThreads)
{
this.totalThreads = totalThreads;
threads = new List<System.Threading.Thread>(totalThreads);
exceptions = new List<Exception>();
for (int i = 0; i < totalThreads; i++)
{
var thread = new System.Threading.Thread(Execute);
thread.IsBackground = true;
threads.Add(thread);
}
}
public void Start()
{
foreach (var thread in threads)
{
thread.Start();
}
}
public void Stop()
{
stop = true;
foreach (var thread in threads)
{
if (thread.IsAlive)
{
thread.Join();
}
}
}
private void Execute()
{
try
{
while (!stop)
{
// Scrap away!
}
}
catch (Exception ex)
{
lock (locker)
{
// You could have a thread checking this collection and
// reporting it as you see fit.
exceptions.Add(ex);
}
}
}
}
The basic logic is:
You have a single queue in which you put the URLs to scrape then you create your threads and use a queue object to which every thread has access. Let the threads start a loop:
lock the queue
check if there are items in the queue, if not, unlock queue and end thread
dequeue first item in the queue
unlock queue
process item
invoke an event that updates the UI (Remember to lock the UI Controller)
return to step 1
Just let the Threads do the "get stuff from the queue" part (pulling the jobs) instead of giving them the urls (pushing the jobs), that way you just say
YourThreadManager.StartThreads(numberOfThreadsTheUserWants);
and everything else happens automagically. See the other replies to find out how to create and manage the threads .
I solved a similar problem by creating a worker class that uses a callback to signal the main app that a worker is done. Then I create a queue of 1000 threads and then call a method that launches threads until the running thread limit is reached, keeping track of the active threads with a dictionary keyed by the thread's ManagedThreadId. As each thread completes, the callback removes its thread from the dictionary and calls the thread launcher.
If a connection is dropped or times out, the callback reinserts the thread back into the queue. Lock around the queue and the dictionary. I create threads vs using the thread pool because the overhead of creating a thread is insignificant compared to the connection time, and it allows me to have a lot more threads in flight. The callback also provides a convenient place with which to update the user interface, even allowing you to change the thread limit while it's running. I've had over 50 open connections at one time. Remember to increase your MacConnections property in your app.config (default is two).
I would use a queue and a condition variable and mutex, and start just the requested number of threads, for example, 5 or 20 (and not start 1,000).
Each thread blocks on the condition variable. When woken up, it dequeues the first item, unlocks the queue, works with the item, locks the queue and checks for more items. If the queue is empty, sleep on the condition variable. If not, unlock, work, repeat.
While the mutex is locked, it can also check if the user has requested the count of threads to be reduced. Just check if count > max_count, and if so, the thread terminates itself.
Any time you have more sites to queue, just lock the mutex and add them to the queue, then broadcast on the condition variable. Any threads that are not already working will wake up and take new work.
Any time the user increases the requested thread count, just start them up and they will lock the queue, check for work, and either sleep on the condition variable or get going.
Each thread will be continually pulling more work from the queue, or sleeping. You don't need more than 5 or 20.
Consider using the event-based asynchronous pattern (AsyncOperation and AsyncOperationManager Classes)
You might want to take a look at the ProcessQueue article on CodeProject.
Essentially, you'll want to create (and start) the number of threads that are appropriate, in your case that number comes from the user. Each of these threads should process a site, then find the next site needed to process. Even if you don't use the object itself (though it sounds like it would suit your purposes pretty well, though I'm obviously biased!) it should give you some good insight into how this sort of thing would be done.