I'm trying to make a tool that get source string from many URL I provided. And I use this code for multithreading
new Thread(() =>
{
while (stop != true)
{
if (nowworker >= threads)
{
Thread.Sleep(50);
}
else
{
if (i <= urllist.Count - 1)
{
var thread = new Thread(() =>
{
string source = GetSource(urllist[i]);
SaveToFile(source, i + ".txt");
});
thread.Start();
i++;
nowworker += 1;
}
else
{
stop = true;
}
}
}
}).Start();
It's run very smooth until I check the result and have some duplicated result and missing some url I provided if using less thread for many url(10 thread - 20 url) but there's no problem when using 20 thread for 20 url.
Please help me. Thank you.
if (i <= urllist.Count - 1)
{
var thread = new Thread(() =>
{
string source = GetSource(urllist[i]);
SaveToFile(source, i + ".txt");
});
thread.Start();
i++;
nowworker += 1;
}
The method you're passing to the thread is not guaranteed to execute before i is updated (the i++). Infact, it's very unlikely that it will. This means that multiple threads may use the same value of i, and some values of i will not have any threads executing it.
Even worse, GetSource may use a different value of i than SaveToFile.
Have a readup here: http://jonskeet.uk/csharp/csharp2/delegates.html
This will fix it:
if (i <= urllist.Count - 1)
{
var currentIndex = i;
var thread = new Thread(() =>
{
string source = GetSource(urllist[currentIndex]);
SaveToFile(source, currentIndex + ".txt");
});
thread.Start();
i++;
nowworker += 1;
}
Even better, you can replace the entire block of code with this:
Parallel.For(0, urlList.Count - 1,
new ParallelOptions { MaxDegreeOfParallelism = threads },
i =>
{
string source = GetSource(urllist[i]);
SaveToFile(source, i + ".txt");
}
);
Which will get rid of the code-smelly Thread.Sleep() and let .NET manage spinning up threads for you
Related
I am calling a VB 6.0 dll in Parallel.ForEach and expecting all calls to be started simultaneously or at least 2 of them based on my PC's cores or threads availability in thread pool
VB6 dll
Public Function DoJunk(ByVal counter As Long, ByVal data As String) As Integer
Dim i As Long
Dim j As Long
Dim s As String
Dim fno As Integer
fno = FreeFile
Open "E:\JunkVB6Dll\" & data & ".txt" For Output Access Write As #fno
Print #fno, "Starting loop with counter = " & counter
For i = 0 To counter
Print #fno, "counting " & i
Next
Close #fno
DoJunk = 1
End Function
counter is being passed from the caller to control execution time of the call and file is being written to make it an IO based process.
C# caller
private void ReportProgress(int value)
{
progressBar.Value = value;
//progressBar.Value++;
}
private void button1_Click(object sender, EventArgs e)
{
progressBar.Value = 0;
counter = 0;
Stopwatch watch = new Stopwatch();
watch.Start();
//var range = Enumerable.Range(0, 100);
var range = Enumerable.Range(0, 20);
bool finished = false;
Task.Factory.StartNew(() =>
{
Parallel.ForEach(range, i =>
{
#region COM CALL
JunkProject.JunkClass junk = new JunkProject.JunkClass();
try
{
Random rnd = new Random();
int dice = rnd.Next(10, 40);
int val = 0;
if (i == 2)
val = junk.DoJunk(9000000, i.ToString());
else
val = junk.DoJunk(dice * 10000, i.ToString());
System.Diagnostics.Debug.Print(junk.GetHashCode().ToString());
if (val == 1)
{
Interlocked.Increment(ref counter);
progressBar.Invoke((Action)delegate { ReportProgress(counter); });
}
junk = null;
}
catch (Exception excep)
{
i = i;
}
finally { junk = null; }
#endregion
});
}).ContinueWith(t =>
{
watch.Stop();
MessageBox.Show(watch.ElapsedMilliseconds.ToString());
});
}
This line is making a specific call longer than the others.
val = junk.DoJunk(9000000, i.ToString());
Here this second process is causing all calls inside the Parallel.ForEach to stop i.e. no other file is created unless this 2nd call gets completed.
Is it an expected behavior or i am doing something wrong?
As #John Wu suggested that you can create AppDomain to allow COM to run on different App Domain, I believe you could run your parallel like this.
Parallel.ForEach(range, i =>
{
AppDomain otherDomain = AppDomain.CreateDomain(i.ToString());
otherDomain.DoCallBack(delegate
{
//Your COM call
});
});
EDIT
Right.. I am not sure how can you set serializable on VB6.0 class. You can try the other way (Marshaling objects by reference). Noted: I haven't actually tested this, but I would like to know if that will work.
Parallel.ForEach(range, i =>
{
AppDomain otherDomain = AppDomain.CreateDomain(i.ToString());
var comCall = (ComCall) otherDomain.CreateInstanceFromAndUnwrap(Assembly.GetExecutingAssembly().Location, typeof(ComCall).ToString());
comCall.Run();
AppDomain.Unload(otherDomain);
});
and the class
public class ComCall : MarshalByRefObject
{
public void Run()
{
//Your COM Call
}
}
Here is also additional reference regarding the topic.
https://www.codeproject.com/Articles/14791/NET-Remoting-with-an-easy-example
I have written the below code in C# to create multiple thread (for ex 10 here)
ThreadStart MainThread = new ThreadStart(CallThread);
for (int i = 1; i <= 10; i++)
{
Thread ChildThread + Convert.ToString(i) = new Thread(MainThread);
ChildThread + Convert.ToString(i).Start();
}
it gives error cannot resolve symble ChildThread in 4th line and Cannot resolve symbol Start in 5th line.
Could someone help me out how to resolve this issue?
You can''t concatenate data to make variable name in c#.
You could simulate the same behavior using a Dictionary:
Dictionary<String, Thread> _threads = new Dictionary<String, Thread>(10);
for (int i = 1; i <= 10; i++)
{
_threads.Add("ChildThread" + Convert.ToString(i),new Thread(MainThread) )
_threads["ChildThread" + Convert.ToString(i)].Start();
}
Edit
On your second requirement you could pass a State to your Thread which could be a list of Files:
List<FileInfo> listOfFiles = //some way to get your files
for (int i = 1; i <= 10; i++)
{
_threads.Add("ChildThread" + Convert.ToString(i),new Thread(MainThread) )
_threads["ChildThread" + Convert.ToString(i)].Start( _listOfFiles.Skip(i*5).Take(5).ToList());
}
And in your CallThread function:
private void CallThread(Object state) {
List<FileInfo> filesToProcess = state as List<FileInfo>;
if(filesToProcess == null) return;
foreach(FileInfo f in filesToProcess) {
//do something
}
}
If you need to have references to your child threads, you can just keep them in list variable:
ThreadStart MainThread = new ThreadStart(CallThread);
List<Thread> threads = new List<Thread>();
for (int i = 1; i <= 10; i++)
{
Thread childThread = new Thread(MainThread);
threads.Add(childThread)
childThread.Start();
}
You cannot create dynamic names for variables. The way that you are approaching is completely wrong for C#. It can be valid for a scripting language (i.e. javascript) though. Here is a small example how you can create them.
ThreadStart MainThread = new ThreadStart(CallThread);
List<Thread> thList = new List<Thread>(); // It will contain all the threads. If you don't need buffering threads, don't use it.
for (int i = 1; i <= 10; i++)
{
Thread th = new Thread(MainThread);
thList.Add(th);
th.Start();
}
I think naming Threads is what you want.
Threads can have a Name property. Example below:
var thread = new Thread(() =>
{
TheMethodYouWantToExecute(passedVariable);
});
thread.Name = "nameThatIsAString";
thread.Start();
The code will create a thread using a lambda expression (doesn't have to be), and will start the thread you've just created.
You can add the "thread" variable to a List<> of Thread objects (Let's name the List<> "threadList"), and later access them by this code below:
var wantedThread = threadList.Where(t => t.Name == "nameThatIsAString").Single();
Have a nice day! If something is unclear, I'll try to explain it better.
I am building a web scraper in C# that deals with proxies and a large volume of requests. The pages are loaded through a ConnectionManager class that grabs a proxy and retries loading that page with random proxies until the page is correctly loaded.
On average, a single task will take somewhere between 100 and 300 requests, and to speed up the process, I have designed the method to use multithreading to simultaneously download the webpages.
public Review[] getReviewsMultithreaded(int reviewCount)
{
ArrayList reviewList = new ArrayList();
int currentIndex = 0;
int currentPage = 1;
int totalPages = (reviewCount / 10) + 1;
bool threadHasMoreWork = true;
Object pageLock = new Object();
Thread[] threads = new Thread[Program.maxScraperThreads];
for(int i = 0; i < Program.maxScraperThreads; i++)
{
threads[i] = (new Thread(() =>
{
while (threadHasMoreWork)
{
HtmlDocument doc;
lock(pageLock)
{
if (currentPage <= totalPages)
{
string builtString = "http://www.example.com/reviews/" + _ID + "?pageNumber=" + currentPage;
//Log.WriteLine(builtString);
currentPage++;
doc = Program.conManager.loadDocument(builtString);
}
else
{
threadHasMoreWork = false;
continue;
}
}
try
{
//Get info from page and add to list
reviewList.Add(cRev);
}
Log.WriteLine(_asin + " reviews scraped: " + reviewList.Count);
}
catch (Exception ex) { continue; }
}
}));
threads[i].Start();
}
bool threadsAreRunning = true;
while(threadsAreRunning) //this is in a separate thread itself, so as not to interrupt the GUI
{
threadsAreRunning = false;
foreach (Thread t in threads)
if (t.IsAlive)
{
threadsAreRunning = true;
Thread.Sleep(2000);
}
}
//flatten the arraylist to a primitive
return reviewArray;
}
However, I have noticed that the requests are still largely being handled one at a time, and as a result the method isn't much faster than it was before. Is the lock causing problems? Is the fact that the ConnectionManager is instantiated in one object and each thread is calling the loadDocument from the same object?
Ah, nevermind. I noticed the lock included the call to the method that loads the pages, and because of that only one page was loading at a time.
I am trying to dynamically create X amount of threads(specified by user) then basically have all of them execute some code at the exact same time in intervals of 1 second.
The issue I am having is that the task I am trying to complete relies on a loop to determine if the current IP is equal to the last. (It scans hosts) So since I have this loop inside, it is going off and then the other threads are not getting created, and not executing the code. I would like them to all go off at the same time, wait 1 second(using a timer or something else that doesnt lock the thread since the code it is executing has a timeout it waits for.) Can anyone help me out? Here is my current code:
int threads = Convert.ToInt32(txtThreads.Text);
List<Thread> workerThreads = new List<Thread>();
string from = txtStart.Text, to = txtEnd.Text;
uint current = from.ToUInt(), last = to.ToUInt();
ulong total = last - current;
for (int i = 0; i < threads; i++)
{
Thread thread = new Thread(() =>
{
for (int t = 0; t < Convert.ToInt32(total); t += i)
{
while (current <= last)
{
current = Convert.ToUInt32(current + t);
var ip = current.ToIPAddress();
doSomething(ip);
}
}
});
workerThreads.Add(thread);
thread.Start();
}
Don't use a lambda as the body of your thread, otherwise the i value isn't doing what you think it's doing. Instead pass the value into a method.
As for starting all of the threads at the same time do something like the following:
private object syncObj = new object();
void ThreadBody(object boxed)
{
Params params = (Params)boxed;
lock (syncObj)
{
Monitor.Wait(syncObj);
}
// do work here
}
struct Params
{
// passed values here
}
void InitializeThreads()
{
int threads = Convert.ToInt32(txtThreads.Text);
List<Thread> workerThreads = new List<Thread>();
string from = txtStart.Text, to = txtEnd.Text;
uint current = from.ToUInt(), last = to.ToUInt();
ulong total = last - current;
for (int i = 0; i < threads; i++)
{
Thread thread = new Thread(new ParameterizedThreadStart(this.ThreadBody, new Params { /* initialize values here */ }));
workerThreads.Add(thread);
thread.Start();
}
lock(syncObj)
{
Monitor.PulseAll(syncObj);
}
}
You're running into closure problems. There's another question that somewhat addresses this, here.
Basically you need to capture the value of i as you create each task. What's happening is by the time the task gets around to actually running, the value of i across all your tasks is the same -- the value at the end of the loop.
How can I run each call for loop in another thread, but continuation of ExternalMethod should wait to ending of last working thread from for loop (and synchronize) ?
ExternalMethod()
{
//some calculations
for (int i = 0; i < 10; i++)
{
SomeMethod(i);
}
//continuation ExternalMethod
}
One approach would be to use a ManualResetEvent.
Consider the following code (note that this should not be taken as a working example, stuck on OSX so don't have VS nor a C# compiler to hand to check this over):
static ManualResetEvent mre = new ManualResetEvent(false);
static int DoneCount = 0;
static int DoneRequired = 9;
void ExternalMethod() {
mre.Reset();
for (int i = 0; i < 10; i++) {
new Thread(new ThreadStart(ThreadVoid)).Start();
}
mre.WaitOne();
}
void ThreadVoid() {
Interlocked.Increment(ref DoneCount);
if (DoneCount == DoneRequired) {
mre.Set();
}
}
IMPORTANT - This possibly isn't the best way to do it, just an example of using ManualResetEvent, and it will suit your needs perfectly fine.
If you're on .NET 4.0 you can use a Parallel.For loop - explained here.
System.Threading.Tasks.Parallel.For(0, 10, (i) => SomeMethod(i));
One approach is to use a CountdownEvent.
ExternalMethod()
{
//some calculations
var finished = new CountdownEvent(1);
for (int i = 0; i < 10; i++)
{
int capture = i; // This is needed to capture the loop variable correctly.
finished.AddCount();
ThreadPool.QueueUserWorkItem(
(state) =>
{
try
{
SomeMethod(capture);
}
finally
{
finished.Signal();
}
}, null);
}
finished.Signal();
finished.Wait();
//continuation ExternalMethod
}
If CountdownEvent is not available then here is an alternate approach.
ExternalMethod()
{
//some calculations
var finished = new ManualResetEvent(false);
int pending = 1;
for (int i = 0; i < 10; i++)
{
int capture = i; // This is needed to capture the loop variable correctly.
Interlocked.Increment(ref pending);
ThreadPool.QueueUserWorkItem(
(state) =>
{
try
{
SomeMethod(capture);
}
finally
{
if (Interlocked.Decrement(ref pending) == 0) finished.Set();
}
}, null);
}
if (Interlocked.Decrement(ref pending) == 0) finished.Set();
finished.WaitOne();
//continuation ExternalMethod
}
Note that in both examples the for loop itself is treating as a parallel work item (it is on a separate thread from the other work items afterall) to avoid a really subtle race condition that might occur if the first work item signals the event before the next work item is queued.
For .NET 3.5, maybe something like this:
Thread[] threads = new Thread[10];
for (int x = 0; x < 10; x++)
{
threads[x] = new Thread(new ParameterizedThreadStart(ThreadFun));
threads[x].Start(x);
}
foreach (Thread thread in threads) thread.Join();
It may seem counterintuitive to use the Join() method, but since you are effectively doing a WaitAll-type pattern, it doesn't matter what order the joins are executed.