I am executing/processing very big files in multi threaded mode in a console app.
When I don't update/write to the console from threads, for testing the whole process take about 1 minute.
But when I try to update/write to console from threads to show the progress, the process stuck and it never finishes (waited several minutes even hours). And also console text/window does not updated as it should.
Update-1: As requested by few kind responder, i added minimal code that can reproduce the same error/problem
Here is the code from the thread function/method:
using System;
using System.Collections;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.IO;
using System.Threading;
using System.Threading.Tasks;
namespace Large_Text_To_Small_Text
{
class Program
{
static string sAppPath;
static ArrayList objThreadList;
private struct ThreadFileInfo
{
public string sBaseDir, sRFile;
public int iCurFile, iTFile;
public bool bIncludesExtension;
}
static void Main(string[] args)
{
string sFileDir;
DateTime dtStart;
Console.Clear();
sAppPath = Path.GetDirectoryName(System.Reflection.Assembly.GetExecutingAssembly().Location);
sFileDir = #"d:\Test";
dtStart = DateTime.Now;
///process in multi threaded mode
List<string> lFiles;
lFiles = new List<string>();
lFiles.AddRange(Directory.GetFiles(sFileDir, "*.*", SearchOption.AllDirectories));
if (Directory.Exists(sFileDir + "-Processed") == true)
{
Directory.Delete(sFileDir + "-Processed", true);
}
Directory.CreateDirectory(sFileDir + "-Processed");
sPrepareThreading();
for (int iFLoop = 0; iFLoop < lFiles.Count; iFLoop++)
{
//Console.WriteLine(string.Format("{0}/{1}", (iFLoop + 1), lFiles.Count));
sThreadProcessFile(sFileDir + "-Processed", lFiles[iFLoop], (iFLoop + 1), lFiles.Count, Convert.ToBoolean(args[3]));
}
sFinishThreading();
Console.WriteLine(DateTime.Now.Subtract(dtStart).ToString());
Console.ReadKey();
return;
}
private static void sProcSO(object oThreadInfo)
{
var inputLines = new BlockingCollection<string>();
long lACounter, lCCounter;
ThreadFileInfo oProcInfo;
lACounter = 0;
lCCounter = 0;
oProcInfo = (ThreadFileInfo)oThreadInfo;
var readLines = Task.Factory.StartNew(() =>
{
foreach (var line in File.ReadLines(oProcInfo.sRFile))
{
inputLines.Add(line);
lACounter++;
}
inputLines.CompleteAdding();
});
var processLines = Task.Factory.StartNew(() =>
{
Parallel.ForEach(inputLines.GetConsumingEnumerable(), line =>
{
lCCounter++;
/*
some process goes here
*/
/*If i Comment out these lines program get stuck!*/
//Console.SetCursorPosition(0, oProcInfo.iCurFile);
//Console.Write(oProcInfo.iCurFile + " = " + lCCounter.ToString());
});
});
Task.WaitAll(readLines, processLines);
}
private static void sPrepareThreading()
{
objThreadList = new ArrayList();
for (var iTLoop = 0; iTLoop < 5; iTLoop++)
{
objThreadList.Add(null);
}
}
private static void sThreadProcessFile(string sBaseDir, string sRFile, int iCurFile, int iTFile, bool bIncludesExtension)
{
Boolean bMatched;
Thread oCurThread;
ThreadFileInfo oProcInfo;
Salma_RecheckThread:
bMatched = false;
for (int iTLoop = 0; iTLoop < 5; iTLoop++)
{
if (objThreadList[iTLoop] == null || ((System.Threading.Thread)(objThreadList[iTLoop])).IsAlive == false)
{
oProcInfo = new ThreadFileInfo()
{
sBaseDir = sBaseDir,
sRFile = sRFile,
iCurFile = iCurFile,
iTFile = iTFile,
bIncludesExtension = bIncludesExtension
};
oCurThread = new Thread(sProcSO);
oCurThread.IsBackground = true;
oCurThread.Start(oProcInfo);
objThreadList[iTLoop] = oCurThread;
bMatched = true;
break;
}
}
if (bMatched == false)
{
System.Threading.Thread.Sleep(250);
goto Salma_RecheckThread;
}
}
private static void sFinishThreading()
{
Boolean bRunning;
Salma_RecheckThread:
bRunning = false;
for (int iTLoop = 0; iTLoop < 5; iTLoop++)
{
if (objThreadList[iTLoop] != null && ((System.Threading.Thread)(objThreadList[iTLoop])).IsAlive == true)
{
bRunning = true;
}
}
if (bRunning == true)
{
System.Threading.Thread.Sleep(250);
goto Salma_RecheckThread;
}
}
}
}
And here is the screenshot, if I try to update console window:
You see? Nor the line number (oProcInfo.iCurFile) or the whole line is correct!
It should be like this:
1 = xxxxx
2 = xxxxx
3 = xxxxx
4 = xxxxx
5 = xxxxx
Update-1: To test just change the sFileDir to any folder that has some big text file or if you like you can download some big text files from following link:
https://wetransfer.com/downloads/8aecfe05bb44e35582fc338f623ad43b20210602005845/bcdbb5
Am I missing any function/method to update console text from threads?
I can't reproduce it. In my tests the process always runs to completion, without getting stuck. The output is all over the place though, because the two lines below are not synchronized:
Console.SetCursorPosition(0, oProcInfo.iCurFile);
Console.Write(oProcInfo.iCurFile + " = " + lCCounter.ToString());
Each thread of the many threads involved in the computation invokes these two statements concurrently with the other threads. This makes it possible for one thread to preempt another, and move the cursor before the first thread has the chance to write in the console. To solve this problem you must add proper synchronization, and the easiest way to do it is to use the lock statement:
class Program
{
static object _locker = new object();
And in the sProcSO method:
lock (_locker)
{
Console.SetCursorPosition(0, oProcInfo.iCurFile);
Console.Write(oProcInfo.iCurFile + " = " + lCCounter.ToString());
}
If you want to know more about thread synchronization, I recommend this online resource: Threading in C# - Part 2: Basic Synchronization
If you would like to hear my opinion about the code in the question, and you don't mind receiving criticism, my opinion is that honestly the code is so much riddled with problems that the best course of action would be to throw it away and restart from scratch. Use of archaic data structures (ArrayList???), liberal use of casting from object to specific types, liberal use of the goto statement, use of hungarian notation in public type members, all make the code difficult to follow, and easy for bugs to creep in. I found particularly problematic that each file is processed concurrently with all other files using a dedicated thread, and then each dedicated thread uses a ThreadPool thread (Task.Factory.StartNew) to starts a parallel loop (Parallel.ForEach) with unconfigured MaxDegreeOfParallelism. This setup ensures that the ThreadPool will be saturated so badly, that there is no hope that the availability of threads will ever match the demand. Most probably it will also result to a highly inefficient use of the storage device, especially if the hardware is a classic hard disk.
Your freezing problem may not be C# or code related
on the top left of your console window, on the icon .. right click
select Properties
remove the option of Quick Edit Mode and Insert Mode
you can google that feature, but essentially manifests in the problem you describe above
The formatting problem on the other hand does seem to be, here you need to create a class that serializes writes to the console window from a singe thread. a consumer/producer pattern would work (you could use a BlockingCollection to implement this quite easily)
Related
I've been working on a hobby project being developed in C# + Xamarin Forms + Prism + EF Core + Sqlite, debugging in UWP app.
I've written the following code to store tick data received from broker to Sqlite.
First, the OnTick call back that receives the ticks (approx. 1 tick per sec per instrument):
private void OnTick(Tick tickData)
{
foreach (var instrument in IntradayInstruments.Where(i => i.InstrumentToken == tickData.InstrumentToken))
{
instrument.UpdateIntradayCandle(tickData);
}
}
And the UpdateIntradayCandle method is:
public void UpdateIntradayCandle(Tick tick)
{
if (LastIntradayCandle != null)
{
if (LastIntradayCandle.Open == 0m)
{
LastIntradayCandle.Open = tick.LastPrice;
}
if (LastIntradayCandle.High < tick.LastPrice)
{
LastIntradayCandle.High = tick.LastPrice;
}
if (LastIntradayCandle.Low == 0m)
{
LastIntradayCandle.Low = tick.LastPrice;
}
else if (LastIntradayCandle.Low > tick.LastPrice)
{
LastIntradayCandle.Low = tick.LastPrice;
}
LastIntradayCandle.Close = tick.LastPrice;
}
}
The LastIntradayCandle is a property:
object _sync = new object();
private volatile IntradayCandle _lastIntradayCandle;
public IntradayCandle LastIntradayCandle
{
get
{
lock (_sync)
{
return _lastIntradayCandle;
}
}
set
{
lock (_sync)
{
_lastIntradayCandle = value;
}
}
}
Now, the LastIntradayCandle is changed periodically, say, 5 minutes, and a new candle is put in place for updating, from a different thread coming from a System.Threading.Timer which is scheduled to run every 5m.
public void AddNewIntradayCandle()
{
if (LastIntradayCandle != null)
{
LastIntradayCandle.IsClosed = true;
}
var newIntradayCandle = new IntradayCandle { Open = 0m, High = 0m, Low = 0m, Close = 0m };
LastIntradayCandle = newIntradayCandle;
IntradayCandles.Add(newIntradayCandle);
}
Now, the problem is, I'm getting 0s in those Open, High or Low but not in Close, Open having the most number of zeroes. This is happening very randomly.
I'm thinking that if any of the Open, High, Low or Close values is getting updated, it means the tick is having a value to be grabbed, but somehow one or more assignments in UpdateIntradayCandle method are not running. Having zeroes is a strict NO for the purpose of the app.
I'm neither formally trained as a programmer nor an expert, but a self-learning hobbyist and definitely never attempted at multi-threading before.
So, I request you to please point me what I am doing wrong, or better still, what should I be doing to make it work.
Multithreading and EF Core is not compatible things. EF Core context is not a thread safe. You have to create new context for each thread. Also making your object thread safe is wasting time.
So, schematically you have to do the following and you can remove locks from your object.
private void OnTick(Tick tickData)
{
using var ctx = new MyDbContext(...);
foreach (var instrument in ctx.IntradayInstruments.Where(i => i.InstrumentToken == tickData.InstrumentToken))
{
instrument.UpdateIntradayCandle(tickData);
}
ctx.SaveChanges();
}
I'm still somewhat new to threads and was wondering if I have an array of threads that get passed data from a loop. Is there any chance that the initial data passed to that thread could change before the thread actually starts processing it?
const int TOTAL_THREADS = 32;
Thread[] _threadList = new Thread[TOTAL_THREADS];
List<class> OIDS = new List<class>();
OIDS.Add(new class());
...
...
for (int i = 0; i < OIDS.Count; i++)
{
threadWait = true;
while (threadWait == true)
{
for (int t = 0; t < TOTAL_THREADS; t++)
{
if (_threadList[t] == null || _threadList[t].IsAlive == false)
{
class oid = OIDS[i];
_threadList[t] = new Thread(() => Worder.ProcessData(oid);
_threadList[t].Start();
threadWait = false;
break;
}
}
if (threadWait == true)
{
Thread.Sleep(1000);
}
}
}
Your code is trying to replicate thread-coordination functionality that already exists in the platform, in various ways and forms. For example you could use the Parallel class:
Parallel.ForEach(OIDS,
new ParallelOptions() { MaxDegreeOfParallelism = 32 },
Worder.ProcessData);
This could be a valid approach if your workload is CPU based (you are doing calculations, not requests to a database or to the web), and the available cores of your machine are at least 32¹. It is also required that the processing of each element of the OIDS list is independent, or if it's not that you are synchronizing the dependencies inside the Worder.ProcessData by using locks or other means to prevent concurrent access to shared state.
(¹ This is an advice regarding the general case, not a hard assertion)
I've been trying to solve this issue for quite some time now. I've written some example code showcasing the usage of lock in C#. Running my code manually I can see that it works the way it should, but of course I would like to write a unit test that confirms my code.
I have the following ObjectStack.cs class:
enum ExitCode
{
Success = 0,
Error = 1
}
public class ObjectStack
{
private readonly Stack<Object> _objects = new Stack<object>();
private readonly Object _lockObject = new Object();
private const int NumOfPopIterations = 1000;
public ObjectStack(IEnumerable<object> objects)
{
foreach (var anObject in objects) {
Push(anObject);
}
}
public void Push(object anObject)
{
_objects.Push(anObject);
}
public void Pop()
{
_objects.Pop();
}
public void ThreadSafeMultiPop()
{
for (var i = 0; i < NumOfPopIterations; i++) {
lock (_lockObject) {
try {
Pop();
}
//Because of lock, the stack will be emptied safely and no exception is ever caught
catch (InvalidOperationException) {
Environment.Exit((int)ExitCode.Error);
}
if (_objects.Count == 0) {
Environment.Exit((int)ExitCode.Success);
}
}
}
}
public void ThreadUnsafeMultiPop()
{
for (var i = 0; i < NumOfPopIterations; i++) {
try {
Pop();
}
//Because there is no lock, an exception is caught when popping an already empty stack
catch (InvalidOperationException) {
Environment.Exit((int)ExitCode.Error);
}
if (_objects.Count == 0) {
Environment.Exit((int)ExitCode.Success);
}
}
}
}
And Program.cs:
public class Program
{
private const int NumOfObjects = 100;
private const int NumOfThreads = 10000;
public static void Main(string[] args)
{
var objects = new List<Object>();
for (var i = 0; i < NumOfObjects; i++) {
objects.Add(new object());
}
var objectStack = new ObjectStack(objects);
Parallel.For(0, NumOfThreads, x => objectStack.ThreadUnsafeMultiPop());
}
}
I'm trying to write a unit that tests the thread unsafe method, by checking the exit code value (0 = success, 1 = error) of the executable.
I tried to start and run the application executable as a process in my test, a couple of 100 times, and checked the exit code value each time in the test. Unfortunately, it was 0 every single time.
Any ideas are greatly appreciated!
Logically, there is one, very small, piece of code where this problem can happen. Once one of the threads enters the block of code that pops a single element, then either the pop will work in which case the next line of code in that thread will Exit with success OR the pop will fail in which case the next line of code will catch the exception and Exit.
This means that no matter how much parallelization you put into the program, there is still only one single point in the whole program execution stack where the issue can occur and that is directly before the program exits.
The code is genuinely unsafe, but the probability of an issue happening in any single execution of the code is extremely low as it requires the scheduler to decide not to execute the line of code that will exit the environment cleanly and instead let one of the other Threads raise an exception and exit with an error.
It is extremely difficult to "prove" that a concurrency bug exists, except for really obvious ones, because you are completely dependent on what the scheduler decides to do.
Looking up some other posts I see this post which is written related to Java but references C#: How should I unit test threaded code?
It includes a link to this which might be useful to you: http://research.microsoft.com/en-us/projects/chess/
Hope this is useful and apologies if it is not. Testing concurrency is inherently unpredictable as is writing example code to cause it.
Thanks for all the input! Although I do agree that this is a concurrency issue quite hard to detect due to the scheduler execution among other things, I seem to have found an acceptable solution to my problem.
I wrote the following unit test:
[TestMethod]
public void Executable_Process_Is_Thread_Safe()
{
const string executablePath = "Thread.Locking.exe";
for (var i = 0; i < 1000; i++) {
var process = new Process() {StartInfo = {FileName = executablePath}};
process.Start();
process.WaitForExit();
if (process.ExitCode == 1) {
Assert.Fail();
}
}
}
When I ran the unit test, it seemed that the Parallel.For execution in Program.cs threw strange exceptions at times, so I had to change that to traditional for-loops:
public class Program
{
private const int NumOfObjects = 100;
private const int NumOfThreads = 10000;
public static void Main(string[] args)
{
var objects = new List<Object>();
for (var i = 0; i < NumOfObjects; i++) {
objects.Add(new object());
}
var tasks = new Task[NumOfThreads];
var objectStack = new ObjectStack(objects);
for (var i = 0; i < NumOfThreads; i++)
{
var task = new Task(objectStack.ThreadUnsafeMultiPop);
tasks[i] = task;
}
for (var i = 0; i < NumOfThreads; i++)
{
tasks[i].Start();
}
//Using this seems to throw exceptions from unit test context
//Parallel.For(0, NumOfThreads, x => objectStack.ThreadUnsafeMultiPop());
}
Of course, the unit test is quite dependent on the machine you're running it on (a fast processor may be able to empty the stack and exit safely before reaching the critical section in all cases).
1.) You could inject IL Inject Context switches on a post build of your code in the form of Thread.Sleep(0) using ILGenerator which would most likely help these issues to arise.
2.) I would recommend you take a look at the CHESS project by Microsoft research team.
I'm doing what amounts to a glorified mail merge and then file conversion to PDF... Based on .Net 4.5 I see a couple ways I can do the threading. The one using a thread safe queue seems interesting (Plan A), but I can see a potential problem. What do you think? I'll try to keep it short, but put in what is needed.
This works on the assumption that it will take far more time to do the database processing than the PDF conversion.
In both cases, the database processing for each file is done in its own thread/task, but PDF conversion could be done in many single threads/tasks (Plan B) or it can be done in a single long running thread (Plan A). It is that PDF conversion I am wondering about. It is all in a try/catch statement, but that thread must not fail or all fails (Plan A). Do you think that is a good idea? Any suggestions would be appreciated.
/* A class to process a file: */
public class c_FileToConvert
{
public string InFileName { get; set; }
public int FileProcessingState { get; set; }
public string ErrorMessage { get; set; }
public List<string> listData = null;
c_FileToConvert(string inFileName)
{
InFileName = inFileName;
FileProcessingState = 0;
ErrorMessage = ""; // yah, yah, yah - String.Empty
listData = new List<string>();
}
public void doDbProcessing()
{
// get the data from database and put strings in this.listData
DAL.getDataForFile(this.InFileName, this.ErrorMessage); // static function
if(this.ErrorMessage != "")
this.FileProcessingState = -1; //fatal error
else // Open file and append strings to it
{
foreach(string s in this.listData}
...
FileProcessingState = 1; // enum DB_WORK_COMPLETE ...
}
}
public void doPDFProcessing()
{
PDFConverter cPDFConverter = new PDFConverter();
cPDFConverter.convertToPDF(InFileName, InFileName + ".PDF");
FileProcessingState = 2; // enum PDF_WORK_COMPLETE ...
}
}
/*** These only for Plan A ***/
public ConcurrentQueue<c_FileToConvert> ConncurrentQueueFiles = new ConcurrentQueue<c_FileToConvert>();
public bool bProcessPDFs;
public void doProcessing() // This is the main thread of the Windows Service
{
List<c_FileToConvert> listcFileToConvert = new List<c_FileToConvert>();
/*** Only for Plan A ***/
bProcessPDFs = true;
Task task1 = new Task(new Action(startProcessingPDFs)); // Start it and forget it
task1.Start();
while(1 == 1)
{
List<string> listFileNamesToProcess = new List<string>();
DAL.getFileNamesToProcessFromDb(listFileNamesToProcess);
foreach(string s in listFileNamesToProcess)
{
c_FileToConvert cFileToConvert = new c_FileToConvert(s);
listcFileToConvert.Add(cFileToConvert);
}
foreach(c_FileToConvert c in listcFileToConvert)
if(c.FileProcessingState == 0)
Thread t = new Thread(new ParameterizedThreadStart(c.doDbProcessing));
/** This is Plan A - throw it on single long running PDF processing thread **/
foreach(c_FileToConvert c in listcFileToConvert)
if(c.FileProcessingState == 1)
ConncurrentQueueFiles.Enqueue(c);
/*** This is Plan B - traditional thread for each file conversion ***/
foreach(c_FileToConvert c in listcFileToConvert)
if(c.FileProcessingState == 1)
Thread t = new Thread(new ParameterizedThreadStart(c.doPDFProcessing));
int iCount = 0;
for(int iCount = 0; iCount < c_FileToConvert.Count; iCount++;)
{
if((c.FileProcessingState == -1) || (c.FileProcessingState == 2))
{
DAL.updateProcessingState(c.FileProcessingState)
listcFileToConvert.RemoveAt(iCount);
}
}
sleep(1000);
}
}
public void startProcessingPDFs() /*** Only for Plan A ***/
{
while (bProcessPDFs == true)
{
if (ConncurrentQueueFiles.IsEmpty == false)
{
try
{
c_FileToConvert cFileToConvert = null;
if (ConncurrentQueueFiles.TryDequeue(out cFileToConvert) == true)
cFileToConvert.doPDFProcessing();
}
catch(Exception e)
{
cFileToConvert.FileProcessingState = -1;
cFileToConvert.ErrorMessage = e.message;
}
}
}
}
Plan A seems like a nice solution, but what if the Task fails somehow? Yes, the PDF conversion can be done with individual threads, but I want to reserve them for the database processing.
This was written in a text editor as the simplest code I could, so there may be something, but I think I got the idea across.
How many files are you working with? 10? 100,000? If the number is very large, using 1 thread to run the DB queries for each file is not a good idea.
Threads are a very low-level control flow construct, and I advise you try to avoid a lot of messy and detailed thread spawning, joining, synchronizing, etc. etc. in your application code. Keep it stupidly simple if you can.
How about this: put the data you need for each file in a thread-safe queue. Create another thread-safe queue for results. Spawn some number of threads which repeatedly pull items from the input queue, run the queries, convert to PDF, then push the output into the output queue. The threads should share absolutely nothing but the input and output queues.
You can pick any number of worker threads which you like, or experiment to see what a good number is. Don't create 1 thread for each file -- just pick a number which allows for good CPU and disk utilization.
OR, if your language/libraries have a parallel map operator, use that. It will save you a lot of messing around.
This situation might seem strange but this is what i have to do:
Situation, i have a sharepoint portal and there was such an issue that there might be a problem while retrieving user profiles that there might be too slow when a lot of people online and perfrom that kind of action, so there was made a descision to make a console application to test it out.
The console application needs to simulate behavior for retrieving the user profiles with as if many different users are doing that.
And there must be a log written.
The first question is this kind of testing a good way to really know where exactly the problme is?
And the other question is about my application, i have a strange behavior:
public class Program
{
static void Main(string[] args)
{
string filePath = #"C:\Users\User\Desktop\logfile.txt";
string siteUrl = #"http://siteurl";
int threads = 1;
//Multiplicator multiplicator = new Multiplicator(filePath, siteUrl, threads);
//Console.ReadLine();
for (int i = 0; i < 100; i++)
{
Thread t = new Thread(Execute);
t.Start();
}
Console.WriteLine("Main thread: " + Thread.CurrentThread.Name);
// Simultaneously, do something on the main thread.
}
static void Execute()
{
for (int i = 0; i < 100; i++)
{
using (SPSite ospSite = new SPSite(#"http://siteurl"))
{
SPServiceContext serviceContext = SPServiceContext.GetContext(ospSite);
UserProfileManager profileManager = new UserProfileManager(serviceContext);
UserProfile userProfile = profileManager.GetUserProfile("User Name");
string message = "Retrieved: " + userProfile.DisplayName + " on " +DateTime.Now + "by " Thread.CurrentThread.Name;
Console.WriteLine(message);
}
}
}
}
So the problem is i never get the name of the thread written why?
Thread.CurrentThread.Name is empty, is it normal, maybe i initialize the threading wrong? Altho many sources said that it is done like this?
You have not set the name of the thread. You should do so before you start it, and you can incorporate the iteration number in the name, if you like:
Thread t = new Thread(Execute);
t.Name = "My Thread" + i.ToString();
t.Start();
They are not given names automatically. The name can only be set once, after which you would get an InvalidOperationException
MSDN Reference: Thread.Name
Incidentally, creating 100 threads is probably not a good idea under normal circumstances.
You need to give it a name. Just when you create the thread, just name it based on i
Thread t = new Thread(Execute) { Name = i.ToString() };
Ok, I will go for your first question. No, there is a better way to do this.
Put the site under load. Maybe a friend can hit F5 all the time or you run a batch file with 1000 lines of a curl get
Attach the Visual Studio debugger to the webserver process
Hit break 10 times and see where it stops most of the time. That is your hotspot/problem.
This is called the poor mans profiler. It is built into every Visual Studio ;-)
In general, it is easy to find such problems by doing profiling. There are even sophisticated tools for this.