Writing a sync application using Windows Service to process files in Parallel

Writing a sync application using Windows Service to process files in Parallel - c#

I have a folder on my windows server, where people will be uploading CSV files to, C:\Uploads.
I want to write a simple windows service application that will scan this uploads folder (every 5 seconds) and collect the files in and process them in parallel (Thread /per File?). However, the main scanning process should not overlap, i.e. locking is required.
So, I was experimenting with it like this:
I am aware this is not windows service code, it's a console app to test ideas...
Updated Code, based on dcastro's reply
class Program
{
static Timer _InternalTimer;
static Object _SyncLock = new Object();
static void Main(string[] args)
{
_InternalTimer = new Timer(InitProcess, null, 0, 5000); // Sync cycle is every 5 sec
Console.ReadKey();
}
private static void InitProcess(Object state)
{
ConsoleLog("Starting Process");
StartProcess();
}
static void StartProcess()
{
bool lockTaken = false;
try
{
Monitor.TryEnter(_SyncLock, ref lockTaken);
if (lockTaken)
{
ConsoleLog("Lock Acquired. Doing some dummy work...");
List<string> fileList = new List<string>()
{
"fileA.csv",
"fileB.csv"
};
Parallel.ForEach(fileList, (string fileName) =>
{
ConsoleLog("Processing File: " + fileName);
Thread.Sleep(10000); // 10 sec to process each file
});
GC.Collect();
}
else
ConsoleLog("Sync Is Busy, Skipping Cycle");
}
finally
{
if (lockTaken)
Monitor.Exit(_SyncLock);
}
}
static void ConsoleLog(String Message)
{
Console.WriteLine("[{0}]: {1}",
DateTime.UtcNow.ToString("HH:mm:ss tt"),
Message);
}
}
When it runs, it looks like this:
Does this look right? Any help/tips on improving this will be much appreciated.

It seems fine to me, apart from the fact that you don't need to start a task with Task.Factory.StartNew. The System.Threading.Timer already executes your callback on the ThreadPool, so there's no need to launch yet another task that will also be run on the thread pool.
Also, if your timer ticks every 5 seconds, and you expect it to take about 10 secs to process the files, then your threads will begin to queue up waiting for the lock to be released. That happened on the example you posted.
If this is the case, I would either increase the timer's period to more than 10 secs, or use Monitor.TryEnter instead of a regular lock. TryEnter will try to acquire the lock, and return immediately regardless of whether or not the lock was taken. If the lock is currently taken by another thread, you just skip this tick entirely.

Related

Alternative to ParallelForEach That can allow me to kill parallel processes immediately on Application Exit? [duplicate]

This question already has answers here:
Kill child process when parent process is killed
(16 answers)
Closed 4 months ago.
I am doing a simple console application that loads files from a database into a hashset. These files are then processed in a parallel foreach loop. This console application does launch a new Process object for each file it needs to process. So it opens new console windows with the application running. I am doing it this way because of logging issues I have if I run parsing from within the application where logs from different threads write into each other.
The issue is, when I do close the application, the parallel foreach loop still tries to process more files before exiting. I want all tasks in the code to stop immediately when I kill the application. Here is code excerpts:
My cancel is borrowed from: Capture console exit C#
Essentially the program performs some cleanup duties when it receives a cancel command such as CTRL+C or closing window with X button
The code I am trying to cancel is here:
class Program
{
private static bool _isFileLoadingDone;
static ConcurrentDictionary<int, Tuple<Tdx2KlarfParserProcInfo, string>> _currentProcessesConcurrentDict = new ConcurrentDictionary<int, Tuple<Tdx2KlarfParserProcInfo, string>>();
static void Main(string[] args)
{
try
{
if (args.Length == 0)
{
// Some boilerplate to react to close window event, CTRL-C, kill, etc
LaunchFolderMode();
}
}
}
}
Which calls:
private static void LaunchFolderMode()
{
//Some function launched from Task
ParseFilesUntilEmpty();
}
And this calls:
private static void ParseFilesUntilEmpty()
{
while (!_isFileLoadingDone)
{
ParseFiles();
}
ParseFiles();
}
Which calls:
private static void ParseFiles()
{
filesToProcess = new HashSet<string>(){#"file1", "file2", "file3", "file4"} //I actuall get files from a db. this just for example
//_fileStack = new ConcurrentStack<string>(filesToProcess);
int parallelCount = 2
Parallel.ForEach(filesToProcess, new ParallelOptions { MaxDegreeOfParallelism = parallelCount },
tdxFile =>{
ConfigureAndStartProcess(tdxFile);
});
}
Which finally calls:
public static void ConfigureAndStartProcess(object fileName)
{
string fileFullPath = fileName.ToString();
Process proc = new Process();
string fileFullPathArg1 = fileFullPath;
string appName = #".\TDXXMLParser.exe";
if (fileFullPathArg1.Contains(".gz"))
{
StartExe(appName, proc, fileFullPathArg1); //I set up the arguments and launch the exes. And add the processes to _currentProcessesConcurrentDict
proc.WaitForExit();
_currentProcessesConcurrentDict.TryRemove(proc.Id, out Tuple<Tdx2KlarfParserProcInfo, string> procFileTypePair);
proc.Dispose();
}
}
The concurrent dictionary to monitor processes uses the following class in the tuple:
public class Tdx2KlarfParserProcInfo
{
public int ProcId { get; set; }
public List<long> MemoryAtIntervalList { get; set; } = new List<long>();
}
For the sake of how long these code excerpts are, I omitted the 'StartExe()' function. All it does is set up arguments and starts the Process object process.
Why is the parallel.Foreach insisting on running even after I close the program? Is there a better parallel processing method I can use which will allow me to kill whatever files I am currently processing immedietly without trying to start a new process. Which the parallel.Foreach does?
I have tried killing it with Parallel State Stop method but it still tries to process more files before finally exiting.

Unless I'm mistaking, your code seems to do no work on its own, it just launches executables and waits for them to end. And yet you're starving your thread pool on code that's just sitting there waiting for the external processes to end. Now, again if I understand correctly, this part works. It's very wasteful and utterly non-scalable, but it works.
The only thing you seem to be missing is closing the processes early when your own process ends. This is rather trivial: CancellationToken. You simply create a CancellationTokenSource in your main function and pass it down to every worker object, and when your program is meant to end you set it. That only leaves you to respond to it, and that's as easy as replacing your proc.WaitForExit(); with something like
// this is how we coded in .Net 1.0, released in Feb. 2002.
while(!proc.HasExited && !ct.IsCancellationRequested)
Thread.Sleep(1000);
if(ct.IsCancellationRequested)
proc.Kill();
Now, if you also want to fix your first problem, start writing async code. Process.WaitForExitAsync(CancellationToken) returns an awaitable task that you can await with a cancellation token, so the work is done for you. Stop using Parallel.ForEach, this isn't the 90s, you have Task.WhenAll to do the collection. And at the end of all this, you'll see that your code will boil down to perhaps 10 good lines of code, instead of the mess you made for yourself.

Load Test using C# Async Await

I am creating a console program, which can test read / write to a Cache by simulating multiple clients, and have written following code. Please help me understand:
Is it correct way to achieve the multi client simulation
What can I do more to make it a genuine load test
void Main()
{
List<Task<long>> taskList = new List<Task<long>>();
for (int i = 0; i < 500; i++)
{
taskList.Add(TestAsync());
}
Task.WaitAll(taskList.ToArray());
long averageTime = taskList.Average(t => t.Result);
}
public static async Task<long> TestAsync()
{
// Returns the total time taken using Stop Watch in the same module
return await Task.Factory.StartNew(() => // Call Cache Read / Write);
}

Adjusted your code slightly to see how many threads we have at a particular time.
static volatile int currentExecutionCount = 0;
static void Main(string[] args)
{
List<Task<long>> taskList = new List<Task<long>>();
var timer = new Timer(Print, null, TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(1));
for (int i = 0; i < 1000; i++)
{
taskList.Add(DoMagic());
}
Task.WaitAll(taskList.ToArray());
timer.Change(Timeout.Infinite, Timeout.Infinite);
timer = null;
//to check that we have all the threads executed
Console.WriteLine("Done " + taskList.Sum(t => t.Result));
Console.ReadLine();
}
static void Print(object state)
{
Console.WriteLine(currentExecutionCount);
}
static async Task<long> DoMagic()
{
return await Task.Factory.StartNew(() =>
{
Interlocked.Increment(ref currentExecutionCount);
//place your code here
Thread.Sleep(TimeSpan.FromMilliseconds(1000));
Interlocked.Decrement(ref currentExecutionCount);
return 4;
}
//this thing should give a hint to scheduller to use new threads and not scheduled
, TaskCreationOptions.LongRunning
);
}
The result is: inside a virtual machine I have from 2 to 10 threads running simultaneously if I don't use the hint. With the hint — up to 100. And on real machine I can see 1000 threads at once. Process explorer confirms this. Some details on the hint that would be helpful.

If it is very busy, then apparently your clients have to wait a while before their requests are serviced. Your program does not measure this, because your stopwatch starts running when the service request starts.
If you also want to measure what happen with the average time before a request is finished, you should start your stopwatch when the request is made, not when the request is serviced.
Your program takes only threads from the thread pool. If you start more tasks then there are threads, some tasks will have to wait before TestAsync starts running. This wait time would be measured if you remember the time Task.Run is called.
Besides the flaw in time measurements, how many service requests do you expect simultaneously? Are there enough free threads in your thread pool to simulate this? If you expect about 50 service requests at the same time, and the size of your thread pool is only 20 threads, then you'll never run 50 service requests at the same time. Vice versa: if your thread pool is way bigger than your number of expected simultaneous service requests, then you'll measure longer times than are actual the case.
Consider changing the number of threads in your thread pool, and make sure no one else uses any threads of the pool.

how to run a background thread properly for a MVC app on shared host?

I need to run a background thread for my MVC 4 app, where the thread wakes up every hour or so to delete old files in database, then goes back to sleep. This method is below:
//delete old files from database
public void CleanDB()
{
while (true)
{
using (UserZipDBContext db = new UserZipDBContext())
{
//delete old files
DateTime timePoint = DateTime.Now.AddHours(-24);
foreach (UserZip file in db.UserFiles.Where(f => f.UploadTime < timePoint))
{
db.UserFiles.Remove(file);
}
db.SaveChanges();
}
//sleep for 1 hour
Thread.Sleep(new TimeSpan(1, 0, 0));
}
}
but where should I start this thread? The answer in this question creates a new Thread and start it in Global.asax, but this post also mentions that "ASP.NET is not designed for long running tasks". My app would run on a shared host where I don't have admin privilege, so I don't think i can install a seperate program for this task.
in short,
Is it okay to start the thread in Global.asax given my thread doesn't do much (sleep most of the time and small db)?
I read the risk of this approach is that the thread might get killed (though not sure why). How can i detect when the thread is killed and what can i do?
If this is a VERY bad idea, what else can I do on a shared host?
Thanks!
UPDATE
#usr mentioned that methods in Application_Start can be called more than once and suggested using Lazy. Before I read up on that topic, I thought of this approach. Calling SimplePrint.startSingletonThread() multiple times would only instantiate a single thread (i think). Is that correct?
public class SimplePrint
{
private static Thread tInstance = null;
private SimplePrint()
{
}
public static void startSingletonThread()
{
if (tInstance == null)
{
tInstance = new Thread(new ThreadStart(new SimplePrint().printstuff));
tInstance.Start();
}
}
private void printstuff()
{
DateTime d = DateTime.Now;
while (true)
{
Console.WriteLine("thread started at " + d);
Thread.Sleep(2000);
}
}
}

I think you should try Hangfire.
Incredibly easy way to perform fire-and-forget, delayed and recurring
tasks inside ASP.NET applications. No Windows Service required.
Backed by Redis, SQL Server, SQL Azure, MSMQ, RabbitMQ.
So you don't need admin priveleges.
RecurringJob.AddOrUpdate(
() =>
{
using (UserZipDBContext db = new UserZipDBContext())
{
//delete old files
DateTime timePoint = DateTime.Now.AddHours(-24);
foreach (UserZip file in db.UserFiles.Where(f => f.UploadTime < timePoint))
{
db.UserFiles.Remove(file);
}
db.SaveChanges();
}
}
Cron.Hourly);

ASP.NET is not designed for long-running tasks, yes. But only because their work and data can be lost at any time when the worker process restarts.
You do not keep any state between iterations of your task. The task can safely abort at any time. This is safe to run in ASP.NET.
Starting the thread in Application_Start is a problem because that function can be called multiple times (surprisingly). I suggest you make sure to only start the deletion task once, for example by using Lazy<T> and accessing its Value property in Application_Start.
static readonly Lazy<object> workerFactory =
new Lazy<object>(() => { StartThread(); return null; });
Application_Start:
var dummy = workerFactory.Value;
For some reason I cannot think of a better init-once pattern right now. Nothing without locks, volatile or Interlocked which are solutions of last resort.

Sporadic memory bloat using Toub's thread pool for long running tasks?

I have read the Toub's thread pool is a good solution for longer running tasks, so I implemented it in the following code. I'm not even sure if my implementation is a good one because I seem to have sporadic memory bloat. The process runs around 50 MB most of the time then will spike to almost a GB and stay there.
The thread pool implementation is as follows (should I even be doing this?):
private void Run()
{
while (!_stop)
{
// Create new threads if we have room in the pool
while (ManagedThreadPool.ActiveThreads < _runningMax)
{
ManagedThreadPool.QueueUserWorkItem(new WaitCallback(FindWork));
}
// Pause for a second so we don't run the CPU to death
Thread.Sleep(1000);
}
}
The method FindWork looks like this:
private void FindWork(object stateInfo)
{
bool result = false;
bool process = false;
bool queueResult = false;
Work_Work work = null;
try
{
using (Queue workQueue = new Queue(_workQueue))
{
// Look for work on the work queue
workQueue.Open(Queue.Mode.Consume);
work = workQueue.ConsumeWithBlocking<Work_Work>();
// Do some work with the message from the queue ...
return;
The ConsumeWithBlocking method blocks if there is nothing in the queue. Then we call return to exit the thread if we successfully retrieve a message and process it.
Typically we run 10 threads with them typically in the blocking state (WaitSleepJoin). The whole point of this is to have 10 threads running at all times.
Am I going about this all wrong?

AutoResetEvent Reset Method

super simple question, but I just wanted some clarification. I want to be able to restart a thread using AutoResetEvent, so I call the following sequence of methods to my AutoResetEvent.
setupEvent.Reset();
setupEvent.Set();
I know it's really obvious, but MSDN doesn't state in their documentation that the Reset method restarts the thread, just that it sets the state of the event to non-signaled.
UPDATE:
Yes the other thread is waiting at WaitOne(), I'm assuming when it gets called it will resume at the exact point it left off, which is what I don't want, I want it to restart from the beginning. The following example from this valuable resource illustrates this:
static void Main()
{
new Thread (Work).Start();
_ready.WaitOne(); // First wait until worker is ready
lock (_locker) _message = "ooo";
_go.Set(); // Tell worker to go
_ready.WaitOne();
lock (_locker) _message = "ahhh"; // Give the worker another message
_go.Set();
_ready.WaitOne();
lock (_locker) _message = null; // Signal the worker to exit
_go.Set();
}
static void Work()
{
while (true)
{
_ready.Set(); // Indicate that we're ready
_go.WaitOne(); // Wait to be kicked off...
lock (_locker)
{
if (_message == null) return; // Gracefully exit
Console.WriteLine (_message);
}
}
}
If I understand this example correctly, notice how the Main thread will resume where it left off when the Work thread signals it, but in my case, I would want the Main thread to restart from the beginning.
UPDATE 2:
#Jaroslav Jandek - It's quite involved, but basically I have a CopyDetection thread that runs a FileSystemWatcher to monitor a folder for any new files that are moved or copied into it. My second thread is responsible for replicating the structure of that particular folder into another folder. So my CopyDetection thread has to block that thread from working while a copy/move operation is in progress. When the operation completes, the CopyDetection thread restarts the second thread so it can re-duplicate the folder structure with the newly added files.
UPDATE 3:
#SwDevMan81 - I actually didn't think about that and that would work save for one caveat. In my program, the source folder that is being duplicated is emptied once the duplication process is complete. That's why I have to block and restart the second thread when new items are added to the source folder, so it can have a chance to re-parse the folder's new structure properly.
To address this, I'm thinking of maybe adding a flag that signals that it is safe to delete the source folder's contents. Guess I could put the delete operation on it's own Cleanup thread.
#Jaroslav Jandek - My apologies, I thought it would be a simple matter to restart a thread on a whim. To answer your questions, I'm not deleting the source folder, only it's content, it's a requirement by my employer that unfortunately I cannot change. Files in the source folder are getting moved, but not all of them, only files that are properly validated by another process, the rest must be purged, i.e. the source folder is emptied. Also, the reason for replicating the source folder structure is that some of the files are contained within a very strict sub-folder hierarchy that must be preserved in the destination directory. Again sorry for making it complicated. All of these mechanisms are in place, have been tested and are working, which is why I didn't feel the need to elaborate on them. I only need to detect when new files are added so I may properly halt the other processes while the copy/move operation is in progress, then I can safely replicate the source folder structure and resume processing.

So thread 1 monitors and thread 2 replicates while other processes modify the monitored files.
Concurrent file access aside, you can't continue replicating after a change. So a successful replication only occurs when there is long enough delay between modifications. Replication cannot be stopped immediately since you replicate in chunks.
So the result of monitoring should be a command (file copy, file delete, file move, etc.).
The result of a successful replication should be an execution of a command.
Considering multiple operations can occur, you need a queue (or queued dictionary - to only perform 1 command on a file) of commands.
// T1:
somethingChanged(string path, CT commandType)
{
commandQueue.AddCommand(path, commandType);
}
// T2:
while (whatever)
{
var command = commandQueue.Peek();
if (command.Execute()) commandQueue.Remove();
else // operation failed, do what you like.
}
Now you may ask how to create a thread-safe query, but that probably belongs to another question (there are many implementations on the web).
EDIT (queue-less version with whole dir replication - can be used with query):
If you do not need multiple operations (eg. always replication the whole directory) and expect the replication to always finish or fail and cancel, you can do:
private volatile bool shouldStop = true;
// T1:
directoryChanged()
{
// StopReplicating
shouldStop = true;
workerReady.WaitOne(); // Wait for the worker to stop replicating.
// StartReplicating
shouldStop = false;
replicationStarter.Set();
}
// T2:
while (whatever)
{
replicationStarter.WaitOne();
... // prepare, throw some shouldStops so worker does not have to work too much.
if (!shouldStop)
{
foreach (var file in files)
{
if (shouldStop) break;
// Copy the file or whatever.
}
}
workerReady.Set();
}

I think this example clarifies (to me anyway) how reset events work:
var resetEvent = new ManualResetEvent(false);
var myclass = new MyAsyncClass();
myclass.MethodFinished += delegate
{
resetEvent.Set();
};
myclass.StartAsyncMethod();
resetEvent.WaitOne(); //We want to wait until the event fires to go on
Assume that MyAsyncClass runs the method on a another thread and fires the event when complete.
This basically turns the asynchronous "StartAsyncMethod" into a synchronous one. Many times I find a real-life example more useful.
The main difference between AutoResetEvent and ManualResetEvent, is that using AutoResetEvent doesn't require you to call Reset(), but automatically sets the state to "false". The next call to WaitOne() blocks when the state is "false" or Reset() has been called.

You just need to make it loop like the other Thread does. Is this what you are looking for?
class Program
{
static AutoResetEvent _ready = new AutoResetEvent(false);
static AutoResetEvent _go = new AutoResetEvent(false);
static Object _locker = new Object();
static string _message = "Start";
static AutoResetEvent _exitClient = new AutoResetEvent(false);
static AutoResetEvent _exitWork = new AutoResetEvent(false);
static void Main()
{
new Thread(Work).Start();
new Thread(Client).Start();
Thread.Sleep(3000); // Run for 3 seconds then finish up
_exitClient.Set();
_exitWork.Set();
_ready.Set(); // Make sure were not blocking still
_go.Set();
}
static void Client()
{
List<string> messages = new List<string>() { "ooo", "ahhh", null };
int i = 0;
while (!_exitClient.WaitOne(0)) // Gracefully exit if triggered
{
_ready.WaitOne(); // First wait until worker is ready
lock (_locker) _message = messages[i++];
_go.Set(); // Tell worker to go
if (i == 3) { i = 0; }
}
}
static void Work()
{
while (!_exitWork.WaitOne(0)) // Gracefully exit if triggered
{
_ready.Set(); // Indicate that we're ready
_go.WaitOne(); // Wait to be kicked off...
lock (_locker)
{
if (_message != null)
{
Console.WriteLine(_message);
}
}
}
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Writing a sync application using Windows Service to process files in Parallel - c#

Related

Alternative to ParallelForEach That can allow me to kill parallel processes immediately on Application Exit? [duplicate]

Load Test using C# Async Await

how to run a background thread properly for a MVC app on shared host?

Sporadic memory bloat using Toub's thread pool for long running tasks?

AutoResetEvent Reset Method

Categories

Resources