I have an application which downloads a single file in 4 different segments (each segment is a different long running task) and serializes the progress at periodic intervals.
When the serialize takes place on a 2 core machine the process will block for 3-10 seconds and then complete. Once this blocking behavior has happened the first time it will never happen again. The 2nd to (n) calls to serialize execute immediately without a hitch. It appears that during the first call blocking the framework did some Task/Threading optimization that prevents the blocking behavior form occurring again.
Does anyone have any insight into this optimization and if it is possible to do this type of optimization at initialization to avoid the blocking behavior all together?
Here is some pseudo code to help describe the situation:
class Download
{
private DownloadSegment[] _downloadSegments = new DownloadSegment[4];
public void StartDownload()
{
for (int i = 0; i < 4; i++)
{
_downloadSegments[i] = new DownloadSegment();
_downloadSegments[i].Start();
}
}
public void Serialize()
{
try
{
//enter lock so the code does not serilize the progress while
//writing new downloaded info at the same time
foreach(DownloadSegment segment in _downloadSegments)
{
Monitor.Enter(segment.SyncRoot);
}
//code to serialize the progress
}
finally
{
foreach (DownloadSegment segment in _downloadSegments)
{
Monitor.Exit(segment.SyncRoot);
}
}
}
}
class DownloadSegment
{
public object SyncRoot = new object();
public void Start()
{
Task downloadTask = new Task(
() =>
{
//download code
lock (SyncRoot)
{
//wirte to disk code
}
},
new CancellationToken(),
TaskCreationOptions.LongRunning);
downloadTask.Start(TaskScheduler.Default);
}
}
Thank you for your suggestions usr. It turns out it was the filestream write causing the problem.
It turns out that when you use FileStream.SetLength() to allocate file space for a large file and then seek to a section deep inside that file, once you start writing at that position in the file the sparse file space will be "backfilled" and this can be slow. The locks I have in the code simply expose the problem by making the unblocked segments wait on the ones that are busy.
I will be posting a separate question to see if there is a more efficient way to write/create these files.
Related
I have the following use case. Multiple threads are creating data points which are collected in a ConcurrentBag. Every x ms a single consumer thread looks at the data points that came in since the last time and processes them (e.g. count them + calculate average).
The following code more or less represents the solution that I came up with:
private static ConcurrentBag<long> _bag = new ConcurrentBag<long>();
static void Main()
{
Task.Run(() => Consume());
var producerTasks = Enumerable.Range(0, 8).Select(i => Task.Run(() => Produce()));
Task.WaitAll(producerTasks.ToArray());
}
private static void Produce()
{
for (int i = 0; i < 100000000; i++)
{
_bag.Add(i);
}
}
private static void Consume()
{
while (true)
{
var oldBag = _bag;
_bag = new ConcurrentBag<long>();
var average = oldBag.DefaultIfEmpty().Average();
var count = oldBag.Count;
Console.WriteLine($"Avg = {average}, Count = {count}");
// Wait x ms
}
}
Is a ConcurrentBag the right tool for the job here?
Is switching the bags the right way to achieve clearing the list for new data points and then processing the old ones?
Is it safe to operate on oldBag or could I run into trouble when I iterate over oldBag and a thread is still adding an item?
Should I use Interlocked.Exchange() for switching the variables?
EDIT
I guess the above code was not really a good representation of what I'm trying to achieve. So here is some more code to show the problem:
public class LogCollectorTarget : TargetWithLayout, ILogCollector
{
private readonly List<string> _logMessageBuffer;
public LogCollectorTarget()
{
_logMessageBuffer = new List<string>();
}
protected override void Write(LogEventInfo logEvent)
{
var logMessage = Layout.Render(logEvent);
lock (_logMessageBuffer)
{
_logMessageBuffer.Add(logMessage);
}
}
public string GetBuffer()
{
lock (_logMessageBuffer)
{
var messages = string.Join(Environment.NewLine, _logMessageBuffer);
_logMessageBuffer.Clear();
return messages;
}
}
}
The class' purpose is to collect logs so they can be sent to a server in batches. Every x seconds GetBuffer is called. This should get the current log messages and clear the buffer for new messages. It works with locks but it as they are quite expensive I don't want to lock on every Logging-operation in my program. So that's why I wanted to use a ConcurrentBag as a buffer. But then I still need to switch or clear it when I call GetBuffer without loosing any log messages that happen during the switch.
Since you have a single consumer, you can work your way with a simple ConcurrentQueue, without swapping collections:
public class LogCollectorTarget : TargetWithLayout, ILogCollector
{
private readonly ConcurrentQueue<string> _logMessageBuffer;
public LogCollectorTarget()
{
_logMessageBuffer = new ConcurrentQueue<string>();
}
protected override void Write(LogEventInfo logEvent)
{
var logMessage = Layout.Render(logEvent);
_logMessageBuffer.Enqueue(logMessage);
}
public string GetBuffer()
{
// How many messages should we dequeue?
var count = _logMessageBuffer.Count;
var messages = new StringBuilder();
while (count > 0 && _logMessageBuffer.TryDequeue(out var message))
{
messages.AppendLine(message);
count--;
}
return messages.ToString();
}
}
If memory allocations become an issue, you can instead dequeue them to a fixed-size array and call string.Join on it. This way, you're guaranteed to do only two allocations (whereas the StringBuilder could do many more if the initial buffer isn't properly sized):
public string GetBuffer()
{
// How many messages should we dequeue?
var count = _logMessageBuffer.Count;
var buffer = new string[count];
for (int i = 0; i < count; i++)
{
_logMessageBuffer.TryDequeue(out var message);
buffer[i] = message;
}
return string.Join(Environment.NewLine, buffer);
}
Is a ConcurrentBag the right tool for the job here?
Its the right tool for a job, this really depends on what you are trying to do, and why. The example you have given is very simplistic without any context so its hard to tell.
Is switching the bags the right way to achieve clearing the list for
new data points and then processing the old ones?
The answer is no, for probably many reasons. What happens if a thread writes to it, while you are switching it?
Is it safe to operate on oldBag or could I run into trouble when I
iterate over oldBag and a thread is still adding an item?
No, you have just copied the reference, this will achieve nothing.
Should I use Interlocked.Exchange() for switching the variables?
Interlock methods are great things, however this will not help you in your current problem, they are for thread safe access to integer type values. You are really confused and you need to look up more thread safe examples.
However Lets point you in the right direction. forget about ConcurrentBag and those fancy classes. My advice is start simple and use locking so you understand the nature of the problem.
If you want multiple tasks/threads to access a list, you can easily use the lock statement and guard access to the list/array so other nasty threads aren't modifying it.
Obviously the code you have written is a nonsensical example, i mean you are just adding consecutive numbers to a list, and getting another thread to average them them. This hardly needs to be consumer producer at all, and would make more sense to just be synchronous.
At this point i would point you to better architectures that would allow you to implement this pattern, e.g Tpl Dataflow, but i fear this is just a learning excise and unfortunately you really need to do more reading on multithreading and try more examples before we can truly help you with a problem.
It works with locks but it as they are quite expensive. I don't want to lock on every logging-operation in my program.
Acquiring an uncontended lock is actually quite cheap. Quoting from Joseph Albahari's book:
You can expect to acquire and release a lock in as little as 20 nanoseconds on a 2010-era computer if the lock is uncontended.
Locking becomes expensive when it is contended. You can minimize the contention by reducing the work inside the critical region to the absolute minimum. In other words don't do anything inside the lock that can be done outside the lock. In your second example the method GetBuffer does a String.Join inside the lock, delaying the release of the lock and increasing the chances of blocking other threads. You can improve it like this:
public string GetBuffer()
{
string[] messages;
lock (_logMessageBuffer)
{
messages = _logMessageBuffer.ToArray();
_logMessageBuffer.Clear();
}
return String.Join(Environment.NewLine, messages);
}
But it can be optimized even further. You could use the technique of your first example, and instead of clearing the existing List<string>, just swap it with a new list:
public string GetBuffer()
{
List<string> oldList;
lock (_logMessageBuffer)
{
oldList = _logMessageBuffer;
_logMessageBuffer = new();
}
return String.Join(Environment.NewLine, oldList);
}
Starting from .NET Core 3.0, the Monitor class has the property Monitor.LockContentionCount, that returns the number of times there was contention at the entry point of a lock. You could watch the delta of this property every second, and see if the number is concerning. If you get single-digit numbers, there is nothing to worry about.
Touching some of your questions:
Is a ConcurrentBag the right tool for the job here?
No. The ConcurrentBag<T> is a very specialized collection intended for mixed producer scenarios, mainly object pools. You don't have such a scenario here. A ConcurrentQueue<T> is preferable to a ConcurrentBag<T> in almost all scenarios.
Should I use Interlocked.Exchange() for switching the variables?
Only if the collection was immutable. If the _logMessageBuffer was an ImmutableQueue<T>, then it would be excellent to swap it with Interlocked.Exchange. With mutable types you have no idea if the old collection is still in use by another thread, and for how long. The operating system can suspend any thread at any time for a duration of 10-30 milliseconds or even more (demo). So it's not safe to use lock-free techniques. You have to lock.
My program needs to write very often messages to several files. As it is very time consuming, I need to optimise it. Below, you can find an extract from my program where I try to write async to file in the background. It seems to work, but I am not sure if it is the best practice as I do not dispose tasks (this part is commented). I do not do it because I do not want my program to wait for those tasks completion. Simply, I want my message to be written to few files in the background as quickly as possible. As those files could be accessed by several threads, I added lock.
I use static methods because these methods are used everywhere in my code and I do not want to instantiate this class, just to write one line of message to file, everywhere (maybe that's wrong).
================== Class ==============================================
namespace test
{
public static class MessageLOG
{
private static string debugFileName = Settings.DebugLOGfilename;
private static string logFileName = Settings.LOGfilename;
private static object DebuglockOn = new object();
private static object LoglockOn = new object();
private static StreamWriter DebugSW;
private static StreamWriter LogSW;
private static void DebugFile(string message)
{
uint linesCount = 0;
string _filename = debugFileName;
if(DebugSW == null && !string.IsNullOrEmpty(_filename))
DebugSW = new StreamWriter(_filename);
if(DebugSW != null)
{
lock(DebuglockOn)
{
DebugSW.WriteLine(message);
linesCount++;
if (linesCount > 10)
{
DebugSW.Flush();
linesCount = 0;
}
}
}
}
private static void LogFile(string message)
{
uint linesCount = 0;
string _filename = logFileName;
if(LogSW == null && !string.IsNullOrEmpty(_filename))
LogSW = new StreamWriter(_filename);
if(LogSW != null)
{
lock(LoglockOn)
{
LogSW.WriteLine(string.Format("{0} ({1}): {2}", DateTime.Now.ToShortDateString(), DateTime.Now.ToShortTimeString(), message));
linesCount++;
if (linesCount > 10)
{
LogSW.Flush();
linesCount = 0;
}
}
}
public static void LogUpdate(string message)
{
ThreadPool.QueueUserWorkItem(new WaitCallback( (x) => LogFile(message)));
ThreadPool.QueueUserWorkItem(new WaitCallback( (x) => DebugFile(message)));
ThreadPool.QueueUserWorkItem(new WaitCallback( (x) => Debug.WriteLine(message)));
}
//This method will be called when the main thread is being closed
public static void CloseAllStreams()
{
if (DebugSW != null)
{
DebugSW.Flush();
DebugSW.Close();
}
if (LogSW != null)
{
LogSW.Flush();
LogSW.Close();
}
}
=============== main window ===========
void MainWIndow()
{
... some code ....
MessageLog.LogUpdate("Message text");
... code cont ....
MessageLog.CloseAllStreams();
}
You should re-think your design. Your locks should not be local variables in your method. This is redundant because each method call creates a new object and locks to it. This will not force synchronization across multiple threads (https://msdn.microsoft.com/en-us/library/c5kehkcz(v=vs.80).aspx). Since your methods are static, the locks need to be static variables and you should have a different lock per file. You can use ThreadPool.QueueUserWorkItem (https://msdn.microsoft.com/en-us/library/kbf0f1ct(v=vs.110).aspx) instead of Tasks. ThreadPool is an internal .NET class that re-uses threads to run async operations. This is perfect for your use case because you don't need control over each thread. You just need some async operation to execute and finish on its own.
A better approach would be to create a logger class that runs on its own thread. You can have a queue and enqueue messages from multiple threads and then have the LoggerThread handle writing to the file. This will ensure that only one thread is ever writing to the file. This will also maintain logging order if you use a FIFO queue. You will no longer need to lock writing to the file, but you will need to lock your queue. You can use the .NET Monitor (https://msdn.microsoft.com/en-us/library/system.threading.monitor(v=vs.110).aspx) class to block the LoggerThread until a message is queued (look at methods Enter/Wait/Pulse). To optimize it even more, you can now keep a stream open to the file and push data to it as it gets queued. Since only one thread ever accesses the file, this will be OK. Just remember to close the stream to the file when you are done. You can also set up a timer that goes off once in a while to flush the content. Keeping the stream open is not always recommended, especially if you anticipate other applications attempting to lock the file. However, in this case, it might be OK. This will be a design decision you need to make that fits best with your application.
You´re opening a new stream and committing writings for each write action : very poor performance.
My recommendation is to use only one StreamWriter for each file, this instance must be a class field and you need to still using the lock to ensure is thread safe.
Also this would require that you don't use the using statement in each write method.
Also periodically , maybe every X number of writes, you could make a Stream.Flush to commit writings on disk. This Flush must be protected by the lock.
I need to run a background thread for my MVC 4 app, where the thread wakes up every hour or so to delete old files in database, then goes back to sleep. This method is below:
//delete old files from database
public void CleanDB()
{
while (true)
{
using (UserZipDBContext db = new UserZipDBContext())
{
//delete old files
DateTime timePoint = DateTime.Now.AddHours(-24);
foreach (UserZip file in db.UserFiles.Where(f => f.UploadTime < timePoint))
{
db.UserFiles.Remove(file);
}
db.SaveChanges();
}
//sleep for 1 hour
Thread.Sleep(new TimeSpan(1, 0, 0));
}
}
but where should I start this thread? The answer in this question creates a new Thread and start it in Global.asax, but this post also mentions that "ASP.NET is not designed for long running tasks". My app would run on a shared host where I don't have admin privilege, so I don't think i can install a seperate program for this task.
in short,
Is it okay to start the thread in Global.asax given my thread doesn't do much (sleep most of the time and small db)?
I read the risk of this approach is that the thread might get killed (though not sure why). How can i detect when the thread is killed and what can i do?
If this is a VERY bad idea, what else can I do on a shared host?
Thanks!
UPDATE
#usr mentioned that methods in Application_Start can be called more than once and suggested using Lazy. Before I read up on that topic, I thought of this approach. Calling SimplePrint.startSingletonThread() multiple times would only instantiate a single thread (i think). Is that correct?
public class SimplePrint
{
private static Thread tInstance = null;
private SimplePrint()
{
}
public static void startSingletonThread()
{
if (tInstance == null)
{
tInstance = new Thread(new ThreadStart(new SimplePrint().printstuff));
tInstance.Start();
}
}
private void printstuff()
{
DateTime d = DateTime.Now;
while (true)
{
Console.WriteLine("thread started at " + d);
Thread.Sleep(2000);
}
}
}
I think you should try Hangfire.
Incredibly easy way to perform fire-and-forget, delayed and recurring
tasks inside ASP.NET applications. No Windows Service required.
Backed by Redis, SQL Server, SQL Azure, MSMQ, RabbitMQ.
So you don't need admin priveleges.
RecurringJob.AddOrUpdate(
() =>
{
using (UserZipDBContext db = new UserZipDBContext())
{
//delete old files
DateTime timePoint = DateTime.Now.AddHours(-24);
foreach (UserZip file in db.UserFiles.Where(f => f.UploadTime < timePoint))
{
db.UserFiles.Remove(file);
}
db.SaveChanges();
}
}
Cron.Hourly);
ASP.NET is not designed for long-running tasks, yes. But only because their work and data can be lost at any time when the worker process restarts.
You do not keep any state between iterations of your task. The task can safely abort at any time. This is safe to run in ASP.NET.
Starting the thread in Application_Start is a problem because that function can be called multiple times (surprisingly). I suggest you make sure to only start the deletion task once, for example by using Lazy<T> and accessing its Value property in Application_Start.
static readonly Lazy<object> workerFactory =
new Lazy<object>(() => { StartThread(); return null; });
Application_Start:
var dummy = workerFactory.Value;
For some reason I cannot think of a better init-once pattern right now. Nothing without locks, volatile or Interlocked which are solutions of last resort.
I am working on a WPF project with C# (.NET 4.0) to capture a sequence of 300 video frames from a high-speed camera that need to be saved to disk (BMP format). The video frames need to be captured in near-exact time intervals, so I can't save the frames to disk as they're being captured -- the disk I/O is unpredictable and it throws off the time intervals between frames. The capture card has about 60 frame buffers available.
I'm not sure what the best approach is for implementing a solution to this problem. My initial thoughts are to create a "BufferToDisk" thread that saves the images from the frame buffers as they become available. In this scenario, the main thread captures a frame buffer and then signals the thread to indicate that it is OK to save the frame. The problem is that the frames are being captured quicker than the thread can save the files, so there needs to be some kind of synchronization to deal with this. I was thinking a Semaphore would be a good tool for this job. I have never used a Semaphore in this way, though, so I'm not sure how to proceed.
Is this a reasonable approach to this problem? If so, can someone post some code to get me started?
Any help is much appreciated.
Edit:
After looking over the linked "Threading in C# - Part 2" book excerpt, I decided to implement the solution by adapting the "ProducerConsumerQueue" class example. Here is my adapted code:
class ProducerConsumerQueue : IDisposable
{
EventWaitHandle _wh = new AutoResetEvent(false);
Thread _worker;
readonly object _locker = new object();
Queue<string> _tasks = new Queue<string>();
public ProducerConsumerQueue()
{
_worker = new Thread(Work);
_worker.Start();
}
public void EnqueueTask(string task)
{
lock (_locker) _tasks.Enqueue(task);
_wh.Set();
}
public void Dispose()
{
EnqueueTask(null); // Signal the consumer to exit.
_worker.Join(); // Wait for the consumer's thread to finish.
_wh.Close(); // Release any OS resources.
}
void Work()
{
while (true)
{
string task = null;
lock (_locker)
if (_tasks.Count > 0)
{
task = _tasks.Dequeue();
if (task == null)
{
return;
}
}
if (task != null)
{
// parse the parameters from the input queue item
string[] indexVals = task.Split(',');
int frameNum = Convert.ToInt32(indexVals[0]);
int fileNum = Convert.ToInt32(indexVals[1]);
string path = indexVals[2];
// build the file name
string newFileName = String.Format("img{0:d3}.bmp", fileNum);
string fqfn = System.IO.Path.Combine(path, newFileName);
// save the captured image to disk
int ret = pxd_saveBmp(1, fqfn, frameNum, 0, 0, -1, -1, 0, 0);
}
else
{
_wh.WaitOne(); // No more tasks - wait for a signal
}
}
}
}
Using the class in the main routine:
// capture bitmap images and save them to disk
using (ProducerConsumerQueue q = new ProducerConsumerQueue())
{
for (int i = 0; i < 300; i++)
{
if (curFrmBuf > numFrmBufs)
{
curFrmBuf = 1; // wrap around to the first frame buffer
}
// snap an image to the image buffer
int ret = pxd_doSnap(1, curFrmBuf, 0);
// build the parameters for saving the frame to image file (for the queue)
string fileSaveParams = curFrmBuf + "," + (i + 1) + "," + newPath;
q.EnqueueTask(fileSaveParams);
curFrmBuf++;
}
}
Pretty slick class -- a small amount of code for this functionality.
Thanks so much for the suggestions, guys.
Sure, sounds reasonable. You can use semaphores or other thread synchronization primitives. This sounds like a standard producer/consumer problem. Take a look here for some pseudo-code
.
What happens if the disk is so slow (e.g. some other process pegs it) that 60 frame buffers are not enough? Maybe you'll need a BufferToMemory and BufferToDisk thread or some sort of combination. You'll want the main thread (capture to buffer) to have the highest priority, BufferToMemory medium, and BufferToDisk the lowest.
Anyway, back to Semaphores, I recommend you read this: http://www.albahari.com/threading/part2.aspx#_Semaphore. Semaphores should do the trick for you, though I would recommend SemaphoreSlim (.NET 4).
Since you're treating this as a producer/consumer problem (judging by your reply to #siz's answer), you might want to look at BlockingCollection<T>, which is designed for precisely this sort of scenario.
It allows any number of producer threads to push data into the collection, and any number of consumer threads to pull it out again. In this case, you probably want just one producer and one consumer thread.
The BlockingCollection<T> does all the work of making sure the consumer thread only wakes up and processes work once the producing thread has said that there's more work to do. And it also takes care of allowing a queue of work to build up.
Current implementation: Waits until parallelCount values are collected, uses ThreadPool to process the values, waits until all threads complete, re-collect another set of values and so on...
Code:
private static int parallelCount = 5;
private int taskIndex;
private object[] paramObjects;
// Each ThreadPool thread should access only one item of the array,
// release object when done, to be used by another thread
private object[] reusableObjects = new object[parallelCount];
private void MultiThreadedGenerate(object paramObject)
{
paramObjects[taskIndex] = paramObject;
taskIndex++;
if (taskIndex == parallelCount)
{
MultiThreadedGenerate();
// Reset
taskIndex = 0;
}
}
/*
* Called when 'paramObjects' array gets filled
*/
private void MultiThreadedGenerate()
{
int remainingToGenerate = paramObjects.Count;
resetEvent.Reset();
for (int i = 0; i < paramObjects.Count; i++)
{
ThreadPool.QueueUserWorkItem(delegate(object obj)
{
try
{
int currentIndex = (int) obj;
Generate(currentIndex, paramObjects[currentIndex], reusableObjects[currentIndex]);
}
finally
{
if (Interlocked.Decrement(ref remainingToGenerate) == 0)
{
resetEvent.Set();
}
}
}, i);
}
resetEvent.WaitOne();
}
I've seen significant performance improvements with this approach, however there are a number of issues to consider:
[1] Collecting values in paramObjects and synchronization using resetEvent can be avoided as there is no dependency between the threads (or current set of values with the next set of values). I'm only doing this to manage access to reusableObjects (when a set paramObjects is done processing, I know that all objects in reusableObjects are free, so taskIndex is reset and each new task of the next set of values will have its unique 'reusableObj' to work with).
[2] There is no real connection between the size of reusableObjects and the number of threads the ThreadPool uses. I might initialize reusableObjects to have 10 objects, and say due to some limitations, ThreadPool can run only 3 threads for my MultiThreadedGenerate() method, then I'm wasting memory.
So by getting rid of paramObjects, how can the above code be refined in a way that as soon as one thread completes its job, that thread returns its taskIndex(or the reusableObj) it used and no longer needs so that it becomes available to the next value. Also, the code should create a reUsableObject and add it to some collection only when there is a demand for it. Is using a Queue here a good idea ?
Thank you.
There's really no reason to do your own manual threading and task management any more. You could restructure this to a more loosely-coupled model using Task Parallel Library (and possibly System.Collections.Concurrent for result collation).
Performance could be further improved if you don't need to wait for a full complement of work before handing off each Task for processing.
TPL came along in .Net 4.0 but was back-ported to .Net 3.5. Download here.