I'm looking for texts or advice on implementing stop and restart in file stream transfer.
The goal is for the application to use a single read source, and output to multiple write sources, and be able to restart from a recorded position if a transfer fails.
The application is being written in C# .NET.
Psuedo code:
while (reader.Read())
{
foreach(writer in writers)
{
writer.WriteToStream();
}
}
I need to be able to implement stop or pause. Which could work like so. To stop, continue is marked false:
while (reader.Read() && Continue)
{
foreach(writer in writers)
{
writer.WriteToStream();
}
}
Clearly at this stage I need to record the number of bytes read, and the number of bytes written to each write source.
My questions are:
If I were to only record the read bytes, and use this for restarts, one or more writers could have written while others have not. Simply restarting using a measure of read progress might corrupt the written data. So I need to use a 'written bytes per writer' record as my new start position. How can I be sure that the bytes were written (I may not have the ability to read the file from the write source to read the file length)?
Can anyone adviser or point me in the right direction of a text on this kind of issue?
Use a thread synchronization event.
(pseudocode):
ManualResetEvent _canReadEvent = new ManualResetEvent(true);
public void WriterThreadFunc()
{
while (_canReadEvent.Wait() && reader.Read())
{
foreach(writer in writers)
{
writer.WriteToStream();
}
}
}
public void Pause()
{
_canReadEvent.Reset();
}
public void Continue()
{
_canReadEvent.Set();
}
The good thing is that the writer thread won't consume any CPU when it's paused and it will continue directly it's signaled (as opposed to using a flag and Thread.Sleep())
The other note is that any check should be the first argument in the while since reader.Read() will read from the stream otherwise (but the content will be ignored since the flag will prevent the while block from being executed).
Related
I'm writing softare that records data from a number of sensors. A user should be able to press a button to start streaming and then another to start recording this data to a file. Each device has its own thread so pressing the stream button will start a thread to stream for each device, and pressing record should make all these threads start writing to files.
I've attempted to implement this by creating a new thread to start pulling samples and then using a volatile bool to tell the threads when to start writing the samples to a file.
Here is the code running inside the threads:
public void streamData(CancellationToken ct, liblsl.StreamInlet inlet)
{
while (true)
{
ct.ThrowIfCancellationRequested();
pullSampleFromLSL(inlet);
//start writing to file if requested
//if(_isRecording){
// writeToFile()
//}
}
}
This method hasn't provided the accuracy I was hoping for as each file records a different timestamp for when recording was started (recorded using Stopwatch.ElapsedMilliseconds from a set starting point). Is there a way to do this so that all the files begin at (as close to as possible) the exact same timestamp?
cheers
Actually I would use a monitor class that trought condition variables notify all thread to start.
Context:
I am implementing a logging mechanism for a Web API project that writes serialized objects to a file from multiple methods which in turn is read by an external process (nxLog to be more accurate). The application is hosted on IIS and uses 18 worker processes. The App pool is recycled once a day. The expected load on the services that will incorporate the logging methods is 10,000 req/s. In short this is a classic produces/consumer problem with multiple producers (the methods that produce logs) and one consumer (the external process who reads from the log files). Update: Each process uses multiple threads as well.
I used BlockingCollection to store data (and solve the race condition) and a long running task that writes the data from the collection to the disk.
To write to the disk I am using a StreamWriter and a FileStream.
Because the write frequency is almost constant ( as I said 10,000 write per second) I decided to keep the streams open for the entire lifetime of the application pool and periodically write logs to the disk. I rely on the App Pool recycle and my DI framework to dispose my logger daily. Also note that this class will be singleton, because I didn't want to have more than one thread dedicated to writing from my thread pool.
Apparently the FileStream object will not write to the disk until it is disposed. Now I don't want the FileStream to wait for an entire day until it writes to the disk. The memory it will require to hold all that serialized object will be tremendous, not to mention that any crash on the application or the server will cause data loss or corrupted file.
Now my question:
How can I have the underlying streams (FileStream and StreamWriter) write to the disk periodically without disposing them? My initial assumption was that it will write to the disk once FileSteam exceeds its buffer size which is 4K by default.
UPDATE: The inconsistencies mentioned in the answer have been fixed.
Code:
public class EventLogger: IDisposable, ILogger
{
private readonly BlockingCollection<List<string>> _queue;
private readonly Task _consumerTask;
private FileStream _fs;
private StreamWriter _sw;
public EventLogger()
{
OpenFile();
_queue = new BlockingCollection<List<string>>(50);
_consumerTask = Task.Factory.StartNew(Write, CancellationToken.None, TaskCreationOptions.LongRunning, TaskScheduler.Default);
}
private void OpenFile()
{
_fs?.Dispose();
_sw?.Dispose();
_logFilePath = $"D:\Log\log{DateTime.Now.ToString(yyyyMMdd)}{System.Diagnostic.Process.GetCurrentProcess().Id}.txt";
_fs = new FileStream(_logFilePath, FileMode.Append, FileAccess.Write, FileShare.ReadWrite);
_sw = new StreamWriter(_fs);
}
public void Dispose()
{
_queue?.CompleteAdding();
_consumerTask?.Wait();
_sw?.Dispose();
_fs?.Dispose();
_queue?.Dispose();
}
public void Log(List<string> list)
{
try
{
_queue.TryAdd(list, 100);
}
catch (Exception e)
{
LogError(LogLevel.Error, e);
}
}
private void Write()
{
foreach (List<string> items in _queue.GetConsumingEnumerable())
{
items.ForEach(item =>
{
_sw?.WriteLine(item);
});
}
}
}
There are a few "inconsistencies" with your question.
The application is hosted on IIS and uses 18 worker processes
.
_logFilePath = $"D:\Log\log{DateTime.Now.ToString(yyyyMMdd)}{System.Diagnostic.Process.GetCurrentProcess().Id}.txt";
.
writes serialized objects to a file from multiple methods
Putting all of this together, you seem to have a single threaded situation as opposed to a multi-threaded one. And since there is a separate log per process, there is no contention problem or need for synchronization. What I mean to say is, I don't see why the BlockingCollection is needed at all. It's possible that you forgot to mention that there are multiple threads within your web process. I will make that assumption here.
Another problems is that your code does not compile
class name is Logger but the EventLogger function looks like a constructor.
some more incorrect syntax with string, etc
Putting all that aside, if you really have a contention situation and want to write to the same log via multiple threads or processes, your class seems to have most of what you need. I have modified your class to do some more things. Chief to note are the below items
Fixed all the syntax errors making assumptions
Added a timer, which will call the flush periodically. This will need a lock object so as to not interrupt the write operation
Used an explicit buffer size in the StreamWriter constructor. You should heuristically determine what size works best for you. Also, you should disable AutoFlush from StreamWriter so you can have your writes hit the buffer instead of the file, providing better performance.
Below is the code with the changes
public class EventLogger : IDisposable, ILogger {
private readonly BlockingCollection<List<string>> _queue;
private readonly Task _consumerTask;
private FileStream _fs;
private StreamWriter _sw;
private System.Timers.Timer _timer;
private object streamLock = new object();
private const int MAX_BUFFER = 16 * 1024; // 16K
private const int FLUSH_INTERVAL = 10 * 1000; // 10 seconds
public EventLogger() {
OpenFile();
_queue = new BlockingCollection<List<string>>(50);
_consumerTask = Task.Factory.StartNew(Write, CancellationToken.None, TaskCreationOptions.LongRunning, TaskScheduler.Default);
}
void SetupFlushTimer() {
_timer = new System.Timers.Timer(FLUSH_INTERVAL);
_timer.AutoReset = true;
_timer.Elapsed += TimedFlush;
}
void TimedFlush(Object source, System.Timers.ElapsedEventArgs e) {
_sw?.Flush();
}
private void OpenFile() {
_fs?.Dispose();
_sw?.Dispose();
var _logFilePath = $"D:\\Log\\log{DateTime.Now.ToString("yyyyMMdd")}{System.Diagnostics.Process.GetCurrentProcess().Id}.txt";
_fs = new FileStream(_logFilePath, FileMode.Append, FileAccess.Write, FileShare.ReadWrite);
_sw = new StreamWriter(_fs, Encoding.Default, MAX_BUFFER); // TODO: use the correct encoding here
_sw.AutoFlush = false;
}
public void Dispose() {
_timer.Elapsed -= TimedFlush;
_timer.Dispose();
_queue?.CompleteAdding();
_consumerTask?.Wait();
_sw?.Dispose();
_fs?.Dispose();
_queue?.Dispose();
}
public void Log(List<string> list) {
try {
_queue.TryAdd(list, 100);
} catch (Exception e) {
LogError(LogLevel.Error, e);
}
}
private void Write() {
foreach (List<string> items in _queue.GetConsumingEnumerable()) {
lock (streamLock) {
items.ForEach(item => {
_sw?.WriteLine(item);
});
}
}
}
}
EDIT:
There are 4 factors controlling the performance of this mechanism, and it is important to understand their relationship. Below example will hopefully make it clear
Let's say
average size of List<string> is 50 Bytes
Calls/sec is 10,000
MAX_BUFFER is 1024 * 1024 Bytes (1 Meg)
You are producing 500,000 Bytes of data per second, so a 1 Meg buffer can hold only 2 seconds worth of data. i.e. Even if FLUSH_INTERVAL is set to 10 seconds the buffer will AutoFlush every 2 seconds (on an average) when it runs out of buffer space.
Also remember that increasing the MAX_BUFFER blindly will not help, since the actual flush operation will take longer due to the bigger buffer size.
The main thing to understand is that when there is a difference in incoming data rates (to your EventLog class) and outgoing data rates (to the disk), you will either need an infinite sized buffer (assuming continuously running process) or you will have to slow down your avg. incoming rate to match avg. outgoing rate
Maybe my answer won't address your concrete concern, but I believe that your scenario could be a good use case for memory-mapped files.
Persisted files are memory-mapped files that are associated with a
source file on a disk. When the last process has finished working with
the file, the data is saved to the source file on the disk. These
memory-mapped files are suitable for working with extremely large
source files.
This could be very interesting because you'll be able to do logging from different processes (i.e. IIS worker processes) without locking issues. See MemoryMappedFile.OpenExisting method.
Also, you can log to a non-persistent shared memory-mapped file and, using a task scheduler or a Windows service, you can take pending logs to their final destination using a persistable memory-mapped file.
I see a lot of potential on using this approach because of your multi/inter-process scenario.
Approach #2
If you don't want to re-invent the wheel, I would go for a reliable message queue like MSMQ (very basic, but still useful in your scenario) or RabbitMQ. Enqueue logs in persistent queues, and a background process may consume these log queues to write logs to the file system.
This way, you can create log files once, twice a day, or whenever you want, and you're not tied to the file system when logging actions within your system.
Use the FileStream.Flush() method - you might do this after each call to .Write. It will clear buffers for the stream and causes any buffered data to be written to the file.
https://msdn.microsoft.com/en-us/library/2bw4h516(v=vs.110).aspx
I am pretty new to coding with some experience in ASM and C for PIC. I am still learning high level programming with C#.
Question
I have a Serial port data reception and processing program in C#. To avoid losing data and knowing when it was coming, I set a DataReceived event and loop into the handling method until there were no more bytes to read.
When I attempted this, the loop continued endlessly and blocked my program from other tasks (such as processing the retrieved data) when I continuously received data.
I read about threading in C#, I created a thread that constantly checks for SerialPort.Bytes2Read property so it will know when to retrieve available data.
I created a second thread that can process data while new data is still being read. If bytes have been read and ReadSerial() has more bytes to read and the timeout (restarted every time a new byte is read from the serial) they can still be processed and the frames assembled via a method named DataProcessing() which reads from the same variable being filled by ReadSerial().
This gave me the desired results, but I noticed that with my solution (both ReadSerial() and DataProcessing() threads alive), CPU Usage was skyrocketed all the way to 100%!
How do you approach this problem without causing such high CPU usage?
public static void ReadSerial() //Method that handles Serial Reception
{
while (KeepAlive) // Bool variable used to keep alive the thread. Turned to false
{ // when the program ends.
if (Port.BytesToRead != 0)
{
for (int i = 0; i < 5000; i++)
{
/* I Don't know any other way to
implement a timeout to wait for
additional characters so i took what
i knew from PIC Serial Data Handling. */
if (Port.BytesToRead != 0)
{
RxList.Add(Convert.ToByte(Port.ReadByte()));
i = 0;
if (RxList.Count > 20) // In case the method is stuck still reading
BufferReady = true; // signal the Data Processing thread to
} // work with that chunk of data.
BufferReady = true; // signals the DataProcessing Method to work
} // with the current data in RxList.
}
}
}
I can not understand completely what you are meaning with the "DataReceived" and the "loop". I am also working a lot with Serial Ports as well as other interfaces. In my application I am attaching to the DataReceived Event and also reading based on the Bytes to read, but I dont use a loop there:
int bytesToRead = this._port.BytesToRead;
var data = new byte[bytesToRead];
this._port.BaseStream.Read(data , 0, bytesToRead);
If you are using a loop to read the bytes I recommend something like:
System.Threading.Thread.Sleep(...);
Otherwise the Thread you are using to read the bytes is busy all the time. And this will lead to the fact that other threads cannot be processed or your CPU is at 100%.
But I think you don't have to use a loop for polling for the data if you are using the DataReceived event. If my undertanding is not correct or you need further information please ask.
I have a process that needs to read and write to a file. The application has a specific order to its reads and writes and I want to preserve this order. What I would like to do is implement something that lets the first operation start and makes the second operation wait until the first is done with a first come first served like of queue to access the file. From what I have read file locking seems like it might be what I am looking for but I have not been able to find a very good example. Can anyone provide one?
Currently I am using a TextReader/Writer with .Synchronized but this is not doing what I hoped it would.
Sorry if this is a very basic question, threading gives me a headache :S
It should be as simple as this:
public static readonly object LockObj = new object();
public void AnOperation()
{
lock (LockObj)
{
using (var fs = File.Open("yourfile.bin"))
{
// do something with file
}
}
}
public void SomeOperation()
{
lock (LockObj)
{
using (var fs = File.Open("yourfile.bin"))
{
// do something else with file
}
}
}
Basically, define a lock object, then whenever you need to do something with your file, make sure you get a lock using the C# lock keyword. On reaching the lock statement, execution will block indefinitely until a lock has been obtained.
There are other constructs you can use for locking, but I find the lock keyword to be the most straightforward.
If you're using a current version of the .Net Framework, you can benefit from Task.ContinueWith.
If your units of work are logically always, "read some, then write some", the following expresses that intent succinctly and should scale:
string path = "file.dat";
// Start a reader task
var task = Task.Factory.StartNew(() => ReadFromFile(path));
// Continue with a writer task
task.ContinueWith(tt => WriteToFile(path));
// We're guaranteed that the read will occur before the write
// and that the write will occur once the read completes.
// We also can check the antecedent task's result (tt.Result in our
// example) for any special error logic we need.
I'm using FileSystemWatcher to check when a file is modified or deleted, but I'm wondering if there is any way to check when a file is read by another application.
Example:
I have the file C:\test.txt on my harddrive and am watching it using FileSystemWatcher. Another program (not under my control) goes to read that file; I would like to catch that event and, if possible, check what program is reading the file then modify the contents of the file accordingly.
It sounds like you want to write to your log file when your log file is read externally, or something to that effect. If that is the case, there is a NotifyFilters value, LastAccess. Make sure this is set as one of the flags in your FileSystemWatcher.NotifyFilter property. A change in the last access time will then fire the Changed event on FileSystemWatcher.
Currently, FileSystemWatcher does not allow you to directly differentiate between a read and a change; they both fire the Changed event based on the "change" to LastAccess. So, it would be infeasible to watch for reads to a large number of files. However, you seem to know which file you're watching, so if you had a FileInfo object for that file, and FileSystemWatcher fired its Changed event, you could get a new one and compare LastAccessTime values. If the access time changed, and LastWriteTime didn't, your file is only being read.
Now, in simplest terms, changes you make to the file while it is being read are not going to immediately show up in the other app, nor are you going to be able to "get there first", lock the file and write to it before they see it. So, you cannot use FileSystemWatcher to "intercept" a read request and show the content you want that app to see. The only way the user of another application can see what you just wrote is if the application is also watching the file and re-loads the file. That will fire another Changed event, causing an infinite loop as long as the other application continues to reload the file.
You will also get a Changed event for a read and a write. Opening a file in a text editor (virtually any will do), making some changes, then saving will fire two Changed events if you're looking for changes to Last Access Time. The first one will go off when the file is opened by the editor; at that time, you may not be able to tell that a write will happen, so if you are looking for pure read-only accesses to the file then you're SOL.
The easiest way I can think of to do this would be with a timer (System.Threading.Timer) whose callback checks and stores the last
System.IO.File.GetLastAccessTime(path)
Something like (maybe with a bit more locking...)
public class FileAccessWatcher
{
public Dictionary<string, DateTime> _trackedFiles = new Dictionary<string, DateTime>();
private Timer _timer;
public event EventHandler<EventArgs<string>> FileAccessed = delegate { };
public FileAccessWatcher()
{
_timer = new Timer(OnTimerTick, null, 500, Timeout.Infinite);
}
public void Watch(string path)
{
_trackedFiles[path] = File.GetLastAccessTime(path);
}
public void OnTimerTick(object state)
{
foreach (var pair in _trackedFiles.ToList())
{
var accessed = File.GetLastAccessTime(pair.Key);
if (pair.Value != accessed)
{
_trackedFiles[pair.Key] = accessed;
FileAccessed(this, new EventArgs<string>(pair.Key));
}
}
_timer.Change(500, Timeout.Infinite);
}
}
There is SysInternals' program FileMon... It can trace every file access in the system. If you can find its source, and understand what win32 hooks it uses, you might marshal those functions in C# and get what you want.
You could use FileInfo.LastAccessTime and FileInfo.Refresh() in a polling loop.
http://msdn.microsoft.com/en-us/library/system.io.fileinfo_members.aspx
Yes, using file system filter driver you can catch all read requests, analyze them and even substitute the data being read. Development of such driver yourself is possible, but very time-consuming and complicated. We offer a product called CallbackFilter, which includes a ready to use driver and lets you implement your filtering business logic in user-mode.
A little snippet that I found useful for detecting when another process has a lock:
static bool IsFileUsedbyAnotherProcess(string filename)
{
try
{
using(var file = File.Open(filename, FileMode.Open, FileAccess.Read, FileShare.None))
{
}
}
catch (System.IO.IOException exp)
{
return true;
}
return false;
}