this is a continuation of part 3
Write file need to optimised for heavy traffic part 3
as my code changed somewhat i think it is better to open a new thread.
public class memoryStreamClass
{
static MemoryStream ms1 = new MemoryStream();
static MemoryStream ms2 = new MemoryStream();
static int c = 1;
public void fillBuffer(string outputString)
{
byte[] outputByte = Encoding.ASCII.GetBytes(outputString);
if (c == 1)
{
ms1.Write(outputByte, 0, outputByte.Length);
if (ms1.Length > 8100)
{
c = 2;
Thread thread1 = new Thread(() => emptyBuffer(ref ms1));
thread1.Start();
}
}
else
{
ms2.Write(outputByte, 0, outputByte.Length);
if (ms2.Length > 8100)
{
c = 1;
Thread thread2 = new Thread(() => emptyBuffer(ref ms2));
thread2.Start();
}
}
}
void emptyBuffer(ref MemoryStream ms)
{
FileStream outStream = new FileStream(string.Format("c:\\output.txt", FileMode.Append);
ms.WriteTo(outStream);
outStream.Flush();
outStream.Close();
ms.SetLength(0);
ms.Position = 0;
Console.WriteLine(ms.Position);
}
there are 2 things i have changed changed from the code in part 3.
the class and method is changed to non-static, the variables are still static tho.
i have move the memorystream reset length into the emptyBuffer method, and i use a ref parameter to pass the reference instead of a copy to the method.
this code compiled fine and runs ok. However, i run it side by side with my single thread program, using 2 computers, one computer runs the single thread, and one computer runs the multithread version, on the same network. i run it for around 5 mins. and the single threaded version collects 8333KB of data while the multithread version collects only 8222KB of data. (98.6% of the single thread version)
its first time i have do any performance comparison between the 2 version. Maybe a should run more test to confirm it. but base on looking the code, any masters out there will point out any problem?
i haven't putting any code on lock or threadpooling at the moment, maybe i should, but if the code runs fine, i dont want to change it and break it. the only thing i will change is the buffer size, so i will eliminate any chance of the buffer fill up before the other is emptied.
any comments on my code?
The problem is still static state. You're clearing buffers that could have data that wasn't written to disk.
I imagine this scenario is happening 1.4% of the time.
ms1 fills up, empty buffer1 thread started, switch to ms2
empty buffer1 is writing to disk
ms2 fills up, empty buffer2 thread started, switch to ms1
empty buffer1 to disk finishes
ms1 is cleared while it is the active stream
When doing multi-threaded programming, static classes are fine but static state is not. Ideally you have no shared memory between threads and your code is entirely dependent on it.
Think of it this way -- if you're expecting a value to consistently change, it's not exactly static is it?
Related
I have a class that needs to keep an instance of a BinaryWriter open over several write function calls (data is packet based). It also has to create a new file once it has written a certain amount of data/packets.
Normally I would just close the Binary Writer and reinstantiate it with a new file path, but the overhead associated with that operation is too great for my application. I tried closing the writer in a seperate thread, but that interferes with the new instance I create later.
My last ditch attempt was not to close the Writer (and stream) at all, and simply create a new instance of it everytime I'd written the required packets. This seems to work, and doesn't cause any memory leaks, but I'd really like to know what goes on if you do this.
Here is my (simplified) code to illustrate:
class Writer
{
BinaryWriter binWriter;
int bytesWritten;
int filesWritten;
const int maxFilesize = 10E9;
Writer(string filepath)
{
binWriter = new BinaryWriter(File.Open(filepath, FileMode.Create));
bytesWritten = 0;
filesWritten = 0;
}
WritePacket(byte[] packet)
{
if(bytesWritten<maxFileSize)
{
binWriter.Write(packet);
bytesWritten += packet.Length;
}
else
{
// this is where I'd normally call Dispose(), but the overhead
// is too high, and disposing the stream in a seperate thread
// interferes with the new one
// what actually happens here? it's the only thing I've found to
//work...
filesWritten++;
binWriter = new BinaryWriter(File.Open(filepath + filesWritten, FileMode.Create));
}
}
It feels bad, but this is the only solution that works so far. Any insight would be great!
I'm running into an issue wherein I am getting significant (10+ second) delays when performing file write operations. It seems only to happen once, and always happens during the 2nd (or sometimes 3rd?) call to the WriteToFile() function.
I've written out 3 different 'WriteToFile' functions to show some of the variations I've tried thus far + shown additional lines in 'OpenFileIfNecessary' that I've tried.
The code never throws an error, and the offsets/counts are all valid. Once the delays occur a single time, there seem to be no further delays.
This has been a pain in my side for 2+ days and I'm definitely at that point where I'm in need of a 2nd opinion.
private void WriteToFile(byte[] data, long offset, int count)
{
lock (this.monitor)
{
this.OpenFileIfNecessary();
this.fileStream.Seek(offset, SeekOrigin.Begin); // <- Takes 10+ seconds for THIS line to execute
this.fileStream.Write(data, 0, count);
}
}
private void WriteToFile2(byte[] data, long offset, int count)
{
lock (this.monitor)
{
this.OpenFileIfNecessary();
this.fileStream.Position = offset; // <- Takes 10+ seconds for THIS line to execute
this.fileStream.Write(data, 0, count);
}
}
private void WriteToFile3(byte[] data, long offset, int count)
{
lock (this.monitor)
{
var fileName = this.file.FullName;
using (Stream fileStream = new FileStream(fileName, FileMode.OpenOrCreate))
{
fileStream.Position = offset; //(instant execution of this line)
fileStream.Write(data, 0, count);
//Getting from HERE ->
}
//To HERE <- takes 10+ seconds
}
}
private System.IO.FileStream fileStream = null;
private System.IO.FileInfo file; //value set during construction
private void OpenFileIfNecessary()
{
lock (this.monitor) {
if (this.fileStream == null) {
//The following 3 lines all result in the same behavior described in this post
//this.fileStream = this.file.Open(FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.ReadWrite);
//this.fileStream = this.file.Open(FileMode.OpenOrCreate, FileAccess.Write, FileShare.Write);
//this.fileStream = this.file.OpenWrite();
this.fileStream = this.file.Open(FileMode.OpenOrCreate);
}
}
}
Found the issue. It's worth mentioning that we had previously been testing with smaller (<1GB files) until late last week. With that in mind:
We write to the file at different positions, that is, we don't simply start at position 0 and go to the end. What that means (especially for larger files) is that every time we first go to a position that is deep into the file, there is apparently a wait period for the newly extended size to be allocated.
The way FileStream obfuscates a lot of the under-the-hood stuff made it a little difficult to find the pattern, and once we did some deeper profiling and discovered smaller delays with smaller files (never noticed the delays before) it became clear what was happening.
The plan forward is to do some multithreading to allow for the space for the file to be allocated fully before writing to disk; we can buffer in memory during that wait period.
Example code for preallocating the entire file:
fileStream.Seek(size - 1, SeekOrigin.Begin);
fileStream.WriteByte(0);
fileStream.Flush();
That is happening because when you set a file position to some large value, underlying storage system has to zero out the contents of allocated blocks. I do not believe BCL will let you bypass that but there is actual a way in Win32 to skip that functionality which requires running program to have administrator privileges (in a very imprecise manner).
Search for SetFileValidData() documentation.
Let's simplify the model.
class Container
{
//other members
public byte[] PNG;
}
class Producer
{
public byte[] Produce(byte[] ImageOutside)
{
using (MemoryStream bmpStream = new MemoryStream(ImageOutside),
pngStream = new MemoryStream())
{
System.Drawing.Bitmap bitmap = new System.Drawing.Bitmap(bmpStream);
bitmap.Save(pngStream, System.Drawing.Imaging.ImageFormat.Png);
pngStream.Seek(0, System.IO.SeekOrigin.Begin);
byte[] PNG = new byte[pngStream.Length];
pngStream.Read(PNG, 0, (int)pngStream.Length);
bitmap.Dispose();
GC.Collect();
GC.WaitForPendingFinalizers();
return PNG;
}
}
}
The main function keep making Container container = new Container(); produce PNG for container.PNG, and Queue.Enqueue(container)
use using() clause doesn't work at all.
While this repeat for about 40+ times(it varies), it throws an exception. Sometime it's OutOfMemoryException and sometime It's something like "GDI+ normal error"(I am not sure how it is exactly in English, I just translated it).
But If I try and catch the exception and simply ignore it, it can still continue producing more but not unlimitedly though, just more forward.
The occupied memory shown in task manager is only about 600 - 700 MB when the first exception is thrown and it finally stops at about 1.2GB. I have tried this:
while (true)
{
Bitmap b = new Bitmap(4500, 5000);
list.Add(b);
Invoke((MethodInvoker)delegate { textBox1.Text = list.Count.ToString(); });
}
It never throws any exception though 99% memory(about 11GB) has been allocated for the program, and all happen is the number in textBox1 no longer raise.
The way to avoid this may be not to produce so many things, but I still want to know the internal principle and reason and thank you for your help.
with byte[] PNG = new byte[pngStream.Length]; you allocate a large portion of memory to store the image.
The follow call it's useless, you have already disposed the stream.
GC.Collect();
GC.WaitForPendingFinalizers();
The memory used by PNG array cannot be released, because there is active reference in function return.
I suggest to return a stream instead of an array of bytes.
Otherwise after you call the method Produce remember to remove the reference to PNG before call again.
sample:
while (true)
{
Byte[] b = new Byte[1000];
b = this.Produce(b);
//Use your array as you need, but you can't assign it to external property, otherwise memory cannot be released
b = null; //remove the reference, (in reality, in this example assign null is not necessary, because b will be overwritten at next loop.
GC.Collect(); //Force garbage collector, probably not necessarry, but can be useful
GC.WaitForPendingFinalizers();
}
The platform compilation can affect the maximum available memory:
In a 32 bit application, you have a maximum of 2 GiB of available memory
In a 64 bit application you have 2 Tib of available memory, but
single object (class) cannot exceed 2 Gib.
In a UWP application there are other limitation in dependence of the
device
Any CPU is complied just in time, when you launch the application,
and can be run both 32-bit and 64, it depends from machine architecture
and system configuration.
It's code that will execute 4 threads in 15-min intervals. The last time that I ran it, the first 15-minutes were copied fast (20 files in 6 minutes), but the 2nd 15-minutes are much slower. It's something sporadic and I want to make certain that, if there's any bottleneck, it's in a bandwidth limitation with the remote server.
EDIT: I'm monitoring the last run and the 15:00 and :45 copied in under 8 minutes each. The :15 hasn't finished and neither has :30, and both began at least 10 minutes before :45.
Here's my code:
static void Main(string[] args)
{
Timer t0 = new Timer((s) =>
{
Class myClass0 = new Class();
myClass0.DownloadFilesByPeriod(taskRunDateTime, 0, cts0.Token);
Copy0Done.Set();
}, null, TimeSpan.FromMinutes(20), TimeSpan.FromMilliseconds(-1));
Timer t1 = new Timer((s) =>
{
Class myClass1 = new Class();
myClass1.DownloadFilesByPeriod(taskRunDateTime, 1, cts1.Token);
Copy1Done.Set();
}, null, TimeSpan.FromMinutes(35), TimeSpan.FromMilliseconds(-1));
Timer t2 = new Timer((s) =>
{
Class myClass2 = new Class();
myClass2.DownloadFilesByPeriod(taskRunDateTime, 2, cts2.Token);
Copy2Done.Set();
}, null, TimeSpan.FromMinutes(50), TimeSpan.FromMilliseconds(-1));
Timer t3 = new Timer((s) =>
{
Class myClass3 = new Class();
myClass3.DownloadFilesByPeriod(taskRunDateTime, 3, cts3.Token);
Copy3Done.Set();
}, null, TimeSpan.FromMinutes(65), TimeSpan.FromMilliseconds(-1));
}
public struct FilesStruct
{
public string RemoteFilePath;
public string LocalFilePath;
}
Private void DownloadFilesByPeriod(DateTime TaskRunDateTime, int Period, Object obj)
{
FilesStruct[] Array = GetAllFiles(TaskRunDateTime, Period);
//Array has 20 files for the specific period.
using (Session session = new Session())
{
// Connect
session.Open(sessionOptions);
TransferOperationResult transferResult;
foreach (FilesStruct u in Array)
{
if (session.FileExists(u.RemoteFilePath)) //File exists remotely
{
if (!File.Exists(u.LocalFilePath)) //File does not exist locally
{
transferResult = session.GetFiles(u.RemoteFilePath, u.LocalFilePath);
transferResult.Check();
foreach (TransferEventArgs transfer in transferResult.Transfers)
{
//Log that File has been transferred
}
}
else
{
using (StreamWriter w = File.AppendText(Logger._LogName))
{
//Log that File exists locally
}
}
}
else
{
using (StreamWriter w = File.AppendText(Logger._LogName))
{
//Log that File exists remotely
}
}
if (token.IsCancellationRequested)
{
break;
}
}
}
}
Something is not quite right here. First thing is, you're setting 4 timers to run parallel. If you think about it, there is no need. You don't need 4 threads running parallel all the time. You just need to initiate tasks at specific intervals. So how many timers do you need? ONE.
The second problem is why TimeSpan.FromMilliseconds(-1)? What is the purpose of that? I can't figure out why you put that in there, but I wouldn't.
The third problem, not related to multi-programming, but I should point out anyway, is that you create a new instance of Class each time, which is unnecessary. It would be necessary if, in your class, you need to set constructors and your logic access different methods or fields of the class in some order. In your case, all you want to do is to call the method. So you don't need a new instance of the class every time. You just need to make the method you're calling static.
Here is what I would do:
Store the files you need to download in an array / List<>. Can't you spot out that you're doing the same thing every time? Why write 4 different versions of code for that? This is unnecessary. Store items in an array, then just change the index in the call!
Setup the timer at perhaps 5 seconds interval. When it reaches the 20 min/ 35 min/ etc. mark, spawn a new thread to do the task. That way a new task can start even if the previous one is not finished.
Wait for all threads to complete (terminate). When they do, check if they throw exceptions, and handle them / log them if necessary.
After everything is done, terminate the program.
For step 2, you have the option to use the new async keyword if you're using .NET 4.5. But it won't make a noticeable difference if you use threads manually.
And why is it so slow...why don't you check your system status using task manager? Is the CPU high and running or is the network throughput occupied by something else or what? You can easily tell the answer yourself from there.
The problem was the sftp client.
The purpose of the console application was to loop through a list<> and download the files. I tried with winscp and, even though, it did the job, it was very slow. I also tested sharpSSH and it was even slower than winscp.
I finally ended up using ssh.net which, at least in my particular case, was much faster than both winscp and sharpssh. I think the problem with winscp is that there was no evident way of disconnecting after I was done. With ssh.net I could connect/disconnect after every file download was made, something I couldn't do with winscp.
I am currently using System.Collections.Concurrent.BlockingCollection, and it´s very good with what it´s doing.
However it seems that it only keeps the reference of an object.
So if i have one byte[] object, which is written to and added to the Queue 100 times.
And after it reaches 100, i want to read all those, i will only get 100 copies of the current data "byte[]" holds.
Hope that explains it, at least it seems it´s doing this from my tests.
So if it´s doing this, is there another one that can keep copies of the data and just add it and add it till i read it?
Like for example, i would have 100 byte[] files, write it to a MemoryStream in the correct order, then i can read them in that order.
Though a Memory Stream isn´t what i would prefer to use, but works as an example.
Here is my code:
try
{
Thread.Sleep(100);
for (int i = Queue.Count; i <= Queue.Count; i++)
if (Queue.TryTake(out AudioData, 300))
{
if (Record)
waveWriter.Write(AudioData, 0, AudioData.Length);
}
}
catch (Exception e)
{
if (e is ArgumentNullException)
return;
}
Here is the part which receives the data
using (ms = new MemoryStream(TcpSize))
using (var tt1 = tcplisten.AcceptTcpClient())
{
ReceiveData = new byte[TcpSize];
tt1.NoDelay = true;
using (var tcpstream = tt1.GetStream())
while (connect)
{
if (Record)
Queue.Add(ReceiveData);
tcpstream.Read(ReceiveData, 0, TcpSize);
waveProvider.AddSamples(ReceiveData, 0, TcpSize);
}
}
You may wonder why i use a for loop and all that for writing, but it´s just there for debug purposes. I wanted to test if the objects in the Queue was copies, cause if so, it shouldn´t matter when i write it, but it does which means it must be reference.
Thanks
If you want to queue copies of the data, just make a copy and then queue the copy.
Queue.Add((byte[])ReceiveData.Clone());
But I think you also need to sort out the fact that you're writing the data to the queue before filling the buffer...
Alternatively, create a new buffer on each iteration and queue that instead:
while (connect)
{
ReceiveData = new byte[TcpSize];
tcpstream.Read(ReceiveData, 0, TcpSize);
waveProvider.AddSamples(ReceiveData, 0, TcpSize);
if (Record)
Queue.Add(ReceiveData);
}