write file out of memory c# - c#

I get some problems with c# windows form.
My goal is to slice a big file(maybe>5GB) into files,and each file contains a million lines.
According to the code below,I have no idea why it will be out of memory.
Thanks.
StreamReader readfile = new StreamReader(...);
StreamWriter writefile = new StreamWriter(...);
string content;
while ((content = readfile.ReadLine()) != null)
{
writefile.Write(content + "\r\n");
i++;
if (i % 1000000 == 0)
{
index++;
writefile.Close();
writefile.Dispose();
writefile = new StreamWriter(...);
}
label5.Text = i.ToString();
label5.Update();
}

The error is probably in the
label5.Text = i.ToString();
label5.Update();
just to make a test I've written something like:
for (int i = 0; i < int.MaxValue; i++)
{
label1.Text = i.ToString();
label1.Update();
}
The app freezes around 16000-18000 (Windows 7 Pro SP1 x64, the app running both x86 and x64).
What probably happens is that by running your long operation in the main thread of the app, you stall the message queue of the window, and at a certain point it freezes. You can see that this is the problem by adding a
Application.DoEvents();
instead of the
label5.Update();
But even this is a false solution. The correct solution is moving the copying on another thread and updating the control every x milliseconds, using the Invoke method (because you are on a secondary thread),
For example:
public void Copy(string source, string dest)
{
const int updateMilliseconds = 100;
int index = 0;
int i = 0;
StreamWriter writefile = null;
try
{
using (StreamReader readfile = new StreamReader(source))
{
writefile = new StreamWriter(dest + index);
// Initial value "back in time". Forces initial update
int milliseconds = unchecked(Environment.TickCount - updateMilliseconds);
string content;
while ((content = readfile.ReadLine()) != null)
{
writefile.Write(content);
writefile.Write("\r\n"); // Splitted to remove a string concatenation
i++;
if (i % 1000000 == 0)
{
index++;
writefile.Dispose();
writefile = new StreamWriter(dest + index);
// Force update
milliseconds = unchecked(milliseconds - updateMilliseconds);
}
int milliseconds2 = Environment.TickCount;
int diff = unchecked(milliseconds2 - milliseconds);
if (diff >= updateMilliseconds)
{
milliseconds = milliseconds2;
Invoke((Action)(() => label5.Text = string.Format("File {0}, line {1}", index, i)));
}
}
}
}
finally
{
if (writefile != null)
{
writefile.Dispose();
}
}
// Last update
Invoke((Action)(() => label5.Text = string.Format("File {0}, line {1} Finished", index, i)));
}
and call it with:
var thread = new Thread(() => Copy(#"C:\Temp\lst.txt", #"C:\Temp\output"));
thread.Start();
Note how it will write the label5 every 100 milliseconds, plus once at the beginning (by setting the initial value of milliseconds "back in time"), each time the output file is changed (by setting the value of milliseconds "back in time") and after having disposed everything.
An even more correct example can be written by using the BackgroundWorker class, that exists explicitly for this scenario. It has an event, ProgressChanged, that can be subscribed to update the window.
Something like this:
private void button1_Click(object sender, EventArgs e)
{
BackgroundWorker backgroundWorker = new BackgroundWorker();
backgroundWorker.WorkerReportsProgress = true;
backgroundWorker.ProgressChanged += backgroundWorker_ProgressChanged;
backgroundWorker.RunWorkerCompleted += backgroundWorker_RunWorkerCompleted;
backgroundWorker.DoWork += backgroundWorker_DoWork;
backgroundWorker.RunWorkerAsync(new string[] { #"C:\Temp\lst.txt", #"C:\Temp\output" });
}
private void backgroundWorker_DoWork(object sender, DoWorkEventArgs e)
{
BackgroundWorker worker = sender as BackgroundWorker;
string[] arguments = (string[])e.Argument;
string source = arguments[0];
string dest = arguments[1];
const int updateMilliseconds = 100;
int index = 0;
int i = 0;
StreamWriter writefile = null;
try
{
using (StreamReader readfile = new StreamReader(source))
{
writefile = new StreamWriter(dest + index);
// Initial value "back in time". Forces initial update
int milliseconds = unchecked(Environment.TickCount - updateMilliseconds);
string content;
while ((content = readfile.ReadLine()) != null)
{
writefile.Write(content);
writefile.Write("\r\n"); // Splitted to remove a string concatenation
i++;
if (i % 1000000 == 0)
{
index++;
writefile.Dispose();
writefile = new StreamWriter(dest + index);
// Force update
milliseconds = unchecked(milliseconds - updateMilliseconds);
}
int milliseconds2 = Environment.TickCount;
int diff = unchecked(milliseconds2 - milliseconds);
if (diff >= updateMilliseconds)
{
milliseconds = milliseconds2;
worker.ReportProgress(0, new int[] { index, i });
}
}
}
}
finally
{
if (writefile != null)
{
writefile.Dispose();
}
}
// For the RunWorkerCompleted
e.Result = new int[] { index, i };
}
void backgroundWorker_ProgressChanged(object sender, ProgressChangedEventArgs e)
{
int[] state = (int[])e.UserState;
label5.Text = string.Format("File {0}, line {1}", state[0], state[1]);
}
void backgroundWorker_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
{
int[] state = (int[])e.Result;
label5.Text = string.Format("File {0}, line {1} Finished", state[0], state[1]);
}

Related

loop use async or thread instead of timer

Sorry for my little English
I have a problem
Instead of the timer, I need to run the codes below in a loop.
can you help me with this
my main purpose
to use thread or async
if (NumbersIndexCounter != PhoneNumbersList.Count)
{
SendMessage(PhoneNumbersList[NumbersIndexCounter++]);
}
lblFailedProcess.Text = FailedProcess.ToString();
lblSuccedProcess.Text = SuccedProcess.ToString();
//metroLabel2.Text = toplamprocess.ToString();
Thread.Sleep(2000);
dataGridÄ°nfos.DataSource = "";
dataGridÄ°nfos.DataSource = InfoList;
Thread.Sleep(Convert.ToInt16(txtWaitBeforeEveryMessage.Text)*1000);
ProcessCounter++;
if (NumbersIndexCounter == PhoneNumbersList.Count)
{
int s1, s2;
s1 = Convert.ToInt32(lblFailedProcess.Text);
s2 = Convert.ToInt32(lblSuccedProcess.Text);
int toplam = s1 + s2;
metroLabel2.Text = "Toplam = " + toplam;
timer1.Stop();
Driver.Quit();
MessageBox.Show("Bitti!");
NumbersIndexCounter = 0;
grpBxMessage.Enabled = true;
grpBxPhoneNumbers.Enabled = true;
grpBxSettings.Enabled = true;
PhoneNumbersList.Clear();
}
if (ProcessCounter == Convert.ToInt32(txtMessageCountForWait.Text))
{
Thread.Sleep(Convert.ToInt16(txtWait.Text) * 1000);
ProcessCounter = 0;
}
but when I use a private void in my codes, it works once and does not continue. When I use a timer, it works without any problems, but the form is not response.

Wpf async await ui is frozen

I writing a WPF desktop application and I used async await to keep my UI update.
its works OK for 5 or 6 sec but after that UI freezing but background code is running normally.
await Task.Run(() =>
{
result = index.lucene_index(filepath, filename, fileContent);
if (result) {
updateResultTextBox(filename);
Task.Delay(1000);
}
});
and updateResultTextBox is
private void updateResultTextBox(string _filename)
{
sync.Post(new SendOrPostCallback(o =>
{
result_tbx.Text += "Indexed \t" + (string)o + "\n";
result_tbx.ScrollToEnd();
}), _filename);
}
Your question is less then clear. So I have to guess. My only guess at this time: GUI write overhead.
Writing the GUI is not cheap. If you only do it once per user triggered event, you do not notice it. But once you do it in a loop - even one that runs in a seperate task or thread - you will notice it. I wrote this simple Windows Forms example to showcase the difference:
using System;
using System.Windows.Forms;
namespace UIWriteOverhead
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
int[] getNumbers(int upperLimit)
{
int[] ReturnValue = new int[upperLimit];
for (int i = 0; i < ReturnValue.Length; i++)
ReturnValue[i] = i;
return ReturnValue;
}
void printWithBuffer(int[] Values)
{
textBox1.Text = "";
string buffer = "";
foreach (int Number in Values)
buffer += Number.ToString() + Environment.NewLine;
textBox1.Text = buffer;
}
void printDirectly(int[] Values){
textBox1.Text = "";
foreach (int Number in Values)
textBox1.Text += Number.ToString() + Environment.NewLine;
}
private void btnPrintBuffer_Click(object sender, EventArgs e)
{
MessageBox.Show("Generating Numbers");
int[] temp = getNumbers(10000);
MessageBox.Show("Printing with buffer");
printWithBuffer(temp);
MessageBox.Show("Printing done");
}
private void btnPrintDirect_Click(object sender, EventArgs e)
{
MessageBox.Show("Generating Numbers");
int[] temp = getNumbers(1000);
MessageBox.Show("Printing directly");
printDirectly(temp);
MessageBox.Show("Printing done");
}
}
}
If you start a lot of those tasks and they suddenly all return 5-6 seconds in the process, you might just plain overload the GUI thread with the sheer amount of write operations.
I actually had that issue with my very first attempt at Multithreading. I did proper Multthreading, but I still overloaded the GUI thread wich made it appear I had failed.
there is something very strange on this code. Anyway, here are my two cents:
var text = await Task.Run(() =>
{
result = index.lucene_index(filepath, filename, fileContent);
if (result) {
return filename;
}
return string.Empty;
});
if (!string.IsNullOrEmpty(text)) {
result_tbx.Text += $"Indexed \t {text} {Environment.NewLine}";
result_tbx.ScrollToEnd();
}
Still a code smell...

Huffman Coding Issue with file size after compression

I am using Huffman code to create a compression algorithm for compressing any sort of file , but I can see that the compressed size is almost same as the original size. E.g 25 mb video occupies 24 mb after compression and 606 kb image occupies 605 kb after compression. Below is my entire code. Kindly let me know if I am doing anything wrong.
public static class ByteValues
{
public static Dictionary<byte, string> ByteDictionary;
public static void AddValues(byte b, string values)
{
if (ByteDictionary == null)
{
ByteDictionary = new Dictionary<byte, string>();
}
ByteDictionary.Add(b, values);
}
public static List<List<T>> Split<T>(this List<T> list, int parts)
{
int i = 0;
var splits = from item in list
group item by i++ % parts into part
select part.ToList();
return splits.ToList();
}
}
public class Node
{
public byte value;
public long freq;
public Node LeftNode;
public Node RightNode;
public void Traverse(string path)
{
if (LeftNode == null)
{
ByteValues.AddValues(value, path);
}
else
{
LeftNode.Traverse(path + "0");
RightNode.Traverse(path + "1");
}
}
}
public partial class MainWindow : Window
{
Dictionary<byte, long> Bytefreq = new Dictionary<byte, long>();
string filename;
List<Node> Nodes = new List<Node>();
public MainWindow()
{
InitializeComponent();
}
private void Button_Click_1(object sender, RoutedEventArgs e)
{
OpenFileDialog dialog = new OpenFileDialog();
dialog.ShowDialog();
filename = dialog.FileName;
if (!string.IsNullOrEmpty(filename))
{
for (int i = 0; i <= byte.MaxValue; i++)
{
Bytefreq.Add((byte)i, 0);
}
BackgroundWorker worker = new BackgroundWorker();
worker.WorkerReportsProgress = true;
worker.DoWork += worker_DoWork;
worker.ProgressChanged += worker_ProgressChanged;
worker.RunWorkerCompleted += worker_RunWorkerCompleted;
worker.RunWorkerAsync();
}
}
void worker_DoWork(object sender, DoWorkEventArgs e)
{
BackgroundWorker worker = sender as BackgroundWorker;
using (BinaryReader reader = new BinaryReader(File.OpenRead(filename)))
{
long length = reader.BaseStream.Length;
int pos = 0;
System.Windows.Application.Current.Dispatcher.Invoke(() =>
{
pbProgress.Maximum = length;
});
while (pos < length)
{
byte[] inputbytes = reader.ReadBytes(1000000);
Bytefreq = inputbytes.OrderBy(x => x).GroupBy(x => x).ToDictionary(x => x.Key, x => (long)(Bytefreq[x.Key] + x.Select(l => l).ToList().Count));
pos = pos + inputbytes.Length;
worker.ReportProgress(pos);
}
}
}
void worker_ProgressChanged(object sender, ProgressChangedEventArgs e)
{
pbProgress.Value = e.ProgressPercentage;
}
void worker1_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
{
System.Windows.MessageBox.Show("DONE");
System.Windows.Application.Current.Shutdown();
}
void worker_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
{
pbProgress.Value = 0;
foreach (KeyValuePair<byte, long> kv in Bytefreq)
{
Nodes.Add(new Node() { value = kv.Key, freq = kv.Value });
}
while (Nodes.Count > 1)
{
Nodes = Nodes.OrderBy(x => x.freq).ThenBy(x => x.value).ToList();
Node left = Nodes[0];
Node right = Nodes[1];
Node newnode = new Node() { LeftNode = left, RightNode = right, freq = left.freq + right.freq };
Nodes.Remove(left);
Nodes.Remove(right);
Nodes.Add(newnode);
}
Nodes[0].Traverse(string.Empty);
BackgroundWorker worker1 = new BackgroundWorker();
worker1.WorkerReportsProgress = true;
worker1.DoWork += worker1_DoWork;
worker1.ProgressChanged += worker_ProgressChanged;
worker1.RunWorkerCompleted += worker1_RunWorkerCompleted;
worker1.RunWorkerAsync();
}
void worker1_DoWork(object sender, DoWorkEventArgs e)
{
BackgroundWorker worker = sender as BackgroundWorker;
Dictionary<byte, string> bytelookup = ByteValues.ByteDictionary;
using (BinaryWriter writer = new BinaryWriter(File.Create(Environment.GetFolderPath(Environment.SpecialFolder.Desktop) + "\\Test.txt")))
{
using (BinaryReader reader = new BinaryReader(File.OpenRead(filename)))
{
long length = reader.BaseStream.Length;
int pos = 0;
while (pos < length)
{
byte[] inputbytes = reader.ReadBytes(1000000);
StringBuilder builder = new StringBuilder();
List<string> outputbytelist = inputbytes.Select(b => bytelookup[b]).ToList();
outputbytelist.ForEach(x => builder.Append(x));
int numOfBytes = builder.ToString().Length / 8;
var bytesAsStrings = builder.ToString().Select((c, i) => new { Char = c, Index = i })
.GroupBy(x => x.Index / 8)
.Select(g => new string(g.Select(x => x.Char).ToArray()));
byte[] finalbytes = bytesAsStrings.Select(s => Convert.ToByte(s, 2)).ToArray();
writer.BaseStream.Write(finalbytes, 0, finalbytes.Length);
pos = pos + inputbytes.Length;
worker.ReportProgress(pos);
}
}
}
}
}
The problem is in the type of data you trying to compress. So when you say "E.g 25 mb video occupies 24 mb after compression", the key word here is video. Video data is notoriously hard to compress (much like other types of binary data, such as music or images).
If you need to compress video, I'd search for dedicated codecs (MP4, MPEG, H.264), but some may not be free to use so watch for licenses costs. Note, that most codecs are lossy - they try to preserve visible quality but remove other information from video. Most of this stuff is good enough, but at some moment you may notice artifacts.
You can also attempt to use lossless compression (like Huffman, gzip, LZ, LZMA, 7z, most available from 7 zip sdk etc), but this won't compress your data well due to it's nature. The basic idea is: the more data resembles random noise, the harder it is to compress. Bonus point: you cannot physically compress random data with any lossless compression even by 1 bit (read about this here).

Thread program, 1st and last loop never has a ThreadedState other than 'Running'

I have a programming that is looping x times (10), and using a specified number of threads (2). I'm using a thread array:
Thread[] myThreadArray = new Thread[2];
My loop counter, I believe, starts the first 2 threads just fine, but when it gets to loop 3, which goes back to thread 0 (zero-based), it hangs. The weird thing is, if I throw a MessageBox.Show() in their to check the ThreadState (which shows thread 0 is still running), it will continue on through 9 of the 10 loops. But if no MessageBox.Show() is there, it hangs when starting the 3rd loop.
I'm using .NET 3.5 Framework (I noticed that .NET 4.0 utilizes something called continuations...)
Here's some code examples:
Thread[] threads = new Thread[2];
int threadCounter = 0;
for (int counter = 0; counter < 10; counter++)
{
if (chkUseThreading.Checked)
{
TestRunResult runResult = new TestRunResult(counter + 1);
TestInfo tInfo = new TestInfo(conn, comm, runResult);
if (threads[threadCounter] != null)
{
// If this is here, then it will continue looping....otherwise, it hangs on the 3rd loop
MessageBox.Show(threads[threadCounter].ThreadState.ToString());
while (threads[threadCounter].IsAlive || threads[threadCounter].ThreadState == ThreadState.Running)
Thread.Sleep(1);
threads[threadCounter] = null;
}
// ExecuteTest is a non-static method
threads[threadCounter] = new Thread(new ThreadStart(delegate { ExecuteTest(tInfo); }));
threads[threadCounter].Name = "PerformanceTest" + (counter + 1);
try
{
threads[threadCounter].Start();
if ((threadCounter + 1) == threadCount)
threadCounter = 0;
else
threadCounter++;
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
Application.DoEvents();
}
}
while (true)
{
int threadsFinished = 0;
for (int counter = 0; counter < threadCount; counter++)
{
if (!threads[counter].IsAlive || threads[counter].ThreadState == ThreadState.Stopped)
threadsFinished++;
}
if (threadsFinished == threadCount)
break;
else
Thread.Sleep(1);
}
Obviously the problem is something about how I'm checking to see if thread #1 or #2 is done. The IsAlive always says true, and the ThreadState always has "running" for threads loops 1 and 10.
Where am I going wrong with this?
Update, here's the ExecuteTask() method:
private void ExecuteTest(object tInfo)
{
TestInfo testInfo = tInfo as TestInfo;
Exception error = null;
DateTime endTime;
TimeSpan duration;
DateTime startTime = DateTime.Now;
try
{
if (testInfo.Connection.State != ConnectionState.Open)
{
testInfo.Connection.ConnectionString = connString;
testInfo.Connection.Open();
}
testInfo.Command.ExecuteScalar();
}
catch (Exception ex)
{
error = ex;
failedCounter++;
//if (chkCancelOnError.Checked)
// break;
}
finally
{
endTime = DateTime.Now;
duration = endTime - startTime;
RunTimes.Add(duration);
testInfo.Result.StartTime = startTime;
testInfo.Result.EndTime = endTime;
testInfo.Result.Duration = duration;
testInfo.Result.Error = error;
TestResults.Add(testInfo.Result);
// This part must be threadsafe...
if (lvResults.InvokeRequired)
{
SetTextCallback d = new SetTextCallback(ExecuteTest);
this.Invoke(d, new object[] { tInfo });
}
else
{
lvResults.Items.Add(testInfo.Result.ConvertToListViewItem());
#region Update Results - This wouldn't work in it's own method in the threaded version
const string msPrefix = "ms";
// ShortestRun
TimeSpan shortest = GetShortestRun(RunTimes);
tbShortestRun.Text = shortest.TotalMilliseconds + msPrefix;
// AverageRun
TimeSpan average = GetAverageRun(RunTimes);
tbAverageRun.Text = average.TotalMilliseconds + msPrefix;
// MeanRun
TimeSpan mean = GetMeanRun(RunTimes);
tbMeanRun.Text = mean.TotalMilliseconds + msPrefix;
// LongestRun
TimeSpan longest = GetLongestRun(RunTimes);
tbLongestRun.Text = longest.TotalMilliseconds + msPrefix;
// ErrorCount
int errorCount = GetErrorCount(TestResults);
tbErrorCount.Text = errorCount.ToString();
#endregion
}
testInfo.Command.Dispose();
Application.DoEvents();
}
}
Can you post a snippet of run ()? Doesn't Thread.currentThread().notifyAll() help? May be each thread is waiting for other thread to do something resulting in a deadlock?

Read/Write text file progressbar in C#

I have a function which reads a big text file,splits a part(from a given start and end),and save the splitted data into another text file.since the file is too big,i need to add a progressbar when reading the stream and another one when writing the splitted text into the other file.Ps.start and end are given datetime!!
using (StreamReader sr = new StreamReader(file,System.Text.Encoding.ASCII))
{
while (sr.EndOfStream == false)
{
line = sr.ReadLine();
if (line.IndexOf(start) != -1)
{
using (StreamWriter sw = new StreamWriter(DateTime.Now.ToString().Replace("/", "-").Replace(":", "-") + "cut"))
{
sw.WriteLine(line);
while (sr.EndOfStream == false && line.IndexOf(end) == -1)
{
line = sr.ReadLine();
sw.WriteLine(line);
}
}
richTextBox1.Text += "done ..." + "\n";
break;
}
}
}
The first thing to do would be to work out how long the file is using FileInfo,
http://msdn.microsoft.com/en-us/library/system.io.fileinfo.aspx
FileInfo fileInfo = new FileInfo(file);
long length = fileInfo.Length;
I would suggest you do it like this,
private long currentPosition = 0;
private void UpdateProgressBar(int lineLength)
{
currentPosition += line.Count; // or plus 2 if you need to take into account carriage return
progressBar.Value = (int)(((decimal)currentPosition / (decimal)length) * (decimal)100);
}
private void CopyFile()
{
progressBar.Minimum = 0;
progressBar.Maximum = 100;
currentPosition = 0;
using (StreamReader sr = new StreamReader(file,System.Text.Encoding.ASCII))
{
while (sr.EndOfStream == false)
{
line = sr.ReadLine();
UpdateProgressBar(line.Length);
if (line.IndexOf(start) != -1)
{
using (StreamWriter sw = new StreamWriter(DateTime.Now.ToString().Replace("/", "-").Replace(":", "-") + "cut"))
{
sw.WriteLine(line);
while (sr.EndOfStream == false && line.IndexOf(end) == -1)
{
line = sr.ReadLine();
UpdateProgressBar(line.Length);
sw.WriteLine(line);
}
}
richTextBox1.Text += "done ..." + "\n";
break;
}
}
}
}
Which is calculating the percentage of the file that has been read and setting the progress bar to that value. Then you don't have to worry about whether the length is a long, and the progress bar uses int.
If you don't want to truncate the value then do this (casting to an int above will always truncate the decimals, and thus round down),
progressBar.Value = (int)Math.Round(((decimal)currentPosition / (decimal)length) * (decimal)100), 0);
Is this on a background thread? Don't forget that you will have to call this.Invoke to update the progress bar or else you will get a cross thread exception.

Categories