Huffman Coding Issue with file size after compression

Huffman Coding Issue with file size after compression - c#

I am using Huffman code to create a compression algorithm for compressing any sort of file , but I can see that the compressed size is almost same as the original size. E.g 25 mb video occupies 24 mb after compression and 606 kb image occupies 605 kb after compression. Below is my entire code. Kindly let me know if I am doing anything wrong.
public static class ByteValues
{
public static Dictionary<byte, string> ByteDictionary;
public static void AddValues(byte b, string values)
{
if (ByteDictionary == null)
{
ByteDictionary = new Dictionary<byte, string>();
}
ByteDictionary.Add(b, values);
}
public static List<List<T>> Split<T>(this List<T> list, int parts)
{
int i = 0;
var splits = from item in list
group item by i++ % parts into part
select part.ToList();
return splits.ToList();
}
}
public class Node
{
public byte value;
public long freq;
public Node LeftNode;
public Node RightNode;
public void Traverse(string path)
{
if (LeftNode == null)
{
ByteValues.AddValues(value, path);
}
else
{
LeftNode.Traverse(path + "0");
RightNode.Traverse(path + "1");
}
}
}
public partial class MainWindow : Window
{
Dictionary<byte, long> Bytefreq = new Dictionary<byte, long>();
string filename;
List<Node> Nodes = new List<Node>();
public MainWindow()
{
InitializeComponent();
}
private void Button_Click_1(object sender, RoutedEventArgs e)
{
OpenFileDialog dialog = new OpenFileDialog();
dialog.ShowDialog();
filename = dialog.FileName;
if (!string.IsNullOrEmpty(filename))
{
for (int i = 0; i <= byte.MaxValue; i++)
{
Bytefreq.Add((byte)i, 0);
}
BackgroundWorker worker = new BackgroundWorker();
worker.WorkerReportsProgress = true;
worker.DoWork += worker_DoWork;
worker.ProgressChanged += worker_ProgressChanged;
worker.RunWorkerCompleted += worker_RunWorkerCompleted;
worker.RunWorkerAsync();
}
}
void worker_DoWork(object sender, DoWorkEventArgs e)
{
BackgroundWorker worker = sender as BackgroundWorker;
using (BinaryReader reader = new BinaryReader(File.OpenRead(filename)))
{
long length = reader.BaseStream.Length;
int pos = 0;
System.Windows.Application.Current.Dispatcher.Invoke(() =>
{
pbProgress.Maximum = length;
});
while (pos < length)
{
byte[] inputbytes = reader.ReadBytes(1000000);
Bytefreq = inputbytes.OrderBy(x => x).GroupBy(x => x).ToDictionary(x => x.Key, x => (long)(Bytefreq[x.Key] + x.Select(l => l).ToList().Count));
pos = pos + inputbytes.Length;
worker.ReportProgress(pos);
}
}
}
void worker_ProgressChanged(object sender, ProgressChangedEventArgs e)
{
pbProgress.Value = e.ProgressPercentage;
}
void worker1_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
{
System.Windows.MessageBox.Show("DONE");
System.Windows.Application.Current.Shutdown();
}
void worker_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
{
pbProgress.Value = 0;
foreach (KeyValuePair<byte, long> kv in Bytefreq)
{
Nodes.Add(new Node() { value = kv.Key, freq = kv.Value });
}
while (Nodes.Count > 1)
{
Nodes = Nodes.OrderBy(x => x.freq).ThenBy(x => x.value).ToList();
Node left = Nodes[0];
Node right = Nodes[1];
Node newnode = new Node() { LeftNode = left, RightNode = right, freq = left.freq + right.freq };
Nodes.Remove(left);
Nodes.Remove(right);
Nodes.Add(newnode);
}
Nodes[0].Traverse(string.Empty);
BackgroundWorker worker1 = new BackgroundWorker();
worker1.WorkerReportsProgress = true;
worker1.DoWork += worker1_DoWork;
worker1.ProgressChanged += worker_ProgressChanged;
worker1.RunWorkerCompleted += worker1_RunWorkerCompleted;
worker1.RunWorkerAsync();
}
void worker1_DoWork(object sender, DoWorkEventArgs e)
{
BackgroundWorker worker = sender as BackgroundWorker;
Dictionary<byte, string> bytelookup = ByteValues.ByteDictionary;
using (BinaryWriter writer = new BinaryWriter(File.Create(Environment.GetFolderPath(Environment.SpecialFolder.Desktop) + "\\Test.txt")))
{
using (BinaryReader reader = new BinaryReader(File.OpenRead(filename)))
{
long length = reader.BaseStream.Length;
int pos = 0;
while (pos < length)
{
byte[] inputbytes = reader.ReadBytes(1000000);
StringBuilder builder = new StringBuilder();
List<string> outputbytelist = inputbytes.Select(b => bytelookup[b]).ToList();
outputbytelist.ForEach(x => builder.Append(x));
int numOfBytes = builder.ToString().Length / 8;
var bytesAsStrings = builder.ToString().Select((c, i) => new { Char = c, Index = i })
.GroupBy(x => x.Index / 8)
.Select(g => new string(g.Select(x => x.Char).ToArray()));
byte[] finalbytes = bytesAsStrings.Select(s => Convert.ToByte(s, 2)).ToArray();
writer.BaseStream.Write(finalbytes, 0, finalbytes.Length);
pos = pos + inputbytes.Length;
worker.ReportProgress(pos);
}
}
}
}
}

The problem is in the type of data you trying to compress. So when you say "E.g 25 mb video occupies 24 mb after compression", the key word here is video. Video data is notoriously hard to compress (much like other types of binary data, such as music or images).
If you need to compress video, I'd search for dedicated codecs (MP4, MPEG, H.264), but some may not be free to use so watch for licenses costs. Note, that most codecs are lossy - they try to preserve visible quality but remove other information from video. Most of this stuff is good enough, but at some moment you may notice artifacts.
You can also attempt to use lossless compression (like Huffman, gzip, LZ, LZMA, 7z, most available from 7 zip sdk etc), but this won't compress your data well due to it's nature. The basic idea is: the more data resembles random noise, the harder it is to compress. Bonus point: you cannot physically compress random data with any lossless compression even by 1 bit (read about this here).

Related

Simulating message latency accurately

I have written a simple "latency simulator" which works, but at times, messages are delayed for longer than the time specified. I need help to ensure that messages are delayed for the correct amount of time.
The main problem, I believe, is that I am using Thread.Sleep(x), which is depended on various factors but mainly on the clock interrupt rate, which causes Thread.Sleep() to have a resolution of roughly 15ms. Further, intensive tasks will demand more CPU time and will occasionally result in a delay greater than the one requested. If you are not familiar with the resolution issues of Thread.Sleep, you can read these SO posts: here, here and here.
This is my LatencySimulator:
public class LatencySimulatorResult: EventArgs
{
public int messageNumber { get; set; }
public byte[] message { get; set; }
}
public class LatencySimulator
{
private int messageNumber;
private int latency = 0;
private int processedMessageCount = 0;
public event EventHandler messageReady;
public void Delay(byte[] message, int delay)
{
latency = delay;
var result = new LatencySimulatorResult();
result.message = message;
result.messageNumber = messageNumber;
if (latency == 0)
{
if (messageReady != null)
messageReady(this, result);
}
else
{
ThreadPool.QueueUserWorkItem(ThreadPoolCallback, result);
}
Interlocked.Increment(ref messageNumber);
}
private void ThreadPoolCallback(object threadContext)
{
Thread.Sleep(latency);
var next = (LatencySimulatorResult)threadContext;
var ready = next.messageNumber == processedMessageCount + 1;
while (ready == false)
{
ready = next.messageNumber == processedMessageCount + 1;
}
if (messageReady != null)
messageReady(this, next);
Interlocked.Increment(ref processedMessageCount);
}
}
To use it, you create a new instance and bind to the event handler:
var latencySimulator = new LatencySimulator();
latencySimulator.messageReady += MessageReady;
You then call latencySimulator.Delay(someBytes, someDelay);
When a message has finished being delayed, the event is fired and you can then process the delayed message.
It is important that the order in which messages are added is maintained. I cannot have them coming out the other end of the latency simulator in some random order.
Here is a test program to use the latency simulator and to see how long messages have been delayed for:
private static LatencySimulator latencySimulator;
private static ConcurrentDictionary<int, PendingMessage> pendingMessages;
private static List<long> measurements;
static void Main(string[] args)
{
var results = TestLatencySimulator();
var anomalies = results.Result.Where(x=>x > 32).ToList();
foreach (var result in anomalies)
{
Console.WriteLine(result);
}
Console.ReadLine();
}
static async Task<List<long>> TestLatencySimulator()
{
latencySimulator = new LatencySimulator();
latencySimulator.messageReady += MessageReady;
var numberOfMeasurementsMax = 1000;
pendingMessages = new ConcurrentDictionary<int, PendingMessage>();
measurements = new List<long>();
var sendTask = Task.Factory.StartNew(() =>
{
for (var i = 0; i < numberOfMeasurementsMax; i++)
{
var message = new Message { Id = i };
pendingMessages.TryAdd(i, new PendingMessage() { Id = i });
latencySimulator.Delay(Serialize(message), 30);
Thread.Sleep(50);
}
});
//Spin some tasks up to simulate high CPU usage
Task.Factory.StartNew(() => { FindPrimeNumber(100000); });
Task.Factory.StartNew(() => { FindPrimeNumber(100000); });
Task.Factory.StartNew(() => { FindPrimeNumber(100000); });
sendTask.Wait();
return measurements;
}
static long FindPrimeNumber(int n)
{
int count = 0;
long a = 2;
while (count < n)
{
long b = 2;
int prime = 1;// to check if found a prime
while (b * b <= a)
{
if (a % b == 0)
{
prime = 0;
break;
}
b++;
}
if (prime > 0)
{
count++;
}
a++;
}
return (--a);
}
private static void MessageReady(object sender, EventArgs e)
{
LatencySimulatorResult result = (LatencySimulatorResult)e;
var message = (Message)Deserialize(result.message);
if (pendingMessages.ContainsKey(message.Id) != true) return;
pendingMessages[message.Id].stopwatch.Stop();
measurements.Add(pendingMessages[message.Id].stopwatch.ElapsedMilliseconds);
}
static object Deserialize(byte[] arrBytes)
{
using (var memStream = new MemoryStream())
{
var binForm = new BinaryFormatter();
memStream.Write(arrBytes, 0, arrBytes.Length);
memStream.Seek(0, SeekOrigin.Begin);
var obj = binForm.Deserialize(memStream);
return obj;
}
}
static byte[] Serialize<T>(T obj)
{
BinaryFormatter bf = new BinaryFormatter();
using (var ms = new MemoryStream())
{
bf.Serialize(ms, obj);
return ms.ToArray();
}
}
If you run this code, you will see that about 5% of the messages are delayed for more than the expected 30ms. In fact, some are as high as 60ms. Without any background tasks or high CPU usage, the simulator behaves as expected.
I need them all to be 30ms (or as close to as possible) - I do not want some arbitrary 50-60ms delays.
Can anyone suggest how I can refactor this code so that I can achieve the desired result, but without the use of Thread.Sleep() and with as little CPU overhead as possible?

How to use BiQuadFilter of NAudio?

I use NAudio for recording sound from the microphone and save it in a file. I use for this:
public WaveFileWriter m_WaveFile = null;
m_WaveFile = new WaveFileWriter(strFile, m_WaveSource.WaveFormat);
void DataAvailable(object sender, WaveInEventArgs e)
{
if (m_WaveFile != null)
{
m_WaveFile.Write(e.Buffer, 0, e.BytesRecorded);
}
}
Now I would like to apply a highpassfilter and a lowpassfilter to the recorded sound. So far I found that BiQuadFilters would do this for me but so far I don't understand how to use that.
The examples I found look all like this:
var r = BiQuadFilter.LowPassFilter(44100, 1500, 1);
var reader = new WaveFileReader(File.OpenRead(strFile));
var filter = new MyWaveProvider(reader, r); // reader is the source for filter
var waveOut = new WaveOut();
waveOut.Init(filter); // filter is the source for waveOut
waveOut.Play();
If I understand this correctly then the Filter is applied to the class that is playing the sound. But I don't want to play the sound, I want the high and log pass filter applied to the file and save the result in a file. How can I do that?
edit:
This is MyWaveProvider class:
class MyWaveProvider : ISampleProvider
{
private ISampleProvider sourceProvider;
private float cutOffFreq;
private int channels;
private BiQuadFilter[] filters;
public MyWaveProvider (ISampleProvider sourceProvider, int cutOffFreq)
{
this.sourceProvider = sourceProvider;
this.cutOffFreq = cutOffFreq;
channels = sourceProvider.WaveFormat.Channels;
filters = new BiQuadFilter[channels];
CreateFilters();
}
private void CreateFilters()
{
for (int n = 0; n < channels; n++)
if (filters[n] == null)
filters[n] = BiQuadFilter.LowPassFilter(44100, cutOffFreq, 1);
else
filters[n].SetLowPassFilter(44100, cutOffFreq, 1);
}
public WaveFormat WaveFormat { get { return sourceProvider.WaveFormat; } }
public int Read(float[] buffer, int offset, int count)
{
int samplesRead = sourceProvider.Read(buffer, offset, count);
for (int i = 0; i < samplesRead; i++)
buffer[offset + i] = filters[(i % channels)].Transform(buffer[offset + i]);
return samplesRead;
}
}

The following code should satisfy what you are trying to do:
using (var reader = new WaveFileReader(File.OpenRead(strFile)))
{
var r = BiQuadFilter.LowPassFilter(44100, 1500, 1);
// reader is the source for filter
using (var filter = new MyWaveProvider(reader, r))
{
WaveFileWriter.CreateWaveFile("filteroutput.wav", filter);
}
}

There's an example of using the BiQuadFilter to make a multi-band equalizer in the NAudio WPF demo code. You can see the Equalizer code here

Importing and removing duplicates from a massive amount of text files using C# and Redis

This is a bit of a doozy and it's been a while since I worked with C#, so bear with me:
I'm running a jruby script to iterate through 900 files (5 Mb - 1500 Mb in size) to figure out how many dupes STILL exist within these (already uniq'd) files. I had little luck with awk.
My latest idea was to insert them into a local MongoDB instance like so:
db.collection('hashes').update({ :_id => hash}, { $inc: { count: 1} }, { upsert: true)
... so that later I could just query it like db.collection.where({ count: { $gt: 1 } }) to get all the dupes.
This is working great except it's been over 24 hours and at the time of writing I'm at 72,532,927 Mongo entries and growing.
I think Ruby's .each_line is bottlnecking the IO hardcore:
So what I'm thinking now is compiling a C# program which fires up a thread PER EACH FILE and inserts the line (md5 hash) into a Redis list.
From there, I could have another compiled C# program simply pop the values off and ignore the save if the count is 1.
So the questions are:
Will using a compiled file reader and multithreading the file reads significantly improve performance?
Is using Redis even necessary? With a tremendous amount of AWS memory, could I not just use the threads to fill some sort of a list atomically and proceed from there?
Thanks in advance.

Updated
New solution. Old solution. The main idea is to calculate dummy hashes(just sum of all chars in string) of each line and store it in Dictionary<ulong, List<LinePosition>> _hash2LinePositions. It's possible to have multiple hashes in the same stream and it solves by List in Dictionary Value. When the hashes are the same, we read and compare the strings from the streams. LinePosition is using for storing info about line - position in stream and its length. I don't have such huge files as you, but my tests shows that it works. Here is the full code:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
public class Solution
{
struct LinePosition
{
public long Start;
public long Length;
public LinePosition(long start, long count)
{
Start = start;
Length = count;
}
public override string ToString()
{
return string.Format("Start: {0}, Length: {1}", Start, Length);
}
}
class TextFileHasher : IDisposable
{
readonly Dictionary<ulong, List<LinePosition>> _hash2LinePositions;
readonly Stream _stream;
bool _isDisposed;
public HashSet<ulong> Hashes { get; private set; }
public string Name { get; private set; }
public TextFileHasher(string name, Stream stream)
{
Name = name;
_stream = stream;
_hash2LinePositions = new Dictionary<ulong, List<LinePosition>>();
Hashes = new HashSet<ulong>();
}
public override string ToString()
{
return Name;
}
public void CalculateFileHash()
{
int readByte = -1;
ulong dummyLineHash = 0;
// Line start position in file
long startPosition = 0;
while ((readByte = _stream.ReadByte()) != -1) {
// Read until new line
if (readByte == '\r' || readByte == '\n') {
// If there was data
if (dummyLineHash != 0) {
// Add line hash and line position to the dict
AddToDictAndHash(dummyLineHash, startPosition, _stream.Position - 1 - startPosition);
// Reset line hash
dummyLineHash = 0;
}
}
else {
// Was it new line ?
if (dummyLineHash == 0)
startPosition = _stream.Position - 1;
// Calculate dummy hash
dummyLineHash += (uint)readByte;
}
}
if (dummyLineHash != 0) {
// Add line hash and line position to the dict
AddToDictAndHash(dummyLineHash, startPosition, _stream.Position - startPosition);
// Reset line hash
dummyLineHash = 0;
}
}
public List<LinePosition> GetLinePositions(ulong hash)
{
return _hash2LinePositions[hash];
}
public List<string> GetDuplicates()
{
List<string> duplicates = new List<string>();
foreach (var key in _hash2LinePositions.Keys) {
List<LinePosition> linesPos = _hash2LinePositions[key];
if (linesPos.Count > 1) {
duplicates.AddRange(FindExactDuplicates(linesPos));
}
}
return duplicates;
}
public void Dispose()
{
if (_isDisposed)
return;
_stream.Dispose();
_isDisposed = true;
}
private void AddToDictAndHash(ulong hash, long start, long count)
{
List<LinePosition> linesPosition;
if (!_hash2LinePositions.TryGetValue(hash, out linesPosition)) {
linesPosition = new List<LinePosition>() { new LinePosition(start, count) };
_hash2LinePositions.Add(hash, linesPosition);
}
else {
linesPosition.Add(new LinePosition(start, count));
}
Hashes.Add(hash);
}
public byte[] GetLineAsByteArray(LinePosition prevPos)
{
long len = prevPos.Length;
byte[] lineBytes = new byte[len];
_stream.Seek(prevPos.Start, SeekOrigin.Begin);
_stream.Read(lineBytes, 0, (int)len);
return lineBytes;
}
private List<string> FindExactDuplicates(List<LinePosition> linesPos)
{
List<string> duplicates = new List<string>();
linesPos.Sort((x, y) => x.Length.CompareTo(y.Length));
LinePosition prevPos = linesPos[0];
for (int i = 1; i < linesPos.Count; i++) {
if (prevPos.Length == linesPos[i].Length) {
var prevLineArray = GetLineAsByteArray(prevPos);
var thisLineArray = GetLineAsByteArray(linesPos[i]);
if (prevLineArray.SequenceEqual(thisLineArray)) {
var line = System.Text.Encoding.Default.GetString(prevLineArray);
duplicates.Add(line);
}
#if false
string prevLine = System.Text.Encoding.Default.GetString(prevLineArray);
string thisLine = System.Text.Encoding.Default.GetString(thisLineArray);
Console.WriteLine("PrevLine: {0}\r\nThisLine: {1}", prevLine, thisLine);
StringBuilder sb = new StringBuilder();
sb.Append(prevPos);
sb.Append(" is '");
sb.Append(prevLine);
sb.Append("'. ");
sb.AppendLine();
sb.Append(linesPos[i]);
sb.Append(" is '");
sb.Append(thisLine);
sb.AppendLine("'. ");
sb.Append("Equals => ");
sb.Append(prevLine.CompareTo(thisLine) == 0);
Console.WriteLine(sb.ToString());
#endif
}
else {
prevPos = linesPos[i];
}
}
return duplicates;
}
}
public static void Main(String[] args)
{
List<TextFileHasher> textFileHashers = new List<TextFileHasher>();
string text1 = "abc\r\ncba\r\nabc";
TextFileHasher tfh1 = new TextFileHasher("Text1", new MemoryStream(System.Text.Encoding.Default.GetBytes(text1)));
tfh1.CalculateFileHash();
textFileHashers.Add(tfh1);
string text2 = "def\r\ncba\r\nwet";
TextFileHasher tfh2 = new TextFileHasher("Text2", new MemoryStream(System.Text.Encoding.Default.GetBytes(text2)));
tfh2.CalculateFileHash();
textFileHashers.Add(tfh2);
string text3 = "def\r\nbla\r\nwat";
TextFileHasher tfh3 = new TextFileHasher("Text3", new MemoryStream(System.Text.Encoding.Default.GetBytes(text3)));
tfh3.CalculateFileHash();
textFileHashers.Add(tfh3);
List<string> totalDuplicates = new List<string>();
Dictionary<ulong, Dictionary<TextFileHasher, List<LinePosition>>> totalHashes = new Dictionary<ulong, Dictionary<TextFileHasher, List<LinePosition>>>();
textFileHashers.ForEach(tfh => {
foreach(var dummyHash in tfh.Hashes) {
Dictionary<TextFileHasher, List<LinePosition>> tfh2LinePositions = null;
if (!totalHashes.TryGetValue(dummyHash, out tfh2LinePositions))
totalHashes[dummyHash] = new Dictionary<TextFileHasher, List<LinePosition>>() { { tfh, tfh.GetLinePositions(dummyHash) } };
else {
List<LinePosition> linePositions = null;
if (!tfh2LinePositions.TryGetValue(tfh, out linePositions))
tfh2LinePositions[tfh] = tfh.GetLinePositions(dummyHash);
else
linePositions.AddRange(tfh.GetLinePositions(dummyHash));
}
}
});
HashSet<TextFileHasher> alreadyGotDuplicates = new HashSet<TextFileHasher>();
foreach(var hash in totalHashes.Keys) {
var tfh2LinePositions = totalHashes[hash];
var tfh = tfh2LinePositions.Keys.FirstOrDefault();
// Get duplicates in the TextFileHasher itself
if (tfh != null && !alreadyGotDuplicates.Contains(tfh)) {
totalDuplicates.AddRange(tfh.GetDuplicates());
alreadyGotDuplicates.Add(tfh);
}
if (tfh2LinePositions.Count <= 1) {
continue;
}
// Algo to get duplicates in more than 1 TextFileHashers
var tfhs = tfh2LinePositions.Keys.ToArray();
for (int i = 0; i < tfhs.Length; i++) {
var tfh1Positions = tfhs[i].GetLinePositions(hash);
for (int j = i + 1; j < tfhs.Length; j++) {
var tfh2Positions = tfhs[j].GetLinePositions(hash);
for (int k = 0; k < tfh1Positions.Count; k++) {
var tfh1Pos = tfh1Positions[k];
var tfh1ByteArray = tfhs[i].GetLineAsByteArray(tfh1Pos);
for (int m = 0; m < tfh2Positions.Count; m++) {
var tfh2Pos = tfh2Positions[m];
if (tfh1Pos.Length != tfh2Pos.Length)
continue;
var tfh2ByteArray = tfhs[j].GetLineAsByteArray(tfh2Pos);
if (tfh1ByteArray.SequenceEqual(tfh2ByteArray)) {
var line = System.Text.Encoding.Default.GetString(tfh1ByteArray);
totalDuplicates.Add(line);
}
}
}
}
}
}
Console.WriteLine();
if (totalDuplicates.Count > 0) {
Console.WriteLine("Total number of duplicates: {0}", totalDuplicates.Count);
Console.WriteLine("#######################");
totalDuplicates.ForEach(x => Console.WriteLine("{0}", x));
Console.WriteLine("#######################");
}
// Free resources
foreach (var tfh in textFileHashers)
tfh.Dispose();
}
}

If you have tons of ram... You guys are overthinking it...
var fileLines = File.ReadAllLines(#"c:\file.csv").Distinct();

write file out of memory c#

I get some problems with c# windows form.
My goal is to slice a big file(maybe>5GB) into files,and each file contains a million lines.
According to the code below,I have no idea why it will be out of memory.
Thanks.
StreamReader readfile = new StreamReader(...);
StreamWriter writefile = new StreamWriter(...);
string content;
while ((content = readfile.ReadLine()) != null)
{
writefile.Write(content + "\r\n");
i++;
if (i % 1000000 == 0)
{
index++;
writefile.Close();
writefile.Dispose();
writefile = new StreamWriter(...);
}
label5.Text = i.ToString();
label5.Update();
}

The error is probably in the
label5.Text = i.ToString();
label5.Update();
just to make a test I've written something like:
for (int i = 0; i < int.MaxValue; i++)
{
label1.Text = i.ToString();
label1.Update();
}
The app freezes around 16000-18000 (Windows 7 Pro SP1 x64, the app running both x86 and x64).
What probably happens is that by running your long operation in the main thread of the app, you stall the message queue of the window, and at a certain point it freezes. You can see that this is the problem by adding a
Application.DoEvents();
instead of the
label5.Update();
But even this is a false solution. The correct solution is moving the copying on another thread and updating the control every x milliseconds, using the Invoke method (because you are on a secondary thread),
For example:
public void Copy(string source, string dest)
{
const int updateMilliseconds = 100;
int index = 0;
int i = 0;
StreamWriter writefile = null;
try
{
using (StreamReader readfile = new StreamReader(source))
{
writefile = new StreamWriter(dest + index);
// Initial value "back in time". Forces initial update
int milliseconds = unchecked(Environment.TickCount - updateMilliseconds);
string content;
while ((content = readfile.ReadLine()) != null)
{
writefile.Write(content);
writefile.Write("\r\n"); // Splitted to remove a string concatenation
i++;
if (i % 1000000 == 0)
{
index++;
writefile.Dispose();
writefile = new StreamWriter(dest + index);
// Force update
milliseconds = unchecked(milliseconds - updateMilliseconds);
}
int milliseconds2 = Environment.TickCount;
int diff = unchecked(milliseconds2 - milliseconds);
if (diff >= updateMilliseconds)
{
milliseconds = milliseconds2;
Invoke((Action)(() => label5.Text = string.Format("File {0}, line {1}", index, i)));
}
}
}
}
finally
{
if (writefile != null)
{
writefile.Dispose();
}
}
// Last update
Invoke((Action)(() => label5.Text = string.Format("File {0}, line {1} Finished", index, i)));
}
and call it with:
var thread = new Thread(() => Copy(#"C:\Temp\lst.txt", #"C:\Temp\output"));
thread.Start();
Note how it will write the label5 every 100 milliseconds, plus once at the beginning (by setting the initial value of milliseconds "back in time"), each time the output file is changed (by setting the value of milliseconds "back in time") and after having disposed everything.
An even more correct example can be written by using the BackgroundWorker class, that exists explicitly for this scenario. It has an event, ProgressChanged, that can be subscribed to update the window.
Something like this:
private void button1_Click(object sender, EventArgs e)
{
BackgroundWorker backgroundWorker = new BackgroundWorker();
backgroundWorker.WorkerReportsProgress = true;
backgroundWorker.ProgressChanged += backgroundWorker_ProgressChanged;
backgroundWorker.RunWorkerCompleted += backgroundWorker_RunWorkerCompleted;
backgroundWorker.DoWork += backgroundWorker_DoWork;
backgroundWorker.RunWorkerAsync(new string[] { #"C:\Temp\lst.txt", #"C:\Temp\output" });
}
private void backgroundWorker_DoWork(object sender, DoWorkEventArgs e)
{
BackgroundWorker worker = sender as BackgroundWorker;
string[] arguments = (string[])e.Argument;
string source = arguments[0];
string dest = arguments[1];
const int updateMilliseconds = 100;
int index = 0;
int i = 0;
StreamWriter writefile = null;
try
{
using (StreamReader readfile = new StreamReader(source))
{
writefile = new StreamWriter(dest + index);
// Initial value "back in time". Forces initial update
int milliseconds = unchecked(Environment.TickCount - updateMilliseconds);
string content;
while ((content = readfile.ReadLine()) != null)
{
writefile.Write(content);
writefile.Write("\r\n"); // Splitted to remove a string concatenation
i++;
if (i % 1000000 == 0)
{
index++;
writefile.Dispose();
writefile = new StreamWriter(dest + index);
// Force update
milliseconds = unchecked(milliseconds - updateMilliseconds);
}
int milliseconds2 = Environment.TickCount;
int diff = unchecked(milliseconds2 - milliseconds);
if (diff >= updateMilliseconds)
{
milliseconds = milliseconds2;
worker.ReportProgress(0, new int[] { index, i });
}
}
}
}
finally
{
if (writefile != null)
{
writefile.Dispose();
}
}
// For the RunWorkerCompleted
e.Result = new int[] { index, i };
}
void backgroundWorker_ProgressChanged(object sender, ProgressChangedEventArgs e)
{
int[] state = (int[])e.UserState;
label5.Text = string.Format("File {0}, line {1}", state[0], state[1]);
}
void backgroundWorker_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
{
int[] state = (int[])e.Result;
label5.Text = string.Format("File {0}, line {1} Finished", state[0], state[1]);
}

Converting Random Number/Letter Code to Read In Order?

So this is a noobish question but i have this code and it generates random numbers and letters.
Using this code:
private readonly Random _rng = new Random();
private const string charaters = "AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz1234567890";
private string RandomString(int size)
{
char[] buffer = new char[size];
for (int i = 0; i < size; i++)
{
buffer[i] = charaters[_rng.Next(charaters.Length)];
}
return new string(buffer);
}
private void timer1_Tick(object sender, EventArgs e)
{
textBox3.Text = RandomString(5);
}
And i want it to read through all the characters in order not by random.
How would i do this?

I'm not sure what you mean exactly, but this returns "AaBbC":
characters.Substring(0,5);
Piecing together your comments, I think what you're looking for is this:
string s = "AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz1234567890";
foreach (char c in s)
{
textBox3.Text += c.ToString() + Environment.Newline; // or perhaps textBox2?
}

Will this do the trick?
class Program
{
private static readonly Timer Timer = new Timer(100);
public static int CurrentValue;
private const string Charaters = "AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz1234567890";
static void Main()
{
Timer.Elapsed += delegate
{
if (CurrentValue < Charaters.Length)
{
Console.Write(string.Format("{0}{1}", Charaters[CurrentValue], Environment.NewLine));
CurrentValue++;
}
else
{
Timer.Stop();
}
};
Timer.Enabled = true;
Console.ReadLine();
}

Simple solution using LINQ:
string test = RandomString(5);
string result = string.Join(string.Empty, test.OrderBy(z => z));
This works because a string is an array of char.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Huffman Coding Issue with file size after compression - c#

Related

Simulating message latency accurately

How to use BiQuadFilter of NAudio?

Importing and removing duplicates from a massive amount of text files using C# and Redis

write file out of memory c#

Converting Random Number/Letter Code to Read In Order?

Categories

Resources