Well I am currently trying to implement a compression algorithm in my project, it has to be lz77 as a matter of effect... I am already able to decompress data but I can't imagine where to start in terms of compressing. I thought it would be best to pass by a byte array with data, but that's about it...
My problem is that all descriptions of the algorithm are quite cryptic for me to understand. I would appreciate a clear description of how the algorithm works and what I have to watch for.
Also: Am I obliged to use unsafe methods and pointers when coding in C#? It would be better to avoid that... I suppose.
Here is what I got so far thanks to the information you gave me:
private const int searchWindow = 4095;
private const byte lookaheadWindow = 15;
public static byte[] lzCompressData(byte[] input)
int position = 0;
List<byte> tempInput = input.ToList();
List<byte> output = new List<byte>();
MemoryStream init = new MemoryStream();
BinaryWriter inbw = new BinaryWriter(init);
inbw.Write(((input.Length << 8) & 0xFFFFFF00) | 0x10);
while (position < input.Length)
byte decoder = 0;
List<byte> tempOutput = new List<byte>();
for (int i = 0; i < 8; ++i)
List<byte> eligible;
if(position < 255)
eligible = tempInput.GetRange(0, position);
eligible = tempInput.GetRange(position - searchWindow, searchWindow);
if (!(position > input.Length - 8))
MemoryStream ms = new MemoryStream(eligible.ToArray());
List<byte> currentSequence = new List<byte>();
int offset = 0;
int length = 0;
long tempoffset = StreamHelper.FindPosition(ms, currentSequence.ToArray());
while ((tempoffset != -1) && (length < lookaheadWindow) && position < input.Length - 8)
offset = (int)tempoffset;
length = currentSequence.Count;
if (length >= 3)
decoder = (byte)(decoder | (byte)(1 << i));
byte b1 = (byte)((length << 4) | (offset >> 8));
byte b2 = (byte)(offset & 0xFF);
if (position < input.Length)
return output.ToArray();
I would apreciate a clear description of how the algorithm works and
what I have to watch for
it is very well explained here . If you have problem in understanding something specific please ask that.
Am I obliged to use unsafe methods and pointers when coding in C#
You don't have to worry about anything. No need to re-invent the wheel. it is already implemented. its implementation
I am iterating through an array of bytes and add values of another array of bytes in a for loop.
var random = new Random();
byte[] bytes = new byte[20_000_000];
byte[] bytes2 = new byte[20_000_000];
for (int i = 0; i < bytes.Length; i++)
bytes[i] = (byte)random.Next(255);
for (int i = 0; i < bytes.Length; i++)
bytes2[i] = (byte)random.Next(255);
//how to optimize the part below
for (int i = 0; i < bytes.Length; i++)
bytes[i] += bytes2[i];
Is there any way to speed up the process, so it can be faster than linear.
You could use Vector:
static void Add(Span<byte> dst, ReadOnlySpan<byte> src)
Span<Vector<byte>> dstVec = MemoryMarshal.Cast<byte, Vector<byte>>(dst);
ReadOnlySpan<Vector<byte>> srcVec = MemoryMarshal.Cast<byte, Vector<byte>>(src);
for (int i = 0; i < dstVec.Length; ++i)
dstVec[i] += srcVec[i];
for (int i = dstVec.Length * Vector<byte>.Count; i < dst.Length; ++i)
dst[i] += src[i];
Will go even faster if you use a pointer here to align one of your arrays.
Pad the array length to the next highest multiple of 8.(It already is in your example.)
Use an unsafe context to create two ulong arrays pointing to the start of the existing byte arrays. Use a for loop to iterate bytes.Length / 8 times adding 8 bytes at a time.
On my system this runs for less than 13 milliseconds. Compared to 105 milliseconds for the original code.
You must add the /unsafe option to use this code. Open the project properties and select "allow unsafe code".
var random = new Random();
byte[] bytes = new byte[20_000_000];
byte[] bytes2 = new byte[20_000_000];
int Len = bytes.Length >> 3; // >>3 is the same as / 8
ulong MASK = 0x8080808080808080;
ulong MASKINV = 0x7f7f7f7f7f7f7f7f;
//Sanity check
if((bytes.Length & 7) != 0) throw new Exception("bytes.Length is not a multiple of 8");
if((bytes2.Length & 7) != 0) throw new Exception("bytes2.Length is not a multiple of 8");
//Add 8 bytes at a time, taking into account overflow between bytes
fixed (byte* pbBytes = &bytes[0])
fixed (byte* pbBytes2 = &bytes2[0])
ulong* pBytes = (ulong*)pbBytes;
ulong* pBytes2 = (ulong*)pbBytes2;
for (int i = 0; i < Len; i++)
pBytes[i] = ((pBytes2[i] & MASKINV) + (pBytes[i] & MASKINV)) ^ ((pBytes[i] ^ pBytes2[i]) & MASK);
You can utilize all your processors/cores, assuming that your machine has more than one.
Parallel.ForEach(Partitioner.Create(0, bytes.Length), range =>
for (int i = range.Item1; i < range.Item2; i++)
bytes[i] += bytes2[i];
Update: The Vector<T> class can also be used in .NET Framework. It requires the package System.Numerics.Vectors. It offers the advantage of parallelization in a single core, by issuing a Single Instruction to Multiple Data (SIMD). Most current processors are SIMD-enabled. It is only enabled for 64-bit processes, so the flag [Prefer 32-bit] must be unchecked. On 32-bit processes the property Vector.IsHardwareAccelerated returns false, and the performance is bad.
using System.Numerics;
/// <summary>Adds each pair of elements in two arrays, and replaces the
/// left array element with the result.</summary>
public static void Add_UsingVector(byte[] left, byte[] right, int start, int length)
int i = start;
int step = Vector<byte>.Count; // the step is 16
int end = start + length - step + 1;
for (; i < end; i += step)
// Vectorize 16 bytes from each array
var vector1 = new Vector<byte>(left, i);
var vector2 = new Vector<byte>(right, i);
vector1 += vector2; // Vector arithmetic is unchecked only
vector1.CopyTo(left, i);
for (; i < start + length; i++) // Process the last few elements
unchecked { left[i] += right[i]; }
This runs 4-5 times faster than a simple loop, without utilizing more than one thread (25% CPU consumption in a 4-core PC).
Below is the code to visualize what's need to be done. I am looking for a solution that can do it faster. One of them is to Sum to arrays using bit manipulation (https://stackoverflow.com/a/55945544/4791668). I wonder if there is any way to do it the way described in the link and find the average at the same time.
var random = new Random();
byte[] bytes = new byte[20_000_000];
byte[] bytes2 = new byte[20_000_000];
for (int i = 0; i < bytes.Length; i++)
bytes[i] = (byte)random.Next(255);
for (int i = 0; i < bytes.Length; i++)
bytes2[i] = (byte)random.Next(255);
//how to optimize the part below
for (int i = 0; i < bytes.Length; i++)
bytes[i] = (byte)((bytes[i] + bytes2[i]) / 2);
/////////// Solution that needs to be improved. It doesn't do the average part.
var random = new Random();
byte[] bytes = new byte[20_000_000];
byte[] bytes2 = new byte[20_000_000];
int Len = bytes.Length >> 3; // >>3 is the same as / 8
ulong MASK = 0x8080808080808080;
ulong MASKINV = 0x7f7f7f7f7f7f7f7f;
//Sanity check
if((bytes.Length & 7) != 0) throw new Exception("bytes.Length is not a multiple of 8");
if((bytes2.Length & 7) != 0) throw new Exception("bytes2.Length is not a multiple of 8");
//Add 8 bytes at a time, taking into account overflow between bytes
fixed (byte* pbBytes = &bytes[0])
fixed (byte* pbBytes2 = &bytes2[0])
ulong* pBytes = (ulong*)pbBytes;
ulong* pBytes2 = (ulong*)pbBytes2;
for (int i = 0; i < Len; i++)
pBytes[i] = ((pBytes2[i] & MASKINV) + (pBytes[i] & MASKINV)) ^ ((pBytes[i] ^ pBytes2[i]) & MASK);
Using bit manipulation, you can compute the average of the bytes in parallel:
ulong NOLOW = 0xfefefefefefefefe;
unsafe {
//Add 8 bytes at a time, taking into account overflow between bytes
fixed (byte* pbBytes = &bytes[0])
fixed (byte* pbBytes2 = &bytes2[0])
fixed (byte* pbAns2 = &ans2[0]) {
ulong* pBytes = (ulong*)pbBytes;
ulong* pBytes2 = (ulong*)pbBytes2;
ulong* pAns2 = (ulong*)pbAns2;
for (int i = 0; i < Len; i++) {
pAns2[i] = (pBytes2[i] & pBytes[i]) + (((pBytes[i] ^ pBytes2[i]) & NOLOW) >> 1);
I modified the code to store in a separate ans byte array since I needed the source arrays to compare the two methods. Obviously you could store back to the original bytes[] if desired.
This is based on this formula: x+y == (x&y)+(x|y) == (x&y)*2 + (x^y) == (x&y)<<1 + (x^y), which means you can compute (x+y)/2 == (x&y)+((x^y) >> 1). Since we know we are computing 8 bytes at a time, we can mask the low order bit out of every byte so we shift in a 0 bit for the high order bit of every byte when we shift all 8 bytes.
On my PC this runs 2x to 3x faster (trending to 2x for longer arrays) than the (byte) sum.
I'm trying to create a code to decompress an RLE Byte-Oriented image from a PostScript File I've already tried solutions found around the web and also tried to build my own ; but none of them produced the result i need.
After decompressing the rle image, i should have an RAW image i can open on photoshop (informing width, height and number of channels). However when i try to open the extracted image it doesn't work ; only a black output is show.
My inputs are an Binary ASCII Encoded file (encoded as a hexadecimal string) and a binary file ; both RLE Byte-Oriented compressed (in the hex file case, its just a question of converting it to bytes before trying the rle decompression).
I've posted samples here.
WorkingSample.raw -> Image Sample i got using another software, and its dimensions as well.
MySample.raw -> Image sample i built using my code, and its dimensions as well.
OriginalFile.ppf -> File containing the original image data and everything else.
ExtractedBinary.bin -> Only a binary portion from OriginalFile.ppf - makes it easier to read and work with the data.
This code was provided by the user nyerguds, he's part of the SO Community.
Original Source: http://www.shikadi.net/moddingwiki/RLE_Compression#Types_of_RLE
Its the one i tried to use but the results weren't correct. And to be honest i had difficulties understanding his code (he told me to change a few things in order to get it working for my case but i was unable to).
And here's what i tried to do following the PostScript Red Book:
Book: https://www.adobe.com/content/dam/acom/en/devnet/actionscript/articles/PLRM.pdf
The part:
"The RunLengthEncode filter encodes data in a simple-byte oriented format based on run length.
The compressed data format is a sequence of runs, where each run consists of a length byte followed by 1 to 128 bytes of data. If the length byte is in the range 0 to 127, the following length + 1 bytes (1 to 128 bytes) are to be copied literally upon decompression. If length is in the range of 129 to 255, the following single byte is to be replicated 257 - length times (2 to 128 times) upon decompression."
Page 142, RunLengthEncode Filter.
List<byte> final = new List<byte>();
var split01 = ArraySplit(bytefile, 2);
foreach (var binPart in split01)
if (binPart.ElementAt(0) <= 127)
int currLen = binPart[0] + 1;
for (int i = 0; i <= binPart[0]; i++)
else if (binPart[0] >= 128)
int currLen = 257 - binPart[0];
for (int i = 0; i < currLen; i++)
// Console.WriteLine(binPart[1]);
File.WriteAllBytes(#"C:\test\again.raw", final.ToArray());
private static IEnumerable<byte[]> ArraySplit(byte[] bArray, int intBufforLengt)
int bArrayLenght = bArray.Length;
byte[] bReturn = null;
int i = 0;
for (; bArrayLenght > (i + 1) * intBufforLengt; i++)
bReturn = new byte[intBufforLengt];
Array.Copy(bArray, i * intBufforLengt, bReturn, 0, intBufforLengt);
yield return bReturn;
int intBufforLeft = bArrayLenght - i * intBufforLengt;
if (intBufforLeft > 0)
bReturn = new byte[intBufforLeft];
Array.Copy(bArray, i * intBufforLengt, bReturn, 0, intBufforLeft);
yield return bReturn;
private static byte[] StringToByteArray(String hex)
int iValue = 0;
int NumberChars = hex.Length;
if (NumberChars % 2 != 0)
string m = string.Empty;
byte[] bytes = new byte[NumberChars / 2];
for (int i = 0; i < NumberChars; i += 2)
bytes[i / 2] = Convert.ToByte(hex.Substring(i, 2), 16);
iValue = i;
catch (Exception e)
var value = iValue;
return bytes;
The desired output would be an TIFF Grayscale. However, i can deal with PNG''s also.
I've managed to extract uncompressed data from this kind of file already ; with Emgu(OpenCV Wrapper) i was able to create a viewable image and do my logic on it.
My actual results from RLE Compressed are only invalid RAW files that can't be viewed even on photoshop or IrfanViewer.
Any input is appreciated. Thanks.
EDIT1: stuck on this part
for(int i=0; i < bytefile.Length; i+=2)
var lengthByte = bytefile[i];
if (lengthByte <= 127)
int currLen = lengthByte + 1;
for (int j = 0; j < currLen; j++)
if (bytefile[i] >= 128)
int currLen = 257 - bytefile[i];
for (int k = 0; k < currLen; k++)
final.Add(bytefile[i + 1]);
This is the logic i'm following. Before it was raising an Exception but i figured it out (it was because i forgot to add the ending byte ; makes no difference in the final result).
Try this basic outline:
int i = 0;
while (i < bytefile.length)
var lengthByte = bytefile[i++];
if (lengthByte <= 127)
int currLen = lengthByte + 1;
for (int j = 0; j < currLen; j++)
int currLen = 257 - lengthByte;
byte byteToCopy = bytefile[i++];
for (int j = 0; j < currLen; j++)
This is how I understand what's specified above, anyway.
Although not explicitly stated, I believe you are attempting to extract a RunLength Encoded image from a Postscript file and save that out as a grayscale TIFF.
As a starting point for something like this, have you tried simply saving out an uncompressed image from a Postscript file as a grayscale TIFF to ensure your application logic responsible for building up the TIFF image data indeed works as you expect it to? I'd caution that would a be a good first step before moving onto now supporting decompressing RLE data to then turn into a TIFF.
The reason I think that's important is because your problem may have nothing to do with how you're decompressing the RLE data but rather how you're creating your output TIFF from presumably correctly decoded data.
I am trying to send a UDP packet of bytes corresponding to the numbers 1-1000 in sequence. How do I convert each number (1,2,3,4,...,998,999,1000) into the minimum number of bytes required and put them in a sequence that I can send as a UDP packet?
I've tried the following with no success. Any help would be greatly appreciated!
List<byte> byteList = new List<byte>();
for (int i = 1; i <= 255; i++)
byte[] nByte = BitConverter.GetBytes((byte)i);
foreach (byte b in nByte)
for (int g = 256; g <= 1000; g++)
UInt16 st = Convert.ToUInt16(g);
byte[] xByte = BitConverter.GetBytes(st);
foreach (byte c in xByte)
byte[] sendMsg = byteList.ToArray();
Thank you.
You need to use :
Think about how you are going to be able to tell the difference between:
260, 1 -> 0x1, 0x4, 0x1
1, 4, 1 -> 0x1, 0x4, 0x1
If you use one byte for numbers up to 255 and two bytes for the numbers 256-1000, you won't be able to work out at the other end which number corresponds to what.
If you just need to encode them as described without worrying about how they are decoded, it smacks to me of a contrived homework assignment or test, and I'm uninclined to solve it for you.
I think you are looking for something along the lines of a 7-bit encoded integer:
protected void Write7BitEncodedInt(int value)
uint num = (uint) value;
while (num >= 0x80)
this.Write((byte) (num | 0x80));
num = num >> 7;
this.Write((byte) num);
(taken from System.IO.BinaryWriter.Write(String)).
The reverse is found in the System.IO.BinaryReader class and looks something like this:
protected internal int Read7BitEncodedInt()
byte num3;
int num = 0;
int num2 = 0;
if (num2 == 0x23)
throw new FormatException(Environment.GetResourceString("Format_Bad7BitInt32"));
num3 = this.ReadByte();
num |= (num3 & 0x7f) << num2;
num2 += 7;
while ((num3 & 0x80) != 0);
return num;
I do hope this is not homework, even though is really smells like it.
Ok, so to put it all together for you:
using System;
using System.IO;
namespace EncodedNumbers
class Program
protected static void Write7BitEncodedInt(BinaryWriter bin, int value)
uint num = (uint)value;
while (num >= 0x80)
bin.Write((byte)(num | 0x80));
num = num >> 7;
static void Main(string[] args)
MemoryStream ms = new MemoryStream();
BinaryWriter bin = new BinaryWriter(ms);
for(int i = 1; i < 1000; i++)
Write7BitEncodedInt(bin, i);
byte[] data = ms.ToArray();
int size = data.Length;
Console.WriteLine("Total # of Bytes = " + size);
The total size I get is 1871 bytes for numbers 1-1000.
Btw, could you simply state whether or not this is homework? Obviously, we will still help either way. But we would much rather you try a little harder so you can actually learn for yourself.
EDIT #2:
If you want to just pack them in ignoring the ability to decode them back, you can do something like this:
protected static void WriteMinimumInt(BinaryWriter bin, int value)
byte[] bytes = BitConverter.GetBytes(value);
int skip = bytes.Length-1;
while (bytes[skip] == 0)
for (int i = 0; i <= skip; i++)
This ignores any bytes that are zero (from MSB to LSB). So for 0-255 it will use one byte.
As states elsewhere, this will not allow you to decode the data back since the stream is now ambiguous. As a side note, this approach crams it down to 1743 bytes (as opposed to 1871 using 7-bit encoding).
A byte can only hold 256 distinct values, so you cannot store the numbers above 255 in one byte. The easiest way would be to use short, which is 16 bits. If you realy need to conserve space, you can use 10 bit numbers and pack that into a byte array ( 10 bits = 2^10 = 1024 possible values).
Naively (also, untested):
List<byte> bytes = new List<byte>();
for (int i = 1; i <= 1000; i++)
byte[] nByte = BitConverter.GetBytes(i);
foreach(byte b in nByte) bytes.Add(b);
byte[] byteStream = bytes.ToArray();
Will give you a stream of bytes were each group of 4 bytes is a number [1, 1000].
You might be tempted to do some work so that i < 256 take a single byte, i < 65535 take two bytes, etc. However, if you do this you can't read the values out of the stream. Instead, you'd add length encoding or sentinels bits or something of the like.
I'd say, don't. Just compress the stream, either using a built-in class, or gin up a Huffman encoding implementation using an agree'd upon set of frequencies.
I have a BitArray with the length of 8, and I need a function to convert it to a byte. How to do it?
Specifically, I need a correct function of ConvertToByte:
BitArray bit = new BitArray(new bool[]
false, false, false, false,
false, false, false, true
//How to write ConvertToByte
byte myByte = ConvertToByte(bit);
var recoveredBit = new BitArray(new[] { myByte });
Assert.AreEqual(bit, recoveredBit);
This should work:
byte ConvertToByte(BitArray bits)
if (bits.Count != 8)
throw new ArgumentException("bits");
byte[] bytes = new byte[1];
bits.CopyTo(bytes, 0);
return bytes[0];
A bit late post, but this works for me:
public static byte[] BitArrayToByteArray(BitArray bits)
byte[] ret = new byte[(bits.Length - 1) / 8 + 1];
bits.CopyTo(ret, 0);
return ret;
Works with:
string text = "Test";
byte[] bytes = System.Text.Encoding.ASCII.GetBytes(text);
BitArray bits = new BitArray(bytes);
bytes[] bytesBack = BitArrayToByteArray(bits);
string textBack = System.Text.Encoding.ASCII.GetString(bytesBack);
// bytes == bytesBack
// text = textBack
A poor man's solution:
protected byte ConvertToByte(BitArray bits)
if (bits.Count != 8)
throw new ArgumentException("illegal number of bits");
byte b = 0;
if (bits.Get(7)) b++;
if (bits.Get(6)) b += 2;
if (bits.Get(5)) b += 4;
if (bits.Get(4)) b += 8;
if (bits.Get(3)) b += 16;
if (bits.Get(2)) b += 32;
if (bits.Get(1)) b += 64;
if (bits.Get(0)) b += 128;
return b;
Unfortunately, the BitArray class is partially implemented in .Net Core class (UWP). For example BitArray class is unable to call the CopyTo() and Count() methods. I wrote this extension to fill the gap:
public static IEnumerable<byte> ToBytes(this BitArray bits, bool MSB = false)
int bitCount = 7;
int outByte = 0;
foreach (bool bitValue in bits)
if (bitValue)
outByte |= MSB ? 1 << bitCount : 1 << (7 - bitCount);
if (bitCount == 0)
yield return (byte) outByte;
bitCount = 8;
outByte = 0;
// Last partially decoded byte
if (bitCount < 7)
yield return (byte) outByte;
The method decodes the BitArray to a byte array using LSB (Less Significant Byte) logic. This is the same logic used by the BitArray class. Calling the method with the MSB parameter set on true will produce a MSB decoded byte sequence. In this case, remember that you maybe also need to reverse the final output byte collection.
This should do the trick. However the previous answer is quite likely the better option.
public byte ConvertToByte(BitArray bits)
if (bits.Count > 8)
throw new ArgumentException("ConvertToByte can only work with a BitArray containing a maximum of 8 values");
byte result = 0;
for (byte i = 0; i < bits.Count; i++)
if (bits[i])
result |= (byte)(1 << i);
return result;
In the example you posted the resulting byte will be 0x80. In other words the first value in the BitArray coresponds to the first bit in the returned byte.
That's should be the ultimate one. Works with any length of array.
private List<byte> BoolList2ByteList(List<bool> values)
List<byte> ret = new List<byte>();
int count = 0;
byte currentByte = 0;
foreach (bool b in values)
if (b) currentByte |= (byte)(1 << count);
if (count == 7) { ret.Add(currentByte); currentByte = 0; count = 0; };
if (count < 7) ret.Add(currentByte);
return ret;
In addition to #JonSkeet's answer you can use an Extension Method as below:
public static byte ToByte(this BitArray bits)
if (bits.Count != 8)
throw new ArgumentException("bits");
byte[] bytes = new byte[1];
bits.CopyTo(bytes, 0);
return bytes[0];
And use like:
BitArray foo = new BitArray(new bool[]
false, false, false, false,false, false, false, true
byte GetByte(BitArray input)
int len = input.Length;
if (len > 8)
len = 8;
int output = 0;
for (int i = 0; i < len; i++)
if (input.Get(i))
output += (1 << (len - 1 - i)); //this part depends on your system (Big/Little)
//output += (1 << i); //depends on system
return (byte)output;
Little endian byte array converter : First bit (indexed with "0") in the BitArray
assumed to represents least significant bit (rightmost bit in the bit-octet) which interpreted as "zero" or "one" as binary.
public static class BitArrayExtender {
public static byte[] ToByteArray( this BitArray bits ) {
const int BYTE = 8;
int length = ( bits.Count / BYTE ) + ( (bits.Count % BYTE == 0) ? 0 : 1 );
var bytes = new byte[ length ];
for ( int i = 0; i < bits.Length; i++ ) {
int bitIndex = i % BYTE;
int byteIndex = i / BYTE;
int mask = (bits[ i ] ? 1 : 0) << bitIndex;
bytes[ byteIndex ] |= (byte)mask;
return bytes;