Encoding an array of bytes similar to Base64, but with arbitrary radix - c#

Does the procedure have a name, where you take a stream of 8-bit bytes and slice them into n-bit snippets stored in 8-bit containers?
The idea is very similar to Base64 encoding, where you split the stream of 1's and 0's into 6-bit chunks (instead of 8), meaning each chunk can have a decimal value of 0 - 63, each of which is assigned a unique human-readable character. In my case, I'm not looking to assign each chunk a specific character.
For example, the input 8-bit bytes:
11100101 01101100 01010011 00001100 11000000 10111101
become the 6-bit snippets:
111001 010110 110001 010011 000011 001100 000010 111101
which are subsequently stored as:
00111001 00010110 00110001 00010011 00000011 00001100 00000010 00111101
or, optionally, with an offset of 1 bit:
01110010 00101100 01100010 00100110 00000110 00011000 00000100 01111010
or and offset of 2 bits:
11100100 01011000 11000100 01001100 00001100 00110000 00001000 11110100
I was looking to write an algorithm in C# to encode a byte array to an arbitrary length with arbitrary offset, and another algorithm to convert it back again.
After quite a lot of headache, I thought I had successfully written the forward algorithm to encode an array of bytes. It worked for all my test cases, but when started writing the reverse algorithm I realised the whole problem was a lot more complicated than I thought it would be, and, in fact, my forward algorithm didn't work where n < 4.
I wanted to write the algorithms with bitwise operators, which is the more proper and elegant solution. The other way would have been to dump the byte array as a long string of 1's and 0's to slice, but that would have been much, much slower.
Here is my forward algorithm that works for cases where n >= 4:
public static byte[] EncodeForward(byte[] input, int n, int offset = 0)
{
byte[] output = new byte[(int)Math.Ceiling(input.Length * 8.0 / n)];
output[0] = (byte)(input[0] >> (8 - n));
int p = 1;
int r = 8 - n;
for (int i = 1; i < input.Length; i++)
{
output[p++] = (byte)((byte)((byte)(input[i - 1] << (8 - r)) | (byte)(input[i] >> r)) >> (8 - n - offset));
if ((r += (8 - n)) == n)
{
output[p++] = (byte)(input[i] & (byte)(0xFF >> (8 - n)));
r = 0;
}
}
return output;
}
I originally conceived it for just the case of n = 7, so each output byte would be composed by parts of at most 7 input bytes. However in the case where n < 4, each output byte would be composed by up to, I think, ceil(8/n) input bytes, so the process is a little more complex than above.
I was hoping to write the forward and reverse algorithms myself, but, honestly, after all this time debugging and testing what I've written and now finding this approach will never work for n < 4, I'm just looking for something that works. These two algorithms are just a very small piece of the project I'm working on.
Does this encoding/decoding procedure have a name, and is there either a built-in way to do it in C# or is there a library that will do it?

You are almost there. You just need and intermediate 16-bit buffer and an unprocessed bits counter. Disclaimer: I don't know C#. The (pseudo) code below is written with C in mind; you may need some tweaking.
For encoding,
uint16_t mask = 0xffff << (16 - width);
uint16_t buffer = (input[0] << 8) | uint[1];
i += 2;
int remaining = 16;
while (i < input.Length) {
while (remaining >= width) {
output[p++] = (buffer & mask) >> (16 - width);
buffer <<= width;
remaining -= width;
}
// Refill the buffer. Since it is 16-bit wide there is a room
// for an _entire_ input byte.
buffer |= input[i++] << (8 - remaining);
remaining += 8;
}
emit_remaining_bits(buffer, remaining);
For decoding:
uint16_t buffer = 0;
int remaining = 16;
while (i < input.Length) {
while (remaining > 8) {
buffer |= input[i++] << (remaining - width);
remaining += width;
}
output[p++] = (buffer >> 8) & 0x00ff;
buffer <<= 8;
remaining += 8;
}

Related

How would you compress 256-byte string consists of only "F" and "G"?

Theoretically, how much you can compress this 256-byte string containing only "F" and "G"?
FGFFFFFFGFFFFGGGGGGGGGGGGGFFFFFGGGGGGGGGGGGFFGFGGGFFFGGGGGGGGFFFFFFFFFFFFFFFFFFFFFGGGGGGFFFGFGGFGFFFFGFFGFGGFFFGFGGFGFFFGFGGGGFGGGGGGGGGFFFFFFFFGGGGGGGFFFFFGFFGGGGGGGFFFGGGFFGGGGGGFFGGGGGGGGGFFGFFGFGFFGFFGFFFFGGGGFGGFGGGFFFGGGFFFGGGFFGGFFGGGGFFGFGGFFFGFGGF
While I don't see a real world application, it is intriguing that compression algorithms like gz, bzip2 and deflate have a disadvantage in this case.
Well, I have this answer and the C# code to demonstrate:
using System;
public class Program
{
public static void Main()
{
string testCase = "FGFFFFFFGFFFFGGGGGGGGGGGGGFFFFFGGGGGGGGGGGGFFGFGGGFFFGGGGGGGGFFFFFFFFFFFFFFFFFFFFFGGGGGGFFFGFGGFGFFFFGFFGFGGFFFGFGGFGFFFGFGGGGFGGGGGGGGGFFFFFFFFGGGGGGGFFFFFGFFGGGGGGGFFFGGGFFGGGGGGFFGGGGGGGGGFFGFFGFGFFGFFGFFFFGGGGFGGFGGGFFFGGGFFFGGGFFGGFFGGGGFFGFGGFFFGFGGF";
uint[] G = new uint[8]; // 256 bit
for (int i = 0; i < testCase.Length; i++)
G[(i / 32)] += (uint)(((testCase[i] & 1)) << (i % 32));
for (int i = 0; i < 8; i++)
Console.WriteLine(G[i]);
string gTestCase = string.Empty;
//G 71 0100 0111
//F 70 0100 0110
for (int i = 0; i < 256; i++)
gTestCase += (char)((((uint)G[i / 32] & (1 << (i % 32))) >> (i % 32)) | 70);
Console.WriteLine(testCase);
Console.WriteLine(gTestCase);
if (testCase == gTestCase)
Console.WriteLine("OK.");
}
}
It may sound silly, but as to how I can improve the algorithm so that this 256-bit decimal number can be further compressed, I have the following idea:
(Note: The following are different topics of discussion but related to compressing 256-byte further)
From my understanding of Microsoft's implementation of Decimal,
96-bit + 96-bit = 128-bit decimal.
Which implies that a 192-byte string containing of any two distinct characters can be encoded as 128-bit number instead of 192-bit number. Correct?
My questions are:
Can I do the same with 256-byte strings?
(by splitting each of them into a pair of two numbers before adding those two as a Decimal shorter than 256-bit)?
How do I decode the above-mentioned 128-bit Decimal back to a pair of two 96-bit numbers, while maintaining the compressed data size less than 192-bit?
Sorry for my previous rather vague question.
The following code would demonstrate how to add two 96-char "binary" strings as 128-char binary string.
public static string AddBinary(string a, string b) // 96-char binary strings
{
int[] x = { 0, 0, 0 };
int[] y = { 0, 0, 0 };
string c = String.Empty;
for (int z = 0; z < a.Length; z++)
x[(z / 32)] |= ((byte)(a[a.Length - z - 1]) & 1) << (z % 32);
for (int z = 0; z < b.Length; z++)
y[(z / 32)] |= ((byte)(b[b.Length - z - 1]) & 1) << (z % 32);
decimal m = new decimal(x[0], x[1], x[2], false, 0); //96-bit
decimal n = new decimal(y[0], y[1], y[2], false, 0); //96-bit
decimal k = decimal.Add(m, n);
int[] l = decimal.GetBits(k); //128-bit
Console.WriteLine(k);
for (int z = 127; z >= 0; z--)
c += (char)(((l[(z / 32)] & (1 << (z % 32))) >> (z % 32)) | 48);
return c.Contains("1") ? c.TrimStart('0') : "0";
}
96-bit + 96-bit = 128-bit decimal.
That is a misunderstanding. Decimal is 96bit integer/mantissa, a sign and an exponent from 0 to 28 (~5bit) to form a scaling factor for the mantissa.
Addition is from 2×(1+5+96) bits to 1×(1+5+96) bits, including inevitable rounding errors and overflow.
You can't get summands from a sum easily - for starters, addition is symmetrical, there is no earthly way of knowing which of two summands has been the first and which the second.
Paul Hankin mentioned the programmer's variant of compressibility: Kolmogorov complexity.
In all fairness, you'd have to add to the 256 bits of your recoding of the input string the size of a program to turn those bits into the original string.
(As would gz, bzip2, deflate(, LZW) - decoders for "pure LZ" can be very small. The usual escape is to define a file format, including a recognisably header.)
Lasse V. Karlsen mentioned one consequence of the Pigeon-hole principle: to tell each combination of 192 bits from every other one, you need no less than 2^192 codes.

fast way to convert integer array to byte array (11 bit)

I have integer array and I need to convert it to byte array
but I need to take (only and just only) first 11 bit of each element of the هinteger array
and then convert it to a byte array
I tried this code
// ***********convert integer values to byte values
//***********to avoid the left zero padding on the byte array
// *********** first step : convert to binary string
// ***********second step : convert binary string to byte array
// *********** first step
string ByteString = Convert.ToString(IntArray[0], 2).PadLeft(11,'0');
for (int i = 1; i < IntArray.Length; i++)
ByteString = ByteString + Convert.ToString(IntArray[i], 2).PadLeft(11, '0');
// ***********second step
int numOfBytes = ByteString.Length / 8;
byte[] bytes = new byte[numOfBytes];
for (int i = 0; i < numOfBytes; ++i)
{
bytes[i] = Convert.ToByte(ByteString.Substring(8 * i, 8), 2);
}
But it takes too long time (if the file size large , the code takes more than 1 minute)
I need a very very fast code (very few milliseconds only )
can any one help me ?
Basically, you're going to be doing a lot of shifting and masking. The exact nature of that depends on the layout you want. If we assume that we pack little-endian from each int, appending on the left, so two 11-bit integers with positions:
abcdefghijk lmnopqrstuv
become the 8-bit chunks:
defghijk rstuvabc 00lmnopq
(i.e. take the lowest 8 bits of the first integer, which leaves 3 left over, so pack those into the low 3 bits of the next byte, then take the lowest 5 bits of the second integer, then finally the remaining 6 bits, padding with zero), then something like this should work:
using System;
using System.Linq;
static class Program
{
static string AsBinary(int val) => Convert.ToString(val, 2).PadLeft(11, '0');
static string AsBinary(byte val) => Convert.ToString(val, 2).PadLeft(8, '0');
static void Main()
{
int[] source = new int[1432];
var rand = new Random(123456);
for (int i = 0; i < source.Length; i++)
source[i] = rand.Next(0, 2047); // 11 bits
// Console.WriteLine(string.Join(" ", source.Take(5).Select(AsBinary)));
var raw = Encode(source);
// Console.WriteLine(string.Join(" ", raw.Take(6).Select(AsBinary)));
var clone = Decode(raw);
// now prove that it worked OK
if (source.Length != clone.Length)
{
Console.WriteLine($"Length: {source.Length} vs {clone.Length}");
}
else
{
int failCount = 0;
for (int i = 0; i < source.Length; i++)
{
if (source[i] != clone[i] && failCount++ == 0)
{
Console.WriteLine($"{i}: {source[i]} vs {clone[i]}");
}
}
Console.WriteLine($"Errors: {failCount}");
}
}
static byte[] Encode(int[] source)
{
long bits = source.Length * 11;
int len = (int)(bits / 8);
if ((bits % 8) != 0) len++;
byte[] arr = new byte[len];
int bitOffset = 0, index = 0;
for (int i = 0; i < source.Length; i++)
{
// note: this encodes little-endian
int val = source[i] & 2047;
int bitsLeft = 11;
if(bitOffset != 0)
{
val = val << bitOffset;
arr[index++] |= (byte)val;
bitsLeft -= (8 - bitOffset);
val >>= 8;
}
if(bitsLeft >= 8)
{
arr[index++] = (byte)val;
bitsLeft -= 8;
val >>= 8;
}
if(bitsLeft != 0)
{
arr[index] = (byte)val;
}
bitOffset = bitsLeft;
}
return arr;
}
private static int[] Decode(byte[] source)
{
int bits = source.Length * 8;
int len = (int)(bits / 11);
// note no need to worry about remaining chunks - no ambiguity since 11 > 8
int[] arr = new int[len];
int bitOffset = 0, index = 0;
for(int i = 0; i < source.Length; i++)
{
int val = source[i] << bitOffset;
int bitsLeftInVal = 11 - bitOffset;
if(bitsLeftInVal > 8)
{
arr[index] |= val;
bitOffset += 8;
}
else if(bitsLeftInVal == 8)
{
arr[index++] |= val;
bitOffset = 0;
}
else
{
arr[index++] |= (val & 2047);
if(index != arr.Length) arr[index] = val >> 11;
bitOffset = 8 - bitsLeftInVal;
}
}
return arr;
}
}
If you need a different layout you'll need to tweak it.
This encodes 512 MiB in just over a second on my machine.
Overview to the Encode method:
The first thing is does is pre-calculate the amount of space that is going to be required, and allocate the output buffer; since each input contributes 11 bits to the output, this is just some modulo math:
long bits = source.Length * 11;
int len = (int)(bits / 8);
if ((bits % 8) != 0) len++;
byte[] arr = new byte[len];
We know the output position won't match the input, and we know we're going to be starting each 11-bit chunk at different positions in bytes each time, so allocate variables for those, and loop over the input:
int bitOffset = 0, index = 0;
for (int i = 0; i < source.Length; i++)
{
...
}
return arr;
So: taking each input in turn (where the input is the value at position i), take the low 11 bits of the value - and observe that we have 11 bits (of this value) still to write:
int val = source[i] & 2047;
int bitsLeft = 11;
Now, if the current output value is partially written (i.e. bitOffset != 0), we should deal with that first. The amount of space left in the current output is 8 - bitOffset. Since we always have 11 input bits we don't need to worry about having more space than values to fill, so: left-shift our value by bitOffset (pads on the right with bitOffset zeros, as a binary operation), and "or" the lowest 8 bits of this with the output byte. Essentially this says "if bitOffset is 3, write the 5 low bits of val into the 5 high bits of the output buffer"; finally, fixup the values: increment our write position, record that we have fewer bits of the current value still to write, and use right-shift to discard the 8 low bits of val (which is made of bitOffset zeros and 8 - bitOffset "real" bits):
if(bitOffset != 0)
{
val = val << bitOffset;
arr[index++] |= (byte)val;
bitsLeft -= (8 - bitOffset);
val >>= 8;
}
The next question is: do we have (at least) an entire byte of data left? We might not, if bitOffset was 1 for example (so we'll have written 7 bits already, leaving just 4). If we do, we can just stamp that down and increment the write position - then once again track how many are left and throw away the low 8 bits:
if(bitsLeft >= 8)
{
arr[index++] = (byte)val;
bitsLeft -= 8;
val >>= 8;
}
And it is possible that we've still got some left-over; for example, if bitOffset was 7 we'll have written 1 bit in the first chunk, 8 bits in the second, leaving 2 more to write - or if bitOffset was 0 we won't have written anything in the first chunk, 8 in the second, leaving 3 left to write. So, stamp down whatever is left, but do not increment the write position - we've written to the low bits, but the next value might need to write to the high bits. Finally, update bitOffset to be however many low bits we wrote in the last step (which could be zero):
if(bitsLeft != 0)
{
arr[index] = (byte)val;
}
bitOffset = bitsLeft;
The Decode operation is the reverse of this logic - again, calculate the sizes and prepare the state:
int bits = source.Length * 8;
int len = (int)(bits / 11);
int[] arr = new int[len];
int bitOffset = 0, index = 0;
Now loop over the input:
for(int i = 0; i < source.Length; i++)
{
...
}
return arr;
Now, bitOffset is the start position that we want to write to in the current 11-bit value, so if we start at the start, it will be 0 on the first byte, then 8; 3 bits of the second byte join with the first 11-bit integer, so the 5 bits become part of the second - so bitOffset is 5 on the 3rd byte, etc. We can calculate the number of bits left in the current integer by subtracting from 11:
int val = source[i] << bitOffset;
int bitsLeftInVal = 11 - bitOffset;
Now we have 3 possible scenarios:
1) if we have more than 8 bits left in the current value, we can stamp down our input (as a bitwise "or") but do not increment the write position (as we have more to write for this value), and note that we're 8-bits further along:
if(bitsLeftInVal > 8)
{
arr[index] |= val;
bitOffset += 8;
}
2) if we have exactly 8 bits left in the current value, we can stamp down our input (as a bitwise "or") and increment the write position; the next loop can start at zero:
else if(bitsLeftInVal == 8)
{
arr[index++] |= val;
bitOffset = 0;
}
3) otherwise, we have less than 8 bits left in the current value; so we need to write the first bitsLeftInVal bits to the current output position (incrementing the output position), and whatever is left to the next output position. Since we already left-shifted by bitOffset, what this really means is simply: stamp down (as a bitwise "or") the low 11 bits (val & 2047) to the current position, and whatever is left (val >> 11) to the next if that wouldn't exceed our output buffer (padding zeros). Then calculate our new bitOffset:
else
{
arr[index++] |= (val & 2047);
if(index != arr.Length) arr[index] = val >> 11;
bitOffset = 8 - bitsLeftInVal;
}
And that's basically it. Lots of bitwise operations - shifts (<< / >>), masks (&) and combinations (|).
If you wanted to store the least significant 11 bits of an int into two bytes such that the least significant byte has bits 1-8 inclusive and the most significant byte has 9-11:
int toStore = 123456789;
byte msb = (byte) ((toStore >> 8) & 7); //or 0b111
byte lsb = (byte) (toStore & 255); //or 0b11111111
To check this, 123456789 in binary is:
0b111010110111100110100010101
MMMLLLLLLLL
The bits above L are lsb, and have a value of 21, above M are msb and have a value of 5
Doing the work is the shift operator >> where all the binary digits are slid to the right 8 places (8 of them disappear from the right hand side - they're gone, into oblivion):
0b111010110111100110100010101 >> 8 =
0b1110101101111001101
And the mask operator & (the mask operator works by only keeping bits where, in each position, they're 1 in the value and also 1 in the mask) :
0b111010110111100110100010101 &
0b000000000000000000011111111 (255) =
0b000000000000000000000010101
If you're processing an int array, just do this in a loop:
byte[] bs = new byte[ intarray.Length*2 ];
for(int x = 0, b=0; x < intarray.Length; x++){
int toStore = intarray[x];
bs[b++] = (byte) ((toStore >> 8) & 7);
bs[b++] = (byte) (toStore & 255);
}

Reading 24-bit samples from a .WAV file

I understand how to read 8-bit, 16-bit & 32-bit samples (PCM & floating-point) from a .wav file, since (conveniently) the .Net Framework has an in-built integral type for those exact sizes. But, I don't know how to read (and store) 24-bit (3 byte) samples.
How can I read 24-bit audio? Is there maybe some way I can alter my current method (below) for reading 32-bit audio to solve my problem?
private List<float> Read32BitSamples(FileStream stream, int sampleStartIndex, int sampleEndIndex)
{
var samples = new List<float>();
var bytes = ReadChannelBytes(stream, Channels.Left, sampleStartIndex, sampleEndIndex); // Reads bytes of a single channel.
if (audioFormat == WavFormat.PCM) // audioFormat determines whether to process sample bytes as PCM or floating point.
{
for (var i = 0; i < bytes.Length / 4; i++)
{
samples.Add(BitConverter.ToInt32(bytes, i * 4) / 2147483648f);
}
}
else
{
for (var i = 0; i < bytes.Length / 4; i++)
{
samples.Add(BitConverter.ToSingle(bytes, i * 4));
}
}
return samples;
}
Reading (and storing) 24-bit samples is very simple. Now, as you've rightly said, a 3 byte integral type does not exist within the framework, which means you're left with two choices; either create your own type, or, you can pad your 24-bit samples by inserting an empty byte (0) to the start of your sample's byte array therefore making them 32-bit samples (so you can then use an int to store/manipulate them).
I will explain and demonstrate how to do the later (which is also in my opinion the more simpler approach).
First we must look at how a 24-bit sample would be stored within an int,
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ MSB ~ ~ 2ndMSB ~ ~ 2ndLSB ~ ~ LSB ~ ~
24-bit sample: 11001101 01101001 01011100 00000000
32-bit sample: 11001101 01101001 01011100 00101001
MSB = Most Significant Byte, LSB = Lest Significant Byte.
As you can see the LSB of the 24-bit sample is 0, therefore all you have to is declare a byte[] with 4 elements, then read the 3 bytes of the sample into the array (starting at element 1) so that your array looks like below (effectively bit shifting by 8 places to the left),
myArray[0]: 00000000
myArray[1]: 01011100
myArray[2]: 01101001
myArray[3]: 11001101
Once you have your byte array full you can pass it to BitConverter.ToInt32(myArray, 0);, you will then need to shift the sample by 8 places to the right to get the sample in it's proper 24-bit intergal representation (from -8388608 to 8388608); then divide by 8388608 to have it as a floating-point value.
So, putting that all together you should end up with something like this,
Note, I wrote the following code with the intention to be "easy-to-follow", therefore this will not be the most performant method, for a faster solution see the code below this one.
private List<float> Read24BitSamples(FileStream stream, int startIndex, int endIndex)
{
var samples = new List<float>();
var bytes = ReadChannelBytes(stream, Channels.Left, startIndex, endIndex);
var temp = new List<byte>();
var paddedBytes = new byte[bytes.Length / 3 * 4];
// Right align our samples to 32-bit (effectively bit shifting 8 places to the left).
for (var i = 0; i < bytes.Length; i += 3)
{
temp.Add(0); // LSB
temp.Add(bytes[i]); // 2nd LSB
temp.Add(bytes[i + 1]); // 2nd MSB
temp.Add(bytes[i + 2]); // MSB
}
// BitConverter requires collection to be an array.
paddedBytes = temp.ToArray();
temp = null;
bytes = null;
for (var i = 0; i < paddedBytes.Length / 4; i++)
{
samples.Add(BitConverter.ToInt32(paddedBytes, i * 4) / 2147483648f); // Skip the bit shift and just divide, since our sample has been "shited" 8 places to the right we need to divide by 2147483648, not 8388608.
}
return samples;
}
For a faster1 implementation you can do the following instead,
private List<float> Read24BitSamples(FileStream stream, int startIndex, int endIndex)
{
var bytes = ReadChannelBytes(stream, Channels.Left, startIndex, endIndex);
var samples = new float[bytes.Length / 3];
for (var i = 0; i < bytes.Length; i += 3)
{
samples[i / 3] = (bytes[i] << 8 | bytes[i + 1] << 16 | bytes[i + 2] << 24) / 2147483648f;
}
return samples.ToList();
}
1 After benchmarking the above code against the previous method, this solution is approximately 450% to 550% faster.

How is PNG CRC calculated exactly?

For the past 4 hours I've been studying the CRC algorithm. I'm pretty sure I got the hang of it already.
I'm trying to write a png encoder, and I don't wish to use external libraries for the CRC calculation, nor for the png encoding itself.
My program has been able to get the same CRC's as the examples on tutorials. Like on Wikipedia:
Using the same polynomial and message as in the example, I was able to produce the same result in both of the cases. I was able to do this for several other examples as well.
However, I can't seem to properly calculate the CRC of png files. I tested this by creating a blank, one pixel big .png file in paint, and using it's CRC as a comparision. I copied the data (and chunk name) from the IDAT chunk of the png (which the CRC is calculated from), and calculated it's CRC using the polynomial provided in the png specification.
The polynomial provided in the png specification is the following:
x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x2 + x + 1
Which should translate to:
1 00000100 11000001 00011101 10110111
Using that polynomial, I tried to get the CRC of the following data:
01001001 01000100 01000001 01010100
00011000 01010111 01100011 11101000
11101100 11101100 00000100 00000000
00000011 00111010 00000001 10011100
This is what I get:
01011111 11000101 01100001 01101000 (MSB First)
10111011 00010011 00101010 11001100 (LSB First)
This is what is the actual CRC:
11111010 00010110 10110110 11110111
I'm not exactly sure how to fix this, but my guess would be I'm doing this part from the specification wrong:
In PNG, the 32-bit CRC is initialized to all 1's, and then the data from each byte is processed from the least significant bit (1) to the most significant bit (128). After all the data bytes are processed, the CRC is inverted (its ones complement is taken). This value is transmitted (stored in the datastream) MSB first. For the purpose of separating into bytes and ordering, the least significant bit of the 32-bit CRC is defined to be the coefficient of the x31 term.
I'm not completely sure I can understand all of that.
Also, here is the code I use to get the CRC:
public BitArray GetCRC(BitArray data)
{
// Prepare the divident; Append the proper amount of zeros to the end
BitArray divident = new BitArray(data.Length + polynom.Length - 1);
for (int i = 0; i < divident.Length; i++)
{
if (i < data.Length)
{
divident[i] = data[i];
}
else
{
divident[i] = false;
}
}
// Calculate CRC
for (int i = 0; i < divident.Length - polynom.Length + 1; i++)
{
if (divident[i] && polynom[0])
{
for (int j = 0; j < polynom.Length; j++)
{
if ((divident[i + j] && polynom[j]) || (!divident[i + j] && !polynom[j]))
{
divident[i + j] = false;
}
else
{
divident[i + j] = true;
}
}
}
}
// Strip the CRC off the divident
BitArray crc = new BitArray(polynom.Length - 1);
for (int i = data.Length, j = 0; i < divident.Length; i++, j++)
{
crc[j] = divident[i];
}
return crc;
}
So, how do I fix this to match the PNG specification?
You can find a complete implementation of the CRC calculation (and PNG encoding in general) in this public domain code:
static uint[] crcTable;
// Stores a running CRC (initialized with the CRC of "IDAT" string). When
// you write this to the PNG, write as a big-endian value
static uint idatCrc = Crc32(new byte[] { (byte)'I', (byte)'D', (byte)'A', (byte)'T' }, 0, 4, 0);
// Call this function with the compressed image bytes,
// passing in idatCrc as the last parameter
private static uint Crc32(byte[] stream, int offset, int length, uint crc)
{
uint c;
if(crcTable==null){
crcTable=new uint[256];
for(uint n=0;n<=255;n++){
c = n;
for(var k=0;k<=7;k++){
if((c & 1) == 1)
c = 0xEDB88320^((c>>1)&0x7FFFFFFF);
else
c = ((c>>1)&0x7FFFFFFF);
}
crcTable[n] = c;
}
}
c = crc^0xffffffff;
var endOffset=offset+length;
for(var i=offset;i<endOffset;i++){
c = crcTable[(c^stream[i]) & 255]^((c>>8)&0xFFFFFF);
}
return c^0xffffffff;
}
1 https://web.archive.org/web/20150825201508/http://upokecenter.dreamhosters.com/articles/png-image-encoder-in-c/

Please help figure out what this C# method is doing?

I am a little confused as to what is being accomplished by this method. It seems to be attempting to break bytes into nibbles and reassemble the nibbles with nibbles from other bytes to form new bytes and then return a new sequence of bytes.
However, I didn't think, you could take nibbles from a byte using modulus and subtraction and division, nor reassemble them with simple multiplication and addition.
I want to better understand what how this method works, and what it is doing, so I can get some comments around it and then see if it can be converted to make more sense using more more standard methods of nibbling bytes and even take advantage of .Net 4.0 if possible.
private static byte[] Process(byte[] bytes)
{
Queue<byte> newBytes = new Queue<byte>();
int phase = 0;
byte nibble1 = 0;
byte nibble2 = 0;
byte nibble3 = 0;
int length = bytes.Length-1;
for (int i = 0; i < length; i++)
{
switch (phase)
{
case 0:
nibble1 = (byte)((bytes[i] - (bytes[i] % 4)) / 4);
nibble2 = (byte)(byte[i] % 4);
nibble3 = 0;
break;
case 1:
nibble2 = (byte)((nibble2 * 4) + (bytes[i] - (bytes[i] % 16))/16);
nibble3 = (byte)(bytes[i] % 16);
if (i < 4)
{
newBytes.Clear();
newBytes.Enqueue((byte)((16 * nibble1) + nibble2));
}
else
newBytes.Enqueue((byte)((16 * nibble1) + nibble2));
break;
case 2:
nibble1 = nibble3;
nibble2 = (byte)((bytes[i] - (bytes[i] % 4)) / 4);
nibble3 = (byte)(bytes[i] % 4);
newBytes.Enqueue((byte)((16 * nibble1) + nibble2));
break;
case 3:
nibble1 = (byte)((nibble3 * 4) + (bytes[i] - (bytes[i] % 16))/16);
nibble2 = (byte)(bytes[i] % 16);
newBytes.Enqueue((byte)((16 * nibble1) + nibble2));
break;
}
phase = (phase + 1) % 4;
}
return newBytes.ToArray();
}
Multiplication by 2 is the same as shifting bits one place to the left. (So multiply by 4 is shifting 2 places, and so on).
Division by 2 is the same as shifting bits one place to the right.
The modulus operator is being used to mask parts of the values. Modulus N where N = 2^p, will give you the value contained in (p-1) bits of the original value. So
value % 4
Would be the same as
value & 7 // 7 the largest value you can make with 3 bits (4-1). 4 + 2 +1.
Addition and subtraction can be used to combine the values. For instance if you know n and z to be 4-bit values, then both the following statements would combine them into one byte, with n placed in the upper 4 bits:
value = (n * 16) + z;
Versus
value = (n << 4) | z;
I am not entirely sure, but the code appears to be rearranging the nibbles in each byte and flipping them (so 0xF0 becomes 0x0F). It may be trying to compress or encrypt the bytes - difficult to tell without representative input.
In regards to the different things happening in the function:
Dividing by 4 is the same a rightshifting twice (>> 2)
Dividing by 16 is the same a rightshifting four times (>> 4)
Multiplying by 4 is the same a leftshifting twice (<< 2)
Multiplyingby 16 is the same a leftshifting four times (<< 4)
These parts reconstruct a byte from nibbles, the first nibble is placed in the higher order part, the second in the lower order:
(byte)((16 * nibble1) + nibble2)
So if nibble1 is 0x0F and nibble2 is 0x0C, the operation results in a leftshift of the nibble1 by 4, resulting in 0xF0 then nibble2 is added, resulting in 0xFF.

Categories