Remove null bytes from the beginning of a stream

Remove null bytes from the beginning of a stream - c#

I have a class in a dll which parses a file and returns a Stream which represents an FAT image (or any other)
My problem is when there is any other image the class creates about 3702 (on average) null bytes at the beginning of the stream.
So I have to edit the stream first and then save it to a file.
I have a code already but it works slow.
[Note : fts is the returned FileStream.]
BufferedStream bfs = new BufferedStream(fts);
BinaryReader bbr = new BinaryReader(bfs);
byte[] all_bytes = bbr.ReadBytes((int)fts.Length);
List<byte> nls = new List<byte>();
int index = 0;
foreach (byte bbrs in all_bytes)
{
if (bbrs == 0x00)
{
index++;
nls.Add(bbrs);
}
else
{
break;
}
}
byte[] nulls = new byte[nls.Count];
nulls = nls.ToArray();
//File.WriteAllBytes(outputDir + "Nulls.bin", nulls);
long siz = fts.Length - index;
byte[] file = new byte[siz];
bbr.BaseStream.Position = index;
file = bbr.ReadBytes((int)siz);
bbr.Close();
bfs.Close();
fts.Close();
bfs = null;
fts = null;
fts = new FileStream(outputDir + "Image.bin", FileMode.Create, FileAccess.Write);
bfs = new BufferedStream(fts);
bfs.Write(file, 0, (int)siz);
bfs.Close();
fts.Close();
Now, my question is :
How can I remove the nulls more efficiently and faster than the above code?

Instead of pushing bytes onto a List you could simply loop through your stream until you find the first non-null byte and then just copy the array from there using Array.Copy.
I would think about something like this (untested code):
int index = 0;
int currByte = 0;
while ((currByte = bbrs.ReadByte()) == 0x00)
{
index++;
}
// now currByte and everything to the end of the stream are the bytes you want.

Related

streamreader/buffered stream C#

I have a class Value
the output of Value is used as an input to other classes and eventually in Main.
In Main a logic is performed and output is produced for first 512 bits. I want my program to return back to value() to start with next 512 bits of file.txt. How can I do that?
public static byte[] Value()
{
byte[] numbers = new byte[9999];
using (FileStream fs = File.Open(#"C:\Users\file.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (BufferedStream bs = new BufferedStream(fs))
using (StreamReader sr = new StreamReader(bs))
{
string line;
while ((line = sr.ReadLine()) != null)
{
for (int i = 0; i < 512; i++)
{
numbers[i] = Byte.Parse(line[i].ToString());
}
}
}
return numbers;
}

What can be done is to pass Value() an offset and a length parameter.
But there is a problem with your method, you are actually taking the first bytes for each line in the file, which I don't know is what you want to do. So I corrected this to make sure you return only length bytes.
using System.Linq Skip and Take methods, you may find things easier as well
public static byte[] Value(int startOffset, int length)
{
byte allBytes = File.ReadAllBytes(#"C:\Users\file.txt");
return allBytes.Skip(startOffset).Take(length);
}

It seems like what you are trying to do is use a recursive call on Value() this is based on your comment, but it is not clear, so I am going to do that assumption.
there is a problem I see and it's like in your scenario you're returning a byte[], So I modified your code a little bit to make it as closest as your's.
/// <summary>
/// This method will call your `value` methodand return the bytes and it is the entry point for the loci.
/// </summary>
/// <returns></returns>
public static byte[] ByteValueCaller()
{
byte[] numbers = new byte[9999];
Value(0, numbers);
return numbers;
}
public static void Value(int startingByte, byte[] numbers)
{
using (FileStream fs = File.Open(#"C:\Users\file.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (BinaryReader br = new BinaryReader(fs))
{
//64bytes == 512bit
//determines if the last position to use is inside your stream, or if the last position is the end of the stream.
int bytesToRead = startingByte + 64 > br.BaseStream.Length ? (int)br.BaseStream.Length - startingByte : 64;
//move your stream to the given possition
br.BaseStream.Seek(startingByte, SeekOrigin.Begin);
//populates databuffer with the given bytes
byte[] dataBuffer = br.ReadBytes(bytesToRead);
//This method will migrate from our temporal databuffer to the numbers array.
TransformBufferArrayToNumbers(startingByte, dataBuffer, numbers);
//recursive call to the same
if (startingByte + bytesToRead < fs.Length)
Value(startingByte + bytesToRead, numbers);
}
static void TransformBufferArrayToNumbers(int startingByte, byte[] dataBuffer, byte[] numbers)
{
for (var i = 0; i < dataBuffer.Length; i++)
{
numbers[startingByte + i] = dataBuffer[i];
}
}
}
Also, be careful with the byte[9999] as you are limiting the characters you can get, if that's a hardcoded limit, I will add also that information on the if that determines the recursive call.

#TiGreX
public static List<byte> ByteValueCaller()
{
List<byte> numbers = new List<byte>();
GetValue(0, numbers);
return numbers;
}
public static void GetValue(int startingByte, List<byte> numbers)
{
using (FileStream fs = File.Open(#"C:\Users\file1.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
using (BinaryReader br = new BinaryReader(fs))
{
//64bytes == 512bit
//determines if the last position to use is inside your stream, or if the last position is the end of the stream.
int bytesToRead = startingByte + 64 > br.BaseStream.Length ? (int)br.BaseStream.Length - startingByte : 64;
//move your stream to the given possition
br.BaseStream.Seek(startingByte, SeekOrigin.Begin);
//populates databuffer with the given bytes
byte[] dataBuffer = br.ReadBytes(bytesToRead);
numbers.AddRange(dataBuffer);
//recursive call to the same
if (startingByte + bytesToRead < fs.Length)
GetValue(startingByte + bytesToRead, numbers);
}
}

Read a large binary file(5GB) into a byte array in C#?

I have a recording file (Binary file) more than 5 GB, i have to read that file and filter out the data needed to be send to server.
Problem is byte[] array supports till 2GB of file data . so just need help if someone had already dealt with this type of situation.
using (FileStream str = File.OpenRead(textBox2.Text))
{
int itemSectionStart = 0x00000000;
BinaryReader breader = new BinaryReader(str);
breader.BaseStream.Position = itemSectionStart;
int length = (int)breader.BaseStream.Length;
byte[] itemSection = breader.ReadBytes(length ); //first frame data
}
issues:
1: Length is crossing the range of integer.
2: tried using long and unint but byte[] only supports integer
Edit.
Another approach i want to give try, Read data on frame buffer basis, suppose my frame buffer size is 24000 . so byte array store that many frames data and then process the frame data and then flush out the byte array and store another 24000 frame data. till keep on going till end of binary file..

See you can not read that much big file at once, so you have to either split the file in small portions and then process the file.
OR
Read file using buffer concept and once you are done with that buffer data then flush out that buffer.
I faced the same issue, so i tried the buffer based approach and it worked for me.
FileStream inputTempFile = new FileStream(Path, FileMode.OpenOrCreate, FileAccess.Read);
Buffer_value = 1024;
byte[] Array_buffer = new byte[Buffer_value];
while ((bytesRead = inputTempFile.Read(Array_buffer, 0, Buffer_value)) > 0)
{
for (int z = 0; z < Array_buffer.Length; z = z + 4)
{
string temp_id = BitConverter.ToString(Array_buffer, z, 4);
string[] temp_strArrayID = temp_id.Split(new char[] { '-' });
string temp_ArraydataID = temp_strArrayID[0] + temp_strArrayID[1] + temp_strArrayID[2] + temp_strArrayID[3];
}
}
this way you can process your data.
For my case i was trying to store buffer read data in to a List, it will work fine till 2GB data after that it will throw memory exception.
The approach i followed, read the data from buffer and apply needed filters and write filter data in to a text file and then process that file.
//text file approach
FileStream inputTempFile = new FileStream(Path, FileMode.OpenOrCreate, FileAccess.Read);
Buffer_value = 1024;
StreamWriter writer = new StreamWriter(Path, true);
byte[] Array_buffer = new byte[Buffer_value];
while ((bytesRead = inputTempFile.Read(Array_buffer, 0, Buffer_value)) > 0)
{
for (int z = 0; z < Array_buffer.Length; z = z + 4)
{
string temp_id = BitConverter.ToString(Array_buffer, z, 4);
string[] temp_strArrayID = temp_id.Split(new char[] { '-' });
string temp_ArraydataID = temp_strArrayID[0] + temp_strArrayID[1] + temp_strArrayID[2] + temp_strArrayID[3];
if(temp_ArraydataID =="XYZ Condition")
{
writer.WriteLine(temp_ArraydataID);
}
}
}
writer.Close();

As said in comments, I think you have to read your file with a stream. Here is how you can do this:
int nbRead = 0;
var step = 10000;
byte[] buffer = new byte[step];
do
{
nbRead = breader.Read(buffer, 0, step);
hugeArray.Add(buffer);
foreach(var oneByte in hugeArray.SelectMany(part => part))
{
// Here you can read byte by byte this subpart
}
}
while (nbRead > 0);
If I well understand your needs, you are looking for a specific pattern into your file?
I think you can do it by looking for the start of your pattern byte by byte. Once you find it, you can start reading the important bytes. If the whole important data is greater than 2GB, as said in the comments, you will have to send it to your server in several parts.

FileStream Seek fails on large files at second call

I'm working with large files , beginning from 10Gb. I'm loading the parts of the file in the memory for processing. Following code works fine for smaller files (700Mb)
byte[] byteArr = new byte[layerPixelCount];
using (FileStream fs = File.OpenRead(recFileName))
{
using (BinaryReader br = new BinaryReader(fs))
{
fs.Seek(offset, SeekOrigin.Begin);
for (int i = 0; i < byteArr.Length; i++)
{
byteArr[i] = (byte)(br.ReadUInt16() / 256);
}
}
}
After opening a 10Gb file, the first run of this function is OK. But the second Seek() throws an IO exception:
An attempt was made to move the file pointer before the beginning of the file.
The numbers are:
fs.Length = 11998628352
offset = 4252580352
byteArr.Length = 7746048
I assumed that GC didn't collect the closed fs reference before the second call and tried
GC.Collect();
GC.WaitForPendingFinalizers();
but no luck.
Any help is apreciated

I'm guessing it's because either your signed integer indexer or offset is rolling over to negative values. Try declaring offset and i as long.
//Offest is now long
long offset = 4252580352;
byte[] byteArr = new byte[layerPixelCount];
using (FileStream fs = File.OpenRead(recFileName))
{
using (BinaryReader br = new BinaryReader(fs))
{
fs.Seek(offset, SeekOrigin.Begin);
for (long i = 0; i < byteArr.Length; i++)
{
byteArr[i] = (byte)(br.ReadUInt16() / 256);
}
}
}

My following written code logic is appropriate with large files beyond 4GB. The key issue to notice is the LONG data type used with the SEEK method. As a LONG is able to point beyond 2^32 data boundaries. In this example, the code is processing first processing the large file in chunks of 1GB, after the large whole 1GB chunks are processed, the left over (<1GB) bytes are processed. I use this code with calculating the CRC of files beyond the 4GB size. (using https://crc32c.machinezoo.com/ for the crc32c calculation in this example)
private uint Crc32CAlgorithmBigCrc(string fileName)
{
uint hash = 0;
byte[] buffer = null;
FileInfo fileInfo = new FileInfo(fileName);
long fileLength = fileInfo.Length;
int blockSize = 1024000000;
decimal div = fileLength / blockSize;
int blocks = (int)Math.Floor(div);
int restBytes = (int)(fileLength - (blocks * blockSize));
long offsetFile = 0;
uint interHash = 0;
Crc32CAlgorithm Crc32CAlgorithm = new Crc32CAlgorithm();
bool firstBlock = true;
using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read))
{
buffer = new byte[blockSize];
using (BinaryReader br = new BinaryReader(fs))
{
while (blocks > 0)
{
blocks -= 1;
fs.Seek(offsetFile, SeekOrigin.Begin);
buffer = br.ReadBytes(blockSize);
if (firstBlock)
{
firstBlock = false;
interHash = Crc32CAlgorithm.Compute(buffer);
hash = interHash;
}
else
{
hash = Crc32CAlgorithm.Append(interHash, buffer);
}
offsetFile += blockSize;
}
if (restBytes > 0)
{
Array.Resize(ref buffer, restBytes);
fs.Seek(offsetFile, SeekOrigin.Begin);
buffer = br.ReadBytes(restBytes);
hash = Crc32CAlgorithm.Append(interHash, buffer);
}
buffer = null;
}
}
//MessageBox.Show(hash.ToString());
//MessageBox.Show(hash.ToString("X"));
return hash;
}

BinaryWriter problem - "code adds some byte between Write() method"

I am try to do some code using BinaryWriter and Then BinaryReader.
When I wanna write I use method Write().
But the problem is that between two lines of Write method there appears a new byte which is in ASCII table in decimal 31 (sometines 24).
You can see it on this image:
You can see that byte at index 4 (5th byte) is of ASCII decimal value 31. I didnt insert it there. As you can see 1st 4 bytes are reserved for a number (Int32), next are other data (some text mostly - this is not important now).
As you can see from the code i write:
- into 1st line a number 10
- into 2nd line text "This is some text..."
How come came that 5th byte (dec 31) in between??
And this is the code I have:
static void Main(string[] args)
{
//
//// SEND - RECEIVE:
//
SendingData();
Console.ReadLine();
}
private static void SendingData()
{
int[] commandNumbers = { 1, 5, 10 }; //10 is for the users (when they send some text)!
for (int i = 0; i < commandNumbers.Length; i++)
{
//convert to byte[]
byte[] allBytes;
using (MemoryStream ms = new MemoryStream())
{
using (BinaryWriter bw = new BinaryWriter(ms))
{
bw.Write(commandNumbers[i]); //allocates 1st 4 bytes - FOR MAIN COMMANDS!
if (commandNumbers[i] == 10)
bw.Write("This is some text at command " + commandNumbers[i]); //HERE ON THIS LINE IS MY QUESTION!!!
}
allBytes = ms.ToArray();
}
//convert back:
int valueA = 0;
StringBuilder sb = new StringBuilder();
foreach (var b in GetData(allBytes).Select((a, b) => new { Value = a, Index = b }))
{
if (b.Index == 0) //1st num
valueA = BitConverter.ToInt32(b.Value, 0);
else //other text
{
foreach (byte _byte in b.Value)
sb.Append(Convert.ToChar(_byte));
}
}
if (sb.ToString().Length == 0)
sb.Append("ONLY COMMAND");
Console.WriteLine("Command = {0} and Text is \"{1}\".", valueA, sb.ToString());
}
}
private static IEnumerable<byte[]> GetData(byte[] data)
{
using (MemoryStream ms = new MemoryStream(data))
{
using (BinaryReader br = new BinaryReader(ms))
{
int j = 0;
byte[] buffer = new byte[4];
for (int i = 0; i < data.Length; i++)
{
buffer[j++] = data[i];
if (i == 3) //SENDING COMMAND DATA
{
yield return buffer;
buffer = new byte[1];
j = 0;
}
else if (i > 3) //SENDING TEXT
{
yield return buffer;
j = 0;
}
}
}
}
}

If you look at the documentation for Write(string), you'll see that it writes a length-prefixed string. So the 31 is the number of characters in your string -- perfectly normal.

You should probably be using Encoding.GetBytes and then write the bytes instead of writing a string
for example
bw.Write(
Encoding.UTF8.GetBytes("This is some text at command " + commandNumbers[i])
);

When a string is written to a binary stream, the first thing it does is write the length of the string. The string "This is some text at command 10" has 31 characters, which is the value you're seeing.

You should check the documentation of methods you use before asking questions about them:
A length-prefixed string represents the string length by prefixing to
the string a single byte or word that contains the length of that
string. This method first writes the length of the string as a UTF-7
encoded unsigned integer, and then writes that many characters to the
stream by using the BinaryWriter instance's current encoding.
;-)
(Though in fact it is an LEB128 and not UTF-7, according to Wikipedia).

The reason this byte is there because you're adding a variable amount of information, so the length is needed. If you were to add two strings, where would you know where the first ended and the second began?
If you really don't want or need that length byte, you can always convert the string to a byte array and use that.

Ok, here is my edited code. I removed BinaryWriter (while BinaryReader is still there!!), and now it works very well - no more extra bytes.
What do you thing? Is there anytihng to do better, to make it run faster?
Expecially Im interesting for that foreach loop, which read from another method that is yield return type!!
New Code:
static void Main(string[] args)
{
//
//// SEND - RECEIVE:
//
SendingData();
Console.ReadLine();
}
private static void SendingData()
{
int[] commands = { 1, 2, 3 };
// 1 - user text
// 2 - new game
// 3 - join game
// ...
for (int i = 0; i < commands.Length; i++)
{
//convert to byte[]
byte[] allBytes;
using (MemoryStream ms = new MemoryStream())
{
// 1.st - write a command:
ms.Write(BitConverter.GetBytes(commands[i]), 0, 4);
// 2nd - write a text:
if (commands[i] == 1)
{
//some example text (like that user sends it):
string myText = "This is some text at command " + commands[i];
byte[] myBytes = Encoding.UTF8.GetBytes(myText);
ms.Write(myBytes, 0, myBytes.Length);
}
allBytes = ms.ToArray();
}
//convert back:
int valueA = 0;
StringBuilder sb = new StringBuilder();
foreach (var b in ReadingData(allBytes).Select((a, b) => new { Value = a, Index = b }))
{
if (b.Index == 0)
{
valueA = BitConverter.ToInt32(b.Value, 0);
}
else
{
sb.Append(Convert.ToChar(b.Value[0]));
}
}
if (sb.ToString().Length == 0)
sb.Append("ONLY COMMAND");
Console.WriteLine("Command = {0} and Text is \"{1}\".", valueA, sb.ToString());
}
}
private static IEnumerable<byte[]> ReadingData(byte[] data)
{
using (MemoryStream ms = new MemoryStream(data))
{
using (BinaryReader br = new BinaryReader(ms))
{
int j = 0;
byte[] buffer = new byte[4];
for (int i = 0; i < data.Length; i++)
{
buffer[j++] = data[i];
if (i == 3) //SENDING COMMAND DATA
{
yield return buffer;
buffer = new byte[1];
j = 0;
}
else if (i > 3) //SENDING TEXT
{
yield return buffer;
j = 0;
}
}
}
}
}

Compare binary files in C#

I want to compare two binary files. One of them is already stored on the server with a pre-calculated CRC32 in the database from when I stored it originally.
I know that if the CRC is different, then the files are definitely different. However, if the CRC is the same, I don't know that the files are. So, I'm looking for a nice efficient way of comparing the two streams: one from the posted file and one from the file system.
I'm not an expert on streams, but I'm well aware that I could easily shoot myself in the foot here as far as memory usage is concerned.

static bool FileEquals(string fileName1, string fileName2)
{
// Check the file size and CRC equality here.. if they are equal...
using (var file1 = new FileStream(fileName1, FileMode.Open))
using (var file2 = new FileStream(fileName2, FileMode.Open))
return FileStreamEquals(file1, file2);
}
static bool FileStreamEquals(Stream stream1, Stream stream2)
{
const int bufferSize = 2048;
byte[] buffer1 = new byte[bufferSize]; //buffer size
byte[] buffer2 = new byte[bufferSize];
while (true) {
int count1 = stream1.Read(buffer1, 0, bufferSize);
int count2 = stream2.Read(buffer2, 0, bufferSize);
if (count1 != count2)
return false;
if (count1 == 0)
return true;
// You might replace the following with an efficient "memcmp"
if (!buffer1.Take(count1).SequenceEqual(buffer2.Take(count2)))
return false;
}
}

I sped up the "memcmp" by using a Int64 compare in a loop over the read stream chunks. This reduced time to about 1/4.
private static bool StreamsContentsAreEqual(Stream stream1, Stream stream2)
{
const int bufferSize = 2048 * 2;
var buffer1 = new byte[bufferSize];
var buffer2 = new byte[bufferSize];
while (true)
{
int count1 = stream1.Read(buffer1, 0, bufferSize);
int count2 = stream2.Read(buffer2, 0, bufferSize);
if (count1 != count2)
{
return false;
}
if (count1 == 0)
{
return true;
}
int iterations = (int)Math.Ceiling((double)count1 / sizeof(Int64));
for (int i = 0; i < iterations; i++)
{
if (BitConverter.ToInt64(buffer1, i * sizeof(Int64)) != BitConverter.ToInt64(buffer2, i * sizeof(Int64)))
{
return false;
}
}
}
}

This is how I would do it if you didn't want to rely on crc:
/// <summary>
/// Binary comparison of two files
/// </summary>
/// <param name="fileName1">the file to compare</param>
/// <param name="fileName2">the other file to compare</param>
/// <returns>a value indicateing weather the file are identical</returns>
public static bool CompareFiles(string fileName1, string fileName2)
{
FileInfo info1 = new FileInfo(fileName1);
FileInfo info2 = new FileInfo(fileName2);
bool same = info1.Length == info2.Length;
if (same)
{
using (FileStream fs1 = info1.OpenRead())
using (FileStream fs2 = info2.OpenRead())
using (BufferedStream bs1 = new BufferedStream(fs1))
using (BufferedStream bs2 = new BufferedStream(fs2))
{
for (long i = 0; i < info1.Length; i++)
{
if (bs1.ReadByte() != bs2.ReadByte())
{
same = false;
break;
}
}
}
}
return same;
}

The accepted answer had an error that was pointed out, but never corrected: stream read calls are not guaranteed to return all bytes requested.
BinaryReader ReadBytes calls are guaranteed to return as many bytes as requested unless the end of the stream is reached first.
The following code takes advantage of BinaryReader to do the comparison:
static private bool FileEquals(string file1, string file2)
{
using (FileStream s1 = new FileStream(file1, FileMode.Open, FileAccess.Read, FileShare.Read))
using (FileStream s2 = new FileStream(file2, FileMode.Open, FileAccess.Read, FileShare.Read))
using (BinaryReader b1 = new BinaryReader(s1))
using (BinaryReader b2 = new BinaryReader(s2))
{
while (true)
{
byte[] data1 = b1.ReadBytes(64 * 1024);
byte[] data2 = b2.ReadBytes(64 * 1024);
if (data1.Length != data2.Length)
return false;
if (data1.Length == 0)
return true;
if (!data1.SequenceEqual(data2))
return false;
}
}
}

if you change that crc to a sha1 signature the chances of it being different but with the same signature are astronomicly small

You can check the length and dates of the two files even before checking the CRC to possibly avoid the CRC check.
But if you have to compare the entire file contents, one neat trick I've seen is reading the bytes in strides equal to the bitness of the CPU. For example, on a 32 bit PC, read 4 bytes at a time and compare them as int32's. On a 64 bit PC you can read 8 bytes at a time. This is roughly 4 or 8 times as fast as doing it byte by byte. You also would probably wanna use an unsafe code block so that you could use pointers instead of doing a bunch of bit shifting and OR'ing to get the bytes into the native int sizes.
You can use IntPtr.Size to determine the ideal size for the current processor architecture.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Remove null bytes from the beginning of a stream - c#

Related

streamreader/buffered stream C#

Read a large binary file(5GB) into a byte array in C#?

FileStream Seek fails on large files at second call

BinaryWriter problem - "code adds some byte between Write() method"

Compare binary files in C#

Categories

Resources