how to remove few first positions from binary file - c#

I've used an HEX editor to figure out that a encrypting made to a .WAV file is adding 16 blocks of empty 00's.
I understood that the first 64 positions should be deleted and then the file is de-crypted.
After searching the site i couldn't find an example that will match my case,
I just need to open the file and write it to another file without those first 64 positions.
Appreciate your help

If you use 64byte buffer to copy the file, than you can skip the first one:
using(var originalFile = File.OpenRead("some file"))
using(var newFile = File.OpenWrite("some file"))
{
byte[] buffer = new byte[64];
int readBytes= 0;
int currentReaded = 0;
do
{
currentReaded = originalFile.Read(buffer, 0, buffer.Length);
readBytes += currentReaded;
if(readBytes > 64)
{
newFile.Write(buffer, 0, currentReaded);
}
} while (currentReaded == buffer.Length);
}

Related

Read a large binary file(5GB) into a byte array in C#?

I have a recording file (Binary file) more than 5 GB, i have to read that file and filter out the data needed to be send to server.
Problem is byte[] array supports till 2GB of file data . so just need help if someone had already dealt with this type of situation.
using (FileStream str = File.OpenRead(textBox2.Text))
{
int itemSectionStart = 0x00000000;
BinaryReader breader = new BinaryReader(str);
breader.BaseStream.Position = itemSectionStart;
int length = (int)breader.BaseStream.Length;
byte[] itemSection = breader.ReadBytes(length ); //first frame data
}
issues:
1: Length is crossing the range of integer.
2: tried using long and unint but byte[] only supports integer
Edit.
Another approach i want to give try, Read data on frame buffer basis, suppose my frame buffer size is 24000 . so byte array store that many frames data and then process the frame data and then flush out the byte array and store another 24000 frame data. till keep on going till end of binary file..
See you can not read that much big file at once, so you have to either split the file in small portions and then process the file.
OR
Read file using buffer concept and once you are done with that buffer data then flush out that buffer.
I faced the same issue, so i tried the buffer based approach and it worked for me.
FileStream inputTempFile = new FileStream(Path, FileMode.OpenOrCreate, FileAccess.Read);
Buffer_value = 1024;
byte[] Array_buffer = new byte[Buffer_value];
while ((bytesRead = inputTempFile.Read(Array_buffer, 0, Buffer_value)) > 0)
{
for (int z = 0; z < Array_buffer.Length; z = z + 4)
{
string temp_id = BitConverter.ToString(Array_buffer, z, 4);
string[] temp_strArrayID = temp_id.Split(new char[] { '-' });
string temp_ArraydataID = temp_strArrayID[0] + temp_strArrayID[1] + temp_strArrayID[2] + temp_strArrayID[3];
}
}
this way you can process your data.
For my case i was trying to store buffer read data in to a List, it will work fine till 2GB data after that it will throw memory exception.
The approach i followed, read the data from buffer and apply needed filters and write filter data in to a text file and then process that file.
//text file approach
FileStream inputTempFile = new FileStream(Path, FileMode.OpenOrCreate, FileAccess.Read);
Buffer_value = 1024;
StreamWriter writer = new StreamWriter(Path, true);
byte[] Array_buffer = new byte[Buffer_value];
while ((bytesRead = inputTempFile.Read(Array_buffer, 0, Buffer_value)) > 0)
{
for (int z = 0; z < Array_buffer.Length; z = z + 4)
{
string temp_id = BitConverter.ToString(Array_buffer, z, 4);
string[] temp_strArrayID = temp_id.Split(new char[] { '-' });
string temp_ArraydataID = temp_strArrayID[0] + temp_strArrayID[1] + temp_strArrayID[2] + temp_strArrayID[3];
if(temp_ArraydataID =="XYZ Condition")
{
writer.WriteLine(temp_ArraydataID);
}
}
}
writer.Close();
As said in comments, I think you have to read your file with a stream. Here is how you can do this:
int nbRead = 0;
var step = 10000;
byte[] buffer = new byte[step];
do
{
nbRead = breader.Read(buffer, 0, step);
hugeArray.Add(buffer);
foreach(var oneByte in hugeArray.SelectMany(part => part))
{
// Here you can read byte by byte this subpart
}
}
while (nbRead > 0);
If I well understand your needs, you are looking for a specific pattern into your file?
I think you can do it by looking for the start of your pattern byte by byte. Once you find it, you can start reading the important bytes. If the whole important data is greater than 2GB, as said in the comments, you will have to send it to your server in several parts.

Get Estimate of Line Count in a text file

I would like to get an estimate of the number of lines in a csv/text file so that I can use that number for a progress bar. The file could be extremely large so getting the exact number of lines will take too long for this purpose.
What I have come up with is below (read in a portion of the file and count the number of lines and use the file size to estimate the total number of lines):
public static int GetLineCountEstimate(string file)
{
double count = 0;
using (var fs = new FileStream(file, FileMode.Open, FileAccess.Read))
{
long byteCount = fs.Length;
int maxByteCount = 524288;
if (byteCount > maxByteCount)
{
var buf = new byte[maxByteCount];
fs.Read(buf, 0, maxByteCount);
string s = System.Text.Encoding.UTF8.GetString(buf, 0, buf.Length);
count = s.Split('\n').Length * byteCount / maxByteCount;
}
else
{
var buf = new byte[byteCount];
fs.Read(buf, 0, (int)byteCount);
string s = System.Text.Encoding.UTF8.GetString(buf, 0, buf.Length);
count = s.Split('\n').Length;
}
}
return Convert.ToInt32(count);
}
This seems to work ok, but I have some concerns:
1) I would like to have my parameter simply as Stream (as opposed to a filename) since I may also be reading from the clipboard (MemoryStream). However Stream doesn't seem to be able to read n bytes at once into a buffer or get the total length of the Stream in bytes, like FileStream can. Stream is the parent class to both MemoryStream and FileStream.
2) I don't want to assume an encoding such as UTF8
3) I don't want to assume an end of line character (it should work for CR, CRLF, and LF)
I would appreciate any help to make this function more robust.
Here is what I came up with as a more robust solution for estimating line count.
public static int EstimateLineCount(string file)
{
using (var fs = new FileStream(file, FileMode.Open, FileAccess.Read))
{
return EstimateLineCount(fs);
}
}
public static int EstimateLineCount(Stream s)
{
//if file is larger than 10MB estimate the line count, otherwise get the exact line count
const int maxBytes = 10485760; //10MB = 1024*1024*10 bytes
s.Position = 0;
using (var sr = new StreamReader(s, Encoding.UTF8))
{
int lineCount = 0;
if (s.Length > maxBytes)
{
while (s.Position < maxBytes && sr.ReadLine() != null)
lineCount++;
return Convert.ToInt32((double)lineCount * s.Length / s.Position);
}
while (sr.ReadLine() != null)
lineCount++;
return lineCount;
}
}
var lineCount = File.ReadLines(#"C:\file.txt").Count();
An other way:
var lineCount = 0;
using (var reader = File.OpenText(#"C:\file.txt"))
{
while (reader.ReadLine() != null)
{
lineCount++;
}
}
You're cheating! You're asking more than one question... I'll try to help you anyway :P
No, you can't use Stream, but you can use StreamReader. This should provide the flexibility you need.
Test for encoding, since I deduce you'll be working with various. Keep in mind however that it's usually hard to cater for ALL scenarios, so pick a few important ones first, and extend your program later.
Don't - let me show you how:
First, consider your source. Whether it's a file or memory stream, you should have an idea about it's size. I've done the file bit because I'm lazy and it's easy, so you'll have to figure out the memory stream bit yourself. What I've done is much simpler but less accurate: Read the first line of the file, and use it as a percentage of the size of the file. Note I multiplied the length of the string by 2 as that is the delta, in other words number of extra bytes used per extra character in a string. Obviously this isn't very accurate, so you can extend it to x number of lines, just keep in mind that you'll have to change the formula as well.
static void Main(string[] args)
{
FileInfo fileInfo = new FileInfo((#"C:\Muckabout\StringCounter\test.txt"));
using (var stream = new StreamReader(fileInfo.FullName))
{
var firstLine = stream.ReadLine(); // Read the first line.
Console.WriteLine("First line read. This is roughly " + (firstLine.Length * 2.0) / fileInfo.Length * 100 + " per cent of the file.");
}
Console.ReadKey();
}

Writing and reading streams with offset to reduce disc seeks

I am reading a file and writing stream of that to a file, what I want to do is writing multiple files in a single file and reading them by their offset.
While writing the files, I understand that i need to know the file offset and length of the stream to read back the file.
var file = #"d:\foo.pdf";
var stream = File.ReadAllBytes(file);
// here i have the length of the
Console.WriteLine(stream.LongLength);
using (var br = new BinaryWriter(File.Open(#"d:\foo.bin", FileMode.OpenOrCreate)))
{
br.Write(stream);
}
I need to find the offset while writing multiple files.
Also while reading back the files, how do I start from an offset and read forwards as long as the length?
Finally, Does this method reduces number of disc seeks?
To read back various fragments, you will need to store the individual file lengths. For example:
using(var dest = File.Open(#"d:\foo.bin", FileMode.OpenOrCreate))
{
Append(dest, file);
Append(dest, anotherFile);
}
...
static void AppendFile(Stream dest, string path)
{
using(var source = File.OpenRead(path))
{
var lenHeader = BitConverter.GetBytes(source.Length);
dest.Write(lenHeader, 0, 4);
source.CopyTo(dest);
}
}
Then to read back you can do things like:
using(var source = File.OpenRead(...))
{
int len = ReadLength(source);
stream.Seek(len, SeekOrigin.Current); // skip the first file
len = ReadLength(source);
// TODO: now read len-many bytes from the second file
}
static int ReadLength(Stream stream)
{
byte[] buffer = new byte[4];
int count = 4, offset = 0, read;
while(count != 0 && (read = stream.Read(buffer, offset, count)) > 0)
{
count -= read;
offset += read;
}
if (count != 0) throw new EndOfStreamException();
return BitConverter.ToInt32(buffer, 0);
}
As for reading len-many bytes; you can either just keep track of it and decrement it while reading, or you can create a length-limited Stream wrapper. Either works.

loading/streaming a file into a buffer/buffers

I have been trying for a couple of days now to load a file in chunks to allow the user to use very large (GB) files and still keep the speed of the program. Currently i have the following code:
using (FileStream filereader = new FileStream(filename, FileMode.Open, FileAccess.Read))
{
using (StreamReader reader = new StreamReader(filereader))
{
while (toRead > 0 && (bytesread = reader.Read(buffer, offset, toRead)) > 0)
{
toRead -= bytesread;
offset += bytesread;
}
if (toRead > 0) throw new EndOfStreamException();
foreach (var item in buffer)
{
temporary = temporary += item.ToString();
}
temporary.Replace("\n", "\n" + System.Environment.NewLine);
Below are the declarations to avoid any confusion (hopefully):
const int Max_Buffer = 5000;
char[] buffer = new char[Max_Buffer];
int bytesread;
int toRead = 5000;
int offset = 0;
At the moment the program reads in 5000 bytes of the text file, then processes the bytes into a string which i then pass into a stringreader so i can take the information i want.
My problem at the moment is that the buffer can stop halfway through a line so when I take my data in the stringreader class it brings up index/length errors.
What i need is to know how to either seek back in the array to find a certain set of characters that signify the start of a line and then only return the data before that point for processing to a string.
Another issue after sorting the seeking back problem is how would i keep the data i didnt want to process and bring in more data to fill the buffer.
I hope this is explained well, i know i can sometimes be confusing hope someone can help.
I would suggest the use of reader.ReadLine() instead of reader.Read() in your loop
buffer=reader.ReadLine();
bytesread = buffer.Length*2;//Each charcter is unicode and equal to 2 bytes
You can then check on the whether (toRead - bytesread)<0.

GZIP file Total length in C#

I have a zipped file having size of several GBs, I want to get the size of Unzipped contents but don't want to actually unzip the file in C#, What might be the Library I can use? When I right click on the .gz file and go to Properties then under the Archive Tab there is a property name TotalLength which is showing this value. But I want to get it Programmatically using C#.. Any idea?
The last 4 bytes of the gz file contains the length.
So it should be something like:
using(var fs = File.OpenRead(path))
{
fs.Position = fs.Length - 4;
var b = new byte[4];
fs.Read(b, 0, 4);
uint length = BitConverter.ToUInt32(b, 0);
Console.WriteLine(length);
}
The last for bytes of a .gz file are the uncompressed input size modulo 2^32. If your uncompressed file isn't larger than 4GB, just read the last 4 bytes of the file. If you have a larger file, I'm not sure that it's possible to get without uncompressing the stream.
EDIT: See the answers by Leppie and Gabe; the only reason I'm keeping this (rather than deleting it) is that it may be necessary if you suspect the length is > 4GB
For gzip, that data doesn't seem to be directly available - I've looked at GZipStream and the SharpZipLib equivalent - neither works. The best I can suggest is to run it locally:
long length = 0;
using(var fs = File.OpenRead(path))
using (var gzip = new GZipStream(fs, CompressionMode.Decompress)) {
var buffer = new byte[10240];
int count;
while ((count = gzip.Read(buffer, 0, buffer.Length)) > 0) {
length += count;
}
}
If it was a zip, then SharpZipLib:
long size = 0;
using(var zip = new ZipFile(path)) {
foreach (ZipEntry entry in zip) {
size += entry.Size;
}
}
public static long mGetFileLength(string strFilePath)
{
if (!string.IsNullOrEmpty(strFilePath))
{
System.IO.FileInfo info = new System.IO.FileInfo(strFilePath);
return info.Length;
}
return 0;
}

Categories