Read a large binary file(5GB) into a byte array in C#? - c#

I have a recording file (Binary file) more than 5 GB, i have to read that file and filter out the data needed to be send to server.
Problem is byte[] array supports till 2GB of file data . so just need help if someone had already dealt with this type of situation.
using (FileStream str = File.OpenRead(textBox2.Text))
{
int itemSectionStart = 0x00000000;
BinaryReader breader = new BinaryReader(str);
breader.BaseStream.Position = itemSectionStart;
int length = (int)breader.BaseStream.Length;
byte[] itemSection = breader.ReadBytes(length ); //first frame data
}
issues:
1: Length is crossing the range of integer.
2: tried using long and unint but byte[] only supports integer
Edit.
Another approach i want to give try, Read data on frame buffer basis, suppose my frame buffer size is 24000 . so byte array store that many frames data and then process the frame data and then flush out the byte array and store another 24000 frame data. till keep on going till end of binary file..

See you can not read that much big file at once, so you have to either split the file in small portions and then process the file.
OR
Read file using buffer concept and once you are done with that buffer data then flush out that buffer.
I faced the same issue, so i tried the buffer based approach and it worked for me.
FileStream inputTempFile = new FileStream(Path, FileMode.OpenOrCreate, FileAccess.Read);
Buffer_value = 1024;
byte[] Array_buffer = new byte[Buffer_value];
while ((bytesRead = inputTempFile.Read(Array_buffer, 0, Buffer_value)) > 0)
{
for (int z = 0; z < Array_buffer.Length; z = z + 4)
{
string temp_id = BitConverter.ToString(Array_buffer, z, 4);
string[] temp_strArrayID = temp_id.Split(new char[] { '-' });
string temp_ArraydataID = temp_strArrayID[0] + temp_strArrayID[1] + temp_strArrayID[2] + temp_strArrayID[3];
}
}
this way you can process your data.
For my case i was trying to store buffer read data in to a List, it will work fine till 2GB data after that it will throw memory exception.
The approach i followed, read the data from buffer and apply needed filters and write filter data in to a text file and then process that file.
//text file approach
FileStream inputTempFile = new FileStream(Path, FileMode.OpenOrCreate, FileAccess.Read);
Buffer_value = 1024;
StreamWriter writer = new StreamWriter(Path, true);
byte[] Array_buffer = new byte[Buffer_value];
while ((bytesRead = inputTempFile.Read(Array_buffer, 0, Buffer_value)) > 0)
{
for (int z = 0; z < Array_buffer.Length; z = z + 4)
{
string temp_id = BitConverter.ToString(Array_buffer, z, 4);
string[] temp_strArrayID = temp_id.Split(new char[] { '-' });
string temp_ArraydataID = temp_strArrayID[0] + temp_strArrayID[1] + temp_strArrayID[2] + temp_strArrayID[3];
if(temp_ArraydataID =="XYZ Condition")
{
writer.WriteLine(temp_ArraydataID);
}
}
}
writer.Close();

As said in comments, I think you have to read your file with a stream. Here is how you can do this:
int nbRead = 0;
var step = 10000;
byte[] buffer = new byte[step];
do
{
nbRead = breader.Read(buffer, 0, step);
hugeArray.Add(buffer);
foreach(var oneByte in hugeArray.SelectMany(part => part))
{
// Here you can read byte by byte this subpart
}
}
while (nbRead > 0);
If I well understand your needs, you are looking for a specific pattern into your file?
I think you can do it by looking for the start of your pattern byte by byte. Once you find it, you can start reading the important bytes. If the whole important data is greater than 2GB, as said in the comments, you will have to send it to your server in several parts.

Related

Get Estimate of Line Count in a text file

I would like to get an estimate of the number of lines in a csv/text file so that I can use that number for a progress bar. The file could be extremely large so getting the exact number of lines will take too long for this purpose.
What I have come up with is below (read in a portion of the file and count the number of lines and use the file size to estimate the total number of lines):
public static int GetLineCountEstimate(string file)
{
double count = 0;
using (var fs = new FileStream(file, FileMode.Open, FileAccess.Read))
{
long byteCount = fs.Length;
int maxByteCount = 524288;
if (byteCount > maxByteCount)
{
var buf = new byte[maxByteCount];
fs.Read(buf, 0, maxByteCount);
string s = System.Text.Encoding.UTF8.GetString(buf, 0, buf.Length);
count = s.Split('\n').Length * byteCount / maxByteCount;
}
else
{
var buf = new byte[byteCount];
fs.Read(buf, 0, (int)byteCount);
string s = System.Text.Encoding.UTF8.GetString(buf, 0, buf.Length);
count = s.Split('\n').Length;
}
}
return Convert.ToInt32(count);
}
This seems to work ok, but I have some concerns:
1) I would like to have my parameter simply as Stream (as opposed to a filename) since I may also be reading from the clipboard (MemoryStream). However Stream doesn't seem to be able to read n bytes at once into a buffer or get the total length of the Stream in bytes, like FileStream can. Stream is the parent class to both MemoryStream and FileStream.
2) I don't want to assume an encoding such as UTF8
3) I don't want to assume an end of line character (it should work for CR, CRLF, and LF)
I would appreciate any help to make this function more robust.
Here is what I came up with as a more robust solution for estimating line count.
public static int EstimateLineCount(string file)
{
using (var fs = new FileStream(file, FileMode.Open, FileAccess.Read))
{
return EstimateLineCount(fs);
}
}
public static int EstimateLineCount(Stream s)
{
//if file is larger than 10MB estimate the line count, otherwise get the exact line count
const int maxBytes = 10485760; //10MB = 1024*1024*10 bytes
s.Position = 0;
using (var sr = new StreamReader(s, Encoding.UTF8))
{
int lineCount = 0;
if (s.Length > maxBytes)
{
while (s.Position < maxBytes && sr.ReadLine() != null)
lineCount++;
return Convert.ToInt32((double)lineCount * s.Length / s.Position);
}
while (sr.ReadLine() != null)
lineCount++;
return lineCount;
}
}
var lineCount = File.ReadLines(#"C:\file.txt").Count();
An other way:
var lineCount = 0;
using (var reader = File.OpenText(#"C:\file.txt"))
{
while (reader.ReadLine() != null)
{
lineCount++;
}
}
You're cheating! You're asking more than one question... I'll try to help you anyway :P
No, you can't use Stream, but you can use StreamReader. This should provide the flexibility you need.
Test for encoding, since I deduce you'll be working with various. Keep in mind however that it's usually hard to cater for ALL scenarios, so pick a few important ones first, and extend your program later.
Don't - let me show you how:
First, consider your source. Whether it's a file or memory stream, you should have an idea about it's size. I've done the file bit because I'm lazy and it's easy, so you'll have to figure out the memory stream bit yourself. What I've done is much simpler but less accurate: Read the first line of the file, and use it as a percentage of the size of the file. Note I multiplied the length of the string by 2 as that is the delta, in other words number of extra bytes used per extra character in a string. Obviously this isn't very accurate, so you can extend it to x number of lines, just keep in mind that you'll have to change the formula as well.
static void Main(string[] args)
{
FileInfo fileInfo = new FileInfo((#"C:\Muckabout\StringCounter\test.txt"));
using (var stream = new StreamReader(fileInfo.FullName))
{
var firstLine = stream.ReadLine(); // Read the first line.
Console.WriteLine("First line read. This is roughly " + (firstLine.Length * 2.0) / fileInfo.Length * 100 + " per cent of the file.");
}
Console.ReadKey();
}

how to remove few first positions from binary file

I've used an HEX editor to figure out that a encrypting made to a .WAV file is adding 16 blocks of empty 00's.
I understood that the first 64 positions should be deleted and then the file is de-crypted.
After searching the site i couldn't find an example that will match my case,
I just need to open the file and write it to another file without those first 64 positions.
Appreciate your help
If you use 64byte buffer to copy the file, than you can skip the first one:
using(var originalFile = File.OpenRead("some file"))
using(var newFile = File.OpenWrite("some file"))
{
byte[] buffer = new byte[64];
int readBytes= 0;
int currentReaded = 0;
do
{
currentReaded = originalFile.Read(buffer, 0, buffer.Length);
readBytes += currentReaded;
if(readBytes > 64)
{
newFile.Write(buffer, 0, currentReaded);
}
} while (currentReaded == buffer.Length);
}

Array size based on available physical memory

I am trying to make an encryption algorithm.
I can read a file and convert it to bytes without any problems, and am saving the bytes in a byteArray.
The problem is I am currently creating the array size like this:
byte[] FileArray =new byte[10000000];
FileStream TheFileStream = new FileStream(FilePath.Text, FileMode.Open);
BinaryReader TheFileBinary = new BinaryReader(TheFileStream);
for (int i = 0; i < TheFileStream.Length; i++) {
FileArray = TheFileBinary.ReadBytes(10000000);
// I call a function here
if (TheFileStream.Position == TheFileStream.Length)
break;
}
However, I don't want the array size to be fixed, because if I make it 1000000 (as an example), other machines with small memory size might face a problem.
I need to find the Idle size of a memory size for each machine, how can I set the array size dynamically based on the free unallocated memory space, to be used where I can put it in the byteArray?
I have noticed the larger the Arraysize the faster it reads, so I don't want to make it too small either.
I would really appreciate the help.
The FileStream keeps track of how many bytes are in the file. Just use the Length property.
FileStream TheFileStream = new FileStream(FilePath.Text, FileMode.Open);
BinaryReader TheFileBinary = new BinaryReader(TheFileStream);
byte[] FileArray = TheFileBinary.ReadBytes(TheFileStream.Length);
Okay reread the question and finnaly found the part of it that was a question, "how can I know the free unallocated memory space so I can put it in the byteArray". Anyways I suggest you take a look at this question along with its highest rated comment.
If you're really worried about space, then use a simple List, read in chunks of the stream at a time (say 1024), and call the AddRange method on the list. After you're done, call ToArray on the List, and now you have a properly size byte array.
List<byte> byteArr = new List<byte>();
byte[] buffer = new byte[1024];
int bytesRead = 0;
using(FileStream TheFileStream = new FileStream(FilePath.Text, FileMode.Open))
{
while((bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
byteArr.AddRange(buffer);
}
buffer = byteArr.ToArray();
// call your method here.
Edit: It's still preferable to read it in chunks for larger files. You can of course play with the buffer size however you want, but 1024 is usually a good starting point. Doing a read of the entire file will ultimately DOUBLE the memory, as you also have to deal with the internal read buffer being the size of the stream (on top of your own buffer). Breaking up the reads into chunks only takes FileStream.Length + <buffer size> memory as opposed to FileStream.Length * 2. Just something to keep in mind...
byte[] buffer = null;
using(FileStream TheFileStream = new FileStream(FilePath.Text, FileMode.Open))
{
buffer = new byte[TheFileStream.Length];
int offset = 0;
while((bytesRead = stream.Read(buffer, offset, 1024)) > 0)
offset += bytesRead;
// Or just TheFileStream.Read(buffer, 0, buffer.Length) if it's small enough.
}
You can use WMI to retrieve the instance of the Win32_OperatingSystem class and base your memory calculations off of the FreePhysicalMemory or TotalVisibleMemorySize properties:
static ulong GetAvailableMemoryKilobytes()
{
const string memoryPropertyName = "FreePhysicalMemory";
using (ManagementObject operatingSystem = new ManagementObject("Win32_OperatingSystem=#"))
return (ulong) operatingSystem[memoryPropertyName];
}
static ulong GetTotalMemoryKilobytes()
{
const string memoryPropertyName = "TotalVisibleMemorySize";
using (ManagementObject operatingSystem = new ManagementObject("Win32_OperatingSystem=#"))
return (ulong) operatingSystem[memoryPropertyName];
}
Then pass the result of either method to a method like this to scale the size of your read buffer to the memory of the local machine:
static int GetBufferSize(ulong memoryKilobytes)
{
const int bufferStepSize = 256; // 256 kilobytes of buffer...
const int memoryStepSize = 128 * 1024;// ...for every 128 megabytes of memory...
const int minBufferSize = 512; // ...no less than 512 kilobytes...
const int maxBufferSize = 10 * 1024; // ...no more than 10 megabytes
int bufferSize = bufferStepSize * ((int) memoryKilobytes / memoryStepSize);
bufferSize = Math.Max(bufferSize, minBufferSize);
bufferSize = Math.Min(bufferSize, maxBufferSize);
return bufferSize;
}
Obviously, increasing your buffer size by 256 KB for every 128 MB of RAM seems a little silly, but these number are just examples of how you might scale your buffer size if you really wanted to do that. Unless you're reading many, many files at once, worrying about a buffer that's a few hundred kilobytes or a few megabytes might be more trouble than it's worth. You might be better off just benchmarking to see which sized buffer gives the best performance (it might not need to be as large as you think) and using that.
Now you can simply update your code like this:
ulong memoryKilobytes =
GetAvailableMemoryKilobytes();
// ...or GetTotalMemoryKilobytes();
int bufferSize = GetBufferSize(memoryKilobytes);
using (FileStream TheFileStream = new FileStream(FilePath.Text, FileMode.Open))
{
byte[] FileArray = new byte[bufferSize];
int readCount;
while ((readCount = TheFileBinary.Read(FileArray, 0, bufferSize)) > 0)
{
// Call a method here, passing FileArray as a parameter
}
}

C# equivalent of fread

I am in the process of converting some C++ code to C#, I am trying to figure out how I could write out and following C++ code in my C# app and have it do the same thing:
fread(&Start, 1, 4, ReadMunge); //Read File position
I have tried multiple ways such as using FileStream:
using (FileStream fs = File.OpenRead("File-0027.AFS"))
{
//Read amount of files from offset 4
fs.Seek(4, SeekOrigin.Begin);
FileAmount = fs.ReadByte();
string strNumber = Convert.ToString(FileAmount);
fileamountStatus.Text = strNumber;
//Seek to beginning of LBA table
fs.Seek(8, SeekOrigin.Begin);
CurrentOffset = fs.Position;
int numBytesRead = 0;
while (Loop < FileAmount) //We want this to loop till it reachs our FileAmount number
{
Loop = Loop + 1;
//fread(&Start, 1, 4, ReadMunge); //Read File position
//Start = fs.ReadByte();
//Size = fs.ReadByte();
CurrentOffset = fs.Position;
int CurrentOffsetINT = unchecked((int)CurrentOffset);
//Start = fs.Read(bytes,0, 4);
Start = fs.Read(bytes, CurrentOffsetINT, 4);
Size = fs.Read(bytes, CurrentOffsetINT, 4);
Start = fs.ReadByte();
}
}
The problem I keep running into is that Start/Size do not hold the 4 bytes of data that I need.
If you're reading a binary file, you probably should look into using BinaryReader. That way you don't have to worry about converting byte arrays to integers or whatever. You can simply call reader.ReadInt32, for example, to read an int.

GZIP file Total length in C#

I have a zipped file having size of several GBs, I want to get the size of Unzipped contents but don't want to actually unzip the file in C#, What might be the Library I can use? When I right click on the .gz file and go to Properties then under the Archive Tab there is a property name TotalLength which is showing this value. But I want to get it Programmatically using C#.. Any idea?
The last 4 bytes of the gz file contains the length.
So it should be something like:
using(var fs = File.OpenRead(path))
{
fs.Position = fs.Length - 4;
var b = new byte[4];
fs.Read(b, 0, 4);
uint length = BitConverter.ToUInt32(b, 0);
Console.WriteLine(length);
}
The last for bytes of a .gz file are the uncompressed input size modulo 2^32. If your uncompressed file isn't larger than 4GB, just read the last 4 bytes of the file. If you have a larger file, I'm not sure that it's possible to get without uncompressing the stream.
EDIT: See the answers by Leppie and Gabe; the only reason I'm keeping this (rather than deleting it) is that it may be necessary if you suspect the length is > 4GB
For gzip, that data doesn't seem to be directly available - I've looked at GZipStream and the SharpZipLib equivalent - neither works. The best I can suggest is to run it locally:
long length = 0;
using(var fs = File.OpenRead(path))
using (var gzip = new GZipStream(fs, CompressionMode.Decompress)) {
var buffer = new byte[10240];
int count;
while ((count = gzip.Read(buffer, 0, buffer.Length)) > 0) {
length += count;
}
}
If it was a zip, then SharpZipLib:
long size = 0;
using(var zip = new ZipFile(path)) {
foreach (ZipEntry entry in zip) {
size += entry.Size;
}
}
public static long mGetFileLength(string strFilePath)
{
if (!string.IsNullOrEmpty(strFilePath))
{
System.IO.FileInfo info = new System.IO.FileInfo(strFilePath);
return info.Length;
}
return 0;
}

Categories