This question already has answers here:
How to read a large (1 GB) txt file in .NET?
(9 answers)
How to read a specific part in text file?
(2 answers)
Closed 8 years ago.
There's a big text file. And I cannot read it with File.ReadAllText() method. How to read it piece by piece in C#?
Try Something like this.
Here i read the file in blocks of 2 megabytes.
This will reduce the load of loading file and reading as it reads in block of 2 megabytes.
You can change the size if you want from 2 megabytes to what so ever.
using (Stream objStream = File.OpenRead(FilePath))
{
// Read data from file
byte[] arrData = { };
// Read data from file until read position is not equals to length of file
while (objStream.Position != objStream.Length)
{
// Read number of remaining bytes to read
long lRemainingBytes = objStream.Length - objStream.Position;
// If bytes to read greater than 2 mega bytes size create array of 2 mega bytes
// Else create array of remaining bytes
if (lRemainingBytes > 262144)
{
arrData = new byte[262144];
}
else
{
arrData = new byte[lRemainingBytes];
}
// Read data from file
objStream.Read(arrData, 0, arrData.Length);
// Other code whatever you want to deal with read data
}
}
Related
How can I read multiple byte arrays from a file? These byte arrays are images and have the potential to be large.
This is how I'm adding them to the file:
using (var stream = new FileStream(tempFile, FileMode.Append))
{
//convertedImage = one byte array
stream.Write(convertedImage, 0, convertedImage.Length);
}
So, now they're in tempFile and I don't know how to retrieve them as individual arrays. Ideally, I'd like to get them as an IEnumerable<byte[]>. Is there a way to split these, maybe?
To retrieve multiple sets of byte arrays, you will need to know the length when reading. The easiest way to do this (if you can change the writing code) is to add a length value:
using (var stream = new FileStream(tempFile, FileMode.Append))
{
//convertedImage = one byte array
// All ints are 4-bytes
stream.Write(BitConverter.GetBytes(convertedImage.Length), 0, 4);
// now, we can write the buffer
stream.Write(convertedImage, 0, convertedImage.Length);
}
Reading the data is then
using (var stream = new FileStream(tempFile, FileMode.Open))
{
// loop until we can't read any more
while (true)
{
byte[] convertedImage;
// All ints are 4-bytes
int size;
byte[] sizeBytes = new byte[4];
// Read size
int numRead = stream.Read(sizeBytes, 0, 4);
if (numRead <= 0) {
break;
}
// Convert to int
size = BitConverter.ToInt32(sizeBytes, 0);
// Allocate the buffer
convertedImage = new byte[size];
stream.Read(convertedImage, 0, size);
// Do what you will with the array
listOfArrays.Add(convertedImage);
} // end while
}
If all saved images are the same size, then you can eliminate the first read and write call from each, and hard-code size to the size of the arrays.
Unless you can work out the number of bytes taken by each individual array from the content of these bytes themselves, you need to store the number of images and their individual lengths into the file.
There are many ways to do it: you could write lengths of the individual arrays preceding each byte array, or you could write a "header" describing the rest of the content before writing the "payload" data to the file.
Header may look as follows:
Byte offset Description
----------- -------------------
0000...0003 - Number of files, N
0004...000C - Length of file 1
000D...0014 - Length of file 2
...
XXXX...XXXX - Length of file N
XXXX...XXXX - Content of file 1
XXXX...XXXX - Content of file 2
...
XXXX...XXXX - Content of file N
You can use BitConverter methods to produce byte arrays to be written to the header, or you could use BinaryWriter.
When you read back how do you get the number of bytes per image/byte array to read?
You will need to store the length too (i.e. first 4 bytes = encoded 32 bit int byte count, followed by the data bytes.)
To read back, read the first four bytes, un-encode it back to an int, and then read that number of bytes, repeat until eof.
This question already has answers here:
byte[] to hex string [duplicate]
(19 answers)
Closed 8 years ago.
I'm trying to create a simple Hex Editor with C#.
For this I'm writing the file into an byte-array, which works fine. But as soon as I put out the bytes to a Textbox in form of a string, the overall performance of the program becomes pretty bad. For example a 190kb file takes about 40 seconds, till it is displayed in the textbox. While that the program is not responding.
The function:
void open()
{
fullstring = "";
OpenFileDialog op = new OpenFileDialog();
op.ShowDialog();
file = op.FileName;
byte[] fileB = File.ReadAllBytes(file);
long b = fileB.Length;
for (int i = 0; i < fileB.Length; i++)
{
fullstring = fullstring + fileB[i].ToString("X") + " ";
}
textBox9.Text = fullstring;
}
Is there a way to improve performance in this function?
Take a look at this post How do you convert Byte Array to Hexadecimal String, and vice versa?
You can use the code there to output your byte array to text file. One problem you have in your code is that you are using String concatenation instead of StringBuilder. It is better to use StringBuilder otherwise the performance degrades.
The file is only 14kb (14,000 bytes). I have read that the varbinary(max) column type (which is what I am using) only supports 8,000 bytes. Is that correct? How can I upload my file into the database?
if (file.ContentLength < (3 * 1048576))
{
// extract only the fielname
var fileName = Path.GetFileName(file.FileName);
using (MemoryStream ms = new MemoryStream())
{
file.InputStream.CopyTo(ms);
byte[] array = ms.GetBuffer();
adj.resumeFile = array;
adj.resumeFileContentType = file.ContentType;
}
}
The error:
String or binary data would be truncated. The statement has been
terminated.
Check your other columns that you are inserting into during this process. I would especially check the ContentType column as this will be something like image/jpeg and not simply image or jpeg.
Here is a list of possible content types so that you can create enough space in your ContentType column accordingly.
varbinary [ ( n | max) ]
Variable-length binary data. n can be a value from 1 through 8,000.
max indicates that the maximum storage size is 2^31-1 bytes.
http://msdn.microsoft.com/en-us/library/ms188362.aspx
So that is 2GB.
If you defined your column as VARBINARY(MAX) in the table definition, then you should have up to 2 GB of storage space. If you specified the maximum column size as a number then you can only explicitly ask for up to VARBINARY(8000).
See this question for more details
AFAIK VARBINARY(MAX) only appeared in SQL Server 2008, so if your database pre-dates that version you might need to upgrade it.
I know this isn't the answer to your question, but ms.GetBuffer() will get the underlying buffer which probably isn't the exact size of your data. The MemoryStream allocates extra room for writing and you are probably inserting extra bytes from the unused buffer. Here you can see that GetBuffer() returns a 256 byte array even though the file is only 5 bytes long:
using (MemoryStream ms = new MemoryStream())
{
using (FileStream fs = File.OpenRead("C:\\t\\hello.txt"))
{
fs.CopyTo(ms);
byte[] results = ms.GetBuffer();
Console.WriteLine("Size: {0}", results.Length); // 256
byte[] justdata = new byte[ms.Length];
Array.Copy(results, justdata, ms.Length);
Console.WriteLine("Size: {0}", justdata.Length); // 5
}
}
I have a test program that demonstrates the end result that I am hoping for (even though in this test program the steps may seem unnecessary).
The program compresses data to a file using GZipStream. The resulting compressed file is C:\mydata.dat.
I then read this file, and write it to a new file.
//Read original file
string compressedFile = String.Empty;
using (StreamReader reader = new StreamReader(#"C:\mydata.dat"))
{
compressedFile = reader.ReadToEnd();
reader.Close();
reader.Dispose();
}
//Write to a new file
using (StreamWriter file = new StreamWriter(#"C:\mynewdata.dat"))
{
file.WriteLine(compressedUserFile);
}
When I try to decompress the two files, the original one decompresses perfectly, but the new file throws an InvalidDataException with message The magic number in GZip header is not correct. Make sure you are passing in a GZip stream.
Why are these files different?
StreamReader is for reading a sequence of characters, not bytes. The same applies to StremWriter. Since treating compressed files as a stream of characters doesn't make any sense, you should use some implementation of Stream. If you want to get the stream as an array of bytes, you can use MemoryStream.
The exact reason why using character streams doesn't work is that they assume the UTF-8 encoding by default. If some byte is not valid UTF-8 (like the second byte of the header, 0x8B), it's represented as Unicode “replacement character” (U+FFFD). When the string is written back, that character is encoded using UTF-8 into something completely different than what was in the source.
For example, to read a file from a stream, get it as an array of bytes and then write it to another files as a stream:
byte[] bytes;
using (var fileStream = new FileStream(#"C:\mydata.dat", FileMode.Open))
using (var memoryStream = new MemoryStream())
{
fileStream.CopyTo(memoryStream);
bytes = memoryStream.ToArray();
}
using (var memoryStream = new MemoryStream(bytes))
using (var fileStream = new FileStream(#"C:\mynewdata.dat", FileMode.Create))
{
memoryStream.CopyTo(fileStream);
}
The CopyTo() method is only available in .Net 4, but you can write your own if you use older versions.
Of course, for this simple example, there is no need to use streams. You can simply do:
byte[] bytes = File.ReadAllBytes(#"C:\mydata.dat");
File.WriteAllBytes(#"C:\mynewdata.dat", bytes);
EDIT: Apparently, my suggestions are wrong/invalid/whatever... please use one of the others which have no doubt been highly re-factored to the point where no extra performance could be possible be achieved (else, that would mean they are just as invalid as mine)
using (System.IO.StreamReader sr = new System.IO.StreamReader(#"C:\mydata.dat"))
{
using (System.IO.StreamWriter sw = new System.IO.StreamWriter(#"C:\mynewdata.dat"))
{
byte[] bytes = new byte[1024];
int count = 0;
while((count = sr.BaseStream.Read(bytes, 0, bytes.Length)) > 0){
sw.BaseStream.Write(bytes, 0, count);
}
}
}
Read all bytes
byte[] bytes = null;
using (System.IO.StreamReader sr = new System.IO.StreamReader(#"C:\mydata.dat"))
{
bytes = new byte[sr.BaseStream.Length];
int index = 0;
int count = 0;
while((count = sr.BaseStream.Read(bytes, index, 1024)) > 0){
index += count;
}
}
Read all bytes/write all bytes (from svick's answer):
byte[] bytes = File.ReadAllBytes(#"C:\mydata.dat");
File.WriteAllBytes(#"C:\mynewdata.dat", bytes);
PERFORMANCE TESTING WITH OTHER ANSWERS:
Just did a quick test between my Answer (StreamReader) (first part above, file copy) and svick's answer (FileStream/MemoryStream) (the first one). The test is 1000 iterations of the code, here are the results from 4 tests (results are in whole seconds, all actual result where slightly over these values):
My Code | svick code
--------------------
9 | 12
9 | 14
8 | 13
8 | 14
As you can see, in my test at least, my code performed better. One thing perhaps to note with mine is I am not reading a character stream, I am in fact accessing the BaseStream which is providing a byte stream. Perhaps svick's answer is slow because he is using two streams for reading, then two for writing. Of course, there is a lot of optimisation that could be done to svick's answer to improve the performance (and he also provided an alternative for simple file copy)
Testing with third option (ReadAllBytes/WriteAllBytes)
My Code | svick code | 3rd
----------------------------
8 | 14 | 7
9 | 18 | 9
9 | 17 | 8
9 | 17 | 9
Note: in milliseconds the 3rd option was always better
I am trying to implement this protocol (http://developer.valvesoftware.com/wiki/Source_RCON_Protocol) from a C# NET application. The part applicable there to the code I am implementing is under heading "Receiving". However, I am not positive I have the byte sizes correct when constructing the packet.
Here is my function to construct a packet...
private static byte[] ConstructPacket(int request_id, int cmdtype, string cmd)
{
MemoryStream stream = new MemoryStream();
using (BinaryWriter writer = new BinaryWriter(stream))
{
byte[] cmdBytes = ConvertStringToByteArray(cmd);
int packetSize = 12 + cmdBytes.Length;
// Packet Contents
writer.Write((int)packetSize); // Byte size of Packet not including This
writer.Write((int)request_id); // 4 Bytes
writer.Write((int)cmdtype); // 4 Bytes
writer.Write(cmdBytes); // 8 Bytes ??
// NULL String 1
writer.Write((byte)0x00);
writer.Write((byte)0x00);
// NULL String 2
writer.Write((byte)0x00);
writer.Write((byte)0x00);
// Memory Stream to Byte Array
byte[] buffer = stream.ToArray();
return buffer;
}
}
According to the Protocol specifications, packetSize is the byte size of the packet not including itself.
The first 2 (int) would make it 8 bytes...
The "cmdBytes", which in this paticular instance is "testpass" would be 8 bytes I believe...
Then the final 2 null delimited strings (If I set these up right) would be 4 bytes.
So by my calculations, the packet should be 20 bytes big, but it doesn't seem to be working properly. Are the values I am thinking these should all be correct and am I setting the NULL delmited strings properly for C# .NET?
You write two zeros too many. Pretty easy to see in the examples, they all end with two zeros, not four.
You should probably call writer.Flush() after the last writer.Write(). Otherwise you run the risk of disposing the writer before it's finished writing everything to the stream.