file reading C# .NET

file reading C# .NET - c#

I have a binary file. It consists of 4 messages, each is inthe size of 100 bytes.
I want to read that last 2 messages again. I am using BinaryReader object.
I seek to psosition 200 and then I read: BinaryReaderObject.read(charBuffer, 0, 10000),
where charBuffer is big enougth.
I get all the time the a mount of read is always missing 1. Instead of getting 200 I get 199. Instead of getting 400 I get 399.
I checked and saw the size of the file is correct and the data that I get starts at the right place.
Thnaks,

Try this code and see what happens with your file.
String message = #"Read {0} bytes into the buffer.";
String fileName = #"TEST.DAT";
Int32 recordSize = 100;
Byte[] buffer = new Byte[recordSize];
using (BinaryReader br = new BinaryReader(File.OpenRead(fileName)))
{
br.BaseStream.Seek(2 * recordSize, SeekOrigin.Begin);
Console.WriteLine(message, br.Read(buffer, 0, recordSize));
Console.WriteLine(message, br.Read(buffer, 0, recordSize));
}
Console.ReadLine();
I get the following output with a 400 byte test file.
Read 100 bytes into the buffer.
Read 100 bytes into the buffer.
If I seek to 2 * recordSize + 1 or use a 399 byte file, I get the following output.
Read 100 bytes into the buffer.
Read 99 bytes into the buffer.
So it works as expected.

Hint: zero-based array indexes, and zero-based positions ...
First byte will start at position zero.

Seek to the end and print position. Is it as expected?
Print the position after reading the 199 -- is it as expected?
Try to read 1 more byte from the position after you get 199 -- do you get EOF?
How are you checking the size of the file?
Diff the 199 bytes with the expected ones -- what is different?
Two things I would check
CR/LF transformations
That the size is what you think it is.

The problem was that I used a wrapper to BinaryReader object.
When calling the Read method there are some function overloding. Instead os using the signeture of char[], I used byte[]. Till now it worked fine because there was only use of utf-8, but now when I entered real binary data in the beginning of each message it caused the problem.

Related

Limiting the size of data being read from a CSV file so it reads only full lines

I want to use C# to read a CSV file of about 10GB. I can't read the file one line at a time and have a limitation of reading a maximum chunk of 32MB at a time.
How can I limit the size of the data I'm reading BUT also make sure I'm reading only full lines? That means that if a full 32MB means reading just for example 100.5 lines, then I want to read only the full 100 lines and leave out the half line even if it means reading less than 32MB.
This is the skeleton code I was thinking about (the comments there hold more questions):
const int MAX_BUFFER = 33554432; //32MB
byte[] buffer = new byte[MAX_BUFFER];
int bytesRead;
using (System.IO.FileStream fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read))
while ((bytesRead = fileStream.Read(buffer, 0, MAX_BUFFER)) != 0)
{
//should I somehow analyze here if what I'm reading containing only full lines?
//and if so, how can I know that I'm not currently reading something less than 32MB
//meaning bytesRead is less than that and that maybe I'm going to read the rest of the line in the next iteration?
}

You don't need to ensure you're reading full lines.
Read the file by chunks into a buffer.
Process each line from your buffer character by character, until you reach a newline character. If you're on a line and reach the end of the buffer, keep that portion around, read the next chunk, and concatenate everything from the new read up to a newline with the left overs from the previous read.
If the very last byte of the buffer is a new-line, you have a whole line and can simply move on the the next chunk. If not, read the next chunk - either the first byte will be a new-line, or you'll have other characters before it. Either way, concat everything up to the newline (even if that means 0 characters) and start on the next.
If you hit the end of file after a newline, you're done. If you hit the end of file while processing non-newline characters, it's up to you whether you want to keep them as a valid line or discard them.
This is very similar to a circular buffer.
Another solution might be to use a BufferedStream and specify the buffer size. Then just read byte by byte to each newline or EOF.

BMP RFC can't get size of file not from stream size BUT from header info

Trying to get the valid size of BMP file.
Of course the best way is just to get the Length property of the loaded stream.
But BMP header format DOES include the information about its size and I want to try to get it exactly from BMP header.
As from Wiki or other source:
http://en.wikipedia.org/wiki/BMP_file_format
offset: 0002h | 4 bytes | the size of the BMP file in bytes
So the size value is included in BMP header in region of 4 bytes ( from [2] -> [5]: 2, 3, 4, 5 )
So first of all I thought to get all byte values and sum the all:
1).
int BMPGetFileSize(ref byte[] headerPart)
{
int fileSize = 0;
for (int i = 0; i < headerPart.Length; i++)
{
fileSize += headerPart[i];
}
return (fileSize > 0) ? fileSize : -1;
}
I've got a very small size... For my file, the actual size is 901:
But, I've got from the code: 84.
I've checked for the correct region, I thought that I can get not the correct values, BUT I've got it correctly ( from 2nd till 5th from byte[] data of BMP ).
2). Then I thought, that I must not sum them, but just to write at one string line all the values and then convert it to System.Int32 and divide on 1024 to get the size in Kbytes, but again... it doesn't equal to the 901 Kb value.
You may thought, that I've confused the region and selecting wrong values, when you look at watch dialog and compare it with the function code, but as you see the byte[] array from function is: headerPart, not data, so I didn't confuse anything, data[] is the file stream of whole BMP file.
So, how could I get file size from BMP header, not from the property of stream in C#?

The BMP file format is a binary format, which means you cannot read it using a StreamReader or TextReader (which are both used only for text) or decode it with a UTF-8 or ANSI decoder. (Encodings are also used only for text.) You have to read it using a BinaryReader.
The documentation states:
offset: 0002h | 4 bytes | the size of the BMP file in bytes
So you need to read four bytes and combine them into an integer value.
With the BinaryReader class you can call the ReadUInt32() method to read 4 bytes that form an unsigned 32-bit integer.
If you do that, you'll see it reads:
921654
...which is 900 KiB and then some.

Reading\Writing Structured Binary File

i want to read\write a binary file which has the following structure:
The file is composed by "RECORDS". Each "RECORD" has the following structure:
I will use the first record as example
(red)START byte: 0x5A (always 1 byte, fixed value 0x5A)
(green) LENGTH bytes: 0x00 0x16 (always 2 bytes, value can change from
"0x00 0x02" to "0xFF 0xFF")
(blue) CONTENT: Number of Bytes indicated by the decimal value of LENGTH Field minus 2. In this case LENGHT field value is 22 (0x00 0x16 converted to decimal), therefore the CONTENT will contain 20 (22 - 2) bytes.
My goal is to read each record one by one, and write it to an output file.
Actually i have a read function and write function (some pseudocode):
private void Read(BinaryReader binaryReader, BinaryWriter binaryWriter)
{
byte START = 0x5A;
int decimalLenght = 0;
byte[] content = null;
byte[] length = new byte[2];
while (binaryReader.PeekChar() != -1)
{
//Check the first byte which should be equals to 0x5A
if (binaryReader.ReadByte() != START)
{
throw new Exception("0x5A Expected");
}
//Extract the length field value
length = binaryReader.ReadBytes(2);
//Convert the length field to decimal
int decimalLenght = GetLength(length);
//Extract the content field value
content = binaryReader.ReadBytes(decimalLenght - 2);
//DO WORK
//modifying the content
//Writing the record
Write(binaryWriter, content, length, START);
}
}
private void Write(BinaryWriter binaryWriter, byte[] content, byte[] length, byte START)
{
binaryWriter.Write(START);
binaryWriter.Write(length);
binaryWriter.Write(content);
}
This way is actually working.
However since I am dealing with very large files i find it to be not performing at all, cause I Read and write 3 times foreach Record. Actually I would like to read bug chunks of data instead small amount of byte and maybe work in memory, but my experience in using Stream stops with BinaryReader and BinaryWriter. Thanks in advance.

FileStream is already buffered, so I'd expect it to work pretty well. You could always create a BufferedStream around the original stream to add extra more buffering if you really need to, but I doubt it would make a significant difference.
You say it's "not performing at all" - how fast is it working? How sure are you that the IO is where your time is going? Have you performed any profiling of the code?

I might also suggest that you read 3 (or 6?) bytes initially, instead of 2 separate reads. Put the initial bytes in a small array, check the 5a ck-byte, then the 2 byte length indicator, then the 3 byte AFP op-code, THEN, read the remainder of the AFP record.
It's a small difference, but it gets rid of one of your read calls.
I'm no Jon Skeet, but I did work at one of the biggest print & mail shops in the country for quite a while, and we did mostly AFP output :-)
(usually in C, though)

basestream.seek displays?

I have a text file called message.txt which has "abcdef' as text in it.
Now, code below outputs:
a if I seek with offset 0
? if I seek with offset 1 or 2
a (again) if I seek with offset 3
b if I seek with offset 4
c if I seek with offset 5
and so on.
static void Main(string[] args)
{
StreamReader sr = new StreamReader("Message.txt");
sr.BaseStream.Seek(2, SeekOrigin.Begin);
Console.WriteLine((char)sr.Read());
}
QUESTION
From offset 3 it behaves as expected. But ideally the same should have been the output starting with offset 1. Hence,
Q1. Why same output a happens with offset 0 and 3?
Q2. Why I get a ? for offset 1 and 2
thanks

You have BOM at the start of your file. Byte order mark, the unicode header.
Watch your file in some hex editor. (Rename to .bin and open in Visual Studio.) This particular BOM tells the computer that this is a UTF-8 file.

There are three likely factors here:
encoding: in most encodings, bytes != characters
buffers: if you Seek a base stream, you must tell the reader to drop any buffers it may have, or it will get badly confused; to do this call sr.DiscardBufferedData()
byte order marks at the start of the file

Reading data from a particular location of a FileStream using .NET

I am trying to read data from a file stream as shown below:
fileStream.Read(byteArray, offset, length);
The problem is that my offset and length are Unsigned Ints and above function accepts only ints. If I typecast to int, I am getting a negative value for offset which is meaningless and not acceptable by the function.
The offset and length are originally taken from another byte array as shown below:
BitConverter.ToUInt32(length, 0); //length is a 4 byte long byte-array
What is the right way to read from arbitrary locations of a file stream.

I am not sure if this is the best way to handle it, but you can change the position of the stream and use offset 0. The Position is of type long.
fileStream.Position = (long)length;
fileStream.Read(byteArray, 0, sizeToRead);

For such a filesize you should read your file in small blocks, proccess the block and read the next. int.MaxValue is about ~2GB, uint.MaxValue ~4GB. Such a size doesn't fit in most computers ram ;)

if you are having problems with conversion, something similar might help:
uint myUInt;
int i = (int)myUInt; or
int i = Convert.ToInt32(myUInt);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

file reading C# .NET - c#

Hint: zero-based array indexes, and zero-based positions ... First byte will start at position zero.

Related

Limiting the size of data being read from a CSV file so it reads only full lines

BMP RFC can't get size of file not from stream size BUT from header info

Reading\Writing Structured Binary File

basestream.seek displays?

Reading data from a particular location of a FileStream using .NET

Categories

Resources