Problem in splitting a file

Problem in splitting a file - c#

int bufferlength = 12488;
int pointer = 1;
int offset = 0;
int length = 0;
FileStream fstwrite = new FileStream("D:\\Movie.wmv", FileMode.Create);
while (pointer != 0)
{
byte[] buff = new byte[bufferlength];
FileStream fst = new FileStream("E:\\Movie.wmv", FileMode.Open);
pointer = fst.Read(buff, 0, bufferlength);
fst.Close();
fstwrite.Write(buff, offset , pointer);
offset += pointer;
}
I used the above code for splitting a file and place it in other drive.Im not able to set the correct offset and length for this routine can anyone help me to fix this
splitting in the sense ,i split it in "x" kbs and pass it somewhere make the same file in some other location
I find it atlast ,thanks to evry one who gave their valueble responses.

Currently you're always reading from the start of the file... and even if you weren't you'd just be copying the whole file.
Here's some code which will actually split a single file into multiple files:
public static void SplitFile(string inputFile,
string outputPrefix,
int chunkSize)
{
byte[] buffer = new byte[chunkSize];
using (Stream input = File.OpenRead(inputFile))
{
int index = 0;
while (input.Position < input.Length)
{
using (Stream output = File.Create(outputPrefix + index))
{
int chunkBytesRead = 0;
while (chunkBytesRead < chunkSize)
{
int bytesRead = input.Read(buffer,
chunkBytesRead,
chunkSize - chunkBytesRead);
// End of input
if (bytesRead == 0)
{
break;
}
chunkBytesRead += bytesRead;
}
output.Write(buffer, 0, chunkBytesRead);
}
index++;
}
}
}

Your reading bufferlength of bytes. Shouldn't you set the offset like this then?
offset += bufferlength;

Don't open your source file inside the loop, or you'll always read the first chunk.
Open it before the loop, then make sure your offset is applied to the read.

Related

C# FileStream.Read doesn't read last block

I read binary file to hex by block.
It is diffrent when I use FileStream.Read and File.ReadAllBytes
FileSteram.Read
int limit = 0;
if (openFileDlg.FileName.Length > 0)
{
fileName = openFileDlg.FileName;
FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read);
fsLen = (int)fs.Length;
int count = 0;
limit = 100;
byte[] read_buff = new byte[limit];
StringBuilder sb = new StringBuilder();
while ( (count = fs.Read(read_buff, 0, limit)) > 0)
{
foreach (byte b in read_buff)
{
sb.Append(Convert.ToString(b, 16).PadLeft(2, '0'));
}
}
rtxb_bin.AppendText(sb.ToString() + "\n");
}
File.ReadAllBytes
if (openFileDlg.FileName.Length > 0)
{
fileName = openFileDlg.FileName;
byte[] fileBytes = File.ReadAllBytes(fileName);
StringBuilder sb2 = new StringBuilder();
foreach (byte b2 in fileBytes)
{
sb2.Append(Convert.ToString(b2, 16).PadLeft(2, '0'));
}
rtxb_allbin.AppendText(sb2.ToString());
}
case 1, reasult is ...
........04c0020f00452a00421346108129844f2138448500208020250405250043188510812e0
and case 2 is
.......04c0020f00452a00421346108129844f2138448500208020250405250043188510812e044f212cc48120c24125404f2069c2c0008bff35f8f401efbd17047
FileStream.Read doesn't read after '12e0'
'44f212cc48120c24125404f2069c2c0008bff35f8f401efbd17047' is missing
How can I read all bytes using FileStream.Read?
Why FileStream.Read doesn't read last block?

Most likely it appears to you that it does not read last block. Suppose you have file of length 102. First iteration of you loop reads first 100 bytes, all is fine. But what happens on second (last) one? You read two bytes into read_buff, which is of length 100. Now that buffer contains 2 bytes of last block and 98 bytes of previous (first) block, because Read doesn't clear the buffer. Then you proceed with:
foreach (byte b in read_buff)
{
sb.Append(Convert.ToString(b, 16).PadLeft(2, '0'));
}
In result, sb has 100 bytes of first block, 2 bytes of last block, and then again 98 bytes of first block. If you don't look too closely, it might appear that it just skipped last block, while in reality it duplicated part of the previous one.
To fix, use count (indicating how much bytes were really read into the buffer) to work only with valid part of read_buff:
for (int i = 0; i < count; i++) {
sb.Append(Convert.ToString(read_buff[i], 16).PadLeft(2, '0'));
}

You need update offset and count.
Sintaxis
public override int Read(
byte[] array,
int offset,
int count
)
Example
public static byte[] ReadFile(string filePath)
{
byte[] buffer;
FileStream fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read);
try
{
int length = (int)fileStream.Length; // get file length
buffer = new byte[length]; // create buffer
int count; // actual number of bytes read
int sum = 0; // total number of bytes read
// read until Read method returns 0 (end of the stream has been reached)
while ((count = fileStream.Read(buffer, sum, length - sum)) > 0)
sum += count; // sum is a buffer offset for next reading
}
finally
{
fileStream.Close();
}
return buffer;
}
Reference

public static void ReadAndProcessLargeFile(string theFilename, long whereToStartReading = 0)
{
FileInfo info = new FileInfo(theFilename);
long fileLength = info.Length;
long timesToRead = (fileLength / megabyte);
long ctr = 0;
long timesRead = 0;
FileStream fileStram = new FileStream(theFilename, FileMode.Open, FileAccess.Read);
using (fileStram)
{
byte[] buffer = new byte[megabyte];
fileStram.Seek(whereToStartReading, SeekOrigin.Begin);
int bytesRead = 0;
//bytesRead = fileStram.Read(buffer, 0, megabyte);
//ctr = ctr + 1;
while ((bytesRead = fileStram.Read(buffer, 0, megabyte)) > 0)
{
ProcessChunk(buffer, bytesRead);
buffer = new byte[megabyte]; // This solves last read prob
}
}
}
private static void ProcessChunk(byte[] buffer, int bytesRead)
{
// Do the processing here
string utfString = Encoding.UTF8.GetString(buffer, 0, bytesRead);
Console.Write(utfString);
}

Seek through FileStream then using StreamReader to read from there

So I want to be able to seek to a point in a fileStream, then read forward using a StreamReader. Then seek forward again, and use the StreamReader to read another chunk of data.
const int BufferSize = 4096;
var buffer = new char[BufferSize];
var endpoints = new List<long>();
using (var fileStream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read))
{
var fileLength = fileStream.Length;
var seekPositionCount = fileLength / concurrentReads;
long currentOffset = 0;
for (var i = 0; i < concurrentReads; i++)
{
var seekPosition = seekPositionCount + currentOffset;
// seek the file forward
fileStream.Seek(seekPosition, SeekOrigin.Current);
// setting true at the end is very important, keeps the underlying fileStream open.
using (var streamReader = new StreamReader(fileStream, Encoding.UTF8, true, BufferSize, true))
{
// this also seeks the file forward the amount in the buffer...
int bytesRead;
var totalBytesRead = 0;
while ((bytesRead = await streamReader.ReadAsync(buffer, 0, buffer.Length)) > 0)
{
totalBytesRead += bytesRead;
var found = false;
var gotR = false;
for (var j = 0; j < buffer.Length; j++)
{
if (buffer[j] == '\r')
{
gotR = true;
continue;
}
if (buffer[j] == '\n' && gotR)
{
// so we add the total bytes read, minus the current buffer amount read, then add how far into the buffer we actually read.
seekPosition += totalBytesRead - BufferSize + j;
endpoints.Add(seekPosition);
found = true;
break;
}
}
if (found) break;
}
}
// we need to seek to the position we got to in the StreamReader (but not going by how much was read).
fileStream.Seek(seekPosition, SeekOrigin.Current);
currentOffset += seekPosition;
}
}
return endpoints;
However, I get to two entries in endpoints and it exits out.
(bytesRead = await streamReader.ReadAsync(buffer, 0, buffer.Length)) > 0
The arguments you pass to ReadAsync I thought are solely to do with the buffer, so the index argument I thought was to say, fill the buffer at index.
I can't make out from Reference Source how this value is used.
I assumed (and can't find the evidence to back up) that, when you opened a StreamReader it uses the underlying Stream as it's guide, so when you ask to read some bytes, it will start from the position the underlying Stream is at...
But the results of what I'm doing aren't showing that, they seem to be showing that the StreamReader is starting at the beginning of the Stream each time - however, I can't find the evidence to support that is how it does it either...
Seeking
Is my understanding of seeking correct, in the sense that if I call seek
fileStream.Seek(seekPosition, SeekOrigin.Current);
If the file is at 300, I want to seek to 600, the above variable seekPosition should be 600??
ReferenceSource would say otherwise:
else if (origin == SeekOrigin.Current) {
// Don't call FlushRead here, which would have caused an infinite
// loop. Simply adjust the seek origin. This isn't necessary
// if we're seeking relative to the beginning or end of the stream.
offset -= (_readLen - _readPos);
}

So thanks to Hans Passant, I have got the answer:
var buffer = new char[BufferSize];
var endpoints = new List<long>();
using (var fileStream = this.CreateMultipleReadAccessFileStream(fileName))
{
var fileLength = fileStream.Length;
var seekPositionCount = fileLength / concurrentReads;
long currentOffset = 0;
for (var i = 0; i < concurrentReads; i++)
{
var seekPosition = seekPositionCount + currentOffset;
// seek the file forward
// fileStream.Seek(seekPosition, SeekOrigin.Current);
// setting true at the end is very important, keeps the underlying fileStream open.
using (var streamReader = this.CreateTemporaryStreamReader(fileStream))
{
// this is poor on performance, hence why you split the file here and read in new threads.
streamReader.DiscardBufferedData();
// you have to advance the fileStream here, because of the previous line
streamReader.BaseStream.Seek(seekPosition, SeekOrigin.Begin);
// this also seeks the file forward the amount in the buffer...
int bytesRead;
var totalBytesRead = 0;
while ((bytesRead = await streamReader.ReadAsync(buffer, 0, buffer.Length)) > 0)
{
totalBytesRead += bytesRead;
var found = false;
var gotR = false;
for (var j = 0; j < buffer.Length; j++)
{
if (buffer[j] == '\r')
{
gotR = true;
continue;
}
if (buffer[j] == '\n' && gotR)
{
// so we add the total bytes read, minus the current buffer amount read, then add how far into the buffer we actually read.
seekPosition += totalBytesRead - BufferSize + j;
endpoints.Add(seekPosition);
found = true;
break;
}
// if we have found new line then move the position to
}
if (found) break;
}
}
currentOffset = seekPosition;
}
}
return endpoints;
Note the new part, rather than doing this twice:
fileStream.Seek(seekPosition, SeekOrigin.Current);
I now use SeekOrigin.Begin and use the StreamReader to progress the underlying base stream:
// this is poor on performance, hence why you split the file here and read in new threads.
streamReader.DiscardBufferedData();
// you have to advance the fileStream here, because of the previous line
streamReader.BaseStream.Seek(seekPosition, SeekOrigin.Begin);
The DiscardBufferedData will mean that I'm always using the underlying stream position.

Processing C# filestream input in WHILE loop causing execution time error

I have a C# console app that I'm trying to create that processes all the files in a given directory and writes output to another given directory. I want to process the input files X bytes at a time.
namespace FileConverter
{
class Program
{
static void Main(string[] args)
{
string srcFolder = args[0];
string destFolder = args[1];
string[] srcFiles = Directory.GetFiles(srcFolder);
for (int s = 0; s < srcFiles.Length; s++)
{
byte[] fileBuffer;
int numBytesRead = 0;
int readBuffer = 10000;
FileStream srcStream = new FileStream(srcFiles[s], FileMode.Open, FileAccess.Read);
int fileLength = (int)srcStream.Length;
string destFile = destFolder + "\\" + Path.GetFileName(srcFiles[s]) + "-processed";
FileStream destStream = new FileStream(destFile, FileMode.OpenOrCreate, FileAccess.Write);
//Read and process the source file by some chunk of bytes at a time
while (numBytesRead < fileLength)
{
fileBuffer = new byte[readBuffer];
//Read some bytes into the fileBuffer
//TODO: This doesn't work on subsequent blocks
int n = srcStream.Read(fileBuffer, numBytesRead, readBuffer);
//If we didn't read anything, there's no more to process
if (n == 0)
break;
//Process the fileBuffer
for (int i = 0; i < fileBuffer.Length; i++)
{
//Process each byte in the array here
}
//Write data
destStream.Write(fileBuffer, numBytesRead, readBuffer);
numBytesRead += readBuffer;
}
srcStream.Close();
destStream.Close();
}
}
}
}
I'm running into an error at execution time at:
//Read some bytes into the fileBuffer
//TODO: This doesn't work on subsequent blocks
int n = srcStream.Read(fileBuffer, numBytesRead, readBuffer);
I don't want to load the entire file into memory, as it could possibly be many gigabytes in size. I really want to be able to read some number of bytes, process them, write them out to a file, and then read in the next X bytes and repeat.
It gets through one iteration of the loop, and then dies on the second. The error I get is:
"Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection."
The sample file I'm working with is about 32k.
Can anyone tell me what I'm doing wrong here?

The second parameter to Read is not the offset into the file - it is the offset into the buffer at which to start writing data. So just pass 0.
Also, don't assume the buffer is filled each time: you should only process "n" bytes from the buffer. And the buffer should be reused between iterations.
If you need to read exactly a number of bytes:
static void ReadOrThrow(Stream source, byte[] buffer, int count) {
int read, offset = 0;
while(count > 0 && (read = source.Read(buffer, offset, count)) > 0) {
offset += read;
count -= read;
}
if(count != 0) throw new EndOfStreamException();
}
Note that Write works similarly, so you need to pass 0 as the offset and n as the count.

It should be
destStream.Write(fileBuffer, numBytesRead, n);
numBytesRead += n;
because n is the actual number of bytes that was read

Remove last x lines from a streamreader

I need to read in all but the last x lines from a file to a streamreader in C#. What is the best way to do this?
Many Thanks!

If it's a large file, is it possible to just seek to the end of the file, and examine the bytes in reverse for the '\n' character? I am aware that \n and \r\n exists. I whipped up the following code and tested on a fairly trivial file. Can you try testing this on the files that you have? I know my solution looks long, but I think you'll find that it's faster than reading from the beginning and rewriting the whole file.
public static void Truncate(string file, int lines)
{
using (FileStream fs = File.Open(file, FileMode.OpenOrCreate, FileAccess.ReadWrite, FileShare.None))
{
fs.Position = fs.Length;
// \n \r\n (both uses \n for lines)
const int BUFFER_SIZE = 2048;
// Start at the end until # lines have been encountered, record the position, then truncate the file
long currentPosition = fs.Position;
int linesProcessed = 0;
byte[] buffer = new byte[BUFFER_SIZE];
while (linesProcessed < linesToTruncate && currentPosition > 0)
{
int bytesRead = FillBuffer(buffer, fs);
// We now have a buffer containing the later contents of the file
for (int i = bytesRead - 1; i >= 0; i--)
{
currentPosition--;
if (buffer[i] == '\n')
{
linesProcessed++;
if (linesProcessed == linesToTruncate)
break;
}
}
}
// Truncate the file
fs.SetLength(currentPosition);
}
}
private static int FillBuffer(byte[] buffer, FileStream fs)
{
if (fs.Position == 0)
return 0;
int bytesRead = 0;
int currentByteOffset = 0;
// Calculate how many bytes of the buffer can be filled (remember that we're going in reverse)
long expectedBytesToRead = (fs.Position < buffer.Length) ? fs.Position : buffer.Length;
fs.Position -= expectedBytesToRead;
while (bytesRead < expectedBytesToRead)
{
bytesRead += fs.Read(buffer, currentByteOffset, buffer.Length - bytesRead);
currentByteOffset += bytesRead;
}
// We have to reset the position again because we moved the reader forward;
fs.Position -= bytesRead;
return bytesRead;
}
Since you are only planning on deleting the end of the file, it seems wasteful to rewrite everything, especially if it's a large file and small N. Of course, one can make the argument that if someone wanted to eliminate all lines, then going from the beginning to the end is more efficient.

Since you are referring to lines in a file, I'm assuming it's a text file. If you just want to get the lines you can read them into an array of strings like so:
string[] lines = File.ReadAllLines(#"C:\test.txt");
Or if you really need to work with StreamReaders:
using (StreamReader reader = new StreamReader(#"C:\test.txt"))
{
while (!reader.EndOfStream)
{
Console.WriteLine(reader.ReadLine());
}
}

You don't really read INTO a StreamReader. In fact, for the pattern you're asking for you don't need the StreamReader at all. System.IO.File has the useful static method 'ReadLines' that you can leverage instead:
IEnumerable<string> allBut = File.ReadLines(path).Reverse().Skip(5).Reverse();
The previous flawed version, back in response to the comment thread
List<string> allLines = File.ReadLines(path).ToList();
IEnumerable<string> allBut = allLines.Take(allLines.Count - 5);

ArgumentOutOfRangeException when reading bytes from stream

I'm trying to read the response stream from an HttpWebResponse object. I know the length of the stream (_response.ContentLength) however I keep getting the following exception:
Specified argument was out of the range of valid values.
Parameter name: size
While debugging, I noticed that at the time of the error, the values were as such:
length = 15032 //the length of the stream as defined by _response.ContentLength
bytesToRead = 7680 //the number of bytes in the stream that still need to be read
bytesRead = 7680 //the number of bytes that have been read (offset)
body.length = 15032 //the size of the byte[] the stream is being copied to
The peculiar thing is that the bytesToRead and bytesRead variables are ALWAYS 7680, regardless of the size of the stream (contained in the length variable). Any ideas?
Code:
int length = (int)_response.ContentLength;
byte[] body = null;
if (length > 0)
{
int bytesToRead = length;
int bytesRead = 0;
try
{
body = new byte[length];
using (Stream stream = _response.GetResponseStream())
{
while (bytesToRead > 0)
{
// Read may return anything from 0 to length.
int n = stream.Read(body, bytesRead, length);
// The end of the file is reached.
if (n == 0)
break;
bytesRead += n;
bytesToRead -= n;
}
stream.Close();
}
}
catch (Exception exception)
{
throw;
}
}
else
{
body = new byte[0];
}
_responseBody = body;

You want this line:
int n = stream.Read(body, bytesRead, length);
to be this:
int n = stream.Read(body, bytesRead, bytesToRead);
You are saying the maximum number of bytes to read is the stream's length, but it isn't since it is actually only the remaining information in the stream after the offset has been applied.
You also shouldn't need this part:
if (n == 0)
break;
The while should end the reading correctly, and it is possible that you won't read any bytes before you have finished the whole thing (if the stream is filling slower than you are taking the data out of it)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Problem in splitting a file - c#

Your reading bufferlength of bytes. Shouldn't you set the offset like this then? offset += bufferlength;

Don't open your source file inside the loop, or you'll always read the first chunk. Open it before the loop, then make sure your offset is applied to the read.

Related

C# FileStream.Read doesn't read last block

Seek through FileStream then using StreamReader to read from there

Processing C# filestream input in WHILE loop causing execution time error

Remove last x lines from a streamreader

ArgumentOutOfRangeException when reading bytes from stream

Categories

Resources