Reading a stream that may have non-ASCII characters - c#

I have an application that reads string data in from a stream. The string data is typically in English but on occasion it encounters something like 'Jalapeño' and the 'ñ' comes out as '?'. In my implementation I'd prefer to read the stream contents into a byte array but I could get by reading the contents into a string. Any idea what I can do to make this work right?
Current code is as follows:
byte[] data = new byte[len]; // len is known a priori
byte[] temp = new byte[2];
StreamReader sr = new StreamReader(input_stream);
int position = 0;
while (!sr.EndOfStream)
{
int c = sr.Read();
temp = System.BitConverter.GetBytes(c);
data[position] = temp[0];
position++;
}
input_stream.Close();
sr.Close();

You can pass the encoding to the StreamReader as in:
StreamReader sr = new StreamReader(input_stream, Encoding.UTF8);
However, I understand that Encoding.UTF8 is used by default according to the documentation.
Update
The following reads 'Jalapeño' fine:
byte[] bytes;
using (var stream = new FileStream("input.txt", FileMode.Open, FileAccess.Read, FileShare.Read))
{
var index = 0;
var count = (int) stream.Length;
bytes = new byte[count];
while (count > 0)
{
int n = stream.Read(bytes, index, count);
if (n == 0)
throw new EndOfStreamException();
index += n;
count -= n;
}
}
// test
string s = Encoding.UTF8.GetString(bytes);
Console.WriteLine(s);
As does this:
byte[] bytes;
using (var stream = new FileStream("input.txt", FileMode.Open, FileAccess.Read, FileShare.Read))
{
var reader = new StreamReader(stream);
string text = reader.ReadToEnd();
bytes = Encoding.UTF8.GetBytes(text);
}
// test
string s = Encoding.UTF8.GetString(bytes);
Console.WriteLine(s);
From what I understand the 'ñ' character is represented as 0xc391 in the text when the text is stored with UTF encoding. When you only read a byte, you'll loose data.
I'd suggest reading the whole stream as a byte array (the first example) and then do the encoding. Or use StreamReader to do the work for you.

Since you're trying to fill the contents into a byte-array, don't bother with the reader - it isn't helping you. Use just the stream:
byte[] data = new byte[len];
int read, offset = 0;
while(len > 0 &&
(read = input_stream.Read(data, offset, len)) > 0)
{
len -= read;
offset += read;
}
if(len != 0) throw new EndOfStreamException();

Related

Read Chunk of data from file based on offset and length

int n = 0;
string encodeString = string.Empty;
using (FileStream fsSource = new FileStream("test.pdf", FileMode.Open, FileAccess.Read))
{
byte[] bytes = new byte[count];
n = fsSource.Read(bytes, offset, count);
encodeString = System.Convert.ToBase64String(bytes);
}
The above code is working fine if I provide offset-0 and length-1024, but the second time if I provide Offset-1024 and length-1024 it is returning an error.
My requirement is I want to get byte array data from offset to length.
1st chunk = 0-1024
2nd chunk = 1024-2048
..
Last chunk = SomeValue -Filesize.
Example in Node.js using readChunk.sync(file_path, Number(offset), Number(size)); - this code is able to get the byte array of data from offset to length.
public static string ReadFileStreamInChunks()
{
const int readChunkBufferLength = 1024;
string filePath = "test.pdf";
string encodeString = string.Empty;
var readChunk = new char[readChunkBufferLength];
int readChunkLength;
using (StringWriter sw = new StringWriter())
using (FileStream fs = new FileStream(filePath, FileMode.Open, FileAccess.Read))
using (StreamReader sr = new StreamReader(fs))
{
do
{
readChunkLength = sr.ReadBlock(readChunk, 0, readChunkBufferLength);
sw.Write(readChunk, 0, readChunkLength);
} while (readChunkLength > 0);
return sw.ToString();
}
}
actually i think your problem is understanding the concepts of these parameters in your code , Count is your Chunk Size and offset is where to start Reading so if you want to read (1): a part of File to end Just Add To Your Offset (Offset + count of Bytes you Want To Seek) but (2): if You Want to Read A Part Of File From Middle You Shouldn't Modify Count That Is Your Chunk Size You Should modify Where You Write Your Byte Array Usually It's a Do-While Loop like :
long position = 0;
do
{
// read bytes from input stream
int bytesRead = request.FileByteStream.Read(buffer, 0, chunkSize);
if (bytesRead == 0)
{
break;
}
// write bytes to output stream
writeStream.Write(buffer, 0, bytesRead);
position += bytesRead;
if(position == "the value you want")
break;
} while (true);

Read from "start" to "stop" as hex values from raw file using FileStream

I want to read from a "start" to a "stop" from a raw image file that I've created with FKT Imager.
I have a code that works, but I dont know if it's the best way of doing it?
// Read file, byte at the time (example 00, 5A)
int start = 512;
int stop = 3345332;
FileStream fs = new FileStream("file.001", FileMode.Open, FileAccess.Read);
int hexIn;
String hex;
String data = "";
fs.Position = start;
for (int i = 0; i < stop; i++) { // i = offset in bytes
hexIn = fs.ReadByte();
hex = string.Format("{0:X2}", hexIn);
data = data + hex;
} //for
fs.Close();
Console.Writeline("data=" + data);
You want to read a range of bytes from within a file. Why not reading all bytes in one go into an array and then do the transformation?
private string ReadFile(string filename, int offset, int length)
{
byte[] data = new byte[length];
using (FileStream fs = new FileStream(filename, FileMode.Open))
{
fs.Position = offset;
fs.Read(data, 0, length);
}
return string.Join("", data.Select(x => x.ToString("X2")));
}

How to read file by chunks

I'm a little bit confused aboot how i should read large file(> 8GB) by chunks in case each chunk has own size.
If I know chunk size it looks like code bellow:
using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read, ProgramOptions.BufferSizeForChunkProcessing))
{
using (BufferedStream bs = new BufferedStream(fs, ProgramOptions.BufferSizeForChunkProcessing))
{
byte[] buffer = new byte[ProgramOptions.BufferSizeForChunkProcessing];
int byteRead;
while ((byteRead = bs.Read(buffer, 0, ProgramOptions.BufferSizeForChunkProcessing)) > 0)
{
byte[] originalBytes;
using (MemoryStream mStream = new MemoryStream())
{
mStream.Write(buffer, 0, byteRead);
originalBytes = mStream.ToArray();
}
}
}
}
But imagine, I've read large file by chunks made some coding with each chunk(chunk's size after that operation has been changed) and written to another new file all processed chunks. And now I need to do the opposite operation. But I don't know exactly chunk size. I have an idea. After each chunk has been processed i have to write new chunk size before chunk bytes. Like this:
Number of block bytes
Block bytes
Number of block bytes
Block bytes
So in that case first what i need to do is read chunk's header and learn what is chunk size exactly. I read and write to file only byte arrays. But I have a question - how should look chunk's header ? May be header have to contain some boundary ?
If the file is rigidly structured so that each block of data is preceded by a 32-bit length value, then it is easy to read. The "header" for each block is just the 32-bit length value.
If you want to read such a file, the easiest way is probably to encapsulate the reading into a method that returns IEnumerable<byte[]> like so:
public static IEnumerable<byte[]> ReadChunks(string path)
{
var lengthBytes = new byte[sizeof(int)];
using (var fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read))
{
int n = fs.Read(lengthBytes, 0, sizeof (int)); // Read block size.
if (n == 0) // End of file.
yield break;
if (n != sizeof(int))
throw new InvalidOperationException("Invalid header");
int blockLength = BitConverter.ToInt32(lengthBytes, 0);
var buffer = new byte[blockLength];
n = fs.Read(buffer, 0, blockLength);
if (n != blockLength)
throw new InvalidOperationException("Missing data");
yield return buffer;
}
}
Then you can use it simply:
foreach (var block in ReadChunks("MyFileName"))
{
// Process block.
}
Note that you don't need to provide your own buffering.
try this
public static IEnumerable<byte[]> ReadChunks(string fileName)
{
const int MAX_BUFFER = 1048576;// 1MB
byte[] filechunk = new byte[MAX_BUFFER];
int numBytes;
using (var fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read))
{
long remainBytes = fs.Length;
int bufferBytes = MAX_BUFFER;
while (true)
{
if (remainBytes <= MAX_BUFFER)
{
filechunk = new byte[remainBytes];
bufferBytes = (int)remainBytes;
}
if ((numBytes = fs.Read(filechunk, 0, bufferBytes)) > 0)
{
remainBytes -= bufferBytes;
yield return filechunk;
}
else
{
break;
}
}
}
}

Missing bytes when reading from WebClient stream

Why am I missing bytes when reading from a WebClient stream as follows?
const int chuckDim = 80;
System.Net.WebClient client = new System.Net.WebClient();
Stream stream = client.OpenRead("http://media-cdn.tripadvisor.com/media/photo-s/01/70/3e/a9/needed-backup-lol.jpg");
//Stream stream = client.OpenRead("file:///C:/Users/Tanganello/Downloads/needed-backup-lol.jpg");
//searching file length
WebHeaderCollection whc = client.ResponseHeaders;
int totalLength = (Int32.Parse(whc["Content-Length"]));
byte[] buffer = new byte[totalLength];
//reading and writing
FileStream filestream = new FileStream("C:\\Users\\Tanganello\\Downloads\\clone1.jpg", FileMode.Create, FileAccess.ReadWrite);
int accumulator = 0;
while (accumulator + chuckDim < totalLength) {
stream.Read(buffer, accumulator, chuckDim);
filestream.Write(buffer, accumulator, chuckDim);
accumulator += chuckDim;
}
stream.Read(buffer, accumulator, totalLength - accumulator);
filestream.Write(buffer, accumulator, totalLength - accumulator);
stream.Close();
filestream.Flush();
filestream.Close();
this is what I get with the first stream:
http://img839.imageshack.us/img839/830/clone1h.jpg
The problem is that you are ignoring the return value of the Stream.Read Method:
count
The maximum number of bytes to be read from the current stream.
Return Value
The total number of bytes read into the buffer. This can be less than the number of bytes requested
You can avoid the whole business of reading and writing streams by simply using the WebClient.DownloadFile Method:
using (var client = new WebClient())
{
client.DownloadFile(
"http://media-cdn.tripadvisor.com/media/photo-s/01/70/3e/a9/needed-backup-lol.jpg",
"C:\\Users\\Tanganello\\Downloads\\clone1.jpg");
}
Alternatively, if you really want to use streams, you can simply use the Stream.CopyTo Method:
using (var client = new WebClient())
using (var stream = client.OpenRead("http://..."))
using (var file = File.OpenWrite("C:\\..."))
{
stream.CopyTo(file);
}
If you insist on really copying the bytes yourself, the correct way to do this would be as follows:
using (var client = new WebClient())
using (var stream = client.OpenRead("http://..."))
using (var file = File.OpenWrite("C:\\..."))
{
var buffer = new byte[512];
int bytesReceived;
while ((bytesReceived = stream.Read(buffer, 0, buffer.Length)) != 0)
{
file.Write(buffer, 0, bytesReceived);
}
}

Using Streams in C#

I'm interested in pulling a file from online a .txt file.
The txt file stores:
filename
md5 hash
filename
md5 hash
I am interested in getting the data from online then comparing the data to local files.
byte[] buffer = new byte[512];
WebRequest test = WebRequest.Create("http://www.domain.com/file.txt");
Stream something = test.GetRequestStream();
something.Read(buffer,0,20);
I don't quite understand streams and how to go about reading just one line from the file. I do not want to download the file first then retrieve the data. I'm interested in just pulling it from online. How different are "streams" vs normal IO, with StreamWriter and StreamReader?
EDIT--
WebRequest myWebRequest = WebRequest.Create("http://www.domain.com/file.txt");
WebResponse myReponse = myWebRequest.GetResponse();
Stream recStream = myReponse.GetResponseStream();
StreamReader reader = new StreamReader(recStream);
txt_status.Text = reader.ReadLine();
GetRequestStream provides a stream for writing to. If you want the returned data to walk through make use of GetResponseStream
...
Stream ReceiveStream = myWebResponse.GetResponseStream();
Encoding encode = System.Text.Encoding.GetEncoding("utf-8");
// Pipe the stream to a higher level stream reader with the required encoding format.
StreamReader readStream = new StreamReader( ReceiveStream, encode );
Console.WriteLine("\nResponse stream received");
Char[] read = new Char[256];
// Read 256 charcters at a time.
int count = readStream.Read( read, 0, 256 );
Console.WriteLine("HTML...\r\n");
while (count > 0)
{
// Dump the 256 characters on a string and display the string onto the console.
String str = new String(read, 0, count);
Console.Write(str);
count = readStream.Read(read, 0, 256);
}
...
If you're reading text, try using a TextReader
WebRequest test = WebRequest.Create("http://www.domain.com/file.txt");
Stream something = test.GetRequestStream();
TextReader reader = (TextReader)new StreamReader(something);
string textfile = reader.ReadToEnd();
All a stream is, is a sequence of bytes. MemoryStreams, FileStream, etc. all inherit from System.IO.Stream
If you are simply attempting to compare the MD5 has against a local file, you could do something such as the following (not tested):
// Download File
WebClient wc = new WebClient();
byte[] bytes = wc.DownloadData();
MD5 md5 = System.Security.Cryptography.MD5.Create();
byte[] hash = md5.ComputeHash(bytes);
StringBuilder onlineFile = new StringBuilder();
for (int i = 0; i < hash.Length; i++)
{
onlineFile.Append(hash[i].ToString("X2"));
}
// Load Local File
FileStream fs = new FileStream(#"c:\yourfile.txt",FileMode.Open);
byte[] fileBytes = new byte[fs.Length];
fs.Read(fileBytes, 0, fileBytes.Length);
byte[] hash = md5.ComputeHash(fileBytes);
StringBuilder localFile = new StringBuilder();
for (int i = 0; i < hash.Length; i++)
{
onlineFile.Append(hash[i].ToString("X2"));
}
if(localFile.ToString() == onlineFile.ToString())
{
// Match
}

Categories