Reading String from Stream - c#

I am encrypting data to a stream. If, for example, my data is of type Int32, I will use BitConverter.GetBytes(myInt) to get the bytes and then write those bytes to the stream.
To read the data back, I read sizeof(Int32) to determine the number of bytes to read, read those bytes, and then use BitConverter.ToInt32(byteArray, 0) to convert the bytes back to an Int32.
So how would I do this with a string? Writing the string is no problem. But the trick when reading the string is knowing how many bytes to read before I can then convert it back to a string.
I have found similar questions, but they seem to assume the string occupies the entire stream and just read to the end of the stream. But here, I can have any number of other items before and after the string.
Note that StringReader is not an option here since I want the option of handling file data that may be larger than I want to load into memory.

You would normally send a content length header, and then read the length determined by that information.
Here is some sample code:
public async Task ContinouslyReadFromStream(NetworkStream sourceStream, CancellationToken token)
{
while (!ct.IsCancellationRequested && sourceStream.CanRead)
{
while (sourceStream.CanRead && !sourceStream.DataAvailable)
{
// Avoid potential high CPU usage when doing stream.ReadAsync
// while waiting for data
Thread.Sleep(10);
}
var lengthOfMessage = BitConverter.ToInt32(await ReadExactBytesAsync(stream, 4, ct), 0);
var content = await ReadExactBytesAsync(stream, lengthOfMessage, ct);
// Assuming you use UTF8 encoding
var stringContent = Encoding.UTF8.GetString(content);
}
}
protected static async Task<byte[]> ReadExactBytesAsync(Stream stream, int count, CancellationToken ct)
{
var buffer = new byte[count];
var totalBytesRemaining = count;
var totalBytesRead = 0;
while (totalBytesRemaining != 0)
{
var bytesRead = await stream.ReadAsync(buffer, totalBytesRead, totalBytesRemaining, ct);
ct.ThrowIfCancellationRequested();
totalBytesRead += bytesRead;
totalBytesRemaining -= bytesRead;
}
return buffer;
}

The solutions that come to mind are to either provide a predetermined sentinel value to signal the end of the string (ASM uses a 0 byte for this, for example), or to provide a fixed-length block of metadata ahead of each new datatype. In that block of metadata would be the type and the length, plus whatever other information you found it useful to include.
For compactness I would use the sentinel value if it will work in your system.

Related

.NET Native incredibly slower than Debug build with ReadAsync calls

so I just found a really weird issue in my app and it turns out it was caused by the .NET Native compiler for some reason.
I have a method that compares the content of two files, and it works fine. With two 400KBs files, it takes like 0.4 seconds to run on my Lumia 930 in Debug mode. But, when in Release mode, it takes up to 17 seconds for no apparent reason. Here's the code:
// Compares the content of the two streams
private static async Task<bool> ContentEquals(ulong size, [NotNull] Stream fileStream, [NotNull] Stream testStream)
{
// Initialization
const int bytes = 8;
int iterations = (int)Math.Ceiling((double)size / bytes);
byte[] one = new byte[bytes];
byte[] two = new byte[bytes];
// Read all the bytes and compare them 8 at a time
for (int i = 0; i < iterations; i++)
{
await fileStream.ReadAsync(one, 0, bytes);
await testStream.ReadAsync(two, 0, bytes);
if (BitConverter.ToUInt64(one, 0) != BitConverter.ToUInt64(two, 0)) return false;
}
return true;
}
/// <summary>
/// Checks if the content of two files is the same
/// </summary>
/// <param name="file">The source file</param>
/// <param name="test">The file to test</param>
public static async Task<bool> ContentEquals([NotNull] this StorageFile file, [NotNull] StorageFile test)
{
// If the two files have a different size, just stop here
ulong size = await file.GetFileSizeAsync();
if (size != await test.GetFileSizeAsync()) return false;
// Open the two files to read them
try
{
// Direct streams
using (Stream fileStream = await file.OpenStreamForReadAsync())
using (Stream testStream = await test.OpenStreamForReadAsync())
{
return await ContentEquals(size, fileStream, testStream);
}
}
catch (UnauthorizedAccessException)
{
// Copy streams
StorageFile fileCopy = await file.CreateCopyAsync(ApplicationData.Current.TemporaryFolder);
StorageFile testCopy = await file.CreateCopyAsync(ApplicationData.Current.TemporaryFolder);
using (Stream fileStream = await fileCopy.OpenStreamForReadAsync())
using (Stream testStream = await testCopy.OpenStreamForReadAsync())
{
// Compare the files
bool result = await ContentEquals(size, fileStream, testStream);
// Delete the temp files at the end of the operation
Task.Run(() =>
{
fileCopy.DeleteAsync(StorageDeleteOption.PermanentDelete).Forget();
testCopy.DeleteAsync(StorageDeleteOption.PermanentDelete).Forget();
}).Forget();
return result;
}
}
}
Now, I have absolutely no idea why this same exact method goes from 0.4 seconds all the way up to more than 15 seconds when compile with the .NET Native toolchain.
I fixed this issue using a single ReadAsync call to read the entire files, then I generated two MD5 hashes from the results and compared the two. This approach worked in around 0.4 seconds on my Lumia 930 even in Release mode.
Still, I'm curious about this issue and I'd like to know why it was happening.
Thank you in advance for your help!
EDIT: so I've tweaked my method in order to reduce the number of actual IO operations, this is the result and it looks like it's working fine so far.
private static async Task<bool> ContentEquals(ulong size, [NotNull] Stream fileStream, [NotNull] Stream testStream)
{
// Initialization
const int bytes = 102400;
int iterations = (int)Math.Ceiling((double)size / bytes);
byte[] first = new byte[bytes], second = new byte[bytes];
// Read all the bytes and compare them 8 at a time
for (int i = 0; i < iterations; i++)
{
// Read the next data chunk
int[] counts = await Task.WhenAll(fileStream.ReadAsync(first, 0, bytes), testStream.ReadAsync(second, 0, bytes));
if (counts[0] != counts[1]) return false;
int target = counts[0];
// Compare the first bytes 8 at a time
int j;
for (j = 0; j < target; j += 8)
{
if (BitConverter.ToUInt64(first, j) != BitConverter.ToUInt64(second, j)) return false;
}
// Compare the bytes in the last chunk if necessary
while (j < target)
{
if (first[j] != second[j]) return false;
j++;
}
}
return true;
}
Reading eight bytes at a time from an I/O device is a performance disaster. That's why we are using buffered reading (and writing) in the first place. It takes time for an I/O request to be submitted, processed, executed and finally returned.
OpenStreamForReadAsync appears to not be using a buffered stream. So your 8-byte requests are actually requesting 8 bytes at a time. Even with the solid-state drive, this is very slow.
You don't need to read the whole file at once, though. The usual approach is to find a reasonable buffer size to pre-read; something like reading 1 kiB at a time should fix your whole issue without requiring you to load the whole file in memory at once. You can use BufferedStream between the file and your reading to handle this for you. And if you're feeling adventurous, you could issue the next read request before the CPU processing is done - though it's very likely that this isn't going to help your performance much, given how much of the work is just I/O.
It also seems that .NET native has a lot bigger overhead than managed .NET for asynchronous I/O in the first place, which would make those tiny asynchronous calls all the more of a problem. Fewer requests of larger data will help.

Implementing IRandomAccessStream, not copying buffers

I'm a bit confused with what I'm supposed to do with targetBuffer in ReadAsync() implementation (Unversal store application for win 8.1).
public IAsyncOperationWithProgress<IBuffer, uint> ReadAsync(IBuffer targetBuffer, uint count, InputStreamOptions options)
The problem is, I can't find a way to write to targetBuffer and to change its Length given my specific implementation requirements.
What I have inside is an encrypted stream with some block cipher. I want to wrap it with IRandomAccessStream, so it can be used with xaml framework components (such as passing encrypted images/video to Image or MediaElement objects). Inside the class I have an array of bytes which I reuse for every block, passing it to encryption library which fills it and reports chunk size.
So, when IRandomAccessStream.ReadAsync() is called, I need to somehow get my bytes into the targetBuffer and set its Length to proper value... Which I don't seem to manage.
I tried this:
var stream = targetBuffer.AsStream();
while(count > 0) {
/* doing something to get next chunk of data decrypted */
// byte[] chunk is the array used to hold decrypted data
// int chunkLength is the length of data (<= chunk.Length)
count -= chunkLength;
await stream.WriteAsync(chunk, 0, chunkLength);
}
return targetBuffer;
And targetBuffer.Length remains zero, yet if I try to print its content, the data is there!
Debug.WriteLine(targetBuffer.GetByte(0..N));
I now have a naïve implementation that uses a memory stream (in addition to byte array buffer), collects data there and reads back from it to targetBuffer. This works, but looks bad. Managed streams write to byte[] and WinRT streams write to IBuffer, and I just can't find a way around, so that I don't waste memory and performance.
I'd appreciate any ideas.
This is what it looks like now. I end up using a byte array as a decryption buffer and a resizeable memory stream as a proxy.
public IAsyncOperationWithProgress<IBuffer, uint> ReadAsync(IBuffer targetBuffer, uint count, InputStreamOptions options)
{
return AsyncInfo.Run<IBuffer, uint>(async (token, progress) => {
Transport.Seek(0); // Transport is InMemoryRandomAccessStream
var remaining = count;
while(remaining > 0) {
/*
ReadAsync() overload reads & decrypts data,
result length is <= remaining bytes,
deals with block cipher alignment and the like
*/
IBuffer chunk = await ReadAsync(remaining);
await Transport.WriteAsync(chunk);
remaining -= chunk.Length;
}
Transport.Seek(0);
// copy resulting bytes to target buffer
await Transport.ReadAsync(targetBuffer, count, InputStreamOptions.None);
return targetBuffer;
});
}
UPDATE: I've tested the solution above with an encrypted image of 7.9Mb. I fed it to Image instance like this:
var image = new BitmapImage();
await image.SetSourceAsync(myCustomStream);
Img.Source = image; // Img is <Image> in xaml
All is Ok untill execution reaches await Transport.ReadAsync(targetBuffer, count, InputStreamOptions.None);: there memory consumption skyrockets (from around 33mb to 300+mb), which effectively crashes phone emulator (desktop version shows image alright, though memory is consumed just the same). The hell is going on there?!
SOLVED in March 2017
First, I somehow did not realize I could just set the Length directly after writing data to buffer. Second, if yoou do just about anything wrong in my case (custom IRandomAccessStream implementation is the source for a XAML Image element), the app crashes not leaving any logs and not showing any errors, so it's really hard to figure out what has gone awry.
This is what the code looks like now:
public IAsyncOperationWithProgress<IBuffer, uint> ReadAsync(IBuffer targetBuffer, uint count, InputStreamOptions options)
{
return AsyncInfo.Run<IBuffer, uint>(async (token, progress) => {
var output = targetBuffer.AsStream();
while (count > 0) {
//
// do all the decryption stuff and get decrypted data
// to a reusable buffer byte array
//
int bytes = Math.Min((int) count, BufferLength - BufferPosition);
output.Write(decrypted, bufferPosition, bytes);
targetBuffer.Length += (uint)bytes;
BufferPosition += bytes;
progress.Report((uint)bytes);
count -= (uint)bytes;
}
}
return targetBuffer;
});
using System.Runtime.InteropServices.WindowsRuntime;
(your byte array).CopyTo(targetBuffer);
the Length property in IBuffer has a setter
the following code is perfectly valid
targetBuffer.Length = (your integer here)
you have more variants of CopyTo to choose from. have a look at this one:
public static void CopyTo(this byte[] source, int sourceIndex, IBuffer destination, uint destinationIndex, int count);

Understanding the NetworkStream.EndRead()-example from MSDN

I tried to understand the MSDN example for NetworkStream.EndRead(). There are some parts that i do not understand.
So here is the example (copied from MSDN):
// Example of EndRead, DataAvailable and BeginRead.
public static void myReadCallBack(IAsyncResult ar ){
NetworkStream myNetworkStream = (NetworkStream)ar.AsyncState;
byte[] myReadBuffer = new byte[1024];
String myCompleteMessage = "";
int numberOfBytesRead;
numberOfBytesRead = myNetworkStream.EndRead(ar);
myCompleteMessage =
String.Concat(myCompleteMessage, Encoding.ASCII.GetString(myReadBuffer, 0, numberOfBytesRead));
// message received may be larger than buffer size so loop through until you have it all.
while(myNetworkStream.DataAvailable){
myNetworkStream.BeginRead(myReadBuffer, 0, myReadBuffer.Length,
new AsyncCallback(NetworkStream_ASync_Send_Receive.myReadCallBack),
myNetworkStream);
}
// Print out the received message to the console.
Console.WriteLine("You received the following message : " +
myCompleteMessage);
}
It uses BeginRead() and EndRead() to read asynchronously from the network stream.
The whole thing is invoked by calling
myNetworkStream.BeginRead(someBuffer, 0, someBuffer.Length, new AsyncCallback(NetworkStream_ASync_Send_Receive.myReadCallBack), myNetworkStream);
somewhere else (not displayed in the example).
What I think it should do is print the whole message received from the NetworkStream in a single WriteLine (the one at the end of the example). Notice that the string is called myCompleteMessage.
Now when I look at the implementation some problems arise for my understanding.
First of all: The example allocates a new method-local buffer myReadBuffer. Then EndStream() is called which writes the received message into the buffer that BeginRead() was supplied. This is NOT the myReadBuffer that was just allocated. How should the network stream know of it? So in the next line numberOfBytesRead-bytes from the empty buffer are appended to myCompleteMessage. Which has the current value "". In the last line this message consisting of a lot of '\0's is printed with Console.WriteLine.
This doesn't make any sense to me.
The second thing I do not understand is the while-loop.
BeginRead is an asynchronous call. So no data is immediately read. So as I understand it, the while loop should run quite a while until some asynchronous call is actually executed and reads from the stream so that there is no data available any more. The documentation doesn't say that BeginRead immediately marks some part of the available data as being read, so I do not expect it to do so.
This example does not improve my understanding of those methods. Is this example wrong or is my understanding wrong (I expect the latter)? How does this example work?
I think the while loop around the BeginRead shouldn't be there. You don't want to execute the BeginRead more than ones before the EndRead is done. Also the buffer needs to be specified outside the BeginRead, because you may use more than one reads per packet/buffer.
There are some things you need to think about, like how long are my messages/blocks (fixed size). Shall I prefix it with a length. (variable size) <datalength><data><datalength><data>
Don't forget it is a Streaming connection, so multiple/partial messages/packets can be read in one read.
Pseudo example:
int bytesNeeded;
int bytesRead;
public void Start()
{
bytesNeeded = 40; // u need to know how much bytes you're needing
bytesRead = 0;
BeginReading();
}
public void BeginReading()
{
myNetworkStream.BeginRead(
someBuffer, bytesRead, bytesNeeded - bytesRead,
new AsyncCallback(EndReading),
myNetworkStream);
}
public void EndReading(IAsyncResult ar)
{
numberOfBytesRead = myNetworkStream.EndRead(ar);
if(numberOfBytesRead == 0)
{
// disconnected
return;
}
bytesRead += numberOfBytesRead;
if(bytesRead == bytesNeeded)
{
// Handle buffer
Start();
}
else
BeginReading();
}

Infinite Do...while loop using Async Await

I have following code:
public static async Task<string> ReadLineAsync(this Stream stream, Encoding encoding)
{
byte[] byteArray = null;
using (MemoryStream ms = new MemoryStream())
{
int bytesRead= 0;
do
{
byte[] buf = new byte[1024];
try
{
bytesRead = await stream.ReadAsync(buf, 0, 1024);
await ms.WriteAsync(buf, 0, bytesRead);
}
catch (Exception e)
{
Console.WriteLine(e.Message + e.StackTrace);
}
} while (stream.CanRead && bytesRead> 0);
byteArray = ms.ToArray();
return encoding.GetString(ms.ToArray());
}
I am trying to read Stream to write into MemoryStream asynchronously, but the Do...while loop is failing to break. I mean it's an infinite loop. How to solve this?
First, in an exceptional situation, your loop would continue indefinitely. You shouldn't catch and ignore exceptions.
Secondly, if the stream doesn't actually end, then bytesRead would never be zero. I suspect this is the case because the name of the method (ReadLineAsync) doesn't imply to me that it will read until the end of the stream.
P.S. CanRead does not ever change for a specific stream. It's whether it makes semantic sense for a stream to do a read operation, not whether it can read right now.
You have your loop condition set to run as long as CanRead is true and bytesRead is greater then 0. CanRead will always be true if your file is readable. This means as long as you start reading your bytes will always be greater than zero. You need to have a maximum number of bytes to be read as well as a minimum or set some other control to break out.
So, you are taking a stream from IMAP and this method is for converting that steam into text?
Why not construct a SteamReader round the stream and call either it's ReadToEndAsync or just ReadToEnd? I doubt the need for making this an Async operation, if the stream is something like an e-mail then it is unlikely to be so big that a user will notice the UI blocking while it reads.
If, as one of your comments suggests, this isn't a UI app at all then it is probably even less of an issue.
If my assumptions are wrong then could I ask you to update your question with some more information about how this function is being used. The more information you can tell us, the better our answers can be.
EDIT:
I just noticed that your method is called ReadLineAsync, although I can't see anywhere in the code that you are looking for a line ending. If your intention is to read a line of text then the SteamReader also provides ReadLine and ReadLineAsync.
I took your method and modified it just a tad by shortening the read buffer size and adding some debug statements
public static async Task<string> ReadLineAsync(this Stream stream, Encoding encoding)
{
const int count = 2;
byte[] byteArray = Enumerable.Empty<byte>().ToArray();
using (MemoryStream ms = new MemoryStream())
{
int bytesRead = 0;
do
{
byte[] buf = new byte[count];
try
{
bytesRead = await stream.ReadAsync(buf, 0, count);
await ms.WriteAsync(buf, 0, bytesRead);
Console.WriteLine("{0:ffffff}:{1}:{2}",DateTime.Now, stream.CanRead, bytesRead);
}
catch (Exception e)
{
Console.WriteLine(e.Message + e.StackTrace);
}
} while (stream.CanRead && bytesRead > 0);
byteArray = ms.ToArray();
return encoding.GetString(byteArray);
}
}
but basically it worked as expected with the following call:
private static void Main(string[] args)
{
FileStream stream = File.OpenRead(#"C:\in.txt");
Encoding encoding = Encoding.GetEncoding(1252);
Task<string> result = stream.ReadLineAsync(encoding);
result.ContinueWith(o =>
{
Console.Write(o.Result);
stream.Dispose();
});
Console.WriteLine("Press ENTER to continue...");
Console.ReadLine();
}
so I'm wondering could it be something with your input file? Mine was (encoded in Windows-1252 in Notepad++)
one
two
three
and my output was
Press ENTER to continue...
869993:True:2
875993:True:2
875993:True:2
875993:True:2
875993:True:2
875993:True:2
875993:True:2
875993:True:1
875993:True:0
one
two
three
note how the "Press ENTER to continue..." was printed first as expected because the main method was invoked asynchronously, and CanRead is always true because it means the file is readable. Its the state of how the file was opened, not the state meaning that the cursor is at the EOF.
From my POV, looks like your code is trying to do the following:
read an entire stream as a sequence of 1024-octet chunks,
concatenate all those chunks into a MemoryStream (which uses a byte array as its backing store),
convert the MemoryStream to a string using the specified encoding
return that string to the caller.
This seems...complicated to me. Maybe I'm missing something, but to use async and await, you've got to be using VS2012 and .Net 4.5, or VS2010. .Net 4.0 and the Async CTP, right? If so, why wouldn't you simply use a StreamReader and its StreamReader.ReadToEndAsync() method?
public static async Task<string> MyReadLineAsync(this Stream stream, Encoding encoding)
{
using ( StreamReader reader = new StreamReader( stream , encoding ) )
{
return await reader.ReadToEndAsync() ;
}
}
The overlapping i/o idea is nice, but the time required to write to a memory stream is, to say the least, not enough to make one whit of difference with respect to the time required to peform actual I/O (presumably your input stream is doing disk or network i/o).

How to ensure that all data are read from a NetworkStream

Is sure that all data are read from a NetworkStream when DataAvailable is false?
Or does the sender of the data have to send the length of the data first.
And I have to read until I have read the number of bytes specified by the sender?
Sampel:
private Byte[] ReadStream(NetworkStream ns)
{
var bl = new List<Byte>();
var receivedBytes = new Byte[128];
while (ns.DataAvailable)
{
var bytesRead = ns.Read(receivedBytes, 0, receivedBytes.Length);
if (bytesRead == receivedBytes.Length)
bl.AddRange(receivedBytes);
else
bl.AddRange(receivedBytes.Take(bytesRead));
}
return bl.ToArray();
}
DataAvailable just tells you what is buffered and available locally. It means exactly nothing in terms of what is likely to arrive. The most common use of DataAvailable is to decide between a sync read and an async read.
If you are expecting the inbound stream to close after the send, then you can just keep using Read until a non-positive result is achieved, which tells you it has reached the end. If they are sending multiple frames, or just aren't closing - then yes: you'll need some way of detecting the end of a frame (=logical message). That can be via a length-prefix and counting, but it can also be via sentinel values. For example, in text-based protocols, \n or \r are often interpreted as "end of message".
So: it depends entirely on your protocol.
The easiest way would be to have a start/end character, so the message would be:
string message = "Hello";
string messageToSend = (char)2 + message + (char)3;

Categories