How to write a huge string to a NetworkStream? - c#

From the internet I got the way to read a huge string from a NetworkStream.
static NetworkStream ns = null;
static StringBuilder sb = null;
static byte[] buffer = null;
static int position = 0;
//.......................................
//other codes skipped for simplicity
//.......................................
private static string Read()
{
if (ns.CanRead)
{
sb.Clear();
position = 0;
while (ns.DataAvailable)
{
position = ns.Read(buffer, 0, buffer.Length);
sb.Append(Encoding.Unicode.GetString(buffer, 0, position));
}
return sb.ToString().Trim();
}
else
{
return null;
}
}
However, I cannot find an example how to write a huge string to a NetworkStream.
Is there a "symmetrical" pattern for writing as we do for reading?
Thank you in advance.

That reading code is dangerously wrong in many ways:
By using static variables in this way, it's hopelessly unsuitable for multi-threaded tasks. (I hope that's just due to you simplifying it...)
It never initializes the variables to non-null values - again, hopefully that's not the real code
It uses the DataAvailable property to decide when it should be "done" - that's incredibly dangerous as it means if a packet is delayed in the stream, you could read half as much data as you expected to
It uses Encoding.Unicode always, which is rarely the best choice of encoding
It assumes that it will always read a whole number of characters. What if one character is split between reads? That's what the Encoder/Decoder classes are for... but you don't really need to use them here anyway - see below.
I would strongly suggest that you wrap the NetworkStream in a StreamReader for reading, and a StreamWriter for writing. That's what they're for. You can then read a line at a time, or just a char[] buffer, or to the end of the stream (which means "until the socket is closed"). This is fine for a text-only protocol.
If you've got a protocol which mixes text and binary data, life becomes a lot harder. Personally I like protocols which length-prefix messages - that way you can read only the data you're meant to, and then perform whatever conversion you want.
Anyway, I hope this random selection of thoughts helps... if you want more detailed assistance, please provide details of what protocol you're using.

Related

C# Converting object to byte array

I'm making a chat system thing with tcp which requires to send things in byte arrays, but when I convert an image into a byte array, send it and then convert back it gives this error: 'End of Stream encountered before parsing was completed.'. With strings it works just fine.
public byte[] ObjectToByteArray(object obj)
{
BinaryFormatter formatter = new BinaryFormatter();
using (var stream = new MemoryStream())
{
formatter.Serialize(stream, obj);
return stream.ToArray();
}
}
public object ByteArrayToObject(byte[] bytes)
{
using (var stream = new MemoryStream())
{
var binForm = new BinaryFormatter();
stream.Write(bytes, 0, bytes.Length);
stream.Position = 0;
var obj = binForm.Deserialize(stream);
return obj;
}
}
There's two separate things here; firstly, and I cannot emphasize this enough; do not use BinaryFormatter. Ever. It will hurt you. Lots of serializers exist, and BinaryFormatter (and the cousin NetDataContractSerializer) is literally the absolute last you should use. I can expand on that if you like, or I can suggest alternatives if you like.
Now; as for the actual problem: I strongly suspect that it isn't what you think it is. I have a hunch, based on decades of working on network code, that the real problem here is "framing". By which I mean: TCP is a stream protocol, not a message/packet protocol. I strongly suspect that you have not correctly deframed the exact bytes that were sent. I can't say this for sure without seeing your socket code, but... as I say: it is an hunch based on lots of experience. To investigate this: note the length of the bytes you send, and note the length of the bytes you've received. I'm pretty sure you'll find they are different. If there's still doubt: get the base-64 or hex string of the sent payload and the received payload (Convert.ToBase64String, for example), and compare that string. I'm pretty sure they'll turn out to be different.
Ultimately, network code is hard; I could try and explain individual points, but "how to correctly send messages over a network" could fill a book. IMO, if you're not interested in specializing in writing network code for the next 5 years: use an existing tool that will do the job for you, for example gRPC. Lots and lots of other messaging RPC tools exist.

How to import and read large binary file data in c#?

i have a large binary file that contains different data types, i can access single records in the file but i am not sure how to loop over the binary values and load it in the memory stream byte by byte
i have been using binary reader
BinaryReader binReader = new BinaryReader(File.Open(fileName, FileMode.Open));
Encoding ascii = Encoding.ASCII;
string authorName = binReader.ReadString();
Console.WriteLine(authorName);
Console.ReadLine();
but this won't work since i have a large file with different data types
simply, i need to convert the file to read byte by byte and then read these data either if it's a string or whatsoever.
would appreciate any thought that can help
This will very much depend on what format the file is in. Each byte in the file might represent different things, or it might just represent values from a large array, or some mix of the two.
You need to know what the format looks like to be able to read it, since binary files are not self-descriptive. Reading a simple object might look like
var authorName = binReader.ReadString();
var publishDate = DateTime.FromBinary(binReader.ReadInt64());
...
If you have a list of items it is common to use a length prefix. Something like
var numItems = binReader.ReadInt32();
for(int i = 0; i < numItems; i++){
var title = binReader.ReadString();
...
}
You would then typically create one or more objects from the data that can be used in the rest of the application. I.e.
new Bibliography(authorName, publishDate , books);
If this is a format you do not control I hope you have a detailed specification. Otherwise this is kind of a lost cause for anything but the cludgiest solutions.
If there is more data than can fit in memory you need some kind of streaming mechanism. I.e. read one item, do some processing of the item, save the result, read the next item, etc.
If you do control the format I would suggest alternatives that are easier to manage. I have used protobuf.Net, and I find it quite easy to use, but there are other alternatives. The common way to use these kinds of libraries is to create a class for the data, and add attributes for the fields that should be stored. The library can manage serialization/deserialization automatically, and usually handle things like inheritance and changes to the format in an easy way.
Here's a simple bit of code that shows the most basic way of doing it.
using System;
using System.IO;
using System.Linq;
using System.Threading.Tasks;
namespace binary_read
{
class Program
{
private static readonly int bufferSize = 1024;
static async Task Main(string[] args)
{
var bytesRead = 0;
var totalBytes = 0;
using (var stream = File.OpenRead(args.First()))
{
do
{
var buffer = new byte[bufferSize];
bytesRead = await stream.ReadAsync(buffer, 0, bufferSize);
totalBytes += bytesRead;
// Process buffer
} while (bytesRead > 0);
Console.WriteLine($"Processed {totalBytes} bytes.");
}
}
}
}
The main bit to take note of is within the using block.
Firstly, when working with files/streams/sockets it's best to use using if possible to deterministically clean up after yourself.
Then it's really just a matter of calling Read/ReadAsync on the stream if you're just after the raw data. However there are various 'readers' that provide an abstraction to make working with certain formats easier.
So if you know that you're going to be reading ints and doubles and strings, then you can use the BinaryReader and it's ReadIntxx/ReadDouble/ReadString methods.
If you're reading into a struct, then you can read the properties in a loop as suggested by #JonasH above. Or use the method in this answer.

Why write to Stream in chunks?

I am wondering why so many examples read byte arrays into streams in chucks and not all at once... I know this is a soft question, but I am interested.
I understand a bit about hardware and filling buffers can be very size dependent and you wouldn't want to write to the buffer again until it has been flushed to wherever it needs to go etc... but with the .Net platform (and other modern languages) I see examples of both. So when use which and when, or is the second an absolute no no?
Here is the thing (code) I mean:
var buffer = new byte[4096];
while (true)
{
var read = this.InputStream.Read(buffer, 0, buffer.Length);
if (read == 0)
break;
OutputStream.Write(buffer, 0, read);
}
rather than:
var buffer = new byte[InputStream.Length];
var read = this.InputStream.Read(buffer, 0, buffer.Length);
OutputStream.Write(buffer, 0, read);
I believe both are legal? So why go through all the fuss of the while loop (in whatever for you decide to structure it)?
I am playing devils advocate here as I want to learn as much as I can :)
In the first case, all you need is 4kB of memory. In the second case, you need as much memory as the input stream data takes. If the input stream is 4GB, you need 4GB.
Do you think it would be good if a file copy operation required 4GB of RAM? What if you were to prepare a disk image that's 20GB?
There is also this thing with pipes. You don't often use them on Windows, but a similar case is often seen on other operating systems. The second case waits for all data to be read, and only then writes them to the output. However, sometimes it is advisable to write data as soon as possible—the first case will start writing to the output stream as soon as the first 4kB of input is read. Think of serving web pages: it is advisable for a web server to send data as soon as possible, so that client's web browser will start rendering headers and first part of the content, not waiting for the whole body.
However, if you know that the input stream won't be bigger than 4kB, then both cases are equivalent.
Sometimes, InputStream.Length is not valid for some source, e.g from the net transport, or the buffer maybe huge, e.g read from a huge file. IMO.
It protects you from the situation where your input stream is several gigabytes long.
You have no idea how much data Read might return. This could create major performance problems if you're reading a very large file.
If you have control over the input, and are sure the size is reasonable, then you can certainly read the whole array in at once. But be especially careful if the user can supply an arbitrary input.

MemoryStream.WriteTo(Stream destinationStream) versus Stream.CopyTo(Stream destinationStream)

Which one is better : MemoryStream.WriteTo(Stream destinationStream) or Stream.CopyTo(Stream destinationStream)??
I am talking about the comparison of these two methods without Buffer as I am doing like this :
Stream str = File.Open("SomeFile.file");
MemoryStream mstr = new MemoryStream(File.ReadAllBytes("SomeFile.file"));
using(var Ms = File.Create("NewFile.file", 8 * 1024))
{
str.CopyTo(Ms) or mstr.WriteTo(Ms);// Which one will be better??
}
Update
Here is what I want to Do :
Open File [ Say "X" Type File]
Parse the Contents
From here I get a Bunch of new Streams [ 3 ~ 4 Files ]
Parse One Stream
Extract Thousands of files [ The Stream is an Image File ]
Save the Other Streams To Files
Editing all the Files
Generate a New "X" Type File.
I have written every bit of code which is actually working correctly..
But Now I am optimizing the code to make the most efficient.
It is an historical accident that there are two ways to do the same thing. MemoryStream always had the WriteTo() method, Stream didn't acquire the CopyTo() method until .NET 4.
The MemoryStream.WriteTo() version looks like this:
public virtual void WriteTo(Stream stream)
{
// Exception throwing code elided...
stream.Write(this._buffer, this._origin, this._length - this._origin);
}
The Stream.CopyTo() implementation like this:
private void InternalCopyTo(Stream destination, int bufferSize)
{
int num;
byte[] buffer = new byte[bufferSize];
while ((num = this.Read(buffer, 0, buffer.Length)) != 0)
{
destination.Write(buffer, 0, num);
}
}
Stream.CopyTo() is more universal, it works for any stream. And helps programmers that fumble copying data from, say, a NetworkStream. Forgetting to pay attention to the return value from Read() was a very common bug. But it of course copies the bytes twice and allocates that temporary buffer, MemoryStream doesn't need it since it can write directly from its own buffer. So you'd still prefer WriteTo(). Noticing the difference isn't very likely.
MemoryStream.WriteTo: Writes the entire contents of this memory stream to another stream.
Stream.CopyTo: Reads the bytes from the current stream and writes them to the destination stream. Copying begins at the current position in the current stream.
You'll need to seek back to 0, to get the whole source stream copied.
So I think MemoryStream.WriteTo better option for this situation
If you use Stream.CopyTo, you don't need to read all the bytes into memory to start with. However:
This code would be simpler if you just used File.Copy
If you are going to load all the data into memory, you can just use:
byte[] data = File.ReadAllBytes("input");
File.WriteAllBytes("output", data);
You should have a using statement for the input as well as the output stream
If you really need processing so can't use File.Copy, using Stream.CopyTo will cope with larger files than loading everything into memory. You may not need that, of course, or you may need to load the whole file into memory for other reasons.
If you have got a MemoryStream, I'd probably use MemoryStream.WriteTo rather than Stream.CopyTo, but it probably won't make much difference which you use, except that you need to make sure you're at the start of the stream when using CopyTo.
I think Hans Passant's claim of a bug in MemoryStream.WriteTo() is wrong; it does not "ignore the return value of Write()". Stream.Write() returns void, which implies to me that the entire count bytes are written, which implies that Stream.Write() will block as necessary to complete the operation to, e.g., a NetworkStream, or throw if it ultimately fails.
That is indeed different from the write() system call in ?nix, and its many emulations in libc and so forth, which can return a "short write". I suspect Hans leaped to the conclusion that Stream.Write() followed that, which I would have expected, too, but apparently it does not.
It is conceivable that Stream.Write() could perform a "short write", without returning any indication of that, requiring the caller to check that the Position property of the Stream has actually been advanced by count. That would be a very error-prone API, and I doubt that it does that, but I have not thoroughly tested it. (Testing it would be a bit tricky: I think you would need to hook up a TCP NetworkStream with a reader on the other end that blocked forever, and write enough to fill up the wire buffers. Or something like that...)
The comments for Stream.Write() are not quite unambiguous:
Summary:
When overridden in a derived class, writes a sequence of bytes to the current
stream and advances the current position within this stream by the number
of bytes written.
Parameters: buffer:
An array of bytes. This method copies count bytes from buffer to the current stream.
Compare that to the Linux man page for write(2):
write() writes up to count bytes from the buffer pointed buf to the file referred to by the file descriptor fd.
Note the crucial "up to". That sentence is followed by explanation of some of the conditions under which a "short write" might occur, making it very explicit that it can occur.
This is really a critical issue: we need to know how Stream.Write() behaves, beyond all doubt.
The CopyTo method creates a buffer, populates its with data from the original stream and then calls the Write method passing the created buffer as a parameter. The WriteTo uses the memoryStream's internal buffer to write. That is the difference. What is better - it is up to you to decide which method you prefer.
Creating a MemoryStream from a HttpInputStream in Vb.Net:
Dim filename As String = MyFile.PostedFile.FileName
Dim fileData As Byte() = Nothing
Using binaryReader = New BinaryReader(MyFile.PostedFile.InputStream)
binaryReader.BaseStream.Position = 0
fileData = binaryReader.ReadBytes(MyFile.PostedFile.ContentLength)
End Using
Dim memoryStream As MemoryStream = New MemoryStream(fileData)

Why reading my tcp inputstream lead a byte array fill in with null character only?

I am not used to C# (I do C++ usually) and try to debug an application that is not mine, at all.
My application tries to read a big line from a TCP socket. Let say around 140 000 characters. And it fails. Let me explain how.
My code is here (inside a loop actually )
System.IO.Stream inputStream;
//...
// Loop code:
buffer = new byte[2];
readByteForLength = inputStream.Read(buffer, 0, 2);
It turns out that Read() may fill in the buffer array correctly up to a point, where it fills it with NULL characters instead of valid values. And it returns 2 as it would in a correct case.
Do you have an idea why such NULL characters?
Is the tcp pacquet still on the network when I try to read more of my data?
Is there a limit for inputStream before it behaves wrongly ?
Update:
By the way doing so lead to the same kind of issue:
System.IO.StreamReader sr = new StreamReader(inputStream);
string s = sr.ReadToEnd();
File.WriteAllText(#"c:\temp\toto.txt", s);
Actually the toto file stops exactly where I encounter an issue in the first version of my code while it is a little bit longer because the rest of the line is then filled up with NULL characters, nearly up to 400 000!
The only thing reasonable idea is that you indeed do have zeroes in the incoming data.
Try sniffering on the communication with ethereal.
By the way: allocating RAM for every received data piece may be a wrong practice.

Categories