Streaming programming - c#

I need to learn more about streaming techniques.
I am using biztalk and want to develop some custom pipelinecomponents.
For performance factors everything has to be in a stream based fashion.
I receive a streamed message, but I want to do some replacements in the text,
what I do now is:
string msg = "";
using(StreamReader r = new StreamReader(stream)){
msg = r.readToEnd();
}
//do replacements
//send stream away
StreamWriter...
As you see I break the stream when I execute r.readToEnd().
How can I edit the message in stream?
thx

You cannot. You can read parts of the message from the stream, replace what you want in each part and finally write processed part to another stream.
Using ReadToEnd is opposite to streaming concept. What I can suggest is that you should use:
using (StreamReader r = new StreamReader(stream))
using (StreamWriter w = new StreamWriter(someOutputStream))
{
string line = null;
while ((line = r.ReadLine()) != null)
{
line = DoReplacements(line);
w.WriteLine(line);
}
}

Related

C# - Read bytes from file from a specific string

I'm trying to parse a crg-file in C#. The file is mixed with plain text and binary data. The first section of the file contains plain text while the rest of the file is binary (lots of floats), here's an example:
$
$ROAD_CRG
reference_line_start_u = 100
reference_line_end_u = 120
$
$KD_DEFINITION
#:KRBI
U:reference line u,m,730.000,0.010
D:reference line phi,rad
D:long section 1,m
D:long section 2,m
D:long section 3,m
...
$
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
�#z����RA����\�l
...
I know I can read bytes starting at a specific offset but how do I find out which byte to start from? The last row before the binary section will always contain at least four dollar signs "$$$$". Here's what I've got so far:
using var fs = new FileStream(#"crg_sample.crg", FileMode.Open, FileAccess.Read);
var startByte = ??; // How to find out where to start?
using (BinaryReader reader = new BinaryReader(fs))
{
reader.BaseStream.Seek(startByte, SeekOrigin.Begin);
var f = reader.ReadSingle();
Debug.WriteLine(f);
}
When you have a mixture of text data and binary data, you need to treat everything as binary. This means you should be using raw Stream access, or something similar, and using binary APIs to look through the text data (often looking for cr/lf/crlf at bytes as sentinels, although it sounds like in your case you could just look for the $$$$ using binary APIs, then decode the entire block before, and scan forwards). When you think you have an entire line, then you can use Encoding to parse each line - the most convenient API being encoding.GetString(). When you've finished looking through the text data as binary, then you can continue parsing the binary data, again using the binary API. I would usually recommend against BinaryReader here too, because frankly it doesn't gain you much over more direct API. The other problem you might want to think about is CPU endianness, but assuming that isn't a problem: BitConverter.ToSingle() may be your friend.
If the data is modest in size, you may find it easiest to use byte[] for the data; either via File.ReadAllBytes, or by renting an oversized byte[] from the array-pool, and loading it from a FileStream. The Stream API is awkward for this kind of scenario, because once you've looked at data: it has gone - so you need to maintain your own back-buffers. The pipelines API is ideal for this, when dealing with large data, but is an advanced topic.
UPDATE: This code may not work as expected. Please review the valuable information in the comments.
using (var fs = new FileStream(#"crg_sample.crg", FileMode.Open, FileAccess.Read))
{
using (StreamReader sr = new StreamReader(fs, Encoding.ASCII, true, 1, true))
{
var line = sr.ReadLine();
while (!string.IsNullOrWhiteSpace(line) && !line.Contains("$$$$"))
{
line = sr.ReadLine();
}
}
using (BinaryReader reader = new BinaryReader(fs))
{
// TODO: Start reading the binary data
}
}
Solution
I know this is far from the most optimized solution but in my case it did the trick and since the plain text section of the file was known to be fairly small this didn't cause any noticable performance issues. Here's the code:
using var fileStream = new FileStream(#"crg_sample.crg", FileMode.Open, FileAccess.Read);
using var reader = new BinaryReader(fileStream);
var newLine = '\n';
var markerString = "$$$$";
var currentString = "";
var foundMarker = false;
var foundNewLine = false;
while (!foundNewLine)
{
var c = reader.ReadChar();
if (!foundMarker)
{
currentString += c;
if (currentString.Length > markerString.Length)
currentString = currentString.Substring(1);
if (currentString == markerString)
foundMarker = true;
}
else
{
if (c == newLine)
foundNewLine = true;
}
}
if (foundNewLine)
{
// Read binary
}
Note: If you're dealing with larger or more complex files you should probably take a look at Mark Gravell's answer and the comment sections.

How to read uploaded CSV UTF-8 for processing with CsvHelper?

My WebAPI allows a user to upload a CSV file and then parses the file. I use CsvHelper to do the heavy lifting of reading the CSV and mapping it to domain objects.
However, I have one customer who's files are in CSV UTF-8 format. The code that works for "vanilla" (ASCII) CSV files hurls when it tries to deal with CSV UTF-8.
Is there a way to import the CSV UTF-8 data and convert it to ASCII CSV so that my code will continue to work?
My current code looks like this:
//In my WebAPI Controller
//fileToProcess is IFormFile
byte[] fileBytes = new byte[fileToProcess.Length];
using(var stream = fileToProcess.OpenReadStream())
{
await stream.ReadAsync(fileBytes);
stream.Close();
}
var result = await ProcessFileAsync(fileBytes);
return OK(result);
...
//In a Parsing Class
public async Task<List<Client>> ProcessFileAsync(byte[] fileBytes)
{
List<Client> result = null;
var fileText = Encoding.Default.GetString(fileBytes);
using(var reader = new StringReader(fileText))
{
using(var csv = new CsvReader(reader))
{
csv.RegisterClassMap<ClientMap>();
result = csv.GetRecords<T>().ToList();
await PostProcess(result);
}
}
return result;
}
The problem is that CSV UTF-8 has the BOM so when CsvHelper tries to process a mapping that references the first column header
Map(c => c.ClientId).Name("CLIENT ID");
it fails because the column name includes the BOM.
So, my questions are:
How can I tell if the file coming in is UTF-8 or ASCII.
How do I convert the UTF-8 to ASCII so it can be processed normally?
NOTE
I did try the following:
fileBytes = Encoding.Convert(Encoding.UTF8, Encoding.ASCII, fileBytes);
However, this replaced the BOM with a ? which still causes CsvHelper to fail.
By doing this:
var fileText = Encoding.Default.GetString(fileBytes);
using(var reader = new StringReader(fileText))
... you're locking yourself into a specific encoding at the point of converting it to a string. Encoding.Default is can vary by platform and CLR implementation.
The StreamReader class is designed to read text from a stream (which you can wrap around the raw bytes with a MemoryStream) and is capable of detecting the encoding for you if you let it. Try this instead:
using (var stream = new MemoryStream(fileBytes))
using (var reader = new StreamReader(stream))
In your case, you could use the incoming stream directly by changing ProcessFileAsync to accept the stream.
using (var stream = fileToProcess.OpenReadStream())
{
var result = await ProcessFileAsync(stream);
return OK(result);
}
public async Task<List<Client>> ProcessFileAsync(Stream stream)
{
using (var reader = new StreamReader(stream))
{
using (var csv = new CsvReader(reader))
{
csv.RegisterClassMap<ClientMap>();
List<Client> result = csv.GetRecords<Client>().ToList();
await PostProcess(result);
return result;
}
}
}
As long as the BOM is present, this will also support UTF16-encoded and UTF32-encoded files (and pretty much anything else that can be detected) because it'll see the U+FEFF code point in whichever encoding it uses.

How to handle StreamReader?

I use StreamReader to read my csv file.
The problem is : i need to read this file twice, and in second time then i use StreamReader
StreamReader.EndOfStream is true and reading not executed.
using (var csvReader = new StreamReader(file.InputStream))
{
string inputLine = "";
var values = new List<string>();
while ((inputLine = csvReader.ReadLine()) != null)...
Can enybody help
Try file.InputStream.Seek(0, SeekOrigin.Begin); before you open the second StreamReader to reset the Stream to the starting point.
A much better approach(if possible) would be to store the file contents in memory, and re-use it from there.

Sending a multiline string over NamedPipe?

How can I send a multiline string with blank lines over a NamedPipe?
If I send a string
string text= #"line 1
line2
line four
";
StreamWriter sw = new StreamWriter(client);
sw.Write(text);
I get on the server side only "line 1":
StreamReader sr = new StreamReader(server);
string message = sr.ReadLine();
When I try something like this
StringBuilder message = new StringBuilder();
string line;
while ((line = sr.ReadLine()) != null)
{
message.Append(line + Environment.NewLine);
}
It hangs in the loop while the client is connected and only releases when the client disconnects.
Any ideas how I can get the whole string without hanging in this loop?
I need to to process the string and return it on the same way to the client.
It's important that I keep the original formatting of the string including blank lines and whitespace.
StreamReader is a line-oriented reader. It will read the first line (terminated by a newline). If you want the rest of the text, you have to issue multiple readlines. That is:
StreamReader sr = new StreamReader(server);
string message = sr.ReadLine(); // will get "line1"
string message2 = sr.ReadLine(); // will get "line2"
You don't want to "read to end" on a network stream, because that's going to hang the reader until the server closes the connection. That might be a very long time and could overflow a buffer.
Typically, you'll see this:
NetworkStream stream = CreateNetworkStream(); // however you're creating the stream
using (StreamReader reader = new StreamReader(stream))
{
string line;
while ((line = reader.ReadLine()) != null)
{
// process line received from stream
}
}
That gives you each line as it's received, and will terminate when the server closes the stream.
If you want the reader to process the entire multi-line string as a single entity, you can't reliably do it with StreamReader. You'll probably want to use a BinaryWriter on the server and a BinaryReader on the client.
Why not just call ReadToEnd() on StreamReader?
StreamReader sr = new StreamReader(server);
string message = sr.ReadToEnd();
I accomplished sending multiple line messages over a named pipe by using token substitution to replace all \r and \n in the message with tokens like {cr} and {lf} thus converting multiple lines messages into 1 line.
Of course, I needed to do the reverse conversion of the receiver side.
The assumption is that the substituted tokens will never be found in the source text.
You can make sure that this happens by also substituting all "{" with {lp}. That way the "{" becomes your escape character.
You can use Regex to make these Replacements.
That worked great and allowed me to use StreamReader.

Why does FileStream.Position increment in multiples of 1024?

I have a text file that I want to read line by line and record the position in the text file as I go. After reading any line of the file the program can exit, and I need to resume reading the file at the next line when it resumes.
Here is some sample code:
using (FileStream fileStream = new FileStream("Sample.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
fileStream.Seek(GetLastPositionInFile(), SeekOrigin.Begin);
using (StreamReader streamReader = new StreamReader(fileStream))
{
while (!streamReader.EndOfStream)
{
string line = streamReader.ReadLine();
DoSomethingInteresting(line);
SaveLastPositionInFile(fileStream.Position);
if (CheckSomeCondition())
{
break;
}
}
}
}
When I run this code, the value of fileStream.Position does not change after reading each line, it only advances after reading a couple of lines. When it does change, it increases in multiples of 1024. Now I assume that there is some buffering going on under the covers, but how can I record the exact position in the file?
It's not FileStream that's responsible - it's StreamReader. It's reading 1K at a time for efficiency.
Keeping track of the effective position of the stream as far as the StreamReader is concerned is tricky... particularly as ReadLine will discard the line ending, so you can't accurately reconstruct the original data (it could have ended with "\n" or "\r\n"). It would be nice if StreamReader exposed something to make this easier (I'm pretty sure it could do so without too much difficulty) but I don't think there's anything in the current API to help you :(
By the way, I would suggest that instead of using EndOfStream, you keep reading until ReadLine returns null. It just feels simpler to me:
string line;
while ((line = reader.ReadLine()) != null)
{
// Process the line
}
I would agree with Stefan M., it is probably the buffering which is causing the Position to be incorrect. If it is just the number of characters that you have read that you want to track than I suggest you do it yourself, as in:
using(FileStream fileStream = new FileStream("Sample.txt", FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
fileStream.Seek(GetLastPositionInFile(), SeekOrigin.Begin);
/**Int32 position = 0;**/
using(StreamReader streamReader = new StreamReader(fileStream))
{
while(!streamReader.EndOfStream)
{
string line = streamReader.ReadLine();
/**position += line.Length;**/
DoSomethingInteresting(line);
/**SaveLastPositionInFile(position);**/
if(CheckSomeCondition())
{
break;
}
}
}
}
Provide that your file is not too big, why not read the whole thing in big chuncks and then manipulate the string - probably faster than the stop and go i/o.
For example,
//load entire file
StreamReader srFile = new StreamReader(strFileName);
StringBuilder sbFileContents = new StringBuilder();
char[] acBuffer = new char[32768];
while (srFile.ReadBlock(acBuffer, 0, acBuffer.Length)
> 0)
{
sbFileContents.Append(acBuffer);
acBuffer = new char[32768];
}
srFile.Close();

Categories