Data truncated and exception from XmlReader when reading Request body using StreamReader

Data truncated and exception from XmlReader when reading Request body using StreamReader - c#

I have a request that calls a post method. It is posting XML in the request content (but sending it as raw text). In testing, the length of the xml is 106880 characters.
In the wep api post method, I process the request body to pull out the XML and store each element/value in a dictionary using the following:
var stream = new System.IO.StreamReader(Request.Body);
XmlReaderSettings settings = new XmlReaderSettings() { Async = true };
using (XmlReader r = XmlReader.Create(stream, settings))
{
bool rowsExist = true;
while (rowsExist && await r.ReadAsync())
{
if (nodeType == r.NodeType)
{
var name = r.Name;
rowsExist = await r.ReadAsync();
if (r.NodeType == XmlNodeType.Text)
{
xmlDic[name] = r.Value;
}
}
}
}
This works fine with small XML, however when the text value is relatively large, when calling the second ReadAsync method, the data is truncated and the XmlReader throws an exception saying
"Synchronous operations are disallowed. Call ReadAsync or set AllowSynchronousIO to true instead."
The exception makes no sense because ReadAsync is being called, but appears to be related to the size of the data, as it wasn't doing it with a smaller set of XML.
I tested a workaround which is to read the entire request body into a string, and then run the XmlReader using the entire body. However, that does use up more memory as it is loading the entire request into memory first, something that shouldn't be necessary.
I wondered if there might be a default max size/limit that stream or XmlReader uses, and see that the XmlReader settings class has 2 properties that control the Max characters:
settings.MaxCharactersFromEntities
settings.MaxCharactersInDocument
However the first has a default set to 10000000, which is way more than I am posting, and the second is set to zero, which means no limit. As a reault, these don't appear to make any difference.
What could be causing this to fail when reading the body using a StreamReader?

Related

C# WebClient StreamReader continuous read missing last line

Greetings good people,
I have a rather strange issue (for me at least) regarding WebClient and reading data for a continuous data stream and I’m not really sure where the issue is. The stream receives data almost as expected, except the last row. But when new data arrives, the unprinted row prints above the new data.
For example, a set of lines is retreived and it could look like this:
<batch name="home">
<event id="1"/>
And when the next set arrives, it contains the missing end block from the above set:
</batch>
<batch name="home">
<event id="2"/>
The code presented is simplified, but hopefully is enough for getting a clearer picture.
WebClient _client = new WebClient();
_client.OpenReadCompleted += (sender, args) =>
{
using (var reader = new StreamReader(args.Result))
{
while (!reader.EndOfStream)
{
Console.WriteLine(reader.ReadLine());
}
}
};
_client.OpenReadAsync(new Uri("localhost:1234/testdata?keep=true"));
In this setup the reader.EndOfStream never gets to true because the stream doesn't end.
Anyone have a suggestion on how to retrieve the last line? Am I missing something or could the fault be with the API?
Kind regards :)

It seems there's simply no newline character after the batch element. In XML whitespace, including newlines, isn't significant so no newlines are required. XML doesn't allow multiple root elements though, which makes this scenario a bit weird.
In streaming scenarios it's common to send each message unindented (ie in a single line) and send either a newline or another uncommon character to mark the end of the message. One would expect either no newlines at all, or a newline after each batch, eg :
<batch name="home"><event id="1"/>...</batch>
<batch name="home"><event id="2"/>...</batch>
<batch name="home"><event id="3"/>...</batch>
In that case you could use just a ReadLine to read each message:
var client=new HttpClient();
using var stream=client.GetStreamAsync(serviceUrl);
using var reader=new StreamReader(stream);
while(true)
{
var msg=reader.ReadLine();
var doc=XDocument.Parse(msg);
...
}
Without another way to identify each message though, you'll have to read each element form the stream. Luckily, LINQ-to-XML makes it a bit easier to read elements :
using var reader=XmlReader.Create(stream,new XmlReaderSettings{ConformanceLevel = ConformanceLevel.Fragment});
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
if (reader.Name == "batch") {
XElement el = XElement.ReadFrom(reader) as XElement;
//Process the batch!
}
break;
}
}

C# Read csv from url and save to database

I'm trying to get data from a csv-file from a Webservice.
If i paste the url in my browser, the csv will be downloaded and look like the following example:
"ID","ProductName","Company"
"1","Apples","Alfreds futterkiste"
"2","Oranges","Alfreds futterkiste"
"3","Bananas","Alfreds futterkiste"
"4","Salad","Alfreds futterkiste"
...next 96 rows
However I don't want to download the csv-file first and then extract data from it afterwards.
The webservice uses pagination and returns 100 rows (determined by the &num-parameter with a max of 100). After the first request i can use the &next-parameter to fetch the next 100 rows based on ID. For instance the url
http://testWebservice123.com/Example.csv?auth=abc&number=100&next=100
will get me rows from ID 101 to 200. So if there are a lot of rows i would end up downloading a lot of csv-files and saving them to the harddrive. So instead of downloading the csv-files first and saving them hdd to I want to get data directly from the webservice to be able to write directly to a database without saving the csv-files.
After a bit of search I came up with the following solution
static void Main(string[] args)
{
string startUrl = "http://testWebservice123.com/Example.csv?auth=abc&number=100";
string url = "";
string deltaRequestParameter = "";
string lastLine;
int numberOfLines = 0;
do
{
url = startUrl + deltaRequestParameter;
WebClient myWebClient = new WebClient();
using (Stream myStream = myWebClient.OpenRead(url))
{
using (StreamReader sr = new StreamReader(myStream))
{
numberOfLines = 0;
while (!sr.EndOfStream)
{
var row = sr.ReadLine();
var values = row.Split(',');
//do whatever with the rows by now - i.e. write to console
Console.WriteLine(values[0] + " " + values[1]);
lastLine = values[0].Replace("\"", ""); //last line in the loop - get the last ID.
numberOfLines++;
deltaRequestParameter = "&next=" + lastLine;
}
}
}
} while (numberOfLines == 101); //since the header is returned each time the number of rows will be 101 until we get to the last request
}
but im not sure if this is an "up to date" way of doing this, or if there is a better way (easier/simpler)? In other words i'm insecure about whether using WebClient and StreamReader is the right way to go?
In this thread: how to read a csv file from a url?
WebClient.DownloadString is mentioned as well as WebRequest. But if I want to write to a database without saving csv to hdd which is the best option?
Furhtermore - will the approach I have taken save data to a temporary disk storage behind the scenes or will all data be read into memmory and then disposed when the loop completes?
I have read the following documentation but can't seem to find out what it does behind the scenes:
StreamReader: https://learn.microsoft.com/en-us/dotnet/api/system.io.streamreader?view=netframework-4.7.2
Stream: https://learn.microsoft.com/en-us/dotnet/api/system.io.stream?view=netframework-4.7.2
Edit:
I guess I could also be using the following "TextFieldParser"...but my questions is really still the same:
(using the Assembly Microsoft.VisualBasic)
using (Stream myStream = myWebClient.OpenRead(url))
{
using (TextFieldParser parser = new TextFieldParser(myStream))
{
numberOfLines = 0;
parser.TrimWhiteSpace = true; // if you want
parser.Delimiters = new[] { "," };
parser.HasFieldsEnclosedInQuotes = true;
while (!parser.EndOfData)
{
string[] line = parser.ReadFields();
Console.WriteLine(line[0].ToString() + " " + line[1].ToString());
numberOfLines++;
deltaRequestParameter = "&next=" + line[0].ToString();
}
}
}

The HttpClient class on System.Web.Http is available as of .Net 4.5. You have to work with async code, but it's not a bad idea to get into it if you're dealing with the web.
As sample data, I'll use jsonplaceholder's "todo" list. It provides json data, not csv data, but it gives a simple enough structure that can serve our purpose in the example below.
This is the core function, which fetches from jsonplaceholder in a similar way to your "testWebService123" site, although I'm just getting the first 3 todo's, as opposed to testing for when I've hit the last page (you would probably keep your do-while) logic on that one.
async void DownloadPagesAsync() {
for (var i = 1; i < 3; i++) {
var pageToGet = $"https://jsonplaceholder.typicode.com/todos/{i}";
using (var client = new HttpClient())
using (HttpResponseMessage response = await client.GetAsync(pageToGet))
using (HttpContent content = response.Content)
using (var stream = (MemoryStream) await content.ReadAsStreamAsync())
using (var sr = new StreamReader(stream))
while (!sr.EndOfStream) {
var row =
sr.ReadLine()
.Replace(#"""", "")
.Replace(",", "");
if (row.IndexOf(":") == -1)
continue;
var values = row.Split(':');
Console.WriteLine($"{values[0]}, {values[1]}");
}
}
}
This is how you would call the function, such as you would in a Main() method:
Task t = new Task(DownloadPagesAsync);
t.Start();
The new task, here is taking in an "action", or or in other words a function that returns void, as a parameter. Then you start the task. Be careful, it is asynchronous, so any code you have after t.Start() may very well run before your task completes.
As to your question as to whether the stream reads "in memory" or not, running GetType() on "stream" in the code resulted in a "MemoryStream" type, though it seems to only be recognized as a "Stream" object at compile time. A MemoryStream is definately in-memory. I'm not really sure if any of the other kinds of stream objects save temporary files behind the scenes, but I'm leaning towards not.
But looking into the inner workings of a class, though commendable, is not usually required for your anxiety about disposing. For any class, just see if it implements IDisposable. If it does, then put in in a "using" statement, as you have done in your code. When the program terminates, as expected or via error, the program will implement the proper disposures after control has passed out of the "using" block.
HttpClient is in fact the newer approach. From what I understand, it does not replace all of the functionality for WebClient, but is stronger in many respects. See this SO site for more details comparing the two classes.
Also, something to know about WebClient is that it can be simple, but limiting. If you run into issues, you will need to look into the HttpWebRequest class, which is a "lower level" class that gives you greater access to the nuts and bolts of things (such as working with cookies).

How to read only a specific part from HTTP Response body

I need to read only the mode segment from the below response body.
grant_type=password&username=demouser&password=test123&client_id=500DWCSFS-D3C0-4135-A188-17894BABBCCF&mode=device
I used the below function to read the HTTP body and it gives me the entire body. How to chop the mode segment without using substring or changing the value in seek() : bodyStream.BaseStream.Seek(3, SeekOrigin.Begin);
var bodyStream = new StreamReader(HttpContext.Current.Request.InputStream);
bodyStream.BaseStream.Seek(0, SeekOrigin.Begin);
var bodyText = bodyStream.ReadToEnd();

You can't. HTTP uses TCP, which requires you to read the entire body anyway, you can't "seek" into a TCP stream. Well, you can, but that still reads the entire body and discards the unused pieces.
So you have to read the entire stream, and you have to meaningfully parse it, because another parameter could also contain the string "mode", and it could also be at the start, so you also can't search for &mode.
Given this is a form post, you can simply access Request.Form["mode"]. If you do want to parse it yourself:
string formData;
using (reader = new StreamReader(HttpContext.Current.Request.InputStream))
{
formData = reader.ReadToEnd();
}
var queryString = HttpUtility.ParseQueryString(formData);
var mode = queryString["mode"];

MSMQ Save Failed MessageBody to Text File

I'm writing a Windows Service to listen and process messages from MSMQ. The listener has various error handling steps, but if all else fails, I want to save the body of the message to a text file so that I can look at it. However, I can't seem to extract the content of my messages when this condition is hit. The following code is a simple representation of the sections in question, and it always produces an empty text file even though I know the message I'm testing with is not empty. HOWEVER, if I comment-out the initial attempt to deserialize the XML, the fail safe does work and produces a text file with the message body. So I think the problem is something to do with how the deserialization attempt leaves the underlying Stream? Just to clarify, when the message contains valid XML that CAN be deserialized, the service all works fine and the fail-safe never comes into action.
MyClass myClass = null;
try
{
XmlSerializer serializer = new XmlSerializer(typeof(MyClass));
// Comment the following out and the fail safe works
// Let this run and fail and the text file below is always empty
myClass = (MyClass)serializer.Deserialize(m.BodyStream);
}
catch (Exception ex)
{
}
if (myClass == null)
{
string filePath = #"D:\path\file.txt";
m.Formatter = new ActiveXMessageFormatter();
StreamReader reader = new StreamReader(m.BodyStream);
File.WriteAllText(filePath, reader.ReadToEnd());
}

Depending on the formatter you are using:
For windows binary messages:
File.WriteAllText(<path>, (new UTF8Encoding()).GetString((byte[])msg.Body)); // for binary
For XML messages try this:
msg.Formatter = new XmlMessageFormatter(new String[] { "System.String, mscorlib" });
var text = msg.Body.ToString();
// write to file..
For neither binary or XML, use the native formatter:
msg.Formatter = new ActiveXMessageFormatter();
reader = new StreamReader(msg.BodyStream);
msgBody = reader.ReadToEnd();
// write to file..

Streaming large list of data as JSON format using Json.net

Using the MVC model, I would like to write a JsonResult that would stream the Json string to the client rather than converting all the data into Json string at once and then streaming it back to the client.
I have actions that require to send very large (over 300,000 records) as Json transfers and I think the basic JsonResult implementation is not scalable.
I am using Json.net, I am wondering if there is a way to stream the chunks of the Json string as it is being transformed.
//Current implementation:
response.Write(Newtonsoft.Json.JsonConvert.SerializeObject(Data, formatting));
response.End();
//I know I can use the JsonSerializer instead
Newtonsoft.Json.JsonSerializer serializer = new Newtonsoft.Json.JsonSerializer();
serializer.Serialize(textWriter, Data);
However I am not sure how I can get the chunks written into textWriter and write into response and call reponse.Flush() until all 300,000 records are converted to Json.
Is this possible at all?

Assuming your final output is a JSON array and each "chunk" is one item in that array, you could try something like the following JsonStreamingResult class. It uses a JsonTextWriter to write the JSON to the output stream, and uses a JObject as a means to serialize each item individually before writing it to the writer. You could pass the JsonStreamingResult an IEnumerable implementation which can read items individually from your data source so that you don't have them all in memory at once. I haven't tested this extensively, but it should get you going in the right direction.
public class JsonStreamingResult : ActionResult
{
private IEnumerable itemsToSerialize;
public JsonStreamingResult(IEnumerable itemsToSerialize)
{
this.itemsToSerialize = itemsToSerialize;
}
public override void ExecuteResult(ControllerContext context)
{
var response = context.HttpContext.Response;
response.ContentType = "application/json";
response.ContentEncoding = Encoding.UTF8;
JsonSerializer serializer = new JsonSerializer();
using (StreamWriter sw = new StreamWriter(response.OutputStream))
using (JsonTextWriter writer = new JsonTextWriter(sw))
{
writer.WriteStartArray();
foreach (object item in itemsToSerialize)
{
JObject obj = JObject.FromObject(item, serializer);
obj.WriteTo(writer);
writer.Flush();
}
writer.WriteEndArray();
}
}
}

The problem with leaving it up to .NET and wait until the buffer is full has other problems.
For example:
If you do that some of the contents for the json will cut off causing parsing issues on the frontend.
Best approach so far is to flush the batch on each iteration in the event you do use a batch or flush it per single item if that's what your design is for.
Currently i use SSE to push to the data to browser and a delimiter message 'on message end' to indicate to the broswer that the connection can be closed, i know SSE use case is for continuous stream but we can also use it to help with chunking and batching response.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Data truncated and exception from XmlReader when reading Request body using StreamReader - c#

Related

C# WebClient StreamReader continuous read missing last line

C# Read csv from url and save to database

How to read only a specific part from HTTP Response body

MSMQ Save Failed MessageBody to Text File

Streaming large list of data as JSON format using Json.net

Categories

Resources