Greetings good people,
I have a rather strange issue (for me at least) regarding WebClient and reading data for a continuous data stream and I’m not really sure where the issue is. The stream receives data almost as expected, except the last row. But when new data arrives, the unprinted row prints above the new data.
For example, a set of lines is retreived and it could look like this:
<batch name="home">
<event id="1"/>
And when the next set arrives, it contains the missing end block from the above set:
</batch>
<batch name="home">
<event id="2"/>
The code presented is simplified, but hopefully is enough for getting a clearer picture.
WebClient _client = new WebClient();
_client.OpenReadCompleted += (sender, args) =>
{
using (var reader = new StreamReader(args.Result))
{
while (!reader.EndOfStream)
{
Console.WriteLine(reader.ReadLine());
}
}
};
_client.OpenReadAsync(new Uri("localhost:1234/testdata?keep=true"));
In this setup the reader.EndOfStream never gets to true because the stream doesn't end.
Anyone have a suggestion on how to retrieve the last line? Am I missing something or could the fault be with the API?
Kind regards :)
It seems there's simply no newline character after the batch element. In XML whitespace, including newlines, isn't significant so no newlines are required. XML doesn't allow multiple root elements though, which makes this scenario a bit weird.
In streaming scenarios it's common to send each message unindented (ie in a single line) and send either a newline or another uncommon character to mark the end of the message. One would expect either no newlines at all, or a newline after each batch, eg :
<batch name="home"><event id="1"/>...</batch>
<batch name="home"><event id="2"/>...</batch>
<batch name="home"><event id="3"/>...</batch>
In that case you could use just a ReadLine to read each message:
var client=new HttpClient();
using var stream=client.GetStreamAsync(serviceUrl);
using var reader=new StreamReader(stream);
while(true)
{
var msg=reader.ReadLine();
var doc=XDocument.Parse(msg);
...
}
Without another way to identify each message though, you'll have to read each element form the stream. Luckily, LINQ-to-XML makes it a bit easier to read elements :
using var reader=XmlReader.Create(stream,new XmlReaderSettings{ConformanceLevel = ConformanceLevel.Fragment});
while (reader.Read())
{
switch (reader.NodeType)
{
case XmlNodeType.Element:
if (reader.Name == "batch") {
XElement el = XElement.ReadFrom(reader) as XElement;
//Process the batch!
}
break;
}
}
Related
I have a request that calls a post method. It is posting XML in the request content (but sending it as raw text). In testing, the length of the xml is 106880 characters.
In the wep api post method, I process the request body to pull out the XML and store each element/value in a dictionary using the following:
var stream = new System.IO.StreamReader(Request.Body);
XmlReaderSettings settings = new XmlReaderSettings() { Async = true };
using (XmlReader r = XmlReader.Create(stream, settings))
{
bool rowsExist = true;
while (rowsExist && await r.ReadAsync())
{
if (nodeType == r.NodeType)
{
var name = r.Name;
rowsExist = await r.ReadAsync();
if (r.NodeType == XmlNodeType.Text)
{
xmlDic[name] = r.Value;
}
}
}
}
This works fine with small XML, however when the text value is relatively large, when calling the second ReadAsync method, the data is truncated and the XmlReader throws an exception saying
"Synchronous operations are disallowed. Call ReadAsync or set AllowSynchronousIO to true instead."
The exception makes no sense because ReadAsync is being called, but appears to be related to the size of the data, as it wasn't doing it with a smaller set of XML.
I tested a workaround which is to read the entire request body into a string, and then run the XmlReader using the entire body. However, that does use up more memory as it is loading the entire request into memory first, something that shouldn't be necessary.
I wondered if there might be a default max size/limit that stream or XmlReader uses, and see that the XmlReader settings class has 2 properties that control the Max characters:
settings.MaxCharactersFromEntities
settings.MaxCharactersInDocument
However the first has a default set to 10000000, which is way more than I am posting, and the second is set to zero, which means no limit. As a reault, these don't appear to make any difference.
What could be causing this to fail when reading the body using a StreamReader?
I am writing a program about job interview. Everything is working properly, except one thing. When I use an outside method TotalLines (where I have seperate StreamReader), it is working properly, but when I am calculating a number of totalLines in the program, I am receiving one question mark on the beginning of the first question. So it is like that:
?What is your name?
but in the text file from which I am reading, I have just - What is your name?
I have no idea why is that. Maybe it is problem with that I am returning StreamReader to beginning? I checked my encoding, everything, but nothing worked. Thanks for your help :)
PotentialEmployee potentialEmployee = new PotentialEmployee();
using (StreamReader InterviewQuestions = new StreamReader(text, Encoding.Unicode))
{
int totalLines = 0;
while (InterviewQuestions.ReadLine() != null)
{
totalLines++;
}
InterviewQuestions.DiscardBufferedData();
InterviewQuestions.BaseStream.Seek(0, SeekOrigin.Begin);
for (int numberOfQuestions = 0; numberOfQuestions < totalLines; numberOfQuestions++)
{
string question = InterviewQuestions.ReadLine();
Console.WriteLine(question);
string response = Console.ReadLine();
potentialEmployee.Responses.Add(question, response);
}
}
But when I have a TotalLines calculation in the outside method, the question mark does not show. Any ideas plase?
It's very likely that the file starts with a byte order mark (BOM) which is being ignored by the reader initially, but then not when you "rewind" the stream.
While you could create a new reader, or even just replace it after reading it, I think it would be better to just avoid reading the file twice to start with:
foreach (var question in File.ReadLines(text, Encoding.Unicode))
{
Console.WriteLine(question);
string response = Console.ReadLine();
potentialEmployee.Responses.Add(question, response);
}
That's shorter, simpler, more efficient code that also won't display the problem you asked about.
If you want to make sure you can read the whole file before asking any questions, that's easy too:
string[] questions = File.ReadAllLines(text, Encoding.Unicode);
foreach (var question in questions)
{
Console.WriteLine(question);
string response = Console.ReadLine();
potentialEmployee.Responses.Add(question, response);
}
Whenever you seek your stream to the beginning, the Byte Order Mark (BOM) is not read again, it's only done the first time after you create a stream reader with Encoding specified.
In order for the BOM to be read correctly again, you need to create a new stream reader. However, you can reuse the stream if you instruct the stream reader to keep the stream open after the reader is disposed, but be sure to seek before you create a new reader.
String s="aasddd??dsfas?df";
s.replace('?','\0');
I'm trying to get data from a csv-file from a Webservice.
If i paste the url in my browser, the csv will be downloaded and look like the following example:
"ID","ProductName","Company"
"1","Apples","Alfreds futterkiste"
"2","Oranges","Alfreds futterkiste"
"3","Bananas","Alfreds futterkiste"
"4","Salad","Alfreds futterkiste"
...next 96 rows
However I don't want to download the csv-file first and then extract data from it afterwards.
The webservice uses pagination and returns 100 rows (determined by the &num-parameter with a max of 100). After the first request i can use the &next-parameter to fetch the next 100 rows based on ID. For instance the url
http://testWebservice123.com/Example.csv?auth=abc&number=100&next=100
will get me rows from ID 101 to 200. So if there are a lot of rows i would end up downloading a lot of csv-files and saving them to the harddrive. So instead of downloading the csv-files first and saving them hdd to I want to get data directly from the webservice to be able to write directly to a database without saving the csv-files.
After a bit of search I came up with the following solution
static void Main(string[] args)
{
string startUrl = "http://testWebservice123.com/Example.csv?auth=abc&number=100";
string url = "";
string deltaRequestParameter = "";
string lastLine;
int numberOfLines = 0;
do
{
url = startUrl + deltaRequestParameter;
WebClient myWebClient = new WebClient();
using (Stream myStream = myWebClient.OpenRead(url))
{
using (StreamReader sr = new StreamReader(myStream))
{
numberOfLines = 0;
while (!sr.EndOfStream)
{
var row = sr.ReadLine();
var values = row.Split(',');
//do whatever with the rows by now - i.e. write to console
Console.WriteLine(values[0] + " " + values[1]);
lastLine = values[0].Replace("\"", ""); //last line in the loop - get the last ID.
numberOfLines++;
deltaRequestParameter = "&next=" + lastLine;
}
}
}
} while (numberOfLines == 101); //since the header is returned each time the number of rows will be 101 until we get to the last request
}
but im not sure if this is an "up to date" way of doing this, or if there is a better way (easier/simpler)? In other words i'm insecure about whether using WebClient and StreamReader is the right way to go?
In this thread: how to read a csv file from a url?
WebClient.DownloadString is mentioned as well as WebRequest. But if I want to write to a database without saving csv to hdd which is the best option?
Furhtermore - will the approach I have taken save data to a temporary disk storage behind the scenes or will all data be read into memmory and then disposed when the loop completes?
I have read the following documentation but can't seem to find out what it does behind the scenes:
StreamReader: https://learn.microsoft.com/en-us/dotnet/api/system.io.streamreader?view=netframework-4.7.2
Stream: https://learn.microsoft.com/en-us/dotnet/api/system.io.stream?view=netframework-4.7.2
Edit:
I guess I could also be using the following "TextFieldParser"...but my questions is really still the same:
(using the Assembly Microsoft.VisualBasic)
using (Stream myStream = myWebClient.OpenRead(url))
{
using (TextFieldParser parser = new TextFieldParser(myStream))
{
numberOfLines = 0;
parser.TrimWhiteSpace = true; // if you want
parser.Delimiters = new[] { "," };
parser.HasFieldsEnclosedInQuotes = true;
while (!parser.EndOfData)
{
string[] line = parser.ReadFields();
Console.WriteLine(line[0].ToString() + " " + line[1].ToString());
numberOfLines++;
deltaRequestParameter = "&next=" + line[0].ToString();
}
}
}
The HttpClient class on System.Web.Http is available as of .Net 4.5. You have to work with async code, but it's not a bad idea to get into it if you're dealing with the web.
As sample data, I'll use jsonplaceholder's "todo" list. It provides json data, not csv data, but it gives a simple enough structure that can serve our purpose in the example below.
This is the core function, which fetches from jsonplaceholder in a similar way to your "testWebService123" site, although I'm just getting the first 3 todo's, as opposed to testing for when I've hit the last page (you would probably keep your do-while) logic on that one.
async void DownloadPagesAsync() {
for (var i = 1; i < 3; i++) {
var pageToGet = $"https://jsonplaceholder.typicode.com/todos/{i}";
using (var client = new HttpClient())
using (HttpResponseMessage response = await client.GetAsync(pageToGet))
using (HttpContent content = response.Content)
using (var stream = (MemoryStream) await content.ReadAsStreamAsync())
using (var sr = new StreamReader(stream))
while (!sr.EndOfStream) {
var row =
sr.ReadLine()
.Replace(#"""", "")
.Replace(",", "");
if (row.IndexOf(":") == -1)
continue;
var values = row.Split(':');
Console.WriteLine($"{values[0]}, {values[1]}");
}
}
}
This is how you would call the function, such as you would in a Main() method:
Task t = new Task(DownloadPagesAsync);
t.Start();
The new task, here is taking in an "action", or or in other words a function that returns void, as a parameter. Then you start the task. Be careful, it is asynchronous, so any code you have after t.Start() may very well run before your task completes.
As to your question as to whether the stream reads "in memory" or not, running GetType() on "stream" in the code resulted in a "MemoryStream" type, though it seems to only be recognized as a "Stream" object at compile time. A MemoryStream is definately in-memory. I'm not really sure if any of the other kinds of stream objects save temporary files behind the scenes, but I'm leaning towards not.
But looking into the inner workings of a class, though commendable, is not usually required for your anxiety about disposing. For any class, just see if it implements IDisposable. If it does, then put in in a "using" statement, as you have done in your code. When the program terminates, as expected or via error, the program will implement the proper disposures after control has passed out of the "using" block.
HttpClient is in fact the newer approach. From what I understand, it does not replace all of the functionality for WebClient, but is stronger in many respects. See this SO site for more details comparing the two classes.
Also, something to know about WebClient is that it can be simple, but limiting. If you run into issues, you will need to look into the HttpWebRequest class, which is a "lower level" class that gives you greater access to the nuts and bolts of things (such as working with cookies).
So I'm doing a project where I am reading in a config file. The config file is just a list of string like "D 1 1", "C 2 2", etc. Now I haven't ever done a read/write in C# so I looked it up online expecting to find some sort of rendition of C/C++ .eof(). I couldn't find one.
So what I have is...
TextReader tr = new StreamReader("/mypath");
Of all the examples online of how I found to read to the end of a file the two examples that kept occurring were
while ((line = tr.ReadLine() != null)
or
while (tr.Peek() >= 0)
I noticed that StreamReader has a bool EndOfStream but no one was suggesting it which led me to believe something was wrong with that solution. I ended up trying it like this...
while (!(tr as StreamReader).EndOfStream)
and it seems to work just fine.
So I guess my question is would I experience issues with casting a TextReader as a StreamReader and checking EndOfStream?
One obvious downside is that it makes your code StreamReader specific. Given that you can easily write the code using just TextReader, why not do so? That way if you need to use a StringReader (or something similar) for unit tests etc, there won't be any difficulties.
Personally I always use the "read a line until it's null" approach - sometimes via an extension method so that I can use
foreach (string line in reader.EnumerateLines())
{
}
EnumerateLines would then be an extension method on TextReader using an iterator block. (This means you can also use it for LINQ etc easily.)
Or you could use ReadAllLines, to simplify your code:
http://msdn.microsoft.com/en-us/library/s2tte0y1.aspx
This way, you let .NET take care of all the EOF/EOL management, and you focus on your content.
No you wont experience any issue's. If you look at the implementation if EndToStream, you'll find that it just checks if there is still data in the buffer and if not, if it can read more data from the underlying stream:
public bool EndOfStream
{
get
{
if (this.stream == null)
{
__Error.ReaderClosed();
}
if (this.charPos < this.charLen)
{
return false;
}
int num = this.ReadBuffer();
return num == 0;
}
}
Ofcourse casting in your code like that makes it dependend on StreamReader being the actual type of your reader which isn't pretty to begin with.
Maybe read it all into a string and then parse it: StreamReader.ReadToEnd()
using (StreamReader sr = new StreamReader(path))
{
//This allows you to do one Read operation.
string contents = sr.ReadToEnd());
}
Well, StreamReader is a specialisation of TextReader, in the sense that StreamReader inherits from TextReader. So there shouldn't be a problem. :)
var arpStream = ExecuteCommandLine(cmd, arg);
arpStream.ReadLine(); // Read entries
while (!arpStream.EndOfStream)
{
var line1 = arpStream.ReadLine().Trim();
// TeststandInt.SendLogPrint(line, true);
}
I am sending mails (in asp.net ,c#), having a template in text file (.txt) like below
User Name :<User Name>
Address : <Address>.
I used to replace the words within the angle brackets in the text file using the below code
StreamReader sr;
sr = File.OpenText(HttpContext.Current.Server.MapPath(txt));
copy = sr.ReadToEnd();
sr.Close(); //close the reader
copy = copy.Replace(word.ToUpper(),"#" + word.ToUpper()); //remove the word specified UC
//save new copy into existing text file
FileInfo newText = new FileInfo(HttpContext.Current.Server.MapPath(txt));
StreamWriter newCopy = newText.CreateText();
newCopy.WriteLine(copy);
newCopy.Write(newCopy.NewLine);
newCopy.Close();
Now I have a new problem,
the user will be adding new words within an angle, say for eg, they will be adding <Salary>.
In that case i have to read out and find the word <Salary>.
In other words, I have to find all the words, that are located with the angle brackets (<>).
How do I do that?
Having a stream for your file, you can build something similar to a typical tokenizer.
In general terms, this works as a finite state machine: you need an enumeration for the states (in this case could be simplified down to a boolean, but I'll give you the general approach so you can reuse it on similar tasks); and a function implementing the logic. C#'s iterators are quite a fit for this problem, so I'll be using them on the snippet below. Your function will take the stream as an argument, will use an enumerated value and a char buffer internally, and will yield the strings one by one. You'll need this near the start of your code file:
using System.Collections.Generic;
using System.IO;
using System.Text;
And then, inside your class, something like this:
enum States {
OUT,
IN,
}
IEnumerable<string> GetStrings(TextReader reader) {
States state=States.OUT;
StringBuilder buffer;
int ch;
while((ch=reader.Read())>=0) {
switch(state) {
case States.OUT:
if(ch=='<') {
state=States.IN;
buffer=new StringBuilder();
}
break;
case States.IN:
if(ch=='>') {
state=States.OUT;
yield return buffer.ToString();
} else {
buffer.Append(Char.ConvertFromUtf32(ch));
}
break;
}
}
}
The finite-state machine model always has the same layout: while(READ_INPUT) { switch(STATE) {...}}: inside each case of the switch, you may be producing output and/or altering the state. Beyond that, the algorithm is defined in terms of states and state changes: for any given state and input combination, there is an exact new state and output combination (the output can be "nothing" on those states that trigger no output; and the state may be the same old state if no state change is triggered).
Hope this helps.
EDIT: forgot to mention a couple of things:
1) You get a TextReader to pass to the function by creating a StreamReader for a file, or a StringReader if you already have the file on a string.
2) The memory and time costs of this approach are O(n), with n being the length of the file. They seem quite reasonable for this kind of task.
Using regex.
var matches = Regex.Matches(text, "<(.*?)>");
List<string> words = new List<string>();
for (int i = 0; i < matches.Count; i++)
{
words.Add(matches[i].Groups[1].Value);
}
Of course, this assumes you already have the file's text in a variable. Since you have to read the entire file to achieve that, you could look for the words as you are reading the stream, but I don't know what the performance trade off would be.
This is not an answer, but comments can't do this:
You should place some of your objects into using blocks. Something like this:
using(StreamReader sr = File.OpenText(HttpContext.Current.Server.MapPath(txt)))
{
copy = sr.ReadToEnd();
} // reader is closed by the end of the using block
//remove the word specified UC
copy = copy.Replace(word.ToUpper(), "#" + word.ToUpper());
//save new copy into existing text file
FileInfo newText = new FileInfo(HttpContext.Current.Server.MapPath(txt));
using(var newCopy = newText.CreateText())
{
newCopy.WriteLine(copy);
newCopy.Write(newCopy.NewLine);
}
The using block ensures that resources are cleaned up even if an exception is thrown.