In HttpContent, why ReadFromJsonAsync is an async method?

In HttpContent, why ReadFromJsonAsync is an async method? - c#

I'm having a doubt about HttpContent.ReadFromJsonAsync (link)
In a common way of make a request to an endpoint, this can be done:
var response = await Http.GetAsync(path...);
if (!response.IsSuccessStatusCode)
{
[do something]
}
else
{
myObject = (await response.Content.ReadFromJsonAsync<MyObject>())!;
}
I am having a hard time understanding why when I want to get the object, it is necessary to perform another await operation. In my head, I already got the response in the GetAsync mehotd and what is missing is to deserialize the object only. I understand that the await is not related of converting json to an object but a network thing.
I tried to find out the reason for this behaviour in the official MS doc, but I couldn't find anything.
Searching on google, I found that even though the content of the response has already been received by the time ReadFromJsonAsync is called, the method still needs to read the content of the response from the network, and parse the content in order to deserialize it into the specified object type.
But cannot figure why this is necessary neither "where" the content is "waiting to be read", I know that response.Content.ReadFromJsonAsync didn't make a new network request, so what's going on in the back?
Is it temporarily stored in some socket? (or is it nonsense to think so?) Is there a time limit for reading it?
Thanks!

There are for example overloads for HttpClient.GetAsync that accept an HttpCompletionOption parameter which allows for GetAsync to complete as soon as the response headers have been read while the content of the response has not been completely received.
Therefore, ReadFromJsonAsync being async makes sense, as reading the HttpResponseMessage.Content could become an I/O-bound operation including waiting for and receiving the complete content of the response.

HTTP breaks the response up into multiple frames, and makes a clear distinction between the metadata and the response data. The first set of frames contain the status code and headers, and you can quite often decide what to do with the response based on this information alone.
Mozilla documentation for HTTP Messages
In C#, the response object can be returned from a request as soon as all of the header frames have been received.
By default, the GetAsync method will wait for all the response data to be returned. However there are overloads that allow you to start processing the response as soon as the headers are received.
Why not just wait for all the data in the first place?
The request content could be massive! Imagine you want to download a 4Gb image, and save it to a file on the local PC. If the HTTP implementation waited for all the data frames to be received, you would end up using at least 4Gb of RAM to buffer the data.
Instead of waiting for the data frames, the content is exposed through a Stream. Data is appended to a buffer, and is read from the buffer by the application on demand. Reading from the stream is an asynchronous operation, because you may be waiting for more frames to be received. The key difference here is that the buffer can have a relatively small size limit. If the response contains more data than can fit into the buffer, you'll have to read from the stream multiple times - which is normal use of the Stream API.

Related

Does Kafka allow reading a message content asynchronously?

Anybody knows whether the Kafka clients allows sending and reading the content of a message in an asynchronous way.
I am currently using the Confluent.Kafka producers and consumers in C# which allows making an async call containing the whole message payload, however it would be interesting to publish the value of the message or content of several MBs asynchronously, and being able of reading it asynchronously as well, instead of just receiving the message in one shot.
using (var producer = new ProducerBuilder<string, string>(config).Build())
{
await producer.ProduceAsync(_topic, new Message<string, string> { Key = _file, Value = <pass async content here> });
}
Is anyway of achieving this?
Thanks

The producer needs to flush the event, send to the broker, which gets written to disk and (optionally) ack the entire record before consumers can read it.
If you'd like to stream chunks of files, then you should send them as binary, but you will need to chunk it yourself, and deal with potential ordering problems in the consumer (e.g. two clients are streaming the same filename, your key, at the same time, with interwoven values)
The recommendation for dealing with files (i.e. large binary content) is to not send them through Kafka, but rather upload them to a shared filesystem, then send the URI as a string through an event.

How to crawl XML(s) very fast — considering the below networking limitations?

I have a .Net crawler that's running when the user makes a request (so, it needs to be fast). It crawls 400+ links in real time. (This is the business ask.)
The problem: I need to detect if a link is xml (think of rss or atom feeds) or html. If the link is xml then I continue with processing, but if the link is html I can skip it. Usually, I have 2 xml(s) and 398+ html(s). Currently, I have multiple threads going but the processing is still slow, usually 75 seconds running with 10 threads for 400+ links, or 280 seconds running with 1 thread. (I want to add more threads but see below..)
The challenge that I am facing is that I read the streams as follows:
var request = WebRequest.Create(requestUriString: uri.AbsoluteUri);
// ....
var response = await request.GetResponseAsync();
//....
using (var reader = new StreamReader(stream: response.GetResponseStream(), encoding: encoding)) {
char[] buffer = new char[1024];
await reader.ReadAsync(buffer: buffer, index: 0, count: 1024);
responseText = new string(value: buffer);
}
// parse first byts of reasponseText to check if xml
The problem is that my optimization to get only 1024 is quite useless because the GetResponseAsync is downloading the entire stream anyway, as I see.
(The other option that I have is to look for the header ContentType, but that's quite similar AFAIK because I get the content anyway - in case that you don't recommend to use OPTIONS, that I did not use so far - and in addition xml might be content-type incorrectly marked (?) and I am going to miss some content.)
If there is any optimization that I am missing please help, as I am running out of ideas.
(I do consider to optimize this design by spreading the load on multiple servers, so that I balance the network with the parallelism, but that's a bit of change from the current architecture, that I cannot afford to do at this point in time.)

Using HEAD requests could speed up the requests significantly, IF you can rely on the Content-Type.
e.g
HttpClient client = new HttpClient();
HttpResponseMessage response = await client.SendAsync(new HttpRequestMessage() { Method = HttpMethod.Head});
Just showing basic usage. Obviously you need to add uri and anything else required to the request.
Also just to note that even with 10 threads, 400 request will likely always take quite a while. 400/10 means 40 requests sequentially. Unless the requests are to servers close by then 200ms would be a good response time meaning a minimum of 8 seconds. Ovserseas serves that may be slow could easily push this out to 30-40 seconds of unavoidable delay, unless you increase the amount of threads to parallel more of the requests.
Dataflow (Task Parallel Library) Can be very helpful for writing parallel pipes with a convenient MaxDegreeOfParallelism property for easily adjusting how many parallel instances can be run.

C#, clear HttpListenerContext.Response.OutputStream

I have a web-service using HttpListener.
I have noticed this thing:
HttpListenerContext context = listener.GetContext();
...
context.Response.StatusCode = 200;
context.Response.OutputStream.Write(buffer, 0, bufferSize);
context.Response.StatusCode = 500;
context.Response.OutputStream.Close();
A client in this case receives a status code 200, so if i have wrote some data to the output network stream i can't change the status code, as, i suppose, it is already written to the response stream.
What i want: after i have started writing a response to the output stream, in some case i want to "abort and reset" the response, clear the output stream (so the client won't receive any data in HTTP response body), and change the status code.
I have no idea how to clear the output stream and change the status code. These two lines below won't help, they throw exceptions.
context.Response.OutputStream.SetLength(0);
context.Response.OutputStream.Position = 0;
I suppose, what the program writes buffer data into network device after i call context.Response.OutputStream.Close(), until this the data is stored in RAM and we can reset it, can't we?
EDIT: It seems what writing into the context.Response.OutputStream takes too much of time sometimes, in some case. From 100 to 1000 ms... That's why i would just interrupt writing, if it's possible.

You either could use a MemoryStream to cache the answer, and if you are sure it is complete, set the status to 200 and return it (e.g. with Stream.CopyTo).
You can't "clear" the OutputStream, since it isn't stored (for long), instead it is sent right away to the client, so you can't edit it anymore.
Apart from that, HTTP does not offer a way to gracefully say "DATADATADATA... oh forget that, this was wrong, use the Status Code 500 instead.". You only can try to kill the TCP connection (TCP RST instead of TCP FIN) and hope that the client will handle failing to continue reading on the connection in a suitable way, after it probably already started to process the data you've already sent.
Try context.Response.Abort() before closing, this won't allow you to set a status code, but will at least communicate that something went wrong.

How to determine if an HTTP response is complete

I am working on building a simple proxy which will log certain requests which are passed through it. The proxy does not need to interfere with the traffic being passed through it (at this point in the project) and so I am trying to do as little parsing of the raw request/response as possible durring the process (the request and response are pushed off to a queue to be logged outside of the proxy).
My sample works fine, except for a cannot reliably tell when the "response" is complete so I have connections left open for longer than needed. The relevant code is below:
var request = getRequest(url);
byte[] buffer;
int bytesRead = 1;
var dataSent = false;
var timeoutTicks = DateTime.Now.AddMinutes(1).Ticks;
Console.WriteLine(" Sending data to address: {0}", url);
Console.WriteLine(" Waiting for response from host...");
using (var outboundStream = request.GetStream()) {
while (request.Connected && (DateTime.Now.Ticks < timeoutTicks)) {
while (outboundStream.DataAvailable) {
dataSent = true;
buffer = new byte[OUTPUT_BUFFER_SIZE];
bytesRead = outboundStream.Read(buffer, 0, OUTPUT_BUFFER_SIZE);
if (bytesRead > 0) { _clientSocket.Send(buffer, bytesRead, SocketFlags.None); }
Console.WriteLine(" pushed {0} bytes to requesting host...", _backBuffer.Length);
}
if (request.Connected) { Thread.Sleep(0); }
}
}
Console.WriteLine(" Finished with response from host...");
Console.WriteLine(" Disconnecting socket");
_clientSocket.Shutdown(SocketShutdown.Both);
My question is whether there is an easy way to tell that the response is complete without parsing headers. Given that this response could be anything (encoded, encrypted, gzip'ed etc), I dont want to have to decode the actual response to get the length and determine if I can disconnect my socket.

As David pointed out, connections should remain open for a period of time. You should not close connections unless the client side does that (or if the keep alive interval expires).
Changing to HTTP/1.0 will not work since you are a server and it's the client that will specify HTTP/1.1 in the request. Sure, you can send a error message with HTTP/1.0 as version and hope that the client changes to 1.0, but it seems inefficient.
HTTP messages looks like this:
REQUEST LINE
HEADERS
(empty line)
BODY
The only way to know when a response is done is to search for the Content-Length header. Simply search for "Content-Length:" in the request buffer and extract everything to the linefeed. (But trim the found value before converting to int).
The other alternative is to use the parser in my webserver to get all headers. It should be quite easy to use just the parser and nothing more from the library.
Update: There is a better parser here: HttpParser.cs

If you make a HTTP/1.0 request instead of 1.1, the server should close the connection as soon as it's through since it doesn't need to keep the connection open for another request.
Other than that, you really need to parse the content length header in the response to get the best value.

Using blocking IO and multiple threads might be your answer. Specifically
using(var response = request.GetResponse())
using(var stream = response.GetResponseStream())
using(var reader = new StreamReader(stream)
data = reader.ReadToEnd()
This is for textual data, however binary handling is similar.

SL 4: simple file upload, why is this not working?

I am not getting any exception in the following code, however I also don't see the file which is suppose to be uploaded to the server (in this case a localhost) - could someone please point out the mistake?
As an add on, I need a simple silverlight file uploader with a progress bar, but I am having a really hard time try using the ones on the codeplex, Does anyone here has a good one for SL4?
public FileStream MyFS { get; set; }
private void UploadFile()
{
FileStream _data; // The file stream to be read
_data = MyFS;
string uploadUri;
uploadUri = #"http://localhost/MyApplication/Upload/Images/testXRay.gif";
byte[] fileContent = new byte[_data.Length]; // Read the contents of the stream into a byte array
int dataLength = int.Parse(_data.Length.ToString());
_data.Read(fileContent, 0, dataLength);
WebClient wc = new WebClient();
wc.OpenWriteCompleted += new OpenWriteCompletedEventHandler(wc_OpenWriteCompleted);
Uri u = new Uri(uploadUri);
wc.OpenWriteAsync(u, null, fileContent); // Upload the file to the server
}
void wc_OpenWriteCompleted(object sender, OpenWriteCompletedEventArgs e) // The upload completed
{
if (e.Error == null)
{
// Upload completed without error
}
}
Thanks,
Voodoo

You are trying to write to a server URL that is an image, not a service:
uploadUri = #"http://localhost/MyApplication/Upload/Images/testXRay.gif";
...
Uri u = new Uri(uploadUri);
wc.OpenWriteAsync(u, null, fileContent);
You can't just write a file (via HTTP) to a webserver like that. The receiving URL needs to be a web service designed to accept the incoming byte stream.
I am sure there are better examples about, but try this link first.

Another problem with your code is that you haven't tried to write the file at all.
This line doesn't do what you think:
wc.OpenWriteAsync(u, null,
fileContent); // Upload the file to
the server
The call signature is OpenWriteAsync(URI, HTTPMETHOD, UserToken).
Let me break that down a little. URI I think you have. The HTTPMETHOD let's you set whether you are doing a post or a get. Probably you want to do an HttpPost. Finally that last item isn't for pushing the filecontent. It is more of a state variable so you can keep track of the request (more on this in a moment).
The way the HTTP stack works in Silverlight is that everything is asynchronous. So you in your case you are setting up a request and then telling the runtime that you want to write some data to the request. That is what your call does. It goes out and sets up to let you make a request (which may all happen on a background thread not the thread where the UI gets updated). Once this is set up it will call your callback event with a stream which you can write to. One of the things it sends back to you is that state variable (the UserToken) which gives you the ability to know which request it responded back to you with (which means that you could send multiple files back to the server at the same time).
It will also exposes a few other events that you can use to see if everything worked Ok (like you can get a response from the your call and see what the status code was --which will tell you if everything was successful or not). BTW, with every callback it sends that UserToken variable so your app can keep track of which request was being responded to (if there are more than one going on right now).
The links that the last guy provided should help you out some. He is right too, you need something on the server setup to respond to the request or rather you typically want to do this. You can set up a folder to allow you to push data directly to it, but honestly you don't want to do this as you would be opening up your server for hackers to exploit.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.