Assume I have the following Observable. (Note that the parsing logic lives in a different layer, and should be testable, so it must remain a separate method. Note also that the real loop is parsing XML and has various branching and exception handling).
IObservable<string> GetLinesAsync(StreamReader r)
{
return Observable.Create<string>(subscribeAsync: async (observer, ct) =>
{
//essentially force a continuation/callback to illustrate my problem
await Task.Delay(5);
while (!ct.IsCancellationRequested)
{
string readLine = await r.ReadLineAsync();
if (readLine == null)
break;
observer.OnNext(readLine);
}
});
}
I would like to use this, for example with another Observable that produces the StreamReader, as in the below, but in any case I cannot get the disposal to work.
[TestMethod]
public async Task ReactiveTest()
{
var filePath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Windows), "win.ini");
var source1 = Observable.Using(
() => File.OpenRead(filePath),
readFile => Observable.Using(() => new StreamReader(readFile),
reader => Observable.Return(reader)
)
);
//"GetLinesAsync" already exists. How can I use it?
var combined = source1
.SelectMany(GetLinesAsync);
int count = await combined.Count();
}
If you run this a few times (e.g. with breakpoints, etc), you should see that it blows up because the TextReader is closed. (In my actual problem it happens sporadically on ReadLineAsync but the Task.Delay makes it happen much more easily). Apparently the asynchronous nature causes the first observable to dispose the stream, and only after that does the continuation occur, and of course at that point the stream is already closed.
So:
is the first disposable with the usings set up right? I tried it other ways (see below*)
Is that the right way to do an async Observable (i.e. GetLinesAsync)? Is there anything else I need to do for that?
Is this a proper way to chain the observables together? Assume the GetLinesAsync already exists and if possible its signature shouldn't be changed (e.g. to take in IObservable<StreamReader>)
if this is the right way to glue together the observables, is there any way to get it working with async usage?
*this was another way I set up the first observerable
var source3 = Observable.Create<StreamReader>(observer =>
{
FileStream readFile = File.OpenRead(filePath);
StreamReader reader = new StreamReader(readFile);
observer.OnNext(reader);
observer.OnCompleted();
return new CompositeDisposable(readFile, reader);
});
You really need to make good use of the Defer and Using operators here.
Using is specifically for the case where you have a disposable resource that you would like to have created and finally disposed of when the subscription starts and completes respectively.
Defer is a way to ensure that you always create a new pipeline whenever you have a new subscription (read more on MSDN)
Your second approach is the way to go. You got this 100% right:
Observable.Using(
() => File.OpenRead(filePath),
readFile =>
Observable.Using(
() => new StreamReader(readFile),
reader =>
This will open and dispose of the resources at the correct time for each.
It's what goes before this block of code and what's after the reader => that you need to fix.
After the reader => is this:
Observable
.Defer(() => Observable.FromAsync(() => reader.ReadLineAsync()))
.Repeat()
.TakeWhile(x => x != null)));
That's the idiomatic way for Rx to read from a stream until completion.
The "before" block is just another Defer to ensure that you compute Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Windows), "win.ini") with each new subscriber. It's not necessary in this case because we know that the filePath won't change, but it's good practice and quite probably crucial when this value can change.
Here's the full code:
public async Task ReactiveTest()
{
IObservable<string> combined =
Observable.Defer(() =>
{
var filePath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Windows), "win.ini");
return
Observable.Using(
() => File.OpenRead(filePath),
readFile =>
Observable.Using(
() => new StreamReader(readFile),
reader =>
Observable
.Defer(() => Observable.FromAsync(() => reader.ReadLineAsync()))
.Repeat()
.TakeWhile(x => x != null)));
});
int count = await combined.Count();
}
I've tested it and it works superbly.
Given that you have a fixed signature for GetLines you can do this:
public IObservable<string> GetLines(StreamReader reader)
{
return Observable
.Defer(() => Observable.FromAsync(() => reader.ReadLineAsync()))
.Repeat()
.TakeWhile(x => x != null);
}
public async Task ReactiveTest()
{
IObservable<string> combined =
Observable.Defer(() =>
{
var filePath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Windows), "win.ini");
return
Observable.Using(
() => File.OpenRead(filePath),
readFile =>
Observable.Using(
() => new StreamReader(readFile),
GetLines));
});
int count = await combined.Count();
}
It also works and was tested.
The problem you are having is that your sequences return a single item, the reader. Making use of the reader, requires the file stream to be open. The file stream is unfortunately closed immediately after the stream reader is created:
StreamReader reader is created
OnNext(reader) is called
using block exits, disposing of stream
OnComplete is called, terminating the subscription
Oops!
To fix this, you must tie the lifetime of the StreamReader to the lifetime of the consumer rather than the producer. The original fault occurs because Observable.Using disposes the resource as soon as OnCompleted is called upon the source.
// Do not dispose of the reader when it is created
var readerSequence = Observable.Return(new StreamReader(ms));
var combined = readerSequence
.Select(reader =>
{
return Observable.Using(() => reader, resource => GetLines(resource));
})
.Concat();
I'm not a massive fan of this as you now rely on your consumer cleaning up the each StreamReader but I'm yet to formulate a better way!
So far this is the only thing that has worked while allowing me to keep using GetLinesAsync:
//"GetLinesAsync" already exists. How can I use it?
[TestMethod]
public async Task ReactiveTest2()
{
var combined2 = Observable.Create<string>(async observer =>
{
var filePath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Windows), "win.ini");
using (FileStream readFile = File.OpenRead(filePath))
{
using (StreamReader reader = new StreamReader(readFile))
{
await GetLinesAsync(reader)
.ForEachAsync(result => observer.OnNext(result));
}
}
});
int count = await combined2.Count();
}
This does not work reliably:
[TestMethod]
public async Task ReactiveTest3()
{
var filePath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Windows), "win.ini");
var source1 = Observable.Defer(() => Observable.Using(
() => File.OpenRead(filePath),
readFile => Observable.Using(() => new StreamReader(readFile),
reader => Observable.Return(reader)
)
));
//"GetLines" already exists. How can I use it?
var combined = source1
.SelectMany(reader => Observable.Defer(() => GetLinesAsync(reader)));
int count = await combined.Count();
}
It only seems to work if there's one Observable, as per Enigmativity's solution:
[TestMethod]
public async Task ReactiveTest4()
{
var filePath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Windows), "win.ini");
var source1 = Observable.Using(
() => File.OpenRead(filePath),
readFile => Observable.Using(() => new StreamReader(readFile),
reader => GetLinesAsync(reader)
)
);
int count = await source1.Count();
}
But I haven't found a way to preserve the separation between the two Observables (which I do for layering and unit test purposes) and have it work right, so I don't consider the question answered.
Related
I have gone through many Stackoverflow threads but I am still not sure why this is taking so long. I am using VS 2022 and it's a console app and the target is .NET 6.0. There is just one file Program.cs where the function and the call to the function is coded.
I am making a GET call to an external API. Since that API returns 10000+ rows, I am trying to call my method that calls this API, 2-3 times in Parallel. I also try to update this Concurrent dictionary object that is declared at the top, which I then use LINQ to show some summaries on the UI.
This same external GET call on Postman takes less than 30 seconds but my app takes for ever.
Here is my code. Right now this entire code is in Program.cs of a Console application. I plan to move the GetAPIData() method to a class library after this works.
static async Task GetAPIData(string url, int taskNumber)
{
var client = new HttpClient();
var serializer = new JsonSerializer();
client.Timeout = TimeSpan.FromMilliseconds(Timeout.Infinite);
var bearerToken = "xxxxxxxxxxxxxx";
client.DefaultRequestHeaders.Add("Authorization", $"Bearer {bearerToken}");
using (var stream = await client.GetStreamAsync(url).ConfigureAwait(false))
using (var sr = new StreamReader(stream))
using (JsonTextReader reader = new JsonTextReader(sr))
{
reader.SupportMultipleContent = true;
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject)
{
bag.Add(serializer.Deserialize<Stats?>(reader));
}
}
}
}
My calling code:
var taskNumber=1;
var url = "https://extrenalsite/api/stream";
var bag = ConcurrentBag<Stats>();
var task1 = Task.Run(() => GetAPIData(url, bag, taskNumber++));
var task2 = Task.Run(() => GetAPIData(url, bag, taskNumber++));
var task3 = Task.Run(() => GetAPIData(url, bag, taskNumber++));
await Task.WhenAll(task1, task2, task3);
Please let me know why it is taking way too long to execute when I have spawned 3 threads and why it's slow.
Thanks.
If it is not necessary to immediately write to the collection, you can improve performance by collecting the results locally in the task method. This does not need to do thread synchronisation and will therefore be faster:
static async Task<List<Stats>> GetAPIData(string url, int taskNumber)
{
var result = new List<Stats>();
// your code, but instead of writing to `bag`:
list.Add(serializer.Deserialize<Stats>(reader));
// ...
return list;
}
var task1 = Task.Run(() => GetAPIData(url, taskNumber++));
var task2 = Task.Run(() => GetAPIData(url, taskNumber++));
var task3 = Task.Run(() => GetAPIData(url, taskNumber++));
await Task.WhenAll(task1, task2, task3);
var allResults = task1.Result.Concat(task2.Result).Concat(task3.Result);
I'm monitoring a directory with the following setup:
var folder = new Subject();
folder.OnNext("somepath");
folder.SelectMany(FileMonitor)
.Subscribe(x => Console.WriteLine($"Found: {x}"));
public IObservable<string> FileMonitor(string pathToWatch){
return Observable.Create<string>(obs => {
var dfs = CreateAndStartFileWatcher(pathToWatch,obs);
() => dfs.Dispose();
});
}
This works, but if I pass a new path to the subject, the previous FileMonitor is not disposed.
Is there a way to cancel/dispose the previously generated Observable?
It looks like I need: http://reactivex.io/documentation/operators/switch.html but this is not implemented in c#?
Sometimes, asking a question gives yourself new insights.
The solution is to use switch which is available, but only works on a Observable.
So it should be:
var folder = new Subject();
folder.OnNext("somepath");
folder.Select(FileMonitor)
.Switch()
.Subscribe(x => Console.WriteLine($"Found: {x}"));
public IObservable<string> FileMonitor(string pathToWatch){
return Observable.Create<string>(obs => {
var dfs = CreateAndStartFileWatcher(pathToWatch,obs);
() => dfs.Dispose();
});
}
Leaving this question for reference instead of removing it.
I'm looking to process the results of a long-lived HTTP connection from a server I am integrating with as they happen. This server returns one line of JSON (\n delimited) per "event" I wish to process.
Given an instance of Stream assigned to the variable changeStream that represents bytes from the open HTTP connection, here's an extracted example of what I'm doing:
(request is an instance of WebRequest, configured to open a connection to the server I am integrating with.)
var response = request.GetResponse();
var changeStream = response.GetResponseStream();
var lineByLine = Observable.Using(
() => new StreamReader(changeStream),
(streamReader) => streamReader.ReadLineAsync().ToObservable()
);
lineByLine.Subscribe((string line) =>
{
System.Console.WriteLine($"JSON! ---------=> {line}");
});
Using the code above, what ends up happening is I receive the first line that the server sends, but then none after that. Neither ones from the initial response, nor new ones generated by real time activity.
For the purposes of my question, assume this connection will remain open indefinitely.
How do I go about having system.reactive trigger a callback for each line as I encounter them?
Please note: This scenario is not a candidate for switching to SignalR
Even though this looks more complicated, it is better to use the built-in operators to make this work.
IObservable<string> query =
Observable
.FromAsync(() => request.GetResponseAsync())
.SelectMany(
r => Observable.Using(
() => r,
r2 => Observable.Using(
() => r2.GetResponseStream(),
rs => Observable.Using(
() => new StreamReader(rs),
sr =>
Observable
.Defer(() => Observable.Start(() => sr.ReadLine()))
.Repeat()
.TakeWhile(w => w != null)))));
It's untested, but it should be close.
Your attempt will only observe a single ReadLineAsync call. Instead you need to return each line. Probably something like this;
Observable.Create<string>(async o => {
var response = await request.GetResponseAsync();
var changeStream = response.GetResponseStream();
using var streamReader = new StreamReader(changeStream);
while (true)
{
var line = await streamReader.ReadLineAsync();
if (line == null)
break;
o.OnNext(line);
}
o.OnCompleted();
});
Whats the best way to a parallel processing in c# with some async methods.
Let me explain with some simple code
Example Scenario: We have a person and 1000 text files from them. we want to check that his text files does not contain sensitive keywords, and if one of his text files contains sensitive keywords, we mark him with the untrusted. The method which check this is an async method, and as fast as we found one of the sensitive keywords further processing is not required and checking loop must be broke for that person.
For the best performance and making it so fast, we must use Parallel processing
simple psudocode:
boolean sesitivedetected=false;
Parallel.ForEach(textfilecollection,async (textfile,parallelloopstate)=>
{
if (await hassensitiveasync(textfile))
{
sensitivedetected=true;
parallelloopstate.break()
}
}
if (sensitivedetected)
markuntrusted(person)
Problem is that Parallel.ForEach don't wait until completion of async tasks so statement if (sensitivedetected) is runned as soon as creating task are finished.
I read other Questions like write parallel.for with async and async/await and Parallel.For and lots of other pages.
This topics are usefull when you need the results of async methods to be collected and used later, but in my scenario execution of loop should be ended as soon as possible.
Update: Sample code:
Boolean detected=false;
Parallel.ForEach(UrlList, async (url, pls) =>
{
using (HttpClient hc = new HttpClient())
{
var result = await hc.GetAsync(url);
if ((await result.Content.ReadAsStringAsync()).Contains("sensitive"))
{
detected = true;
pls.Break();
}
}
});
if (detected)
Console.WriteLine("WARNING");
The simplest way to achieve what you need (and not what you want, because Threading is evil). Is to use ReactiveExtensions.
var firstSensitive = await UrlList
.Select(async url => {
using(var http = new HttpClient()
{
var result = await hc.GetAsync(url);
return await result.Content.ReadAsStringAsync();
}
})
.SelectMany(downloadTask => downloadTask.ToObservable())
.Where(result => result.Contains("sensitive"))
.FirstOrDefaultAsync();
if(firstSensitive != null)
Console.WriteLine("WARNING");
To limit the number of concurrent HTTP queries :
int const concurrentRequestLimit = 4;
var semaphore = new SemaphoreSlim(concurrentRequestLimit);
var firstSensitive = await UrlList
.Select(async url => {
await semaphore.WaitAsync()
try
using(var http = new HttpClient()
{
var result = await hc.GetAsync(url);
return await result.Content.ReadAsStringAsync();
}
finally
semaphore.Release();
})
.SelectMany(downloadTask => downloadTask.ToObservable())
.Where(result => result.Contains("sensitive"))
.FirstOrDefaultAsync();
if(firstSensitive != null)
Console.WriteLine("WARNING");
I'm using Rx to read from a NetworkStream and provide the results as a Hot Observable.
Even if the query works great, I'm not sure if the condition to complete the sequence based on the NetworkStream is the most appropriate. I have cases where the sequence completes and the TcpListener on the other side has not finished or closed the connection.
Here is the query. I will appreciate to get some suggestions about the right condition to safety terminate the sequence:
private IDisposable GetStreamSubscription(TcpClient client)
{
return Observable.Defer(() => {
var buffer = new byte[client.ReceiveBufferSize];
return Observable.FromAsync<int>(() => {
return client.GetStream ().ReadAsync (buffer, 0, buffer.Length);
})
.SubscribeOn(NewThreadScheduler.Default)
.Select(x => buffer.Take(x).ToArray());
})
.Repeat()
.TakeWhile(bytes => bytes.Any()) //This is the condition to review
.Subscribe(bytes => {
//OnNext Logic
}, ex => {
//OnError logic
}, () => {
//OnCompleted Logic
});
}
Just to be clear about my question, I need to know the best way to detect when a Network Stream is completed on the other side (because of a disconnect, an error, or whatever). Right now I'm doing it by invoking ReadAsync until no bytes are returned, but I don't know if this is completely safe.
Does this do what you want?
private IDisposable GetStreamSubscription(TcpClient client)
{
return Observable
.Defer(() =>
{
var buffer = new byte[client.ReceiveBufferSize];
return Observable.Using(
() => client.GetStream(),
st => Observable.While(
() => st.DataAvailable,
Observable.Start(() =>
{
var bytes = st.Read(buffer, 0, buffer.Length);
return buffer.Take(bytes).ToArray();
})));
})
.SubscribeOn(NewThreadScheduler.Default)
.Subscribe(bytes => {
//OnNext Logic
}, ex => {
//OnError logic
}, () => {
//OnCompleted Logic
});
}
Note that this safely disposes of the stream and concludes when there is no longer any data available.