Search on TextChanged with Reactive Extensions - c#

I was trying to implement instant search on a database table with 10000+ records.
The search starts when the text inside the search text box changes, when the search box becomes empty I want to call a different method that loads all the data.
Also if the user changes the search string while results for another search are being loaded, then the loading of the those results should stop in favor of the new search.
I implemented it like the following code, but I was wondering if there is a better or cleaner way to do it using Rx (Reactive Extension) operators, I feel that creating a second observable inside the subscribe method of the first observable is more imperative than declarative, and the same for that if statement.
var searchStream = Observable.FromEventPattern(s => txtSearch.TextChanged += s, s => txtSearch.TextChanged -= s)
.Throttle(TimeSpan.FromMilliseconds(300))
.Select(evt =>
{
var txtbox = evt.Sender as TextBox;
return txtbox.Text;
}
);
searchStream
.DistinctUntilChanged()
.ObserveOn(SynchronizationContext.Current)
.Subscribe(searchTerm =>
{
this.parties.Clear();
this.partyBindingSource.ResetBindings(false);
long partyCount;
var foundParties = string.IsNullOrEmpty(searchTerm) ? partyRepository.GetAll(out partyCount) : partyRepository.SearchByNameAndNotes(searchTerm);
foundParties
.ToObservable(Scheduler.Default)
.TakeUntil(searchStream)
.Buffer(500)
.ObserveOn(SynchronizationContext.Current)
.Subscribe(searchResults =>
{
this.parties.AddRange(searchResults);
this.partyBindingSource.ResetBindings(false);
}
, innerEx =>
{
}
, () => { }
);
}
, ex =>
{
}
, () =>
{
}
);
The SearchByNameAndNotes method just returns an IEnumerable<Party> using SQLite by reading data from a data reader.

I think you want something like this. EDIT: From your comments, I see you have a synchronous repository API - I'll leave the asynchronous version in, and add a synchronous version afterwards. Notes inline:
Asynchronous Repository Version
An asynchronous repository interface could be something like this:
public interface IPartyRepository
{
Task<IEnumerable<Party>> GetAllAsync(out long partyCount);
Task<IEnumerable<Party>> SearchByNameAndNotesAsync(string searchTerm);
}
Then I refactor the query as:
var searchStream = Observable.FromEventPattern(
s => txtSearch.TextChanged += s,
s => txtSearch.TextChanged -= s)
.Select(evt => txtSearch.Text) // better to select on the UI thread
.Throttle(TimeSpan.FromMilliseconds(300))
.DistinctUntilChanged()
// placement of this is important to avoid races updating the UI
.ObserveOn(SynchronizationContext.Current)
.Do(_ =>
{
// I like to use Do to make in-stream side-effects explicit
this.parties.Clear();
this.partyBindingSource.ResetBindings(false);
})
// This is "the money" part of the answer:
// Don't subscribe, just project the search term
// into the query...
.Select(searchTerm =>
{
long partyCount;
var foundParties = string.IsNullOrEmpty(searchTerm)
? partyRepository.GetAllAsync(out partyCount)
: partyRepository.SearchByNameAndNotesAsync(searchTerm);
// I assume the intention of the Buffer was to load
// the data into the UI in batches. If so, you can use Buffer from nuget
// package Ix-Main like this to get IEnumerable<T> batched up
// without splitting it up into unit sized pieces first
return foundParties
// this ToObs gets us into the monad
// and returns IObservable<IEnumerable<Party>>
.ToObservable()
// the ToObs here gets us into the monad from
// the IEnum<IList<Party>> returned by Buffer
// and the SelectMany flattens so the output
// is IObservable<IList<Party>>
.SelectMany(x => x.Buffer(500).ToObservable())
// placement of this is again important to avoid races updating the UI
// erroneously putting it after the Switch is a very common bug
.ObserveOn(SynchronizationContext.Current);
})
// At this point we have IObservable<IObservable<IList<Party>>
// Switch flattens and returns the most recent inner IObservable,
// cancelling any previous pending set of batched results
// superceded due to a textbox change
// i.e. the previous inner IObservable<...> if it was incomplete
// - it's the equivalent of your TakeUntil, but a bit neater
.Switch()
.Subscribe(searchResults =>
{
this.parties.AddRange(searchResults);
this.partyBindingSource.ResetBindings(false);
},
ex => { },
() => { });
Synchronous Repository Version
An synchronous repository interface could be something like this:
public interface IPartyRepository
{
IEnumerable<Party> GetAll(out long partyCount);
IEnumerable<Party> SearchByNameAndNotes(string searchTerm);
}
Personally, I don't recommend a repository interface be synchronous like this. Why? It is typically going to do IO, so you will wastefully block a thread.
You might say the client could call from a background thread, or you could wrap their call in a task - but this is not the right way to go I think.
The client doesn't "know" you are going to block; it's not expressed in the contract
It should be the repository that handles the asynchronous aspect of the implementation - after all, how this is best achieved will only be known best by the repository implementer.
Anyway, accepting the above, one way to implement is like this (of course it's mostly similar to the async version so I've only annotated the differences):
var searchStream = Observable.FromEventPattern(
s => txtSearch.TextChanged += s,
s => txtSearch.TextChanged -= s)
.Select(evt => txtSearch.Text)
.Throttle(TimeSpan.FromMilliseconds(300))
.DistinctUntilChanged()
.ObserveOn(SynchronizationContext.Current)
.Do(_ =>
{
this.parties.Clear();
this.partyBindingSource.ResetBindings(false);
})
.Select(searchTerm =>
// Here we wrap the synchronous repository into an
// async call. Note it's simply not enough to call
// ToObservable(Scheduler.Default) on the enumerable
// because this can actually still block up to the point that the
// first result is yielded. Doing as we have here,
// we guarantee the UI stays responsive
Observable.Start(() =>
{
long partyCount;
var foundParties = string.IsNullOrEmpty(searchTerm)
? partyRepository.GetAll(out partyCount)
: partyRepository.SearchByNameAndNotes(searchTerm);
return foundParties;
}) // Note you can supply a scheduler, default is Scheduler.Default
.SelectMany(x => x.Buffer(500).ToObservable())
.ObserveOn(SynchronizationContext.Current))
.Switch()
.Subscribe(searchResults =>
{
this.parties.AddRange(searchResults);
this.partyBindingSource.ResetBindings(false);
},
ex => { },
() => { });

Related

Check time taken by a thread inside Parallel.Foreach and exit if it is taking more time

I have code with Parallel.Foreach which is processing files and doing some operation on each file in parallel.
Parallel.ForEach(lstFiles, file=>
{
// Doing some operation on file
// Skip file and move to next if it is taking too long
});
I want to skip a file and move to next file (but don't want to exit the Parallel.Foreach) if a particular file is taking too long (say 2 mins). Is there any way in Parallel.Foreach to check the time taken by thread to process a single file.
Thanks
I'd suggest you don't use Parallel.ForEach and instead use Mirosoft's extremely more powerful Reactive Framework. Then you can do this:
var query =
from file in lstFiles.ToObservable()
from result in Observable.Amb(
Observable.Start(() => SomeOperation(file)).Select(_ => true),
Observable.Timer(TimeSpan.FromMinutes(2.0)).Select(_ => false))
select new { file, result };
IDisposable subscription =
query
.Subscribe(x =>
{
/* do something with each `new { file, result }`
as they arrive. */
}, ex =>
{
/* do something if an error is encountered */
/* (stops processing on first error) */
}, () =>
{
/* do something if they have all finished successfully */
})
This is all done in parallel. The Observable.Amb operator starts the two observables defined in its argument list and takes the value from which ever of the two produces a value first - if it's the Start observable it has processed your file and if it's the Timer observable then 2.0 minutes has elapsed without a result from the file.
If you want to stop the processing when it is half-way through then just call subscription.Dispose().
Use NuGet "System.Reactive" to get the bits.
The query in lambda form as per request in comments:
var query =
lstFiles
.ToObservable()
.SelectMany(
file =>
Observable.Amb(
Observable.Start(() => SomeOperation(file)).Select(_ => true),
Observable.Timer(TimeSpan.FromMinutes(2.0)).Select(_ => false)),
(file, result) => new { file, result });

How can I bind to an Entry's "focused" event with Reactive?

I'm trying to bind to an text Entry fields "focused" event using reactive but my code is failing to compile.
Here's what I'm doing now, which works fine:
Entry _qty; // at class level
_qty.Focused += (s, e) => { /* do stuff */ };
Attempt
But I'd like to do something like this instead:
// class level
IObservable<string> _qtyFocusObservable;
Entry _qty;
// in a setup function
_qtyFocusObservable =
Observable
.FromEventPattern<EventHandler<FocusEventArgs>>(
x => _qty.Focused += x,
x => _qty.Focused -= x
);
Problem
I've tried quite a few variations of the code above and I get compiler errors saying that the compiler can't implicitly convert from whatever type I specify to System.EventHandler<System.EventHandler<Xamarin.Forms.FocusEventArgs>>, even if the type I specify is indeed System.EventHandler<System.EventHandler<Xamarin.Forms.FocusEventArgs>>.
Question
How do I bind to my Entry's Focused event using reactive?
So to get a basic observable working from an event I usually structure like so:
var focusObservable = Observable.FromEventPattern<EventHandler, FocusEventArgs>(
x => _qty.Focused += x.Invoke,
x => _qty.Focused -= x.Invoke);
Then when I need to do something from that observable event I link a command to it like so:
var doStuffCommand = ReactiveCommand.CreateAsyncTask(DoStuffAsync);
focusObservable.InvokeCommand(doStuffCommand);
With a DoStuffAsync implementation of something like this:
public async Task DoStuffAsync(object value, CancellationToken token = default(CancellationToken))
{
// Do stuff here
}
I'm still fairly new to Reactive as well but this (should?) get you going in the right direction.
Cheers, and happy coding!
So, after a year of using ReactiveUI, this is how I fire an event when focusing an input.
var focusedObservable =
Observable
.FromEventPattern<FocusEventArgs>(
x => _totalBirds.Focused += x,
x => _totalBirds.Focused -= x)
.Select(x => x.EventArgs.IsFocused);
// fires when focused
focusedObservable
.WhenIsTrue() // extension method, basically .Where(x => x == true)
.ObserveOn(RxApp.MainThreadScheduler)
.InvokeCommand(this, x => DoSomething)
.DisposeWith(ControlBindings); // extension that uses composite disposable
// fires when changing state back to unfocused
focusedObservable
.WhenIsFalse() // extension method, basically .Where(x => x == false)
.ObserveOn(RxApp.MainThreadScheduler)
.InvokeCommand(this, x => x.ViewModel.DoSomethingElse)
.DisposeWith(ControlBindings); // extension that uses composite disposable
This is pretty straight forward, if you need to see any additional code, let me know. Also, if you want to snag the .DisposeWith extension you can grab it here.

Reactive Extensions SelectMany with large objects

I have this little piece of code that simulates a flow that uses large objects (that huge byte[]). For each item in the sequence, an async method is invoked to get some result. The problem? As it is, it throws OutOfMemoryException.
Code compatible with LINQPad (C# Program):
void Main()
{
var selectMany = Enumerable.Range(1, 100)
.Select(i => new LargeObject(i))
.ToObservable()
.SelectMany(o => Observable.FromAsync(() => DoSomethingAsync(o)));
selectMany
.Subscribe(r => Console.WriteLine(r));
}
private static async Task<int> DoSomethingAsync(LargeObject lo)
{
await Task.Delay(10000);
return lo.Id;
}
internal class LargeObject
{
public int Id { get; }
public LargeObject(int id)
{
this.Id = id;
}
public byte[] Data { get; } = new byte[10000000];
}
It seems that it creates all the objects at the same time. How can I do it the right way?
The underlying idea is to invoke DoSomethingAsync in order to get some result for each object, so that's why I use SelectMany. To simplify, I just have introduced a Task.Delay, but in real life it is a service that can process some items concurrently, so I want to introduce some concurrency mechanism to get advantage of it.
Please, notice that, theoretically, processing a little number of items at time shouldn't fill the memory. In fact, we only need each "large object" to get the results of the DoSomethingAsync method. After that point, the large object isn't used anymore.
I feel like i'm repeating myself. Similar to your last question and my last answer, what you need to do is limit the number of bigObjects™ to be created concurrent.
To do so, you need to combine object creation and processing and put it on the same thread pool. Now the problem is, we use async methods to allow threads to do other things while our async method run. Since your slow network call is async, your (fast) object creation code will keep creating large objects too fast.
Instead, we can use Rx to keep count of the number of concurrent Observables running by combine the object creation with the async call and use .Merge(maxConcurrent) to limit concurrency.
As a bonus, we can also set a minimal time for queries to execute. Just Zip with something that takes a minimal delay.
static void Main()
{
var selectMany = Enumerable.Range(1, 100)
.ToObservable()
.Select(i => Observable.Defer(() => Observable.Return(new LargeObject(i)))
.SelectMany(o => Observable.FromAsync(() => DoSomethingAsync(o)))
.Zip(Observable.Timer(TimeSpan.FromMilliseconds(400)), (el, _) => el)
).Merge(4);
selectMany
.Subscribe(r => Console.WriteLine(r));
Console.ReadLine();
}
private static async Task<int> DoSomethingAsync(LargeObject lo)
{
await Task.Delay(10000);
return lo.Id;
}
internal class LargeObject
{
public int Id { get; }
public LargeObject(int id)
{
this.Id = id;
Console.WriteLine(id + "!");
}
public byte[] Data { get; } = new byte[10000000];
}
It seems that it creates all the objects at the same time.
Yes, because you are creating them all at once.
If I simplify your code I can show you why:
void Main()
{
var selectMany =
Enumerable
.Range(1, 5)
.Do(x => Console.WriteLine($"{x}!"))
.ToObservable()
.SelectMany(i => Observable.FromAsync(() => DoSomethingAsync(i)));
selectMany
.Subscribe(r => Console.WriteLine(r));
}
private static async Task<int> DoSomethingAsync(int i)
{
await Task.Delay(1);
return i;
}
Running this produces:
1!
2!
3!
4!
5!
4
3
5
2
1
Because of the Observable.FromAsync you are allowing the source to run to completion before any of the results return. In other words you are quickly building all of the large objects, but slowly processing them.
You should allow Rx to run synchronously, but on the default scheduler so that your main thread is not blocked. The code will then run without any memory issues and your program will remain responsive on the main thread.
Here's the code for this:
var selectMany =
Observable
.Range(1, 100, Scheduler.Default)
.Select(i => new LargeObject(i))
.Select(o => DoSomethingAsync(o))
.Select(t => t.Result);
(I've effectively replaced Enumerable.Range(1, 100).ToObservable() with Observable.Range(1, 100) as that will also help with some issues.)
I've tried testing other options, but so far anything that allows DoSomethingAsync to run asynchronously runs into the out of memory error.
ConcatMap supports this out of the box. I know this operator is not available in .net, but you can make the same using Concat operator which defers subscribing to each inner source until the previous one completes.
You can introduce a time interval delay this way:
var source = Enumerable.Range(1, 100)
.ToObservable()
.Zip(Observable.Interval(TimeSpan.FromSeconds(1)), (i, ts) => i)
.Select(i => new LargeObject(i))
.SelectMany(o => Observable.FromAsync(() => DoSomethingAsync(o)));
So instead of pulling all 100 integers at once, immediately converting them to the LargeObject then calling DoSomethingAsync on all 100, it drips the integers out one-by-one spaced out one second each.
This is what a TPL+Rx solution would look like. Needless to say it is less elegant than Rx alone, or TPL alone. However, I don't think this problem is well suited for Rx:
void Main()
{
var source = Observable.Range(1, 100);
const int MaxParallelism = 5;
var transformBlock = new TransformBlock<int, int>(async i => await DoSomethingAsync(new LargeObject(i)),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = MaxParallelism });
source.Subscribe(transformBlock.AsObserver());
var selectMany = transformBlock.AsObservable();
selectMany
.Subscribe(r => Console.WriteLine(r));
}

How to use Rx.NET in a pipeline mixing network and file system IO?

I have the following requirement:
Collect certain information from multiple remote sites.
Serialize the information to disk.
Contact the same sites and acknowledge the data was collected successfully.
This is a very simplified flow, the real flow must also deal with faults and has other aspects, which I think are irrelevant to my question or so it seems for the moment.
Anyway, here is how I implement the described flow:
var data = await GetSitesSource()
.Select(site => Observable
.FromAsync(() => GetInformationFromSiteAsync(site))
.Select(site.MakeKeyValuePair))
.Merge(maxConcurrentSiteRequests)
.ToList();
if (data.Count > 0)
{
var filePath = GetFilePath();
using (var w = new StreamWriter(filePath))
{
await w.WriteAsync(YieldLines(data));
}
var tsUTC = DateTime.UtcNow;
await data.ToObservable()
.Select(o => Observable.FromAsync(() => AckInformationFromSiteAsync(o.Key, tsUTC, o.Value.InformationId)))
.Merge(maxConcurrentSiteRequests);
}
Where:
MakeKeyValuePair is an extension method that returns a KeyValuePair<K,V> instance
YieldLines transforms data into an IEnumerable<string>
WriteAsync is a fictional extension method writing a series of strings to its StreamWriter
It does not seem a good implementation, because I do not leverage the fact that I could have started writing out the records as they come out of the first Merge operator.
I could use SelectMany + Merge(1) operator to asynchronously write out the chunks to the file (the order does not matter), but how do I make sure the respective StreamWriter is initialized only when needed and is properly disposed of? Because if there is no data, I do not even want to initialize the StreamWriter.
My question - how can this code be rewritten, so that the Observable pipeline is not interrupted in the middle to write out the file? It should includes all the three phases:
Get the data from multiple sites
Write the data in chunks one by one, the order does not matter
Acknowledge the data once all the data is written
I haven't tested this but none of your code precludes joining it together. So you could do something like this:
//The ToObservable extension for Task is only available through
using System.Reactive.Threading.Tasks;
GetSitesSource()
.Select(site => Observable
.FromAsync(() => GetInformationFromSiteAsync(site))
.Select(site.MakeKeyValuePair))
.Merge(maxConcurrentSiteRequests)
.ToList()
//Only proceed if we received data
.Where(data => data.Count > 0)
.SelectMany(data =>
//Gives the StreamWriter the same lifetime as this Observable once it subscribes
Observable.Using(
() => new StreamWriter(GetFilePath()),
(w) => w.WriteAsync(YieldLines(data)).ToObservable()),
//We are interested in the original data value, not the write result
(data, _) => data)
//Attach a timestamp of when data passed through here
.Timestamp()
.SelectMany(o=> {
var ts = o.Timestamp;
var data= o.Value;
//This is actually returning IEnumerable<IObservable<T>> but merge
//will implicitly handle it.
return data.Select(i => Observable.FromAsync(() =>
AckInformationFromSiteAsync(i.Key, ts,
i.Value.InformationId)))
.Merge(maxConcurrentSiteRequests);
})
//Handle the return values, fatal errors and the completion of the stream.
.Subscribe();
To more fully answer your question
The Using operator ties a resource which must implement IDisposable to the lifetime of the Observable. The first argument is a factory function that will get called once when the Observable is subscribed to.

Are Parallel.Invoke and Parallel.ForEach essentially the same thing?

And by "same thing" I mean do these two operations basically do the same work, and it just boils down to which one is more convenient to call based on what you have to work with? (i.e. a list of delegates or a list of things to iterate over)? I've been searching MSDN, StackOverflow, and various random articles but I have yet to find a clear answer for this.
EDIT: I should have been clearer; I am asking if the two methods do the same thing because if they do not, I would like to understand which would be more efficient.
Example: I have a list of 500 key values. Currently I use a foreach loop that iterates through the list (serially) and performs work for each item. If I want to take advantage of multiple cores, should I simply use Parallel.ForEach instead?
Let's say for arguments's sake that I had an array of 500 delegates for those 500 tasks - would the net effect be any different calling Parallel.Invoke and giving it a list of 500 delegates?
Parallel.ForEach goes through the list of elements and can perform some task on the elements of the array.
eg.
Parallel.ForEach(val, (array) => Sum(array));
Parallel.Invoke can invoke many functions in parallel.
eg.
Parallel.Invoke(
() => doSum(array),
() => doAvg(array),
() => doMedian(array));
As from the example above, you can see that they are different in functionality. ForEach iterates through a List of elements and performs one task on each element in parallel, while Invoke can perform many tasks in parallel on a single element.
Parallel.Invoke and Parallel.ForEach (when used to execute Actions) function the same, although yes one specifically wants the collection to be an Array. Consider the following sample:
List<Action> actionsList = new List<Action>
{
() => Console.WriteLine("0"),
() => Console.WriteLine("1"),
() => Console.WriteLine("2"),
() => Console.WriteLine("3"),
() => Console.WriteLine("4"),
() => Console.WriteLine("5"),
() => Console.WriteLine("6"),
() => Console.WriteLine("7"),
() => Console.WriteLine("8"),
() => Console.WriteLine("9"),
};
Parallel.ForEach<Action>(actionsList, ( o => o() ));
Console.WriteLine();
Action[] actionsArray = new Action[]
{
() => Console.WriteLine("0"),
() => Console.WriteLine("1"),
() => Console.WriteLine("2"),
() => Console.WriteLine("3"),
() => Console.WriteLine("4"),
() => Console.WriteLine("5"),
() => Console.WriteLine("6"),
() => Console.WriteLine("7"),
() => Console.WriteLine("8"),
() => Console.WriteLine("9"),
};
Parallel.Invoke(actionsArray);
Console.ReadKey();
This code produces this output on one Run. It's output is generally in a different order every time.
0 5 1 6 2 7 3 8 4 9
0 1 2 4 5 6 7 8 9 3
Surprisingly, no, they are not the same thing. Their fundamental difference is on how they behave in case of exceptions:
The Parallel.ForEach (as well as the Parallel.For and the upcoming Parallel.ForEachAsync) fails fast. After an exception has occurred, it does not start any new parallel work, and will return as soon as all the currently running delegates complete.
The Parallel.Invoke invokes invariably all the actions, regardless of whether some (or all) of them failed.
For demonstrating this behavior, lets run in parallel 1,000 actions, with one every three actions failing:
int c = 0;
Action[] actions = Enumerable.Range(1, 1000).Select(n => new Action(() =>
{
Interlocked.Increment(ref c);
if (n % 3 == 0) throw new ApplicationException();
})).ToArray();
try { c = 0; Parallel.For(0, actions.Length, i => actions[i]()); }
catch (AggregateException aex)
{ Console.WriteLine($"Parallel.For, Exceptions: {aex.InnerExceptions.Count}/{c}"); }
try { c = 0; Parallel.ForEach(actions, action => action()); }
catch (AggregateException aex)
{ Console.WriteLine($"Parallel.ForEach, Exceptions: {aex.InnerExceptions.Count}/{c}"); }
try { c = 0; Parallel.Invoke(actions); }
catch (AggregateException aex)
{ Console.WriteLine($"Parallel.Invoke, Exceptions: {aex.InnerExceptions.Count}/{c}"); }
Output (in my PC, .NET 5, Release build):
Parallel.For, Exceptions: 5/12
Parallel.ForEach, Exceptions: 5/11
Parallel.Invoke, Exceptions: 333/1000
Try it on Fiddle.
I'm trying to find a good way of phrasing it; but they are not the same thing.
The reason is, Invoke works on an Array of Action and ForEach works on a List (specifically an IEnumerable) of Action; Lists are significantly different to Arrays in mechanics although they expose the same sort of basic behaviour.
I can't say what the difference actually entails because I don't know, so please don't accept this an an answer (unless you really want too!) but I hope it jogs someones memory as to the mechanisms.
+1 for a good question too.
Edit; It just occurred to me that there is another answer too; Invoke can work on a dynamic list of Actions; but Foreach can work with a Generic IEnumerable of Actions and gives you the ability to use conditional logic, Action by Action; so you could test a condition before saying Action.Invoke() in each Foreach iteration.

Categories