Constructing an IEnumerable based on a collection of collections - c#

Good afternoon,
I'm having a highly entertaining time trying to convince the 'yield' keyword to function in a way that I can understand, but unfortunately I'm not having much luck. Here's the scenario:
Based on the property of a user, I want to look up a set of RSS feed addresses and display the seven most recent articles from all of those feeds, not each of those feeds. To do so, I'm trying to build a collection of the five most recent articles from each feed, then take the 7 most recent from those. The (very dull) pseudo-code-type-process goes something like:
Look up the the member
Get the relevant property ( a group name) for the member
Look up the addresses of the RSS feeds for that group
For each address in the collection, get the five most recent articles and place them in another collection
Take the seven most recent articles from that
subsequent collection and display them.
I've done some research and been able to produce the following:
public static class RSSHelper
{
public static IEnumerable<SyndicationItem> GetLatestArticlesFromFeeds(List<string> feedList, short articlesToTake)
{
foreach (string Feed in feedList)
{
yield return GetLatestArticlesFromFeed(Feed).OrderByDescending(o => o.PublishDate).Take(articlesToTake).First();
}
yield return null;
}
private static IEnumerable<SyndicationItem> GetLatestArticlesFromFeed(string feedURL)
{
// We're only accepting XML based feeds, so create an XML reader:
SyndicationItem Result = new SyndicationItem();
int SkipCount = 0;
for (int Curr = 1; Curr <= 5; Curr++)
{
try
{
XmlReader Reader = XmlReader.Create(feedURL);
SyndicationFeed Feed = SyndicationFeed.Load(Reader);
Reader.Close();
Result = Feed.Items.OrderByDescending(o => o.PublishDate).Skip(SkipCount).Take(1).Single();
SkipCount++;
}
catch (Exception ex)
{
// Do nothing, else the Yield will fail.
}
yield return Result;
}
}
}
What seems to be happening is that I get five results (articlesToTake is 7, not 5), and occasionally either the whole SyndicationItem is null, or properties of it are null. I'm also convinced that this is a really, really poorly performing approach to tackling this problem, but I can't find much direction on using the yield keyword in this context.
I did find this question but it's not quite helping me understand anything.
Is what I'm trying to do achievable in this way, or do I just need to bite the bullet and use a couple of foreach loops?

Load your RSS feed in memory using async and await, then order them by date, and just Take the first 7

Considering all you want to do in GetLatestArticlesFromFeed is to get the 5 latest items, wouldn't it be easier to only order the list once and then take the first 5 items? It would look like this (together with a SelectMany based approach to the first method)
public static class RSSHelper
{
public static IEnumerable<SyndicationItem> GetLatestArticlesFromFeeds(List<string> feedList, short articlesToTake)
{
return feedList.SelectMany(f => GetLatestArticlesFromFeed(f)).OrderByDescending(a => a.PublishDate).Take(articlesToTake);
}
private static IEnumerable<SyndicationItem> GetLatestArticlesFromFeed(string feedURL)
{
// We're only accepting XML based feeds, so create an XML reader:
SyndicationFeed feed = null;
try
{
using (XmlReader reader = XmlReader.Create(feedURL))
{
feed = SyndicationFeed.Load(reader);
}
return feed.Items.OrderByDescending(o => o.PublishDate).Take(5);
}
catch
{
return Enumerable.Empty<SyndicationItem>();
}
}
}
Let me know if this doesn't work!

Morning,
Now that I don't feel like death warmed up, I've got it working! Thanks to #Rodrigo and #McKabue for their help in finding the eventual answer, and #NPSF3000 for pointing out my original stupidity!
I've settled on this as a result:
public static class RSSHelper
{
public static IEnumerable<SyndicationItem> GetLatestArticlesFromFeeds(List<string> feedList, short articlesToTake)
{
return GetLatestArticlesFromFeedsAsync(feedList, articlesToTake).Result;
}
private async static Task<IEnumerable<SyndicationItem>> GetLatestArticlesFromFeedsAsync(List<string> feedList, short articlesToTake)
{
List<Task<IEnumerable<SyndicationItem>>> TaskList = new List<Task<IEnumerable<SyndicationItem>>>();
foreach (string Feed in feedList)
{
// Call and start a task to evaluate the RSS feeds
Task<IEnumerable<SyndicationItem>> T = Task.FromResult(GetLatestArticlesFromFeed(Feed).Result);
TaskList.Add(T);
}
var Results = await Task.WhenAll(TaskList);
// Filter the not null results - on the balance of probabilities, we'll still get more than 7 results.
var ReturnList = Results.SelectMany(s => TaskList.Where(w => w.Result != null).SelectMany(z => z.Result).OrderByDescending(o => o.PublishDate)).Take(articlesToTake);
return ReturnList;
}
private async static Task<IEnumerable<SyndicationItem>> GetLatestArticlesFromFeed(string feedURL)
{
// We're only accepting XML based feeds, so create an XML reader:
try
{
XmlReader Reader = XmlReader.Create(feedURL);
SyndicationFeed Feed = SyndicationFeed.Load(Reader);
Reader.Close();
return Feed.Items.OrderByDescending(o => o.PublishDate).Take(5);
}
catch (Exception ex)
{
return null;
}
}
}
It took me a while to wrap my head around, as I was forgetting to define the result type of the task I was kicking off, but thankfully I stumbled across this question this morning, which helped everything fall nicely into place.
I feel a bit cheeky answering my own question, but on balance I think this is a nice, tidy balance between the proposed answers and it certainly seems functional. I'll add some hardening and comments, and of course if anyone has feedback I'll receive it gratefully.
Thanks!
Ash

Related

Realm .NET Where query with Contains() throws System.NotSupportedException

I'm using Realm for .NET v10.1.3, and I've got a method that deletes some objects. Pulling from the documentation that indicates that Contains is supported, I have the following snippet:
var results = realm.All<DeviceEventEntity>()
.Where(entity => ids.Contains(entity.Id));
realm.RemoveRange(results);
But when realm.RemoveRange(results) is executed, Realm throws a System.NotSupportedException. What am I doing wrong here? Or does Realm not support Contains?
Here's the stacktrace:
System.NotSupportedException
The method 'Contains' is not supported
at Realms.RealmResultsVisitor.VisitMethodCall(MethodCallExpression node) in C:\jenkins\workspace\realm_realm-dotnet_PR-2362#2\Realm\Realm\Linq\RealmResultsVisitor.cs:line 378
at Realms.RealmResultsVisitor.VisitMethodCall(MethodCallExpression node) in C:\jenkins\workspace\realm_realm-dotnet_PR-2362#2\Realm\Realm\Linq\RealmResultsVisitor.cs:line 164
at Realms.RealmResults`1.CreateHandle() in C:\jenkins\workspace\realm_realm-dotnet_PR-2362#2\Realm\Realm\Linq\RealmResults.cs:line 65
at System.Lazy`1.CreateValue()
at System.Lazy`1.LazyInitValue()
at Realms.RealmResults`1.get_ResultsHandle() in C:\jenkins\workspace\realm_realm-dotnet_PR-2362#2\Realm\Realm\Linq\RealmResults.cs:line 30
at Realms.Realm.RemoveRange[T](IQueryable`1 range) in C:\jenkins\workspace\realm_realm-dotnet_PR-2362#2\Realm\Realm\Realm.cs:line 1279
at DocLink.Client.Storage.Service.Event.DeviceEventService.<>c__DisplayClass2_0.<DeleteEvents>b__0() in
Here's a more complete example:
public Task DeleteEvents(List<ObjectId> ids) {
return Task.Run(() => {
using (var realm = GetRealm()) {
using (var transaction = realm.BeginWrite()) {
try {
var results = realm.All<DeviceEventEntity>().Where(entity => ids.Contains(entity.Id));
realm.RemoveRange(results);
transaction.Commit();
}
catch (Exception exception) {
transaction.Rollback();
throw new ServiceException("Unable to delete events. Transaction has been rolled back.", exception);
}
}
}
});
}
Also, it seems a little odd that the library is referencing files like this C:\jenkins\workspace\realm_realm-dotnet_PR-2362#2\Realm\Realm\Linq\RealmResultsVisitor.cs. This is not anything that's on my system, the library is pulled in through NuGet.
Docs say you need to use Filter when you encounter a NotSupportedException. Read the comments on the method for a link to the NSPredicate cheat sheet, there is quite a lot you can do with it :)
Update to question. First and foremost, thank you to all who participated and help point me in the right direction. The final answer ended up being a combination of a couple of things, but in short it was this previous post that ended up solving the issue.
The current version of Realm has support for Mongo ObjectId, however, using ObjectId in the Filter() method didn't really work. So the fix was to end up using a string as the PK but use ObjectId in the DTO -- converting to ObjectId on the way out, and ToString() on the way into Realm.
public static class IQueryableExtensions {
public static IQueryable<T> In<T>(this IQueryable<T> source, string propertyName, IList<ObjectId> objList)
where T : RealmObject {
var query = string.Join(" OR ", objList.Select(i => $"{propertyName} == '{i.ToString()}'"));
var results = source.Filter(query);
return results;
}
}
My code utilizing the extension
public Task DeleteEvents(List<ObjectId> ids) {
return Task.Run(() => {
using (var realm = GetRealm())
{
using (var transaction = realm.BeginWrite())
{
try {
// In order to support this with the current version of Realm we had to write an extension In()
// that explodes the list into a Filter() expression of OR comparisons. This also required us
// to use string as the underlying PK type instead of ObjectId. In this way, our domain object
// still expects ObjectId, so we ToString() on the way into realm and ObjectId.Parse() on the
// way out to our DTO.
var results = realm.All<DeviceEventEntity>().In("Id", ids);
realm.RemoveRange(results);
transaction.Commit();
}
catch (Exception exception)
{
transaction.Rollback();
throw new ServiceException("Unable to delete events. Transaction has been rolled back.", exception);
}
}
}
});
}

How to seed an observable from a database

I'm trying to expose an observable sequence that gives observers all existing records in a database table plus any future items. For the sake of argument, lets say it's log entries. Therefore, I'd have something like this:
public class LogService
{
private readonly Subject<LogEntry> entries;
public LogService()
{
this.entries = new Subject<LogEntry>();
this.entries
.Buffer(...)
.Subscribe(async x => WriteLogEntriesToDatabaseAsync(x));
}
public IObservable<LogEntry> Entries
{
get { return this.entries; }
}
public IObservable<LogEntry> AllLogEntries
{
get
{
// how the heck?
}
}
public void Log(string message)
{
this.entries.OnNext(new LogEntry(message));
}
private async Task<IEnumerable<LogEntry>> GetLogEntriesAsync()
{
// reads existing entries from DB table and returns them
}
private async Task WriteLogEntriesToDatabaseAsync(IList<LogEntry> entries)
{
// writes entries to the database
}
}
My initial thought for the implementation of AllLogEntries was something like this:
return Observable.Create<LogEntry>(
async observer =>
{
var existingEntries = await this.GetLogEntriesAsync();
foreach (var existingEntry in existingEntries)
{
observer.OnNext(existingEntry);
}
return this.entries.Subscribe(observer);
});
But the problem with this is that there could log entries that have been buffered and not yet written to the database. Hence, those entries will be missed because they are not in the database and have already passed through the entries observable.
My next thought was to separate the buffered entries from the non-buffered and use the buffered when implementing AllLogEntries:
return Observable.Create<LogEntry>(
async observer =>
{
var existingEntries = await this.GetLogEntriesAsync();
foreach (var existingEntry in existingEntries)
{
observer.OnNext(existingEntry);
}
return this.bufferedEntries
.SelectMany(x => x)
.Subscribe(observer);
});
There are two problems with this:
It means clients of AllLogEntries also have to wait for the buffer timespan to pass before they receive their log entries. I want them to see log entries instantaneously.
There is still a race condition in that log entries could be written to the database between the point at which I finish reading the existing ones and the point at which I return the future entries.
So my question is: how would I actually go about achieving my requirements here with no possibility of race conditions, and avoiding any major performance penalties?
To do this via the client code, you will probably have to implement a solution using polling and then look for differences between calls. Possibly combining a solution with
Observable.Interval() : http://rxwiki.wikidot.com/101samples#toc28 , and
Observable.DistinctUntilChanged()
will give you sufficient solution.
Alternatively, I'd suggest you try to find a solution where the clients are notified when the DB/table is updated. In a web application, you could use something like SignalR to do this.
For example: http://techbrij.com/database-change-notifications-asp-net-signalr-sqldependency
If its not a web-application, a similar update mechanism via sockets may work.
See these links (these came from the accepted answer of SignalR polling database for updates):
http://xsockets.net/api/net-c#snippet61
https://github.com/codeplanner/XSocketsPollingLegacyDB

How can I combine two streams ordered then grouped by timestamp?

I have two streams of objects that each have a Timestamp value. Both streams are in order, so for example the timestamps might be Ta = 1,3,6,6,7 in one stream and Tb = 1,2,5,5,6,8 in the other. Objects in both streams are of the same type.
What I'd like to be able to do is to put each of these events on the bus in order of timestamp, i.e., put A1, then B1, B2, A3 and so on. Furthermore, since some streams have several (sequential) elements with the same timestamp, I want those elements grouped so that each new event is an array. So we would put [A3] on the bus, followed by [A15,A25] and so on.
I've tried to implement this by making two ConcurrentQueue structures, putting each event at the back of the queue, then looking at each front of the queue, choosing first the earlier event and then traversing the queue such that all events with this timestamp are present.
However, I've encountered two problems:
If I leave these queues unbounded, I quickly run out of memory as the read op is a lot faster than the handlers receiving the events. (I've got a few gigabytes of data).
I sometimes end up with a situation where I handle the event, say, A15 before A25 has arrived. I somehow need to guard against this.
I'm thinking that Rx can help in this regard but I don't see an obvious combinator(s) to make this possible. Thus, any advice is much appreciated.
Rx is indeed a good fit for this problem IMO.
IObservables can't 'OrderBy' for obvious reasons (you would have to observe the entire stream first to guarantee the correct output order), so my answer below makes the assumption (that you stated) that your 2 source event streams are in order.
It was an interesting problem in the end. The standard Rx operators are missing a GroupByUntilChanged that would have solved this easily, as long as it called OnComplete on the previous group observable when the first element of the next group was observed. However looking at the implementation of DistinctUntilChanged it doesn't follow this pattern and only calls OnComplete when the source observable completes (even though it knows there will be no more elements after the first non-distinct element... weird???). Anyway, for those reasons, I decided against a GroupByUntilChanged method (to not break Rx conventions) and went instead for a ToEnumerableUntilChanged.
Disclaimer: This is my first Rx extension so would appreciate feedback on my choices made. Also, one main concern of mine is the anonymous observable holding the distinctElements list.
Firstly, your application code is quite simple:
public class Event
{
public DateTime Timestamp { get; set; }
}
private IObservable<Event> eventStream1;
private IObservable<Event> eventStream2;
public IObservable<IEnumerable<Event>> CombineAndGroup()
{
return eventStream1.CombineLatest(eventStream2, (e1, e2) => e1.Timestamp < e2.Timestamp ? e1 : e2)
.ToEnumerableUntilChanged(e => e.Timestamp);
}
Now for the ToEnumerableUntilChanged implementation (wall of code warning):
public static IObservable<IEnumerable<TSource>> ToEnumerableUntilChanged<TSource,TKey>(this IObservable<TSource> source, Func<TSource,TKey> keySelector)
{
// TODO: Follow Rx conventions and create a superset overload that takes the IComparer as a parameter
var comparer = EqualityComparer<TKey>.Default;
return Observable.Create<IEnumerable<TSource>>(observer =>
{
var currentKey = default(TKey);
var hasCurrentKey = false;
var distinctElements = new List<TSource>();
return source.Subscribe((value =>
{
TKey elementKey;
try
{
elementKey = keySelector(value);
}
catch (Exception ex)
{
observer.OnError(ex);
return;
}
if (!hasCurrentKey)
{
hasCurrentKey = true;
currentKey = elementKey;
distinctElements.Add(value);
return;
}
bool keysMatch;
try
{
keysMatch = comparer.Equals(currentKey, elementKey);
}
catch (Exception ex)
{
observer.OnError(ex);
return;
}
if (keysMatch)
{
distinctElements.Add(value);
return;
}
observer.OnNext( distinctElements);
distinctElements.Clear();
distinctElements.Add(value);
currentKey = elementKey;
}), observer.OnError, () =>
{
if (distinctElements.Count > 0)
observer.OnNext(distinctElements);
observer.OnCompleted();
});
});
}

XmlReader best practices

I've had a good read through MSDN and the XmlReader related questions on StackOverflow and I haven't yet come across a decent "best practices" example.
I've tried various combinations and each seems to have downsides, but the best I can come up with is as follows:
The XML:
<properties>
<actions:name>start</actions:name>
<actions:value type="System.DateTime">06/08/2011 01:26:49</actions:value>
</properties>
The code:
// Reads past the initial/root element
reader.ReadStartElement();
// Check we haven't hit the end
while (!reader.EOF && reader.NodeType != XmlNodeType.EndElement) {
if (reader.IsStartElement("name", NamespaceUri)) {
// Read the name element
this.name = reader.ReadElementContentAsString();
} else if (reader.IsStartElement("value", NamespaceUri)) {
// Read the value element
string valueTypeName = reader["type"] ?? typeof(string).Name;
Type valueType = Type.GetType(valueTypeName);
string valueString = reader.ReadElementContentAsString();
// Other stuff here that doesn;t matter to the XML parsing
} else {
// We can't do anything with this node so skip over it
reader.Read();
}
}
This is being passed into my class from a .ReadSubTree() call and each class reads its own information. I would prefer it NOT to rely on it being in a specific order.
Before this, I did try several variations.
1) while(reader.Read())
This was taken from various example, but found that it "missed" some elements when .ReadContent*() of element 1 left it on the start of the element 2, .Read read over it to element 3.
2) Removing the .Read() caused it to just get stuck after the first element I read.
3) Several others I long consigned to "failed".
As far as I can see, the code I've settled on seems to be the most accepting and stable but is there anything obvious I'm missing?
(Note the c# 2.0 tag so LINQ/XNode/XElement aren't options)
One approach is to use a custom XmlReader. XmlReader is abstract and XmlReaders can be chained, giving a powerful mechanism to do some domain specific processing in a reader.
Example: XamlXmlReader
Help on XmlWrappingReader
Here's a sample of how it could be implemented (See inline comments):
/// <summary>
/// Depending on the complexity of the Xml structure, a complex statemachine could be required here.
/// Such a reader nicely separates the structure of the Xml from the business logic dependent on the data in the Xml.
/// </summary>
public class CustomXmlReader: XmlWrappingReader
{
public CustomXmlReader(XmlReader xmlReader)
:base(XmlReader.Create(xmlReader, xmlReader.Settings))
{
}
public override bool Read()
{
var b = base.Read();
if (!b)
return false;
_myEnum = MyEnum.None;
if("name".Equals(this.Name))
{
_myEnum = MyEnum.Name;
//custom logic to read the entire element and set the enum, name and any other properties relevant to your domain
//Use base.Read() until you've read the complete "logical" chunk of Xml. The "logical" chunk could be more than a element.
}
if("value".Equals(this.Value))
{
_myEnum = Xml.MyEnum.Value;
//custom logic to read the entire element and set the enum, value and and any other properties relevant to your domain
//Use base.Read() until you've read the complete "logical" chunk of Xml. The "logical" chunk could be more than a element.
}
return true;
}
//These properties could return some domain specific values
#region domain specific reader properties.
private MyEnum _myEnum;
public MyEnum MyEnum
{
get { return _myEnum; }
}
#endregion
}
public enum MyEnum
{
Name,
Value,
None
}
public class MyBusinessAppClass
{
public void DoSomething(XmlReader passedInReader)
{
var myReader = new CustomXmlReader(passedInReader);
while(myReader.Read())
{
switch(myReader.MyEnum)
{
case MyEnum.Name:
//Do something here;
break;
case MyEnum.Value:
//Do something here;
break;
}
}
}
}
A word of caution : This might be over engineering for some simple Xml processing that you've shown here. Unless, you have more that two elements that need custom processing, this approach is not advised.

Yield multiple IEnumerables

I have an piece of code that does calculations on assets. There are many millions of those so I want to compute everything in streams. My current 'pipeline' looks like this:
I have a query that is executed as a Datareader.
Then my Asset class has a constructor that accepts an IDataReader;
Public Asset(IdataReader rdr){
// logic that initiates fields
}
and a method that converts the IDataReader to an IEnumerable<Asset>
public static IEnumerable<Asset> ToAssets(IDataReader rdr) {
// make sure the reader is in the right formt
CheckReaderFormat(rdr);
// project reader into IEnumeable<Asset>
while (rdr.Read()) yield return new Asset(rdr);
}
That then gets passed into a function that does the actually calculations and then projects it into a IEnumerable<Asnwer>
That then gets a wrapper the exposes the Answers as an IDataReader and that then that gets passed to a OracleBulkCopy and the stream is written to the DB.
So far it works like a charm. Because of the setup I can swap the DataReader for an IEnumerable that reads from a file, or have the results written to a file etc. All depending on how I string the classes/ functions together.
Now: There are several thing I can compute, for instance besides the normal Answer I could have a DebugAnswer class that also outputs some intermediate numbers for debugging. So what I would like to do is project the IEnumerable into several output streams so I can put 'listeners' on those. That way I won't have to go over the data multiple times. How can I do that? Kind of like having several Events and then only fire certain code if there's a listeners attached.
Also sometimes I write to the DB but also to a zipfile just to keep a backup of the results. So then I would like to have 2 'listeners' on the IEnumerable. One that projects is as an IDataReader and another one that writes straight to the file.
How do I output multiple output streams and how can I put multiple listeners on one outputstream? What lets me compose streams of data like that?
edit
so some pseudocode of what I would like to do:
foreach(Asset in Assets){
if(DebugListener != null){
// compute
DebugAnswer da = new DebugAnswer {result = 100};
yield da to DebugListener; // so instead of yield return yield to that stream
}
if(AnswerListener != null){
// compute basic stuff
Answer a = new Answer { bla = 200 };
yield a to AnswerListener;
}
}
Thanks in advance,
Gert-Jan
What you're describing sounds sort of like what the Reactive framework provides via the IObservable interface, but I don't know for sure whether it allows multiple subscribers to a single subscription stream.
Update
If you take a look at the documentation for IObservable, it has a pretty good example of how to do the sort of thing you're doing, with multiple subscribers to a single object.
Your example rewritten using Rx:
// The stream of assets
IObservable<Asset> assets = ...
// The stream of each asset projected to a DebugAnswer
IObservable<DebugAnswer> debugAnswers = from asset in assets
select new DebugAnswer { result = 100 };
// Subscribe the DebugListener to receive the debugAnswers
debugAnswers.Subscribe(DebugListener);
// The stream of each asset projected to an Anwer
IObservable<Answer> answers = from asset in assets
select new Answer { bla = 200 };
// Subscribe the AnswerListener to receive the answers
answers.Subscribe(AnswerListener);
This is exactly the job for Reactive Extensions (became part of .NET since 4.0, available as a library in 3.5).
You don't need multiple "listeners", you just need pipeline components that aren't destructive or even necessarily transformable.
IEnumerable<T> PassThroughEnumerable<T>(IEnumerable<T> source, Action<T> action) {
foreach (T t in source) {
Action(t);
yield return t;
}
}
Or, as you're processing in the pipeline just raise some events to be consumed. You can async them if you want:
static IEnumerable<Asset> ToAssets(IDataReader rdr) {
CheckReaderFormat(rdr);
var h = this.DebugAsset;
while (rdr.Read()) {
var a = new Asset(rdr);
if (h != null) h(a);
yield return a;
}
}
public event EventHandler<Asset> DebugAsset;
If I got you right, it should be possible to replace or decorate the wrapper. The WrapperDecorator may forward calls to the normal OracleBulkCopy (or whatever you're using) and add some custom debug code.
Does that help you?
Matthias

Categories