Skipping null values in a Scan method of an Observable collection

Skipping null values in a Scan method of an Observable collection - c#

I have an IObservable of items with a timestamp.
I use the Scan method to wrap each item and add a reference to the last valid wrapped item which was received.
IObservable<IWrappedItem> wrappedItems =
allItems.Scan(seedItem,
(lastWrappedItem, currentItem) =>
WrapItem(currentItem, lastWrappedItem)));
This is the signature of WrapItem:
IWrappedItem WrapItem(Item currentItem, IWrappedItem lastItem);
We needed to change the WrapItem method so it skips invalid (not-null) items and returns null.
The seedItem will most probably be null, and the WrapItem method can handle it.
I need to update the way I use Scan with something like this:
IObservable<IWrappedItem> wrappedItems = allItems.Scan(seedItem, (lastWrappedItem, currentItem) =>
{
IWrappedItem wrappedItem = WrapItem(currentItem, lastWrappedItem);
if (wrappedItem == null)
{
// Do something clever to skip this invalid item
// Next item should have a reference to lastWrappedItem
}
return wrappedItem;
}));
How can I implement this behavior without returning null values to the new collection, while keeping the Observable pattern?
Is there a different method that I should use instead of "Scan"?

You should just be able to simply use the Where method https://learn.microsoft.com/en-us/dotnet/api/system.linq.enumerable.where?view=net-7.0
IObservable<IWrappedItem> wrappedItems = allItems.Where(item => item != null).Scan(seedItem, (lastWrappedItem, currentItem) =>
{
IWrappedItem wrappedItem = WrapItem(currentItem, lastWrappedItem);
if (wrappedItem == null)
{
// Do something clever to skip this invalid item
// Next item should have a reference to lastWrappedItem
}
return wrappedItem;
}));

I found the answer to my question, I implemented a custom generic Scan method that receives the WrapItem function and ignores null values returned from it. It implements Scan using Select and Where methods.
This is my implementation:
public static IObservable<TAccumulate> ScanObservableAndFilterNulls<TSource, TAccumulate>(this IObservable<TSource> items, TAccumulate seed, Func<TSource, TAccumulate, TAccumulate> wrapItemFunc)
{
// use the seed before beginning the scan implementation
TAccumulate lastWrappedItem = seed;
// implement the custom Scan method
return items.Select(item => wrapItemFunc(item, lastWrappedItem))
.Where(wrappedItem =>
{
if (wrappedItem != null)
{
// update the lastWrappedItem only when the wrapped item is valid
lastWrappedItem = wrappedItem;
return true;
}
// skip invalid wrapped items, but keep the reference to the last valid item
return false;
});
}
This method can be used like this:
public static IObservable<IWrappedItem> ScanAndWrapItems(IObservable<Item> allItems, IWrappedItem seedItem)
{
return allItems.ScanObservableAndFilterNulls(seedItem, WrapItem);
}
I didn't benchmark the new method to assess the performance penalty, but I believe it would be slower than the original Scan method...

Related

Looking for alternative LINQ expression(s)

I'm working on a code generator that validated objects based on certain business rules. As an example, I’m curious to find out various ways below logic can be written as LINQ expression.
Assertion should evaluate to true when collection is null OR when count of "TrueAndCorrect" items is anything but 1. One possible solution is:
bool assertion = report.DeclarationOfTrusteeCollection == null
|| report.DeclarationOfTrusteeCollection.Count(f => f.FTER99.Equals("TrueAndCorrect")) != 1
Are there other ways this LINQ can be expressed as, perhaps more compact, using Any, inverting the operators, or any other?

The original code is:
bool assertion =
report.DeclarationOfTrusteeCollection == null ||
report.DeclarationOfTrusteeCollection.Count(
f => f.FTER99.Equals("TrueAndCorrect")) != 1;
There are some problems here.
First, the intention of the null check seems to be "a null collection has the same semantics as an empty collection". This is a worst-practice in C#. Never do this! If you want to represent an empty collection, make an empty collection. There's even an Enumerable.Empty helper method for you.
So, start with that; the code should be:
if (report.DeclarationOfTrusteeCollection == null)
throw some appropriate exception
or
Debug.Assert(report.DeclarationOfTrusteeCollection != null);
if the condition is impossible.
That leaves us with
bool assertion =
report.DeclarationOfTrusteeCollection.Count(
f => f.FTER99.Equals("TrueAndCorrect")) != 1;
This is bad. Suppose I show you a jar that contains some number of pennies and I ask you "is there exactly one penny in the jar?" How many pennies do you have to count before you know the answer? Your code here is counting all of them, but you could stop after two.
Enumerable gives you a method which throws if a sequence is not a singleton, but no method that tests it. Fortunately it is easy to write. The best practice here is to write a helper method that has the exact semantics you want:
static class Extensions
{
public static bool IsSingleton<T>(this IEnumerable<T> items)
{
bool seenOne = false;
foreach(T item in items)
{
if (seenOne) return false;
seenOne = true;
}
return seenOne;
}
public static bool IsSingleton<T>(
this IEnumerable<T> items, Func<T, bool> predicate) =>
items.Where(predicate).IsSingleton();
}
Done. And now your code is:
if (report.DeclarationOfTrusteeCollection == null)
throw some appropriate exception
bool assertion =
report.DeclarationOfTrusteeCollection.IsSingleton(f => ...);
Write the code so that it reads like what it is logically doing. That's the beauty and power of LINQ sequence operators.

You could use the null-propagation operator:
bool assertion = report.DeclarationOfTrusteeCollection?.Count(f => f.FTER99.Equals("TrueAndCorrect")) != 1;
Since null is not 1 this is also true if the collection is null.
It would be nice if you don't need to count the whole collection, you already know it's wrong when there's more than one matching element. But I don't know of a built-in method for that. You could write your own extension:
public static class MyExtensions
{
public static bool IsNullOrHasNotExactlyOneMatching<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
if (source == null) return true;
bool found = false;
foreach(T element in source)
{
if (!predicate(element)) continue;
if (found) return true; // this is the second match!
found = true;
}
return !found; // one match found (or not)
}
}
And use it:
bool assertion = report.DeclarationOfTrusteeCollection.IsNullOrHasNotExactlyOneMatching(f => f.FTER99.Equals("TrueAndCorrect"));
As mentioned by Rawling you could shorten the extension using Take():
public static bool IsNullOrHasNotExactlyOneMatching<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
return source?.Where(predicate).Take(2).Count() != 1;
}
or do this directly:
bool assertion = report.DeclarationOfTrusteeCollection?.Where(f => f.FTER99.Equals("TrueAndCorrect"))
.Take(2).Count() != 1;
Both versions only iterate until a second match was found (or until the end if no match was found).

How do I verify a collection of values is unique (contains no duplicates) in C#

Surely there is an easy way to verify a collection of values has no duplicates [using the default Comparison of the collection's Type] in C#/.NET ? Doesn't have to be directly built in but should be short and efficient.
I've looked a lot but I keep hitting examples of using collection.Count() == collection.Distinct().Count() which for me is inefficient. I'm not interested in the result and want to bail out as soon as I detect a duplicate, should that be the case.
(I'd love to delete this question and/or its answer if someone can point out the duplicates)

Okay, if you just want to get out as soon as the duplicate is found, it's simple:
// TODO: add an overload taking an IEqualityComparer<T>
public bool AllUnique<T>(this IEnumerable<T> source)
{
if (source == null)
{
throw new ArgumentNullException("source");
}
var distinctItems = new HashSet<T>();
foreach (var item in source)
{
if (!distinctItems.Add(item))
{
return false;
}
}
return true;
}
... or use All, as you've already shown. I'd argue that this is slightly simpler to understand in this case... or if you do want to use All, I'd at least separate the creation of the set from the method group conversion, for clarity:
public static bool IsUnique<T>(this IEnumerable<T> source)
{
// TODO: validation
var distinctItems = new HashSet<T>();
// Add will return false if the element already exists. If
// every element is actually added, then they must all be unique.
return source.All(distinctItems.Add);
}

Doing it inline, you can replace:
collection.Count() == collection.Distinct().Count()
with
collection.All( new HashSet<T>().Add );
(where T is the type of your collection's elements)
Or you can extract the above to a helper extension method[1] so you can say:
collection.IsUnique()
[1]
static class EnumerableUniquenessExtensions
{
public static bool IsUnique<T>(this IEnumerable<T> that)
{
return that.All( new HashSet<T>().Add );
}
}
(and as Jon has pointed out in his answer, one really should separate and comment the two lines as such 'cuteness' is generally Not A Good Idea)

Is this achievable with a single LINQ query?

Suppose I have a given object of type IEnumerable<string> which is the return value of method SomeMethod(), and which contains no repeated elements. I would like to be able to "zip" the following lines in a single LINQ query:
IEnumerable<string> someList = SomeMethod();
if (someList.Contains(givenString))
{
return (someList.Where(givenString));
}
else
{
return (someList);
}
Edit: I mistakenly used Single instead of First. Corrected now.
I know I can "zip" this by using the ternary operator, but that's just not the point. I would just list to be able to achieve this with a single line. Is that possible?

This will return items with given string or all items if given is not present in the list:
someList.Where(i => i == givenString || !someList.Contains(givenString))

The nature of your desired output requires that you either make two requests for the data, like you are now, or buffer the non-matches to return if no matches are found. The later would be especially useful in cases where actually getting the data is a relatively expensive call (eg: database query or WCF service). The buffering method would look like this:
static IEnumerable<T> AllIfNone<T>(this IEnumerable<T> source,
Func<T, bool> predicate)
{
//argument checking ignored for sample purposes
var buffer = new List<T>();
bool foundFirst = false;
foreach (var item in source)
{
if (predicate(item))
{
foundFirst = true;
yield return item;
}
else if (!foundFirst)
{
buffer.Add(item);
}
}
if (!foundFirst)
{
foreach (var item in buffer)
{
yield return item;
}
}
}
The laziness of this method is either that of Where or ToList depending on if the collection contains a match or not. If it does, you should get execution similar to Where. If not, you will get roughly the execution of calling ToList (with the overhead of all the failed filter checks) and iterating the result.

What is wrong with the ternary operator?
someList.Any(s => s == givenString) ? someList.Where(s => s == givenString) : someList;
It would be better to do the Where followed by the Any but I can't think of how to one-line that.
var reducedEnumerable = someList.Where(s => s == givenString);
return reducedEnumerable.Any() ? reducedEnumerable : someList;

It is not possible to change the return type on the method, which is what you're asking. The first condition returns a string and the second condition returns a collection of strings.
Just return the IEnumerable<string> collection, and call Single on the return value like this:
string test = ReturnCollectionOfStrings().Single(x => x == "test");

Why does this linq extension method hit the database twice?

I have an extension method called ToListIfNotNullOrEmpty(), which is hitting the DB twice, instead of once. The first time it returns one result, the second time it returns all the correct results.
I'm pretty sure the first time it hits the database, is when the .Any() method is getting called.
here's the code.
public static IList<T> ToListIfNotNullOrEmpty<T>(this IEnumerable<T> value)
{
if (value.IsNullOrEmpty())
{
return null;
}
if (value is IList<T>)
{
return (value as IList<T>);
}
return new List<T>(value);
}
public static bool IsNullOrEmpty<T>(this IEnumerable<T> value)
{
if (value != null)
{
return !value.Any();
}
return true;
}
I'm hoping to refactor it so that, before the .Any() method is called, it actually enumerates through the entire list.
If i do the following, only one DB call is made, because the list is already enumerated.
var pewPew = (from x in whatever
select x)
.ToList() // This enumerates.
.ToListIsNotNullOrEmpty(); // This checks the enumerated result.
I sorta don't really want to call ToList() then my extension method.
Any ideas, folks?

I confess that I see little point in this method. Surely if you simply do a ToList(), a check to see if the list is empty suffices as well. It's arguably harder to handle the null result when you expect a list because then you always have to check for null before you iterate over it.
I think that:
var query = (from ...).ToList();
if (query.Count == 0) {
...
}
works as well and is less burdensome than
var query = (from ...).ToListIfNotNullOrEmpty();
if (query == null) {
...
}
and you don't have to implement (and maintain) any code.

How about something like this?
public static IList<T> ToListIfNotNullOrEmpty<T>(this IEnumerable<T> value)
{
if (value == null)
return null;
var list = value.ToList();
return (list.Count > 0) ? list : null;
}

To actually answer your question:
This method hits the database twice because the extension methods provided by the System.Linq.Enumerable class exhibit what is called deferred execution. Essentially, this is to eliminate the need for constructing a string of temporarily cached collections for every part of a query. To understand this, consider the following example:
var firstMaleTom = people
.Where(p => p.Gender = Gender.Male)
.Where(p => p.FirstName == "Tom")
.FirstOrDefault();
Without deferred execution, the above code might require that the entire collection people be enumerated over, populating a temporary buffer array with all the individuals whose Gender is Male. Then it would need to be enumerated over again, populating another buffer array with all of the individuals from the first buffer whose first name is Tom. After all that work, the last part would return the first item from the resulting array.
That's a lot of pointless work. The idea with deferred execution is that the above code really just sets up the firstMaleTom variable with the information it needs to return what's being requested with the minimal amount of work.
Now, there's a flip side to this: in the case of querying a database, deferred execution means that the database gets queried when the return value is evaluated. So, in your IsNullOrEmpty method, when you call Any, the value parameter is actually being evaluated right then and there -- hence a database query. After this, in your ToListIfNotNullOrEmpty method, the line return new List<T>(value) also evaluates the value parameter -- because it's enumerating over the values and adding them to the newly created List<T>.

You could stick the .ToList() call inside the extension, the effect is slightly different, but does this still work in the cases you have?
public static IList<T> ToListIfNotNullOrEmpty<T>(this IEnumerable<T> value)
{
if(value == null)
{
return null;
}
var result = value.ToList();
return result.IsNullOrEmpty() ? null : result;
}

intro to lambda/anonymous functions

I have this function from a plugin (from a previous post)
// This method implements the test condition for
// finding the ResolutionInfo.
private static bool IsResolutionInfo(ImageResource res)
{
return res.ID == (int)ResourceIDs.ResolutionInfo;
}
And the line thats calling this function:
get
{
return (ResolutionInfo)m_imageResources.Find(IsResolutionInfo);
}
So basically I'd like to get rid of the calling function. It's only called twice (once in the get and the other in the set). And It could possible help me to understand inline functions in c#.

get
{
return (ResolutionInfo)m_imageResources.Find(res => res.ID == (int)ResourceIDs.ResolutionInfo);
}
Does that clear it up at all?
Just to further clear things up, looking at reflector, this is what the Find method looks like:
public T Find(Predicate<T> match)
{
if (match == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
}
for (int i = 0; i < this._size; i++)
{
if (match(this._items[i]))
{
return this._items[i];
}
}
return default(T);
}
So as you can see, it loops through the collection, and for every item in the collection, it passes the item at that index to the Predicate that you passed in (through your lambda). Thus, since we're dealing with generics, it automatically knows the type you're dealing with. It'll be Type T which is whatever type that is in your collection. Makes sense?

Just to add , does the "Find" Function on a list (which is what m_imageresources is) automatically pass the parameter to the IsResoulutionInfo function?
Also, what happens first the cast or the function call?

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Skipping null values in a Scan method of an Observable collection - c#

Related

Looking for alternative LINQ expression(s)

How do I verify a collection of values is unique (contains no duplicates) in C#

Is this achievable with a single LINQ query?

Why does this linq extension method hit the database twice?

intro to lambda/anonymous functions

Categories

Resources