How to select last value from each run of similar items?

How to select last value from each run of similar items? - c#

I have a list. I'd like to take the last value from each run of similar elements.
What do I mean? Let me give a simple example. Given the list of words
['golf', 'hip', 'hop', 'hotel', 'grass', 'world', 'wee']
And the similarity function 'starting with the same letter', the function would return the shorter list
['golf', 'hotel', 'grass', 'wee']
Why? The original list has a 1-run of G words, a 3-run of H words, a 1-run of G words, and a 2-run of W words. The function returns the last word from each run.
How can I do this?
Hypothetical C# syntax (in reality I'm working with customer objects but I wanted to share something you could run and test yourself)
> var words = new List<string>{"golf", "hip", "hop", "hotel", "grass", "world", "wee"};
> words.LastDistinct(x => x[0])
["golf", "hotel", "grass", "wee"]
Edit: I tried .GroupBy(x => x[0]).Select(g => g.Last()) but that gives ['grass',
'hotel', 'wee'] which is not what I want. Read the example carefully.
Edit. Another example.
['apples', 'armies', 'black', 'beer', 'bastion', 'cat', 'cart', 'able', 'art', 'bark']
Here there are 5 runs (a run of A's, a run of B's, a run of C's, a new run of A's, a new run of B's). The last word from each run would be:
['armies', 'bastion', 'cart', 'art', 'bark']
The important thing to understand is that each run is independent. Don't mix-up the run of A's at the start with the run of A's near the end.

There's nothing too complicated with just doing it the old-fashioned way:
Func<string, object> groupingFunction = s => s.Substring(0, 1);
IEnumerable<string> input = new List<string>() {"golf", "hip", "..." };
var output = new List<string>();
if (!input.Any())
{
return output;
}
var lastItem = input.First();
var lastKey = groupingFunction(lastItem);
foreach (var currentItem in input.Skip(1))
{
var currentKey = groupingFunction(str);
if (!currentKey.Equals(lastKey))
{
output.Add(lastItem);
}
lastKey = currentKey;
lastItem = currentItem;
}
output.Add(lastItem);
You could also turn this into a generic extension method as Tim Schmelter has done; I have already taken a couple steps to generalize the code on purpose (using object as the key type and IEnumerable<T> as the input type).

You could use this extension that can group by adjacent/consecutive elements:
public static IEnumerable<IGrouping<TKey, TSource>> GroupAdjacent<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector)
{
TKey last = default(TKey);
bool haveLast = false;
List<TSource> list = new List<TSource>();
foreach (TSource s in source)
{
TKey k = keySelector(s);
if (haveLast)
{
if (!k.Equals(last))
{
yield return new GroupOfAdjacent<TSource, TKey>(list, last);
list = new List<TSource>();
list.Add(s);
last = k;
}
else
{
list.Add(s);
last = k;
}
}
else
{
list.Add(s);
last = k;
haveLast = true;
}
}
if (haveLast)
yield return new GroupOfAdjacent<TSource, TKey>(list, last);
}
public class GroupOfAdjacent<TSource, TKey> : IEnumerable<TSource>, IGrouping<TKey, TSource>
{
public TKey Key { get; set; }
private List<TSource> GroupList { get; set; }
System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
return ((System.Collections.Generic.IEnumerable<TSource>)this).GetEnumerator();
}
System.Collections.Generic.IEnumerator<TSource> System.Collections.Generic.IEnumerable<TSource>.GetEnumerator()
{
foreach (var s in GroupList)
yield return s;
}
public GroupOfAdjacent(List<TSource> source, TKey key)
{
GroupList = source;
Key = key;
}
}
Then it's easy:
var words = new List<string>{"golf", "hip", "hop", "hotel", "grass", "world", "wee"};
IEnumerable<string> lastWordOfConsecutiveFirstCharGroups = words
.GroupAdjacent(str => str[0])
.Select(g => g.Last());
Output:
string.Join(",", lastWordOfConsecutiveFirstCharGroups); // golf,hotel,grass,wee
Your other sample:
words=new List<string>{"apples", "armies", "black", "beer", "bastion", "cat", "cart", "able", "art", "bark"};
lastWordOfConsecutiveFirstCharGroups = words
.GroupAdjacent(str => str[0])
.Select(g => g.Last());
Output:
string.Join(",", lastWordOfConsecutiveFirstCharGroups); // armies,bastion,cart,art,bark
Demonstration

Try this algoritm
var words = new List<string> { "golf", "hip", "hop", "hotel", "grass", "world", "wee" };
var newList = new List<string>();
int i = 0;
while (i < words.Count - 1 && i <= words.Count)
{
if (words[i][0] != words[i+1][0])
{
newList.Add(words[i]);
i++;
}
else
{
var j = i;
while ( j < words.Count - 1 && words[j][0] == words[j + 1][0])
{
j++;
}
newList.Add(words[j]);
i = j+1;
}
}

You can use following extension method to split your sequence into groups (i.e. sub-sequnces) by some condition:
public static IEnumerable<IEnumerable<T>> Split<T, TKey>(
this IEnumerable<T> source, Func<T, TKey> keySelector)
{
var group = new List<T>();
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
yield break;
else
{
TKey currentKey = keySelector(iterator.Current);
var keyComparer = Comparer<TKey>.Default;
group.Add(iterator.Current);
while (iterator.MoveNext())
{
var key = keySelector(iterator.Current);
if (keyComparer.Compare(currentKey, key) != 0)
{
yield return group;
currentKey = key;
group = new List<T>();
}
group.Add(iterator.Current);
}
}
}
if (group.Any())
yield return group;
}
And getting your expected results looks like:
string[] words = { "golf", "hip", "hop", "hotel", "grass", "world", "wee" };
var result = words.Split(w => w[0])
.Select(g => g.Last());
Result:
golf
hotel
grass
wee

Because your input is a List<>, so I think this should work for you with an acceptable performance and especially it's very concise:
var result = words.Where((x, i) => i == words.Count - 1 ||
words[i][0] != words[i + 1][0]);
You can append ToList() on the result to get a List<string> if you want.

I went with
/// <summary>
/// Given a list, return the last value from each run of similar items.
/// </summary>
public static IEnumerable<T> WithoutDuplicates<T>(this IEnumerable<T> source, Func<T, T, bool> similar)
{
Contract.Requires(source != null);
Contract.Requires(similar != null);
Contract.Ensures(Contract.Result<IEnumerable<T>>().Count() <= source.Count(), "Result should be at most as long as original list");
T last = default(T);
bool first = true;
foreach (var item in source)
{
if (!first && !similar(item, last))
yield return last;
last = item;
first = false;
}
if (!first)
yield return last;
}

Related

Find object index in binding list? [duplicate]

This question already has answers here:
Get List<> element position in c# using LINQ
(11 answers)
How to get the index of an element in an IEnumerable?
(12 answers)
Closed 8 years ago.
Given a datasource like that:
var c = new Car[]
{
new Car{ Color="Blue", Price=28000},
new Car{ Color="Red", Price=54000},
new Car{ Color="Pink", Price=9999},
// ..
};
How can I find the index of the first car satisfying a certain condition with LINQ?
EDIT:
I could think of something like this but it looks horrible:
int firstItem = someItems.Select((item, index) => new
{
ItemName = item.Color,
Position = index
}).Where(i => i.ItemName == "purple")
.First()
.Position;
Will it be the best to solve this with a plain old loop?

myCars.Select((v, i) => new {car = v, index = i}).First(myCondition).index;
or the slightly shorter
myCars.Select((car, index) => new {car, index}).First(myCondition).index;
or the slightly shorter shorter
myCars.Select((car, index) => (car, index)).First(myCondition).index;

Simply do :
int index = List.FindIndex(your condition);
E.g.
int index = cars.FindIndex(c => c.ID == 150);

An IEnumerable is not an ordered set.
Although most IEnumerables are ordered, some (such as Dictionary or HashSet) are not.
Therefore, LINQ does not have an IndexOf method.
However, you can write one yourself:
///<summary>Finds the index of the first item matching an expression in an enumerable.</summary>
///<param name="items">The enumerable to search.</param>
///<param name="predicate">The expression to test the items against.</param>
///<returns>The index of the first matching item, or -1 if no items match.</returns>
public static int FindIndex<T>(this IEnumerable<T> items, Func<T, bool> predicate) {
if (items == null) throw new ArgumentNullException("items");
if (predicate == null) throw new ArgumentNullException("predicate");
int retVal = 0;
foreach (var item in items) {
if (predicate(item)) return retVal;
retVal++;
}
return -1;
}
///<summary>Finds the index of the first occurrence of an item in an enumerable.</summary>
///<param name="items">The enumerable to search.</param>
///<param name="item">The item to find.</param>
///<returns>The index of the first matching item, or -1 if the item was not found.</returns>
public static int IndexOf<T>(this IEnumerable<T> items, T item) { return items.FindIndex(i => EqualityComparer<T>.Default.Equals(item, i)); }

myCars.TakeWhile(car => !myCondition(car)).Count();
It works! Think about it. The index of the first matching item equals the number of (not matching) item before it.
Story time
I too dislike the horrible standard solution you already suggested in your question. Like the accepted answer I went for a plain old loop although with a slight modification:
public static int FindIndex<T>(this IEnumerable<T> items, Predicate<T> predicate) {
int index = 0;
foreach (var item in items) {
if (predicate(item)) break;
index++;
}
return index;
}
Note that it will return the number of items instead of -1 when there is no match. But let's ignore this minor annoyance for now. In fact the horrible standard solution crashes in that case and I consider returning an index that is out-of-bounds superior.
What happens now is ReSharper telling me Loop can be converted into LINQ-expression. While most of the time the feature worsens readability, this time the result was awe-inspiring. So Kudos to the JetBrains.
Analysis
Pros
Concise
Combinable with other LINQ
Avoids newing anonymous objects
Only evaluates the enumerable until the predicate matches for the first time
Therefore I consider it optimal in time and space while remaining readable.
Cons
Not quite obvious at first
Does not return -1 when there is no match
Of course you can always hide it behind an extension method. And what to do best when there is no match heavily depends on the context.

I will make my contribution here... why? just because :p Its a different implementation, based on the Any LINQ extension, and a delegate. Here it is:
public static class Extensions
{
public static int IndexOf<T>(
this IEnumerable<T> list,
Predicate<T> condition) {
int i = -1;
return list.Any(x => { i++; return condition(x); }) ? i : -1;
}
}
void Main()
{
TestGetsFirstItem();
TestGetsLastItem();
TestGetsMinusOneOnNotFound();
TestGetsMiddleItem();
TestGetsMinusOneOnEmptyList();
}
void TestGetsFirstItem()
{
// Arrange
var list = new string[] { "a", "b", "c", "d" };
// Act
int index = list.IndexOf(item => item.Equals("a"));
// Assert
if(index != 0)
{
throw new Exception("Index should be 0 but is: " + index);
}
"Test Successful".Dump();
}
void TestGetsLastItem()
{
// Arrange
var list = new string[] { "a", "b", "c", "d" };
// Act
int index = list.IndexOf(item => item.Equals("d"));
// Assert
if(index != 3)
{
throw new Exception("Index should be 3 but is: " + index);
}
"Test Successful".Dump();
}
void TestGetsMinusOneOnNotFound()
{
// Arrange
var list = new string[] { "a", "b", "c", "d" };
// Act
int index = list.IndexOf(item => item.Equals("e"));
// Assert
if(index != -1)
{
throw new Exception("Index should be -1 but is: " + index);
}
"Test Successful".Dump();
}
void TestGetsMinusOneOnEmptyList()
{
// Arrange
var list = new string[] { };
// Act
int index = list.IndexOf(item => item.Equals("e"));
// Assert
if(index != -1)
{
throw new Exception("Index should be -1 but is: " + index);
}
"Test Successful".Dump();
}
void TestGetsMiddleItem()
{
// Arrange
var list = new string[] { "a", "b", "c", "d", "e" };
// Act
int index = list.IndexOf(item => item.Equals("c"));
// Assert
if(index != 2)
{
throw new Exception("Index should be 2 but is: " + index);
}
"Test Successful".Dump();
}

Here is a little extension I just put together.
public static class PositionsExtension
{
public static Int32 Position<TSource>(this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
{
return Positions<TSource>(source, predicate).FirstOrDefault();
}
public static IEnumerable<Int32> Positions<TSource>(this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
{
if (typeof(TSource) is IDictionary)
{
throw new Exception("Dictionaries aren't supported");
}
if (source == null)
{
throw new ArgumentOutOfRangeException("source is null");
}
if (predicate == null)
{
throw new ArgumentOutOfRangeException("predicate is null");
}
var found = source.Where(predicate).First();
var query = source.Select((item, index) => new
{
Found = ReferenceEquals(item, found),
Index = index
}).Where( it => it.Found).Select( it => it.Index);
return query;
}
}
Then you can call it like this.
IEnumerable<Int32> indicesWhereConditionIsMet =
ListItems.Positions(item => item == this);
Int32 firstWelcomeMessage ListItems.Position(msg =>
msg.WelcomeMessage.Contains("Hello"));

Here's an implementation of the highest-voted answer that returns -1 when the item is not found:
public static int FindIndex<T>(this IEnumerable<T> items, Func<T, bool> predicate)
{
var itemsWithIndices = items.Select((item, index) => new { Item = item, Index = index });
var matchingIndices =
from itemWithIndex in itemsWithIndices
where predicate(itemWithIndex.Item)
select (int?)itemWithIndex.Index;
return matchingIndices.FirstOrDefault() ?? -1;
}

c# faster n-ary cartesian product for

I have searched around for a way to find the product of multiple lists; I have used the popular answer which uses Aggregate+SelectMany. The trouble is that my example runs very slow: I have 4 lists, with 3K entries each and I need to enumerate each possible combinations.
Does anyone know a faster way in C#?
I made a fiddle here, which currently runs out of memory.
Following is the code of fiddle link
public static void Main()
{
var sources = new[]
{
Enumerable.Range(1, 3000),
Enumerable.Range(1, 3000),
Enumerable.Range(1, 3000),
Enumerable.Range(1, 3000),
};
var sw = new System.Diagnostics.Stopwatch();
sw.Start();
Console.Write("linq way...");
foreach(var l in NCartesian(sources))
{
// just enumerate
}
Console.WriteLine("{0}ms", sw.ElapsedMilliseconds);
}
public static IEnumerable<IEnumerable<T>> NCartesian<T>(
IEnumerable<IEnumerable<T>> sequences)
{
if (sequences == null)
{
return null;
}
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>()
};
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) => accumulator.SelectMany(
accseq => sequence,
(accseq, item) => accseq.Concat(new[] { item })));
}

I made one which has less memory usage than the above, still slow though:
public static IEnumerable<IEnumerable<T>> NCartesian<T>(
IEnumerable<IEnumerable<T>> sequences)
{
if (sequences == null)
{
throw new ArgumentNullException(nameof(sequences));
}
var enumerators = new List<IEnumerator<T>>();
foreach (IEnumerator<T> enumerator in sequences
.Select(s => s.GetEnumerator()))
{
enumerator.MoveNext(); // move to the first position
enumerators.Add(enumerator);
}
bool done = false;
while (!done)
{
IList<T> result = enumerators.Select(e => e.Current).ToList();
yield return result;
for (int idx = enumerators.Count - 1; idx >= 0; idx--)
{
bool hasNext = enumerators[idx].MoveNext();
if (hasNext)
{
break;
}
if (idx == 0)
{
// the first enumerator is done
done = true;
break;
}
enumerators[idx].Reset();
enumerators[idx].MoveNext();
}
}
}

Getting grouped date range using linq

I have this data that I have to group by price , and check the range and continouity of data
date price
2014-01-01 10
2014-01-02 10
2014-01-03 10
2014-01-05 20
2014-01-07 30
2014-01-08 40
2014-01-09 50
2014-01-10 30
and the output should look like this
2014-01-01 2014-01-03 10
2014-01-05 2014-01-05 20
2014-01-07 2014-01-07 30
2014-01-08 2014-01-08 40
2014-01-09 2014-01-09 50
2014-01-10 2014-01-10 30
I tried so far
var result = list
.OrderBy(a => a.Date)
.GroupBy(a => a.Price)
.Select(x => new
{
DateMax = x.Max(a => a.Date),
DateMin = x.Min(a => a.Date),
Count = x.Count()
})
.ToList()
.Where(a => a.DateMax.Subtract(a.DateMin).Days == a.Count)
.ToList();
I am not really sure this takes care of continuous dates. All dates are unique!

So first of we'll use a helper method to group consecutive items. It'll take a function that will be given the "previous" and "current" item, and it will then determine if that item should be in the current group, or should start a new one.
public static IEnumerable<IEnumerable<T>> GroupWhile<T>(
this IEnumerable<T> source, Func<T, T, bool> predicate)
{
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
yield break;
List<T> list = new List<T>() { iterator.Current };
T previous = iterator.Current;
while (iterator.MoveNext())
{
if (predicate(previous, iterator.Current))
{
list.Add(iterator.Current);
}
else
{
yield return list;
list = new List<T>() { iterator.Current };
}
previous = iterator.Current;
}
yield return list;
}
}
Now we're able to use that method to group the items and then select out the information that we need:
var query = data.OrderBy(item => item.Date)
.GroupWhile((previous, current) =>
previous.Date.AddDays(1) == current.Date
&& previous.Price == current.Price)
.Select(group => new
{
DateMin = group.First().Date,
DateMax = group.Last().Date,
Count = group.Count(),
Price = group.First().Price,
});

As an alternative to Servy's answer, which I find more elegant and obviously much more resuable,
You could do something more bespoke in one sweep (after ordering.)
public class ContiguousValuePeriod<TValue>
{
private readonly DateTime start;
private readonly DateTime end;
private readonly TValue value;
public ContiguousValuePeriod(
DateTime start,
DateTime end,
TValue value)
{
this.start = start;
this.end = end;
this.value = value;
}
public DateTime Start { get { return this.start; } }
public DateTime End { get { return this.start; } }
public TValue Value { get { return this.value; } }
}
public IEnumerable<ContiguousValuePeriod<TValue>>
GetContiguousValuePeriods<TValue, TItem>(
this IEnumerable<TItem> source,
Func<TItem, DateTime> dateSelector,
Func<TItem, TValue> valueSelector)
{
using (var iterator = source
.OrderBy(t => valueSelector(t))
.ThenBy(t => dateSelector(t))
.GetEnumerator())
{
if (!iterator.MoveNext())
{
yield break;
}
var periodValue = valueSelector(iterator.Current);
var periodStart = dateSelector(iterator.Current);
var periodLast = periodStart;
var hasTail = false;
while (iterator.MoveNext())
{
var thisValue = valueSelector(iterator.Current);
var thisDate = dateSelector(iterator.Current);
if (!thisValue.Equals(periodValue) ||
thisDate.Subtract(periodLast).TotalDays > 1.0)
{
// Period change
yield return new ContiguousValuePeriod(
periodStart,
periodLast,
periodValue);
periodStart = thisDate;
periodValue = thisValue;
hasTail = false;
}
else
{
hasTail = true;
}
periodLast = thisDate;
}
}
if (hasTail)
{
yield return new ContiguousValuePeriod(
periodStart,
periodLast,
periodValue);
}
}
which you use like,
var result = yourList.GetContiguousValuePeriods(
a => a.Date,
a => a.Price);

How to split list into all the cases sublists using LINQ?

I would like to to split list into all the cases SubLists using LINQ?
For example :
List contains : {"a", "b", "c"}
I would like to make list of lists where the result is : {"a", "ab", "abc"}
public List<List<Alphabet>> ListofLists (Stack<String> Pile)
{
var listoflists = new List<List<Alphabet>>();
var list = new List<Alphabet>();
foreach (var temp in from value in Pile where value != "#" select new Alphabet(value))
{
list.Add(temp);
listoflists.Add(list);
}
return listoflists;
}

This method will allow you to do this.
IEnumerable<IEnumerable<T>> SublistSplit<T>(this IEnumerable<T> source)
{
if (source == null) return null;
var list = source.ToArray();
for (int i = 0; i < list.Length; i++)
{
yield return new ArraySegment<T>(list, 0, i);
}
}
In case of strings:
IEnumerable<string> SublistSplit<T>(this IEnumerable<string> source)
{
if (source == null) return null;
var sb = new StringBuilder();
foreach (var x in source)
{
sb.Append(x);
yield return sb.ToString();
}
}

If you want to yield the intermediate values of an accumulation you could define your own extension method:
public IEnumerable<TAcc> Scan<T, TAcc>(this IEnumerable<T> seq, TAcc init, Func<T, TAcc, TAcc> acc)
{
TAcc current = init;
foreach(T item in seq)
{
current = acc(item, current);
yield return current;
}
}
then your example would be:
var strings = new[] {"a", "b", "c"}.Scan("", (str, acc) => str + acc);
for lists, you'll have to copy them each time:
List<Alphabet> input = //
List<List<Alphabet>> output = input.Scan(new List<Alphabet>(), (a, acc) => new List<Alphabet(acc) { a }).ToList();
Note that copying the intermediate List<T> instances could be inefficient, so you may want to consider using an immutable structure instead.

finding the count in two lists

I have two lists which I am getting from database as follow:
List<myobject1> frstList = ClientManager.Get_FirstList( PostCode.Text, PhoneNumber.Text);
List<myobject2> secondList = new List<myobject2>;
foreach (var c in frstList )
{
secondList.Add( ClaimManager.GetSecondList(c.ID));
}
now my list will contain data like so:
frstList: id = 1, id = 2
secondList: id=1 parentid = 1, id=2 parentid=1 and id = 3 parentid = 2
I want to count these individually and return the one that has most counts? in above example it should return id=1 from frsList and id1 and id2 from secondList...
tried this but not working
var numbers = (from c in frstList where c.Parent.ID == secondList.Select(cl=> cl.ID) select c).Count();
can someone please help me either in linq or normal foreach to do this?
Thanks

Looking at the question it appears that what you want is to determine which of the parent nodes has the most children, and you want the output to be that parent node along with all of its child nodes.
The query is fairly straightforward:
var largestGroup = secondList.GroupBy(item => item.ParentID)
.MaxBy(group => group.Count());
var mostFrequentParent = largestGroup.Key;
var childrenOfMostFrequentParent = largestGroup.AsEnumerable();
We'll just need this helper function, MaxBy:
public static TSource MaxBy<TSource, TKey>(this IEnumerable<TSource> source
, Func<TSource, TKey> selector
, IComparer<TKey> comparer = null)
{
if (comparer == null)
{
comparer = Comparer<TKey>.Default;
}
using (IEnumerator<TSource> iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
throw new ArgumentException("Source was empty");
}
TSource maxItem = iterator.Current;
TKey maxValue = selector(maxItem);
while (iterator.MoveNext())
{
TKey nextValue = selector(iterator.Current);
if (comparer.Compare(nextValue, maxValue) > 0)
{
maxValue = nextValue;
maxItem = iterator.Current;
}
}
return maxItem;
}
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to select last value from each run of similar items? - c#

Because your input is a List<>, so I think this should work for you with an acceptable performance and especially it's very concise: var result = words.Where((x, i) => i == words.Count - 1 || words[i][0] != words[i + 1][0]); You can append ToList() on the result to get a List<string> if you want.

Related

Find object index in binding list? [duplicate]

c# faster n-ary cartesian product for

Getting grouped date range using linq

How to split list into all the cases sublists using LINQ?

finding the count in two lists

Categories

Resources