Align multiple sorted lists - c#

If I have, for example the following List<int>s
{ 1, 2, 3, 4 } //list1
{ 2, 3, 5, 6 } //list2
...
{ 3, 4, 5 } //listN
What is the best way to retrieve the following corresponding List<int?>s?
{ 1, 2, 3, 4, null, null } //list1
{ null, 2, 3, null, 5, 6 } //list2
...
{ null, null, 3, 4, 5, null } //listN

I'm posting the solution we discussed in chat. I had an unoptimized version using Linq for all things loopy/filtering:
http://ideone.com/H4gCoE (live demo)
However, I suspect it won't be too performant because of all the enumerator classes created, and the collections being instantiated/modified along the way.
So I took the time to optimize it into handwritten loops with an administration to keep track of active iterators instead of modifying the iters collection. Here it is:
See http://ideone.com/FuZIDy for full live demo.
Note I assume the lists are pre-ordered by DefaultComparer<T>, since I use Linq'sMin() extension method without a custom comparer
public static IEnumerable<IEnumerable<T>> AlignSequences<T>(this IEnumerable<IEnumerable<T>> sequences)
{
var iters = sequences
.Select((s, index) => new { active=true, index, enumerator = s.GetEnumerator() })
.ToArray();
var isActive = iters.Select(it => it.enumerator.MoveNext()).ToArray();
var numactive = isActive.Count(flag => flag);
try
{
while (numactive > 0)
{
T min = iters
.Where(it => isActive[it.index])
.Min(it => it.enumerator.Current);
var row = new T[iters.Count()];
for (int j = 0; j < isActive.Length; j++)
{
if (!isActive[j] || !Equals(iters[j].enumerator.Current, min))
continue;
row[j] = min;
if (!iters[j].enumerator.MoveNext())
{
isActive[j] = false;
numactive -= 1;
}
}
yield return row;
}
}
finally
{
foreach (var iter in iters) iter.enumerator.Dispose();
}
}
Use it like this:
public static void Main(string[] args)
{
var list1 = new int?[] { 1, 2, 3, 4, 5 };
var list2 = new int?[] { 3, 4, 5, 6, 7 };
var list3 = new int?[] { 6, 9, 9 };
var lockstep = AlignSequences(new[] { list1, list2, list3 });
foreach (var step in lockstep)
Console.WriteLine(string.Join("\t", step.Select(i => i.HasValue ? i.Value.ToString() : "null").ToArray()));
}
It prints (for demo purposes I print the results sideways):
1 null null
2 null null
3 3 null
4 4 null
5 5 null
null 6 6
null 7 null
null null 9
null null 9
Note: You might like to change the interface to accept arbitrary number of lists, instead of a single sequence of sequences:
public static IEnumerable<IEnumerable<T>> AlignSequences<T>(params IEnumerable<T>[] sequences)
That way you could just call
var lockstep = AlignSequences(list1, list2, list3);

Here's another approach using List.BinarySearch.
sample data:
var list1 = new List<int>() { 1, 2, 3, 4 };
var list2 = new List<int>() { 2, 3, 5, 6, 7, 8 };
var list3 = new List<int>() { 3, 4, 5 };
var all = new List<List<int>>() { list1, list2, list3 };
calculate min/max and all nullable-lists:
int min = all.Min(l => l.Min());
int max = all.Max(l => l.Max());
// start from smallest number and end with highest, fill all between
int count = max - min + 1;
List<int?> l1Result = new List<int?>(count);
List<int?> l2Result = new List<int?>(count);
List<int?> l3Result = new List<int?>(count);
foreach (int val in Enumerable.Range(min, count))
{
if (list1.BinarySearch(val) >= 0)
l1Result.Add(val);
else
l1Result.Add(new Nullable<int>());
if (list2.BinarySearch(val) >= 0)
l2Result.Add(val);
else
l2Result.Add(new Nullable<int>());
if (list3.BinarySearch(val) >= 0)
l3Result.Add(val);
else
l3Result.Add(new Nullable<int>());
}
output:
Console.WriteLine(string.Join(",", l1Result.Select(i => !i.HasValue ? "NULL" : i.Value.ToString())));
Console.WriteLine(string.Join(",", l2Result.Select(i => !i.HasValue ? "NULL" : i.Value.ToString())));
Console.WriteLine(string.Join(",", l3Result.Select(i => !i.HasValue ? "NULL" : i.Value.ToString())));
1, 2, 3, 4, NULL, NULL, NULL, NULL
NULL, 2, 3, NULL, 5, 6, 7, 8
NULL, NULL, 3, 4, 5, NULL, NULL, NULL
DEMO

Related

Split a list into multiple lists at increasing sequence broken

I've a List of int and I want to create multiple List after splitting the original list when a lower or same number is found. Numbers are not in sorted order.
List<int> data = new List<int> { 1, 2, 1, 2, 3, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5, 6 };
I want the result to be as following lists:
{ 1, 2 }
{ 1, 2, 3 }
{ 3 }
{ 1, 2, 3, 4 }
{ 1, 2, 3, 4, 5, 6 }
Currently, I'm using following linq to do this but not helping me out:
List<int> data = new List<int> { 1, 2, 1, 2, 3, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5, 6 };
List<List<int>> resultLists = new List<List<int>>();
var res = data.Where((p, i) =>
{
int count = 0;
resultLists.Add(new List<int>());
if (p < data[(i + 1) >= data.Count ? i - 1 : i + 1])
{
resultLists[count].Add(p);
}
else
{
count++;
resultLists.Add(new List<int>());
}
return true;
}).ToList();
I'd just go for something simple:
public static IEnumerable<List<int>> SplitWhenNotIncreasing(List<int> numbers)
{
for (int i = 1, start = 0; i <= numbers.Count; ++i)
{
if (i != numbers.Count && numbers[i] > numbers[i - 1])
continue;
yield return numbers.GetRange(start, i - start);
start = i;
}
}
Which you'd use like so:
List<int> data = new List<int> { 1, 2, 1, 2, 3, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5, 6 };
foreach (var subset in SplitWhenNotIncreasing(data))
Console.WriteLine(string.Join(", ", subset));
If you really did need to work with IEnumerable<T>, then the simplest way I can think of is like this:
public sealed class IncreasingSubsetFinder<T> where T: IComparable<T>
{
public static IEnumerable<IEnumerable<T>> Find(IEnumerable<T> numbers)
{
return new IncreasingSubsetFinder<T>().find(numbers.GetEnumerator());
}
IEnumerable<IEnumerable<T>> find(IEnumerator<T> iter)
{
if (!iter.MoveNext())
yield break;
while (!done)
yield return increasingSubset(iter);
}
IEnumerable<T> increasingSubset(IEnumerator<T> iter)
{
while (!done)
{
T prev = iter.Current;
yield return prev;
if ((done = !iter.MoveNext()) || iter.Current.CompareTo(prev) <= 0)
yield break;
}
}
bool done;
}
Which you would call like this:
List<int> data = new List<int> { 1, 2, 1, 2, 3, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5, 6 };
foreach (var subset in IncreasingSubsetFinder<int>.Find(data))
Console.WriteLine(string.Join(", ", subset));
This is not a typical LINQ operation, so as usual in such cases (when one insists on using LINQ) I would suggest using Aggregate method:
var result = data.Aggregate(new List<List<int>>(), (r, n) =>
{
if (r.Count == 0 || n <= r.Last().Last()) r.Add(new List<int>());
r.Last().Add(n);
return r;
});
You can use the index to get the previous item and calculate the group id out of comparing the values. Then group on the group ids and get the values out:
List<int> data = new List<int> { 1, 2, 1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5, 6 };
int groupId = 0;
var groups = data.Select
( (item, index)
=> new
{ Item = item
, Group = index > 0 && item <= data[index - 1] ? ++groupId : groupId
}
);
List<List<int>> list = groups.GroupBy(g => g.Group)
.Select(x => x.Select(y => y.Item).ToList())
.ToList();
I really like Matthew Watson's solution. If however you do not want to rely on List<T>, here is my simple generic approach enumerating the enumerable once at most and still retaining the capability for lazy evaluation.
public static IEnumerable<IEnumerable<T>> AscendingSubsets<T>(this IEnumerable<T> superset) where T :IComparable<T>
{
var supersetEnumerator = superset.GetEnumerator();
if (!supersetEnumerator.MoveNext())
{
yield break;
}
T oldItem = supersetEnumerator.Current;
List<T> subset = new List<T>() { oldItem };
while (supersetEnumerator.MoveNext())
{
T currentItem = supersetEnumerator.Current;
if (currentItem.CompareTo(oldItem) > 0)
{
subset.Add(currentItem);
}
else
{
yield return subset;
subset = new List<T>() { currentItem };
}
oldItem = supersetEnumerator.Current;
}
yield return subset;
}
Edit: Simplified the solution further to only use one enumerator.
I have modified your code, and now working fine:
List<int> data = new List<int> { 1, 2, 1, 2, 3,3, 1, 2, 3, 4, 1, 2, 3, 4, 5, 6 };
List<List<int>> resultLists = new List<List<int>>();
int last = 0;
int count = 0;
var res = data.Where((p, i) =>
{
if (i > 0)
{
if (p > last && p!=last)
{
resultLists[count].Add(p);
}
else
{
count++;
resultLists.Add(new List<int>());
resultLists[count].Add(p);
}
}
else
{
resultLists.Add(new List<int>());
resultLists[count].Add(p);
}
last = p;
return true;
}).ToList();
For things like this, I'm generally not a fan of solutions that use GroupBy or other methods that materialize the results. The reason is that you never know how long the input sequence will be, and materializations of these sub-sequences can be very costly.
I prefer to stream the results as they are pulled. This allows implementations of IEnumerable<T> that stream results to continue streaming through your transformation of that stream.
Note, this solution won't work if you break out of iterating through the sub-sequence and want to continue to the next sequence; if this is an issue, then one of the solutions that materialize the sub-sequences would probably be better.
However, for forward-only iterations of the entire sequence (which is the most typical use case), this will work just fine.
First, let's set up some helpers for our test classes:
private static IEnumerable<T> CreateEnumerable<T>(IEnumerable<T> enumerable)
{
// Validate parameters.
if (enumerable == null) throw new ArgumentNullException("enumerable");
// Cycle through and yield.
foreach (T t in enumerable)
yield return t;
}
private static void EnumerateAndPrintResults<T>(IEnumerable<T> data,
[CallerMemberName] string name = "") where T : IComparable<T>
{
// Write the name.
Debug.WriteLine("Case: " + name);
// Cycle through the chunks.
foreach (IEnumerable<T> chunk in data.
ChunkWhenNextSequenceElementIsNotGreater())
{
// Print opening brackets.
Debug.Write("{ ");
// Is this the first iteration?
bool firstIteration = true;
// Print the items.
foreach (T t in chunk)
{
// If not the first iteration, write a comma.
if (!firstIteration)
{
// Write the comma.
Debug.Write(", ");
}
// Write the item.
Debug.Write(t);
// Flip the flag.
firstIteration = false;
}
// Write the closing bracket.
Debug.WriteLine(" }");
}
}
CreateEnumerable is used for creating a streaming implementation, and EnumerateAndPrintResults will take the sequence, call ChunkWhenNextSequenceElementIsNotGreater (this is coming up and does the work) and output the results.
Here's the implementation. Note, I've chosen to implement them as extension methods on IEnumerable<T>; this is the first benefit, as it doesn't require a materialized sequence (technically, none of the other solutions do either, but it's better to explicitly state it like this).
First, the entry points:
public static IEnumerable<IEnumerable<T>>
ChunkWhenNextSequenceElementIsNotGreater<T>(
this IEnumerable<T> source)
where T : IComparable<T>
{
// Validate parameters.
if (source == null) throw new ArgumentNullException("source");
// Call the overload.
return source.
ChunkWhenNextSequenceElementIsNotGreater(
Comparer<T>.Default.Compare);
}
public static IEnumerable<IEnumerable<T>>
ChunkWhenNextSequenceElementIsNotGreater<T>(
this IEnumerable<T> source,
Comparison<T> comparer)
{
// Validate parameters.
if (source == null) throw new ArgumentNullException("source");
if (comparer == null) throw new ArgumentNullException("comparer");
// Call the implementation.
return source.
ChunkWhenNextSequenceElementIsNotGreaterImplementation(
comparer);
}
Note that this works on anything that implements IComparable<T> or where you provide a Comparison<T> delegate; this allows for any type and any kind of rules you want for performing the comparison.
Here's the implementation:
private static IEnumerable<IEnumerable<T>>
ChunkWhenNextSequenceElementIsNotGreaterImplementation<T>(
this IEnumerable<T> source, Comparison<T> comparer)
{
// Validate parameters.
Debug.Assert(source != null);
Debug.Assert(comparer != null);
// Get the enumerator.
using (IEnumerator<T> enumerator = source.GetEnumerator())
{
// Move to the first element. If one can't, then get out.
if (!enumerator.MoveNext()) yield break;
// While true.
while (true)
{
// The new enumerator.
var chunkEnumerator = new
ChunkWhenNextSequenceElementIsNotGreaterEnumerable<T>(
enumerator, comparer);
// Yield.
yield return chunkEnumerator;
// If the last move next returned false, then get out.
if (!chunkEnumerator.LastMoveNext) yield break;
}
}
}
Of note: this uses another class ChunkWhenNextSequenceElementIsNotGreaterEnumerable<T> to handle enumerating the sub-sequences. This class will iterate each of the items from the IEnumerator<T> that is obtained from the original IEnumerable<T>.GetEnumerator() call, but store the results of the last call to IEnumerator<T>.MoveNext().
This sub-sequence generator is stored, and the value of the last call to MoveNext is checked to see if the end of the sequence has or hasn't been hit. If it has, then it simply breaks, otherwise, it moves to the next chunk.
Here's the implementation of ChunkWhenNextSequenceElementIsNotGreaterEnumerable<T>:
internal class
ChunkWhenNextSequenceElementIsNotGreaterEnumerable<T> :
IEnumerable<T>
{
#region Constructor.
internal ChunkWhenNextSequenceElementIsNotGreaterEnumerable(
IEnumerator<T> enumerator, Comparison<T> comparer)
{
// Validate parameters.
if (enumerator == null)
throw new ArgumentNullException("enumerator");
if (comparer == null)
throw new ArgumentNullException("comparer");
// Assign values.
_enumerator = enumerator;
_comparer = comparer;
}
#endregion
#region Instance state.
private readonly IEnumerator<T> _enumerator;
private readonly Comparison<T> _comparer;
internal bool LastMoveNext { get; private set; }
#endregion
#region IEnumerable implementation.
public IEnumerator<T> GetEnumerator()
{
// The assumption is that a call to MoveNext
// that returned true has already
// occured. Store as the previous value.
T previous = _enumerator.Current;
// Yield it.
yield return previous;
// While can move to the next item, and the previous
// item is less than or equal to the current item.
while ((LastMoveNext = _enumerator.MoveNext()) &&
_comparer(previous, _enumerator.Current) < 0)
{
// Yield.
yield return _enumerator.Current;
// Store the previous.
previous = _enumerator.Current;
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
#endregion
}
Here's the test for the original condition in the question, along with the output:
[TestMethod]
public void TestStackOverflowCondition()
{
var data = new List<int> {
1, 2, 1, 2, 3, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5, 6
};
EnumerateAndPrintResults(data);
}
Output:
Case: TestStackOverflowCondition
{ 1, 2 }
{ 1, 2, 3 }
{ 3 }
{ 1, 2, 3, 4 }
{ 1, 2, 3, 4, 5, 6 }
Here's the same input, but streamed as an enumerable:
[TestMethod]
public void TestStackOverflowConditionEnumerable()
{
var data = new List<int> {
1, 2, 1, 2, 3, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5, 6
};
EnumerateAndPrintResults(CreateEnumerable(data));
}
Output:
Case: TestStackOverflowConditionEnumerable
{ 1, 2 }
{ 1, 2, 3 }
{ 3 }
{ 1, 2, 3, 4 }
{ 1, 2, 3, 4, 5, 6 }
Here's a test with non-sequential elements:
[TestMethod]
public void TestNonSequentialElements()
{
var data = new List<int> {
1, 3, 5, 7, 6, 8, 10, 2, 5, 8, 11, 11, 13
};
EnumerateAndPrintResults(data);
}
Output:
Case: TestNonSequentialElements
{ 1, 3, 5, 7 }
{ 6, 8, 10 }
{ 2, 5, 8, 11 }
{ 11, 13 }
Finally, here's a test with characters instead of numbers:
[TestMethod]
public void TestNonSequentialCharacters()
{
var data = new List<char> {
'1', '3', '5', '7', '6', '8', 'a', '2', '5', '8', 'b', 'c', 'a'
};
EnumerateAndPrintResults(data);
}
Output:
Case: TestNonSequentialCharacters
{ 1, 3, 5, 7 }
{ 6, 8, a }
{ 2, 5, 8, b, c }
{ a }
You can do it with Linq using the index to calculate the group:
var result = data.Select((n, i) => new { N = n, G = (i > 0 && n > data[i - 1] ? data[i - 1] + 1 : n) - i })
.GroupBy(a => a.G)
.Select(g => g.Select(n => n.N).ToArray())
.ToArray();
This is my simple loop approach using some yields :
static IEnumerable<IList<int>> Split(IList<int> data)
{
if (data.Count == 0) yield break;
List<int> curr = new List<int>();
curr.Add(data[0]);
int last = data[0];
for (int i = 1; i < data.Count; i++)
{
if (data[i] <= last)
{
yield return curr;
curr = new List<int>();
}
curr.Add(data[i]);
last = data[i];
}
yield return curr;
}
I use a dictionary to get 5 different list as below;
static void Main(string[] args)
{
List<int> data = new List<int> { 1, 2, 1, 2, 3, 3, 1, 2, 3, 4, 1, 2, 3, 4, 5, 6 };
Dictionary<int, List<int>> listDict = new Dictionary<int, List<int>>();
int listCnt = 1;
//as initial value get first value from list
listDict.Add(listCnt, new List<int>());
listDict[listCnt].Add(data[0]);
for (int i = 1; i < data.Count; i++)
{
if (data[i] > listDict[listCnt].Last())
{
listDict[listCnt].Add(data[i]);
}
else
{
//increase list count and add a new list to dictionary
listCnt++;
listDict.Add(listCnt, new List<int>());
listDict[listCnt].Add(data[i]);
}
}
//to use new lists
foreach (var dic in listDict)
{
Console.WriteLine( $"List {dic.Key} : " + string.Join(",", dic.Value.Select(x => x.ToString()).ToArray()));
}
}
Output :
List 1 : 1,2
List 2 : 1,2,3
List 3 : 3
List 4 : 1,2,3,4
List 5 : 1,2,3,4,5,6

The union of the intersects of the 2 set combinations of a sequence of sequences

How can I find the set of items that occur in 2 or more sequences in a sequence of sequences?
In other words, I want the distinct values that occur in at least 2 of the passed in sequences.
Note:
This is not the intersect of all sequences but rather, the union of the intersect of all pairs of sequences.
Note 2:
The does not include the pair, or 2 combination, of a sequence with itself. That would be silly.
I have made an attempt myself,
public static IEnumerable<T> UnionOfIntersects<T>(
this IEnumerable<IEnumerable<T>> source)
{
var pairs =
from s1 in source
from s2 in source
select new { s1 , s2 };
var intersects = pairs
.Where(p => p.s1 != p.s2)
.Select(p => p.s1.Intersect(p.s2));
return intersects.SelectMany(i => i).Distinct();
}
but I'm concerned that this might be sub-optimal, I think it includes intersects of pair A, B and pair B, A which seems inefficient. I also think there might be a more efficient way to compound the sets as they are iterated.
I include some example input and output below:
{ { 1, 1, 2, 3, 4, 5, 7 }, { 5, 6, 7 }, { 2, 6, 7, 9 } , { 4 } }
returns
{ 2, 4, 5, 6, 7 }
and
{ { 1, 2, 3} } or { {} } or { }
returns
{ }
I'm looking for the best combination of readability and potential performance.
EDIT
I've performed some initial testing of the current answers, my code is here. Output below.
Original valid:True
DoomerOneLine valid:True
DoomerSqlLike valid:True
Svinja valid:True
Adricadar valid:True
Schmelter valid:True
Original 100000 iterations in 82ms
DoomerOneLine 100000 iterations in 58ms
DoomerSqlLike 100000 iterations in 82ms
Svinja 100000 iterations in 1039ms
Adricadar 100000 iterations in 879ms
Schmelter 100000 iterations in 9ms
At the moment, it looks as if Tim Schmelter's answer performs better by at least an order of magnitude.
// init sequences
var sequences = new int[][]
{
new int[] { 1, 2, 3, 4, 5, 7 },
new int[] { 5, 6, 7 },
new int[] { 2, 6, 7, 9 },
new int[] { 4 }
};
One-line way:
var result = sequences
.SelectMany(e => e.Distinct())
.GroupBy(e => e)
.Where(e => e.Count() > 1)
.Select(e => e.Key);
// result is { 2 4 5 7 6 }
Sql-like way (with ordering):
var result = (
from e in sequences.SelectMany(e => e.Distinct())
group e by e into g
where g.Count() > 1
orderby g.Key
select g.Key);
// result is { 2 4 5 6 7 }
May be fastest code (but not readable), complexity O(N):
var dic = new Dictionary<int, int>();
var subHash = new HashSet<int>();
int length = array.Length;
for (int i = 0; i < length; i++)
{
subHash.Clear();
int subLength = array[i].Length;
for (int j = 0; j < subLength; j++)
{
int n = array[i][j];
if (!subHash.Contains(n))
{
int counter;
if (dic.TryGetValue(n, out counter))
{
// duplicate
dic[n] = counter + 1;
}
else
{
// first occurance
dic[n] = 1;
}
}
else
{
// exclude duplucate in sub array
subHash.Add(n);
}
}
}
This should be very close to optimal - how "readable" it is depends on your taste. In my opinion it is also the most readable solution.
var seenElements = new HashSet<T>();
var repeatedElements = new HashSet<T>();
foreach (var list in source)
{
foreach (var element in list.Distinct())
{
if (seenElements.Contains(element))
{
repeatedElements.Add(element);
}
else
{
seenElements.Add(element);
}
}
}
return repeatedElements;
You can skip already Intesected sequences, this way will be a little faster.
public static IEnumerable<T> UnionOfIntersects<T>(this IEnumerable<IEnumerable<T>> source)
{
var result = new List<T>();
var sequences = source.ToList();
for (int sequenceIdx = 0; sequenceIdx < sequences.Count(); sequenceIdx++)
{
var sequence = sequences[sequenceIdx];
for (int targetSequenceIdx = sequenceIdx + 1; targetSequenceIdx < sequences.Count; targetSequenceIdx++)
{
var targetSequence = sequences[targetSequenceIdx];
var intersections = sequence.Intersect(targetSequence);
result.AddRange(intersections);
}
}
return result.Distinct();
}
How it works?
Input: {/*0*/ { 1, 2, 3, 4, 5, 7 } ,/*1*/ { 5, 6, 7 },/*2*/ { 2, 6, 7, 9 } , /*3*/{ 4 } }
Step 0: Intersect 0 with 1..3
Step 1: Intersect 1 with 2..3 (0 with 1 already has been intersected)
Step 2: Intersect 2 with 3 (0 with 2 and 1 with 2 already has been intersected)
Return: Distinct elements.
Result: { 2, 4, 5, 6, 7 }
You can test it with the below code
var lists = new List<List<int>>
{
new List<int> {1, 2, 3, 4, 5, 7},
new List<int> {5, 6, 7},
new List<int> {2, 6, 7, 9},
new List<int> {4 }
};
var result = lists.UnionOfIntersects();
You can try this approach, it might be more efficient and also allows to specify the minimum intersection-count and the comparer used:
public static IEnumerable<T> UnionOfIntersects<T>(this IEnumerable<IEnumerable<T>> source
, int minIntersectionCount
, IEqualityComparer<T> comparer = null)
{
if (comparer == null) comparer = EqualityComparer<T>.Default;
foreach (T item in source.SelectMany(s => s).Distinct(comparer))
{
int containedInHowManySequences = 0;
foreach (IEnumerable<T> seq in source)
{
bool contained = seq.Contains(item, comparer);
if (contained) containedInHowManySequences++;
if (containedInHowManySequences == minIntersectionCount)
{
yield return item;
break;
}
}
}
}
Some explaining words:
It enumerates all unique items in all sequences. Since Distinct is using a set this should be pretty efficient. That can help to speed up in case of many duplicates in all sequences.
The inner loop just looks into every sequence if the unique item is contained. Thefore it uses Enumerable.Contains which stops execution as soon as one item was found(so duplicates are no issue).
If the intersection-count reaches the minum intersection count this item is yielded and the next (unique) item is checked.
That should nail it:
int[][] test = { new int[] { 1, 2, 3, 4, 5, 7 }, new int[] { 5, 6, 7 }, new int[] { 2, 6, 7, 9 }, new int[] { 4 } };
var result = test.SelectMany(a => a.Distinct()).GroupBy(x => x).Where(g => g.Count() > 1).Select(y => y.Key).ToList();
First you make sure, there are no duplicates in each sequence. Then you join all sequences to a single sequence and look for duplicates as e.g. here.

Split array with LINQ

Assuming I have a list
var listOfInt = new List<int> {1, 2, 3, 4, 7, 8, 12, 13, 14}
How can I use LINQ to obtain a list of lists as follows:
{{1, 2, 3, 4}, {7, 8}, {12, 13, 14}}
So, i have to take the consecutive values and group them into lists.
You can create extension method (I omitted source check here) which will iterate source and create groups of consecutive items. If next item in source is not consecutive, then current group is yielded:
public static IEnumerable<List<int>> ToConsecutiveGroups(
this IEnumerable<int> source)
{
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
yield break;
}
else
{
int current = iterator.Current;
List<int> group = new List<int> { current };
while (iterator.MoveNext())
{
int next = iterator.Current;
if (next < current || current + 1 < next)
{
yield return group;
group = new List<int>();
}
current = next;
group.Add(current);
}
if (group.Any())
yield return group;
}
}
}
Usage is simple:
var listOfInt = new List<int> { 1, 2, 3, 4, 7, 8, 12, 13, 14 };
var groups = listOfInt.ToConsecutiveGroups();
Result:
[
[ 1, 2, 3, 4 ],
[ 7, 8 ],
[ 12, 13, 14 ]
]
UPDATE: Here is generic version of this extension method, which accepts predicate for verifying if two values should be considered consecutive:
public static IEnumerable<List<T>> ToConsecutiveGroups<T>(
this IEnumerable<T> source, Func<T,T, bool> isConsequtive)
{
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
yield break;
}
else
{
T current = iterator.Current;
List<T> group = new List<T> { current };
while (iterator.MoveNext())
{
T next = iterator.Current;
if (!isConsequtive(current, next))
{
yield return group;
group = new List<T>();
}
current = next;
group.Add(current);
}
if (group.Any())
yield return group;
}
}
}
Usage is simple:
var result = listOfInt.ToConsecutiveGroups((x,y) => (x == y) || (x == y - 1));
This works for both sorted and unsorted lists:
var listOfInt = new List<int> { 1, 2, 3, 4, 7, 8, 12, 13 };
int index = 0;
var result = listOfInt.Zip(listOfInt
.Concat(listOfInt.Reverse<int>().Take(1))
.Skip(1),
(v1, v2) =>
new
{
V = v1,
G = (v2 - v1) != 1 ? index++ : index
})
.GroupBy(x => x.G, x => x.V, (k, l) => l.ToList())
.ToList();
External index is building an index of consecutive groups that have value difference of 1. Then you can simply GroupBy with respect to this index.
To clarify solution, here is how this collection looks without grouping (GroupBy commented):
Assuming your input is in order, the following will work:
var grouped = input.Select((n, i) => new { n, d = n - i }).GroupBy(p => p.d, p => p.n);
It won't work if your input is e.g. { 1, 2, 3, 999, 5, 6, 7 }.
You'd get { { 1, 2, 3, 5, 6, 7 }, { 999 } }.
This works:
var results =
listOfInt
.Skip(1)
.Aggregate(
new List<List<int>>(new [] { listOfInt.Take(1).ToList() }),
(a, x) =>
{
if (a.Last().Last() + 1 == x)
{
a.Last().Add(x);
}
else
{
a.Add(new List<int>(new [] { x }));
}
return a;
});
I get this result:

What's the best way to apply a "Join" method generically similar to how String.Join(...) works?

If I have a string array, for example: var array = new[] { "the", "cat", "in", "the", "hat" }, and I want to join them with a space between each word I can simply call String.Join(" ", array).
But, say I had an array of integer arrays (just like I can have an array of character arrays). I want to combine them into one large array (flatten them), but at the same time insert a value between each array.
var arrays = new[] { new[] { 1, 2, 3 }, new[] { 4, 5, 6 }, new { 7, 8, 9 }};
var result = SomeJoin(0, arrays); // result = { 1, 2, 3, 0, 4, 5, 6, 0, 7, 8, 9 }
I wrote something up, but it is very ugly, and I'm sure that there is a better, cleaner way. Maybe more efficient?
var result = new int[arrays.Sum(a => a.Length) + arrays.Length - 1];
int offset = 0;
foreach (var array in arrays)
{
Buffer.BlockCopy(array, 0, result, offset, b.Length);
offset += array.Length;
if (offset < result.Length)
{
result[offset++] = 0;
}
}
Perhaps this is the most efficient? I don't know... just seeing if there is a better way. I thought maybe LINQ would solve this, but sadly I don't see anything that is what I need.
You can generically "join" sequences via:
public static IEnumerable<T> Join<T>(T separator, IEnumerable<IEnumerable<T>> items)
{
var sep = new[] {item};
var first = items.FirstOrDefault();
if (first == null)
return Enumerable.Empty<T>();
else
return first.Concat(items.Skip(1).SelectMany(i => sep.Concat(i)));
}
This works with your code:
var arrays = new[] { new[] { 1, 2, 3 }, new[] { 4, 5, 6 }, new { 7, 8, 9 }};
var result = Join(0, arrays); // result = { 1, 2, 3, 0, 4, 5, 6, 0, 7, 8, 9 }
The advantage here is that this will work with any IEnumerable<IEnumerable<T>>, and isn't restricted to lists or arrays. Note that this will insert a separate in between two empty sequences, but that behavior could be modified if desired.
public T[] SomeJoin<T>(T a, T[][] arrays){
return arrays.SelectMany((x,i)=> i == arrays.Length-1 ? x : x.Concat(new[]{a}))
.ToArray();
}
NOTE: The code works seamlessly because of using Array, otherwise we may lose some performance cost to get the Count of the input collection.
This may not be the most efficient, but it is quite extensible:
public static IEnumerable<T> Join<T>(this IEnumerable<IEnumerable<T>> source, T separator)
{
bool firstTime = true;
foreach (var collection in source)
{
if (!firstTime)
yield return separator;
foreach (var value in collection)
yield return value;
firstTime = false;
}
}
...
var arrays = new[] { new[] { 1, 2, 3 }, new[] { 4, 5, 6 }, new[] { 7, 8, 9 }};
var result = arrays.Join(0).ToArray();
// result = { 1, 2, 3, 0, 4, 5, 6, 0, 7, 8, 9 }

How to select the last set of concatenated sequence from multiple similar sequence

I have an int array, it's a concatenated array from multiple similar arrays all starting at 1.
1, 2, 3, 4
1, 2
1, 2, 3
1, 2
int[] list = { 1, 2, 3, 4, 1, 2, 1, 2, 3, 1, 2 };
What I am trying to achieve is to get the "last set" of the result which is {1, 2}.
Attempt:
int[] list = { 1, 2, 3, 4, 1, 2, 1, 2, 3, 1, 2 };
List<int> lastSet = new List<int>();
var totalSets = list.Count(x => x == 1);
int encounter = 0;
foreach (var i in list)
{
if (i == 1)
encounter += 1;
if (encounter == totalSets)
lastSet.Add(i);
}
lastSet.ToList().ForEach(x => Console.WriteLine(x));
Is there a better way to achieve this using LINQ, perhaps SkipWhile, GroupBy, Aggregate?
If you can either make your list be an actual List<int> or if it doesn't bother you to create a copy of the list via .ToList(), you can do this:
var list = new[]{ 1, 2, 3, 4, 1, 2, 1, 2, 3, 1, 2 }.ToList();
var lastSet = list.Skip(list.LastIndexOf(1)).ToList();
Otherwise, Aggregate can work, but it's a little ugly:
var lastSet = list.Aggregate(new List<int>{1}, (seed, i) => {
if(i == 1) {seed.Clear(); }
seed.Add(i);
return seed;
})
Update
As dtb points out, you can use Array.LastIndexOf rather than creating a List:
var list = new[]{ 1, 2, 3, 4, 1, 2, 1, 2, 3, 1, 2 };
var lastSet = list.Skip(Array.LastIndexOf(list, 1)).ToList();
Works for any IEnumerable (but is slower than direct List versions)
var sub = list.Reverse<int>()
.TakeWhile(i => i != 1)
.Concat(new[]{1})
.Reverse<int>();
Run a ToArray() on the result if you like.
Using the GroupAdjacent Extension Method below, you can split the list into the sequences beginning with 1 and then take the last sequence:
var result = list.GroupAdjacent((g, x) => x != 1)
.Last()
.ToList();
with
public static IEnumerable<IEnumerable<T>> GroupAdjacent<T>(
this IEnumerable<T> source, Func<IEnumerable<T>, T, bool> adjacent)
{
var g = new List<T>();
foreach (var x in source)
{
if (g.Count != 0 && !adjacent(g, x))
{
yield return g;
g = new List<T>();
}
g.Add(x);
}
yield return g;
}
LINQ is overrated:
int[] list = { 1, 2, 3, 4, 1, 2, 1, 2, 3, 1, 2 };
int pos = Array.LastIndexOf(list, 1);
int[] result = new int[list.Length - pos];
Array.Copy(list, pos, result, 0, result.Length);
// result == { 1, 2 }
Now with 100% more readable:
int[] list = { 1, 2, 3, 4, 1, 2, 1, 2, 3, 1, 2 };
int[] result = list.Slice(list.LastIndexOf(1));
// result == { 1, 2 }
where
static int LastIndexOf<T>(this T[] array, T value)
{
return Array.LastIndexOf<T>(array, value);
}
static T[] Slice<T>(this T[] array, int offset)
{
return Slice(array, offset, array.Length - offset);
}
static T[] Slice<T>(this T[] array, int offset, int length)
{
T[] result = new T[length];
Array.Copy(array, offset, result, 0, length);
return result;
}

Categories