Is there a better way to use Lambda with groups of N? - c#

I have the method Process(IEnumerable<Record> records) which can take UP TO but NO MORE THAN 3 records at a time. I have hundreds of records, so I need to pass in groups. I do this:
var _Records = Enumerable.Range(1, 16).ToArray();
for (int i = 0; i < int.MaxValue; i += 3)
{
var _ShortList = _Records.Skip(i).Take(3);
if (!_ShortList.Any())
break;
Process(_ShortList);
}
// TODO: finish
It works, but... is there a better way?

you can use MoreLinq's Batch
var result=Enumerable.Range(1, 16).Batch(3);
or
var arrayOfArrays = Enumerable.Range(1, 16).Batch(3).Select(x => x.ToArray()).ToArray();
And here is the source if you want to take a look at it.

You may use this extension method:
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> source, int chunkSize)
{
return source
.Select((value, i) => new { Index = i, Value = value })
.GroupBy(item => item.Index % chunkSize)
.Select(chunk => chunk.Select(item => item.Value));
}
It splits a source collection of items into several chunks with given size.
So your code will look next:
foreach (var chunk in Enumerable.Range(1, 16).Split(3))
{
Process(chunk);
}

Here's another LINQ-y way to do it:
var batchSize = 3;
Enumerable.Range(0, (_Records.Length - 1)/batchSize + 1)
.ToList()
.ForEach(i => Process(_Records.Skip(i * batchSize).Take(batchSize)));

In case you need "pagination" multiple times in your solution, you may consider using an extension method.
Hacked one together in LINQPad, it will do the trick.
public static class MyExtensions {
public static IEnumerable<IEnumerable<T>> Paginate<T>(this IEnumerable<T> source, int pageSize) {
T[] buffer = new T[pageSize];
int index = 0;
foreach (var item in source) {
buffer[index++] = item;
if (index >= pageSize) {
yield return buffer.Take(pageSize);
index = 0;
}
}
if (index > 0) {
yield return buffer.Take(index);
}
}
}
Basically, it pre-fills a buffer of size pageSize and yields it just when it's full. If there are < pageSize elements left, we yield them as well. So,
Enumerable.Range(1, 10).Paginate(3).Dump(); // Dump is a LINQPad extension
will yield
{{1, 2, 3}, {4, 5, 6}, {7, 8, 9}, {10}}

You can create your own extension method:
static class Extensions {
public static IEnumerable<IEnumerable<T>> ToBlocks<T>(this IEnumerable<T> source, int blockSize) {
var count = 0;
T[] block = null;
foreach (var item in source) {
if (block == null)
block = new T[blockSize];
block[count++] = item;
if (count == blockSize) {
yield return block;
block = null;
count = 0;
}
}
if (count > 0)
yield return block.Take(count);
}
}

public static void ChunkProcess<T>(IEnumerable<T> source, int size, Action<IEnumerable<T>> action)
{
var chunk = source.Take(size);
while (chunk.Any())
{
action(chunk);
source = source.Skip(size);
chunk = source.Take(size);
}
}
and your code would be
ChunkProcess(_Records, 3, Process);

var _Records = Enumerable.Range(1, 16).ToArray();
int index = 0;
foreach (var group in _Records.GroupBy(element => index++ / 3))
Process(group);
NOTE: The code above is short and relatively efficient, but is still not as efficient as it can be (it will essentially build a hashtable behind the scenes). A slightly more cumbersome, but faster way would be:
var _Records = Enumerable.Range(1, 16).ToArray();
var buff = new int[3];
int index = 0;
foreach (var element in _Records) {
if (index == buff.Length) {
Process(buff);
index = 0;
}
buff[index++] = element;
}
if (index > 0)
Process(buff.Take(index));
Or, pack it to a more reusable form:
public static class EnumerableEx {
public static void Paginate<T>(this IEnumerable<T> elements, int page_size, Action<IEnumerable<T>> process_page) {
var buff = new T[3];
int index = 0;
foreach (var element in elements) {
if (index == buff.Length) {
process_page(buff);
index = 0;
}
buff[index++] = element;
}
if (index > 0)
process_page(buff.Take(index));
}
}
// ...
var _Records = Enumerable.Range(1, 16).ToArray();
_Records.Paginate(3, Process);

This extension method is working properly.
public static class EnumerableExtentions
{
public static IEnumerable<IEnumerable<T>> Chunks<T>(this IEnumerable<T> items, int size)
{
return
items.Select((member, index) => new { Index = index, Value = member })
.GroupBy(item => (int)item.Index / size)
.Select(chunk => chunk.Select(item => item.Value));
}
}

Related

c# faster n-ary cartesian product for

I have searched around for a way to find the product of multiple lists; I have used the popular answer which uses Aggregate+SelectMany. The trouble is that my example runs very slow: I have 4 lists, with 3K entries each and I need to enumerate each possible combinations.
Does anyone know a faster way in C#?
I made a fiddle here, which currently runs out of memory.
Following is the code of fiddle link
public static void Main()
{
var sources = new[]
{
Enumerable.Range(1, 3000),
Enumerable.Range(1, 3000),
Enumerable.Range(1, 3000),
Enumerable.Range(1, 3000),
};
var sw = new System.Diagnostics.Stopwatch();
sw.Start();
Console.Write("linq way...");
foreach(var l in NCartesian(sources))
{
// just enumerate
}
Console.WriteLine("{0}ms", sw.ElapsedMilliseconds);
}
public static IEnumerable<IEnumerable<T>> NCartesian<T>(
IEnumerable<IEnumerable<T>> sequences)
{
if (sequences == null)
{
return null;
}
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>()
};
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) => accumulator.SelectMany(
accseq => sequence,
(accseq, item) => accseq.Concat(new[] { item })));
}
I made one which has less memory usage than the above, still slow though:
public static IEnumerable<IEnumerable<T>> NCartesian<T>(
IEnumerable<IEnumerable<T>> sequences)
{
if (sequences == null)
{
throw new ArgumentNullException(nameof(sequences));
}
var enumerators = new List<IEnumerator<T>>();
foreach (IEnumerator<T> enumerator in sequences
.Select(s => s.GetEnumerator()))
{
enumerator.MoveNext(); // move to the first position
enumerators.Add(enumerator);
}
bool done = false;
while (!done)
{
IList<T> result = enumerators.Select(e => e.Current).ToList();
yield return result;
for (int idx = enumerators.Count - 1; idx >= 0; idx--)
{
bool hasNext = enumerators[idx].MoveNext();
if (hasNext)
{
break;
}
if (idx == 0)
{
// the first enumerator is done
done = true;
break;
}
enumerators[idx].Reset();
enumerators[idx].MoveNext();
}
}
}

C# Algorithm for Combinations/Permutations of a defined range of integers

I am trying to generate a list of unique combinations/permutations of 3 allowable values for 20 different participants (each of the 20 participants can be assigned a value of either 1, 2, or 3).
An example of one combination would be an array on length 20 with all ones like such:
{ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 }
...and everything possible all the way up to
{ 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3 }
where each value in the array can be 1, 2 or 3.
I am stuck on writing my GetAllCombinations() function and I looked at some articles on permutation, but everything I have found is just confusing me more. I am not even sure if permutation is what I need here
So far I have this:
public List<int[]> GetValidCombinations()
{
const int positions = 20;
int[] acceptableValues = new int[3] { 1, 2, 3 };
//DO I USE PERMUTATION HERE TO BUILD MY ARRAY LIST OF ALL POSSIBLE COMBINATIONS?
var allPossibleCombinations = GetAllCombinations(positions, acceptableValues);
List<int[]> validList = new List<int[]>();
foreach (var combination in allPossibleCombinations)
{
//omited for berevity but I would
//do some other validations here...
if (true)
{
validList.Add(combination);
}
}
return validList;
}
public List<int[]> GetAllCombinations(int positions, int[] acceptableValues)
{
//For now returning null because I
//don't know How the heck to do this...
return null;
}
I have looked at some examples of permutation and I tried to use something like this below, but it did not produce what I was looking for:
static IEnumerable<IEnumerable<T>>
GetPermutations<T>(IEnumerable<T> list, int length)
{
if (length == 1) return list.Select(t => new T[] { t });
return GetPermutations(list, length - 1)
.SelectMany(t => list.Where(o => !t.Contains(o)),
(t1, t2) => t1.Concat(new T[] { t2 }));
}
public void Test()
{
const int k = 20;
var n = new[] { 1, 2, 3 };
var combinations = GetPermutations(n, k);
//DOES NOT WORK FOR WHAT I NEED
}
Running Test() worked with k was 3 or less but returned nothing if k was greater then 3.
Try this:
public static List<int[]> GetAllCombinations(int position, int[] acceptableVaues)
{
List<int[]> result = new List<int[]>();
int[] parent = new int[] { };
result = AddAPosition(parent, acceptableVaues);
while(position > 1)
{
var tmpResult = new List<int[]>();
foreach(var _parent in result)
{
tmpResult.AddRange(AddAPosition(_parent, acceptableVaues));
}
position--;
result = tmpResult;
}
return result;
}
public static List<int[]> AddAPosition(int[] parent, int[] acceptableValues)
{
List<int[]> result = new List<int[]>();
for (int i = 0; i< acceptableValues.Length; i++)
{
var anArray = new int[parent.Length + 1];
for (int j = 0; j< parent.Length; j++)
{
anArray[j] = parent[j];
}
anArray[parent.Length] = acceptableValues[i];
result.Add(anArray);
}
return result;
}

How to convert a multiple rank array using ConvertAll()?

I want to use ConvertAll like this:
var sou = new[,] { { true, false, false }, { true, true, true } };
var tar = Array.ConvertAll<bool, int>(sou, x => (x ? 1 : 0));
but I got compiler error:
cannot implicitly convert type bool[,] to bool[]
You could write a straightforward conversion extension:
public static class ArrayExtensions
{
public static TResult[,] ConvertAll<TSource, TResult>(this TSource[,] source, Func<TSource, TResult> projection)
{
if (source == null)
throw new ArgumentNullException("source");
if (projection == null)
throw new ArgumentNullException("projection");
var result = new TResult[source.GetLength(0), source.GetLength(1)];
for (int x = 0; x < source.GetLength(0); x++)
for (int y = 0; y < source.GetLength(1); y++)
result[x, y] = projection(source[x, y]);
return result;
}
}
Sample usage would look like this:
var tar = sou.ConvertAll(x => x ? 1 : 0);
The downside is that if you wanted to do any other transforms besides projection, you would be in a pickle.
Alternatively, if you want to be able to use LINQ operators on the sequence, you can do that easily with regular LINQ methods. However, you would still need a custom implementation to turn the sequence back into a 2D array:
public static T[,] To2DArray<T>(this IEnumerable<T> source, int rows, int columns)
{
if (source == null)
throw new ArgumentNullException("source");
if (rows < 0 || columns < 0)
throw new ArgumentException("rows and columns must be positive integers.");
var result = new T[rows, columns];
if (columns == 0 || rows == 0)
return result;
int column = 0, row = 0;
foreach (T element in source)
{
if (column >= columns)
{
column = 0;
if (++row >= rows)
throw new InvalidOperationException("Sequence elements do not fit the array.");
}
result[row, column++] = element;
}
return result;
}
This would allow a great deal more flexibility as you can operate on your source array as an IEnumerable{T} sequence.
Sample usage:
var tar = sou.Cast<bool>().Select(x => x ? 1 : 0).To2DArray(sou.GetLength(0), sou.GetLength(1));
Note that the initial cast is required to transform the sequence from IEnumerable paradigm to IEnumerable<T> paradigm since a multidimensional array does not implement the generic IEnumerable<T> interface. Most of the LINQ transforms only work on that.
If your array is of unknown rank, you can use this extension method (which depends on the MoreLinq Nuget package). I'm sure this can be optimized a lot, though, but this works for me.
using MoreLinq;
using System;
using System.Collections.Generic;
using System.Linq;
public static class ArrayExtensions
{
public static Array ConvertAll<TOutput>(this Array array, Converter<object, TOutput> converter)
{
foreach (int[] indices in GenerateIndices(array))
{
array.SetValue(converter.Invoke(array.GetValue(indices)), indices);
}
return array;
}
private static IEnumerable<int[]> GenerateCartesianProductOfUpperBounds(IEnumerable<int> upperBounds, IEnumerable<int[]> existingCartesianProduct)
{
if (!upperBounds.Any())
return existingCartesianProduct;
var slice = upperBounds.Slice(0, upperBounds.Count() - 1);
var rangeOfIndices = Enumerable.Range(0, upperBounds.Last() + 1);
IEnumerable<int[]> newCartesianProduct;
if (existingCartesianProduct.Any())
newCartesianProduct = rangeOfIndices.Cartesian(existingCartesianProduct, (i, p1) => new[] { i }.Concat(p1).ToArray()).ToArray();
else
newCartesianProduct = rangeOfIndices.Select(i => new int[] { i }).ToArray();
return GenerateCartesianProductOfUpperBounds(slice, newCartesianProduct);
}
private static IEnumerable<int[]> GenerateIndices(Array array)
{
var upperBounds = Enumerable.Range(0, array.Rank).Select(r => array.GetUpperBound(r));
return GenerateCartesianProductOfUpperBounds(upperBounds, Array.Empty<int[]>());
}
}

linq to sql remove URLS that starts with same domain

I have a list of URLS in a data table. I want to remove rows that starts with same domain. Right now I have this code:
List<int> toRemove = new List<int>();
toRemove.Clear();
string initialDomain;
string compareDomainName;
for(int i = 0; i<UrlList.Rows.Count -1; i++)
{
if (toRemove.Contains(i))
continue;
initialDomain = new Uri(UrlList.Rows[i][0] as String).Host;
for(int j = i + 1; j < UrlList.Rows.Count; j++)
{
compareDomainName = new Uri(UrlList.Rows[j][0] as String).Host;
if (String.Compare(initialDomain, compareDomainName, true) == 0)
{
toRemove.Add(j);
}
}
percent = i * 100 / total;
if (percent > lastPercent)
{
progress.EditValue = percent;
Application.DoEvents();
lastPercent = percent;
}
}
for(int i = toRemove.Count-1; i>=0; i--)
{
UrlList.Rows.RemoveAt(toRemove[i]);
}
It works well for small amount of data, but when I load a long list of URLs it is very slow. Now I want to move to linq, but I do not know how to realize this using linq. Any help?
Update *
I do not need to remove eduplicate rows. For ex.
I have a list of URLS
Now, I know how to remove duplicate rows. My problem is:
I have a simple list of urls:
http://centroid.steven.centricagency.com/forms/contact-us?page=1544
http://chirp.wildcenter.org/poll
http://itdiscover.com/links/
http://itdiscover.com/links/?page=132
http://itdiscover.com/links/?page=2
http://itdiscover.com/links/?page=3
http://itdiscover.com/links/?page=4
http://itdiscover.com/links/?page=6
http://itdiscover.com/links/?page=8
http://www.foreignpolicy.com/articles/2010/06/21/la_vie_en
http://www.foreignpolicy.com/articles/2010/06/21/the_worst_of_the_worst
http://www.foreignpolicy.com/articles/2011/04/25/think_again_dictators
http://www.foreignpolicy.com/articles/2011/08/22/the_dictators_survival_guide
http://www.gsioutdoors.com/activities/pdp/glacier_ss_nesting_wine_glass/gourmet_backpacking/
http://www.gsioutdoors.com/products/pdp/telescoping_foon_orange/
http://www.gsioutdoors.com/products/pdp/telescoping_spoon_blue/
now I want this list:
http://centroid.steven.centricagency.com/forms/contact-us?page=1544
http://chirp.wildcenter.org/poll
http://itdiscover.com/links/
http://www.foreignpolicy.com/articles/2010/06/21/la_vie_en
http://www.gsioutdoors.com/activities/pdp/glacier_ss_nesting_wine_glass/gourmet_backpacking/
var result = urls.Distinct(new UrlComparer());
public class UrlComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y)
{
return new Uri(x).Host == new Uri(y).Host;
}
public int GetHashCode(string obj)
{
return new Uri(obj).Host.GetHashCode();
}
}
You can also implement an extension method DistinctBy
public static partial class MyExtensions
{
public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> source, Func<T, TKey> keySelector)
{
HashSet<TKey> knownKeys = new HashSet<TKey>();
return source.Where(x => knownKeys.Add(keySelector(x)));
}
}
var result = urls.DistinctBy(url => new Uri(url).Host);
Try to use this:
IEnumerable<string> DeleteDuplicates(IEnumerable<string> source)
{
var hosts = new HashSet<string>();
foreach (var s in source)
{
var host = new Uri(s).Host.ToLower();
if (hosts.Contains(host))
continue;
hosts.Add(host);
yield return s;
}
}
Hi implement this function to remove the duplicate rows
public DataTable FilterURLS(DataTable urllist)
{
return
(from urlrow in urllist.Rows.OfType<DataRow>()
group urlrow by urlrow.Field<string>("Host") into g
select g
.OrderBy(r => r.Field<int>("ID"))
.First()).CopyToDataTable();
}

Creating multiple array using a foreach loop and a select statement

I have a database that I call select all of its contents of a table. It has 18000+ items. I have a method uses a web service that can have an array of up to ten element pass into it. Right now I am doing item by item instead of by an array. I want to create an array of ten and then call the function. I could make an array of ten and then call the function be what is I have an extra three records?
public static void Main()
{
inventoryBLL inv = new inventoryBLL();
DataSet1.sDataTable dtsku = inv.SelectEverything();
foreach (DataSet1.Row row in dtsku)
{
webservicefunction(row.item);
}
}
My question is how would I transform this?
Generic solution of your problem could look like this:
static class LinqHelper
{
public static IEnumerable<T[]> SplitIntoGroups<T>(this IEnumerable<T> items, int N)
{
if (items == null || N < 1)
yield break;
T[] group = new T[N];
int size = 0;
var iter = items.GetEnumerator();
while (iter.MoveNext())
{
group[size++] = iter.Current;
if (size == N)
{
yield return group;
size = 0;
group = new T[N];
}
}
if (size > 0)
yield return group.Take(size).ToArray();
}
}
So your Main function become
public static void Main()
{
inventoryBLL inv = new inventoryBLL();
DataSet1.sDataTable dtsku = inv.SelectEverything();
foreach (var items in dtsku.Select(r => r.item).SplitIntoGroups(10))
{
webservicefunction(items);
}
}
var taken = 0;
var takecount = 10;
while(list.Count() >= taken)
{
callWebService(list.Skip(taken).Take(takecount));
taken += takecount;
}
Generic Extension Method version:
public static void AtATime<T>(this IEnumerable<T> list, int eachTime, Action<IEnumerable<T>> action)
{
var taken = 0;
while(list.Count() >= taken)
{
action(list.Skip(taken).Take(eachTime));
taken += eachTime;
}
}
Usage:
inv.SelectEverything().AtATime<Row>(10, webservicefunction);

Categories