Related
I'm trying to loop through a string to find the character, ASCII value, and the number of times the character occurs. So far, I have found each unique character and ASCII value using foreach statements, and finding if the value was already in the list, then don't add it, otherwise add it. However I'm struggling with the count portion. I was thinking the logic would be "if I am already in the list, don't count me again, however, increment my frequency"
I've tried a few different things, such as trying to find the index of the character it found and adding to that specific index, but i'm lost.
string String = "hello my name is lauren";
char[] String1 = String.ToCharArray();
// int [] frequency = new int[String1.Length]; //array of frequency counter
int length = 0;
List<char> letters = new List<char>();
List<int> ascii = new List<int>();
List<int> frequency = new List<int>();
foreach (int ASCII in String1)
{
bool exists = ascii.Contains(ASCII);
if (exists)
{
//add to frequency at same index
//ascii.Insert(1, ascii);
//get { ASCII[index]; }
}
else
{
ascii.Add(ASCII);
//add to frequency at new index
}
}
foreach (char letter in String1)
{
bool exists = letters.Contains(letter);
if (exists)
{
//add to frequency at same index
}
else
{
letters.Add(letter);
//add to frequency at new index
}
}
length = letters.Count;
for (int j = 0; j<length; ++j)
{
Console.WriteLine($"{letters[j].ToString(),3} {"(" + ascii[j] + ")"}\t");
}
Console.ReadLine();
}
}
}
I'm not sure if I understand your question but that what you are looking for may be Dictionary<T,T> instead of List<T>. Here are examples of solutions to problems i think you trying to solve.
Counting frequency of characters appearance
Dictionary<int, int> frequency = new Dictionary<int, int>();
foreach (int j in String)
{
if (frequency.ContainsKey(j))
{
frequency[j] += 1;
}
else
{
frequency.Add(j, 1);
}
}
Method to link characters to their ASCII
Dictionary<char, int> ASCIIofCharacters = new Dictionary<char, int>();
foreach (char i in String)
{
if (ASCIIofCharacters.ContainsKey(i))
{
}
else
{
ASCIIofCharacters.Add(i, (int)i);
}
}
A simple LINQ approach is to do this:
string String = "hello my name is lauren";
var results =
String
.GroupBy(x => x)
.Select(x => new { character = x.Key, ascii = (int)x.Key, frequency = x.Count() })
.ToArray();
That gives me:
If I understood your question, you want to map each char in the provided string to the count of times it appears in the string, right?
If that is the case, there are tons of ways to do that, and you also need to choose in which data structure you want to store the result.
Assuming you want to use linq and store the result in a Dictionary<char, int>, you could do something like this:
static IDictionary<char, int> getAsciiAndFrequencies(string str) {
return (
from c in str
group c by Convert.ToChar(c)
).ToDictionary(c => c.Key, c => c.Count());
}
And use if like this:
var f = getAsciiAndFrequencies("hello my name is lauren");
// result: { h: 1, e: 3, l: 3, o: 1, ... }
You are creating a histogram. But you should not use List.Contains as it gets ineffective as the list grows. You have to go through the list one item after another. Better use Dictionary which is based on hashing and you go directly to the item. The code may look like this
string str = "hello my name is lauren";
var dict = new Dictionary<char, int>();
foreach (char c in str)
{
dict.TryGetValue(c, out int count);
dict[c] = ++count;
}
foreach (var pair in dict.OrderBy(r => r.Key))
{
Console.WriteLine(pair.Value + "x " + pair.Key + " (" + (int)pair.Key + ")");
}
which gives
4x (32)
2x a (97)
3x e (101)
1x h (104)
1x i (105)
3x l (108)
2x m (109)
2x n (110)
1x o (111)
1x r (114)
1x s (115)
1x u (117)
1x y (121)
I have a concurrentdictionary with 500,000 items.
Keys are integers, items are single.
for instance:
1, 8.65
2, 7.65
3, 8.89
4, 8.90
5, 7.95
...
500000, 7.68
How I can I retrieve the min and max values within a specified key range of this dictionary and their respective keys?
Example: finding min/max data value between key=25 and key=477 and returning their keys.
I found some LINQ examples but the author warned it's potentially slower than foreach, and not doing exactly what I would like.
https://social.msdn.microsoft.com/Forums/vstudio/en-US/774aa579-2bc9-4458-93f4-af4b94169e7c/get-min-and-max-values-in-dictionary?forum=csharpgeneral
Performance is critical in my application.
Update 1:
I want to know the keys corresponding to the max/min.
The dictionary contains a time serie. The values (single) are ordered in time by their key. Higher the key value is, more recent is the data.
Update 2: benchmarks
I made a few benchmarks filling a concurrent dictionary with 929,452 records.
My CPU is i7-8550U, that means it has boost on single thread (3.8GHz) and lowers its frequency when the 4 cores (8 threads) run, roughly 2.6 GHz. So, I never expect multithread to be 4 times faster than single thread.
For each item of the dictionary, I look backward for the maximum of the previous 800 records.
Release build mode, x64:
Single thread, for loop: 14149 ms
Multithread, parallelfor loop: 4731 ms
Single thread, linq ONLY 1000 records: 17609 ms. Sorry LINQ.
LINQ is out. Definitively I will go for the "for loop". Now I'd like to compare concurrentdictionary and list ofwith the for loop.
Update 3: simplification and benchmarks
Modifying my code using other containers. All are thread-safe for reading (if no modification by other thread at the same time).
Concurrent dictionary 1-thread of my objects (datetime, 2D-single): 14682 ms
List of my objects (datetime, 2D-single): 2071 ms
Concurrent dictionary 4-threads: 4611 ms
Array of objects (datetime, 2D-single): 1030 ms
Array of 1D-single (x4) and array 1D-datetime: 784 ms
Array of 1D-single (x4) and array 1D-datetime 4 threads: 229 ms.
In order to keep my input objects read-only and as fast as possible, I will have to write the processing results in another object. It's another theme now.
I'm not sure dictionary knows to optimize based on any relationship the keys may have.
As such, I think you're going to have to do the optimizing yourself. With one pass through the dictionary, you should be able to:
int max = Int32.MinValue;
int min = Int32.MaxValue,
foreach (var k in dictionary.keys) {
if (k<minIndex | k>maxIndex) continue;
max = Math.Max(max,dictionary[k]);
min = Math.Min(min,dictionary[k]);
}
Now if your dictionary is sorted ahead of time, meaning key '50' will always be before key '60', you can abort as soon as possible and start as late as possible.
You should in fact see SortedDictionary
SINCE you updated your description
Use a SortedList, k is the index number of the list and the value is your double.
The Where will return all elements with keys in your range, and then Max() and Min() methods will return corresponding min and max values in the rage.
var data = new Dictionary<int, double>();
for (int i = 1; i <= 10; i++)
{
data.Add(i, i * 1.1);
}
var minKey = 3;
var maxKey = 7;
var max = data.Where(x => x.Key >= minKey && x.Key <= maxKey).Max(y => y.Value);
var min = data.Where(x => x.Key >= minKey && x.Key <= maxKey).Min(y => y.Value);
Edit: Extension Method
If you're going to be using this a lot, you could turn it into an extension method you so can call it easily on any dictionary of type Dictionary<int, double>.
public static class Extensions
{
public static double GetMaxInRange(this Dictionary<int, double> data, int minKey, int maxKey)
{
return data.Where(x => x.Key >= minKey && x.Key <= maxKey).Max(y => y.Value);
}
public static double GetMinInRange(this Dictionary<int, double> data, int minKey, int maxKey)
{
return data.Where(x => x.Key >= minKey && x.Key <= maxKey).Min(y => y.Value);
}
}
Call it like this:
var max = data.GetMaxInRange(3, 7);
var min = data.GetMinInRange(3, 7);
Edit2:
If you want the KeyValuePair<int, double>, then this would be an option.
public static class Extensions
{
public static KeyValuePair<int, double> GetMaxInRange(this Dictionary<int, double> data, int minKey, int maxKey)
{
return data.Where(x => x.Key >= minKey && x.Key <= maxKey).OrderByDescending(y => y.Value).FirstOrDefault();
}
public static KeyValuePair<int, double> GetMinInRange(this Dictionary<int, double> data, int minKey, int maxKey)
{
return data.Where(x => x.Key >= minKey && x.Key <= maxKey).OrderBy(y => y.Value).FirstOrDefault();
}
}
Following is a LinqPad5 example, but don't you want something like this?
var inst = new Dictionary<int, double>();
inst.Add(1, 82.65);
inst.Add(2, 8.65);
inst.Add(3, 8.89);
inst.Add(4, 84.90);
inst.Add(5, 7.95);
var min = inst.Where(x => x.Value > 8).Min(x => x.Value);
Console.WriteLine(min);
var max = inst.Where(x => x.Value < 80).Max(x => x.Value);
Console.WriteLine(max);
Or if you're looking for the key you could do something like this:
var min = inst.Where(x => x.Value > 8).OrderBy(x => x.Value).First();
Console.WriteLine(min.Key);
var max = inst.Where(x => x.Value < 80).OrderByDescending(x => x.Value).First();
Console.WriteLine(max.Key);
However... there is a catch with the lather. How can you define without certain doubt the first key is the one you need? (but that's not my issue.. just a side question)
If Only to get max and min, you can use:
Dim myResult = Aggregate order In myDict Into Max(order.Value), Min(order.Value)
'myResult.max for max and myResult.min as min
If you want to get detail each dic, for min and max, may be you can try this:
Dim myMinResult = From dic In myDic Where dic.Value = (Aggregate dicAgg In myDic Into Min(dicAgg.Value))
Dim myMaxResult = From dic In myDic Where dic.Value = (Aggregate dicAgg In myDic Into Max(dicAgg.Value))
MessageBox.Show("Min = key : " & myMinResult(0).Key.ToString & ", Value : " & myMinResult(0).Value.ToString)
MessageBox.Show("Max = key : " & myMaxResult(0).Key.ToString & ", Value : " & myMaxResult(0).Value.ToString)
I think this extension method for Dictionary can help you
static class DctExt {
public static void GetKeysByValueInRange(this Dictionary<int,float> baseDct, int start, int end, out List<int> byMinValue, out List<int> byMaxValue) {
byMinValue = new List<int>();
byMaxValue = new List<int>();
float max = GetMaxValue(baseDct, start, end);
float min = GetMinValue(baseDct, start, end);
foreach (KeyValuePair<int, float> kvp in baseDct) {
if(kvp.Value == min) {
byMinValue.Add(kvp.Key);
}
else if(kvp.Value == max) {
byMaxValue.Add(kvp.Key);
}
}
}
private static float GetMaxValue(Dictionary<int,float> baseDct, int start, int end) {
List<float> valuesOnRange = GetSpecificRange(baseDct, start, end);
return valuesOnRange.Max();
}
private static float GetMinValue(Dictionary<int,float> baseDct, int start, int end) {
List<float> valuesOnRange = GetSpecificRange(baseDct, start, end);
return valuesOnRange.Min();
}
private static List<float> GetSpecificRange(Dictionary<int,float> dct, int start, int end) {
List<float> res = new List<float>();
for (int i = start; i < end; i++) {
res.Add(dct.ElementAt(i).Value);
}
return res;
}
}
Here is the usage below
private static void Main() {
Dictionary<int, float> dct = new Dictionary<int, float> {
{1, 8.65f},
{2, 7.65f},
{3, 7.65f},
{4, 8.90f},
{5, 7.95f}
};
List<int> keysByMax = new List<int>();
List<int> keysByMin = new List<int>();
dct.GetKeysByValueInRange(1, 4, out keysByMin, out keysByMax);
foreach (var item in keysByMin) {
Console.Write($"min {item} ");
// printst min 2 min 3
}
Console.WriteLine();
foreach (var item in keysByMax) {
Console.Write($"max {item} ");
//prints max 4
}
Console.ReadLine();
}
Here is a class that encapsulates a List<T> and a ReaderWriterLock, it is thread safe to use, and will perform much better than a ConcurrentDictionary for ranged queries. It will perform even better if single-element operations are avoided, so that the ReaderWriterLock is not acquired multiple times during a search or bulk-update. For example instead of:
for (int i = 25; i < 477; i++)
{
if (list[i] > maxValue)
{
maxValue = list[i];
maxIndex = i;
}
}
...it is preferable to do it like this:
foreach (var entry in list.GetRange(25, 477))
{
if (entry.Value > maxValue)
{
maxValue = entry.Value;
maxIndex = entry.Index;
}
}
...because the method GetRange acquires and releases the lock only once. Not only this is faster, but the results will also be more consistent, because it is guaranteed that no updates will happen during the enumeration of the range.
public class ConcurrentList<T> : IEnumerable<T>
{
private readonly List<T> _list;
private readonly ReaderWriterLock _lock = new ReaderWriterLock();
public ConcurrentList()
{
_list = new List<T>();
}
public ConcurrentList(IEnumerable<T> collection)
{
_list = new List<T>(collection);
}
public int Count => ReadSafe(list => list.Count);
public T this[int index]
{
get => ReadSafe(list => list[index]);
set => WriteSafe(list => list[index] = value);
}
public IEnumerable<(int Index, T Value)> GetRange(int from, int to)
{
using (new DisposableReader(_lock))
{
for (int i = from; i < to; i++)
{
yield return (i, _list[i]);
}
}
}
public void Add(T item) => WriteSafe(list => list.Add(item));
public void AddRange(IEnumerable<T> r) => WriteSafe(list => list.AddRange(r));
public void Clear() => WriteSafe(list => list.Clear());
public void UpdateRange(IEnumerable<(int Index, T Value)> changes)
{
WriteSafe(list =>
{
foreach (var change in changes)
{
list[change.Index] = change.Value;
}
});
}
public IEnumerator<T> GetEnumerator()
{
using (new DisposableReader(_lock))
{
foreach (var item in _list)
{
yield return item;
}
}
}
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
public TResult ReadSafe<TResult>(Func<List<T>, TResult> function)
{
_lock.AcquireReaderLock(Timeout.Infinite);
try
{
return function(_list);
}
finally
{
_lock.ReleaseReaderLock();
}
}
public void WriteSafe(Action<List<T>> action)
{
_lock.AcquireWriterLock(Timeout.Infinite);
try
{
action(_list);
}
finally
{
_lock.ReleaseWriterLock();
}
}
private struct DisposableReader : IDisposable
{
private readonly ReaderWriterLock _lock;
public DisposableReader(ReaderWriterLock obj)
{
_lock = obj;
_lock.AcquireReaderLock(Timeout.Infinite);
}
public void Dispose() => _lock.ReleaseReaderLock();
}
}
I have used helper methods for acquiring and releasing the lock, to avoid repeating the try - finally block in every property and method. Of course this is not necessary, it is just a matter of style.
Say I have the following array of strings as an input:
foo-139875913
foo-aeuefhaiu
foo-95hw9ghes
barbazabejgoiagjaegioea
barbaz8gs98ghsgh9es8h
9a8efa098fea0
barbaza98fyae9fghaefag
bazfa90eufa0e9u
bazgeajga8ugae89u
bazguea9guae
aifeaufhiuafhe
There are 3 different prefixes used here, "foo-", "barbaz" and "baz" - however these prefixes are not known ahead of time (they could be something completely different).
How could you establish what the different common prefixes are so that they could then be grouped by? This is made a bit tricky since in the data I've provided there's two that start with "bazg" and one that starts "bazf" where of course "baz" is the prefix.
What I've tried so far is sorting them into alphabetical order, and then looping through them in order and counting how many characters in a row are identical to the previous. If the number is different or when 0 characters are identical, it starts a new group. The problem with this is it falls over at the "bazg" and "bazf" problem I mentioned earlier and separates those into two different groups (one with just one element in it)
Edit: Alright, let's throw a few more rules in:
Longer potential groups should generally be preferred over shorter ones, unless there is a closely matching group of less than X characters difference in length. (So where X is 2, baz would be preferred over bazg)
A group must have at least Y elements in it or not be a group at all
It's okay to simply throw away elements that don't match any of the 'groups' to within the rules above.
To clarify the first rule in relation to the second, if X was 0 and Y was 2, then the two 'bazg' entries would be in a group, and the 'bazf' would be thrown away because its on its own.
Well, here's a quick hack, probably O(something_bad):
IEnumerable<Tuple<String, IEnumerable<string>>> GuessGroups(IEnumerable<string> source, int minNameLength=0, int minGroupSize=1)
{
// TODO: error checking
return InnerGuessGroups(new Stack<string>(source.OrderByDescending(x => x)), minNameLength, minGroupSize);
}
IEnumerable<Tuple<String, IEnumerable<string>>> InnerGuessGroups(Stack<string> source, int minNameLength, int minGroupSize)
{
if(source.Any())
{
var tuple = ExtractTuple(GetBestGroup(source, minNameLength), source);
if (tuple.Item2.Count() >= minGroupSize)
yield return tuple;
foreach (var element in GuessGroups(source, minNameLength, minGroupSize))
yield return element;
}
}
Tuple<String, IEnumerable<string>> ExtractTuple(string prefix, Stack<string> source)
{
return Tuple.Create(prefix, PopWithPrefix(prefix, source).ToList().AsEnumerable());
}
IEnumerable<string> PopWithPrefix(string prefix, Stack<string> source)
{
while (source.Any() && source.Peek().StartsWith(prefix))
yield return source.Pop();
}
string GetBestGroup(IEnumerable<string> source, int minNameLength)
{
var s = new Stack<string>(source);
var counter = new DictionaryWithDefault<string, int>(0);
while(s.Any())
{
var g = GetCommonPrefix(s);
if(!string.IsNullOrEmpty(g) && g.Length >= minNameLength)
counter[g]++;
s.Pop();
}
return counter.OrderBy(c => c.Value).Last().Key;
}
string GetCommonPrefix(IEnumerable<string> coll)
{
return (from len in Enumerable.Range(0, coll.Min(s => s.Length)).Reverse()
let possibleMatch = coll.First().Substring(0, len)
where coll.All(f => f.StartsWith(possibleMatch))
select possibleMatch).FirstOrDefault();
}
public class DictionaryWithDefault<TKey, TValue> : Dictionary<TKey, TValue>
{
TValue _default;
public TValue DefaultValue {
get { return _default; }
set { _default = value; }
}
public DictionaryWithDefault() : base() { }
public DictionaryWithDefault(TValue defaultValue) : base() {
_default = defaultValue;
}
public new TValue this[TKey key]
{
get { return base.ContainsKey(key) ? base[key] : _default; }
set { base[key] = value; }
}
}
Example usage:
string[] input = {
"foo-139875913",
"foo-aeuefhaiu",
"foo-95hw9ghes",
"barbazabejgoiagjaegioea",
"barbaz8gs98ghsgh9es8h",
"barbaza98fyae9fghaefag",
"bazfa90eufa0e9u",
"bazgeajga8ugae89u",
"bazguea9guae",
"9a8efa098fea0",
"aifeaufhiuafhe"
};
GuessGroups(input, 3, 2).Dump();
Ok, well as discussed, the problem wasn't initially well defined, but here is how I'd go about it.
Create a tree T
Parse the list, for each element:
for each letter in that element
if a branch labeled with that letter exists then
Increment the counter on that branch
Descend that branch
else
Create a branch labelled with that letter
Set its counter to 1
Descend that branch
This gives you a tree where each of the leaves represents a word in your input. Each of the non-leaf nodes has a counter representing how many leaves are (eventually) attached to that node. Now you need a formula to weight the length of the prefix (the depth of the node) against the size of the prefix group. For now:
S = (a * d) + (b * q) // d = depth, q = quantity, a, b coefficients you'll tweak to get desired behaviour
So now you can iterate over each of the non-leaf node and assign them a score S. Then, to work out your groups you would
For each non-leaf node
Assign score S
Insertion sort the node in to a list, so the head is the highest scoring node
Starting at the root of the tree, traverse the nodes
If the node is the highest scoring node in the list
Mark it as a prefix
Remove all nodes from the list that are a descendant of it
Pop itself off the front of the list
Return up the tree
This should give you a list of prefixes. The last part feels like some clever data structures or algorithms could speed it up (the last part of removing all the children feels particularly weak, but if you input size is small, I guess speed isn't too important).
I'm wondering if your requirements aren't off. It seems as if you are looking for a specific grouping size as opposed to specific key size requirements. I have below a program that will, based on a specified group size, break up the strings into the largest possible groups up too, and including the group size specified. So if you specify a group size of 5, then it will group items on the smallest key possible to make a group of size 5. In your example it would group foo- as f since there is no need to make a more complex key as an identifier.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication2
{
class Program
{
/// <remarks><c>true</c> in returned dictionary key are groups over <paramref name="maxGroupSize"/></remarks>
public static Dictionary<bool,Dictionary<string, List<string>>> Split(int maxGroupSize, int keySize, IEnumerable<string> items)
{
var smallItems = from item in items
where item.Length < keySize
select item;
var largeItems = from item in items
where keySize < item.Length
select item;
var largeItemsq = (from item in largeItems
let key = item.Substring(0, keySize)
group item by key into x
select new { Key = x.Key, Items = x.ToList() } into aGrouping
group aGrouping by aGrouping.Items.Count() > maxGroupSize into x2
select x2).ToDictionary(a => a.Key, a => a.ToDictionary(a_ => a_.Key, a_ => a_.Items));
if (smallItems.Any())
{
var smallestLength = items.Aggregate(int.MaxValue, (acc, item) => Math.Min(acc, item.Length));
var smallItemsq = (from item in smallItems
let key = item.Substring(0, smallestLength)
group item by key into x
select new { Key = x.Key, Items = x.ToList() } into aGrouping
group aGrouping by aGrouping.Items.Count() > maxGroupSize into x2
select x2).ToDictionary(a => a.Key, a => a.ToDictionary(a_ => a_.Key, a_ => a_.Items));
return Combine(smallItemsq, largeItemsq);
}
return largeItemsq;
}
static Dictionary<bool, Dictionary<string,List<string>>> Combine(Dictionary<bool, Dictionary<string,List<string>>> a, Dictionary<bool, Dictionary<string,List<string>>> b) {
var x = new Dictionary<bool,Dictionary<string,List<string>>> {
{ true, null },
{ false, null }
};
foreach(var condition in new bool[] { true, false }) {
var hasA = a.ContainsKey(condition);
var hasB = b.ContainsKey(condition);
x[condition] = hasA && hasB ? a[condition].Concat(b[condition]).ToDictionary(c => c.Key, c => c.Value)
: hasA ? a[condition]
: hasB ? b[condition]
: new Dictionary<string, List<string>>();
}
return x;
}
public static Dictionary<string, List<string>> Group(int maxGroupSize, IEnumerable<string> items, int keySize)
{
var toReturn = new Dictionary<string, List<string>>();
var both = Split(maxGroupSize, keySize, items);
if (both.ContainsKey(false))
foreach (var key in both[false].Keys)
toReturn.Add(key, both[false][key]);
if (both.ContainsKey(true))
{
var keySize_ = keySize + 1;
var xs = from needsFix in both[true]
select needsFix;
foreach (var x in xs)
{
var fixedGroup = Group(maxGroupSize, x.Value, keySize_);
toReturn = toReturn.Concat(fixedGroup).ToDictionary(a => a.Key, a => a.Value);
}
}
return toReturn;
}
static Random rand = new Random(unchecked((int)DateTime.Now.Ticks));
const string allowedChars = "aaabbbbccccc"; // "aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ";
static readonly int maxAllowed = allowedChars.Length - 1;
static IEnumerable<string> GenerateText()
{
var list = new List<string>();
for (int i = 0; i < 100; i++)
{
var stringLength = rand.Next(3,25);
var chars = new List<char>(stringLength);
for (int j = stringLength; j > 0; j--)
chars.Add(allowedChars[rand.Next(0, maxAllowed)]);
var newString = chars.Aggregate(new StringBuilder(), (acc, item) => acc.Append(item)).ToString();
list.Add(newString);
}
return list;
}
static void Main(string[] args)
{
// runs 1000 times over autogenerated groups of sample text.
for (int i = 0; i < 1000; i++)
{
var s = GenerateText();
Go(s);
}
Console.WriteLine();
Console.WriteLine("DONE");
Console.ReadLine();
}
static void Go(IEnumerable<string> items)
{
var dict = Group(3, items, 1);
foreach (var key in dict.Keys)
{
Console.WriteLine(key);
foreach (var item in dict[key])
Console.WriteLine("\t{0}", item);
}
}
}
}
Anyone know of a way to add a value to a range of generic lists in c#?
I'm currently building up a large List<List<int>> and the whole process is taking too long and I'm trying to avoid using foreach loops and nested foreach loops in order to shave some time off.
Lets say I had 600 rows in a generic list. For each of the first 200 rows, I'd like to add a "1". For the next 200, I'd like to add a "2". For the next 200, I'd like to add a "3".
The way I'm doing that now, I have to loop through it 600 times and add each one individually, whereas what I'd like to do is loop through it 3 times and add the entries in bulk.
The code I was hoping for would be something like:
List<List<int>> idList = GetFullList(); //list contains 600 rows
int[] newItems = {1, 3, 5};
int count = 0;
int amountToAmend = 200;
foreach (int i in newItems)
{
//List<int> newID = new List<int>();
//newID.Add(i);
(idList.GetRange(count, amountToAmend)).Add(i);
count += amountToAmend;
}
Obviously this doesn't work, but hopefully you can see the kind of thing I'm going for. In my application I'm currently needing to do tens of thousands of unnecessary loops, when often less than 10 could feasibly do the job if the code exists!
UPDATE: I'm not sure I've explained this well, so just to clarify, here are the results I'm looking for here
If I have a list with 6 rows like so:
[6,7,8]
[5,6,7]
[6,4,8]
[2,4,7]
[5,1,7]
[9,3,5]
i know that I'd like to add a 1 to the first 3 rows and a 2 to the next 3 rows, so they would become:
[6,7,8,1]
[5,6,7,1]
[6,4,8,1]
[2,4,7,2]
[5,1,7,2]
[9,3,5,2]
This is easy to do with foreach loops and is how I currently do it, but because of the sheer volume of data involved, I'm looking for ways to cut the time taken on specific functions. I'm not sure if a way exists tbh, but if anyone knows, then it'll be the good people of Stack Overflow :)
You may use Skip and Take methods from LINQ.
like idList.Skip(0).Take(200) it will give you first 200 items from your list, then you may update these items.
For update you may say:
int increment=2;
list.Select(intVal=> intVal+increment).ToList();
How about this:
foreach (int i in newItems)
{
foreach (var row in idList.Skip(count).Take(amountToAmend))
{
row.Add(i);
}
count += amountToAmend;
}
Or with a for-loop:
foreach (int i in newItems)
{
for (int j = 0; j < amountToAmend; j++)
{
idList[count + j].Add(i);
}
count += amountToAmend;
}
You want to have amountToAmend times each item in newItems ?
Like :
200 times 1
200 times 3
200 times 5
If so, you can try :
int amountToAmend = 200;
List<int> newItems = new List<int>(){ 1, 3, 5 };
<List<int>> idList = new List<List<int>>();
newItems.ForEach(i => idList.Add(new List<int>(Enumerable.Repeat(i, amountToAmend))));
List<List<int>> idList = GetFullList(); //list contains 600 rows
var iterator = idList.Begin();
int[] newItems = {1, 3, 5};
int count = 0;
int amountToAmend = 200;
foreach (var item in newItems)
{
iterator = iterator.AddItem(item);
iterator = iterator.MoveForward(amountToAmend);
}
public struct NestedListIterator<T>
{
public NestedListIterator(List<List<T>> lists, int listIndex, int itemIndex)
{
this.lists = lists;
this.ListIndex = listIndex;
this.ItemIndex = itemIndex;
}
public readonly int ListIndex;
public readonly int ItemIndex;
public readonly List<List<T>> lists;
public NestedListIterator<T> AddItem(T item)
{
var list = lists.ElementAtOrDefault(ListIndex);
if (list == null || list.Count < ItemIndex)
return this;//or throw new Exception(...)
list.Insert(ItemIndex, item);
return new NestedListIterator<T>(this.lists, this.ListIndex, this.ItemIndex + 1);
}
public NestedListIterator<T> MoveForward(List<List<T>> lists, int index)
{
//if (index < 0) throw new Exception(..)
var listIndex = this.ListIndex;
var itemIndex = this.ItemIndex + index;
for (; ; )
{
var list = lists.ElementAtOrDefault(ListIndex);
if (list == null)
return new NestedListIterator<T>(lists, listIndex, itemIndex);//or throw new Exception(...)
if (itemIndex <= list.Count)
return new NestedListIterator<T>(lists, listIndex, itemIndex);
itemIndex -= list.Count;
listIndex++;
}
}
public static int Compare(NestedListIterator<T> left, NestedListIterator<T> right)
{
var cmp = left.ListIndex.CompareTo(right.ListIndex);
if (cmp != 0)
return cmp;
return left.ItemIndex.CompareTo(right.ItemIndex);
}
public static bool operator <(NestedListIterator<T> left, NestedListIterator<T> right)
{
return Compare(left, right) < 0;
}
public static bool operator >(NestedListIterator<T> left, NestedListIterator<T> right)
{
return Compare(left, right) > 0;
}
}
public static class NestedListIteratorExtension
{
public static NestedListIterator<T> Begin<T>(this List<List<T>> lists)
{
return new NestedListIterator<T>(lists, 0, 0);
}
public static NestedListIterator<T> End<T>(this List<List<T>> lists)
{
return new NestedListIterator<T>(lists, lists.Count, 0);
}
}
There is no builtin function, although you cannot avoid looping(explicit or implicit) at all since you want to add a new element to every list.
You could combine List.GetRange with List.ForEach:
var newItems = new[] { 1, 2 };
int numPerGroup = (int)(idList.Count / newItems.Length);
for (int i = 0; i < newItems.Length; i++)
idList.GetRange(i * numPerGroup, numPerGroup)
.ForEach(l => l.Add(newItems[i]));
Note that above is not Linq and would work even in .NET 2.0
This is my old approach which was not what you needed:
You can use Linq and Enumerable.GroupBy to redistribute a flat list into nested lists:
int amountToAmend = 200;
// create sample data with 600 integers
List<int> flattened = Enumerable.Range(1, 600).ToList();
// group these 600 numbers into 3 nested lists with each 200 integers
List<List<int>> unflattened = flattened
.Select((i, index) => new { i, index })
.GroupBy(x => x.index / amountToAmend)
.Select(g => g.Select(x => x.i).ToList())
.ToList();
Here's the demo: http://ideone.com/LlEe2
I have a list of numbers e.g. 21,4,7,9,12,22,17,8,2,20,23
I want to be able to pick out sequences of sequential numbers (minimum 3 items in length), so from the example above it would be 7,8,9 and 20,21,22,23.
I have played around with a few ugly sprawling functions but I am wondering if there is a neat LINQ-ish way to do it.
Any suggestions?
UPDATE:
Many thanks for all the responses, much appriciated. Im am currently having a play with them all to see which would best integrate into our project.
It strikes me that the first thing you should do is order the list. Then it's just a matter of walking through it, remembering the length of your current sequence and detecting when it's ended. To be honest, I suspect that a simple foreach loop is going to be the simplest way of doing that - I can't immediately think of any wonderfully neat LINQ-like ways of doing it. You could certainly do it in an iterator block if you really wanted to, but bear in mind that ordering the list to start with means you've got a reasonably "up-front" cost anyway. So my solution would look something like this:
var ordered = list.OrderBy(x => x);
int count = 0;
int firstItem = 0; // Irrelevant to start with
foreach (int x in ordered)
{
// First value in the ordered list: start of a sequence
if (count == 0)
{
firstItem = x;
count = 1;
}
// Skip duplicate values
else if (x == firstItem + count - 1)
{
// No need to do anything
}
// New value contributes to sequence
else if (x == firstItem + count)
{
count++;
}
// End of one sequence, start of another
else
{
if (count >= 3)
{
Console.WriteLine("Found sequence of length {0} starting at {1}",
count, firstItem);
}
count = 1;
firstItem = x;
}
}
if (count >= 3)
{
Console.WriteLine("Found sequence of length {0} starting at {1}",
count, firstItem);
}
EDIT: Okay, I've just thought of a rather more LINQ-ish way of doing things. I don't have the time to fully implement it now, but:
Order the sequence
Use something like SelectWithPrevious (probably better named SelectConsecutive) to get consecutive pairs of elements
Use the overload of Select which includes the index to get tuples of (index, current, previous)
Filter out any items where (current = previous + 1) to get anywhere that counts as the start of a sequence (special-case index=0)
Use SelectWithPrevious on the result to get the length of the sequence between two starting points (subtract one index from the previous)
Filter out any sequence with length less than 3
I suspect you need to concat int.MinValue on the ordered sequence, to guarantee the final item is used properly.
EDIT: Okay, I've implemented this. It's about the LINQiest way I can think of to do this... I used null values as "sentinel" values to force start and end sequences - see comments for more details.
Overall, I wouldn't recommend this solution. It's hard to get your head round, and although I'm reasonably confident it's correct, it took me a while thinking of possible off-by-one errors etc. It's an interesting voyage into what you can do with LINQ... and also what you probably shouldn't.
Oh, and note that I've pushed the "minimum length of 3" part up to the caller - when you have a sequence of tuples like this, it's cleaner to filter it out separately, IMO.
using System;
using System.Collections.Generic;
using System.Linq;
static class Extensions
{
public static IEnumerable<TResult> SelectConsecutive<TSource, TResult>
(this IEnumerable<TSource> source,
Func<TSource, TSource, TResult> selector)
{
using (IEnumerator<TSource> iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
yield break;
}
TSource prev = iterator.Current;
while (iterator.MoveNext())
{
TSource current = iterator.Current;
yield return selector(prev, current);
prev = current;
}
}
}
}
class Test
{
static void Main()
{
var list = new List<int> { 21,4,7,9,12,22,17,8,2,20,23 };
foreach (var sequence in FindSequences(list).Where(x => x.Item1 >= 3))
{
Console.WriteLine("Found sequence of length {0} starting at {1}",
sequence.Item1, sequence.Item2);
}
}
private static readonly int?[] End = { null };
// Each tuple in the returned sequence is (length, first element)
public static IEnumerable<Tuple<int, int>> FindSequences
(IEnumerable<int> input)
{
// Use null values at the start and end of the ordered sequence
// so that the first pair always starts a new sequence starting
// with the lowest actual element, and the final pair always
// starts a new one starting with null. That "sequence at the end"
// is used to compute the length of the *real* final element.
return End.Concat(input.OrderBy(x => x)
.Select(x => (int?) x))
.Concat(End)
// Work out consecutive pairs of items
.SelectConsecutive((x, y) => Tuple.Create(x, y))
// Remove duplicates
.Where(z => z.Item1 != z.Item2)
// Keep the index so we can tell sequence length
.Select((z, index) => new { z, index })
// Find sequence starting points
.Where(both => both.z.Item2 != both.z.Item1 + 1)
.SelectConsecutive((start1, start2) =>
Tuple.Create(start2.index - start1.index,
start1.z.Item2.Value));
}
}
Jon Skeet's / Timwi's solutions are the way to go.
For fun, here's a LINQ query that does the job (very inefficiently):
var sequences = input.Distinct()
.GroupBy(num => Enumerable.Range(num, int.MaxValue - num + 1)
.TakeWhile(input.Contains)
.Last()) //use the last member of the consecutive sequence as the key
.Where(seq => seq.Count() >= 3)
.Select(seq => seq.OrderBy(num => num)); // not necessary unless ordering is desirable inside each sequence.
The query's performance can be improved slightly by loading the input into a HashSet (to improve Contains), but that will still not produce a solution that is anywhere close to efficient.
The only bug I am aware of is the possibility of an arithmetic overflow if the sequence contains negative numbers of large magnitude (we cannot represent the count parameter for Range). This would be easy to fix with a custom static IEnumerable<int> To(this int start, int end) extension-method. If anyone can think of any other simple technique of dodging the overflow, please let me know.
EDIT:
Here's a slightly more verbose (but equally inefficient) variant without the overflow issue.
var sequences = input.GroupBy(num => input.Where(candidate => candidate >= num)
.OrderBy(candidate => candidate)
.TakeWhile((candidate, index) => candidate == num + index)
.Last())
.Where(seq => seq.Count() >= 3)
.Select(seq => seq.OrderBy(num => num));
I think my solution is more elegant and simple, and therefore easier to verify as correct:
/// <summary>Returns a collection containing all consecutive sequences of
/// integers in the input collection.</summary>
/// <param name="input">The collection of integers in which to find
/// consecutive sequences.</param>
/// <param name="minLength">Minimum length that a sequence should have
/// to be returned.</param>
static IEnumerable<IEnumerable<int>> ConsecutiveSequences(
IEnumerable<int> input, int minLength = 1)
{
var results = new List<List<int>>();
foreach (var i in input.OrderBy(x => x))
{
var existing = results.FirstOrDefault(lst => lst.Last() + 1 == i);
if (existing == null)
results.Add(new List<int> { i });
else
existing.Add(i);
}
return minLength <= 1 ? results :
results.Where(lst => lst.Count >= minLength);
}
Benefits over the other solutions:
It can find sequences that overlap.
It’s properly reusable and documented.
I have not found any bugs ;-)
Here is how to solve the problem in a "LINQish" way:
int[] arr = new int[]{ 21, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 };
IOrderedEnumerable<int> sorted = arr.OrderBy(x => x);
int cnt = sorted.Count();
int[] sortedArr = sorted.ToArray();
IEnumerable<int> selected = sortedArr.Where((x, idx) =>
idx <= cnt - 3 && sortedArr[idx + 1] == x + 1 && sortedArr[idx + 2] == x + 2);
IEnumerable<int> result = selected.SelectMany(x => new int[] { x, x + 1, x + 2 }).Distinct();
Console.WriteLine(string.Join(",", result.Select(x=>x.ToString()).ToArray()));
Due to the array copying and reconstruction, this solution - of course - is not as efficient as the traditional solution with loops.
Not 100% Linq but here's a generic variant:
static IEnumerable<IEnumerable<TItem>> GetSequences<TItem>(
int minSequenceLength,
Func<TItem, TItem, bool> areSequential,
IEnumerable<TItem> items)
where TItem : IComparable<TItem>
{
items = items
.OrderBy(n => n)
.Distinct().ToArray();
var lastSelected = default(TItem);
var sequences =
from startItem in items
where startItem.Equals(items.First())
|| startItem.CompareTo(lastSelected) > 0
let sequence =
from item in items
where item.Equals(startItem) || areSequential(lastSelected, item)
select (lastSelected = item)
where sequence.Count() >= minSequenceLength
select sequence;
return sequences;
}
static void UsageInt()
{
var sequences = GetSequences(
3,
(a, b) => a + 1 == b,
new[] { 21, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 });
foreach (var sequence in sequences)
Console.WriteLine(string.Join(", ", sequence.ToArray()));
}
static void UsageChar()
{
var list = new List<char>(
"abcdefghijklmnopqrstuvwxyz".ToCharArray());
var sequences = GetSequences(
3,
(a, b) => (list.IndexOf(a) + 1 == list.IndexOf(b)),
"PleaseBeGentleWithMe".ToLower().ToCharArray());
foreach (var sequence in sequences)
Console.WriteLine(string.Join(", ", sequence.ToArray()));
}
Here's my shot at it:
public static class SequenceDetector
{
public static IEnumerable<IEnumerable<T>> DetectSequenceWhere<T>(this IEnumerable<T> sequence, Func<T, T, bool> inSequenceSelector)
{
List<T> subsequence = null;
// We can only have a sequence with 2 or more items
T last = sequence.FirstOrDefault();
foreach (var item in sequence.Skip(1))
{
if (inSequenceSelector(last, item))
{
// These form part of a sequence
if (subsequence == null)
{
subsequence = new List<T>();
subsequence.Add(last);
}
subsequence.Add(item);
}
else if (subsequence != null)
{
// We have a previous seq to return
yield return subsequence;
subsequence = null;
}
last = item;
}
if (subsequence != null)
{
// Return any trailing seq
yield return subsequence;
}
}
}
public class test
{
public static void run()
{
var list = new List<int> { 21, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 };
foreach (var subsequence in list
.OrderBy(i => i)
.Distinct()
.DetectSequenceWhere((first, second) => first + 1 == second)
.Where(seq => seq.Count() >= 3))
{
Console.WriteLine("Found subsequence {0}",
string.Join(", ", subsequence.Select(i => i.ToString()).ToArray()));
}
}
}
This returns the specific items that form the sub-sequences and permits any type of item and any definition of criteria so long as it can be determined by comparing adjacent items.
What about sorting the array then create another array that is the difference between each element the previous one
sortedArray = 8, 9, 10, 21, 22, 23, 24, 27, 30, 31, 32
diffArray = 1, 1, 11, 1, 1, 1, 3, 3, 1, 1
Now iterate through the difference array; if the difference equlas 1, increase the count of a variable, sequenceLength, by 1. If the difference is > 1, check the sequenceLength if it is >=2 then you have a sequence of at at least 3 consecutive elements. Then reset sequenceLenght to 0 and continue your loop on the difference array.
Here is a solution I knocked up in F#, it should be fairly easy to translate this into a C# LINQ query since fold is pretty much equivalent to the LINQ aggregate operator.
#light
let nums = [21;4;7;9;12;22;17;8;2;20;23]
let scanFunc (mainSeqLength, mainCounter, lastNum:int, subSequenceCounter:int, subSequence:'a list, foundSequences:'a list list) (num:'a) =
(mainSeqLength, mainCounter + 1,
num,
(if num <> lastNum + 1 then 1 else subSequenceCounter+1),
(if num <> lastNum + 1 then [num] else subSequence#[num]),
if subSequenceCounter >= 3 then
if mainSeqLength = mainCounter+1
then foundSequences # [subSequence#[num]]
elif num <> lastNum + 1
then foundSequences # [subSequence]
else foundSequences
else foundSequences)
let subSequences = nums |> Seq.sort |> Seq.fold scanFunc (nums |> Seq.length, 0, 0, 0, [], []) |> fun (_,_,_,_,_,results) -> results
Linq isn't the solution for everything, sometimes you're better of with a simple loop. Here's a solution, with just a bit of Linq to order the original sequences and filter the results
void Main()
{
var numbers = new[] { 21,4,7,9,12,22,17,8,2,20,23 };
var sequences =
GetSequences(numbers, (prev, curr) => curr == prev + 1);
.Where(s => s.Count() >= 3);
sequences.Dump();
}
public static IEnumerable<IEnumerable<T>> GetSequences<T>(
IEnumerable<T> source,
Func<T, T, bool> areConsecutive)
{
bool first = true;
T prev = default(T);
List<T> seq = new List<T>();
foreach (var i in source.OrderBy(i => i))
{
if (!first && !areConsecutive(prev, i))
{
yield return seq.ToArray();
seq.Clear();
}
first = false;
seq.Add(i);
prev = i;
}
if (seq.Any())
yield return seq.ToArray();
}
I thought of the same thing as Jon: to represent a range of consecutive integers all you really need are two measly integers! So I'd start there:
struct Range : IEnumerable<int>
{
readonly int _start;
readonly int _count;
public Range(int start, int count)
{
_start = start;
_count = count;
}
public int Start
{
get { return _start; }
}
public int Count
{
get { return _count; }
}
public int End
{
get { return _start + _count - 1; }
}
public IEnumerator<int> GetEnumerator()
{
for (int i = 0; i < _count; ++i)
{
yield return _start + i;
}
}
// Heck, why not?
public static Range operator +(Range x, int y)
{
return new Range(x.Start, x.Count + y);
}
// skipping the explicit IEnumerable.GetEnumerator implementation
}
From there, you can write a static method to return a bunch of these Range values corresponding to the consecutive numbers of your sequence.
static IEnumerable<Range> FindRanges(IEnumerable<int> source, int minCount)
{
// throw exceptions on invalid arguments, maybe...
var ordered = source.OrderBy(x => x);
Range r = default(Range);
foreach (int value in ordered)
{
// In "real" code I would've overridden the Equals method
// and overloaded the == operator to write something like
// if (r == Range.Empty) here... but this works well enough
// for now, since the only time r.Count will be 0 is on the
// first item.
if (r.Count == 0)
{
r = new Range(value, 1);
continue;
}
if (value == r.End)
{
// skip duplicates
continue;
}
else if (value == r.End + 1)
{
// "append" consecutive values to the range
r += 1;
}
else
{
// return what we've got so far
if (r.Count >= minCount)
{
yield return r;
}
// start over
r = new Range(value, 1);
}
}
// return whatever we ended up with
if (r.Count >= minCount)
{
yield return r;
}
}
Demo:
int[] numbers = new[] { 21, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 };
foreach (Range r in FindConsecutiveRanges(numbers, 3))
{
// Using .NET 3.5 here, don't have the much nicer string.Join overloads.
Console.WriteLine(string.Join(", ", r.Select(x => x.ToString()).ToArray()));
}
Output:
7, 8, 9
20, 21, 22, 23
Here's my LINQ-y take on the problem:
static IEnumerable<IEnumerable<int>>
ConsecutiveSequences(this IEnumerable<int> input, int minLength = 3)
{
int order = 0;
var inorder = new SortedSet<int>(input);
return from item in new[] { new { order = 0, val = inorder.First() } }
.Concat(
inorder.Zip(inorder.Skip(1), (x, val) =>
new { order = x + 1 == val ? order : ++order, val }))
group item.val by item.order into list
where list.Count() >= minLength
select list;
}
uses no explicit loops, but should still be O(n lg n)
uses SortedSet instead of .OrderBy().Distinct()
combines consecutive element with list.Zip(list.Skip(1))
Here's a solution using a Dictionary instead of a sort...
It adds the items to a Dictionary, and then for each value increments above and below to find the longest sequence.
It is not strictly LINQ, though it does make use of some LINQ functions, and I think it is more readable than a pure LINQ solution..
static void Main(string[] args)
{
var items = new[] { -1, 0, 1, 21, -2, 4, 7, 9, 12, 22, 17, 8, 2, 20, 23 };
IEnumerable<IEnumerable<int>> sequences = FindSequences(items, 3);
foreach (var sequence in sequences)
{ //print results to consol
Console.Out.WriteLine(sequence.Select(num => num.ToString()).Aggregate((a, b) => a + "," + b));
}
Console.ReadLine();
}
private static IEnumerable<IEnumerable<int>> FindSequences(IEnumerable<int> items, int minSequenceLength)
{
//Convert item list to dictionary
var itemDict = new Dictionary<int, int>();
foreach (int val in items)
{
itemDict[val] = val;
}
var allSequences = new List<List<int>>();
//for each val in items, find longest sequence including that value
foreach (var item in items)
{
var sequence = FindLongestSequenceIncludingValue(itemDict, item);
allSequences.Add(sequence);
//remove items from dict to prevent duplicate sequences
sequence.ForEach(i => itemDict.Remove(i));
}
//return only sequences longer than 3
return allSequences.Where(sequence => sequence.Count >= minSequenceLength).ToList();
}
//Find sequence around start param value
private static List<int> FindLongestSequenceIncludingValue(Dictionary<int, int> itemDict, int value)
{
var result = new List<int>();
//check if num exists in dictionary
if (!itemDict.ContainsKey(value))
return result;
//initialize sequence list
result.Add(value);
//find values greater than starting value
//and add to end of sequence
var indexUp = value + 1;
while (itemDict.ContainsKey(indexUp))
{
result.Add(itemDict[indexUp]);
indexUp++;
}
//find values lower than starting value
//and add to start of sequence
var indexDown = value - 1;
while (itemDict.ContainsKey(indexDown))
{
result.Insert(0, itemDict[indexDown]);
indexDown--;
}
return result;
}