Does Linq provide a way to easily spot gaps in a sequence? - c#

I am managing a directory of files. Each file will be named similarly to Image_000000.png, with the numeric portion being incremented for each file that is stored.
Files can also be deleted, leaving gaps in the number sequence. The reason I am asking is because I recognize that at some point in the future, the user could use up the number sequence unless I takes steps to reuse numbers when they become available. I realize that it is a million, and that's a lot, but we have 20-plus year users, so "someday" is not out of the question.
So, I am specifically asking whether or not there exists a way to easily determine the gaps in the sequence without simply looping. I realize that because it's a fixed range, I could simply loop over the expected range.
And I will unless there is a better/cleaner/easier/faster alternative. If so, I'd like to know about it.
This method is called to obtain the next available file name:
public static String GetNextImageFileName()
{
String retFile = null;
DirectoryInfo di = new DirectoryInfo(userVars.ImageDirectory);
FileInfo[] fia = di.GetFiles("*.*", SearchOption.TopDirectoryOnly);
String lastFile = fia.Where(i => i.Name.StartsWith("Image_") && i.Name.Substring(6, 6).ContainsOnlyDigits()).OrderBy(i => i.Name).Last().Name;
if (!String.IsNullOrEmpty(lastFile))
{
Int32 num;
String strNum = lastFile.Substring(6, 6);
String strExt = lastFile.Substring(13);
if (!String.IsNullOrEmpty(strNum) &&
!String.IsNullOrEmpty(strExt) &&
strNum.ContainsOnlyDigits() &&
Int32.TryParse(strNum, out num))
{
num++;
retFile = String.Format("Image_{0:D6}.{1}", num, strExt);
while (num <= 999999 && File.Exists(retFile))
{
num++;
retFile = String.Format("Image_{0:D6}.{1}", num, strExt);
}
}
}
return retFile;
}
EDIT: in case it helps anyone, here is the final method, incorporating Daniel Hilgarth's answer:
public static String GetNextImageFileName()
{
DirectoryInfo di = new DirectoryInfo(userVars.ImageDirectory);
FileInfo[] fia = di.GetFiles("Image_*.*", SearchOption.TopDirectoryOnly);
List<Int32> fileNums = new List<Int32>();
foreach (FileInfo fi in fia)
{
Int32 i;
if (Int32.TryParse(fi.Name.Substring(6, 6), out i))
fileNums.Add(i);
}
var result = fileNums.Select((x, i) => new { Index = i, Value = x })
.Where(x => x.Index != x.Value)
.Select(x => (Int32?)x.Index)
.FirstOrDefault();
Int32 index;
if (result == null)
index = fileNums.Count - 1;
else
index = result.Value - 1;
var nextNumber = fileNums[index] + 1;
if (nextNumber >= 0 && nextNumber <= 999999)
return String.Format("Image_{0:D6}", result.Value);
return null;
}

A very simple approach to find the first number of the first gap would be the following:
int[] existingNumbers = /* extract all numbers from all filenames and order them */
var allNumbers = Enumerable.Range(0, 1000000);
var result = allNumbers.Where(x => !existingNumbers.Contains(x)).First();
This will return 1,000,000 if all numbers have been used and no gaps exist.
This approach has the drawback that it performs rather badly, as it iterates existingNumbers multiple times.
A somewhat better approach would be to use Zip:
allNumbers.Zip(existingNumbers, (a, e) => new { Number = a, ExistingNumber = e })
.Where(x => x.Number != x.ExistingNumber)
.Select(x => x.Number)
.First();
An improved version of DuckMaestro's answer that actually returns the first value of the first gap - and not the first value after the first gap - would look like this:
var tmp = existingNumbers.Select((x, i) => new { Index = i, Value = x })
.Where(x => x.Index != x.Value)
.Select(x => (int?)x.Index)
.FirstOrDefault();
int index;
if(tmp == null)
index = existingNumbers.Length - 1;
else
index = tmp.Value - 1;
var nextNumber = existingNumbers[index] + 1;

Improving over the other answer, use the alternate version of Where.
int[] existingNumbers = ...
var result = existingNumbers.Where( (x,i) => x != i ).FirstOrDefault();
The value i is a counter starting at 0.
This version of where is supported in .NET 3.5 (http://msdn.microsoft.com/en-us/library/bb549418(v=vs.90).aspx).

var firstnonexistingfile = Enumerable.Range(0,999999).Select(x => String.Format("Image_{0:D6}.{1}", x, strExt)).FirstOrDefault(x => !File.Exists(x));
This will iterate from 0 to 999999, then output the result of the String.Format() as an IEnumerable<string> and then find the first string out of that sequence that returns false for File.Exists().

It's an old question, but it has been suggested (in the comments) that you could use .Except() instead. I tend to like this solution a little better since it will give you the first missing number (the gap) or the next smallest number in the sequence. Here's an example:
var allNumbers = Enumerable.Range(0, 999999); //999999 is arbitrary. You could use int.MaxValue, but it would degrade performance
var existingNumbers = new int[] { 0, 1, 2, 4, 5, 6 };
int result;
var missingNumbers = allNumbers.Except(existingNumbers);
if (missingNumbers.Any())
result = missingNumbers.First();
else //no missing numbers -- you've reached the max
result = -1;
Running the above code would set result to:
3
Additionally, if you changed existingNumbers to:
var existingNumbers = new int[] { 0, 1, 3, 2, 4, 5, 6 };
So there isn't a gap, you would get 7 back.
Anyway, that's why I prefer Except over the Zip solution -- just my two cents.
Thanks!

Related

C# LINQ - SkipWhile() in reverse, without calling Reverse()?

In this code:
for (e = 0; e <= collection.Count - 2; e++)
{
var itm = collection.Read()
var itm_price = itm.Price
var forwards_satisfied_row = collection
.Skip(e + 1)
.SkipWhile(x => x.Price < ex_price)
.FirstOrDefault();
var backwards_satisfied_row = collection
.Reverse()
.Skip(collection.Count - e)
.SkipWhile(x => x.Price < ex_price)
.FirstOrDefault();
}
Suppose the collection contains millions of items and a Reverse() is too expensive, what would be the best way to achieve the same outcome as 'backwards_satisfied_row' ?
Edit:
For each item in the collection, it should find the first preceding item that matches the SkipWhile predicate.
For context I'm finding the distance a price extrema (minima or maxima) is from a horizontal clash with the price. This gives a 'strength' value for each Minima and Maxima, which determines the importance of it, and to help marry it up with extremas of a similar strength.
Edit 2
This chart shows the data in the reproc code below, note the dip in the middle at item #22, this item has a distance of 18.
Bear in mind this operation will be iterated millions of times.
So I'm trying not to read into memory, and to only evaluate the items needed.
When I run this on a large dataset r_ex takes 5 ms per row, whereas l_ex takes up to a second.
It might be tempting to iterate backwards and check that way, but there could be millions of previous records, being read from a binary file.
Many types of searches like Binary search wouldn't be practical here, since the values aren't ordered.
static void Main(string[] args)
{
var dict_dists = new Dictionary<Int32, Int32>();
var dict = new Dictionary<Int32, decimal> {
{1, 410},{2, 474},{3, 431},
{4, 503},{5, 461},{6, 535},
{7, 488},{8, 562},{9, 508},
{10, 582},{11, 522},{12, 593},
{13, 529},{14, 597},{15, 529},
{16, 593},{17, 522},{18, 582},
{19, 510},{20, 565},{21, 492},
{22, 544},{23, 483},{24, 557},
{25, 506},{26, 580},{27, 524},
{28, 598},{29, 537},{30, 609},
{31, 543},{32, 612},{33, 542},
{34, 607},{35, 534},{36, 594},
{37, 518},{38, 572},{39, 496},
{40, 544},{41, 469},{42, 511},
{43, 437},{44, 474},{45, 404},
{46, 462},{47, 427},{48, 485},
{49, 441},{50, 507}};
var i = 0;
for (i = 0; i <= dict.Count - 2; i++)
{
var ele = dict.ElementAt(i);
var current_time = ele.Key;
var current_price = ele.Value;
var is_maxima = current_price > dict.ElementAt(i + 1).Value;
//' If ele.Key = 23 Then here = True
var shortest_dist = Int32.MaxValue;
var l_ex = new KeyValuePair<int, decimal>();
var r_ex = new KeyValuePair<int, decimal>();
if (is_maxima)
{
l_ex = dict.Reverse().Skip(dict.Count - 1 - i + 1).SkipWhile(x => x.Value < current_price).FirstOrDefault();
r_ex = dict.Skip(i + 1).SkipWhile(x => x.Value < current_price).FirstOrDefault();
}
else
{ // 'Is Minima
l_ex = dict.Reverse().Skip(dict.Count - 1 - i + 1).SkipWhile(x => x.Value > current_price).FirstOrDefault();
r_ex = dict.Skip(i + 1).SkipWhile(x => x.Value > current_price).FirstOrDefault();
}
if (l_ex.Key > 0)
{
var l_dist = (current_time - l_ex.Key);
if ( l_dist < shortest_dist ) {
shortest_dist = l_dist;
};
}
if (r_ex.Key > 0)
{
var r_dist = (r_ex.Key - current_time);
if ( r_dist < shortest_dist ) {
shortest_dist = r_dist;
};
}
dict_dists.Add(current_time, shortest_dist);
}
var dist = dict_dists[23];
}
Edit: As a workaround I'm writing a reversed temp file for the left-seekers.
for (i = file.count - 1; i >= 0; i += -1)
{
file.SetPointerToItem(i);
temp_file.Write(file.Read());
}
You could make it more efficient by selecting the precedent of each item in one pass. Lets make an extension method for enumerables that selects a precedent for each element:
public static IEnumerable<T> SelectPrecedent<T>(this IEnumerable<T> source,
Func<T, bool> selector)
{
T selectedPrecedent = default;
foreach (var item in source)
{
if (selector(item)) selectedPrecedent = item;
yield return selectedPrecedent;
}
}
You could then use this method, and select the precedent and the subsequent of each element by doing only two Reverse operations in total:
var precedentArray = collection.SelectPrecedent(x => x.Price < ex_price).ToArray();
var subsequentArray = collection.Reverse()
.SelectPrecedent(x => x.Price < ex_price).Reverse().ToArray();
for (int i = 0; i < collection.Count; i++)
{
var current = collection[i];
var precedent = precedentArray[i];
var subsequent = subsequentArray[i];
// Do something with the current, precedent and subsequent
}
No need to do .Reverse() and then FirstOrDefault(), just use LastOrDefault(). Instead of Skip(collection.Count - e) use .Take(e) elements
var backwards_satisfied_row = collection
.SkipWhile(x => x.Price < ex_price) //Skip till x.Price < ex_price
.Skip(e+1) //Skip first e+1 elements
.LastOrDefault(); //Get Last or default value
You can make your code more efficient by storing collection and then just get FirstOrDefault() and LastOrDefault() for forwards_satisfied_row and backwards_satisfied_row respectively.
like,
for (e = 0; e <= collection.Count - 2; e++)
{
var itm = collection.Read()
var itm_price = itm.Price
var satisfied_rows = collection
.SkipWhile(x => x.Price < ex_price)
.Skip(e + 1)
.ToList();
var forwards_satisfied_row = satisfied_rows.FirstOrDefault();
var backwards_satisfied_row = satisfied_rows.LastOrDefault();
}

C# array for finding all the 2's

Basically what I have to do is find a certain number, which in this case is 2, and see how many times I have that number in my program, I assumed that I would have to use a .GetValue(42) but it's not doing it right, the code I am using is
static int count2(int[] input)
{
return input.GetValue(2);
}
input is from a separate method, but it contains the values that I'm working with which is
int [] input = {1,2,3,4,5};
Not sure if you count specifically the number 2, or any number that contains the number 2.
For the later here's the easy way:
public int count2(int[] input) {
int counter = 0;
foreach(var i in input) {
if (i.ToString().Contains("2"))
{
++counter;
}
}
return counter;
}
You can do it with LINQ
input.Count(x=>x==2);
Array.GetValue() "gets the value at the specified position in the one-dimensional Array" which is not what you want. (in your example it will return 3 because that's the value at index 2 of your array).
You want to count the number of times a specific item is in the array. That's a matter of looping and checking each item:
var counter = 0;
foreach(var item in input)
{
if(item == 2)
{
counter++;
}
}
return counter;
to get a count do this
int [] inputDupes = {1,2,3,4,5,2};
var duplicates = inputDupes
.Select(w => inputDupes.Contains(2))
.GroupBy(q => q)
.Where(gb => gb.Count() > 1)
.Select(gb => gb.Key).Count();//returns an Int32 value
to see if there are duplicates of the number 2 then do the following
int [] inputDupes = {1,2,3,4,5,2};
var duplicates = inputDupes
.Select(w => inputDupes.Contains(2))
.GroupBy(q => q)
.Where(gb => gb.Count() > 1)
.Select(gb => gb.Key)
.ToList(); //returns true | false
if you want to do this based on any number then create a method and pass a param in where .Contains() extension method is being called
if you want to capture user input from Console you can do it this way as well
int [] inputDupes = {1,2,3,4,5,2};
Console.WriteLine("Enter a number to check for duplicates: ");
string input = Console.ReadLine();
int number;
Int32.TryParse(input, out number);
var dupeCount = inputDupes.Count(x => x == number);
Console.WriteLine(dupeCount);
Console.Read();
Yields 2 for the duplicate Count
static int count2(int[] input)
{
return input.Count(i => i == 2);
}
You could use a Func like this:
public Func<int[], int, int> GetNumberCount =
(numbers, numberToSearchFor) =>
numbers.Count(num => num.Equals(numberToSearchFor));
...
var count = GetNumberCount(input, 2);
Gotta' love a Func :)

Select interval linq

Is there some way with LINQ to select certain numbers with shortcut criteria.
Like this:
I have numbers from 1 to 10000.
My criteria is (4012..4190|4229), meaning take numbers between 4012 to 4190 and number 4229:
static int[] test(string criteria)
{
// criteria is 4012..4190|4229
// select numbers from lab where criteria is met
int[] lab = Enumerable.Range(0, 10000).ToArray();
return lab;
}
This should be enough for your case:
return lab.Where((int1) => (int1 >= 4012 && int1 <= 4190) || int1 == 4229).ToArray();
Also a quick way of parsing your criteria would be to use RegEx:
Regex r = new Regex(#"\d+");
MatchCollection m = r.Matches(criteria);
int start = int.Parse(m[0].Value);
int end = int.Parse(m[1].Value);
int specific = int.Parse(m[2].Value);
return lab.Where((int1) => (int1 >= start && int1 <= end) || int1 == specific).ToArray();
If your criteria is always a string, you need some way to parse it, to Func<int, bool, but it's not LINQ specific. In the end you'll need something like this:
Func<int, bool> predicate = Parse(criteria);
return lab.Where(predicate).ToArray();
where very basic implementation of Parse might look as follows:
public static Func<int, bool> Parse(string criteria)
{
var alternatives = criteria
.Split('|')
.Select<string, Func<int, bool>>(
token =>
{
if (token.Contains(".."))
{
var between = token.Split(new[] {".."}, StringSplitOptions.RemoveEmptyEntries);
int lo = int.Parse(between[0]);
int hi = int.Parse(between[1]);
return x => lo <= x && x <= hi;
}
else
{
int exact = int.Parse(token);
return x => x == exact;
}
})
.ToArray();
return x => alternatives.Any(alt => alt(x));
}
You can concatenate two sequenses
int[] lab = Enumerable.Range(4012, 4190-4012).Concat(Enumerable.Range(4229,1)).ToArray();
Update:
you need to parse incoming criteria first
static int[] test(string criteria)
{
// criteria is 4012..4190|4229
// select numbers from lab where criteria is met
// assume you parsed your criteria to 2 dimentional array
// I used count for second part for convience
int[][] criteriaArray = { new int[]{ 4012, 50 }, new int[]{ 4229, 1 } };
var seq = Enumerable.Range(criteriaArray[0][0], criteriaArray[0][1]);
for (int i = 1; i < criteriaArray.Length; i++)
{
int start = criteriaArray[i][0];
int count = criteriaArray[i][1];
seq = seq.Concat(Enumerable.Range(start, count));
}
return seq.ToArray();
}
You could :
Flatten[{Range[4012, 4190], 4229}]
And in some way this would work as well 4012..4190|4229, but answer is exactly that - list of items from 4012 to 4190 and item 4229.
Lambda just imitates pure functions. However unless you have free wolfram kernel, using this approach might no be most cost effective. However, you do not need to write boilerplate code.

how do access previous item in list using linQ?

I have:
List<int> A = new List<int>(){1,2,3,4,5,6};
List<int> m=new List<int>();
for(int i=1;i<A.count;i++)
{
int j=A[i]+A[i-1];
m.add(j);
}
how can I do this same operation using LinQ?
Well, a straightforward translation would be:
var m = Enumerable.Range(1, A.Count - 1)
.Select(i => A[i] + A[i - 1])
.ToList();
But also consider:
var m = A.Skip(1)
.Zip(A, (curr, prev) => curr + prev)
.ToList();
Or using Jon Skeet's extension here:
var m = A.SelectWithPrevious((prev, curr) => prev + curr)
.ToList();
But as Jason Evans points out in a comment, this doesn't help all that much with readability or brevity, considering your existing code is perfectly understandable (and short) and you want to materialize all of the results into a list anyway.
There's nothing really wrong with:
var sumsOfConsecutives = new List<int>();
for(int i = 1; i < A.Count; i++)
sumsOfConsecutives.Add(A[i] + A[i - 1]);
Ok so getting the next item in the list you can use:
A.SkipWhile(x => x != value).Skip(1).FirstOrDefault();
So to get the previous item use:
var B = A.ToList();
B.Reverse();
B.SkipWhile(x => x != value).Skip(1).FirstOrDefault();
How about something like
var l = A.Skip(1).Select((x, index) => x + A[index]).ToList();
Some of the other answers assume that the elements of A are always going to be 1, 2, 3, 4, 5, 6. If those values ever change then the solution would break, such as the values changing to 2, 3, 6, 7, 10.
Here's my solution that will work with any values of A.
List<int> m = A.Skip(1).Select((element, index) => element + A.ElementAt(index)).ToList();
It is worth noting that sticking with a loop would probably be better than hacking together a Linq solution for this.
In case you only need the end value, you can Aggregate it, ie. you need previous value, but dont need each individual value to a new list.
int last = 0;
var r = m.Aggregate(last, (acc, it) => (last += it), (acc) => (last));
Another option is to implement your own Buffer operator and use that to make a simple LINQ statement.
public static IEnumerable<IEnumerable<T>> Buffer<T>(this IEnumerable<T> source, int size)
{
var buffer = new List<T>();
foreach (var t in source)
{
buffer.Add(t);
if (buffer.Count() == size)
{
yield return buffer.AsEnumerable();
buffer = buffer.Skip(1).ToList();
}
}
}
That allows this code:
List<int> B = A.Buffer(2).Select(x => x.Sum()).ToList();

How do I get the index of the highest value in an array using LINQ?

I have an array of doubles and I want the index of the highest value. These are the solutions that I've come up with so far but I think that there must be a more elegant solution. Ideas?
double[] score = new double[] { 12.2, 13.3, 5, 17.2, 2.2, 4.5 };
int topScoreIndex = score.Select((item, indx) => new {Item = item, Index = indx}).OrderByDescending(x => x.Item).Select(x => x.Index).First();
topScoreIndex = score.Select((item, indx) => new {Item = item, Index = indx}).OrderBy(x => x.Item).Select(x => x.Index).Last();
double maxVal = score.Max();
topScoreIndex = score.Select((item, indx) => new {Item = item, Index = indx}).Where(x => x.Item == maxVal).Select(x => x.Index).Single();
Meh, why make it overcomplicated? This is the simplest way.
var indexAtMax = scores.ToList().IndexOf(scores.Max());
Yeah, you could make an extension method to use less memory, but unless you're dealing with huge arrays, you will never notice the difference.
I suggest writing your own extension method (edited to be generic with an IComparable<T> constraint.)
public static int MaxIndex<T>(this IEnumerable<T> sequence)
where T : IComparable<T>
{
int maxIndex = -1;
T maxValue = default(T); // Immediately overwritten anyway
int index = 0;
foreach (T value in sequence)
{
if (value.CompareTo(maxValue) > 0 || maxIndex == -1)
{
maxIndex = index;
maxValue = value;
}
index++;
}
return maxIndex;
}
Note that this returns -1 if the sequence is empty.
A word on the characteristics:
This works with a sequence which can only be enumerated once - this can sometimes be very important, and is generally a desirable feature IMO.
The memory complexity is O(1) (as opposed to O(n) for sorting)
The runtime complexity is O(n) (as opposed to O(n log n) for sorting)
As for whether this "is LINQ" or not: if it had been included as one of the standard LINQ query operators, would you count it as LINQ? Does it feel particularly alien or unlike other LINQ operators? If MS were to include it in .NET 4.0 as a new operator, would it be LINQ?
EDIT: If you're really, really hell-bent on using LINQ (rather than just getting an elegant solution) then here's one which is still O(n) and only evaluates the sequence once:
int maxIndex = -1;
int index=0;
double maxValue = 0;
int urgh = sequence.Select(value => {
if (maxIndex == -1 || value > maxValue)
{
maxIndex = index;
maxValue = value;
}
index++;
return maxIndex;
}).Last();
It's hideous, and I don't suggest you use it at all - but it will work.
var scoreList = score.ToList();
int topIndex =
(
from x
in score
orderby x
select scoreList.IndexOf(x)
).Last();
If score wasn't an array this wouldn't be half bad...
Try this one which is completely LINQ and has the best performance:
var indexAtMax = scores.Select((x, i) => new { x, i })
.Aggregate((a, a1) => a.x > a1.x ? a : a1).i;
This isn't the only Aggregate based solution, but this is really just a single line solution.
double[] score = new double[] { 12.2, 13.3, 5, 17.2, 2.2, 4.5 };
var max = score.Select((val,ix)=>new{val,ix})
.Aggregate(new{val=Double.MinValue,ix=-1},(z,last)=>z.val>=last.val?z:last);
Console.WriteLine ("maximum value is {0}", max.val );
Console.WriteLine ("index of maximum value is {0}", max.ix );
I had this problem today (to get the index in a users array who had highest age), and I did on this way:
var position = users.TakeWhile(u => u.Age != users.Max(x=>x.Age)).Count();
It was on C# class, so its noob solution, I´am sure your ones are better :)
System.Linq.Enumerable.Select with index and System.Linq.Enumerable.Aggregate would do it in one line
public static int IndexOfMax<TSource>(this IEnumerable<TSource> source)
where TSource : IComparable<TSource> => source.Select((value, idx) => (value, idx))
.Aggregate((aggr, next) => next.value.CompareTo(aggr.value) > 0 ? next : aggr).idx;
The worst possible complexity of this is O(2N) ~= O(N), but it needs to enumerate the collection two times.
void Main()
{
IEnumerable<int> numbers = new int[] { 1, 2, 3, 4, 5 };
int max = numbers.Max ();
int index = -1;
numbers.Any (number => { index++; return number == max; });
if(index != 4) {
throw new Exception("The result should have been 4, but " + index + " was found.");
}
"Simple test successful.".Dump();
}
If you want something that looks LINQy, in that it's purely functional, then Jon Skeets' answer above can be recast as:
public static int MaxIndex<T>(this IEnumerable<T> sequence) where T : IComparable<T>
{
return sequence.Aggregate(
new { maxIndex = -1, maxValue = default(T), thisIndex = 0 },
((agg, value) => (value.CompareTo(agg.maxValue) > 0 || agg.maxIndex == -1) ?
new {maxIndex = agg.thisIndex, maxValue = value, thisIndex = agg.thisIndex + 1} :
new {maxIndex = agg.maxIndex, maxValue = agg.maxValue, thisIndex = agg.thisIndex + 1 })).
maxIndex;
}
This has the same computational complexity as the other answer, but is more profligate with memory, creating an intermediate answer for each element of the enumerable.
Using other answers, I came up with this one; writing an extension:
public static int MaxIndex<T, R>(this IEnumerable<T> sequence, Func<T, R> evaluate) where R : IComparable<R>
{
var maxIndex = -1;
var maxValue = default(R);
var index = 0;
foreach (var value in sequence)
{
if (evaluate(value).CompareTo(maxValue) > 0 || maxIndex == -1)
{
maxIndex = index;
maxValue = evaluate(value);
}
index++;
}
return maxIndex;
}

Categories