LINQ to find array indexes of a value - c#

Assuming I have the following string array:
string[] str = new string[] {"max", "min", "avg", "max", "avg", "min"}
Is it possbile to use LINQ to get a list of indexes that match one string?
As an example, I would like to search for the string "avg" and get a list containing
2, 4
meaning that "avg" can be found at str[2] and str[4].

.Select has a seldom-used overload that produces an index. You can use it like this:
str.Select((s, i) => new {i, s})
.Where(t => t.s == "avg")
.Select(t => t.i)
.ToList()
The result will be a list containing 2 and 4.
Documentation here

You can do it like this:
str.Select((v,i) => new {Index = i, Value = v}) // Pair up values and indexes
.Where(p => p.Value == "avg") // Do the filtering
.Select(p => p.Index); // Keep the index and drop the value
The key step is using the overload of Select that supplies the current index to your functor.

You can use the overload of Enumerable.Select that passes the index and then use Enumerable.Where on an anonymous type:
List<int> result = str.Select((s, index) => new { s, index })
.Where(x => x.s== "avg")
.Select(x => x.index)
.ToList();
If you just want to find the first/last index, you have also the builtin methods List.IndexOf and List.LastIndexOf:
int firstIndex = str.IndexOf("avg");
int lastIndex = str.LastIndexOf("avg");
(or you can use this overload that take a start index to specify the start position)

First off, your code doesn't actually iterate over the list twice, it only iterates it once.
That said, your Select is really just getting a sequence of all of the indexes; that is more easily done with Enumerable.Range:
var result = Enumerable.Range(0, str.Count)
.Where(i => str[i] == "avg")
.ToList();
Understanding why the list isn't actually iterated twice will take some getting used to. I'll try to give a basic explanation.
You should think of most of the LINQ methods, such as Select and Where as a pipeline. Each method does some tiny bit of work. In the case of Select you give it a method, and it essentially says, "Whenever someone asks me for my next item I'll first ask my input sequence for an item, then use the method I have to convert it into something else, and then give that item to whoever is using me." Where, more or less, is saying, "whenever someone asks me for an item I'll ask my input sequence for an item, if the function say it's good I'll pass it on, if not I'll keep asking for items until I get one that passes."
So when you chain them what happens is ToList asks for the first item, it goes to Where to as it for it's first item, Where goes to Select and asks it for it's first item, Select goes to the list to ask it for its first item. The list then provides it's first item. Select then transforms that item into what it needs to spit out (in this case, just the int 0) and gives it to Where. Where takes that item and runs it's function which determine's that it's true and so spits out 0 to ToList, which adds it to the list. That whole thing then happens 9 more times. This means that Select will end up asking for each item from the list exactly once, and it will feed each of its results directly to Where, which will feed the results that "pass the test" directly to ToList, which stores them in a list. All of the LINQ methods are carefully designed to only ever iterate the source sequence once (when they are iterated once).
Note that, while this seems complicated at first to you, it's actually pretty easy for the computer to do all of this. It's not actually as performance intensive as it may seem at first.

While you could use a combination of Select and Where, this is likely a good candidate for making your own function:
public static IEnumerable<int> Indexes<T>(IEnumerable<T> source, T itemToFind)
{
if (source == null)
throw new ArgumentNullException("source");
int i = 0;
foreach (T item in source)
{
if (object.Equals(itemToFind, item))
{
yield return i;
}
i++;
}
}

You need a combined select and where operator, comparing to accepted answer this will be cheaper, since won't require intermediate objects:
public static IEnumerable<TResult> SelectWhere<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, bool> filter, Func<TSource, int, TResult> selector)
{
int index = -1;
foreach (var s in source)
{
checked{ ++index; }
if (filter(s))
yield return selector(s, index);
}
}

Related

Can I use LINQ to find out if a given property in a list repeats itself?

I have a list of objects that have a name field on them.
I want to know if there's a way to tell if all the name fields are unique in the list.
I could just do two loops and iterate over the list for each value, but I wanted to know if there's a cleaner way to do this using LINQ?
I've found a few examples where they compare each item of the list to a hard coded value but in my case I want to compare the name field on each object between each other and obtain a boolean value.
A common "trick" to check for uniqueness is to compare the length of a list with duplicates removed with the length of the original list:
bool allNamesAreUnique = myList.Select(x => x.Name).Distinct().Count() == myList.Count();
Select(x => x.Name) transforms your list into a list of just the names, and
Distict() removes the duplicates.
The performance should be close to O(n), which is better than the O(n²) nested-loop solution.
Another option is to group your list by the name and check the size of those groups. This has the additional advantage of telling you which values are not unique:
var duplicates = myList.GroupBy(x => x.Name).Where(g => g.Count() > 1);
bool hasDuplicates = duplicates.Any(); // or
List<string> duplicateNames = duplicates.Select(g => g.Key).ToList();
While you can use LINQ to group or create a distinct list, and then compare item-wise with the original list, that incurs a bit of overhead you might not want, especially for a very large list. A more efficient solution would store the keys in a HashSet, which has better lookup capability, and check for duplicates in a single loop. This solution still uses a little bit of LINQ so it satisfies your requirements.
static public class ExtensionMethods
{
static public bool HasDuplicates<TItem,TKey>(this IEnumerable<TItem> source, Func<TItem,TKey> func)
{
var found = new HashSet<TKey>();
foreach (var key in source.Select(func))
{
if (found.Contains(key)) return true;
found.Add(key);
}
return false;
}
}
If you are looking for duplicates in a field named Name, use it like this:
var hasDuplicates = list.HasDuplicates( item => item.Name );
If you want case-insensitivity:
var hasDuplicates = list.HasDuplicates( item => item.Name.ToUpper() );

Check array for duplicates, return only items which appear more than once

I have an text document of emails such as
Google12#gmail.com,
MyUSERNAME#me.com,
ME#you.com,
ratonabat#co.co,
iamcool#asd.com,
ratonabat#co.co,
I need to check said document for duplicates and create a unique array from that (so if "ratonabat#co.co" appears 500 times in the new array he'll only appear once.)
Edit:
For an example:
username1#hotmail.com
username2#hotmail.com
username1#hotmail.com
username1#hotmail.com
username1#hotmail.com
username1#hotmail.com
This is my "data" (either in an array or text document, I can handle that)
I want to be able to see if there's a duplicate in that, and move the duplicate ONCE to another array. So the output would be
username1#hotmail.com
You can simply use Linq's Distinct extension method:
var input = new string[] { ... };
var output = input.Distinct().ToArray();
You may also want to consider refactoring your code to use a HashSet<string> instead of a simple array, as it will gracefully handle duplicates.
To get an array containing only those records which are duplicates, it's a little moe complex, but you can still do it with a little Linq:
var output = input.GroupBy(x => x)
.Where(g => g.Skip(1).Any())
.Select(g => g.Key)
.ToArray();
Explanation:
.GroupBy group identical strings together
.Where filter the groups by the following criteria
.Skip(1).Any() return true if there are 2 or more items in the group. This is equivalent to .Count() > 1, but it's slightly more efficient because it stops counting after it finds a second item.
.Select return a set consisting only of a single string (rather than the group)
.ToArray convert the result set to an array.
Here's another solution using a custom extension method:
public static class MyExtensions
{
public static IEnumerable<T> Duplicates<T>(this IEnumerable<T> input)
{
var a = new HashSet<T>();
var b = new HashSet<T>();
foreach(var x in input)
{
if (!a.Add(x) && b.Add(x))
yield return x;
}
}
}
And then you can call this method like this:
var output = input.Duplicates().ToArray();
I haven't benchmarked this, but it should be more efficient than the previous method.
You can use the built in in .Distinct() method, by default the comparisons are case sensitive, if you want to make it case insenstive use the overload that takes a comparer in and use a case insensitive string comparer.
List<string> emailAddresses = GetListOfEmailAddresses();
string[] uniqueEmailAddresses = emailAddresses.Distinct(StringComparer.OrdinalIgnoreCase).ToArray();
EDIT: Now I see after you made your clarification you only want to list the duplicates.
string[] duplicateAddresses = emailAddresses.GroupBy(address => address,
(key, rows) => new {Key = key, Count = rows.Count()},
StringComparer.OrdinalIgnoreCase)
.Where(row => row.Count > 1)
.Select(row => row.Key)
.ToArray();
To select emails which occur more then once..
var dupEmails=from emails in File.ReadAllText(path).Split(',').GroupBy(x=>x)
where emails.Count()>1
select emails.Key;

Determine if an int value exists once in an array

I know you can use Any, Exists, and Single with LINQ but can't quite get this to work. I need to do a lookup based on an id to see if it's in the array and make sure that there is only ONE match on that value. because if there are 2 it's gonna cause an issue..the requirement that I'm checking is that the array only has one and only one of each ID in the array.
Here's what I tried
if(someIntArray.Single(item => item = 3)
//... we found the value 8 in the array only once so now we can be confident and do something
Here's how I would solve this:
if (someIntArray.Count(item => item == 3) == 1)
{
//only one '3' found in the array
...
}
I created a One() extension method set for just this situation:
public static bool One<T>(this IEnumerable<T> sequence)
{
var enumerator = sequence.GetEnumerator();
return enumerator.MoveNext() && !enumerator.MoveNext();
}
public static bool One<T>(this IEnumerable<T> sequence, Func<T, bool> predicate)
{
return sequence.Where(predicate).One();
}
//usage
if (someIntArray.One(item => item == 3)) ...
The problem with Single() is that it throws an exception if there isn't exactly one element. You can wrap it in a try-catch, but these are cleaner, and more efficient than Count() in most cases where there's more than one matching element. Unfortunately, there's no way around having to check the entire array to verify that there are either no elements or only one that matches a predicate, but this will at least "fail fast" if there are two or more, where Count() will always evaluate the entire Enumerable whether there's one matching element or fifty.
I think you're overthinking this.
var targetNumber = 3;
var hasExactlyOne = someIntArray.Count(i => i == targetNumber) == 1;
Using LINQ expression:
var duplicates = from i in new int[] { 2,3,4,4,5,5 }
group i by i into g
where g.Count() > 1
select g.Key
Results:
{4,5}
And of course you could check duplicates.Count() > 0 or log the ones that are a problem or whatever you need to do.
got it working:
if(someIntArray.Single(item => item = 3) > 0)
doh

Sort one list by another

I have 2 list objects, one is just a list of ints, the other is a list of objects but the objects has an ID property.
What i want to do is sort the list of objects by its ID in the same sort order as the list of ints.
Ive been playing around for a while now trying to get it working, so far no joy,
Here is what i have so far...
//**************************
//*** Randomize the list ***
//**************************
if (Session["SearchResultsOrder"] != null)
{
// save the session as a int list
List<int> IDList = new List<int>((List<int>)Session["SearchResultsOrder"]);
// the saved list session exists, make sure the list is orded by this
foreach(var i in IDList)
{
SearchData.ReturnedSearchedMembers.OrderBy(x => x.ID == i);
}
}
else
{
// before any sorts randomize the results - this mixes it up a bit as before it would order the results by member registration date
List<Member> RandomList = new List<Member>(SearchData.ReturnedSearchedMembers);
SearchData.ReturnedSearchedMembers = GloballyAvailableMethods.RandomizeGenericList<Member>(RandomList, RandomList.Count).ToList();
// save the order of these results so they can be restored back during postback
List<int> SearchResultsOrder = new List<int>();
SearchData.ReturnedSearchedMembers.ForEach(x => SearchResultsOrder.Add(x.ID));
Session["SearchResultsOrder"] = SearchResultsOrder;
}
The whole point of this is so when a user searches for members, initially they display in a random order, then if they click page 2, they remain in that order and the next 20 results display.
I have been reading about the ICompare i can use as a parameter in the Linq.OrderBy clause, but i can’t find any simple examples.
I’m hoping for an elegant, very simple LINQ style solution, well I can always hope.
Any help is most appreciated.
Another LINQ-approach:
var orderedByIDList = from i in ids
join o in objectsWithIDs
on i equals o.ID
select o;
One way of doing it:
List<int> order = ....;
List<Item> items = ....;
Dictionary<int,Item> d = items.ToDictionary(x => x.ID);
List<Item> ordered = order.Select(i => d[i]).ToList();
Not an answer to this exact question, but if you have two arrays, there is an overload of Array.Sort that takes the array to sort, and an array to use as the 'key'
https://msdn.microsoft.com/en-us/library/85y6y2d3.aspx
Array.Sort Method (Array, Array)
Sorts a pair of one-dimensional Array objects (one contains the keys
and the other contains the corresponding items) based on the keys in
the first Array using the IComparable implementation of each key.
Join is the best candidate if you want to match on the exact integer (if no match is found you get an empty sequence). If you want to merely get the sort order of the other list (and provided the number of elements in both lists are equal), you can use Zip.
var result = objects.Zip(ints, (o, i) => new { o, i})
.OrderBy(x => x.i)
.Select(x => x.o);
Pretty readable.
Here is an extension method which encapsulates Simon D.'s response for lists of any type.
public static IEnumerable<TResult> SortBy<TResult, TKey>(this IEnumerable<TResult> sortItems,
IEnumerable<TKey> sortKeys,
Func<TResult, TKey> matchFunc)
{
return sortKeys.Join(sortItems,
k => k,
matchFunc,
(k, i) => i);
}
Usage is something like:
var sorted = toSort.SortBy(sortKeys, i => i.Key);
One possible solution:
myList = myList.OrderBy(x => Ids.IndexOf(x.Id)).ToList();
Note: use this if you working with In-Memory lists, doesn't work for IQueryable type, as IQueryable does not contain a definition for IndexOf
docs = docs.OrderBy(d => docsIds.IndexOf(d.Id)).ToList();

Return Modal Average in LINQ (Mode)

I am not sure if CopyMost is the correct term to use here, but it's the term my client used ("CopyMost Data Protocol"). Sounds like he wants the mode? I have a set of data:
Increment Value
.02 1
.04 1
.06 1
.08 2
.10 2
I need to return which Value occurs the most "CopyMost". In this case, the value is 1. Right now I had planned on writing an Extension Method for IEnumerable to do this for integer values. Is there something built into Linq that already does this easily? Or is it best for me to write an extension method that would look something like this
records.CopyMost(x => x.Value);
EDIT
Looks like I am looking for the modal average. I've provided an updated answer that allows for a tiebreaker condition. It's meant to be used like this, and is generic.
records.CopyMost(x => x.Value, x => x == 0);
In this case x.Value would be an int, and if the the count of 0s was the same as the counts of 1s and 3s, it would tiebreak on 0.
Well, here's one option:
var query = (from item in data
group 1 by item.Value into g
orderby g.Count() descending
select g.Key).First();
Basically we're using GroupBy to group by the value - but all we're interested in for each group is the size of the group and the key (which is the original value). We sort the groups by size, and take the first element (the one with the most elements).
Does that help?
Jon beat me to it, but the term you're looking for is Modal Average.
Edit:
If I'm right In thinking that it's modal average you need then the following should do the trick:
var i = (from t in data
group t by t.Value into aggr
orderby aggr.Count() descending
select aggr.Key).First();
This method has been updated several times in my code over the years. It's become a very important method, and is much different than it use to be. I wanted to provide the most up to date version in case anyone was looking to add CopyMost or a Modal Average as a linq extension.
One thing I did not think I would need was a tiebreaker of some sort. I have now overloaded the method to include a tiebreaker.
public static K CopyMost<T, K>(this IEnumerable<T> records, Func<T, K> propertySelector, Func<K, bool> tieBreaker)
{
var grouped = records.GroupBy(x => propertySelector(x)).Select(x => new { Group = x, Count = x.Count() });
var maxCount = grouped.Max(x => x.Count);
var subGroup = grouped.Where(x => x.Count == maxCount);
if (subGroup.Count() == 1)
return subGroup.Single().Group.Key;
else
return subGroup.Where(x => tieBreaker(x.Group.Key)).Single().Group.Key;
}
The above assumes the user enters a legitimate tiebreaker condition. You may want to check and see if the tiebreaker returns a valid value, and if not, throw an exception. And here's my normal method.
public static K CopyMost<T, K>(this IEnumerable<T> records, Func<T, K> propertySelector)
{
return records.GroupBy(x => propertySelector(x)).OrderByDescending(x => x.Count()).Select(x => x.Key).First();
}

Categories