LINQ query that combines grouping and sorting - c#

I am relatively new to LINQ and currently working on a query that combines grouping and sorting. I am going to start with an example here. Basically I have an arbitrary sequence of numbers represented as strings:
List<string> sNumbers = new List<string> {"34521", "38450", "138477", "38451", "28384", "13841", "12345"}
I need to find all sNumbers in this list that contain a search pattern (say "384")
then return the filtered sequence such that the sNumbers that start with the search pattern ("384") are sorted first followed by the remaining sNumbers that contain the search pattern somewhere. So it will be like this (please also notice the alphabetical sort with in the groups):
{"38450", "38451", "13841", "28384", "138477"}
Here is how I have started:
outputlist = (from n in sNumbers
where n.Contains(searchPattern
select n).ToList();
So now we have all number that contain the search pattern. And this is where I am stuck. I know that at this point I need to 'group' the results into two sequences. One that start with the search pattern and other that don't. Then apply a secondary sort in each group alphabetically. How do I write a query that combines all that?

I think you don't need any grouping nor list splitting for getting your desired result, so instead of answer about combining and grouping I will post what I would do to get desired result:
sNumbers.Where(x=>x.Contains(pattern))
.OrderByDescending(x => x.StartsWith(pattern)) // first criteria
.ThenBy(x=>Convert.ToInt32(x)) //this do the trick instead of GroupBy
.ToList();

This seems fairly straight forward, unless I've misunderstood something:
List<string> outputlist =
sNumbers
.Where(n => n.Contains("384"))
.OrderBy(n => int.Parse(n))
.OrderByDescending(n => n.StartsWith("384"))
.ToList();
I get this:

var result = sNumbers
.Where(e => e.StartsWith("384"))
.OrderBy(e => Int32.Parse(e))
.Union(sNumbers
.Where(e => e.Contains("384"))
.OrderBy(e => Int32.Parse(e)));

Here the optimized version which only needs one LINQ statement:
string match = "384";
List<string> sNumbers = new List<string> {"34521", "38450", "138477", "38451", "28384", "13841", "12345"};
// That's all it is
var result =
(from x in sNumbers
group x by new { Start = x.StartsWith(match), Contain = x.Contains(match)}
into g
where g.Key.Start || g.Key.Contain
orderby !g.Key.Start
select g.OrderBy(Convert.ToInt32)).SelectMany(x => x);
result.ToList().ForEach(x => Console.Write(x + " "));
Steps:
1.) Group into group g based on StartsWith and Contains
2.) Just select those groups which contain the match
3.) Order by the inverse of the StartsWith key (So that StartsWith = true comes before StartsWith = false)
4.) Select the sorted list of elements of both groups
5.) Do a flatMap (SelectMany) over both lists to receive one final result list
Here an unoptimized version:
string match = "384";
List<string> sNumbers = new List<string> {"34521", "38450", "138477", "38451", "28384", "13841", "12345"};
var matching = from x in sNumbers
where x.StartsWith(match)
orderby Convert.ToInt32(x)
select x;
var nonMatching = from x in sNumbers
where !x.StartsWith(match) && x.Contains(match)
orderby Convert.ToInt32(x)
select x;
var result = matching.Concat(nonMatching);
result.ToList().ForEach(x => Console.Write(x + " "));

Linq has an OrderBy method that allows you give a custom class for deciding how things should be sorted. Look here: https://msdn.microsoft.com/en-us/library/bb549422(v=vs.100).aspx
Then you can write your IComparer class that takes a value in the constructor, then a Compare method that prefers values that start with that value.
Something like this maybe:
public class CompareStringsWithPreference : IComparer<string> {
private _valueToPrefer;
public CompareStringsWithPreference(string valueToPrefer) {
_valueToPrefer = valueToPrefer;
}
public int Compare(string s1, string s2) {
if ((s1.StartsWith(_valueToPrefer) && s2.StartsWith(_valueToPrefer)) ||
(!s1.StartsWith(_valueToPrefer) && !s2.StartsWith(_valueToPrefer)))
return string.Compare(s1, s2, true);
if (s1.StartsWith(_valueToPrefer)) return -1;
if (s2.StartsWith(_valueToPrefer)) return 1;
}
}
Then use it like this:
outputlist = (from n in sNumbers
where n.Contains(searchPattern)
select n).OrderBy(n, new CompareStringsWithPreference(searchPattern))ToList();

You can create a list with strings starting with searchPattern variable and another containing searchPattern but not starting with (to avoid repeating elements in both lists):
string searchPattern = "384";
List<string> sNumbers = new List<string> { "34521", "38450", "138477", "38451", "28384", "13841", "12345" };
var list1 = sNumbers.Where(s => s.StartsWith(searchPattern)).OrderBy(s => s).ToList();
var list2 = sNumbers.Where(s => !s.StartsWith(searchPattern) && s.Contains(searchPattern)).OrderBy(s => s).ToList();
var outputList = new List<string>();
outputList.AddRange(list1);
outputList.AddRange(list2);

Sorry guys, after reading through the responses, I realize that I made a mistake in my question. The correct answer would be as follows: (sort by "starts with" first and then alphabetically (not numerically)
// output: {"38450", "38451", "13841", "138477", "28384"}
I was able to achieve that with the following query:
string searchPattern = "384";
List<string> result =
sNumbers
.Where(n => n.Contains(searchpattern))
.OrderBy(s => !s.StartsWith(searchpattern))
.ThenBy(s => s)
.ToList();
Thanks

Related

Select list elements contained in another list in linq

I have a string with "|" seperators:
string s = "item1|item2|item3|item4";
a list of objects that each have a name and value:
//object
List<ItemObject> itemList = new List<ItemObject>();
itemList.Add(new ItemObject{Name="item0",Value=0});
itemList.Add(new ItemObject{Name="item1",Value=1});
//class
public class ItemObject(){
public string Name {get;set;}
public int Value {get;set;}
}
How could the following code be done in one line in linq?
var newList = new List<object>();
foreach (var item in s.Split("|"))
{
newList.Add(itemList.FirstOrDefault(x => x.Name == item));
}
// Result: newList
// {Name="item1",Value=1}
I would suggest to start from splitting the string in the beginning. By doing so we won't split it during each iteration:
List<ItemObject> newList = s
.Split("|")
.SelectMany(x => itemList.Where(i => i.Name == x))
.ToList();
Or even better:
List<ItemObject> newList = s
.Split("|") // we can also pass second argument: StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries
.Distinct() // remove possible duplicates, we can also specify comparer f.e. StringComparer.CurrentCulture
.SelectMany(x => itemList
.Where(i => string.Equals(i.Name, x))) // it is better to use string.Equals, we can pass comparison as third argument f.e. StringComparison.CurrentCulture
.ToList();
Try this:
var newList = itemList.Where(item => s.Split('|').Contains(item.Name));
The proposed solution also prevents from populating newList with nulls from nonpresent items. You may also consider a more strict string equality check.
string s = "item1|item2|item3|item4";
I don't see a need for splitting this string s. So you could simply do
var newList = itemList.Where(i => s.Contains(i.Name));
For different buggy input you can also do
s = "|" + s + "|";
var newList = itemList.Where(o => s.Contains("|" + o.Name + '|')).ToList();
List<object> newList = itemList.Where(item => s.Split("|").Contains(item.Name)).ToList<object>();

Compare two lists from user

I have a predefined list List words.Say it has 7 elements:
List<string> resourceList={"xyz","dfgabr","asxy", "abec","def","geh","mnbj"}
Say, the user gives an input "xy+ ab" i.e he wants to search for "xy" or "ab"
string searchword="xy+ ab";
Then I have to find all the words in the predefined list which have "xy" or "ab" i.e all words split by '+'
So, the output will have:
{"xyz","dfgabr","abec",""}
I am trying something like:
resourceList.Where(s => s.Name.ToLower().Contains(searchWords.Any().ToString().ToLower())).ToList()
But, I am unable to frame the LINQ query as there are 2 arrays and one approach I saw was concatenate 2 arrays and then try; but since my second array only contains part of the first array, my LINQ does not work.
You need to first split your search pattern with + sign and then you can easily find out which are those item in list that contains your search pattern,
var result = resourceList.Where(x => searchword.Split('+').Any(y => x.Contains(y.Trim()))).ToList();
Where:
Your resourceList is
List<string> resourceList = new List<string> { "xyz", "dfgabr", "asxy", "abec", "def", "geh", "mnbj" };
And search pattern is,
string searchword = "xy+ ab";
Output: (From Debugger)
Try following which doesn't need Regex :
List<string> resourceList= new List<string>() {"xyz","dfgabr","asxy","abec","def","geh","mnbj"};
List<string> searchPattern = new List<string>() {"xy","ab"};
List<string> results = resourceList.Where(r => searchPattern.Any(s => r.Contains(s))).ToList();
You can try querying with a help of Linq:
List<string> resourceList = new List<string> {
"xyz", "dfgabr", "asxy", "abec", "def", "geh", "mnbj"
};
string input = "xy+ ab";
string[] toFind = input
.Split('+')
.Select(item => item.Trim()) // we are looking for "ab", not for " ab"
.ToArray();
// {"xyz", "dfgabr", "asxy", "abec"}
string[] result = resourceList
.Where(item => toFind
.Any(find => item.IndexOf(find) >= 0))
.ToArray();
// Let's have a look at the array
Console.Write(string.Join(", ", result));
Outcome:
xyz, dfgabr, asxy, abec
If you want to ignore case, add StringComparison.OrdinalIgnoreCase parameter to IndexOf
string[] result = resourceList
.Where(item => toFind
.Any(find => item.IndexOf(find, StringComparison.OrdinalIgnoreCase) >= 0))
.ToArray();

C# Compare one List string with substring of other list string

I have two lists
List<string> ingoreEducationKeywords= new List<string>(){"Uni", "School", "College",};
List<string> userEducation= new List<string>(){"MCS", "BCS", "School of Arts","College of Medicine"};
Now I want to get a list which has no substring from the ignore list.
require list {"MCS", "BCS"}
That's a relatively straightforward query that can be constructed with Any or All, depending on your preferences:
var res = userEducation
.Where(s => !ingoreEducationKeywords.Any(ignored => s.Contains(ignored)))
.ToList();
or
var res = userEducation
.Where(s => ingoreEducationKeywords.All(ignored => !s.Contains(ignored)))
.ToList();
If the lists are very large, you could improve performance by using regex to match all words simultaneously:
var regex = new Regex(
string.Join("|", ingoreEducationKeywords.Select(Regex.Escape))
);
var res = userEducation.Where(s => !regex.IsMatch(s)).ToList();
Demo.
It's a matter of phrasing what you want in a way that leads to a natural translation into LINQ:
You want items from userEducation (that suggests you'll start with userEducation)
Where none of ignoreEducationKeywords are substrings.
"None" is equivalent to "not any"
To check for substrings you can use Contains
That leads to:
var query = userEducation
.Where(candidate => !ignoredKeyWords.Any(ignore => candidate.Contains(ignore)));
The same thought process can help in many other queries.
Another option would be to create your own None extension method, assuming you're using LINQ to Objects:
public static class Extensions
{
public static bool None(this IEnumerable<T> source, Func<T, bool> predicate)
=> !source.Any(predicate);
}
Then you could rewrite the query without the negation:
var query = userEducation
.Where(candidate => ignoredKeyWords.None(ignore => candidate.Contains(ignore)));
You can use Where, Any and Contains:
var list = userEducation.Where(ed => !ingoreEducationKeywords.Any(ik => ed.Contains(ik)));
It searches all occurences in userEducation where the education does not have any match in ingoreEducationKeywords.
List<string> ingoreEducationKeywords = new List<string>() { "Uni", "School", "College", };
List<string> userEducation = new List<string>() { "MCS", "BCS", "School of Arts", "College of Medicine" };
var result = userEducation.Where(r => !ingoreEducationKeywords.Any(t => r.Contains(t))).ToList();

LINQ Query to find string of multidimensional array with most duplicates

I have written a function that gives me an multidimensional array of an Match with multiple regex strings. (FileCheck[][])
FileCheck[0] // This string[] contains all the filenames
FileCheck[1] // This string[] is 0 or 1 depending on a Regex match is found.
FileCheck[2] // This string[] contains the Index of the first found Regex.
foreach (string File in InputFolder)
{
int j = 0;
FileCheck[0][k] = Path.GetFileName(File);
Console.WriteLine(FileCheck[0][k]);
foreach (Regex Filemask in Filemasks)
{
if (string.IsNullOrEmpty(FileCheck[1][k]) || FileCheck[1][k] == "0")
{
if (Filemask.IsMatch(FileCheck[0][k]))
{
FileCheck[1][k] = "1";
FileCheck[2][k] = j.ToString(); // This is the Index of the Regex thats Valid
}
else
{
FileCheck[1][k] = "0";
}
j++;
}
Console.WriteLine(FileCheck[1][k]);
}
k++;
}
Console.ReadLine();
// I need the Index of the Regex with the most valid hits
I'm trying to write a function that gives me the string of the RegexIndex that has the most duplicates.
This is what I tried but did not work :( (I only get the count of the string the the most duplicates but not the string itself)
// I need the Index of the Regex with the most valid hits
var LINQ = Enumerable.Range(0, FileCheck[0].GetLength(0))
.Where(x => FileCheck[1][x] == "1")
.GroupBy(x => FileCheck[2][x])
.OrderByDescending(x => x.Count())
.First().ToList();
Console.WriteLine(LINQ[1]);
Example Data
string[][] FileCheck = new string[3][];
FileCheck[0] = new string[]{ "1.csv", "TestValid1.txt", "TestValid2.txt", "2.xml", "TestAlsoValid.xml", "TestValid3.txt"};
FileCheck[1] = new string[]{ "0","1","1","0","1","1"};
FileCheck[2] = new string[]{ null, "3", "3", null,"1","2"};
In this example I need as result of the Linq query:
string result = "3";
With your current code, substituting 'ToList()' with 'Key' would do the trick.
var LINQ = Enumerable.Range(0, FileCheck[0].GetLength(0))
.Where(x => FileCheck[1][x] == "1")
.GroupBy(x => FileCheck[2][x])
.OrderByDescending(x => x.Count())
.First().Key;
Since the index is null for values that are not found, you could also filter out null values and skip looking at the FileCheck[1] array. For example:
var maxOccurringIndex = FileCheck[2].Where(ind => ind != null)
.GroupBy(ind=>ind)
.OrderByDescending(x => x.Count())
.First().Key;
However, just a suggestion, you can use classes instead of a nested array, e.g.:
class FileCheckInfo
{
public string File{get;set;}
public bool Match => Index.HasValue;
public int? Index{get;set;}
public override string ToString() => $"{File} [{(Match ? Index.ToString() : "no match")}]";
}
Assuming InputFolder is an enumerable of string and Filemasks an enumerable of 'Regex', an array can be filled with:
FileCheckInfo[] FileCheck = InputFolder.Select(f=>
new FileCheckInfo{
File = f,
Index = Filemasks.Select((rx,ind) => new {ind, IsMatch = rx.IsMatch(f)}).FirstOrDefault(r=>r.IsMatch)?.ind
}).ToArray();
Getting the max occurring would be much the same:
var maxOccurringIndex = FileCheck.Where(f=>f.Match).GroupBy(f=>f.Index).OrderByDescending(gr=>gr.Count()).First().Key;
edit PS, the above is all assuming you need to reuse the results, if you only have to find the maximum occurrence you're much better of with an approach such as Martin suggested!
If the goal is only to get the max occurrence, you can use:
var maxOccurringIndex = Filemasks.Select((rx,ind) => new {ind, Count = InputFolder.Count(f=>rx.IsMatch(f))})
.OrderByDescending(m=>m.Count).FirstOrDefault()?.ind;
Your question and code seems very convoluted. I am guessing that you have a list of file names and another list of file masks (regular expressions) and you want to find the file mask that matches most file names. Here is a way to do that:
var fileNames = new[] { "1.csv", "TestValid1.txt", "TestValid2.txt", "2.xml", "TestAlsoValid.xml", "TestValid3.txt" };
var fileMasks = new[] { #"\.txt$", #"\.xml$", "valid" };
var fileMaskWithMostMatches = fileMasks
.Select(
fileMask => new {
FileMask = fileMask,
FileNamesMatched = fileNames.Count(
fileName => Regex.Match(
fileName,
fileMask,
RegexOptions.IgnoreCase | RegexOptions.CultureInvariant
)
.Success
)
}
)
.OrderByDescending(x => x.FileNamesMatched)
.First()
.FileMask;
With the sample data the value of fileMaskWithMostMatches is valid.
Note that the Regex class will do some caching of regular expressions but if you have many regular expressions it will be more effecient to create the regular expressions outside the implied fileNames.Count for-each loop to avoid recreating the same regular expression again and again (creating a regular expression may take a non-trivial amount of time depending on the complexity).
As an alternative to Martin's answer, here's a simpler version to your existing Linq query that gives the desired result;
var LINQ = FileCheck[2]
.ToLookup(x => x) // Makes a lookup table
.OrderByDescending(x => x.Count()) // Sorts by count, descending
.Select(x => x.Key) // Extract the key
.FirstOrDefault(x => x != null); // Return the first non null key
// or null if none found.
Isn't this much more easier?
string result = FileCheck[2]
.Where(x => x != null)
.GroupBy(x => x)
.OrderByDescending(x => x.Count())
.FirstOrDefault().Key;

Modifying an IEnumerable type

I have a a string IEnumerable type that I get from the below code.The var groups is an Enumerable type which has some string values. Say there are 4 values in groups and in the second position the value is just empty string "" .The question is how can I move it to the 4th ie the end position.I do not want to sort or change any order.Just move the empty "" value whereever it occurs to the last position.
List<Item> Items = somefunction();
var groups = Items.Select(g => g.Category).Distinct();
Simply order the results by their string value:
List<Item> Items = somefunction();
var groups = Items.Select(g => g.Category).Distinct().OrderByDescending(s => s);
Edit (following OP edit):
List<Item> Items = somefunction();
var groups = Items.Select(g => g.Category).Distinct();
groups = groups.Where(s => !String.IsNullOrEmpty(s))
.Concat(groups.Where(s => String.IsNullOrEmpty(s)));
You can't directly modify the IEnumerable<> instance, but you can create a new one:
var list = groups.Where(x => x != "").Concat(groups.Where(x => x == ""));
Note that in this query, groups is iterated twice. This is usually not a good practice for a deferred IEnumerable<>, so you should call ToList() after the Distinct() to eagerly evaluate your LINQ query:
var groups = Items.Select(g => g.Category).Distinct().ToList();
EDIT :
On second thought, there's a much easier way to do this:
var groups = Items.Select(g => g.Category).Distinct().OrderBy(x => x == "");
Note that this doesn't touch the order of the non-empty elements since OrderBy is stable.
var groups = Items.Select(g => g.Category).Distinct().OrderByDescending(s =>s);
I don't like my query but it should do the job. It selects all items which are not empty and unions it with the items which are empty.
var groups = Items.Select(g => g.Category).Distinct()
.Where(s => !string.IsNullOrEmpty(s))
.Union(Items.Select(g => g.Category).Distinct()
.Where(s => string.IsNullOrEmpty(s)));
Try something like
var temp = groups.Where(item => ! String.IsNullOrEmpty(item)).ToList<string>();
while (temp.Count < groups.Count) temp.Add("");

Categories