LINQ Query to find string of multidimensional array with most duplicates - c#

I have written a function that gives me an multidimensional array of an Match with multiple regex strings. (FileCheck[][])
FileCheck[0] // This string[] contains all the filenames
FileCheck[1] // This string[] is 0 or 1 depending on a Regex match is found.
FileCheck[2] // This string[] contains the Index of the first found Regex.
foreach (string File in InputFolder)
{
int j = 0;
FileCheck[0][k] = Path.GetFileName(File);
Console.WriteLine(FileCheck[0][k]);
foreach (Regex Filemask in Filemasks)
{
if (string.IsNullOrEmpty(FileCheck[1][k]) || FileCheck[1][k] == "0")
{
if (Filemask.IsMatch(FileCheck[0][k]))
{
FileCheck[1][k] = "1";
FileCheck[2][k] = j.ToString(); // This is the Index of the Regex thats Valid
}
else
{
FileCheck[1][k] = "0";
}
j++;
}
Console.WriteLine(FileCheck[1][k]);
}
k++;
}
Console.ReadLine();
// I need the Index of the Regex with the most valid hits
I'm trying to write a function that gives me the string of the RegexIndex that has the most duplicates.
This is what I tried but did not work :( (I only get the count of the string the the most duplicates but not the string itself)
// I need the Index of the Regex with the most valid hits
var LINQ = Enumerable.Range(0, FileCheck[0].GetLength(0))
.Where(x => FileCheck[1][x] == "1")
.GroupBy(x => FileCheck[2][x])
.OrderByDescending(x => x.Count())
.First().ToList();
Console.WriteLine(LINQ[1]);
Example Data
string[][] FileCheck = new string[3][];
FileCheck[0] = new string[]{ "1.csv", "TestValid1.txt", "TestValid2.txt", "2.xml", "TestAlsoValid.xml", "TestValid3.txt"};
FileCheck[1] = new string[]{ "0","1","1","0","1","1"};
FileCheck[2] = new string[]{ null, "3", "3", null,"1","2"};
In this example I need as result of the Linq query:
string result = "3";

With your current code, substituting 'ToList()' with 'Key' would do the trick.
var LINQ = Enumerable.Range(0, FileCheck[0].GetLength(0))
.Where(x => FileCheck[1][x] == "1")
.GroupBy(x => FileCheck[2][x])
.OrderByDescending(x => x.Count())
.First().Key;
Since the index is null for values that are not found, you could also filter out null values and skip looking at the FileCheck[1] array. For example:
var maxOccurringIndex = FileCheck[2].Where(ind => ind != null)
.GroupBy(ind=>ind)
.OrderByDescending(x => x.Count())
.First().Key;
However, just a suggestion, you can use classes instead of a nested array, e.g.:
class FileCheckInfo
{
public string File{get;set;}
public bool Match => Index.HasValue;
public int? Index{get;set;}
public override string ToString() => $"{File} [{(Match ? Index.ToString() : "no match")}]";
}
Assuming InputFolder is an enumerable of string and Filemasks an enumerable of 'Regex', an array can be filled with:
FileCheckInfo[] FileCheck = InputFolder.Select(f=>
new FileCheckInfo{
File = f,
Index = Filemasks.Select((rx,ind) => new {ind, IsMatch = rx.IsMatch(f)}).FirstOrDefault(r=>r.IsMatch)?.ind
}).ToArray();
Getting the max occurring would be much the same:
var maxOccurringIndex = FileCheck.Where(f=>f.Match).GroupBy(f=>f.Index).OrderByDescending(gr=>gr.Count()).First().Key;
edit PS, the above is all assuming you need to reuse the results, if you only have to find the maximum occurrence you're much better of with an approach such as Martin suggested!
If the goal is only to get the max occurrence, you can use:
var maxOccurringIndex = Filemasks.Select((rx,ind) => new {ind, Count = InputFolder.Count(f=>rx.IsMatch(f))})
.OrderByDescending(m=>m.Count).FirstOrDefault()?.ind;

Your question and code seems very convoluted. I am guessing that you have a list of file names and another list of file masks (regular expressions) and you want to find the file mask that matches most file names. Here is a way to do that:
var fileNames = new[] { "1.csv", "TestValid1.txt", "TestValid2.txt", "2.xml", "TestAlsoValid.xml", "TestValid3.txt" };
var fileMasks = new[] { #"\.txt$", #"\.xml$", "valid" };
var fileMaskWithMostMatches = fileMasks
.Select(
fileMask => new {
FileMask = fileMask,
FileNamesMatched = fileNames.Count(
fileName => Regex.Match(
fileName,
fileMask,
RegexOptions.IgnoreCase | RegexOptions.CultureInvariant
)
.Success
)
}
)
.OrderByDescending(x => x.FileNamesMatched)
.First()
.FileMask;
With the sample data the value of fileMaskWithMostMatches is valid.
Note that the Regex class will do some caching of regular expressions but if you have many regular expressions it will be more effecient to create the regular expressions outside the implied fileNames.Count for-each loop to avoid recreating the same regular expression again and again (creating a regular expression may take a non-trivial amount of time depending on the complexity).

As an alternative to Martin's answer, here's a simpler version to your existing Linq query that gives the desired result;
var LINQ = FileCheck[2]
.ToLookup(x => x) // Makes a lookup table
.OrderByDescending(x => x.Count()) // Sorts by count, descending
.Select(x => x.Key) // Extract the key
.FirstOrDefault(x => x != null); // Return the first non null key
// or null if none found.

Isn't this much more easier?
string result = FileCheck[2]
.Where(x => x != null)
.GroupBy(x => x)
.OrderByDescending(x => x.Count())
.FirstOrDefault().Key;

Related

Select list elements contained in another list in linq

I have a string with "|" seperators:
string s = "item1|item2|item3|item4";
a list of objects that each have a name and value:
//object
List<ItemObject> itemList = new List<ItemObject>();
itemList.Add(new ItemObject{Name="item0",Value=0});
itemList.Add(new ItemObject{Name="item1",Value=1});
//class
public class ItemObject(){
public string Name {get;set;}
public int Value {get;set;}
}
How could the following code be done in one line in linq?
var newList = new List<object>();
foreach (var item in s.Split("|"))
{
newList.Add(itemList.FirstOrDefault(x => x.Name == item));
}
// Result: newList
// {Name="item1",Value=1}
I would suggest to start from splitting the string in the beginning. By doing so we won't split it during each iteration:
List<ItemObject> newList = s
.Split("|")
.SelectMany(x => itemList.Where(i => i.Name == x))
.ToList();
Or even better:
List<ItemObject> newList = s
.Split("|") // we can also pass second argument: StringSplitOptions.RemoveEmptyEntries | StringSplitOptions.TrimEntries
.Distinct() // remove possible duplicates, we can also specify comparer f.e. StringComparer.CurrentCulture
.SelectMany(x => itemList
.Where(i => string.Equals(i.Name, x))) // it is better to use string.Equals, we can pass comparison as third argument f.e. StringComparison.CurrentCulture
.ToList();
Try this:
var newList = itemList.Where(item => s.Split('|').Contains(item.Name));
The proposed solution also prevents from populating newList with nulls from nonpresent items. You may also consider a more strict string equality check.
string s = "item1|item2|item3|item4";
I don't see a need for splitting this string s. So you could simply do
var newList = itemList.Where(i => s.Contains(i.Name));
For different buggy input you can also do
s = "|" + s + "|";
var newList = itemList.Where(o => s.Contains("|" + o.Name + '|')).ToList();
List<object> newList = itemList.Where(item => s.Split("|").Contains(item.Name)).ToList<object>();

C# Array of strings contains string part from another array of strings

Is there a way using LINQ, to find if string from one array of strings contains (partial) string from another array of strings? Something like this:
string[] fullStrings = { "full_xxx_part_name", "full_ccc_part_name", "full_zzz_part_name" };
string[] stringParts = { "a_part", "b_part", "c_part", "e_part" };
// compare fullStrings array with stringParts array
// full_ccc_part_name contains c_part (first match is OK, no need to find all)
// return index 1 (index 1 from fullStrings array)
This is asked rather for educational purpose.
I'm aware that Linq does not magically avoid the loop, instead does it in the background.
You can use Where + Any with string methods:
string[] matches = fullStrings
.Where(s => stringParts.Any(s.Contains))
.ToArray();
If you want to compare in a case insensitive way use IndexOf:
string[] matches = fullStrings
.Where(s => stringParts.Any(part => s.IndexOf(part, StringComparison.OrdinalIgnoreCase) >= 0))
.ToArray();
In case you want the indexes:
int[] matches = fullStrings
.Select((s, index) => (String: s, Index: index))
.Where(x => stringParts.Any(x.String.Contains))
.Select(x => x.Index)
.ToArray();
You would of course need to use some type of loop to find the index. Here is a solution using Linq.
This will return the first index if a match is found or -1 if none is found:
var index = fullStrings
.Select((s,i) => (s, i))
.Where(x => stringParts.Any(x.s.Contains))
.Select(x => x.i)
.DefaultIfEmpty(-1)
.First();

Compare and combine strings to get duplicates

I will try to describe my question in the best way I can.
I have a list with X strings ("NOTION", "CATION", "COIN", "NOON").
I am trying to compare them and find the most times each character (letter) was used, use that to get the number of that character, arrange them in alphabetical order, and create a string.
So the result string should be: "ACINNOOT"
Hope is clear what I am describing.
EDIT
So far:
for (int i = 0; i < currentWord.Length; i++)
{
string letter = word.Substring(i, 1);
tempDuplicatedLetterList.Add(letter);
}
// Which letters are repeated and how many times
var duplicatedQuery = tempDuplicatedLetterList.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(y => new { Element = y.Key, Counter = y.Count() })
.ToList();
I came to this, although I think there might be a cleaner way to do it:
var characterSets = new string[] { "NOTION", "CATION", "COIN", "NOON" }
.SelectMany(c => c.GroupBy(cc => cc)) // create character groups for each string, and flatten the groups
.GroupBy(c => c.Key) // group the groups
.OrderBy(cg => cg.Key) // order by the character (alphabetical)
.Select(cg => new string(cg.Key, cg.Max(v => v.Count()))) // create a string for each group, using the maximum count for that character
.ToArray(); // make an array
var result = string.Concat(characterSets);

Using C# Lambda to split string and search value

I have a string with the following value:
0:12211,90:33221,23:09011
In each pair, the first value (before the : (colon)) is an employee id, the second value after is a payroll id.
So If I want to get the payroll id for employee id 23 right now I have to do:
var arrayValues=mystring.split(',');
and then for each arrayValues do the same:
var employeeData = arrayValue.split(':');
That way I will get the key and the value.
Is there a way to get the Payroll ID by a given employee id using lambda?
If the employeeId is not in the string then by default it should return the payrollid for employeeid 0 zero.
Using a Linq pipeline and anonymous objects:
"0:12211,90:33221,23:09011"
.Split(',')
.Select(x => x.Split(':'))
.Select(x => new { employeeId = x[0], payrollId = x[1] })
.Where(x=> x.employeeId == "23")
Results in this:
{
employeeId = "23",
payrollId = "09011"
}
These three lines represent your data processing and projection logic:
.Split(',')
.Select(x => x.Split(':'))
.Select(x => new { employeeId = x[0], payrollId = x[1] })
Then you can add any filtering logic with Where after this the second Select
You can try something like that
"0:12211,90:33221,23:09011"
.Split(new char[] { ',' })
.Select(c => {
var pair = c.Split(new char[] { ':' });
return new KeyValuePair<string, string>(pair[0], pair[1]);
})
.ToList();
You have to be aware of validations of data
If I were you, I'd use a dictionary. Especially if you're going to do more than one lookup.
Dictionary<int, int> employeeIDToPayrollID = "0:12211,90:33221,23:09011"
.Split(',') //Split on comma into ["0:12211", "90:33221", "23:09011"]
.Select(x => x.Split(':')) //Split each string on colon into [ ["0", "12211"]... ]
.ToDictionary(int.Parse(x => x[0]), int.Parse(x => x[1]))
and now, you just have to write employeeIDtoPayrollID[0] to get 12211 back. Notice that int.Parse will throw an exception if your IDs aren't integers. You can remove those calls if you want to have a Dictionary<string, string>.
You can use string.Split along with string.Substring.
var result =
str.Split(',')
.Where(s => s.Substring(0,s.IndexOf(":",StringComparison.Ordinal)) == "23")
.Select(s => s.Substring(s.IndexOf(":",StringComparison.Ordinal) + 1))
.FirstOrDefault();
if this logic will be used more than once then I'd put it to a method:
public string GetPayrollIdByEmployeeId(string source, string employeeId){
return source.Split(',')
.Where(s => s.Substring(0, s.IndexOf(":", StringComparison.Ordinal)) == employeeId)
.Select(s => s.Substring(s.IndexOf(":", StringComparison.Ordinal) + 1))
.FirstOrDefault();
}
Assuming you have more than three pairs in the string (how long is that string, anyway?) you can convert it to a Dictionary and use that going forward.
First, split on the comma and then on the colon and put in a Dictionary:
var empInfo = src.Split(',').Select(p => p.Split(':')).ToDictionary(pa => pa[0], pa => pa[1]);
Now, you can write a function to lookup payroll IDs from employee IDs:
string LookupPayrollID(Dictionary<string, string> empInfo, string empID) => empInfo.TryGetValue(empID, out var prID) ? prID : empInfo["0"];
And you can call it to get the answer:
var emp23prid = LookupPayrollID(empInfo, "23");
var emp32prid = LookupPayrollID(empInfo, "32");
If you just have three employees in the string, creating a Dictionary is probably overkill and a simpler answer may be appropriate, such as searching the string.

LINQ query that combines grouping and sorting

I am relatively new to LINQ and currently working on a query that combines grouping and sorting. I am going to start with an example here. Basically I have an arbitrary sequence of numbers represented as strings:
List<string> sNumbers = new List<string> {"34521", "38450", "138477", "38451", "28384", "13841", "12345"}
I need to find all sNumbers in this list that contain a search pattern (say "384")
then return the filtered sequence such that the sNumbers that start with the search pattern ("384") are sorted first followed by the remaining sNumbers that contain the search pattern somewhere. So it will be like this (please also notice the alphabetical sort with in the groups):
{"38450", "38451", "13841", "28384", "138477"}
Here is how I have started:
outputlist = (from n in sNumbers
where n.Contains(searchPattern
select n).ToList();
So now we have all number that contain the search pattern. And this is where I am stuck. I know that at this point I need to 'group' the results into two sequences. One that start with the search pattern and other that don't. Then apply a secondary sort in each group alphabetically. How do I write a query that combines all that?
I think you don't need any grouping nor list splitting for getting your desired result, so instead of answer about combining and grouping I will post what I would do to get desired result:
sNumbers.Where(x=>x.Contains(pattern))
.OrderByDescending(x => x.StartsWith(pattern)) // first criteria
.ThenBy(x=>Convert.ToInt32(x)) //this do the trick instead of GroupBy
.ToList();
This seems fairly straight forward, unless I've misunderstood something:
List<string> outputlist =
sNumbers
.Where(n => n.Contains("384"))
.OrderBy(n => int.Parse(n))
.OrderByDescending(n => n.StartsWith("384"))
.ToList();
I get this:
var result = sNumbers
.Where(e => e.StartsWith("384"))
.OrderBy(e => Int32.Parse(e))
.Union(sNumbers
.Where(e => e.Contains("384"))
.OrderBy(e => Int32.Parse(e)));
Here the optimized version which only needs one LINQ statement:
string match = "384";
List<string> sNumbers = new List<string> {"34521", "38450", "138477", "38451", "28384", "13841", "12345"};
// That's all it is
var result =
(from x in sNumbers
group x by new { Start = x.StartsWith(match), Contain = x.Contains(match)}
into g
where g.Key.Start || g.Key.Contain
orderby !g.Key.Start
select g.OrderBy(Convert.ToInt32)).SelectMany(x => x);
result.ToList().ForEach(x => Console.Write(x + " "));
Steps:
1.) Group into group g based on StartsWith and Contains
2.) Just select those groups which contain the match
3.) Order by the inverse of the StartsWith key (So that StartsWith = true comes before StartsWith = false)
4.) Select the sorted list of elements of both groups
5.) Do a flatMap (SelectMany) over both lists to receive one final result list
Here an unoptimized version:
string match = "384";
List<string> sNumbers = new List<string> {"34521", "38450", "138477", "38451", "28384", "13841", "12345"};
var matching = from x in sNumbers
where x.StartsWith(match)
orderby Convert.ToInt32(x)
select x;
var nonMatching = from x in sNumbers
where !x.StartsWith(match) && x.Contains(match)
orderby Convert.ToInt32(x)
select x;
var result = matching.Concat(nonMatching);
result.ToList().ForEach(x => Console.Write(x + " "));
Linq has an OrderBy method that allows you give a custom class for deciding how things should be sorted. Look here: https://msdn.microsoft.com/en-us/library/bb549422(v=vs.100).aspx
Then you can write your IComparer class that takes a value in the constructor, then a Compare method that prefers values that start with that value.
Something like this maybe:
public class CompareStringsWithPreference : IComparer<string> {
private _valueToPrefer;
public CompareStringsWithPreference(string valueToPrefer) {
_valueToPrefer = valueToPrefer;
}
public int Compare(string s1, string s2) {
if ((s1.StartsWith(_valueToPrefer) && s2.StartsWith(_valueToPrefer)) ||
(!s1.StartsWith(_valueToPrefer) && !s2.StartsWith(_valueToPrefer)))
return string.Compare(s1, s2, true);
if (s1.StartsWith(_valueToPrefer)) return -1;
if (s2.StartsWith(_valueToPrefer)) return 1;
}
}
Then use it like this:
outputlist = (from n in sNumbers
where n.Contains(searchPattern)
select n).OrderBy(n, new CompareStringsWithPreference(searchPattern))ToList();
You can create a list with strings starting with searchPattern variable and another containing searchPattern but not starting with (to avoid repeating elements in both lists):
string searchPattern = "384";
List<string> sNumbers = new List<string> { "34521", "38450", "138477", "38451", "28384", "13841", "12345" };
var list1 = sNumbers.Where(s => s.StartsWith(searchPattern)).OrderBy(s => s).ToList();
var list2 = sNumbers.Where(s => !s.StartsWith(searchPattern) && s.Contains(searchPattern)).OrderBy(s => s).ToList();
var outputList = new List<string>();
outputList.AddRange(list1);
outputList.AddRange(list2);
Sorry guys, after reading through the responses, I realize that I made a mistake in my question. The correct answer would be as follows: (sort by "starts with" first and then alphabetically (not numerically)
// output: {"38450", "38451", "13841", "138477", "28384"}
I was able to achieve that with the following query:
string searchPattern = "384";
List<string> result =
sNumbers
.Where(n => n.Contains(searchpattern))
.OrderBy(s => !s.StartsWith(searchpattern))
.ThenBy(s => s)
.ToList();
Thanks

Categories