C# Array of strings contains string part from another array of strings

C# Array of strings contains string part from another array of strings - c#

Is there a way using LINQ, to find if string from one array of strings contains (partial) string from another array of strings? Something like this:
string[] fullStrings = { "full_xxx_part_name", "full_ccc_part_name", "full_zzz_part_name" };
string[] stringParts = { "a_part", "b_part", "c_part", "e_part" };
// compare fullStrings array with stringParts array
// full_ccc_part_name contains c_part (first match is OK, no need to find all)
// return index 1 (index 1 from fullStrings array)
This is asked rather for educational purpose.
I'm aware that Linq does not magically avoid the loop, instead does it in the background.

You can use Where + Any with string methods:
string[] matches = fullStrings
.Where(s => stringParts.Any(s.Contains))
.ToArray();
If you want to compare in a case insensitive way use IndexOf:
string[] matches = fullStrings
.Where(s => stringParts.Any(part => s.IndexOf(part, StringComparison.OrdinalIgnoreCase) >= 0))
.ToArray();
In case you want the indexes:
int[] matches = fullStrings
.Select((s, index) => (String: s, Index: index))
.Where(x => stringParts.Any(x.String.Contains))
.Select(x => x.Index)
.ToArray();

You would of course need to use some type of loop to find the index. Here is a solution using Linq.
This will return the first index if a match is found or -1 if none is found:
var index = fullStrings
.Select((s,i) => (s, i))
.Where(x => stringParts.Any(x.s.Contains))
.Select(x => x.i)
.DefaultIfEmpty(-1)
.First();

Related

Compare and combine strings to get duplicates

I will try to describe my question in the best way I can.
I have a list with X strings ("NOTION", "CATION", "COIN", "NOON").
I am trying to compare them and find the most times each character (letter) was used, use that to get the number of that character, arrange them in alphabetical order, and create a string.
So the result string should be: "ACINNOOT"
Hope is clear what I am describing.
EDIT
So far:
for (int i = 0; i < currentWord.Length; i++)
{
string letter = word.Substring(i, 1);
tempDuplicatedLetterList.Add(letter);
}
// Which letters are repeated and how many times
var duplicatedQuery = tempDuplicatedLetterList.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(y => new { Element = y.Key, Counter = y.Count() })
.ToList();

I came to this, although I think there might be a cleaner way to do it:
var characterSets = new string[] { "NOTION", "CATION", "COIN", "NOON" }
.SelectMany(c => c.GroupBy(cc => cc)) // create character groups for each string, and flatten the groups
.GroupBy(c => c.Key) // group the groups
.OrderBy(cg => cg.Key) // order by the character (alphabetical)
.Select(cg => new string(cg.Key, cg.Max(v => v.Count()))) // create a string for each group, using the maximum count for that character
.ToArray(); // make an array
var result = string.Concat(characterSets);

How to compare 2 comma seperated string values and update in existing list at same position?

I am having a list of string which contains some value and I want to compare values of 2 positions from list and remove matching items from list.
Code :
var list = new List<string>();
list.Add("Employee1");
list.Add("Account");
list.Add("100.5600,A+ ,John");
list.Add("1.00000,A+ ,John");
list.Add("USA");
Now i want to compare 2nd and 3rd position :
list.Add("100.5600,A+ ,John");
list.Add("1.00000,A+ ,John");
Compare above 2 records and remove matching records like below:
Expected output :
list.Add("100.5600");
list.Add("1.00000");
This is how i am trying to do :
var source = list[2].Split(',').Select(p => p.Trim());
var target = list[3].Split(',').Select(p => p.Trim());
var result = source.Except(target);
But the problem is I am only getting 100.5600 as output.
Is it possible to compare and update non matching records in existing list?

How about this "beauty"
var list = new List<string>();
list.Add("Employee1");
list.Add("Account");
list.Add("100.5600,A+ ,John");
list.Add("1.00000,A+ ,John");
list.Add("USA");
//prepare the list, I decided to make a tuple with the original string in the list and the splitted array
var preparedItems = list.Select(x => (x, x.Split(',')));
//group the prepared list to get matching items for the 2nd and 3rd part of the split, I therefor used .Skip(1) on the previously prepared array
var groupedItems = preparedItems.GroupBy(x => string.Join(",", x.Item2.Skip(1).Select(y => y.Trim())));
//"evaluate" the group by saying if the items in the group is > 1 only use the first part of the prepared array and if it doesnt have more than one entry use the orignal string
var evaluatedItems = groupedItems.SelectMany(x => x.Count() > 1 ? x.Select(y => y.Item2[0]) : x.Select(y => y.Item1));
//replace the orignal list with the new result
list = evaluatedItems.ToList();
Edit - preserve original order:
//extended the prepare routine with a third part the index to Keep track of the ordering of the original list
//so the tuple now consits of 3 parts instead of 2 - ([item], [index], [splittedArray])
var preparedItems = list.Select((x, i) => (x, i, x.Split(',')));
//changed to use Item3 intead of Item2 - since the Array now is on third position
var groupedItems = preparedItems.GroupBy(x => string.Join(",", x.Item3.Skip(1).Select(y => y.Trim())));
//instead of returning the simple string here already, return a tuple with the index (y.Item2) and the correct string
var evaluatedItems = groupedItems.SelectMany(x => x.Count() > 1 ? x.Select(y => (y.Item2, y.Item3[0])) : x.Select(y => (y.Item2, y.Item1)));
//now order by the new tuple x.Item1 and only return x.Item2
var orderedItems = evaluatedItems.OrderBy(x => x.Item1).Select(x => x.Item2);
list = orderedItems.ToList();
//one-liner - isn't that a beauty
list = list.Select((x, i) => (x, i, x.Split(','))).GroupBy(x => string.Join(",", x.Item3.Skip(1).Select(y => y.Trim()))).SelectMany(x => x.Count() > 1 ? x.Select(y => (y.Item2, y.Item3[0])) : x.Select(y => (y.Item2, y.Item1))).OrderBy(x => x.Item1).Select(x => x.Item2).ToList();

You may get it easily by checking if items in one is not contained in the other:
var result = source.Where(x => !target.Contains(x));
To update your old list:
var source = string.Join(",", source.Where(x => !target.Contains(x)));

Order list of objects by a list of ids

So, I have a list of objects (let's say there are 20) and they have an id. Then I have another list (which is ordered correctly).
I had this linq to sort the object list by the id list:
var outcomeIds = outcomeRequestModels
.OrderByDescending(m => m.Score)
.Select(m => m.Id)
.ToList();
groupResponseModel.Outcomes = groupOutcomes
.OrderBy(m => outcomeIds.IndexOf(m.Id))
.ToList();
Now, this "would" work, but the problem is the outcomeIds only has a selection of ids in it. I would have thought that indexOf would return -1 for any id that was not found and it would be put under the matched ids. Instead they appear first in the list.
How can I modify my code to get the matching ids at the top and the rest at the bottom. I can't do a reverse, because it would mean that the order of the matching ids would be in reverse too.

Sounds like you want to order by the result of IndexOf, but to have the -1 values go to the end instead of the start. In that case, you could just process the value of the IndexOf to, say, int.MaxValue so it'll go at the end.
I've tidied up your code a bit to make it more readable - only the OrderBy is different to your original code.
var outcomeIds = outcomeRequestModels
.OrderByDescending(m => m.Score)
.Select(m => m.Id)
.ToList();
groupResponseModel.Outcomes = groupOutcomes
.Select(m => Tuple.Create(m, outcomeIds.IndexOf(m.Id))
.OrderBy(m => outcomeIds.IndexOf(m.Id) == -1 ? int.MaxValue : outcomeIds.IndexOf(m.Id))
.ToList();
Or, if you don't want to call IndexOf multiple times, you could extract the conditional statement into a method:
var outcomeIds = outcomeRequestModels
.OrderByDescending(m => m.Score)
.Select(m => m.Id)
.ToList();
groupResponseModel.Outcomes = groupOutcomes
.Select(m => Tuple.Create(m, outcomeIds.IndexOf(m.Id))
.OrderBy(m => orderByKeySelector(outcomeIds(m.Id)))
.ToList();
where orderByKeySelector is
private static int orderByKeySelector<T>(List<T> source, T value)
{
var indexOfValue = source.IndexOf(value);
return indexOfValue == -1 ? int.MaxValue : indexOfValue;
}

var outcomeIds = outcomeRequestModels
.OrderByDescending(m => m.Score)
.Select(m => m.Id)
.ToList();
groupResponseModel.Outcomes = groupOutcomes
.OrderBy(m => outcomeIds.IndexOf(m.Id) != -1
? outcomeIds.IndexOf(m.Id)
: outcomeIds.Max())
.ToList();

I prefer keeping it simple:
var outcomeList;
var unorderedList;
//check all elements of the ordered list in order
foreach(var item in orderedList)
{
//if your unordered list has this item
if(unorderedList.Any(item))
{
//add this item to the final list
outcomeList.Add(item);
//and remove it from unordered
unorderedList.Remove(item);
}
}
//at this point, you added all your matching entities in order, the rest is the remainder:
outcomeList.AddRange(unorderedList);
You can even turn this into an extension method for reusability.

Why not using mapping (say, id == 5 corresponds to 0, id = 123 to 1 etc.) with a help of dictionary? It will be efficient in case of long lists:
var order = outcomeRequestModels
.OrderByDescending(m => m.Score)
.Select((m, index) => new {
id = m.id,
index = index })
.ToDictionary(item => item.id, // id
item => item.index); // corresponding index
Now let's sort the 2nd list:
groupResponseModel.Outcomes = groupOutcomes
.OrderBy(m => order.TryGetValue(m.Id, out var order)
? order // if we have corresponding index, use it
: int.MaxValue) // otherwise, put the item at the bottom
.ToList();

LINQ Query to find string of multidimensional array with most duplicates

I have written a function that gives me an multidimensional array of an Match with multiple regex strings. (FileCheck[][])
FileCheck[0] // This string[] contains all the filenames
FileCheck[1] // This string[] is 0 or 1 depending on a Regex match is found.
FileCheck[2] // This string[] contains the Index of the first found Regex.
foreach (string File in InputFolder)
{
int j = 0;
FileCheck[0][k] = Path.GetFileName(File);
Console.WriteLine(FileCheck[0][k]);
foreach (Regex Filemask in Filemasks)
{
if (string.IsNullOrEmpty(FileCheck[1][k]) || FileCheck[1][k] == "0")
{
if (Filemask.IsMatch(FileCheck[0][k]))
{
FileCheck[1][k] = "1";
FileCheck[2][k] = j.ToString(); // This is the Index of the Regex thats Valid
}
else
{
FileCheck[1][k] = "0";
}
j++;
}
Console.WriteLine(FileCheck[1][k]);
}
k++;
}
Console.ReadLine();
// I need the Index of the Regex with the most valid hits
I'm trying to write a function that gives me the string of the RegexIndex that has the most duplicates.
This is what I tried but did not work :( (I only get the count of the string the the most duplicates but not the string itself)
// I need the Index of the Regex with the most valid hits
var LINQ = Enumerable.Range(0, FileCheck[0].GetLength(0))
.Where(x => FileCheck[1][x] == "1")
.GroupBy(x => FileCheck[2][x])
.OrderByDescending(x => x.Count())
.First().ToList();
Console.WriteLine(LINQ[1]);
Example Data
string[][] FileCheck = new string[3][];
FileCheck[0] = new string[]{ "1.csv", "TestValid1.txt", "TestValid2.txt", "2.xml", "TestAlsoValid.xml", "TestValid3.txt"};
FileCheck[1] = new string[]{ "0","1","1","0","1","1"};
FileCheck[2] = new string[]{ null, "3", "3", null,"1","2"};
In this example I need as result of the Linq query:
string result = "3";

With your current code, substituting 'ToList()' with 'Key' would do the trick.
var LINQ = Enumerable.Range(0, FileCheck[0].GetLength(0))
.Where(x => FileCheck[1][x] == "1")
.GroupBy(x => FileCheck[2][x])
.OrderByDescending(x => x.Count())
.First().Key;
Since the index is null for values that are not found, you could also filter out null values and skip looking at the FileCheck[1] array. For example:
var maxOccurringIndex = FileCheck[2].Where(ind => ind != null)
.GroupBy(ind=>ind)
.OrderByDescending(x => x.Count())
.First().Key;
However, just a suggestion, you can use classes instead of a nested array, e.g.:
class FileCheckInfo
{
public string File{get;set;}
public bool Match => Index.HasValue;
public int? Index{get;set;}
public override string ToString() => $"{File} [{(Match ? Index.ToString() : "no match")}]";
}
Assuming InputFolder is an enumerable of string and Filemasks an enumerable of 'Regex', an array can be filled with:
FileCheckInfo[] FileCheck = InputFolder.Select(f=>
new FileCheckInfo{
File = f,
Index = Filemasks.Select((rx,ind) => new {ind, IsMatch = rx.IsMatch(f)}).FirstOrDefault(r=>r.IsMatch)?.ind
}).ToArray();
Getting the max occurring would be much the same:
var maxOccurringIndex = FileCheck.Where(f=>f.Match).GroupBy(f=>f.Index).OrderByDescending(gr=>gr.Count()).First().Key;
edit PS, the above is all assuming you need to reuse the results, if you only have to find the maximum occurrence you're much better of with an approach such as Martin suggested!
If the goal is only to get the max occurrence, you can use:
var maxOccurringIndex = Filemasks.Select((rx,ind) => new {ind, Count = InputFolder.Count(f=>rx.IsMatch(f))})
.OrderByDescending(m=>m.Count).FirstOrDefault()?.ind;

Your question and code seems very convoluted. I am guessing that you have a list of file names and another list of file masks (regular expressions) and you want to find the file mask that matches most file names. Here is a way to do that:
var fileNames = new[] { "1.csv", "TestValid1.txt", "TestValid2.txt", "2.xml", "TestAlsoValid.xml", "TestValid3.txt" };
var fileMasks = new[] { #"\.txt$", #"\.xml$", "valid" };
var fileMaskWithMostMatches = fileMasks
.Select(
fileMask => new {
FileMask = fileMask,
FileNamesMatched = fileNames.Count(
fileName => Regex.Match(
fileName,
fileMask,
RegexOptions.IgnoreCase | RegexOptions.CultureInvariant
)
.Success
)
}
)
.OrderByDescending(x => x.FileNamesMatched)
.First()
.FileMask;
With the sample data the value of fileMaskWithMostMatches is valid.
Note that the Regex class will do some caching of regular expressions but if you have many regular expressions it will be more effecient to create the regular expressions outside the implied fileNames.Count for-each loop to avoid recreating the same regular expression again and again (creating a regular expression may take a non-trivial amount of time depending on the complexity).

As an alternative to Martin's answer, here's a simpler version to your existing Linq query that gives the desired result;
var LINQ = FileCheck[2]
.ToLookup(x => x) // Makes a lookup table
.OrderByDescending(x => x.Count()) // Sorts by count, descending
.Select(x => x.Key) // Extract the key
.FirstOrDefault(x => x != null); // Return the first non null key
// or null if none found.

Isn't this much more easier?
string result = FileCheck[2]
.Where(x => x != null)
.GroupBy(x => x)
.OrderByDescending(x => x.Count())
.FirstOrDefault().Key;

Find index from an array using regex with linq in c# .net

I have an array which contains an elements as below
dec.02
Novemeber-2
Oct-6
.
.
.
Now suppose I want to find the index of dec.02
Suppose if an array in place of dec.02 there can be december-02.
So I want to use linq with regex which finds the index.
Date can be in any format
Regex will be (dec|december)\W*02
Can any one tell how to use regex with linq to find index from an array

Are you looking for something like that?
String[] data = new String[] {
"dec.02",
"Novemeber-2",
"Oct-6",
...
};
// All the indexes
int[] indice = data
.Select((line, index) => new {
line = line,
index = index})
.Where(item => Regex.IsMatch(item.line, "Your regular expression"))
.Select(item => item.index)
.ToArray();
In case you want 1st such index only (-1 if no index found):
// First index (or -1 if there's no such index)
int result = data
.Select((line, index) => new {
line = line,
index = index})
.Where(item => Regex.IsMatch(item.line, "Your regular expression"))
.Select(item => item.index + 1)
.FirstOrDefault() - 1;

If I understand your question correctly, this can be a solution:
Regex regex = new Regex(#"(dec|december)\W*02");
string[] dates = { "dec.02", "Novemeber-2", "Oct-6" };
int i = dates.Length - dates.SkipWhile(s => !regex.IsMatch(s)).Count(); // based-0 index

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Array of strings contains string part from another array of strings - c#

Related

Compare and combine strings to get duplicates

How to compare 2 comma seperated string values and update in existing list at same position?

Order list of objects by a list of ids

LINQ Query to find string of multidimensional array with most duplicates

Find index from an array using regex with linq in c# .net

Categories

Resources