Compare and combine strings to get duplicates

Compare and combine strings to get duplicates - c#

I will try to describe my question in the best way I can.
I have a list with X strings ("NOTION", "CATION", "COIN", "NOON").
I am trying to compare them and find the most times each character (letter) was used, use that to get the number of that character, arrange them in alphabetical order, and create a string.
So the result string should be: "ACINNOOT"
Hope is clear what I am describing.
EDIT
So far:
for (int i = 0; i < currentWord.Length; i++)
{
string letter = word.Substring(i, 1);
tempDuplicatedLetterList.Add(letter);
}
// Which letters are repeated and how many times
var duplicatedQuery = tempDuplicatedLetterList.GroupBy(x => x)
.Where(g => g.Count() > 1)
.Select(y => new { Element = y.Key, Counter = y.Count() })
.ToList();

I came to this, although I think there might be a cleaner way to do it:
var characterSets = new string[] { "NOTION", "CATION", "COIN", "NOON" }
.SelectMany(c => c.GroupBy(cc => cc)) // create character groups for each string, and flatten the groups
.GroupBy(c => c.Key) // group the groups
.OrderBy(cg => cg.Key) // order by the character (alphabetical)
.Select(cg => new string(cg.Key, cg.Max(v => v.Count()))) // create a string for each group, using the maximum count for that character
.ToArray(); // make an array
var result = string.Concat(characterSets);

Related

LINQ query to group strings by first letter and determine total length

A sequence of non-empty strings stringList is given, containing only uppercase letters of the Latin alphabet. For all strings starting with the same letter, determine their total length and obtain a sequence of strings of the form "S-C", where S is the total length of all strings from stringList that begin with the character C.
var stringList = new[] { "YELLOW", "GREEN", "YIELD" };
var expected = new[] { "11-Y", "5-G" };
I tried this:
var groups =
from word in stringList
orderby word ascending
group word by word[0] into groupedByFirstLetter
orderby groupedByFirstLetter.Key descending
select new { key = groupedByFirstLetter.Key, Words = groupedByFirstLetter.Select(x => x.Length) };
But the output of this query is Y 6 5 G 5 instead of Y-11 G-5.
What I would like to know is how to sum the lengths if there is more than 1 word in the group, and how to format the result/display it as expected?

This should do it:
var results = stringList.OrderByDescending(x => x[0])
.ThenBy(x => x)
.GroupBy(x => x[0])
.Select(g => $"{g.Sum(x => x.Length)}-{g.Key}")
.ToArray();

var result = stringList.GroupBy(e => e[0]).Select(e => $"{e.Sum(o => o.Length)}-{e.Key}").ToArray();
Not sure I am able to rewrite it in your form.

C# Array of strings contains string part from another array of strings

Is there a way using LINQ, to find if string from one array of strings contains (partial) string from another array of strings? Something like this:
string[] fullStrings = { "full_xxx_part_name", "full_ccc_part_name", "full_zzz_part_name" };
string[] stringParts = { "a_part", "b_part", "c_part", "e_part" };
// compare fullStrings array with stringParts array
// full_ccc_part_name contains c_part (first match is OK, no need to find all)
// return index 1 (index 1 from fullStrings array)
This is asked rather for educational purpose.
I'm aware that Linq does not magically avoid the loop, instead does it in the background.

You can use Where + Any with string methods:
string[] matches = fullStrings
.Where(s => stringParts.Any(s.Contains))
.ToArray();
If you want to compare in a case insensitive way use IndexOf:
string[] matches = fullStrings
.Where(s => stringParts.Any(part => s.IndexOf(part, StringComparison.OrdinalIgnoreCase) >= 0))
.ToArray();
In case you want the indexes:
int[] matches = fullStrings
.Select((s, index) => (String: s, Index: index))
.Where(x => stringParts.Any(x.String.Contains))
.Select(x => x.Index)
.ToArray();

You would of course need to use some type of loop to find the index. Here is a solution using Linq.
This will return the first index if a match is found or -1 if none is found:
var index = fullStrings
.Select((s,i) => (s, i))
.Where(x => stringParts.Any(x.s.Contains))
.Select(x => x.i)
.DefaultIfEmpty(-1)
.First();

C# .Where & .Select

I was looking into how to check character duplicates and I came across this method, it works, but I am trying to understand how it works. If anyone could explain this method so I can better understand what is occurring I would greatly appreciate it. Thank you.
static int duplicateAmount(string word)
{
var duplicates = word.GroupBy(a => a)
.Where(g => g.Count() > 1)
.Select(i => new { Number = i.Key, Count = i.Count() });
return duplicates.Count();
}

The idea is to group the characters in the string and check if any group contains more than one elements, signifying duplicate occurrence of characters. For example,
word.GroupBy would produce a grouping result as the following.
As you can observe, the characters t,i,and s has more than one occurrences. The Where condition filters the groups which has more than one element and the count method counts the numbers of filtered groups.
In your case, if you are interested only in counting the number of characters that are duplicate, you could refactor the method further as
static int duplicateAmount(string word)
{
return word.GroupBy(a => a)
.Count(g => g.Count() > 1);
}
This avoids creation of intermediate types, which is not quite required if you are interested only the count

When you iterate a string, you do so by iterating all its characters.
Therefore:
static int duplicateAmount(string word)
{
var duplicates = word.GroupBy(a => a) // Groups all the unique chars
.Where(g => g.Count() > 1) // filters the groups with more than one entry
// Maps the query result to an anonymous object containing the char
// and their amount of occurrences
.Select(i => new { Number = i.Key, Count = i.Count() });
// return the count of elements in the resulting collection
return duplicates.Count();
}
Now that you have understood that, you can probably tell the last step (the mapping) is unnecessary since we're creating a structure we're not using at all: { Number, Count}.
The code can perfectly be
static int duplicateAmount(string word)
{
return word.GroupBy(a => a) // Groups all the unique chars
// Counts the amount of groups with more than one occurrence.
.Count(g => g.Count() > 1);
}
Edited: Removed the where clause as noted in the comments. Thanks #DrkDeveloper

C#: Rename/replace duplicates in list with an added number

I have a List<string> where I would want to replace all duplicates with an added number to them. An example would be:
{"Ply0", "Ply+45", "Ply-45", "Ply0"}
I would like each "Ply0" to have a unique name, so replace them with "Ply0_1" and "Ply0_2". It is important that the order of the list stays the same. Afterwards the list should look like this:
{"Ply0_1", "Ply+45", "Ply-45", "Ply0_2"}
I have tried first finding the duplicates with LINQ but I am new to it and also have trouble replacing them with the added number while keeping the order of the original list.
Any help would be greatly appreciated!

Using linq, it can be done like this, but i don't think it is much readable
var listx = new List<string>() { "Ply0", "Ply+45", "Ply-45", "Ply0" };
var res = listx.Select((s, i) => new { orgstr=s, index = i })
.GroupBy(x => x.orgstr)
.SelectMany(g => g.Select((x, j) => new { item = x, suffix = j + 1, count = g.Count() }))
.OrderBy(x => x.item.index)
.Select(x => x.count == 1 ? x.item.orgstr : x.item.orgstr + "_" + x.suffix)
.ToList();

LINQ Query to find string of multidimensional array with most duplicates

I have written a function that gives me an multidimensional array of an Match with multiple regex strings. (FileCheck[][])
FileCheck[0] // This string[] contains all the filenames
FileCheck[1] // This string[] is 0 or 1 depending on a Regex match is found.
FileCheck[2] // This string[] contains the Index of the first found Regex.
foreach (string File in InputFolder)
{
int j = 0;
FileCheck[0][k] = Path.GetFileName(File);
Console.WriteLine(FileCheck[0][k]);
foreach (Regex Filemask in Filemasks)
{
if (string.IsNullOrEmpty(FileCheck[1][k]) || FileCheck[1][k] == "0")
{
if (Filemask.IsMatch(FileCheck[0][k]))
{
FileCheck[1][k] = "1";
FileCheck[2][k] = j.ToString(); // This is the Index of the Regex thats Valid
}
else
{
FileCheck[1][k] = "0";
}
j++;
}
Console.WriteLine(FileCheck[1][k]);
}
k++;
}
Console.ReadLine();
// I need the Index of the Regex with the most valid hits
I'm trying to write a function that gives me the string of the RegexIndex that has the most duplicates.
This is what I tried but did not work :( (I only get the count of the string the the most duplicates but not the string itself)
// I need the Index of the Regex with the most valid hits
var LINQ = Enumerable.Range(0, FileCheck[0].GetLength(0))
.Where(x => FileCheck[1][x] == "1")
.GroupBy(x => FileCheck[2][x])
.OrderByDescending(x => x.Count())
.First().ToList();
Console.WriteLine(LINQ[1]);
Example Data
string[][] FileCheck = new string[3][];
FileCheck[0] = new string[]{ "1.csv", "TestValid1.txt", "TestValid2.txt", "2.xml", "TestAlsoValid.xml", "TestValid3.txt"};
FileCheck[1] = new string[]{ "0","1","1","0","1","1"};
FileCheck[2] = new string[]{ null, "3", "3", null,"1","2"};
In this example I need as result of the Linq query:
string result = "3";

With your current code, substituting 'ToList()' with 'Key' would do the trick.
var LINQ = Enumerable.Range(0, FileCheck[0].GetLength(0))
.Where(x => FileCheck[1][x] == "1")
.GroupBy(x => FileCheck[2][x])
.OrderByDescending(x => x.Count())
.First().Key;
Since the index is null for values that are not found, you could also filter out null values and skip looking at the FileCheck[1] array. For example:
var maxOccurringIndex = FileCheck[2].Where(ind => ind != null)
.GroupBy(ind=>ind)
.OrderByDescending(x => x.Count())
.First().Key;
However, just a suggestion, you can use classes instead of a nested array, e.g.:
class FileCheckInfo
{
public string File{get;set;}
public bool Match => Index.HasValue;
public int? Index{get;set;}
public override string ToString() => $"{File} [{(Match ? Index.ToString() : "no match")}]";
}
Assuming InputFolder is an enumerable of string and Filemasks an enumerable of 'Regex', an array can be filled with:
FileCheckInfo[] FileCheck = InputFolder.Select(f=>
new FileCheckInfo{
File = f,
Index = Filemasks.Select((rx,ind) => new {ind, IsMatch = rx.IsMatch(f)}).FirstOrDefault(r=>r.IsMatch)?.ind
}).ToArray();
Getting the max occurring would be much the same:
var maxOccurringIndex = FileCheck.Where(f=>f.Match).GroupBy(f=>f.Index).OrderByDescending(gr=>gr.Count()).First().Key;
edit PS, the above is all assuming you need to reuse the results, if you only have to find the maximum occurrence you're much better of with an approach such as Martin suggested!
If the goal is only to get the max occurrence, you can use:
var maxOccurringIndex = Filemasks.Select((rx,ind) => new {ind, Count = InputFolder.Count(f=>rx.IsMatch(f))})
.OrderByDescending(m=>m.Count).FirstOrDefault()?.ind;

Your question and code seems very convoluted. I am guessing that you have a list of file names and another list of file masks (regular expressions) and you want to find the file mask that matches most file names. Here is a way to do that:
var fileNames = new[] { "1.csv", "TestValid1.txt", "TestValid2.txt", "2.xml", "TestAlsoValid.xml", "TestValid3.txt" };
var fileMasks = new[] { #"\.txt$", #"\.xml$", "valid" };
var fileMaskWithMostMatches = fileMasks
.Select(
fileMask => new {
FileMask = fileMask,
FileNamesMatched = fileNames.Count(
fileName => Regex.Match(
fileName,
fileMask,
RegexOptions.IgnoreCase | RegexOptions.CultureInvariant
)
.Success
)
}
)
.OrderByDescending(x => x.FileNamesMatched)
.First()
.FileMask;
With the sample data the value of fileMaskWithMostMatches is valid.
Note that the Regex class will do some caching of regular expressions but if you have many regular expressions it will be more effecient to create the regular expressions outside the implied fileNames.Count for-each loop to avoid recreating the same regular expression again and again (creating a regular expression may take a non-trivial amount of time depending on the complexity).

As an alternative to Martin's answer, here's a simpler version to your existing Linq query that gives the desired result;
var LINQ = FileCheck[2]
.ToLookup(x => x) // Makes a lookup table
.OrderByDescending(x => x.Count()) // Sorts by count, descending
.Select(x => x.Key) // Extract the key
.FirstOrDefault(x => x != null); // Return the first non null key
// or null if none found.

Isn't this much more easier?
string result = FileCheck[2]
.Where(x => x != null)
.GroupBy(x => x)
.OrderByDescending(x => x.Count())
.FirstOrDefault().Key;

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Compare and combine strings to get duplicates - c#

Related

LINQ query to group strings by first letter and determine total length

C# Array of strings contains string part from another array of strings

C# .Where & .Select

C#: Rename/replace duplicates in list with an added number

LINQ Query to find string of multidimensional array with most duplicates

Categories

Resources