Modify duplicate values with duplication index suffix (using Linq)

Modify duplicate values with duplication index suffix (using Linq) - c#

I have a list:
List<string> myList = new List<string>{ "dog", "cat", "dog", "bird" };
I want the output to be list of:
"dog (1)", "cat", "dog (2)", "bird"
I've already looked through this question but it is only talking about count the duplicates, my output should be with its duplicate index. like
duplicate (index)
I've tried this code:
var q = list.GroupBy(x => x)
.Where(y => y.Count()>1)
.Select(g => new {Value = g.Key + "(" + g.Index + ")"})
but it does not seem to work because:
Need to return all of my list back \ Or just modify the existing one. (and my answer returning only the duplicate ones)
For duplicate values need to add a prefix based on their "duplicate index".
How to do this in C#? Is there a way using Linq?

The accepted solution works but is extremely inefficient when the size of the list grows large.
What you want to do is first get the information you need in an efficient data structure. Can you implement a class:
sealed class Counter<T>
{
public void Add(T item) { }
public int Count(T item) { }
}
where Count returns the number of times (possibly zero) that Add has been called with that item. (Hint: you could use a Dictionary<T, int> to good effect.)
All right. Now that we have our useful helper we can:
var c1 = new Counter<string>();
foreach(string item in myList)
c1.Add(item);
Great. Now we can construct our new list by making use of a second counter:
var result = new List<String>();
var c2 = new Counter<String>();
foreach(string item in myList)
{
c2.Add(item);
if (c1.Count(item) == 1))
result.Add(item);
else
result.Add($"{item} ({c2.Count(item)})");
}
And we're done. Or, if you want to modify the list in place:
var c2 = new Counter<String>();
// It's a bad practice to mutate a list in a foreach, so
// we'll be sticklers and use a for.
for (int i = 0; i < myList.Count; i = i + 1)
{
var item = myList[i];
c2.Add(item);
if (c1.Count(item) != 1))
myList[i] = $"{item} ({c2.Count(item)})";
}
The lesson here is: create a useful helper class that solves one problem extremely well, and then use that helper class to make the solution to your actual problem more elegant. You need to count things to solve a problem? Make a thing-counter class.

This solution is not quadratic with respect to size of list, and it modifies the list in place as preferred in OP.
Any efficient solution will involve a pre-pass in order to find and count the duplicates.
List<string> myList = new List<string>{ "dog", "cat", "dog", "bird" };
//map out a count of all the duplicate words in dictionary.
var counts = myList
.GroupBy(s => s)
.Where(p => p.Count() > 1)
.ToDictionary(p => p.Key, p => p.Count());
//modify the list, going backwards so we can take advantage of our counts.
for (int i = myList.Count - 1; i >= 0; i--)
{
string s = myList[i];
if (counts.ContainsKey(s))
{
//add the suffix and decrement the number of duplicates left to tag.
myList[i] += $" ({counts[s]--})";
}
}

One way to do this is to simply create a new list that contains the additional text for each item that appears more than once. When we find these items, we can create our formatted string using a counter variable, and increment the counter if the list of formatted strings contains that counter already.
Note that this is NOT a good performing solution. It was just the first thing that came to my head. But it's a place to start...
private static void Main()
{
var myList = new List<string> { "dog", "cat", "dog", "bird" };
var formattedItems = new List<string>();
foreach (var item in myList)
{
if (myList.Count(i => i == item) > 1)
{
int counter = 1;
while (formattedItems.Contains($"{item} ({counter})")) counter++;
formattedItems.Add($"{item} ({counter})");
}
else
{
formattedItems.Add(item);
}
}
Console.WriteLine(string.Join(", ", formattedItems));
Console.Write("\nDone!\nPress any key to exit...");
Console.ReadKey();
}
Output

Ok, #EricLippert challenged me and I couldn't let it go. Here's my second attempt, which I believe is much better performing and modifies the original list as requested. Basically we create a second list that contains all the duplicate entries in the first. Then we walk backwards through the first list, modifying any entries that have a counterpart in the duplicates list, and remove the item from the duplicates list each time we encounter one:
private static void Main()
{
var myList = new List<string> {"dog", "cat", "dog", "bird"};
var duplicates = myList.Where(item => myList.Count(i => i == item) > 1).ToList();
for (var i = myList.Count - 1; i >= 0; i--)
{
var numDupes = duplicates.Count(item => item == myList[i]);
if (numDupes <= 0) continue;
duplicates.Remove(myList[i]);
myList[i] += $" ({numDupes})";
}
Console.WriteLine(string.Join(", ", myList));
Console.Write("\nDone!\nPress any key to exit...");
Console.ReadKey();
}
Output

Related

Comparing Names in array to themselves

I have an Array of names I sorted alphabetically. Many names repeat themselves and I track each occurrence as Popularity I have been trying to figure out how I can compare each Index and the one next to it to see if its the same name or not. Each time the same name appears I have a counter that ticks up, when it reaches a different name it checks its occurrence vs "foundNamePop" it stores the counter in a separate variable and resets. The problem is that some Arrays as input have the same name repeating at the end of the array (i.e. Lane, Lane, Lane \0) it leaves out of my IF LOOP and doesn't store it because I just have only the "nameCounter++". I just can't seem to find the solution to making sure it reads every name and store it all no matter if there are multiple names at the end or single names that are different i.e.(Lane, Dane, Bane \0).
Let me also add these .txt files can contain ~50 thousand names and I have no idea what names are in there.
Why does that ending If statement not work it just enters like normal. I ran with debugging and i watched it just slip right into the function even when .ElementsAt(i).Value > (5 for this instance)
var dict = new ConcurrentDictionary<string,int>(StringComparer.OrdinalIgnoreCase);
foreach (var name in updatedName)
{
dict.AddOrUpdate(name, 1, (_, count) => ++count);
}
for (int i = 0; i < dict.Count; i++)
{
if (dict.ElementsAt(i).Value <= foundNamePop);
{
lessPopNameSum += dict.ElementAt(i).Value;
}
}

The simple solution is to add a check after the loop
if (foundNamePop >= nameCounter)
{
lessPopNameSum += nameCounter;
}
But it is not clear to me what you are actually computing, it looks like you are summing the duplicate names that have more duplicates than foundNamePop, but it is not clear what value this has, nor what actual meaning the result will have.
You should be able to use LINQ to get something similar with less code:
var lessPopNameSum = sameLengthName
.GroupBy(n => n)
.Select(group => group.Count())
.Where(c => c >= foundNamePop)
.Sum();

Although I like the elegance of the other posted solution another alternative could be to use a Dictionary to store a count of each of the names.
const int FoundNamePop = 2;
var names = new string[] { "Bill", "Jane", "Jeff", "Rebecca", "Bill" };
var count = FindPopularNames(names)
.Where(kvp => kvp.Value < FoundNamePop)
.Sum(kvp => kvp.Value);
// With 'FoundNamePop' set to two, the below line will print '3'.
Console.WriteLine($"Count: {count}");
static IDictionary<string, int> FindPopularNames(IEnumerable<string> names)
{
var dict = new ConcurrentDictionary<string, int>
(StringComparer.OrdinalIgnoreCase);
foreach (var name in names)
{
dict.AddOrUpdate(name, 1, (_, count) => ++count);
}
return dict;
}

C#: Count occurrences of a string in a list which is in another list using LINQ?

I am trying to count occurrences of a string in dynamically added lists in a main list. This is the main list:
public static List<string>[] tables = new List<string>[30];
This is how I add items to it:
public static int takenTablesDayTotal;
public static void AddProductToTable()
{
int tableNum = int.Parse(Console.ReadLine());
if (tableNum < 1 || tableNum > 30) { throw new Exception(); }
choiceName = Console.ReadLine();
if (tables[tableNum] is null)
{
tables[tableNum] = new List<string>();
takenTablesDayTotal++;
}
tables[tableNum].Add(choiceName);
}
And this is how I have tried to do the counting, but it doesn't seem to work right for some reason (starts at 1 and stops counting there when the required string is detected)
salesProductDayTotal = tables.Where(s => s != null && s.Contains("string")).Count();
I'm not sure how to make this work, so any help will be appreciated!
Thanks in advance!

You can use SelectMany to deliminate the two-nest structure.
Then use Count to get what you want.
For example - count the daily apple sales number
List<string>[] tables = new List<string>[30];
tables[0] = new List<string>{
"Apple", "Banana", "Cherry"
};
tables[1] = new List<string>{
"Peach", "Apple", "Watermelon"
};
tables[2] = new List<string>{
"Mango", "Grape", "Apple"
};
//the daily sales count of Apple.
var dailyAppleSalesCount = tables.Where(x => x != null)
.SelectMany(s => s).Count(x => x == "Apple");

You can use SelectMany to flatten the List<List<string>> into one large List<string>, and then count the products.
You don't need to use Contains, IMO ("Chicken soup" is probably a different product on the menu that "Spicy Chicken Soup"), so it simplifies the condition a bit.
salesProductDayTotal = tables
.Where(t => t != null)
.SelectMany(products => products)
.Count(p => p == "string")
You could also use a GroupBy clause to do this calculations on all the products at once.
Explanation of your problem:
You were using the Count on the outer list, the list of tables. So you had just "one match" for each table that contains the product at least once.

Loop to check for duplicate strings

I want to create a loop to check a list of titles for duplicates.
I currently have this:
var productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle));
foreach (var x in productTitles)
{
var title = x.Text;
productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle));
foreach (var y in productTitles.Skip(productTitles.IndexOf(x) + 1))
{
if (title == y.Text)
{
Assert.Fail("Found duplicate product in the table");
}
}
}
But this is taken the item I skip out of the array for the next loop so item 2 never checks it's the same as item 1, it moves straight to item 3.
I was under the impression that skip just passed over the index you pass in rather than removing it from the list.

You can use GroupBy:
var anyDuplicates = SeleniumContext
.Driver
.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
.GroupBy(p => p.Text, p => p)
.Any(g => g.Count() > 1);
Assert.That(anyDuplicates, Is.False);
or Distinct:
var productTitles = SeleniumContext
.Driver
.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
.Select(p => p.Text)
.ToArray();
var distinctProductTitles = productTitles.Distinct().ToArray();
Assert.AreEqual(productTitles.Length, distinctProductTitles.Length);
Or, if it is enough to find a first duplicate without counting all of them it's better to use a HashSet<T>:
var titles = new HashSet<string>();
foreach (var title in SeleniumContext
.Driver
.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
.Select(p => p.Text))
{
if (!titles.Add(title))
{
Assert.Fail("Found duplicate product in the table");
}
}
All approaches are better in terms of computational complexity (O(n)) than what you propose (O(n2)).

You don't need a loop. Simply use the Where() function to find all same titles, and if there is more than one, then they're duplicates:
var productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle));
foreach(var x in productTitles) {
if (productTitles.Where(y => x.Text == y.Text).Count() > 1) {
Assert.Fail("Found duplicate product in the table");
}
}

I would try a slightly different way since you only need to check for duplicates in a one-dimensional array.
You only have to check the previous element with the next element within the array/collection so using Linq to iterate through all of the items seems a bit unnecessary.
Here's a piece of code to better understand:
var productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
for ( int i = 0; i < productionTitles.Length; i++ )
{
var currentObject = productionTitles[i];
for ( int j = i + 1; j < productionTitles.Length; j++ )
{
if ( currentObject.Title == productionTitles[j].Title )
{
// here's your duplicate
}
}
}
Since you've checked that item at index 0 is not the same as item placed at index 3 there's no need to check that again when you're at index 3. The items will remain the same.

The Skip(IEnumerable, n) method returns an IEnumerable that doesn't "contain" the n first element of the IEnumerable it's called on.
Also I don't know what sort of behaviour could arise from this, but I wouldn't assign a new IEnumerable to the variable over which the foreach is being executed.
Here's another possible solution with LINQ:
int i = 0;
foreach (var x in productTitles)
{
var possibleDuplicate = productTitles.Skip(i++).Find((y) => y.title == x.title);
//if possibleDuplicate is not default value of type
//do stuff here
}
This goes without saying, but the best solution for you will depend on what you are trying to do. Also, I think the Skip method call is more trouble than it's worth, as I'm pretty sure it will most certainly make the search less eficient.

C# Delete one of two successive and same lines in a list

how can i delete one of two same successive lines in a list?
For example:
load
testtest
cd /abc
cd /abc
testtest
exit
cd /abc
In this case ONLY line three OR four.The lists have about 50000 lines, so it is also about speed.
Do you have an idea?
Thank you!
Homeros

You just have to look at the last added element in the second list:
var secondList = new List<string>(firstList.Count){ firstList[0] };
foreach(string next in firstList.Skip(1))
if(secondList.Last() != next)
secondList.Add(next);
Since you wanted to delete the duplicates you have to assign this new list to the old variable:
firstList = secondList;
This approach is more efficient than deleting from a list.
Side-note: since Enumerable.Last is optimized for collections with an indexer(IList<T>), is is as efficient as secondList[secondList.Count-1], but more readable.

user a reverse for-loop and check the adjacent elements:
List<string> list = new List<string>();
for (int i = list.Count-1; i > 0 ; i--)
{
if (list[i] == list[i-1])
{
list.RemoveAt(i);
}
}
the reverse version is advantageous here, because the list might shrink in size with every removed element

I would first split the list, then use LINQ to only select items that don't have the same previous item:
string[] source = text.Split(Environment.NewLine);
var list = source.Select((l, idx) => new { Line = l, Index = idx } )
.Where(x => x.Index == 0 || source[x.Index - 1] != x.Line)
.Select(x => x.Line)
.ToList() // materialize
;

O(n) as extension method
public static IEnumerable<string> RemoveSameSuccessiveItems(this IEnumerable<string> items)
{
string previousItem = null;
foreach(var item in list)
{
if (item.Equals(previousItem) == false)
{
previousItem = item;
yield item;
}
}
}
Then use it
lines = lines.RemoveSameSuccessiveItems();

C# count consecutive duplicates in List<string>

I have a List of strings and want to count the duplicates in it, to work with this information later. To simply count the duplicates would be very easy, but unfortunately I just want to count the consecutive duplicates.
Let us say we have a list with this string items in it:
"1A","3B","5X","7Q","2W","2G","2J","1A","2A"
Now I want to count the duplicates in this list. I just will look at the first char of each string, the other characters in the string can be ignored!
What we get is 2x "1%" and 3x "2%", what i actually want to get is consecutive duplicates, so my result should look like 3x "2%". The 2x "1A" has to be ignored, they are not in a row.
(% = place holder)
I wrote a code that loops through the list and compares one string with the next one
int counter = 0;
for (int i = 0; i < list.Count; i++)
{
char first = list[i][0];
if ((i + 1) == list.Count) break;
char second = list[(i + 1)][0];
if (first == second)
{
counter++;
}
}
I guess you can imagine that this code is a very ugly way to do this especially if you want to work with the output. Also my code can´t handle the features I need.
The code I am looking for, must be able to deal with two feature I want to implement. First, a row of duplicates does not end if the last element of my list is equal to the first element of the list.
For example:
"1A","1B","5X","7Q","2J","1I"
The "1%" has to be detected as duplicate, because of the "1I" and "1A" which are "in a row". If you would loop through the list you just break up at the end of the list if the first and last element are not equal.
pseudo code:
if(list.First()[0] != list.Last()[0])
The second feature I want to implement is, that the items in the list that are not duplicates, with a "duplicate count" over 4 will be deleted. If there is not a single duplicate row with a "duplicate count" or length over 4 I want to return.
For example:
"1A","1B","5X","3Q","1J","1I"
duplicate count == 4 so return
"1A","1B","1X","3Q","1J","1I"
duplicate count == 5, save this five items, delete any other item in the list.
"1A","1B","1X","3Q","1I","1Z","1Z"
duplicate count == 6, save this six items, delete any other item in the list.
Notice:
Just the first char of each string matters. The input list will have 7 items, not a single item more or less. There is no result list, the old one has to be updated. If the duplicate count is under or equal to 4, then there is no work to do, simply return.
There will not be more than 5 duplicates in a row. I have to check billion of list, so performance is really important
As they don´t teach any better English in German schools, I hope anyone understand what my problem is and is willing to help me out.
This is not part of any homework.

What you can use here is a method that is capable of grouping consecutive items while a condition is met:
public static IEnumerable<IEnumerable<T>> GroupWhile<T>(
this IEnumerable<T> source, Func<T, T, bool> predicate)
{
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
yield break;
List<T> list = new List<T>() { iterator.Current };
T previous = iterator.Current;
while (iterator.MoveNext())
{
if (!predicate(previous, iterator.Current))
{
yield return list;
list = new List<T>();
}
list.Add(iterator.Current);
previous = iterator.Current;
}
yield return list;
}
}
Once we have this helper method we can write your query in a reasonably straightforward manor:
var query = data.GroupWhile((prev, current) => prev[0] == current[0])
.Where(group => group.Count() > 1)
.Select(group => new
{
Character = group.First()[0],
Count = group.Count(),
});

I suggest you to group items beginning with the same char in lists. The result of this grouping will be a List<List<string>>. This makes it easier to work with groups.
var list = new List<string> {
"1A", "3B", "5X", "7Q", "2W", "2G", "2J", "1B", "1C", "1D", "1E"
};
var groups = new List<List<string>>();
char lastChar = (char)0; // We assume that NUL will never be used as first char.
List<string> group = null;
foreach (string s in list) {
if (s[0] != lastChar) {
group = new List<string>();
groups.Add(group);
lastChar = s[0];
}
group.Add(s);
}
// Join the first and the last group if their first char is equal
int lastIndex = groups.Count - 1;
if (groups.Count > 2 && groups[0][0][0] == groups[lastIndex][0][0]) {
// Insert the elements of the last group to the first group
groups[0].InsertRange(0, groups[lastIndex]);
// and delete the last group
groups.RemoveAt(lastIndex);
}
//TODO: Remove test
foreach (List<string> g in groups) {
Console.WriteLine(g[0][0]);
foreach (string s in g) {
Console.WriteLine(" " + s);
}
}
// Now create a list with items of groups having more than 4 duplicates
var result = new List<string>();
foreach (List<string> g in groups) {
if (g.Count > 4) {
result.AddRange(g);
}
}
//TODO: Remove test
Console.WriteLine("--------");
foreach (string s in result) {
Console.Write(s);
Console.Write(" ");
}
Console.WriteLine();
Console.ReadKey();

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Modify duplicate values with duplication index suffix (using Linq) - c#

Related

Comparing Names in array to themselves

C#: Count occurrences of a string in a list which is in another list using LINQ?

Loop to check for duplicate strings

C# Delete one of two successive and same lines in a list

C# count consecutive duplicates in List<string>

Categories

Resources