Loop to check for duplicate strings

Loop to check for duplicate strings - c#

I want to create a loop to check a list of titles for duplicates.
I currently have this:
var productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle));
foreach (var x in productTitles)
{
var title = x.Text;
productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle));
foreach (var y in productTitles.Skip(productTitles.IndexOf(x) + 1))
{
if (title == y.Text)
{
Assert.Fail("Found duplicate product in the table");
}
}
}
But this is taken the item I skip out of the array for the next loop so item 2 never checks it's the same as item 1, it moves straight to item 3.
I was under the impression that skip just passed over the index you pass in rather than removing it from the list.

You can use GroupBy:
var anyDuplicates = SeleniumContext
.Driver
.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
.GroupBy(p => p.Text, p => p)
.Any(g => g.Count() > 1);
Assert.That(anyDuplicates, Is.False);
or Distinct:
var productTitles = SeleniumContext
.Driver
.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
.Select(p => p.Text)
.ToArray();
var distinctProductTitles = productTitles.Distinct().ToArray();
Assert.AreEqual(productTitles.Length, distinctProductTitles.Length);
Or, if it is enough to find a first duplicate without counting all of them it's better to use a HashSet<T>:
var titles = new HashSet<string>();
foreach (var title in SeleniumContext
.Driver
.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
.Select(p => p.Text))
{
if (!titles.Add(title))
{
Assert.Fail("Found duplicate product in the table");
}
}
All approaches are better in terms of computational complexity (O(n)) than what you propose (O(n2)).

You don't need a loop. Simply use the Where() function to find all same titles, and if there is more than one, then they're duplicates:
var productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle));
foreach(var x in productTitles) {
if (productTitles.Where(y => x.Text == y.Text).Count() > 1) {
Assert.Fail("Found duplicate product in the table");
}
}

I would try a slightly different way since you only need to check for duplicates in a one-dimensional array.
You only have to check the previous element with the next element within the array/collection so using Linq to iterate through all of the items seems a bit unnecessary.
Here's a piece of code to better understand:
var productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
for ( int i = 0; i < productionTitles.Length; i++ )
{
var currentObject = productionTitles[i];
for ( int j = i + 1; j < productionTitles.Length; j++ )
{
if ( currentObject.Title == productionTitles[j].Title )
{
// here's your duplicate
}
}
}
Since you've checked that item at index 0 is not the same as item placed at index 3 there's no need to check that again when you're at index 3. The items will remain the same.

The Skip(IEnumerable, n) method returns an IEnumerable that doesn't "contain" the n first element of the IEnumerable it's called on.
Also I don't know what sort of behaviour could arise from this, but I wouldn't assign a new IEnumerable to the variable over which the foreach is being executed.
Here's another possible solution with LINQ:
int i = 0;
foreach (var x in productTitles)
{
var possibleDuplicate = productTitles.Skip(i++).Find((y) => y.title == x.title);
//if possibleDuplicate is not default value of type
//do stuff here
}
This goes without saying, but the best solution for you will depend on what you are trying to do. Also, I think the Skip method call is more trouble than it's worth, as I'm pretty sure it will most certainly make the search less eficient.

Related

Modify duplicate values with duplication index suffix (using Linq)

I have a list:
List<string> myList = new List<string>{ "dog", "cat", "dog", "bird" };
I want the output to be list of:
"dog (1)", "cat", "dog (2)", "bird"
I've already looked through this question but it is only talking about count the duplicates, my output should be with its duplicate index. like
duplicate (index)
I've tried this code:
var q = list.GroupBy(x => x)
.Where(y => y.Count()>1)
.Select(g => new {Value = g.Key + "(" + g.Index + ")"})
but it does not seem to work because:
Need to return all of my list back \ Or just modify the existing one. (and my answer returning only the duplicate ones)
For duplicate values need to add a prefix based on their "duplicate index".
How to do this in C#? Is there a way using Linq?

The accepted solution works but is extremely inefficient when the size of the list grows large.
What you want to do is first get the information you need in an efficient data structure. Can you implement a class:
sealed class Counter<T>
{
public void Add(T item) { }
public int Count(T item) { }
}
where Count returns the number of times (possibly zero) that Add has been called with that item. (Hint: you could use a Dictionary<T, int> to good effect.)
All right. Now that we have our useful helper we can:
var c1 = new Counter<string>();
foreach(string item in myList)
c1.Add(item);
Great. Now we can construct our new list by making use of a second counter:
var result = new List<String>();
var c2 = new Counter<String>();
foreach(string item in myList)
{
c2.Add(item);
if (c1.Count(item) == 1))
result.Add(item);
else
result.Add($"{item} ({c2.Count(item)})");
}
And we're done. Or, if you want to modify the list in place:
var c2 = new Counter<String>();
// It's a bad practice to mutate a list in a foreach, so
// we'll be sticklers and use a for.
for (int i = 0; i < myList.Count; i = i + 1)
{
var item = myList[i];
c2.Add(item);
if (c1.Count(item) != 1))
myList[i] = $"{item} ({c2.Count(item)})";
}
The lesson here is: create a useful helper class that solves one problem extremely well, and then use that helper class to make the solution to your actual problem more elegant. You need to count things to solve a problem? Make a thing-counter class.

This solution is not quadratic with respect to size of list, and it modifies the list in place as preferred in OP.
Any efficient solution will involve a pre-pass in order to find and count the duplicates.
List<string> myList = new List<string>{ "dog", "cat", "dog", "bird" };
//map out a count of all the duplicate words in dictionary.
var counts = myList
.GroupBy(s => s)
.Where(p => p.Count() > 1)
.ToDictionary(p => p.Key, p => p.Count());
//modify the list, going backwards so we can take advantage of our counts.
for (int i = myList.Count - 1; i >= 0; i--)
{
string s = myList[i];
if (counts.ContainsKey(s))
{
//add the suffix and decrement the number of duplicates left to tag.
myList[i] += $" ({counts[s]--})";
}
}

One way to do this is to simply create a new list that contains the additional text for each item that appears more than once. When we find these items, we can create our formatted string using a counter variable, and increment the counter if the list of formatted strings contains that counter already.
Note that this is NOT a good performing solution. It was just the first thing that came to my head. But it's a place to start...
private static void Main()
{
var myList = new List<string> { "dog", "cat", "dog", "bird" };
var formattedItems = new List<string>();
foreach (var item in myList)
{
if (myList.Count(i => i == item) > 1)
{
int counter = 1;
while (formattedItems.Contains($"{item} ({counter})")) counter++;
formattedItems.Add($"{item} ({counter})");
}
else
{
formattedItems.Add(item);
}
}
Console.WriteLine(string.Join(", ", formattedItems));
Console.Write("\nDone!\nPress any key to exit...");
Console.ReadKey();
}
Output

Ok, #EricLippert challenged me and I couldn't let it go. Here's my second attempt, which I believe is much better performing and modifies the original list as requested. Basically we create a second list that contains all the duplicate entries in the first. Then we walk backwards through the first list, modifying any entries that have a counterpart in the duplicates list, and remove the item from the duplicates list each time we encounter one:
private static void Main()
{
var myList = new List<string> {"dog", "cat", "dog", "bird"};
var duplicates = myList.Where(item => myList.Count(i => i == item) > 1).ToList();
for (var i = myList.Count - 1; i >= 0; i--)
{
var numDupes = duplicates.Count(item => item == myList[i]);
if (numDupes <= 0) continue;
duplicates.Remove(myList[i]);
myList[i] += $" ({numDupes})";
}
Console.WriteLine(string.Join(", ", myList));
Console.Write("\nDone!\nPress any key to exit...");
Console.ReadKey();
}
Output

C# Delete one of two successive and same lines in a list

how can i delete one of two same successive lines in a list?
For example:
load
testtest
cd /abc
cd /abc
testtest
exit
cd /abc
In this case ONLY line three OR four.The lists have about 50000 lines, so it is also about speed.
Do you have an idea?
Thank you!
Homeros

You just have to look at the last added element in the second list:
var secondList = new List<string>(firstList.Count){ firstList[0] };
foreach(string next in firstList.Skip(1))
if(secondList.Last() != next)
secondList.Add(next);
Since you wanted to delete the duplicates you have to assign this new list to the old variable:
firstList = secondList;
This approach is more efficient than deleting from a list.
Side-note: since Enumerable.Last is optimized for collections with an indexer(IList<T>), is is as efficient as secondList[secondList.Count-1], but more readable.

user a reverse for-loop and check the adjacent elements:
List<string> list = new List<string>();
for (int i = list.Count-1; i > 0 ; i--)
{
if (list[i] == list[i-1])
{
list.RemoveAt(i);
}
}
the reverse version is advantageous here, because the list might shrink in size with every removed element

I would first split the list, then use LINQ to only select items that don't have the same previous item:
string[] source = text.Split(Environment.NewLine);
var list = source.Select((l, idx) => new { Line = l, Index = idx } )
.Where(x => x.Index == 0 || source[x.Index - 1] != x.Line)
.Select(x => x.Line)
.ToList() // materialize
;

O(n) as extension method
public static IEnumerable<string> RemoveSameSuccessiveItems(this IEnumerable<string> items)
{
string previousItem = null;
foreach(var item in list)
{
if (item.Equals(previousItem) == false)
{
previousItem = item;
yield item;
}
}
}
Then use it
lines = lines.RemoveSameSuccessiveItems();

Remove duplicates with lambda leaving last item (from dupes) alive

I'm trying to refactor an old code "for-bubled" that I had to remove duplicates inside a collection of Items where if properties X Y and Z match the ones from a previously inserted Item, only the last item to be inserted should be preserved in the collection:
private void RemoveDuplicates()
{
//Remove duplicated items.
int endloop = Items.Count;
for (int i = 0; i < endloop - 1; i++)
{
var item = Items[i];
for (int j = i + 1; j < endloop; j++)
{
if (!item.HasSamePropertiesThan(Items[j]))
{
continue;
}
AllItems.Remove(item);
break;
}
}
}
where HasSameProperties() is an extension method for Item and does something similar to:
public static bool HasSamePropertiesThan(this Item i1, Item i2)
{
return string.Equals(i1.X, i2.X, StringComparison.InvariantCulture)
&& string.Equals(i1.Y, i2.Y, StringComparison.InvariantCulture)
string.Equals(i1.Z, i2.Z, StringComparison.InvariantCulture);
}
so if I have a collection like:
[0]A
[1]A
[2]A
[3]B
[4]A
[5]A
I want to be able to delete all duplicates, leaving only [3]B and [5]A alive.
so far, I've managed to craft these lambdas:
var query = items.GroupBy(i => new {i.X, i.Y, i.Z}).Select(i => i.Last()); // Retrieves entities to not delete
var dupes = Items.Except(query);
dupes.ToList().ForEach(d => Items.Remove(d));
based on these examples:
Remove duplicates in the list using linq
Delete duplicates using Lambda
Which don't seem to work quite well... (The removed items are incorrect, some items are left in the collection and should've been removed) what am I doing wrong?

mmm a quick question? the result of "Query" it supose to have the result that you are looking for? in my opinión you are getting a list of the ítems, then you do a query with the elements founded before and at the end you are removing from the original list the result
correct me if I'm wrong but is not the same doing something like this:
items = items.GroupBy(i => new {i.X, i.Y, i.Z}).Select(i => i.Last()).ToList();
if the result of "Query" is not returning the right elements then your problem is how are yo doing the query, or problably you need to order the list before apply the query

You could either use a HashSet, or using linq do something like this:
var dups = new string[]{"A","A","B","B"};
var nonDupe = dups.Distinct().ToArray();

Remove an object from a collection

I want to remove an object from a collection if the object does'nt satisfies some condition
foreach (var data in infData)
{
if(data.Id==0)
{
//Remove this object from the collection
}
}
How to do this.
EDIT: This is the complete code
foreach (var data in infData)
{
//Validate Offence Code
IQueryable<Ref_OffenceCode> allRows = dbContext.Ref_OffenceCode;
if (allRows.Where(p => p.Code == data.offenceCode && p.StartDate<=data.offenceDate ).Count() == 0)
{
invalidCount += 1;
}
//Validate Location Code
//IQueryable<Ref_OffenceCode> allRows = dbContext.Ref_OffenceCode;
if (invalidCount != 0)
{
infData.Remove(data);
}
}

Instead of removing the object from the collection you could create a new filtered collection:
var filteredList = infData.Where(x => x.Id != 0);
and leave the GC take care of the old collection when it falls out of scope. Also you mentioned ArrayList in your post. Unless you are using .NET 1.1 or older there's absolutely no reason to use ArrayList. A generic collection would be more appropriate.

for List do this:
infData = infData.RemoveAll(p => p.Id == 0)
and in General you can do this (for enumerable):
enumerable = enumerable.Except(enumerable.Where(p => p.Id == 0));

Don't use foreach if you want to remove an item from a collection (since you are modifying the collection while iterating over it).
You can use an index based approach, but recall that the collection size will change. If you only need to remove one item, you can do this:
for (int i = 0; i < infData.Count; i++)
{
if(infData[i].Id==0)
{
infData.RemoveAt(i);
break;
}
}
As #Stefano comments, you can iterate backwards and then you don't need to break (and can remove multiple items):
for (int i = infData.Count - 1; i >= 0 ; i--)
{
if(infData[i].Id==0)
{
infData.RemoveAt(i);
}
}

fastest way to remove an item in a list

I have a list of User objects, and I have to remove ONE item from the list with a specific UserID.
This method has to be as fast as possible, currently I am looping through each item and checking if the ID matches the UserID, if not, then I add the row to a my filteredList collection.
List allItems = GetItems();
for(int x = 0; x < allItems.Count; x++)
{
if(specialUserID == allItems[x].ID)
continue;
else
filteredItems.Add( allItems[x] );
}

If it really has to be as fast as possible, use a different data structure. List isn't known for efficiency of deletion. How about a Dictionary that maps ID to User?

Well, if you want to create a new collection to leave the original untouched, you have to loop through all the items.
Create the new list with the right capacity from the start, that minimises allocations.
Your program logic with the continue seems a bit backwards... just use the != operator instead of the == operator:
List<User> allItems = GetItems();
List<User> filteredItems = new List<User>(allItems.Count - 1);
foreach (User u in allItems) {
if(u.ID != specialUserID) {
filteredItems.Add(u);
}
}
If you want to change the original collection instead of creating a new, storing the items in a Dictionary<int, User> would be the fastest option. Both locating the item and removing it are close to O(1) operations, so that would make the whole operation close to an O(1) operation instead of an O(n) operation.

Use a hashtable. Lookup time is O(1) for everything assuming a good hash algorithm with minimal collision potential. I would recommend something that implements IDictionary

If you must transfer from one list to another here is the fasted result I've found:
var filtered = new List<SomeClass>(allItems);
for (int i = 0; i < filtered.Count; i++)
if (filtered[i].id == 9999)
filtered.RemoveAt(i);
I tried comparing your method, the method above, and a linq "where" statement:
var allItems = new List<SomeClass>();
for (int i = 0; i < 10000000; i++)
allItems.Add(new SomeClass() { id = i });
Console.WriteLine("Tests Started");
var timer = new Stopwatch();
timer.Start();
var filtered = new List<SomeClass>();
foreach (var item in allItems)
if (item.id != 9999)
filtered.Add(item);
var y = filtered.Last();
timer.Stop();
Console.WriteLine("Transfer to filtered list: {0}", timer.Elapsed.TotalMilliseconds);
timer.Reset();
timer.Start();
filtered = new List<SomeClass>(allItems);
for (int i = 0; i < filtered.Count; i++)
if (filtered[i].id == 9999)
filtered.RemoveAt(i);
var s = filtered.Last();
timer.Stop();
Console.WriteLine("Removal from filtered list: {0}", timer.Elapsed.TotalMilliseconds);
timer.Reset();
timer.Start();
var linqresults = allItems.Where(x => (x.id != 9999));
var m = linqresults.Last();
timer.Stop();
Console.WriteLine("linq list: {0}", timer.Elapsed.TotalMilliseconds);
The results were as follows:
Tests Started
Transfer to filtered list: 610.5473
Removal from filtered list: 207.5675
linq list: 379.4382
using the "Add(someCollection)" and using a ".RemoveAt" was a good deal faster.
Also, subsequent .RemoveAt calls are pretty cheap.

I know it's not the fastest, but what about generic list and remove()? (msdn). Anybody knows how it performs compared to eg. the example in the question?

Here's a thought, how about you don't remove it per se. What I mean is something like this:
public static IEnumerable<T> LoopWithExclusion<T>(this IEnumerable<T> list, Func<T,bool> excludePredicate)
{
foreach(var item in list)
{
if(excludePredicate(item))
{
continue;
}
yield return item;
}
}
The point being, whenever you need a "filtered" list, just call this extension method, which loops through the original list, returns all of the items, EXCEPT the ones you don't want.
Something like this:
List<User> users = GetUsers();
//later in the code when you need the filtered list:
foreach(var user in users.LoopWithExclusion(u => u.Id == myIdToExclude))
{
//do what you gotta do
}

Assuming the count of the list is even, I would :
(a) get a list of the number of processors
(b) Divide your list into equal chunks for each processors
(c) spawn a thread for each processor with these data chunks, with the terminating condition being if the predicate is found to return a boolean flag.

public static void RemoveSingle<T>(this List<T> items, Predicate<T> match)
{
int i = -1;
while (i < items.Count && !match(items[++i])) ;
if (i < items.Count)
{
items[i] = items[items.Count - 1];
items.RemoveAt(items.Count - 1);
}
}

I cannot understand why the most easy, straight-forward and obvious solution (also the fastest among the List-based ones) wasn't given by anyone.
This code removes ONE item with a matching ID.
for(int i = 0; i < items.Count; i++) {
if(items[i].ID == specialUserID) {
items.RemoveAt[i];
break;
}
}

If you have a list and you want to mutate it in place to remove an item matching a condition the following is faster than any of the alternatives posted so far:
for (int i = allItems.Count - 1; i >= 0; i--)
if (allItems[i].id == 9999)
allItems.RemoveAt(i);
A Dictionary may be faster for some uses, but don't discount a List. For small collections, it will likely be faster and for large collections, it may save memory which may, in turn make you application faster overall. Profiling is the only way to determine which is faster in a real application.

Here is some code that is efficient if you have hundreds or thousands of items:
List allItems = GetItems();
//Choose the correct loop here
if((x % 5) == 0 && (X >= 5))
{
for(int x = 0; x < allItems.Count; x = x + 5)
{
if(specialUserID != allItems[x].ID)
filteredItems.Add( allItems[x] );
if(specialUserID != allItems[x+1].ID)
filteredItems.Add( allItems[x+1] );
if(specialUserID != allItems[x+2].ID)
filteredItems.Add( allItems[x+2] );
if(specialUserID != allItems[x+3].ID)
filteredItems.Add( allItems[x+3] );
if(specialUserID != allItems[x+4].ID)
filteredItems.Add( allItems[x+4] );
}
}
Start testing if the size of the loop is divisible by the largest number to the smallest number. if you want 10 if statements in the loop then test if the size of the list is bigger then ten and divisible by ten then go down from there. For example if you have 99 items --- you can use 9 if statements in the loop. The loop will iterate 11 times instead of 99 times
"if" statements are cheap and fast

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Loop to check for duplicate strings - c#

Related

Modify duplicate values with duplication index suffix (using Linq)

C# Delete one of two successive and same lines in a list

Remove duplicates with lambda leaving last item (from dupes) alive

Remove an object from a collection

fastest way to remove an item in a list

Categories

Resources