fastest way to remove an item in a list - c#

I have a list of User objects, and I have to remove ONE item from the list with a specific UserID.
This method has to be as fast as possible, currently I am looping through each item and checking if the ID matches the UserID, if not, then I add the row to a my filteredList collection.
List allItems = GetItems();
for(int x = 0; x < allItems.Count; x++)
{
if(specialUserID == allItems[x].ID)
continue;
else
filteredItems.Add( allItems[x] );
}

If it really has to be as fast as possible, use a different data structure. List isn't known for efficiency of deletion. How about a Dictionary that maps ID to User?

Well, if you want to create a new collection to leave the original untouched, you have to loop through all the items.
Create the new list with the right capacity from the start, that minimises allocations.
Your program logic with the continue seems a bit backwards... just use the != operator instead of the == operator:
List<User> allItems = GetItems();
List<User> filteredItems = new List<User>(allItems.Count - 1);
foreach (User u in allItems) {
if(u.ID != specialUserID) {
filteredItems.Add(u);
}
}
If you want to change the original collection instead of creating a new, storing the items in a Dictionary<int, User> would be the fastest option. Both locating the item and removing it are close to O(1) operations, so that would make the whole operation close to an O(1) operation instead of an O(n) operation.

Use a hashtable. Lookup time is O(1) for everything assuming a good hash algorithm with minimal collision potential. I would recommend something that implements IDictionary

If you must transfer from one list to another here is the fasted result I've found:
var filtered = new List<SomeClass>(allItems);
for (int i = 0; i < filtered.Count; i++)
if (filtered[i].id == 9999)
filtered.RemoveAt(i);
I tried comparing your method, the method above, and a linq "where" statement:
var allItems = new List<SomeClass>();
for (int i = 0; i < 10000000; i++)
allItems.Add(new SomeClass() { id = i });
Console.WriteLine("Tests Started");
var timer = new Stopwatch();
timer.Start();
var filtered = new List<SomeClass>();
foreach (var item in allItems)
if (item.id != 9999)
filtered.Add(item);
var y = filtered.Last();
timer.Stop();
Console.WriteLine("Transfer to filtered list: {0}", timer.Elapsed.TotalMilliseconds);
timer.Reset();
timer.Start();
filtered = new List<SomeClass>(allItems);
for (int i = 0; i < filtered.Count; i++)
if (filtered[i].id == 9999)
filtered.RemoveAt(i);
var s = filtered.Last();
timer.Stop();
Console.WriteLine("Removal from filtered list: {0}", timer.Elapsed.TotalMilliseconds);
timer.Reset();
timer.Start();
var linqresults = allItems.Where(x => (x.id != 9999));
var m = linqresults.Last();
timer.Stop();
Console.WriteLine("linq list: {0}", timer.Elapsed.TotalMilliseconds);
The results were as follows:
Tests Started
Transfer to filtered list: 610.5473
Removal from filtered list: 207.5675
linq list: 379.4382
using the "Add(someCollection)" and using a ".RemoveAt" was a good deal faster.
Also, subsequent .RemoveAt calls are pretty cheap.

I know it's not the fastest, but what about generic list and remove()? (msdn). Anybody knows how it performs compared to eg. the example in the question?

Here's a thought, how about you don't remove it per se. What I mean is something like this:
public static IEnumerable<T> LoopWithExclusion<T>(this IEnumerable<T> list, Func<T,bool> excludePredicate)
{
foreach(var item in list)
{
if(excludePredicate(item))
{
continue;
}
yield return item;
}
}
The point being, whenever you need a "filtered" list, just call this extension method, which loops through the original list, returns all of the items, EXCEPT the ones you don't want.
Something like this:
List<User> users = GetUsers();
//later in the code when you need the filtered list:
foreach(var user in users.LoopWithExclusion(u => u.Id == myIdToExclude))
{
//do what you gotta do
}

Assuming the count of the list is even, I would :
(a) get a list of the number of processors
(b) Divide your list into equal chunks for each processors
(c) spawn a thread for each processor with these data chunks, with the terminating condition being if the predicate is found to return a boolean flag.

public static void RemoveSingle<T>(this List<T> items, Predicate<T> match)
{
int i = -1;
while (i < items.Count && !match(items[++i])) ;
if (i < items.Count)
{
items[i] = items[items.Count - 1];
items.RemoveAt(items.Count - 1);
}
}

I cannot understand why the most easy, straight-forward and obvious solution (also the fastest among the List-based ones) wasn't given by anyone.
This code removes ONE item with a matching ID.
for(int i = 0; i < items.Count; i++) {
if(items[i].ID == specialUserID) {
items.RemoveAt[i];
break;
}
}

If you have a list and you want to mutate it in place to remove an item matching a condition the following is faster than any of the alternatives posted so far:
for (int i = allItems.Count - 1; i >= 0; i--)
if (allItems[i].id == 9999)
allItems.RemoveAt(i);
A Dictionary may be faster for some uses, but don't discount a List. For small collections, it will likely be faster and for large collections, it may save memory which may, in turn make you application faster overall. Profiling is the only way to determine which is faster in a real application.

Here is some code that is efficient if you have hundreds or thousands of items:
List allItems = GetItems();
//Choose the correct loop here
if((x % 5) == 0 && (X >= 5))
{
for(int x = 0; x < allItems.Count; x = x + 5)
{
if(specialUserID != allItems[x].ID)
filteredItems.Add( allItems[x] );
if(specialUserID != allItems[x+1].ID)
filteredItems.Add( allItems[x+1] );
if(specialUserID != allItems[x+2].ID)
filteredItems.Add( allItems[x+2] );
if(specialUserID != allItems[x+3].ID)
filteredItems.Add( allItems[x+3] );
if(specialUserID != allItems[x+4].ID)
filteredItems.Add( allItems[x+4] );
}
}
Start testing if the size of the loop is divisible by the largest number to the smallest number. if you want 10 if statements in the loop then test if the size of the list is bigger then ten and divisible by ten then go down from there. For example if you have 99 items --- you can use 9 if statements in the loop. The loop will iterate 11 times instead of 99 times
"if" statements are cheap and fast

Related

Loop to check for duplicate strings

I want to create a loop to check a list of titles for duplicates.
I currently have this:
var productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle));
foreach (var x in productTitles)
{
var title = x.Text;
productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle));
foreach (var y in productTitles.Skip(productTitles.IndexOf(x) + 1))
{
if (title == y.Text)
{
Assert.Fail("Found duplicate product in the table");
}
}
}
But this is taken the item I skip out of the array for the next loop so item 2 never checks it's the same as item 1, it moves straight to item 3.
I was under the impression that skip just passed over the index you pass in rather than removing it from the list.
You can use GroupBy:
var anyDuplicates = SeleniumContext
.Driver
.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
.GroupBy(p => p.Text, p => p)
.Any(g => g.Count() > 1);
Assert.That(anyDuplicates, Is.False);
or Distinct:
var productTitles = SeleniumContext
.Driver
.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
.Select(p => p.Text)
.ToArray();
var distinctProductTitles = productTitles.Distinct().ToArray();
Assert.AreEqual(productTitles.Length, distinctProductTitles.Length);
Or, if it is enough to find a first duplicate without counting all of them it's better to use a HashSet<T>:
var titles = new HashSet<string>();
foreach (var title in SeleniumContext
.Driver
.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
.Select(p => p.Text))
{
if (!titles.Add(title))
{
Assert.Fail("Found duplicate product in the table");
}
}
All approaches are better in terms of computational complexity (O(n)) than what you propose (O(n2)).
You don't need a loop. Simply use the Where() function to find all same titles, and if there is more than one, then they're duplicates:
var productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle));
foreach(var x in productTitles) {
if (productTitles.Where(y => x.Text == y.Text).Count() > 1) {
Assert.Fail("Found duplicate product in the table");
}
}
I would try a slightly different way since you only need to check for duplicates in a one-dimensional array.
You only have to check the previous element with the next element within the array/collection so using Linq to iterate through all of the items seems a bit unnecessary.
Here's a piece of code to better understand:
var productTitles = SeleniumContext.Driver.FindElements(By.XPath(ComparisonTableElements.ProductTitle))
for ( int i = 0; i < productionTitles.Length; i++ )
{
var currentObject = productionTitles[i];
for ( int j = i + 1; j < productionTitles.Length; j++ )
{
if ( currentObject.Title == productionTitles[j].Title )
{
// here's your duplicate
}
}
}
Since you've checked that item at index 0 is not the same as item placed at index 3 there's no need to check that again when you're at index 3. The items will remain the same.
The Skip(IEnumerable, n) method returns an IEnumerable that doesn't "contain" the n first element of the IEnumerable it's called on.
Also I don't know what sort of behaviour could arise from this, but I wouldn't assign a new IEnumerable to the variable over which the foreach is being executed.
Here's another possible solution with LINQ:
int i = 0;
foreach (var x in productTitles)
{
var possibleDuplicate = productTitles.Skip(i++).Find((y) => y.title == x.title);
//if possibleDuplicate is not default value of type
//do stuff here
}
This goes without saying, but the best solution for you will depend on what you are trying to do. Also, I think the Skip method call is more trouble than it's worth, as I'm pretty sure it will most certainly make the search less eficient.

correct way of looping through a list and remove items

I wrote a function to go through a list and remove list items if some conditions where met. My program crashed on it, and after a while i concluded that the outer for loop, goes through all items in the list.
While at the same routine the list of item can get shorter.
// Lijst is a list of a struct that contains a value .scanned and .price
for (int i = 0; i < Lijst.Count; i++)
{
if (Lijst[i].scanned == false)
{
// (removed deletion of list item i here)
if (Lijst[i].price > (int)nudMinimum.Value)
{
Totaal++;
lblDebug.Text = Totaal.ToString();
}
Lijst.RemoveAt(i); //<-moved to here
}
}
Now i wonder whats the correct to do this, without getting index out of range errors.
Why not direct List<T>.RemoveAll()?
https://msdn.microsoft.com/en-us/library/wdka673a(v=vs.110).aspx
In your case
Lijst.RemoveAll(item => some condition);
E.g.
// Count all the not scanned items each of them exceeds nudMinimum.Value
lblDebug.Text = Lijst
.Where(item => !item.scanned && item.price > (int)nudMinimum.Value)
.Count()
.ToString();
// Remove all not scanned items
Lijst.RemoveAll(item => !item.scanned);
You might be looking for this
for (int i = Lijst.Count - 1 ; i >= 0 ; i--)
{
if (Lijst[i].scanned == false)
{
if (Lijst[i].price > (int)nudMinimum.Value)
{
Totaal++;
lblDebug.Text = Totaal.ToString();
}
Lijst.RemoveAt(i);
}
}
Question in the comment:
why would the other direction for loop work ?
Because when the loop is run in from Zero to Count There is a situation arise when the index is not available to remove and the count is still left. For example:
if you have 10 items in the List the loop starts at 0 and would remove 0,1,2,3,4 and now the item left are 5 and index is also 5 it would remove that item too. After that when loop value reaches 6 and item left is 4. Then it would create a problem. and it would throw an error. i.e. index out of range
here you go
// 1. Count items
lblDebug.Text = Lijst.Count(x => x.price > (int)nudMinimum.Value && !x.scanned).ToString();
//2. Remove items
Lijst.RemoveAll(x => !x.scanned);
The problems is that when you remove the element number 5, the list gets shorter and the element number 6 is now 5th, number 7 becomes 6th etc. However, if you run the loop backwards, the number is kept as expected.
for(int i = donkeys.Count - 1; i >= 0; i++)
if(donkeys[i] == some condition here)
donkeys.RemoveAt(i);
However, it's an like-a-boss approach. There are better ways. You've got the answer but I'd like to suggest a LINQ based approach.
int Totaal = Lijst
.Where(item => item.scanned)
.Where(item => item.price > (int)nudMinimum.Value)
.Count();
Lijst = Lijst.Where(item => !item.scanned).ToList()
Also, as a side note, I wonder if you find the below more readable. Consider the following different naming (both regarding the language and the capitalization).
List<Item> items = ...;
int minimum = (int)nudMinimum.Value;
int total = items
.Where(item => item.scanned)
.Where(item => item.price > minimum)
.Count();
items = items
.Where(item => !item.scanned)
.ToList();
First You are removing the element with index i and then using it. You need to first do your process with element having index i and then remove it. Your code will look like below:
for (int i = 0; i < Lijst.Count; i++)
{
if (Lijst[i].scanned == false)
{
if (Lijst[i].price > (int)nudMinimum.Value)
{
Totaal++;
lblDebug.Text = Totaal.ToString();
}
Lijst.RemoveAt(i);
}
}
Normally if you want to remove from a list all items that match a predicate, you'd use List<T>.RemoveAll(), for example:
List<int> test = Enumerable.Range(0, 10).ToList();
test.RemoveAll(value => value%2 == 0); // Remove all even numbers.
Console.WriteLine(string.Join(", ", test));
However, it seems you need to do some additional processing. You have two choices:
Do it in two steps; first use RemoveAll() to remove unwanted items, then loop over the list to process the remaining items separately.
Loop backwards from List.Count-1 to 0 instead.
your code is some how is not in proper format.
first you deleted the list item and then you are trying to catch the price of that deleted item.
How can it possible.
so you can write in this way.
for (int i = 0; i < Lijst.Count; i++)
{
if (Lijst[i].scanned == false)
{
if (Lijst[i].price > (int)nudMinimum.Value)
{
Totaal++;
lblDebug.Text = Totaal.ToString();
}
Lijst.RemoveAt(i);
}
}
List<string> list = new List<string>();
list.Add("sasa");
list.Add("sames");
list.Add("samu");
list.Add("james");
for (int i = list.Count - 1; i >= 0; i--)
{
list.RemoveAt(i);
}
How to Delete Items from List

C# count consecutive duplicates in List<string>

I have a List of strings and want to count the duplicates in it, to work with this information later. To simply count the duplicates would be very easy, but unfortunately I just want to count the consecutive duplicates.
Let us say we have a list with this string items in it:
"1A","3B","5X","7Q","2W","2G","2J","1A","2A"
Now I want to count the duplicates in this list. I just will look at the first char of each string, the other characters in the string can be ignored!
What we get is 2x "1%" and 3x "2%", what i actually want to get is consecutive duplicates, so my result should look like 3x "2%". The 2x "1A" has to be ignored, they are not in a row.
(% = place holder)
I wrote a code that loops through the list and compares one string with the next one
int counter = 0;
for (int i = 0; i < list.Count; i++)
{
char first = list[i][0];
if ((i + 1) == list.Count) break;
char second = list[(i + 1)][0];
if (first == second)
{
counter++;
}
}
I guess you can imagine that this code is a very ugly way to do this especially if you want to work with the output. Also my code can´t handle the features I need.
The code I am looking for, must be able to deal with two feature I want to implement. First, a row of duplicates does not end if the last element of my list is equal to the first element of the list.
For example:
"1A","1B","5X","7Q","2J","1I"
The "1%" has to be detected as duplicate, because of the "1I" and "1A" which are "in a row". If you would loop through the list you just break up at the end of the list if the first and last element are not equal.
pseudo code:
if(list.First()[0] != list.Last()[0])
The second feature I want to implement is, that the items in the list that are not duplicates, with a "duplicate count" over 4 will be deleted. If there is not a single duplicate row with a "duplicate count" or length over 4 I want to return.
For example:
"1A","1B","5X","3Q","1J","1I"
duplicate count == 4 so return
"1A","1B","1X","3Q","1J","1I"
duplicate count == 5, save this five items, delete any other item in the list.
"1A","1B","1X","3Q","1I","1Z","1Z"
duplicate count == 6, save this six items, delete any other item in the list.
Notice:
Just the first char of each string matters. The input list will have 7 items, not a single item more or less. There is no result list, the old one has to be updated. If the duplicate count is under or equal to 4, then there is no work to do, simply return.
There will not be more than 5 duplicates in a row. I have to check billion of list, so performance is really important
As they don´t teach any better English in German schools, I hope anyone understand what my problem is and is willing to help me out.
This is not part of any homework.
What you can use here is a method that is capable of grouping consecutive items while a condition is met:
public static IEnumerable<IEnumerable<T>> GroupWhile<T>(
this IEnumerable<T> source, Func<T, T, bool> predicate)
{
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
yield break;
List<T> list = new List<T>() { iterator.Current };
T previous = iterator.Current;
while (iterator.MoveNext())
{
if (!predicate(previous, iterator.Current))
{
yield return list;
list = new List<T>();
}
list.Add(iterator.Current);
previous = iterator.Current;
}
yield return list;
}
}
Once we have this helper method we can write your query in a reasonably straightforward manor:
var query = data.GroupWhile((prev, current) => prev[0] == current[0])
.Where(group => group.Count() > 1)
.Select(group => new
{
Character = group.First()[0],
Count = group.Count(),
});
I suggest you to group items beginning with the same char in lists. The result of this grouping will be a List<List<string>>. This makes it easier to work with groups.
var list = new List<string> {
"1A", "3B", "5X", "7Q", "2W", "2G", "2J", "1B", "1C", "1D", "1E"
};
var groups = new List<List<string>>();
char lastChar = (char)0; // We assume that NUL will never be used as first char.
List<string> group = null;
foreach (string s in list) {
if (s[0] != lastChar) {
group = new List<string>();
groups.Add(group);
lastChar = s[0];
}
group.Add(s);
}
// Join the first and the last group if their first char is equal
int lastIndex = groups.Count - 1;
if (groups.Count > 2 && groups[0][0][0] == groups[lastIndex][0][0]) {
// Insert the elements of the last group to the first group
groups[0].InsertRange(0, groups[lastIndex]);
// and delete the last group
groups.RemoveAt(lastIndex);
}
//TODO: Remove test
foreach (List<string> g in groups) {
Console.WriteLine(g[0][0]);
foreach (string s in g) {
Console.WriteLine(" " + s);
}
}
// Now create a list with items of groups having more than 4 duplicates
var result = new List<string>();
foreach (List<string> g in groups) {
if (g.Count > 4) {
result.AddRange(g);
}
}
//TODO: Remove test
Console.WriteLine("--------");
foreach (string s in result) {
Console.Write(s);
Console.Write(" ");
}
Console.WriteLine();
Console.ReadKey();

Remove duplicates with lambda leaving last item (from dupes) alive

I'm trying to refactor an old code "for-bubled" that I had to remove duplicates inside a collection of Items where if properties X Y and Z match the ones from a previously inserted Item, only the last item to be inserted should be preserved in the collection:
private void RemoveDuplicates()
{
//Remove duplicated items.
int endloop = Items.Count;
for (int i = 0; i < endloop - 1; i++)
{
var item = Items[i];
for (int j = i + 1; j < endloop; j++)
{
if (!item.HasSamePropertiesThan(Items[j]))
{
continue;
}
AllItems.Remove(item);
break;
}
}
}
where HasSameProperties() is an extension method for Item and does something similar to:
public static bool HasSamePropertiesThan(this Item i1, Item i2)
{
return string.Equals(i1.X, i2.X, StringComparison.InvariantCulture)
&& string.Equals(i1.Y, i2.Y, StringComparison.InvariantCulture)
string.Equals(i1.Z, i2.Z, StringComparison.InvariantCulture);
}
so if I have a collection like:
[0]A
[1]A
[2]A
[3]B
[4]A
[5]A
I want to be able to delete all duplicates, leaving only [3]B and [5]A alive.
so far, I've managed to craft these lambdas:
var query = items.GroupBy(i => new {i.X, i.Y, i.Z}).Select(i => i.Last()); // Retrieves entities to not delete
var dupes = Items.Except(query);
dupes.ToList().ForEach(d => Items.Remove(d));
based on these examples:
Remove duplicates in the list using linq
Delete duplicates using Lambda
Which don't seem to work quite well... (The removed items are incorrect, some items are left in the collection and should've been removed) what am I doing wrong?
mmm a quick question? the result of "Query" it supose to have the result that you are looking for? in my opinión you are getting a list of the ítems, then you do a query with the elements founded before and at the end you are removing from the original list the result
correct me if I'm wrong but is not the same doing something like this:
items = items.GroupBy(i => new {i.X, i.Y, i.Z}).Select(i => i.Last()).ToList();
if the result of "Query" is not returning the right elements then your problem is how are yo doing the query, or problably you need to order the list before apply the query
You could either use a HashSet, or using linq do something like this:
var dups = new string[]{"A","A","B","B"};
var nonDupe = dups.Distinct().ToArray();

How can I divide each element in a list by an integer? C#

So I have created a list which holds doubles, is it possible to divide every element in this list by an integer variable?
List<Double> amount = new List<Double>();
Just create a new list with the modified contents:
var newAmounts = amount.Select(x => x / 10).ToList();
Creating new data is less error-prone than modifying existing data.
foreach
You can iterate over each item with foreach:
foreach(var item in amount)
{
var result = item / 3;
}
If you want to store the results in a new list you can do it inside the loop...
var newList = new List<double>(amount.Count); //<-- set capacity for performance
foreach(var item in amount)
{
newList.Add(item / 3);
}
LINQ
... or use Linq to an IEnumerable<double>:
var newList = from item in amount select item / 3;
You can also use Linq extension methods:
var newList = amount.Select(item => item / 3);
Or if you want a List<double> from Linq, you can do it with ToList():
var newList = (from item in amount select item / 3).ToList();
... or ...
var newList = amount.Select(item => item / 3).ToList();
for
As an alternative you can use a simple for:
for (int index = 0; index < amount.Count; index++)
{
var result = amount[index] / 3;
}
This approach will allow you to do the modifications in place:
for (int index = 0; index < amount.Count; index++)
{
amount[index] = amount[index] / 3;
}
PLINQ
You may also consider using Parallel LINQ (with AsParallel):
var newList = amount.AsParallel().Select(item => item / 3).ToList();
Warning: The result may be out of order.
This will take advantage of multicore CPU, by running the operations for each item in parallel. This is particularly good for large lists, and for operations that are independent for each item.
Comparison
foreach: Easy to read and write, easy to remember. Also allows for some optimizations.
Linq: Better if you are used to SQL, also allows for lazy execution.
for: Doing the operation in place requires less memory. Allows for a more control.
PLinq: All you love from Linq, optimized for multiple cores. Although some caution is needed.
In case you want to modify the same instance (rather than creating a new collection), do:
for (int i = 0; i < amount.Count; ++i)
amount[i] /= yourInt32Divisor;
Of course the simple way is to iterate the list and divide each number:
foreach(var d in amount) {
var result = d / 3;
}
You can store the result in a new list.

Categories