I'm trying to make a function with list.
It is to sort and delete duplicates.
It sorts good, but don't delete duplictates.
What's the problem?
void sort_del(List<double> slist){
//here i sort slist
//get sorted with duplicates
List<double> rlist = new List<double>();
int new_i=0;
rlist.Add(slist[0]);
for (i = 0; i < size; i++)
{
if (slist[i] != rlist[new_i])
{
rlist.Add(slist[i]);
new_i++;
}
}
slist = new List<double>(rlist);
//here get without duplicates
}
It does not work because slist is passed by value. Assigning rlist to it has no effect on the caller's end. Your algorithm for detecting duplicates seems fine. If you do not want to use a more elegant LINQ way suggested in the other answer, change the method to return your list:
List<double> sort_del(List<double> slist){
// Do your stuff
return rlist;
}
with double you can just use Distinct()
slist = new List<double>(rlist.Distinct());
or maybe:
slist.Distinct().Sort();
You're not modifying the underlying list. You're trying to add to a new collection, and you're not checking if the new one contains one of the old ones correctly.
If you're required to do this for homework (which seems likely, as there are data structures and easy ways to do this with LINQ that others have pointed out), you should break the sort piece and the removal of duplication into two separate methods. The methods that removes duplicates should accept a list as a parameter (as this one does), and return the new list instance without duplicates.
Related
I got a List<string> named Test:
List<string> Test = new List<string>();
I want to add a string to it using Test.Add();, but first I want to check if it already exists in the list.
I thought of something like this:
if (Test.Find("Teststring") != true)
{
Test.Add("Teststring");
}
However, this returns an error.
I assume that you if you don't want to add the item if it is already added
Try This:
if (!Test.Contains("Teststring"))
{
Test.Add("Teststring");
}
Any receives a Predicate. It determines if any element in a collection matches a certain condition. You could do this imperatively, using a loop construct. But the Any extension method provides another way.
See this:
bool b1 = Test.Any(item => item == "Teststring");
Also you can use :
if (!Test.Contains("Teststring"))
{
...
}
If you don't want to add an item twice it is a good indicator that you might use a HashSet<T> instead which is more efficient but doesn't allow duplicates(like a Dictionary with only keys).
HashSet<string> Test = new HashSet<string>();
bool newString = Test.Add("Teststring");
If you need to use the list use List.Contains to check if the string is already in the list.
What is the difference between HashSet and List in C#?
But your code suggests that you only want to add duplicates. I assume that this is not intended.
In my opinion you are using the wrong datastructure here. You should use Hashset to avoid duplicates.
The lookup time for Hashset is O(1) whereas for list it is O(n)
The HashSet class provides high-performance set operations. A set is a collection that contains no duplicate elements, and whose elements are in no particular order.
This is how your code should look like.
HashSet<string> Test = new HashSet<string>();
Test.Add("Teststring");
Use Test.Contains("TestString");
My C# program generates random strings from a given pattern. These strings are stored in a list. As no duplicates are allowed I'm doing it like this:
List<string> myList = new List<string>();
for (int i = 0; i < total; i++) {
string random_string = GetRandomString(pattern);
if (!myList.Contains(random_string)) myList.Add(random_string);
}
As you can imagine this works fine for several hundreds of entries. But I'm facing the situation to generate several million strings. And with each added string checking for duplicates gets slower and slower.
Are there any faster ways to avoid duplicates?
Use a data structure that can much more efficiently determine if an item exists, namely a HashSet. It can determine if an item is in the set in constant time, regardless of the number of items in the set.
If you really need the items in a List instead, or you need the items in the resulting list to be in the order they were generated, then you can store the data in both a list and a hashset; adding the item to both collections if it doesn't currently exist in the HashSet.
The easiest way is to use this:
myList = myList.Distinct().ToList();
Although this would require creating the list once, then creating a new list. A better way might be to make your generator ahead of time:
public IEnumerable<string> GetRandomStrings(int total, string pattern)
{
for (int i = 0; i < total; i++)
{
yield return GetRandomString(pattern);
}
}
...
myList = GetRandomStrings(total, pattern).Distinct().ToList();
Of course, if you don't need to access items by index, you could probably improve efficiency even more by dropping the ToList and just using an IEnumerable.
Don't use List<>. Use Dictionary<> or HashSet<> instead!
You could use a HashSet<string> if order is not important:
HashSet<string> myHashSet = new HashSet<string>();
for (int i = 0; i < total; i++)
{
string random_string = GetRandomString(pattern);
myHashSet.Add(random_string);
}
The HashSet class provides high-performance set operations. A set is a collection that contains no duplicate elements, and whose elements are in no particular order.
MSDN
Or if the order is important, I'd recommend using a SortedSet (.net 4.5 only)
not a good way but kind of quick fix,
take a bool to check if in whole list there is any duplicate entry.
bool containsKey;
string newKey;
public void addKey(string newKey){
foreach(string key in MyKeys){
if(key == newKey){
containsKey = true;
}
}
if(!containsKey){
MyKeys.add(newKey);
}else{
containsKey = false;
}
}
A Hashtable would be a faster way to check if an item exists than a list.
Have you tried:
myList = myList.Distinct()
This is how I remove an item from a List. Is this the right way? Is there any cleaner/faster way to achieve this.
List<ItemClass> itemsToErase = new List<ItemClass>();
foreach(ItemClass itm in DS)
{
if(itm.ToBeRemoved)
itemsToErase .Add(itm);
}
foreach(ItemClass eraseItem in itemsToErase)
{
DS.Remove(eraseItem );
}
EDIT: DS is of type List<ItemClass>
EDIT: Have one more doubt. What if DS is a LinkedList<ItemClass>. There is no RemoveAll() for that.
There is List.RemoveAll() which takes a delegate where you can add your comparison function.
E.g.:
List<ItemClass> itemsToErase = new List<ItemClass>();
itemsToErase.RemoveAll( itm => itm.ToBeRemoved );
You can use the RemoveAll() method:
DS.RemoveAll(x => x.ToBeRemoved);
This is a O(n) operation, your code is O(n^2).
This methods avoids a lot of copies in the orginal List but has a greater memory consumption.
List<ItemClass> newList = new List<ItemClass>(originalList.Count);
foreach(var item in originalList) {
if (!item.ToBeRemoved)
newList.Add(item);
}
originalList = newList;
Not really, the logic remains the same no matter how you do it. You cannot iterate over and modify a collection at the same time. It looks cleaner with LINQ:
var list = new List<int> { 1, 2, 3, 4, 5 };
var except = new List<int> { 3, 4 };
var result = list.Except(except);
Hope this helps.
edit: Even list.RemoveAll(...) has to maintain two lists internally to do it.
edit2: Actually svick is right; after looking at the implementation, RemoveAll is fastest.
use this method
DS.RemoveAll(x => x.ToBeRemoved);
Your code is a common solution to the problem and is fine. Especially if there are only a few items to be removed.
As others have suggested you can also create a new list containing the items you want to keep and then discard the old list. This works better if most items are going to be removed and only a small number kept.
When picking either of these methods keep in mind that both require the allocation of a new list object. This extra memory allocation probably isn't an issue but might be, depending on what the rest of the code is doing.
As others have mentioned there is also the RemoveAll method. This is what I would use, it is neat, clear and as efficient as anything using a list can be.
The last option is to use an index to loop through the collection. EG.
(Sorry for the VB, I use it more often than C# and didn't want to confuse by getting the syntax wrong)
Dim i as Integer
Do While i<DS.Count
If DS.Item(i).ToBeRemoved Then
DS.RemoveAt(i)
Else
i+=1
End If
Loop
Hi I'm working on some legacy code that goes something along the lines of
for(int i = results.Count-1; i >= 0; i--)
{
if(someCondition)
{
results.Remove(results[i]);
}
}
To me it seems like bad practice to be removing the elements while still iterating through the loop because you'll be modifying the indexes.
Is this a correct assumption?
Is there a better way of doing this? I would like to use LINQ but I'm in 2.0 Framework
The removal is actually ok since you are going downwards to zero, only the indexes that you already passed will be modified. This code actually would break for another reason: It starts with results.Count, but should start at results.Count -1 since array indexes start at 0.
for(int i = results.Count-1; i >= 0; i--)
{
if(someCondition)
{
results.RemoveAt(i);
}
}
Edit:
As was pointed out - you actually must be dealing with a List of some sort in your pseudo-code. In this case they are conceptually the same (since Lists use an Array internally) but if you use an array you have a Length property (instead of a Count property) and you can not add or remove items.
Using a list the solution above is certainly concise but might not be easy to understand for someone that has to maintain the code (i.e. especially iterating through the list backwards) - an alternative solution could be to first identify the items to remove, then in a second pass removing those items.
Just substitute MyType with the actual type you are dealing with:
List<MyType> removeItems = new List<MyType>();
foreach(MyType item in results)
{
if(someCondition)
{
removeItems.Add(item);
}
}
foreach (MyType item in removeItems)
results.Remove(item);
It doesn't seem like the Remove should work at all. The IList implementation should fail if we're dealing with a fixed-size array, see here.
That being said, if you're dealing with a resizable list (e.g. List<T>), why call Remove instead of RemoveAt? Since you're already navigating the indices in reverse, you don't need to "re-find" the item.
May I suggest a somewhat more functional alternative to your current code:
Instead of modifying the existing array one item at a time, you could derive a new one from it and then replace the whole array as an "atomic" operation once you're done:
The easy way (no LINQ, but very similar):
Predicate<T> filter = delegate(T item) { return !someCondition; };
results = Array.FindAll(results, filter);
// with LINQ, you'd have written: results = results.Where(filter);
where T is the type of the items in your results array.
A somewhat more explicit alternative:
var newResults = new List<T>();
foreach (T item in results)
{
if (!someCondition)
{
newResults.Add(item);
}
}
results = newResults.ToArray();
Usually you wouldn't remove elements as such, you would create a new array from the old without the unwanted elements.
If you do go the route of removing elements from an array/list your loop should count down rather than up. (as yours does)
a couple of options:
List<int> indexesToRemove = new List<int>();
for(int i = results.Count; i >= 0; i--)
{
if(someCondition)
{
//results.Remove(results[i]);
indexesToRemove.Add(i);
}
}
foreach(int i in indexesToRemove) {
results.Remove(results[i]);
}
or alternatively, you could make a copy of the existing list, and instead remove from the original list.
//temp is a copy of results
for(int i = temp.Count-1; i >= 0; i--)
{
if(someCondition)
{
results.Remove(results[i]);
}
}
I'm having issues finding the most efficient way to remove duplicates from a list of strings (List).
My current implementation is a dual foreach loop checking the instance count of each object being only 1, otherwise removing the second.
I know there are MANY other questions out there, but they all the best solutions require above .net 2.0, which is the current build environment I'm working in. (GM and Chrysler are very resistant to changes ... :) )
This limits the possible results by not allowing any LINQ, or HashSets.
The code I'm using is Visual C++, but a C# solution will work just fine as well.
Thanks!
This probably isn't what you're looking for, but if you have control over this, the most efficient way would be to not add them in the first place...
Do you have control over this? If so, all you'd need to do is a myList.Contains(currentItem) call before you add the item and you're set
You could do the following.
List<string> list = GetTheList();
Dictionary<string,object> map = new Dictionary<string,object>();
int i = 0;
while ( i < list.Count ) {
string current = list[i];
if ( map.ContainsKey(current) ) {
list.RemoveAt(i);
} else {
i++;
map.Add(current,null);
}
}
This has the overhead of building a Dictionary<TKey,TValue> object which will duplicate the list of unique values in the list. But it's fairly efficient speed wise.
I'm no Comp Sci PhD, but I'd imagine using a dictionary, with the items in your list as the keys would be fast.
Since a dictionary doesn't allow duplicate keys, you'd only have unique strings at the end of iteration.
Just remember when providing a custom class to override the Equals() method in order for the Contains() to function as required.
Example
List<CustomClass> clz = new List<CustomClass>()
public class CustomClass{
public bool Equals(Object param){
//Put equal code here...
}
}
If you're going the route of "just don't add duplicates", then checking "List.Contains" before adding an item works, but its O(n^2) where n is the number strings you want to add. Its no different from your current solution using two nested loops.
You'll have better luck using a hashset to store items you've already added, but since you're using .NET 2.0, a Dictionary can substitute for a hash set:
static List<T> RemoveDuplicates<T>(List<T> input)
{
List<T> result = new List<T>(input.Count);
Dictionary<T, object> hashSet = new Dictionary<T, object>();
foreach (T s in input)
{
if (!hashSet.ContainsKey(s))
{
result.Add(s);
hashSet.Add(s, null);
}
}
return result;
}
This runs in O(n) and uses O(2n) space, it will generally work very well for up to 100K items. Actual performance depends on the average length of the strings -- if you really need to maximum performance, you can exploit some more powerful data structures like tries make inserts even faster.