exclude items of one list from another with C#

exclude items of one list from another with C# - c#

I have a rather specific question about how to exclude items of one list from another. Common approaches such as Except() won't do and here is why:
If the duplicate within a list has an "even" index - I need to remove THIS element and the NEXT element AFTER it.
if the duplicate within a list had an "odd" index - I need to remove THIS element AND one element BEFORE** it.
there might be many appearances of the same duplicate within a list. i.e. one might be with an "odd" index, another - "even".
I'm not asking for a solution since I've created one myself. However after performing this method many times - "ANTS performance profiler" shows that the method elapses 75% of whole execution time (30 seconds out of 40). The question is: Is there a faster method to perform the same operation? I've tried to optimize my current code but it still lacks performance. Here it is:
private void removedoubles(List<int> exclude, List<int> listcopy)
{
for (int j = 0; j < exclude.Count(); j++)
{
for (int i = 0; i < listcopy.Count(); i++)
{
if (listcopy[i] == exclude[j])
{
if (i % 2 == 0) // even
{
//listcopy.RemoveRange(i, i + 1);
listcopy.RemoveAt(i);
listcopy.RemoveAt(i);
i = i - 1;
}
else //odd
{
//listcopy.RemoveRange(i - 1, i);
listcopy.RemoveAt(i - 1);
listcopy.RemoveAt(i - 1);
i = i - 2;
}
}
}
}
}
where:
exclude - list that contains Duplicates only. This list might contain up to 30 elements.
listcopy - list that should be checked for duplicates. If duplicate from "exclude" is found -> perform removing operation. This list might contain up to 2000 elements.
I think that the LINQ might be some help but I don't understand its syntax well.

A faster way (O(n)) would be to do the following:
go through the exclude list and make it into a HashSet (O(n))
in the checks, check if the element being tested is in the set (again O(n)), since test for presence in a HashSet is O(1).
Maybe you can even change your algorithms so that the exclude collection will be a HashSet from the very beginning, this way you can omit step 1 and gain even more speed.
(Your current way is O(n^2).)
Edit:
Another idea is the following: you are perhaps creating a copy of some list and make this method modify it? (Guess based on the parameter name.) Then, you can change it to the following: you pass the original array to the method, and make the method allocate new array and return it (your method signature should be than something like private List<int> getWithoutDoubles(HashSet<int> exclude, List<int> original)).
Edit:
It could be even faster if you would reorganize the input data in the following way: As the items are always removed in pairs (even index + the following odd index), you should pair them in advance! So that your list if ints becomes list of pairs of ints. This way your method might be be something like that:
private List<Tuple<int, int>> getWithoutDoubles(
HashSet<int> exclude, List<Tuple<int, int>> original)
{
return original.Where(xy => (!exclude.Contains(xy.Item1) &&
!exclude.Contains(xy.Item2)))
.ToList();
}
(you remove the pairs where either the first or the second item is in the exclude collection). Instead of Tuple, perhaps you can pack the items into your custom type.

Here is yet another way to get the results.
var a = new List<int> {1, 2, 3, 4, 5};
var b = new List<int> {1, 2, 3};
var c = (from i in a let found = b.Any(j => j == i) where !found select i).ToList();
c will contain 4,5

Reverse your loops so they start at .Count - 1 and go to 0, so you don't have to change i in one of the cases and Count is only evaluated once per collection.

Can you convert the List to LinkedList and have a try? The List.RemoveAt() is more expensive than LinkedList.Remove().

Related

How do you do this in C# without using List?

I am new to C#. The following code was a solution I came up to solve a challenge. I am unsure how to do this without using List since my understanding is that you can't push to an array in C# since they are of fixed size.
Is my understanding of what I said so far correct?
Is there a way to do this that doesn't involve creating a new array every time I need to add to an array? If there is no other way, how would I create a new array when the size of the array is unknown before my loop begins?
Return a sorted array of all non-negative numbers less than the given n which are divisible both by 3 and 4. For n = 30, the output should be
threeAndFour(n) = [0, 12, 24].
int[] threeAndFour(int n) {
List<int> l = new List<int>(){ 0 };
for (int i = 12; i < n; ++i)
if (i % 12 == 0)
l.Add(i);
return l.ToArray();
}
EDIT: I have since refactored this code to be..
int[] threeAndFour(int n) {
List<int> l = new List<int>(){ 0 };
for (int i = 12; i < n; i += 12)
l.Add(i);
return l.ToArray();
}

A. Lists is OK
If you want to use a for to find out the numbers, then List is the appropriate data structure for collecting the numbers as you discover them.
B. Use more maths
static int[] threeAndFour(int n) {
var a = new int[(n / 12) + 1];
for (int i = 12; i < n; i += 12) a[i/12] = i;
return a;
}
C. Generator pattern with IEnumerable<int>
I know that this doesn't return an array, but it does avoid a list.
static IEnumerable<int> threeAndFour(int n) {
yield return 0;
for (int i = 12; i < n; i += 12)
yield return i;
}
D. Twist and turn to avoid a list
The code could for twice. First to figure the size or the array, and then to fill it.
int[] threeAndFour(int n) {
// Version: A list is really undesirable, arrays are great.
int size = 1;
for (int i = 12; i < n; i += 12)
size++;
var a = new int[size];
a[0] = 0;
int counter = 1;
for (int i = 12; i < n; i += 12) a[counter++] = i;
}

if (i % 12 == 0)
So you have figured out that the numbers which divides both 3 and 4 are precisely those numbers that divides 12.
Can you figure out how many such numbers there are below a given n? - Can you do so without counting the numbers - if so there is no need for a dynamically growing container, you can just initialize the container to the correct size.
Once you have your array just keep track of the next index to fill.

You could use Linq and Enumerable.Range method for the purpose. For example,
int[] threeAndFour(int n)
{
return Enumerable.Range(0,n).Where(x=>x%12==0).ToArray();
}
Enumerable.Range generates a sequence of integral numbers within a specified range, which is then filtered on the condition (x%12==0) to retrieve the desired result.

Since you know this goes in steps of 12 and you know how many there are before you start, you can do:
Enumerable.Range(0,n/12+1).Select(x => x*12).ToArray();

I am unsure how to do this without using List since my understanding is that you can't push to an array in C# since they are of fixed size.
It is correct that arrays can not grow. List were invented as a wrapper around a array that automagically grows whenever needed. Note that you can give List a integer via the Constructor, wich will tell it the minimum size it should expect. It will allocate at least that much the first time. This can limit growth related overhead.
And dictionaries are just a variation of the list mechanics, with Hash Table key search speed.
There is only 1 other Collection I know of that can grow. However it is rarely mentioned outside of theory and some very specific cases:
Linked Lists. The linked list has a unbeatable growth performance and the lowest issue of running into OutOfMemory Exceptions due to Fragmentation. Unfortunately, their random access times are the worst as a result. Unless you can process those collections exclusively sequentally from the start (or sometimes the end), their performance will be abysmal. Only stacks and queues are likely to use them. There is however still a implementation you could use in .NET: https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.linkedlist-1
Your code holds some potential too:
for (int i = 12; i < n; ++i)
if (i % 12 == 0)
l.Add(i);
It would way more effective to count up by 12 every itteration - you are only interested in every 12th number after all. You may have to change the loop, but I think a do...while would do. Also the array/minimum List size is easily predicted: Just divide n by 12 and add 1. But I asume that is mostly mock-up code and it is not actually that deterministic.

List generally works pretty well, as I understand your question you have challenged yourself to solve a problem without using the List class. An array (or List) uses a contiguous block of memory to store elements. Arrays are of fixed size. List will dynamically expand to accept new elements but still keeps everything in a single block of memory.
You can use a linked list https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.linkedlist-1?view=netframework-4.8 to produce a simulation of an array. A linked list allocates additional memory for each element (node) that is used to point to the next (and possibly the previous). This allows you to add elements without large block allocations, but you pay a space cost (increased use of memory) for each element added. The other problem with linked lists are you can't quickly access random elements. To get to element 5, you have to go through elements 0 through 4. There's a reason arrays and array like structures are favored for many tasks, but it's always interesting to try to do common things in a different way.

C# why does binarysearch have to be made on sorted arrays and lists?

C# why does binarysearch have to be made on sorted arrays and lists?
Is there any other method that does not require me to sort the list?
It kinda messes with my program in a way that I cannot sort the list for it to work as I want to.

A binary search works by dividing the list of candidates in half using equality. Imagine the following set:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
We can also represent this as a binary tree, to make it easier to visualise:
Source
Now, say we want to find the number 3. We can do it like so:
Is 3 smaller than 8? Yes. OK, now we're looking at everything between 1 and 7.
Is 3 smaller than 4? Yes. OK, now we're looking at everything between 1 and 3.
Is 3 smaller than 2? No. OK, now we're looking at 3.
We found it!
Now, if your list isn't sorted, how will we divide the list in half? The simple answer is: we can't. If we swap 3 and 15 in the example above, it would work like this:
Is 3 smaller than 8? Yes. OK, now we're looking at everything between 1 and 7.
Is 3 smaller than 4? Yes. OK, now we're looking at everything between 1 and 3 (except we swapped it with 15).
Is 3 smaller than 2? No. OK, now we're looking at 15.
Huh? There's no more items to check but we didn't find it. I guess it's not in the list.
The solution is to use an appropriate data type instead. For fast lookups of key/value pairs, I'll use a Dictionary. For fast checks if something already exists, I'll use a HashSet. For general storage I'll use a List or an array.
Dictionary example:
var values = new Dictionary<int, string>();
values[1] = "hello";
values[2] = "goodbye";
var value2 = values[2]; // this lookup will be fast because Dictionaries are internally optimised inside and partition keys' hash codes into buckets.
HashSet example:
var mySet = new HashSet<int>();
mySet.Add(1);
mySet.Add(2);
if (mySet.Contains(2)) // this lookup is fast for the same reason as a dictionary.
{
// do something
}
List exmaple:
var list = new List<int>();
list.Add(1);
list.Add(2);
if (list.Contains(2)) // this isn't fast because it has to visit each item in the list, but it works OK for small sets or places where performance isn't so important
{
}
var idx2 = list.IndexOf(2);
If you have multiple values with the same key, you could store a list in a Dictionary like this:
var values = new Dictionary<int, List<string>>();
if (!values.ContainsKey(key))
{
values[key] = new List<string>();
}
values[key].Add("value1");
values[key].Add("value2");

There is no way you use binary search on unordered collections. Sorting collection is the main concept of the binary search. The key is that on every move u take the middle index between l and r. On first step they are 0 and size - 1, after every step one of them becomes middle index between them. If x > arr[m] then l becomes m + 1, otherwise r becomes m - 1. Basically, on every step you take half of the array you had and, of course, it remains sorted. This code is recursive, if you don't know what recursion is(which is very important in programming), you can review and learn here.
// C# implementation of recursive Binary Search
using System;
class GFG {
// Returns index of x if it is present in
// arr[l..r], else return -1
static int binarySearch(int[] arr, int l,
int r, int x)
{
if (r >= l) {
int mid = l + (r - l) / 2;
// If the element is present at the
// middle itself
if (arr[mid] == x)
return mid;
// If element is smaller than mid, then
// it can only be present in left subarray
if (arr[mid] > x)
return binarySearch(arr, l, mid - 1, x);
// Else the element can only be present
// in right subarray
return binarySearch(arr, mid + 1, r, x);
}
// We reach here when element is not present
// in array
return -1;
}
// Driver method to test above
public static void Main()
{
int[] arr = { 2, 3, 4, 10, 40 };
int n = arr.Length;
int x = 10;
int result = binarySearch(arr, 0, n - 1, x);
if (result == -1)
Console.WriteLine("Element not present");
else
Console.WriteLine("Element found at index "
+ result);
}
}
Output:
Element is present at index 3

Sure there is.
var list = new List<int>();
list.Add(42);
list.Add(1);
list.Add(54);
var index = list.IndexOf(1); //TADA!!!!
EDIT: Ok, I hoped the irony was obvious. But strictly speaking, if your array is not sorted, you are pretty much stuck with the linear search, readily available by means of IndexOf() or IEnumerable.First().

Appropriate way to compare two lists and generate error message indicating any indexes where they are different with corresponding difference?

I have two lists of doubles that I need to compare for equality. There are obviously a million ways to do this, the simplest probably being list1.Equals(list2). However I want to have some sort of error message indicating precisely every list index and value for both lists wherever there is a difference. This error message would hopefully be something like
list1 and list2 are not equal.
list1 has value 0.1 at index 2, list2 has value 0.05 at index 2
etc. etc. for every difference
I also have a Utilities method already called AreEqual that basically just compares the values.
My first thought was evidently to loop through the lists and use AreEqual (I already know the lists are the same length)
for (int index = 0; index < list1.Count; index++)
{
check.AreEqual(list1[index], list2[index]);
}
but this doesn't help much for generating a useful error message unless in the case they're not equal I call some method to generate an error message like this
public string ErrorMessage(List<double> oldList, List<double> newList)
{
// build some error message here by taking the list difference
// and using IndexOf or whatnot
}
This seems super overkill, though. I can think of a million ways to do this but I can't determine what an appropriate way to do it is.
Is looping over the values and calling an error-message generating method reasonable?
Or is using something like
list3 = list1.Except(list2)
and then checking whether or not list3 is empty or not and correspondingly using IndexOf to get the differing values in both lists appropriate?
Or am I losing my mind and there's a much more straightforward way to do this?

You can use following LINQ query:
string sizeMsg = "";
if (list1.Count != list2.Count)
sizeMsg = String.Format("They have a different size, list1.Count:{0} list2.Count:{1}", list1.Count, list2.Count);
int count = Math.Min(list1.Count, list2.Count);
var differences = Enumerable.Range(0, count)
.Select(index => new { index, d1 = list1[index], d2 = list2[index] })
.Where(x => x.d1 != x.d2)
.Select(x => String.Format("list1 has value {0} at index {1}, list2 has value {2} at index {1}"
, x.d1, x.index, x.d2));
string differenceMessage = String.Join(Environment.NewLine, differences);

I think that using Linq here just makes it more complicated, when you can just do something like this:
public static IEnumerable<string> DifferenceErrors(List<double> list1, List<double> list2)
{
// I recommend defining a minimum difference below which you consider the values to be identical:
const double EPSILON = 0.00001;
for (int i = 0; i < list1.Count; ++i)
if (Math.Abs(list1[i] - list2[i]) >= EPSILON)
yield return $"At index {i}, list1 has value {list1[i]} and list2 has value {list2[i]}";
}
If you want to use C# prior to C#6 change the yield to this:
yield return string.Format("At index {0} list1 has value {1} and list2 has value {2}", i, list1[i], list2[i]);

for eq test I will use this and check if list3 is empty
list3 = list1.Except(list2)
if list3 is not empty and values are unique - we can loop thru list three and provide meaning full feedback.
This seems to be the easiest for me.
but using linqPad - had a small test(6 entries are different)
var list1 = new List<double>{1,2,3,4,7,8,9,10,11};
var list2 = new List<double>{1,2,3,5,6,7,8,19,20};
var list3 = list1.Except(list2).Dump();
var list4 = list2.Except(list1).Dump();
IEnumerable (4 items) 4 9 10 11
IEnumerable (4 items) 5 6 19 20
but result gives us only four entries are different.
If you care about order - there is a need for a loop, if not - go with except.

Remove oldest n Items from List using C#

I am working on a dynamic listing of scores which is frequently updated. Ultimately this is used to produce an overall rating, so older entries (based on some parameters, not time) need to be removed to prevent heavy +/- weighting on the overall. It will be adding multiple values at once from a separate enumeration.
List<int> scoreList = new List<int>();
foreach(Item x in Items)
{
scoreList.Add(x.score);
}
//what I need help with:
if(scoreList.Count() > (Items.Count() * 3))
{
//I need to remove the last set (first in, first out) of values size
//Items.Count() from the list
}
If anyone can help it would be much appreciated :) I had to make the code a bit generic because it is written rather cryptically (didn't write the methods).

Use List<T>.RemoveRange - something like this:
// number to remove is the difference between the current length
// and the maximum length you want to allow.
var count = scoreList.Count - (Items.Count() * 3);
if (count > 0) {
// remove that number of items from the start of the list
scoreList.RemoveRange(0, count);
}
You remove from the start of the list, because when you Add items they go to the end - so the oldest are at the start.

Try this
scoreList.RemoveAt(scoreList.Count-1);
And here is the MSDN Article

Instead of using a List<int> I would recommend using a Queue<int>. That will give you the FIFO behavior you're looking for.
See http://msdn.microsoft.com/en-us/library/7977ey2c.aspx for more information on Queues.
Queue<int> scoreList = new Queue<int>();
foreach(Item x in Items)
{
scoreList.Enqueue(x.score);
}
//Or you can eliminate the foreach by doing the following
//Queue<int> scoreList = new Queue<int>(Items.Select(i => i.score).ToList());
//Note that Count is a property for a Queue
while (scoreList.Count > (Items.Count() * 3))
{
scoreList.Dequeue();
}

I didn't understand your question very well, hope if this is what you want.
scoreList.RemoveRange(Items.Count()*3, scoreList.Count()-Items.Count()*3);

A simple way to get last n elements from a list with linq
scoreList.Skip(Math.Max(0, scoreList.Count() - N)).Take(N)

I toyed around and looked at the method suggested above ( scoresList.RemoveAt() ), but it wasn't suited to the situation. What did end up working:
if (...)
{
scoresList.RemoveRange(0, scores.Count);
}
Thanks for the help guys

Determining the first available value in a list of integers

I got a simple List of ints.
List<int> myInts = new List<int>();
myInts.Add(0);
myInts.Add(1);
myInts.Add(4);
myInts.Add(6);
myInts.Add(24);
My goal is to get the first unused (available) value from the List.
(the first positive value that's not already present in the collection)
In this case, the answer would be 2.
Here's my current code :
int GetFirstFreeInt()
{
for (int i = 0; i < int.MaxValue; ++i)
{
if(!myInts.Contains(i))
return i;
}
throw new InvalidOperationException("All integers are already used.");
}
Is there a better way? Maybe using LINQ? How would you do this?
Of course here I used ints for simplicity but my question could apply to any type.

You basically want the first element from the sequence 0..int.MaxValue that is not contained in myInts:
int? firstAvailable = Enumerable.Range(0, int.MaxValue)
.Except(myInts)
.FirstOrDefault();
Edit in response to comment:
There is no performance penalty here to iterate up to int.MaxValue. What Linq is going to to internally is create a hashtable for myInts and then begin iterating over the sequence created by Enumerable.Range() - once the first item not contained in the hashtable is found that integer is yielded by the Except() method and returned by FirstOrDefault() - after which the iteration stops. This means the overall effort is O(n) for creating the hashtable and then worst case O(n) for iterating over the sequence where n is the number of integers in myInts.
For more on Except() see i.e. Jon Skeet's EduLinq series: Reimplementing LINQ to Objects: Part 17 - Except

Well, if the list is ordered from smallest to largest and contains values from 0 to positive infinity, you could simply access the i-th element. if (myInts[i] != i) return i; which would be essentially the same, but doesn't necessitate iterating through the list for each and every Contains check (the Contains method iterates through the list, turning your algorithm into an O(n-squared) rather than O(n)).

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.