C# Similarities of two arrays - c#

There must be an better way to do this, I'm sure...
// Simplified code
var a = new List<int>() { 1, 2, 3, 4, 5, 6 };
var b = new List<int>() { 2, 3, 5, 7, 11 };
var z = new List<int>();
for (int i = 0; i < a.Count; i++)
if (b.Contains(a[i]))
z.Add(a[i]);
// (z) contains all of the numbers that are in BOTH (a) and (b), i.e. { 2, 3, 5 }
I don't mind using the above technique, but I want something fast and efficient (I need to compare very large Lists<> multiple times), and this appears to be neither! Any thoughts?
Edit: As it makes a difference - I'm using .NET 4.0, the initial arrays are already sorted and don't contain duplicates.

You could use IEnumerable.Intersect.
var z = a.Intersect(b);
which will probably be more efficient than your current solution.
note you left out one important piece of information - whether the lists happen to be ordered or not. If they are then a couple of nested loops that pass over each input array exactly once each may be faster - and a little more fun to write.
Edit
In response to your comment on ordering:
first stab at looping - it will need a little tweaking on your behalf but works for your initial data.
int j = 0;
foreach (var i in a)
{
int x = b[j];
while (x < i)
{
if (x == i)
{
z.Add(b[j]);
}
j++;
x = b[j];
}
}
this is where you need to add some unit tests ;)
Edit
final point - it may well be that Linq can use SortedList to perform this intersection very efficiently, if performance is a concern it is worth testing the various solutions. Dont forget to take the sorting into account if you load your data in an un-ordered manner.
One Final Edit because there has been some to and fro on this and people may be using the above without properly debugging it I am posting a later version here:
int j = 0;
int b1 = b[j];
foreach (var a1 in a)
{
while (b1 <= a1)
{
if (b1 == a1)
z1.Add(b[j]);
j++;
if (j >= b.Count)
break;
b1 = b[j];
}
}

There's IEnumerable.Intersect, but since this is an extension method, I doubt it will be very efficient.
If you want efficiency, take one list and turn it into a Set, then go over the second list and see which elements are in the set. Note that I preallocate z, just to make sure you don't suffer from any reallocations.
var set = new HashSet<int>(a);
var z = new List<int>(Math.Min(set.Count, b.Count));
foreach(int i in b)
{
if(set.Contains(i))
a.Add(i);
}
This is guaranteed to run in O(N+M) (N and M being the sizes of the two lists).
Now, you could use set.IntersectWith(b), and I believe it will be just as efficient, but I'm not 100% sure.

The Intersect() method does just that. From MSDN:
Produces the set intersection of two sequences by using the default
equality comparer to compare values.
So in your case:
var z = a.Intersect(b);

Use SortedSet<T> in System.Collections.Generic namespace:
SortedSet<int> a = new SortedSet<int>() { 1, 2, 3, 4, 5, 6 };
SortedSet<int> b = new SortedSet<int>() { 2, 3, 5, 7, 11 };
b.IntersectWith(s2);
But surely you have no duplicates!
Although your second list needs not to be a SortedSet. It can be any collection (IEnumerable<T>), but internally the method act in a way that if the second list also is SortedSet<T>, the operation is an O(n) operation.

If you can use LINQ, you could use the Enumerable.Intersect() extension method.

Related

Most efficient way to remove element of certain value everywhere from List? C#

EDIT: Benchmarks for different techniques published at the bottom of this question.
I have a very large List<int> full of integers. I want to remove every occurrence of "3" from the List<int>. Which technique would be most efficient to do this? I would normally use the .Remove(3) extension until it returns false, but I fear that each call to .Remove(3) internally loops through the entire List<int> unnecessarily.
EDIT: It was recommended in the comments to try
TheList = TheList.Where(x => x != 3).ToList();
but I need to remove the elements without instantiating a new List.
var TheList = new List<int> { 5, 7, 8, 2, 8, 3, 1, 0, 6, 3, 9, 3, 5, 2, 7, 9, 3, 5, 5, 1, 0, 4, 5, 3, 5, 8, 2, 3 };
//technique 1
//this technique has the shortest amount of code,
//but I fear that every time the Remove() method is called,
//the entire list is internally looped over again starting at index 0
while (TheList.Remove(3)) { }
//technique 2
//this technique is an attempt to keep the keep the list from
//being looped over every time an element is removed
for (var i = 0; i < TheList.Count; i++)
{
if (TheList[i] == 3)
{
TheList.RemoveAt(i);
i--;
}
}
Are there any better ways to do this?
Benchmarks
I tested three techniques to remove 10,138 from an array with 100,000 elements: the two shown above, and one recommended by Serg in an answer. These are the results:
'while' loop: 179.6808ms
'for' loop: 65.5099ms
'RemoveAll' predicate: 0.5982ms
Benchmark Code:
var RNG = new Random();
//inclusive min and max random number
Func<int, int, int> RandomInt = delegate (int min, int max) { return RNG.Next(min - 1, max) + 1; };
var TheList = new List<int>();
var ThreeCount = 0;
for (var i = 0; i < 100000; i++)
{
var TheInteger = RandomInt(0, 9);
if (TheInteger == 3) { ThreeCount++; }
TheList.Add(TheInteger);
}
var Technique1List = TheList.ToList();
var Technique2List = TheList.ToList();
var Technique3List = TheList.ToList();
<div style="background-color:aquamarine;color:#000000;">Time to remove #ThreeCount items</div>
//technique 1
var Technique1Stopwatch = Stopwatch.StartNew();
while (Technique1List.Remove(3)) { }
var Technique1Time = Technique1Stopwatch.Elapsed.TotalMilliseconds;
<div style="background-color:#ffffff;color:#000000;">Technique 1: #(Technique1Time)ms ('while' loop)</div>
//technique 2
var Technique2Stopwatch = Stopwatch.StartNew();
for (var i = 0; i < Technique2List.Count; i++)
{
if (Technique2List[i] == 3)
{
Technique2List.RemoveAt(i);
i--;
}
}
var Technique2Time = Technique2Stopwatch.Elapsed.TotalMilliseconds;
<div style="background-color:#ffffff;color:#000000;">Technique 2: #(Technique2Time)ms ('for' loop)</div>
//technique 3
var Technique3Stopwatch = Stopwatch.StartNew();
var RemovedCount = Technique3List.RemoveAll(x => x == 3);
var Technique3Time = Technique3Stopwatch.Elapsed.TotalMilliseconds;
<div style="background-color:#ffffff;color:#000000;">Technique 3: #(Technique3Time)ms ('RemoveAll' predicate)</div>
You can just use List<T>.RemoveAll and pass your predicate - https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.list-1.removeall?view=net-6.0#System_Collections_Generic_List_1_RemoveAll_System_Predicate__0__ . This guaranteed to be linear complexity O(list.Count)
TheList.RemoveAll(x=>x==3);
Additionally, RemoveAll performs some GC-specific things internally, so I think in some cases this may provide some additional performance advantages against the simple hand-made loop implementation (but I'm unsure here).
If you want to do it all yourself, you can check out the implementation of RemoveAll here. Generally, it is just a while loop as in your question.
Additionally, as we can see from GitHub implementation (and as Jon Skeet mentioned in the comment) the remove operation causes the rest of list (all items after the first removed items) to be copied (shifted) on the free space, intorduced by deletion. So, if you have really huge list and/or want to remove something frequently, you may consider to switching to some other data structure, such as linked list.

C# why does binarysearch have to be made on sorted arrays and lists?

C# why does binarysearch have to be made on sorted arrays and lists?
Is there any other method that does not require me to sort the list?
It kinda messes with my program in a way that I cannot sort the list for it to work as I want to.
A binary search works by dividing the list of candidates in half using equality. Imagine the following set:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
We can also represent this as a binary tree, to make it easier to visualise:
Source
Now, say we want to find the number 3. We can do it like so:
Is 3 smaller than 8? Yes. OK, now we're looking at everything between 1 and 7.
Is 3 smaller than 4? Yes. OK, now we're looking at everything between 1 and 3.
Is 3 smaller than 2? No. OK, now we're looking at 3.
We found it!
Now, if your list isn't sorted, how will we divide the list in half? The simple answer is: we can't. If we swap 3 and 15 in the example above, it would work like this:
Is 3 smaller than 8? Yes. OK, now we're looking at everything between 1 and 7.
Is 3 smaller than 4? Yes. OK, now we're looking at everything between 1 and 3 (except we swapped it with 15).
Is 3 smaller than 2? No. OK, now we're looking at 15.
Huh? There's no more items to check but we didn't find it. I guess it's not in the list.
The solution is to use an appropriate data type instead. For fast lookups of key/value pairs, I'll use a Dictionary. For fast checks if something already exists, I'll use a HashSet. For general storage I'll use a List or an array.
Dictionary example:
var values = new Dictionary<int, string>();
values[1] = "hello";
values[2] = "goodbye";
var value2 = values[2]; // this lookup will be fast because Dictionaries are internally optimised inside and partition keys' hash codes into buckets.
HashSet example:
var mySet = new HashSet<int>();
mySet.Add(1);
mySet.Add(2);
if (mySet.Contains(2)) // this lookup is fast for the same reason as a dictionary.
{
// do something
}
List exmaple:
var list = new List<int>();
list.Add(1);
list.Add(2);
if (list.Contains(2)) // this isn't fast because it has to visit each item in the list, but it works OK for small sets or places where performance isn't so important
{
}
var idx2 = list.IndexOf(2);
If you have multiple values with the same key, you could store a list in a Dictionary like this:
var values = new Dictionary<int, List<string>>();
if (!values.ContainsKey(key))
{
values[key] = new List<string>();
}
values[key].Add("value1");
values[key].Add("value2");
There is no way you use binary search on unordered collections. Sorting collection is the main concept of the binary search. The key is that on every move u take the middle index between l and r. On first step they are 0 and size - 1, after every step one of them becomes middle index between them. If x > arr[m] then l becomes m + 1, otherwise r becomes m - 1. Basically, on every step you take half of the array you had and, of course, it remains sorted. This code is recursive, if you don't know what recursion is(which is very important in programming), you can review and learn here.
// C# implementation of recursive Binary Search
using System;
class GFG {
// Returns index of x if it is present in
// arr[l..r], else return -1
static int binarySearch(int[] arr, int l,
int r, int x)
{
if (r >= l) {
int mid = l + (r - l) / 2;
// If the element is present at the
// middle itself
if (arr[mid] == x)
return mid;
// If element is smaller than mid, then
// it can only be present in left subarray
if (arr[mid] > x)
return binarySearch(arr, l, mid - 1, x);
// Else the element can only be present
// in right subarray
return binarySearch(arr, mid + 1, r, x);
}
// We reach here when element is not present
// in array
return -1;
}
// Driver method to test above
public static void Main()
{
int[] arr = { 2, 3, 4, 10, 40 };
int n = arr.Length;
int x = 10;
int result = binarySearch(arr, 0, n - 1, x);
if (result == -1)
Console.WriteLine("Element not present");
else
Console.WriteLine("Element found at index "
+ result);
}
}
Output:
Element is present at index 3
Sure there is.
var list = new List<int>();
list.Add(42);
list.Add(1);
list.Add(54);
var index = list.IndexOf(1); //TADA!!!!
EDIT: Ok, I hoped the irony was obvious. But strictly speaking, if your array is not sorted, you are pretty much stuck with the linear search, readily available by means of IndexOf() or IEnumerable.First().

Find a match without IF statements?

Came across this question / quiz as something that might be asked in an interview. Not seeing how to do this...
You have two arrays full of random numbers and each array has a number that they share. Find the number and output the number. (NOTE: Do not use IF Statements)
Use Intersect. I suppose it is a LINQ question.
What they are examining is your ability to do functional programming (as opposed to procedural). As stated by several answers, you can use LINQ to intersect two lists.
The other answers aren't quite complete; you were also told
there is exactly one common number
you should output this number
In the spirit of quasi-functional programming you ought to do this in one statement without loops or explicit conditionals:
int[] a = { 1, 2, 3 };
int[] b = { 3, 4, 5 };
Console.WriteLine(a.Intersect(b).Single());
This could be more robust, eg
Console.WriteLine(a.Intersect(b).FirstOrDefault());
This won't barf when there are zero or multiple elements in the intersection,
but strictly speaking these fail to report violation of preconditions - there should be exactly one match, anything else should produce an exception.
Well you may want to take a look at Intersect extension method.
A little bit example here:
int[] array1 = { 1, 2, 3 };
int[] array2 = { 3, 4, 5 };
// get the shared number(s)
var intersect = array1.Intersect(array2);
foreach (int val in intersect)
{
Console.WriteLine(val);
}
I had similar experience but I had to use procedural programming to find out if I can think of more than one way to solve the puzzle.
Here is code achievable with while loop:
int[] array1 = { 1, 2, 3 };
int[] array2 = { 3, 4, 5 };
int x = 0;
int y = 0;
while (x < array1.Length)
{
y=0;
while (y < array2.Length)
{
while (array1[x] == array2[y])
{
Console.WriteLine(String.Format("Matching number is {0}", array1[x]));
break;
}
y++;
}
x++;
}
Above code will print all matches. To get only first match you can use goto to get out of these loop.
Best advise, learn if you have any idea what you can expect learn all possible ways to do something. You never can know too much.

C# Collection - Order by an element (Rotate)

I have an IEnumerable<Point> collection. Lets say it contains 5 points (in reality it is more like 2000)
I want to order this collection so that a specifc point in the collection becomes the first element, so it's basically chopping a collection at a specific point and rejoining them together.
So my list of 5 points:
{0,0}, {10,0}, {10,10}, {5,5}, {0,10}
Reordered with respect to element at index 3 would become:
{5,5}, {0,10}, {0,0}, {10,0}, {10,10}
What is the most computationally efficient way of resolving this problem, or is there an inbuilt method that already exists... If so I can't seem to find one!
var list = new[] { 1, 2, 3, 4, 5 };
var rotated = list.Skip(3).Concat(list.Take(3));
// rotated is now {4, 5, 1, 2, 3}
A simple array copy is O(n) in this case, which should be good enough for almost all real-world purposes. However, I will grant you that in certain cases - if this is a part deep inside a multi-level algorithm - this may be relevant. Also, do you simply need to iterate through this collection in an ordered fashion or create a copy?
Linked lists are very easy to reorganize like this, although accessing random elements will be more costly. Overall, the computational efficiency will also depend on how exactly you access this collection of items (and also, what sort of items they are - value types or reference types?).
The standard .NET linked list does not seem to support such manual manipulation but in general, if you have a linked list, you can easily move around sections of the list in the way you describe, just by assigning new "next" and "previous" pointers to the endpoints.
The collection library available here supports this functionality: http://www.itu.dk/research/c5/.
Specifically, you are looking for LinkedList<T>.Slide() the method which you can use on the object returned by LinkedList<T>.View().
Version without enumerating list two times, but higher memory consumption because of the T[]:
public static IEnumerable<T> Rotate<T>(IEnumerable<T> source, int count)
{
int i = 0;
T[] temp = new T[count];
foreach (var item in source)
{
if (i < count)
{
temp[i] = item;
}
else
{
yield return item;
}
i++;
}
foreach (var item in temp)
{
yield return item;
}
}
[Test]
public void TestRotate()
{
var list = new[] { 1, 2, 3, 4, 5 };
var rotated = Rotate(list, 3);
Assert.That(rotated, Is.EqualTo(new[] { 4, 5, 1, 2, 3 }));
}
Note: Add argument checks.
Another alternative to the Linq method shown by ulrichb would be to use the Queue Class (a fifo collection) dequeue to your index, and enqueue the ones you have taken out.
The naive implementation using linq would be:
IEnumerable x = new[] { 1, 2, 3, 4 };
var tail = x.TakeWhile(i => i != 3);
var head = x.SkipWhile(i => i != 3);
var combined = head.Concat(tail); // is now 3, 4, 1, 2
What happens here is that you perform twice the comparisons needed to get to your first element in the combined sequence.
The solution is readable and compact but not very efficient.
The solutions described by the other contributors may be more efficient since they use special data structures as arrays or lists.
You can write a user defined extension of List that does the rotation by using List.Reverse(). I took the basic idea from the C++ Standard Template Library which basically uses Reverse in three steps: Reverse(first, mid) Reverse(mid, last) Reverse(first, last)
As far as I know, this is the most efficient and fastest way. I tested with 1 billion elements and the rotation Rotate(0, 50000, 800000) takes 0.00097 seconds. (By the way: adding 1 billion ints to the List already takes 7.3 seconds)
Here's the extension you can use:
public static class Extensions
{
public static void Rotate(this List<int> me, int first, int mid, int last)
{
//indexes are zero based!
if (first >= mid || mid >= lastIndex)
return;
me.Reverse(first, mid - first + 1);
me.Reverse(mid + 1, last - mid);
me.Reverse(first, last - first + 1);
}
}
The usage is like:
static void Main(string[] args)
{
List<int> iList = new List<int>{0,1,2,3,4,5};
Console.WriteLine("Before rotate:");
foreach (var item in iList)
{
Console.Write(item + " ");
}
Console.WriteLine();
int firstIndex = 0, midIndex = 2, lastIndex = 4;
iList.Rotate(firstIndex, midIndex, lastIndex);
Console.WriteLine($"After rotate {firstIndex}, {midIndex}, {lastIndex}:");
foreach (var item in iList)
{
Console.Write(item + " ");
}
Console.ReadKey();
}

What is the use of the ArraySegment<T> class?

I just came across the ArraySegment<byte> type while subclassing the MessageEncoder class.
I now understand that it's a segment of a given array, takes an offset, is not enumerable, and does not have an indexer, but I still fail to understand its usage. Can someone please explain with an example?
ArraySegment<T> has become a lot more useful in .NET 4.5+ and .NET Core as it now implements:
IList<T>
ICollection<T>
IEnumerable<T>
IEnumerable
IReadOnlyList<T>
IReadOnlyCollection<T>
as opposed to the .NET 4 version which implemented no interfaces whatsoever.
The class is now able to take part in the wonderful world of LINQ so we can do the usual LINQ things like query the contents, reverse the contents without affecting the original array, get the first item, and so on:
var array = new byte[] { 5, 8, 9, 20, 70, 44, 2, 4 };
array.Dump();
var segment = new ArraySegment<byte>(array, 2, 3);
segment.Dump(); // output: 9, 20, 70
segment.Reverse().Dump(); // output 70, 20, 9
segment.Any(s => s == 99).Dump(); // output false
segment.First().Dump(); // output 9
array.Dump(); // no change
It is a puny little soldier struct that does nothing but keep a reference to an array and stores an index range. A little dangerous, beware that it does not make a copy of the array data and does not in any way make the array immutable or express the need for immutability. The more typical programming pattern is to just keep or pass the array and a length variable or parameter, like it is done in the .NET BeginRead() methods, String.SubString(), Encoding.GetString(), etc, etc.
It does not get much use inside the .NET Framework, except for what seems like one particular Microsoft programmer that worked on web sockets and WCF liking it. Which is probably the proper guidance, if you like it then use it. It did do a peek-a-boo in .NET 4.6, the added MemoryStream.TryGetBuffer() method uses it. Preferred over having two out arguments I assume.
In general, the more universal notion of slices is high on the wishlist of principal .NET engineers like Mads Torgersen and Stephen Toub. The latter kicked off the array[:] syntax proposal a while ago, you can see what they've been thinking about in this Roslyn page. I'd assume that getting CLR support is what this ultimately hinges on. This is actively being thought about for C# version 7 afaik, keep your eye on System.Slices.
Update: dead link, this shipped in version 7.2 as Span.
Update2: more support in C# version 8.0 with Range and Index types and a Slice() method.
Buffer partioning for IO classes - Use the same buffer for simultaneous
read and write operations and have a
single structure you can pass around
the describes your entire operation.
Set Functions - Mathematically speaking you can represent any
contiguous subsets using this new
structure. That basically means you
can create partitions of the array,
but you can't represent say all odds
and all evens. Note that the phone
teaser proposed by The1 could have
been elegantly solved using
ArraySegment partitioning and a tree
structure. The final numbers could
have been written out by traversing
the tree depth first. This would have
been an ideal scenario in terms of
memory and speed I believe.
Multithreading - You can now spawn multiple threads to operate over the
same data source while using segmented
arrays as the control gate. Loops
that use discrete calculations can now
be farmed out quite easily, something
that the latest C++ compilers are
starting to do as a code optimization
step.
UI Segmentation - Constrain your UI displays using segmented
structures. You can now store
structures representing pages of data
that can quickly be applied to the
display functions. Single contiguous
arrays can be used in order to display
discrete views, or even hierarchical
structures such as the nodes in a
TreeView by segmenting a linear data
store into node collection segments.
In this example, we look at how you can use the original array, the Offset and Count properties, and also how you can loop through the elements specified in the ArraySegment.
using System;
class Program
{
static void Main()
{
// Create an ArraySegment from this array.
int[] array = { 10, 20, 30 };
ArraySegment<int> segment = new ArraySegment<int>(array, 1, 2);
// Write the array.
Console.WriteLine("-- Array --");
int[] original = segment.Array;
foreach (int value in original)
{
Console.WriteLine(value);
}
// Write the offset.
Console.WriteLine("-- Offset --");
Console.WriteLine(segment.Offset);
// Write the count.
Console.WriteLine("-- Count --");
Console.WriteLine(segment.Count);
// Write the elements in the range specified in the ArraySegment.
Console.WriteLine("-- Range --");
for (int i = segment.Offset; i < segment.Count+segment.Offset; i++)
{
Console.WriteLine(segment.Array[i]);
}
}
}
ArraySegment Structure - what were they thinking?
What's about a wrapper class? Just to avoid copy data to temporal buffers.
public class SubArray<T> {
private ArraySegment<T> segment;
public SubArray(T[] array, int offset, int count) {
segment = new ArraySegment<T>(array, offset, count);
}
public int Count {
get { return segment.Count; }
}
public T this[int index] {
get {
return segment.Array[segment.Offset + index];
}
}
public T[] ToArray() {
T[] temp = new T[segment.Count];
Array.Copy(segment.Array, segment.Offset, temp, 0, segment.Count);
return temp;
}
public IEnumerator<T> GetEnumerator() {
for (int i = segment.Offset; i < segment.Offset + segment.Count; i++) {
yield return segment.Array[i];
}
}
} //end of the class
Example:
byte[] pp = new byte[] { 1, 2, 3, 4 };
SubArray<byte> sa = new SubArray<byte>(pp, 2, 2);
Console.WriteLine(sa[0]);
Console.WriteLine(sa[1]);
//Console.WriteLine(b[2]); exception
Console.WriteLine();
foreach (byte b in sa) {
Console.WriteLine(b);
}
Ouput:
3
4
3
4
The ArraySegment is MUCH more useful than you might think. Try running the following unit test and prepare to be amazed!
[TestMethod]
public void ArraySegmentMagic()
{
var arr = new[] {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
var arrSegs = new ArraySegment<int>[3];
arrSegs[0] = new ArraySegment<int>(arr, 0, 3);
arrSegs[1] = new ArraySegment<int>(arr, 3, 3);
arrSegs[2] = new ArraySegment<int>(arr, 6, 3);
for (var i = 0; i < 3; i++)
{
var seg = arrSegs[i] as IList<int>;
Console.Write(seg.GetType().Name.Substring(0, 12) + i);
Console.Write(" {");
for (var j = 0; j < seg.Count; j++)
{
Console.Write("{0},", seg[j]);
}
Console.WriteLine("}");
}
}
You see, all you have to do is cast an ArraySegment to IList and it will do all of the things you probably expected it to do in the first place. Notice that the type is still ArraySegment, even though it is behaving like a normal list.
OUTPUT:
ArraySegment0 {0,1,2,}
ArraySegment1 {3,4,5,}
ArraySegment2 {6,7,8,}
In simple words: it keeps reference to an array, allowing you to have multiple references to a single array variable, each one with a different range.
In fact it helps you to use and pass sections of an array in a more structured way, instead of having multiple variables, for holding start index and length. Also it provides collection interfaces to work more easily with array sections.
For example the following two code examples do the same thing, one with ArraySegment and one without:
byte[] arr1 = new byte[] { 1, 2, 3, 4, 5, 6 };
ArraySegment<byte> seg1 = new ArraySegment<byte>(arr1, 2, 2);
MessageBox.Show((seg1 as IList<byte>)[0].ToString());
and,
byte[] arr1 = new byte[] { 1, 2, 3, 4, 5, 6 };
int offset = 2;
int length = 2;
byte[] arr2 = arr1;
MessageBox.Show(arr2[offset + 0].ToString());
Obviously first code snippet is more preferred, specially when you want to pass array segments to a function.

Categories