Optimizing this C# algorithm (K Difference) - c#

This is the problem I'm solving (it's a sample problem, not a real problem):
Given N numbers , [N<=10^5] we need to count the total pairs of
numbers that have a difference of K. [K>0 and K<1e9]
Input Format: 1st line contains N & K (integers). 2nd line contains N
numbers of the set. All the N numbers are assured to be distinct.
Output Format: One integer saying the no of pairs of numbers that have
a diff K.
Sample Input #00:
5 2
1 5 3 4 2
Sample Output #00:
3
Sample Input #01:
10 1
363374326 364147530 61825163 1073065718 1281246024 1399469912 428047635 491595254 879792181 1069262793
Sample Output #01:
0
I already have a solution (and I haven't been able to optimize it as well as I had hoped). Currently my solution gets a score of 12/15 when it is run, and I'm wondering why I can't get 15/15 (my solution to another problem wasn't nearly as efficient, but got all of the points). Apparently, the code is run using "Mono 2.10.1, C# 4".
So can anyone think of a better way to optimize this further? The VS profiler says to avoid calling String.Split and Int32.Parse. The calls to Int32.Parse can't be avoided, although I guess I could optimize tokenizing the array.
My current solution:
using System;
using System.Collections.Generic;
using System.Text;
using System.Linq;
namespace KDifference
{
class Solution
{
static void Main(string[] args)
{
char[] space = { ' ' };
string[] NK = Console.ReadLine().Split(space);
int N = Int32.Parse(NK[0]), K = Int32.Parse(NK[1]);
int[] nums = Console.ReadLine().Split(space, N).Select(x => Int32.Parse(x)).OrderBy(x => x).ToArray();
int KHits = 0;
for (int i = nums.Length - 1, j, k; i >= 1; i--)
{
for (j = 0; j < i; j++)
{
k = nums[i] - nums[j];
if (k == K)
{
KHits++;
}
else if (k < K)
{
break;
}
}
}
Console.Write(KHits);
}
}
}

Your algorithm is still O(n^2), even with the sorting and the early-out. And even if you eliminated the O(n^2) bit, the sort is still O(n lg n). You can use an O(n) algorithm to solve this problem. Here's one way to do it:
Suppose the set you have is S1 = { 1, 7, 4, 6, 3 } and the difference is 2.
Construct the set S2 = { 1 + 2, 7 + 2, 4 + 2, 6 + 2, 3 + 2 } = { 3, 9, 6, 8, 5 }.
The answer you seek is the cardinality of the intersection of S1 and S2. The intersection is {6, 3}, which has two elements, so the answer is 2.
You can implement this solution in a single line of code, provided that you have sequence of integers sequence, and integer difference:
int result = sequence.Intersect(from item in sequence select item + difference).Count();
The Intersect method will build an efficient hash table for you that is O(n) to determine the intersection.

Try this (note, untested):
Sort the array
Start two indexes at 0
If difference between the numbers at those two positions is equal to K, increase count, and increase one of the two indexes (if numbers aren't duplicated, increase both)
If difference is larger than K, increase index #1
If difference is less than K, increase index #2, if that would place it outside the array, you're done
Otherwise, go back to 3 and keep going
Basically, try to keep the two indexes apart by K value difference.
You should write up a series of unit-tests for your algorithm, and try to come up with edge cases.

This would allow you to do it in a single pass. Using hash sets is beneficial if there are many values to parse/check. You might also want to use a bloom filter in combination with hash sets to reduce lookups.
Initialize. Let A and B be two empty hash sets. Let c be zero.
Parse loop. Parse the next value v. If there are no more values the algorithm is done and the result is in c.
Back check. If v exists in A then increment c and jump back to 2.
Low match. If v - K > 0 then:
insert v - K into A
if v - K exists in B then increment c (and optionally remove v - K from B).
High match. If v + K < 1e9 then:
insert v + K into A
if v + K exists in B then increment c (and optionally remove v + K from B).
Remember. Insert v into B.
Jump back to 2.

// php solution for this k difference
function getEqualSumSubstring($l,$s) {
$s = str_replace(' ','',$s);
$l = str_replace(' ','',$l);
for($i=0;$i<strlen($s);$i++)
{
$array1[] = $s[$i];
}
for($i=0;$i<strlen($s);$i++)
{
$array2[] = $s[$i] + $l[1];
}
return count(array_intersect($array1,$array2));
}
echo getEqualSumSubstring("5 2","1 3 5 4 2");

Actually that's trivially to solve with a hashmap:
First put each number into a hashmap: dict((x, x) for x in numbers) in "pythony" pseudo code ;)
Now you just iterate through every number in the hashmap and check if number + K is in the hashmap. If yes, increase count by one.
The obvious improvement to the naive solution is to ONLY check for the higher (or lower) bound, otherwise you get the double results and have to divide by 2 afterwards - useless.
This is O(N) for creating the hashmap when reading the values in and O(N) when iterating through, i.e. O(N) and about 8loc in python (and it is correct, I just solved it ;-) )

Following Eric's answer, paste the implementation of Interscet method below, it is O(n):
private static IEnumerable<TSource> IntersectIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
Set<TSource> set = new Set<TSource>(comparer);
foreach (TSource current in second)
{
set.Add(current);
}
foreach (TSource current2 in first)
{
if (set.Remove(current2))
{
yield return current2;
}
}
yield break;
}

Related

C# why does binarysearch have to be made on sorted arrays and lists?

C# why does binarysearch have to be made on sorted arrays and lists?
Is there any other method that does not require me to sort the list?
It kinda messes with my program in a way that I cannot sort the list for it to work as I want to.
A binary search works by dividing the list of candidates in half using equality. Imagine the following set:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
We can also represent this as a binary tree, to make it easier to visualise:
Source
Now, say we want to find the number 3. We can do it like so:
Is 3 smaller than 8? Yes. OK, now we're looking at everything between 1 and 7.
Is 3 smaller than 4? Yes. OK, now we're looking at everything between 1 and 3.
Is 3 smaller than 2? No. OK, now we're looking at 3.
We found it!
Now, if your list isn't sorted, how will we divide the list in half? The simple answer is: we can't. If we swap 3 and 15 in the example above, it would work like this:
Is 3 smaller than 8? Yes. OK, now we're looking at everything between 1 and 7.
Is 3 smaller than 4? Yes. OK, now we're looking at everything between 1 and 3 (except we swapped it with 15).
Is 3 smaller than 2? No. OK, now we're looking at 15.
Huh? There's no more items to check but we didn't find it. I guess it's not in the list.
The solution is to use an appropriate data type instead. For fast lookups of key/value pairs, I'll use a Dictionary. For fast checks if something already exists, I'll use a HashSet. For general storage I'll use a List or an array.
Dictionary example:
var values = new Dictionary<int, string>();
values[1] = "hello";
values[2] = "goodbye";
var value2 = values[2]; // this lookup will be fast because Dictionaries are internally optimised inside and partition keys' hash codes into buckets.
HashSet example:
var mySet = new HashSet<int>();
mySet.Add(1);
mySet.Add(2);
if (mySet.Contains(2)) // this lookup is fast for the same reason as a dictionary.
{
// do something
}
List exmaple:
var list = new List<int>();
list.Add(1);
list.Add(2);
if (list.Contains(2)) // this isn't fast because it has to visit each item in the list, but it works OK for small sets or places where performance isn't so important
{
}
var idx2 = list.IndexOf(2);
If you have multiple values with the same key, you could store a list in a Dictionary like this:
var values = new Dictionary<int, List<string>>();
if (!values.ContainsKey(key))
{
values[key] = new List<string>();
}
values[key].Add("value1");
values[key].Add("value2");
There is no way you use binary search on unordered collections. Sorting collection is the main concept of the binary search. The key is that on every move u take the middle index between l and r. On first step they are 0 and size - 1, after every step one of them becomes middle index between them. If x > arr[m] then l becomes m + 1, otherwise r becomes m - 1. Basically, on every step you take half of the array you had and, of course, it remains sorted. This code is recursive, if you don't know what recursion is(which is very important in programming), you can review and learn here.
// C# implementation of recursive Binary Search
using System;
class GFG {
// Returns index of x if it is present in
// arr[l..r], else return -1
static int binarySearch(int[] arr, int l,
int r, int x)
{
if (r >= l) {
int mid = l + (r - l) / 2;
// If the element is present at the
// middle itself
if (arr[mid] == x)
return mid;
// If element is smaller than mid, then
// it can only be present in left subarray
if (arr[mid] > x)
return binarySearch(arr, l, mid - 1, x);
// Else the element can only be present
// in right subarray
return binarySearch(arr, mid + 1, r, x);
}
// We reach here when element is not present
// in array
return -1;
}
// Driver method to test above
public static void Main()
{
int[] arr = { 2, 3, 4, 10, 40 };
int n = arr.Length;
int x = 10;
int result = binarySearch(arr, 0, n - 1, x);
if (result == -1)
Console.WriteLine("Element not present");
else
Console.WriteLine("Element found at index "
+ result);
}
}
Output:
Element is present at index 3
Sure there is.
var list = new List<int>();
list.Add(42);
list.Add(1);
list.Add(54);
var index = list.IndexOf(1); //TADA!!!!
EDIT: Ok, I hoped the irony was obvious. But strictly speaking, if your array is not sorted, you are pretty much stuck with the linear search, readily available by means of IndexOf() or IEnumerable.First().

Given an array of integers, how can I find all common multiples up to a maximum number?

This is my first question on this site. I am practicing on a problem on Hackerrank that asks to find numbers "Between two Sets". Given two arrays of integers, I must find the number(s) that fit the following two criteria:
1) The elements in the first array must all be factors of the number(s)
2) The number(s) must factor into all elements of the second array
I know that I need to find all common multiples of every element in the first array, but those multiples need to be less than or equal to the minimum value of the second array. I first sort the first array then find all the multiples of ONLY the largest number in the first array (again, up to a max of the second array's minimum) and store those multiples in a list. Then, I move on to the second largest element in the first array and test it against the array of existing multiples. All elements in the list of existing multiples that isn't also a multiple of the second largest element of the first array is removed. I then test the third largest value of the first array, all the way to the minimum value. The list of existing multiples should be getting trimmed as I iterate through the first array in descending order. I've written a solution which passes only 5 out of the 9 test cases on the site, see code below. My task was to edit the getTotalX function and I created the getCommonMultiples function myself as a helper. I did not create nor edit the main function. I am not sure why I am not passing the other 4 test cases as I can't see what any of the test cases are.
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
class Solution {
/*
* Complete the getTotalX function below.
*/
static int getTotalX(int[] a, int[] b) {
//get minimum value of second array
int b_min = b.Min();
//create List to hold multiples
List<int> multiples = getCommonMultiples(a, b_min);
//create List to hold number of ints which are in solution
List<int> solutions = new List<int>();
foreach(int x in multiples)
{
foreach(int y in b)
{
if (y % x == 0 && !solutions.Contains(x))
{
solutions.Add(x);
}
else
{
break;
}
}
}
return solutions.Count;
}
static List<int> getCommonMultiples(int[] array, int max)
{
//make sure array is sorted
Array.Sort(array);
int x = array.Length - 1; //x will be the last # in array -- the max
int y = 1;
//find all multiples of largest number first and store in a list
int z = array[x] * y;
List<int> commonMultiples = new List<int>();
while(z <= max)
{
commonMultiples.Add(z);
y++;
z = array[x] * y;
}
//all multiples of largest number are now added to the list
//go through the smaller numbers in query array
//only keep elements in list if they are also multiples of smaller
//numbers
int xx = array.Length - 2;
for(int a = array[xx]; xx >= 0; xx--)
{
foreach(int b in commonMultiples.ToList())
{
if (b % a != 0)
{
commonMultiples.Remove(b);
}
else
{
continue;
}
}
}
return commonMultiples;
}
static void Main(string[] args) {
TextWriter tw = new StreamWriter(#System.Environment.GetEnvironmentVariable("OUTPUT_PATH"), true);
string[] nm = Console.ReadLine().Split(' ');
int n = Convert.ToInt32(nm[0]);
int m = Convert.ToInt32(nm[1]);
int[] a = Array.ConvertAll(Console.ReadLine().Split(' '), aTemp => Convert.ToInt32(aTemp))
;
int[] b = Array.ConvertAll(Console.ReadLine().Split(' '), bTemp => Convert.ToInt32(bTemp))
;
int total = getTotalX(a, b);
tw.WriteLine(total);
tw.Flush();
tw.Close();
}
}
Again, I can't see the test cases so I do not know what exactly the issue is. I went through the code line by line and can't find any OutOfBoundExceptions or things of that sort so it has to be a logic issue. Thanks for the help!
A typical sample involves 3 lines of input. The first line has 2 integers which gives the length of the first array and the second array, respectively. The second line will give the integers in the first array. The third line will give the integers in the second array. The output needs to be the total number of integers "in between" the two arrays. It will looks like this:
Sample Input
2 3
2 4
16 32 96
Sample Output
3
Explanation: 2 and 4 divide evenly into 4, 8, 12 and 16.
4, 8 and 16 divide evenly into 16, 32, 96.
4, 8 and 16 are the only three numbers for which each element of the first array is a factor and each is a factor of all elements of the second array.
I see two issues with the code you posted.
Firstly, as #Hans Kesting pointed out, a = array[xx] is not being updated each time in the for loop. Since the variable a is only used in one spot, I recommend just replacing that use with array[xx] and be done with it as follows:
for(int xx = array.Length - 2; xx >= 0; xx--)
{
foreach(int b in commonMultiples.ToList())
{
if (b % array[xx] != 0)
{
commonMultiples.Remove(b);
For your understanding of for loops: to properly increment a each time you'd write the for loop like this:
for(int xx = array.Length - 2, a = array[xx]; xx >= 0; xx--, a = array[xx])
The first part of the for loop (up until ;) is the initialization stage which is only called before the entering the loop the first time. The second part is the while condition that is checked before each time through loop (including the first) and if at any time it evaluates to false, the loop is broken (stopped). The third part is the increment stage that is called only after each successful loop.
Because of that in order to keep a up to date in the for loop head, it must appear twice.
Secondly, your solutions in getTotalX is additive, meaning that each multiple that works for each value in array b is added as a solution even if it doesn't fit the other values in b. To get it to work the way that you want, we have to use a Remove loop, rather than an Add loop.
List<int> multiples = getCommonMultiples(a, b_min);
//create List to hold number of ints which are in solution
List<int> solutions = multiples.ToList();
foreach(int x in multiples)
{
foreach(int y in b)
{
if (y % x != 0)
{
solutions.Remove(x);
break;
}
}
}
You could also use LINQ to perform an additive solution where it takes into account All members of b:
//create List to hold number of ints which are in solution
List<int> solutions = multiples.Where((x) => b.All((y) => y % x == 0)).ToList();

How to find all subset which is the summation of its elements equals to a constant number?

I need to find all number subset to get a number N by summing its elements. I don't know how to get through this type of combination problem. In this combination, order matters for different numbers.
example for the number N=4
1 + 1 + 1 + 1
2 + 1 + 1
1 + 2 + 1
1 + 1 + 2
2 + 2
3 + 1
1 + 3
Zeros are not important for me. So how can I get such number sets as an array for an exact number?
What you're looking for are called integer compositions, or ordered partitions.
Compositions can be generated recursively (in lexicographic order, if I'm not mistaken) as follows:
public static IEnumerable<List<int>> Compositions(int n)
{
if (n < 0)
throw new ArgumentOutOfRangeException(nameof(n));
return GenerateCompositions(n, new List<int>());
}
private static IEnumerable<List<int>> GenerateCompositions(int n, List<int> comp)
{
if (n == 0)
{
yield return new List<int>(comp); // important: must make a copy here
}
else
{
for (int k = 1; k <= n; k++)
{
comp.Add(k);
foreach (var c in GenerateCompositions(n - k, comp))
yield return c;
comp.RemoveAt(comp.Count - 1);
}
}
}
Not tested! This was transcribed from a Python implementation. If anyone would like to make corrections or update the code with more idiomatic C#, feel free.
Also, as #aah noted, the number of compositions of n is 2^(n-1), so this becomes unwieldy even for modest n.
If order doesn't matter, there are simply 2^(N-1) possibilities. (Your example doesn't have 2 + 2 or 4)
You can then represent any sequence by its binary representation. To generate, imagine N 1's in a row, so there are N-1 "spaces" between them. Choosing any subset of spaces, you merge any 1's that are adjacent via a chosen space. You can verify this is 1-1 to all possible sets by expanding any such sequence and inserting these spaces.

How to make this function process in constant time?

I need to find the n-th term of this infinite series: 1,2,2,3,3,3,4,4,4,4...
Can you give me a constant time function for this task?
int i = 1;
while(true)
{
if(i = n)
//do things and exit the loop
i++;
}
I think this isn`t going to be a constant time function...
Edit
After reading more comments, it appears I misunderstood the question.
If you want to find the item at nth position an array in constant time, then the answer is trivial: x[n], because array access is constant time. However, if for some reason you were using some container where access time is not constant (e.g. linked list), or did not want to look up value in the array, you'd have to use the arithmetic series formulas to find the answer.
Arithmetic series tells us that the position n of the ith unique item would be
n = i * (i - 1) / 2
So we just need to solve for i. Using quadratic formula, and discarding the nonsensical negative option, we get:
i = Math.Floor( (1 + Math.Sqrt(1 + 8 * n)) / 2)
Original Response
I'm assuming you're looking for the position of the nth unique term, because otherwise the problem is trivial.
Sounds like the first occurrence of the nth unique term should follow arithmetic series. I.e. the position of nth unique term would be:
n * (n - 1) / 2
Given my understanding of the problem, this is more of a math problem than a programming one.
If the problem is:
Given an infinite series that consists of 1 copy of 1, 2 copies of 2, 3 copies of 3... n copies of n, what is the kth value in this series?
Now the first clue when approaching this problem is that there are 1 + 2 + 3... + n values before the first occurance of n + 1. Specifically there are (sum of the first n numbers) values before n+1, or (n)(n-1)/2.
Now set (n)(n-1)/2 = k. Multiply out and rationalize to n^2 - n - 2k = 0. Solve using quadratic equation, you get n = (1 + sqrt(1+8k))/2. The floor of this gives you how many full copies of n there are before, and happily, given zero based indexing, the floor gives you the value at the kth point in the array.
That means your final answer in c# is
return (int) Math.Floor((1 + Math.Sqrt(1 + 8 * k)) / 2);
Given non zero based indexing,
return (int) Math.Floor((1 + Math.Sqrt(-7 + 8 * k)) / 2);
public static long Foo(long index)
{
if (index < 0)
{
throw new IndexOutOfRangeException();
}
long nowNum = 0;
long nowIndex = 0;
do
{
nowIndex += nowNum;
nowNum++;
} while (nowIndex < index);
return nowNum;
}

Bubble sort worst case example is O(n*n), how?

I am trying Bubble sort. There are 5 elements and array is unsorted. Worst case for bubble sort shuold be O(n^2).
As an exmaple I am using
A = {5, 4, 3, 2, 1}
In this case the comparison should be 5^2 = 25.
Using manual verification and code, I am getting comparison count to be 20.
Following is the bubble sort implemenation code
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace SortingAlgo
{
class Program
{
public static int[] bubbleSort(int[] A)
{
bool sorted = false;
int temp;
int count = 0;
int j = 0;
while (!sorted)
{
j++;
sorted = true;
for (int i = 0; i < (A.Length - 1); i++)
{
count++;
if(A[i] > A[i+1])
{
temp = A[i];
A[i] = A[i+1];
A[i+1] = temp;
sorted = false;
}
Console.Write(count + ". -> ");
for(int k=0; k< A.Length; k++)
{
Console.Write(A[k]);
}
Console.Write("\n");
}
}
return A;
}
static void Main(string[] args)
{
int[] A = {5, 4, 3, 2, 1};
int[] B = bubbleSort(A);
Console.ReadKey();
}
}
}
Output is following
-> 45321
-> 43521
-> 43251
-> 43215
-> 34215
-> 32415
-> 32145
-> 32145
-> 23145
-> 21345
-> 21345
-> 21345
-> 12345
-> 12345
-> 12345
-> 12345
-> 12345
-> 12345
-> 12345
-> 12345
Any idea why the maths its not coming out to be 25?
Big-O notation doesn't tell you anything about how many iterations (or how long) an algorithm will take. It is an indication of the growth rate of a function as the number of elements increases (usually towards infinity).
So, in your case, O(n2) simply means that the bubble sort's computational resources grows by the square as the number of elements. So, if you have twice as many elements, you can expect it to take (worst case) 4-times as long (as an upper bound). If you have 4-times as many elements, the complexity increases by a factor of 16. Etc.
For an algorithm with O(n2) complexity, five elements could take 25 iterations, or 25,000 iterations. There's no way to tell without analyzing the algorithm. In the same vein, a function with O(1) complexity (constant time) could take 0.000001 seconds to execute or two weeks to execute.
If an algorithm takes n^2 - n operations, that's still simplified to O(n^2). Big-O notation is only an approximation of how the algorithm scales, not an exact measurement of how many operations it will need for a specific input.
Consider: Your example, bubble-sorting 5 elements, takes 5x4 = 20 comparisons. That generalizes to bubble-sorting N elements takes N x (N-1) = N^2 - N comparisons, and N^2 very quickly gets a LOT bigger than N. That's where O(N^2) comes from. (For example, for 20 elements, you are looking at 380 comparisons.)
Bubble sort is a specific case, and its full complexity is (n*(n-1)) - which gives you the correct number: 5 elements leads to 5*(5-1) operations, which is 20, and is what you found in the worst case.
The simplified Big O notation, however, removes the constants and the least significantly growing terms, and just gives O(n^2). This makes it easy to compare it to other implementations and algorithms which may not have exactly (n*(n-1)), but when simplified show how the work increases with greater input.
It's much easier to compare the Big O notation, and for large datasets the constants and lesser terms are negligible.
Remember that O(N^2) is simplified from the actual expression of C * N(2); that is, there is a bounded constant. For bubble sort, for example, C would be roughly 1/2 (not exactly, but close).
Your comparison count is off too, I think, it should be 10 pairwise comparisons. But I guess you could consider swapping of elements to be another. Either way, all that does is change the constant, not the more important part.
for (int i=4; i>0; i--) {
for (int j=0; j<i;j++) {
if (A[j]>A[j+1]){
swapValues(A[j],A[j+1]);
................
Comparison count for 5 (0:4) elements should be 10.
i=4 - {(j[0] j[1]) (j[1] j[2]) (j[2] j[3]) (j[3] j[4])} - 4 comparisons
i=3 - {(j[0] j[1]) (j[1] j[2]) (j[2] j[3])} - 3 comparisons
i=2 - {(j[0] j[1]) (j[1] j[2])} - 2 comparisons
i=1 - {(j[0] j[1])} - 1 comparison

Categories