C# - Help optimizing a loop

C# - Help optimizing a loop - c#

I have a piece of code that in principal looks like the below. The issue is that I am triggering this code 10's of thousands of times and need it to be more optimized. Any suggestions would be welcome.
//This array is in reality enormous and needs to be triggered loads of times in my code
int[] someArray = { 1, 631, 632, 800, 801, 1600, 1601, 2211, 2212, 2601, 2602 };
//I need to know where in the array a certain value is located
//806 is located between entry 801 and 1600 so I want the array ID of 801 to be returned (4).
id = 806
//Since my arrays are very large, this operation takes far too long
for (int i = 0; i < someArrayLenght; i++)
{
if (someArray[i] <= id)
return i;
}
Edit: Sorry got the condition wrong. It should return the id when 806 is greater than 801. Hope you can make sense ot ouf it.

The array values look sorted. If that’s indeed the case, use binary search:
int result = Array.BinarySearch(someArray, id);
return result < 0 ? (~result - 1) : result;
If the searched value does not appear in the array, Array.BinarySearch will return the bitwise complement of the next greater value’s index. This is why I am testing for negative numbers and using the bitwise complement operator in the code above. The result should then be the same as in your code.
Binary search has logarithmic running time instead of linear. That is, in the worst case only log2 n many entries have to be searched instead of n (where n is the array’s size).

Providing someArray's content is sorted, use binary search — see also Array.BinarySearch.
Note: In your example the condition in if (someArray[i] <= id) return i; will trigger whenever id >= 1. I doubt that's what you want to do.

Related

Using one array to populate a corresponding second array in C#

I am taking a class in c# programming and cannot figure out how to do the following. I have spent hours researching and nobody else seems to have the same issues.
The question is:
Write a function named, evenOrOdd, that will accept three parameters. The first parameter is the integer array used in the above function. The second parameter is the string array from step 2 and the third parameter is an integer indicating the size of the two arrays. This function will perform the following tasks:
This function will loop through the first array, checking each value to see if it is even or odd.
For each item in the integer array, the function will then place the appropriate value, “even” or “odd”, in the corresponding position of the string array.
Hint: Using the modulus operator, (also called the modulo), to divide the number by 2 will result in either a remainder of 0 or 1. A remainder of 0 indicates an even number and a remainder of 1 indicates an odd number. The modulus operator for all of the languages in this class is %.
After calling both functions, print the maximum number determined by findMax as well as the array index position of the largest number. Next, the program will loop through the two arrays and print the integer from integer array followed by the corresponding “even” or “odd” value from the string array
I do not understand how to populate the second string array with "even" or "odd".
The two arrays should be something like this:
array1 [1,2,3,4,5,6,7,8,9,10]
then run through a loop to determine if the values are even or odd and then assign the values to the second array so it is something like this:
array2 [odd,even,odd,even,odd,even,odd,even,odd,even]
I am confused how I "link" these two arrays together so that it know index of array 1=index of array 2.

You don't have to "link" the arrays together. You can use a variable which contains the current index and use it for both arrays. Like this:
for (int i = 0; i < array1.Length; i++){
//array1[i]...
//array2[i] = ...
}
This way, you can check if the number at index i in array 1 is even or odd and then modify the index i of array 2 accordingly.
Instead of array1.Length, you can also use the third argument of the method.

It looks like a code challenge. I highly recommend you find your own way to understand and solve this kind of problem and the fundamental concepts behind it as well.
One way to code the EvenOrOdd method:
public void EvenOrOdd(int[] numbers,
string[] natures, int size)
{
for(int i=0; i < size; i++)
if(numbers[i] % 2 == 0)
natures[i] = "even";
else
natures[i] = "odd";
}
One wat to code FindMax Function:
public static (int MaxValue, int MaxIndex) FindMax(int[] numbers)
{
int major = int.MinValue;
int majorIndex = -1;
for(int i=0; i < numbers.Length; i++)
if(numbers[i] > major)
{
major = numbers[i];
majorIndex = i;
}
return (major, majorIndex);
}
Check it in dotnetFiddle

How do you do this in C# without using List?

I am new to C#. The following code was a solution I came up to solve a challenge. I am unsure how to do this without using List since my understanding is that you can't push to an array in C# since they are of fixed size.
Is my understanding of what I said so far correct?
Is there a way to do this that doesn't involve creating a new array every time I need to add to an array? If there is no other way, how would I create a new array when the size of the array is unknown before my loop begins?
Return a sorted array of all non-negative numbers less than the given n which are divisible both by 3 and 4. For n = 30, the output should be
threeAndFour(n) = [0, 12, 24].
int[] threeAndFour(int n) {
List<int> l = new List<int>(){ 0 };
for (int i = 12; i < n; ++i)
if (i % 12 == 0)
l.Add(i);
return l.ToArray();
}
EDIT: I have since refactored this code to be..
int[] threeAndFour(int n) {
List<int> l = new List<int>(){ 0 };
for (int i = 12; i < n; i += 12)
l.Add(i);
return l.ToArray();
}

A. Lists is OK
If you want to use a for to find out the numbers, then List is the appropriate data structure for collecting the numbers as you discover them.
B. Use more maths
static int[] threeAndFour(int n) {
var a = new int[(n / 12) + 1];
for (int i = 12; i < n; i += 12) a[i/12] = i;
return a;
}
C. Generator pattern with IEnumerable<int>
I know that this doesn't return an array, but it does avoid a list.
static IEnumerable<int> threeAndFour(int n) {
yield return 0;
for (int i = 12; i < n; i += 12)
yield return i;
}
D. Twist and turn to avoid a list
The code could for twice. First to figure the size or the array, and then to fill it.
int[] threeAndFour(int n) {
// Version: A list is really undesirable, arrays are great.
int size = 1;
for (int i = 12; i < n; i += 12)
size++;
var a = new int[size];
a[0] = 0;
int counter = 1;
for (int i = 12; i < n; i += 12) a[counter++] = i;
}

if (i % 12 == 0)
So you have figured out that the numbers which divides both 3 and 4 are precisely those numbers that divides 12.
Can you figure out how many such numbers there are below a given n? - Can you do so without counting the numbers - if so there is no need for a dynamically growing container, you can just initialize the container to the correct size.
Once you have your array just keep track of the next index to fill.

You could use Linq and Enumerable.Range method for the purpose. For example,
int[] threeAndFour(int n)
{
return Enumerable.Range(0,n).Where(x=>x%12==0).ToArray();
}
Enumerable.Range generates a sequence of integral numbers within a specified range, which is then filtered on the condition (x%12==0) to retrieve the desired result.

Since you know this goes in steps of 12 and you know how many there are before you start, you can do:
Enumerable.Range(0,n/12+1).Select(x => x*12).ToArray();

I am unsure how to do this without using List since my understanding is that you can't push to an array in C# since they are of fixed size.
It is correct that arrays can not grow. List were invented as a wrapper around a array that automagically grows whenever needed. Note that you can give List a integer via the Constructor, wich will tell it the minimum size it should expect. It will allocate at least that much the first time. This can limit growth related overhead.
And dictionaries are just a variation of the list mechanics, with Hash Table key search speed.
There is only 1 other Collection I know of that can grow. However it is rarely mentioned outside of theory and some very specific cases:
Linked Lists. The linked list has a unbeatable growth performance and the lowest issue of running into OutOfMemory Exceptions due to Fragmentation. Unfortunately, their random access times are the worst as a result. Unless you can process those collections exclusively sequentally from the start (or sometimes the end), their performance will be abysmal. Only stacks and queues are likely to use them. There is however still a implementation you could use in .NET: https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.linkedlist-1
Your code holds some potential too:
for (int i = 12; i < n; ++i)
if (i % 12 == 0)
l.Add(i);
It would way more effective to count up by 12 every itteration - you are only interested in every 12th number after all. You may have to change the loop, but I think a do...while would do. Also the array/minimum List size is easily predicted: Just divide n by 12 and add 1. But I asume that is mostly mock-up code and it is not actually that deterministic.

List generally works pretty well, as I understand your question you have challenged yourself to solve a problem without using the List class. An array (or List) uses a contiguous block of memory to store elements. Arrays are of fixed size. List will dynamically expand to accept new elements but still keeps everything in a single block of memory.
You can use a linked list https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.linkedlist-1?view=netframework-4.8 to produce a simulation of an array. A linked list allocates additional memory for each element (node) that is used to point to the next (and possibly the previous). This allows you to add elements without large block allocations, but you pay a space cost (increased use of memory) for each element added. The other problem with linked lists are you can't quickly access random elements. To get to element 5, you have to go through elements 0 through 4. There's a reason arrays and array like structures are favored for many tasks, but it's always interesting to try to do common things in a different way.

Subset Sum algorithm efficiency

We have a number of payments (Transaction) that come into our business each day. Each Transaction has an ID and an Amount. We have the requirement to match a number of these transactions to a specific amount. Example:
Transaction Amount
1 100
2 200
3 300
4 400
5 500
If we wanted to find the transactions that add up to 600 you would have a number of sets (1,2,3),(2,4),(1,5).
I found an algorithm that I have adapted, that works as defined below. For 30 transactions it takes 15ms. But the number of transactions average around 740 and have a maximum close to 6000. Is the a more efficient way to perform this search?
sum_up(TransactionList, remittanceValue, ref MatchedLists);
private static void sum_up(List<Transaction> transactions, decimal target, ref List<List<Transaction>> matchedLists)
{
sum_up_recursive(transactions, target, new List<Transaction>(), ref matchedLists);
}
private static void sum_up_recursive(List<Transaction> transactions, decimal target, List<Transaction> partial, ref List<List<Transaction>> matchedLists)
{
decimal s = 0;
foreach (Transaction x in partial) s += x.Amount;
if (s == target)
{
matchedLists.Add(partial);
}
if (s > target)
return;
for (int i = 0; i < transactions.Count; i++)
{
List<Transaction> remaining = new List<Transaction>();
Transaction n = new Transaction(0, transactions[i].ID, transactions[i].Amount);
for (int j = i + 1; j < transactions.Count; j++) remaining.Add(transactions[j]);
List<Transaction> partial_rec = new List<Transaction>(partial);
partial_rec.Add(new Transaction(n.MatchNumber, n.ID, n.Amount));
sum_up_recursive(remaining, target, partial_rec, ref matchedLists);
}
}
With Transaction defined as:
class Transaction
{
public int ID;
public decimal Amount;
public int MatchNumber;
public Transaction(int matchNumber, int id, decimal amount)
{
ID = id;
Amount = amount;
MatchNumber = matchNumber;
}
}

As already mentioned your problem can be solved by pseudo polynomial algorithm in O(n*G) with n - number of items and G - your targeted sum.
The first part question: is it possible to achieve the targeted sum G. The following pseudo/python code solves it (have no C# on my machine):
def subsum(values, target):
reached=[False]*(target+1) # initialize as no sums reached at all
reached[0]=True # with 0 elements we can only achieve the sum=0
for val in values:
for s in reversed(xrange(target+1)): #for target, target-1,...,0
if reached[s] and s+val<=target: # if subsum=s can be reached, that we can add the current value to this sum and build an new sum
reached[s+val]=True
return reached[target]
What is the idea? Let's consider values [1,2,3,6] and target sum 7:
We start with an empty set - the possible sum is obviously 0.
Now we look at the first element 1 and have to options to take or not to take. That leaves as with possible sums {0,1}.
Now looking at the next element 2: leads to possible sets {0,1} (not taking)+{2,3} (taking).
Until now not much difference to your approach, but now for element 3 we have possible sets a. for not taking {0,1,2,3} and b. for taking {3,4,5,6} resulting in {0,1,2,3,4,5,6} as possible sums. The difference to your approach is that there are two way to get to 3 and your recursion will be started twice from that (which is not needed). Calculating basically the same staff over and over again is the problem of your approach and why the proposed algorithm is better.
As last step we consider 6 and get {0,1,2,3,4,5,6,7} as possible sums.
But you also need the subset which leads to the targeted sum, for this we just remember which element was taken to achieve the current sub sum. This version returns a subset which results in the target sum or None otherwise:
def subsum(values, target):
reached=[False]*(target+1)
val_ids=[-1]*(target+1)
reached[0]=True # with 0 elements we can only achieve the sum=0
for (val_id,val) in enumerate(values):
for s in reversed(xrange(target+1)): #for target, target-1,...,0
if reached[s] and s+val<=target:
reached[s+val]=True
val_ids[s+val]=val_id
#reconstruct the subset for target:
if not reached[target]:
return None # means not possible
else:
result=[]
current=target
while current!=0:# search backwards jumping from predecessor to predecessor
val_id=val_ids[current]
result.append(val_id)
current-=values[val_id]
return result
As an another approach you could use memoization to speed up your current solution remembering for the state (subsum, number_of_elements_not considered) whether it is possible to achieve the target sum. But I would say the standard dynamic programming is a less error prone possibility here.

Yes.
I can't provide full code at the moment, but instead of iterating each list of transactions twice until finding matches (O squared), try this concept:
setup a hashtable with the existing transaction amounts as entries, as well as the summation of each set of two transactions assuming each value is made of a max of two transactions (weekend credit card processing).
for each total, reference into the hashtable - the sets of transactions in that slot are the list of matching transactions.
Instead of O^2, you can get it down to 4*O, which would make a noticeable difference in speed.
Good luck!

Dynamic programming can solve this problem efficiently:
Assume you have n transactions and the max amount of transactions is m.
we can solve it just in the complexity of O(nm).
learn it at Knapsack problem.
for this problem we can define for pre i transactions the numbers of subset, add up to sum: dp[i][sum].
the equation:
for i 1 to n:
dp[i][sum] = dp[i - 1][sum - amount_i]
the dp[n][sum] is the numbers of you need, and you need to add some tricks to get what are all the subsets.
Blockquote

You have a couple of practical assumptions here that would make brute force with smartish branch pruning feasible:
items are unique, hence you wouldn't be getting combinatorial blow up of valid subsets (i.e. (1,1,1,1,1,1,1,1,1,1,1,1,1) adding up to 3)
if the number of resulting feasible sets is still huge, you would run out of memory collecting them before running into total runtime issues.
ordering input ascending would allow for an easy early stop check - if your remaining sum is smaller then the current element, then none of the yet unexamined items could possibly be in a result (as current and subsequent items would only get bigger)
keeping running sums would speed up each step, as you wouldn't be recalculating it over and over again
Here's a bit of code:
public static List<T[]> SubsetSums<T>(T[] items, int target, Func<T, int> amountGetter)
{
Stack<T> unusedItems = new Stack<T>(items.OrderByDescending(amountGetter));
Stack<T> usedItems = new Stack<T>();
List<T[]> results = new List<T[]>();
SubsetSumsRec(unusedItems, usedItems, target, results, amountGetter);
return results;
}
public static void SubsetSumsRec<T>(Stack<T> unusedItems, Stack<T> usedItems, int targetSum, List<T[]> results, Func<T,int> amountGetter)
{
if (targetSum == 0)
results.Add(usedItems.ToArray());
if (targetSum < 0 || unusedItems.Count == 0)
return;
var item = unusedItems.Pop();
int currentAmount = amountGetter(item);
if (targetSum >= currentAmount)
{
// case 1: use current element
usedItems.Push(item);
SubsetSumsRec(unusedItems, usedItems, targetSum - currentAmount, results, amountGetter);
usedItems.Pop();
// case 2: skip current element
SubsetSumsRec(unusedItems, usedItems, targetSum, results, amountGetter);
}
unusedItems.Push(item);
}
I've run it against 100k input that yields around 1k results in under 25 millis, so it should be able to handle your 740 case with ease.

Determining how close an array is to the target array

I'm playing a little experiment to increase my knowledge and I have reached a part where I feel i could really optimize it, but am not quite sure how to do this.
I have many arrays of numbers. (for simplicity, lets say each array has 4 numbers: 1, 2, 3, and 4)
The target is to have all of the numbers in ascending order (ie,
1-2-3-4), but the numbers are all scrambled in the different arrays.
A higher weight is placed upon larger numbers.
I need to sort all of these arrays in order of how close they are to
the target.
Ie, 4-3-2-1 would be the worst possible case.
Some example cases:
3-4-2-1 is better than 4-3-2-1
2-3-4-1 is better than 1-4-3-2 (even though two numbers match (1 and 3).
the biggest number is closer to its spot.)
So the big numbers always take precedence over the smaller numbers. Here is my attempt:
var tmp = from m in moves
let mx = m.Max()
let ranking = m.IndexOf(s => s == mx)
orderby ranking descending
select m;
return tmp.ToArray();
P.S IndexOf in the above example, is an extension I wrote to take an array and expression, and return the index of the element that satisfies the expression. It is needed because the situation is really a little more complicated, i'm simplifying it with my example.
The problem with my attempt here though, is that it would only sort by the biggest number, and forget all of the other numbers. it SHOULD rank by biggest number first, then by second largest, then by third.
Also, since it will be doing this operation over and over again for several minutes, it should be as efficient as possible.

You could implement a bubble sort, and count the number of times you have to move data around. The number of data moves will be large on arrays that are far away from the sorted ideal.
int GetUnorderedness<T>(T[] data) where T : IComparable<T>
{
data = (T[])data.Clone(); // don't modify the input data,
// we weren't asked to actually sort.
int swapCount = 0;
bool isSorted;
do
{
isSorted = true;
for(int i = 1; i < data.Length; i++)
{
if(data[i-1].CompareTo(data[i]) > 0)
{
T temp = data[i];
data[i] = data[i-1];
data[i-1] = temp;
swapCount++;
isSorted = false;
}
}
} while(!isSorted);
}
From your sample data, this will give slightly different results than you specified.
Some example cases:
3-4-2-1 is better than 4-3-2-1
2-3-4-1 is better than 1-4-3-2
3-4-2-1 will take 5 swaps to sort, 4-3-2-1 will take 6, so that works.
2-3-4-1 will take 3, 1-4-3-2 will also take 3, so this doesn't match up with your expected results.
This algorithm doesn't treat the largest number as the most important, which it seems you want; all numbers are treated equally. From your description, you'd consider 2-1-3-4 as much better than 1-2-4-3, because the first one has both the largest and second largest numbers in their proper place. This algorithm would consider those two equal, because each requires only 1 swap to sort the array.
This algorithm does have the advantage that it's not just a comparison algorithm, each input has a discrete output, so you only need to run the algorithm once for each input array.

I hope this helps
var i = 0;
var temp = (from m in moves select m).ToArray();
do
{
temp = (from m in temp
orderby m[i] descending
select m).ToArray();
}
while (++i < moves[0].Length);

Series calculation

I have some random integers like
99 20 30 1 100 400 5 10
I have to find a sum from any combination of these integers that is closest(equal or more but not less) to a given number like
183
what is the fastest and accurate way of doing this?

If your numbers are small, you can use a simple Dynamic Programming(DP) technique. Don't let this name scare you. The technique is fairly understandable. Basically you break the larger problem into subproblems.
Here we define the problem to be can[number]. If the number can be constructed from the integers in your file, then can[number] is true, otherwise it is false. It is obvious that 0 is constructable by not using any numbers at all, so can[0] is true. Now you try to use every number from the input file. We try to see if the sum j is achievable. If an already achieved sum + current number we try == j, then j is clearly achievable. If you want to keep track of what numbers made a particular sum, use an additional prev array, which stores the last used number to make the sum. See the code below for an implementation of this idea:
int UPPER_BOUND = number1 + number2 + ... + numbern //The largest number you can construct
bool can[UPPER_BOUND + 1]; //can[number] is true if number can be constructed
can[0] = true; //0 is achievable always by not using any number
int prev[UPPER_BOUND + 1]; //prev[number] is the last number used to achieve sum "number"
for (int i = 0; i < N; i++) //Try to use every number(numbers[i]) from the input file
{
for (int j = UPPER_BOUND; j >= 1; j--) //Try to see if j is an achievable sum
{
if (can[j]) continue; //It is already an achieved sum, so go to the next j
if (j - numbers[i] >= 0 && can[j - numbers[i]]) //If an (already achievable sum) + (numbers[i]) == j, then j is obviously achievable
{
can[j] = true;
prev[j] = numbers[i]; //To achieve j we used numbers[i]
}
}
}
int CLOSEST_SUM = -1;
for (int i = SUM; i <= UPPER_BOUND; i++)
if (can[i])
{
//the closest number to SUM(larger than SUM) is i
CLOSEST_SUM = i;
break;
}
int currentSum = CLOSEST_SUM;
do
{
int usedNumber = prev[currentSum];
Console.WriteLine(usedNumber);
currentSum -= usedNumber;
} while (currentSum > 0);

This seems to be a Knapsack-like problem, where the value of your integers would be the "weight" of each item, the "profit" of each item is 1, and you are looking for the least number of items to exactly sum to the maximum allowable weight of the knapsack.

This is a variant of the SUBSET-SUM problem, and is also NP-Hard like SUBSET-SUM.
But if the numbers involved are small, pseudo-polynomial time algorithms exist. Check out:
http://en.wikipedia.org/wiki/Subset_sum_problem
Ok More details.
The following problem:
Given an array of integers, and integers a,b, is there
some subset whose sum lies in the
interval [a,b] is NP-Hard.
This is so because we can solve subset-sum by choosing a=b=0.
Now this problem easily reduces to your problem and so your problem is NP-Hard too.
Now you can use the polynomial time approximation algorithm mentioned in the wiki link above.
Given an array of N integers, a target S and an approximation threshold c,
there is a polynomial time approximation algorithm (involving 1/c) which tells if there is a subset sum in the interval [(1-c)S, S].
You can use this repeatedly (by some form of binary search) to find the best approximation to S you need. Note you can also use this on intervals of the from [S, (1+c)S], while the knapsack will only give you a solution <= S.
Of course there might be better algorithms, in fact I can bet on it. There should be plenty of literature on the web. Some search terms you can use: approximation algorithms for subset-sum, pseudo-polynomial time algorithms, dynamic programming algorithm etc.

A simple-brute-force-method would be to read the text in, parse it into numbers, and then go through all combinations until you find the required sum.
A quicker solution would be to sort the numbers, then...
Add the largest number to your sum, Is it too big? if so, take it off and try the next smallest.
if the sum is too small, add the next largest number and repeat.
Continue adding numbers not letting the sum exceed the target. Finish when you hit the target.
Note that when you backtrack, you may need to back track more than one level. Sounds like a good case for recursion...

If the numbers are large you can turn this into an Integer Programme. Using Mathematicas solver, it might look something like this
nums = {99, 20, 30 , 1, 100, 400, 5, 10};
vars = a /# Range#Length#nums;
Minimize[(vars.nums - 183)^2, vars, Integers]

You can sort the list of values, find the first value that's greater than the target, and start concentrating on the values that are less than the target. Find the sum that's closest to the target without going over, then compare that to the first value greater than the target. If the difference between the closest sum and the target is less than the difference between the first value greater than the target and the target, then you have the sum that's closest.
Kinda hokey, but I think the logic hangs together.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.