Fastest way to check each neighbor in 2D array - c#

I am working on a random dungeon generator just for fun / as a side project to learn some new things. I have written a function that returns an integer hash value for any given cell, which gives you information about what type of gameobject it should be. i.e. if it is a wall, what direction to face, is it a corner, etc. Here is what the function currently looks like.
private int CellHashValue(int xIndex, int yIndex, char centerTile)
{
int hashValue = 0;
if (dungeon2DArray[xIndex - 1, yIndex + 1] == centerTile)
{
hashValue += 1;
}
if (dungeon2DArray[xIndex, yIndex + 1] == centerTile)
{
hashValue += 2;
}
if (dungeon2DArray[xIndex + 1, yIndex + 1] == centerTile)
{
hashValue += 4;
}
if (dungeon2DArray[xIndex - 1, yIndex] == centerTile)
{
hashValue += 8;
}
if (dungeon2DArray[xIndex + 1, yIndex] == centerTile)
{
hashValue += 16;
}
if (dungeon2DArray[xIndex - 1, yIndex - 1] == centerTile)
{
hashValue += 32;
}
if (dungeon2DArray[xIndex, yIndex - 1] == centerTile)
{
hashValue += 64;
}
if (dungeon2DArray[xIndex + 1, yIndex - 1] == centerTile)
{
hashValue += 128;
}
return hashValue;
}
My question is, is there a more efficient and faster way to do these checks that perhaps I am not thinking of? The dungeon array ranges in size from 100x100 to 1000x1000, though the function is not called on each cell. I have a separate List that contains rooms and there start and end indexes for each direction that I iterate over to instantiate objects.

What you're doing is essentially applying a form of convolution. Without more context as to how your method is being called or how you're using the returned hash value, what you're doing seems to be close to the most efficient way to iterate a 3x3 grid. Assuming your dungeon2dArray is a char[][] and is global, this is what I believe to be a bit clearer and more concise (you'll have to adjust how to interpret the resulting sum based on the order of iteration).
private int CellHashValue(int x, int y) {
int hashSum = 0; // Hash sum to return
int hashValue = 1; // Increases as power of 2 (1,2,4,8,...128)
char centerTile = dungeon2DArray[x, y]; // Cache center tile
for (int r = -1; r <= 1; r++) {
for (int c = -1; c <= 1; c++) {
if (r == 0 && c == 0) continue; // Skip center tile
if (dungeon2DArray[x + c, y + r] == centerTile) {
hashSum += hashValue;
}
hashValue *= 2; // Next power of two
//could also bit shift here instead
// hashValue <<= 1
}
}
return hashSum;
}
Note: This method doesn't do any boundary checking, so if x or y index is along edge, indices will fail.
Each of the array accesses is O(1) and iterating over your entire dungeon array is O(n^2), so the only way to get better efficiency would be to combine per cell methods calls, but this is still only a constant factor, so not really more efficient, but depending on the calculation could boost performance a little bit.

Since you are using an array to build the map then the access time is constant due to direct access. Thus, the process of checking each array index is fast.
There are several minor things to speed up the function.
Return the hashValue within the corresponding if statement. This will remove a few lines of code.
By removing the hashValue variable and returning a hard-coded value, a variable initialization will be removed from the process. This is more significant than it may seem. To create and remove an object takes time, lots of time when at scale.
xIndex and yIndex can be made global to the object. Be careful implementing this idea. Since xIndex and yIndex are not changing while checking for specific conditions then they can be made global within the object. This reduces the number of parameters passed in. Since Java does not pass by reference then an object of equal size is created and passed into the function. A simple int won't impact the speed much but if you have an object that contains many variables then more time is needed to build another object of equal value.
Each check can be moved to a separate function. This primarily helps with readability and debugging later on. There are some speed advantages but they're project dependent. By observing how objects are initialized/manipulated then certain conditions can typically be forced to be true. When logic doesn't need to be checked and conclusions can be reached without checks, then less time is needed.
Just a few ideas. If you have some time to research, the 2nd and 3rd points use concepts from low latency/high frequency. Also, be aware that some of these concepts are not thought of as best practice.

Related

Sorting array using BubbleSort fails

Following algorithm works pretty fine in C#
public int[] Sortieren(int[] array, int decide)
{
bool sorted;
int temp;
for (int i = 0; i < array.Length; i++)
{
do
{
sorted= true;
for (int j = 0; j < array.Length - 1; j++)
{
if (decide == 1)
{
if (array[j] < array[j + 1])
{
temp = array[j];
array[j] = array[j + 1];
array[j + 1] = temp;
sorted= false;
}
}else if (decide == 0)
{
if (array[j] > array[j + 1])
{
temp = array[j];
array[j] = array[j + 1];
array[j + 1] = temp;
sorted= false;
}
}
else
{
Console.WriteLine("Incorrect sorting parameter!");
break;
}
}
} while (!sorted);
}
return array;
}
Same thing in C fails. I only get the first two numbers of the array being sorted. The rest of the numbers are same. So, this code also seems to change the array instead of only sorting it. Any ideas, where are the bugs?
#include <stdio.h>
#include<stdbool.h>
#define MAX 10
void main(void)
{
int random_numbers[MAX],temp,Array_length;
bool sorted;
srand(time(NULL));
for(int i=0;i<=MAX;i++){
random_numbers[i]=rand()%1000;
}
Array_length=sizeof(random_numbers) / sizeof(int);
printf("List of (unsorted) numbers:\n");
for(int i=0;i<MAX;i++){
if(i==MAX-1)
printf("%i",random_numbers[i]);
else
printf("%i,",random_numbers[i]);
}
//Searching algorithm
for(int i=0;i<Array_length;i++){
do{
sorted=true;
for(int j=0;j<Array_length-1;j++){
if(random_numbers[j]>random_numbers[j+1]){
temp=random_numbers[j];
random_numbers[j]==random_numbers[j+1];
random_numbers[j+1]=temp;
sorted=false;
}
}
}while(!sorted);
}
printf("\n");
for(int i=0;i<Array_length;i++){
if(i==Array_length-1)
printf("%i",random_numbers[i]);
else
printf("%i,",random_numbers[i]);
}
}
You have an error in your swap algorithm:
if (zufallszahlen[j] > zufallszahlen[j+1]) {
temp = zufallszahlen[j];
zufallszahlen[j] == zufallszahlen[j+1]; // here
zufallszahlen[j+1] = temp;
sortiert = false;
}
In the line after you assign to temp, your double equal sign results in a check for equality rather than an assignment. This is still legal code (== is an operator and and expressions that use them evaluate to something), and the expression will evaluate to either 1 or 0 depending on the truth value of the statement. Note that this is legal even though you're not using the expression, where normally a boolean value would presumably be used for control flow.
Note that this is true for other operators as well. For example, the = operator assigns the value on the right to the variable on the left, so hypothetically a mistake like if (x = 0) will mean this branch will never be called, since the x = 0 will evaluate to false every time, when you may have meant to branch when x == 0.
Also, why are you using a boolean value to check if the array is sorted? Bubble sort is a simple algorithm, so it should be trivial to implement, and by the definition of an algorithm, it's guaranteed to both finish and be correct. If you were trying to optimize for performance purposes, for example choosing between merge sort and insertion sort based on whether the data was already sorted then I would understand, but you're checking whether the data is sorted as you're sorting it, which doesn't really make sense, since the algorithm will tell you when it's sorted because it will finish. Adding the boolean checking only adds overhead and nets you nothing.
Also, note how in your C# implementation, you repeated the sort process. This is a good sign your design is wrong. You take in an integer as well as the actual int[] array in your C# code, and you use that integer to branch. Then, from what I can gather, you sort using either < or >, depending on the value passed in. I'm pretty confused by this, since either would work. You gain nothing from adding this functionality, so I'm confused as to why you added it in.
Also, why do you repeat the printf statements? Even doing if/else if I might understand. But you're doing if/else. This is logically equivalent to P V ~P and will always evaluate to true, so you might as well get rid of the if and the else and just have one printf statement.
Below is implementation of your Bubble Sort program, and I want to point out a few things. First, it's generally frowned upon to declare main as void (What should main() return in C and C++?).
I quickly want to also point out that even though we are declaring the maximum length of the array as a macro, all of the array functions I defined explicitly take a size_t size argument for referrential transparency.
Last but not least, I would recommend not declaring all your variables at the start of your program/functions. This is a more contested topic among developers, especially because it used to be required, since compilers needed to know exactly what variables needed to be allocated. As compilers got better and better, they could accept variable declarations within code (and could even optimize some variables away altogether), so some developers recommend declaring your variables when you need them, so that their declaration makes sense (i.e... you know you need them), and also to reduce code noise.
That being said, some developers do prefer declaring all their variables at the beginning of the program/function. You'll especially see this:
int i, j, k;
or some variation of that, because the developer pre-declared all of their loop counters. Again, I think it's just code noise, and when you work with C++ some of the language syntax itself is code noise in my opinion, but just be aware of this.
So for example, rather than declaring everything like this:
int zufallszahlen[MAX], temp, Array_length;
You would declare the variables like this:
int zufallszahlen[MAX];
int Array_length = sizeof (zufallszahlen) / sizeof (int);
The temp variable is then put off for as long as possible so that it's obvious when and were it's useful. In my implementation, you'll notice I declared it in the swap function.
For pedagogical purposes I would also like to add that you don't have to use a swap variable when sorting integers because you can do the following:
a = a + b;
b = a - b;
a = a - b;
I will say, however, that I believe the temporary swap variable makes the swap much more instantly familiar, so I would say leave it in, but this is my own personal preference.
I do recommend using size_t for Array_length, however, because that's the type that the sizeof operator returns. It makes sense, too, because the size of an array will not be negative.
Here are the include statements and functions. Remember that I do not include <stdbool.h> because the bool checking you were doing was doing nothing for the algorithm.
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define MAX 10
void PrintArray(int arr[], size_t n) {
for (int i = 0; i < n; ++i) {
printf("%d ", arr[i]);
}
printf("\n");
}
void PopulateArray(int arr[], size_t n) {
for (int i = 0; i < n; ++i) {
arr[i] = rand() % 1000 + 1;
}
}
void BubbleSortArray(int arr[], size_t n) {
for (int i = 0; i < n; ++i) {
for (int j = 0; j < n - 1; ++j) {
if (arr[j] > arr[j+1]) {
int temp = arr[j+1];
arr[j+1] = arr[j];
arr[j] = temp;
}
}
}
}
To implement the bubble sort algorithm, the only thing you have to do now is initialize the random number generator like you did, create your array and populate it, and finally sort the array.
int main()
{
srand(time(NULL));
int arr[MAX];
size_t array_length = sizeof (arr) / sizeof (int);
PopulateArray(arr, array_length);
PrintArray(arr, array_length);
BubbleSortArray(arr, array_length);
PrintArray(arr, array_length);
}
I hope this helps, let me know if you have any questions.

Cheapest cost through an array

Question
Given an array of length[N] you must start at array[0] and traverse through to the end. You are allowed to to move one position or two i.e. array[0] -> array[1] or array[0] -> array[2] depending on which sum of numbers are lower. This will repeat all the way to the end and must include array[N].
[1, 10, 3, 8, 4]
Cheapest way to navigate is = 8 via array[0] + array[2] + array[4]
My current solution:
int totalCost = 0
totalCost += array[0]
int i = 1;
while (i < array.length)
{
if (i + 1 < array.length)
{
int sum1 = totalCost + array[i];
int sum2 = totalCost + array[i + 1];
if (sum1 < sum2)
{
totalCost += array[i];
i++;
}
else
{
totalCost += array[i + 1];
i += 2;
}
}
else
{
totalCost += array[i];
i++;
}
}
This seems to work for most arrays...the issue comes into play where if an early jump results in a bigger number but allows for a better jump further through the array ultimately resulting in a lower number. I have no clue how to approach that.
var c = new List<int>();
for (int i = 0; i < a.Length - 1;)
{
c.Add(i);
if (i < a.Length - 2 && a[i + 2] < a[i + 1])
i += 2;
else
i += 1;
}
c.Add(a[a.Length - 1]);
Your approach is not working, because you try to decide on the next move when you are at some element. Point is, you can't generally decide on your next move until you know all element values up to the second element from the end of array.
Not only in computer science but generally it is easier to look into the past and learn than to look into the future and predict. So whenever a problem or sub-problem can be solved from historic data, don't try to solve it with future expectations.
Now, lets look how this can be applied to your problem.
If you are at position i in your array, what you do is trying to predict the right way by looking at the possible next steps and deciding on one of them, this is not working reliably. So lets instead suppose you are at position i and you don't want to know where to go next but instead you ask "What is the best (cost efficient) way to reach the current position and how much does it cost?".
For the first position i=0, this is trivial. You reach this position by starting the algorithm and the cost equals the value of a[0].
For the second position i=1, it is also trivial. You can move in steps of 1 (small step) or 2 (big step), but in this specific case, only a small step is possible, so you reach i=1 by comming from position i=0 and the cost equals the cost of reaching i=0 plus the value of position i=1.
For all following positions i>1, a decision needs to be made, whether the current position is reached by going a small step or a big step. If the current position is reached by a small step, its cost is calculated as the cost of reaching i-1 plus the value of position i. Other case, if the current position is reached by a big step, its cost is calculated as the cost of reaching i-2 plus the value of position i. In order to reach position i with the lowest cost, decide between small and big step by comparing their associated cost and chosing the minimum.
When the end of the array is reached, the cost and steps of reaching each position will be computed and the cost of reaching the last position can be returned as a result.
If you only need the minimum cost and not the actual path, then the following should work (c-related pseudo code):
costs[array.size] = { 0 };
for (i = 0; i < array.size; ++i)
{
if (i > 1) costs[i] = array[i] + min(costs[i-1], costs[i-2]);
else if (i > 0) costs[i] = array[i] + costs[i-1];
else costs[i] = array[i];
}
result = costs[array.size - 1];
It basically says: reaching a point i can be done by comming from the previous point or from 2 points before. If the cost for the two previous points is already computed, the decision is as easy as taking the minimum of the two previous point costs and adding the current point cost.
Instead of an array with all sub-costs, you could even do with a total of 3 variables (discard sub-costs that represent indices below i-2), but then the path can't be re-constructed from the sub-costs.
I ignored the possibility of future sums being effected by earlier choices. I agree that is at some level of AI.
Solution I revised to.
int total = 0;
cost += array[0];
int i = 0;
while (i < array.Length - 1)
{
if ((array[i+1] + total) < array[i+2] + total) && (i != array.Length - 3))
{
total += array[i + 1];
i++;
}
else
{
total += array[i + 2];
i += 2;
}
}
I had to add the second to last element check because if the iterator landed on 2nd to last, there was no reason to check which one was lower as I had to pick the last one regardless.

Sudoku Backtracking - Order of walking through fields by amount of possible values

I've created a backtracking algorithm for solving Sudoku puzzles which basicly walks through all the empty fields from left to right, top down respectively. Now I need to make an extended version in which the order in which the algorithm walks through the fields is defined by the amount of possibilities (calculated once, at initialization) for each of the fields. E.g. empty fields which initially have the fewest amount of possible values should be visited first, and only the initially possible values should be checked (both to reduce the amount of iterations needed). Now I'm not sure how to go on implementing this without increasing the amount of iterations needed to actually define these values for each field and then obtain the next field with the fewest possibilities.
For my backtracking algorithm I have a nextPosition method which determines the next empty field in the sudoku to visit. Right now it looks like this:
protected virtual int[] nextPosition(int[] position)
{
int[] nextPosition = new int[2];
if (position[1] == (n * n) - 1)
{
nextPosition[0] = position[0]+1;
nextPosition[1] = 0;
}
else
{
nextPosition[0] = position[0];
nextPosition[1] = position[1]+1;
}
return nextPosition;
}
So it basicly walks through the sudoku left-right, top-down respectively. Now I need to alter this for my new version to walk through the fields ordered by the fewest amount of possible values for the fields (and only trying the possible values for each field in my backtracking algorithm). I figured I'd try to keep a list of invalid values for each field:
public void getInvalidValues(int x, int y)
{
for (int i = 0; i < n * n; i++)
if (grid[y, i] != 0)
this.invalidValues[y, i].Add(grid[y, i]);
for (int i = 0; i < n * n; i++)
if (grid[i, x] == 0)
this.invalidValues[i, x].Add(grid[i, x]);
int nX = (int)Math.Floor((double)x / n);
int nY = (int)Math.Floor((double)y / n);
for (int x = 0; x < n; x++)
for (int y = 0; y < n; y++)
if (grid[nY * n + y, nX * n + x] != 0)
this.invalidValues[y, x].Add(grid[y, x]);
}
Calling this method for every empty field in the sudoku (represented in this.grid as 2D array [nn,nn]). However this causes even more iterations since in order to determine the amount of different invalid values for each field it'll have to walk through each list again.
So my question is whether someone knows a way to efficiently walk through the fields of the sudoku ordered by the amount of possible values for each field (at the same time keeping track of these possible values for each field since they are needed for the backtracking algorithm). If anyone could help me out on this it'd be much appreciated.
Thanks in advance!

Video rate image construction from binary data performance

First things first:
I have a git repo over here that holds the code of my current efforts and an example data set
Background
The example data set holds a bunch of records in Int32 format. Each record is composed of several bit fields that basically hold info on events where an event is either:
The detection of a photon
The arrival of a synchronizing signal
Each Int32 record can be treated like following C-style struct:
struct {
unsigned TimeTag :16;
unsigned Channel :12;
unsigned Route :2;
unsigned Valid :1;
unsigned Reserved :1; } TTTRrecord;
Whether we are dealing with a photon record or a sync event, time
tag will always hold the time of the event relative to the start of
the experiment (macro-time).
If a record is a photon, valid == 1.
If a record is a sync signal or something else, valid == 0.
If a record is a sync signal, sync type = channel & 7 will give either a value indicating start of frame or end of scan line in a frame.
The last relevant bit of info is that Timetag is 16 bit and thus obviously limited. If the Timetag counter rolls over, the rollover counter is incremented. This rollover (overflow) count can easily be obtained from channel overflow = Channel & 2048.
My Goal
These records come in from a high speed scanning microscope and I would like to use these records to reconstruct images from the recorded photon data, preferably at 60 FPS.
To do so, I obviously have all the info:
I can look over all available data, find all overflows, which allows me to reconstruct the sequential macro time for each record (photon or sync).
I also know when the frame started and when each line composing the frame ended (and thus also how many lines there are).
Therefore, to reconstruct a bitmap of size noOfLines * noOfLines I can process the bulk array of records line by line where each time I basically make a "histogram" of the photon events with edges at the time boundary of each pixel in the line.
Put another way, if I know Tstart and Tend of a line, and I know the number of pixels I want to spread my photons over, I can walk through all records of the line and check if the macro time of my photons falls within the time boundary of the current pixel. If so, I add one to the value of that pixel.
This approach works, current code in the repo gives me the image I expect but it is too slow (several tens of ms to calculate a frame).
What I tried already:
The magic happens in the function int[] Renderline (see repo).
public static int[] RenderlineV(int[] someRecords, int pixelduration, int pixelCount)
{
// Will hold the pixels obviously
int[] linePixels = new int[pixelCount];
// Calculate everything (sync, overflow, ...) from the raw records
int[] timeTag = someRecords.Select(x => Convert.ToInt32(x & 65535)).ToArray();
int[] channel = someRecords.Select(x => Convert.ToInt32((x >> 16) & 4095)).ToArray();
int[] valid = someRecords.Select(x => Convert.ToInt32((x >> 30) & 1)).ToArray();
int[] overflow = channel.Select(x => (x & 2048) >> 11).ToArray();
int[] absTime = new int[overflow.Length];
absTime[0] = 0;
Buffer.BlockCopy(overflow, 0, absTime, 4, (overflow.Length - 1) * 4);
absTime = absTime.Cumsum(0, (prev, next) => prev * 65536 + next).Zip(timeTag, (o, tt) => o + tt).ToArray();
long lineStartTime = absTime[0];
int tempIdx = 0;
for (int j = 0; j < linePixels.Length; j++)
{
int count = 0;
for (int i = tempIdx; i < someRecords.Length; i++)
{
if (valid[i] == 1 && lineStartTime + (j + 1) * pixelduration >= absTime[i])
{
count++;
}
}
// Avoid checking records in the raw data that were already binned to a pixel.
linePixels[j] = count;
tempIdx += count;
}
return linePixels;
}
Treating photon records in my data set as an array of structs and addressing members of my struct in an iteration was a bad idea. I could increase speed significantly (2X) by dumping all bitfields into an array and addressing these. This version of the render function is already in the repo.
I also realised I could improve the loop speed by making sure I refer to the .Length property of the array I am running through as this supposedly eliminates bounds checking.
The major speed loss is in the inner loop of this nested set of loops:
for (int j = 0; j < linePixels.Length; j++)
{
int count = 0;
lineStartTime += pixelduration;
for (int i = tempIdx; i < absTime.Length; i++)
{
//if (lineStartTime + (j + 1) * pixelduration >= absTime[i] && valid[i] == 1)
// Seems quicker to calculate the boundary before...
//if (valid[i] == 1 && lineStartTime >= absTime[i] )
// Quicker still...
if (lineStartTime > absTime[i] && valid[i] == 1)
{
// Slow... looking into linePixels[] each iteration is a bad idea.
//linePixels[j]++;
count++;
}
}
// Doing it here is faster.
linePixels[j] = count;
tempIdx += count;
}
Rendering 400 lines like this in a for loop takes roughly 150 ms in a VM (I do not have a dedicated Windows machine right now and I run a Mac myself, I know I know...).
I just installed Win10CTP on a 6 core machine and replacing the normal loops by Parallel.For() increases the speed by almost exactly 6X.
Oddly enough, the non-parallel for loop runs almost at the same speed in the VM or the physical 6 core machine...
Regardless, I cannot imagine that this function cannot be made quicker. I would first like to eke out every bit of efficiency from the line render before I start thinking about other things.
I would like to optimise the function that generates the line to the maximum.
Outlook
Until now, my programming dealt with rather trivial things so I lack some experience but things I think I might consider:
Matlab is/seems very efficient with vectored operations. Could I achieve similar things in C#, i.e. by using Microsoft.Bcl.Simd? Is my case suited for something like this? Would I see gains even in my VM or should I definitely move to real HW?
Could I gain from pointer arithmetic/unsafe code to run through my arrays?
...
Any help would be greatly, greatly appreciated.
I apologize beforehand for the quality of the code in the repo, I am still in the quick and dirty testing stage... Nonetheless, criticism is welcomed if it is constructive :)
Update
As some mentioned, absTime is ordered already. Therefore, once a record is hit that is no longer in the current pixel or bin, there is no need to continue the inner loop.
5X speed gain by adding a break...
for (int i = tempIdx; i < absTime.Length; i++)
{
//if (lineStartTime + (j + 1) * pixelduration >= absTime[i] && valid[i] == 1)
// Seems quicker to calculate the boundary before...
//if (valid[i] == 1 && lineStartTime >= absTime[i] )
// Quicker still...
if (lineStartTime > absTime[i] && valid[i] == 1)
{
// Slow... looking into linePixels[] each iteration is a bad idea.
//linePixels[j]++;
count++;
}
else
{
break;
}
}

Interpolation in c# - performance problem

I need to resample big sets of data (few hundred spectra, each containing few thousand points) using simple linear interpolation.
I have created interpolation method in C# but it seems to be really slow for huge datasets.
How can I improve the performance of this code?
public static List<double> interpolate(IList<double> xItems, IList<double> yItems, IList<double> breaks)
{
double[] interpolated = new double[breaks.Count];
int id = 1;
int x = 0;
while(breaks[x] < xItems[0])
{
interpolated[x] = yItems[0];
x++;
}
double p, w;
// left border case - uphold the value
for (int i = x; i < breaks.Count; i++)
{
while (breaks[i] > xItems[id])
{
id++;
if (id > xItems.Count - 1)
{
id = xItems.Count - 1;
break;
}
}
System.Diagnostics.Debug.WriteLine(string.Format("i: {0}, id {1}", i, id));
if (id <= xItems.Count - 1)
{
if (id == xItems.Count - 1 && breaks[i] > xItems[id])
{
interpolated[i] = yItems[yItems.Count - 1];
}
else
{
w = xItems[id] - xItems[id - 1];
p = (breaks[i] - xItems[id - 1]) / w;
interpolated[i] = yItems[id - 1] + p * (yItems[id] - yItems[id - 1]);
}
}
else // right border case - uphold the value
{
interpolated[i] = yItems[yItems.Count - 1];
}
}
return interpolated.ToList();
}
Edit
Thanks, guys, for all your responses. What I wanted to achieve, when I wrote this questions, were some general ideas where I could find some areas to improve the performance. I haven't expected any ready solutions, only some ideas. And you gave me what I wanted, thanks!
Before writing this question I thought about rewriting this code in C++ but after reading comments to Will's asnwer it seems that the gain can be less than I expected.
Also, the code is so simple, that there are no mighty code-tricks to use here. Thanks to Petar for his attempt to optimize the code
It seems that all reduces the problem to finding good profiler and checking every line and soubroutine and trying to optimize that.
Thank you again for all responses and taking your part in this discussion!
public static List<double> Interpolate(IList<double> xItems, IList<double> yItems, IList<double> breaks)
{
var a = xItems.ToArray();
var b = yItems.ToArray();
var aLimit = a.Length - 1;
var bLimit = b.Length - 1;
var interpolated = new double[breaks.Count];
var total = 0;
var initialValue = a[0];
while (breaks[total] < initialValue)
{
total++;
}
Array.Copy(b, 0, interpolated, 0, total);
int id = 1;
for (int i = total; i < breaks.Count; i++)
{
var breakValue = breaks[i];
while (breakValue > a[id])
{
id++;
if (id > aLimit)
{
id = aLimit;
break;
}
}
double value = b[bLimit];
if (id <= aLimit)
{
var currentValue = a[id];
var previousValue = a[id - 1];
if (id != aLimit || breakValue <= currentValue)
{
var w = currentValue - previousValue;
var p = (breakValue - previousValue) / w;
value = b[id - 1] + p * (b[id] - b[id - 1]);
}
}
interpolated[i] = value;
}
return interpolated.ToList();
}
I've cached some (const) values and used Array.Copy, but I think these are micro optimization that are already made by the compiler in Release mode. However You can try this version and see if it will beat the original version of the code.
Instead of
interpolated.ToList()
which copies the whole array, you compute the interpolated values directly in the final list (or return that array instead). Especially if the array/List is big enough to qualify for the large object heap.
Unlike the ordinary heap, the LOH is not compacted by the GC, which means that short lived large objects are far more harmful than small ones.
Then again: 7000 doubles are approx. 56'000 bytes which is below the large object threshold of 85'000 bytes (1).
Looks to me you've created an O(n^2) algorithm. You are searching for the interval, that's O(n), then probably apply it n times. You'll get a quick and cheap speed-up by taking advantage of the fact that the items are already ordered in the list. Use BinarySearch(), that's O(log(n)).
If still necessary, you should be able to do something speedier with the outer loop, what ever interval you found previously should make it easier to find the next one. But that code isn't in your snippet.
I'd say profile the code and see where it spends its time, then you have somewhere to focus on.
ANTS is popular, but Equatec is free I think.
few suggestions,
as others suggested, use profiler to understand better where time is used.
the loop
while (breaks[x] < xItems[0])
could cause exception if x grows bigger than number of items in "breaks" list. You should use something like
while (x < breaks.Count && breaks[x] < xItems[0])
But you might not need that loop at all. Why treat the first item as special case, just start with id=0 and handle the first point in for(i) loop. I understand that id might start from 0 in this case, and [id-1] would be negative index, but see if you can do something there.
If you want to optimize for speed then you sacrifice memory size, and vice versa. You cannot usually have both, except if you make really clever algorithm. In this case, it would mean to calculate as much as you can outside loops, store those values in variables (extra memory) and use them later. For example, instead of always saying:
id = xItems.Count - 1;
You could say:
int lastXItemsIndex = xItems.Count-1;
...
id = lastXItemsIndex;
This is the same suggestion as Petar Petrov did with aLimit, bLimit....
next point, your loop (or the one Petar Petrov suggested):
while (breaks[i] > xItems[id])
{
id++;
if (id > xItems.Count - 1)
{
id = xItems.Count - 1;
break;
}
}
could probably be reduced to:
double currentBreak = breaks[i];
while (id <= lastXIndex && currentBreak > xItems[id]) id++;
and the last point I would add is to check if there is some property in your samples that is special for your problem. For example if xItems represent time, and you are sampling in regular intervals, then
w = xItems[id] - xItems[id - 1];
is constant, and you do not have to calculate it every time in the loop.
This is probably not often the case, but maybe your problem has some other property which you could use to improve performance.
Another idea is this: maybe you do not need double precision, "float" is probably faster because it is smaller.
Good luck
System.Diagnostics.Debug.WriteLine(string.Format("i: {0}, id {1}", i, id));
I hope it's release build without DEBUG defined?
Otherwise, it might depend on what exactly are those IList parameters. May be useful to store Count value instead of accessing property every time.
This is the kind of problem where you need to move over to native code.

Categories