C# Finding the most similar image to given example - c#

I seek to optimize the performance of my small program, which functionality relies on detecting an image which is most similar to given example. Problem is, the method that I use is really slow and could use a bit of reworking.
I also find that I cannot use Parallel.For to compute the similarity value due to the fact that the function you'll see below is already being called from Parallel.ForEach cycle. Eh.
My similarity method:
public static double isItSame(Bitmap source, Color[,] example)
{
double rez = 0;
for (int x = 20; x < 130; x += 3)
{
for (int y = 10; y < 140; y += 3)
{
Color color1 = source.GetPixel(x, y);
rez += Math.Abs(color1.R - example[x, y].R) + Math.Abs(color1.G - example[x, y].G) + Math.Abs(color1.B - example[x, y].B);
}
}
return rez;
}
Will greatly appreciate any help to optimize this solution. My own way to optimize it was to just do x+3 instead of x++, and same for y, but it results in poor overall results. Eh.

Related

Why is the following code wrong? binomial coefficient

I am doing project euler and i am at problem 15 now, here is a link:
https://projecteuler.net/problem=15 . I am trying to solve this with binomial coefficient. Here is a site that explains it: http://www.mathblog.dk/project-euler-15/ . You can find it at the bottom.
My question is, why is the following code wrong? Since this follows the mathematical algorithm I think: n-k+i/i
int grid = 20;
long paths = 1;
for (int i = 0; i < grid; i++)
{
paths *= (grid * 2) - (grid + i)
paths /= (i + 1);
}
Console.WriteLine(paths);
Console.ReadKey();
And why is this code wrong? This is exactly as the mathblog site but in 1 line.
int grid = 20;
long paths = 1;
for (int i = 0; i < grid; i++)
{
paths *= ((grid * 2) - i) / (i + 1);
}
Console.WriteLine(paths);
Console.ReadKey();
But why is this code right then? Isnt it the same as the previous code? And it doesn't exactly follow the mathematical algorithm does it? Because it's n-k+i/i, and this code does n-i/i
int grid = 20;
long paths = 1;
for (int i = 0; i < grid; i++)
{
paths *= ((grid * 2) - i);
paths /= (i + 1);
}
Console.WriteLine(paths);
Console.ReadKey();
Thnx guys!
If you want to combain the calculation it should be like this
paths = (path *((grid * 2) - i))/(i + 1);
By convention, in many programming languages, int/int gives an int,* not a floating point number. Your method implies that 'paths' should take values that are not int. In fact none of the three methods should work but by a happy coincidence, the last one worked: basically because all intermediate values of 'paths' happen to be binomial coefficients too.
Advice for debugging: ask your program to output the intermediate values. This helps a lot.
*: As a mathematician, I almost never need that feature. In fact the other convention (int/int -> double) would have made my life as a programmer easier on average.
I had a look at the blog you mention. This makes your message much more understandable.
The blog mentions a formula : the product for i from 1 to k of (n-k+1)/i.
So to mimick it you would need to write
for (int i = 1; i <= grid; i++) // bounds!
{
paths *= (grid * 2) - (grid - i) // minus sign!
paths /= (i + 1);
}
About the fact that this works with ints: this is an accident due to the fact that in the intermediate values of the product (at the end of each loop) are binomial coefficients all along the computation. If you would compute the products and divisions in another order, you may very well get non-integers so the computation would fail with an integer variable type for path because of the convention int/int -> int. The blog is not very helpful in not mentionning that.

Speed up nested for loops and improve performance

I am working on a program and it has a pretty long execution time. I'am trying to improve performance where I can, however my knowledge is limited in this area. Can anyone recommend a way to speed up the method below?
public static double DistanceBetween2Points(double[,] p1, double[,] p2, int patchSize)
{
double sum = 0;
for (int i = 0; i < patchSize; i++)
{
for (int j = 0; j < patchSize; j++)
{
sum += Math.Sqrt(Math.Pow(p1[i, j] - p2[i, j], 2));
}
}
return sum;
}
The method calculates the distance between two images by calculating the sum of all the distances between two points on the two images.
Think about your algorithm. Probably a pixel-distance isn't the best thing to get an acurate image-distance.
replace sqrt(x^2) by abs(x) or even faster:
if(x < 0) x = -x;
Rename your routine to OverallImageDistance or similar(will not improve performance) ;)
Use unsafe pointers, and calculate your distance in a single loop using these pointers:
unsafe
{
sum = 0.0;
int numPixels = patchsize*patchsize;
fixed(int *pointer1 = &p1[0])
{
fixed(int* pointer2 = &p2[0])
{
while(numPixels-- > 0)
{
double dist = *pointer1++ - *pointer2++;
if(dist < 0) dist = -dist;
sum += dist;
}
...
This should be several times faster than your original.
Well, this method is really weird and does not look like distance between pixels at all. But certainly you would want to use linear algebra instead of straightforward array calculations.
Image recognition, natural language processing and machine learning algorithms all use matrices, because matrix libraries are highly optimized for these kind of situations, when you need batch processing.
There is a plethora of matrix libraries in the wild, look here Recommendation for C# Matrix Library
EDIT: Ok, thanks for feedback, trying to improve the answer...
You can use Math.Net Numerics open source library (install MathNet.Numerics nuget package) and rewrite your method like this:
using MathNet.Numerics.LinearAlgebra;
public static double DistanceBetween2Points(double[,] p1, double[,] p2, int patchSize)
{
var A = Matrix<double>.Build.DenseOfArray(p1).SubMatrix(0, patchSize, 0, patchSize);
var B = Matrix<double>.Build.DenseOfArray(p2).SubMatrix(0, patchSize, 0, patchSize);
return (A - B).RowAbsoluteSums().Sum();
}
Essentially, loops slow down your code. When doing batch processing ideally you should avoid loops at all.

C# convert an ArrayList of type ints into a Point[]

First of all I am running this on a Mobile Device using CE6.5. I have an ArrayList of data that are ints and for graphing purposes I want to convert in order the number it is into the list and the int value into the x and y values for a Point. Put all those points into a Point Array then use bufferedGraphics.DrawLines to draw that onto the form. I have a way of doing that that seems to work pretty fast but not sure if this is the best way. Any suggestions or improvements to this code?
Oh yeah dataList is usually about 450 or more depending on screen size and rotation.
public Point[] toPointArray(int w, int h) {
Point[] p;
int val;
p = new Point[dataList.Count];
for (int i = 0; i < dataList.Count; i++) {
val = (int)dataList[i];
if (i < p.Length)
p[i] = new Point(i, h - (val * h) / range + (min * h) / range);
}
return p;
}
Some of my data coming is updating the dataList at 256 times per sec, so concerns of overwriting are there but so far this seems to work even at these speeds.
Here are some speed values on how fast this is currently, time is in secs:
Time to complete 0.000588
Time to complete 0.0005886154
Time to complete 0.0005846154
Time to complete 0.0005870769
Time to complete 0.0005830769
Time to complete 0.0005806154
Time to complete 0.0005981539
Time to complete 0.0007206154
Time to complete 0.0005836923
Time to complete 0.001039077
After taking suggestions from the answers below here is what my code looks like now and I am getting 0.00047 secs in execution time on the average. dataList is a global that is now a List instead of ArrayList, the (int) cast was taking up a quarter of the original processing time.
List<int> dataList = new List<int>();
public Point[] toPointArray(int w, int h) {
Point[] p = new Point[dataList.Count];
for (int i = 0; i < dataList.Count; i++) {
p[i] = new Point(i, h - (dataList[i] * h) / range + (min * h) / range);
}
return p;
}
EDIT - after comment about the index and Dictionary:
Point[] p = (from i in Enumerable.Range(0, dataList.Count)
select new Point(
i,
h - (((int)dataList[i]) * h) / range + (min * h) / range))
.OrderBy ((pp) => pp.X)
.ToArray();
this works with the ArrayList etc.
1) Declare the variables at the same place you use them. No need to declare Point[] a separately (it's created right after that), and no reason to declare int val separately (it's not used outside the loop and there are no performance penalties for declaring it where you use it).
2) You know that i is < p.Length (assuming single-threaded application), remove the check.
3) Use LINQ
4) Use Microsoft's suggested naming standards (ProperCase for methods)
5) Use better variable names
6) Use better method names - this method isn't converting it to an array (or at least, that's not the main purpose of the method)
7) Use a typed datastore. We can't see what dataList is, but you shouldn't have to convert it to an int (unless it's a long that you know is an int, or somesuch)
If you can use LINQ, I haven't looked over Yahia's solution but it should work. If you can't/don't want to use LINQ, try:
public Point[] ToPointGrid(int width, int height)
{
Point[] points = new Point[dataList.Count];
for(int index = 0; index < dataList.Count; ++index)
{
points[index] = new Point(index, height - (dataList[index] * height) / range + (minimum * height) / range);
}
return points;
}

C#: Average From Series of Images

Suppose I have a series of black and white images, NOT grayscale. I'm trying to calculate an average of all images. I have some sample code that should work but I'm wondering if there is a better way?
Bitmap[] Images = ReadAndScale(Width: 50, Height: 50);
int Width = 50;
int Height = 50;
double[,] Result = new int[50,50]
for (int i = 0 i < Images.Count; i++)
{
for (int j = 0; j < Width; j++)
{
for (int k = 0; k < Height; k++)
{
Result[j, k] += Images[i].GetPixel(j,k) == Color.White ? 0
: 1.0 / (double)Images.Count;
}
}
}
At the end of these loops you have an array Result[] that contains the average value; > .5 is black otherwise the average is white.
Well, I think your general algorithm is fine, not really a way to make that "better".
The only way I see you could make it "better" would be to scrape more performance out of it, but do you need that?
The main perf issue I see is the use of GetPixel() as it is a relatively slow method. Here is an example using unsafe code that should run much faster: Unsafe Bitmap
Don't let the word "unsafe" scare you, it just is the keyword for enabling true pointers in C#.
Well, you could make the images the inner loop, and break out as soon as you've guaranteed the average will be less than or greater than .5. Not sure if that is really "better".

Calculating vs. lookup tables for sine value performance?

Let's say you had to calculate the sine (cosine or tangent - whatever) where the domain is between 0.01 and 360.01. (using C#)
What would be more performant?
Using Math.Sin
Using a lookup array with precalculated values
I would anticpate that given the domain, option 2 would be much faster. At what point in the precision of the domain (0.0000n) does the performance of the calculation exceed the lookup.
Update: read through to the end. It looks like the lookup table is faster than Math.Sin after all.
I would guess that the lookup approach would be faster than Math.Sin. I would also say that it would be a lot faster, but Robert's answer made me think that I would still want to benchmark this to be sure. I do a lot of audio buffer processing, and I've noticed that a method like this:
for (int i = 0; i < audiodata.Length; i++)
{
audiodata[i] *= 0.5;
}
will execute significantly faster than
for (int i = 0; i < audiodata.Length; i++)
{
audiodata[i] = Math.Sin(audiodata[i]);
}
If the difference between Math.Sin and a simple multiplication is substantial, I would guess that the difference between Math.Sin and a lookup would also be substantial.
I don't know, though, and my computer with Visual Studio is in the basement, and I'm too tired to take the 2 minutes it would take to determine this.
Update: OK, it took more than 2 minutes (more like 20) to test this, but it looks like Math.Sin is at least twice as fast as a lookup table (using a Dictionary). Here's the class that does Sin using Math.Sin or a lookup table:
public class SinBuddy
{
private Dictionary<double, double> _cachedSins
= new Dictionary<double, double>();
private const double _cacheStep = 0.01;
private double _factor = Math.PI / 180.0;
public SinBuddy()
{
for (double angleDegrees = 0; angleDegrees <= 360.0;
angleDegrees += _cacheStep)
{
double angleRadians = angleDegrees * _factor;
_cachedSins.Add(angleDegrees, Math.Sin(angleRadians));
}
}
public double CacheStep
{
get
{
return _cacheStep;
}
}
public double SinLookup(double angleDegrees)
{
double value;
if (_cachedSins.TryGetValue(angleDegrees, out value))
{
return value;
}
else
{
throw new ArgumentException(
String.Format("No cached Sin value for {0} degrees",
angleDegrees));
}
}
public double Sin(double angleDegrees)
{
double angleRadians = angleDegrees * _factor;
return Math.Sin(angleRadians);
}
}
And here's the test/timing code:
SinBuddy buddy = new SinBuddy();
System.Diagnostics.Stopwatch timer = new System.Diagnostics.Stopwatch();
int loops = 200;
// Math.Sin
timer.Start();
for (int i = 0; i < loops; i++)
{
for (double angleDegrees = 0; angleDegrees <= 360.0;
angleDegrees += buddy.CacheStep)
{
double d = buddy.Sin(angleDegrees);
}
}
timer.Stop();
MessageBox.Show(timer.ElapsedMilliseconds.ToString());
// lookup
timer.Start();
for (int i = 0; i < loops; i++)
{
for (double angleDegrees = 0; angleDegrees <= 360.0;
angleDegrees += buddy.CacheStep)
{
double d = buddy.SinLookup(angleDegrees);
}
}
timer.Stop();
MessageBox.Show(timer.ElapsedMilliseconds.ToString());
Using a step value of 0.01 degrees and looping through the full range of values 200 times (as in this code) takes about 1.4 seconds using Math.Sin, and about 3.2 seconds using a Dictionary lookup table. Lowering the step value to 0.001 or 0.0001 makes the lookup perform even worse against Math.Sin. Also, this result is even more in favor of using Math.Sin, since SinBuddy.Sin does a multiplication to turn the angle in degrees into the angle in radians on every call, while SinBuddy.SinLookup just does a straight lookup.
This is on a cheap laptop (no dual cores or anything fancy). Robert, you da man! (But I still think I should get the check, coz I did the work).
Update 2: It turns out stopping and restarting the Stopwatch doesn't reset the elapsed milliseconds, so the lookup only seemed half as fast because it's time was including the time for the Math.Sin calls. Also, I reread the question and realized you were talking about caching the values in a simple array, rather than using a Dictionary. Here is my modified code (I'm leaving the old code up as a warning to future generations):
public class SinBuddy
{
private Dictionary<double, double> _cachedSins
= new Dictionary<double, double>();
private const double _cacheStep = 0.01;
private double _factor = Math.PI / 180.0;
private double[] _arrayedSins;
public SinBuddy()
{
// set up dictionary
for (double angleDegrees = 0; angleDegrees <= 360.0;
angleDegrees += _cacheStep)
{
double angleRadians = angleDegrees * _factor;
_cachedSins.Add(angleDegrees, Math.Sin(angleRadians));
}
// set up array
int elements = (int)(360.0 / _cacheStep) + 1;
_arrayedSins = new double[elements];
int i = 0;
for (double angleDegrees = 0; angleDegrees <= 360.0;
angleDegrees += _cacheStep)
{
double angleRadians = angleDegrees * _factor;
//_cachedSins.Add(angleDegrees, Math.Sin(angleRadians));
_arrayedSins[i] = Math.Sin(angleRadians);
i++;
}
}
public double CacheStep
{
get
{
return _cacheStep;
}
}
public double SinArrayed(double angleDegrees)
{
int index = (int)(angleDegrees / _cacheStep);
return _arrayedSins[index];
}
public double SinLookup(double angleDegrees)
{
double value;
if (_cachedSins.TryGetValue(angleDegrees, out value))
{
return value;
}
else
{
throw new ArgumentException(
String.Format("No cached Sin value for {0} degrees",
angleDegrees));
}
}
public double Sin(double angleDegrees)
{
double angleRadians = angleDegrees * _factor;
return Math.Sin(angleRadians);
}
}
And the test/timing code:
SinBuddy buddy = new SinBuddy();
System.Diagnostics.Stopwatch timer = new System.Diagnostics.Stopwatch();
int loops = 200;
// Math.Sin
timer.Start();
for (int i = 0; i < loops; i++)
{
for (double angleDegrees = 0; angleDegrees <= 360.0;
angleDegrees += buddy.CacheStep)
{
double d = buddy.Sin(angleDegrees);
}
}
timer.Stop();
MessageBox.Show(timer.ElapsedMilliseconds.ToString());
// lookup
timer = new System.Diagnostics.Stopwatch();
timer.Start();
for (int i = 0; i < loops; i++)
{
for (double angleDegrees = 0; angleDegrees <= 360.0;
angleDegrees += buddy.CacheStep)
{
double d = buddy.SinLookup(angleDegrees);
}
}
timer.Stop();
MessageBox.Show(timer.ElapsedMilliseconds.ToString());
// arrayed
timer = new System.Diagnostics.Stopwatch();
timer.Start();
for (int i = 0; i < loops; i++)
{
for (double angleDegrees = 0; angleDegrees <= 360.0;
angleDegrees += buddy.CacheStep)
{
double d = buddy.SinArrayed(angleDegrees);
}
}
timer.Stop();
MessageBox.Show(timer.ElapsedMilliseconds.ToString());
These results are quite different. Using Math.Sin takes about 850 milliseconds, the Dictionary lookup table takes about 1300 milliseconds, and the array-based lookup table takes about 600 milliseconds. So it appears that a (properly-written [gulp]) lookup table is actually a bit faster than using Math.Sin, but not by much.
Please verify these results yourself, since I have already demonstrated my incompetence.
It used to be that an array lookup was a good optimization to perform fast trig calculations.
But with cache hits, built-in math coprocessors (which use table lookups) and other performance improvements, it might be best to time your specific code yourself to determine which will perform better.
For performance questions, the only right answer is the one you reach after testing. But, before you test, you need to determine whether the effort of the test is worth your time - meaning that you've identified a performance issue.
If you're just curious, you can easily write a test to compare the speeds. However, you'll need to remember that using memory for the lookup table can affect paging in larger apps. So, even if paging is faster in your small test, it could slow things down in a larger app that uses more memory.
The answer to this depends entirely on how many values are in your lookup table. You say "the domain is between 0.01 and 360.01", but you don't say how many values in that range might be used, or how accurate you need the answers to be. Forgive me for not expecting to see significant digits used to convey implicit meaning in a non-scientific context.
More information is still needed to answer this question. What is the expected distribution of values between 0.01 and 360.01? Are you processing a lot of data other than the simple sin( ) computation?
36000 double precision values takes over 256k in memory; the lookup table is too large to fit in L1 cache on most machines; if you're running straight through the table, you'll miss L1 once per sizeof(cacheline)/sizeof(double) accesses, and probably hit L2. If, on the other hand, your table accesses are more or less random, you will be missing L1 almost every time you do a lookup.
It also depends a lot on the math library of the platform that you're on. Common i386 implementations of the sin function, for example, range from ~40 cycles up to 400 cycles or even more, depending on your exact microarchitecture and library vendor. I haven't timed the Microsoft library, so I don't know exactly where the C# Math.sin implementation would fall.
Since loads from L2 are generally faster than 40 cycles on a sane platform, one reasonably expects the lookup table to be faster considered in isolation. However, I doubt you're computing sin( ) in isolation; if your arguments to sin( ) jump all over the table, you will be blowing other data needed for other steps of your computation out of the cache; although the sin( ) computation gets faster, the slowdown to other parts of your computation may more than outweigh the speedup. Only careful measurement can really answer this question.
Am I to understand from your other comments that you're doing this as part of a FFT computation? Is there a reason that you need to roll your own FFT instead of using one of the numerous extremely high quality implementations that already exist?
Since you mention Fourier transforms as an application, you might also consider to compute your sines/cosines using the equations
sin(x+y) = sin(x)cos(y) + cos(x)sin(y)
cos(x+y) = cos(x)cos(y) - sin(x)sin(y)
I.e. you can compute sin(n * x), cos(n * x) for n = 0, 1, 2 ... iteratively from sin((n-1) * x), cos((n-1) * x) and the constants sin(x), cos(x) with 4 multiplications.
Of course that only works if you have to evaluate sin(x), cos(x) on an arithmetic sequence.
Comparing the approaches without the actual implementation is difficult. It depends a lot on how well your tables fit into the caches.
Sorry for grave digging, but there is a good solution for how to make quick indexing of lookup tables:
https://jvm-gaming.org/t/fast-math-sin-cos-lookup-tables/36660
It's in Java, but it takes only a few minutes to port it to C#.
I did tests and got the following results with 100000 iterations:
Math.Sin: 0.043 sec
Mathf.Sin: 0.06 sec (Unity`s Mathf lib)
MathTools.Sin: 0.026 (lookup tables static class).
Probably in Java it will give 50x boost (or it did in 2011 lol, but in C# in 2021 the difference is about 2x only).
Math.Sin is faster. The people who wrote are smart and use table lookups when they are accurate and faster and use the math when that is faster. And there's nothing about that domain that makes it particularily faster, the first thing most trig function implementations do is to map down to a favorable domain anyway.
As you may have thousands of values in your lookup table, what you may want to do is have a dictionary, and when you calculate a value, put it in the dictionary, so you only calculate each value one time, and use the C# function to do the calculation.
But, there is no reason to recalculate the same value over and over.

Categories