Which is faster, for loop or LINQ - c#

I have two lists of double values same as here:
List<double> list1 = something;
List<double> list2 = somethingElse;
I want to subtract these two lists. Lists are huge and I want to do it as fast as possible. Which of the below methods is faster:
public double CalculateDistance(List<double> list1, List<double> list2)
{
double dist = 0;
for (int i = 0; i != list1.Count; i++)
{
dist += Math.Pow(list1 [i] - list2 [i], 2.0);
}
return dist;
}
or this:
public double CalculateDistance(List<double> list1, List<double> list2)
{
double dist = list1.Zip(list2, (v1, v2) => Math.Pow(v1 - v2, 2.0)).Sum();
return dist;
}
I dont know what LINQ does under the hood so I'm not sure which one is better.

Neither. Using a loop has less overhead than LINQ, so that is a good start.
Use the < operator in the loop condition, that is the standard way of writing such a loop, so it's more likely that the compiler will recognise it and optimise it properly.
Using Math.Pow to square a number is not effective. It's somewhere in the ballpark of 100 times faster to multiply it by itself:
public double CalculateDistance(List<double> list1, List<double> list2) {
double dist = 0;
for (int i = 0; i < list1.Count; i++) {
double n = list1[i] - list2[i];
dist += n * n;
}
return dist;
}
Edit:
Using PLINQ you would get a better performance for large sets. Testing it on my computer I found that with less than 10000 items it's not faster to use PLINQ. For lists with 10 million items, I got about 40% shorter execution time.
I also found that using a projection I got about 30% shorter execution time than using Zip:
public double CalculateDistance(List<double> list1, List<double> list2) {
return
ParallelEnumerable.Range(0, list1.Count).Select(i => {
double n = list1[i] - list2[i];
return n * n;
}).Sum();
}

Related

Getting a List<int> from an integer which modulo result is equal to 0 without using loop [duplicate]

All numbers that divide evenly into x.
I put in 4 it returns: 4, 2, 1
edit: I know it sounds homeworky. I'm writing a little app to populate some product tables with semi random test data. Two of the properties are ItemMaximum and Item Multiplier. I need to make sure that the multiplier does not create an illogical situation where buying 1 more item would put the order over the maximum allowed. Thus the factors will give a list of valid values for my test data.
edit++:
This is what I went with after all the help from everyone. Thanks again!
edit#: I wrote 3 different versions to see which I liked better and tested them against factoring small numbers and very large numbers. I'll paste the results.
static IEnumerable<int> GetFactors2(int n)
{
return from a in Enumerable.Range(1, n)
where n % a == 0
select a;
}
private IEnumerable<int> GetFactors3(int x)
{
for (int factor = 1; factor * factor <= x; factor++)
{
if (x % factor == 0)
{
yield return factor;
if (factor * factor != x)
yield return x / factor;
}
}
}
private IEnumerable<int> GetFactors1(int x)
{
int max = (int)Math.Ceiling(Math.Sqrt(x));
for (int factor = 1; factor < max; factor++)
{
if(x % factor == 0)
{
yield return factor;
if(factor != max)
yield return x / factor;
}
}
}
In ticks.
When factoring the number 20, 5 times each:
GetFactors1-5,445,881
GetFactors2-4,308,234
GetFactors3-2,913,659
When factoring the number 20000, 5 times each:
GetFactors1-5,644,457
GetFactors2-12,117,938
GetFactors3-3,108,182
pseudocode:
Loop from 1 to the square root of the number, call the index "i".
if number mod i is 0, add i and number / i to the list of factors.
realocode:
public List<int> Factor(int number)
{
var factors = new List<int>();
int max = (int)Math.Sqrt(number); // Round down
for (int factor = 1; factor <= max; ++factor) // Test from 1 to the square root, or the int below it, inclusive.
{
if (number % factor == 0)
{
factors.Add(factor);
if (factor != number/factor) // Don't add the square root twice! Thanks Jon
factors.Add(number/factor);
}
}
return factors;
}
As Jon Skeet mentioned, you could implement this as an IEnumerable<int> as well - use yield instead of adding to a list. The advantage with List<int> is that it could be sorted before return if required. Then again, you could get a sorted enumerator with a hybrid approach, yielding the first factor and storing the second one in each iteration of the loop, then yielding each value that was stored in reverse order.
You will also want to do something to handle the case where a negative number passed into the function.
The % (remainder) operator is the one to use here. If x % y == 0 then x is divisible by y. (Assuming 0 < y <= x)
I'd personally implement this as a method returning an IEnumerable<int> using an iterator block.
Very late but the accepted answer (a while back) didn't not give the correct results.
Thanks to Merlyn, I got now got the reason for the square as a 'max' below the corrected sample. althought the answer from Echostorm seems more complete.
public static IEnumerable<uint> GetFactors(uint x)
{
for (uint i = 1; i * i <= x; i++)
{
if (x % i == 0)
{
yield return i;
if (i != x / i)
yield return x / i;
}
}
}
As extension methods:
public static bool Divides(this int potentialFactor, int i)
{
return i % potentialFactor == 0;
}
public static IEnumerable<int> Factors(this int i)
{
return from potentialFactor in Enumerable.Range(1, i)
where potentialFactor.Divides(i)
select potentialFactor;
}
Here's an example of usage:
foreach (int i in 4.Factors())
{
Console.WriteLine(i);
}
Note that I have optimized for clarity, not for performance. For large values of i this algorithm can take a long time.
Another LINQ style and tying to keep the O(sqrt(n)) complexity
static IEnumerable<int> GetFactors(int n)
{
Debug.Assert(n >= 1);
var pairList = from i in Enumerable.Range(1, (int)(Math.Round(Math.Sqrt(n) + 1)))
where n % i == 0
select new { A = i, B = n / i };
foreach(var pair in pairList)
{
yield return pair.A;
yield return pair.B;
}
}
Here it is again, only counting to the square root, as others mentioned. I suppose that people are attracted to that idea if you're hoping to improve performance. I'd rather write elegant code first, and optimize for performance later, after testing my software.
Still, for reference, here it is:
public static bool Divides(this int potentialFactor, int i)
{
return i % potentialFactor == 0;
}
public static IEnumerable<int> Factors(this int i)
{
foreach (int result in from potentialFactor in Enumerable.Range(1, (int)Math.Sqrt(i))
where potentialFactor.Divides(i)
select potentialFactor)
{
yield return result;
if (i / result != result)
{
yield return i / result;
}
}
}
Not only is the result considerably less readable, but the factors come out of order this way, too.
I did it the lazy way. I don't know much, but I've been told that simplicity can sometimes imply elegance. This is one possible way to do it:
public static IEnumerable<int> GetDivisors(int number)
{
var searched = Enumerable.Range(1, number)
.Where((x) => number % x == 0)
.Select(x => number / x);
foreach (var s in searched)
yield return s;
}
EDIT: As Kraang Prime pointed out, this function cannot exceed the limit of an integer and is (admittedly) not the most efficient way to handle this problem.
Wouldn't it also make sense to start at 2 and head towards an upper limit value that's continuously being recalculated based on the number you've just checked? See N/i (where N is the Number you're trying to find the factor of and i is the current number to check...) Ideally, instead of mod, you would use a divide function that returns N/i as well as any remainder it might have. That way you're performing one divide operation to recreate your upper bound as well as the remainder you'll check for even division.
Math.DivRem
http://msdn.microsoft.com/en-us/library/wwc1t3y1.aspx
If you use doubles, the following works: use a for loop iterating from 1 up to the number you want to factor. In each iteration, divide the number to be factored by i. If (number / i) % 1 == 0, then i is a factor, as is the quotient of number / i. Put one or both of these in a list, and you have all of the factors.
And one more solution. Not sure if it has any advantages other than being readable..:
List<int> GetFactors(int n)
{
var f = new List<int>() { 1 }; // adding trivial factor, optional
int m = n;
int i = 2;
while (m > 1)
{
if (m % i == 0)
{
f.Add(i);
m /= i;
}
else i++;
}
// f.Add(n); // adding trivial factor, optional
return f;
}
I came here just looking for a solution to this problem for myself. After examining the previous replies I figured it would be fair to toss out an answer of my own even if I might be a bit late to the party.
The maximum number of factors of a number will be no more than one half of that number.There is no need to deal with floating point values or transcendent operations like a square root. Additionally finding one factor of a number automatically finds another. Just find one and you can return both by just dividing the original number by the found one.
I doubt I'll need to use checks for my own implementation but I'm including them just for completeness (at least partially).
public static IEnumerable<int>Factors(int Num)
{
int ToFactor = Num;
if(ToFactor == 0)
{ // Zero has only itself and one as factors but this can't be discovered through division
// obviously.
yield return 0;
return 1;
}
if(ToFactor < 0)
{// Negative numbers are simply being treated here as just adding -1 to the list of possible
// factors. In practice it can be argued that the factors of a number can be both positive
// and negative, i.e. 4 factors into the following pairings of factors:
// (-4, -1), (-2, -2), (1, 4), (2, 2) but normally when you factor numbers you are only
// asking for the positive factors. By adding a -1 to the list it allows flagging the
// series as originating with a negative value and the implementer can use that
// information as needed.
ToFactor = -ToFactor;
yield return -1;
}
int FactorLimit = ToFactor / 2; // A good compiler may do this optimization already.
// It's here just in case;
for(int PossibleFactor = 1; PossibleFactor <= FactorLimit; PossibleFactor++)
{
if(ToFactor % PossibleFactor == 0)
{
yield return PossibleFactor;
yield return ToFactor / PossibleFactor;
}
}
}
Program to get prime factors of whole numbers in javascript code.
function getFactors(num1){
var factors = [];
var divider = 2;
while(num1 != 1){
if(num1 % divider == 0){
num1 = num1 / divider;
factors.push(divider);
}
else{
divider++;
}
}
console.log(factors);
return factors;
}
getFactors(20);
In fact we don't have to check for factors not to be square root in each iteration from the accepted answer proposed by chris fixed by Jon, which could slow down the method when the integer is large by adding an unnecessary Boolean check and a division. Just keep the max as double (don't cast it to an int) and change to loop to be exclusive not inclusive.
private static List<int> Factor(int number)
{
var factors = new List<int>();
var max = Math.Sqrt(number); // (store in double not an int) - Round down
if (max % 1 == 0)
factors.Add((int)max);
for (int factor = 1; factor < max; ++factor) // (Exclusice) - Test from 1 to the square root, or the int below it, inclusive.
{
if (number % factor == 0)
{
factors.Add(factor);
//if (factor != number / factor) // (Don't need check anymore) - Don't add the square root twice! Thanks Jon
factors.Add(number / factor);
}
}
return factors;
}
Usage
Factor(16)
// 4 1 16 2 8
Factor(20)
//1 20 2 10 4 5
And this is the extension version of the method for int type:
public static class IntExtensions
{
public static IEnumerable<int> Factors(this int value)
{
// Return 2 obvious factors
yield return 1;
yield return value;
// Return square root if number is prefect square
var max = Math.Sqrt(value);
if (max % 1 == 0)
yield return (int)max;
// Return rest of the factors
for (int i = 2; i < max; i++)
{
if (value % i == 0)
{
yield return i;
yield return value / i;
}
}
}
}
Usage
16.Factors()
// 4 1 16 2 8
20.Factors()
//1 20 2 10 4 5
Linq solution:
IEnumerable<int> GetFactors(int n)
{
Debug.Assert(n >= 1);
return from i in Enumerable.Range(1, n)
where n % i == 0
select i;
}

List<int> separated into thirds

Assuming I have a list of numbers, which could be any amount, realistically over 15.
I want to separate that list of numbers into three groups depending on their size, small, medium, and large for instance.
What is the best way of achieving this?
I've written out the below, is it necessary to make my own function as per below, or is there anything existing that I can utilise in .NET?
public static List<int> OrderByThree (List<int> list)
{
list.Sort();
int n = list.Count();
int small = n / 3;
int medium = (2 * n) / 3;
int large = n;
// depending if the number is lower/higher than s/m/l,
// chuck into group via series of if statements
return list;
}
Example
Say I have a list of numbers, 1-15 for instance, I want 1-5 in small, 6-10 in medium and 11-15 in large. However I won't know the amount of numbers at the start, no dramas, using list.count I was hoping to divide for my own function.
Since you have the list sorted already, you can use some LINQ to get the results. I'm assuming a right-closed interval here.
list.Sort();
int n = list.Count();
var smallGroup = list.TakeWhile(x => (x <= n / 3)).ToList();
var middleGroup = list.Skip(smallGroup.Count).TakeWhile(x => (x <= (2 * n) / 3)).ToList();
var largeGroup = list.Skip(smallGroup.Count + middleGroup.Count).ToList();
EDIT
As Steve Padmore commented, you probably will want to return a list of lists (List<List<int>>) from your method, rather than just List<int>.
return new List<List<int>> { smallGroup, middleGroup, largeGroup };
This would be a simple way of doing it:
var result = list.GroupBy (x =>
{
if(x <= small) return 1;
if(x <= medium) return 2;
return 3;
});
Or:
var result = list.GroupBy (x => x <= small ? 1 : x <= medium ? 2 : 3);
(This does not require the list to be sorted)

How to multiply all elements in an doubles list?

How do I multiply the contents of a list <double>?
List<double> mult=new List<double>{3, 5, 10};
So far I have:
double r=0.0;
for(int i=0;i<mult.Count;i++)
{
r=mult[i]*mult[(i+1)];
}
To fix your loop, start with 1.0 and multiply each item in the list, like this:
double r = 1.0;
for(int i = 0; i < mult.Count; i++)
{
r = r * mult[i]; // or equivalently r *= mult[i];
}
But for simplicity, you could use a little Linq with the Aggregate extension method:
double r = mult.Aggregate((a, x) => a * x);
What do you mean by multiply? If you like to calculate the product, then your code is wrong and the correct code is
double r=1.0;
for(int i=0;i<mult.Count;i++)
{
r *= mult[i];
}

Math.Log vs multiplication complexity in terms of geometric average which is better?

I want to find geometric average of data and performance does matters.
Which one should I pick between
Keep multiplication over single variable and take Nth-root at the end of calculation
X = MUL(x[i])^(1/N)
Thus, O(N) x Multiplication Complexity + O(1) x Nth-root
Use logarithm
X = e ^ { 1/N * SUM(log(x[i])) }
Thus, O(N) x Logarithm Complexity + O(1) x Nth-division + O(1) Exponential
Specialized algorithm for geometric average. Please tell me if there is.
I thought I would try to benchmark this and get a comparison, here is my attempt.
Comparing was difficult since the list of numbers needed to be large enough to make timing it reasonable, so N is large. In my test N = 50,000,000 elements.
However, multiplying lots of numbers together which are greater than 1 overflows the double storing the product. But multiplying together numbers less than 1 gives a total product which is very small, and dividing by the number of elements gives zero.
Just a couple more things: Make sure none of your elements are zero, and the Log approach doesn't work for negative elements.
(The multiply would work without overflow if C# had a BigDecimal class with an Nth root function.)
Anyway, in my code each element is between 1 and 1.00001
On the other hand, the log approach had no problems with overflows, or underflows.
Here's the code:
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Starting...");
Console.WriteLine("");
Stopwatch watch1 = new Stopwatch();
Stopwatch watch2 = new Stopwatch();
List<double> list = getList();
double prod = 1;
double mean1 = -1;
for (int c = 0; c < 2; c++)
{
watch1.Reset();
watch1.Start();
prod = 1;
foreach (double d in list)
{
prod *= d;
}
mean1 = Math.Pow(prod, 1.0 / (double)list.Count);
watch1.Stop();
}
double mean2 = -1;
for (int c = 0; c < 2; c++)
{
watch2.Reset();
watch2.Start();
double sum = 0;
foreach (double d in list)
{
double logged = Math.Log(d, 2);
sum += logged;
}
sum *= 1.0 / (double)list.Count;
mean2 = Math.Pow(2.0, sum);
watch2.Stop();
}
Console.WriteLine("First way gave: " + mean1);
Console.WriteLine("Other way gave: " + mean2);
Console.WriteLine("First way took: " + watch1.ElapsedMilliseconds + " milliseconds.");
Console.WriteLine("Other way took: " + watch2.ElapsedMilliseconds + " milliseconds.");
Console.WriteLine("");
Console.WriteLine("Press enter to exit");
Console.ReadLine();
}
private static List<double> getList()
{
List<double> result = new List<double>();
Random rand = new Random();
for (int i = 0; i < 50000000; i++)
{
result.Add( rand.NextDouble() / 100000.0 + 1);
}
return result;
}
}
My computer output describes that both geometric means are the same, but that:
Multiply way took: 466 milliseconds
Logarithm way took: 3245 milliseconds
So, the multiply appears to be faster.
But multiply is very problematic with overflow and underflow, so I would recommend the Log approach, unless you can guarantee the product won't overflow and that the product won't get too close to zero.

How can I convert this divide and conquer code to compare one point to a list of points?

I found this code on the website http://rosettacode.org/wiki/Closest-pair_problem and I adopted the C# version of the divide and conquer method of finding the closest pair of points but what I am trying to do is adapt it for use to only find the closest point to one specific point. I have googled quite a bit and searched this website to find examples but none quite like this. I am not entirely sure what to change so that it only checks the list against one point rather than checking the list to find the two closest. I'd like to make my program operate as fast as possible because it could be searching a list of several thousand Points to find the closest to my current coordinate Point.
public class Segment
{
public Segment(PointF p1, PointF p2)
{
P1 = p1;
P2 = p2;
}
public readonly PointF P1;
public readonly PointF P2;
public float Length()
{
return (float)Math.Sqrt(LengthSquared());
}
public float LengthSquared()
{
return (P1.X - P2.X) * (P1.X - P2.X)
+ (P1.Y - P2.Y) * (P1.Y - P2.Y);
}
}
public static Segment Closest_BruteForce(List<PointF> points)
{
int n = points.Count;
var result = Enumerable.Range(0, n - 1)
.SelectMany(i => Enumerable.Range(i + 1, n - (i + 1))
.Select(j => new Segment(points[i], points[j])))
.OrderBy(seg => seg.LengthSquared())
.First();
return result;
}
public static Segment MyClosestDivide(List<PointF> points)
{
return MyClosestRec(points.OrderBy(p => p.X).ToList());
}
private static Segment MyClosestRec(List<PointF> pointsByX)
{
int count = pointsByX.Count;
if (count <= 4)
return Closest_BruteForce(pointsByX);
// left and right lists sorted by X, as order retained from full list
var leftByX = pointsByX.Take(count / 2).ToList();
var leftResult = MyClosestRec(leftByX);
var rightByX = pointsByX.Skip(count / 2).ToList();
var rightResult = MyClosestRec(rightByX);
var result = rightResult.Length() < leftResult.Length() ? rightResult : leftResult;
// There may be a shorter distance that crosses the divider
// Thus, extract all the points within result.Length either side
var midX = leftByX.Last().X;
var bandWidth = result.Length();
var inBandByX = pointsByX.Where(p => Math.Abs(midX - p.X) <= bandWidth);
// Sort by Y, so we can efficiently check for closer pairs
var inBandByY = inBandByX.OrderBy(p => p.Y).ToArray();
int iLast = inBandByY.Length - 1;
for (int i = 0; i < iLast; i++)
{
var pLower = inBandByY[i];
for (int j = i + 1; j <= iLast; j++)
{
var pUpper = inBandByY[j];
// Comparing each point to successivly increasing Y values
// Thus, can terminate as soon as deltaY is greater than best result
if ((pUpper.Y - pLower.Y) >= result.Length())
break;
Segment segment = new Segment(pLower, pUpper);
if (segment.Length() < result.Length())
result = segment;// new Segment(pLower, pUpper);
}
}
return result;
}
I used this code in my program to see the actual difference in speed and divide and conquer easily wins.
var randomizer = new Random(10);
var points = Enumerable.Range(0, 10000).Select(i => new PointF((float)randomizer.NextDouble(), (float)randomizer.NextDouble())).ToList();
Stopwatch sw = Stopwatch.StartNew();
var r1 = Closest_BruteForce(points);
sw.Stop();
//Debugger.Log(1, "", string.Format("Time used (Brute force) (float): {0} ms", sw.Elapsed.TotalMilliseconds));
richTextBox.AppendText(string.Format("Time used (Brute force) (float): {0} ms", sw.Elapsed.TotalMilliseconds));
Stopwatch sw2 = Stopwatch.StartNew();
var result2 = MyClosestDivide(points);
sw2.Stop();
//Debugger.Log(1, "", string.Format("Time used (Divide & Conquer): {0} ms", sw2.Elapsed.TotalMilliseconds));
richTextBox.AppendText(string.Format("Time used (Divide & Conquer): {0} ms", sw2.Elapsed.TotalMilliseconds));
//Assert.Equal(r1.Length(), result2.Length());
You can store the points in a better data structure that takes advantage of their position. Something like a quadtree.
The divide and conquer algorithm that you are trying to use doesn't really apply to this problem.
Don't use this algorithm at all, just go through the list one at a time comparing the distance to your reference point and at the end return the point that was the closest. This will be O(n).
You can probably add some extra speed ups but this should be good enough.
I can write some example code if you want.
You're mixing up two different problems. The only reason divide and conquer for the closest pair problem is faster than brute force is that it avoids comparing every point to every other point, so that it gets O(n log n) instead of O(n * n). But finding the closest point to just one point is just O(n). How can you find the closest point in a list of n points, while examining less than n points? What you're trying to do doesn't even make sense.
I can't say why your divide and conquer runs in less time than your brute force; maybe the linq implementation runs slower. But I think you'll find two things: 1) Even if, in absolute terms, your implementation of divide and conquer for 1 point runs in less time than your implementation of brute force for 1 point, they still have the same O(n). 2) If you just try a simple foreach loop and record the lowest distance squared, you'll get even better absolute time than your divide and conquer - and, it will still be O(n).
public static float LengthSquared(PointF P1, PointF P2)
{
return (P1.X - P2.X) * (P1.X - P2.X)
+ (P1.Y - P2.Y) * (P1.Y - P2.Y);
}
If, as your question states, you want to compare 1 (known) point to a list of points to find the closest then use this code.
public static Segment Closest_BruteForce(PointF P1, List<PointF> points)
{
PointF closest = null;
float minDist = float.MaxValue;
foreach(PointF P2 in points)
{
if(P1 != P2)
{
float temp = LengthSquared(P1, P2);
if(temp < minDist)
{
minDist = temp;
closest = P2;
}
}
}
return new Segment(P1, closest);
}
However, if as your example shows, you want to find the closest 2 points from a list of points try the below.
public static Segment Closest_BruteForce(List<PointF> points)
{
PointF closest1;
PointF closest2;
float minDist = float.MaxValue;
for(int x=0; x<points.Count; x++)
{
PointF P1 = points[x];
for(int y = x + 1; y<points.Count; y++)
{
PointF P2 = points[y];
float temp = LengthSquared(P1, P2);
if(temp < minDist)
{
minDist = temp;
closest1 = P1;
closest2 = P2;
}
}
}
return new Segment(closest1, closest2);
}
note the code above was written in the browser and may have some syntax errors.
EDIT Odd... is this an acceptable answer or not? Down-votes without explanation, oh well.

Categories