Efficiency of LINQ in simple operations - c#

I have recently discovered LINQ and I find it very interesting to use. Currently I have the following function, and I am not sure whether it would be MORE efficient, and after all, produce the same output.
Can you please tell me your opinion about this?
The function simply removes punctuation in a very simple manner:
private static byte[] FilterText(byte[] arr)
{
List<byte> filteredBytes = new List<byte>();
int j = 0; //index for filteredArray
for (int i = 0; i < arr.Length; i++)
{
if ((arr[i] >= 65 && arr[i] <= 90) || (arr[i] >= 97 && arr[i] <= 122) || arr[i] == 10 || arr[i] == 13 || arr[i] == 32)
{
filteredBytes.Insert(j, arr[i]) ;
j++;
}
}
//return the filtered content of the buffer
return filteredBytes.ToArray();
}
The LINQ alternative:
private static byte [] FilterText2(byte[] arr)
{
var x = from a in arr
where ((a >= 65 && a <= 90) || (a >= 97 && a <= 122) || a == 10 || a == 13 || a == 32)
select a;
return x.ToArray();
}

LINQ usually is slightly less efficient than simple loops and procedural code, but the difference is typically small and the conciseness and ease of reading usually makes it worth converting simple projections and filtering to LINQ.
If the performance really matters, measure it and decide for yourself if the performance of the LINQ code is adequate.

LinQ is great to keep things simple. Performances wise, it can really become a problem if you start doing a lot of conversions to lists, arrays, and so on.
MyObject.where(...).ToList().something().ToList().somethingelse.ToList();
This is well known to be a killer, try to convert to a final list as late as possible.

Screw performance, LINQ is awsome because of this:
private static bool IsAccepted(byte b)
{
return (65 <= b && b <= 90) ||
(97 <= b && b <= 122) ||
b == 10 || b == 13 || b == 32;
}
arr.Where(IsAccepted).ToArray(); // equivalent to FilterText(arr)
I.e. you do not write the how, but just the what. Also, it's about as fast (slow) as the other method which you presented: Where(..) gets evaluated lazily in ToArray() which internally creates a List and converts that to an Array iirc.
And by the way, strings are Unicode in C#, so don't use this to do some simple string formatting (there are far nicer alternatives for that).

For the most part, I agree with #MarkByers. Linq will be a little less efficient than procedural code. Generally, the deficiency can be traced to compilation of an expression tree. Nevertheless, the readability & time improvements are worth the hit in 99% of cases. When you encounter a performance issue, benchmark, modify, and re-benchmark.
With that said, LINQ is pretty closely related to lambdas and anonymous delegates. These features are and often talked about as if they are the same thing. There are cases where these constructs can be faster than procedural code. It looks like your example can be one of those cases. I would rewrite your code as follows:
private static byte [] FilterText2(byte[] arr) {
return arr.Where( a=> (a >= 65 && a <= 90) ||
(a >= 97 && a <= 122) ||
a == 10 || a == 13 || a == 32
).ToArray();
}
Again, do some bench marks for your specific scenario, as YMMV. A lot of ink has been spilled on which is faster and under what scenarios. Here is some of that ink:
http://blog.jerrynixon.com/2010/02/revisiting-c-loop-performance.html
http://blog.thijssen.ch/2009/02/linq-vs-lambda-vs-loop-performance-test.html
In .NET, which loop runs faster, 'for' or 'foreach'?
When not to use lambda expressions

Many LINQ statements are easily parallelizable. Just add AsParallel() to the beginning of a query. You can also add AsOrdered() if you want the original order to be preserved at the expense of some performance. For example, the following LINQ statement:
arr.Where(IsAccepted).ToArray();
can be written as:
arr.AsParallel().AsOrdered().Where(IsAccepted).ToArray();
You just have to make sure its overhead doesn't outweigh its benefits:
var queryA = from num in numberList.AsParallel()
select ExpensiveFunction(num); //good for PLINQ
var queryB = from num in numberList.AsParallel()
where num % 2 > 0
select num; //not as good for PLINQ

Every good written imperative code will be more time and space effective than good written declarative code, because that declarative one must be translated to imperative one (except you own a Prolog machine ... which you probably don't, because you are asking about .Net :-) ).
But if you can solve a problem using LINQ in simpler and more readable way than using loops, it's worth it. When you see something like
var actualPrices = allPrices
.Where(price => price.ValidFrom <= today && price.ValidTo >= today)
.Select(price => price.PriceInUSD)
.ToList();
it's "one line" where it's obvious what it's doing on the first sight. Declaring a new collection, looping through old one, writing if and adding something to the new one is not. So it's a win if you don't want to save every millisecond (which you probably don't, because you are using .Net and not C with embedded ASM). And LINQ is highly optimalized - there are more codebases - one for collections, one for XML, one for SQL ..., so it is generally not much slower. No reason NOT to use it.
Some LINQ expressions can be easily parallelized using Parallel LINQ, almost "for free" (= no more code, but the parallelism overhead is still there, so count with it).

Related

Range of Integers

C#. I would like to compare a random number to a guess.
If the guess is 3 more or 3 less than the random number , the program should show the statement
Console.WriteLine("Almost right");
Can I write like this?
If (randomnumber < guess+3 | randomnumber> guess-3);
Console.Writeln ("Almost right")
I am not using array.
Is there a more efficient way to write the code?
You are on the right track.
When you write code here, you can and should write it as code, read the markdown spec or get accustomed to the editor here at stackoverflow. code looks like:
If (randomnumber < guess+3 | randomnumber> guess-3); Console.Writeln ("Almost right")
You should then write real code because your code is more c# like pseudo code. Correctly you must write:
if (randomnumber < guess+3 || randomnumber> guess-3) {
Console.Writeln ("Almost right");
}
Check the logical operators in C#, its || not |
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/operators/boolean-logical-operators
Performance wise this is fine you could write a specific method like
bool IsRoughly (x, y) {
return x < y + 3 || y < x + 3;
}
This puts the esence of your logic more into light. Finally you have to think about the corner cases: Is 1 really almost the maximum value of an int? Probably not.
With C# 9 you can do in easy readable way. These are just alternative ways do to the the same thing in bit different way
if (randomnumber is < (guess+3) or randomnumber is > (guess-3))
{
Console.WriteLine("Almost right");
}
Another alternative way is to use
if(Enumerable.Range(guess - 2, guess + 2).Contains(randomnumber))

Fermat primality test

I have tried to write a code for Fermat primality test, but apparently failed.
So if I understood well: if p is prime then ((a^p)-a)%p=0 where p%a!=0.
My code seems to be OK, therefore most likely I misunderstood the basics. What am I missing here?
private bool IsPrime(int candidate)
{
//checking if candidate = 0 || 1 || 2
int a = candidate + 1; //candidate can't be divisor of candidate+1
if ((Math.Pow(a, candidate) - a) % candidate == 0) return true;
return false;
}
Reading the wikipedia article on the Fermat primality test, You must choose an a that is less than the candidate you are testing, not more.
Furthermore, as MattW commented, testing only a single a won't give you a conclusive answer as to whether the candidate is prime. You must test many possible as before you can decide that a number is probably prime. And even then, some numbers may appear to be prime but actually be composite.
Your basic algorithm is correct, though you will have to use a larger data type than int if you want to do this for non-trivial numbers.
You should not implement the modular exponentiation in the way that you did, because the intermediate result is huge. Here is the square-and-multiply algorithm for modular exponentiation:
function powerMod(b, e, m)
x := 1
while e > 0
if e%2 == 1
x, e := (x*b)%m, e-1
else b, e := (b*b)%m, e//2
return x
As an example, 437^13 (mod 1741) = 819. If you use the algorithm shown above, no intermediate result will be greater than 1740 * 1740 = 3027600. But if you perform the exponentiation first, the intermediate result of 437^13 is 21196232792890476235164446315006597, which you probably want to avoid.
Even with all of that, the Fermat test is imperfect. There are some composite numbers, the Carmichael numbers, that will always report prime no matter what witness you choose. Look for the Miller-Rabin test if you want something that will work better. I modestly recommend this essay on Programming with Prime Numbers at my blog.
You are dealing with very large numbers, and trying to store them in doubles, which is only 64 bits.
The double will do the best it can to hold your number, but you are going to loose some accuracy.
An alternative approach:
Remember that the mod operator can be applied multiple times, and still give the same result.
So, to avoid getting massive numbers you could apply the mod operator during the calculation of your power.
Something like:
private bool IsPrime(int candidate)
{
//checking if candidate = 0 || 1 || 2
int a = candidate - 1; //candidate can't be divisor of candidate - 1
int result = 1;
for(int i = 0; i < candidate; i++)
{
result = result * a;
//Notice that without the following line,
//this method is essentially the same as your own.
//All this line does is keeps the numbers small and manageable.
result = result % candidate;
}
result -= a;
return result == 0;
}

Evaluating six conditions efficently and clearly

I have the following (simplified) conditions that need to be validated for a form I am writing:
a > b
a > c
a > d
b > c
b > d
c > d
Graphically, this can be seen as:
The user has freedom to enter values for a, b, c, and d, which is why they need to be validated to make sure they obey those rules. The problem I am having is writing something that clearly and efficiently evaluates each statement. Of course, the most obvious way would be to evaluate each statement separately as an if-statement. Unfortunately, this takes up a lot of lines of code, and I'd rather avoid cluttering up the method with a bunch of if-blocks that do almost the same thing. I did come up with the following nested for-loop solution, using arrays. I built an array of the values, and looped over it twice (demonstrated in Python-like pseudo-code):
A = [a, b, c, d]
for i in range(3):
for j in range(i, 4):
if i > j and A[i] >= A[j]:
print("A[i] must be less than A[j]")
else if i < j and A[i] <= A[j]:
print("A[j] must be greater than A[i]")
The problem I have with this solution is it is hard to read and understand - the solution just isn't clear.
I have this nagging feeling that there is a better, clearer answer out there, but I can't think of it for the life of me. This isn't homework or anything - I am actually working on a project and this problem (or subtle variations of it) arose more than once, so I would like to make a reusable solution that is clear and efficient. Any input would be appreciated. Thanks!
if a > b > c > d:
do ok
else:
for x in range(3):
if A[i] <= A[i+1]:
print A[i], ' not greater than', A[i+1]
If you can't assume transitivity of comparison:
from itertools import combinations
for x, y in combinations(A, 2):
if x <= y:
print("{} not greater than {}".format(x, y))
Otherwise, f p's solution is optimal.
You could create delegates for each check and add these to an array. Create appropriate delegates, like bool LargerCheck(a,b, string error) and add them to an array which you can loop through...
A lot of work though and more complex, if more readable. I think I would just hide the messy checks in a single validationblock and have a easily readable single check in the normal code. Something like this;
// Simple readable check in normal code
if (!ValidationOk(a,b,c,d,response)
{
ShowResponse(response);
Break;
}
// messy routine
private bool ValidationOk(a,b,c,d,List<string> valerrors)
{
valerrors.Clear();
if (a<b) valerrors.Add("a < b");
if (a<c) valerrors.Add("a < c");
....
return valerrors.Count == 0;
}

C# ?: Expression

I have a function with multiple if's (THIS IS NOT THE ACTUAL CODE)
if(n == 1)
m = 1;
if(n == 2)
m = 2;
if(n == 3)
m = 3;
Instead of that I wanted to do make them all into ?: expression :
(n == 1) ? m = 1;
But it says that its expecting a ':'
I am familiar with the ?: expression from C++ where you can simply write:
(n == 1) ? m = 1 : 0;
But 0 doesn't take here. This is a ridiculous question and I couldn't even find an answer in google since it ignores '?:' as a word.
ANSWER : too bad the answer was in the comments. There is no way to "do nothing" in this expression and I should use if-else or switch. thanks.
It looks like you're looking for:
m = (n == 1) ? 1 : 0;
Which you could then cascade to:
m = (n == 1) ? 1 : (n == 2) ? 2 : (n == 3) ? 3 : 0;
An important (to me, anyway), aside:
Why are you asking this? If it's because you think that this form will be more efficient than a series of if statements, or a switch, don't. The C# compiler and the .net JIT compiler are really quite clever and they'll transform your code (hopefully!) into its most optimal form. Write your code so its as understandable by yourself, or the developer who has to maintain it after you as it can be. If the performance you get isn't acceptable, then try changing it around but measure to determine what works best (bearing in mind that newer compilers/.net frameworks could well change what happens).
looking for ternary operator in c# will give you relevant results.
an example usage would be
var m = n == 1 ? 1 : 0
Maybe:
m = (n == 1) ? 1 : (n == 2) ? 2 : (n == 3) ? 3 : m;
or
m = n
Edit:
Simplified:
variable2 = (variable1 == value) ?
variable1 :
variable2;
You want this:
m = (n == 1) ? 1 : 0;
To nest them all it would look like this:
m = (n == 1) ? 1 : (n == 2) ? 2 : (n == 3) ? 3 : 0;
But as you can see, this is really a lot less easy to read and understand. It can help to add extra parenthesis, but I think you're better off using an if-else tree.
m = (n == 1) ? 1 : m
Means
M equals 1 if n == 1, else m
FYI the ? is called the Ternery operator. Find usage on MSDN
Best regards,
You could write:
m = (n==1) ? 1 : m;
But IMO that's harder to read and uglier than the original code.
(n == 1) ? m = 1 : 0;
This isn't allowed because C# doesn't allow arbitrary expressions as a statement. Method calls and assignments are allowed, most other expressions aren't.
A statement is executed for its side-effects, an expression for its value. So it's only natural that the outermost part of a statement has a side effect. ?: never has a side-effect, so it's not allowed as a statement.
Try this :
m = (n == 1) ? 1 : 0;
This is not a problem to be solved with a ternary if/else operator - it is clearly an ideal candidate for a switch statement (and using a switch is likely to be much more efficient than using a sequence of ternary operators)
If you wish to transliterate an if statement into ?:, then it's quite simple:
if ({condition}) {then-code}; else {else-code};
becomes
{condition} ? {then-code} : {else-code};
The only restriction is that the then/else code is a single statement.
The primary benefit of ?: (with modern compilers) is that it can be embedded within a statement to significantly compress the source code - sometimes this can aid readability, and sometimes it just serves to obfuscate the meaning of the code - use it with care.

Can I use infinite range and operate over it?

Enumerable.Range(0, int.MaxValue)
.Select(n => Math.Pow(n, 2))
.Where(squared => squared % 2 != 0)
.TakeWhile(squared => squared < 10000).Sum()
Will this code iterate over all of the integer values from 0 to max-range or just through the integer values to satisfy the take-while, where, and select operators?
Can somebody clarify?
EDIT: My first try to make sure it works as expected was dumb one. I revoke it :)
int.MaxValue + 5 overflows to be a negative number. Try it yourself:
unchecked
{
int count = int.MaxValue + 5;
Console.WriteLine(count); // Prints -2147483644
}
The second argument for Enumerable.Range has to be non-negative - hence the exception.
You can certainly use infinite sequences in LINQ though. Here's an example of such a sequence:
public IEnumerable<int> InfiniteCounter()
{
int counter = 0;
while (true)
{
unchecked
{
yield return counter;
counter++;
}
}
}
That will overflow as well, of course, but it'll keep going...
Note that some LINQ operators (e.g. Reverse) need to read all the data before they can yield their first result. Others (like Select) can just keep streaming results as they read them from the input. See my Edulinq blog posts for details of the behaviour of each operator (in LINQ to Objects).
The way to solve these sort of questions in general, is to think about what's going on in steps.
Linq turns the linq code into something that'll be executed by query provider. This could be something like producing SQL code, or all manner of things. In the case of linq-to-objects, it produces some equivalent .NET code. Thinking about what that .NET code will be lets us reason about what will happen.*
With your code you have:
Enumerable.Range(0, int.MaxValue)
.Select(n => Math.Pow(n, 2))
.Where(squared => squared % 2 != 0)
.TakeWhile(squared => squared < 10000).Sum()
Enumerable.Range is slightly more complicated than:
for(int i = start; i != start + count; ++i)
yield return i;
...but that's close enough for argument's sake.
Select is close enough to:
foreach(T item in source)
yield return func(item);
Where is close enough to:
foreach(T item in source)
if(func(item))
yield return item;
TakeWhile is close enough to:
foreach(T item in source)
if(func(item))
yield return item;
else
yield break;
Sum is close enough to:
T tmp = 0;//must be numeric type
foreach(T x in source)
tmp += x;
return tmp;
This simplifies a few optimisations and so on, but is close enough to reason with. Taking each of these in turn, your code is equivalent to:
double ret = 0; // part of equivalent of sum
for(int i = 0; i != int.MaxValue; ++i) // equivalent of Range
{
double j = Math.Pow(i, 2); // equivalent of Select(n => Math.Pow(n, 2))
if(j % 2 != 0) //equivalent of Where(squared => squared %2 != 0)
{
if(j < 10000) //equivalent of TakeWhile(squared => squared < 10000)
{
ret += j; //equaivalent of Sum()
}
else //TakeWhile stopping further iteration
{
break;
}
}
}
return ret; //end of equivalent of Sum()
Now, in some ways the code above is simpler, and in some ways it's more complicated. The whole point of using LINQ is that in many ways its simpler. Still, to answer your question "Will this code iterate over all of the integer values from 0 to max-range or just through the integer values to satisfy the take-while, where, and select operators?" we can look at the above and see that those that don't satisfy the where are iterated through to find that they don't satisfy the where, but no more work is done with them, and once the TakeWhile is satisfied, all further work is stopped (the break in my non-LINQ re-write).
Of course it's only the TakeWhile() in this case that means the call will return in a reasonable length of time, but we also need to think briefly about the others to make sure they yield as they go. Consider the following variant of your code:
Enumerable.Range(0, int.MaxValue)
.Select(n => Math.Pow(n, 2))
.Where(squared => squared % 2 != 0)
.ToList()
.TakeWhile(squared => squared < 10000).Sum()
Theoretically, this will give exactly the same answer, but it will take far longer and far more memory to do so (probably enough to cause an out of memory exception). The equivalent non-linq code here though is:
List<double> tmpList = new List<double>(); // part of ToList equivalent
for(int i = 0; i != int.MaxValue; ++i) // equivalent of Range
{
double j = Math.Pow(i, 2); // equivalent of Select(n => Math.Pow(n, 2))
if(j % 2 != 0) //equivalent of Where(squared => squared %2 != 0)
{
tmpList.Add(j);//part of equivalent to ToList()
}
}
double ret = 0; // part of equivalent of sum
foreach(double k in tmpList)
{
if(k < 10000) //equivalent of TakeWhile(squared => squared < 10000)
{
ret += k; //equaivalent of Sum()
}
else //TakeWhile stopping further iteration
{
break;
}
}
return ret; //end of equivalent of Sum()
Here we can see how adding ToList() to the Linq query vastly affects the query so that every item produced by the Range() call must be dealt with. Methods like ToList() and ToArray() break up the chaining so that non-linq equivalents no longer fit "inside" each other and none can therefore stop the operation of those that come before. (Sum() is another example, but since it's after your TakeWhile() in your example, that isn't an issue).
Another thing that would make it go through every iteration of the range is if you had While(x => false) because it would never actually perform the test in TakeWhile.
*Though there may be further optimisations, esp in the case of SQL code and also while conceptually e.g. Count() is equivalent to:
int c = 0;
foreach(item in src)
++c;
return c;
That this will be turned into a call to the Count property of an ICollection or the Length property of an array means the O(n) above is replaced by an O(1) (for most ICollection implementations) call, which is a massive gain for large sequences.
Your first code will only iterate as long the TakeWhile condition is met. It will not iterate until int.MaxValue.
int.MaxValue + 5 will result in a negative integer. Enumerable.Range throws an ArgumentOutOfRangeException if its second argument is negative. So that's why you get the exception (before any iteration takes place).

Categories