I have two arrays:
Vector3[] positions;
Matrix4x4[] transforms;
And a point in space:
Vector3 point;
For each position I get the distance from the point:
float distance = GetDistance(point, transforms[i] * positions[i]);
I'm comfortable enough with using delegates to sort a single array, but how can I sort the two arrays at the same time?
I need the operation to be as fast as possible, so i'd like to avoid packing into a temporary array and then unpacking the result.
I'm using .NET 2.0 so no Linq.
Instead of managing two parallel array you should use a better model that binds the data together.
Tuple<Vector3, Matrix4x4>[] posTransforms;
//add like this
posTransforms.Add(new Tuple<Vector3, Matrix4x4>(vec, matrix));
// order by Y cooridinate of the vectors for example
posTransforms.OrderBy(x => x.Item1.Y)
The simplest solution is to do something like what #evanmcdonnal suggests and just package the corresponding elements together (because they are obviously related and should be part of the same data structure). Then sort them using either a built in sort function or some such thing.
If you are really opposed to doing that for whatever reason, you will need to write your own sort method that will move the elements of both arrays at the same time.
Uncompiled and untested example (but should give you a decent idea of how to proceed):
bool isSorted;
do
{
isSorted = true;
for (int i = 0; i < positions.Length - 1; i++)
{
float distance = GetDistance(point, transforms[i] * positions[i]);
float distanceNext = GetDistance(point, transforms[i + 1] * positions[i + 1]);
if (distanceNext < distance)
{
var swapTransform = transforms[i];
transforms[i] = transforms[i + 1];
transforms[i + 1] = swapTransforms;
var swapPosition = positions[i];
positions[i] = positions[i + 1];
positions[i + 1] = swapPosition ;
isSorted = false;
}
}
} while(!isSorted);
Note that I used a bubble sort here (which is in no way efficient, just really easy to write). I suggest finding a much more efficient algorithm to use if you decide to go this route.
Related
So I have this for loop:
for (int i = 0; i < meshes.Count; i++)
{
for (int j = 0; j < meshes.Count; j++)
{
for (int m = 0; m < meshes[i].vertices.Length; m++)
{
for (int n = 0; n < meshes[i].vertices.Length; n++)
{
if ((meshes[i].vertices[m].x == meshes[j].vertices[n].x) && (meshes[i].vertices[m].z == meshes[j].vertices[n].z))
{
if (meshes[i].vertices[m] != meshes[j].vertices[n])
{
meshes[i].vertices[m].y = meshes[j].vertices[n].y;
}
}
}
}
}
}
Which goes through a few million vectors and compares them to all other vectors, to then modify some of their y values. I think it works, however after hitting play it takes an unbelievably long time to load (currently been waiting for 15 minutes, and still going). Is there a way to make it more efficient? Thanks for the help!
As I read this, what you're basically doing, is that for all vertices with the same x and z, you set their y value to the same.
A more optimized way would be to use the Linq method GroupBy which internally uses hash mapping to avoid exponential time complexity like your current approach:
var vGroups = meshes.SelectMany(mesh => mesh.vertices)
.GroupBy(vertex => new { vertex.x, vertex.z });
foreach (var vGroup in vGroups)
{
vGroup.Aggregate((prev, curr) =>
{
// If prev is null (i.e. first iteration of the "loop")
// don't change the 'y' value
curr.Y = prev?.y ?? curr.y;
return curr;
});
}
// All vertices should now be updated in the 'meshes'
Note, that the final y value of the vertices depends on the order of the meshes and vertices in your original list. The first vertex in each vGroup is the deciding vertex. I believe it'll be opposite of your approach, where it's the last vertex that's the deciding one, but it doesn't sound like that's important for you.
Furthermore, be aware that in this (and your) approach you are possibly merging two vertices in the same mesh if two vertices have the same x and z values. I don't know if that's intended but I wanted to point it out.
A additional performance optimization would be to parallelize this. Just start out with call to AsParallel:
var vGroups = meshes.AsParallel()
.SelectMany(mesh => mesh.vertices)
.GroupBy(vertex => new { vertex.x, vertex.z });
// ...
Be aware, that parallelization is not always speeding things up if the computation you are trying to parallelize is not that computationally expensive. The overhead from parallelizing it may outweigh the benefits. I'm not sure if the GroupBy operation is heavy enough for it to be beneficial but you'll have to test that out for yourself. Try without it first.
For a simplified example, see this fiddle.
You want to make Y equal for all vertices with the same X and Z. Lets do just that
var yForXZDict = new Dictionary<(int, int), int>();
foreach (var mesh in meshes)
{
foreach (var vertex in mesh.vertices)
{
var xz = (vertex.x, vertex.z);
if (yForXZDict.TryGetValue(xz, out var y))
{
vertex.y = y;
}
else
{
yForXZDict[xz] = vertex.y;
}
}
}
You should replace int to the exact type you use for coordinates
You are comparing twice unnecessarily.
Here a short example of what I mean:
Let's say we have meshes A, B, C.
You are comparing
A, A
A, B
A, C
B, A
B, B
B, C
C, A
C, B
C, C
while this checks e.g. the combination A and B two times.
One first easy improvement would be to use e.g.
for (int i = 0; i < meshes.Count; i++)
{
// only check the current and following meshes
for (int j = i; j < meshes.Count; j++)
{
...
do you even want to compare a mesh with itself? Otherwise you can actually even use j = i + 1 so only compare the current mesh to the next and following meshes.
Then for the vertices it depends. If you actually also want to check the mesh with itself at least you want int n = m + 1 in the case that i == j.
It makes no sense to check a vertex with itself since the condition will always be true.
A next point is minimize accesses
You are accessing e.g.
meshes[i].vertices
five times!
rather get and store it once like e.g.
// To minimize GC it sometimes makes sense to reuse variables outside of a loop
Mesh meshA;
Mesh meshB;
Vector3[] vertsA;
Vector3[] vertsB;
Vector3 vA;
Vector3 vB;
for (int i = 0; i < meshes.Count; i++)
{
meshA = meshes[i];
vertsA = meshA.vertices;
for (int j = i; j < meshes.Count; j++)
{
meshB = meshes[j];
vertsB = meshB.vertices;
for(int m = 0; m < vertsA.Length; m++)
{
vA = vertsA[m];
...
Also note that a line like
meshes[i].vertices[m].y = meshes[j].vertices[n].y;
Actually shouldn't even compile!
The vertices are Vector3 which is a struct so assigning the
meshes[i].vertices[m].y
only changes the value of a returned Vector3 instance but shouldn't in any way change the content of the array.
You would rather work with the vA as mentioned before and at the end assign it back via
vertsA[m] = vA;
and then at the end of the loop assign the entire array back once via
meshA.vertices = vertsA;
And well finally: I would put this into a Thread or use Unity's JobSystem and the burst compiler and meanwhile e.g. display a progress bar or some User feedback instead of freezing the entire application.
Yet another point is floating point precision
you are directly comparing two float values using ==. Due to the floating point precision this might fail even if it shouldn't e.g.
10f * 0.1f == 1f
is not necessarily true. It might be 0.99999999 or 1.0000000001.
Therefore Unity uses only a precision of 0.00001 for Vector3 == Vector3.
You should either do the same and use
if(Mathf.Abs(vA.x - vB.x) <= 0.00001f)`
or use
if(Mathf.Approximately(vA.x, vB.x))
which equals
if(Mathf.Abs(vA.x - vB.x) <= Mathf.Epsilon)`
where Epsilon is the smallest value two floats can differ
I am working on a random dungeon generator just for fun / as a side project to learn some new things. I have written a function that returns an integer hash value for any given cell, which gives you information about what type of gameobject it should be. i.e. if it is a wall, what direction to face, is it a corner, etc. Here is what the function currently looks like.
private int CellHashValue(int xIndex, int yIndex, char centerTile)
{
int hashValue = 0;
if (dungeon2DArray[xIndex - 1, yIndex + 1] == centerTile)
{
hashValue += 1;
}
if (dungeon2DArray[xIndex, yIndex + 1] == centerTile)
{
hashValue += 2;
}
if (dungeon2DArray[xIndex + 1, yIndex + 1] == centerTile)
{
hashValue += 4;
}
if (dungeon2DArray[xIndex - 1, yIndex] == centerTile)
{
hashValue += 8;
}
if (dungeon2DArray[xIndex + 1, yIndex] == centerTile)
{
hashValue += 16;
}
if (dungeon2DArray[xIndex - 1, yIndex - 1] == centerTile)
{
hashValue += 32;
}
if (dungeon2DArray[xIndex, yIndex - 1] == centerTile)
{
hashValue += 64;
}
if (dungeon2DArray[xIndex + 1, yIndex - 1] == centerTile)
{
hashValue += 128;
}
return hashValue;
}
My question is, is there a more efficient and faster way to do these checks that perhaps I am not thinking of? The dungeon array ranges in size from 100x100 to 1000x1000, though the function is not called on each cell. I have a separate List that contains rooms and there start and end indexes for each direction that I iterate over to instantiate objects.
What you're doing is essentially applying a form of convolution. Without more context as to how your method is being called or how you're using the returned hash value, what you're doing seems to be close to the most efficient way to iterate a 3x3 grid. Assuming your dungeon2dArray is a char[][] and is global, this is what I believe to be a bit clearer and more concise (you'll have to adjust how to interpret the resulting sum based on the order of iteration).
private int CellHashValue(int x, int y) {
int hashSum = 0; // Hash sum to return
int hashValue = 1; // Increases as power of 2 (1,2,4,8,...128)
char centerTile = dungeon2DArray[x, y]; // Cache center tile
for (int r = -1; r <= 1; r++) {
for (int c = -1; c <= 1; c++) {
if (r == 0 && c == 0) continue; // Skip center tile
if (dungeon2DArray[x + c, y + r] == centerTile) {
hashSum += hashValue;
}
hashValue *= 2; // Next power of two
//could also bit shift here instead
// hashValue <<= 1
}
}
return hashSum;
}
Note: This method doesn't do any boundary checking, so if x or y index is along edge, indices will fail.
Each of the array accesses is O(1) and iterating over your entire dungeon array is O(n^2), so the only way to get better efficiency would be to combine per cell methods calls, but this is still only a constant factor, so not really more efficient, but depending on the calculation could boost performance a little bit.
Since you are using an array to build the map then the access time is constant due to direct access. Thus, the process of checking each array index is fast.
There are several minor things to speed up the function.
Return the hashValue within the corresponding if statement. This will remove a few lines of code.
By removing the hashValue variable and returning a hard-coded value, a variable initialization will be removed from the process. This is more significant than it may seem. To create and remove an object takes time, lots of time when at scale.
xIndex and yIndex can be made global to the object. Be careful implementing this idea. Since xIndex and yIndex are not changing while checking for specific conditions then they can be made global within the object. This reduces the number of parameters passed in. Since Java does not pass by reference then an object of equal size is created and passed into the function. A simple int won't impact the speed much but if you have an object that contains many variables then more time is needed to build another object of equal value.
Each check can be moved to a separate function. This primarily helps with readability and debugging later on. There are some speed advantages but they're project dependent. By observing how objects are initialized/manipulated then certain conditions can typically be forced to be true. When logic doesn't need to be checked and conclusions can be reached without checks, then less time is needed.
Just a few ideas. If you have some time to research, the 2nd and 3rd points use concepts from low latency/high frequency. Also, be aware that some of these concepts are not thought of as best practice.
For some reason i my code doesn't properly compare when an instantiated object is being overlapped.
What i want to do is, to generate random platforms with different positions and scale (X).
And since its random, it is possible of overlapping happening. So in order to solve this problem, ive tried to compare each and every platform and see if it overlaps and when it does, it will delete itself and instantiate another one.
An addition to this question is,
If i have the overlapping problem solved, is it possible to make it so the platforms are at a certain distance away from each other, for X Y and Z.
So . .
What have i done wrong ?
What can i do ?
void Platform_Position_Scale_Generator(int i) {
posX[i] = Random.Range(minPosRange, maxPosRange + 1);
posY[i] = Random.Range(minPosRange, maxPosRange + 1);
posZ[i] = 0;
scaleX[i] = Random.Range(minScaleRange, maxScaleRange + 1);
scaleY[i] = 1;
scaleZ[i] = 1;
}
void Platform_Generator(int i) {
platformPrefabPosition[i].x = posX[i];
platformPrefabPosition[i].y = posY[i];
platformPrefabPosition[i].z = posZ[i];
Instantiate(platformPrefab, platformPrefabPosition[i], Quaternion.identity);
platformPrefab.transform.localScale = new Vector3(scaleX[i], 1, 1);
}
// Error with this
void Detect_Collision(int i) {
for(int f = 0; f < i; f++) {
for(int s = f + 1; s < i; s++) {
bool xOverlap = (posX[s] > posX[f] && posX[s] < posX[f] + scaleX[i]) || (posX[f] > posX[s] && posX[f] < posX[s] + scaleX[i]);
bool yOverlap = (posY[s] > posY[f] && posY[s] < posY[f] + scaleY[i]) || (posY[f] > posY[s] && posY[f] < posY[s] + scaleY[i]);
if(xOverlap && yOverlap) {
Debug.Log("xOverlap: " + xOverlap + " yOverlap: " + yOverlap);
}
else {
//Debug.Log("xOverlap: " + xOverlap + " yOverlap: " + yOverlap);
}
}
}
}
I wouldn't recommend using completely random generation for something like this, as it can easily create something totally unplayable, and making it playable can be more difficult than trying a more methodical approach.
One interesting approach could be the one shown in this video:
https://www.youtube.com/watch?v=VkGG9Umag0M
That approach is using pre-built chunks of levels manually generated to be playable, that are later randomly chosen in run-time, to create infinite levels.
Another approach could be dynamically generating a sequence of "viable" platforms.
I'm assuming this is a 2D platform game, but the same logic could apply to other types.
For example, the following sequence:
Add a platform on the left edge (random Y position and size, if desired).
Determine viable positions of the next platform to the right and vertically, taking into account both the end position of the previous platform and other things, such as jumping distance. That would give you a max position at which you can place things and still have the player make it. You can use that max and possibly a min. distance to chose a random value between them and still have a viable platform without overlaps.
Repeat step 2 until you reach some end condition, such as size of level, amount of platforms, etc.
You can also add a more complex logic, such as allowing overlaps on one axis as long as there isn't any on the other axis, or a minimum separation between both. That way you could get two nearly parallel platforms, and things like that.
Other rules could be more complex, expecting an actual specific solution from the player, such as double-jumps, bouncing off walls, etc. In that scenario you could have item 2 just be one of many generation strategies from which to chose from.
This type of generation would also be much less expensive than actual instantiation and deletion in case of collisions.
NOTE:
If you still want to stick to 100% random generation, but guarantee gaps between platforms. Just assume an "imaginary" border surrounding the actual platform. Instead of just taking into account current real points, just add offsets to them when testing collisions.
You should be able to test intersection without physics, using something similar to what is show here:
http://answers.unity3d.com/questions/581014/2d-collision-detection-box-intersection-without-ph.html
if (object1.renderer.bounds.Intersects(object2.renderer.bounds)) {
// Do some stuff
}
I need to resample big sets of data (few hundred spectra, each containing few thousand points) using simple linear interpolation.
I have created interpolation method in C# but it seems to be really slow for huge datasets.
How can I improve the performance of this code?
public static List<double> interpolate(IList<double> xItems, IList<double> yItems, IList<double> breaks)
{
double[] interpolated = new double[breaks.Count];
int id = 1;
int x = 0;
while(breaks[x] < xItems[0])
{
interpolated[x] = yItems[0];
x++;
}
double p, w;
// left border case - uphold the value
for (int i = x; i < breaks.Count; i++)
{
while (breaks[i] > xItems[id])
{
id++;
if (id > xItems.Count - 1)
{
id = xItems.Count - 1;
break;
}
}
System.Diagnostics.Debug.WriteLine(string.Format("i: {0}, id {1}", i, id));
if (id <= xItems.Count - 1)
{
if (id == xItems.Count - 1 && breaks[i] > xItems[id])
{
interpolated[i] = yItems[yItems.Count - 1];
}
else
{
w = xItems[id] - xItems[id - 1];
p = (breaks[i] - xItems[id - 1]) / w;
interpolated[i] = yItems[id - 1] + p * (yItems[id] - yItems[id - 1]);
}
}
else // right border case - uphold the value
{
interpolated[i] = yItems[yItems.Count - 1];
}
}
return interpolated.ToList();
}
Edit
Thanks, guys, for all your responses. What I wanted to achieve, when I wrote this questions, were some general ideas where I could find some areas to improve the performance. I haven't expected any ready solutions, only some ideas. And you gave me what I wanted, thanks!
Before writing this question I thought about rewriting this code in C++ but after reading comments to Will's asnwer it seems that the gain can be less than I expected.
Also, the code is so simple, that there are no mighty code-tricks to use here. Thanks to Petar for his attempt to optimize the code
It seems that all reduces the problem to finding good profiler and checking every line and soubroutine and trying to optimize that.
Thank you again for all responses and taking your part in this discussion!
public static List<double> Interpolate(IList<double> xItems, IList<double> yItems, IList<double> breaks)
{
var a = xItems.ToArray();
var b = yItems.ToArray();
var aLimit = a.Length - 1;
var bLimit = b.Length - 1;
var interpolated = new double[breaks.Count];
var total = 0;
var initialValue = a[0];
while (breaks[total] < initialValue)
{
total++;
}
Array.Copy(b, 0, interpolated, 0, total);
int id = 1;
for (int i = total; i < breaks.Count; i++)
{
var breakValue = breaks[i];
while (breakValue > a[id])
{
id++;
if (id > aLimit)
{
id = aLimit;
break;
}
}
double value = b[bLimit];
if (id <= aLimit)
{
var currentValue = a[id];
var previousValue = a[id - 1];
if (id != aLimit || breakValue <= currentValue)
{
var w = currentValue - previousValue;
var p = (breakValue - previousValue) / w;
value = b[id - 1] + p * (b[id] - b[id - 1]);
}
}
interpolated[i] = value;
}
return interpolated.ToList();
}
I've cached some (const) values and used Array.Copy, but I think these are micro optimization that are already made by the compiler in Release mode. However You can try this version and see if it will beat the original version of the code.
Instead of
interpolated.ToList()
which copies the whole array, you compute the interpolated values directly in the final list (or return that array instead). Especially if the array/List is big enough to qualify for the large object heap.
Unlike the ordinary heap, the LOH is not compacted by the GC, which means that short lived large objects are far more harmful than small ones.
Then again: 7000 doubles are approx. 56'000 bytes which is below the large object threshold of 85'000 bytes (1).
Looks to me you've created an O(n^2) algorithm. You are searching for the interval, that's O(n), then probably apply it n times. You'll get a quick and cheap speed-up by taking advantage of the fact that the items are already ordered in the list. Use BinarySearch(), that's O(log(n)).
If still necessary, you should be able to do something speedier with the outer loop, what ever interval you found previously should make it easier to find the next one. But that code isn't in your snippet.
I'd say profile the code and see where it spends its time, then you have somewhere to focus on.
ANTS is popular, but Equatec is free I think.
few suggestions,
as others suggested, use profiler to understand better where time is used.
the loop
while (breaks[x] < xItems[0])
could cause exception if x grows bigger than number of items in "breaks" list. You should use something like
while (x < breaks.Count && breaks[x] < xItems[0])
But you might not need that loop at all. Why treat the first item as special case, just start with id=0 and handle the first point in for(i) loop. I understand that id might start from 0 in this case, and [id-1] would be negative index, but see if you can do something there.
If you want to optimize for speed then you sacrifice memory size, and vice versa. You cannot usually have both, except if you make really clever algorithm. In this case, it would mean to calculate as much as you can outside loops, store those values in variables (extra memory) and use them later. For example, instead of always saying:
id = xItems.Count - 1;
You could say:
int lastXItemsIndex = xItems.Count-1;
...
id = lastXItemsIndex;
This is the same suggestion as Petar Petrov did with aLimit, bLimit....
next point, your loop (or the one Petar Petrov suggested):
while (breaks[i] > xItems[id])
{
id++;
if (id > xItems.Count - 1)
{
id = xItems.Count - 1;
break;
}
}
could probably be reduced to:
double currentBreak = breaks[i];
while (id <= lastXIndex && currentBreak > xItems[id]) id++;
and the last point I would add is to check if there is some property in your samples that is special for your problem. For example if xItems represent time, and you are sampling in regular intervals, then
w = xItems[id] - xItems[id - 1];
is constant, and you do not have to calculate it every time in the loop.
This is probably not often the case, but maybe your problem has some other property which you could use to improve performance.
Another idea is this: maybe you do not need double precision, "float" is probably faster because it is smaller.
Good luck
System.Diagnostics.Debug.WriteLine(string.Format("i: {0}, id {1}", i, id));
I hope it's release build without DEBUG defined?
Otherwise, it might depend on what exactly are those IList parameters. May be useful to store Count value instead of accessing property every time.
This is the kind of problem where you need to move over to native code.
I use a custom Matrix class in my application, and I frequently add multiple matrices:
Matrix result = a + b + c + d; // a, b, c and d are also Matrices
However, this creates an intermediate matrix for each addition operation. Since this is simple addition, it is possible to avoid the intermediate objects and create the result by adding the elements of all 4 matrices at once. How can I accomplish this?
NOTE: I know I can define multiple functions like Add3Matrices(a, b, c), Add4Matrices(a, b, c, d), etc. but I want to keep the elegancy of result = a + b + c + d.
You could limit yourself to a single small intermediate by using lazy evaluation. Something like
public class LazyMatrix
{
public static implicit operator Matrix(LazyMatrix l)
{
Matrix m = new Matrix();
foreach (Matrix x in l.Pending)
{
for (int i = 0; i < 2; ++i)
for (int j = 0; j < 2; ++j)
m.Contents[i, j] += x.Contents[i, j];
}
return m;
}
public List<Matrix> Pending = new List<Matrix>();
}
public class Matrix
{
public int[,] Contents = { { 0, 0 }, { 0, 0 } };
public static LazyMatrix operator+(Matrix a, Matrix b)
{
LazyMatrix l = new LazyMatrix();
l.Pending.Add(a);
l.Pending.Add(b);
return l;
}
public static LazyMatrix operator+(Matrix a, LazyMatrix b)
{
b.Pending.Add(a);
return b;
}
}
class Program
{
static void Main(string[] args)
{
Matrix a = new Matrix();
Matrix b = new Matrix();
Matrix c = new Matrix();
Matrix d = new Matrix();
a.Contents[0, 0] = 1;
b.Contents[1, 0] = 4;
c.Contents[0, 1] = 9;
d.Contents[1, 1] = 16;
Matrix m = a + b + c + d;
for (int i = 0; i < 2; ++i)
{
for (int j = 0; j < 2; ++j)
{
System.Console.Write(m.Contents[i, j]);
System.Console.Write(" ");
}
System.Console.WriteLine();
}
System.Console.ReadLine();
}
}
Something that would at least avoid the pain of
Matrix Add3Matrices(a,b,c) //and so on
would be
Matrix AddMatrices(Matrix[] matrices)
In C++ it is possible to use Template Metaprograms and also here, using templates to do exactly this. However, the template programing is non-trivial. I don't know if a similar technique is available in C#, quite possibly not.
This technique, in c++ does exactly what you want. The disadvantage is that if something is not quite right then the compiler error messages tend to run to several pages and are almost impossible to decipher.
Without such techniques I suspect you are limited to functions such as Add3Matrices.
But for C# this link might be exactly what you need: Efficient Matrix Programming in C# although it seems to work slightly differently to C++ template expressions.
You can't avoid creating intermediate objects.
However, you can use expression templates as described here to minimise them and do fancy lazy evaluation of the templates.
At the simplest level, the expression template could be an object that stores references to several matrices and calls an appropriate function like Add3Matrices() upon assignment. At the most advanced level, the expression templates will do things like calculate the minimum amount of information in a lazy fashion upon request.
This is not the cleanest solution, but if you know the evaluation order, you could do something like this:
result = MatrixAdditionCollector() << a + b + c + d
(or the same thing with different names). The MatrixCollector then implements + as +=, that is, starts with a 0-matrix of undefined size, takes a size once the first + is evaluated and adds everything together (or, copies the first matrix). This reduces the amount of intermediate objects to 1 (or even 0, if you implement assignment in a good way, because the MatrixCollector might be/contain the result immediately.)
I am not entirely sure if this is ugly as hell or one of the nicer hacks one might do. A certain advantage is that it is kind of obvious what's happening.
Might I suggest a MatrixAdder that behaves much like a StringBuilder. You add matrixes to the MatrixAdder and then call a ToMatrix() method that would do the additions for you in a lazy implementation. This would get you the result you want, could be expandable to any sort of LazyEvaluation, but also wouldn't introduce any clever implementations that could confuse other maintainers of the code.
I thought that you could just make the desired add-in-place behavior explicit:
Matrix result = a;
result += b;
result += c;
result += d;
But as pointed out by Doug in the Comments on this post, this code is treated by the compiler as if I had written:
Matrix result = a;
result = result + b;
result = result + c;
result = result + d;
so temporaries are still created.
I'd just delete this answer, but it seems others might have the same misconception, so consider this a counter example.
Bjarne Stroustrup has a short paper called Abstraction, libraries, and efficiency in C++ where he mentions techniques used to achieve what you're looking for. Specifically, he mentions the library Blitz++, a library for scientific calculations that also has efficient operations for matrices, along with some other interesting libraries. Also, I recommend reading a conversation with Bjarne Stroustrup on artima.com on that subject.
It is not possible, using operators.
My first solution would be something along this lines (to add in the Matrix class if possible) :
static Matrix AddMatrices(Matrix[] lMatrices) // or List<Matrix> lMatrices
{
// Check consistency of matrices
Matrix m = new Matrix(n, p);
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
foreach (Maxtrix mat in lMatrices)
m[i, j] += mat[i, j];
return m;
}
I'd had it in the Matrix class because you can rely on the private methods and properties that could be usefull for your function in case the implementation of the matrix change (linked list of non empty nodes instead of a big double array, for example).
Of course, you would loose the elegance of result = a + b + c + d. But you would have something along the lines of result = Matrix.AddMatrices(new Matrix[] { a, b, c, d });.
There are several ways to implement lazy evaluation to achieve that. But its important to remember that not always your compiler will be able to get the best code of all of them.
I already made implementations that worked great in GCC and even superceeded the performance of the traditional several For unreadable code because they lead the compiler to observe that there were no aliases between the data segments (somethign hard to grasp with arrays coming of nowhere). But some of those were a complete fail at MSVC and vice versa on other implementations. Unfortunately those are too long to post here (don't think several thousands lines of code fit here).
A very complex library with great embedded knowledge int he area is Blitz++ library for scientific computation.