I have a 2D string array in C# and I need to shift that array to left in one dimension
how can I do that in efficient way
I dont want use nested for and i want an algurithm in O(n) not O(n2)
for (int i = 50; i < 300; i++)
{
for (int j = 0; j < 300; j++)
{
numbers[i-50, j] = numbers[i, j];
}
}
If you want to shift large amounts of data around quickly, use Array.Copy rather than a loop that copies individual characters.
If you swap to a byte array and use Array.Copy or Buffer.BlockCopy you will probably improve the performance a bit more (but if you have to convert to/from character arrays you may lose everything you've gained).
(edit: Now that you've posted example code): If you use references to the array rows then you may be able to shift the references rather than having to move the data itself. Any you can still shift the references using Array.Copy)
But if you change your approach so you don't need to shift the data, you'll gain considerably better performance - not doing the work at all if you can avoid it is always faster! Chances are you can wrap the data in an accessor layer that keeps track of how much the data has been shifted and modifies your indexes to return the data you are after. (This will slightly slow down access to the data, but saves you shifting the data, so may result in a net win - depending on how much you access relative to how much you shift)
The most efficient way would be to not shift it at all, but instead change how you access the array. For example, keep an offset that tells you where in the dimension the first column is.
Related
I want to shuffle a big dataset (of type List<Record>), then iterate over it many times. Typically, shuffling a list only shuffles the references, not the data. My algorithm's performance suffers tremendously (3x) because of frequent cache missing. I can do a deep copy of the shuffled data to make it cache friendly. However, that would double the memory usage.
Is there a more memory-efficient way to shuffle or re-order data so that the shuffled data is cache friendly?
Option 1:
Make Record a struct so the List<Record> holds contiguous data in memory.
Then either sort it directly, or (if the records are large) instead of sorting the list directly, make an array of indices (initially just {0, 1, ..., n - 1}) and then sort the indices by making the comparator compare the elements they refer to. Finally if you need the sorted array you can copy the elements in the shuffled order by looking at the indices.
Note that this may be more cache-unfriendly than directly sorting the structs, but at least it'll be a single pass through the data, so it is more likely to be faster, depending on the struct size. You can't really avoid it if the struct is large, so if you're not sure whether Record is large, you'll have to try both approaches and see whether sorting the records directly is more efficient.
If you can't change the type, then your only solution is to somehow make them contiguous in memory. The only realistic way of doing that is to perform an initial garbage collection, then allocate them in order, and keep your fingers crossed hoping that the runtime will allocate them contiguously. I can't think of any other way that could work if you can't make it a struct.
If you think another garbage collection run in the middle might mess up the order, you can try making a second array of GCHandle with pinned references to these objects. I don't recommend this, but it might be your only solution at that point.
Option 2:
Are you really using the entire record for sorting? That's unlikely. If not, then just extract the portion of each record that is relevant, sort those, and then re-shuffle the original data.
It is better for you not to touch the List. Instead you create an accessor method for you list. First you create an array of n elements in a random order e.g something like var arr = [2, 5, .., n-1, 0];
Then you create an access method:
Record get(List<Record> list, int i) {
return list[arr[i]];
}
By doing so the list remains untouched, but you get a random Record at every index.
Edit: to create a random order array:
int[] arr = new int[n];
// Fill the array with values 1 to n;
for (int i = 0; i < arr.Length; i++)
arr[i] = i + 1;
// Switch pairs of values for unbiased uniform random distribution:
Random rnd = new Random();
for (int i = 0; i < arr.Length - 1; i++) {
int j = rnd.Next(i, arr.Length);
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
I have three nested loops from zero to n. n is a large number, around 12000th These three loops working on 2DList. It is actually a Floyd algorithm. At these large data it takes along time, could you advise me how to improve it? Thank you (Sorry for my english:) )
List<List<int>> distance = new List<List<int>>();
...
for (int i = 0; i < n; i++)
for (int v = 0; v < n; v++)
for (int w = 0; w < n; w++)
{
if (distance[v][i] != int.MaxValue &&
distance[i][w] != int.MaxValue)
{
int d = distance[v][i] + distance[i][w];
if (distance[v][w] > d)
distance[v][w] = d;
}
}
The first part of your if statement distance[v][i] != int.MaxValue can be moved outside of the iteration over w to reduce overhead in some cases. However, I have no idea how often your values are at int.MaxValue
You cannot change Floyd’s algorithm, its complexity is fixed (and it’s provably the most efficient solution to the general problem of finding all pairwise shortest path distances in a graph with negative edge weights).
You can only improve the runtime by making the problem more specific or the data set smaller. For a general solution you’re stuck with what you have.
Normally I would suggest using Parallel Linq - for example the Ray Tracer example, however this assumes that the items you're operating on are independent. In your example you are using results from a previous iteration, in the current one, making it impossible to parallelize.
As your code is quite simple and there isn't really any overhead, there's not really anything you can do to speed that up. As mentioned you could switch the Lists to arrays. You might also want to compare Double arithmetic to Integer arithmetic on your target machine.
After a simple look at your code, it seems that you might be heading for a overflow, as the condition check would not be able to block it.
In your code, the condition below adds no value, since we can have distance[v][i] < Int.MaxValue & distance[i][w] < Int.MaxValue but distance[v][i] + distance[i][w] > Int.Maxvalue.
if (distance[v][i] != int.MaxValue && distance[i][w] != int.MaxValue)
As the others have mentioned, the complexity is fixed so you don't exactly have many options there. However, you can use
Use arrays instead of lists, if possible.
Use an "unsafe" block with pointersemantics, this should decrease the time required to access your array data.
Check if you can parallelize your algorithm. In your case you could use multiple copies of your data (multiple copies to get rid of the need for synchronisation) and have several threads work on it (e.g. by splitting the range of the outerloop into some subranges (1-1000, 1001-2000 e.g.).
I've got an array of integers we're getting from a third party provider. These are meant to be sequential but for some reason they miss a number (something throws an exception, its eaten and the loop continues missing that index). This causes our system some grief and I'm trying to ensure that the array we're getting is indeed sequential.
The numbers start from varying offsets (sometimes 1000, sometimes 5820, others 0) but whatever the start, its meant to go from there.
What's the fastest method to verify the array is sequential? Even though its a required step it seems now, I also have to make sure it doesn't take too long to verify. I am currently starting at the first index, picking up the number and adding one and making sure the next index contains that etc.
EDIT:
The reason why the system fails is because of the way people use the system it may not always be returning the tokens the way it was picked initially - long story. The data can't be corrected until it gets to our layer unfortunately.
If you're sure that the array is sorted and has no duplicates, you can just check:
array[array.Length - 1] == array[0] + array.Length - 1
I think it's worth addressing the bigger issue here: what are you going to do if the data doesn't meet your requriements (sequential, no gaps)?
If you're still going to process the data, then you should probably invest your time in making your system more resilient to gaps or missing entries in the data.
**If you need to process the data and it must be clean, you should work with the vendor to make sure they send you well-formed data.
If you're going to skip processing and report an error, then asserting the precondition of no gaps may be the way to go. In C# there's a number of different things you could do:
If the data is sorted and has no dups, just check if LastValue == FirstValue + ArraySize - 1.
If the data is not sorted but dup free, just sort it and do the above.
If the data is not sorted, has dups and you actually want to detect the gaps, I would use LINQ.
List<int> gaps = Enumerable.Range(array.Min(), array.Length).Except(array).ToList();
or better yet (since the high-end value may be out of range):
int minVal = array.Min();
int maxVal = array.Max();
List<int> gaps = Enumerable.Range(minVal, maxVal-minVal+1).Except(array).ToList();
By the way, the whole concept of being passed a dense, gapless, array of integers is a bit odd for an interface between two parties, unless there's some additional data that associated with them. If there's no other data, why not just send a range {min,max} instead?
for (int i = a.Length - 2; 0 <= i; --i)
{
if (a[i] >= a[i+1]) return false; // not in sequence
}
return true; // in sequence
Gabe's way is definitely the fastest if the array is sorted. If the array is not sorted, then it would probably be best to sort the array (with merge/shell sort (or something of similar speed)) and then use Gabe's way.
I have two multi-dimentional arrays declared like this:
bool?[,] biggie = new bool?[500, 500];
bool?[,] small = new bool?[100, 100];
I want to copy part of the biggie one into the small. Let’s say I want from the index 100 to 199 horizontally and 100 to 199 vertically.
I have written a simple for statement that goes like this:
for(int x = 0; x < 100; x++)
{
For(int y = 0; y < 100; y++)
{
Small[x,y] = biggie[x+100,y+100];
}
}
I do this A LOT in my code, and this has proven to be a major performance jammer.
Array.Copy only copies single-dimentional arrays, and with multi-dimentional arrays it just considers as if the whole matrix is a single array, putting each row at the end of the other, which won’t allow me to cut a square in the middle of my array.
Is there a more efficient way to do this?
Ps.: I do consider refactoring my code in order not to do this at all, and doing whatever I want to do with the bigger array. Copying matrixes just can’t be painless, the point is that I have already stumbled upon this before, looked for an answer, and got none.
In my experience, there are two ways to do this efficiently:
Use unsafe code and work directly with pointers.
Convert the 2D array to a 1D array and do the necessary arithmetic when you need to access it as a 2D array.
The first approach is ugly and it uses potentially invalid assumptions since 2D arrays are not guaranteed to be laid out contiguously in memory. The upshot to the first approach is that you don't have to change your code that is already using 2D arrays. The second approach is as efficient as the first approach, doesn't make invalid assumptions, but does require updating your code.
I'm looking for resources that can help me determine which approach to use in creating a 2d data structure with C#.
Do you mean multidimensional array? It's simple:
<type>[,] <name> = new <type>[<first dimenison>,<second dimension>];
Here is MSDN reference:
Multidimensional Arrays (C#)
#Traumapony-- I'd actually state that the real performance gain is made in one giant flat array, but that may just be my C++ image processing roots showing.
It depends on what you need the 2D structure to do. If it's storing something where each set of items in the second dimension is the same size, then you want to use something like a large 1D array, because the seek times are faster and the data management is easier. Like:
for (y = 0; y < ysize; y++){
for (x = 0; x < xsize; x++){
theArray[y*xsize + x] = //some stuff!
}
}
And then you can do operations which ignore neighboring pixels with a single passthrough:
totalsize = xsize*ysize;
for (x = 0; x < totalsize; x++){
theArray[x] = //some stuff!
}
Except that in C# you probably want to actually call a C++ library to do this kind of processing; C++ tends to be faster for this, especially if you use the intel compiler.
If you have the second dimension having multiple different sizes, then nothing I said applies, and you should look at some of the other solutions. You really need to know what your functional requirements are in order to be able to answer the question.
Depending on the type of the data, you could look at using a straight 2 dimensional array:
int[][] intGrid;
If you need to get tricky, you could always go the generics approach:
Dictionary<KeyValuePair<int,int>,string>;
That allows you to put complex types in the value part of the dictionary, although makes indexing into the elements more difficult.
If you're looking to store spatial 2d point data, System.Drawing has a lot of support for points in 2d space.
For performance, it's best not to use multi-dimensional arrays ([,]); instead, use jagged arrays. e.g.:
<type>[][] <name> = new <type>[<first dimension>];
for (int i = 0; i < <first dimension>; i++)
{
<name>[i] = new <type>[<second dimension>];
}
To access:
<type> item = <name>[<first index>][<second index>];
Data Structures in C#
Seriously, I'm not trying to be critical of the question, but I got tons of useful results right at the top of my search when I Googled for:
data structures c#
If you have specific questions about specific data structures, we might have more specific answers...