How to skip first few elements from array? - c#

I have an array with total 5000 elements and at one functionality, I only need last 3000 elements only to proceed further.
for that I have tried following solution.
//skipping first 2000 elements
list = list.Skip(5000 - 3000).ToArray();
This solution is actually giving me desired solution, but when I ran profiler on my code, It is showing huge amount memory allocation on this line.
I have to use Array only due to carried on legacy. and very frequent ToArray() doesn't seem to be good for performance.
there is also possible solution,
//reversing whole list
Array.Reverse(list);
//restricting size of an array to 3000,
//so first (as reversed the list, they are last 3000 elements)
Array.Resize(ref list, 3000);
//again reversing list to make it proper order
Array.Reverse(list);
but this is even worse in time complexity.
Is there any better solution for this, which doesn't need casting from List to Array ?

If you absolutely have to use an array, then Array.Copy is probably your friend:
int[] smallerArray = new int[array.Length - 2000];
Array.Copy(array, 2000, smallerArray, 0, smallerArray.Length);
I'd expect that to be a bit more efficient than using Take followed by ToArray.

If list is a List<> you can use List.GetRange:
int lastN = 3000;
var sublist = list.GetRange(list.Count - lastN, lastN);
var array = sublist.ToArray();
This is more efficient because List.ToArray uses Array.Copy.
If list is an int[] as commented it's even more efficient:
int lastN = 3000;
int[] result = new int[lastN];
Array.Copy(list, list.Length - lastN, result, 0, lastN);

you can use Skip(Provide number which you want to exclude).ToArray();

Related

Random access on .NET lists is slow, but what if I always reference the first element?

I know that in general, .NET Lists are not good for random access. I've always been told that an array would be best for that. I have a program that needs to continually (like more than a billion times) access the first element of a .NET list, and I am wondering if this will slow anything down, or it won't matter because it's the first element in the list. I'm also doing a lot of other things like adding and removing items from the list as I go along, but the List is never empty.
I'm using F#, but I think this applies to any .NET language (I am using .NET Lists, not F# Lists). My list is about 100 elements long.
In F#, the .NET list (System.Collections.Generic.List) is aptly aliased as ResizeArray, which leaves little doubt as to what to expect. It's an array that can resize itself, and not really a list in the CS-classroom understanding of the term. Any performance differences between it and a simple array most likely come from the fact that compiler can be more aggressive about optimizing array usage.
Back to your question. If you only access the first element of a list, it doesn't matter what you choose. Both a ResizeArray and a list (using F# lingo) have O(1) access to the first element (head).
A list would be a preferable choice if your other operations also work on the head element, i.e. you only add elements from the head. If you want to append elements to the end of the list, or mutate some elements that already in, you'd get better mileage out of a ResizeArray.
That said, a ResizeArray in idomatic F# code is a rare sight. The usual approach favors (and doesn't suffer from using) immutable data structures, so seeing one usually would be a minor red flag for me.
There is not much difference between the performance of random access for an array and a list. Here's a test on my machine.
var list = Enumerable.Range(1, 100).ToList();
var array = Enumerable.Range(1, 100).ToArray();
int total = 0;
var sw = Stopwatch.StartNew();
for (int i = 0; i < 1000000000; i++) {
total ^= list[0];
}
Console.WriteLine("Time for list: {0}", sw.Elapsed);
sw.Restart();
for (int i = 0; i < 1000000000; i++) {
total ^= array[0];
}
Console.WriteLine("Time for list: {0}", sw.Elapsed);
This produces this output:
Time for list: 00:00:05.2002620
Time for array: 00:00:03.0159816
If you know you have a fixed size list, it makes sense to use an array, otherwise, there's not much cost to the list. (see update)
Update!
I found some pretty significant new information. After executing the script in release mode, the story changes quite a bit.
Time for list: 00:00:02.3048339
Time for array: 00:00:00.0805705
In this case, the performance of the array totally dominates the list. I'm pretty surprised, but the numbers don't lie.
Go with the array.

How to re-order data in memory to optimize cache access?

I want to shuffle a big dataset (of type List<Record>), then iterate over it many times. Typically, shuffling a list only shuffles the references, not the data. My algorithm's performance suffers tremendously (3x) because of frequent cache missing. I can do a deep copy of the shuffled data to make it cache friendly. However, that would double the memory usage.
Is there a more memory-efficient way to shuffle or re-order data so that the shuffled data is cache friendly?
Option 1:
Make Record a struct so the List<Record> holds contiguous data in memory.
Then either sort it directly, or (if the records are large) instead of sorting the list directly, make an array of indices (initially just {0, 1, ..., n - 1}) and then sort the indices by making the comparator compare the elements they refer to. Finally if you need the sorted array you can copy the elements in the shuffled order by looking at the indices.
Note that this may be more cache-unfriendly than directly sorting the structs, but at least it'll be a single pass through the data, so it is more likely to be faster, depending on the struct size. You can't really avoid it if the struct is large, so if you're not sure whether Record is large, you'll have to try both approaches and see whether sorting the records directly is more efficient.
If you can't change the type, then your only solution is to somehow make them contiguous in memory. The only realistic way of doing that is to perform an initial garbage collection, then allocate them in order, and keep your fingers crossed hoping that the runtime will allocate them contiguously. I can't think of any other way that could work if you can't make it a struct.
If you think another garbage collection run in the middle might mess up the order, you can try making a second array of GCHandle with pinned references to these objects. I don't recommend this, but it might be your only solution at that point.
Option 2:
Are you really using the entire record for sorting? That's unlikely. If not, then just extract the portion of each record that is relevant, sort those, and then re-shuffle the original data.
It is better for you not to touch the List. Instead you create an accessor method for you list. First you create an array of n elements in a random order e.g something like var arr = [2, 5, .., n-1, 0];
Then you create an access method:
Record get(List<Record> list, int i) {
return list[arr[i]];
}
By doing so the list remains untouched, but you get a random Record at every index.
Edit: to create a random order array:
int[] arr = new int[n];
// Fill the array with values 1 to n;
for (int i = 0; i < arr.Length; i++)
arr[i] = i + 1;
// Switch pairs of values for unbiased uniform random distribution:
Random rnd = new Random();
for (int i = 0; i < arr.Length - 1; i++) {
int j = rnd.Next(i, arr.Length);
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}

Building int[] of increasing size

What is the easiest way to build an array of integers starting at 0 and increasing until a given point?
Background:
I have a struct that holds an int[] representing indexes of other arrays.
I would like to signify I want to use all indexes by filling this array with ints starting at 0 and increasing until int numTotalIndexes; I am sure there is a better way to do this than using a for loop.
Someone here showed me this little Linq trick
int[] numContacts = new int[]{ 32, 48, 24, 12};
String[][] descriptions = numContacts.Select(c => new string[c]).ToArray();
to build a jagged 2D array without loops (well it does, but it hides them and makes my code pretty) and I think there might be a nice little trick to accomplish what I want above.
You can use Enumerable.Range:
int[] intArray = Enumerable.Range(0, numTotalIndexes).ToArray();
You:
I am sure there is a better way to do this than using a for loop
Note that LINQ also uses loops, you simply don't see them. It's also not the most efficient way since ToArray doesn't know how large the array must be. However, it is a readable and short way.
So here is the (possibly premature-)optimized, classic way to initialize the array:
int[] intArray = new int[numTotalIndexes];
for(int i=0; i < numTotalIndexes; i++)
intArray[i] = i;
I'm not sure i understand your question at all but if going by your first line what you want is
var MySequencialArray = Enumerable.From(0,howmanyyouwant).ToArray();

Prepend to a C# Array

Given a populated byte[] values in C#, I want to prepend the value (byte)0x00 to the array. I assume this will require making a new array and adding the contents of the old array. Speed is an important aspect of my application. What is the best way to do this?
-- EDIT --
The byte[] is used to store DSA (Digital Signature Algorithm) parameters. The operation will only need to be performed once per array, but speed is important because I am potentially performing this operation on many different byte[]s.
If you are only going to perform this operation once then there isn't a whole lot of choices. The code provided by Monroe's answer should do just fine.
byte[] newValues = new byte[values.Length + 1];
newValues[0] = 0x00; // set the prepended value
Array.Copy(values, 0, newValues, 1, values.Length); // copy the old values
If, however, you're going to be performing this operation multiple times you have some more choices. There is a fundamental problem that prepending data to an array isn't an efficient operation, so you could choose to use an alternate data structure.
A LinkedList can efficiently prepend data, but it's less efficient in general for most tasks as it involves a lot more memory allocation/deallocation and also looses memory locallity, so it may not be a net win.
A double ended queue (known as a deque) would be a fantastic data structure for you. You can efficiently add to the start or the end, and efficiently access data anywhere in the structure (but you can't efficiently insert somewhere other than the start or end). The major problem here is that .NET doesn't provide an implementation of a deque. You'd need to find a 3rd party library with an implementation.
You can also save yourself a lot when copying by keeping track of "data that I need to prepend" (using a List/Queue/etc.) and then waiting to actually prepend the data as long as possible, so that you minimize the creation of new arrays as much as possible, as well as limiting the number of copies of existing elements.
You could also consider whether you could adjust the structure so that you're adding to the end, rather than the start (even if you know that you'll need to reverse it later). If you are appending a lot in a short space of time it may be worth storing the data in a List (which can efficiently add to the end) and adding to the end. Depending on your needs, it may even be worth making a class that is a wrapper for a List and that hides the fact that it is reversed. You could make an indexer that maps i to Count-i, etc. so that it appears, from the outside, as though your data is stored normally, even though the internal List actually holds the data backwards.
Ok guys, let's take a look at the perfomance issue regarding this question.
This is not an answer, just a microbenchmark to see which option is more efficient.
So, let's set the scenario:
A byte array of 1,000,000 items, randomly populated
We need to prepend item 0x00
We have 3 options:
Manually creating and populating the new array
Manually creating the new array and using Array.Copy (#Monroe)
Creating a list, loading the array, inserting the item and converting the list to an array
Here's the code:
byte[] byteArray = new byte[1000000];
for (int i = 0; i < byteArray.Length; i++)
{
byteArray[i] = Convert.ToByte(DateTime.Now.Second);
}
Stopwatch stopWatch = new Stopwatch();
//#1 Manually creating and populating a new array;
stopWatch.Start();
byte[] extendedByteArray1 = new byte[byteArray.Length + 1];
extendedByteArray1[0] = 0x00;
for (int i = 0; i < byteArray.Length; i++)
{
extendedByteArray1[i + 1] = byteArray[i];
}
stopWatch.Stop();
Console.WriteLine(string.Format("#1: {0} ms", stopWatch.ElapsedMilliseconds));
stopWatch.Reset();
//#2 Using a new array and Array.Copy
stopWatch.Start();
byte[] extendedByteArray2 = new byte[byteArray.Length + 1];
extendedByteArray2[0] = 0x00;
Array.Copy(byteArray, 0, extendedByteArray2, 1, byteArray.Length);
stopWatch.Stop();
Console.WriteLine(string.Format("#2: {0} ms", stopWatch.ElapsedMilliseconds));
stopWatch.Reset();
//#3 Using a List
stopWatch.Start();
List<byte> byteList = new List<byte>();
byteList.AddRange(byteArray);
byteList.Insert(0, 0x00);
byte[] extendedByteArray3 = byteList.ToArray();
stopWatch.Stop();
Console.WriteLine(string.Format("#3: {0} ms", stopWatch.ElapsedMilliseconds));
stopWatch.Reset();
Console.ReadLine();
And the results are:
#1: 9 ms
#2: 1 ms
#3: 6 ms
I've run it multiple times and I got different numbers, but the proportion is always the same: #2 is always the most efficient choice.
My conclusion: arrays are more efficient then Lists (although they provide less functionality), and somehow Array.Copy is really optmized (would like to understand that, though).
Any feedback will be appreciated.
Best regards.
PS: this is not a swordfight post, we are at a Q&A site to learn and teach. And learn.
The easiest and cleanest way for .NET 4.7.1 and above is to use the side-effect free Prepend().
Adds a value to the beginning of the sequence.
Example
// Creating an array of numbers
var numbers = new[] { 1, 2, 3 };
// Trying to prepend any value of the same type
var results = numbers.Prepend(0);
// output is 0, 1, 2, 3
Console.WriteLine(string.Join(", ", results ));
As you surmised, the fastest way to do this is to create new array of length + 1 and copy all the old values over.
If you are going to be doing this many times, then I suggest using a List<byte> instead of byte[], as the cost of reallocating and copying while growing the underlying storage is amortized more effectively; in the usual case, the underlying vector in the List is grown by a factor of two each time an addition or insertion is made to the List that would exceed its current capacity.
...
byte[] newValues = new byte[values.Length + 1];
newValues[0] = 0x00; // set the prepended value
Array.Copy(values, 0, newValues, 1, values.Length); // copy the old values
When I need to append data frequently but also want O(1) random access to individual elements, I'll use an array that is over allocated by some amount of padding for quickly adding new values. This means you need to store the actual content length in another variable, as the array.length will indicate the length + the padding. A new value gets appended by using one slot of the padding, no allocation & copy are necessary until you run out of padding. In effect, allocation is amortized over several append operations. There are speed space trade offs, as if you have many of these data structures you could have a fair amount of padding in use at any one time in the program.
This same technique can be used in prepending. Just as with appending, you can introduce an interface or abstraction between the users and the implementation: you can have several slots of padding so that new memory allocation is only necessary occasionally. As some above suggested, you can also implement a prepending interface with an appending data structure that reverses the indexes.
I'd package the data structure as an implementation of some generic collection interface, so that the interface appears fairly normal to the user (such as an array list or something).
(Also, if removal is supported, it's probably useful to clear elements as soon as they are removed to help reduce gc load.)
The main point is to consider the implementation and the interface separately, as decoupling them gives you the flexibility to choose varied implementations or to hide implementation details using a minimal interface.
There are many other data structures you could use depending on the applicability to your domain. Ropes or Gap Buffer; see What is best data structure suitable to implement editor like notepad?; Trie's do some useful things, too.
I know this is a VERY old post but I actually like using lambda. Sure my code may NOT be the most efficient way but its readable and in one line. I use a combination of .Concat and ArraySegment.
string[] originalStringArray = new string[] { "1", "2", "3", "5", "6" };
int firstElementZero = 0;
int insertAtPositionZeroBased = 3;
string stringToPrepend = "0";
string stringToInsert = "FOUR"; // Deliberate !!!
originalStringArray = new string[] { stringToPrepend }
.Concat(originalStringArray).ToArray();
insertAtPositionZeroBased += 1; // BECAUSE we prepended !!
originalStringArray = new ArraySegment<string>(originalStringArray, firstElementZero, insertAtPositionZeroBased)
.Concat(new string[] { stringToInsert })
.Concat(new ArraySegment<string>(originalStringArray, insertAtPositionZeroBased, originalStringArray.Length - insertAtPositionZeroBased)).ToArray();
The best choice depends on what you're going to be doing with this collection later on down the line. If that's the only length-changing edit that will ever be made, then your best bet is to create a new array with one additional slot and use Array.Copy() to do the rest. No need to initialize the first value, since new C# arrays are always zeroed out:
byte[] PrependWithZero(byte[] input)
{
var newArray = new byte[input.Length + 1];
Array.Copy(input, 0, newArray, 1, input.Length);
return newArray;
}
If there are going to be other length-changing edits that might happen, the most performant option might be to use a List<byte> all along, as long as the additions aren't always to the beginning. (If that's the case, even a linked list might not be an option that you can dismiss out of hand.):
var list = new List<byte>(input);
list.Insert(0, 0);
I am aware this is over 4-year-old accepted post, but for those who this might be relevant Buffer.BlockCopy would be faster.

combination question

i have an array like below
int[] array = new array[n];// n may be 2,3,4
example for N = 4
int[] array = new array[4];
array[0] = 2;
array[1] = 4;
array[2] = 6;
array[3] = 8;
how i calculate all unrepeated combination of this array without using linq may be within?
2,4,6,8
2,4,8,6
2,8,6,4
2,6,4,6
8,6,4,2
2,4,6,8
.......
.......
.......
Here's a pretty flexible C# implementation using iterators.
Well, given that you are looking for all unrepeated combinations, that means there will be N! such combinations... (so, in your case, N! = 4! = 24 such combinations).
As I'm in the middle of posting this, dommer has pointed out a good implementation.
Just be warned that it's is going to get really slow for large values of N (since there are N! permutations).
Think about the two possible states of the world to see if that sheds any light.
1) There are no dupes in my array(i.e. each number in the array is unique). In this case, how many possible permutations are there?
2) There is one single dupe in the array. So, of the number of permutations that you calculated in part one, how many are just duplicates
Hmmm, lets take a three element array for simplicity
1,3,5 has how many permutations?
1,3,5
1,5,3
3,1,5
3,5,1
5,1,3
5,3,1
So six permutations
Now what happens if we change the list to say 1,5,5?
We get
1,5,5
5,1,5
5,5,1
My question to you would be, how can you express this via factorials?
Perhaps try writing out all the permutations with a four element array and see if the light bulb goes off?

Categories