Given a populated byte[] values in C#, I want to prepend the value (byte)0x00 to the array. I assume this will require making a new array and adding the contents of the old array. Speed is an important aspect of my application. What is the best way to do this?
-- EDIT --
The byte[] is used to store DSA (Digital Signature Algorithm) parameters. The operation will only need to be performed once per array, but speed is important because I am potentially performing this operation on many different byte[]s.
If you are only going to perform this operation once then there isn't a whole lot of choices. The code provided by Monroe's answer should do just fine.
byte[] newValues = new byte[values.Length + 1];
newValues[0] = 0x00; // set the prepended value
Array.Copy(values, 0, newValues, 1, values.Length); // copy the old values
If, however, you're going to be performing this operation multiple times you have some more choices. There is a fundamental problem that prepending data to an array isn't an efficient operation, so you could choose to use an alternate data structure.
A LinkedList can efficiently prepend data, but it's less efficient in general for most tasks as it involves a lot more memory allocation/deallocation and also looses memory locallity, so it may not be a net win.
A double ended queue (known as a deque) would be a fantastic data structure for you. You can efficiently add to the start or the end, and efficiently access data anywhere in the structure (but you can't efficiently insert somewhere other than the start or end). The major problem here is that .NET doesn't provide an implementation of a deque. You'd need to find a 3rd party library with an implementation.
You can also save yourself a lot when copying by keeping track of "data that I need to prepend" (using a List/Queue/etc.) and then waiting to actually prepend the data as long as possible, so that you minimize the creation of new arrays as much as possible, as well as limiting the number of copies of existing elements.
You could also consider whether you could adjust the structure so that you're adding to the end, rather than the start (even if you know that you'll need to reverse it later). If you are appending a lot in a short space of time it may be worth storing the data in a List (which can efficiently add to the end) and adding to the end. Depending on your needs, it may even be worth making a class that is a wrapper for a List and that hides the fact that it is reversed. You could make an indexer that maps i to Count-i, etc. so that it appears, from the outside, as though your data is stored normally, even though the internal List actually holds the data backwards.
Ok guys, let's take a look at the perfomance issue regarding this question.
This is not an answer, just a microbenchmark to see which option is more efficient.
So, let's set the scenario:
A byte array of 1,000,000 items, randomly populated
We need to prepend item 0x00
We have 3 options:
Manually creating and populating the new array
Manually creating the new array and using Array.Copy (#Monroe)
Creating a list, loading the array, inserting the item and converting the list to an array
Here's the code:
byte[] byteArray = new byte[1000000];
for (int i = 0; i < byteArray.Length; i++)
{
byteArray[i] = Convert.ToByte(DateTime.Now.Second);
}
Stopwatch stopWatch = new Stopwatch();
//#1 Manually creating and populating a new array;
stopWatch.Start();
byte[] extendedByteArray1 = new byte[byteArray.Length + 1];
extendedByteArray1[0] = 0x00;
for (int i = 0; i < byteArray.Length; i++)
{
extendedByteArray1[i + 1] = byteArray[i];
}
stopWatch.Stop();
Console.WriteLine(string.Format("#1: {0} ms", stopWatch.ElapsedMilliseconds));
stopWatch.Reset();
//#2 Using a new array and Array.Copy
stopWatch.Start();
byte[] extendedByteArray2 = new byte[byteArray.Length + 1];
extendedByteArray2[0] = 0x00;
Array.Copy(byteArray, 0, extendedByteArray2, 1, byteArray.Length);
stopWatch.Stop();
Console.WriteLine(string.Format("#2: {0} ms", stopWatch.ElapsedMilliseconds));
stopWatch.Reset();
//#3 Using a List
stopWatch.Start();
List<byte> byteList = new List<byte>();
byteList.AddRange(byteArray);
byteList.Insert(0, 0x00);
byte[] extendedByteArray3 = byteList.ToArray();
stopWatch.Stop();
Console.WriteLine(string.Format("#3: {0} ms", stopWatch.ElapsedMilliseconds));
stopWatch.Reset();
Console.ReadLine();
And the results are:
#1: 9 ms
#2: 1 ms
#3: 6 ms
I've run it multiple times and I got different numbers, but the proportion is always the same: #2 is always the most efficient choice.
My conclusion: arrays are more efficient then Lists (although they provide less functionality), and somehow Array.Copy is really optmized (would like to understand that, though).
Any feedback will be appreciated.
Best regards.
PS: this is not a swordfight post, we are at a Q&A site to learn and teach. And learn.
The easiest and cleanest way for .NET 4.7.1 and above is to use the side-effect free Prepend().
Adds a value to the beginning of the sequence.
Example
// Creating an array of numbers
var numbers = new[] { 1, 2, 3 };
// Trying to prepend any value of the same type
var results = numbers.Prepend(0);
// output is 0, 1, 2, 3
Console.WriteLine(string.Join(", ", results ));
As you surmised, the fastest way to do this is to create new array of length + 1 and copy all the old values over.
If you are going to be doing this many times, then I suggest using a List<byte> instead of byte[], as the cost of reallocating and copying while growing the underlying storage is amortized more effectively; in the usual case, the underlying vector in the List is grown by a factor of two each time an addition or insertion is made to the List that would exceed its current capacity.
...
byte[] newValues = new byte[values.Length + 1];
newValues[0] = 0x00; // set the prepended value
Array.Copy(values, 0, newValues, 1, values.Length); // copy the old values
When I need to append data frequently but also want O(1) random access to individual elements, I'll use an array that is over allocated by some amount of padding for quickly adding new values. This means you need to store the actual content length in another variable, as the array.length will indicate the length + the padding. A new value gets appended by using one slot of the padding, no allocation & copy are necessary until you run out of padding. In effect, allocation is amortized over several append operations. There are speed space trade offs, as if you have many of these data structures you could have a fair amount of padding in use at any one time in the program.
This same technique can be used in prepending. Just as with appending, you can introduce an interface or abstraction between the users and the implementation: you can have several slots of padding so that new memory allocation is only necessary occasionally. As some above suggested, you can also implement a prepending interface with an appending data structure that reverses the indexes.
I'd package the data structure as an implementation of some generic collection interface, so that the interface appears fairly normal to the user (such as an array list or something).
(Also, if removal is supported, it's probably useful to clear elements as soon as they are removed to help reduce gc load.)
The main point is to consider the implementation and the interface separately, as decoupling them gives you the flexibility to choose varied implementations or to hide implementation details using a minimal interface.
There are many other data structures you could use depending on the applicability to your domain. Ropes or Gap Buffer; see What is best data structure suitable to implement editor like notepad?; Trie's do some useful things, too.
I know this is a VERY old post but I actually like using lambda. Sure my code may NOT be the most efficient way but its readable and in one line. I use a combination of .Concat and ArraySegment.
string[] originalStringArray = new string[] { "1", "2", "3", "5", "6" };
int firstElementZero = 0;
int insertAtPositionZeroBased = 3;
string stringToPrepend = "0";
string stringToInsert = "FOUR"; // Deliberate !!!
originalStringArray = new string[] { stringToPrepend }
.Concat(originalStringArray).ToArray();
insertAtPositionZeroBased += 1; // BECAUSE we prepended !!
originalStringArray = new ArraySegment<string>(originalStringArray, firstElementZero, insertAtPositionZeroBased)
.Concat(new string[] { stringToInsert })
.Concat(new ArraySegment<string>(originalStringArray, insertAtPositionZeroBased, originalStringArray.Length - insertAtPositionZeroBased)).ToArray();
The best choice depends on what you're going to be doing with this collection later on down the line. If that's the only length-changing edit that will ever be made, then your best bet is to create a new array with one additional slot and use Array.Copy() to do the rest. No need to initialize the first value, since new C# arrays are always zeroed out:
byte[] PrependWithZero(byte[] input)
{
var newArray = new byte[input.Length + 1];
Array.Copy(input, 0, newArray, 1, input.Length);
return newArray;
}
If there are going to be other length-changing edits that might happen, the most performant option might be to use a List<byte> all along, as long as the additions aren't always to the beginning. (If that's the case, even a linked list might not be an option that you can dismiss out of hand.):
var list = new List<byte>(input);
list.Insert(0, 0);
I am aware this is over 4-year-old accepted post, but for those who this might be relevant Buffer.BlockCopy would be faster.
Related
I have an array with total 5000 elements and at one functionality, I only need last 3000 elements only to proceed further.
for that I have tried following solution.
//skipping first 2000 elements
list = list.Skip(5000 - 3000).ToArray();
This solution is actually giving me desired solution, but when I ran profiler on my code, It is showing huge amount memory allocation on this line.
I have to use Array only due to carried on legacy. and very frequent ToArray() doesn't seem to be good for performance.
there is also possible solution,
//reversing whole list
Array.Reverse(list);
//restricting size of an array to 3000,
//so first (as reversed the list, they are last 3000 elements)
Array.Resize(ref list, 3000);
//again reversing list to make it proper order
Array.Reverse(list);
but this is even worse in time complexity.
Is there any better solution for this, which doesn't need casting from List to Array ?
If you absolutely have to use an array, then Array.Copy is probably your friend:
int[] smallerArray = new int[array.Length - 2000];
Array.Copy(array, 2000, smallerArray, 0, smallerArray.Length);
I'd expect that to be a bit more efficient than using Take followed by ToArray.
If list is a List<> you can use List.GetRange:
int lastN = 3000;
var sublist = list.GetRange(list.Count - lastN, lastN);
var array = sublist.ToArray();
This is more efficient because List.ToArray uses Array.Copy.
If list is an int[] as commented it's even more efficient:
int lastN = 3000;
int[] result = new int[lastN];
Array.Copy(list, list.Length - lastN, result, 0, lastN);
you can use Skip(Provide number which you want to exclude).ToArray();
Say I have a large byte[] and I'm not only looking to see if, but also where, a smaller byte[] is in the larger array. For example:
byte[] large = new byte[100];
for (byte i = 0; i < 100; i++) {
large[i] = i;
}
byte[] small = new byte[] { 23, 24, 25 };
int loc = large.IndexOf(small); // this is what I want to write
I guess I'm asking about looking for a sequence of any type (primitive or otherwise) within a larger sequence.
I faintly remember reading about a specific approach to this in strings, but I don't remember the name of the algorithm. I could easily write some way to do this, but I know there's a good solution and it's on the tip of my tongue. If there's some .Net method that does this, I'll take that too (although I'd still appreciate the name of the searching algorithm for education's sake).
You can do it with LINQ, like this:
var res = Enumerable.Range(0, large.Length-1)
.Cast<int?>()
.FirstOrDefault(n => large.Skip(n.Value).Take(small.Length).SequenceEqual(small));
if (res != null) {
Console.Println("Found at {0}", res.Value);
} else {
Console.Println("Not found");
}
The approach is self-explanatory except for the Cast<int?> part: you need it to decide between finding the result at the initial location of the large array, when zero is returned, and not finding the result at all, when the return is null.
Here is a demo on ideone.
The complexity of the above is O(M*N), where M and N are lengths of the large and small arrays. If the large array is very long, and contains a significant number of "almost right" sub-sequences that match long prefixes of small, you may be better off implementing an advanced algorithm for searching sequences, such as the Knuth–Morris–Pratt (KMP) algorithm. The KMP algorithm speeds up the search by making an observation that when a mismatch occurs, the small sequence contains enough information on how far ahead you can move in the large sequence based on where in the small sequence is the first mismatch. A look-up table is prepared for the small sequence, and then that table is used throughout the search to decide how to advance the search point. The complexity of KMP is O(N+M). See the Wikipedia article linked above for pseudocode of the KMP algorithm.
Are you thinking of Lambda expressions? That is what came to my mind when you said a more specific approach with strings.
http://www.dotnetperls.com/array-find
I'm writing a high-performance data structure. One problem I came across is there doesn't seem to be anyway to copy only a portion of an array to another array (preferably as quickly as possible). I also use generics, so I'm not really sure how I'd use Buffer.BlockCopy since it demands byte addresses and it appears to be impossible to objectively determine the size of an object. I know Buffer.BlockCopy works at a byte-level, but does it also count padding as a byte?
Example:
var tmo=new T[5];
var source = new T[10];
for(int i=5;i<source.Length;i++)
{
tmp[i-5]=source[i];
}
How would I do this in a faster way like Array.CopyTo?
You can use Array.Copy().
Array.Copy(source , 5, tmp, 0, tmp.Length);
As noticed in this question: Randomize a List<T> you can implement a shuffle method on a list; like one of the answers mentions:
using System.Security.Cryptography;
...
public static void Shuffle<T>(this IList<T> list)
{
RNGCryptoServiceProvider provider = new RNGCryptoServiceProvider();
int n = list.Count;
while (n > 1)
{
byte[] box = new byte[1];
do provider.GetBytes(box);
while (!(box[0] < n * (Byte.MaxValue / n)));
int k = (box[0] % n);
n--;
T value = list[k];
list[k] = list[n];
list[n] = value;
}
}
Does anyone know if it's possible to "Parallel-ize" this using some of the new features in C# 4?
Just a curiosity.
Thanks all,
-R.
Your question is ambiguous, but you can use the Parallel Framework to help doing some operations in parallel, but it depends on whether you want to get the bytes, then shuffle them, so the getting of the bytes is in parallel, or, if you want to shuffle multiple lists at one time.
If it is the former, I would suggest that you first break your code into smaller functions, so you can do some analysis to see where the bottlenecks are, as, if the getting of the bytes is the bottleneck, then doing it in parallel may make a difference. By having tests in place you can test new functions and decide if it is worth the added complexity.
static RNGCryptoServiceProvider provider = new RNGCryptoServiceProvider();
private static byte[] GetBytes() {
byte[] box = new byte[1];
do provider.GetBytes(box);
while (!(box[0] < n * (Byte.MaxValue / n)));
return box;
}
private static IList<T> InnerLoop(int n, IList<T> list) {
var box = GetBytes(n);
int k = (box[0] % n);
T value = list[k];
list[k] = list[n];
list[n] = value;
return list;
}
public static void Shuffle<T>(this IList<T> list)
{
int n = list.Count;
while (n > 1)
{
list = InnerLoop(n, list);
n--;
}
}
This is a rough idea as to how to split your function, so you can replace the GetBytes function, but you may need to make some other changes to test it.
Getting some numbers is crucial to make certain that you are getting enough of a benefit to warrant adding complexity though.
You may want to move the lines in InnerLoop that deals with list into a separate function, so you can see if that is slow and perhaps swap out some algorithms to improve it, but, you need to have an idea how fast you need the entire shuffle operation to go.
But if you want to just do multiple lists then it will be easy, you may want to perhaps look at PLINQ for that.
UPDATE
The code above is meant to just show an example of how it can be broken into smaller functions, not to be a working solution. If it is necessary to move the Provider class that I put into a static variable into a function and then pass it as a parameter then that may need to be done. I didn't test the code, but my suggestion is based on getting profiling then look at how to improve it's performance, especially since I am not certain which way it was meant to be done in parallel. It may be necessary to just build up an array in order, in parallel, then shuffle them, but first see what the time needed for each operation is, then see if doing it in parallel will be warranted.
There may be a need to use concurrent data structures also, if using multiple threads, in order to not pay a penalty by having to synchronize yourself, but, again, that may not be needed, depending on where the bottleneck is.
UPDATE:
Based on the answer to my comment, you may want to look at the various functions in the parallel library, but this page may help. http://msdn.microsoft.com/en-us/library/dd537609.aspx.
You can create a Func version of your function, and pass that in as a parameter. There are multiple ways to work use this library to make this function in parallel, as you already don't have any global variables, just lose the static operator.
You will want to get numbers as you add more threads though, to see where you start to see a decrease in performance, as you won't see a 1:1 improvement, so, if you add 2 threads it won't go twice as fast, so just test and see where it becomes a problem having more threads. Since your function is CPU bound you may want to have only one thread per core, as a rough starting point.
Any in-place shuffle is not very well suited for parallelization. Especially since a shuffle requires a random component (over the range) so there is no way to localize parts of the problem.
Well, you could fairly easily parallelize the code which is generating the random numbers - which could be the bottleneck when using a cryptographically secure random number generator. You could then use that sequence of random numbers (which would need to have its order preserved) within a single thread performing the swaps.
One problem which has just occurred to me though, is that RNGCryptoServiceProvider isn't thread-safe (and neither is System.Random). You'd need as many random number generators as threads to make this work. Basically it becomes a bit ugly :(
I'm downloading some files asynchronously into a large byte array, and I have a callback that fires off periodically whenever some data is added to that array. If I want to give developers the ability to use the last chunk of data that was added to array, then... well how would I do that? In C++ I could give them a pointer to somewhere in the middle, and then perhaps tell them the number of bytes that were added in the last operation so they at least know the chunk they should be looking at... I don't really want to give them a 2nd copy of that data, that's just wasteful.
I'm just thinking if people want to process this data before the file has completed downloading. Would anyone actually want to do that? Or is it a useless feature anyway? I already have a callback for when the buffer (entire byte array) is full, and then they can dump the whole thing without worrying about start and end points...
.NET has a struct that does exactly what you want:
System.ArraySegment.
In any case, it's easy to implement it yourself too - just make a constructor that takes a base array, an offset, and a length. Then implement an indexer that offsets indexes behind the scenes, so your ArraySegment can be seamlessly used in the place of an array.
You can't give them a pointer into the array, but you could give them the array and start index and length of the new data.
But I have to wonder what someone would use this for. Is this a known need? or are you just guessing that someone might want this someday. And If so, is there any reason why you couldn't wait to add the capability once somone actually needs it?
Whether this is needed or not depends on whether you can afford to accumulate all the data from a file before processing it, or whether you need to provide a streaming mode where you process each chunk as it arrives. This depends on two things: how much data there is (you probably would not want to accumulate a multi-gigabyte file), and how long it takes the file to completely arrive (if you are getting the data over a slow link you might not want your client to wait till it had all arrived). So it is a reasonable feature to add, depending on how the library is to be used. Streaming mode is usually a desirable attribute, so I would vote for implementing the feature. However, the idea of putting the data into an array seems wrong, because it fundamentally implies a non-streaming design, and because it requires an additional copy. What you could do instead is to keep each chunk of arriving data as a discrete piece. These could be stored in a container for which adding at the end and removing from the front is efficient.
Copying a chunk of a byte array may seem "wasteful," but then again, object-oriented languages like C# tend to be a little more wasteful than procedural languages anyway. A few extra CPU cycles and a little extra memory consumption can greatly reduce complexity and increase flexibility in the development process. In fact, copying bytes to a new location in memory to me sounds like good design, as opposed to the pointer approach which will give other classes access to private data.
But if you do want to use pointers, C# does support them. Here is a decent-looking tutorial. The author is correct when he states, "...pointers are only really needed in C# where execution speed is highly important."
I agree with the OP: sometimes you just plain need to pay some attention to efficiency. I don't think the example of providing an API is the best, because that certainly calls for leaning toward safety and simplicity over efficiency.
However, a simple example is when processing large numbers of huge binary files that have zillions of records in them, such as when writing a parser. Without using a mechanism such as System.ArraySegment, the parser becomes a big memory hog, and is greatly slowed down by creating a zillion new data elements, copying all the memory over, and fragmenting the heck out of the heap. It's a very real performance issue. I write these kinds of parsers all the time for telecommunications stuff which generate millions of records per day in each of several categories from each of many switches with variable length binary structures that need to be parsed into databases.
Using the System.ArraySegment mechanism versus creating new structure copies for each record tremendously speeds up the parsing, and greatly reduces the peak memory consumption of the parser. These are very real advantages because the servers run multiple parsers, run them frequently, and speed and memory conservation = very real cost savings in not having to have so many processors dedicated to the parsing.
System.Array segment is very easy to use. Here's a simple example of providing a base way to track the individual records in a typical big binary file full of records with a fixed length header and a variable length record size (obvious exception control deleted):
public struct MyRecord
{
ArraySegment<byte> header;
ArraySegment<byte> data;
}
public class Parser
{
const int HEADER_SIZE = 10;
const int HDR_OFS_REC_TYPE = 0;
const int HDR_OFS_REC_LEN = 4;
byte[] m_fileData;
List<MyRecord> records = new List<MyRecord>();
bool Parse(FileStream fs)
{
int fileLen = (int)fs.FileLength;
m_fileData = new byte[fileLen];
fs.Read(m_fileData, 0, fileLen);
fs.Close();
fs.Dispose();
int offset = 0;
while (offset + HEADER_SIZE < fileLen)
{
int recType = (int)m_fileData[offset];
switch (recType) { /*puke if not a recognized type*/ }
int varDataLen = ((int)m_fileData[offset + HDR_OFS_REC_LEN]) * 256
+ (int)m_fileData[offset + HDR_OFS_REC_LEN + 1];
if (offset + varDataLen > fileLen) { /*puke as file has odd bytes at end*/}
MyRecord rec = new MyRecord();
rec.header = new ArraySegment(m_fileData, offset, HEADER_SIZE);
rec.data = new ArraySegment(m_fileData, offset + HEADER_SIZE,
varDataLen);
records.Add(rec);
offset += HEADER_SIZE + varDataLen;
}
}
}
The above example gives you a list with ArraySegments for each record in the file while leaving all the actual data in place in one big array per file. The only overhead are the two array segments in the MyRecord struct per record. When processing the records, you have the MyRecord.header.Array and MyRecord.data.Array properties which allow you to operate on the elements in each record as if they were their own byte[] copies.
I think you shouldn't bother.
Why on earth would anyone want to use it?
That sounds like you want an event.
public class ArrayChangedEventArgs : EventArgs {
public (byte[] array, int start, int length) {
Array = array;
Start = start;
Length = length;
}
public byte[] Array { get; private set; }
public int Start { get; private set; }
public int Length { get; private set; }
}
// ...
// and in your class:
public event EventHandler<ArrayChangedEventArgs> ArrayChanged;
protected virtual void OnArrayChanged(ArrayChangedEventArgs e)
{
// using a temporary variable avoids a common potential multithreading issue
// where the multicast delegate changes midstream.
// Best practice is to grab a copy first, then test for null
EventHandler<ArrayChangedEventArgs> handler = ArrayChanged;
if (handler != null)
{
handler(this, e);
}
}
// finally, your code that downloads a chunk just needs to call OnArrayChanged()
// with the appropriate args
Clients hook into the event and get called when things change. This is what most client code in .NET expects to have in an API ("call me when something happens"). They can hook into the code with something as simple as:
yourDownloader.ArrayChanged += (sender, e) =>
Console.WriteLine(String.Format("Just downloaded {0} byte{1} at position {2}.",
e.Length, e.Length == 1 ? "" : "s", e.Start));