Encryption Project: Need advice on how to eliminate method overhead - c#

I am looking for advice. I have developed my own encryption algorithms because I enjoy it and I can. Now, I am looking to try a new idea.
My idea involves consolidating a number my algorithms into a larger one. For instance, you call X.Encrypt() then it uses A.Encrypt(), B.Encrypt(), C.Encrypt() etc. When you perform this kind of operation one byte per A, B, C method call the method overhead becomes killer. Going from a few ms to several minutes. So, any questions?
I am merely looking for code design tips and tricks to maybe lessen the issue.
Thanks ahead of time.
Update
Code example of the issue:
//fast
moduleA.Transform(true, buffer, 0, buffer.Length);
moduleB.Transform(true, buffer, 0, buffer.Length);
//slow
for (int L = 0; L < buffer.Length; )
{
moduleA.Transform(true, buffer, L++, 1);
moduleB.Transform(true, buffer, L++, 1);
}
I know this problem is inherent to how it is being called. My goal is to change how I am doing it. I know inside the Transform methods there can be improvement. The fast operates in about 24s while the slow takes many minutes. Clearly, overhead from the methods, no profiler needed :)
I do have an idea I am going to try. I am thinking about using "run-modes" where I instead of looping outside of the Transform methods I change how it runs inside each method to fit my needs. So, I could do an every-other-byte encryption performed inside the Transform methods and as a batch. I believe this would eliminate the overhead I am getting.
FINAL UPDATE (Solved my own issue, still open to ideas!)
Incrementing the loop rate inside the Transform method has worked!
What I've done is the following and it seems to work well:
ITransformationModule moduleA = TransformationFactory.GetModuleInstance("Subspace28");
ITransformationModule moduleB = TransformationFactory.GetModuleInstance("Ataxia");
moduleA.IncrementInterval = 2;
moduleB.IncrementInterval = 2;
moduleA.Transform(true, buffer, 0, buffer.Length);
moduleB.Transform(true, buffer, 1, buffer.Length);
This runs at about 12s for 100MB on my work VM. Thank you all who contributed! It was a combination of response that helped lead me to try it this way. I appreciate you all greatly!
This is just proof of concept at the moment. It is building towards greater things! :)

Are you encrypting the data by calling methods on a byte-by-byte basis? Why not call the method on a chunk of data and loop within that method? Also, while it is definitely fun to try out your own encryption methods, you should pretty much always use a known, tested, and secure algorithm if security is at all a concern.

You could try to implement your algorithm such that your code makes chunky calls then chatty calls. That is instead of calling functions hundred of time, you could have less function calls such that each function has more work to do. This is one advice, you might have to make your algorithm efficient as well such that its not processor intensive. Hope this help.

You want to have class X call methods from class A, B, C, D, E, F, G, etc...without the method call overhead. At first, that seems absurd. You might be able to find a way to do it using System.Reflection.Emit. That is, dynamically create a method that does A+B+C+D+E+F+G, then call that.

Firstly profile your code so you know where you should operate first, then ask again :)

Would something like this work? Of course you would have to modify it to fit your encryption arguments and return types....
static class Encryptor
{
delegate void Transform(bool b, byte[] buffer, int index, int length);
static Transform[] transformers = new Transform[3];
static Encryptor()
{
transformers[0] = (b, buffer, index, length) => { /*Method A*/ };
transformers[1] = (b, buffer, index, length) => { /*Method B*/ };
transformers[2] = (b, buffer, index, length) => { /*Method C*/ };
}
public static void Encrypt(bool b, byte[] buffer)
{
int length = buffer.Length;
int nTransforms = transformers.Length;
for (int i = 0; i < length;)
{
for (int j = 0; j < nTransforms; j++)
{
transformers[i % nTransforms](b, buffer, i++, 1);
}
}
}
}
Edit So this would do the second example
Encryptor.Encrypt(yourBoolean, yourBuffer);
I don't know the specifics of your implementation, but this shouldn't have overhead issues.

Related

Safely access data in MemoryStream

Assume that I have a MemoryStream and function that operates on bytes.
Current code is something like this:
void caller()
{
MemoryStream ms = // not important
func(ms.GetBuffer(), 0, (int)ms.Length);
}
void func(byte[] buffer, int offset, int length)
{
// not important
}
I can not change func but I would like to minimize possibility of changing stream data from within the func.
How could / should I rewrite the code to be sure that stream data won't be changed?
Or this can't be done?
EDIT:
I am sorry, I didn't mention that a I would like to not make copies of data.
Call .ToArray.
func(ms.GetBuffer().ToArray(), 0, (int)ms.Length);
From MSDN (emphasis mine):
Note that the buffer contains allocated bytes which might be unused.
For example, if the string "test" is written into the MemoryStream
object, the length of the buffer returned from GetBuffer is 256, not
4, with 252 bytes unused. To obtain only the data in the buffer, use
the ToArray method; however, ToArray creates a copy of the data in
memory.
Ideally you would change func to take an IEnumerable<byte>. Once a method has the array, you're trusting they won't modify the data if you don't want them to. If the contract was to provide IEnumerable<byte>, the implementer would have to decide if they need a copy to edit or not.
If you can't make a copy (ToArray as suggested in other answers) and can't change signature of the func function the only thing left is try to validate that function did not change the data.
You may compute some sort of hash before/after call and check if it is the same. It will not guarantee that func did not changed the underlying data (due to hash collisions), but at least will give you good chance to know if it happened. May be useful for non-production code...
The real solution is to either provide copy of the data to untrusted code OR pass some wrapper interface/object that does not allow data changes (requires signature changes/rewrite for func).
Copy the data out of the stream by using ms.ToArray(). Obviously, there'll be a performance hit.
You cannot pass only a 'slice' of an array to a method. Either you pass a copy of the array to the method and copy the result back:
byte[] slice = new byte[length];
Buffer.BlockCopy(bytes, offset, slice, 0, length);
func(slice, 0, length);
Buffer.BlockCopy(slice, 0, bytes, offset, length);
or, if you can change the method, you pass some kind of proxy object that wraps the array and checks for each access if it's within the allowed range:
class ArrayView<T>
{
private T[] array;
private int offset;
private int length;
public T this[int index]
{
get
{
if (index < offset || index >= offset + length)
throw new ArgumentOutOfRange("index");
return array[index];
}
set
{
if (index < offset || index >= offset + length)
throw new ArgumentOutOfRange("index");
array[index] = value;
}
}
}
Are you trying to make sure that func() is never actually able to change the memory stream, or is it enough if your code can throw an exception if something is changed? Sounds like you want to do something like:
void caller()
{
MemoryStream ms = // not important
var checksum = CalculateMyChecksum(ms);
func(ms.GetBuffer(), 0, (int)ms.Length);
if(checksum != CalculateMyChecksum(ms)){
throw new Exception("Hey! Someone has been fiddling with my memory!");
}
}
I would not feel comfortable recommending this for anything important / critical though. Could you give some more information? Maybe there is a better solution to your problem, and a way to avoid this issue completely.

c# to c++ dictionary to unordered_map results

I've done a few years of c# now, and I'm trying to learn some new stuff. So I decided to have a look at c++, to get to know programming in a different way.
I've been doing loads of reading, but I just started writing some code today.
On my Windows 7/64 bit machine, running VS2010, I created two projects:
1) A c# project that lets me write things the way I'm used to.
2) A c++ "makefile" project that let's me play around, trying to implement the same thing. From what I understand, this ISN'T a .NET project.
I got to trying to populate a dictionary with 10K values. For some reason, the c++ is orders of magnitude slower.
Here's the c# below. Note I put in a function after the time measurement to ensure it wasn't "optimized" away by the compiler:
var freq = System.Diagnostics.Stopwatch.Frequency;
int i;
Dictionary<int, int> dict = new Dictionary<int, int>();
var clock = System.Diagnostics.Stopwatch.StartNew();
for (i = 0; i < 10000; i++)
dict[i] = i;
clock.Stop();
Console.WriteLine(clock.ElapsedTicks / (decimal)freq * 1000M);
Console.WriteLine(dict.Average(x=>x.Value));
Console.ReadKey(); //Don't want results to vanish off screen
Here's the c++, not much thought has gone into it (trying to learn, right?)
int input;
LARGE_INTEGER frequency; // ticks per second
LARGE_INTEGER t1, t2; // ticks
double elapsedTime;
// get ticks per second
QueryPerformanceFrequency(&frequency);
int i;
boost::unordered_map<int, int> dict;
// start timer
QueryPerformanceCounter(&t1);
for (i=0;i<10000;i++)
dict[i]=i;
// stop timer
QueryPerformanceCounter(&t2);
// compute and print the elapsed time in millisec
elapsedTime = (t2.QuadPart - t1.QuadPart) * 1000.0 / frequency.QuadPart;
cout << elapsedTime << " ms insert time\n";
int input;
cin >> input; //don't want console to disappear
Now, some caveats. I managed to find this related SO question. One of the guys wrote a long answer mentioning WOW64 skewing the results. I've set the project to release and gone through the "properties" tab of the c++ project, enabling everything that sounded like it would make it fast. Changed the platform to x64, though I'm not sure whether that addresses his wow64 issue. I'm not that experienced with the compiler options, perhaps you guys have more of a clue?
Oh, and the results: c#:0.32ms c++: 8.26ms. This is a bit strange. Have I misinterpreted something about what .Quad means? I copied the c++ timer code from someplace on the web, going through all the boost installation and include/libfile rigmarole. Or perhaps I am actually using different instruments unwittingly? Or there's some critical compile option that I haven't used? Or maybe the c# code is optimized because the average is a constant?
Here's the c++ command line, from the Property page->C/C++->Command Line:
/I"C:\Users\Carlos\Desktop\boost_1_47_0" /Zi /nologo /W3 /WX- /MP /Ox /Oi /Ot /GL /D "_MBCS" /Gm- /EHsc /GS- /Gy- /arch:SSE2 /fp:fast /Zc:wchar_t /Zc:forScope /Fp"x64\Release\MakeTest.pch" /Fa"x64\Release\" /Fo"x64\Release\" /Fd"x64\Release\vc100.pdb" /Gd /errorReport:queue
Any help would be appreciated, thanks.
A simple allocator change will cut that time down a lot.
boost::unordered_map<int, int, boost::hash<int>, std::equal_to<int>, boost::fast_pool_allocator<std::pair<const int, int>>> dict;
0.9ms on my system (from 10ms before). This suggests to me that actually, the vast, vast majority of your time is not spent in the hash table at all, but in the allocator. The reason that this is an unfair comparison is because your GC will never collect in such a trivial program, giving it an undue performance advantage, and native allocators do significant caching of free memory- but that'll never come into play in such a trivial example, because you've never allocated or deallocated anything and so there's nothing to cache.
Finally, the Boost pool implementation is thread-safe, whereas you never play with threads so the GC can just fall back to a single-threaded implementation, which will be much faster.
I resorted to a hand-rolled, non-freeing non-thread-safe pool allocator and got down to 0.525ms for C++ to 0.45ms for C# (on my machine). Conclusion: Your original results were very skewed because of the different memory allocation schemes of the two languages, and once that was resolved, then the difference becomes relatively minimal.
A custom hasher (as described in Alexandre's answer) dropped my C++ time to 0.34ms, which is now faster than C#.
static const int MaxMemorySize = 800000;
static int FreedMemory = 0;
static int AllocatorCalls = 0;
static int DeallocatorCalls = 0;
template <typename T>
class LocalAllocator
{
public:
std::vector<char>* memory;
int* CurrentUsed;
typedef T value_type;
typedef value_type * pointer;
typedef const value_type * const_pointer;
typedef value_type & reference;
typedef const value_type & const_reference;
typedef std::size_t size_type;
typedef std::size_t difference_type;
template <typename U> struct rebind { typedef LocalAllocator<U> other; };
template <typename U>
LocalAllocator(const LocalAllocator<U>& other) {
CurrentUsed = other.CurrentUsed;
memory = other.memory;
}
LocalAllocator(std::vector<char>* ptr, int* used) {
CurrentUsed = used;
memory = ptr;
}
template<typename U> LocalAllocator(LocalAllocator<U>&& other) {
CurrentUsed = other.CurrentUsed;
memory = other.memory;
}
pointer address(reference r) { return &r; }
const_pointer address(const_reference s) { return &r; }
size_type max_size() const { return MaxMemorySize; }
void construct(pointer ptr, value_type&& t) { new (ptr) T(std::move(t)); }
void construct(pointer ptr, const value_type & t) { new (ptr) T(t); }
void destroy(pointer ptr) { static_cast<T*>(ptr)->~T(); }
bool operator==(const LocalAllocator& other) const { return Memory == other.Memory; }
bool operator!=(const LocalAllocator&) const { return false; }
pointer allocate(size_type count) {
AllocatorCalls++;
if (*CurrentUsed + (count * sizeof(T)) > MaxMemorySize)
throw std::bad_alloc();
if (*CurrentUsed % std::alignment_of<T>::value) {
*CurrentUsed += (std::alignment_of<T>::value - *CurrentUsed % std::alignment_of<T>::value);
}
auto val = &((*memory)[*CurrentUsed]);
*CurrentUsed += (count * sizeof(T));
return reinterpret_cast<pointer>(val);
}
void deallocate(pointer ptr, size_type n) {
DeallocatorCalls++;
FreedMemory += (n * sizeof(T));
}
pointer allocate() {
return allocate(sizeof(T));
}
void deallocate(pointer ptr) {
return deallocate(ptr, 1);
}
};
int main() {
LARGE_INTEGER frequency; // ticks per second
LARGE_INTEGER t1, t2; // ticks
double elapsedTime;
// get ticks per second
QueryPerformanceFrequency(&frequency);
std::vector<char> memory;
int CurrentUsed = 0;
memory.resize(MaxMemorySize);
struct custom_hash {
size_t operator()(int x) const { return x; }
};
boost::unordered_map<int, int, custom_hash, std::equal_to<int>, LocalAllocator<std::pair<const int, int>>> dict(
std::unordered_map<int, int>().bucket_count(),
custom_hash(),
std::equal_to<int>(),
LocalAllocator<std::pair<const int, int>>(&memory, &CurrentUsed)
);
// start timer
std::string str;
QueryPerformanceCounter(&t1);
for (int i=0;i<10000;i++)
dict[i]=i;
// stop timer
QueryPerformanceCounter(&t2);
// compute and print the elapsed time in millisec
elapsedTime = ((t2.QuadPart - t1.QuadPart) * 1000.0) / frequency.QuadPart;
std::cout << elapsedTime << " ms insert time\n";
int input;
std::cin >> input; //don't want console to disappear
}
Storing a consecutive sequence of numeric integral keys added in ascending order is definitely NOT what hash tables are optimized for.
Use an array, or else generate random values.
And do some retrievals. Hash tables are highly optimized for retrieval.
You can try dict.rehash(n) with different (large) values of n before inserting elements, and see how this impacts performance. Memory allocations (they take place when the container fills buckets) are generally more expensive in C++ than in C#, and rehashing is also heavy. For std::vector and std::deque, the analog member function is reserve.
Different rehash policies and load factor threshold (have a look at the max_load_factor member function) will also greatly impact unordered_map's performance.
Next, since you're using VS2010, I suggest you use std::unordered_map from the <unordered_map> header. Don't use boost when you can use the standard library.
The actual hash function used may greatly impact performance. You may try with the following:
struct custom_hash { size_t operator()(int x) const { return x; } };
and use std::unordered_map<int, int, custom_hash>.
Finally, I agree that this is a poor usage of hash tables. Use random values for insertion, you'll get a more precise picture of what is going on. Testing insertion speeds of hash tables isn't stupid at all, but hash tables are not meant to store consecutive integers. Use a vector for this.
Visual Studio TR1 unordered_map is the same as stdext::hash_map:
Another thread asking why it performs slow, see my answer with links to others that have discovered the same issue. The conclusion is to use another hash_map implementation when in C+++:
Alternative to stdext::hash_map for performance reasons
Btw. remember when in C++ then there is big difference between optimized Release-build and non-optimized Debug-build compared to C#.

Should variable declarations always be placed outside of a loop?

Is it better to declare a variable used in a loop outside of the loop rather then inside? Sometimes I see examples where a variable is declared inside the loop. Does this effectively cause the program to allocate memory for a new variable each time the loop runs? Or is .NET smart enough to know that it's really the same variable.
For example see the code below from this answer.
public static void CopyStream(Stream input, Stream output)
{
byte[] buffer = new byte[32768];
while (true)
{
int read = input.Read (buffer, 0, buffer.Length);
if (read <= 0)
return;
output.Write (buffer, 0, read);
}
}
Would this modified version be any more efficent?
public static void CopyStream(Stream input, Stream output)
{
int read; //OUTSIDE LOOP
byte[] buffer = new byte[32768];
while (true)
{
read = input.Read (buffer, 0, buffer.Length);
if (read <= 0)
return;
output.Write (buffer, 0, read);
}
}
No, it wouldn't be more efficient. However, I'd rewrite it this way which happens to declare it outside the loop anyway:
byte[] buffer = new byte[32768];
int read;
while ((read = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, read);
}
I'm not generally a fan of using side-effects in conditions, but effectively the Read method is giving you two bits of data: whether or not you've reached the end of the stream, and how much you've read. The while loop is now saying, "While we've managed to read some data... copy it."
It's a little bit like using int.TryParse:
if (int.TryParse(text, out value))
{
// Use value
}
Again you're using a side-effect of calling the method in the condition. As I say, I don't make a habit out of doing this except for this particular pattern, when you're dealing with a method returning two bits of data.
The same thing comes up reading lines from a TextReader:
string line;
while ((line = reader.ReadLine()) != null)
{
...
}
To go back to your original question: if a variable is going to be initialized in every iteration of a loop and it's only used within the body of the loop, I'd almost always declare it within the loop. One minor exception here is if the variable is being captured by an anonymous function - at that point it will make a difference in behaviour, and I'd pick whichever form gave me the desired behaviour... but that's almost always the "declare inside" form anyway.
EDIT: When it comes to scoping, the code above does indeed leave the variable in a larger scope than it needs to be... but I believe it makes the loop clearer. You can always address this by introducing a new scope if you care to:
{
int read;
while (...)
{
}
}
In the unlikely environment that doesn't help you with this, it would still be a micro-optimization. Factors like clarity and proper scoping is much more important than the edge case where this might just make next to no difference.
You should give your variables proper scope without thinking about performance. Of course, complex initializations are a different beast, so if something should only be initialized once but is only used within a loop, you'd still want to declare it outside.
I am going to agree with most of these other answers with a caveat.
If you are using lambada expressions you must be careful with capturing variables.
static void Main(string[] args)
{
var a = Enumerable.Range(1, 3);
var b = a.GetEnumerator();
int x;
while(b.MoveNext())
{
x = b.Current;
Task.Factory.StartNew(() => Console.WriteLine(x));
}
Console.ReadLine();
}
will give the result
3
3
3
Where
static void Main(string[] args)
{
var a = Enumerable.Range(1, 3);
var b = a.GetEnumerator();
while(b.MoveNext())
{
int x = b.Current;
Task.Factory.StartNew(() => Console.WriteLine(x));
}
Console.ReadLine();
}
will give the result
1
2
3
or some order there of. This is because when the task finally starts it will check the current value of it's reference to x. in the first example all 3 loops pointed at the same reference, where in the second example they all pointed at different references.
As is the case with lots of simple optimizations like this, the compiler takes care of it for you. If you try both of these and look at the assemblies' IL in ildasm you can see that they both declare a single int32 read variable, although it does reorder the declarations:
.locals init ([0] int32 read,
[1] uint8[] buffer,
[2] bool CS$4$0000)
.locals init ([0] uint8[] buffer,
[1] int32 read,
[2] bool CS$4$0000)
It really doesn't matter, and if I was reviewing the code for that particular example, I wouldn't care either way.
However, be aware that the two can mean very different things if you end up capturing the 'read' variable in a closure.
See this excellent post from Eric Lippert where this issue comes up regarding foreach loops - Link
I've generally preferred the latter as a matter of personal habit because, even if .NET is smart enough, other environments in which I might work later may not be smart enough. It could be nothing more than compiling down to an extra line of code inside the loop to re-initialize the variable, but it's still overhead.
Even if they're identical for all measurable purposes in any given example, I would say the latter has less of a chance of causing problems in the long run.

Is it possible to combine hash codes for private members to generate a new hash code?

I have an object for which I want to generate a unique hash (override GetHashCode()) but I want to avoid overflows or something unpredictable.
The code should be the result of combining the hash codes of a small collection of strings.
The hash codes will be part of generating a cache key, so ideally they should be unique however the number of possible values that are being hashed is small so I THINK probability is in my favour here.
Would something like this be sufficient AND is there a better way of doing this?
int hash = 0;
foreach(string item in collection){
hash += (item.GetHashCode() / collection.Count)
}
return hash;
EDIT: Thanks for answers so far.
#Jon Skeet: No, order is not important
I guess this is almost a another question but since I am using the result to generate a cache key (string) would it make sense to use a crytographic hash function like MD5 or just use the string representation of this int?
The fundamentals pointed out by Marc and Jon are not bad but they are far from optimal in terms of their evenness of distribution of the results. Sadly the 'multiply by primes' approach copied by so many people from Knuth is not the best choice in many cases better distribution can be achieved by cheaper to calculate functions (though this is very slight on modern hardware). In fact throwing primes into many aspects of hashing is no panacea.
If this data is used for significantly sized hash tables I recommend reading of Bret Mulvey's excellent study and explanation of various modern (and not so modern) hashing techniques handily done with c#.
Note that the behaviour with strings of various hash functions is heavily biased towards wehther the strings are short (roughly speaking how many characters are hashed before the bits begin to over flow) or long.
One of the simplest and easiest to implement is also one of the best, the Jenkins One at a time hash.
private static unsafe void Hash(byte* d, int len, ref uint h)
{
for (int i = 0; i < len; i++)
{
h += d[i];
h += (h << 10);
h ^= (h >> 6);
}
}
public unsafe static void Hash(ref uint h, string s)
{
fixed (char* c = s)
{
byte* b = (byte*)(void*)c;
Hash(b, s.Length * 2, ref h);
}
}
public unsafe static int Avalanche(uint h)
{
h += (h<< 3);
h ^= (h>> 11);
h += (h<< 15);
return *((int*)(void*)&h);
}
you can then use this like so:
uint h = 0;
foreach(string item in collection)
{
Hash(ref h, item);
}
return Avalanche(h);
you can merge multiple different types like so:
public unsafe static void Hash(ref uint h, int data)
{
byte* d = (byte*)(void*)&data;
AddToHash(d, sizeof(int), ref h);
}
public unsafe static void Hash(ref uint h, long data)
{
byte* d= (byte*)(void*)&data;
Hash(d, sizeof(long), ref h);
}
If you only have access to the field as an object with no knowledge of the internals you can simply call GetHashCode() on each one and combine that value like so:
uint h = 0;
foreach(var item in collection)
{
Hash(ref h, item.GetHashCode());
}
return Avalanche(h);
Sadly you can't do sizeof(T) so you must do each struct individually.
If you wish to use reflection you can construct on a per type basis a function which does structural identity and hashing on all fields.
If you wish to avoid unsafe code then you can use bit masking techniques to pull out individual bits from ints (and chars if dealing with strings) with not too much extra hassle.
Hashes aren't meant to be unique - they're just meant to be well distributed in most situations. They're just meant to be consistent. Note that overflows shouldn't be a problem.
Just adding isn't generally a good idea, and dividing certainly isn't. Here's the approach I usually use:
int result = 17;
foreach (string item in collection)
{
result = result * 31 + item.GetHashCode();
}
return result;
If you're otherwise in a checked context, you might want to deliberately make it unchecked.
Note that this assumes that order is important, i.e. that { "a", "b" } should be different from { "b", "a" }. Please let us know if that's not the case.
There is nothing wrong with this approach as long as the members whose hashcodes you are combining follow the rules of hash codes. In short ...
The hash code of the private members should not change for the lifetime of the object
The container must not change the object the private members point to lest it in turn change the hash code of the container
If the order of the items is not important (i.e. {"a","b"} is the same as {"b","a"}) then you can use exclusive or to combine the hash codes:
hash ^= item.GetHashCode();
[Edit: As Mark pointed out in a comment to a different answer, this has the drawback of also give collections like {"a"} and {"a","b","b"} the same hash code.]
If the order is important, you can instead multiply by a prime number and add:
hash *= 11;
hash += item.GetHashCode();
(When you multiply you will sometimes get an overflow that is ignored, but by multiplying with a prime number you lose a minimum of information. If you instead multiplied with a number like 16, you would lose four bits of information each time, so after eight items the hash code from the first item would be completely gone.)

How can I compound byte[] buffers into a List<byte>?

So I'm receiving data over a socket using a buffer (byte[]) of size 1024, and I want to combine the reads together to form the entire packet in the event that they're bigger than 1024 bytes. I chose a List to store the entire packet, and what I want to do is add each buffer read to it as it comes in. I'd want to do:
List.AddRange(Buffer);
But in the event that the buffer isn't full a bunch of empty bytes would get padded to the end. So naturally what I would want to do is add only a certain range of bytes to the List, but there is no such method. I could always create a temporary byte array of exactly the number of bytes that were received and then use AddRange() and get the result I want, but it just seems stupid to me. Not to mention it would be creating then throwing away an array on each read of data, which wouldn't be good for performance on a scalable multiuser server.
Is there a way to do this with a List? Or is there some other data structure I can use?
If you're using C# 3.5 (LINQ)
list.AddRange(buffer.Take(count));
Do you actually need the result to be a List<byte>? What are you going to do with it afterwards? If you really only need an IEnumerable<byte> I'd suggest creating something like this:
using System;
using System.Collections;
using System.Collections.Generic;
public class ArraySegmentConcatenator<T> : IEnumerable<T>
{
private readonly List<ArraySegment<T>> segments =
new List<ArraySegment<T>>();
public IEnumerator<T> GetEnumerator()
{
foreach (ArraySegment<T> segment in segments)
{
for (int i=0; i < segment.Count; i++)
{
yield return segment.Array[i+segment.Offset];
}
}
}
public void Add(ArraySegment<T> segment)
{
segments.Add(segment);
}
public void Add(T[] array)
{
segments.Add(new ArraySegment<T>(array));
}
public void Add(T[] array, int count)
{
segments.Add(new ArraySegment<T>(array, 0, count));
}
public void Add(T[] array, int offset, int count)
{
segments.Add(new ArraySegment<T>(array, offset, count));
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
Then you can just add the relevant segments each time. Of course, you could end up with a lot of wasted memory, and you'd have to be careful to create a new buffer each time (instead of reading over the original again) but it would be efficient in other ways.
For .Net3.5 you can use the .Take() extension method to only return the actual number of bytes you received.
I don't know what protocol you are using, or if you are implementing a custom protocol, but if you identify the size you can use Buffer.BlockCopy to directly copy the bytes to a new array to add to your list.
It's hard to be more concise when you don't have specifics.
You could implement your own IEnumerable implementation which retrieves only the bytes you want from the array. Then you could do:
List.AddRange(new BufferEnumerator(Buffer));
Edit
You can also look at:
new System.ArraySegment(Buffer,0,numBytesRecieved)
I'm not positive if ArraySegment would work I remember reading some downsides of it but don't remember the specifics.
You can use Array.Copy() and use only arrays to build your target buffer:
byte[] recvBuffer = new byte[1024];
byte[] message = new byte[0];
int nReaded;
while ((nReaded = ....Read(recvBuffer, 1024) > 0)
{
byte[] tmp = new byte[message.Length + nReaded];
Buffer.BlockCopy(message, 0, tmp, 0, message.Length);
Buffer.BlockCopy(recvBuffer, 0, tmp, message.Length, nReaded);
message = tmp;
}
EDIT: Replaced Array.Copy() with Buffer.BlockCopy() like suggested by Quintin Robinson in the comments.

Categories