I saw this while coding and thought it seemed odd:
Sure enough, MSDN says that RNGCryptoServiceProvider(byte[] rgb) and RNGCryptoServiceProvider(string str) both ignore their parameters.
As far as I can tell, there is no practical difference between either of those two and RNGCryptoServiceProvider(). What is going on? I suspect I'm missing something weird to do with cryptography.
It's probably a leftover from a previous older version, maybe going back as far as 1.x as even the 2.0 API contains the same description. It could however well be that the 2.0 and 2.1 API's have changed in the mean time.
If you look at the Mono source then you find
_handle = RngInitialize (rgb);
and
_handle = RngInitialize (Encoding.UTF8.GetBytes (str));
so I presume that the data was used as an additional or initial seed for a platform provided random number generator. That would also make the most sense. The Mono source usually follows the MS implementation as much as possible.
If the platform provided RNG is secure there may be little need to seed it from an application. Using an RNG as method to generate the same stream over and over again (i.e. when the parameter is used as initial seed) is fraught with danger, especially if the underlying implementation is unknown and may differ between platform and system updates. So that would be a good reason to deprecate the constructors.
Obviously if they are just deleted then the old sources won't compile anymore. So it is more logical to leave the implementation empty as the resulting instance should be generating random data anyway.
In the end this is just a (very) educated guess though, the reason is not specified in the current API documentation. The constructors are not marked obsolete either it seems. Everything I've come to expect from Microsofts crypto API documentation in other words.
Related
Edit
After digging deeper, I learned that my problem is caused by an improperly truncated protobuf stream (once again). I thus reformulated this question to focus on aspects that have not already been answered in the linked questions (see "Refined question").
Background
So, I've inherited this large set of applications that use protobuf-net for just about everything (which is not a bad thing :-)). Customers have tons of serialized data lying around, so I'm bound to staying backwards compatible with the protobuf-net version these applications are currently using (at the very least, I have to be able deserialize the data they have). And here comes the catch: To this date, all parts of the application are based on protobuf-net 1.0.0.282, and I dearly want/need to upgrade. And, as you can imagine, my initial attempts at swapping in newer versions (starting with v2) failed miserably (I get various exceptions, like "Invalid field in source data: 0").
Original question
Staying away from getting into the details of the individual exceptions, is my goal of upgrading to non-primeval versions of protobuf-net while keeping backwards compatibility even feasible? If yes, what would be a good starting point? Are there any resources on how to start such an undertaking? I only found this document in the github repository, but I did not achieve much playing around with the mentioned CompatibilityLevel.
The question(s)
Note: This part of the question has already been answered in the comments - yes, the protocol is compatible
Staying away from getting into the details of the individual exceptions, is my goal of upgrading to non-primeval versions of protobuf-net while keeping backwards compatibility even feasible? If yes, what would be a good starting point? Are there any resources on how to start such an undertaking? I only found this document in the github repository, but I did not achieve much playing around with the mentioned CompatibilityLevel.
Refined question
Turns out my real problem is that the original author of the coded decided to serialize the protobuf data into files that are zero-padded at the end. Don't ask me why, but there's nothing I can do about it - this data is living on customer machines. Protobuf-net 1.0.0.282 is perfectly happy with these files, and deserializes them correctly, whereas newer versions (correctly) barf on them. Now, is there anything I can do to make newer versions of protobuf accept the zero-padded files? How can I update without breaking my customers data archives?
As far as I can recall, there were no fundamental configuration changes around the time of 1.x to 2.x (which was a very long time ago), so this should - as far as I know - be a very low effort upgrade. If something unexpected is happening, an example type and payload would be useful, although this may be a better topic for GitHub than Stack Overflow.
The actual data protocol hasn't changed at any point, so there is no fundamental reason this shouldn't work, and to be honest I'm surprised you've hit any snags at all. I'd be happy to help further, but: I'd really need to see the problem in front of me.
I'm current stuck in the random generator. The requirement specification shows a sample like this:
Random rand = new Random(3412);
The rand result is not directly given out, but used for other performance.
I'd written the same code as above to generate a random number by a seed 3412.
however, the result of the rest performance is totally different with sample.
The generating result is 518435373, I used the same code tried on the online c# compiler, but getting different result of generation which is 11688046, the rest performance result was also different with the sample.
So I'm just wondering is that supposed to be different in different machines?
BTW, could anyone provide the result from your machine just see if it's same with me.
I would expect any one implementation to give the same sequence for the same seed, but there may well be different implementations involved. For example, an "online C# compiler" may well end up using Mono, which I'd expect to have a different implementation to the one in .NET.
I don't know whether the implementations have changed between versions of .NET, but again, that seems entirely possible.
The documentation for the Random(int) constructor states:
Providing an identical seed value to different Random objects causes each instance to produce identical sequences of random numbers.
... but it doesn't specify the implications of different versions etc. Heck, it doesn't even state whether the x86 and x64 versions will give the same results. I'd expect the same results within any one specific CLR instance (i.e. one process, and not two CLRs running side-by-side, either*.
If you need anything more stable, I'd start off with a specified algorithm - I bet there are implementations of the Mersenne Twister etc available.
It isn't specified as making such a promise, so you should assume that it does not.
A good rule with any specification, is not to make promises that aren't required for reasonable use, so you are freer to improve things later on.
Indeed, Random's documentation says:
The current implementation of the Random class is based on Donald E. Knuth's subtractive random number generator algorithm.
Note the phrase "current implementation", implying it may change in the future. This very strongly suggests that not only is there no promise to be consistent between versions, but there is no intention to either.
If a spec requires consistent pseudo-random numbers, then it must specify the algorithm as well as the seed value. Indeed, even if Random was specified as making such a promise, what if you need a non-.NET implementation of all or part of your specification - or something that interoperates with it - in the future?
This is probably due to different framework versions. Have a look at this
The online provider you tried might use the Mono implementation of the CLR, which is different of the one Microsoft provides. So probably their Random class implementation is a bit different.
if one takes a look at the decompiled source of the .net framework code most of the APIs have checks like these
if (source == null)
throw Error.ArgumentNull("source");
on the method arguments instead of using a more generic class like
Guard.IsNotNull(source);
Is there a reason behind doing this explicilty every time or is this just legacy code that is been around since the framework was developed and the newer classes are moving towards this or are there any inherent advantages of having explicit checks? One reason that I could think off is probably to avoid overloading the stack with function pointers.
Adding to Matthews answer:
Your proposed syntax of Guard.IsNotNull(source); is not directly equivalent to the first code snippet. It only passes the value of the parameter but not its name, so the thrown exception can't report the name of the offending parameter. It just knows that one of the parameters is null.
You could use expression trees - like so: Guard.IsNotNull(() => source); - but analyzing this expression tree has a rather large performance impact at runtime, so this isn't an option either.
Your proposed syntax could only be used in conjunction with a static weaver. That's basically a post-compiler that changes the generated IL. That's the approach Code Contracts are using. But this comes with its own cost, namely:
That static weaver needs to be written by someone in the first place
It increases build time
The weaver also needs to patch the debug symbols
It causes all sorts of problems with Edit and Continue
Nowadays we can do this with Code Contracts so we can use:
Contract.Requires(source != null);
Contract.Ensures(Contract.Result<MyType>() != null);
and so on, but currently we can only do this in our own code because this isn't built in to the CLR yet (it's a separate download).
The Code Contracts classes themselves have been a part of .Net since version 4, but by themselves they don't generate any checking code. To do that we need the code contracts rewriter which will be called by the C# compiler when generating your code. That's the thing that needs a separate download.
So yes we have better ways to do this now, but it hasn't been released as part of the CLR (yet) and so the CLR is currently using what you think of as a "legacy" approach.
It's certainly nothing to do with "overloading the stack with function pointers".
Even with Code Contracts, we are still doing a check of course. There's no IL command that I know of that checks an argument for null and throws if it is, so such work has to be done using several IL instructions (in all CLR languages). However, the Code Contracts code rewriter does generate inline code to check the Code Contract's predicate (e.g. value != null) rather than calling a method to do so, so it is very efficient.
There's no Guard class in the .NET framework so your proposed alternative is not feasible. Later additions to the framework do use code contracts, but rather sparingly. Not every .NET programmer at Microsoft seems that convinced that contracts are that useful, I do share the sentiment.
You are otherwise seeing the way Microsoft works. Code in the .NET framework is contributed by lots of small teams within the company. A typical team size is about 10 programmers. Otherwise a nod to what everybody in software dev business knows, large teams don't work. There's a critical mass where the amount of time spent on getting everybody to communicate starts to overwhelm the amount of time that can be spent on actually getting code produced.
Such teams are also constantly created and disbanded. Lots of parts of the framework no longer have an active team that maintains it. Typically just one guy that still knows the internals well enough to provide critical security updates and, maybe, bug fixes when necessary. The code that such a disbanded team wrote is very much in maintenance mode, changes are only made when absolutely necessary. Not just because there's no benefit to making minor stylistic changes but to reduce the odds that breaking changes are unknowingly added.
Which is a liability for the .NET framework, there are plenty of internals that have a knack for becoming externally visible, even if that code lives inside private methods. Like exceptions. And programmers using Reflection to hack around framework limitations. And the really subtle stuff, a great example is the bug in an email app widely used inside Microsoft, written by an intern. Which crashed and left everybody without email when they updated their machine from .NET 1.1 to .NET 2.0. The bug in that email app was a latent threading race that never triggered when running with .NET 1.1. But became visible by a very slight change in the timing of .NET 2.0 framework code.
It might not be a part of .NET Framework but Microsoft developers seem to be embracing the concept (notice the use of JetBrains annotations instead of Code Contracts as well):
https://github.com/aspnet/EntityFramework/blob/master/src/Shared/Check.cs
// Copyright (c) Microsoft Open Technologies, Inc. All rights reserved.
// Licensed under the Apache License, Version 2.0. See License.txt in the project root for license information.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using JetBrains.Annotations;
namespace Microsoft.Data.Entity.Utilities
{
[DebuggerStepThrough]
internal static class Check
{
[ContractAnnotation("value:null => halt")]
public static T NotNull<T>([NoEnumeration] T value, [InvokerParameterName] [NotNull] string parameterName)
{
NotEmpty(parameterName, "parameterName");
if (ReferenceEquals(value, null))
{
throw new ArgumentNullException(parameterName);
}
return value;
}
...
The only thing I can think of, is that if you have a Guard class, then in the exception stacktrace, it will look as if the problem is in Guard when it's actually in the method that called Guard. You could get around this by catching and rethrowing, but then you've got boilerplate in your production code again.
I need to provide a copy of the source code to a third party, but given it's a nifty extensible framework that could be easily repurposed, I'd rather provide a less OO version (a 'procedural' version for want of a better term) that would allow minor tweaks to values etc but not reimplementation using the full flexibility of how it is currently structured.
The code makes use of the usual stuff: classes, constructors, etc. Is there a tool or method for 'simplifying' this into what is still the 'source' but using only plain variables etc.
For example, if I had a class instance 'myclass' which initialised this.blah in the constructor, the same could be done with a variable called myclass_blah which would then be manipulated in a more 'flat' way. I realise some things like polymorphism would probably not be possible in such a situation. Perhaps an obfuscator, set to a 'super mild' setting would achieve it?
Thanks
My experience with nifty extensible frameworks has been that most shops have their own nifty extensible frameworks (usually more than one) and are not likely to steal them from vendor-provided source code. If you are under obligation to provide source code (due to some business relationship), then, at least in my mind, there's an ethical obligation to provide the actual source code, in a maintainable form. How you protect the source code is a legal matter and I can't offer legal advice, but really you should be including some license with your release and dealing with clients who are not going to outright steal your IP (assuming it's actually yours under the terms you're developing it.)
As had already been said, if this is a requirement based on restrictions of contracts then don't do it. In short, providing a version of the source that differs from what they're actually running becomes a liability and I doubt that it is one that your company should be willing to take. Proving that the code provided matches the code they are running is simple. This is also true if you're trying to avoid license restrictions of libraries your application uses (e.g. GPL).
If that isn't the case then why not provide a limited version of your extensibility framework that only works with internal types and statically compile any required extensions in your application? This will allow the application to continue to function as what they currently run while remaining maintainable without giving up your sacred framework. I've never done it myself but this sounds like something ILMerge could help with.
If you don't want to give out framework - just don't. Provide only source you think is required. Otherwise most likely you'll need to either support both versions in the future OR never work/interact with these people (and people they know) again.
Don't forget that non-obfuscated .Net assemblies have IL in easily de-compilable form. It is often easier to use ILSpy/Reflector to read someone else code than looking at sources.
If the reason to provide code is some sort of inspection (even simply looking at the code) you'd better have semi-decent code. I would seriously consider throwing away tool if its code looks written in FORTRAN-style using C# ( http://www.nikhef.nl/~templon/fortran/fortran_style ).
Side note: I believe "nifty extensible frameworks" are one of the roots of "not invented here" syndrome - I'd be more worried about comments on the framework (like "this code is ##### because it does not use YYY pattern and spacing is wrong") than reuse.
What are the technical reasons behind the difference between the 32-bit and 64-bit versions of string.GetHashCode()?
More importantly, why does the 64-bit version seem to terminate its algorithm when it encounters the NUL character? For example, the following expressions all return true when run under the 64-bit CLR.
"\0123456789".GetHashCode() == "\0987654321".GetHashCode()
"\0AAAAAAAAA".GetHashCode() == "\0BBBBBBBBB".GetHashCode()
"\0The".GetHashCode() == "\0Game".GetHashCode()
This behavior (bug?) manifested as a performance issue when we used such strings as keys in a Dictionary.
This looks like a known issue which Microsoft would not fix:
As you have mentioned this would be a breaking change for some programs (even though they shouldn't really be relying on this), the risk of this was deemed too high to fix this in the current release.
I agree that the rate of collisions that this will cause in the default Dictionary<String, Object> will be inflated by this. If this is adversely effecting your applications performance, I would suggest trying to work around it by using one of the Dictionary constructors that takes an IEqualityComparer so you can provide a more appropriate GetHashCode implementation. I know this isn't ideal and would like to get this fixed in a future version of the .NET Framework.
Source: Microsoft Connect - String.GetHashCode ignores any characters in the string beyond the first null byte in x64 runtime
Eric lippert has got a wondeful blog to this
Curious property in String
Curious property Revealed