Am I correct in saying that this:
public static void MethodName{bool first, bool second, bool third}
{
//Do something
}
Is more efficient than this:
public static void MethodName{bool [] boolArray}
{
bool first = boolArray[0];
bool second = boolArray[1];
bool third = boolArray[2];
//Do something
}
My thoughts are that for both they would have to declare first, second and third - just in different places. But for the second one it has to add it into an array and then unpack it again.
Unless you declared the array like this:
MethodName(new[] { true, true, true });
In which case I am not sure which is faster?
I ask because I am thinking of using the second one but wanted to know if/what the implications are on performance.
In this case performance is not particularly important, but it would be helpful for me to clarify this point.
Also, the second one has the advantage that you can pass as many values as you like to it, and it is also easier to read I think?
The reason I am thinking of using this is because there are already about 30 parameters being passed into the method and I feel it is becoming confusing to keep adding more. All these bools are closely related so I thought it may make the code more manageable to package them up.
I am working on existing code and it is not in my project scope to spend time reworking the method to decrease the number of parameters that are passed into the method, but I thought it would be good practice to understand the implications of this change.
In terms of performance, there's just an answer for your question:
"Programmers waste enormous amounts of time thinking about, or
worrying about, the speed of noncritical parts of their programs, and
these attempts at efficiency actually have a strong negative impact
when debugging and maintenance are considered. We should forget about
small efficiencies, say about 97% of the time: premature optimization
is the root of all evil. Yet we should not pass up our opportunities
in that critical 3%."
In terms of productivity, parameters > arrays.
Side note
Everyone should know that that was said by Donald Knuth in 1974. More than 40 years after this statement, we still fall on premature optimization (or even pointless optimization) very often!
Further reading
I would take a look at this other Q&A on Software Engineering
Am I correct in saying that this:
Is more efficient than this:
In isolation, yes. Unless the caller already has that array, in which case the second is the same or even (for larger argument types or more arguments) minutely faster.
I ask because I am thinking of using the second one but wanted to know if/what the implications are on performance.
Why are you thinking about the second one? If it is more natural at the point of the call then the reasons making it more natural are likely going to also have a performance impact that makes the second the better one in the wider context that outweighs this.
If you're starting off with three separate bools and you're wrapping them just to unwrap them again then I don't see what this offers in practice except for more typing.
So your reason for considering this at all is the more important thing here.
In this case performance is not particularly important
Then really don't worry about it. It's certainly known for hot-path code that hits params to offer overloads that take set numbers of individual parameters, but it really does only make a difference in hot paths. If you aren't in a hot path the lifetime saving of computing time of picking whichever of the two is indeed more efficient is unlikely to add up to the
amount of time it took you to write your post here.
If you are in a hot path and really need to shave off every nanosecond you can because you're looping so much that it will add up to something real, then you have to measure. Isolated changes have non-isolated effects when it comes to performance, so it doesn't matter whether the people on the Internet tell you A is faster than B if the wider context means the code calling A is slower than B. Measure. Measurement number one is "can I even notice?", if the answer to that measurement is "no" then leave it alone and find somewhere where the performance impact is noticeable to optimise instead.
Write "natural" code to start with, before seeing if little tweaks can have a performance impact in the bits that are actually hurting you. This isn't just because of the importance of readability and so on, but also because:
The more "natural" code in a given language very often is the more efficient. Even if you think it can't be, it's more likely to benefit from some compiler optimisation behind the scenes.
The more "natural" code is a lot easier to tweak for performance when it is necessary than code doing a bunch of strange things.
I don't think this would affect the performance of your app at all.
Personally
I'd go with the first option for two reasons:
Naming each parameter: if the project is a large scale project and there is a lot of coding or for possible future edits and enhancements.
Usability: if you are sending a list of similar parameters then you must use an array or a list, if it just a couple of parameters that happened to be of the same type then you should be sending them separately.
Third way would be use of params, Params - MSDN
In the end I dont think it will change much in performance.
array[] though inheritates from abstract Array class which implements IEnumerable and IEnumerable<t> (ICloneable, IList, ICollection,
IEnumerable, IStructuralComparable, IStructuralEquatable), this means objects are more blown up than three value type Parameters, which will make then slower obviously
Array - MSDN
You could test performance differences on both, but I doubt there would be much difference.
You have to consider maintainability, is another programmer, or even yourself going to understand why you did it that way in a few weeks, or a few months time when it's time for review? Is it easily extended, can you pass different object types through to your method?
If your passing a collection of items, then certainly packing them into an array would be quicker than specifying a new parameter for each additional item?
If you have to, you can do it that way, but have you considered param array??
Why use the params keyword?
public static void MethodName{params bool [] boolAarray}
{
//extract data here
}
Agreed with Matias' answer.
I also want to add that you need to add error checking, as you are passed an array, and nowhere is stated how many elements in your array you will receive. So you must first check that you have three elements in your array. This will balance the small perf gain that you may have earned.
Also, if you ever want to make this method available to other developers (as part of an API, public or private), intellisense will not help them at all in which parameters they're suppposed to set...
While using three parameters, you can do this :
///<summary>
///This method does something
///</summary>
///<param name="first">The first parameter</param>
///<param name="second">The second parameter</param>
///<param name="third">The third parameter</param>
public static void MethodName{bool first, bool second, bool third}
{
//Do something
}
And it will be displayed nicely and helpfully to others...
I would take a different approach and use Flags;
public static void MethodName(int Flag)
{
if (Flag & FIRST) { }
}
Chances are the compiler will do its own optimizations;
Check http://rextester.com/QRFL3116 Added method from Jamiec comment
M1 took 5ms
M2 took 23ms
M3 took 4ms
Related
I would love to write code like this:
class Zebra
{
public lazy int StripeCount
{
get { return ExpensiveCountingMethodThatReallyOnlyNeedsToBeRunOnce(); }
}
}
EDIT: Why? I think it looks better than:
class Zebra
{
private Lazy<int> _StripeCount;
public Zebra()
{
this._StripeCount = new Lazy(() => ExpensiveCountingMethodThatReallyOnlyNeedsToBeRunOnce());
}
public lazy int StripeCount
{
get { return this._StripeCount.Value; }
}
}
The first time you call the property, it would run the code in the get block, and afterward would just return the value from it.
My questions:
What costs would be involved with adding this kind of keyword to the library?
What situations would this be problematic in?
Would you find this useful?
I'm not starting a crusade to get this into the next version of the library, but I am curious what kind of considerations a feature such as this should have to go through.
I am curious what kind of considerations a feature such as this should have to go through.
First off, I write a blog about this subject, amongst others. See my old blog:
http://blogs.msdn.com/b/ericlippert/
and my new blog:
http://ericlippert.com
for many articles on various aspects of language design.
Second, the C# design process is now open for view to the public, so you can see for yourself what the language design team considers when vetting new feature suggestions. See https://github.com/dotnet/roslyn/ for details.
What costs would be involved with adding this kind of keyword to the library?
It depends on a lot of things. There are, of course, no cheap, easy features. There are only less expensive, less difficult features. In general, the costs are those involving designing, specifying, implementing, testing, documenting and maintaining the feature. There are more exotic costs as well, like the opportunity cost of not doing a better feature, or the cost of choosing a feature that interacts poorly with future features we might want to add.
In this case the feature would probably be simply making the "lazy" keyword a syntactic sugar for using Lazy<T>. That's a pretty straightforward feature, not requiring a lot of fancy syntactic or semantic analysis.
What situations would this be problematic in?
I can think of a number of factors that would cause me to push back on the feature.
First off, it is not necessary; it's merely a convenient sugar. It doesn't really add new power to the language. The benefits don't seem to be worth the costs.
Second, and more importantly, it enshrines a particular kind of laziness into the language. There is more than one kind of laziness, and we might choose wrong.
How is there more than one kind of laziness? Well, think about how it would be implemented. Properties are already "lazy" in that their values are not calculated until the property is called, but you want more than that; you want a property that is called once, and then the value is cached for the next time. By "lazy" essentially you mean a memoized property. What guarantees do we need to put in place? There are many possibilities:
Possibility #1: Not threadsafe at all. If you call the property for the "first" time on two different threads, anything can happen. If you want to avoid race conditions, you have to add synchronization yourself.
Possibility #2: Threadsafe, such that two calls to the property on two different threads both call the initialization function, and then race to see who fills in the actual value in the cache. Presumably the function will return the same value on both threads, so the extra cost here is merely in the wasted extra call. But the cache is threadsafe, and doesn't block any thread. (Because the threadsafe cache can be written with low-lock or no-lock code.)
Code to implement thread safety comes at a cost, even if it is low-lock code. Is that cost acceptable? Most people write what are effectively single-threaded programs; does it seem right to add the overhead of thread safety to every single lazy property call whether it's needed or not?
Possibility #3: Threadsafe such that there is a strong guarantee that the initialization function will only be called once; there is no race on the cache. The user might have an implicit expectation that the initialization function is only called once; it might be very expensive and two calls on two different threads might be unacceptable. Implementing this kind of laziness requires full-on synchronization where it is possible that one thread blocks indefinitely while the lazy method is running on another thread. It also means there could be deadlocks if there's a lock-ordering problem with the lazy method.
That adds even more cost to the feature, a cost that is borne equally by people who do not take advantage of it (because they are writing single-threaded programs).
So how do we deal with this? We could add three features: "lazy not threadsafe", "lazy threadsafe with races" and "lazy threadsafe with blocking and maybe deadlocks". And now the feature just got a whole lot more expensive and way harder to document. This produces an enormous user education problem. Every time you give a developer a choice like this, you present them with an opportunity to write terrible bugs.
Third, the feature seems weak as stated. Why should laziness be applied merely to properties? It seems like this could be applied generally through the type system:
lazy int x = M(); // doesn't call M()
lazy int y = x + x; // doesn't add x + x
int z = y * y; // now M() is called once and cached.
// x + x is computed and cached
// y * y is computed
We try to not do small, weak features if there is a more general feature that is a natural extension of it. But now we're talking about really serious design and implementation costs.
Would you find this useful?
Personally? Not really useful. I write lots of simple low-lock lazy code mostly using Interlocked.Exchange. (I don't care if the lazy method gets run twice and one of the results discarded; my lazy methods are never that expensive.) The pattern is straightforward, I know it to be safe, there are never extra objects allocated for the delegate or the locks, and if I have something a little more complex I can always use Lazy<T> to do the work for me. It would be a small convenience.
The system library already has a class that does what you want: System.Lazy<T>
I'm sure it could be integrated into the language, but as Eric Lippert will tell you adding features to a language is not something to take lightly. Many things have to be considered, and the benefit/cost ratio needs to be very good. Since System.Lazy already handles this pretty well, I doubt we will see this anytime soon.
Do you know about the Lazy<T> class that was added in .Net 4.0?
http://sankarsan.wordpress.com/2009/10/04/laziness-in-c-4-0-lazyt/
Have you tryed / Dou you mean this?
private Lazy<int> MyExpensiveCountingValue = new Lazy<int>(new Func<int>(()=> ExpensiveCountingMethodThatReallyOnlyNeedsToBeRunOnce()));
public int StripeCount
{
get
{
return MyExpensiveCountingValue.Value;
}
}
EDIT:
after your post edit I would add that your idea is definitely more elegant, but still has the same functionallity!!!.
This is unlikely to be added to the C# language because you can easily do it yourself, even without Lazy<T>.
A simple, but not thread-safe, example:
class Zebra
{
private int? stripeCount;
public int StripeCount
{
get
{
if (this.stripeCount == null)
{
this.stripeCount = ExpensiveCountingMethodThatReallyOnlyNeedsToBeRunOnce();
}
return this.stripeCount;
}
}
}
If you don't mind using a post-compiler, CciSharp has this feature:
class Zebra {
[Lazy] public int StripeCount {
get { return ExpensiveCountingMethodThatReallyOnlyNeedsToBeRunOnce(); }
}
}
Have a look at the Lazy<T> type. Also ask Eric Lippert about adding things like this to the language, he would no doubt have a view.
I'm new to C# and any form of programming, and I have a question that seems to divide those in the know in my university faculty. That question is simply: do I always have to declare a variable? As a basic example of what I'm talking about: If I have int pounds and int pence do I need to declare int money into which to put the answer or is it ok to just have:
textbox1.Text = (pounds + pence).ToString();
I know both work but i'm thinking in terms of best practice.
Thanks in advance.
In my opinion the answer is "no". You should, however, use variables in some cases:
Whenever a value is used multiple times
When a call to an expensive function is done, or one that has side-effects
When the expression needs to be made more self-explaining, variable (with meaningful names) do help
Basically, follow your common sense. The code should be self-explaining and clear, if introducing variables helps with that then use them.
Maintenance is king. It's the most expensive part of software development and anything you can do to make it easier is good.
Variables are good because when debugging, you can inspect the results of functions and calculations.
Absolutely not. The only time I would create a variable for a single use like this is if it significantly increases the readability of my code.
In my opinion if you do something like
int a = SomeFunc();
int b = SomeFunc2();
int c = a + b;
SomeFunc3(c);
it is better to just do
int a = SomeFunc();
int b = SomeFunc2();
SomeFunc3(a + b);
or even just
SomeFunc3(SomeFunc() + SomeFunc2());
If I am not manipulating with the variable after it's calculated then I think it's just better not to declare it because you just get more lines of code and way more room to make some mistake later on when your code gets bigger
Variables come to serve the following two purposes above all else:
Place holder for data (and information)
Readability enhancer
of course more can be said about the job of a variable, but those other tasks are less important here, and are far less relevant.
The above two points have the same importance as far as I'm concerned.
If you think that declaring a variable will enhance readability, or if you think that the data stored in that variable will be needed many times (and in which case, storing it in a well name var will again increase readability), then by all means create a new variable.
The only time I strictly advice against creating more variables is when the clutter of too-many-variables impacts readability more then aids it, and this cannot be undone by method extraction.
I would suggest that logging frequently makes variable declaration worthwile, and when you need to know what something specific is and you need to track that specific value. And you are logging, aren't you? Logging is good. Logging is right. Logging is freedom and unicorns and happy things.
I don't always use a variable. As an example if have a method evaluating something and returning true/false, I typically am returning the expression. The results are logged elsewhere, and I have the inputs logged, so I always know what happened.
Localisation and scope
For a programmer, knowledge of local variables - their content and scopes - is an essential part of the mental effort in comprehending and evolving the code. When you reduce the number of concurrent variables, you "free up" the programmer to consider other factors. Minimising scope is a part of this. Every little decision implies something about your program.
void f()
{
int x = ...; // "we need x (or side effect) in next scope AND
// thereafter..."
{
int n = ...; // "n isn't needed at function scope..."
...
} // can "pop" that n mentally...
...
}
The smallest scope is a literal or temporary result. If a value is only used once I prefer to use comments rather than a variable (they're not restricted to A-Za-z0-9_ either :-)):
x = employees.find("*", // name
retirement.qualify_at(), // age
num_wives() + num_kids()); // # dependents
Concision
Keeping focused on what your program is achieving is important. If you have a lot of screen real-estate (i.e. lines of code) getting fields into variables, you've less code on screen that's actually responsible for getting algorithmic level stuff done, and hence it's less tangible to the programmer. That's another reason to keep the code concise, so:
keep useful documentation targetted and concise
It depends on the situation. There is no one practice that would be best for all. For something this simple, you can skip creating a new variable but the best thing to do is step back and see how readable your expression is and see if introducing an intermediate variable will help the situation.
There are two objectives in making this decision:
Readability - the code is readable
and self-explanatory Code
Optimization - the code doesn't have
any unnecessary calculation
If you look at this as an optimization problem it might seem less subjective
Most readable on a scale from 1 to 10 with 10 being the easiest. Using sensible variable names may give you a 2, showing the calculation in line may give you a 3 (since the user doesn't have to look up what "money" is, it's just there in that line of code). etc etc. This piece is subjective, you and the companies you work for define what is readable and you can build this cost model from that experience.
Most optimal execution is not subjective. If you write "pounds + pence" everywhere you want the money calculation to go, you are wasting processor time. Yes I know addition is a bad example, but it still holds true. Say minimum execution of a process is simplified to memory allocation of variables, assignments, and calculations. Maybe one or two of these additions in the code will be ok for readability, but at some point it becomes a complete waste on the processor. This is why variables exist, allocate some space to store the value, name it money so the user knows what it is, and reference that variable "money" everywhere it's needed.
This makes more sense as you look at loops. lets say you want to sum 1000 values in the following loop.
money = factor[1] + factor[2] + ... + factor[n]
You could do this everywhere you want to use the value money so that anyone that reads your code knows what money consists of, instead, just do it once and write in some comments when you first calculate money so any programmers can come back and reference that.
Long story short, if you only use money once and it's clear what the inline calculation means, then of course, don't make a variable. If you plan on using it throughout your code and it's meaning becomes remotely confusing, then declare a variable, save the processor and be done with it!
Note: partially kidding about this approach, just thought it was funny to answer something like this in a cost model format :) still useful I'de say
I don't recall ever seeing anything like is, and I think it's more tied to different "styles" of programming. Some styles, such a Spartan programming actually attempts to declare as few as possible. If you aren't trying to follow a particular style, then it's best to go off of readability. In your example I wouldn't declare a special variable to hold it. It you were calculating taxes based off some percentage of the total of those, then I may- or at the very least I would comment what I was calculating.
Ok so the title may have been confusing so i have posted 2 code snippets to illustrate what i mean.
NOTE: allUsers is just a collection.
RegularUser regUser = new RegularUser(userName, password, name, emailAddress);
allUsers.Add(regUser);
VS
allUsers.Add(new RegularUser(userName, password, name, emailAddress));
Which snippet A or B is better and why?
What are the advantages or disadvantages?
The example i wrote was C# but does the language (C#, Java etc) make a difference?
As far as C# is concerned, both of your code examples are practically identical at the IL level. The second examples still creates a reference to the created object and pushes it onto the stack, you just don't have a local variable hooked up to it. This will not create any performance problems at all.
1) Which snippet A or B is better and why?
They're really identical. The compiled code will be nearly identical, since a temporary object is pushed onto the stack, then used in the method call.
2) What are the advantages or disadvantages?
The main advantages and disadvantages to the approach are really just readability.
Your first example has the advantage of keeping a single "operation" per line of code, which, in many ways, is more maintainable.
The second example removes the unnecessary variable declaration, which may be more maintainable.
Personally, I feel that the number of parameters in your RegularUser constructor would probably push me, in this instance, towards your first option. I typically find that, when a line of code gets to be more than about half a screen width on an average monitor, it's easier to read and understand if it's split up. Splitting this up by introducing the temporary and calling Add separately makes this more clear.
However, if you're just adding an integer or a class that's very small, I'd probably vote to skip the unnecessary variable. This is completely a personal preference, however - your milage may (and probably will) vary.
3) The example i wrote was C# but does the language (C#, Java etc) make a difference?
No, for the most part. This is really language/implementation dependent, but most languages will have the same basic behavior and performance in both cases. It is possible (and highly likely) that some languages may treat this differently, but most mainstream languages will not.
I really like to create them the first way unless I really really know what is going on. It is much harder to do debugging if you don't create the object first...
The compiler will just turn the 2nd version into the 1st for you, anyway, so there isn't a net negative effect.
Pros of #1:
easier to debug (!)
theoretically easier to read, clearer
can use the object later
Cons:
more verbose
can be unnecessary, especially for trivial objects
Result:
1 for anything complex to create, or that may need to be inspected easily at debug time
2 for lots of annoying little stuff, like the following.
var list = new List<NameValuePair>(3);
list.Add( new NameValuePair("name", "valuable");
list.add( new NameValuePair("age", "valuable");
list.add( new NameValuePair("height", "not valuable");
var dates = new List<date>();
dates.Add(DateTime.Now());
dates.Add(DateTime.Now().Date().AddYears(-2));
As far as I know there isn't a real difference between languages when it comes to this. Some may not allow it, though.
Both are equal in terms of performance.
In terms of maintainability the second case is a nightmare, it is (nearly) impossible to trace in a debugger. So I tend to prefer the first one. In my early oop days I was always writing the second, because "I knew that they were objects and I was sooo good at grasping objects that I ... blah blah blah", but that wore off with time and especially maintenance time
Also, suppose that someone wants you to
FilterClass.FilterUser(regUser)
or
Database.AddToDatabase(regUser)
because it is the right place to do so, the first scenario is better.
Finally, when do you stop?
allUsers.Add(new RegularUser(new ReadFromInput(new EscapedName(new Name(new String(userName)))), password, name, emailAddress));
I've been poking around mscorlib to see how the generic collection optimized their enumerators and I stumbled on this:
// in List<T>.Enumerator<T>
public bool MoveNext()
{
List<T> list = this.list;
if ((this.version == list._version) && (this.index < list._size))
{
this.current = list._items[this.index];
this.index++;
return true;
}
return this.MoveNextRare();
}
The stack size is 3, and the size of the bytecode should be 80 bytes. The naming of the MoveNextRare method got me on my toes and it contains an error case as well as an empty collection case, so obviously this is breaching separation of concern.
I assume the MoveNext method is split this way to optimize stack space and help the JIT, and I'd like to do the same for some of my perf bottlenecks, but without hard data, I don't want my voodoo programming turning into cargo-cult ;)
Thanks!
Florian
If you're going to think about ways in which List<T>.Enumerator is "odd" for the sake of performance, consider this first: it's a mutable struct. Feel free to recoil with horror; I know I do.
Ultimately, I wouldn't start mimicking optimisations from the BCL without benchmarking/profiling what difference they make in your specific application. It may well be appropriate for the BCL but not for you; don't forget that the BCL goes through the whole NGEN-alike service on install. The only way to find out what's appropriate for your application is to measure it.
You say you want to try the same kind of thing for your performance bottlenecks: that suggests you already know the bottlenecks, which suggests you've got some sort of measurement in place. So, try this optimisation and measure it, then see whether the gain in performance is worth the pain of readability/maintenance which goes with it.
There's nothing cargo-culty about trying something and measuring it, then making decisions based on that evidence.
Separating it into two functions has some advantages:
If the method were to be inlined, only the fast path would be inlined and the error handling would still be a function call. This prevents inlining from costing too much extra space. But 80 bytes of IL is probably still above the threshold for inlining (it was once documented as 32 bytes, don't know if it's changed since .NET 2.0).
Even if it isn't inlined, the function will be smaller and fit within the CPU's instruction cache more easily, and since the slow path is separate, it won't have to be fetched into cache every time the fast path is.
It may help the CPU branch predictor optimize for the more common path (returning true).
I think that MoveNextRare is always going to return false, but by structuring it like this it becomes a tail call, and if it's private and can only be called from here then the JIT could theoretically build a custom calling convention between these two methods that consists of just a jmp instruction with no prologue and no duplication of epilogue.
I am trying to optimize a piece of .NET 2.0 C# code that looks like this:
Dictionary<myType, string> myDictionary = new Dictionary<myType, string>();
// some other stuff
// inside a loop check if key is there and if not add element
if(!myDictionary.ContainsKey(currentKey))
{
myDictionary.Add(currentKey, "");
}
Looks like the Dictionary has been used by whoever wrote this piece of code even if not needed (only the key is being used to store a list of unique values) because faster than a List of myType objects for search.
This seems obviously wrong as only the key of the dictionary but I am trying to understand what's the best way to fix it.
Questions:
1) I seem to understand I would get a good performance boost even just using .NET 3.5 HashSet. Is this correct?
2) What would be the best way to optimize the code above in .NET 2.0 and why?
EDIT:
This is existing code I am trying to optimize, it's looping through dozens of thousands items and for each one of them is calling a ContainsKey. There's gotta be a better way of doing it (even in .NET 2.0)! :)
I think you need to break this down into 2 questions
Is Dictionary<myType,string> the best available type for this scenario
No. Based on your breakdown, HashSet<myType> is clearly the better choice because it's usage pattern more accurately fits the scenario
Will switching to Hashset<myType> give me a performance boost?
This is really subjective and only a profiler can give you the answer to this question. Likely you'll see a very minor memory size improvement per element in the collection. But in terms of raw computing power I doubt you'll see a huge difference. Only a profiler can tell you if there is one.
Before you ever make a performance related change to your code remember the golden rule.
Don't make any performance related changes until a profiler has told you precisely what is wrong with your code.
Making changes which violate this rule are just guesses. A profiler is the only way to measure success of a performance fix.
1) No. A dictionary does a hash on the key so your lookup should be O(1). A Hashset should result in less memory needed though. But honestly, it isn't that much that you will really see a performance boost.
2) Give us some more detail as to what you are trying to accomplish. The code you posted is pretty simple. Have you measured yet? Are you seeing that this method is slow? Don't forget "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." -- Donald Knuth
Depending on the size of your keys, you may actually see performance degrade.
One way in 2.0 would be to try and insert it and catch the exception (of course, this depends on how many duplicate keys you plan on having:
foreach(string key in keysToAdd)
{
try
{
dictionary.Add(key, "myvalue");
}
catch(ArgumentException)
{
// do something about extra key
}
}
The obvious mistake (if we discuss performance) I can see is the double work done when calling ContainsKey and then adding the key-value pair. When the pair is added using Add method, the key is again internally checked for presense. The whole if block can be safely replaced by this:
...
myDictionary[currentKey] = "";
...
If the key already exists there, the value will be just replaces and no exception will get thrown. Moreover, if the value is not used at all I would personally use null values to fill it. Can see no reason for using any string constant there.
The possible performance degrade mentioned by scottm is not for doing simple lookups. It is for calculating the intersection between 2 sets. HashSet does have slightly faster lookups than Dictionary. The performance difference really is going to be very small, though, as everyone says -- the lookup takes most of the time & creating the KeyValuePair takes very little.
For 2.0, you could make the "Value" object one of these:
public struct Empty {}
It may do slightly better than the "".
Or you could try making a reference to System.Core.dll in your 2.0 project, so you can use the HashSet.
Also, make sure that GetHashCode and Equals are as efficient as possible for MyType. I've been bitten by using a dictionary on something with a really slow GetHashCode (I believe we tried to use a delegate as a key or something like that.)