Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Is there a case when a hashcode collision would be beneficial?
(Other than when the objects are identical, of course.)
EDIT: beneficial meaning to calculate the hashcode in less cpu cycles, or use less memory in the calculation.
I guess a clarification would be: If a certain GetHashCode() is 10 times faster, but it also causes twice (for example) as many collisions, is it worth it?
'Beneficial' is a difficult term to quantify, especially in this case. It depends on your definition of beneficial.
If you're checking for object equality and they collide but the objects are not the same, then that would not be beneficial.
If you're building a hashmap, then you might have specific mechanisms built into your implementation to handle these cases. I'm fairly certain most (if not all) modern hashmap implementations do this.
You could also argue there's a bunch of fringe benefits, like maybe you're a mathematician or a security researcher, and you're looking to show the strength (or lack thereof) for the algorithm used in GetHashCode(). Or maybe you want to give an excellent proof-of-concept for why Microsoft should hire you for the .NET team.
Overall, your question is pretty vague. If there's something specific you're wondering, you should rethink/edit your question.
To answer your question you first need to understand what a hash code is used for. A hash code is a fast "pre test" for checking the equality of two objects.
So is there a case where a collision is beneficial?
Yes, if in the process of generating the hash code you are spending a relatively large amount of time to create a more unique hash code the overhead of that generation may be more than the benefits you get from having a more unique hash.
To address your latest edit, the only way to tell if it is worth it is try both methods in place with your real data and see how the two compare. Doing a artificial head to head benchmark is not going to give you any meaningful information, things like hash code lookups depend too much on the data it is working with.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
First, please no spamming because I am not necessarily an OOP devotee. That said, I have been a programmer on and off for almost 30 years and have created a lot of pretty cool production code systems/solutions in several industries. I've also done my share of break/fix, database development, etc. Even a bout 10 years as a web programmer, not developer, so I an not so much a newbie but someone trying to get an answer about something that frankly is eluding me.
I started as a "C" programmer int he early 1980's and "C" served me well into the early 2000s (even today most scripting and higher level languages use "C" syntactical elements).
That said, overloading seems to violate every principle of what I was taught were "good coding practices" by increasing ambiguity in the opportunity for omission of intended code to be executed for a given condition or actually running a routine you didn't expect to due to some condition falling through the cracks. Also generally seems to creates LOTS of confusion for learners.
I am not saying overloading is bad per se, I just want to better understand it's practical application to real problems other than simply a way to provide input validation or perhaps just to handle inputs from other sources that you have no control over in an API or something else that you don't necessarily know the type of (again not clear on how or why that could actually happen either) C# has a lot of parse and try catch to handle this as do most OOP languages.
In over a decade, I have yet to get a straight, non judgmental and dare I say unsnarky answer to this question. Surely there is someone who can offer a reasonable explanation of why it is used.
So I pose the question to you the stack overflow gurus, Personally, does having a method/function that is potentially callable multiple different ways with multiple exclusive code segments really a good thing, or does it just suggest lack of good planning when designing software. Again, not knocking, judging, or disparaging, I just don't get it.....please enlighten me!
I'd say std::to_string is a pretty good example of good use of overloading. Why would you want to have different functions for converting different types to std::string? You don't. You just want one - std::to_string and you want it to behave sensibly whatever type of argument you give it - and it does just that. Using overloading keeps the client code simple.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I want to write a C# application that compares the performance of two PCs, and know which PC will perform a task faster than the other.
So is there an algorithm for doing this?
for example ( (NumberOfProcessCurntlyRunning*AvailableRAM)+CPUUsage).
Assuming that we have 2 computers with the same computing and hardware power.
As I agree that there is no general purpose algorithm to determine the overall performance of a computer, there are some algorithms that are being used by scientists to create more reliable benchmarks for their papers. So if you implement another solution targeting the same problem you can tell if it's better then previously discovered ones even tho you are working on a different machine then previous teams.
One example can be a benchmark algorithm dfmax. It will give you in a short time some foggy idea on how fast the current machine is, but it won't take into account RAM that is available. But I think that it could be some start for you.
No, there isn't an algorithm to determine overall computer performance. There are lots of things that can affect overall performance.
You should determine what to compare; access to memory, the efficiency of the CPU cache, efficiency of GC, then implement functions for each of them.
If you need a blind test and you don't care any of specific metric then you can run the same function (e.g. quick sort) and log time and compare milliseconds.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I'm learning C# recently with a strong C++ background, and there is something about C# that I don't quite understand, given my understanding of and experiences with C++.
In C++, people do care a lot about uniformity, otherwise it would be impossible to write generic code using template meta-programming. In C#, however, people seem to care little about uniformity. For example, while array types have a Length property, List<T> uses Count. While IndexOf, LastIndexOf, and alike for array types are static methods, their counterparts for List<T> are not. This gives me the impression that instead of being uniform, C# is actually trying hard to be nonuniform. This doesn't make sense to me. Since C# doesn't support template meta-programming, uniformity is not that important as in C++. But still, being uniform can be beneficial in many other ways. For example, it would be easier for humans to learn and master. When things are highly uniform, you mater one, and you master it all. Please note that I'm not a C++ fanatics nor diehard. I just don't really understand.
You've got a conceptual issue here.
List<T>, and the other collection classes with it, aren't C# constructs. They are classes in the BCL. Essentially, you can use any BCL class in any .NET Language, not just C#. If you're asking why the BCL classes differ in certain ways, it's not because the designers disrespected or didn't want uniformity. It's probably for one of (at least two) reasons:
1) The BCL and FCL evolved over time. You're likely to see very significant differences in classes that were introduced before and after generics were added. One example, DataColumnCollection, is an IEnumerable (but not an IEnumerable<DataColumn>). That causes you to need to cast to perform some operations.
2) There's a subtle difference in the meaning of the method. .Length, I believe, is made to imply that there's a static number somewhere, where .Count implies that some operation might be done to get the number of items in the list.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
EDIT: I'm not asking for an opinion here. I'm not saying, "Let's debate what's evil and what's not." I'm saying if someone decides to just use static classes for a majority of their code, what performance gains does that give them? And what maintainability challenges does that represent?
I was reading an article today on how StackOverflow is able to be so fast using limited hardware: http://highscalability.com/blog/2014/7/21/stackoverflow-update-560m-pageviews-a-month-25-servers-and-i.html
One thing that caught my eye is that they employ:
Heavy usage of static classes and methods, for simplicity and better
performance.
I've read opinions here and other places suggesting that Singletons and Static classes are "evil" -- that anything global is "evil". I've also learned that people who call things "evil" can be right in a situation -- but that they usually don't know what they're talking about.
So I'm turning to the community to understand this -- what advantages does StackExchange benefit from by using mostly static classes and methods? And what are the software maintainability or performance costs of their choices?
Using static methods and classes has two effects.
Fewer null checks because a static method invocation does not require a null check on the receiver.
Fewer allocations which leads to less memory pressure and time spent in GC.
Of course, this is at the cost of having fewer seams where you can break apart the system and do unit testing. Therefore it is probably more costly to maintain. It is not about what is good or "evil". They know the trade offs and they've decided that they want the performance more than they want the maintainability.
This blog post (also part 2) has a nice discussion of some of the performance trade offs they've made and it specifically discusses points in the article you've quoted.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm not sure if this kind of question is appropriate, but its been suggested I ask here, so here goes.
For a subject at university this semester our assignment is to take some existing code and parallelize it. We've got a bit of an open end on it, but open source is really the only way we are going to get existing code. I could write some code and then parallelize it, but existing code (and perhaps one where I could make a genuine contribution) would be best, to avoid doubling my workload for little benefit.
I was hoping to use C# and make use of the new Task parallel library, but I'm struggling to find some C# open source projects that are computationally expensive enough to make use of parallelization (and don't already have it).
Does anyone have some suggestions of where to look? Or is C# just not going to have enough of that kind of thing as open source (should perhaps try C++?)
I don't know if they already use parallel tasks, but good candidates are image manipulation programs, such as paint.net or pinta.
I don't know the scope of this project (if it's just a weekly assignment or your final project), but a process that benefits from parallelization does not have to be "embarassingly parallel" as Hans' linked article describes. A problem will benefit from being parallelized if:
The solution to the problem can be expressed as the "sum" of a repetitive series of smaller operations,
The smaller operations have minimal effect on each other, and
The scale of the problem is sufficient to make the benefits of parallelization greater than the loss due to the added overhead of creating and supervising multiple worker processes.
Examples of problems that are often solved linearly, but can benefit from parallelization include:
Sorting. Some algorithms like MergeSort are atomic enough to parallelize; others like QuickSort are not.
Searching. BinarySearch cannot be parallelized, but if you're searching unordered data like a document for one or more occurrences of words, linear searches can use "divide and conquer" optimizations.
Data transformation workflows. Open a file, read its raw data, carve it up into domain fields, turn those domain fields into true domain objects, validate them, and persist them. Each data file is often totally independent of all others, and the process of transformation (which is everything between reading the file and persisting it) is often a bottleneck that benefits from having more processors thrown at it.
Constraint satisfaction problems. Given a series of business rules defining relationships and constraints of multiple variables in a problem space, find a set of those variables that meets all constraints, or determine that there are none. Common applications include transportation route scheduling and business process optimization. This is an evolving sector of computational algorithms, of relatively high academic interest, and so you may find published public-domain code of a basic CSP algorithm you can multithread. It may be described as embarassingly parallel as the best-known solution is "intelligent brute force", but nonetheless, a possible solution can be evaluated independently of others and so each can be given to a worker thread.
Processes defined as "embarassingly parallel" are generally any problems sufficiently large in scale, yet atomic and repetitive, that parallel processing is the only feasible solution. The Wiki article Hans links to mentions common applications; they usually boil down, in general, to the application of a relatively simple computation to each element of a very large domain of data.
Check out Alglib, especially the open source C# edition. It will contain a lot of matrix and array manipulations that will be nicely suitable for TPL.
Project Bouncycastle implements several encryption algorithms in C# and java. Perhaps some of them are not as parallelized as they could be.