C# Open Source software that would benefit from parallelization? [closed] - c#

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm not sure if this kind of question is appropriate, but its been suggested I ask here, so here goes.
For a subject at university this semester our assignment is to take some existing code and parallelize it. We've got a bit of an open end on it, but open source is really the only way we are going to get existing code. I could write some code and then parallelize it, but existing code (and perhaps one where I could make a genuine contribution) would be best, to avoid doubling my workload for little benefit.
I was hoping to use C# and make use of the new Task parallel library, but I'm struggling to find some C# open source projects that are computationally expensive enough to make use of parallelization (and don't already have it).
Does anyone have some suggestions of where to look? Or is C# just not going to have enough of that kind of thing as open source (should perhaps try C++?)

I don't know if they already use parallel tasks, but good candidates are image manipulation programs, such as paint.net or pinta.

I don't know the scope of this project (if it's just a weekly assignment or your final project), but a process that benefits from parallelization does not have to be "embarassingly parallel" as Hans' linked article describes. A problem will benefit from being parallelized if:
The solution to the problem can be expressed as the "sum" of a repetitive series of smaller operations,
The smaller operations have minimal effect on each other, and
The scale of the problem is sufficient to make the benefits of parallelization greater than the loss due to the added overhead of creating and supervising multiple worker processes.
Examples of problems that are often solved linearly, but can benefit from parallelization include:
Sorting. Some algorithms like MergeSort are atomic enough to parallelize; others like QuickSort are not.
Searching. BinarySearch cannot be parallelized, but if you're searching unordered data like a document for one or more occurrences of words, linear searches can use "divide and conquer" optimizations.
Data transformation workflows. Open a file, read its raw data, carve it up into domain fields, turn those domain fields into true domain objects, validate them, and persist them. Each data file is often totally independent of all others, and the process of transformation (which is everything between reading the file and persisting it) is often a bottleneck that benefits from having more processors thrown at it.
Constraint satisfaction problems. Given a series of business rules defining relationships and constraints of multiple variables in a problem space, find a set of those variables that meets all constraints, or determine that there are none. Common applications include transportation route scheduling and business process optimization. This is an evolving sector of computational algorithms, of relatively high academic interest, and so you may find published public-domain code of a basic CSP algorithm you can multithread. It may be described as embarassingly parallel as the best-known solution is "intelligent brute force", but nonetheless, a possible solution can be evaluated independently of others and so each can be given to a worker thread.
Processes defined as "embarassingly parallel" are generally any problems sufficiently large in scale, yet atomic and repetitive, that parallel processing is the only feasible solution. The Wiki article Hans links to mentions common applications; they usually boil down, in general, to the application of a relatively simple computation to each element of a very large domain of data.

Check out Alglib, especially the open source C# edition. It will contain a lot of matrix and array manipulations that will be nicely suitable for TPL.

Project Bouncycastle implements several encryption algorithms in C# and java. Perhaps some of them are not as parallelized as they could be.

Related

Conceptual Query -- C# 4.0, ASPX.NET, GUI [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I am looking for honest / constructive feedback.
I hear a lot of my peers who have been using .NET for a while now, say how easily they built their GUI interfaces. On closer inspection they have used 3rd party tools such as Infragistics.
As a new .NET programmer (certified I may add), I wanted to know if anyone has actually created interfaces using nothing but what ever happens to be available by default with the framework...
I am guessing it shouldn't be too difficult to create a good, aethestic looking GUI without using 3rd party addons.
Yes We've done it (windows).
Depends on where you put the emphasis in your guess. No its not TOO difficult, but it's definitely not easy, unless your requirements are truly trivial, as opposed to apparently trivial.
All depends on what you need / want to do. My advice don't tell your boss, this will be easy, well not unless you want help getting out of the door for the last time.
For instance take a straight textbox.
They want to enter currency in it.
Multiple rounding algorithms.
Enter raw value display formatted, Currency symbol, thousand
separators.
Optional pounds or pence.
Optional blank or zero
Optional treatment of negatives.
Optional display formatting of negatives.
Alignment on decimal point.
Auto change of font on resizing.
And break none of the standard behaviours.
Trust me not simple at all. Especially if you do something Infragistics did not, and go for a good developer interface as well as the end user behaviours.
Not trying to put you off. It's challenging and rewarding, but when you have the entire application stuck behind some irritating bug in the UI, bosses lose patience real quick and you haven't got that get out of jail free card in shrugging and saying that's how X works.
NB just buying a suite won't fix all these problems, you can spend a lot of time producing a totally crap UI with them as well, just you don't have to write the code...
The answer to that is a lot of hard work. :(
Can your current suite be upgraded?
If you have the source could it be fixed, if you had the source and it's been twiddled with, are those "improvements" interfering?
Needs some hard-headed realistic analysis this. Which components are broke? How much are they used? How much of the extra behaviour in the suite do you really need.
Most important, how good is your separation of concerns in the current code, and how comprehensive both unit tests and automation tests.
Would compatibility mode sort it out?
Need to get to a point where the number of questions doesn't significantly out weigh the number of answers.
I've been where you are though it was another suite in another environment. The people looking for the cheap, quick and painless way of dealing with a mess like this were hugely disappointed, but it can be attacked in parts as long as everybody takes a heavy dose of pragmatism.
As a for instance,
Someone had bought a windows component that looked like a html link, and was heavily dependent on File associations and API calls. It was very visible and all over the place, I knocked up a much better and far less fragile one in a few days, swapped it in, a lot of perceived problems disappeared, confidence increased, and the remaining problems started to look less horrible.
Think of it like going into triage mode on bugs at the end of a struggling release.

Steps to open source a small project [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I've been working for a couple of years on a small project, almost by myself, with the eventual help of some colleagues. The project is getting out of my hands, because the size of the code is growing (around 20K lines now) and the initial expectations I had for it have outgrown my own ability and time. So now I want to open source it, with the hope to attract some contributors. My motivations for going open source are these:
The project is rather academic (a library of algorithms for scientific computing), and I don't really have any economic interest in it.
The project is getting too big for me to handle it by myself, and the number of features I've planned are enough to keep a small team motivated (I think).
It needs a lot of testing, not just unit testing, but testing in the real world to see if the API is easy to use, the performance is as expected, etc.
I'm sure it has a lot of bugs, but I can only find a few, since its me alone testing it.
It needs proper documentation, because the API is getting a bit complex.
Other than that, I think that the project could benefit from a comunity in terms of deciding which features are most needed, and creating a set of guidelines for the future development.
I'm using Git, so my first thought was to publish it on Github and/or Codeplex. Besides that, what would be the steps to help to slowly grow a community of users and perhaps developers around it? Do I need a domain of my own, or should I stick to Github/Codeplex? How do I set up a platform for collaboration between developers potentially geographically separated? Should I set up a mailing list? And most important, how do I attract people to use it and collaborate with it?
The project is a .NET library for optimization and machine learning, written in C#.
There is only one piece of advice I can give here; use Github. It is common, (pretty much) everyone knows about it, it is easy to use, and the community who you are trying to attract is already on it. It has a ton of tools which you may not have even thought about, but may come in handy. It it pretty much the perfect solution for what you're looking to do, so don't overthink it.
As for attracting people to use it and contribute, if it is something that is useful and good, people will find it. I have found a ton of obscure projects with a simple google. If someone googles for something related to your project (and it is appropriate named and such) they will likely find it. There isn't really much you can do to force a demand though, just let it happen. As for contributors, people who are using it will likely contribute they're additions back. Just be sure to stay actively involved in managing it (monitoring pull requests, etc). If no one is accepting requests or managing versions, contributors will likely start to give up on your project.

Advantages/disadvantages of combining C# and C [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am thinking about developing Winforms application that would use c library to preform all computations. I want to use C# because developing GUI with it is really easy, and C to increase performance. Are there other advantages or any disadvantages of combining those two languages?
Edit: As computations I mean mainly (but not limited to) graph algorithms, like coloring, dijakstra, maximum flow; I expect graphs to be huge, and performance is really crucial.
and C to increase performance
You could write pretty well performing applications using pure managed code. Take for example this very same site. It's pretty darn fast, don't you think?
The way I see things is the following: mix those two technologies only if you are really some kind of a performance maniac (someone that tries to solve the scaling problems that Google does for example) or if you have some existing C codebase that you cannot port to .NET immediately and that you have to interoperate with.
In other cases, managed code is safer. If you are sick about performance don't forget that in many cases the cost of marshaling between managed and unmanaged code will be higher than the performance you would gain from pure unmanaged code. Also the JITter gets smarter and smarter with each version of the framework and maybe one day it will generate almost as efficient code as unmanaged code.
So, yeah, go for managed and leave C to the maniacs :-)
I would not be so sure that C will be faster than C# for what you are going to do. The jitted code is extremely fast. Also, if you are doing lots of calculations, by using c#/.net, you could potentially take advantage of the Task Parallel Library, which would greatly ease parallelizing your calculations. And really, with all of the multi-core machines, I think that you will get a lot more bang for your bucks if you can take advantage of multiple cores (of course, straight C can thread as well, but the TPL in .net 4 is much more productive than the base WinApi)
One other thing that you might consider, and I am speaking from outside of my own experience, so definitely do your own research: I have heard that F# can be a very good choice for writing scientific sets/calculation libraries. I haven't written a line of F# myself, but from what I understand, the language design supports writing the calculations in a way that makes them extremely parallelizable.
(of course, now you have a new problem -- learning F# ;)
there is a post here that you might check out: F# performance in scientific computing
Getting fixed memory locations, and also converting your indexes to C pointers could have negative side effects. You can try both safe and unsafe code for sure and try to see the results but my guess is that safe code will perform better at the end. For sure, your computation is also important factor.
I have done this in the past, I would say that as other mentioned the cost of marshalling is not free so you need some pretty specialised need before this is a valid option.
My only advice to you would be to start in the most adapted language, which would be C# if you want to have managed code and other features that C# provide. And then if performance is an issue and do some profiling and optimise carefully by adding some code in C after weighting the pro/cons and considering whether you can get some optimisation by reworking the high level algorithm rather than optimising at a lower level.
For example, if you were to write scientific code, you would make your code multithreadable or checking that you cannot reuse previous computations before to attempt other kind of lower level optimisation, this will have a much larger impact on your efficiency rather than chasing tiny problems.

Multi Threading [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I'm learning Multi Threading at the moment, in C#, but as with all learning I like to learn best practices. At the moment the area seems fuzzy. I understand the basics, and I can create threads.
What should I look out for when creating multi threaded applications. Are there any set rules or best practices that I should know about? Or anything to remember in order to avoid slip ups down the line?
Thanks for the responses.
In addition to the MSDN Best Practices, I'll add:
Don't make your own threads. Prefer to use the ThreadPool (or the new Task Parallel Library Tasks). Managing your own thread is rarely, if ever, the correct design decision.
Take extra care with UI related issues. Control.Invoke (Windows Forms) and Dispatcher.Invoke (WPF), or use SynchronizationContext.Current with Post/Send
Favor using the BackgroundWorker class when appropriate.
Try to keep synchronization via locks to a minimum
Make sure to synchronize everything that requires synchronization
Favor the methods in the Interlocked class when possible over locking
Once you get more advanced, and are trying to optimize, other things to look for:
Watch out for false sharing. This is especially problematic when working with arrays, since every array write to any element in an array includes a bounds check in .NET, which in effect causes an access on the array near element 0 (just prior to element 0 in memory). This can cause perf. to go downhill dramatically.
Beware of closure issues, especially when working in looping situations. Nasty bugs can occur if you're closing on a variable in the wrong scope when making a delegate.
MSDN - Managed Threading Best Practices
That MSDN Article does a really good job at touching on the danger areas and giving the best practices for managing/working around those areas.
Since I have this open on another tab...
http://www.yoda.arachsys.com/csharp/threads/
I'm only a couple chapters in but it was written by Jon Skeet.
Are there things you should know about ?
Certainly.
Dangers:
Race Conditions
Deadlocks
A usefull and interesting article can be found here
I highly recommend that you start by digesting this: http://www.albahari.com/threading/.

Automatic generation of Unit test cases for .NET and Java [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
Is there a good tool to generate unit test cases given say a .NET or Java project, it generates unit test cases that would cover an almost 100% code coverage. The number of test cases could be directly proportional to the cyclomatic complexity of the code (the higher the nesting of loops and conditions the higher the cyclomatic complexity) where the higher the cyclomatic complexity, the greater the set of test cases are generated. I'm not expecting it to be fully functional (say I'm going to build the unit tests and run it after its been generated), but I would say that it can have a template style in the test case where you are to modify the case that suits your intended needs. But it should also have a proper setup and teardown method and is good enough to detect if mock objects for unit testing should be used should there be any dependencies. So, is there such a tool that exists?
For .NET, Microsoft has Pex which will hopefully go mainstream for .NET 4.0, along with Code Contracts. I highly recommend watching the Channel 9 video.
It strikes me that this sort of thing is very good for very data-driven classes - parsers etc. I can't see that I'd very often start off with it, but a useful tool to have in your armoury nonetheless.
For C# (or .NET in general), PEX might be that tool. It works at the IL level, and attempts to force its way into every branch. It has successfully uncovered a wide range of bugs (in the BCL etc).
Although it seems counter-intuituve, you may also be interested in random test generation frameworks. Research has proven that it can be just as effective in finding bugs than systematic approaches based on coverage, as you suggest.
Check out Randoop both for .NET and Java. It works by generating a more or less random sequence of method calls, and checks contracts, crashes etc. It is fully automatic.
Also you may want to check out some other random testing tools based on QuickCheck, e.g. for Java, Scala, F#. that are more similar to Pex, i.e. you give a specification, or parametrized unit test, and the tool checks it for a number of generated input arguments.
I've found that this "parametrized" way of writing unit tests is actually a lot more natural in at least 60% of the cases, and finds lots more bugs.
For Java, you can check EvoSuite, which is open source and currently active (disclaimer, I am one of its contributors). Also see related question for a list of more tools.
For Java, try JUnit-Tools. It has own eclipse plugin along with good documentation.

Categories