Related
If I am given three arrays of equal length. Each array represents the distance to a specific attraction (ie the first array is only theme parks, the second is only museums, the third is only beaches) on a road trip I am taking. I wan't to determine all possible trips stopping at one of each type of attraction on each trip, never driving backwards, and never visiting the same attraction twice.
IE if I have the following three arrays:
[29 50]
[61 37]
[37 70]
The function would return 3 because the possible combinations would be: (29,61,70)(29,37,70)(50,61,70)
What I've got so far:
public int test(int[] A, int[] B, int[] C) {
int firstStop = 0;
int secondStop = 0;
int thirdStop = 0;
List<List<int>> possibleCombinations = new List<List<int>>();
for(int i = 0; i < A.Length; i++)
{
firstStop = A[i];
for(int j = 0; j < B.Length; j++)
{
if(firstStop < B[j])
{
secondStop = B[j];
for(int k = 0; k < C.Length; k++)
{
if(secondStop < C[k])
{
thirdStop = C[k];
possibleCombinations.Add(new List<int>{firstStop, secondStop, thirdStop});
}
}
}
}
}
return possibleCombinations.Count();
}
This works for the folowing test cases:
Example test: ([29, 50], [61, 37], [37, 70])
OK Returns 3
Example test: ([5], [5], [5])
OK Returns 0
Example test: ([61, 62], [37, 38], [29, 30])
FAIL Returns 0
What is the correct algorithm to calculate this correctly?
What is the best performing algorithm?
How can I tell the performance of this algorithm's time complexity (ie is it O(N*log(N))?)
UPDATE: The question has been rewritten with new details and still is completely unclear and self-contradictory; attempts to clarify the problem with the original poster have been unsuccessful, and the original poster admits to having started coding before understanding the problem themselves. The solution below is correct for the problem as it was originally stated; what the solution to the real problem looks like, no one can say, because no one can say what the real problem is. I'll leave this here for historical purposes.
Let's re-state the problem:
We are given three arrays of distances to attractions along a road.
We wish to enumerate all sequences of possible stops at attractions that do not backtrack. (NOTE: The statement of the problem is to enumerate them; the wrong algorithm given counts them. These are completely different problems. Counting them can be extremely fast. Enumerating them is extremely slow! If the problem is to count them then clarify the problem.)
No other constraints are given in the problem. (For example, it is not given in the problem that we stop at no more than one beach, or that we must stop at one of every kind, or that we must go to a beach before we go to a museum. If those are constraints then they must be stated in the problem)
Suppose there are a total of n attractions. For each attraction either we visit it or we do not. It might seem that there are 2n possibilities. However, there's a problem. Suppose we have two museums, M1 and M2 both 5 km down the road. The possible routes are:
(Start, End) -- visit no attractions on your road trip
(Start, M1, End)
(Start, M2, End)
(Start, M1, M2, End)
(Start, M2, M1, End)
There are five non-backtracking possibilities, not four.
The algorithm you want is:
Partition the attractions by distance, so that all the partitions contain the attractions that are at the same distance.
For each partition, generate a set of all the possible orderings of all the subsets within that partition. Do not forget that "skip all of them" is a possible ordering.
The combinations you want are the Cartesian product of all the partition ordering sets.
That should give you enough hints to make progress. You have several problems to solve here: partitioning, permuting within a partition, and then taking the cross product of arbitrarily many sets. I and many others have written articles on all of these subjects, so do some research if you do not know how to solve these sub-problems yourself.
As for the asymptotic performance: As noted above, the problem given is to enumerate the solutions. The best possible case is, as noted before, 2n for cases where there are no attractions at the same distance, so we are at least exponential. If there are collisions then it becomes a product of many factorials; I leave it to you to work it out, but it's big.
Again: if the problem is to work out the number of solutions, that's much easier. You don't have to enumerate them to know how many solutions there are! Just figure out the number of orderings at each partition and then multiply all the counts together. I leave figuring out the asymptotic performance of partitioning, working out the number of orderings, and multiplying them together as an exercise.
Your solution runs in O(n ^ 3). But if you need to generate all possible combinations and the distances are sorted row and column wise i.e
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
all solutions will degrade to O(n^3) as it requires to compute all possible subsequences.
If the input has lots of data and the distance between each of them is relatively far then a Sort + binary search + recursive solution might be faster.
static List<List<int>> answer = new List<List<int>>();
static void findPaths(List<List<int>> distances, List<int> path, int rowIndex = 0, int previousValue = -1)
{
if(rowIndex == distances.Count)
{
answer.Add(path);
return;
}
previousValue = previousValue == -1 ? distances[0][0] : previousValue;
int startIndex = distances[rowIndex].BinarySearch(previousValue);
startIndex = startIndex < 0 ? Math.Abs(startIndex) - 1 : startIndex;
// No further destination can be added
if (startIndex == distances[rowIndex].Count)
return;
for(int i=startIndex; i < distances[rowIndex].Count; ++i)
{
var temp = new List<int>(path);
int currentValue = distances[rowIndex][i];
temp.Add(currentValue);
findPaths(distances, temp, rowIndex + 1, currentValue);
}
}
The majority of savings in this solution comes from the fact that since the data is already sorted we need not look distances in the next destinations with distance less than the previous value we have.
For smaller and more closed distances this might be a overkill with the additional sorting and binary search overhead making it slower than the straightforward brute force approach.
Ultimately i think this comes down to how your data is and you can try out both approaches and try which one is faster for you.
Note: This solution does not assume strictly increasing distances i.e) [29, 37, 37] is valid here. If you do not want such solution you'll have to change Binary Search to do a upper bound as opposed to lower bound.
Use Dynamic Programming with State. As there are only 3 arrays, so there are only 2*2*2 states.
Combine the arrays and sort it. [29, 37, 37, 50, 61, 70]. And we make an 2d-array: dp[0..6][0..7]. There are 8 states:
001 means we have chosen 1st array.
010 means we have chosen 2nd array.
011 means we have chosen 1st and 2nd array.
.....
111 means we have chosen 1st, 2nd, 3rd array.
The complexity is O(n*8)=O(n)
I have three nested loops from zero to n. n is a large number, around 12000th These three loops working on 2DList. It is actually a Floyd algorithm. At these large data it takes along time, could you advise me how to improve it? Thank you (Sorry for my english:) )
List<List<int>> distance = new List<List<int>>();
...
for (int i = 0; i < n; i++)
for (int v = 0; v < n; v++)
for (int w = 0; w < n; w++)
{
if (distance[v][i] != int.MaxValue &&
distance[i][w] != int.MaxValue)
{
int d = distance[v][i] + distance[i][w];
if (distance[v][w] > d)
distance[v][w] = d;
}
}
The first part of your if statement distance[v][i] != int.MaxValue can be moved outside of the iteration over w to reduce overhead in some cases. However, I have no idea how often your values are at int.MaxValue
You cannot change Floyd’s algorithm, its complexity is fixed (and it’s provably the most efficient solution to the general problem of finding all pairwise shortest path distances in a graph with negative edge weights).
You can only improve the runtime by making the problem more specific or the data set smaller. For a general solution you’re stuck with what you have.
Normally I would suggest using Parallel Linq - for example the Ray Tracer example, however this assumes that the items you're operating on are independent. In your example you are using results from a previous iteration, in the current one, making it impossible to parallelize.
As your code is quite simple and there isn't really any overhead, there's not really anything you can do to speed that up. As mentioned you could switch the Lists to arrays. You might also want to compare Double arithmetic to Integer arithmetic on your target machine.
After a simple look at your code, it seems that you might be heading for a overflow, as the condition check would not be able to block it.
In your code, the condition below adds no value, since we can have distance[v][i] < Int.MaxValue & distance[i][w] < Int.MaxValue but distance[v][i] + distance[i][w] > Int.Maxvalue.
if (distance[v][i] != int.MaxValue && distance[i][w] != int.MaxValue)
As the others have mentioned, the complexity is fixed so you don't exactly have many options there. However, you can use
Use arrays instead of lists, if possible.
Use an "unsafe" block with pointersemantics, this should decrease the time required to access your array data.
Check if you can parallelize your algorithm. In your case you could use multiple copies of your data (multiple copies to get rid of the need for synchronisation) and have several threads work on it (e.g. by splitting the range of the outerloop into some subranges (1-1000, 1001-2000 e.g.).
I've had quite a bit of experience with programming (three semesters teaching VBasic, C++, and Java), and now I'm in college and I'm taking a C# class, which is quite boring (the teacher knows less than I do).
Anyways, for one of our exercises, we're creating a number guessing/lottery game. It works kind of like this:
User inputs three integers from 1-4 and clicks submit (I have them storing into an array)
Program generates three numbers from 1-4 (also in an array)
Function that checks matching runs and checks the two arrays
If all three match in order (i.e. 1,2,3 = 1,2,3 and NOT 1,2,3 = 1,3,2), matching = 4
If all three match NOT in order, matching = 3
If only two match, matching = 2
I want to make sure that only one match counts as one (i.e. [1,1,2][1,2,3] only gives one match to the user.
If only one matches, matching = 1
If no matches, matching stays at 0 (it's instantiated at submit_click)
I've got all of the code and GUI working except for the matching logic. I know I could do it with a LARGE amount of if statements, and I know cases would probably work, but I'm not as experienced with cases.
I'm not expecting my 'homework' to be done here, but I just want to know what method would be most effective to get this to correctly work (if it's easier to exclude the one match per item, then that's fine), and to possibly see some working code.
Thanks!
EDIT
I apologize if I come across as arrogant, I didn't mean to come across as a know-it-all (I definitely do not).
I have NOT taught classes, I've just taken classes from a teacher who's primarily a programming in and I'm at a community college and my professor isn't primarily a programming teacher.
I didn't take time to write a ton of if statements because I know that it would just get shot down as ineffective. I currently don't have the resources to test the answers, but as soon as I can I'll check them out and post back.
Again, I apologize for coming across as rude and arrogant, and I appreciate your answers more than you know.
Thanks again!
You can use a loop to achieve this functionality. I've used a list simply for ease of use, performing remove operations and the like. Something like this should work:
public static int getNumberOfMatches(List<int> userGuesses, List<int> machineGuesses) {
// Determine list equality.
bool matchedAll = true;
for (int i = 0; i < userGuesses.Count; i++) {
if (userGuesses[i] != machineGuesses[i]) {
matchedAll = false;
break;
}
}
// The lists were equal; return numberOfGuesses + 1 [which equals 4 in this case].
if (matchedAll) {
return userGuesses.Count + 1;
}
// Remove all matches from machineGuesses.
foreach (int userGuess in userGuesses) {
if (machineGuesses.Contains(userGuess)) {
machineGuesses.Remove(userGuess);
}
}
// Determine number of matches made.
return userGuesses.Count - machineGuesses.Count;
}
I think for the first case, for all matches in order you would scan the arrays together and maybe increment a counter. Since you mentioned you know c++, this would be
int userGuesses[3];
int randomGen[3];
int matches = 0;
for(int i=0; i < 3; i++) if(userGuesses[i] == randoGen[i]) matches++;
if(matches == 3) //set highest score here.
if(matches == 2) // next score for ordered matches etc.
For the not-in-order case, you will need to lookup the generated array for each user guess to see if it has that value.
I have a program that needs to repeatedly compute the approximate percentile (order statistic) of a dataset in order to remove outliers before further processing. I'm currently doing so by sorting the array of values and picking the appropriate element; this is doable, but it's a noticable blip on the profiles despite being a fairly minor part of the program.
More info:
The data set contains on the order of up to 100000 floating point numbers, and assumed to be "reasonably" distributed - there are unlikely to be duplicates nor huge spikes in density near particular values; and if for some odd reason the distribution is odd, it's OK for an approximation to be less accurate since the data is probably messed up anyhow and further processing dubious. However, the data isn't necessarily uniformly or normally distributed; it's just very unlikely to be degenerate.
An approximate solution would be fine, but I do need to understand how the approximation introduces error to ensure it's valid.
Since the aim is to remove outliers, I'm computing two percentiles over the same data at all times: e.g. one at 95% and one at 5%.
The app is in C# with bits of heavy lifting in C++; pseudocode or a preexisting library in either would be fine.
An entirely different way of removing outliers would be fine too, as long as it's reasonable.
Update: It seems I'm looking for an approximate selection algorithm.
Although this is all done in a loop, the data is (slightly) different every time, so it's not easy to reuse a datastructure as was done for this question.
Implemented Solution
Using the wikipedia selection algorithm as suggested by Gronim reduced this part of the run-time by about a factor 20.
Since I couldn't find a C# implementation, here's what I came up with. It's faster even for small inputs than Array.Sort; and at 1000 elements it's 25 times faster.
public static double QuickSelect(double[] list, int k) {
return QuickSelect(list, k, 0, list.Length);
}
public static double QuickSelect(double[] list, int k, int startI, int endI) {
while (true) {
// Assume startI <= k < endI
int pivotI = (startI + endI) / 2; //arbitrary, but good if sorted
int splitI = partition(list, startI, endI, pivotI);
if (k < splitI)
endI = splitI;
else if (k > splitI)
startI = splitI + 1;
else //if (k == splitI)
return list[k];
}
//when this returns, all elements of list[i] <= list[k] iif i <= k
}
static int partition(double[] list, int startI, int endI, int pivotI) {
double pivotValue = list[pivotI];
list[pivotI] = list[startI];
list[startI] = pivotValue;
int storeI = startI + 1;//no need to store # pivot item, it's good already.
//Invariant: startI < storeI <= endI
while (storeI < endI && list[storeI] <= pivotValue) ++storeI; //fast if sorted
//now storeI == endI || list[storeI] > pivotValue
//so elem #storeI is either irrelevant or too large.
for (int i = storeI + 1; i < endI; ++i)
if (list[i] <= pivotValue) {
list.swap_elems(i, storeI);
++storeI;
}
int newPivotI = storeI - 1;
list[startI] = list[newPivotI];
list[newPivotI] = pivotValue;
//now [startI, newPivotI] are <= to pivotValue && list[newPivotI] == pivotValue.
return newPivotI;
}
static void swap_elems(this double[] list, int i, int j) {
double tmp = list[i];
list[i] = list[j];
list[j] = tmp;
}
Thanks, Gronim, for pointing me in the right direction!
The histogram solution from Henrik will work. You can also use a selection algorithm to efficiently find the k largest or smallest elements in an array of n elements in O(n). To use this for the 95th percentile set k=0.05n and find the k largest elements.
Reference:
http://en.wikipedia.org/wiki/Selection_algorithm#Selecting_k_smallest_or_largest_elements
According to its creator a SoftHeap can be used to:
compute exact or approximate medians
and percentiles optimally. It is also
useful for approximate sorting...
I used to identify outliers by calculating the standard deviation. Everything with a distance more as 2 (or 3) times the standard deviation from the avarage is an outlier. 2 times = about 95%.
Since your are calculating the avarage, its also very easy to calculate the standard deviation is very fast.
You could also use only a subset of your data to calculate the numbers.
You could estimate your percentiles from just a part of your dataset, like the first few thousand points.
The Glivenko–Cantelli theorem ensures that this would be a fairly good estimate, if you can assume your data points to be independent.
Divide the interval between minimum and maximum of your data into (say) 1000 bins and calculate a histogram. Then build partial sums and see where they first exceed 5000 or 95000.
There are a couple basic approaches I can think of. First is to compute the range (by finding the highest and lowest values), project each element to a percentile ((x - min) / range) and throw out any that evaluate to lower than .05 or higher than .95.
The second is to compute the mean and standard deviation. A span of 2 standard deviations from the mean (in both directions) will enclose 95% of a normally-distributed sample space, meaning your outliers would be in the <2.5 and >97.5 percentiles. Calculating the mean of a series is linear, as is the standard dev (square root of the sum of the difference of each element and the mean). Then, subtract 2 sigmas from the mean, and add 2 sigmas to the mean, and you've got your outlier limits.
Both of these will compute in roughly linear time; the first one requires two passes, the second one takes three (once you have your limits you still have to discard the outliers). Since this is a list-based operation, I do not think you will find anything with logarithmic or constant complexity; any further performance gains would require either optimizing the iteration and calculation, or introducing error by performing the calculations on a sub-sample (such as every third element).
A good general answer to your problem seems to be RANSAC.
Given a model, and some noisy data, the algorithm efficiently recovers the parameters of the model.
You will have to chose a simple model that can map your data. Anything smooth should be fine. Let say a mixture of few gaussians. RANSAC will set the parameters of your model and estimate a set of inliners at the same time. Then throw away whatever doesn't fit the model properly.
You could filter out 2 or 3 standard deviation even if the data is not normally distributed; at least, it will be done in a consistent manner, that should be important.
As you remove the outliers, the std dev will change, you could do this in a loop until the change in std dev is minimal. Whether or not you want to do this depends upon why are you manipulating the data this way. There are major reservations by some statisticians to removing outliers. But some remove the outliers to prove that the data is fairly normally distributed.
Not an expert, but my memory suggests:
to determine percentile points exactly you need to sort and count
taking a sample from the data and calculating the percentile values sounds like a good plan for decent approximation if you can get a good sample
if not, as suggested by Henrik, you can avoid the full sort if you do the buckets and count them
One set of data of 100k elements takes almost no time to sort, so I assume you have to do this repeatedly. If the data set is the same set just updated slightly, you're best off building a tree (O(N log N)) and then removing and adding new points as they come in (O(K log N) where K is the number of points changed). Otherwise, the kth largest element solution already mentioned gives you O(N) for each dataset.
I'm new to C#. And I would like to program something like, displaying the prime numbers in a listbox if user will input any integer in the textbox. (that means, if they write 10, it will display the prime numbers from 0-10, or 20 from 0-20, etc).
What should I consider first, before I do the programming?
I know there are many examples in the internet, but first I would like to know what will I need?
Thanks for the tip;-)
===
Thanks guys. So you're suggesting that it's better to do it first in the Console application?
I did an example of "For Loop" using Console Application a very simple one, but then when I tried to do it in the Windows Form Application, I'm not sure how to implement it.
I'm afraid that if I keep doing examples in the Console, then I'll have difficulty to do it in Windows Form Apps.
What do you think?
======
Hello again,
I need some feedback with my code:
Console.WriteLine("Please enter your integer: ");
long yourInteger;
yourInteger = Int32.Parse(Console.ReadLine());
//displaying the first prime number and comparing it to the given integer
for (long i = 2; i <= yourInteger; i = i + 1)
{
//Controls i if its prime number or not
if ((i % 2 != 0) || (i == 2))
{
Console.Write("{0} ", i);
}
}
Well, first of all I'd think about how to find prime numbers, and write that in a console app that reads a line, does the math, and writes the results (purely because that is the simplest thing you can do, and covers the same parsing etc logic you'll need later).
When you are happy with the prime number generation, then look at how to do winforms - how to put a listbox, textbox and button on a form; how to handle the click event (of the button), and how to read from the textbox and write values into the listbox. Your prime code should be fairly OK to take "as is"...
If you don't already have an IDE, then note that C# Express is free and will cover all of the above.
You'll need to know:
How to read user input from a Windows application
How to generate prime numbers within a range
How to write output in the way that you want
I strongly suggest that you separate these tasks. Once you've got each of them working separately, you can put them together. (Marc suggests writing a console app for the prime number section - that's a good suggestion if you don't want to get into unit testing yet. If you've used unit testing in other languages, it's reasonably easy to get up and running with NUnit. A console app will certainly be quicker to get started with though.)
In theory, for a potentially long-running task (e.g. the user inputs 1000000 as the first number) you should usually use a background thread to keep the UI responsive. However, I would ignore that to start with. Be aware that while you're computing the primes, your application will appear to be "hung", but get it working at all first. Once you're confident with the simple version, you can look at BackgroundWorker and the like if you're feeling adventurous.
I discussed creating prime numbers using the Sieve of Eratosthenes on my blog here:
http://blogs.msdn.com/mpeck/archive/2009/03/03/Solving-Problems-in-CSharp-and-FSharp-Part-1.aspx
The code looks like this...
public IEnumerable<long> GetPrimes(int max)
{
var nonprimes = new bool[max + 1];
for (long i = 2; i <= max; i++)
{
if (nonprimes[i] == false)
{
for (var j = i * i; j <= max; j += i)
{
nonprimes[j] = true;
}
yield return i;
}
}
}
With this code you can write statements like this...
var primes = SieveOfEratosthenes.GetPrimes(2000);
... to get an IEnumerable of primes up to 2000.
All the code can be found on CodePlex at http://FSharpCSharp.codeplex.com.
The code is "as is" and so you should look at it to determine whether it suits your needs, whether you need to add error checking etc, so treat it as a sample.
Here's a great "naive" prime number algorithm, that would be perfect for your needs:
http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
Here is a response to the edit:
Thanks guys. So you're suggesting that it's better to do it first in the Console application? I did an example of "For Loop" using Console Application a very simple one, but then when I tried to do it in the Windows Form Application, I'm not sure how to implement it. I'm afraid that if I keep doing examples in the Console, then I'll have difficulty to do it in Windows Form Apps. What do you think?
If you want to present the prime numbers as a windows forms application then you need to design the user interface for it as well. That is a bit overkill for such a small problem to be solved. The easiest design you can do is to fill up a ListBox in your form (example).
If you're really keen on learning Windows Forms or WPF then there are several resources for this.
I was recently writing a routine to implement Sieve Of Eratosthenes and came across this thread. Just for the archives, here is my implementation:
static List<int> GetPrimeNumbers(int maxNumber)
{
// seed the master list with 2
var list = new List<int>() {2};
// start at 3 and build the complete list
var next = 3;
while (next <= maxNumber)
{
// since even numbers > 2 are never prime, ignore evens
if (next % 2 != 0)
list.Add(next);
next++;
}
// create copy of list to avoid reindexing
var primes = new List<int>(list);
// index starts at 1 since the 2's were never removed
for (int i = 1; i < list.Count; i++)
{
var multiplier = list[i];
// FindAll Lambda removes duplicate processing
list.FindAll(a => primes.Contains(a) && a > multiplier)
.ForEach(a => primes.Remove(a * multiplier));
}
return primes;
}
You could always seed it with "1, 2" if you needed 1 in your list of primes.
using System;
class demo
{
static void Main()
{
int number;
Console.WriteLine("Enter Number you Should be Checked Number is Prime or not Prime");
number = Int32.Parse(Console.ReadLine());
for(int i =2;i {
if(number % i == 0)
{
Console.WriteLine("Entered number is not Prime");
break;
}
}
if(number % i !=0)
{
Console.WriteLine("Entered Number is Prime");
}
Console.ReadLine();
}
}
Your approach is entirely wrong. Prime numbers are absolute and will never change. Your best bet is to pre-generate a long list of prime numbers. Then come up with an algorithm to quickly look up that number to determine if it is on the list. Then in your case (since you want to list all in the given range just do so). This solution will be much faster than any prime number finding algorithm implemented during run-time. If the integer entered is greater than your list then you can always implement the algorithm starting at that point.