What kind of algorithm is this, I know pretty much nothing but this is what I'm trying to do in code... I have class 'Item', properties int A and int B -- I have multiple lists of List<Item> with a random amount of Item in each list, incosistent with any other List. I must choose 1 item from each list to get the highest possible value of the sum Item.A while conforming that the sum of Item.B must also be at minimum a certain number. In the future there might also be another property Item.C to conform to that the sum must be equal to a certain number. I have no idea how to write this :(
So to put it this way;
class Item
int A
int B
int C
I have a 10x different List<Item> each with a random number of Item inside
We must find the exactly the best combination to have
a) Highest sum of Item.A
b) Constraint that the sum of Item.B must be higher than X
c) Constraint that the sum of Item.C must be equal to X
I have no idea how to code this to be fast and efficient. :(
As mentionend in my comment, this is a Binary Programming problem, which can be cast as a multi-dimensional Knapsack problem. I would first try to solve it with an off-the-shelf Mixed Integer Programming (MIP) solver like the one suggested by Lieven in one of his comments (lpSolve), given that you "only" have got some 100-200 binary variables. You might have to play a little bit around with the parameters. Some MIP solvers allow you to add search heuristics, which might be helpful. Given your constraints, I must admit I don't have a feeling how long a standard MIP solver will take, but I wouldn't hold my breath.
If a mixed-integer programming solver is not fast enough for you, you want to look at some more specialised algorithms. For your problem, the ones presented in Knapsack Problems, chapter 11.10 on the multiple-choice Knapsack problem (almost exactly your problem) and chapter 9 are relevant.
Edit: based on your comments, the good news is that your data ranges are pretty good and the problem seems solvable in a reasonable time. This paper (DOI in case the link vanishes) presents an algorithm that according to the authors solves problems of your size within seconds (see section 4.4 and 5.1). The bad news is that it contains a lot of math...
I posted this question as an unregistered user and after clicking register, it didn't associate my unregistered user with my registered user, nice =/
In regards to the comment by van:
Typically there will be about 14 lists or so
Within each list there will be usually around 5-15 'Items'
Each item has those 3 properties.
We must exactly 1 item from each list.
We are looking for the maximum value of PropertyA when we calculate the sum of all PropertyA after choosing one item from each list
The constraints are PropertyB and PropertyC which the chosen combination must confirm too, once again using the sum of the values across the combination.
It must also be the most optimal solution, not an approximation.
Related
I want an logic for my below requirement.
I have number say 600. I have another set of numbers (Say 100, 200, 500 etc). I need to implement a logic where I need to find the combinations of these number(100,200,500 etc), that will sum up to 600 or less than that.
I am nowhere near to add any code here. Please shed some light on this for me.
As per your comment, if you are looking for an idea to start with, then take a look at the Knapsack problem (1).
This is exactly what you are looking for.
The knapsack problem or rucksack problem is a problem in combinatorial
optimization:
Given a set of items, each with a mass and a value,
determine the number of each item to include in a collection so that
the total weight is less than or equal to a given limit and the total
value is as large as possible.
It derives its name from the problem
faced by someone who is constrained by a fixed-size knapsack and must
fill it with the most valuable items.
The links above also has examples of implementing an algorithm for solving the problem.
One of my clients wants to use a unique code for his items (long story..) and he asked me for a solution. The code will consist in 4 parts in which the first one is the zip code where the item is sent from, the second one is the supplier registration number, the third number is the year when the item is sent and the last part is a three division alphanumeric unique character.
As you can see the first three parts are static fields which will never change for the same sender in the same year. So we can say that the last part is the identifier part for that year. This part is 3-division alpahnumeric which means starting from 000 and ending with ZZZ.
The problem is that my client, for some reasonable reasons, wants this part to be not sequential. For example this is not what he wants:
06450-05-2012-000
06450-05-2012-001
06450-05-2012-002
...
06450-05-2012-ZZY
06450-05-2012-ZZZ
The last part should produced randomly like:
06450-05-2012-A17
06450-05-2012-0BF
06450-05-2012-002
...
06450-05-2012-T7W
06450-05-2012-22C
But it should also non-repetitive. So once a possible id is generated the possibility should be discarded from the selection pool.
I am looking for an effective way to do this.
If I only record selected possibilities and check a newly created one against them there is always a worst case possibility that it keeps producing already selected ones, especially near the end.
If I create all possibilities at once and record them in a table or a file it may take a while after every item creation because it will lookup for a non-selected record. By the way 26 letters + 10 digits means 46.656 possible combinations, and there is a chance that there may be a 4th divison added which means 1.679.616 possible combinations.
Is there a more effective way you can suggest? I will use C# for coding and MS SQL for databese..
If it doesn't have to be random, you could maybe simply choose a fixed but "unpredictable" addend which is relatively prime to 26 + 10 == 36 == 2²·3². This means, just choose a fixed addend divisible by neither 2 nor 3.
Then keep adding this fixed number to your previous serial number every time you need a new serial number. This is to be done modulo 46656 (or 1679616) of course.
Mathematics guarantees you won't get the same number twice (before no more "free" numbers are left).
As the addend, you could use const int addend = 26075 since it's 5 modulo 6.
If you expect to create far less than 36^3 entries for each zip-supplier-year tuple, you should probably just pick a random value for the last field and then check to see if it exists, repeating if it does.
Even if you create half of the maximum number of possible entries, new entries still have an expected value of only one failure. Assuming your database is indexed on the overall identifier, this isn't too great a price to pay.
That said, if you expect to use all but a few possible identifiers, then you should probably create all the possible records in advance. It may sounds like a high cost, but each space in memory storing an unused record will eventually store a real record.
I'd expect the first situation is more likely, but if not, or if there's some other combination of the two, please add a comment with some more information and I'll revise my answer.
I think options depend on the amount of the codes that are going to be used:
If you expect to use most of them within a year, then it is better to pre-generate. If done right, lookup should be really fast. And you are going to have 1.679.616 items per year in your DB anyway, so you will have to do such things right.
On the other hand, is it good that you are expecting to use most of them? It may leave you without codes if there are suddenly more items than expected.
If you expect to use only a small amount, then random+existence check might be a way to go, however it is unclear what amount it should be for that to be best (I am pretty sure it is possible to calculate that though).
I'm trying to see if a specific algorithm can be translated to the kind of map-reduce index RavenDB/CouchDB uses, ie, "pre-computed" map-reduce (which means the indexes are refreshed on insertion and updates, not when performing the actual query).
Let's say we have a typical online store with 50,000 products, grouped in categories. Every product has a collection of "Attribute Values", ie, something like "[Red, Round, Metal]".
Since we have so much products on our website, and there's probably a lot of items in each of the categories, we want to give the user another way to "filter" the products he's currently seeing.
For example, if a category is "Less than $20", there's a whole bunch of products in this category. But our user only need to see products which are less than $20 and Red. Unfortunately, there's no sub-category "Red" in the "Less than $20" category.
Our algorithm would take the current list of products, and generate a list of "interesting" Attributes and Attribute Values, ie, given a list of products, it would output something like:
Color
Red (40)
Blue (32)
Yellow (17)
Material
Metal (37)
Plastic (36)
Wood (23)
Shape
Square (56)
Round (17)
Cylinder (12)
Could this sort of algorithm be somehow pre-computed à la RavenDB/CouchDB map-reduce index? If not, why exactly (so I can identify that kind of algorithm in the future) and if yes, how?
A C# 4.0 Visual Studio Test Solution is available that demonstrates the potential data structures and sample data, as well as a try at a map-reduce implementation (which doesn't seem to be pre-computable).
General case: It's always possible to use a CouchDB-style map-reduce view, but it's not necessarily practical.
In the end, it's mostly a counting-based argument: if you need to ask the question for any subset of your 500,000 products, then your database must be able to provide a distinct answer to each of 2500,000 different possible questions, which uses a prohibitive amount of memory if you have to emit a B-tree leaf for every one of them (and you need to emit data unless the answer to most of these queries is zero, false, an empty set or a similar null value).
CouchDB provides a first small optimization through the existence of range queries (meaning that in an ideal case, it can use as little as N B-tree leaves to answer N2 questions). However, in your example, this would only reduce the number of leaves down to 2250,000 (and that's a theoretical lower bound).
CouchDB provides a second small optimization through key prefix queries, meaning that you can compress [A], [A,B] and [A,B,C] queries into a single [A,B,C] key. So, instead of your 2250,000 possibilities, you're down to a "mere" 2249,999 ...
So, while you could think up an emitting strategy for answering the question for any subset, it would take more storage space than is actually available on our planet. In the general case, to answer N different questions you need to emit at least sqrt(N/2) B-tree leaves, so count your questions and determine if that lower bound on the number of leaves is acceptable.
Only for categories and subcategories: if you give up on arbitrary lists of products and only ask questions of the form "give me the significant attributes in category A filtered by attributes B and C", then your number of emits drops to:
AvgCategories * AvgAttr * 2 ^ (AvgAttr - 1) * 500,000
You're basically emitting for each product the keys [Category,Attr,Attr,...] for all categories of the product and all combinations of attributes of the product, which lets you query by category + attributes. If you have on average 1 category and 3 attributes per product, this works out to about 6 million entries, which is fairly acceptable.
This should be quite straightforward to implement in something like CouchDB. Have the map phase of your index output one key, value pair for each attribute the object has, with the value simply being '1'. Then, have the reduce phase sum up all input values and output the sum. The end result will be an index of the form you describe.
This question already has answers here:
How to find the index of an element in an array in Java?
(15 answers)
Closed 6 years ago.
I was asked this question in an interview. Although the interview was for dot net position, he asked me this question in context to java, because I had mentioned java also in my resume.
How to find the index of an element having value X in an array ?
I said iterating from the first element till last and checking whether the value is X would give the result. He asked about a method involving less number of iterations, I said using binary search but that is only possible for sorted array. I tried saying using IndexOf function in the Array class. But nothing from my side answered that question.
Is there any fast way of getting the index of an element having value X in an array ?
As long as there is no knowledge about the array (is it sorted? ascending or descending? etc etc), there is no way of finding an element without inspecting each one.
Also, that is exactly what indexOf does (when using lists).
How to find the index of an element having value X in an array ?
This would be fast:
int getXIndex(int x){
myArray[0] = x;
return 0;
}
A practical way of finding it faster is by parallel processing.
Just divide the array in N parts and assign every part to a thread that iterates through the elements of its part until value is found. N should preferably be the processor's number of cores.
If a binary search isn't possible (beacuse the array isn't sorted) and you don't have some kind of advanced search index, the only way I could think of that isn't O(n) is if the item's position in the array is a function of the item itself (like, if the array is [10, 20, 30, 40], the position of an element n is (n / 10) - 1).
Maybe he wants to test your knowledge about Java.
There is Utility Class called Arrays, this class contains various methods for manipulating arrays (such as sorting and searching)
http://download.oracle.com/javase/6/docs/api/java/util/Arrays.html
In 2 lines you can have a O(n * log n) result:
Arrays.sort(list); //O(n * log n)
Arrays.binarySearch(list, 88)); //O(log n)
Puneet - in .net its:
string[] testArray = {"fred", "bill"};
var indexOffset = Array.IndexOf(testArray, "fred");
[edit] - having read the question properly now, :) an alternative in linq would be:
string[] testArray = { "cat", "dog", "banana", "orange" };
int firstItem = testArray.Select((item, index) => new
{
ItemName = item,
Position = index
}).Where(i => i.ItemName == "banana")
.First()
.Position;
this of course would find the FIRST occurence of the string. subsequent duplicates would require additional logic. but then so would a looped approach.
jim
It's a question about data structures and algorithms (altough a very simple data structure). It goes beyond the language you are using.
If the array is ordered you can get O(log n) using binary search and a modified version of it for border cases (not using always (a+b)/2 as the pivot point, but it's a pretty sophisticated quirk).
If the array is not ordered then... good luck.
He can be asking you about what methods you have in order to find an item in Java. But anyway they're not faster. They can be olny simpler to use (than a for-each - compare - return).
There's another solution that's creating an auxiliary structure to do a faster search (like a hashmap) but, OF COURSE, it's more expensive to create it and use it once than to do a simple linear search.
Take a perfectly unsorted array, just a list of numbers in memory. All the machine can do is look at individual numbers in memory, and check if they are the right number. This is the "password cracker problem". There is no faster way than to search from the beginning until the correct value is hit.
Are you sure about the question? I have got a questions somewhat similar to your question.
Given a sorted array, there is one element "x" whose value is same as its index find the index of that element.
For example:
//0,1,2,3,4,5,6,7,8,9, 10
int a[10]={1,3,5,5,6,6,6,8,9,10,11};
at index 6 that value and index are same.
for this array a, answer should be 6.
This is not an answer, in case there was something missed in the original question this would clarify that.
If the only information you have is the fact that it's an unsorted array, with no reletionship between the index and value, and with no auxiliary data structures, then you have to potentially examine every element to see if it holds the information you want.
However, interviews are meant to separate the wheat from the chaff so it's important to realise that they want to see how you approach problems. Hence the idea is to ask questions to see if any more information is (or could be made) available, information that can make your search more efficient.
Questions like:
1/ Does the data change very often?
If not, then you can use an extra data structure.
For example, maintain a dirty flag which is initially true. When you want to find an item and it's true, build that extra structure (sorted array, tree, hash or whatever) which will greatly speed up searches, then set the dirty flag to false, then use that structure to find the item.
If you want to find an item and the dirty flag is false, just use the structure, no need to rebuild it.
Of course, any changes to the data should set the dirty flag to true so that the next search rebuilds the structure.
This will greatly speed up (through amortisation) queries for data that's read far more often than written.
In other words, the first search after a change will be relatively slow but subsequent searches can be much faster.
You'll probably want to wrap the array inside a class so that you can control the dirty flag correctly.
2/ Are we allowed to use a different data structure than a raw array?
This will be similar to the first point given above. If we modify the data structure from an array into an arbitrary class containing the array, you can still get all the advantages such as quick random access to each element.
But we gain the ability to update extra information within the data structure whenever the data changes.
So, rather than using a dirty flag and doing a large update on the next search, we can make small changes to the extra information whenever the array is changed.
This gets rid of the slow response of the first search after a change by amortising the cost across all changes (each change having a small cost).
3. How many items will typically be in the list?
This is actually more important than most people realise.
All talk of optimisation tends to be useless unless your data sets are relatively large and performance is actually important.
For example, if you have a 100-item array, it's quite acceptable to use even the brain-dead bubble sort since the difference in timings between that and the fastest sort you can find tend to be irrelevant (unless you need to do it thousands of times per second of course).
For this case, finding the first index for a given value, it's probably perfectly acceptable to do a sequential search as long as your array stays under a certain size.
The bottom line is that you're there to prove your worth, and the interviewer is (usually) there to guide you. Unless they're sadistic, they're quite happy for you to ask them questions to try an narrow down the scope of the problem.
Ask the questions (as you have for the possibility the data may be sorted. They should be impressed with your approach even if you can't come up with a solution.
In fact (and I've done this in the past), they may reject all your possibile approaches (no, it's not sorted, no, no other data structures are allowed, and so on) just to see how far you get.
And maybe, just maybe, like the Kobayashi Maru, it may not be about winning, it may be how you deal with failure :-)
A SO post about generating all the permutations got me thinking about a few alternative approaches. I was thinking about using space/run-time trade offs and was wondering if people could critique this approach and possible hiccups while trying to implement it in C#.
The steps goes as follows:
Given a data-structure of homogeneous elements, count the number of elements in the structure.
Assuming the permutation consists of all the elements of the structure, calculate the factorial of the value from Step 1.
Instantiate a newer structure(Dictionary) of type <key(Somehashofcollection),Collection<data-structure of homogeneous elements>> and initialize a counter.
Hash(???) the seed structure from step 1, and insert the key/value pair of hash and collection into the Dictionary. Increment the counter by 1.
Randomly shuffle(???) the order of the seed structure, hash it and then try to insert it into the Dictionary from step 3.
If there is a conflict in hashes,repeat step 5 again to get a new order and hash and check for conflict. Upon successful insertion increment the counter by 1.
Repeat steps 5 & 6 until the counter equals the factorial calculated in step 2.
It seems like doing it this way using some sort of randomizer(which is a black box to me at the moment) might help with getting all the permutations within a decent timeframe for datasets of obscene sizes.
It will be great to get some feedback from the great minds of SO to further analyze this approach whose objective is to deviate from the traditional brute-force approach prevalent in algorithms of such nature and also the repercussions of implementing such an algorithm using C#.
Thanks
This method of generating all permutations does not fare well as compared to the standard known methods.
Say you had n items and M=n! permutations.
This method of generation is expected to generate M*lnM permutations before discovering all M.
(See this answer for a possible explanation: Programing Pearls - Random Select algorithm)
Also, what would the hash function be? For a reasonable hash function, we might have to start dealing with very large integer issues pretty soon (any n > 50 for sure, don't remember that exact cut-off point).
This method uses up a lot of memory too (the hashtable of all permutations).
Even assuming the hash is perfect, this method would take expected Omega(nMlogM) operations and guaranteed Omega(nM) space, while standard well-known methods can do it in O(M) time and O(n) space.
As a starting point I suggest one can read: Systematic Generation of All Permutations which is believe is O(nM) time and O(n) space and still much better than this method.
Note that if one has to generate all permutations, any algorithm will necessarily take Omega(M) steps and so the the method I refer to above is optimal!
It seems like a complicated way to randomise the order of the generated permutations. In terms of time efficiency, you can't do much better than the 'brute force' approach.