How to improve my team creation logic?

How to improve my team creation logic? - c#

I had one day to create a program that, when given a .csv file with the columns: [Name] | [Elo Rating], would output another .csv file with balanced teams. The user can choose how many players they want per team.
This is probably the programming project I enjoyed the most, but it is highly inefficient given the rushed time frame and my limited knowledge of how best to implement such a task. It is similar to this question, but I would prefer not to simplify the process into just placing the players into teams based on the rating order (faster, but less accurate than what I already have). Can you give suggestions on how I could make this run faster without sacrificing accuracy?
The logic I went with is this:
Find all the possible team combinations.
Find the average rating of all the team combinations & find the closest team rating to that average
Select the first team that has the average rating from step 2.
Remove all the teams that have any of these players in them.
Repeat #2, #3, and #4 until all the teams are made.
Step 1 was accomplished by this answer and works excellently.
Step 2 might be more of a math question. I'm not sure if the average of the player's ratings is equal to the average ratings of all the teams. I'm using a slightly modified version of this answer because I am using a dictionary to hold the name & rating. I wouldn't need this dictionary if just averaging the player's ratings was just as accurate.
float average = teamScoreDict.Average(s => s.Value);
var closestScoreToAvg = teamScoreDict.Values.Aggregate((x, y) =>
Math.Abs(x - average) < Math.Abs(y - average) ? x : y);
Step 3
static Team FindTeam(float targetScore)
{
var selectedTeam = new Team();
//grabbing the first team with the proper score and adding it to final teams (not perfect since there could be multiple teams with the same score)
for (int i = 0; i < teams.Count; i++)
{
if (teams[i].TeamRating == targetScore)
{
selectedTeam = teams[i];
//add these players to the list of players who have already been selected
foreach (var player in teams[i].Player)
if (!usedPlayers.Contains(player.Key))
usedPlayers.Add(player.Key);
//remove the score and team then break
teamScoreDict.Remove(teams[i].Id);
teams.Remove(teams[i]);
break;
}
}
return selectedTeam;
}
Step 4 is what I believe to be the slowest part of this task. I thought removing the teams each iteration would make the subsequent team searches faster, which it does, but the removal process is slow.
static void RemoveTeamsWithUsedPlayers()
{
for (int i = teams.Count - 1; i >= 0; i--)
{
if (teams[i].Player.Any(kvp => usedPlayers.Contains(kvp.Key)))
{
teamScoreDict.Remove(teams[i].Id);
teams.Remove(teams[i]);
}
}
}
The results are excellent (My last run with 40 players into 5 man teams with elo ratings from 1300-2200 gave me 8 teams with only 19 point difference between the highest and lowest team's total score [8498 vs 8517]).
My problem is that it is extremely slow for larger team sizes. 12 players with 3 man teams is instant. 32 players in 4 man teams takes a few seconds. 40 players in 5 man teams takes many minutes because the possible K-combinations grow drastically and I'm looping through them so many times.
I hope this question is not too broad, and thank you for any suggestions.
EDIT: I went ahead and used the stopwatch class to get the times for each step as suggested. For 12 players with 3 per team (220 k-combinations).
00:00:00.0014493
00:00:00.0083637
00:00:00.0015608
00:00:00.0015930
There is also another step that I forgot. Between step 1 and step 2 I take the IEnumerable of all possible team combinations and put it into a class that I made, and calculate the total team score. This step takes 00:00:00.0042700
foreach (var team in teamList)
{
Team teamClass = new Team();
teamScore = 0.0F;
foreach (var player in team)
{
if (participants.TryGetValue(player, out tempInt))
{
teamScore += tempInt;
teamClass.Player.Add(player, tempInt);
}
}
teamClass.TeamRating = teamScore;
teamClass.Id = count;
teamScoreDict.Add(count, teamScore);
teams.Add(teamClass);
count++;
}
EDIT 2 (Improved Logic): I'm certain I could still improve this a lot, but based on the answer I marked and the comments I was able to drastically speed things up while still maintaining accuracy. The 3 person teams with 12 players take about the same time, but 4 person teams with 32 players went from 4.5229176s to 0.4067160s. Here are the adjustments I made:
I use the average of just the players. It is the same as the average of all team combinations
I remove the highest rated player and then find the combinations with one less player than I normally would.
While I am placing all these teams into the class I will use, I simultaneously check for the (closest team rating + highest player rating) to the (overall average * players per team)
Then I add the highest player back to that team and remove all the players from the list
Repeat step 2 & 3 until everyone is used up

For your average of averages side question, the answer is that you don't need your dictionary.
As for the algorithm, the easiest speedup is to say that the person with the highest rating will have to be on SOME team. So build only teams with that person, take the best. Then build teams again with some other person. And repeat. The result will be different than what you are doing currently because the person with the highest rating is not always on the team with closest to average rating, but the result is no less accurate. This will make your current O(n^m) algorithm for building a list of all possible teams into a O(n^(m-1)) algorithm.
But for a real speedup, what you need to do is follow the strategy in https://en.wikipedia.org/wiki/Subset_sum_problem#Pseudo-polynomial_time_dynamic_programming_solution and use dynamic programming to eliminate looking at teams whose scores will come out to be the same. That is, you build a data structure by number of players on the partial team so far, by total rating, the last player added. Now the second time you find, for instance, a combined score of 3500 for 3 players, you know that the teams that it leads to will have duplicate scores with what you already did, so you throw it out.
Throwing away all of those duplicates in the team generation process will reduce finding the next team from O(n^m) to O(n*m*k) where k is the range between the lowest and highest rating. And doing that n/m times will therefore take time O(n^2*k). Which should be acceptable with 5 person teams up to several hundred people.
So you build up this data structure, then sort the scores for a full team, and take the best one. Now walk the data structure backwards and you can find the selected team. Then throw it away and do this again.

Related

Best way to search/filter (potentially several billion) combinations of items from multiple lists or arrays

I'm trying to write a program to optimize equipment configurations for a game. The problem is as follows:
There are six different equipment slots for a character to place an item. This is represented by 6 lists of individual items for each slot in the code containing all of the equipment owned by the player altogether.
The program will calculate the total stats of the character for each possible combination of equipment (1 from each list). These calculated stats can be filtered by specific stat min/max values and then also sorted by a specific stat to pinpoint a certain target set of stats for their character.
The program should be able to perform these queries without running out of memory or taking hours, and of course, the main problem is sifting through several billion possible combinations.
I'm not sure what the name of any supporting data structures or search algorithms to accomplish this would be called (in order to do more research towards a solution). I have come up with the following idea but I'm not sure if i'm on the right track or if someone can point me in a more effective direction.
The idea i'm pursuing is to use recursion, where each list (for each possible equipment slot) is set into a tree structure, with each progressive list acting as a child of the last. E.G.:
Weapons List
|
-----Armor List
|
------Helm List... etc
Each layer of the tree would keep a dictionary of every child path it can take containing the IDs of 1 item from each list and progressively calculating the stats given to the character (simple addition of stats from weapon + armor + helm as it traverses the tree and so on...)
When any stat with a min/max filter being applied hits it's boundary for that stat (namely, if the stat goes over the maximum before it reaches the bottom layer of the tree, it eliminates that "path" from the dictionary thus removing that entire leg of possible results from being traversed).
The main goal here is to reduce the possible tree paths to be traversed by the search algorithm and remove as many invalid results before the tree needs to calculate them to make the search as fast as possible and avoid any wasteful cycles. This seems pretty straightforward when removing items based on a "maximum" filter since when adding each item's stats progressively we can quickly tell when a stat has crossed it's expected maximum -- however when it comes to stopping paths based on a minimum total stat, I can't wrap my head around how to predict and remove these paths that won't end up above the minimum by the sixth item.
To simplify the idea, think of it like this:
I have 3 arrays of numbers
[X][0][1][2]
[0] 5 3 2
[1] 1 0 8
[2] 3 2 7
[3] 2 1 0
I want to find all combinations from the 3 arrays (sums) that are minimum of 9 and maximum of 11 total.
Each array must select at least but no more than 1 item and the sum of those selected values is what is being searched. This would need to be able to scale up to search 6+ arrays of 40+ values each essentially. Is the above approach on the right track or what is the best way to go about this (mainly using c#)

You should be able to filter out a lot of items by using a lower and upper bound for each slot:
var minimum = slots.Sum(slot => slot.Minimum);
var maximum = slots.Sum(slot => slot.Maximum);
foreach (var slot in slots)
{
var maxAvailable = maximum - slot.Maximum;
var minAvailable = minimum - slot.Minimum;
var filtered = slot.Items
// If we choose the strongest item in all the other slots and it's still below the minimum
.Where(item => item.Value + maxAvailable >= request.MinimumValue)
// If we choose the weakest item in all the other slots and its still above the maximum
.Where(item => item.Value + minAvailable <= request.MaximumValue);
}
After doing this, you can guarantee that all your combinations will be above the requested minimum, however some combinations may also be above the requested maximum, so combine this with the logic you have so far and I think you should get pretty optimal performance.

Possible to do this in better than O(n^2) time?

The problem I'm trying to solve gives me a matrix like
10101
11100
11010
00101
where the rows are supposed to represented topics that a person knows; e.g. Person 1, represented by 10101, knows topics 1, 3 and 5, but not 2 or 4. I need to find the maximum number of topics that a 2-person team could know; e.g. the team that is Person 1 and 3 knows all the topics because between 10101 and 11010 there are 1s at every index.
I have an O(n^2) solution
string[] topic = new string[n];
for(int topic_i = 0; topic_i < n; topic_i++)
{
topic[topic_i] = Console.ReadLine();
}
IEnumerable<int> teamTopics =
from t1 in topic
from t2 in topic
where !Object.ReferenceEquals(t1, t2)
select t1.Zip(t2, (c1, c2) => c1 == '1' || c2 == '1').Sum(b => b ? 1 : 0);
int max = teamTopics.Max();
Console.WriteLine(max);
which is passing all the test cases it doesn't time out on. I suspect the reason it's not fast enough has to do with the time complexity rather than the overhead of the LINQ machinery. But I can't think of a better way to do it.
I thought that maybe I could map the indices of topics to the persons who know them, like
1 -> {1,2,3}
2 -> {2,3}
3 -> {1,2,4}
4 -> {3}
5 -> {1,4}
but I can't think of where to go from there.
Can you supply me with a "hint"?

Let's say we have n people and m topics.
I would argue that your algorithm is O(n^2 * m), where n is number of people, because:
from t1 in topic gets you O(n)
from t2 in topic gets you to O(n^2)
t1.Zip(t2 ... get you to O(n^2 * m)
An optimisation that I see is first to modify strings a bit:
s1 = '0101', where i-th element shows whether a person i knows 1st topic
s2 = '1111', where i-th element shows whether a person i knows 2nd topic.
etc...
Then you analyse string s1. You pick all possible pairs of 1s (O(n^2) elements) that show pairs of people that together know 1st topic. Then go pick a pair from that list and check whether they know 2nd topic as well and so on. When they don't, delete it from the list and move on to another pair.
Unfortunately this looks to be O(n^2 * m) as well, but this should be quicker in practise. For very sparse matrix, it should be close to O(n2), and for dense matrices it should find a pair pretty soon.

Thoughts:
as a speculative optimization: you could do an O(n) sweep to find the individual with the highest number of skills (largest hamming weight); note them, and stop if they have everything: pair them with anyone, it doesn't matter
you can exclude anyone without testing who only has skilled shared with the "best" individual - we already know about everything they can offer and have tested against everyone; so only test if (newSkills & ~bestSkills) != 0 - meaning: the person being tested has something that the "best" worker didn't have; this leaves m workers with complementary skills plus the "best" worker (you must include them explicitly, as the ~/!=0 test above will fail for them)
now do another O(m) sweep of possible partners - checking to see if the "most skilled" plus any other gives you all the skills (obviously stop earlier if a single member has all the skills); but either way: keep track of best combination for later reference
you can further half the time by only considering the triangle, not the square - meaning: you compare row 0 to rows 1-(m-1), but row 1 to rows 2-(m-1), row 5 to 6-(m-1), etc
you can significantly improve things by using integer bit math along with an efficient "hamming weight" algorithm (to count the set bits) rather than strings and summing
get rid of the LINQ
short-circuit if you get all ones (compare to ~((~0)<<k), where k is the number of bits being tested for)
remember to compare any result to the "best" combination we found against the most skilled worker
This is still O(n) + O(m^2) where m <= n is the number of people with skills different to the most skilled worker
Pathological but technically correct answer:
insert a Thread.Sleep(FourYears) - all solutions are now essentially O(1)

Your solution is asymptotically as efficient as it gets, because you need to examine all pairs to arrive at the maximum. You can make your code more efficient by replacing strings with BitArray objects, like this:
var topic = new List<BitArray>();
string line;
while ((line = Console.ReadLine()) != null) {
topic.Add(new BitArray(line.Select(c => c=='1').ToArray()));
}
var res =
(from t1 in topic
from t2 in topic
select t1.Or(t2).Count).Max();
Console.WriteLine(res);
Demo.

Big O Analysis with sub loop of varying size

I've been doing some practice problems for some interviews coming up, and I was curious about something. Say, for example, in the following algorithm
foreach(User friend in friends)
{
foreach(Purchase purchase in friend.Purchases)
{
allFriendsPurchases.Add(purchase);
}
}
So, going through each friend is O(n), because we're iterating through all the friends. But what about the sub loop? There are some friends that may not have purchased anything, and some that have purchased a lot. How would you describe the run time in Big O Notation?
Thanks

This is one of those cases where it's important to specify what n is. This algorithm can be defined as being O(n) where n represents the total number of purchases, not the total number of friends.
If you want to define n as the number of friends then n alone isn't enough variables. The number of iterations is dependent on more than just how many friends there are. There are different ways in which you could describe the number of iterations; one way would be to say that this algorithm is O(n*m) where n is the number of friends and m is the average number of purchases per friend. (If m were known to be small, say less than some fixed value, then you can transform that to O(n), claiming that m is constant, but that isn't true in the general case.)

Complex permutations without repetition

I am trying to create a tool for a game called Monster Hunter (for personal-use)). I have worked with permutations before, but nothing this complex so i am totally stuck.
In the game you wear 5 pieces of armor. Each piece has skill points for one of many different skills. If you have 10+ skill points in a particular skill after calculating the whole set, you earn that skill.
Example:
Foo Head: Attack +2, Guard + 2
Foo Chest: Defense + 5
Foo Body: Guard + 2, Attack + 5, Defense +2
Foo Arm: Attack + 3, Speed + 4
Foo Legs: Attack + 5, Guard + 6, Defense + 3
The above set would result in 10+ in Attack, Defense, and Guard (not speed).
I would like to figure out how to find all combinations of armor pieces given 2-3 user-specified skills. So if you selected "Attack" and "Speed", it would give you all possible combinations of 5 pieces of armor that will result in +10 in both "Attack" and "Speed". There are about 60 different items for each of the 5 categories.
I know I can use LINQ to filter each of the 5 categories of armor parts so that I only get back a list of all the items that include one of the 2 specified skills, but I am lost on how to do the permutations since I am juggling 2-3 user-specified skills...
I wish I had working code to show, but I am so lost at this point I don't know where to start. I am not looking for the answer, per se, but advice on how to get there. Thanks.

1) I would try to find just for 1 skill, then filter that item set for the second / third
2) to avoid taking too much time/memory/recursion : i would sort the 5 * 60 items based on that only skill. Then i would create combinations by seeking the ones that add up to more than 10, starting from the upper skills, and stopping either when 10 is reached, or when it won't be reached.
The function that builds all combinations would look like :
1 : if we have total item skill >10 : all combination with other items are ok . stop.
2 : if current item skill is count <10 seek in the array for next biggest item for a not weared piece.
if in the array we reached 0 OR we reached a value such that (current count + value*number of piece type left ) <10 then its time to stop :-)
Otherwise add its skill count, note piece of armor type as used, then call your function for all items that might match.
well i may not be precise enough but you see the idea : use condition for the call to avoid exploding recursivity. Because 60*60*60*60*60 is a lot. and (quick)sorting 5*60=300 items is nothing.
To store your combinations, you might want to add the 'anything goes' case, to avoid storing / computing too many combination for nothing. (ex : if you have Carmak's Magical Hat, you have +100 in Coding, and you can dress any way you want, the bugs will dye ! :-) )

Algorithm for Team Scheduling - Quiz design

I've got a weird problem to solve - this is to be used in designing a quiz, but it's easiest to explain using teams.
There're 16 teams, and 24 matches. 4 teams play in every match. Each team has to appear once against 12/16 teams and twice against the remaining 3/16, and has to appear exactly 6 times. Any ideas on how to do this? If there's a software that can do this, that'd be great as well.
UPDATE:
I'm not sure if the above is even possible. Here is the minimum we're trying to accomplish:
Number of games is not set.
Each Game has 4 teams.
Each team gets an equal number of games.
Is this possible?

Check this ...
http://en.wikipedia.org/wiki/Round-robin_tournament
I think someone could generalize the algorithm so that applies for more than 2 teams ...
I know this doesn't answer the question but it provides some tip ...
This also may help a little ...
http://en.wikipedia.org/wiki/Tournament_(graph_theory)

Note that each team plays 3 others per match, so it takes at least 5 matches to play all 15 other teams. We hope, then, that there is a solution for 20 matches where each team plays 5 matches and plays each team exactly once.
With 16 teams it's possible to construct a solution by hand in the following way...
Divide the 20 matches into 5 rounds
Number the teams 1 to 16
For each match in turn, for each of the 4 places in that match, allocate the first team which
is still available to play in that round
has not yet played any of the teams already allocated to that match
You can narrow the search for available teams somewhat by noting that each match must contain exactly one team from each match of the previous round, so for place n you need only consider the teams which played match n in the previous round.
If we want 24 matches then any random choice of matches will suffice in the sixth round to fit the original requirements. However, to also ensure that no exact matches are repeated we can switch pairs of teams between the matches in some previous round. That is, if {1,2,3,4} and {5,6,7,8} were matches in some round then in round 6 we'll have {1,2,7,8} and {3,4,5,6}. Since 1 and 2 played each other exactly once in rounds 1-5, in the match {1,2,3,4}, we certainly haven't played match {1,2,7,8} yet.
The choice of data structures to implement that efficiently is left as an exercise for the reader.

Pull out your combinatorics book. I remember these questions as in that scope.

"Combinatorial Designs and Tournaments" was a textbook I had for a course about Combinatorial Designs that had this type of problem. One of my majors back in university was Combinatorics & Optimization, so I do remember a little about this kind of thing.

A little more clarity identifying the problem would be helpful. What type of sport are you trying to schedule. It sounds like you're into a 16 person tennis league and each week 4 individuals players show up on four courts to play a doubles match (players A&B vs C&D). The same is happening on the other three courts with players E thru P. Is this what you're looking for? If so the answer is easy. If not, I still don't understand what you're looking for.

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.