Consider the table below. It has 400,000 rows with 40 columns with values which can range from 0 to 4,000.
MeasureValue1
MeasureValue2
MeasureValue3
...
MeasureValue40
1
5
7
...
2740
2
5
7
...
2749
2
6
7
...
2703
4
6
8
...
2721
Conditions are then given per column to which other columns in the row must suffice before the value in that specific column is counted, these conditions are only known at runtime. Essentially a group by is performed across every column where other columns in the row satisfy the given conditions. For example, these conditions could be as follows.
Count MeasureValue1 if MeasureValue2 equals 5
Count MeasureValue2 if MeasureValue1 equals 2
Count MeasureValue3 if MeasureValue1 equals 2 and MeasureValue2 equals 5
...
Count MeasureValue40 if MeasureValue1 equals 2 and MeasureValue2 equals 5
In which the final result would be a table with counts per value. Given the example conditions above, that table would look as follows.
1
2
3
4
5
6
7
8
9
10
...
2749
1
1
0
0
1
0
1
0
0
0
...
1
To tackle this problem I have written something akin to the code below. It is definitely faster than performing a LINQ GroupBy across every column, and also faster even compared to multithreaded LINQ. It takes in an array of 400,000 times 40 values. You can imagine the 40 values as being a single row in a table as described above, of which there are 400,000 rows with those 40 values. Those 40 values can range from 0 to 4,000.
Since both the amount of rows and the amount of values can change dynamically, a single array was chosen to store everything. The reason jagged arrays are not being used is due to being clunky to work with, on top of having read that they negatively affect performance.
The code then counts the values present in the array if any combination of the 39 values meets a specific condition. In the example below the first value is counted if the second value is a 5, the second value is counted if the first value is a 2, and the third value is counted if the first value is a 2 and the second value is a 5. These conditions are only known at runtime. The combinations of conditions each with their own range of hundreds of possible values quickly reaches an array with a length too big to store, which means I cannot bake the amounts in some array and be done with it.
This piece of code will be called billions of times per day, maybe more, so it is imperative that it is as quick as possible. The implementation I have currently, which resembles the example below (there is an additional optimization which only counts the conditions once), has it down to an average of 40 milliseconds, I need to reduce this to at least 4 milliseconds by any means possible (except parallelism, obviously throwing more cores at it will make it faster). I briefly looked at SIMD but couldn't figure out how to apply it to this problem. How/where can I find the fastest algorithm to tackle this problem?
void Main()
{
var values = new ushort[400_000 * 40];
var stopwatch = new Stopwatch();
stopwatch.Start();
var totals = new ushort[4_000];
for (var i = 0; i < values.Length; i += 40)
{
if (values[i + 1] == 5)
{
totals[values[i]]++;
}
if (values[i] == 2)
{
totals[values[i + 1]]++;
}
if (values[i] == 2 && values[i + 1] == 5)
{
totals[values[i + 2]]++;
}
// More ifs which count...
}
Console.WriteLine(stopwatch.ElapsedMilliseconds);
}
Related
I have data like that:
Time(seconds from start)
Value
15
2
16
4
19
2
25
9
There are a lot of entries (10000+), and I need a way to find fast enough sum of any time range, like sum of range 16-25 seconds (which would be 4+2+9=15). This data will be dynamically changed many times (always adding new entries at the bottom of list).
I am thinking about using sorted list + binary search to determinate positions and just make sum of values, but is can took too much time to calculate it. Is there are any more appropriate way to do so? Nuget packets or algorithm references would be appreciated.
Just calculate cumulative sum:
Time Value CumulativeSum
15 2 2
16 4 6
19 2 8
25 9 17
Then for range [16,25] it will be task to binary search left border of 16 and 25 exact, which turns into 17 - 2 = 15
Complexity: O(log(n)), where n - size of the list.
Binary search implementation for lower/upper bound can be found in my repo - https://github.com/eocron/Algorithm/blob/master/Algorithm/Sorted/BinarySearchExtensions.cs
I got 9 numbers which I want to divide in two lists, and both lists need to reach a certain amount when summed up. For example I got a list of ints:
List<int> test = new List<int>
{
1963000, 1963000, 393000, 86000,
393000, 393000, 176000, 420000,
3193000
};
And I want to have 2 lists of numbers that when you sum them up, they both reach over 4 million.
It doesn't matter if the 2 lists don't have the same amount of numbers. If it only takes 2 numbers to reach 4 million in 1 list, and 7 numbers together reaching 7 million, is fine.
As long as both lists summed up are equal to 4 million or higher.
Is this certain sum low enough to be reached easily?
If yes, then your algorithm may be as simple as: iterate i from 1 to number of items. sum up the first i numbers. if the sum is higher than your certain sum (eg 4 million), then you are finished, else increment i.
BUT: if your certain sums are high and it is not such trivial to find the partition, then you have the famous Partition Probem (https://en.wikipedia.org/wiki/Partition_problem), this is not that simple but there are some algorithms. Read this wikipedia artikle or try to google "Partition problem solution" or similar.
I have a list of entities, and for the purpose of analysis, an entity can be in one of three states. Of course I wish it was only two states, then I could represent that with a bool.
In most cases there will be a list of entities where the size of the list is usually 100 < n < 500.
I am working on analyzing the effects of the combinations of the entities and the states.
So if I have 1 entity, then I can have 3 combinations. If I have two entities, I can have six combinations, and so on.
Because of the amount of combinations, brute forcing this will be impractical (it needs to run on a single system). My task is to find good-but-not-necessarily-optimal solutions that could work. I don't need to test all possible permutations, I just need to find one that works. That is an implementation detail.
What I do need to do is to register the combinations possible for my current data set - this is basically to avoid duplicating the work of analyzing each combination. Every time a process arrives at a certain configuration of combinations, it needs to check if that combo is already being worked at or if it was resolved in the past.
So if I have x amount of tri-state values, what is an efficient way of storing and comparing this in memory? I realize there will be limitations here. Just trying to be as efficient as possible.
I can't think of a more effective unit of storage then two bits, where one of the four "bit states" is not used. But I don't know how to make this efficient. Do I need to make a choice on optimizing for storage size or performance?
How can something like this be modeled in C# in a way that wastes the least amount of resources and still performs relatively well when a process needs to ask "Has this particular combination of tri-state values already been tested?"?
Edit: As an example, say I have just 3 entities, and the state is represented by a simple integer, 1, 2 or 3. We would then have this list of combinations:
111
112
113
121
122
123
131
132
133
211
212
213
221
222
223
231
232
233
311
312
313
321
322
323
331
332
333
I think you can break this down as follows:
You have a set of N entities, each of which can have one of three different states.
Given one particular permutation of states for those N entities, you
want to remember that you have processed that permutation.
It therefore seems that you can treat the N entities as a base-3 number with 3 digits.
When considering one particular set of states for the N entities, you can store that as an array of N bytes where each byte can have the value 0, 1 or 2, corresponding to the three possible states.
That isn't a memory-efficient way of storing the states for one particular permutation, but that's OK because you don't need to store that array. You just need to store a single bit somewhere corresponding to that permutation.
So what you can do is to convert the byte array into a base 10 number that you can use as an index into a BitArray. You then use the BitArray to remember whether a particular permutation of states has been processed.
To convert a byte array representing a base three number to a decimal number, you can use this code:
public static int ToBase10(byte[] entityStates) // Each state can be 0, 1 or 2.
{
int result = 0;
for (int i = 0, n = 1; i < entityStates.Length; n *= 3, ++i)
result += n * entityStates[i];
return result;
}
Given that you have numEntities different entities, you can then create a BitArray like so:
int numEntities = 4;
int numPerms = (int)Math.Pow(numEntities, 3);
BitArray states = new BitArray(numPerms);
Then states can store a bit for each possible permutation of states for all the entities.
Let's suppose that you have 4 entities A, B, C and D, and you have a permutation of states (which will be 0, 1 or 2) as follows: A2 B1 C0 D1. That is, entity A has state 2, B has state 1, C has state 0 and D has state 1.
You would represent that as a boolean array like so:
byte[] permutation = { 2, 1, 0, 1 };
Then you can convert that to a base 10 number like so:
int asBase10 = ToBase10(permutation);
Then you can check if that permutation has been processed like so:
if (!bits[permAsBase10])
{
// Not processed, so process it.
process(permutation);
bits[permAsBase10] = true; // Remember that we processed it.
}
Without getting overly fancy with algorithms and data structures and assuming your tri-state values can be represented in strings and doesn't have a easily determined fix maximum amount. ie. "111", "112", etc (or even "1:1:1", "1:1:2") then a simple SortedSet may end up being fairly efficient.
As a bonus, it doesn't care about the number of values in your set.
SortedSet<string> alreadyTried = new SortedSet<string>();
if(!HasSetBeenTried("1:1:1"){
// do whatever
}
if(!HasSetBeenTried("500:212:100"){
// do whatever
}
public bool HasSetBeenTried(string set){
if(alreadyTried.Contains(set)) return false;
alreadyTried.Add(set);
return true;
}
Simple mathematic says:
3 entities in 3 states makes 27 combinations.
So you need exactly log(27)/log(2) = ~ 4.75 bits to store that information.
Because a pc only can make use of whole bits, you need to "waste" ~0.25 bits and use 5 bits per combination.
The more data you gather, the better you can pack that information, but in the end, maybe a compression algorithm could help even more.
Again: you only asked for memory efficiency, not performance.
In general you can calculate the bits you need by Math.Ceil(Math.Log( noCombinations , 2 )).
This isn't a complicated problem, but I can't for whatever reason think of a simple way to do this with the modulus operator. Basically I have a collection of N items and I want to display them in a grid.
I can display a maximum of 3 entries across and infinite vertically; they are not fixed width...So If I have 2 items they get displayed like that [1][2]. If I have 4 items they get displayed stacked like this:
[1][2]
[3][4]
If I have 5 items it should look like this:
[ 1 ][ 2]
[3][4][5]
Seven items is slightly more complicated:
[ 1 ][ 2]
[ 3 ][ 4]
[5][6][7]
This is one of those things where if I slept on it, it would be brain dead obvious in the morning, but all I can think about doing involves complicated loops and state variables. There has to be an easier way.
I'm doing this in C# but I doubt the language matters.
By maximizing the number of rows that have three items, you can minimize the total number of rows. Thus six items would be grouped as two rows of 3 rather than three rows of 2:
[1][2][3]
[4][5][6]
and ten items would be grouped as two rows of 2 and two rows of 3 rather than five rows of 2:
[ 1 ][ 2 ]
[ 3 ][ 4 ]
[5][6][7 ]
[8][9][10]
If you want rows with two items first, then you keep peeling off two items until the remaining items are divisible by 3. As you go through the loop, you need to keep track of the number of remaining items using an index or whatnot.
In your loop to populate each row, you can check these conditions:
//logic within loop iteration
if (remaining % 3 == 0) //take remaining in threes; break the loop
else if (remaining >= 4) //take two items, leaving two or more remaining
else //take remaining items, which will be two or three; break the loop
If we walk through the example of 10 items, the process would go as follows:
10 items remaining. 10 % 3 != 0. Since 10 > 4, take two items.
8 items remaining. 8 % 3 != 0. Since 8 > 4, take two items.
6 items remaining. 6 % 3 = 0. Take those 6 items in groups of three.
To go to your example of 7 items:
7 items remaining. 7 % 3 != 0. Since 7 > 4, take two items.
5 items remaining. 5 % 3 != 0. Since 5 > 4, take two items.
3 items remaining. 3 % 3 = 0. Take those 3 items as a group.
And here's the result for 4 items:
4 items remaining. 4 % 3 != 0. Since remaining = 4, take two items.
2 items remaining. 2 % 3 != 0. 2 < 4. Fall to else condition, take remaining items.
I think that'll work. At least, at 12:30 a.m. it seems like it should work.
if ((list.Count % 2) == 0)
{
//Display all as [][]
[][]
}
else
{
//Display all as [][]
[][]
//Display last 3 as [][][]
}
How about pseudo-code
if n mod 3 = 1
first 2 rows have 2 items each (assuming n >= 4)
all remaining rows have 3 items
else if n mod 3 = 2
first row has 2 items
all remaining rows have 3 items
else
all rows have 3 items
So, given that: a) the objective is to minimize the number of rows, b) a row cannot have more than 3 items, c) a row should have 3 items if possible, and d) you cannot have a row with a single item unless it is the only item, I would say the algorithm goes as follows:
If there is only one item, it will be alone in its own row; done.
Calculate the 'tentative' number of rows by dividing the number of items by 3.
If the remainder (N % 3) is 0, then all rows will have 3 items.
If the remainder is 1, then there will be an additional row, and the last 2 rows will only have 2 items each.
If the remainder is 2, then there will be an additional row, and it will only have 2 items.
This algorithm will produce a slightly different format from the one you were envisioning, (the 3-item rows will be at the top, the 2-item rows will be at the bottom,) but it satisfies the constraints. If you need the 2-item rows to be at the top, you can modify it.
I am looking now for some time about how can a programmer simulate a AI decision based on percentages of actions for the final fantasy tactic-like games (strategy game).
Say for example that the AI character has the following actions:
Attack 1: 10%
Attack 2: 9%
Magic : 4%
Move : 1%
All of this is far from equaling 100%
Now at first I though about having an array with 100 empty slots, attack would have 10 slots, attack 2 9 slots on the array. Combining random I could get the action to do then. My problem here is it is not really efficient, or doesn't seem to be. Also important thing, what do I do if I get on an empty slot. Do I have to calculate for each character all actions based on 100% or define maybe a "default" action for everyone ?
Or maybe there is a more efficient way to see all of this ? I think that percentage is the easiest way to implement an AI.
The best answer I can come up with is to make a list of all the possible moves you want the character to have, give each a relative value, then scale all of them to total 100%.
EDIT:
For example, here are three moves I have. I want attack and magic to be equally likely, and fleeing to be half as likely as attacking or using magic:
attack = 20
magic = 20
flee = 10
This adds up to 50, so dividing each by this total gives me a fractional value (multiply by 100 for percentage):
attack = 0.4
magic = 0.4
flee = 0.2
Then, I would make from this a list of cumulative values (i.e. each entry is a sum of that entry and all that came before it):
attack = 0.4
magic = 0.8
flee = 1
Now, generate a random number between 0 and 1 and find the first entry in the list that is greater than or equal to that number. That is the move you make.
No, you just create threshholds. One simple way is:
0 - 9 -> Attack1
10 - 18 -> Attack 2
19 - 22 -> Magic
23 -> Move
Something else -> 24-99 (you need to add up to 100)
Now create a random number and mod it by 100 (so num = randomNumber % 100) to define your action. The better the random number to close to a proper distribution you will get. So you take the result and see which category it falls into. You can actually make this even more efficient but it is a good start.
Well if they don't all add up to 100 they aren't really percentages. This doesnt matter though. you just need to figure out the relative probability of each action. To do this use the following formula...
prob = value_of_action / total_value_of_all_actions
This gives you a number between 0 and 1. if you really want a percentage rather than a fraction, multiply it by 100.
here is an example:
prob_attack = 10 / (10 + 9 + 4 + 1)
= 10 / 24
= 0.4167
This equates to attack being chosen 41.67% of the time.
you can then generate thresholds as is mentioned in other answers. And use a random number between 0 and 1 to choose your action.