Complicated Logic for Variation of same rhythm C# - c#

I am working on checking the variation on some integers and filtering it up , so let me say that we start entering integers of 700,768,820,320,790,260,etc... so I want to check if the each integer is within the same rhythm and harmony so 320 will be remove or ignored,
Let us say the variation must not be less the 75% of the lower number and must not the higher than 75%.
Actually the problem is not checking the new entry if it is higher or lower , but the problem if the rhythm or harmony of the entries suddenly become higher , so lets take an example :
768,799,890,320,380,799,820,1230,1300,1340,1342,1400,680,1340,1280,1490 ,
so in that case I we started in a range of 700 ~ 890 so 320 will be filtered out , and when suddenly the range became between 1230 ~ 1400 so 680 . and we cannot expect what would be the range ,
so how to make a logic that can filter out and put the higher and lower limitations?
no need for any code just I need for logical explanation .
Regards...

Related

ML.net FastForestRegressor output not as expected

With using the standard FastForestRegressor like this:
pipeline.Add(new FastForestRegressor());
My expected result is more of an 'average' than a 'this must be it' prediction.
The following image contains time (HH:mm:ss) slots, with a number:
The higher the number, the more likely my prediction should give me that hour from the number. Now, in bold at the bottom you can see the predictions, which are indeed some kind of average of all the given values. it predicts a time that does not even has an entry. What I expect:
Column1: 9:00:00 has 140 values, so it should return a prediction close around this one
Column2: 14:00:00 has 152 vales, the other 2 below are also high, so something in that range between 14:00:00-16:00:00.
I tried to tweak the parameters of the FastForestRegressor but that doesn't seem to change anything at all.
My data is stored as:
time,day
480,1
480,1
..etc.
Now, for the upper left one in the image (8:00:00 110), 110 lines are of values (480,1) are stored in a file. Maybe I should make an extra column with the amount?
To me it seems like I have to adjust some grouping, or smoothing, so it takes the highest possible candidate, and not an average, but I can't seem to find it.

How to identify 5 numbers within 5% of each other out of 6 consecutive numbers

Using C# I have to:
Take an user entered number and decide if 5 out of 6 of these numbers are within 5% of each other. Also identify when it's impossible reach success before going to all 6 numbers. An example of this would be in the first 3 numbers... if 2 were not within 5% of each other, no use continuing.
I've tried taking in the first 2 numbers, into an array, and setting a Min and Max assuming they are within 5%. By the time I get to the 3rd and 4th numbers, which also can reset my Min and Max, it's turned into a bowl of spaghetti :)
Please help!

How does String.Format work in this situation?

I have a website where you can buy stuff, and we want to format the orderID that goes to our portal in certain way. I am using the string.format method to format it like this:
Portal.OrderID = string.Format( "{0}{1:0000000}-{2:000}",
"Z",
this.Order.OrderID,
"000");
So we want it to look like this basically Z0545698-001. My question is, if I am using string.format will it blow up if this.Order.OrderID is greater than 7 characters?
If so, how can I keep the same formatting (i.e. Z 1234567 - 000) but have the first set of numbers (the 1-7) be a minimum of 7 (with any numbers less than 7 in length have leading 0's). And then have anything greater than 7 in length just extend the formatting so I could get an order number like Z12345678-001?
how can I keep the same formatting (i.e. Z 1234567 - 000) but have the first set of numbers (the 1-7) be a minimum of 7 (with any numbers less than 7 in length have leading 0's). And then have anything greater than 7 in length just extend the formatting so I could get an order number like Z12345678-001?
Use exactly the code that you have, because that's what it does.

Create a summary description of a schedule given a list of shifts

Assuming I have a list of shifts for an event (in the format start date/time, end date/time) - is there some sort of algorithm I could use to create a generalized summary of the schedule? It is quite common for most of the shifts to fall into some sort of common recurrence pattern (ie. Mondays from 9:00 am to 1:00 pm, Tuesdays from 10:00 am to 3:00 pm, etc). However, there can (and will be) exceptions to this rule (eg. one of the shifts fell on a holiday and was rescheduled for the next day). It would be fine to exclude those from my "summary", as I'm looking to provide a more general answer of when does this event usually occur.
I guess I'm looking for some sort of statistical method to determine the day and time occurences and create a description based on the most frequent occurences found in the list. Is there some sort of general algorithm for something like this? Has anyone created something similar?
Ideally I'm looking for a solution in C# or VB.NET, but don't mind porting from any other language.
Thanks in advance!
You may use Cluster Analysis.
Clustering is a way to segregate a set of data into similar components (subsets). The "similarity" concept involves some definition of "distance" between points. Many usual formulas for the distance exists, among others the usual Euclidean distance.
Practical Case
Before pointing you to the quirks of the trade, let's show a practical case for your problem, so you may get involved in the algorithms and packages, or discard them upfront.
For easiness, I modelled the problem in Mathematica, because Cluster Analysis is included in the software and very straightforward to set up.
First, generate the data. The format is { DAY, START TIME, END TIME }.
The start and end times have a random variable added (+half hour, zero, -half hour} to show the capability of the algorithm to cope with "noise".
There are three days, three shifts per day and one extra (the last one) "anomalous" shift, which starts at 7 AM and ends at 9 AM (poor guys!).
There are 150 events in each "normal" shift and only two in the exceptional one.
As you can see, some shifts are not very far apart from each other.
I include the code in Mathematica, in case you have access to the software. I'm trying to avoid using the functional syntax, to make the code easier to read for "foreigners".
Here is the data generation code:
Rn[] := 0.5 * RandomInteger[{-1, 1}];
monshft1 = Table[{ 1 , 10 + Rn[] , 15 + Rn[] }, {150}]; // 1
monshft2 = Table[{ 1 , 12 + Rn[] , 17 + Rn[] }, {150}]; // 2
wedshft1 = Table[{ 3 , 10 + Rn[] , 15 + Rn[] }, {150}]; // 3
wedshft2 = Table[{ 3 , 14 + Rn[] , 17 + Rn[] }, {150}]; // 4
frishft1 = Table[{ 5 , 10 + Rn[] , 15 + Rn[] }, {150}]; // 5
frishft2 = Table[{ 5 , 11 + Rn[] , 15 + Rn[] }, {150}]; // 6
monexcp = Table[{ 1 , 7 + Rn[] , 9 + Rn[] }, {2}]; // 7
Now we join the data, obtaining one big dataset:
data = Join[monshft1, monshft2, wedshft1, wedshft2, frishft1, frishft2, monexcp];
Let's run a cluster analysis for the data:
clusters = FindClusters[data, 7, Method->{"Agglomerate","Linkage"->"Complete"}]
"Agglomerate" and "Linkage" -> "Complete" are two fine tuning options of the clustering methods implemented in Mathematica. They just specify we are trying to find very compact clusters.
I specified to try to detect 7 clusters. If the right number of shifts is unknown, you can try several reasonable values and see the results, or let the algorithm select the more proper value.
We can get a chart with the results, each cluster in a different color (don't mind the code)
ListPointPlot3D[ clusters,
PlotStyle->{{PointSize[Large], Pink}, {PointSize[Large], Green},
{PointSize[Large], Yellow}, {PointSize[Large], Red},
{PointSize[Large], Black}, {PointSize[Large], Blue},
{PointSize[Large], Purple}, {PointSize[Large], Brown}},
AxesLabel -> {"DAY", "START TIME", "END TIME"}]
And the result is:
Where you can see our seven clusters clearly apart.
That solves part of your problem: identifying the data. Now you also want to be able to label it.
So, we'll get each cluster and take means (rounded):
Table[Round[Mean[clusters[[i]]]], {i, 7}]
The result is:
Day Start End
{"1", "10", "15"},
{"1", "12", "17"},
{"3", "10", "15"},
{"3", "14", "17"},
{"5", "10", "15"},
{"5", "11", "15"},
{"1", "7", "9"}
And with that you get again your seven classes.
Now, perhaps you want to classify the shifts, no matter the day. If the same people make the same task at the same time everyday, so it's no useful to call it "Monday shift from 10 to 15", because it happens also on Weds and Fridays (as in our example).
Let's analyze the data disregarding the first column:
clusters=
FindClusters[Take[data, All, -2],Method->{"Agglomerate","Linkage"->"Complete"}];
In this case, we are not selecting the number of clusters to retrieve, leaving the decision to the package.
The result is
You can see that five clusters have been identified.
Let's try to "label" them as before:
Grid[Table[Round[Mean[clusters[[i]]]], {i, 5}]]
The result is:
START END
{"10", "15"},
{"12", "17"},
{"14", "17"},
{"11", "15"},
{ "7", "9"}
Which is exactly what we "suspected": there are repeated events each day at the same time that could be grouped together.
Edit: Overnight Shifts and Normalization
If you have (or plan to have) shifts that start one day and end on the following, it's better to model
{Start-Day Start-Hour Length} // Correct!
than
{Start-Day Start-Hour End-Day End-Hour} // Incorrect!
That's because as with any statistical method, the correlation between the variables must be made explicit, or the method fails miserably. The principle could run something like "keep your candidate data normalized". Both concepts are almost the same (the attributes should be independent).
--- Edit end ---
By now I guess you understand pretty well what kind of things you can do with this kind if analysis.
Some references
Of course, Wikipedia, its "references" and "further reading" are good guide.
A nice video here showing the capabilities of Statsoft, but you can get there many
ideas about other things you can do with the algorithm.
Here is a basic explanation of the algorithms involved
Here you can find the impressive functionality of R for Cluster Analysis (R is a VERY good option)
Finally, here you can find a long list of free and commercial software for statistics in general, including clustering.
HTH!
I don't think any ready made algorithm exists, so unfortunately you need to come up with something yourself. Because the problem is not really well defined (from mathematical perspective) it will require testing on some "real" data that would be reasonably representative, and a fair bit of tweaking.
I would start from dividing your shifts into weekdays (because if I understand correctly you are after a weekly view) - so for each weekday we have shifts that happen to be on that day. Then for each day I would group the shifts that happen at the same time (or "roughly" at the same time - here you need to come up with some heuristic, i.e. both start and end times do not deviate from average in the group by more than 15min or 30 mins). Now we need another heuristic to decide if this group is relevant, i.e. if a shift 1pm-3pm on a Monday happened only once it is probably not relevant, but if it happened on at least 70% of Mondays covered by the data then it is relevant. And now your relevant groups for each day of the week will form the schedule you are after.
Could we see an example data set? If it is really "clean" data then you could simply find the mode of the start and end times.
One option would be to label all the start times as +1 and the end times as -1 then create a three column table of times (both start and ends), label (+1 or -1), and number of staff at that time (starts with zero and adds or subtracts staff using the label) and sort the whole thing in time order.
This time series now is a summary descriptor of your staff levels and the labels are also a series as well. Now you can apply time series statistics to both to look for daily, weekly or monthly patterns.

How to manage AI actions based on percentages

I am looking now for some time about how can a programmer simulate a AI decision based on percentages of actions for the final fantasy tactic-like games (strategy game).
Say for example that the AI character has the following actions:
Attack 1: 10%
Attack 2: 9%
Magic : 4%
Move : 1%
All of this is far from equaling 100%
Now at first I though about having an array with 100 empty slots, attack would have 10 slots, attack 2 9 slots on the array. Combining random I could get the action to do then. My problem here is it is not really efficient, or doesn't seem to be. Also important thing, what do I do if I get on an empty slot. Do I have to calculate for each character all actions based on 100% or define maybe a "default" action for everyone ?
Or maybe there is a more efficient way to see all of this ? I think that percentage is the easiest way to implement an AI.
The best answer I can come up with is to make a list of all the possible moves you want the character to have, give each a relative value, then scale all of them to total 100%.
EDIT:
For example, here are three moves I have. I want attack and magic to be equally likely, and fleeing to be half as likely as attacking or using magic:
attack = 20
magic = 20
flee = 10
This adds up to 50, so dividing each by this total gives me a fractional value (multiply by 100 for percentage):
attack = 0.4
magic = 0.4
flee = 0.2
Then, I would make from this a list of cumulative values (i.e. each entry is a sum of that entry and all that came before it):
attack = 0.4
magic = 0.8
flee = 1
Now, generate a random number between 0 and 1 and find the first entry in the list that is greater than or equal to that number. That is the move you make.
No, you just create threshholds. One simple way is:
0 - 9 -> Attack1
10 - 18 -> Attack 2
19 - 22 -> Magic
23 -> Move
Something else -> 24-99 (you need to add up to 100)
Now create a random number and mod it by 100 (so num = randomNumber % 100) to define your action. The better the random number to close to a proper distribution you will get. So you take the result and see which category it falls into. You can actually make this even more efficient but it is a good start.
Well if they don't all add up to 100 they aren't really percentages. This doesnt matter though. you just need to figure out the relative probability of each action. To do this use the following formula...
prob = value_of_action / total_value_of_all_actions
This gives you a number between 0 and 1. if you really want a percentage rather than a fraction, multiply it by 100.
here is an example:
prob_attack = 10 / (10 + 9 + 4 + 1)
= 10 / 24
= 0.4167
This equates to attack being chosen 41.67% of the time.
you can then generate thresholds as is mentioned in other answers. And use a random number between 0 and 1 to choose your action.

Categories