I am trying to change affinity of a program to use Core 1,2,3 and 4 of a CPU. And not the rest of them. I have searched a bit around.. I found this one: How Can I Set Processor Affinity in .NET?
But it didn't help me out..
I have a way to get the numbers of cores the CPU have. So it can adjust how many cores it will change it to. So it won't try to change it to more cores than the CPU got and so on..
Is there any easy way to do this?
I have successfully used the following to put my process on the the first CPU
Console.WriteLine("Press Enter to put the process onto Core 1");
Console.ReadLine();
Process Proc = Process.GetCurrentProcess();
long AffinityMask = (long)Proc.ProcessorAffinity;
AffinityMask &= 0x0001; // Put my process on the First Core
Proc.ProcessorAffinity = (IntPtr)AffinityMask;
Console.WriteLine("Process is now on Core 1");
Console.WriteLine("Press Enter to exit");
Console.ReadLine();
You can check in Task Manager the before and after affinity.
Update:
ProcessorAffinity represents each processor as a bit. Bit 0 represents processor one, bit 1 represents processor two, and so on.
The following table shows a subset of the possible ProcessorAffinity for a four-processor system.
Property value (in hexadecimal) Valid processors
0x0001 1
0x0002 2
0x0003 1 or 2
0x0004 3
0x0005 1 or 3
0x0007 1, 2, or 3
0x000F 1, 2, 3, or 4
Just as an extension to answer by #Rowan Smith, there is an additional way of writing binary numbers in C# 7.0 and higher - binary literals.
To specify hex literal one writes 0x at the beginning of number. For binary literals one should write 0b like this:
0b0000_0000_0000_0001 -> 1
0b0000_0000_0000_0010 -> 2
0b0000_0000_0000_1111 -> 1 & 2 & 3 4
You write them like:
int value = 0b0000_0000_0000_0001;
Some people might find it easier to write it like this instead of recalculating number between hex and binary representation, although the number itself is longer.
Related
I'm doing some entry level programming challenges at codefights.com and I came across the following question. The link is to a blog that has the answer, but it includes the question in it as well. If only it had an explanation...
https://codefightssolver.wordpress.com/2016/10/19/swap-adjacent-bits/
My concern is with the line of code (it is the only line of code) below.
return (((n & 0x2AAAAAAA) >> 1) | ((n & 0x15555555) << 1)) ;
Specifically, I'm struggling to find some decent info on how the "0x2AAAAAAA" and "0x15555555" work, so I have a few dumb questions. I know they represent binary values of 10101010... and 01010101... respectively.
1. I've messed around some and found out that the number of 5s and As corresponds loosely and as far as I can tell to bit size, but how?
2. Why As? Why 5s?
3. Why the 2 and the 1 before the As and 5s?
4. Anything else I should know about this? Does anyone know a cool blog post or website that explains some of this in more detail?
0x2AAAAAAA is 00101010101010101010101010101010 in 32 bits binary,
0x15555555 is 00010101010101010101010101010101 in 32 bits binary.
Note that the problem specifies Constraints: 0 ≤ n < 2^30. For this reason the highest two bits can be 00.
The two hex numbers have been "built" starting from their binary representation, that has a particular property (that we will see in the next paragraph).
Now... We can say that, given the constraint, x & 0x2AAAAAAA will return the even bits of x (if we count the bits as first, second, third... the second bit is even), while x & 0x15555555 will return the odd bits of x. By using << 1 and >> 1 you move them of one step. By using | (or) you re-merge them.
0x2AAAAAAA is used to get 30 bits, which is the constraint.
Constraints:
0 ≤ n < 2^30.
0x15555555 also represent 30 bits with bits opposite of other number.
I would start with binary number (101010101010101010101010101010) in the calculator and select hex using programmer calculator to show the number in hex.
you can also use 0b101010101010101010101010101010 too, if you like, depending on language.
I have a list of entities, and for the purpose of analysis, an entity can be in one of three states. Of course I wish it was only two states, then I could represent that with a bool.
In most cases there will be a list of entities where the size of the list is usually 100 < n < 500.
I am working on analyzing the effects of the combinations of the entities and the states.
So if I have 1 entity, then I can have 3 combinations. If I have two entities, I can have six combinations, and so on.
Because of the amount of combinations, brute forcing this will be impractical (it needs to run on a single system). My task is to find good-but-not-necessarily-optimal solutions that could work. I don't need to test all possible permutations, I just need to find one that works. That is an implementation detail.
What I do need to do is to register the combinations possible for my current data set - this is basically to avoid duplicating the work of analyzing each combination. Every time a process arrives at a certain configuration of combinations, it needs to check if that combo is already being worked at or if it was resolved in the past.
So if I have x amount of tri-state values, what is an efficient way of storing and comparing this in memory? I realize there will be limitations here. Just trying to be as efficient as possible.
I can't think of a more effective unit of storage then two bits, where one of the four "bit states" is not used. But I don't know how to make this efficient. Do I need to make a choice on optimizing for storage size or performance?
How can something like this be modeled in C# in a way that wastes the least amount of resources and still performs relatively well when a process needs to ask "Has this particular combination of tri-state values already been tested?"?
Edit: As an example, say I have just 3 entities, and the state is represented by a simple integer, 1, 2 or 3. We would then have this list of combinations:
111
112
113
121
122
123
131
132
133
211
212
213
221
222
223
231
232
233
311
312
313
321
322
323
331
332
333
I think you can break this down as follows:
You have a set of N entities, each of which can have one of three different states.
Given one particular permutation of states for those N entities, you
want to remember that you have processed that permutation.
It therefore seems that you can treat the N entities as a base-3 number with 3 digits.
When considering one particular set of states for the N entities, you can store that as an array of N bytes where each byte can have the value 0, 1 or 2, corresponding to the three possible states.
That isn't a memory-efficient way of storing the states for one particular permutation, but that's OK because you don't need to store that array. You just need to store a single bit somewhere corresponding to that permutation.
So what you can do is to convert the byte array into a base 10 number that you can use as an index into a BitArray. You then use the BitArray to remember whether a particular permutation of states has been processed.
To convert a byte array representing a base three number to a decimal number, you can use this code:
public static int ToBase10(byte[] entityStates) // Each state can be 0, 1 or 2.
{
int result = 0;
for (int i = 0, n = 1; i < entityStates.Length; n *= 3, ++i)
result += n * entityStates[i];
return result;
}
Given that you have numEntities different entities, you can then create a BitArray like so:
int numEntities = 4;
int numPerms = (int)Math.Pow(numEntities, 3);
BitArray states = new BitArray(numPerms);
Then states can store a bit for each possible permutation of states for all the entities.
Let's suppose that you have 4 entities A, B, C and D, and you have a permutation of states (which will be 0, 1 or 2) as follows: A2 B1 C0 D1. That is, entity A has state 2, B has state 1, C has state 0 and D has state 1.
You would represent that as a boolean array like so:
byte[] permutation = { 2, 1, 0, 1 };
Then you can convert that to a base 10 number like so:
int asBase10 = ToBase10(permutation);
Then you can check if that permutation has been processed like so:
if (!bits[permAsBase10])
{
// Not processed, so process it.
process(permutation);
bits[permAsBase10] = true; // Remember that we processed it.
}
Without getting overly fancy with algorithms and data structures and assuming your tri-state values can be represented in strings and doesn't have a easily determined fix maximum amount. ie. "111", "112", etc (or even "1:1:1", "1:1:2") then a simple SortedSet may end up being fairly efficient.
As a bonus, it doesn't care about the number of values in your set.
SortedSet<string> alreadyTried = new SortedSet<string>();
if(!HasSetBeenTried("1:1:1"){
// do whatever
}
if(!HasSetBeenTried("500:212:100"){
// do whatever
}
public bool HasSetBeenTried(string set){
if(alreadyTried.Contains(set)) return false;
alreadyTried.Add(set);
return true;
}
Simple mathematic says:
3 entities in 3 states makes 27 combinations.
So you need exactly log(27)/log(2) = ~ 4.75 bits to store that information.
Because a pc only can make use of whole bits, you need to "waste" ~0.25 bits and use 5 bits per combination.
The more data you gather, the better you can pack that information, but in the end, maybe a compression algorithm could help even more.
Again: you only asked for memory efficiency, not performance.
In general you can calculate the bits you need by Math.Ceil(Math.Log( noCombinations , 2 )).
I was throwing together a quickie program to take mainframe output blocked in 133 byte lengths, all ending with a CRLF, and it was working except for my calculated number of lines in the output. Because the output size was X pages of 133 bytes with 2 bytes (CRLF) at the end, I was calculating the line count as:
lineCount = fileLength - 2 / 133;
For a file length of 3194, that works out to 24 lines. Take 3194, subtract 2 for the CRLF and you get 3192, and that is divided by 133 to come up with 24. Simple! The crazy thing is, I was getting the lineCount equal to the fileLength!
What could I be doing wrong?
After examining this several times, I finally hit on it! It's a matter of the infamous Order of Operations!
lineCount = fileLength - 2 / 133;
If I evaluate this from left to right, according to my description above, it works fine, but I happen to be a human, not a CPU. Different rule! The computer processor has to use a different rule: MiDAS: multiplications, divisions, additions and subtractions.
My code was calculating 2 / 133, which for integers equals 0. It was then subtracting that 0 from fileLength, and of course set lineCount to that value. I am ancient of days, sort of, and should have known better from the start, but I guess I was in a hurry. The correct code?
int lineCount = ((fileLength - 2) / 133);
So, remember MiDAS and you will be Golden!
NOTE: it's more complicated than this, actually. The full rule encompasses parentheses and exponentiation. For an expanded look at this check Wikipedia for Order of Operations.
In the US the mnemonic is more like: PEMDAS - Please Excuse My Dear Aunt Sally - and refers to Parentheses, Exponents, Multiplications, Divisions, Additions and Subtractions.
This is most probably the dumbest question anyone would ask, but regardless I hope I will find a clear answer for this.
My question is - How is an integer stored in computer memory?
In c# an integer is of size 32 bit. MSDN says we can store numbers from -2,147,483,648 to 2,147,483,647 inside an integer variable.
As per my understanding a bit can store only 2 values i.e 0 & 1. If I can store only 0 or 1 in a bit, how will I be able to store numbers 2 to 9 inside a bit?
More precisely, say I have this code int x = 5; How will this be represented in memory or in other words how is 5 converted into 0's and 1's, and what is the convention behind it?
It's represented in binary (base 2). Read more about number bases. In base 2 you only need 2 different symbols to represent a number. We usually use the symbols 0 and 1. In our usual base we use 10 different symbols to represent all the numbers, 0, 1, 2, ... 8, and 9.
For comparison, think about a number that doesn't fit in our usual system. Like 14. We don't have a symbol for 14, so how to we represent it? Easy, we just combine two of our symbols 1 and 4. 14 in base 10 means 1*10^1 + 4*10^0.
1110 in base 2 (binary) means 1*2^3 + 1*2^2 + 1*2^1 + 0*2^0 = 8 + 4 + 2 + 0 = 14. So despite not having enough symbols in either base to represent 14 with a single symbol, we can still represent it in both bases.
In another commonly used base, base 16, which is also known as hexadecimal, we have enough symbols to represent 14 using only one of them. You'll usually see 14 written using the symbol e in hexadecimal.
For negative integers we use a convenient representation called twos-complement which is the complement (all 1s flipped to 0 and all 0s flipped to 1s) with one added to it.
There are two main reasons this is so convenient:
We know immediately if a number is positive of negative by looking at a single bit, the most significant bit out of the 32 we use.
It's mathematically correct in that x - y = x + -y using regular addition the same way you learnt in grade school. This means that processors don't need to do anything special to implement subtraction if they already have addition. They can simply find the twos-complement of y (recall, flip the bits and add one) and then add x and y using the addition circuit they already have, rather than having a special circuit for subtraction.
This is not a dumb question at all.
Let's start with uint because it's slightly easier. The convention is:
You have 32 bits in a uint. Each bit is assigned a number ranging from 0 to 31. By convention the rightmost bit is 0 and the leftmost bit is 31.
Take each bit number and raise 2 to that power, and then multiply it by the value of the bit. So if bit number three is one, that's 1 x 23. If bit number twelve is zero, that's 0 x 212.
Add up all those numbers. That's the value.
So five would be 00000000000000000000000000000101, because 5 = 1 x 20 + 0 x 21 + 1 x 22 + ... the rest are all zero.
That's a uint. The convention for ints is:
Compute the value as a uint.
If the value is greater than or equal to 0 and strictly less than 231 then you're done. The int and uint values are the same.
Otherwise, subtract 232 from the uint value and that's the int value.
This might seem like an odd convention. We use it because it turns out that it is easy to build chips that perform arithmetic in this format extremely quickly.
Binary works as follows (as your 32 bits).
1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1
2^ 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16......................................0
x
x = sign bit (if 1 then negative number if 0 then positive)
So the highest number is 0111111111............1 (all ones except the negative bit), which is 2^30 + 2 ^29 + 2^28 +........+2^1 + 2^0 or 2,147,483,647.
The lowest is 1000000.........0, meaning -2^31 or -2147483648.
Is this what high level languages lead to!? Eeek!
As other people have said it's a base 2 counting system. Humans are naturally base 10 counters mostly, though time for some reason is base 60, and 6 x 9 = 42 in base 13. Alan Turing was apparently adept at base 17 mental arithmetic.
Computers operate in base 2 because it's easy for the electronics to be either on or off - representing 1 and 0 which is all you need for base 2. You could build the electronics in such a way that it was on, off or somewhere in between. That'd be 3 states, allowing you to do tertiary math (as opposed to binary math). However the reliability is reduced because it's harder to tell the difference between those three states, and the electronics is much more complicated. Even more levels leads to worse reliability.
Despite that it is done in multi level cell flash memory. In these each memory cell represents on, off and a number of intermediate values. This improves the capacity (each cell can store several bits), but it is bad news for reliability. This sort of chip is used in solid state drives, and these operate on the very edge of total unreliability in order to maximise capacity.
Given a time series of sensor state intervals, how do I implement a classifier which learns from supervised training data to detect an incident based on a sequence of state intervals? To simplify the problem, sensor states are reduced to either true or false.
Update: I've found this paper (PDF) on Mining Sequences of Temporal Intervals which addresses a similar problem. Another paper (Google Docs) on Mining Hierarchical Temporal Patterns in Multivariate Time Series takes a novel approach, but deals with hierarchical data.
Example Training Data
The following data is a training example for an incident, represented as a graph over time, where /¯¯¯\ represents a true state interval and \___/ a false state interval for a sensor.
Sensor | Sensor State over time
| 0....5....10...15...20...25... // timestamp
---------|--------------------------------
A | ¯¯¯¯¯¯¯¯¯¯¯¯\________/¯¯¯¯¯¯¯¯
B | ¯¯¯¯¯\___________________/¯¯¯¯
C | ______________________________ // no state change
D | /¯\_/¯\_/¯\_/¯\_/¯\_/¯\_/¯\_/¯
E | _________________/¯¯¯¯¯¯¯¯\___
Incident Detection vs Sequence Labeling vs Classification
I initially generalised my problem as a two-category sequence labeling problem, but my categories really represented "normal operation" and a rare "alarm event" so I have rephrased my question as incident detection. Training data is available for "normal operation" and "alarm incident".
To reduce problem complexity, I have discretized sensor events to boolean values, but this need not be the case.
Possible Algorithms
A hidden Markov model seems to be a possible solution, but would it be able to use the state intervals? If a sequence labeler is not the best approach for this problem, alternative suggestions would be appreciated.
Bayesian Probabilistic Approach
Sensor activity will vary significantly by time of day (busy in mornings, quiet at night). My initial approach would have been to measure normal sensor state over a few days and calculate state probability by time of day (hour). The combined probability of sensor states at an unlikely hour surpassing an "unlikelihood threshold" would indicate an incident. But this seemed like it would raise a false alarm if the sensors were noisy. I have not yet implemented this, but I believe that approach has merit.
Feature Extraction
Vector states could be represented as state interval changes occurring at a specific time and lasting a specific duration.
struct StateInterval
{
int sensorID;
bool state;
DateTime timeStamp;
TimeSpan duration;
}
eg. Some State Intervals from the process table:
[ {D, true, 0, 3} ]; [ {D, false, 4, 1} ]; ...
[ {A, true, 0, 12} ]; [ {B, true, 0, 6} ]; [ {D, true, 0, 3} ]; etc.
A good classifier would take into account state-value intervals and recent state changes to determine if a combination of state changes closely matches training data for a category.
Edit: Some ideas after sleeping on how to extract features from multiple sensors' alarm data and how to compare it to previous data...
Start by calculating the following data for each sensor for each hour of the day:
Average state interval length (for true and false states)
Average time between state changes
Number of state changes over time
Each sensor could then be compared to every other sensor in a matrix with data like the following:
Average time taken for sensor B to change to a true state after sensor A did. If an average value is 60 seconds, then a 1-second wait would be more interesting than a 120-second wait.
Average number of state changes sensor B underwent while sensor A was in one state
Given two sets of training data, the classifier should be able to determine from these feature sets which is the most likely category for classification.
Is this a sensible approach and what would be a good algorithm to compare these features?
Edit: the direction of a state change (false->true vs true-false) is significant, so any features should take that into account.
A simple solution would be collapse the time aspect of your data and take each timestamp as one instance. In this case, the values of the sensors are considered your feature vector, where each time step is labeled with a class value of category A or B (at least for the labeled training data):
sensors | class
A B C D E |
-------------------------
1 1 1 0 0 | catA
1 0 0 0 0 | catB
1 1 0 1 0 | catB
1 1 0 0 0 | catA
..
This input data is fed to the usual classification algorithms (ANN, SVM, ...), and the goal is to predict the class of unlabeled time series:
sensors | class
A B C D E |
-------------------------
0 1 1 1 1 | ?
1 1 0 0 0 | ?
..
An intermediary step of dimensionality reduction / feature extraction could improve the results.
Obviously this may not be as good as modeling the time dynamics of the sequences, especially since techniques such as Hidden Markov Models (HMM) take into account the transitions between the various states.
EDIT
Based on your comment below, it seems that the best way to get less transitory predictions of the target class is to a apply a post-processing rule at the end of the prediction phase, and treating the classification output as a sequence of consecutive predictions.
The way this works is that you would compute the class posterior probabilities (ie: probability distribution that an instance belong to each class label, which in the case of binary SVM are easily derived from the decision function), then given a specified threshold, you check if the probability of the predicted class is above that threshold: if it is we use that class to predict the current timestamp, if not then we keep the previous prediction, and the same goes for future instances. This has the effect of adding a certain inertia to the current prediction.
This doesn't sound like a classification problem. Classifiers aren't really meant to take into account "a combination of state changes." It sounds like a sequence labeling problem. Look into using a Hidden Markov Model or a Conditional Random Field. You can find an efficient implementation of the latter at http://leon.bottou.org/projects/sgd.
Edit:
I've read through your question in a little more detail, and I don't think and HMM is the best model given what you want to do with features. It's going to blow up your state space and could make inference intractable. You need a more expressive model. You could look at Dynamic Bayesian Networks. They generalize HMMs by allowing the state space to be represented in factored form. Kevin Murphy's dissertation is the most thorough resource for them I've come across.
I'll still like CRFs though. Just as an easy place to start, define one with the time of day and each of the sensor readings as the features for each observation and use bigram feature functions. You can see how it performs and increase complexity of your features from there. I would start simple though. I think you're underestimating how difficult some of your ideas will be to implement.
Why reinvent the wheel? Check out TClass
If that doesn't cut it for you, you can find there also a number of pointers. I hope this helps.