How do I generate random numbers using a Markov model in C#? I noticed here that almost all of the applications of the Markov algorithm is for randomly writing text. Is there a source code somewhere or a tutorial where I can fully understand how this works? My goal actually is to generate random numbers to simulate solar energy harvesting.
First decide how deep your Markov model is going. Do you look at the previous number? The previous two numbers? The previous three numbers? Perhaps deeper?
Second, look through some actual solar energy data and extract the probabilities for what follows a group of 1, 2 or 3 numbers. Ideally you will be able to get complete coverage, but there may well be gaps. For those either extrapolate, or put in some average/random value.
All this so far is data.
Third generate the first 1, 2 or 3 numbers. From your database pick the correct combination and randomly select one of the possible followers. When I do this, I have a low probability random element possible as well so things don't get stuck in a rut.
Drop the earliest element of your 1, 2 or 3 numbers. Shift the others down and add the new number at the end. Repeat until you have enough data.
Here is a short extract from my 1-deep Markov word generator showing part of the data table:
// The line addEntry('h', "e 50 a 23 i 12 o 7 # 100") shows the the letter
// 'h' is followed by 'e' 50% of the time, 'a' 23% of the time, 'i' 12% of
// the time, 'o' 7% of the time and otherwise some other letter, '#'.
//
// Figures are taken from Gaines and tweaked. (see 'q')
private void initMarkovTable() {
mMarkovTable = new HashMap<Character, List<CFPair>>(26);
addEntry('a', "n 21 t 17 s 12 r 10 l 8 d 5 c 4 m 4 # 100");
addEntry('b', "e 34 l 17 u 11 o 9 a 7 y 5 b 4 r 4 # 100");
addEntry('c', "h 19 o 19 e 17 a 13 i 7 t 6 r 4 l 4 k 4 # 100");
addEntry('d', "e 16 i 14 a 14 o 10 y 8 s 6 u 5 # 100");
addEntry('e', "r 15 d 10 s 9 n 8 a 7 t 6 m 5 e 4 c 4 o 4 w 4 # 100");
addEntry('f', "t 22 o 21 e 10 i 9 a 7 r 5 f 5 u 4 # 100");
addEntry('g', "e 14 h 14 o 12 r 10 a 8 t 6 f 5 w 4 i 4 s 4 # 100");
addEntry('h', "e 50 a 23 i 12 o 7 # 100");
// ...
}
The data is organised as letter-frequency pairs. I used the '#' character to indicate "pick any letter here". Your data will be number-frequency pairs instead.
To pick an output, I read the appropriate line of the data and generate a random percentage. Scan long the data accumulating the frequencies until the accumulated frequency exceeds the random percantage. That is the letter (or number in your case) you pick.
Related
I have data like that:
Time(seconds from start)
Value
15
2
16
4
19
2
25
9
There are a lot of entries (10000+), and I need a way to find fast enough sum of any time range, like sum of range 16-25 seconds (which would be 4+2+9=15). This data will be dynamically changed many times (always adding new entries at the bottom of list).
I am thinking about using sorted list + binary search to determinate positions and just make sum of values, but is can took too much time to calculate it. Is there are any more appropriate way to do so? Nuget packets or algorithm references would be appreciated.
Just calculate cumulative sum:
Time Value CumulativeSum
15 2 2
16 4 6
19 2 8
25 9 17
Then for range [16,25] it will be task to binary search left border of 16 and 25 exact, which turns into 17 - 2 = 15
Complexity: O(log(n)), where n - size of the list.
Binary search implementation for lower/upper bound can be found in my repo - https://github.com/eocron/Algorithm/blob/master/Algorithm/Sorted/BinarySearchExtensions.cs
I was going through the Training Data RASA Format as detailed here.
{
"text": "show me chinese restaurants",
"intent": "restaurant_search",
"entities": [
{
"start": 8,
"end": 15,
"value": "chinese",
"entity": "cuisine"
}
]
}
The substring Chinese is marked as an entity from the 8th to 15th index of the utterance.
I have written a small C# program to verify the correctness of the index of the characters in the utterance.
public class Program
{
public static void Main(string[] args)
{
string s = "show me chinese restaurants";
int i = 0;
foreach(var item in s.ToCharArray())
Console.WriteLine("{0} - {1}", item, i++);
}
}
But when I run the program I get the following output:
s - 0
h - 1
o - 2
w - 3
- 4
m - 5
e - 6
- 7
c - 8
h - 9
i - 10
n - 11
e - 12
s - 13
e - 14
- 15
r - 16
e - 17
s - 18
t - 19
a - 20
u - 21
r - 22
a - 23
n - 24
t - 25
s - 26
Notice the bizarre behavior of the annotation of text the substring Chinese starts at index 8 and ends at 15 with a whitespace.
But the substring Chinese should start at index 8 and end at position 14.
When I train the same text Chinese with indices starting at position 8 and ending at 14. I get Misaligned Entity Annotation warning by RASA as detailed here.
Can someone explain this strange behavior.
Thanks
Reading the link provided I may have come up with a possible explanation:
which together make a python style range to apply to the string, e.g. in the example below, with text="show me chinese restaurants", then text[8:15] == 'chinese'
This lead me down a path that I was thinking
Hmmm that is weird i wonder if python does indexing wierdly
I spun up a quick app to prove this:
text = "show me chinese restaurants"
print(text[8:15])
Now this may not make sense because the character in space 15 of the array here is in all fact a space. Which led me onto thi article:
https://www.pythoncentral.io/how-to-slice-listsarrays-and-tuples-in-python/
It seems that the operator they are using in the example here text[8:15] slices the array, they use the example:
a = [1, 2, 3, 4, 5, 6, 7, 8]
a[1:4] which outputs: [2, 3, 4]
and explains it as such
Let me explain it. The 1 means to start at second element in the list (note that the slicing index starts at 0). The 4 means to end at the fifth element in the list, but not include it. The colon in the middle is how Python's lists recognize that we want to use slicing to get objects in the list.
So it seems that the second parameter of the slicing is exclusive.
Hope this helps
p.s. Had to learn and setup some python stuff :D
I have a sql server compact framework database and I want to query an int column
the column can contain values from 1 to 99999999 and I am interested in the right 4 digits
Examples:
1 -> 1
12 -> 12
123 -> 123
1234 -> 1234
12345 -> 2345
123456 -> 3456
I could convert the result to string and use substring, but there is propably a better solution.
Use Modulo
select 123456 % 10000
SQLFiddle demo
If you
SELECT WhateverField % 10000
you will get the 4 rightmost digits.
SELECT RIGHT(column_name, n)
-- n is number of digits
You can do Mod 10 for getting the last digit of any number.
Example : 12345%10 = 5
so if you want to get n number of last digits u n number of zeros :
example : if you want last 4 digits use 4 zero's after 1
Sql command: select column% 10000;
ex: 123456-> 123456/10000 = 3456
you can use Division by 10 for getting the first Digits.
This is most probably the dumbest question anyone would ask, but regardless I hope I will find a clear answer for this.
My question is - How is an integer stored in computer memory?
In c# an integer is of size 32 bit. MSDN says we can store numbers from -2,147,483,648 to 2,147,483,647 inside an integer variable.
As per my understanding a bit can store only 2 values i.e 0 & 1. If I can store only 0 or 1 in a bit, how will I be able to store numbers 2 to 9 inside a bit?
More precisely, say I have this code int x = 5; How will this be represented in memory or in other words how is 5 converted into 0's and 1's, and what is the convention behind it?
It's represented in binary (base 2). Read more about number bases. In base 2 you only need 2 different symbols to represent a number. We usually use the symbols 0 and 1. In our usual base we use 10 different symbols to represent all the numbers, 0, 1, 2, ... 8, and 9.
For comparison, think about a number that doesn't fit in our usual system. Like 14. We don't have a symbol for 14, so how to we represent it? Easy, we just combine two of our symbols 1 and 4. 14 in base 10 means 1*10^1 + 4*10^0.
1110 in base 2 (binary) means 1*2^3 + 1*2^2 + 1*2^1 + 0*2^0 = 8 + 4 + 2 + 0 = 14. So despite not having enough symbols in either base to represent 14 with a single symbol, we can still represent it in both bases.
In another commonly used base, base 16, which is also known as hexadecimal, we have enough symbols to represent 14 using only one of them. You'll usually see 14 written using the symbol e in hexadecimal.
For negative integers we use a convenient representation called twos-complement which is the complement (all 1s flipped to 0 and all 0s flipped to 1s) with one added to it.
There are two main reasons this is so convenient:
We know immediately if a number is positive of negative by looking at a single bit, the most significant bit out of the 32 we use.
It's mathematically correct in that x - y = x + -y using regular addition the same way you learnt in grade school. This means that processors don't need to do anything special to implement subtraction if they already have addition. They can simply find the twos-complement of y (recall, flip the bits and add one) and then add x and y using the addition circuit they already have, rather than having a special circuit for subtraction.
This is not a dumb question at all.
Let's start with uint because it's slightly easier. The convention is:
You have 32 bits in a uint. Each bit is assigned a number ranging from 0 to 31. By convention the rightmost bit is 0 and the leftmost bit is 31.
Take each bit number and raise 2 to that power, and then multiply it by the value of the bit. So if bit number three is one, that's 1 x 23. If bit number twelve is zero, that's 0 x 212.
Add up all those numbers. That's the value.
So five would be 00000000000000000000000000000101, because 5 = 1 x 20 + 0 x 21 + 1 x 22 + ... the rest are all zero.
That's a uint. The convention for ints is:
Compute the value as a uint.
If the value is greater than or equal to 0 and strictly less than 231 then you're done. The int and uint values are the same.
Otherwise, subtract 232 from the uint value and that's the int value.
This might seem like an odd convention. We use it because it turns out that it is easy to build chips that perform arithmetic in this format extremely quickly.
Binary works as follows (as your 32 bits).
1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1
2^ 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16......................................0
x
x = sign bit (if 1 then negative number if 0 then positive)
So the highest number is 0111111111............1 (all ones except the negative bit), which is 2^30 + 2 ^29 + 2^28 +........+2^1 + 2^0 or 2,147,483,647.
The lowest is 1000000.........0, meaning -2^31 or -2147483648.
Is this what high level languages lead to!? Eeek!
As other people have said it's a base 2 counting system. Humans are naturally base 10 counters mostly, though time for some reason is base 60, and 6 x 9 = 42 in base 13. Alan Turing was apparently adept at base 17 mental arithmetic.
Computers operate in base 2 because it's easy for the electronics to be either on or off - representing 1 and 0 which is all you need for base 2. You could build the electronics in such a way that it was on, off or somewhere in between. That'd be 3 states, allowing you to do tertiary math (as opposed to binary math). However the reliability is reduced because it's harder to tell the difference between those three states, and the electronics is much more complicated. Even more levels leads to worse reliability.
Despite that it is done in multi level cell flash memory. In these each memory cell represents on, off and a number of intermediate values. This improves the capacity (each cell can store several bits), but it is bad news for reliability. This sort of chip is used in solid state drives, and these operate on the very edge of total unreliability in order to maximise capacity.
Is there a freely available implementation of finding a maximum weight clique in weighted graph in C#?
You could read the paper "A fast algorithm for the maximum clique problem", and you will find an effective maximum clique algorithm that proposed in this paper. In addition, a maximum weighted algorithm could be found in "A new algorithm for the maximum weighted clique problem". Here is the Pseudo-Code:
1 **FUNCTION CLIQUE(U, size)**
2 if |U| = 0 then
3 if size > max then
4 max ← size
5 New record; save it.
6 found ← true
7 end
8 return
9 end
10 while |U| != ∅ do
11 if size + weight(|U|) <= max then
12 return
13 end
14 i ← min{ j|vj ∈ U}
15 if size + c[i] <= max then
16 return
17 end
18 U ← U ∖ {vi}
19 CLIQUE(U ∩ N(vi); size + weight(vi))
20 if found = true then
21 return
22 end
23 end
24 return
25 **FUNCTION NEW()**
26 max ← 0
27 for i ← n downto 1 do
28 found ← false
29 CLIQUE(Si ∩ N(vi), weight(i))
30 c[i] ← max
31 end
32 return
We assume Si represents vertexes that have larger index than i, for example {vi,vi+1,...,vn}. N(vi) means the adjacent vertexes of vi. The global variable max marks the maximum size of clique that we find for now, and the global variable found marks whether we have found a larger clique. The array c[] record the maximum clique size of Si. size records maximum clique size in local recursion。
There are several prune strategies that could avoid useless search, especially, in line 11 and line 15.
You could use the hash table to implement this algorithm.
Find maximum clique is an NP-hard problem. You can find something useful in Clique problem (Wikipedia).