Entity annotation has whitespaces in RASA NLU - c#

I was going through the Training Data RASA Format as detailed here.
{
"text": "show me chinese restaurants",
"intent": "restaurant_search",
"entities": [
{
"start": 8,
"end": 15,
"value": "chinese",
"entity": "cuisine"
}
]
}
The substring Chinese is marked as an entity from the 8th to 15th index of the utterance.
I have written a small C# program to verify the correctness of the index of the characters in the utterance.
public class Program
{
public static void Main(string[] args)
{
string s = "show me chinese restaurants";
int i = 0;
foreach(var item in s.ToCharArray())
Console.WriteLine("{0} - {1}", item, i++);
}
}
But when I run the program I get the following output:
s - 0
h - 1
o - 2
w - 3
- 4
m - 5
e - 6
- 7
c - 8
h - 9
i - 10
n - 11
e - 12
s - 13
e - 14
- 15
r - 16
e - 17
s - 18
t - 19
a - 20
u - 21
r - 22
a - 23
n - 24
t - 25
s - 26
Notice the bizarre behavior of the annotation of text the substring Chinese starts at index 8 and ends at 15 with a whitespace.
But the substring Chinese should start at index 8 and end at position 14.
When I train the same text Chinese with indices starting at position 8 and ending at 14. I get Misaligned Entity Annotation warning by RASA as detailed here.
Can someone explain this strange behavior.
Thanks

Reading the link provided I may have come up with a possible explanation:
which together make a python style range to apply to the string, e.g. in the example below, with text="show me chinese restaurants", then text[8:15] == 'chinese'
This lead me down a path that I was thinking
Hmmm that is weird i wonder if python does indexing wierdly
I spun up a quick app to prove this:
text = "show me chinese restaurants"
print(text[8:15])
Now this may not make sense because the character in space 15 of the array here is in all fact a space. Which led me onto thi article:
https://www.pythoncentral.io/how-to-slice-listsarrays-and-tuples-in-python/
It seems that the operator they are using in the example here text[8:15] slices the array, they use the example:
a = [1, 2, 3, 4, 5, 6, 7, 8]
a[1:4] which outputs: [2, 3, 4]
and explains it as such
Let me explain it. The 1 means to start at second element in the list (note that the slicing index starts at 0). The 4 means to end at the fifth element in the list, but not include it. The colon in the middle is how Python's lists recognize that we want to use slicing to get objects in the list.
So it seems that the second parameter of the slicing is exclusive.
Hope this helps
p.s. Had to learn and setup some python stuff :D

Related

Coin change problem: limited coins, interested in number of unique ways to make change

I have a task for uni, the requirements of which are as follows:
There is a collection of coins. For each non-negative integer k, there are two coins with the value 2k, i.e. the collection of coins is {1, 1, 2, 2, 4, 4, 8, 8, ... }
For a given number, I need to write a method that returns the unique number of ways to make change for that amount, given the collection of coins.
For example, if the number passed to the algorithm is 6, the relevant collection of coins would be {1, 1, 2, 2, 4, 4}, the subsets that add up to 6 are {1, 1, 2, 2}, {1, 1, 4}, {2, 4}, {2, 4}, {2, 4} and {2, 4}, the unique subsets are {1, 1, 2, 2}, {1, 1, 4} and {2, 4}, and therefore the total unique ways is 3.
The numbers (and potential combinations) can be very large: the largest number in the tester class is 999,999,999,999,999,999 (1018 - 1), for which the expected result is 29,665,503.
It's apparent to me that the approach should involve dynamic programming. I've used DP once before (for another task where we had to maximise our returns in a 'coin game'), and I've watched lots of videos (such as MIT OCW) on dynamic programming to try and understand how we could solve this particular problem, but I'm quite stuck, with my current confusion as follows:
I'm struggling to understand how we can frame this problem in terms of minimising or maximising something, and therefore how to structure the recurrence relationship. As opposed to trying to determine the minimum number of coins, we're interested in all combinations that work.
There's also the issue that we (I think?) need to keep track of the solutions themselves, otherwise we won't be able to filter out duplicates.
Although it may become apparent as I work out the recurrence relation and how it should be memoized, I feel like space will be an issue: wouldn't we need something like a Z*|C| (where |C| is the size of the array of coins) sized array to store our memoized results? For a Z of 1018, that array would be huge.
At the risk of making this post too long, I've tried to sketch out a few approaches, but always come down to the problem that the recurrence seems like an OR relationship. Something like:
Let z = desired amount
Let A be array of coins, and i be the index in that array
Recurrence relation: DP(i, z) = OR ( DP(i + 1, z), DP(i + 1, z - A[i]) )
// Unsure how to deal with this OR in actual code. We're not saying,
// "Return one of these, whichever is smaller/bigger". We're saying,
// "We want to know if either case works."
Or another approach where you don't actually have an array of coins, but just start at the largest power of 2 less than Z and work down:
Let z = desired amount
Let largestCoin = largestCoinLessThanZ(z) // e.g. for z = 6, largestCoin = 4
findChange(desiredSum, runningTotal, coin):
if runningTotal + coin = desiredSum:
[add path to pile of valid paths]
return ( findChange(desiredSum, runningTotal + coin, coin / 2) // using coin of this denomination once
or findChange(desiredSum, runningTotal + coin + coin, coin / 2) // using coin of this denomination twice
or findChange(desiredSum, runningTotal, coin / 2) // not using coin of this denomination at all
)
Main:
findchange(z, 0, largestCoin)
Sorry for the janky pseudocode - just trying to convey how I've approached it in my head.
In summary, I'm hoping for help understanding the recurrence relationship to solve this problem, and how to deal with potential space constraints. I'm working with C#, but I don't expect code - any math or pseudocode would be greatly appreciated.
I think you can find a solution without going over the array multiple times, or without a O(n^2) compexity.
If you can find the closest 2^k number to your desired change, you can once again think of the remaining amount as a change and calculate the closest 2^k, and once you iterate this you will find which 2^k numbers make up this change.
Let me give an example, your number is 290, the closest 2^k is 256(2^8).
The remaining is 34, the closest 2^k is 32(2^5).
The remaining is 2 so 2^1.
Once you find these:[2^8,2^5,2^1] you can now find the possible combinations that make up those.
{1,1,2,2,4,4,8,8,16,16,32,32,64,64,128,128}(numbers)
{1,2,3,4,5,6,7,8, 9 ,10,11,12,13,14, 15 ,16}(indexes)
So if you want to find the combinations for 256-2^8 there are multiple possibilities:
15 + 16
15 + (13 + 14)
15 + (13 + (11 + 12))
15 + (14 + (11 + 12))
16 + (13 + 14)
16 + (13 + 14)
16 + (13 + (11 + 12))
16 + (14 + (11 + 12))
If you notice, the first 12 elements dont make 128, similarly the first 14 elements add up to 254 which doesnt make up to 256. So the combinations are limited.
You just have to ensure that the same element isn't used in both 2^8 and 2^5.
Hope this helps.
Numbers: 1 1 2 2 4 4 8 8 16 16 32 32 64 64 128 128
Indexes: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Number Count Combinations
1 2 [1][2]
2 3 [4][3][2,1]
4 5 [6][5][4,3][4,2,1][3,2,1]
8 9 [8][7][6,5][6,4,3][5,4,3][6,4,2,1][6,3,2,1][5,4,2,1][5,3,2,1]
16 17 [10] [9] [8,7] [8,6,5] [8,6,4,3] [8,5,4,3]
(Combinations for 16 cont.)
[8,6,4,2,1] [8,6,3,2,1] [8,5,4,2,1] [8,5,3,2,1] [7,6,5] [7,6,4,3] [7,5,4,3]
[7,6,4,2,1] [7,6,3,2,1] [7,5,4,2,1] [7,5,3,2,1]
So, the formula is 2^(k) + 1
Try 16 for example, 16 is 2^4, so 2^(4) + 1 = 17
Or just +1 for the number you want to find the combinations for. If the number is 256 it's combinations will be 257.

How to match x chars from y total in groups of z with each char matching at least w others

Programming up a pattern matching game.
We have 135 symbols. Of those 135, a subset of 108 symbols is used. From the subset of 108, either 18, 21, or 24 symbols are chosen at random. For simplicity, let's stick with 18.
When a symbol is chosen, it can't be used again.
Using groups of 27 symbols at a time, we need to generate the minimum number of groups of 27, making sure that when 18 symbols are chosen at random from the subset of 108, we are guaranteed that 1 of them will match at least 12 other symbols from the 18 randoms in at least 1 of the groups of 27.
Question is, what is the programming logic (we're using C#) to generate the groups of 27 making sure to meet the symbol matching requirements?
If we didn't care about having to match things up, it would be a straight combination/factorial calculation.
Eg, along the lines of:
(135 * 134 * 133 * ... * 27) / (27 * 26 * ... * 1)
But am totally stumped on the best approach to fulfil the matching requirements.
Pseudo logic and/or sample code would be greatly appreciated!
EDIT: trying this example as requested. Hopefully it clears things up. Am going to use numbers since it wouldn't be practical to try and upload 135 image symbols.
So let's say our 135 symbols are the numbers 1-135 inclusive.
Of those 135 numbers, a subset of 108 is chosen. For simplicity, let's use the numbers 1-108.
Pick 18 random numbers from the subset 1-108: let's use 1-18 inclusive in place of symbols.
We need to come up with the minimum number of groups of 27 symbols (numbers in this example) so that at least one group of 27 (from amongst all our groups of 27), will have at least 12 of the 18 random numbers (symbols).
That is, one group will possibly look like:
1,2,3,5,6,7,77,9,10,13,15,30,40,50,60,70,56,43,100,4,103,99,66,8,78,44,55
as it matches 12 of the 18 random symbols (numbers).
Note that the 18 random symbols are chosen after the groups of 27 are chosen. There can be as many groups of 27 as needed.

How is an integer stored in memory?

This is most probably the dumbest question anyone would ask, but regardless I hope I will find a clear answer for this.
My question is - How is an integer stored in computer memory?
In c# an integer is of size 32 bit. MSDN says we can store numbers from -2,147,483,648 to 2,147,483,647 inside an integer variable.
As per my understanding a bit can store only 2 values i.e 0 & 1. If I can store only 0 or 1 in a bit, how will I be able to store numbers 2 to 9 inside a bit?
More precisely, say I have this code int x = 5; How will this be represented in memory or in other words how is 5 converted into 0's and 1's, and what is the convention behind it?
It's represented in binary (base 2). Read more about number bases. In base 2 you only need 2 different symbols to represent a number. We usually use the symbols 0 and 1. In our usual base we use 10 different symbols to represent all the numbers, 0, 1, 2, ... 8, and 9.
For comparison, think about a number that doesn't fit in our usual system. Like 14. We don't have a symbol for 14, so how to we represent it? Easy, we just combine two of our symbols 1 and 4. 14 in base 10 means 1*10^1 + 4*10^0.
1110 in base 2 (binary) means 1*2^3 + 1*2^2 + 1*2^1 + 0*2^0 = 8 + 4 + 2 + 0 = 14. So despite not having enough symbols in either base to represent 14 with a single symbol, we can still represent it in both bases.
In another commonly used base, base 16, which is also known as hexadecimal, we have enough symbols to represent 14 using only one of them. You'll usually see 14 written using the symbol e in hexadecimal.
For negative integers we use a convenient representation called twos-complement which is the complement (all 1s flipped to 0 and all 0s flipped to 1s) with one added to it.
There are two main reasons this is so convenient:
We know immediately if a number is positive of negative by looking at a single bit, the most significant bit out of the 32 we use.
It's mathematically correct in that x - y = x + -y using regular addition the same way you learnt in grade school. This means that processors don't need to do anything special to implement subtraction if they already have addition. They can simply find the twos-complement of y (recall, flip the bits and add one) and then add x and y using the addition circuit they already have, rather than having a special circuit for subtraction.
This is not a dumb question at all.
Let's start with uint because it's slightly easier. The convention is:
You have 32 bits in a uint. Each bit is assigned a number ranging from 0 to 31. By convention the rightmost bit is 0 and the leftmost bit is 31.
Take each bit number and raise 2 to that power, and then multiply it by the value of the bit. So if bit number three is one, that's 1 x 23. If bit number twelve is zero, that's 0 x 212.
Add up all those numbers. That's the value.
So five would be 00000000000000000000000000000101, because 5 = 1 x 20 + 0 x 21 + 1 x 22 + ... the rest are all zero.
That's a uint. The convention for ints is:
Compute the value as a uint.
If the value is greater than or equal to 0 and strictly less than 231 then you're done. The int and uint values are the same.
Otherwise, subtract 232 from the uint value and that's the int value.
This might seem like an odd convention. We use it because it turns out that it is easy to build chips that perform arithmetic in this format extremely quickly.
Binary works as follows (as your 32 bits).
1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1
2^ 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16......................................0
x
x = sign bit (if 1 then negative number if 0 then positive)
So the highest number is 0111111111............1 (all ones except the negative bit), which is 2^30 + 2 ^29 + 2^28 +........+2^1 + 2^0 or 2,147,483,647.
The lowest is 1000000.........0, meaning -2^31 or -2147483648.
Is this what high level languages lead to!? Eeek!
As other people have said it's a base 2 counting system. Humans are naturally base 10 counters mostly, though time for some reason is base 60, and 6 x 9 = 42 in base 13. Alan Turing was apparently adept at base 17 mental arithmetic.
Computers operate in base 2 because it's easy for the electronics to be either on or off - representing 1 and 0 which is all you need for base 2. You could build the electronics in such a way that it was on, off or somewhere in between. That'd be 3 states, allowing you to do tertiary math (as opposed to binary math). However the reliability is reduced because it's harder to tell the difference between those three states, and the electronics is much more complicated. Even more levels leads to worse reliability.
Despite that it is done in multi level cell flash memory. In these each memory cell represents on, off and a number of intermediate values. This improves the capacity (each cell can store several bits), but it is bad news for reliability. This sort of chip is used in solid state drives, and these operate on the very edge of total unreliability in order to maximise capacity.

Generating random numbers for solar energy harvesting using Markov models

How do I generate random numbers using a Markov model in C#? I noticed here that almost all of the applications of the Markov algorithm is for randomly writing text. Is there a source code somewhere or a tutorial where I can fully understand how this works? My goal actually is to generate random numbers to simulate solar energy harvesting.
First decide how deep your Markov model is going. Do you look at the previous number? The previous two numbers? The previous three numbers? Perhaps deeper?
Second, look through some actual solar energy data and extract the probabilities for what follows a group of 1, 2 or 3 numbers. Ideally you will be able to get complete coverage, but there may well be gaps. For those either extrapolate, or put in some average/random value.
All this so far is data.
Third generate the first 1, 2 or 3 numbers. From your database pick the correct combination and randomly select one of the possible followers. When I do this, I have a low probability random element possible as well so things don't get stuck in a rut.
Drop the earliest element of your 1, 2 or 3 numbers. Shift the others down and add the new number at the end. Repeat until you have enough data.
Here is a short extract from my 1-deep Markov word generator showing part of the data table:
// The line addEntry('h', "e 50 a 23 i 12 o 7 # 100") shows the the letter
// 'h' is followed by 'e' 50% of the time, 'a' 23% of the time, 'i' 12% of
// the time, 'o' 7% of the time and otherwise some other letter, '#'.
//
// Figures are taken from Gaines and tweaked. (see 'q')
private void initMarkovTable() {
mMarkovTable = new HashMap<Character, List<CFPair>>(26);
addEntry('a', "n 21 t 17 s 12 r 10 l 8 d 5 c 4 m 4 # 100");
addEntry('b', "e 34 l 17 u 11 o 9 a 7 y 5 b 4 r 4 # 100");
addEntry('c', "h 19 o 19 e 17 a 13 i 7 t 6 r 4 l 4 k 4 # 100");
addEntry('d', "e 16 i 14 a 14 o 10 y 8 s 6 u 5 # 100");
addEntry('e', "r 15 d 10 s 9 n 8 a 7 t 6 m 5 e 4 c 4 o 4 w 4 # 100");
addEntry('f', "t 22 o 21 e 10 i 9 a 7 r 5 f 5 u 4 # 100");
addEntry('g', "e 14 h 14 o 12 r 10 a 8 t 6 f 5 w 4 i 4 s 4 # 100");
addEntry('h', "e 50 a 23 i 12 o 7 # 100");
// ...
}
The data is organised as letter-frequency pairs. I used the '#' character to indicate "pick any letter here". Your data will be number-frequency pairs instead.
To pick an output, I read the appropriate line of the data and generate a random percentage. Scan long the data accumulating the frequencies until the accumulated frequency exceeds the random percantage. That is the letter (or number in your case) you pick.

Using mod operator in C#

Items:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Repeater control I want to place class on highlighted item number.
so ... I have done following code.
if ((DL_NewProducts.Items.Count) % 3 == 0)
{
var libox = e.Item.FindControl("libox") as HtmlGenericControl;
if (libox != null)
libox.Attributes["class"] = "last";
}
Here is problem that in first iteration it find three items, mod work fine and it place class on 4th item but in second iteration it come again on 6th item and place class on 7th item while I want it to place it on 8th what will be correct logic for it..
You are looking for (DL_NewProducts.Items.Count % 4) == 0.
The question isn't completely clear - you have marked the sequence 4, 8, 12, ... in bold but appear to actually want the numbers in the sequence 3, 7, 11... to pass the test.
So I think you're looking for the expression:
DL_NewProducts.Items.Count % 4 == 3
But it's hard to tell since it isn't clear if those numbers at the top represent counts, zero-based indices or one-based indices. If you can clarify exactly what they represent and how they relate to the collection's count, we might be able to provide more appropriate answers.

Categories