MathDotNet: Sampling from distributions without casting

MathDotNet: Sampling from distributions without casting - c#

How does one sample from a distribution in MathDotNet without having to cast to the specific distribution?
I have a distribution d which could be any random variable, being passed around as an IDistribution. Now, I wish to sample from it. I want to do this with having to do as few casts as possible on the actual distribution itself (I don't want a giant case statement with a ton of casts to really specific distribution types like Bernoulli, Normal, etc.
I have tried the following code, for an IDistribution d who is of type Bernoulli, with a mean of around 0.99:
Console.WriteLine("Mean is " + ((Bernoulli)d).Mean);
Console.WriteLine("Casted sample is " + ((Bernoulli)d).Sample());
Console.WriteLine("Sample is " + d.RandomSource.NextDouble());
The first print statement prints 0.99, as expected.
The second print statement tends to return 1, as expected, since 99% of the time it should return 1.
The third print statement seems to be giving me what looks like a uniform random variable between 0 or 1 (NB: It might not be uniform, that's just with a quick eyeball test on print statements, but it's definitely NOT Bernoulli with mean 0.99).
How can I sample generally from the appropriate distribution?

This is what I am currently doing. I would like to avoid the if statement, but for now, this works. If someone has a better answer, it would be preferable:
if (distribution is IContinuousDistribution){
value = (double)((IContinuousDistribution)distribution).Sample();
}else{
value = (double)((IDiscreteDistribution)distribution).Sample();
}

Related

C# Check if variable isn't same and set or just set [duplicate]

I'm doing a bit of coding, where I have to write this sort of code:
if( array[i]==false )
array[i]=true;
I wonder if it should be re-written as
array[i]=true;
This raises the question: are comparisions faster than assignments?
What about differences from language to language? (contrast between java & cpp, eg.)
NOTE: I've heard that "premature optimization is the root of all evil." I don't think that applies here :)

This isn't just premature optimization, this is micro-optimization, which is an irrelevant distraction.
Assuming your array is of boolean type then your comparison is unnecessary, which is the only relevant observation.

Well, since you say you're sure that this matters you should just write a test program and measure to find the difference.
Comparison can be faster if this code is executed on multiple variables allocated at scattered addresses in memory. With comparison you will only read data from memory to the processor cache, and if you don't change the variable value when the cache decides to to flush the line it will see that the line was not changed and there's no need to write it back to the memory. This can speed up execution.

Edit: I wrote a script in PHP. I just noticed that there was a glaring error in it meaning the best-case runtime was being calculated incorrectly (scary that nobody else noticed!)
Best case just beats outright assignment but worst case is a lot worse than plain assignment. Assignment is likely fastest in terms of real-world data.
Output:
assignment in 0.0119960308075 seconds
worst case comparison in 0.0188510417938 seconds
best case comparison in 0.0116770267487 seconds
Code:
<?php
$arr = array();
$mtime = explode(" ", microtime());
$starttime = $mtime[1] + $mtime[0];
reset_arr($arr);
for ($i=0;$i<10000;$i++)
$arr[i] = true;
$mtime = explode(" ", microtime());
$firsttime = $mtime[1] + $mtime[0];
$totaltime = ($firsttime - $starttime);
echo "assignment in ".$totaltime." seconds<br />";
reset_arr($arr);
for ($i=0;$i<10000;$i++)
if ($arr[i])
$arr[i] = true;
$mtime = explode(" ", microtime());
$secondtime = $mtime[1] + $mtime[0];
$totaltime = ($secondtime - $firsttime);
echo "worst case comparison in ".$totaltime." seconds<br />";
reset_arr($arr);
for ($i=0;$i<10000;$i++)
if (!$arr[i])
$arr[i] = false;
$mtime = explode(" ", microtime());
$thirdtime = $mtime[1] + $mtime[0];
$totaltime = ($thirdtime - $secondtime);
echo "best case comparison in ".$totaltime." seconds<br />";
function reset_arr($arr) {
for ($i=0;$i<10000;$i++)
$arr[$i] = false;
}

I believe if comparison and assignment statements are both atomic(ie one processor instruction) and the loop executes n times, then in the worst-case comparing then assigning would require n+1(comparing on every iteration plus setting the assignement) executions whereas constantly asssigning the bool would require n executions. Therefore the second one is more efficient.

Depends on the language. However looping through arrays can be costly as well. If the array is in consecutive memory, the fastest is to write 1 bits (255s) across the entire array with memcpy assuming your language/compiler can do this.
Thus performing 0 reads-1 write total, no reading/writing the loop variable/array variable (2 reads/2 writes each loop) several hundred times.

I really wouldn't expect there to be any kind of noticeable performance difference for something as trivial as this so surely it comes down to what gives you clear, more readable code. I my opinion that would be always assigning true.

Might give this a try:
if(!array[i])
array[i]=true;
But really the only way to know for sure is to profile, I'm sure pretty much any compiler would see the comparison to false as unnecessary and optimize it out.

It all depends on the data type. Assigning booleans is faster than first comparing them. But that may not be true for larger value-based datatypes.

As others have noted, this is micro-optimization.
(In politics or journalism, this is known as navel-gazing ;-)
Is the program large enough to have more than a couple layers of function/method/subroutine calls?
If so, it probably had some avoidable calls, and those can waste hundreds as much time as low-level inefficiencies.
On the assumption that you have removed those (which few people do), then by all means run it 10^9 times under a stopwatch, and see which is faster.

Why would you even write the first version? What's the benefit of checking to see if something is false before setting it true. If you always are going to set it true, then always set it true.
When you have a performance bottleneck that you've traced back to setting a single boolean value unnecessarily, come back and talk to us.

I remember in one book about assembly language the author claimed that if condition should be avoided, if possible.
It is much slower if the condition is false and execution has to jump to another line, considerably slowing down performance. Also since programs are executed in machine code, I think 'if' is slower in every (compiled) language, unless its condition is true almost all the time.

If you just want to flip the values, then do:
array[i] = !array[i];
Performance using this is actually worse though, as instead of only having to do a single check for a true false value then setting, it checks twice.
If you declare a 1000000 element array of true,false, true,false pattern comparision is slower. (var b = !b) essentially does a check twice instead of once

.NET query getting inappropriate answer

I was at career fair and I was asked a question "what does this query do and in which language it is".
I told that it is in .NET LINQ, but was unable to predict what it does.
Can any one help me
I wrote in .NET and tried .
var youShould = from c
in "3%.$#9/52#2%35-%#4/#./3,!#+%23 !2#526%N#/-"
select (char)(c ^ 3 << 5);
Label1.Text = youShould.ToString();
And got this output :
System.Linq.Enumerable+WhereSelectEnumerableIterator`2[System.Char,System.Char]

First of all, don't feel bad that you didn't get the answer. I know exactly what's going on here and I'd have probably just laughed and walked away if someone asked me what this did.
There's a number of things going on here, but start with the output:
var youShould = from c in "3%.$#9/52#2%35-%#4/#./3,!#+%23 !2#526%N#/-"
select (char)(c ^ 3 << 5);
Label1.Text = youShould.ToString();
>>> System.Linq.Enumerable+WhereSelectEnumerableIterator`2[System.Char,System.Char]
When you run a LINQ query, or use any of the equivalent methods like Select() that return sets, what you get back is a special, internal type of object called an "iterator", specifically, an object that implements the IEnumerable interface. .NET uses these objects all over the place; for example, the foreach loop's purpose is to iterate over iterators.
One of the most important things to know about these kind of objects is that just creating one doesn't actually "do" anything. The iterator doesn't actually contain a set of things; rather, it contains the instructions needed to produce a set of things. If you try to do something like, e.g. ToString on it, the result you get won't be very useful.
However, it does tell us one thing: it tells us that this particular iterator takes a source list of type char and returns a new set, also of type char. (I know that because I know that's what the two generic parameters of a "select iterator" do.) To get the actual results out of this iterator you just need to loop over it somehow, e.g.:
foreach (var c in youShould)
{
myLabel.Text += c;
}
or, slightly easier,
myLabel.Text = new string(youShould.ToArray());
To actually figure out what it does, you have to also recognize the second fact: LINQ treats a string as a "set of characters". It is going to process each character in that string, one at a time, and perform the bit-wise operations on the value.
The long-form equivalent of that query is something like this:
var input= "3%.$#9/52#2%35-%#4/#./3,!#+%23 !2#526%N#/-";
var output = string.Empty;
foreach (var c in input)
{
var i = (int)c;
var i2 = i ^ (3 << 5);
var c2= (char)i2;
output += c2;
}
If you did the math by hand you'd get the correct output message. To save you the brain-numbing exercise, I'll just tell you: it toggles bits 5 and 6 of the ASCII value, changing each character to one further up the ASCII table. The resulting string is:
SEND YOUR RESUME TO [an email address]
Demostrative .NET Fiddle: https://dotnetfiddle.net/x7UvYA

For each character in the string, project the character by xor-ing it with (3 left shifted by 5), then cast the numeric value back to a char.
You could generate your own code strings by running the query again over an uncoded string, because if you XOR a number twice by the same value, you'll be left with the same number you started with. (e.g. x ^ y ^ y = x)
I'll leave it as an exercise to the reader to figure out what the following is:
4()3#)3#!#$5-"#4%34
I suppose it tests:
Linq to objects
Understanding of IEnumerable<T> interface and how it relates to strings
Casting
Bitwise operations
Bitwise operator precedence
Personally, I think this is a useless test that doesn't really reflect real world problems.

Named numbers as variables [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I've seen this a couple of times recently in high profile code, where constant values are defined as variables, named after the value, then used only once. I wondered why it gets done?
E.g. Linux Source (resize.c)
unsigned five = 5;
unsigned seven = 7;
E.g. C#.NET Source (Quaternion.cs)
double zero = 0;
double one = 1;

Naming numbers is terrible practice, one day something will need to change, and you'll end up with unsigned five = 7.
If it has some meaning, give it a meaningful name. The 'magic number' five is no improvement over the magic number 5, it's worse because it might not actually equal 5.
This kind of thing generally arises from some cargo-cult style programming style guidelines where someone heard that "magic numbers are bad" and forbade their use without fully understanding why.

Well named variables
Giving proper names to variables can dramatically clarify code, such as
constant int MAXIMUM_PRESSURE_VALUE=2;
This gives two key advantages:
The value MAXIMUM_PRESSURE_VALUE may be used in many different places, if for whatever reason that value changes you need to change it in only one place.
Where used it immediately shows what the function is doing, for example the following code obviously checks if the pressure is dangerously high:
if (pressure>MAXIMUM_PRESSURE_VALUE){
//without me telling you you can guess there'll be some safety protection in here
}
Poorly named variables
However, everything has a counter argument and what you have shown looks very like a good idea taken so far that it makes no sense. Defining TWO as 2 doesn't add any value
constant int TWO=2;
The value TWO may be used in many different places, perhaps to double things, perhaps to access an index. If in the future you need to change the index you cannot just change to int TWO=3; because that would affect all the other (completely unrelated) ways you've used TWO, now you'd be tripling instead of doubling etc
Where used it gives you no more information than if you just used "2". Compare the following two pieces of code:
if (pressure>2){
//2 might be good, I have no idea what happens here
}
or
if (pressure>TWO){
//TWO means 2, 2 might be good, I still have no idea what happens here
}
Worse still (as seems to be the case here) TWO may not equal 2, if so this is a form of obfuscation where the intention is to make the code less clear: obviously it achieves that.
The usual reason for this is a coding standard which forbids magic numbers but doesn't count TWO as a magic number; which of course it is! 99% of the time you want to use a meaningful variable name but in that 1% of the time using TWO instead of 2 gains you nothing (Sorry, I mean ZERO).
this code is inspired by Java but is intended to be language agnostic

Short version:
A constant five that just holds the number five is pretty useless. Don't go around making these for no reason (sometimes you have to because of syntax or typing rules, though).
The named variables in Quaternion.cs aren't strictly necessary, but you can make the case for the code being significantly more readable with them than without.
The named variables in ext4/resize.c aren't constants at all. They're tersely-named counters. Their names obscure their function a bit, but this code actually does correctly follow the project's specialized coding standards.
What's going on with Quaternion.cs?
This one's pretty easy.
Right after this:
double zero = 0;
double one = 1;
The code does this:
return zero.GetHashCode() ^ one.GetHashCode();
Without the local variables, what does the alternative look like?
return 0.0.GetHashCode() ^ 1.0.GetHashCode(); // doubles, not ints!
What a mess! Readability is definitely on the side of creating the locals here. Moreover, I think explicitly naming the variables indicates "We've thought about this carefully" much more clearly than just writing a single confusing return statement would.
What's going on with resize.c?
In the case of ext4/resize.c, these numbers aren't actually constants at all. If you follow the code, you'll see that they're counters and their values actually change over multiple iterations of a while loop.
Note how they're initialized:
unsigned three = 1;
unsigned five = 5;
unsigned seven = 7;
Three equals one, huh? What's that about?
See, what actually happens is that update_backups passes these variables by reference to the function ext4_list_backups:
/*
* Iterate through the groups which hold BACKUP superblock/GDT copies in an
* ext4 filesystem. The counters should be initialized to 1, 5, and 7 before
* calling this for the first time. In a sparse filesystem it will be the
* sequence of powers of 3, 5, and 7: 1, 3, 5, 7, 9, 25, 27, 49, 81, ...
* For a non-sparse filesystem it will be every group: 1, 2, 3, 4, ...
*/
static unsigned ext4_list_backups(struct super_block *sb, unsigned *three,
unsigned *five, unsigned *seven)
They're counters that are preserved over the course of multiple calls. If you look at the function body, you'll see that it's juggling the counters to find the next power of 3, 5, or 7, creating the sequence you see in the comment: 1, 3, 5, 7, 9, 25, 27, &c.
Now, for the weirdest part: the variable three is initialized to 1 because 30 = 1. The power 0 is a special case, though, because it's the only time 3x = 5x = 7x. Try your hand at rewriting ext4_list_backups to work with all three counters initialized to 1 (30, 50, 70) and you'll see how much more cumbersome the code becomes. Sometimes it's easier to just tell the caller to do something funky (initialize the list to 1, 5, 7) in the comments.
So, is five = 5 good coding style?
Is "five" a good name for the thing that the variable five represents in resize.c? In my opinion, it's not a style you should emulate in just any random project you take on. The simple name five doesn't communicate much about the purpose of the variable. If you're working on a web application or rapidly prototyping a video chat client or something and decide to name a variable five, you're probably going to create headaches and annoyance for anyone else who needs to maintain and modify your code.
However, this is one example where generalities about programming don't paint the full picture. Take a look at the kernel's coding style document, particularly the chapter on naming.
GLOBAL variables (to be used only if you really need them) need to
have descriptive names, as do global functions. If you have a function
that counts the number of active users, you should call that
"count_active_users()" or similar, you should not call it "cntusr()".
...
LOCAL variable names should be short, and to the point. If you have
some random integer loop counter, it should probably be called "i".
Calling it "loop_counter" is non-productive, if there is no chance of it
being mis-understood. Similarly, "tmp" can be just about any type of
variable that is used to hold a temporary value.
If you are afraid to mix up your local variable names, you have another
problem, which is called the function-growth-hormone-imbalance syndrome.
See chapter 6 (Functions).
Part of this is C-style coding tradition. Part of it is purposeful social engineering. A lot of kernel code is sensitive stuff, and it's been revised and tested many times. Since Linux is a big open-source project, it's not really hurting for contributions — in most ways, the bigger challenge is checking those contributions for quality.
Calling that variable five instead of something like nextPowerOfFive is a way to discourage contributors from meddling in code they don't understand. It's an attempt to force you to really read the code you're modifying in detail, line by line, before you try to make any changes.
Did the kernel maintainers make the right decision? I can't say. But it's clearly a purposeful move.

My organisation have certain programming guidelines, one of which is the use of magic numbers...
eg:
if (input == 3) //3 what? Elephants?....3 really is the magic number here...
This would be changed to:
#define INPUT_1_VOLTAGE_THRESHOLD 3u
if (input == INPUT_1_VOLTAGE_THRESHOLD) //Not elephants :(
We also have a source file with -200,000 -> 200,000 #defined in the format:
#define MINUS_TWO_ZERO_ZERO_ZERO_ZERO_ZERO -200000
which can be used in place of magic numbers, for example when referencing a specific index of an array.
I imagine this has been done for "Readability".

The numbers 0, 1, ... are integers. Here, the 'named variables' give the integer a different type. It might be more reasonable to specify these constant (const unsigned five = 5;)

I've used something akin to that a couple times to write values to files:
const int32_t zero = 0 ;
fwrite( &zero, sizeof(zero), 1, myfile );
fwrite accepts a const pointer, but if some function needs a non const pointer, you'll end up using a non const variable.
P.S.: That always keeps me wondering what may be the sizeof zero .

How do you come to a conslusion that it is used only once? It is public, it could be used any number of times from any assembly.
public static readonly Quaternion Zero = new Quaternion();
public static readonly Quaternion One = new Quaternion(1.0f, 1.0f, 1.0f, 1.0f);
Same thing applies to .Net framework decimal class. which also exposes public constants like this.
public const decimal One = 1m;
public const decimal Zero = 0m;

Numbers are often given a name when these numbers have special meaning.
For example in the Quaternion case the identity quaternion and unit length quaternion have special meaning and are frequently used in a special context. Namely Quaternion with (0,0,0,1) is an identity quaternion so it's a common practice to define them instead of using magic numbers.
For example
// define as static
static Quaternion Identity = new Quaternion(0,0,0,1);
Quaternion Q1 = Quaternion.Identity;
//or
if ( Q1.Length == Unit ) // not considering floating point error

One of my first programming jobs was on a PDP 11 using Basic. The Basic interpreter allocated memory to every number required, so every time the program mentioned 0, a byte or two would be used to store the number 0. Of course back in those days memory was a lot more limited than today and so it was important to conserve.
Every program in that work place started with:
10 U0%=0
20 U1%=1
That is, for those who have forgotten their Basic:
Line number 10: create an integer variable called U0 and assign it the number 0
Line number 20: create an integer variable called U1 and assign it the number 1
These variables, by local convention, never held any other value, so they were effectively constants. They allowed 0 and 1 to be used throughout the program without wasting any memory.
Aaaaah, the good old days!

some times it's more readable to write:
double pi=3.14; //Constant or even not constant
...
CircleArea=pi*r*r;
instead of:
CircleArea=3.14*r*r;
and may be you would use pi more again (you are not sure but you think it's possible later or in other classes if they are public)
and then if you want to change pi=3.14 into pi=3.141596 it's easier.
and some other like e=2.71, Avogadro and etc.

Replace values located close to themselves with the mean value

I'm not sure if SO is the proper place for asking this, if it's not, I will remove the question and try it in some other place. Said that I'm trying to do the following:
I have a List<double> and want to replace the block of values whose values are situated very close (say 0.75 in this example) to a single value, representing the mean of the replaced values.
The values that are isolated, or alone, should not be modified.
Also the replaced block can't be longer than 5.
Computing the mean value for each interval from 0, 5, 10.. would not provide the expected results.
It happened many times that LINQ power surprised me gladly and I would be happy if someone could guide me in the creation of this little method.
What I've thought is to first find the closest values for each one, calculate the distance, if the distance is less than minimum (0.75 in this example) then assign those values to the same block.
When all values are assigned to their blocks run a second loop that replaces each block (with one value, or many values) to its mean.
The problem that I have in this approach is to assign the "block": if several values are together, I need to check if the evaluating value is contained in another block, and if it so, the new value should be in that block too.
I don't know if this is the right way of doing this or I'm over complicating it.
EDIT: the expected result:
Although you see two axes only one is used, the List is 1D, I should have drawn only the X axis.
The length of the lines that are represented is irrelevant. It's just to mark on the axis where the value is situated.

It turns out that MSDN has already done this, and provided an in-depth example application with code:
Data Clustering - Detecting Abnormal Data Using k-Means Clustering

C# - Suggestions of control statement needed

I'm a student and I got a homework i need some minor help with =)
Here is my task:
Write an application that prompts the user to enter the size of a square and display a square of asterisks with the sides equal with entered integer. Your application works for side’s size from 2 to 16. If the user enters a number less than 2 or greater then 16, your application should display a square of size 2 or 16, respectively, and an error message.
This is how far I've come:
start:
int x;
string input;
Console.Write("Enter a number between 2-16: ");
input = Console.ReadLine();
x = Int32.Parse(input);
Console.WriteLine("\n");
if (x <= 16 & x >= 2)
{
control statement
code
code
code
}
else
{
Console.WriteLine("You must enter a number between 2 and 16");
goto start;
}
I need help with...
... what control statment(if, for, while, do-while, case, boolean) to use inside the "if" control.
My ideas are like...
do I write a code that writes out the boxes for every type of number entered? That's a lot of code...
..there must be a code containing some "variable++" that could do the task for me, but then what control statement suits the task best?
But if I use a "variable++" how am I supposed to write the spaces in the output, because after all, it has to be a SQUARE?!?! =)
I'd love some suggestions on what type of statements to use, or maybe just a hint, of course not the whole solution as I am a student!

It's not the answer you're looking for, but I do have a few suggestions for clean code:
Your use of Int32.Parse is a potential exception that can crash the application. Look into Int32.TryParse (or just int.TryParse, which I personally think looks cleaner) instead. You'll pass it what it's parsing and an "out" parameter of the variable into which the value should be placed (in this case, x).
Try not to declare your variables until you actually use them. Getting into the habit of declaring them all up front (especially without instantiated values) can later lead to difficult to follow code. For my first suggestions, x will need to be declared ahead of time (look into default in C# for default instantiation... it's, well, by default, but it's good information to understand), but the string doesn't need to be.
Try to avoid using goto when programming :) For this code, it would be better to break out the code which handles the value and returns what needs to be drawn into a separate method and have the main method just sit around and wait for input. Watch for hard infinite loops, though.
It's never too early to write clean and maintainable code, even if it's just for a homework assignment that will never need to be maintained :)

You do not have to write code for every type of number entered. Instead, you have to use loops (for keyword).
Probably I must stop here and let you do the work, but I would just give a hint: you may want to do it with two loops, one embedded in another.
I have also noted some things I want to comment in your code:
Int32.Parse: do not use Int32, but int. It will not change the meaning of your code. I will not explain why you must use int instead: it is quite difficult to explain, and you would understand it later for sure.
Avoid using goto statement, except if you were told to use it in the current case by your teacher.
Console.WriteLine("\n");: avoid "\n". It is platform dependent (here, Linux/Unix; on Windows it's "\r\n", and on MacOS - "\n\r"). Use Environment.NewLine instead.
x <= 16 & x >= 2: why & and not ||?
You can write string input = Console.ReadLine(); instead of string input; followed by input = Console.ReadLine();.

Since it's homework, we can't give you the answer. But here are some hints (assuming solid *'s, not white space in-between):
You're going to want to iterate from 1 to N. See for (int...
There's a String constructor that will allow you to avoid the second loop. Look at all of the various constructors.
Your current error checking does not meet the specifications. Read the spec again.
You're going to throw an exception if somebody enters a non-parsable integer.
goto's went out of style before bell-bottoms. You actually don't need any outer control for the spec you were given, because it's "one shot and go". Normally, you would write a simple console app like this to look for a special value (e.g., -1) and exit when you see that value. In that case you would use while (!<end of input>) as the outer control flow.

If x is greater or equal to 16, why not assign 16 to it (since you'll eventually need to draw a square with a side of length 16) (and add an appropriate message)?

the control statement is:
for (int i = 0; i < x; i++)
{
for ( int j = 0; j < x; j++ )
{
Console.Write("*");
}
Console.WriteLine();
}
This should print a X by X square of asterisks!
I'ma teacher and I left the same task to my students a while ago, I hope you're not one of them! :)

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.