R, python or octave: empirical quantile (inverse cdf) with confidence intervals? - c#

I'm looking for a built-in function that returns the sample quantile and an estimated confidence interval in something other than MATLAB (MATLAB's ecdf does this).
I'm guessing R has this built-in and I just haven't found it yet.
If you have any standalone code to do this, you could also point to it here, though I hope to find something that is included as part of a larger open code base.
-Trying to get away from MATLAB.

The survfit function can be used to get the survival function with confidence intervals. Since it is just 1-ecdf, there is a direct relationship between the quantiles. To use this you have to create a variable that says that each of your observations is complete (not censored):
library(survival)
x <- rexp(10)
ev <- rep(1, length(x))
sf <- survfit(Surv(x,ev)~1)
With output:
>summary(sf)
Call: survfit(formula = Surv(x, ev) ~ 1)
time n.risk n.event survival std.err lower 95% CI upper 95% CI
-1.4143 10 1 0.9 0.0949 0.7320 1.000
-1.1229 9 1 0.8 0.1265 0.5868 1.000
-0.9396 8 1 0.7 0.1449 0.4665 1.000
-0.4413 7 1 0.6 0.1549 0.3617 0.995
-0.2408 6 1 0.5 0.1581 0.2690 0.929
-0.1698 5 1 0.4 0.1549 0.1872 0.855
0.0613 4 1 0.3 0.1449 0.1164 0.773
0.1983 3 1 0.2 0.1265 0.0579 0.691
0.5199 2 1 0.1 0.0949 0.0156 0.642
0.8067 1 1 0.0 NaN NA NA
In fact, survfit does calculate the median and its confidence interval, but not the other quantiles:
>sf
Call: survfit(formula = Surv(x, ev) ~ 1)
records n.max n.start events median 0.95LCL 0.95UCL
10.000 10.000 10.000 10.000 -0.205 -0.940 NA
The actual work for of the calculation of the confidence interval of the median is well hidden in the survival:::survmean function, which you could probably use to generalize to other quantiles.

Related

What is Exactly AverageTimer32 in Performance monitoring background?

I read lots of articles, but i really confused about it. it read that is background for calculating AverageTimer32 the system use this formula: ((N1 - N0) / F) / (B1 - B0)
I searched about t this formula i find that:
N1 current reading at time t (provided to the AverageTimer32/64)
N0 reading before, at t – 1 (provided to the AverageTimer32/64)
B1 current counter at t (provided to the AverageBase)
B0 counter before, at t – 1 (provided to the AverageBase)
F Factor to calculate ticks/seconds
the questions:
1-what F and HOW It was calculated? or where is it come from?
2-As the above was said N0 reading before, at t – 1 , what is t-1 ? so if the current time is 01:14:44 how can i get t-1? it talk about second?
3- according to the this formula the AverageTime doesnot give a total average. for example if method A was called 4 times and its takes times in sec (in order): 2 sec , 4 sec, 3 sec , 2 sec, i assume it give us average times like this =>(2+4+3+2)/4 , and if the fifth called take 3 sec so : it give (2+4+3+2+3)/5, but it does not do like this?
and if its possible please explain more about about AverageTimer32 formula.
thanks in adv

Find key or value in Dictionary<double, double> with linear interpolation

I am developing application with .NET 4.5 / C# 6.0 which gets measurement data from a hardware device, which is also configured by my application. E.g. I configure my device with the following:
start frequency: 10 kHz
stop frequency: 20 kHz
measurement points: 11
and at the end of the processing pipeline of my raw data I get a dictionary like the following, where the key is the frequency and the value e.g. the magnitude in dB:
key => value
10k => -3
11k => -3
12k => -3
13k => -3.5
14k => -4
15k => -5
16k => -6
17k => -7
18k => -8
19k => -10
20k => -12
This dictionary is updated "on the fly" as the device is continuously sweeping these 11 points in a loop.
These values are for once displayed in a chart, which is no problem as I simply update the chart data whenever a new point is ready (I get an event for each new point), but I also have a data grid where the user can display the values for manually entered points ([value] means manually entered), e.g.:
| A: f | B: mag(db)
Point 1 | [11 kHz] | -3 dB
Point 2 | [16.5 kHz] | -6.5 dB
Point 3 | 18.5 kHz | [-9.5 dB]
Point 1 is easy as it hits exactly on one measured point, but for e.g. Point 2 the user manually enters 16.5 kHz in column A, which means I need to interpolate the value of column B for the two measurement points next to it.
About the same, but the other way around for Point 3: The user manually entered -9.5 dB in column B and so I need to find the interpolated frequency where this value would be the interpolated result.
Note - the following constraints apply for Point 3:
If the value would be possible twice or more, the first one is used, e.g. for -3 dB it should return 10 kHz
The frequency for the entered value is only searched once and then it behaves the same as Point 2
If the given value is not found the closest one is returned (also valid for Point 2), e.g. values > -3 dB return 10 kHz and values < -12 dB return 20 kHz
Is there some fast/optimized way to get to these values for Point 2 and Point 3?
I could only think of iterating over all point every time a value is manually entered and then interpolate between every two points until the given value is found. Then update column B when one of the neighboring frequencies is updated.
Note: The device delivers up to several hundred points per second.

How is an integer stored in memory?

This is most probably the dumbest question anyone would ask, but regardless I hope I will find a clear answer for this.
My question is - How is an integer stored in computer memory?
In c# an integer is of size 32 bit. MSDN says we can store numbers from -2,147,483,648 to 2,147,483,647 inside an integer variable.
As per my understanding a bit can store only 2 values i.e 0 & 1. If I can store only 0 or 1 in a bit, how will I be able to store numbers 2 to 9 inside a bit?
More precisely, say I have this code int x = 5; How will this be represented in memory or in other words how is 5 converted into 0's and 1's, and what is the convention behind it?
It's represented in binary (base 2). Read more about number bases. In base 2 you only need 2 different symbols to represent a number. We usually use the symbols 0 and 1. In our usual base we use 10 different symbols to represent all the numbers, 0, 1, 2, ... 8, and 9.
For comparison, think about a number that doesn't fit in our usual system. Like 14. We don't have a symbol for 14, so how to we represent it? Easy, we just combine two of our symbols 1 and 4. 14 in base 10 means 1*10^1 + 4*10^0.
1110 in base 2 (binary) means 1*2^3 + 1*2^2 + 1*2^1 + 0*2^0 = 8 + 4 + 2 + 0 = 14. So despite not having enough symbols in either base to represent 14 with a single symbol, we can still represent it in both bases.
In another commonly used base, base 16, which is also known as hexadecimal, we have enough symbols to represent 14 using only one of them. You'll usually see 14 written using the symbol e in hexadecimal.
For negative integers we use a convenient representation called twos-complement which is the complement (all 1s flipped to 0 and all 0s flipped to 1s) with one added to it.
There are two main reasons this is so convenient:
We know immediately if a number is positive of negative by looking at a single bit, the most significant bit out of the 32 we use.
It's mathematically correct in that x - y = x + -y using regular addition the same way you learnt in grade school. This means that processors don't need to do anything special to implement subtraction if they already have addition. They can simply find the twos-complement of y (recall, flip the bits and add one) and then add x and y using the addition circuit they already have, rather than having a special circuit for subtraction.
This is not a dumb question at all.
Let's start with uint because it's slightly easier. The convention is:
You have 32 bits in a uint. Each bit is assigned a number ranging from 0 to 31. By convention the rightmost bit is 0 and the leftmost bit is 31.
Take each bit number and raise 2 to that power, and then multiply it by the value of the bit. So if bit number three is one, that's 1 x 23. If bit number twelve is zero, that's 0 x 212.
Add up all those numbers. That's the value.
So five would be 00000000000000000000000000000101, because 5 = 1 x 20 + 0 x 21 + 1 x 22 + ... the rest are all zero.
That's a uint. The convention for ints is:
Compute the value as a uint.
If the value is greater than or equal to 0 and strictly less than 231 then you're done. The int and uint values are the same.
Otherwise, subtract 232 from the uint value and that's the int value.
This might seem like an odd convention. We use it because it turns out that it is easy to build chips that perform arithmetic in this format extremely quickly.
Binary works as follows (as your 32 bits).
1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1 | 1 1 1 1
2^ 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16......................................0
x
x = sign bit (if 1 then negative number if 0 then positive)
So the highest number is 0111111111............1 (all ones except the negative bit), which is 2^30 + 2 ^29 + 2^28 +........+2^1 + 2^0 or 2,147,483,647.
The lowest is 1000000.........0, meaning -2^31 or -2147483648.
Is this what high level languages lead to!? Eeek!
As other people have said it's a base 2 counting system. Humans are naturally base 10 counters mostly, though time for some reason is base 60, and 6 x 9 = 42 in base 13. Alan Turing was apparently adept at base 17 mental arithmetic.
Computers operate in base 2 because it's easy for the electronics to be either on or off - representing 1 and 0 which is all you need for base 2. You could build the electronics in such a way that it was on, off or somewhere in between. That'd be 3 states, allowing you to do tertiary math (as opposed to binary math). However the reliability is reduced because it's harder to tell the difference between those three states, and the electronics is much more complicated. Even more levels leads to worse reliability.
Despite that it is done in multi level cell flash memory. In these each memory cell represents on, off and a number of intermediate values. This improves the capacity (each cell can store several bits), but it is bad news for reliability. This sort of chip is used in solid state drives, and these operate on the very edge of total unreliability in order to maximise capacity.

Cache-friendly optimization: Object oriented matrix multiplication and in-function tiled matrix multiplication

After writing a matrix class that represents whole matrix in two 1D-buffers using this implementation , I've reached the matrix maltiplication part in my project and inclined to some cache-friendly-optimizations now. Stumbled upon two options(question is in lower part of this page):
1)Selecting blocked/tiled sub-matrices just in the multiplication time.
Done in c++ DLL function, so no function overhead.
Since the code will be more complex, additional optimizations will be harder to apply.
2)Building a matrix class from sub matrix classes(smaller patches) so multiplication is generally done on sub-matrix classes.
Object oriented approach leaves space for additional optimizations for sub-matrices.
Object headers and padding behavior of C# can help overcome critical strides?
Function overhead can be a problem after calling it many times instead of a few times.
Example matrix multiplication: C=A.B
A
1 2 3 4 is used as 1 2 3 4
4 3 4 2 4 3 4 2
1 3 1 2
1 1 1 2 1 3 1 2
1 1 1 2
B
1 1 1 1 ---> 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1
Multiplication: 1 2 * 1 1 + 3 4 * 1 1 ==> upper-left tile of result
4 3 1 1 4 2 1 1
same for the upper-right of result
1 3 * 1 1 + 1 2 * 1 1 ==> lower left tile of result
1 1 1 1 1 2 1 1
same for lower-right tile of result
Multiplication is O(n³) but summation is O(n²).
Question: Has anyone tried both(functional and object oriented) and made performance comparisons? Right now, my naive multiplication without any of these cache targeted optimizations, takes:
Matrix Size Single Threaded Time Multithreaded Time
* 128x128 : 5 ms 1ms-5ms(time sample error is bigger)
* 256x256 : 25 ms 7 ms
* 512x512 : 140 ms 35 ms
* 1024x1024 : 1.3 s 260 ms
* 2048x2048 : 11.3 s 2700 ms
* 4096x4096 : 88.1 s 24 s
* 8192x8192 : 710 s 177 s
Giga-multiplications of variables per second
Single threaded Multithreaded Multi/single ratio
* 128x128 : 0.42 2.0 - 0.4 ?
* 256x256 : 0.67 2.39 3.67x
* 512x512 : 0.96 3.84 4.00x
* 1024x1024 : 0.83 3.47 4.18x
* 2048x2048 : 0.76 3.18 4.18x
* 4096x4096 : 0.78 2.86 3.67x
* 8192x8192 : 0.77 3.09 4.01x
(average results for 1.4GHz fx8150 with avx-optimized code using 32-bit floats)(c++ avx-intrinsics in dll functions within Parallel.For() of visual studio C#)
Which size of matrices above could be suffered from cache misses, critical strides and other bad things? Do you know how can I get performance counters of those using intrinsics?
Thans for your time.
Edit: Inlining optimization within DLL:
Matrix Size Single Threaded Time Multithreaded Time Multi/Single radio
* 128x128 : 1 ms(%400) 390us avrage in 10k iterations(6G mult /s)
* 256x256 : 12 ms(%108 faster) 2 ms (%250 faster) 6.0x
* 512x512 : 73 ms(%92 faster) 15 ms (%133 faster) 4.9x
* 1024x1024 : 1060 ms(%22 faster) 176 ms (%48 faster) 6.0x
* 2048x2048 : 10070 ms(%12 faster) 2350 ms (%15 faster) 4.3x
* 4096x4096 : 82.0 s(%7 faster) 22 s (%9 faster) 3.7x
* 8192x8192 : 676 s(%5 faster) 174 s (%2 faster) 4.1x
After the inlining, the shadowed performance of smaller multiplications become visible.
There is still DLL-function-C# overhead. 1024x1024 case seems to be starting point of cache-misses. While work is increased by only seven times, the execution time is increased to fifteen times.
Edit:: Going to try Strassen's algorithm for 3-layers deep with object oriented approach this week. Main matrix will be composed of 4 sub matrices. Then they will be composed of 4 sub-subs each. Then they will be composed of 4 sub-sub-subs each. This should give nearly (8/7)(8/7)(8/7)= +%50 speedup. If it works, will convert DLL-function to patch-optimized one which will use more cache.
Applying Strassen's Algorithm for just one layer(such as four of 256x256 as 512x512) as an object-oriented approach(the super class is Strassen and the submatrices are matrix classes):
Matrix Size Single Threaded Time Multithreaded Time Multi/Single radio
* 128x128 : **%50 slowdown** **slowdown**
* 256x256 : **%30 slowdown** **slowdown**
* 512x512 : **%10 slowdown** **slowdown**
* 1024x1024 : 540 ms(%96 faster) 130 ms (%35 faster) 4.15
* 2048x2048 : 7500 ms(%34 faster) 1310 ms (%79 faster) 5.72
* 4096x4096 : 70.2 s(%17 faster) 17 s (%29 faster) 4.13
* 6144x6144 : x 68 s
* 8192x8192 : outOfMemoryException outOfMemoryException
The overhead between DLL-function and C# is still in effect so small matrices could not get faster. But when there is a speedup, its always more than 8/7(%14) because using smaller chunks is better for cache usage.
Will write a benchmark class that repeatedly tests for different leaf sizes of Stressen's Algorithm versus naive one to find the critical size. (for my system, it is 512x512).
Superclass will recursively build the sub-matrix tree until it reaches the 512x512 size and will use naive algorithm for 512x512 nodes. Then in the DLL-function, a patched/blocking algorithm(will add this next week) will make it some more faster. But I dont know how to select proper size of patch because I dont know how to get cache-line size of cpu. Will search for that after recursive Strassen is done.
My implementation of Strassen's Algorithm needs five times more memory(working on it).
Edit: some of recursivity is done, updating the table as resuts come.
Matrix Size Single Threaded Time Multithreaded Time
* 2048x2048 : x 872 ms average(double layer)
* 2560x2560 : x 1527 ms average(double layer)
Parallelized the tree parsing, decreased memory footprint and introduced full recursivity:
Matrix Size Single Threaded Time Multithreaded Time m/s
* 1024x1024 : 547 ms 123 ms average(single layer) 4.45x
* 2048x2048 : 3790 ms 790 ms average(double layer) 4.79x
* 4096x4096 : 26.4 s 5440 ms average(triple layer) 4.85x
* 8192x8192 : 185 s 38 s average(quad layer) 4.87x
* 8192x8192(4GHz): x 15 s average(quad layer) 4.87x
Multiplications per second (x10^9):
Matrix Size Single Threaded Multithreaded
* 1024x1024 : 1.71 7.64 (single layer)
* 2048x2048 : 1.73 8.31 (double layer)
* 4096x4096 : 1.74 8.45 (triple layer)
* 8192x8192 : 1.74 8.48 (quad layer)
* 8192x8192(4GHz): x 21.49 (quad layer)
Strassen's cpu flops is multiplied by 7/8 for each layer.
Just found out that using a similarly priced gpu can do 8kx8k under 1 second using opencl.

Why does -2 % 360 give -2 instead of 358 in c#

Microsoft Mathematics and Google's calculator give me 358 for -2 % 360, but C# and windows calculator are outputting -2 ... which is the right answer ?
The C# compiler is doing the right thing according to the C# specification, which states that for integers:
The result of x % y is the value produced by x – (x / y) * y.
Note that (x/y) always rounds towards zero.
For the details of how remainder is computed for binary and decimal floating point numbers, see section 7.8.3 of the specification.
Whether this is the "right answer" for you depends on how you view the remainder operation. The remainder must satisfy the identity that:
dividend = quotient * divisor + remainder
I say that clearly -2 % 360 is -2. Why? Well, first ask yourself what the quotient is. How many times does 360 go into -2? Clearly zero times! 360 doesn't go into -2 at all. If the quotient is zero then the remainder must be -2 in order to satisfy the identity. It would be strange to say that 360 goes into -2 a total of -1 times, with a remainder of 358, don't you think?
Which is the right answer?
Both answers are correct. It's merely a matter of convention which value is returned.
Both, see Modulo operation on Wikipedia.
I found this very easy to understand explanation at http://mathforum.org/library/drmath/view/52343.html
There are different ways of thinking about remainders when you deal
with negative numbers, and he is probably confusing two of them. The
mod function is defined as the amount by which a number exceeds the
largest integer multiple of the divisor that is not greater than that
number. In this case, -340 lies between -360 and -300, so -360 is the
greatest multiple LESS than -340; we subtract 60 * -6 = -360 from -340
and get 20:
-420 -360 -300 -240 -180 -120 -60 0 60 120 180 240 300 360
--+----+----+----+----+----+----+----+----+----+----+----+----+----+--
| | | |
-360| |-340 300| |340
|=| |==|
20 40
Working with a positive number like 340, the multiple we subtract is
smaller in absolute value, giving us 40; but with negative numbers, we
subtract a number with a LARGER absolute value, so that the mod
function returns a positive value. This is not always what people
expect, but it is consistent.
If you want the remainder, ignoring the sign, you have to take the
absolute value before using the mod function.
Doctor Peterson, The Math Forum
http://mathforum.org/dr.math/
IMO, -2 is much easier to understand and code with. If you divide -2 by 360, your answer is 0 remainder -2 ... just as dividing 2 by 360 is 0 remainder 2. It's not as natural to consider that 358 is also the remainder of -2 mod 360.
From wikipedia:
if the remainder is nonzero, there are two possible choices for the
remainder, one negative and the other positive, and there are also two
possible choices for the quotient. Usually, in number theory, the
positive remainder is always chosen, but programming languages choose
depending on the language and the signs of a and n.[2] However, Pascal
and Algol68 do not satisfy these conditions for negative divisors, and
some programming languages, such as C89, don't even define a result if
either of n or a is negative.

Categories