I'm performing a large number of calculations. Each calculation is independent of every other, in other words, the task could be parallelized and I'd like to offset the job to the GPU.
Specifically, I'm creating light/shadow maps for an OpenGL application, and the calculations are a bunch of Vector math, dot products, square roots, etc.
What are my options here? Does OpenGL natively support anything like this, or should I be looking for an external library/module?
Compute shader is the generic for CUDA, which is like an enhanced compute for nVidia. Note you don't need to use either, you can do calaculations using a vertex -> geomerty stream, or render to a pixel shader. So long as you can represent the results as a collection of values (a vertex buffer or texture), you can use the rendering pipeline to do your maths.
Related
What I'm trying to do: I want to compress a 2D grey-scale map (2D array of float values between 0 and 1) into a DFT. I then want to be able to sample the value of points in continuous coordinates (i.e. arbitrary points in between the data points in the original 2D map).
What I've tried: So far I've looked at Exocortex and some similar libraries, but they seem to be missing functions for sampling a single point or performing lossy compression. Though the math is a bit above my level, I might be able to derive methods do do these things. Ideally someone can point me to a C# library that already has this functionality. I'm also concerned that libraries that use the row-column FFT algorithm don't produce sinusoid functions that can be easily sampled this way since they unwind the 2D array into a 1D array.
More detail on what I'm trying to do: The intended application for all this is an experiment in efficiently pre-computing, storing, and querying line of sight information. This is similar to the the way spherical harmonic light probes are used to approximate lighting on dynamic objects. A grid of visibility probes store compressed visibility data using a small number of float values each. From this grid, an observer position can calculate an interpolated probe, then use that probe to sample the estimated visibility of nearby positions. The results don't have to be perfectly accurate, this is intended as first pass that can cheaply identify objects that are almost certainly visible or obscured, and then maybe perform more expensive ray-casting on the few on-the-fence objects.
In Accord.net framework, two classes are used to construct a Gabor filter:
Accord.Math.Gabor
Accord.Imaging.Filters.GaborFilter
There are various implementations of Gabor filter elsewhere:
How to convolve an image with different Gabor filters adjusted according to the local orientation and density using FFT?
Gabor Filter – Image processing for scientists and engineers, Part 6
https://github.com/clumsy/gabor-filter/blob/master/src/main/java/GaborFilter.java
https://github.com/dominiklessel/opencv-gabor-filter/blob/master/gaborFilter.cpp
https://github.com/adriaant/Gabor-API
but, the source codes in Accord.net look very strange to me. They discuss 3 types of kernels:
Real
Imaginary
Magnitude
SquaredMagnitude
Can anyone either explain the latter 3 (Real is self-explanatory) types or refer me to some materials where I can study them?
The Gabor kernel g(t) is complex-valued. It is a quadrature filter, meaning that, in the frequency domain (G(f)), it has no negative frequencies. Thus, the even and odd parts of this frequency response are related by even(G(f)) = odd(G(f)) * sign(f). That is, the even and odd parts have the same values for positive frequencies, but inverse values for negative frequencies. Adding up the even and odd part leads thus to the negative frequencies canceling out, and the positive frequencies reinforcing each other.
The even part of the (real-valued) frequency response corresponds to an even and real-valued kernel. The odd part corresponds to an odd and imaginary-valued kernel. The even kernel is a windowed cosine, the odd kernel is a windowed sine.
The Gabor filer is applied by convolving the image with these two components, then taking the magnitude of the result.
The magnitude of the filter itself is just a Gaussian smoothing kernel (it's the window over the sine and cosine). Note that cos^2+sin^2=1, so the magnitude doesn't show the wave component of the kernel. The code you linked that computes the magnitude of the Gabor kernel does a whole lot of pointless computations... :)
I am working on writing an application that contains line plots of large datasets.
My current strategy is to load up my data for each channel into 1D vertex buffers.
I then use a vertex shader when drawing to assemble my buffers into vertices (so I can reuse one of my buffers for multiple sets of data)
This is working pretty well, and I can draw a few hundred million data-points, without slowing down too much.
To stretch things a bit further I would like to reduce the number of points that actually get drawn, though simple reduction (I.e. draw every n points) as there is not much point plotting 1000 points that are all represented by a single pixel)
One way I can think of doing this is to use a geometry shader and only emit every N points but I am not sure if this is the best plan of attack.
Would this be the recommended way of doing this?
You can do this much simpler by adjusting the stride of all vertex attributes to N times the normal one.
I have as small c# project that involves matrices. I am processing large amounts of data by splitting it into n-length chunks, treating the chucks as vectors, and multiplying by a Vandermonde** matrix. The problem is, depending on the conditions, the size of the chucks and corresponding Vandermonde** matrix can vary. I have a general solution which is easy to read, but way too slow:
public byte[] addBlockRedundancy(byte[] data) {
if (data.Length!=numGood) D.error("Expecting data to be just "+numGood+" bytes long");
aMatrix d=aMatrix.newColumnMatrix(this.mod, data);
var r=vandermonde.multiplyBy(d);
return r.ToByteArray();
}//method
This can process about 1/4 megabytes per second on my i5 U470 # 1.33GHz. I can make this faster by manually inlining the matrix multiplication:
int o=0;
int d=0;
for (d=0; d<data.Length-numGood; d+=numGood) {
for (int r=0; r<numGood+numRedundant; r++) {
Byte value=0;
for (int c=0; c<numGood; c++) {
value=mod.Add(value, mod.Multiply(vandermonde.get(r, c), data[d+c]));
}//for
output[r][o]=value;
}//for
o++;
}//for
This can process about 1 meg a second.
(Please note the "mod" is performing operations over GF(2^8) modulo my favorite irreducible polynomial.)
I know this can get a lot faster: After all, the Vandermonde** matrix is mostly zeros. I should be able to make a routine, or find a routine, that can take my matrix and return a optimized method which will effectively multiply vectors by the given matrix, but faster. Then, when I give this routine a 5x5 Vandermonde matrix (the identity matrix), there is simply no arithmetic to perform, and the original data is just copied.
** Please note: What I use the term "Vandermonde", I actually mean an Identity matrix with some number of rows from the Vandermonde matrix appended (see comments). This matrix is wonderful because of all the zeros, and because if you remove enough rows (of your choosing) to make it square, it is an invertible matrix. And, of course, I would like to use this same routine to convert any one of those inverted matrices into an optimized series of instructions.
How can I make this matrix multiplication faster?
Thanks!
(edited to correct my mistake with Vandermonde matrix)
Maybe you can define a matrix interface and build implementations at runtime using Reflection.Emit.
IMatrix m = MatrixGenerator.CreateMatrix(data);
m.multiplyBy(...)
Here, MatrixGenerator.CreateMatrix will create a tailored IMatrix implementation, with full loop unrolling, and further code pruning (0 cell, identity, etc). MatrixGenerator.CreateMatrix may cache matrices to avoid recreating it later for the same set of data.
I've seen solutions using Reflection.Emit, and I've seen solutions which involve TPL. The real answer here is, for most situations, that you want to use an existing unmanaged library such as Intel MKL via P/Invoke. Alternatively, if you are using the GPU, you can go with the GPGPU approach which would go a lot faster.
And yes, SSE together with multi-core processing is the fastest way to do it on a CPU. But I wouldn't recommend writing your own algorithm - instead, go look for something that's already out there. Most likely, it will end up being a C++ library, possibly with a C# wrapper.
While it won't speed up the math, you could at least use all your cores with the Parallel.For in .Net 4.0. Microsoft link
From the math perspective
You could look at Eigen Spaces, Eigen Vectors, Eigen Values. I'm not sure what your application does and if it will help.
You could look at LU Decomposition.
All of the above topics can be found at wikipedia
From a programming perspective
You could try SIMD, but they are designed for 4x4 matrices to do homogeneous transformations of 3D space, mostly for computer graphics.
You could write special algorithms for your most common dimensions.
Using SSE in c# is it possible?
I want to do smoothing to an image in the frequency domain. when i use google to see any articles it gave some Matlab codes which i don't need. i could do FFT to an image but i don't know how to implement any smoothing techniques(ILPF, BLPF, IHPF, BHPF) in frequency domain. if you can provide any code samples for any of the above techniques WITHOUT using any image processing libraries it will be really helpful and C# is preferred.
Thanks,
Could you define what you mean by 'smoothing in the frequency domain'? You can generate a spectrum image using FFT and multiply the image by some function to attenuate particular frequencies, then convert the spectrum back to an image using the inverse-FFT. However, for this kind of filtering (multiplication by some scaling function in frequency), you can achieve the same result more quickly by convolving with the dual function in the spatial domain.
In any case, if you wish to implement this yourself, read up on FFT (the fast Fourier transform) and convolution. You might also check out a signal processing textbook, if you're interested, as the theory behind discrete filtering is fairly deep. The algorithms won't make a whole lot of sense without that theory, though you can certainly apply them without understanding them.
If you want to implement your own DSP algorithms, check out this book online. In particular, Ch 33 describes the math and algorithm behind Butterworth filter design. Ch 12 describes how to implement FFT.
There is a great series on Code Project by Christian Graus which you might find useful, especially part 2 which deals amongst others with smoothing filters:
Image Processing for Dummies with C# and GDI+ Part 1 - Per Pixel Filters
Image Processing for Dummies with C# and GDI+ Part 2 - Convolution Filters
Image Processing for Dummies with C# and GDI+ Part 3 - Edge Detection Filters
Image Processing for Dummies with C# and GDI+ Part 4 - Bilinear Filters and Resizing
Image Processing for Dummies with C# and GDI+ Part 5 - Displacement filters, including swirl
Image Processing for Dummies with C# and GDI+ Part 6 - The HSL color space
Keshan, it is simple. Imagine the FFT is another two pictures where low frequencies lie in the middle and high frequencies away from the middle. If the pixels are numbered from -w/2 to w/2 and -h/2 to h/2 you can simply measure the distance from the middle as a(x,y)=sqrt(x^2+y^2). Then take some arbitrary monotonic decreasing function like f(x)=1/(1+x) and multiply each point in the fft with f(a(x,y)). Then transform back using the FFT.
There are different choices for f(x) which will look different. For example a gaussian function or bessel or whatever. I did this for my undergrad and it was great fun. If you send me a mail I will send you my program :-).
One bit caveat is the ordering in output of the fft. The arrays it generates can be ordered in weird ways. It is important that you find out which array index corresponds to which x/y-position in the "analytical" fourier transform!
For all image/signal processing I recommend OpenCV.
This has a managed C# wrapper: Emgu.
http://www.emgu.com/wiki/index.php/Main_Page