I have 9 stereo camera rigs that are essentially identical. I am calibrating them all with the same methodology:
Capture 25 images of an 8x11 chessboard (the same one for all rigs) in varying positions and orientations
Detect the corners for all images using FindChessboardCorners and refine them using CornerSubPix
Calibrate each camera intrinsics individually using CalibrateCamera
Calibrate the extrinsics using StereoCalibrate passing the CameraMatrix and DistortionCoeffs from #3 and using the FixIntrinsics flag
Compute the rectification transformations using StereoRectify
Then, with a projector using structured light, I place a sphere (the same one for all rigs) of known radius (16 mm) in front of the rigs and measure the sphere using:
Use image processing to match a large number of features between the two cameras in the distorted images
Use UndistortPoints to get their undistorted image locations
Use TriangulatePoints to get the points in homogeneous coordinates
Use ConvertFromHomogeneous to get the points in world coordinates
On two of the rigs, the sphere measurement comes out highly accurate (RMSE 0.034 mm). However, on the other seven rigs, the measurement comes out with an unacceptable RMSE 0.15 mm (5x). Also, the inaccuracy of each of the measurements seems to be skewed vertically. Its as if the sphere is measured "spherical" in the horizontal direction, but slightly skewed vertically with a peak pointing slightly downward.
I have picked my methodology apart for a few weeks and tried almost every variation I can think of. However, after recalibrating the devices multiple times and recapturing sphere measurements multiple times, the same two devices remain spot-on and the other seven devices keep giving the exact same error. Nothing about the calibration results of the 7 incorrect rigs stands out as "erroneous" in comparison to the results of the 2 good rigs other than the sphere measurement. Also, I cannot find anything about the rigs that are significantly different hardware-wise.
I am pulling my hair out at this point and am turning to this fine community to see if anyone notices anything I'm missing in my above described calibration procedure. I've tried every variation I can think of in each step of the above process. However, the process seems valid since it works for 2 of the 9 devices.
Thank you!
Related
I have just recently started to work with OpenCV and image processing in general, so please bear with me
I have the following image to work with:
The gray outline is the result of the tracking algorithm, which I drew in for debugging, so you can ignore that.
I am tracking glowing spheres, so it is easy to turn down the exposure of my camera and then filter out the surrounding noise that remains. So what I have to work with is always a black image with a white circle. Sometimes a little bit of noise makes it through, but generally that's not a problem.
Note that the spheres are mounted on a flat surface, so when held at a specific angle the bottom of the circle might be "cut off", but the Hough transform seems to handle that well enough..
Currently, I use the Hough Transform for getting position and size. However, it jitters a lot around the actual circle, even with very little motion. When in motion, it sometimes loses track entirely and does not detect any circles.
Also, this is in a real-time (30fps) environment, and i have to run two Hough circle transforms, which takes up 30% CPU load on a ryzen 7 cpu...
I have tried using binary images (removing the "smooth" outline of the circle), and changing the settings of the hough transform. With a lower dp value, it seems to be less jittery, but it then is no longer real-time due to the processing needed.
This is basically my code:
ImageProcessing.ColorFilter(hsvFrame, Frame, tempFrame, ColorProfile, brightness);
ImageProcessing.Erode(Frame, 1);
ImageProcessing.SmoothGaussian(Frame, 7);
/* Args: cannyThreshold, accumulatorThreshold, dp, minDist, minRadius, maxRadius */
var circles = ImageProcessing.HoughCircles(Frame, 80, 1, 3, Frame.Width / 2, 3, 80);
if (circles.Length > 0)
...
The ImageProcessing calls are just wrappers to the OpenCV framework (EmguCV)
Is there a better, less jittery and less performance-hungry way or algorithm to detect these kinds of (as i see it) very simple circles? I did not find an answer on the internet that matches these kinds of circles. thank you for any help!
Edit:
This is what the image looks like straight from the camera, no processing:
I feel desperate to see how often people spoil good information by jumping on edge detection and/or Hough transformations.
In this particular case, you have a lovely blob, which can be detected in a fraction of a millisecond and for which the centroid will yield good accuracy. The radius can be obtained just from the area.
You report that in case of motion the Hough becomes jittery; this can be because of motion blur or frame interleaving (depending on the camera). The centroid should be more robust to these effects.
Some objects which I have placed at a position (-19026.65,58.29961, 1157) from the origin (0,0,0) are rendering with issues, the problem is referred to as spatial Jitter (SJ) ref. Like You can check its detail here or You can see the below image. The objects are rendering with black spots/lines or maybe it is Mesh flickering. (Actually, I can't describe the problem, maybe the picture will help you to understand it)
I have also tried to change the camera near and far clipping but it was useless. Why I am getting this? Maybe my object and camera are far away from the origin.
Remember:
I have a large environment and some of my game objects (where the problem is) are at (-19026.65,58.29961, 1157) position. And I guess this is the problem that Object and Camera are very far from the origin (0,0,0). I found numerous discussions which is given below
GIS Terrain and Unity
Unity Coordinates bound Question at unity
Unity Coordinate and Scale - Post
Infinite Runner and Unity Coordinates
I didn't find that what is the minimum or maximum limit to place the Object in unity so that works correctly.
Since the world origin is a vector 3(0,0,0), the max limit that you can place an object would be 3.402823 × 10^38 since it is a floating point. However, as you are finding, this does not necessarily mean that placing something here will insure it works properly. Your limitation will be bound by what other performance factors your have in your game. If you need to have items placed at this point in the world space,consider building objects at runtime based on where the camera is. This will allow the performance to work at different points from the origin.
Unity suggests: not recommended to go any further than 100,000 units away from the center, the editor will warn you. If you notice in today's gaming world, many games move the world around the player rather than the player around the world.
To quote Dave Newson's site Read Here:
Floating Point Accuracy
Unity allows you to place objects anywhere
within the limitations of the float-based coordinate system. The
limitation for the X, Y and Z Position Transform is 7 significant
digits, with a decimal place anywhere within those 7 digits; in effect
you could place an object at 12345.67 or 12.34567, for just two
examples.
With this system, the further away from the origin (0.000000 -
absolute zero) you get, the more floating-point precision you lose.
For example, accepting that one unit (1u) equals one meter (1m), an
object at 1.234567 has a floating point accuracy to 6 decimal places
(a micrometer), while an object at 76543.21 can only have two decimal
places (a centimeter), and is thus less accurate.
The degradation of accuracy as you get further away from the origin
becomes an obvious problem when you want to work at a small scale. If
you wanted to move an object positioned at 765432.1 by 0.01 (one
centimeter), you wouldn't be able to as that level of accuracy doesn't
exist that far away from the origin.
This may not seem like a huge problem, but this issue of losing
floating point accuracy at greater distances is the reason you start
to see things like camera jitter and inaccurate physics when you stray
too far from the origin. Most games try to keep things reasonably
close to the origin to avoid these problems.
I'm working on a school project and my goal is to recognize objects. I started with taking pictures, applying various filters and doing boundary tracing. Fourier descriptors are to high for me, so I started approximating polygons from my List of Points. Now I have to match those polygons, which all have the same amount of vertices and all sites have the same length. More particular, I have two polygons and now I have to calculate some scale of similarity. This process has to be translation, rotation and scale invariant.
I tried turning and scaling one in different ways and calculate the distance between each pair of vertices, but this is very slow.
I tried turning the polygon in a set of vectors and calculate each angle of the corners and compare them. But this is also a bit slow.
I found an article called Contour Analysis. But i find this a bit difficult. In this article, firstly all vectors of each set are interpreted as complex numbers, so we only have two vectors with complex compounds. Then the cosine of both vectors is calculated. But the cosine is also a complex number and the norm of it is always 1 if both vectors are the same. So how does it make sense to interpret a set of vectors as one vector. I don't understand this practice.
Are there any other ways to compare two polygons or sets of vectors? Or can someone explain my 3rd try or do it with normal vectors?
I hope someone can help me out :-)
If your objects are well separated, you can characterize every contour using Hu's moments.
Description and basic math of image moments is rather simple and would be suitable for school project.
In FarseerPhysics engine / XNA, what is ConvertUnits.ToDisplayUnits(); ?
Farseer (or rather, Box2D, which it is derived from) is tuned to work best with objects that range from 0.1 to 10 units in weight, and 0.1 and 10 units in size (width or height). If you use objects outside this range, your simulation may not be as stable as it otherwise could be.
Most of the time this works well for "regular" sized objects you might find in a game (cars, books, etc), as measured in meters and kilograms. However this is not mandatory and you can, in fact, choose any scale. (For example: games involving marbles, or aeroplanes, might use a scale other than meters/kilograms).
Most games have various spaces. For example: "Model" space, "Projection" space, "View" space, "World" space, "Screen" space, "Client" space. Some are measured in pixels, others in plain units. And in general games use matrices to convert vertices from one space to another. Most obviously when taking a world measured in units, and displaying it on a screen measured in pixels.
XNA's SpriteBatch simplifies this a fair bit, by default, by having the world space be the same as client space. One world unit = one pixel.
Normally you should have your world space defined to be identical to the space your physics world exists in. But this would be a problem when using SpriteBatch's default space - as you could then not have a physics object larger than 10 pixels, without going outside range that Farseer is tuned for.
Farseer's[1] solution is to have two different world spaces - the game space and the physics space. And use the ConvertUnits class everywhere it needs to convert between these two systems.
I personally find this solution pretty damn awful, as it is highly error-prone (as you have to get the conversion correct in multiple places spread throughout your code).
For any modestly serious game development effort, I would recommend using a unified world space, designed around what Farseer requires. And then either use a global transform to SpriteBatch.Begin, or something completely other than SpriteBatch, to render that world to screen.
However, for simple demos, ConvertUnits does the job. And it lets you keep the nice SpriteBatch property that one pixel in an unscaled sprite's texture = one pixel on screen.
[1]: last time I checked, ConvertUnits was part of the Farseer samples, and not part of the physics library itself.
I haven't dealt with that particular chunk of code, but most games that have a virtual space (the game world) will have a function similar to 'ToDisplayUnits', and it's function is convert the game world's physical units to the display units in XNA.
An example would be meters to pixels, or meters to x,y screen coordinates.
Having this is good, because it allows you do all your math in physics units and keep all abstract, and then translate stuff to the game screen separately.
Farseer uses the MKS (metre, kilogram, second) units of measure. They provide methods to convert display units of measure to MKS units of measure and vice versa. ToSimUnits() and ToDisplayUnits().
I've written a program that uses the Depth data from a Kinect, and does blob detection to find a user's hand. However, when using the user's hand to control the mouse, it becomes very jerky, probably because people aren't very good at holding body parts completely still.
I've tried averaging the position based on the last ten positioning's, but that just resulted in lag time without actually preventing jerkiness. The best solution so far that I've used is to not move the cursor if the pixel change is less than 10 in both directions (i.e., a 10 pixel change in either direction results in movement). This is okay, but it's still kinda jerky, and results in a clunky interface because you don't have fine precision.
How can I compensate for the lack of steadiness in the human form so that the mouse isn't so jerky?
This will in any case be a tradeoff between lag and stability.
Check your data. You may find that the jerking is because of low resolution in Kinect. If so the jerking distance will be determined at how close you are to the Kinect cameras. When you are too far away the camera resolution is too low and it will keep bouncing between one or two pixels (stereo cams).
You are thinking in the right direction by calculating average and having a threshold for movement. You say you have calculated average for the last 10 positions, which with a resolution of 30 fps causes a 0,33 second delay.
You may want to average out only the 5 last (experiment), and instead of average calculate the mean value.
Just a thought; movement rarely comes alone, so you could set a threshold for when you decrease the number of samples used for averaging/mean.
What is your sample rate? 10 positions is likely to be just a hundredth of a second. You may want to average the last 10th or 3rd of a second.
Did you try to apply a median filter to the depth map before doing your blob detection? I used that in a finger tracking demo and it greatly improved the steadiness.
A bandwidth between 3 and 5 gave me the best results (5 kills a bit the fps but it's really smooth).