Thursday, August 27, 2009

How to obtain the 8th degree polynomial from the linear PnP Algorithm


In the previous blog I have introduced an algorithm to solve the PnP problem by Long Quan and Zhongdan Lan, but there is still one problem during the calculation, that is with the equation system

how to get the 8th polynomial:

and here I will derive the whole process.

Still like before with this illustration let's face the problem more intuitively. With this tetrahedron we achieve the all angles and lengths we need, and what should be obtained is the length of three legs, S1, S2 and S3.

And our new equation system looks like this:

Besides we also define another two helpful tools to help us eliminate the invariable.

So combine (5) with (2) to eliminate the invariable S3 and we can get:

The same to combine (4) with (3) will come the following:

And do the same thing with (4) and (5) to take place of (1), the S2 and S3 will be totally removed.

And next we will do a serial of supersessions to really solve the problem, but you guys should be patient!

So with this replacement, (6), (7) and (8) can be rewritten as

And square the equation (11) with the help of (9) and (10) to eliminate the u^2 and v^2, we can acquire the following equation:

So the following job is another replacement, still that word "BE PATIENT!"

And in this way, expand the equation and replace the u^2 and v^2 with above equations we get another relevant equation:

Now we are already close to the 8th degree polynomial, this is the good news! But bad one is we need another replacement,hehe. Here we go: Let's square the equation (12) and enjoy the last replacement, as follows:

Finally the 8th degree polynomial is available as follows:

With this equation man can apply it on lots of algorithm calculation for the perspective n points pose estimation. And the main idea of this solution is from Linnainmaa, Harwood and Davis.

Monday, August 24, 2009

Mobile Augmented Reality

How much actually can the Augmented Reality technique change our life. Since recent decade this amazing thing has been applied in the medical area, industrial manupulation and the scientific researches. Is it still far from us -- normal person with peaceful life? Check the next video you guys could imagine in the close future with our personal electrical devices like PC, mobile phones or some other unbelievable things, how much role the Augmented Reality can play. At least we could be immersed into the game as a real individual and sure we could enjoy more!


Another PC game with AR technique. I wanna one!


Estimation of the correspondence -- RANSAC

In the previous article I have mentioned the problem of the correspondence determination between the real markers and the image points. I have calculated the possible correspondences for N markers and M marker-images. But how to finally determine the correct correspondence is far more difficult than that. Here I will introduce one of the most popular way for the correspondence or the parameter estimation applications in computer vision -- RANSAC, which is an abbreviation for "RANdom SAmple Consensus". It was first published by Fischler and Bolles in 1981. It is advantageously used to estimate parameters of a mathematical model from a number of observed data with a iterative method. And the data is divided into inliers and outliers. The so-called inliers are the suitable candidates which suffice the mathematical model, while the outliers are the traitors.

According to the content of this method, RANSAC is a non-deterministic algorithm, the final result is just reasonalbe with certain probability. Besides elimination of the outliers, RANSAC is also able to establish a parameterized model for a set of the inliers. In
wikipedia there is a example to demonstrate this theory with a 2D line model. But for our camera tracking correspondence it is not so lucky with such a simple model of so limited parameters. Let's look at the abstract idea of this algorithem:

Given:
1) D = {Xi} D is set of data points, |D| = N
2) f(Y): Y--> p Y is a sample from D, and f is the function to compute the model
parameters p with this sample
3) C(X,Y) it's the cost function to evaluate the result for a single point x


To achieve:
p* parameters of the model maximizing the cost function

Algorithm:

n:=o

--initialized cost function C* = 0
repeat until cost function C(x,y) is smaller than threshold

--select randomly set Yi from the whole database D
--based on the requirement calculate the parameters pk
--compute the cost function Ck
--if C* is smaller than Ck, then C*=Ck, p*=pk

end


In our situation,
a)the to be determined parameters are the rotations and translations of the markers and the correspondence result, that is to say which marker-image corresponds to which real marker.
b)the set of data is all possible projection images of the object onto the image plane.
c)the cost function can be the sum of the distances between the projected marker-images and the closest detected images on the sensor

And based on these Hypothesis and algorithm, the correspondence problem could be solved possibly.

Storyboard, interesting stuff

Today I found another alluring augmented reality application from youtube, it is a platform for the normal users to enjoy the Augmented Reality fun. It seems the name of this genial thing is Will be. It is created by some korean guys from Hongik University in Korea. What amazed me most is the friendly and intuitive interface of the software and the strong usability. The design is also simple but vivid. In a word, it's a nice work. Based on this idea, the AR personal application is closer and more realizable. And here is a paper from them about this technique:here

Sunday, August 23, 2009

Perspective n Points problem for Pose Reconstruction


To reconstruct the object position related to the camera coordinate system, there are lots of method available to be applied. But one of the most popular way is linear perspective reconstruction. Of course it can be used to reconstruct the camera position in room CS, too. Up to the requirement.


In my case I have a object consisted of n markers, in the image "n=4", what I want is to reconstruct the position and orientation of the object related to the camera CS. And the linear PnP method is absolutely helpful for my needs. Professionally described, my problem is also called SPACE RESECTION . That is given a set of correspondences between 3D reference points and their image, determining the position and orientation of camera or object with respect to the known reference points. It's said one of the oldest and most important tasks in computer vision.

There are already numerous papers to introduce or describe the algorithms to solve this problem, specially for the one camera or single view situation. Most of them relay on either iterative methods or applying the closed-form solutions to minimal subsets of the redundant data. Definitely they have their own drawbacks and advantages respectively. But so far as I have read, the Linear N-point Camera Pose Determination method by Long Quan and Zhongdan Lan is the most proper and flexible tool to solve the pose reconstruction problem. And the mathematical model could be illustrated in image 2, the 3D object with n markers is projected onto the camera's sensor with m(m<=n) image points. And by their method the correspondence between the 3D markers and 2D points is already known. And the geometry of the object is also available like the distances between every two markers. And the camera can detect the position of those 2D image points with respect in camera CS, and the focal length of the camera we can use, too. wow, so many things, ne? And with next picture this problem can be explained better:


So what we should do is based on |AB|, |BC|, |CD|, |DA| and the parietal angles of every polyhedron's surface (with the help of focal length and position of the image points they are able to be calculated), achieve the length of the polyhedron's legs: a,b,c,d. And with the geometrical relationship of tetrahedron ABCL we can obtain the following equations:

And this system has 8 solutions. But if we eliminate the negative values we just have to face 4 choices. Using the classical Sylvester resultant, a, b or c could be expressed by a eighth degree polynomial(detailed explanation) like:

with x = a^2 ( or b^2, or c^2)
For n=4, an overconstrained system of 6 polynomials is obtained for the four unknowns a, b, c, d. And the straightforward approach is to take subsets of 3 of the four points solve the fourth degree polynomial equation for each subset of 3 points and at last every subset. For n points we have (n-1)(n-2)/2 fourth degree polynomials of type g(x)=0, and in my case n=4, so we have 3 fourth degree polynomials, written in matrix form:

And with SVD we could solve the vector t with parameterized null space as:

and for i+j = k+l, 0<=i,j,k,l<=4 we have this equation:

Substituting t from the equations we can achieve another equation:

And with the logical permutation of i+j = k+l, 0<=i,j,k,l<=4, we can obtain 7 such quadratic equations, written in matrix form:

and solved by SVD of B, we can absolutely get the solution of
so according to the equation above the vector t is able to be solved and the length of a, b, c, d are also obtained with the value t1/t0. So finally with the value of these values, the four markers' position could be uniquely determined. And based on these the location or the orientation of the object can be calculated, too. Likewise
the camera position is also able to be achieved.

Thursday, August 20, 2009

Movable view with your head

Last month I have found a interesting "Toy" which worked on personal PC with virtual reality technique to move the monitor view effect. This amazing product is from the company NaturalPoint and specially for their own PC games. It's really cool and this is a simple introduction video about it, in "Youtube" there are lots of videos about it. But of course it costs a lot, too. :(



But what surprises me is that I have found another copycatting "trackio" -- Freetrack. Nice name, right? Similar functions but free!! Besides we could download the working plattform to enjoy the head tracking fun. Only problem is the hardware and the accuracy. Whatever, we could still enjoy it!


Correspondence -- matching them

Recently I'm doing the correspondence work for the target tracking with one camera tracking system. For the multi-camera tracking systems the location and transformation are always 'friendly' to be found, but by the single camera, things seem always antagonistic. Based on the camera image, on which the detected points are, we should show out the orginal 3D location of the target related to an assigned coordinate system. So far there are already an amount of algorithms cooperating with each other to reconstructe the object.

According to the picture above, what we can obtain from a camera detecting system is just the projected image coordinate in camera coordinate system, like (x1,y1), (x2,y2) ..... while to achieve the 3D Target coordinate as the following object sounds a long way to go!


So with the previous knowledge about the target geometry and the detected image, what we first should tell is the correspondence between the actual marker to be measured and the available projected images. That is to say we should be able to distinguish the different projected points in the image, and match them exactly to proper markers.

At first let's discuss the number of the possible correspondences, supposing there are
N markers of a target and the camera detected M marker-images (but inside there could be noise image which should not be considered), and assume all of the markers ought to be seen by camera, So the number of possible correspondences is

Num = M!/(M-N)!

If we think a little more complicated that not all markers can be detected, so we should think about the subset of such combination, and of course the formula can be more 'frightful'! Supposing the size of subset is S,

Num = N!xM!/[(N-S)!x(M-N)!]

And the value of S is logically not decided, so with all possible S, the possible number of the correspondence is:

Num=SUM{
N!xM!/[(N-S)!x(M-N)!]} for all 3<=S<=N Based on this nice formula let's check a short example, with N = 5, M = 6 tracking situation, the number of possible correspondence is 3720. Seems not bad, right? But with the increase of N the amount of correspondence will increase exponentially. Horrible!!

The problem is truly complex but there are also helpful solutions. And my next job is to find the available algorithms, implement them, compare them and develop them. So far as I have found: The softPOSIT algorithm developed by DeMenthon is able to calculate the correspondence automatically, and the advantage is avoiding going through all possible correspondences; The RANSAC paradigm developed by Fischler which is a method to find the maximal number of samples that fit into the desired model, and it effectively rejects outliers and clutter. The obvious advantage is that only a fraction of the subsets of S, with the minimum number of s points, has to be tried in order to find the maximum set, because it always choose the best one for the next step calculation, it saves a lot. Some of the narrowing search space tools are also quite important, like clustering method to group the samples in order to reduce the search, extrapolation which is used after one correspondence has been found to extrapolate the rest markers, and some of the similar approximations.

And enjoy a funny video about the visualisation of correspondence:


Friday, August 14, 2009

Octave Simulation of Pinhole Camera Model

For the one camera tracking, to test and verify the correspondence result and reconstruction effect is very significant. So constructing a one camera Model always makes sense for the tracking technique. In my thesis I have used octave under Linux to simulate the simplest model of one camera situation -- Pinhole camera Model.

The main idea of this so-called Pinhole camera Model is a Projection from a coordinate of 3D point to a image plane of 2D coordinate. Compare with the normal camera model what is special with pinhole
model is the camera aperture is described as a point and no lenses are used to focus light. With this assumption we can calculate the transformation much more easily. And the advantage of this model is we don't have to consider geometric distortions or blurring of unfocused objects caused by lenses and finite sized apertures. Anyway simplification.

And the image above shows the principle of this projection model. For the pinhole camera model there are several assumptions:
1. The origin of 3D coordinate system is the location of camera aperture. And the axes X1, X2 are the axes of image plane, too.
2. The image plan is located in distance f from the origin along the X3 axis.
3. The pinhole aperture is assumed as infinitely small point.



In the second picture we could find the relationship between 3D and 2D coordinate very easily. With the geometrical relationship we can conclude that:

Based on this fomular we can implement the projection without any problem. But usually for the octave applications we prefer the matrix representation, and the mapping from 3D coordinate to 2D coordinate will be implemented by homogeneous camera matrix, which is represented in this way:
Y ~ Cx
and this C is the core of the model simulation. Actually to imitate the camera model is just to realize this 3x4 camera matrix which is considered as the projective space. (Attention: with this representation x is 4 dimensional and y is 3 dimensional). So based on the analysis of previous words, the matrix representation of this model can be expressed as follows:
so the camera matrix can be written as
But this model is too simple, because we just assume this is a camera centered coordinate system. In fact, however, the 3D points could be in terms of coordinates relative to an arbitrary coordinate system. So we have to take Rotation R and Translation t into account. And the projection relationship will be redefined again as follows:

So finally what we should focus on is just this new camera projection matrix:
And here R denotes to the rotation matrix which requires the rotation angel related to x,y,z axis respectively, and t is the translation vector.

My octave program is able to be run under any linux environment with Octave installation. To manipulate the program,
1)first go to the path where the files are stored.
2)then just input "model" command to start the program
3)with the hint on the screen sequentially input the translation vector and the 3 rotation angles, for the input 3D coordinates you have to choices, one is read the so-called "bodyfile" in which the 3D coordinates of several markers are stored, the other one is input the coordinate by yourself if you have interest and patience.
And finally the projected 2D coordinate related to camera coordinate system will be outputted on the screen.

As said this is nothing with deep theory or difficult implementation, just for fun. Enjoy it!
The source code