Thursday, September 10, 2009

Another practical algorithm to reconstruct the 3D object for single camera -- POSIT


Since last week I used the PnP algorithm from Mr Lan and Mr Quan to reconstruct the 4_Marker based object from a single camera. Unfortunately the result with Matlab implementation is not so perfect, at least not sufficient for the accurate application. So recent days I'm looking for the new tools to solve it. And on the related papers lots of experts recommend the POSIT algorithm from Mr DeMenthon which is regarded one of the most practical algorithms to solve the perspective reconstruction problem with known object geometry. Another excited news is this algorithm has been implemented already in OpenCV library with C++ programming. He has attached the C and Matlab implementation on his own site, too. So the guys whos has interest can download it as a tool to test the one camera tracking.

Here I would just like to describe this algorithm shortly, but the real implementation is of course much more complicated. POSIT is the abbreviation of Pose from Orthography and Scaling with ITerations. In my opinion the peculiarity of this algorithm is these two points which can be found easily from its name. One is using the orthography projection and scaling skills to create proper geometry relationship, taking advantage of this nice creating, we are able to find out the rotation and translation with which the object's position can be reconstructed in the given coordinate system. The other important component of this algorithm is the iteration -- iteratively approach to the accurate result. What is different from the previous algorithm is that POSIT is not a distance-based algorithm. It means the output is not distances between the camera and markers with which the reconstruction is implemented, but after POSIT the transformation matrix could be achieved. Like the usual PnP problem, the geometric configuration of the target is known, so actually if we want to determine the location of the target, what we need is just this transformation matrix. In this case, POSIT seems more convenient.

The link of the paper which describe the algorithm very detailedly we will attach below, so here I won't put lots of the sick formulas and the tedious Pseudocode of POSIT. I'm afraid 30 pages paper can offer you more. What I'd like to talk about is three interesting points which I think are the keystone and nodus of this algorithm, and I hope these my comprehensions can help you understand it better:

1. The mathmatic model of POSIT is based on vector calculation. The elite of POSIT is with the help of 3D geometric characteristic eliminating the extra variables, e.g the rotation matrix is a orthogonal matrix, so the third row can be obtained from first and second's cross product. And during the formula processing, through the orthogonal vectors cross product we could find more surprises. So please keep it in mind, it's about vector calculation, don't ignore the i, j, k!!

2. the idea: the rotation matrix R for the object is the matrix whose rows are the coordinates of the unit vectors i, j, k of the camera coordinate system expressed in the object coordinate system. I think this is the key idea of the whole algorithm, because of this theory, the requirement to calculate the rotation angle and two systems switching is just simplified as a direct calculation between the geometric value of the target and the 2D CS from camera sensor of detected point-images. In the paper you can find how wonderful is this algorithm, since it's not necessary to process the input values complexly. But to understand it needs a little time. According to my understanding the rotation matrix serves as a bridge, with which the body CS can be transfered into camera CS. On the other hand, one coordinate in camera CS must correspond to a coordinate in body CS, therefore there must be such correspondence for the unit vectors of which the coordinate in camera CS is composed. So understanding this amazing point is relevant for the whole algorithm.

3. How to understand this iteration. Actually in this algorithm another significant component is SOP(Scaled Orthographic Projection). As illustrated in the first image, first with the orthographic projection project the feature markers onto a plane which is perpendicular to the optical axis. While to achieve the SOP the exact position of the feature markers is definitely required, but they are the values to be calculated. So at the beginning just assume the segment M0Mi is on the lines of the sight(that must be impossible, because in this case all of the markers are on the same line), apply the POS algorithm getting an approximate depth for each marker, and with the new information we are able to build a new SOP and construct a better transformation matrix. Just applying such a SOP and POS iteratively, until the expected threshold.


Above is something when I read the paper and implement the POSIT what I have thought. With the C++ implementation, the result seems much better than the first algorithm. With the distance of about 1.7 meter, the error is from 2 cm to 9 cm dependent on the Target position. The second image illustrate the visual result after reconstruction with POSIT, and the 3rd picture is the screen shot of C++ running synchronized with the real-time camera tracking.

Here are some links of the paper from Mr DeMenthon which describes the POSIT pretty detailedly. And one of the drawbacks for this algorithm is for the coplanar 4-points target, the reconstruction result is not so acceptable. But advantage is obvious: no need initializing anything, it's quite easy to construct and implement in any program language and so on. Thank Mr DeMenthon for creating the excellent method to solve the PnP problem.

0 comments: