Difference between revisions of "Localizing with AprilTags"

From Lofaro Lab Wiki
Jump to: navigation, search
(How the Module Performs Localization)
 
(15 intermediate revisions by the same user not shown)
Line 1: Line 1:
Our method of indoor localization utilizes a module with a ceiling-facing camera that recognizes glyph markers on the ceiling. The glyph markers on the ceiling each have unique IDs corresponding to positions in the global map of the area that the module is localizing in.
+
{| align="right"
In order to make our localization method possible, we needed to determine a practical glyph recognition system to use. We have chosen to use AprilTags as our glyph recognition system due to its robustness in accurate recognition of its tags. The AprilTags system provides quick scale-invariant and rotation-invariant recognition of its tags and will therefore prove very useful to our indoor localization project as our chosen glyph recognition system. AprilTags was developed at the University of Michigan by Professor Edwin Olson. Check out the AprilTags wiki [http://april.eecs.umich.edu/wiki/index.php/AprilTags here].
+
|[[File:HowLocalize.jpg|center]]
 +
|}
  
 +
Our method of indoor localization utilizes a module with a ceiling-facing camera that recognizes tags on the ceiling. The tags on the ceiling each have unique IDs corresponding to positions in the global map of the area that the module is localizing in. These position in the global map of the area for a viewed tag is utilized with the local frame 6D pose of the tag determined when the tag is viewed in order to determine the global frame position of POLARIS. In determining the global frame position of POLARIS, we determine the global frame position of whatever is holding POLARIS (a generic robot, for example). Thus, we provide a method of indoor localization through computer vision of tags.
  
== Chosen AprilTags Family ==
+
In order to make our localization method possible, we needed to determine a practical marker recognition system to use. We have chosen to use AprilTags as our glyph recognition system due to its robustness in accurate recognition of its tags. The AprilTags system provides quick scale-invariant and rotation-invariant recognition of its tags and will therefore prove very useful to our indoor localization project as our chosen glyph recognition system. AprilTags was developed at the University of Michigan by Professor Edwin Olson. Check out the AprilTags wiki [http://april.eecs.umich.edu/wiki/index.php/AprilTags here].
AprilTags has several tag families. We originally did testing with the 36h11 tag family. But later also considered using the 16h5 tag family instead. In the end, we decided on using the 36h11 tag family. The naming convention for tag families, for example "36h11", have the number of data bits in a member tag of the family, in this case 36, followed by the minimum hamming distance between two tags of the same family, in this case 11.
+
  
{| align="center"
 
|[[File:TagFams.jpg|center]]
 
Four member tags from each of the two AprilTags families pictured.
 
|}
 
  
'''Hamming Distance'''
 
  
It is desired to have a high hamming distance between members of the chosen tag family because hamming distance, by definition, is the number of positions at which two symbols are different. Therefore, a high hamming distance leads to less of a chance of recognizing one tag as a different tag. This is one reason why the 36h11 tag family is more desirable to use than the 16h5 tag family.
 
  
'''Family Size'''
 
  
Another reason we chose the 36h11 tag family instead of the 16h5 tag family is because the 16h5 tag family only has 30 tag members, while the 36h11 tag family has 586 tag members. We must cover the ceilings of two floors of the engineering building, therefore we need a lot of glyphs. Our strategy to use pairs of tags from a given family, means we can have N^2 amount of spots marked by tags for a tag family with N members. This means that even with our tag pair strategy, the 16h5 tag family can only cover 900 spots. The 36h11 tag family, has the potential to cover 343396 spots. This was the deciding factor for why we chose the 36h11 tag family, not only will it provide more accurate tag recognition, but it will also provide us with the ability to localize more area than we will even need to localize.
 
  
'''Complexity'''
 
  
One con in choosing the 36h11 tag family over the 16h5 tag family is that the 36h11 tag family has more data bits and is therefore more complex. Because we are manually making the tags with stencils and spray-paint, the stencils will therefore have to be carved out to a higher complexity for each tag that we use from the 36h11 tag family relative to the 16h5 tag family. However, the pros of using the 36h11 tag family still outweigh the cons.
 
  
== How the Module Performs Localization ==
 
  
'''Step 1: Find an ordered tag pair'''
 
  
{| align="right"
 
|[[File:Globalframeglyph.jpg|center]]
 
    Tag pair in global frame.
 
|}
 
An ordered tag pair is two side-by-side tags, with the top of each tag facing the positive direction of the y-axis of the global reference frame, as shown in the following picture on the right.
 
An arrow from the center of either tag in the tag pair upward to its top tag side (that is also perpendicular to the top tag side) is parallel to global y-axis, and points in the same direction as positive y.
 
  
We determine if we have an ordered tag pair by first getting the local (camera) frame x,y position of (at maximum 5) recognized tags. The reason we only need to recognize 5 tags to find a pair is because at maximum their are 4 tag pairs in the camera FOV at once. This means that the fifth tag recognized will be the other tag in a pair with a previously recognized tag.
 
  
{| align="left"
 
|[[File:GlyphMaxFOV.jpg|center]]
 
    Max amount of tags in the FOV.
 
|}
 
In the picture to the left, the red square represents the camera FOV, the black squares represent AprilTags, two black squares next to each other are a tag pair and if a black square is filled with red that means it has been recognized by the system, if it is filled with white then it has yet to be recognized. Note that in the pictures we have four recognized tags, each from a different tag pair. The fifth tag that we recognize will be the other tag to a fully recognized tag pair. This assumes that the tags are placed such that we only ever have 8 tags (4 tag pairs) in the camera FOV at a time, so this rule must be enforced when placing the tags.
 
  
We determine that two of the recognized tags are a pair if their centers are a certain distance away. Therefore, it is important that tags in a tag pair are within less than or equal to the specified distance from each other. It is equally important that the centers of two tags from different tag pairs are further away from each other than this specified distance (or else they will be confused as a pair).
 
  
Next we must determine which tag is the leading tag in the recognized pair. That is, if we recognize a pair with one tag that has tag ID 1, and another tag that has tag ID 2, we must know if we are looking at the tag pair (1,2) or (2,1), as these two ordered tag pairs represent different global positions in the global frame. The leading tag for (1,2) is 1, we recognize the leading tag as the leftmost tag in the tag pair when in the camera local frame. Note that since the camera is facing the ceiling, the local frame x-axis is mirrored from the global frame, this is accounted for later.
 
  
{| align="right"
 
|[[File:TagPairDiff.jpg|center]]
 
    The difference between tag pair (1,2) and (2,1)
 
|}
 
  
We use a conditional structure to determine which tag is the leading tag in the ordered tag pair that is based on the rules shown in the picture immediately below. When the tag pair has an orientation between two of the orientations in the picture, it must meet either of the x requirements for the two bordering orientations, and must meet either of the y requirements for the two bordering orientations. So in the case of an orientation of 70 degrees for the tag pair, the orientation is between 45 degree and 90 degree borders. From the picture we can see that this means the leading tag, tag 1, must have x1<x2 OR x1=x2. Similarly, for the requirements of y, y1<y2 OR y1<y2, so nothing changes there. The program gives some tolerance to the position of the tags since this is a real world application.
+
 
 +
== Chosen AprilTags Family ==
 +
AprilTags has several tag families. We originally did testing with the 36h11 tag family. But later also considered using the 16h5 tag family instead. In the end, we decided on using the 36h11 tag family. The naming convention for tag families, for example "36h11", have the number of data bits in a member tag of the family, in this case 36, followed by the minimum hamming distance between two tags of the same family, in this case 11.
  
 
{| align="center"
 
{| align="center"
|[[File:Leadtagsmall.jpg|center]]
+
|[[File:TagFams.jpg|center]]
    Determining the lead tag in the tag pair.
+
Four member tags from each of the two AprilTags families pictured.
 
|}
 
|}
  
'''Step 2: Index into LUT to get global position'''
+
'''Hamming Distance'''
  
Now that we have an ordered tag pair, we index into an array of global positions. For a tag pair (i,j), we look at the (N*i + j) item in the array, where N is the number of tag IDs that are used to localize (for example we use tag IDs from 0 to 50, then N is 51 and the number of items in the LUT array is 2500). So if we have the tag pair (2,1), and N = 51, then we look at item (51*2 + 1) in the array, which is item 103. What we get when we retrieve this item from the array is a string in the format "x,y,z" where x and y are signed floats that represent the global frame position of the center of the lead tag in the tag pair and z represents the floor that the tag pair is on. This information, along with the local frame position of the lead tag in the tag pair and the orientation of the tag pair, are sufficient data to calculate the global position of the module's camera center.
+
It is desired to have a high hamming distance between members of the chosen tag family because hamming distance, by definition, is the number of positions at which two symbols are different. Therefore, a high hamming distance leads to less of a chance of recognizing one tag as a different tag. This is one reason why the 36h11 tag family is more desirable to use than the 16h5 tag family.
  
{| align="left"
+
'''Family Size'''
|[[File:Tagxy.jpg|center]]
+
    x,y point for tag pair is center of leading tag.
+
|}
+
  
'''Step 3: Calculate global position of module's camera center'''
+
Another reason we chose the 36h11 tag family instead of the 16h5 tag family is because the 16h5 tag family only has 30 tag members, while the 36h11 tag family has 586 tag members. We must cover the ceilings of two floors of the engineering building, therefore we need a lot of glyphs. Our strategy to use pairs of tags from a given family, means we can have N^2 amount of spots marked by tags for a tag family with N members. This means that even with our tag pair strategy, the 16h5 tag family can only cover 900 spots. The 36h11 tag family, has the potential to cover 343396 spots. This was the deciding factor for why we chose the 36h11 tag family, not only will it provide more accurate tag recognition, but it will also provide us with the ability to localize more area than we will even need to localize.
  
Let the global frame glyph position be the global frame x,y attained from the LUT as xgg,ygg.
+
'''Complexity'''
  
Let the local frame glyph position attained through AprilTags recognition software be xgl,ygl. Before using this coordinate negate xgl to account for mirroring of the local frame:
+
One con in choosing the 36h11 tag family over the 16h5 tag family is that the 36h11 tag family has more data bits and is therefore more complex. Because we are manually making the tags with stencils and spray-paint, the stencils will therefore have to be carved out to a higher complexity for each tag that we use from the 36h11 tag family relative to the 16h5 tag family. However, the pros of using the 36h11 tag family still outweigh the cons.
        xgl = -xgl
+
  
Let the orientation of the tag pair in the local frame, which yields the -tilt of the local frame with respect to the global frame, be gamma. Negate gamma to account for the local frame being a mirror of the global frame.
+
== Placement of Tags ==
        gamma = -gamma
+
When placing AprilTags on the ceiling for POLARIS to localize with, it is important to note the following:
Now, note that the tag orientation yields the [ -tilt = gamma] of the local frame with respect to the global frame since the glyphs at 0 degrees have their top pointing toward a line parallel to the y axis of the global map.
+
  
 +
* The tag's x and y values in meters are taken from the global frame origin to the tag's center.
 +
* The tag must be aligned with the global frame such that the top edge of the tag is perpendicular to the y-axis of the global frame and the top edge of the tag has a more positive y-value than the bottom edge of the tag. This is the convention for POLARIS to determine its orientation properly.
 +
[[File:orientTag.jpg]]
  
We shift the origin at camera center to the leading tag and obtain local frame coordinates of the camera center for this new local frame. The position of the camera center in the local frame with the tag pair's leading tag taken as the origin is the negative of the local frame position of the leading tag of the tag pair when the camera center is taken as the origin:
+
In the above picture you can see how a tag should look after placement following the conventions in this section. The image shows the view of a tag when looking up at the ceiling from the floor.
        local frame camera center x for leading tag as origin = xcl = -xgl
+
        local frame camera center y for leading tag as origin = ycl = -ygl
+
  
Now we calculate beta, which is the angle between the line created from the camera center to the leading tag origin, and the x-axis of the local frame. We will see the use of this angle soon.
+
== How the Module Performs Localization ==
 
+
The following steps are repeated in cycles as long as the program is run:
        beta = math.atan(ycl/xcl)
+
        if(xcl < 0):
+
                beta = beta + math.pi
+
 
+
Now we get the distance l1 from the origin of the global frame to the global frame position of the leading tag.
+
        l1 = math.sqrt((xgg*xgg) + (ygg*ygg))
+
 
+
We also get the distance l2 from the global frame position of the leading tag to the global frame position of the camera center. This distance is equivalent to the distance between the local frame position of the leading tag and the local frame position of the camera center since the local frame and the global frame have axes of the same scaling and units. Therefore:
+
        l2 = math.sqrt((xcl*xcl) + (ycl*ycl))
+
  
Now we get theta1, which is the angle between l1 and the global x-axis:
+
'''Step 1: Find tags'''
        theta1 = math.atan(ygg/xgg)
+
        if(xgg < 0):
+
                theta1 = theta1 + math.pi
+
  
We also want theta2, which is the angle between l2 and the global x-axis minus theta1. First we get thetaee, which is the angle between l2 and the global x-axis:
+
The module recognizes every tag in its field of view (FOV) that the camera views with AprilTags. It is imperative to the process that at least one tag is in the FOV of POLARIS at all times or localization data cannot be obtained.
        thetaee = beta - gamma
+
Why is thetaee = beta - gamma? Recall that gamma is the orientation of the tag pair, it is also the orientation of the leading tag since each tag in the tag pair has the same orientation. This orientation is taken with the tag facing toward the positive direction of the y-axis of the global frame. If the tag is viewed with orientation of zero, then the tag is facing toward the positive y-axis of the local frame and therefore the local frame y-axis and global frame y-axis are in parallel with no angle difference between them. However if the orientation is non-zero, then the local frame has an angle with respect to the global frame of the same magnitude but opposite sign. The reason the angle difference between the local frame and the global frame is -gamma is the same reason why if you tilt your camera clockwise when observing an arrow that is on an image plane parallel to the camera lens, the arrow appears to tilt counterclockwise by the same amount of degrees that you tilted your camera clockwise. So negative gamma is the angle between the local frame and the global frame, with counterclockwise rotation taken as positive. When negative gamma is added to beta, we get thetaee because beta is the angle between the line l2 (the line between the leading tag's center point and the camera center point and the local frame's x-axis. Since scaling and units are the same for the local and global frame, we simply add beta to negative gamma to get the angle between l2 and the global frame's x-axis.
+
  
Now we can get theta2:
+
'''Step 2: Store Data'''
        theta2 = thetaee - theta1
+
  
Finally, we can find xcg,ycg (the coordinate for the camera center in the global frame) using forward kinematics trigonometry:
+
The module stores the roll, pitch, and yaw (RPY) of all newly recognized tags in data arrays that hold the data of the 8 most recently recognized tags.
        xcg = (l1*math.cos(theta1)) + (l2*math.cos(theta1+theta2))
+
        ycg = (l1*math.sin(theta1)) + (l2*math.sin(theta1+theta2))
+
  
As for the z position value of the camera (that is, what floor the camera is on) it is obviously equivalent to the z value found from indexing into the LUT when obtaining the global position of the leading tag in the recognized tag pair.
+
'''Step 3: Run Moving Average Filter on RPY'''
  
'''Step 4: Calculate orientation of module's camera'''
+
The module runs a moving average filter with a window size of 10 on the RPY data of the 8 most recently recognized tags. This alleviates the effects of noise that cause jumps in the RPY data of tags collected using AprilTags. The problem of noisy data has only been observed as an issue for the RPY data of the tags, which is why the moving average filter is only applied to RPY data. The issue observed was sharp jumps between two different roll and pitch values for a given tag, even when the tags were completely stable and immobile. The moving average filter alleviates the problem by lowering the magnitude of discrepancy between the two differing angle values. Later in the process, if multiple tags have been recognized, their localization data values are averaged out to further alleviate this noise issue.
  
If we want the orientation of the module's camera with respect to true north, we need the following:
+
'''Step 4: Obtain Global Tag Pose Data'''
  
gamma = orientation of a spotted glyph with respect to the x-axis of the local frame given that the top of the glyph faces the positive y direction of the global frame.
+
The module reads the pre-collected Look-Up Table (LUT) to obtain the global x,y,THETA,z values of the tags recognized in this particular cycle of the localization.
  
offset = the angular difference between the y-axis of the global frame and true north.
+
'''Step 5: Run Transformations'''
  
Then we find THETAcg, the orientation of the module's camera with respect to true north, as:
+
The module runs transforms using the global and local tag positions to determine the global position of the camera of POLARIS and thus the global position of the device. For every tag viewed in this cycle, we obtain a localization data point consisting of the x,y, and orientation of POLARIS.
 +
First the local x,y,z values of the viewed tag obtained through the AprilTags module are transformed using the RPY values for that specific tag in order to find the tag's position in reference to a local plane that is parallel to the global frame plane. This can be seen in the image accompanying step 5. The first three transformations, which are RPY angle transformations, are changing the angle of the local frame that the tag is viewed in so that the frame is parallel to the global frame. Next the camera local position is obtained with respect to this viewed tag's center as the origin by simply negating the x,y,z values of the tag's local frame position since that position was originally taken with respect to the camera center as the local frame origin. Finally, the camera's local frame position is shifted by the tag's global frame position values that were obtained from the pre-collected LUT such that we are able to obtain the global frame position of the camera. Because the camera is attached to POLARIS, we have found the global frame position of POLARIS and any device/user carrying POLARIS.
  
        THETAcg = offset - gamma
+
[[File:transforms.jpg]]
  
With positive angle change in the counterclockwise direction.
+
'''Step 6: Weighted Average of Localization Data'''
  
== Example of Localization ==
+
The localization data from every viewed tag is averaged out using a weighted average that allows a smaller weight for tags that are further away. This allows for one localization data point of x, y, z, and YAW to be obtained for POLARIS even if multiple tags are viewed. As mentioned before in step 3, the viewing of multiple tags decreases the effect on the final localization value of the infrequent occurrence of noisy angle data in the local frame 6D pose of viewed tags.
  
(Real World Robot Demo Video with One Glyph)
+
'''Step 7: Moving Average of Global Position'''
  
(Short Explanation)
+
The global position of POLARIS is run through a moving average filter with a window size of 5 to obtain the final localization data of POLARIS that is output by the localization module. The reason for this filter is to keep the localization data continuous, as it should be.
  
(Demo Code)
+
== POLARIS Level II Diagram ==
  
== TODO ==
+
[[File:polarisdiagram.jpg]]
* Explanation of how we localize
+
* Demo video if real world robot demonstrating this behavior
+

Latest revision as of 13:57, 23 May 2015

HowLocalize.jpg

Our method of indoor localization utilizes a module with a ceiling-facing camera that recognizes tags on the ceiling. The tags on the ceiling each have unique IDs corresponding to positions in the global map of the area that the module is localizing in. These position in the global map of the area for a viewed tag is utilized with the local frame 6D pose of the tag determined when the tag is viewed in order to determine the global frame position of POLARIS. In determining the global frame position of POLARIS, we determine the global frame position of whatever is holding POLARIS (a generic robot, for example). Thus, we provide a method of indoor localization through computer vision of tags.

In order to make our localization method possible, we needed to determine a practical marker recognition system to use. We have chosen to use AprilTags as our glyph recognition system due to its robustness in accurate recognition of its tags. The AprilTags system provides quick scale-invariant and rotation-invariant recognition of its tags and will therefore prove very useful to our indoor localization project as our chosen glyph recognition system. AprilTags was developed at the University of Michigan by Professor Edwin Olson. Check out the AprilTags wiki here.









Chosen AprilTags Family

AprilTags has several tag families. We originally did testing with the 36h11 tag family. But later also considered using the 16h5 tag family instead. In the end, we decided on using the 36h11 tag family. The naming convention for tag families, for example "36h11", have the number of data bits in a member tag of the family, in this case 36, followed by the minimum hamming distance between two tags of the same family, in this case 11.

TagFams.jpg

Four member tags from each of the two AprilTags families pictured.

Hamming Distance

It is desired to have a high hamming distance between members of the chosen tag family because hamming distance, by definition, is the number of positions at which two symbols are different. Therefore, a high hamming distance leads to less of a chance of recognizing one tag as a different tag. This is one reason why the 36h11 tag family is more desirable to use than the 16h5 tag family.

Family Size

Another reason we chose the 36h11 tag family instead of the 16h5 tag family is because the 16h5 tag family only has 30 tag members, while the 36h11 tag family has 586 tag members. We must cover the ceilings of two floors of the engineering building, therefore we need a lot of glyphs. Our strategy to use pairs of tags from a given family, means we can have N^2 amount of spots marked by tags for a tag family with N members. This means that even with our tag pair strategy, the 16h5 tag family can only cover 900 spots. The 36h11 tag family, has the potential to cover 343396 spots. This was the deciding factor for why we chose the 36h11 tag family, not only will it provide more accurate tag recognition, but it will also provide us with the ability to localize more area than we will even need to localize.

Complexity

One con in choosing the 36h11 tag family over the 16h5 tag family is that the 36h11 tag family has more data bits and is therefore more complex. Because we are manually making the tags with stencils and spray-paint, the stencils will therefore have to be carved out to a higher complexity for each tag that we use from the 36h11 tag family relative to the 16h5 tag family. However, the pros of using the 36h11 tag family still outweigh the cons.

Placement of Tags

When placing AprilTags on the ceiling for POLARIS to localize with, it is important to note the following:

  • The tag's x and y values in meters are taken from the global frame origin to the tag's center.
  • The tag must be aligned with the global frame such that the top edge of the tag is perpendicular to the y-axis of the global frame and the top edge of the tag has a more positive y-value than the bottom edge of the tag. This is the convention for POLARIS to determine its orientation properly.

OrientTag.jpg

In the above picture you can see how a tag should look after placement following the conventions in this section. The image shows the view of a tag when looking up at the ceiling from the floor.

How the Module Performs Localization

The following steps are repeated in cycles as long as the program is run:

Step 1: Find tags

The module recognizes every tag in its field of view (FOV) that the camera views with AprilTags. It is imperative to the process that at least one tag is in the FOV of POLARIS at all times or localization data cannot be obtained.

Step 2: Store Data

The module stores the roll, pitch, and yaw (RPY) of all newly recognized tags in data arrays that hold the data of the 8 most recently recognized tags.

Step 3: Run Moving Average Filter on RPY

The module runs a moving average filter with a window size of 10 on the RPY data of the 8 most recently recognized tags. This alleviates the effects of noise that cause jumps in the RPY data of tags collected using AprilTags. The problem of noisy data has only been observed as an issue for the RPY data of the tags, which is why the moving average filter is only applied to RPY data. The issue observed was sharp jumps between two different roll and pitch values for a given tag, even when the tags were completely stable and immobile. The moving average filter alleviates the problem by lowering the magnitude of discrepancy between the two differing angle values. Later in the process, if multiple tags have been recognized, their localization data values are averaged out to further alleviate this noise issue.

Step 4: Obtain Global Tag Pose Data

The module reads the pre-collected Look-Up Table (LUT) to obtain the global x,y,THETA,z values of the tags recognized in this particular cycle of the localization.

Step 5: Run Transformations

The module runs transforms using the global and local tag positions to determine the global position of the camera of POLARIS and thus the global position of the device. For every tag viewed in this cycle, we obtain a localization data point consisting of the x,y, and orientation of POLARIS. First the local x,y,z values of the viewed tag obtained through the AprilTags module are transformed using the RPY values for that specific tag in order to find the tag's position in reference to a local plane that is parallel to the global frame plane. This can be seen in the image accompanying step 5. The first three transformations, which are RPY angle transformations, are changing the angle of the local frame that the tag is viewed in so that the frame is parallel to the global frame. Next the camera local position is obtained with respect to this viewed tag's center as the origin by simply negating the x,y,z values of the tag's local frame position since that position was originally taken with respect to the camera center as the local frame origin. Finally, the camera's local frame position is shifted by the tag's global frame position values that were obtained from the pre-collected LUT such that we are able to obtain the global frame position of the camera. Because the camera is attached to POLARIS, we have found the global frame position of POLARIS and any device/user carrying POLARIS.

Transforms.jpg

Step 6: Weighted Average of Localization Data

The localization data from every viewed tag is averaged out using a weighted average that allows a smaller weight for tags that are further away. This allows for one localization data point of x, y, z, and YAW to be obtained for POLARIS even if multiple tags are viewed. As mentioned before in step 3, the viewing of multiple tags decreases the effect on the final localization value of the infrequent occurrence of noisy angle data in the local frame 6D pose of viewed tags.

Step 7: Moving Average of Global Position

The global position of POLARIS is run through a moving average filter with a window size of 5 to obtain the final localization data of POLARIS that is output by the localization module. The reason for this filter is to keep the localization data continuous, as it should be.

POLARIS Level II Diagram

Polarisdiagram.jpg