Scene Detection For Flexible Production Robot

Mikkel Viager (s072103)
Scene detection for exible production robot

Masters Thesis, June 2013
M IKKEL V IAGER ( S 072103)
Scene detection for exible production robot
Masters Thesis, June 2013
Supervisors: Jens Christian Andersen, Professor at DTU, Department of Electrical Engineering Ole Ravn, Head of Group, Department of Electrical Engineering Anders B. Beck, Project Leader, Danish Technological Institute
DTU - Technical University of Denmark, Kgs. Lyngby - 2013
Scene detection for exible production robot Scene detektion for eksibel produktionsrobot This report was prepared by: Mikkel Viager (s072103)
Advisors: Jens Christian Andersen, Professor at DTU, Department of Electrical Engineering Ole Ravn, Head of Group, Department of Electrical Engineering Anders B. Beck, Project Leader, Danish Technological Institute
DTU Electrical Engineering Automation and Control Technical University of Denmark Elektrovej, Building 326 2800 Kgs. Lyngby Denmark Tel: +45 4525 3576
studieadministration@elektro.dtu.dk
Project period: ECTS: Education: Field: Class: Remarks:
February 2013 - June 2013 30 MSc Electrical Engineering Public This report is submitted as partial fullment of the requirements for graduation in the above education at the Technical University of Denmark.
Copyrights:
Mikkel Viager, 2013
Table of Contents
Table of Contents List of Figures List of Tables Nomenclature Abstract Preface 1 Introduction 1.1 1.2 1.3 1.4 2 Subject description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Scope and limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Content overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
i iii vii ix xi xiii 1 1 3 4 5 7 7 8 9 10 11 11 11 12 13 13 15 15 16 16 16 16 18 19 20
Analysis 2.1 2.2 2.3 2.4 2.5 Concept and tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction of scene and objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Existing solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ROS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 2.5.1 2.5.2 2.6 2.6.1 2.6.2 2.6.3 2.7 2.7.1 2.7.2 2.7.3 2.7.4 2.8 2.9 Software versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Structured light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Microsoft Kinect & ASUS Xtion PRO LIVE . . . . . . . . . . . . . . . . . . . Relative positioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Impact of overlapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Intrinsic camera calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distance calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . External position calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Positioning and eld of view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3D data precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Analysis conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
ii 3 Developed Elements 3.1 3.2 3.3 3.4 3.5 3.6 4 Tests 4.1
TABLE OF CONTENTS 21 21 23 25 26 28 29 30 31 Impact of noise sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Ambient light . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 4.2 4.3 4.4 4.5 Overlapping structured light patterns . . . . . . . . . . . . . . . . . . . . . . Precision of sensor position calibration . . . . . . . . . . . . . . . . . . . . . . . . . . Distinguishability of 3D features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Performance and data acquisition rates . . . . . . . . . . . . . . . . . . . . . . . . . Reliability of services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 4.5.2 4.6 4.7 3D obstacle mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Box nder add-on . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 32 32 34 35 37 38 38 39 40 41 43 45 47 97
Software structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Perception processing step 1 - Depth recalibration . . . . . . . . . . . . . . . . . . . Perception processing step 2 - Transform estimation . . . . . . . . . . . . . . . . . . Perception processing step 3 - Scene camera core . . . . . . . . . . . . . . . . . . . . Perception processing step 4 - Box nder add-on . . . . . . . . . . . . . . . . . . . . 3.5.1 Detection of rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Portability to custom Linux systems at DTU . . . . . . . . . . . . . . . . . . . . . . . Test conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 6
Future Work Conclusion
List of Appendices References
List of Figures
1.1 1.2 2.1
Example of manual assembly work at a Danish SME. . . . . . . . . . . . . . . . . . . . Robot Co-Worker concept [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overview of the work area with objects from several industrial processes, and a tag for calibration purposes. The square containers in the corners are included to evaluate detectability of metal and transparent materials. . . . . . . . . . . . . . . . . . . . . . .
1 2
8 8 9 11 12 12 13 14 14
2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9
Assembly demonstration scene. The black zone is limited by the reach of the robot arm and the red zone is limited by capabilities of the tool camera and scene camera. . . . . Visualisation of a identication and picking task with the tabletop manipulation apps for the PR2 robot. Image is from the package summary wiki page [2]. . . . . . . . . . . A section of the structured IR light projected by a Kinect. Less than 10% of the entire projection is shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Microsoft Kinect. Large and requires external power supply, but is cheap and comes with a integrated accelerometer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ASUS Xtion PRO LIVE. Small and powered via USB only, but almost twice the cost of a Kinect. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sensor placement options. The robot arm is mounted at S1, and the chosen positions of the two sensors are on sides S2 and S4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sensor FOV overlapping. The entire table surface is covered by both sensors, but tall objects close to the sides will only be covered by one. . . . . . . . . . . . . . . . . . . . Images showing the FOV for both sensors. It should be noted that this is the RGB image, which has a slightly bigger FOV than what is covered in images for depth estimation.
2.10 Openni_camera calibration screen. Intrinsic camera parameters are calculated based on several pictures with a checker board, of given size, in varying positions and orientations. 16 2.11 Sensor poses relative to common reference, visualized in rviz. Knowledge about relative positions allows merging of sensor data. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.12 An AR tag generated with ar-track-alvar, encoded with the id 19. . . . . . . . . . . . . 2.13 Alignment of point clouds from the two sensors, based on results from ar-track-alvar with both depth and RGB data. Orientation is very precise, but positioning has a signicant offset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.14 Alignment of point clouds from the two sensors, based on results from ar-track-alvar with only RGB data. The positioning is very precise, but the orientation has a small offset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii 18 18 17 17
iv
List of Figures
2.15 Down sampling is necessary to reduce the density of point clouds, allowing analysis, ltering an segmentation with much lower processing power requirements. From an example with 3D data gathered from the table surface it is seen that the uncertainty in the high resolution data (left image) is relatively high, and that down sampling (right image) does not impact this signicantly. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Overview chart of software structure in the solution. Division into separate nodes, rather than one single combined, is to allow extraction and use of individual components in other ROS solutions, as well as easy maintenance and the option of utilizing distributed processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 The distance estimates for Xtion2 plotted in relation to measured distances. From the offset to the cyan colored dashed guideline with correct y = x relation, it is clear that the sensor estimates are not correct. With a polynomial tting the estimation data it is possible to calculate the correct distance from the estimates. The difference between a rst and second order t is hardly noticeable on the plots, but the second order polynomial does t the samples best. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Two point clouds merged by alignment based on the calibrated transforms between sensors and a common reference. In this case,t he camworld reference is placed directly in the center of the AR tag. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Augmented version of the image from gure 3.3 with axis aligned bounding boxes. These indicate segmented objects, which should be considered as obstacles in motion planning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 A simple example with only one box in the scene. The dimensions of the red bounding box is matched to a box size request. If the size is an acceptable match, the box centroid coordinates are returned. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Individual visualization of the box from gure 3.3 with normal vectors in key areas. As only one box side is partly detected, it is hard to use normal vector information from only this to determine the orientation of the box. . . . . . . . . . . . . . . . . . . . . . . 4.1 Example of how ambient sun light can affect the acquisition of 3D data sets with the IR based structured light sensors. The area of the table with highest concentration of IR light from both sensors and the sun is undetectable under these conditions. . . . . . . 4.2 At the upper right corner of the black square AR tag can be seen a signicant spike in depth reading of a small area. This primarily occurs with two sensors pointed at the same reective surface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Calibration data showing the estimate differences between the 2D only and combined 2D and 3D methods for 6D pose estimation. The difference in position is around 5cm, and the offset in orientation is around 0.1 radians in both roll and pitch estimates. . . . . . 4.4 Example of a single object point cloud after segmentation. Further analysis of the object could be conducted to determine how many transformers is left in the box, and where these are placed. Either from image analysis of the RGB data (left image), or the elevation level of distance readings (right image). . . . . . . . . . . . . . . . . . . . . . 4.5 Point cloud of metal heat sink with a mounted transformer. Only some parts of major features are visible. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 35 34 33 32 29 28 27 25 24 22 19
List of Figures 4.6 Screenshot from the video of autonomous motion planning demonstrated by having the robot plan an alternate path from one side of the toolbox to the other. The detected obstacles are shown in the small simulation window. . . . . . . . . . . . . . . . . . . . 4.7 4.8 The test setup with a cardboard box placed with equal distance to both sensors and no vision obstruction by the arm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graphical overview of 3D data samples required to detect a box. With 15% tolerance, detection in rst attempt is 85% and no more than 3 attempts are required at most. This should be expected to vary with the box size, as the noise does not scale with this. . . 4.9 The Robot Co-Worker main demonstration setup. A video with highlights from one of the live demonstrations is included on the CD. . . . . . . . . . . . . . . . . . . . . . . . 1 2 3 4 Xtion1 estimates and measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xtion2 estimates and measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kinect1 estimates and measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kinect2 estimates and measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38 39
39 40 57 57 58 58
List of Tables
3.1
Estimate results for the Xtion2 sensor with factory calibration shows that actual distances are 5-10% more than estimated, in this interval. . . . . . . . . . . . . . . . . . . . 23
4.1
Performance achieved with the three tested computer systems. Signicant difference between the three is obvious, with the slowest one barely usable, depending on the task. 37
1 2 3 4
Estimate results for the Xtion1 sensor with factory calibration. . . . . . . . . . . . . . . Estimate results for the Xtion2 sensor with factory calibration. . . . . . . . . . . . . . . Estimate results for the Kinect1 sensor with factory calibration. . . . . . . . . . . . . . Estimate results for the Kinect2 sensor with factory calibration. . . . . . . . . . . . . .
57 57 58 58
vii
Nomenclature
Abbreviations and Acronyms AR CMOS DTI DTU FOV FPS IR LED PCL RGBD ROS SME TCP XML Symbols camworld dest dreal Common world reference coordinate system Sensor estimated distance to surface Actual distance from sensor to surface Augmented Reality Complementary Metal-Oxide-Semiconductor Danish Technological Institute Danish Technological University Field Of View Frames Per Second Infrared Light Emitting Diode Point Cloud Library Red-Green-Blue-Depth Robot Operating System Small and Medium Enterprises Transmission Control Protocol Extensible Markup Language
ix
Abstract
This report documents the development, integration and verication of a scene camera solution for the Robot Co-Worker prototype at the Danish Technical Institute. An analysis of the requirements to the implementation is conducted, and it is determined that no existing solutions can sufciently full these. Based on two structured light sensors, a complete solution is developed to match a set of requested functionalities. The result is a ROS package capable of delivering detailed RGBD point cloud segmentations for each object in the scene. Furthermore, bounding box geometries are estimated and made available for use in motion planning and in an included service to return the position of boxes from provided dimensions. Calibration of the solution is done by automatic estimation of sensor poses in 6D, allowing alignment of 3D data from the sensors into a single combined point cloud. Creation of a method for calibration of distance estimates from structured light sensors have also been done, as this was shown to be necessary. The implementation is veried through tests and inclusion in demonstrations of industrial assembly cases as an integrated part of the Robot Co-Worker, fullling the requested capabilities.
xi
Preface
This project was carried out at the Technical University of Denmark (DTU) in collaboration with the Danish Technological Institute (DTI). The project was completed in the timeframe from February 2013 to June 2013, and covers a workload of 30 ECTS credits. While completing this thesis I have worked with several people whom I would like to thank for their support; my supervisors Jens Christian Andersen and Ole Ravn for great sparring, my external supervisor Anders B. Beck and the entire Robot Co-Worker team at DTI for their helpfulness and interest in my work, and my fellow student Jakob Mahler Hansen for his support and constructive input. The work of this thesis have been partially funded by the European Commission in relation to the FP7 project PRACE grant no. 285380, with great appreciation.
xiii
Chapter Introduction
1.1
Subject description
In both private homes and the industrial sector, robots are becoming an increasingly frequent
sight as they help with completion of a wide variety of tasks. These tasks involve everything from cleaning to carrying, mounting, welding, assembling and so on. Common for such tasks is that by a robot they are usually completed faster, preciser, more efcient and most important; cheaper than by a human. Assembly line work is an obvious area to introduce robots into, which many larger production companies have already done. Higher efciency means higher throughput and more prot, which will in turn pay back the investment needed to make the processes autonomous. However, while large scale fabrication of thousands of units of the same product model is worth the effort of conguring and programming robots to do the work, it is an entirely different scenario if only a few units of different models are to be produced. Many small and medium sized enterprises (SMEs) are relying on orders of small batches of uniquely fabricated products, which can not bring in enough prot to pay for the continuous reconguration and reprogramming needed to sustain automated production of them all. To overcome this issue, which makes it hard for SMEs to introduce automation into their production, the SMErobotics project [9] was initiated by the European Union. Its goal is to create a new line of robots suited to solve the challenges and meet the needs of SMEs. The Danish Technological Institute (DTI) takes part in this development process, looking into cases where Danish SMEs want to introduce robots into their production. Small customized product batches are common here, and the way these are usually handled is Figure 1.1: Example of manual assemmost often manually, like the example shown in g- bly work at a Danish SME. ure 1.1. DTI is currently developing a new concept for industrial robots; the Robot Co-Worker [3]. Tailored to suit the needs of SMEs, this robot platform is being developed with focus on easy reconguration and reprogramming, as well as a user friendly interface and the capability of working in a partly dynamic environment alongside human workers. The concept is to free the human workers from the 1
CHAPTER 1. INTRODUCTION
most tedious or troublesome parts of the production process, while relying on the expertise and experience of the workers themselves by letting them instruct and supervise the robot. This is made possible with an intuitive instruction process, allowing factory workers to tell the robots what to do, reducing the cost and time frame of frequent reconguration and reprogramming of the robots. An example of possible inclusion of the Robot Co-Worker in an assembly line is illustrated in gure 1.2. In order for the Robot Co-Worker to be of help as part of a team with both humans and robots, it is necessary to have the robot perceive and act on its surroundings. Because of the human element in the production process, precision in placements of objects on the assembly line are not accurate enough to have the robot interact with them blindly. To successfully interact with objects in the work space, the Robot CoWorker must have sensor systems providing detailed 3D information of its environment. Not only can this allow seamless interaction with targeted objects, but it also allows the robot to plan its movements to avoid collision with obstacles in the dynamic work space. The development prototype of the Robot Co-Worker is equipped with a changeable tool at the end of the robot arm, allowing it to pick up objects of different size and weight. Up until now, any picking and placing has been done at predetermined 3D coordinate positions, demonstrating the grippers, tool changing, intuitive interface, instructive task teaching and situation assessment for error handling. To further add to the capabilities of the prototype it is desired to introduce additional sensors in form of multiple vision systems. For precise relative positioning of the gripper tool, while moving to pick up objects in the scene, a single sensor tool-camera is being implemented. This will allow 2D/2.5D registration of the area just in front of the arm, making the robot able to calculate pose estimates precise enough for picking and placing objects. However, as the eld of view (FOV) for this camera covers only a fraction of the entire work space, it does not provide a good overview of the rest of the area. By including a separate scene camera setup, it is desired to monitor the entire work space regardless of the position of the tool camera. In terms of precision, the scene camera will not have to meet the same high precision requirements as the tool camera, but the data quality should be sufcient to assess the state of the entire scene. With the scene camera providing 3D position data on all objects in the scene, it is desired to be able to point out an object of interest in a given situation and utilize this information to bring the tool camera in position for further analysis. Inclusion of obstacle avoidance in motion planning, when moving the robot arm, is to be made possible with data from the scene camera as well. Figure 1.2: Robot Co-Worker concept [1].
1.2. PROBLEM FORMULATION
1.2
Problem formulation
The project goal is to develop, integrate and verify a scene camera solution for the Robot
Co-Worker, making the system capable of perceiving its work area in 3D. It is imperative that the implemented solution is both accurate and robust enough to function reliably in connection with other modules, making it a useful feature for the Robot Co-Worker, as well as a viable choice for inclusion in other projects with similar needs. The scene camera should be able to detect objects in the scene and provide details on their size and position. Desired functionality includes: Simple and fast calibration procedure Data precision with sufcient accuracy for initial object position estimation Creation and publication of the scene as a dense 3D pointcloud of RGBD data Segmentation, of individual objects in the scene, into separate point clouds Generation of simple bounding geometries for obstacle avoidance in motion planning Functionality to return the position of a box, of specied size, in the scene
The scene camera solution should be able to operate continuously, even during movement of the robot arm inside the work area. As part of the verication process it is desired to have the scene camera feature showcased as a fully functional and essential part of the Robot Co-Worker during a scheduled public demonstration in early May 2013. Furthermore, the project should evaluate on the capabilities of the scene camera in perspective to options for further development of extended functionality for future tasks, while preparing the software structure to allow such expansions. In summary: Development and implementation of a scene camera solution with easy calibration. Integration with the Robot Co-Worker and inclusion in showcase demonstration. Test and verication of the solution. Evaluation on options for future use and expandability.
Finally, the solution should also be usable with the computer systems at Automation and Control, DTU, which is to be veried through testing.
CHAPTER 1. INTRODUCTION
1.3
Scope and limitations

This project is conducted in cooperation with the Danish Technological Institute, and is based
on their Robot Co-Worker project which is part of the European robotics initiative SMErobotics. This imposes some limitations, as the project is to be based on the concept of the existing Robot Co-Worker prototype and previous research. Since all of the existing modules of the Robot Co-Worker has been developed to run in a ROS (Robot Operating System) framework, it is a requirement that the scene camera solution should also be developed as a ROS component. Because of how widely used ROS has become in robot solutions all over the world, this is considered as a very reasonable choice. Since ROS is linux based it will be possible to merge the scene camera solution with the computer systems at DTU. Based on previous research by M. Viager [12], it has been determined that the use of a Structured Light based vision system will meet the requirements set for the scene camera. It is thus set as a premise that the scene camera solution should use structured light sensor technology in the form of the Kinect sensor from Microsoft or the Xtion PRO live sensor from ASUS. With the Danish Technological Institute as industrial partner on the project, it is desired to have the scope of this project, to some extent, cover the specic needs in the Robot Co-worker prototype. This is to be achieved with participation in the Robot Co-Worker development team, focusing on integration of useful scene camera functionalities. The demonstration tasks, including materials of surfaces and objects, has been predetermined to best showcase the capabilities of the Robot Co-Worker. It is thus preferred that substitution of these is not considered to be an acceptable option when developing the scene camera solution.
1.4. CONTENT OVERVIEW
1.4
Content overview
This report is structured to explain the development process of the scene camera solution.
Chapter 2 Contains the analysis of the problem at hand, concluding which problems can be solved with existing technology and which should be prioritized for development of new functionalities. Chapter 3 Goes through the details of the elements developed for the scene camera solution, including argumentation for the chosen approaches. Chapter 4 Veries the functionality of the complete solution, testing reliability, precision and robustness. Chapter 5 Elaborates on suggestions for future expansion through development of additional features. Chapter 6 Summarizes the achievements of the project, and evaluates on the level of initial requirement fullment of the solution. Appendix A Contains a step-by-step guide on how to install and use the solution on a linux system. Appendix B Further describes the details on how to receive and use the implemented data streams. Appendix C Shows additional depth calibration measurement data in tables and plots. Appendix D Provides and overview of the contents of the project CD. Appendix E Contains a printed version of all code written for the developed solution.
Chapter Analysis
In order to fully understand the aspects of the problem, an analysis of the involved elements is conducted. This is to give a better overview of the underlying concepts, providing a basis for making choices in the solution to be developed. Through this chapter is explained the scenarios in which the Robot Co-Worker will have to function, along with key obstacles to overcome for the scene camera solution to become useful and reliable. With this knowledge it is possible to consider whether certain approaches and solutions are acceptable in actual cases, providing basis for evaluation before choosing one over another. The analysis is summarized in section 2.9 and, based on this, development and inclusion of chosen software elements is explained in chapter 3.
2.1
Concept and tasks

The scene camera solution should provide real time 3D data, formatted to be of direct use to
subscribing clients. Some initial cases of intended use, in the form of demonstration procedures, are to be used for verication of the implementation. Inclusion of realistic object sizes, shapes and textures in a dynamic environment only partly isolated from possible noise sources, is expected to provide reliable results for evaluation of general use with other similar tasks as well. The two demonstrations chosen by DTI are: Obstacle avoidance in sensor based motion planning. Operator instructed assembly with sensor based picking and placing. Simplest of the two, is the rst demonstration of obstacle avoidance in motion planning. The task is to show capability of detecting a dynamically placed obstacle, planning and following a movement path around it. For the scene camera, the task is here to provide information on restricted areas in the environment, for the motion planning method to use in path estimation. The goal of the second demonstration is to showcase intuitive instruction of a simple assembly task, followed by a seamless execution by the robot. This case is designed to exemplify the needs in an industrial assembly line process, and include most of the capabilities of the Robot Co-Worker cell. Here, the scene camera is to be used for detection of objects in the scene, providing coordinates for the robot arm to move in position for precision pose detection by the tool-camera. The object for the scene camera to detect is a large box containing transformers, and a metal heat sink.
CHAPTER 2. ANALYSIS
2.2
Introduction of scene and objects

The entire work area is considered the scene
for the camera to monitor, which in this case is a square table surface measuring 1m by 1m. On one side of the table is mounted a robot arm1 with a lifting capacity of up to 5 kg, and on the left side from this is a curtain reducing the amount of ambient light in the cell. Mounted on the robot is a single toolcamera for precision positioning when using the gripper to pick up or place objects. The gripper tool attached at the end of the robot is one of several, each created to complete specic tasks. Figure 2.1: Overview of the work area with objects from several industrial processes, and a tag
for calibration purposes. The square containers in The project goal is for the Robot Co-Worker the corners are included to evaluate detectability to be able to complete a wide variety of tasks, of metal and transparent materials. and thus exibility in tolerance to different scene components should be considered. This is done by using additional objects from other industrial processes than those demonstrated, when evaluating the general usability of the scene camera. An example of this is shown in gure 2.1 with a scene containing objects of sizes and materials other than those used in the demonstrations. In the assembly demonstration, shown in gure 2.2, the table is marked with two rectangular zones visualizing the corresponding physical limitations in this conguration. Because of the tool size and shape, the box with transformers needs to be inside the large black rectangle, for the robot to be able to reach all of the 16 possible pick locations. The box contains up to 16 transformers in 2 layers, padded by dense foam. Figure 2.2: Assembly demonstration scene. The black zone The smaller red zone marks the is limited by the reach of the robot arm and the red zone is region where the metal heat sink limited by capabilities of the tool camera and scene camera. should be placed, for the robot to successfully place the transformer in it. Rotations of magnitude up to 45 are tolerable for successful detection with the tool camera. In addition to this, robust scene camera detection of the heat sink have not been achievable, which are the reasons that the red rectangle is not larger. Objects and materials in this scene are generally large and with textures and surfaces well suited for detection, such as cardboard, black plastic and unpolished aluminium. Most surfaces
1
The robot arm manufacturer is Universal Robots, and the model is a UR5.
2.3. EXISTING SOLUTION
and textures in the scene are inside the range of what is detectable with a structured light sensor. However, smaller parts of the blank transformer surfaces and the metal heat sink are not ideal, and will be undetectable at certain angles. The tape used for marking zones on the table has a shiny nish, which will result in higher magnitude of distance detection noise on its surface.
2.3
Existing solution
Solutions with root in similar scene camera problems has been considered by others, and
suggestions to solving these have been developed. For their PR2 research robot, Willow Garage has developed a series of modules, called the tabletop manipulation apps [2], which in combination solves a set of tasks very similar to what has been requested for the Robot Co-Worker. A screen shot of a visualization from the package is shown in gure 2.3. The PR2 robot is positioned next to a table of with objects, and can be instructed to recognise, pick and place specic objects. The tabletop object detector package contains the relevant features, but their intended environment and tasks differs from that of the Robot Co-Worker. Developed for the PR2 robot, the package is tailored for this platform specically, and for use with a Narrow Stereo image from a single viewpoint. Apart from not using a structured light sensor, some of the assumptions about scene and objects are problematic. The object recognition functionality is only compatFigure 2.3: Visualisation of a identication and picking task with the tabletop manipulation apps for the PR2 robot. Image is from the package summary wiki page [2].
ible with rotationally symmetrical objects, and can only handle 2 degrees of freedom (movement along x and y axes). This is not sufcient for use with the Robot Co-Worker, so modications of this functionality would have to be made in order for the package to be useful. Object segmentation is done based on detection and extraction of the dominant plane in the scene, expected to be the table surface. This approach is obviously meant for cases where the relative position from sensor to table surface is not known, which is the case for the mobile PR2 robot. For a statically mounted sensor it would work in cases where the table is the dominant plane, but with the need for detection of boxes of considerable size there is a risk of wrongfully detecting the side or top of a box to be the table surface. Adjustment to this segmentation approach would also have to be made, before it could be used with the Robot Co-Worker. The tabletop object detection package consists of two components for object perception; object segmentation and object recognition. With the assumptions behind choices for limitations in the capability of these, several fundamental changes would have to be made before either component could be used in implementation as a scene camera for the Robot Co-Worker. Because of this it is chosen not to use the tabletop object detection package, and instead create a custom scene camera solution. Existing components should still be used where possible, but with a new framework combining these to match the requirements of the Robot Co-Worker. Through the rest of chapter 2, aspects and requirements of individual solution modules are considered, and it is evaluated which approaches are qualied and necessary.
10
CHAPTER 2. ANALYSIS
2.4
ROS
As the project is required to result in a solution based on ROS, this will be the overall software
framework used. Even within such a predened package structure for ROS there are still lots of possibilities to adapt the software structure and optimize for adaptability, future expansion and easy conguration. This section briey summarizes the concept of ROS, for readers unfamiliar to it.
The open source Robot Operating System (ROS) is not a standalone operating system as the name suggests, but a complete framework for robot applications which can be installed on top of many popular Linux distributions, such as Ubuntu. ROS provides a framework with libraries and tools for robot developers to use in almost any robot application. Because of the large community behind ROS, it comes with standardized libraries and drivers to cover many common algorithms and supports a large part of all existing hardware. The framework with built in message-passing, visualizers and package management makes it easy for developers to take advantage of existing functionalities instead of having to make their own. In return for using ROS, users are encouraged to make their nished packages available for others to use, contributing to a continuously growing collection of tools. This ultimately allows all developers to spend more time on new ideas and concepts, as they no longer need to spend as much time on implementing proven existing methods and functionalities. Designed to run in many separate nodes, ROS is created with clustering in mind. Communication is done via TCP (Transmission Control Protocol), making it possible to run nodes on separate computers, which is very useful in many applications within robotics. This allows separation of nodes with high processing power requirements to dedicated computers, or having a stationary system perform data processing for mobile robots with limited carrying capacity.
Functionalities in ROS are broken down into nodes, which are small standalone programs linked together. All communication between nodes goes through topics, which works as relays for information. A node can publish or subscribe to information in one or more topics. As an example; a low level node could work as a driver for a piece of hardware on a robot, relaying the raw sensor values into a published topic. An overlying node could then subscribe to this topic and lter or evaluate the collected data, providing basis for a higher level status assessment. The nodes can run at predened frequencies or be set to trigger upon receiving new information from a topic. This is very useful for optimization, as it can limit the internal data transferring to only take place when there is an active subscriber to the gathered information. Nodes can also be in the form of a service, providing one or more services, similar to global method calls. Unfortunately, these are simple method calls leaving the caller with no other information than a response upon completion or failure of the requested service call. An advanced form of services have been developed to provide status updates during execution and even allow the caller to end the method process before it nishes. These are called actions, and require the ROS installation to be congured to use the actionlib library.
To allow easy reconguration of parameters in nodes, without having to edit and recompile the source code, ROS makes use of launch les. These make it easy to provide many parameters at runtime, in a simple XML structure. The full potential of this feature is in the ability to include
2.5. SENSORS
11
and inherit from parent launch les, allowing to overwrite any or none of the inherited parameter values.
2.4.1
Software versions
To match the ROS version which the existing Robot Co-Worker modules were created for, it is chosen to use ROS fuerte. As operating system is used the popular Linux distribution Ubuntu, version 12.04 LTS. A full desktop installation of ROS with the addition of OpenNI sensor driver packages openni_camera and openni_launch allows direct use of the Kinect sensors. In order to use the Xtion sensors it is required to roll back the Sensor-Bin-Linux driver to version 5.1.0.41, otherwise the driver will not register the sensors as connected. This is done simply by running the driver install script with administrator rights, overwriting the existing driver les. To revert this process, the newest driver version can be used to overwrite the driver les the same way.
2.5
Sensors
A key aspect of the solution is to utilize useful sensors in optimal positions. The sensor type
has already been decided, so this choice is outside the scope of the project. Two candidate sensors of this type qualify for use in this case, so a comparison is needed. This section also contains a brief description of what to expect from structured light technology, as well as thoughts on sensor positioning.
2.5.1
Structured light
Structured light sensors function by projecting a predened pattern unto a surface and analysing the deformations in the projection, from the viewpoint of a camera with known relative position [10]. Precision of the sensor is depending on the resolution of the projected pattern, as well as the resolution of the camera. The camera must have a resolution high enough to distinguish the individual parts of the projected pattern from the background, and the resulting 3D precision is dependent on the density of the pattern features. An example of the pattern projected by a Kinect sensor is shown in gure 2.4. Because of the recent mass production of the Microsoft Kinect device, structured light sensor technology is currently very affordable and available to everyone. The research leading to the choice of structured light sensors for this project shows that appropriate precision for the desired level of object detection is achievable [12], even though the technology is not ideal for all surface and texture properties. As with other light based sensors, it gives rise to problems when surfaces are either very reective or absorbs the projected light instead of reecting it. This downside has been considered, but it is determined that a structured light based scene camera will still be of good use in the case at hand.
Figure 2.4: A section of the structured IR light projected by a Kinect. Less than 10% of the entire projection is shown.
12
CHAPTER 2. ANALYSIS
2.5.2
Microsoft Kinect & ASUS Xtion PRO LIVE
Based on an earlier thorough comparison of these two sensors [12], it is know that the technical specications of Kinects and Xtions are very similar. A brief overview of the main differences is given here, to provide basis for the selection of which one to use. Being the rst commonly available structured light sensor, the Microsoft Kinect (shown in g. 2.5) is both bigger and more power consuming than the ASUS Xtion. The foot mount of the Kinect is adjustable with a build in motor, and the newest revisions have and extra joint Figure 2.5: Microsoft Kinect. Large and allowing manual horizontal pivoting as well. A key fea- requires external power supply, but is ture for the Kinect is the build in accelerometer which cheap and comes with a integrated acis a welcomed additional feature in many robot applica- celerometer. tions. This has previously been shown to provide very good estimates of roll and pitch angles of the sensor, but unfortunately is not supported by the ROS driver. The structured light sensor driver for ROS is the same for both the Kinect and Xtion, and since the Xtion has no accelerometer, it has not yet been interfaced in the common driver. The ASUS Xtion PRO LIVE (shown in g. 2.6) comes in versions with or without an RGB camera (hence the "LIVE" part), and is considerably smaller than the Kinect. Furthermore, the Xtion can be powered by USB alone, which is not the case for the Kinect with an external Figure 2.6: ASUS Xtion PRO LIVE. power supply. Small and powered via USB only, but One additional feature is the capability of registering the RGB and depth images to each other internally in the sensor, compared to the Kinect which relies on the attached PC to take care of this. The registering process aligns the RGB image with the depth image, and makes it possible to store RGB values inside the 3D point cloud, with a single value for each point in the cloud. This inclusion of hardware accelerated RGBD registering is useful when the sensors is used with less powerful computer systems. Because of its smaller size, and ability to be powered directly via USB, the Xtion has been chosen for the scene camera solution. However, since the same ROS driver package is used for both sensors, and because they have the same technical specications, it is possible to also use the implementation directly with the Kinect if desired. almost twice the cost of a Kinect.
2.6. POSITIONING AND FIELD OF VIEW
13
2.6
Positioning and eld of view
The camera placement in the scene should be so that the entire work area is visible in the images. It is estimated that the eld of view (FOV) of the sensor where both RGB and depth data is available is approximately 58 horizontally and 45 vertically [12], making the vertical view angle the smallest. It should also be considered that the depth sensors have a minimum distance for reliable detection, which is approximately 0.6-0.8 meters, depending on the sensor manufacturer and the surrounding environment. In order to fully cover the entire work space, an obvious placement of the scene camera would be in the direct center of the overhanging structure. This would easily cover the entire square area, but it would also cause detection issues for objects hidden behind the moving robot arm. For this camera position to work, it would require the arm to move out of the working area for optimal coverage, or obtaining several data sets with the arm blocking different areas. Both options are far from optimal so the introduction of a secondary camera, to have the scene camera setup consist of two separate sensors, is considered. With two cameras covering the same area from different angles it is possible to always obtain data from any part of the area, no matter the position of the robot arm, by combining data from both cameras. For this to work it will require the cameras to be positioned in a way where the robot will not obstruct vision of the same area for both cameras at the same time.
2.6.1
Relative positioning
An illustration of the aluminium frame of the Robot Co-Worker, seen from a top perspective, is shown in gure 2.7. This indicates the obvious possible placements of the two scene cameras on the existing frame, which is preferred in order to avoid any unnecessary increase in the space taken up by the cell. Sides are numbered S1-S4, corners C1-C4, and the orientation is with the robot arm positioned at S1. Mounting a single camera at S3 would provide better vision of the table than a single middle centered camera, as the robot would be less likely to block its view. However, when the view is blocked it would not be easy to have a secondary camera in a position where complimen-
Figure 2.7: Sensor placement options. The robot arm is mounted at S1, and the chosen positions of the two sensors are on sides S2 and S4.
tary vision of the area is guaranteed. Placing the two cameras at corners C2 and C3 would make certain that the area under the arm is always visible to at least one camera (because of the mounted tools, the arm never moves all the way down to the surface). But this placement would give rise to a problem with the eld of view, as the rectangular eld of view is no longer aligned with the sides of the square work area, requiring a larger distance to the table. Higher distance leads to lower detection precision, so it is important to keep the distance from sensor to object as short as possible.
14 Moving the cameras to S2 and S4 at the sides of the cell allows the eld of view for both cameras to be aligned with the work area, as well as provide viewing angles complementing each other well. Depending on the anticipated height of objects in the scene, it may be advantageous to extend the cell and move the cameras further apart. In gure 2.8 is shown a conceptual graphic of the FOV seen from the front of the cell, making it clear why the detectable object height is limited. Small objects are entirely covered by both cameras, but at soon as a larger object is placed closer to the sides, they are not guaran-
CHAPTER 2. ANALYSIS
teed to be covered in their full height. This could lead Figure 2.8: Sensor FOV overlapping. to critical detection errors in rare situations where the The entire table surface is covered by robot arm is blocking the camera above a large object both sensors, but tall objects close to and the full height of the object is not within the FOV of the sides will only be covered by one. the opposite camera. For the cases and demonstrations used in this project, the viewing angles allow sufcient object height, as the detectable objects are to be placed within predened boundaries with enough distance from the work space edges. These boundaries are decided by the limited reach of the robot with a gripping tool mounted. In other applications it may be advantageous to move the cameras further apart, but this would increase the distance to the workspace and thereby decrease sensor precision. Figure 2.15 shows the FOV of both sensors in the chosen positions, conrming that the entire work space is covered.
((a)) 640x480 RGB image from Xtion1.
((b)) 640x480 RGB image from Xtion2.
Figure 2.9: Images showing the FOV for both sensors. It should be noted that this is the RGB image, which has a slightly bigger FOV than what is covered in images for depth estimation.
2.6. POSITIONING AND FIELD OF VIEW
15
2.6.2
Impact of overlapping
For reliable detection of the entire work space it is key to have both sensors cover the same area, but this comes with a disadvantage as well. Some robustness to ambient light is achieved by using the infrared (IR) light spectrum for this, but with two sensors projecting individual patterns on the same surface, some interference between these is to be expected. It has been shown that critical impact only occurs if the sensor patterns are nearly perfectly aligned, because nine brighter dots for internal calibration are interfering [10]. Less critical interference is expected to have some effect on the precision of aquired depth data, which is tested and explained in chapter 4.2. Furthermore, the use of more than one sensor does not natively support direct merging of 3D data into a single combined point cloud. One point cloud is available from each sensor, with all measured distances in relation to the sensor itself. In order to orient data from both sensors in correct alignment it is necessary to estimate their position and orientation in relation to each other, which is explained in section 2.7.
2.6.3
Conclusion
It has been deemed necessary to develop a new scene camera solution, as the use of existing alternatives would require extensive modication to full the requirements. It has been chosen to use the ASUS Xtion sensor, which can be positioned within the existing frame of the Robot Co-Worker. It is necessary to include more than one sensor to effectively collect data from the entire work space, for all possible positions of the robot. In some cases there will be areas of the scene where detection of tall objects is limited, which should be considered. One way to handle this could be by moving the robot arm so it will not block the view of either camera, when acquiring data samples. It is expected to have some reduction in sensor precision from overlapping of IR patterns projected from both sensors. Initial experiments have shown that the consequences of such interference is not critical to depth detection in general, but further investigations should be conducted to conrm this. In order to merge data from the two sensors it is required to know their poses relative to a common reference point. This is considered to be calibration issue, and will be covered in the following chapter.
16
CHAPTER 2. ANALYSIS
2.7
Calibration
The precision of 3D data provided by the scene camera setup relies on accurate calibration of
the intrinsic camera properties as well as the poses of sensors in the scene. In order to make the calibration task both fast and reliable, it is desired to automate and streamline the process. This will also make it viable to do re-calibration more often, making the entire solution more resistant to changes in the scene and setup.
2.7.1
Intrinsic camera calibration
As with all vision-based systems it is important to have a good calibration of the intrinsic camera parameters. This calibration can be done with the ROS package openni_camera, by following the ofcial tutorial [8], making it possible to calibrate the intrinsic camera parameters as well as the impact of any lens distortion. Because of the type of lens used in these sensors, there is not much distortion of the image, but the best result is achieved by also calibrating for the slight distortion. A screenshot Figure 2.10: Openni_camera calibration from the calibration procedure is shown in gure 2.10. screen. Intrinsic camera parameters The same calibration routine can be used to calibrate are calculated based on several pictures the IR sensor used for the depth data, as this is simply with a checker board, of given size, in another CMOS sensor with an IR lter in front of it. varying positions and orientations.
2.7.2
Distance calculation
Both the ASUS Xtion and Microsoft Kinect devices come with preset calibrations for internal calculation of metric distances from raw data values. It is uncertain whether this calibration has been done individually for each device, but it is clear that quality of the converted metric values vary a lot between devices. In order to achieve the desired precision, it is necessary to recalibrate the parameters used for conversion between raw sensor values and metric values. As no method is available for doing this, a new functionality needs to be developed. This is described in section 3.2.
2.7.3
External position calibration
In order to have the two sensors contribute to a single well-dened collective point cloud, it is necessary to align the two separate point clouds in correct relation to each other. 2.7.3.1 3D Feature registration
A common method of combining several datasets into a single global consistent model is the technique called registration, which iteratively matches and aligns identied feature points from separate data sets until a given alignment error threshold is reached. However, this method works best for data sets containing several well dened 3D features for matching, which also requires relatively high sensor precision. Even if these requirements are met by the structured light sensors, the computational load caused by the registration algorithm would take up a lot of the available processing power. For calibration purposes it could be viable to use registration, as the computational requirements would not have very high impact if only used in a one time ofine calculation. However, a simpler external calibration has proven to be sufciently effective in this case, as described in the following section.
2.7. CALIBRATION 2.7.3.2 Transforms
17
Instead of analysing and realigning the point clouds after acquisition, a pre-alignment of the relative sensor poses has proven to be sufciently accurate. By estimating poses of the two sensors relative to a common world frame, it is possible to have the two resulting point clouds aligned and positioned with enough precision to seamlessly make up a common larger and more dense total point cloud. This can be done by keeping track of the transforms between the coordinate systems of the two sensors and the chosen common world reference Figure 2.11: Sensor poses relative to (camworld ), as illustrated in gure 2.11. common reference, visualized in rviz. One way to estimate the sensor poses would be to Knowledge about relative positions almanually measure them in camworld coordinates. Not lows merging of sensor data. only would this be a tedious and time consuming calibration method, but it would also be unlikely to meet the precision requirements, as it is difcult to do precise manual measurements of the sensor orientation angles. A vision based method, using tag tracking, has previously been proven to provide very good positional estimates with similar sensors [11]. However, as this solution is required to use ROS, it is not very easy to re-use the mobotware2 plug-in from the previous project. Similar functionality can be achieved through use of the ROS package ar-track-alvar, serving as a wrapper for Alvar which is an open source Augmented Reality (AR) tag tracking library. 2.7.3.3 AR-track-alvar
The ar-track-alvar [5] ROS package provides functionality to detect, track and generate AR tags (see gure 2.12) in a live image stream. Other similar packages have been evaluated; the ar_pose [4], and camera_pose_calibration [13], but neither was found to be as reliable as ar-track-alvar or provide the pose data in a similarly directly usable structure. Based on these ndings, it is decided that ar-track-alvar is the best currently existing package for relative camera pose estimation. Figure 2.12: An AR tag genProvided with the intrinsic camera calibration parameters and erated with ar-track-alvar, enthe physical size of the AR tag, the package returns the 6D pose coded with the id 19. and encoded tag number. The pose is then used to determine the transform between sensor and AR tag, resulting in a calibration with sufcient precision, as shown in section 4.2.
Mobotware is a plug-in based software platform developed and used at Automation and Control, DTU
18 The latest ofcial release of the package utilizes depth data, in addition to regular image analysis, to improve the estimates of position and orientation of tags. This approach results in a very good match on orientation, but can have a critical impact on precision of position, as illustrated with the example in gure 2.13. Position estimates from this method are not precise enough to be used as a reasonable basis for merging.
CHAPTER 2. ANALYSIS
Figure 2.13: Alignment of point clouds from the It is possible to disable the use of depth data, and receive results which are only based on normal image analysis. Here it is required to provide the physical size of the AR tag, as the distance to it is no longer known. With this approach the positional alignment is very precise, but the estimated orientation is not as good as before. It is close enough to be tolerable, but as shown in gure 2.14 it could be better. Even though the precision is not perfect, it is decided that ar-track-alvar can provide pose Figure 2.14: Alignment of point clouds from the estimates with enough precision for reasonable two sensors, based on results from ar-track-alvar calibrations. The amount of time required to with only RGB data. The positioning is very predevelop a new package with similar capability cise, but the orientation has a small offset. outweighs the small potential improvement of precision, which is why this is not considered as an option here. However, to make substitution of the pose estimation package possible at a future time, the implementation should be carried out with this in mind. two sensors, based on results from ar-track-alvar with both depth and RGB data. Orientation is very precise, but positioning has a signicant offset.
2.7.4
Conclusion
The calibration procedures can be separated into two categories; the one-time calibrations and those in need of frequent recalibration. Both intrinsic camera parameters and the distance equation will only need to be calibrated once, unless the sensor suffers an impact with enough intensity to misplace the internal components or damage the lens. The intrinsic parameters are easily calibrated and saved with the available ros package openni_camera. No method for calibration of the parameters used for distance calculation is available, so to achieve higher precision it is necessary to develop this. In order to fuse data from the two sensors together, it is required to have a good estimate of both sensor poses. Even though the sensors are placed high up and out of the way, it is likely that they will become misaligned by external factors at some point and require recalibration.
2.8. 3D DATA PRECISION
19
Existing ROS packages can provide pose estimation capability, but it is necessary to wrap these into a simple calibration procedure. The procedure can benet from using the native transform handling in ROS, and the package ar-track-alvar can be used to obtain pose estimates between both cameras and a common point. Precision of the pose orientation is not perfect, but high enough to provide decent calibrations.
2.8
3D data precision
Examining 3D data gathered from the at table surface, the precision and eventual loss by
down sampling to a lower resolution is considered. In gure 2.15a is shown a close up of RGBD data at full resolution of approximately 2mm between points at this distance. Even though the table is perfectly at, it is obvious that there is variation in the data. Notice that the points are clustered in groups, and that all of these are tilted either right or left instead of being horizontal. The issue here is that the resolution in points, at this distance (approx. 1.1m) is greater than the resolution of depth measurements by the sensor. All clusters tilted towards the right is obtained with the sensor to the right, and opposite for the ones tilted left. As initial distance measurements are to a plane parallel with the sensor itself, the gaps seen between clusters is the depth resolution of distance perpendicular to the sensor. Averaging over the points will provide a reasonable estimate of the table surface, as shown in gure 2.15b. The clustering of high resolution points suggests that for detection of a at table surface, full depth image resolution provides a needlessly high density of points. Depending on the object to be detected, the high resolution could prove to be useful in other cases. It is therefore decided not to determine a specic resolution for the scene camera data, but to include an option for easy adjustment based on evaluation of different work environments and tasks. If high resolution is needed, it is necessary to use a more powerful computer for data handling, but optional congurations with down sampling to lower density should be available for compatibility with less powerful computer systems as well.
((a)) Maximum voxel grid resolution of 2mm.
((b)) Down sampled voxel grid resolution to 10mm.
Figure 2.15: Down sampling is necessary to reduce the density of point clouds, allowing analysis, ltering an segmentation with much lower processing power requirements. From an example with 3D data gathered from the table surface it is seen that the uncertainty in the high resolution data (left image) is relatively high, and that down sampling (right image) does not impact this signicantly.
Further evaluation of object detail distinguishability is done based on tests of the implemented solution, and can be found in section 4.3.
20
CHAPTER 2. ANALYSIS
2.9
Analysis conclusion
By implementing a combination of existing ROS packages with the addition of supplemen-
tary new ones, it is considered possible to develop a scene camera solution with the desired functionalities mentioned in section 1.2. Utilizing the ar-track-alvar package for 6D camera pose estimation, an intuitive and fast calibration procedure can be implemented. Inclusion of an option to adjust data resolution will make it possible to congure this to match required level of detail in a wider range of scenarios for reliable object position estimation. Using the ROS framework it is possible to merge data from multiple sensors and publish these as a dense 3D pointcloud of RGBD data. With knowledge of the position of the table surface plane it will be possible to segment individual objects on top of this into separate point clouds. Analysis of the segmented objects can be used for generation of simple bounding geometries for obstacle avoidance in motion planning, and for further processing to achieve object recognition capability. In order to fully cover the work area at all times, regardless of where the robot arm is positioned, it is required to use more than one structured light sensor. With two sensors positioned on the frame sides, across from each other, both sensors have the entire work area inside their eld of view. In situations where tall objects are placed close to the table sides, the double coverage does not include the entire object height. This could be critical for detection of obstacles in relation to motion planning, and should be remembered when considering reliability of object detection. Several calibrations needs to be conducted in order to ensure optimal data quality, and sensor pose calibration can be made straightforward to allow frequent recalibration when needed. With the current version of the chosen camera position estimation module it is not always possible to achieve perfect calibrations. A combination of measurements from both RGB and RGBD mode of the ar-track-alvar package could provide improved results, but RGBD mode is only supported by 64-bit linux installations. For now it is chosen to use the package, keeping in mind that modication or replacement is likely to be become relevant. Stock calibration of distance measurements by the structured light sensors are not consistent, and requires recalibration. A further investigation of this needs to be conducted, and a solution for calibration should be developed. As this is a general problem, it is preferred to have the correction functionality easily separable for use in other ROS based projects where structured light sensors are used.
Chapter Developed Elements

As concluded in the analysis, a combination of existing software modules alone does not provide all of the required functionality to implement the solution. It has been necessary to develop additions to ll these gaps, as explained in this chapter.
3.1
Software structure
In order to make further development possible in the future, it is imperative that the software
structure of the solution is prepared for this. Implementing all features in a single node would be possible, but it would also make it very troublesome to maintain and expand the source code. The implemented types of nodes can be divided into four main categories; Sensor driver Core functionality of the sensors, taking care of receiving data and conguration of the camera parameters and settings. Calibration Used only when it is decided that a calibration is needed. Stores the calibration data in a le which will then be used until a new calibration is done. Data providers Handles data streamed from the driver and read from the calibration le, preparing these for use by correcting depth estimates and keeping the information available. Scene camera features Main processing, ltering and segmentation of the acquired and prepared data, in the form of point clouds. Provides most public topics and services, and expandable with creation of add-ons. An illustration of the categories, containing their respective nodes, can be seen in gure 3.1. This also provides an overview of the included elements, of which the nodes openni_camera and ar-track-alvar are used directly from existing ROS packages. There is no strict launch order for the nodes, but no output will be generated if some of the lower level nodes are not running. The openni_camera sensor driver does not depend on input from other nodes, and is the obvious choice as the rst node to be started. A separate driver node is needed for each sensor, which is handled with two launch les inheriting most of their contents from the same parent. Each node then publishes topics with RGB images, point clouds and depth images. 21
22
CHAPTER 3. DEVELOPED ELEMENTS
Figure 3.1: Overview chart of software structure in the solution. Division into separate nodes, rather than one single combined, is to allow extraction and use of individual components in other ROS solutions, as well as easy maintenance and the option of utilizing distributed processing.
It should be noted that it is not possible to subscribe to point clouds from both sensors at the same time, if they are congured to 30 Hz operation, unless they are connected to separate computers or USB busses. Even then, it would also require a computer with more than average processing power to receive and handle the data. This is explained further in section 4.4.
The distance precision of point clouds available directly from the driver varies a lot with each sensor unit, and is not adjustable through calibration in the driver. Because of this uncertainty it is deemed necessary to include a depth_image_corrector node for generation of more precise point clouds. This is to be based on calibration of each individual sensor, as explained in detail in section 3.2. The corrected point clouds are then published for further analysis in the main scene camera node, as well as for calibration purposes.
With corrected depth values, it is now possible to run a calibration of the sensor positions. The cam_tf_generator is started from another launch le, and is congured to launch and call ar-track-alvar as necessary. The requested sensor unit id is provided in the launch le and passed on through a call to ar-track-alvar internally. With every new set of RGBD information a detection of the AR tag is attempted, and once enough data samples have been collected the average relative pose transform is calculated and saved to a le. This makes it possible to save calibrations through system restarts.
Back with the data providers, the tf_publisher node reads the pose calibration le and makes sure that the transformation is continuously published for use in the point cloud alignment. Here is also needed one node for each sensor. The calibration le is occasionally checked for new parameters, to allow recalibration during operation.
With all of the required data, it is now possible to start the actual analysis in the scenecamera node. This contains the core methods for initial data treatment, providing segmented and down sampled point clouds for further analysis in following nodes. Some data parts may already be usable in external modules at this point, but the main goal is to prepare and organize the data for nal analysis in smaller add-ons with more specic tasks.
3.2. PERCEPTION PROCESSING STEP 1 - DEPTH RECALIBRATION
23
The concept is that whenever a fellow developer of an external module for the Robot CoWorker requires a specic piece of information from the scene camera, a separate add-on should be created to meet the needs. Few processes will require the same formatting and exact set of details, making the separation into core scene camera functionality and specialized service add-ons a good solution to easier maintenance and a more transparent structure. This also makes the solution more adaptable to use in other scenarios, as it is possible in each implementation to decide which additional features, if any, is desired to install and run alongside the core functionality.
The thoughts behind each developed element are explained in more detail through the following sections.
3.2
Perception processing step 1 - Depth recalibration

The distance estimates of point clouds provided
Xtion2 distance estimates est. [m] 0.619 0.711 0.801 0.892 0.984 1.072 1.161 1.247 1.331 1.414 act. [m] 0.65 0.75 0.85 0.95 1.05 1.15 1.25 1.35 1.45 1.55 error [m] 0.031 0.039 0.049 0.058 0.066 0.078 0.089 0.103 0.119 0.136 error [%] 5.01 5.49 6.12 6.50 6.71 7.28 7.67 8.26 8.94 9.62
directly from the sensor driver can not be adjusted or calibrated further than the factory calibration. Whether such a calibration is done for each sensor, or a standard calibration is used for all, is unknown. By looking into the precision of the distance measurements of multiple sensor units it is clear that precision varies a lot between units, and that further calibration will improve the estimates in most cases.
Four sensors have been investigated in this project; two Kinects and two Xtions. Only one of these were sufciently precise and had no need for recalibration, while one had a estimation error of 10% at 1.6 m distance.
Table 3.1:
Estimate results for the
Xtion2 sensor with factory calibration shows that actual distances are 5-10% more than estimated, in this interval.
The most unprecise unit is included here as a worst case example, with measurement values in table 3.1 and plots in gure 3.2. Similar data from the other sensors is included in appendix C. Ten distance measurements with the sensor positioned perpendicular to a wall were taken at intervals of 10cm, and the distance estimated by the sensor was compared to the manually measured distance. To make such measurements easy, a corresponding feature has been built into the depth_img_corector node, which is used by launching the node with an additional "mode" parameter of 1. This will make it write out the mean distance to pixels in a 10 by 10 square in the center of the sensor area, to the console.
Already by looking at the error percentages in the table, it is clear that this sensor is far from optimally calibrated. Furthermore, the error is neither a constant offset, nor a specic percentage. By estimating a polynomial t describing the true distance as a function of the measured distance, it is possible to create a calibration equation. Even on the zoomed plot in gure 3.2(b) it is hard to see the difference between using a rst and second order polynomial t, but the second order does describe the connection much better than the rst order, conrmed by experimental tests. The reason behind the imperfect ts could be the human error in manual distance measuring, but it is
24
clear that both ts conrms that the estimates deviate signicantly from a 1:1 relation (the dashed line).
((a)) Full plot of all measurements
((b)) Zoom on the area of interest
Figure 3.2: The distance estimates for Xtion2 plotted in relation to measured distances. From the offset to the cyan colored dashed guideline with correct y = x relation, it is clear that the sensor estimates are not correct. With a polynomial tting the estimation data it is possible to calculate the correct distance from the estimates. The difference between a rst and second order t is hardly noticeable on the plots, but the second order polynomial does t the samples best. It may be possible to enhance the existing structured light sensor driver in ROS by including calibration parameters for the depth calculation, but this would also require manual inclusion of the feature in all future versions of the ofcial driver. Instead, an individual node is developed to complete this task as a part of the scene camera features. Internally, the node is named depth_image_corrector, as this is what it does. It handles generation of a RGBD point cloud, similar to what is done in the driver, but corrects the depth values according to equation parameters supplied in its launch le. This equation is based on the polynomial line t to a set of experimentally obtained measurements, and only needs to be estimated once for each sensor. The correction equation for the Xtion2 sensor is best estimated as the second order polynomial: dreal = 0.00008d2 est + 0.963dest + 24.33, converted to mm. In the driver there is a step between the gathered data and creation of the point cloud, which is a depth image where each pixel is encoded with its corresponding depth value. Already at this point the distances are given in mm, which is the most raw data value available from the sensor. With the raw sensor values unavailable, the only option is to do a recalibration of these provided metric values. This is not ideal, but it will result in valid corrections as long as the initial conversion does not change. Iterating through the depth image to recalculate each value, provides a corrected depth image from which a point cloud is then generated. The point cloud is published and made available alongside the one without recalibration, allowing the user to choose either. This continuous processing of images and parallel creation of two point clouds is in principle not desirable, as it is a waste of processing resources. However, because of the way ROS topics works, the original point cloud is not generated unless a node is an active subscriber to it. The addition of the depth image correction node will therefore not require additional processing power, as long as there are no subscribers to the original point cloud.
3.3. PERCEPTION PROCESSING STEP 2 - TRANSFORM ESTIMATION
25
This module is an essential part of the scene camera solution, making precision in both transform estimation and object pose estimation reliable with all sensor units, regardless of their factory calibration quality.
3.3
Perception processing step 2 - Transform estimation

In order to merge the point clouds from the two sensors, without having to analyse the point
cloud data, it is necessary to know the relative positioning of the sensors. The relative positions are measured from a common reference point on the table, which then serves as the common main coordinate system for all poses estimated with the scene camera. This is also used when ltering out the points from the table surface. Handling transform estimation and saving the result in a conguration le, is done by the node with this task is named cam_tf_cfg_generator. During development of the solution it has become clear that the ar-track-alvar package, used for pose detection, was developed for 64-bit operating systems only and therefore can not run on 32-bit systems. This could be a problem for future users, so the functionality is separated in its own node to make it easier to modify or replace with another pose estimation package. Once started, the node attempts to gather data for creation of a pose calibration le for one sensor. It starts up the ar-track-alvar package, providing AR tag size parameters and information of which topics to subscribe to for the image streams, and then waits for detection of an AR tag with the chosen id number. Ar-track-alvar publishes the transforms to all observed AR tags directly in the shared ROS collection of transforms, which is monitored by the cam_tf_cfg_generator node. To achieve higher precision in the pose estimate, a total of 100 separate readings are collected and the mean value of these is stored in the conguration le. This is due to the noise included when ar-track-alvar estimates the distance to an AR tag from the depth data of a sensor in addition to regular image analysis. Further testing of this is described in section 4.2. When the 100 readings have been obtained, the calculated pose is stored in a conguration le and the node stops ar-track-alvar before terminating itself. In order for the transforms to be usable by other nodes in the system, it is required that they are continuously published for inclusion in the common transform topic. This is han- Figure 3.3: Two point clouds merged by aligndled by the small separate node tf_publisher, ment based on the calibrated transforms between which takes the path to a conguration le as sensors and a common reference. In this case,t he a launch parameter. At a predened interval, camworld reference is placed directly in the center the node publishes the most recently read pose, of the AR tag.
26
and is congured to check the conguration le for changes after a number of updates. This is to allow calibration of the system during operation, not having to restart the tf_publisher nodes to see the effect. An example of the relative transforms between the common camworld coordinate system reference and the sensors is shown in gure 3.3. With this calibration procedure it is possible to recalibrate the scene camera, by placing the AR tag at on the table surface and visible to both cameras, and running the cam_tf_cfg_generator node. The AR tag can then be removed when the calibration is saved. This procedure is based on a need for easy calibration, but no desire to run automatic recalibration at predened intervals. Expansion of the solution to allow this would be possible, but it would also require the AR tag to be in the scene at all times. If it is desired to use a reference point in the work space, other than the center of the AR tag, as the camworld reference, this is easily done by providing the pose of the point with respect to the AR tag in the node launch le. This will make all published object poses relative to the dened point.
3.4
Perception processing step 3 - Scene camera core

The scenecamera node now has the required information in data streams, and can start generat-
ing the main output of the scene camera module. Most of this output is not targeted at external processes, but meant to be used by scene camera add-ons which will then further process and format specically targeted outputs. Subscriptions to the topics with the RGBD point clouds generated from the corrected depth images is the main source of data for the node. It has been chosen to include an option for setting the loop rate of the scenecamera node specically, to allow a lower rate than the frame rate set in the camera driver. This is for cases where a high camera frame rate is desired, but a similarly high scene analysis rate is not realistic with the available processing power. In this project, camera frame rates of 10Hz and scene analysis rates of 1Hz has mainly been used. All processing of the point clouds is done by utilizing functionalities of the Point Cloud Library (PCL), which is included with the structured light sensor drivers for ROS. With sensor pose information from the transform publisher, the two point clouds are merged together to form one single point cloud. Further ltering and processing of this very dense point cloud would require a lot of processing power, so an option to down sample the data is provided through a parameter in the launch le. This makes it possible to set a leaf size, which is the resolution of a voxel (volume pixel) grid for the point cloud to be reduced to. The reduction is done by taking the mean value of all points within each voxel, which can greatly reduce density while also working as an initial noise lter. An appropriate leaf size for the tasks in this project is about 1 cm3 . To further reduce the number of points and only include those of interest, a ltering based on region of interest (ROI) is conducted. In the launch le, the size of the work area can be congured, which is used for this lter. All points outside this ROI box will be ignored in the scene analysis. This step allows ltering out the points placed on the table surface as well, having only points from objects inside the scene passing through the lter, by translating the ROI to start a short distance above the table surface. Because of noise in the sensor data it is required to dene the ROI high enough above the table surface to account for this. With the sensors placed 1m above
3.4. PERCEPTION PROCESSING STEP 3 - SCENE CAMERA CORE
27
the table surface, a margin of 2.5 - 3.0cm has proven to sufce. This also accounts for any small angular alignment errors and slight curving of the data furthest from the center, as explained in more detail in chapter 4.3.
The remaining points are now only a small fraction of the full point cloud, making it more efcient to apply object segmentation algorithms. Utilizing the Euclidean cluster extraction approach [6] from PCL, all of the remaining points are grouped into clusters and accepted as individual objects if they contain more than a predened number of points. The minimum number of points can be set in the launch le, and should be adjusted to match the chosen leaf size. This is also the case for minimum distance between to points for them to become clustered, as different leaf sizes will require accordingly different distance thresholds. Individual point clouds for each object, as well as a point cloud with all objects, are then published for use in add-ons or external modules. All point data is kept for the separated point clouds, having them retain RGB value and position information in relation to the chosen common reference coordinate system camworld . Furthermore, an analysis of each individual object is conducted to verify that it is placed on the table and not oating. It can be assumed that a oating object is the robot arm or a picked object, which it is chosen to ignore in the scene analysis. This is primarily to avoid having the robot arm see itself as an obstacle to avoid in motion planning. Floating objects are identied by having their lowest point z-value greater than a threshold dened in the launch le.
Axis aligned bounding box estimation has also been added to the scenecamera node. Whether this functionality should be in an addon rather than the core functionalities can be discussed, but for now it is part of the scene camera core. Each segmented object is tted with an axis aligned bounding box, which will t some objects better than other. The bounding boxes are collected as an array of markers, which is a ROS object type allowing storage and easy visualization of the data. The size, centroid point and orientation of each box is stored, and the array containing them is published. An example of a scene where the bounding boxes are displayed (in red) alongside the combined and down sampled point cloud is shown in gure Figure 3.4: Augmented version of the image 3.4. from gure 3.3 with axis aligned bounding boxes. The bounding boxes can be used when plan- These indicate segmented objects, which should ning motion paths for the robot arm, as in the be considered as obstacles in motion planning. the motion planning demonstration described in section 4.5.1.1. Depending on the placement of objects in the scene, the axis-aligned bounding boxes will most likely be unnecessarily large because their sides are always aligned with the camworld , no matter how the object inside is oriented. This may be tolerable for motion planning with few or small objects, but for determining the size of the actual object it is not optimal.
28
3.5
Perception processing step 4 - Box nder add-on

Now that the point cloud data has been segmented into separate objects, it is possible to start
analysing these. By completing such steps in add-on modules for the scene camera, it is possible to separate the recognition process from the segmentation process. The segmentation task will be the same for most scenarios, but the recognition will most likely have very specic requirements. To full the requirement of nding the pose of a specically sized box in the scene, an add-on for classication of box sizes has been developed. As this is sufcient to solve the task at hand, no object recognition functionality has been implemented in this project, but it would be an obvious subject for future work. The box nder add-on is meant to be used for simple object classication based on object size only. It may not be very useful with many objects of similar size in the scene, but can be used to locate a uniquely sized box and deliver a reasonable estimate of its position. The boxnder is implemented as a ROS service, waiting for a specic request before starting a search. Other ROS nodes can call the service with four parameters to initiate a box search; dimensions in height, width and length, as well as a per cent value for tolerance in variation from these dimensions. The dimensions are attempted veried by going through all of the bounding boxes available from the scene camera. If a box matches all three dimensions within the acceptable variation, the pose of it is returned in an XML string. If ten marker Figure 3.5: A simple example with only one box arrays have provided no matches, the service in the scene. The dimensions of the red bounding stops the search and returns an error. In gure box is matched to a box size request. If the size is 3.5 is shown an example where only a single an acceptable match, the box centroid coordinates box is placed in the scene. If the dimensions of the box matches the requested size, within the tolerance, its pose is returned. For the service to be more exible it can be converted to become an action instead. As explained in section 2.4, the action implementation would allow continuous status updates and premature termination, which means that it would not block the process waiting for a response. However, in order to make the add-on compatible with existing modules of the Robot Co-Worker, it had to be implemented as a service. It is planned to have a later iteration of the Robot Co-Worker prototype support ROS actions, at which time the add-on could easily be modied to work as an action instead. are returned.
3.5. PERCEPTION PROCESSING STEP 4 - BOX FINDER ADD-ON
29
3.5.1
Detection of rotation
A problem for the box nder add-on, in its current state, is that it relies on sizes of axis-aligned bounding boxes. In situations where the searched boxes are not aligned with the camworld coordinate system, the bounding boxes will represent a larger size than the actual box within. Small rotations of only a few degrees may be handled by setting the size tolerance a bit higher, but the true box pose is still not found. There is no existing PCL library providing this object-aligned bounding box estimation directly, and development of such a method is on the boundary of the scope for this project. Some considerations on the subject was done, but an implementation was not prioritized, as it could be assumed that the box in the immediate demonstration would be axis aligned. Two approaches to object alignment estimation have been investigated in this project. As summarized in the following, one of these was found to be potentially useful. Principal component analysis (PCA) was considered as a candidate for determining the rotational alignment of objects, but was discarded after experimental results made it clear that this was not ideal for hollow box shapes. Even if all sides of the box was visible, the estimated major axis of the object would most likely be from one corner to the adjacent, rather than parallel with the box sides. This led to the conclusion that PCA is not very well suited for this task. If the boxes had been solid, containing points as well, it might have been more plausible. Normal vector analysis was then considered, with the idea that normal vector histograms could be used to estimate rotation from the angles of normal vectors on box sides. An example of a box point cloud visualized with its normal vectors can be seen in gure 3.6. The amount available normal vectors in the shown example makes it clear that the current sensor positioning makes it unlikely to have many of the box sides detected at once, which could prove to be a problem. Without sides to analyse for normal vector alignment, estimation of orientation is not easy. Analysis of the rectangular shape of the visible top layer would probably provide better results in such Figure 3.6: Individual visualization of the box worst case scenarios. from gure 3.3 with normal vectors in key areas. The principle of normal vector analysis is As only one box side is partly detected, it is hard considered viable for use in orientation estima- to use normal vector information from only this tion of boxes, as well as other types of feature to determine the orientation of the box. recognition.
30
3.6
Conclusion
Several elements have had to be developed to make the scene camera solution full the given
requirements. Focus has been on the rst part of the perception sequence, allowing robust and correct data aquisition and segmentation of objects into seperate point clouds, ready for further analysis. The second perception part with object recognition or classication has been introduced in the form of a box nder add-on, but actual object recognition is considered to be outside the scope of this project. Calibration and data provider nodes have been developed to create an easy calibration procedure, partly consisting of methods allowing inclusion of the existing ROS package ar-track-alvar. A method for calibration of the depth estimates of the structured light sensors has been shown to be necessary, making it possible to achieve higher precision. Such a functionality has been integrated, in the form of a depth_image_corrector node which can also be used directly in other ROS applications using similar sensors. A core scenecamera node has been developed to merge, down sample, lter and segment the acquired point clouds, using algorithms from the PCL. The output from this is ready for use in add-ons and is published in several forms, alongside simple bounding box geometries allow obstacle avoidance through motion planning. Because of specic needs in a demonstration for estimating the location of a box in the scene, a box nder service add-on has been developed. This can also serve as a near minimal example of scene camera add-on construction, for use as a reference to new developers. In order to verify the robustness, reliability and precision of the individual elements, as well as the collective solution as a whole, experimental tests are documented in the next chapter.
Chapter Tests
Verication of the implemented scene camera solution is done through methodical testing with data logging and analysis hereof, as well as inclusion in many demonstration test cycles. This is to legitimate that the solution lives up to its requirements and to document fullment of all points in the problem formulation, as well as reveal areas with room for improvement. This chapter contains evaluation on results from tests of: Impact on data by major noise sources Precision of sensor pose estimates in calibration Distinguishability of 3D features for recognition Reliability of obstacle mapping and box nder features Performance and data acquisition rates Portability to DTU linux platform The tests have been carried out while pushing the implementation to the limits of its intended use, varying the changeable congurations to identify any problems herein.
31
32
CHAPTER 4. TESTS
4.1
Impact of noise sources

The Robot Co-Worker concept is to be functional in existing industrial environments in produc-
tion lines, and must be able to function even though conditions are not necessarily optimal. It may be possible to adapt the work environment slightly to shield the scene camera from most external inuence, but some interference between the sensors of the cell itself should also be expected.
4.1.1
Ambient light
Light conditions in the scene is always important for vision based systems. Where many cameras can be manually adjusted to have white balance and focus distance changed, the Kinect and Xtion sensors handle such adjustments automatically, with no option for manual adjustment. In most cases this is sufcient, but it makes the implementation dependant on external light sources, with no option to adapt. A critical issue is when saturation occurs for the camera with IR lter, which is used for 3D data generation. Articial light sources have not been seen to cause such issues, contradictory to sunlight. Direct sunlight has the most signicant impact, but diffused reections from surfaces or through windows have proven to be an issue as well. The image in gure 4.1 shows the impact of diffused sunlight in the work area. The entire center of the table surface is not registered, because the saturation of IR light in that area makes it impossible for the sensor to distinguish the projected points. It is important that the entire work space is shielded from strong sunlight, in order to ensure reliable use of the structured light cameras. A curtain had to be added to one side of the the Robot Co-Worker demonstration cell which is oriented towards several windows. To further Figure 4.1: Example of how ambient sun light can affect the acquisition of 3D data sets with the IR based structured light sensors. The area of the table with highest concentration of IR light from both sensors and the sun is undetectable under these conditions.
control the lighting conditions, a roof with LED lights has also been added. This is not required by the scene camera solution, but it is necessary for the tool camera to have as close to static light conditions as possible.
4.1.2
Overlapping structured light patterns
It is expected to have some impact from using two structured light sensors to monitor the same area, as they are both emitting IR dot patterns for depth estimation. As long as alignment of all nine calibration points [10] is avoided, only limited conict between the two is noticable. On most surfaces in the scene there is no problem for each sensor to recognize its own pattern projection and ignore any extra dots. However, surfaces with reective properties can sometime be oriented so that very little returned light is visible to the IR camera, normally causing a small hole of missing data in the resulting point cloud. In these cases it can be critical if one or more of
4.1. IMPACT OF NOISE SOURCES
33
the dots projected by the other sensor is at an angle where the light is better reected, causing the wrongful assumption that this is the missing dot. Any shift in position of such dots is assumed by the sensor to be caused by an object in the scene, resulting in a spike in the point cloud. An example of this is shown in gure 4.2 where the black area of an AR tag causes a signicant spike. Of all objects used in the demonstrations through this project, very few have caused such signicant problems. The AR tag, which is only in the scene during calibration, as well as the black tape marking the box zone, are the primary ones. In both cases the issue is not critical, as these spikes only consists of a few points, which is not enough to pass through the segmentation process and be considered as objects. Worst case is if a real object is positioned very close to a spike, having this included in the segmentation of such an object because of the small distance between them. However, this is unlikely, as the object will usually block the Figure 4.2: At the upper right corner of the black square AR tag can be seen a signicant spike in depth reading of a small area. This primarily occurs with two sensors pointed at the same reective surface.
projected pattern from one of the two sensors. It should be kept in mind that these spikes can occur in the 3D data stream, but simple averaging with various ltering techniques will make them neglectable.
34
CHAPTER 4. TESTS
4.2
Precision of sensor position calibration

In the analysis it was concluded that the ar-track-alvar package can provide sufciently precise
6D pose estimates of the two sensors relative to a common reference. It was also seen that the RGB and RGBD modes gave different calibration results which deviated on either position or orientation. To further investigate this, pose estimation data from a calibration sequence has been plotted for analysis as shown in gure 4.3. The two available modes for pose estimation was run on the same data sequence of approximately 20 seconds length, which has been recorded and played back using the rosbag functionality. An AR tag with quadratic sides of 44.7cm has been used for calibration purposes. A smaller variation of 20cm was also tested, but best accuracy was achieved with the larger one. For unknown reasons the two modes register the AR tag with a /2 rotation offset, which has been normalized in the plotted data. The 100 2D estimations deviate very little because of less sensor noise than with 3D, meaning that less measurements than 100 would be sufcient to determine a mean value. Noise on the 3D data readings is much more signicant, making the need to average over many measurements clear. The estimates from RGB images in 2D and from RGBD in both 2D and 3D gives different results on several parameters, of which translation along the y-axis and roll, pitch rotation is obvious.
((a)) Estimates of x, y & z coordinates
((b)) Estimates of roll, pitch & yaw
Figure 4.3: Calibration data showing the estimate differences between the 2D only and combined 2D and 3D methods for 6D pose estimation. The difference in position is around 5cm, and the offset in orientation is around 0.1 radians in both roll and pitch estimates. The data plots in gure 4.3 conrm the estimation offsets seen back in gure 2.13 and 2.14. From these visualizations of point cloud alignment with both calibrations is concluded that the best position estimate is achieved with the 2D method, suggesting that the combined method gives a position error of around 5cm for this sensor. However, the 2D method is lacking precision in its orientation estimate, which is around 0.1 radians off compared to the more accurate result from the combined method. If run on a 64-bit system, supporting the combined 2D and 3D method, it could be advantageous to use RGBD for position estimation and RGB for orientation estimation. On 32-bit systems the RGB method is the only one available, so a small error in orientation offset should be expected. This can either be accepted as it is, or adjusted manually after the calibration. In any case, it is obvious that a more precise pose estimation method could improve precision, preferably through development of a new package with both 32-bit and 64-bit compatibility.
4.3. DISTINGUISHABILITY OF 3D FEATURES
35
4.3
Distinguishability of 3D features
It is desirable that the 3D data collected by the scene camera solution is of sufcient resolution to allow object recognition, which could be very useful in many scenarios. Depending on the size of objects and their features, it is possible to adjust the voxel grid resolution to reect detail requirements. In these examples the highest possible precision of around 2mm resolution is used. A key object in the main demonstration of the Robot Co-Worker is the cardboard box with transformers placed in foam. In gure 4.4 is shown visualizations of the point cloud segmentation of such a box, with both RGB and a z-axis gradient coloring. Positioning of the box relative to both cameras is as shown in gure 2.15, also making it clear that full vision of the box is not obtained with both cameras. With only a single box side partly visible to one of the sensors, it is hard to classify this as a box, from only the point cloud data. Another classication is still achieved through object segmentation as shown in gure 3.4. No true classication is done, as all objects are tted with axis aligned bounding boxes, regardless of shape. A box classication algorithm would most likely require more sides of the box to be visible.
((a)) RGB colored transformer box point cloud.
((b)) Same point cloud with z-axis gradient color.
Figure 4.4: Example of a single object point cloud after segmentation. Further analysis of the object could be conducted to determine how many transformers is left in the box, and where these are placed. Either from image analysis of the RGB data (left image), or the elevation level of distance readings (right image).
Looking at the surface of the box, the data from on top of transformers and inside empty slots is very detailed. For each of the eight slots, both analysis of RGB values and analysis of depth looks to be viable approaches to determining which are occupied by a transformer. From the 3D data this is possible because of the feature size, as each hole is around 7cm x 6cm x 6cm, which leads to a considerable feature chance when placing a transformer in it. On both sides of the transformer is placed mounting brackets, but as these are not the same size, the transformer is not symmetrical. Whereas it would be benecial to know the orientation of a transformer, this is hardly possible with even the highest resolution of the scene camera solution. It may be possible under optimal conditions, but in a dynamic work environment it would not be expected to achieve a very high success rate.
36 Another example is the metal heat sink in which the transformers are to be placed. Even though the metal surface is matte, surfaces of some angles are still not detectable with structured light. This makes it unlikely that points on all object features are always registered and available for recognition. The point cloud result from object segmentation of the heat sink with a transformer is shown in gure 4.5, which also corresponds to the RGB images in gure 2.15. Most of the key features are too small to be registered, and some of the detected features are only partly included. It is hard to
CHAPTER 4. TESTS
Figure 4.5: Point cloud of metal heat sink with a mounted transformer. Only some parts of major features are visible.
say if recognition would be possible, but it would denitely not be reliable. This is as expected from the beginning of the project, conrming the assumption that inclusion of a tool camera for reliable object recognition is necessary. For larger objects it may be possible to successfully analyse major features, but reliable classication or recognition can not be expected with these sensors.
4.4. PERFORMANCE AND DATA ACQUISITION RATES
37
4.4
Performance and data acquisition rates
For the scene camera solution to publish data in real time, it is necessary to adjust the cycle rate of the node to match available processing power. During development, the implementation has been tested on 3 computer systems; a powerful workstation at DTI, a portable but powerful laptop, and a small ofce desktop PC at DTU. In table 4.1 is shown the highest possible processing rates of each system.
Cores Threads Sensor FPS Complete processing Scene camera processing rates Intel Core i7 920 @3.20GHz Intel Core i5 460M @2.53GHz 6 2 12 4 15 10 1.6Hz 1.0Hz Intel Core 2 Duo E7200 @2.53GHz 2 2 3 0.4Hz
Table 4.1: Performance achieved with the three tested computer systems. Signicant difference between the three is obvious, with the slowest one barely usable, depending on the task. It is preferred to have reasonably high image acquisition rate for the sensors, without impacting the rate of the overall processing loop. As the modules are split into separate nodes it is possible to achieve a signicant increase in cycle rates with multi-core processors. Image acquisition in the sensor driver is the rst bottle neck, where all three systems cannot run both cameras at their full 30 frames per second (FPS). The most powerful one is capable of running both at 15 FPS, and the least powerful only at 3 FPS. These nodes take up a lot of processing power and will easily occupy one processor core for each sensor, if possible. This explains why 15 FPS is achievable with an entire, fast, core available for each sensor, and only 3 FPS can be reached with only a single, slower, core for both sensors to share. Adjustment of FPS conguration settings is applied to both RGB and depth simultaneously, and has to be set in the sensor driver. As no option for pre-launch conguration of this has been found, a workaround with dynamic reconguration after launch has proven to be both sufcient and reliable. It is not possible to set a desired frame rate directly, as the FPS conguration parameter is the number of frames to skip between each used one. This value must be in the range 0-10, allowing frame rates of 30.0, 15.0, 10.0, 7.5, 6.0, 5.0, 4.29, 3.75, 3.33, 3.0 and 2.73 FPS. A conguration where 2 of 3 frames are skipped have proven to provide good results at 10 FPS. The second bottle neck is the main scene camera node handling processing of the point clouds, including down sampling and segmentation. Especially segmentation is very computational heavy with high density point clouds. In the processing rate examples have been used down sampling to a voxel grid with 10mm resolution. Segmentation is done with the PCL implementation of a kdtree method [7], which takes op most of the processing power used by the scene camera node. Some improvement could be achieved by moving this into a separate node, on systems where additional cores are available, or from using a more efcient segmentation method. The scene camera solution requires a signicant amount of processing power to deliver an update rate sufcient for real time use. Without this, analysis of snapshots for less time dependent purposes can be done, but it will not reect the full potential of the implementation.
38
CHAPTER 4. TESTS
4.5
Reliability of services
Thorough testing of the scene camera features used by external modules have been done, in
order to verify whether these are robust and reliable enough for their intended purposes. Primarily during integration with the external modules, but also in live demonstrations. Knowing the limitations of these services are crucial for other developers to ensure proper behaviour on their end, making the combination of modules more tolerant.
4.5.1
3D obstacle mapping
To verify that positions of detected objects are sufciently precise, and to make sure that the coordinates are correctly translated, the robot arm have been instructed to move and point at several estimated object coordinates. A small box is placed in different areas of the scene, followed by the robot arm moving to where the center of the box is estimated to be. This gives an impression of how well all parts of the scene camera solution are calibrated, as well as precision of the used data treatment methods. All positions of the box are well detected, and in all cases the robot arm points very close to the center of it. Precision is dependent on whether the box is placed in the center or at the sides of the table, which is as expected because of the sensor noise magnitude. The best estimates has less than 5mm offset, and the worst a maximum of up to 2cm. It is concluded that the external user in this case should expect a maximum deviation of 2cm, which lives up the requirements for its intended use to position the tool camera with proper vision of an object. 4.5.1.1 DTI Motion Planning Demo
In relation to both of the projects SMErobotics and PRACE, a demonstration of motion planning with obstacle avoidance has been requested. The task was to have the robot do motion planning of the arm, based on any obstacles in the scene. Consultant at DTI, Ph.d. Martin M. Olsen, who was in charge of the motion planning, required simple 3D geometries describing restricted areas in the Figure 4.6: Screenshot from the video of autonomous moscene, for the robot to base its plan- tion planning demonstrated by having the robot plan an ning upon. With the bounding boxes alternate path from one side of the toolbox to the other. provided by the scene camera solu- The detected obstacles are shown in the small simulation tion it was possible to achieve the de- window. sired functionality, resulting in the addition of autonomous motion planning to the Robot Co-Worker skill set. A video taken as part of the task fulllment can be found on the CD1 , where an arbitrarily placed large obstacle is detected and avoided. A screenshot from the video is shown in gure 4.6.
See appendix D for a list of CD contents
4.5. RELIABILITY OF SERVICES
39
4.5.2
Box nder add-on
To verify the reliability of the box nder add-on, a series of tests are carried out. It is known that only noise in the 3D data has a signicant impact on the size of a detected box, as the size is determined by the bounding box. A single outlier in the points can cause one or more of the box dimensions to be less accurate, ultimately resulting in a wrong size estimate for the box nder. To handle this uncertainty, an allowed tolerance is passed to the box nder, dening how much the size of a box can vary and still be accepted as a match. Figure 4.7: The test setup with a cardboard box In this example, a medium sized cardboard placed with equal distance to both sensors and box measuring 0.220m x 0.150m x 0.215m has no vision obstruction by the arm. been used. Cardboard do not cause much noise, so a shiny tape stripe with print is added to make the test more realistic. The box is placed with the same distance to both cameras and without the robot arm obstructing visibility, as shown in gure 4.7. Service calls with four different tolerance values are indexed by how many data samples needs to be searched before nding a match to the requested box size. A total of 80 service calls were made, and the results are shown in gure 4.8. It is preferred to have the box detected in all service calls, and with iteration through as few data samples as possible. With 20% tolerance it is possible to detect the box in rst attempt every time, but such a high tolerance will make it hard to distinguish between multiple boxes of approximately the same size, if these are both within the relatively high tolerance. A tolerance of 15% or 12% reduces the allowed variation considerably, and does not impact the fast success rate by much. All service calls successfully detects the box, although a bit longer average processing time is used because some of the calls go through up to 5 data samples before nding a match. Finally, a tolerance of 10% is tested. With a success rate of only 60% and very few detections in rst Figure 4.8: Graphical overview of 3D data samples required to detect a box. With 15% tolerance, detection in rst attempt is 85% and no more than 3 attempts are required at most. This should be expected to vary with the box size, as the noise does not scale with this.
attempt, this is not considered to be viable. The caller of the box nder service is free to determine whether to prioritize precision in size distinguishability of sizes or success rate, depending on the application. A good compromise between the two is to use a tolerance of 15%.
40 4.5.2.1 DTI Co-Worker Demo
CHAPTER 4. TESTS
An example of actual use of the box nder add-on, is the demonstration also mentioned in the problem formulation. This was presented on 07/06-2013 at DTI in Odense, in relation to a scheduled conference [1] for future users and integrators of the Robot Co-Worker. With the scene camera functionality as a key component, the live demonstration was completed with no technical issues, and thereby great success. A position estimation of the large box initiates the demonstration by making the robot able to position the tool camera correctly, and pick a transformer from the box. The transformer is then moved to the heat sink, where it is carefully placed with the correct angle of rotation. Repeating this process, the robot empties the top layer of the box and continues with the bottom layer after instructing the operator to remove a middle layer of foam. For this demon- Figure 4.9: The Robot Co-Worker main demonstration stration, the transformer in the heat setup. A video with highlights from one of the live demonsink is removed by hand, simulating strations is included on the CD. a conveyor belt or production line. During the entire execution, the audience can follow the simulated robot on a screen. The simulation is congured to subscribe to the RGBD point cloud from the scene camera, making it possible to overlay the point cloud data live on the simulation, as shown on the left in gure 4.9. A video with a segment of the demonstration showing a task cycle is included on the CD2 .
4.6
Portability to custom Linux systems at DTU

The concept of ROS packages is for them to be portable and easily usable in any system capable
of running ROS. To test this, the scene camera package has been installed on a small PC at DTU to monitor the surface of an automated XY-table. Even though the used PC is running a custom ubuntu 12.04 distribution, it has been possible to install ROS fuerte and get the scene camera package up and running quite easily. A step-by-step instruction to this can be found in appendix A. In addition to a different Linux system, the sensors used in the DTU setup are Kinects rather than Xtions, which should be directly adaptable as well. As expected, the sensor change required almost no conguration changes, as the same driver supports both Xtions and Kinects. To make room for a new robot arm next to the work cell, the scene camera setup is currently momentarily removed, but successful implementation and operation was achieved and can be expected to work when the cell is put back together. However, it should be noted that the very limited processing power available from the provided PC will not be sufcient for real time pose estimation of objects on a moving XY table.
See appendix D for a list of CD contents
4.7. TEST CONCLUSION
41
4.7
Test conclusion
Several critical aspects of the implementation has been investigated, and evaluations of these have been used to determine the usability and dependability of the scene camera solution. It has been pointed out that shielding from external IR light sources is crucial for proper data collection with structured light sensors. In particular it is important to avoid sunlight in the work area. Potential issues caused by interference between the two light patterns projected on the same surfaces has been investigated, but are not deemed to have critical impact in most detection scenarios. The difference between 6D pose estimation of sensors, with RGB or RGBD, have been shown to vary. Neither provides a perfect pose estimate, but the RGB approach only has a small deviation on orientation, making it the most suited candidate for calibration use. Ultimately, a combination of both methods, or development of a new package, would provide more precise results. With large objects it is possible to obtain 3D data of high enough detail to distinguish major features, which could be used in the Robot Co-Worker demo to determine how many transformers are left in a box, as well as their positions. Smaller features, especially on metallic objects, can not be expected to be distinguishable in the point clouds. Multi-core processors are best suited to run the solution, as the separation into nodes makes it possible to assign these to different processor cores for improved performance. Running at full resolution, with no down sampling, the solution will require a powerful processor in order to run at a decent rate. With down sampling to a 1cm voxel grid it is possible to achieve a full cycle processing time of 1Hz or faster, even on a laptop processor. Multiple demonstrations have been carried out to present examples of intended use of the scene camera solution in combination with the Robot Co-Worker. These demonstrations have showed that the project solution requirements are met, and that the implemented features are both useful, reliant, and portable to ROS environments on other systems as well.
Chapter Future Work

The goal of this project has been to develop a scene camera solution to be generally useful in integration with the DTI Robot Co-Worker prototype. A few specic uses have also been requested for use in demonstrations, as examples of how the generated point clouds can be used. Targeting both general usability and specic requests, suggestions for future work are divided into two categories to reect the difference between improvements and expansions. Improvements include aspects of the implemented solution which are open to signicant quality enhancement. These are the suggestions for future work to improve the existing implementation: The calibration procedure currently relies on the ar-track-alvar package, which has been shown to have room for improvement. Renement of how the package is used, or replacement with another, could make calibrations more precise. Object segmentation with the kdtree method has proven to be computationally heavy, suggesting that this approach could be reconsidered, possibly resulting in a more efcient implementation. Expansions are ways of adding new features and functionalities on top of the existing implementation, most likely through development of add-ons. These are suggested areas considered as worth looking into, because they will add useful tools based on the existing core implementation: A functionality for monitoring and keeping track of all objects in the scene could be useful for tracking and identifying when changes occur in the scene or a new object is detected. An add-on capable of analysing the content of a container in the scene in order to determine if it is empty, or in which region of the container there is still objects left to be picked. Object classication or recognition could be useful in scenarios where larger objects with distinguishable features are present. This could also help determine the orientation of recognizable objects. As expected, and desired, the implemented scene camera solution has presented some useful options for further development. Which of these are the most relevant to prioritize comes down to which demonstration tasks are decided to show in the future.
43
Chapter Conclusion
Through completion of this project a scene camera solution for the DTI Robot Co-Worker prototype has been developed, integrated and veried. From analysis of the scene and investigation of existing modules it was decided to create a completely new implementation, tailored to match requirements for use with the Robot Co-Worker. The developed solution serves as a fully functional scene camera module, including necessary core features for calibration and reliable generation of segmented RGBD point clouds, as well as additional functionalities to allow pose estimation of objects. Fullment of the initial project goals have been achieved by development of: An automatic calibration solution, estimating and saving 6D pose estimates of the structured light sensors by placement of an AR tag in the scene and executionf of a script. Calibration during operation is possible, and values are saved even after restart of the system. Functionality to allow easy conguration of point cloud density through adjustment of voxel grid resolution used for down sampling. This makes it possible to adjust 3D data resolution to match varying requirements in multiple situations. A Correction node for adjustment of estimated depth values, which has proven to be crucial in order to obtain reliable distance measurements from the sensors. The scene camera core which processes obtained 3D data and publishes segmented point clouds of objects in the scene, as well as bounding box geometries for obstacle avoidance in motion planning. An add-on providing a ROS service to locate an axis-aligned bounding box of specic size in the scene and return its pose for use in external modules. In cooperation with the development team of the Robot Co-Worker, the scene camera solution has been successfully integrated with the current prototype. This has made it possible to perform tests in a wide variety of realistic scenarios, through demonstrations of performance in example cases suggested by industrial partners.
45
46
CHAPTER 6. CONCLUSION
The primary demonstrations with inclusion of key features from the scene camera was: Autonomous motion planning based on obstacle detection, as part of the SMErobotics and PRACE projects. Simple instruction of tasks for the Robot Co-Worker, with autonomous object pose estimation in a pick-and-place assembly task. Both demonstrations were successful and could not have been achievable without the functionality of the scene camera solution. Developed as a ROS package, the implementation is directly portable to other ROS systems. This has been veried by successful installation and use on a custom Ubuntu system at DTU, simultaneously conrming seamless compatibility with the Microsoft Kinect sensor. With fullment of all elements mentioned in the initial problem formulation, the project has provided a solution capable of its intended purpose, while also providing opportunity for further addition of useful functionalities.
List of Appendices
A Appendix - Package installation instructions . . . . . . . . . . . . . . . . . . . . . . . . A.1 A.2 B C D E How to do a fresh installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . How to launch the nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 49 49 51 57 59 61
Appendix - Wiki content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix - Depth estimation offset for sensors . . . . . . . . . . . . . . . . . . . . . . . Appendix - CD contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix - Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
A. Appendix - Package installation instructions

A.1. How to do a fresh installation
The software packages has been developed with easy installation in mind. To install on a new system, simply follow these steps: 1. If using ASUS Xtions, make sure that the PC has at least two buses with different numbers available (eg. by using the command lsusb) 2. Install and update a linux distro compatible with ROS fuerte (eg. Ubuntu 12.04). 3. Install ROS fuerte (desktop-full version) by following the ofcial guide at http://www.ros.org/wiki/fuerte/Installation/Ubuntu 4. Set up your ROS environment as explained in the guide 5. Use apt-get install to install the Kinect/Xtion ros packages (and any dependencies): ros-fuerte-openni-camera ros-fuerte-openni-launch 6. (If using ASUS xtions) Install the Xtion compatible driver Sensor-Bin-Linux-v5.1.0.41 (included in the CD) by running the install.sh script with sudo rights 7. Copy the scenecamera and scenecamera_plugins folders into your ROS_PACKAGE_PATH directory, and use rosmake to compile them 8. (for 64-bit) Install the package ros-fuerte-ar-track-alvar (for 32-bit) Check out the compatible version from gitgub: git clone https://github.com/sniekum/ar_track_alvar.git git checkout f093668
A.2. How to launch the nodes

You should now be able to launch the required nodes. To make this task easier, they are collected in .launch les. Make sure to copy verify and modify the content of the launch les to match your needs. 1. Launch the drivers: roslaunch scenecamera both_xtions.launch 2. Launch the scenecamera core: roslaunch scenecamera scenecamera.launch 3. Start rviz to see the images and pointclouds visualized: rosrun rviz rviz 4. To calibrate the individual sensors and positions, follow the steps in the wiki [appendix B].
49
B. Appendix - Wiki content

During development of the scene camera, an internal wiki webpage was created as a reference for the members of the Robot Co-Worker team at DTI. The content has not been streamlined for external use, as it is just meant to serve as a basic guideline for use of the scene camera solution. Included in the following four pages is a snapshot of the wiki near the end of this project. It should be noted that the wiki format is primarily created to be viewed in a browser, not for nice printing layouts.
51
[ Appendix B is not included in this version of the report ]
C. Appendix - Depth estimation oset for sensors

Xtion 1 correction equation converted to [mm]): y = 0.00001x2 + 0.989x + 15.5
Table 1: Estimate results for the Xtion1 sensor with factory calibration.
Figure 1: Xtion1 estimates and measurements.
Xtion 2 correction equation converted to [mm]): y = 0.00008x2 + 0.963x + 24.33

Table 2: Estimate results for the Xtion2 sensor with factory calibration.
Figure 2: Xtion2 estimates and measurements.
57
58
List of Appendices Kinect 1 correction equation converted to [mm]): y = 0.000001x2 + 0.984x + 7.99
Kinect1 distance estimates
est. [m] 0.547 0.650 0.750 0.848 0.949 1.050 1.149 1.248 1.346 1.448
act. [m] 0.55 0.65 0.75 0.85 0.95 1.05 1.15 1.25 1.35 1.45
error [m] 0.003 0.000 0.000 0.002 0.001 0.000 0.001 0.002 0.004 0.002
error [%] 0.55 0.00 0.00 0.24 0.11 0.00 0.09 0.16 0.30 0.14
Table 3: Estimate results for the Kinect1 sensor with factory calibration.
Figure 3: Kinect1 estimates and measurements.
Kinect 2 correction equation converted to [mm]): y = 0.000001x2 + 1.014x 43.53

Kinect2 distance estimates est. [m] 0.583 0.681 0.776 0.873 0.972 1.070 1.164 1.264 1.360 1.453 act. [m] 0.55 0.65 0.75 0.85 0.95 1.05 1.15 1.25 1.35 1.45 error [m] -0.033 -0.031 -0.026 -0.023 -0.022 -0.020 -0.014 -0.014 -0.010 -0.003 error [%] -5.66 -4.55 -3.35 -2.63 -2.26 -1.87 -1.20 -1.11 -0.74 -0.21
Table 4: Estimate results for the Kinect2 sensor with factory calibration.
Figure 4: Kinect2 estimates and measurements.
D. Appendix - CD contents
Along with this report comes a CD with les related to the project. These are the folder on the CD, and their content: Images Full resolution versions of images included in the report. Packages Source les in the form of ROS packages ready for compilation. References Pdf les of all resources from the resource list of the report. Report Pdf version of this report itself. Videos Video les of the demonstrations mentioned in the report.
59
E. Appendix - Code
In addition to the digital source les on the CD, this appendix contains the code written for the scene camera implementation. Each node has its own .cpp le, and is launched through one or more .launch les. The order of code les is as follows: 1. xtion1.launch (launches driver, image correction and tf publisher for one sensor) 2. depth_image_corrector_publisher.launch 3. corrected_depth_remap.launch 4. publish_xtion_tf.launch 5. depth_image_corrector.cpp 6. xtion_tf_broadcaster.cpp 7. calibrate_xtion1.launch (launches sensor pose calibration) 8. calibrate_primesense_sensor.launch 9. cam_tf_pose_cfg_generator.cpp 10. scenecamera.launch (launches the scene camera core) 11. scenecamera.cpp 12. nd_box_pose_server.cpp
61
[ Appendix E is not included in this version of the report ]
References
[1] [2] [3] DIRA. Robot Co-Worker - Information og demonstration. URL: http://www.dira.dk/nyheder/ ?id=519 (visited on 30/05/2013). Willow Garage. PR2 Tabletop Manipulation Apps. URL: http : / / ros . org / wiki / pr2 _ tabletop_manipulation_apps (visited on 22/06/2013). The Danish Technological Institute. DTI Robot Co-Worker for Assembly. URL: http://www. teknologisk . dk / ydelser / dti - robot - co - worker - for - assembly / 32940 (visited on 21/06/2013). [4] [5] [6] [7] [8] [9] Gautier Dumonteil et al. Ivan Dryanovski William Morris. Augmented Reality Marker Pose Estimation using ARToolkit. URL: http://www.ros.org/wiki/ar_pose (visited on 20/06/2013). Scott Niekum. ar-track-alvar Package Summary. URL: http://www.ros.org/wiki/ar_track_ alvar (visited on 31/05/2013). PCL. Euclidean Cluster Extraction - Documentation. URL: http://www.pointclouds.org/ documentation/tutorials/cluster_extraction.php (visited on 09/06/2013). PCL. How to use a KdTree to search. URL: http : / / pointclouds . org / documentation / tutorials/kdtree_search.php (visited on 24/06/2013). ROS. Intrinsic calibration of the Kinect cameras. URL: http://www.ros.org/wiki/openni_ launch/Tutorials/IntrinsicCalibration (visited on 31/05/2013). SMErobotics. The SMErobotics Project. URL: http://www.smerobotics.org/project.html (visited on 21/06/2013). [10] Mikkel Viager. Analysis of Kinect for Mobile Robots. Individual course report. Technical University of Denmark DTU, Mar. 2011. [11] Mikkel Viager. Flexible Mission Execution for Mobile Robots. Individual course report. Technical University of Denmark DTU, July 2012. [12] Mikkel Viager. Scene analysis for robotics using 3D camera. Individual course report. Technical University of Denmark DTU, Jan. 2013. [13] Wim Meeussen Vijay Pradeep. camera-pose-calibration Package Summary. URL: http://www. ros.org/wiki/camera_pose_calibration?distro=fuerte (visited on 20/06/2013).
97

Scene Detection For Flexible Production Robot

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Scene Detection For Flexible Production Robot

Diunggah oleh

Hak Cipta:

Format Tersedia

Mikkel Viager (s072103)

Scene detection for exible production robot

M IKKEL V IAGER ( S 072103)

Scene detection for exible production robot

Masters Thesis, June 2013

DTU - Technical University of Denmark, Kgs. Lyngby - 2013

Project period: ECTS: Education: Field: Class: Remarks:

Mikkel Viager, 2013

i iii vii ix xi xiii 1 1 3 4 5 7 7 8 9 10 11 11 11 12 13 13 15 15 16 16 16 16 18 19 20

Positioning and eld of view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3D data precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Analysis conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i

Portability to custom Linux systems at DTU . . . . . . . . . . . . . . . . . . . . . . . Test conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Future Work Conclusion

List of Appendices References

1.1 1.2 2.1

2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9

1.2. PROBLEM FORMULATION

Scope and limitations

1.4. CONTENT OVERVIEW

Concept and tasks

Introduction of scene and objects

2.3. EXISTING SOLUTION

Microsoft Kinect & ASUS Xtion PRO LIVE

2.6. POSITIONING AND FIELD OF VIEW

Positioning and eld of view

((a)) 640x480 RGB image from Xtion1.

((b)) 640x480 RGB image from Xtion2.

2.6. POSITIONING AND FIELD OF VIEW

Intrinsic camera calibration

External position calibration

2.7. CALIBRATION 2.7.3.2 Transforms

2.8. 3D DATA PRECISION

((a)) Maximum voxel grid resolution of 2mm.

((b)) Down sampled voxel grid resolution to 10mm.

Chapter Developed Elements

CHAPTER 3. DEVELOPED ELEMENTS

3.2. PERCEPTION PROCESSING STEP 1 - DEPTH RECALIBRATION

Perception processing step 1 - Depth recalibration

Estimate results for the

CHAPTER 3. DEVELOPED ELEMENTS

((a)) Full plot of all measurements

((b)) Zoom on the area of interest

3.3. PERCEPTION PROCESSING STEP 2 - TRANSFORM ESTIMATION

Perception processing step 2 - Transform estimation

CHAPTER 3. DEVELOPED ELEMENTS

Perception processing step 3 - Scene camera core

3.4. PERCEPTION PROCESSING STEP 3 - SCENE CAMERA CORE

CHAPTER 3. DEVELOPED ELEMENTS

Perception processing step 4 - Box nder add-on

3.5. PERCEPTION PROCESSING STEP 4 - BOX FINDER ADD-ON

CHAPTER 3. DEVELOPED ELEMENTS

Impact of noise sources

Overlapping structured light patterns

4.1. IMPACT OF NOISE SOURCES

Precision of sensor position calibration

((a)) Estimates of x, y & z coordinates

((b)) Estimates of roll, pitch & yaw

4.3. DISTINGUISHABILITY OF 3D FEATURES

((a)) RGB colored transformer box point cloud.

((b)) Same point cloud with z-axis gradient color.

4.4. PERFORMANCE AND DATA ACQUISITION RATES

Performance and data acquisition rates

See appendix D for a list of CD contents