Filter by type:

Sort by year:

Consolidating Kinematic Models to Promote Coordinated Mobile Manipulations

Ziyuan Jiao*, Zeyu Zhang*, Xin Jiang, David Han, Song-Chun Zhu, Yixin Zhu, Hangxin Liu.
* equal contributors
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021
Conference PapersRobotics

Abstract

We construct a Virtual Kinematic Chain (VKC) that readily consolidates the kinematics of the mobile base, the arm, and the object to be manipulated in mobile manipulations. Accordingly, a mobile manipulation task is represented by altering the state of the constructed VKC, which can be converted to a motion planning problem, formulated, and solved by trajectory optimization. This new VKC perspective of mobile manipulation allows a service robot to (i) produce well-coordinated motions, suitable for complex household environments, and (ii) perform intricate multi-step tasks while interacting with multiple objects without an explicit definition of intermediate goals. In simulated experiments, we validate these advantages by comparing the VKC-based approach with baselines that solely optimize individual components. The results manifest that VKC-based joint modeling and planning promote task success rates and produce more efficient trajectories.

Efficient Task Planning for Mobile Manipulation: a Virtual Kinematic Chain Perspective

Ziyuan Jiao*, Zeyu Zhang*, Weiqi Wang, David Han, Song-Chun Zhu, Yixin Zhu, Hangxin Liu.
* equal contributors
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021
Conference PapersRobotics

Abstract

We present a VKC perspective, a simple yet effective method, to improve task planning efficacy for mobile manipulation. By consolidating the kinematics of the mobile base, the arm, and the object being manipulated collectively as a whole, this novel VKC perspective naturally defines abstract actions and eliminates unnecessary predicates in describing intermediate poses. As a result, these advantages simplify the design of the planning domain and significantly reduce the search space and branching factors in solving planning problems. In experiments, we implement a task planner using PDDL with VKC. Compared with conventional domain definition, our VKC-based domain definition is more efficient in both planning time and memory. In addition, abstract actions perform better in producing feasible motion plans and trajectories. We further scale up the VKC-based task planner in complex mobile manipulation tasks. Taken together, these results demonstrate that task planning using VKC for mobile manipulation is not only natural and effective but also introduces new capabilities.

Reconstructing Interactive 3D Scenes by Panoptic Mapping and CAD Model Alignments

Muzhi Han*, Zeyu Zhang*, Ziyuan Jiao, Xu Xie, Yixin Zhu, Song-Chun Zhu, Hangxin Liu.
* equal contributors
IEEE International Conference on Robotics and Automation (ICRA), 2021
Conference PapersRoboticsSensing

Abstract

In this paper, we rethink the problem of scene reconstruction from an embodied agent's perspective: While the classic view focuses on the reconstruction accuracy, our new perspective emphasizes the underlying functions and constraints such that the reconstructed scenes provide \emph{actionable} information for simulating \emph{interactions} with agents. Here, we address this challenging problem by reconstructing an interactive scene using RGB-D data stream, which captures (i) the semantics and geometry of objects and layouts by a 3D volumetric panoptic mapping module, and (ii) object affordance and contextual relations by reasoning over physical common sense among objects, organized by a graph-based scene representation. Crucially, this reconstructed scene replaces the object meshes in the dense panoptic map with part-based articulated CAD models for finer-grained robot interactions. In the experiments, we demonstrate that (i) our panoptic mapping module outperforms previous state-of-the-art methods, (ii) a high-performant physical reasoning procedure that matches, aligns, and replaces objects' meshes with best-fitted CAD models, and (iii) reconstructed scenes are physically plausible and naturally afford actionable interactions; without any manual labeling, they are seamlessly imported to ROS-based simulators and virtual environments for complex robot task executions.

Human-Robot Interaction in a Shared Augmented Reality Workspace

Shuwen Qiu*, Hangxin Liu*, Zeyu Zhang, Yixin Zhu, Song-Chun Zhu.
* equal contributors
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020
Conference PapersRoboticsAugmented Reality

Abstract

We design and develop a new shared Augmented Reality (AR) workspace for Human-Robot Interaction (HRI), which establishes a bi-directional communication between human agents and robots. In a prototype system, the shared AR workspace enables a shared perception, so that a physical robot not only perceives the virtual elements in its own view but also infers the utility of the human agent—the cost needed to perceive and interact in AR—by sensing the human agent's gaze and pose. Such a new HRI design also affords a shared manipulation, wherein the physical robot can control and alter virtual objects in AR as an active agent; crucially, a robot can proactively interact with human agents, instead of purely passively executing received commands. In experiments, we design a resource collection game that qualitatively demonstrates how a robot perceives, processes, and manipulates in AR and quantitatively evaluates the efficacy of HRI using the shared AR workspace. We further discuss how the system can potentially benefit future HRI studies that are otherwise challenging.

WalkingBot: Modular Interactive Legged Robot with Automated Structure Sensing and Motion Planning

Meng Wang, Yao Su, Hangxin Liu, Yingqing Xu.

IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 2020
Conference PapersRobotics

Abstract

This paper presents WalkingBot, a modular robot system that allows non-expert users to build a multi-legged robot in various morphologies. The system provides a set of building blocks with sensors and actuators embedded. Through the integrated hardware and software designs, the morphology of the built robot is interpreted automatically, and its kinematic model is revealed in a customized GUI in a computer screen, allowing users to understand, control, and program the robot easily. A Model Predictive Control scheme is introduced to generate a control policy for various motions (eg moving forward, turning left) corresponding to the sensed robot structure, affording rich robot motions right after assembling. Targeting different levels of programming skill, two programming methods---visual block programming and events programming are also presented to enable users to create their own interactive legged robot.

Joint Inference of States, Robot Knowledge, and Human (False-)Beliefs


IEEE International Conference on Robotics and Automation (ICRA), 2020
Conference PapersRoboticsSensing

Abstract

Aiming to understand how human (false-)belief—a core socio-cognitive ability—would affect human interactions with robots, this paper proposes to adopt a graphical model to unify the representation of object states, robot knowledge, and human (false-)beliefs. Specifically, a parse graph (PG) is learned from a single-view spatiotemporal parsing by aggregating various object states along the time; such a learned representation is accumulated as the robot’s knowledge. An inference algorithm is derived to fuse individual PG from all robots across multi-views into a joint PG, which affords more effective reasoning and inference capability to overcome the errors originated from a single view. In the experiments, through the joint inference over PGs, the system correctly recognizes human (false-)belief in various settings and achieves better cross-view accuracy on a challenging small object tracking dataset.

Congestion-aware Evacuation Routing using Augmented Reality Devices


IEEE International Conference on Robotics and Automation (ICRA), 2020
Conference PapersAugmented Reality

Abstract

We present a congestion-aware routing solution for indoor evacuation, which produces real-time individual-customized evacuation routes among multiple destinations while keeping tracks of all evacuees’ locations. A population density map, obtained on-the-fly by aggregating locations of evacuees from user-end AR devices, is used to model the congestion distribution inside a building. To efficiently search the evacuation route among all destinations, a variant of A* algorithm is devised to obtain the optimal solution in a single pass. In a series of simulated studies, we show that the proposed algorithm is more computationally optimized compared to classic path planning algorithms; it generates a more time-efficient evacuation route for each individual that minimizes the overall congestion. A complete system using AR devices is implemented for a pilot study in real-world environments, demonstrating the efficacy of the proposed approach.

Dark, Beyond Deep: A Paradigm Shift to Cognitive AI with Human-like Commonsense


Engineering, 2020
Journal Papers

Abstract

Recent progress from deep learning is based on a “big data for small task” paradigm, in which massive data is poured into the training of a classifier dedicated to a single task. In this paper, we call for a paradigm shift that flips the data-task relation upside down. Specifically, we propose a “small data for big task” paradigm, wherein a single Artificial Intelligence (AI) system is challenged to develop “commonsense” that can solve a wide range of tasks with small training data. We illustrate the power of this paradigm by reviewing models of commonsense from our groups that synthesize recent breakthroughs from both machine and human vision. We dentify functionality, physics, intention, causality, and utility (FPICU), as the five core domains of the cognitive AI with human-like commonsense. FPICU are concerning “why” and “how,” which are beyond the dominating “what-and-where” framework of vision. They are invisible in terms of pixels but nevertheless drive the creation, maintenance, and development of visual scenes. Therefore, we coin them as the “dark matter” of vision. Just like our universe cannot be understood by just studying the observable matter, vision cannot be understood without studying FPICU as dark matters. We demonstrate the power of this cognitive AI approach with human-like commonsense by showing how to apply FPICU with little training data to solve a wide range of novel tasks including tool-use, planning, utility inference, and social learning in general. In summary, we argue that the next generation of AI must embrace the “dark” human-like commonsense for solving novel tasks.

A tale of two explanations: Enhancing human trust by explaining robot behavior

Science Robotics, Volume 4, Issue 37, 2019
Journal PapersRobotics

Abstract

The ability to provide comprehensive explanations of chosen actions is a hallmark of intelligence. Lack of this ability impedes the general acceptance of AI and robot systems in critical tasks. This paper examines what forms of explanations best foster human trust in machines and proposes a framework in which explanations are generated from both functional and mechanistic perspectives. The robot system learns from human demonstrations to open medicine bottles using (i) an embodied haptic prediction model to extract knowledge from sensory feedback, (ii) a stochastic grammar model induced to capture the compositional structure of a multistep task, and (iii) an improved Earley parsing algorithm to jointly leverage both the haptic and grammar models. The robot system not only shows the ability to learn from human demonstrators but also succeeds in opening new, unseen bottles. Using different forms of explanations generated by the robot system, we conducted a psychological experiment to examine what forms of explanations best foster human trust in the robot. We found that comprehensive and real-time visualizations of the robot’s internal decisions were more effective in promoting human trust than explanations based on summary text descriptions. In addition, forms of explanation that are best suited to foster trust do not necessarily correspond to the model components contributing to the best task performance. This divergence shows a need for the robotics community to integrate model components to enhance both task execution and human trust in machines.

VRGym: A Virtual Testbed for Physical and Interactive AI


ACM Turing Celebration Conference - China (ACM TURC), 2019
Conference PapersVirtual Reality

Abstract

We propose VRGym, a virtual reality (VR) testbed for realistic human-robot interaction. Different from existing toolkits and VR environments, the VRGym emphasizes on building and training both physical and interactive agents for robotics, machine learning, and cognitive science. VRGym leverages mechanisms that can generate diverse 3D scenes with high realism through physics-based simulation. We demonstrate that VRGym is able to (i) collect human interactions and fine manipulations, (ii) accommodate various robots with a ROS bridge, (iii) support experiments for human-robot interaction, and (iv) provide toolkits for training the state-of-the-art machine learning algorithms. We hope VRGym can help to advance general-purpose robotics and machine learning agents, as well as assisting human studies in the field of cognitive science.

Self-Supervised Incremental Learning for Sound Source Localization in Complex Indoor Environment

Hangxin Liu*, Zeyu Zhang*, Yixin Zhu, Song-Chun Zhu.
* equal contributors
IEEE International Conference on Robotics and Automation (ICRA), 2019
Conference PapersRoboticsSensing

Abstract

This paper presents an incremental learning framework for mobile robots localizing the human sound source using a microphone array in a complex indoor environment consisting of multiple rooms. In contrast to conventional approaches that leverage direction-of-arrival (DOA) estimation, the framework allows a robot to accumulate training data and improve the performance of the prediction model over time using an incremental learning scheme. Specifically, we use implicit acoustic features obtained from an auto-encoder together with the geometry features from the map for training. A self-supervision process is developed such that the model ranks the priority of rooms to explore and assigns the ground truth label to the collected data, updating the learned model on-the-fly. The framework does not require pre-collected data and can be directly applied to real-world scenarios without any human supervisions or interventions. In experiments, we demonstrate that the prediction accuracy reaches 67% using about 20 training samples and eventually achieves 90% accuracy within 120 samples, surpassing prior classification-based methods with explicit GCC-PHAT features.

High-Fidelity Grasping in Virtual Reality using a Glove-based System

Hangxin Liu*, Zhenliang Zhang*, Xu Xie, Yixin Zhu, Yue Liu, Yongtian Wang, Song-Chun Zhu.
* equal contributors
IEEE International Conference on Robotics and Automation (ICRA), 2019
Conference PapersRoboticsSensingVirtual Reality

Abstract

This paper presents a design that jointly provides hand pose sensing, hand localization, and haptic feedback to facilitate real-time stable grasps in Virtual Reality (VR). The design is based on an easy-to-replicate glove-based system that can reliably perform (i) a high-fidelity hand pose sensing in real time through a network of 15 IMUs, and (ii) the hand localization using a Vive Tracker. The supported physics-based simulation in VR is capable of detecting collisions and contact points for virtual object manipulation, which drives the collision event to trigger the physical vibration motors on the glove to signal the user, providing a better realism inside virtual environments. A caging-based approach using collision geometry is integrated to determine whether a grasp is stable. In the experiment, we showcase successful grasps of virtual objects with large geometry variations. Comparing to the popular LeapMotion sensor, we demonstrate the proposed glove-based design yields a higher success rate in various tasks in VR. We hope such a glove-based system can simplify the data collection of human manipulations with VR.

Mirroring without Overimitation: Learning Functionally Equivalent Manipulation Actions


AAAI Conference on Artificial Intelligence (AAAI), 2019
Conference Papers Robotics

Abstract

This paper presents a mirroring approach, inspired by the neuroscience discovery of the mirror neurons, to transfer demonstrated manipulation actions to robots. Designed to address the different embodiments between a human (demonstrator) and a robot, this approach extends the classic robot Learning from Demonstration (LfD) in the following aspects: i) It incorporates fine-grained hand forces collected by a tactile glove in demonstration to learn robot's fine manipulative actions; ii) Through model-free reinforcement learning and grammar induction, the demonstration is represented by a goal-oriented grammar consisting of goal states and the corresponding forces to reach the states, independent of robot embodiments; iii) A physics-based simulation engine is applied to emulate various robot actions and mirrors the actions that are functionally equivalen} to the human's in the sense of causing the same state changes by exerting similar forces. Through this approach, a robot reasons about which forces to exert and what goals to achieve to generate actions (i.e., mirroring), rather than strictly mimicking demonstration (i.e., overimitation). Thus the embodiment difference between a human and a robot is naturally overcome. In the experiment, we demonstrate the proposed approach by teaching a real Baxter robot with a complex manipulation task involving haptic feedback---opening medicine bottles.

Interactive Robot Knowledge Patching using Augmented Reality

Hangxin Liu*, Yaofang Zhang*, Wenwen Si, Xu Xie, Yixin Zhu, Song-Chun Zhu.
* equal contributors
IEEE International Conference on Robotics and Automation (ICRA), 2018
Conference PapersRoboticsAugmented Reality

Abstract

We present a novel Augmented Reality (AR) approach, through Microsoft HoloLens, to address the challenging problems of diagnosing, teaching, and patching interpretable knowledge of a robot. A Temporal And-Or graph (T-AOG) of opening bottles is learned from human demonstration and programmed to the robot. This representation yields a hierarchical structure that captures the compositional nature of the given task, which is highly interpretable for the users. By visualizing the knowledge structure represented by the T-AOG and the decision making process by parsing a T-AOG, the user can intuitively understand what the robot knows, supervise the robot's action planner, and monitor visually latent robot states (e.g., the force exerted during interactions). Given a new task, through such comprehensive visualizations of robot's inner functioning, users can quickly identify the reasons of failures, interactively teach the robot with a new action, and patch it to the knowledge structure represented by the T-AOG. In this way, the robot is capable of solving similar but new tasks only through minor modifications provided by the users interactively. This process demonstrates the interpretability of our knowledge representation and the effectiveness of the AR interface.

Unsupervised Learning using Hierarchical Models for Hand-Object Interactions

Xu Xie*, Hangxin Liu*, Mark Edmonds, Feng Gao, Siyuan Qi, Yixin Zhu, Brandon Rothrock, Song-Chun Zhu.
* equal contributors
IEEE International Conference on Robotics and Automation (ICRA), 2018
Conference PapersRobotics

Abstract

Contact forces of the hand are visually unobservable, but play a crucial role in understanding hand-object interactions. In this paper, we propose an unsupervised learning approach for manipulation event segmentation and manipulation event parsing. The proposed framework incorporates hand pose kinematics and contact forces using a low-cost easy-to-replicate tactile glove. We use a temporal grammar model to capture the hierarchical structure of events, integrating extracted force vectors from the raw sensory input of poses and forces. The temporal grammar is represented as a temporal And-Or graph (T-AOG), which can be induced in an unsupervised manner. We obtain the event labeling sequences by measuring the similarity between segments using the Dynamic Time Alignment Kernel (DTAK). Experimental results show that our method achieves high accuracy in manipulation event segmentation, recognition and parsing by utilizing both pose and force data.

A Glove-based System for Studying Hand-Object Manipulation via Joint Pose and Force Sensing

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017
Conference PapersRoboticsSensing

Abstract

We present a design of an easy-to-replicate glove-based system that can reliably perform simultaneous hand pose and force sensing in real time, for the purpose of collecting human hand data during fine manipulative actions. The design consists of a sensory glove that is capable of jointly collecting data of finger poses, hand poses, as well as forces on palm and each phalanx. Specifically, the sensory glove employs a network of 15 IMUs to measure the rotations between individual phalanxes. Hand pose is then reconstructed using forward kinematics. Contact forces on the palm and each phalanx are measured by 6 customized force sensors made from Velostat, a piezoresistive material whose force-voltage relation is investigated. We further develop an open-source software pipeline consisting of drivers and processing code and a system for visualizing hand actions that is compatible with the popular Raspberry Pi architecture. In our experiment, we conduct a series of evaluations that quantitatively characterize both individual sensors and the overall system, proving the effectiveness of the proposed design.

Feeling the Force: Integrating Force and Pose for Fluent Discovery through Imitation Learning to Open Medicine Bottles

Mark Edmonds*, Feng Gao*, Xu Xie, Hangxin Liu, Siyuan Qi, Yixin Zhu, Brandon Rothrock, Song-Chun Zhu.
* equal contributors
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017
Conference PapersRobotics

Abstract

Learning complex robot manipulation policies for real-world objects is challenging, often requiring significant tuning within controlled environments. In this paper, we learn a manipulation model to execute tasks with multiple stages and variable structure, which typically are not suitable for most robot manipulation approaches. The model is learned from human demonstration using a tactile glove that measures both hand pose and contact forces. The tactile glove enables observation of visually latent changes in the scene, specifically the forces imposed to unlock the child-safety mechanisms of medicine bottles. From these observations, we learn an action planner through both a top-down stochastic grammar model (And-Or graph) to represent the compositional nature of the task sequence and a bottom-up discriminative model from the observed poses and forces. These two terms are combined during planning to select the next optimal action. We present a method for transferring this human-specific knowledge onto a robot platform and demonstrate that the robot can perform successful manipulations of unseen objects with similar task structure.

Reliable Infrastructural Urban Traffic Monitoring Via Lidar and Camera Fusion

Yi Tian, Hangxin Liu, Tomonari Furukawa

SAE International Journal of Passenger Cars-Electronic and Electrical Systems, 2017
Journal PaperSensing

Abstract

This paper presents a novel infrastructural traffc monitoring approach that estimates traffc information by combining two sensing techniques. The traffc information can be obtained from the presented approach includes passing vehicle counts, corresponding speed estimation and vehicle classifcation based on size. This approach uses measurement from an array of Lidars and video frames from a camera and derives traffc information using two techniques. The frst technique detects passing vehicles by using Lidars to constantly measure the distance from laser transmitter to the target road surface. When a vehicle or other objects pass by, the measurement of the distance to road surface reduces in each targeting spot, and triggers detection event. The second technique utilizes video frames from camera and performs background subtraction algorithm in each selected Region of Interest (ROI), which also triggers detection when vehicle travels through each ROI. Based on detection events, vehicle location is estimated respectively. The fnal location estimation is derived by fusing the two estimation in the framework of Recursive Bayesian Estimation (RBE). Vehicle counts, speed estimation and classifcation are then performed using the vehicle location estimation in each time step. The approach achieves high reliability by combing the strength of both sensors. A sensor prototype has been built and multiple feld experiments have been completed. High reliability is demonstrated in experiment by achieving more than 95% accuracy both in vehicle counting and classifcation.

Non-Field-Of-View Sound Source Localization Using Diffraction and Reflection Signals


IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016
Conference PapersRoboticsSensing

Abstract

This paper describes a non-field-of-view (NFOV) localization approach for a mobile robot in an unknown environment based on an acoustic signal combined with the geometrical information from an optical sensor. The approach estimates the location of a target through the mobile robot’s sensor observation frame, which consists of a combination of diffraction and reflection acoustic signals and a 3-D environment geometrical description. This fusion of audio-visual sensor observation likelihoods allows the robot to estimate the NFOV target. The diffraction and reflection observations from the microphone array generate the acoustic joint observation likelihood. The observed geometry also determines far-field or near-field acoustic conditions to improve the estimation of the sound direction of arrival. A mobile robot equipped with a microphone array and an RGB-D sensor was tested in a controlled environment, an anechoic chamber, to demonstrate the NFOV localization capabilities. This resulted in +/- 18 degrees, and less than 0.75 m error in angle and distance estimation, respectively.

Design of Highly Reliable Infrastructural Traffic Monitoring Using Laser and Vision Sensors

Hangxin Liu, Yi Tian, Tomonari Furukawa

ASME International Design Engineering Technical Conferences and Computers and Information in Engineering Conference (ASME IDETC), 2016
Conference PapersSensing

Abstract

This paper presents a novel design of infrastructural traffic monitoring that performs vehicle counts, speed estimation, and vehicles classification by deploying three different approaches using two types of sensor, infrared (IR) cameras and laser range finders (LRFs). The first approach identifies passing vehicles by using LRFs and measuring the time-of-flight to the ground, which changes when vehicles pass. In the second approach, LRFs are used only to project a dotted line onto ground, and an IR camera identifies passing vehicles by recognizing the change of location of these laser dots in its images. The third approach utilizes an IR camera only and recognizes passing vehicles in each frame using background subtraction and edge detection algorithms. The design achieves high reliability because each approach has different strengths. A prototype system has been built and the field tests at a public road show promising results by achieving high reliability by having 95% accuracy in traffic counting and speed estimation.

Recursive Bayesian estimation of NFOV target using diffraction and reflection signals


ISIF International Conference on Information Fusion (FUSION), 2016
Conference PapersRoboticsSensing

Abstract

This paper presents an approach to the recursive Bayesian estimation of non-field-of-view (NFOV) sound source tracking based on reflection and diffraction signals with an incorporation of optical sensors. The approach takes multi-modal sensoy fusion of a mobile robot, which combines an optical 3D environment geometrical description with a microphone array acoustic signal to estimate the target location. The robot estimates target location either in the field-of-view (FOV) or in the NFOV by fusion of sensor observation likelihoods. For the NFOV case, the microphone array provides reflection and diffraction observations to generate a joint acoustic observation likelihood. With the data fusion between the 3D description and the acoustic observation, the target estimation is performed in an unknown environment. Finally, the sensor observation combined with the motion model of the target iteratively performs tracking within a recursive Bayesian estimation framework. The proposed approach was tested with a microphone array with an RGBD sensor in a controlled anechoic chamber to demonstrate the NFOV tracking capabilities for a moving target.