I am a research scientist and team lead of the Robotics Lab at Beijing Institute for General Artificial Intelligence. In the robotics lab, my colleagues and I hypothesize that there exists some fundamental representations or cognitive architectures underpinning the intelligent behaviors of humans. By uncovering and reproducing such an architecture, we hope to enable long-term human-robot shared autonomy by bridging
Perception (how to extract and organize more expressive symbols from dense sensory signals);
Reasoning (how to utilze thoes abstract information for higher-level skills, which in turn faciliates perception);
Task and Motion Planning (how to effectively act and react).
Before joining VCLA, I graduated with a B.S. in Mechanical Engineering and a B.S. in Computer Science with a Mathematics minor from Virginia Polytechnic Institute and State University (Virginia Tech) in 2016. At Virginia Tech, I was advised by Professor Tomonari Furukawa with generous support from Murata Electronics North America, Inc and NSF-EAGER-1554961
I'm interested in robot perception, planning, learning, human-robot interaction, and virtual and augmented reality.
Representative papers are highlighted.
To endow embodied AI agents with a deeper understanding of hand-object interactions, we design a data glove that can be reconfigured to collect grasping data in three modes: (i) force exerted by hand using piezoresistive material, (ii) contact points by grasping stably in VR, and (iii) reconstruct both visual and physical effects during the manipulation by integrating physics-based simulation.
We present an optimization framework to redesign an indoor scene by rearranging the furniture within it, which maximizes free space for service robots to operate while preserving human's preference for scene layout.
We rethink the problem of scene reconstruction from an embodied agent’s perspective. The objects within a reconstructed scene are segmented and replaced by part-based articulated CAD models to provide actionable information and afford finer-grained robot interactions.
We devise a 3D scene graph representation to abstract scene layouts with succinct geometric information and valid robot-scene interactions, such that a valid task plan can be computed using graph editing distance between the initial and the final scene graph while effectively satisfying constraints in motion level.
Leveraging the input redundancy in over-actuated UAVs, we tackle downwash effects between propellers with a novel control allocation framework that explores the entire allocation space for an optimal solution that reduces counteractions of airflows.
Learning key physical properties of tool-uses from a FEM-based simulation and enacting those properties via an optimal control-based motion planning scheme to produce tool-use strategies drastically different from observations, but with the least joint efforts.
A cooperative planning framework to generate optimal trajectories for a robot duo tethered by a flexible net to gather scattered objects spread in a large area. Implemented Model Reference Adaptive Control (MRAC) to handle unknown dynamics of carried payloads.
Given an interpretable And-Or-Graph knowledge representation, the proposed AR interface allows users to intuitively understand and supervise robot's behaviors, and interactively teach the robot with new actions.
Constructing a Virtual Kinematic Chain (VKC) that readily consolidates the kinematics of the mobile base, the arm, and the object to be manipulated in mobile manipulations.
Reconstructing an interactive scene from RGB-D data stream by panoptic mapping and organizing object affordance and contextual relations by a graph-based scene representation.
Physical robots can control and alter virtual objects in AR as an active agent and proactively interact with human agents, instead of purely passively executing received commands.
A modular robot system that allows non-expert users to build a multi-legged robot in various morphologies using a set of building blocks with sensors and actuators embedded.
Learning an action planner through both a top-down stochastic grammar model (And-Or graph) and a bottom-up discriminative model from the observed poses and forces