This paper presents a lightweight and affordable motion capture method using two smartwatches and a head-mounted camera. Unlike existing methods that require six or more expert-level IMU devices, this approach is more cost-effective and convenient. The method enables 3D full-body motion capture in diverse environments, making wearable motion capture accessible to everyone. The key idea is to integrate 6D head poses from the head-mounted camera into the motion estimation pipeline to overcome the sparsity and ambiguities of sensor inputs. An algorithm tracks and updates floor level changes to define head poses, coupled with a multi-stage Transformer-based regression module. Novel strategies leverage visual cues from egocentric images to enhance motion capture quality while reducing ambiguities. The method is demonstrated on various challenging scenarios, including complex outdoor environments and everyday motions. The system can be applied for large-scale and long-term motion captures without location constraints. The method also explores scenarios where signals among individuals are shared, and egocentric observations from one person can be used as sparse third-person views for others. The contributions include the first method to capture high-quality 3D full-body motion from a head-mounted camera and two smartwatches, a novel algorithm to track and update floor levels, and a novel motion optimization module that utilizes visual information captured by monocular egocentric cameras. The method is compared with existing baselines, showing comparable or better performance in motion estimation. The results demonstrate that incorporating head pose directly in estimation can improve performance and mitigate root drift issues without additional localization or correction. The method is robust in challenging locations with non-flat grounds and can be used in multi-user scenarios. The system is supported by various grants and acknowledges the contributions of several individuals.This paper presents a lightweight and affordable motion capture method using two smartwatches and a head-mounted camera. Unlike existing methods that require six or more expert-level IMU devices, this approach is more cost-effective and convenient. The method enables 3D full-body motion capture in diverse environments, making wearable motion capture accessible to everyone. The key idea is to integrate 6D head poses from the head-mounted camera into the motion estimation pipeline to overcome the sparsity and ambiguities of sensor inputs. An algorithm tracks and updates floor level changes to define head poses, coupled with a multi-stage Transformer-based regression module. Novel strategies leverage visual cues from egocentric images to enhance motion capture quality while reducing ambiguities. The method is demonstrated on various challenging scenarios, including complex outdoor environments and everyday motions. The system can be applied for large-scale and long-term motion captures without location constraints. The method also explores scenarios where signals among individuals are shared, and egocentric observations from one person can be used as sparse third-person views for others. The contributions include the first method to capture high-quality 3D full-body motion from a head-mounted camera and two smartwatches, a novel algorithm to track and update floor levels, and a novel motion optimization module that utilizes visual information captured by monocular egocentric cameras. The method is compared with existing baselines, showing comparable or better performance in motion estimation. The results demonstrate that incorporating head pose directly in estimation can improve performance and mitigate root drift issues without additional localization or correction. The method is robust in challenging locations with non-flat grounds and can be used in multi-user scenarios. The system is supported by various grants and acknowledges the contributions of several individuals.