The paper presents a novel approach for humanoid loco-manipulations in industrial settings, focusing on complex tasks involving manipulation and localization. The authors propose a combined vision-based tracker and localization system integrated with a task-space whole-body optimization control framework. This system uses a fast dense 3D model-based tracking method with wide-angle depth images to achieve good perception complementarity between manipulation and localization. The approach allows humanoid robots to manipulate and assemble large-scale objects while walking, as demonstrated in experiments using two different humanoid robots (HRP-2KAI and HRP-5P) for rolling and assembling a heavy and wide bobbin in an unwinder. The experiments highlight the effectiveness of the proposed method in handling large objects with limited texture and strong distortions in wide-angle depth images. The paper also discusses the challenges and solutions in visual SLAM and object tracking, emphasizing the importance of combining these techniques to achieve robust loco-manipulation tasks in industrial environments.The paper presents a novel approach for humanoid loco-manipulations in industrial settings, focusing on complex tasks involving manipulation and localization. The authors propose a combined vision-based tracker and localization system integrated with a task-space whole-body optimization control framework. This system uses a fast dense 3D model-based tracking method with wide-angle depth images to achieve good perception complementarity between manipulation and localization. The approach allows humanoid robots to manipulate and assemble large-scale objects while walking, as demonstrated in experiments using two different humanoid robots (HRP-2KAI and HRP-5P) for rolling and assembling a heavy and wide bobbin in an unwinder. The experiments highlight the effectiveness of the proposed method in handling large objects with limited texture and strong distortions in wide-angle depth images. The paper also discusses the challenges and solutions in visual SLAM and object tracking, emphasizing the importance of combining these techniques to achieve robust loco-manipulation tasks in industrial environments.