May 11-16, 2024 | Xianzhe Fan, Zihan Wu, Chun Yu, Fenggui Rao, Weinan Shi, Teng Tu
ContextCam is a novel human-AI image co-creation system that integrates context awareness with mainstream AI-generated content (AIGC) technologies like Stable Diffusion. The system extracts relevant contextual data to inspire users and leverages Large Language Model (LLM)-based multi-agent systems to co-create images with the user. A study with 16 participants and 136 scenarios revealed that ContextCam was well-received, with participants providing positive feedback on their engagement and enjoyment when using the system. The system's workflow consists of two phases: "framing" and "focusing." In the "framing" phase, ContextCam deduces user intent based on contextual data and generates three themes for the image. In the "focusing" phase, users collaborate with AI to create images that meet their satisfaction. The system provides a canvas where human creativity meets AI innovation, enhancing the creation's depth and personalization.
ContextCam's design is informed by a formative study with 23 participants, which identified six design guidelines for context-aware human-AI image co-creation. These guidelines include requiring less user burden, designing a well-functioning intent understanding and image generation mechanism, accommodating a variety of input methods, offering a range of suggestions, focusing on the most relevant contextual information, and allowing users to control which contextual information is utilized. The system uses a multi-agent approach, with agents responsible for distinct tasks such as context selection, topic generation, tool management, artist assistance, and personalization. The system collects contextual data such as location, screen content, facial expression, weather, and music to generate images that reflect the user's current context.
The user study with 16 participants showed high user satisfaction with the images produced by ContextCam. In 92.9% of the scenarios, participants picked ContextCam's topic recommendations. The average user input was 1.1 words per interaction. Participants also rated high overall enjoyment, engagement, usability, and inspiration. The system's ability to incorporate contextual information into the image generation process was found to significantly enhance the creative journey and foster innovation. The research highlights the role of contextual information in impacting image themes, influencing user behaviors, acting as a source of creative inspiration, and enriching collaborative experiences between humans and AI. The study concludes that context-aware human-AI image co-creation has the potential to significantly impact inspiration and user engagement in image creation.ContextCam is a novel human-AI image co-creation system that integrates context awareness with mainstream AI-generated content (AIGC) technologies like Stable Diffusion. The system extracts relevant contextual data to inspire users and leverages Large Language Model (LLM)-based multi-agent systems to co-create images with the user. A study with 16 participants and 136 scenarios revealed that ContextCam was well-received, with participants providing positive feedback on their engagement and enjoyment when using the system. The system's workflow consists of two phases: "framing" and "focusing." In the "framing" phase, ContextCam deduces user intent based on contextual data and generates three themes for the image. In the "focusing" phase, users collaborate with AI to create images that meet their satisfaction. The system provides a canvas where human creativity meets AI innovation, enhancing the creation's depth and personalization.
ContextCam's design is informed by a formative study with 23 participants, which identified six design guidelines for context-aware human-AI image co-creation. These guidelines include requiring less user burden, designing a well-functioning intent understanding and image generation mechanism, accommodating a variety of input methods, offering a range of suggestions, focusing on the most relevant contextual information, and allowing users to control which contextual information is utilized. The system uses a multi-agent approach, with agents responsible for distinct tasks such as context selection, topic generation, tool management, artist assistance, and personalization. The system collects contextual data such as location, screen content, facial expression, weather, and music to generate images that reflect the user's current context.
The user study with 16 participants showed high user satisfaction with the images produced by ContextCam. In 92.9% of the scenarios, participants picked ContextCam's topic recommendations. The average user input was 1.1 words per interaction. Participants also rated high overall enjoyment, engagement, usability, and inspiration. The system's ability to incorporate contextual information into the image generation process was found to significantly enhance the creative journey and foster innovation. The research highlights the role of contextual information in impacting image themes, influencing user behaviors, acting as a source of creative inspiration, and enriching collaborative experiences between humans and AI. The study concludes that context-aware human-AI image co-creation has the potential to significantly impact inspiration and user engagement in image creation.