The paper introduces a novel framework for open-vocabulary scene graph generation (SGG) using vision-language models (VLMs). The framework, named Pixels to Scene Graph Generation with Generative VLM (PGSG), leverages image-to-text generation to generate scene graph sequences and then constructs scene graphs from these sequences. This approach allows the model to handle novel visual relation concepts and integrates explicit relational modeling to enhance downstream vision-language tasks. The framework consists of three main components: scene graph prompts, a relationship construction module, and a fine-tuning strategy based on VLMs. Experimental results on three SGG benchmarks (Panoptic Scene Graph, OpenImages-V6, and Visual Genome) demonstrate superior performance in the general open-vocabulary setting. Additionally, the framework is applied to multiple vision-language tasks, showing consistent improvements. The contributions of the work include a novel framework for open-vocabulary SGG, efficient model learning through scene graph prompts and relation-aware conversion, and superior performance on SGG benchmarks and downstream tasks.The paper introduces a novel framework for open-vocabulary scene graph generation (SGG) using vision-language models (VLMs). The framework, named Pixels to Scene Graph Generation with Generative VLM (PGSG), leverages image-to-text generation to generate scene graph sequences and then constructs scene graphs from these sequences. This approach allows the model to handle novel visual relation concepts and integrates explicit relational modeling to enhance downstream vision-language tasks. The framework consists of three main components: scene graph prompts, a relationship construction module, and a fine-tuning strategy based on VLMs. Experimental results on three SGG benchmarks (Panoptic Scene Graph, OpenImages-V6, and Visual Genome) demonstrate superior performance in the general open-vocabulary setting. Additionally, the framework is applied to multiple vision-language tasks, showing consistent improvements. The contributions of the work include a novel framework for open-vocabulary SGG, efficient model learning through scene graph prompts and relation-aware conversion, and superior performance on SGG benchmarks and downstream tasks.