This paper introduces a framework called Nano-Capsulator for compressing long prompts into natural language (NL) formatted Capsule Prompts while maintaining their utility and transferability across different large language models (LLMs). The main challenges are that NL prompts are incompatible with back-propagation and lack flexibility in imposing length constraints. To address these challenges, the Nano-Capsulator framework uses a reward function that interacts with a semantics-preserving loss and a reward function that features length constraints. The framework compresses original prompts into NL formatted Capsule Prompts while maintaining the prompt utility and transferability. Experimental results show that the Capsule Prompt can reduce the original length by 81.4%, decrease inference latency up to 4.5×, and save 80.1% of budget overheads while providing transferability across diverse LLMs and different datasets. The framework is effective in preserving the performance of the original prompts and is applicable to both few-shot demonstration CoT and input contextual prompts. The framework is also capable of being directly applied to unseen datasets without any further training, provided these new datasets encompass downstream tasks with similar domains. The framework is evaluated on two different prompt types: few-shot demonstration chain-of-thoughts (CoT) and passage prompts of reading comprehension. The results demonstrate that the Capsule Prompt exhibits strong transferability across different LLMs and similar but unseen downstream datasets. The framework is also effective in reducing inference latency and budget overheads. The framework is evaluated on multiple datasets, including CommonsenseQA, GSM8K, MultiRC, and TriviaQA-Long. The results show that the framework achieves high accuracy and performance on these datasets. The framework is also effective in preserving the semantic meaning of the original prompts and is capable of being applied to a wide range of tasks. The framework is evaluated on multiple LLMs, including Vicuna-13B, PaLM, and Claude2, and shows strong performance across these models. The framework is also effective in reducing the computational latency of the inference process. The framework is evaluated on multiple LLMs, including OPT-2.7B and Vicuna-13B, and shows significant improvements in inference speed. The framework is also effective in reducing the cost of API calls for LLMs. The framework is evaluated on multiple LLMs, including PaLM, and shows significant cost savings. The framework is effective in preserving the utility of the original prompts and is capable of being applied to a wide range of tasks. The framework is evaluated on multiple datasets and shows strong performance across these datasets. The framework is also effective in reducing the computational latency of the inference process. The framework is evaluated on multiple LLMs and shows significant improvements in inference speed. The framework is effective in preserving the utility of the original prompts and is capable of being applied to a wide range of tasks. The framework is evaluated on multiple datasets and shows strong performance across these datasets. The frameworkThis paper introduces a framework called Nano-Capsulator for compressing long prompts into natural language (NL) formatted Capsule Prompts while maintaining their utility and transferability across different large language models (LLMs). The main challenges are that NL prompts are incompatible with back-propagation and lack flexibility in imposing length constraints. To address these challenges, the Nano-Capsulator framework uses a reward function that interacts with a semantics-preserving loss and a reward function that features length constraints. The framework compresses original prompts into NL formatted Capsule Prompts while maintaining the prompt utility and transferability. Experimental results show that the Capsule Prompt can reduce the original length by 81.4%, decrease inference latency up to 4.5×, and save 80.1% of budget overheads while providing transferability across diverse LLMs and different datasets. The framework is effective in preserving the performance of the original prompts and is applicable to both few-shot demonstration CoT and input contextual prompts. The framework is also capable of being directly applied to unseen datasets without any further training, provided these new datasets encompass downstream tasks with similar domains. The framework is evaluated on two different prompt types: few-shot demonstration chain-of-thoughts (CoT) and passage prompts of reading comprehension. The results demonstrate that the Capsule Prompt exhibits strong transferability across different LLMs and similar but unseen downstream datasets. The framework is also effective in reducing inference latency and budget overheads. The framework is evaluated on multiple datasets, including CommonsenseQA, GSM8K, MultiRC, and TriviaQA-Long. The results show that the framework achieves high accuracy and performance on these datasets. The framework is also effective in preserving the semantic meaning of the original prompts and is capable of being applied to a wide range of tasks. The framework is evaluated on multiple LLMs, including Vicuna-13B, PaLM, and Claude2, and shows strong performance across these models. The framework is also effective in reducing the computational latency of the inference process. The framework is evaluated on multiple LLMs, including OPT-2.7B and Vicuna-13B, and shows significant improvements in inference speed. The framework is also effective in reducing the cost of API calls for LLMs. The framework is evaluated on multiple LLMs, including PaLM, and shows significant cost savings. The framework is effective in preserving the utility of the original prompts and is capable of being applied to a wide range of tasks. The framework is evaluated on multiple datasets and shows strong performance across these datasets. The framework is also effective in reducing the computational latency of the inference process. The framework is evaluated on multiple LLMs and shows significant improvements in inference speed. The framework is effective in preserving the utility of the original prompts and is capable of being applied to a wide range of tasks. The framework is evaluated on multiple datasets and shows strong performance across these datasets. The framework