The paper by S. A. Murphy discusses the estimation of optimal dynamic treatment regimes, which are tailored to an individual's changing status over time. The goal is to estimate decision rules that maximize the mean response at the end of a specified time period using experimental or observational data. The methodology proposed makes smooth parametric assumptions only on quantities directly relevant to estimating the optimal rules. The paper uses the potential outcomes model to specify the objective and assumptions, and it illustrates the methodology through a small simulation. The approach involves modeling regret functions, which measure the increase in benefit-to-go by deviating from the optimal decision at each time point. This method allows for the straightforward use of statistical methods such as hypothesis testing and model selection, and it avoids implicit constraints on the form of the regret functions. The paper also provides an estimator of the mean response to the optimal dynamic regime and discusses the computational aspects and limitations of the method.The paper by S. A. Murphy discusses the estimation of optimal dynamic treatment regimes, which are tailored to an individual's changing status over time. The goal is to estimate decision rules that maximize the mean response at the end of a specified time period using experimental or observational data. The methodology proposed makes smooth parametric assumptions only on quantities directly relevant to estimating the optimal rules. The paper uses the potential outcomes model to specify the objective and assumptions, and it illustrates the methodology through a small simulation. The approach involves modeling regret functions, which measure the increase in benefit-to-go by deviating from the optimal decision at each time point. This method allows for the straightforward use of statistical methods such as hypothesis testing and model selection, and it avoids implicit constraints on the form of the regret functions. The paper also provides an estimator of the mean response to the optimal dynamic regime and discusses the computational aspects and limitations of the method.