Grammar Engineering for CCG using Ant and XSLT*

Grammar Engineering for CCG using Ant and XSLT*

June 2009 | Scott Martin, Rajakrishnan Rajkumar, and Michael White
The paper by Scott Martin, Rajakrishnan Rajkumar, and Michael White from Ohio State University's Department of Linguistics presents an innovative approach to corpus conversion and grammar extraction using Ant and XSLT. The authors argue that traditional methods, which treat these tasks as one-time processes, can be improved by making them more flexible and iterative. They use the CCGbank as input and enhance it with additional linguistic features such as Propbank roles, head lexicalization for case-marking prepositions, derivational restructuring for punctuation analysis, named entity annotation, and lemmatization. The system employs successive XSLT transforms controlled by Apache Ant to produce an OpenCCG grammar, leveraging XSLT's ability to perform arbitrary transformations of XML trees and Ant's fine-grained control. This design facilitates state-of-the-art BLEU scores for surface realization on section 23 of the CCGbank. The paper also discusses the benefits of separating the grammar engineering task into configurable processes using Ant tasks, which simplifies process management and speeds up experimentation. The experimental results show significant improvements in single-rooted logical forms (LFs) and BLEU scores, highlighting the importance of grammar engineering improvements alongside statistical model enhancements. Future work will focus on increasing the number of single-rooted LFs and integrating the system with OpenCCG.The paper by Scott Martin, Rajakrishnan Rajkumar, and Michael White from Ohio State University's Department of Linguistics presents an innovative approach to corpus conversion and grammar extraction using Ant and XSLT. The authors argue that traditional methods, which treat these tasks as one-time processes, can be improved by making them more flexible and iterative. They use the CCGbank as input and enhance it with additional linguistic features such as Propbank roles, head lexicalization for case-marking prepositions, derivational restructuring for punctuation analysis, named entity annotation, and lemmatization. The system employs successive XSLT transforms controlled by Apache Ant to produce an OpenCCG grammar, leveraging XSLT's ability to perform arbitrary transformations of XML trees and Ant's fine-grained control. This design facilitates state-of-the-art BLEU scores for surface realization on section 23 of the CCGbank. The paper also discusses the benefits of separating the grammar engineering task into configurable processes using Ant tasks, which simplifies process management and speeds up experimentation. The experimental results show significant improvements in single-rooted logical forms (LFs) and BLEU scores, highlighting the importance of grammar engineering improvements alongside statistical model enhancements. Future work will focus on increasing the number of single-rooted LFs and integrating the system with OpenCCG.
Reach us at info@study.space