[slides] A SICK cure for the evaluation of compositional distributional semantic models

The paper introduces SICK, a large-scale English benchmark for evaluating compositional distributional semantic models (CDSMs). SICK consists of about 10,000 sentence pairs designed to test the ability of CDSMs to capture lexical, syntactic, and semantic phenomena. Each pair is annotated for semantic relatedness (on a 5-point scale) and entailment (three labels: entailment, contradiction, neutral). The dataset was created by normalizing and expanding sentences from existing data sets, then pairing them to generate diverse sentence pairs. Crowdsourcing was used to annotate the data, ensuring a large number of annotations. The dataset was used in SemEval-2014 Task 1 and is freely available for research. SICK addresses the lack of suitable benchmarks for CDSMs by focusing on sentence-level semantics, avoiding complex linguistic phenomena like idioms and named entities. The dataset includes various types of sentence pairs, such as those with similar meaning, contrasting meaning, and lexical overlap. The relatedness and entailment annotations show that pairs with high relatedness are more likely to be entailed or contradictory. The dataset is valuable for evaluating CDSMs as it captures the key phenomena these models are expected to handle. The paper also discusses related work, including existing datasets like FraCaS and RTE, and highlights the limitations of these datasets in evaluating CDSMs. SICK provides a more focused and comprehensive benchmark for assessing compositional semantics.The paper introduces SICK, a large-scale English benchmark for evaluating compositional distributional semantic models (CDSMs). SICK consists of about 10,000 sentence pairs designed to test the ability of CDSMs to capture lexical, syntactic, and semantic phenomena. Each pair is annotated for semantic relatedness (on a 5-point scale) and entailment (three labels: entailment, contradiction, neutral). The dataset was created by normalizing and expanding sentences from existing data sets, then pairing them to generate diverse sentence pairs. Crowdsourcing was used to annotate the data, ensuring a large number of annotations. The dataset was used in SemEval-2014 Task 1 and is freely available for research. SICK addresses the lack of suitable benchmarks for CDSMs by focusing on sentence-level semantics, avoiding complex linguistic phenomena like idioms and named entities. The dataset includes various types of sentence pairs, such as those with similar meaning, contrasting meaning, and lexical overlap. The relatedness and entailment annotations show that pairs with high relatedness are more likely to be entailed or contradictory. The dataset is valuable for evaluating CDSMs as it captures the key phenomena these models are expected to handle. The paper also discusses related work, including existing datasets like FraCaS and RTE, and highlights the limitations of these datasets in evaluating CDSMs. SICK provides a more focused and comprehensive benchmark for assessing compositional semantics.

A SICK cure for the evaluation of compositional distributional semantic models

| M. Marelli, S. Menini, M. Baroni, L. Bentivogli, R. Bernardi, R. Zamparelli