This paper presents a new linguistic resource for the generation of paraphrases in Portuguese, based on the lexicon-grammar framework. The results show that the proposed methods have better performances than the baselines based on the established CIP dataset. We further deploy three baselines and two novel CIP approaches to deal with CIP problems. To circumvent difficulties in acquiring annotations, we first establish a large-scale CIP dataset based on human and machine collaboration, which consists of 115,530 sentence pairs. In this study, CIP task is treated as a special paraphrase generation task. Since the sentences without idioms are easier handled by Chinese NLP systems, CIP can be used to pre-process Chinese datasets, thereby facilitating and improving the performance of Chinese NLP tasks, e.g., machine translation system, Chinese idiom cloze, and Chinese idiom embeddings. CIP aims to rephrase idioms-included sentences to non-idiomatic ones under the premise of preserving the original sentence's meaning.
![you are on cloud nine meaning you are on cloud nine meaning](http://www.emojimeanings.org/wp-content/uploads/2016/03/cloud-nine.png)
This study proposes a novel task, denoted as Chinese Idiom Paraphrasing (CIP).
![you are on cloud nine meaning you are on cloud nine meaning](https://venturebeat.com/wp-content/uploads/2018/09/IMG_20180903_100317.jpg)
Due to the properties of non-compositionality and metaphorical meaning, Chinese Idioms are hard to be understood by children and non-native speakers. Idioms, are a kind of idiomatic expression in Chinese, most of which consist of four Chinese characters.