Meaning Variation in Paraphrase

Meaning Variation in Paraphrase Ellie Pavlick My research focuses on enabling computers to understand language. As humans, we document most of what ...
Author: Rhoda Briggs
0 downloads 3 Views 170KB Size
Meaning Variation in Paraphrase

Ellie Pavlick

My research focuses on enabling computers to understand language. As humans, we document most of what we know and learn in the form of speech or text. Without the ability to comprehend what humans say and write, computers can access only a tiny fraction of the world’s knowledge. Automatic natural language understanding can allow computers to organize the vast amount of information available and to communicate with the people who want to access it (Singhal, 2012; Dong et al., 2014) . Although humans do it without thinking, understanding natural language is very difficult for computers. Much of this difficulty is due to the fact that language exhibits both ambiguity– we can use the same words to express many different meanings– and redundancy– we have many different ways of expressing the same meaning. Figure 1 illustrates how understanding a single word, head, is a highly complex task. Depending on the context, it may mean brain or leader. Even within the leader sense, we have an vast number of options, or paraphrases, which communicate the same concept. Some of these are specific (CEO) while others are general (boss). Some are fairly formal (manager ) while others are very colloquial (top guy). While expressing the same general concept, each of these paraphrases alters the meaning of the text, sometimes very subtly and other times quite drastically. Most existing work on paraphrases adopts an informal definition of paraphrases as utterances with pre-frontal cortex ⊏ brain ≣ mind ≣ noggin “approximate conceptual equivalence” in “many Use your head. contexts” (Barzilay, 2003). My thesis work is aimed The head of the company. at adding clarity to this definition: I am interested in determining exactly which aspects of meaning are CEO ⊏ manager ⊏ leader ≣ boss ⊏ top guy shared between paraphrases, and in which contexts a paraphrase holds. I have built robust statistical Figure 1: Paraphrases, different ways of models for determining when two expressions can be expressing the same idea, make automatic treated as paraphrases and when they cannot (§1). language understanding difficult. Given a pair of paraphrases, I have modeled how they differ in terms of their semantic (§2) and their pragmatic (§3) meaning. During the final years of my PhD (§4), I will focus on how local context can affect the meaning of a paraphrase, both from a language understanding and a language generation perspective.

1

Identifying good paraphrases

Recognizing paraphrases is a huge challenge for computers. The default representation of language as text strings would mean that a computer could never conclude that head == leader, since the strings are simply not the same. However, knowledge of such equivalences is essential for nearly every natural language processing task, e.g. information extraction, question answering, and summarization. A variety of approaches have been explored that attempt to discover when text strings might share meaning. For example, we might assume that phrases which are used in similar contexts are likely to be paraphrases (Mikolov et al., 2013), or that phrases which share a foreign language translation are good paraphrases (Ganitkevitch et al., 2013). While both of these are reasonable assumptions, I have shown in Pavlick et al. (2015a)

that each method alone is likely to make significant errors: the former often confusing antonyms for paraphrases, and the latter often mistaking unrelated words for paraphrases. In Pavlick et al. (2015c) we addressed these shortcomings by introducing a discriminative model which combines both contextual similarity from monolingual text and bilingual similarities from parallel corpora in order to determine if two phases are good paraphrases. Our model produces state-ofthe-art rankings of paraphrase quality, achieving a correlation of 0.71 with human judgements, compared to just 0.46 achieved by the best previously published model. Our ranking model provides an important first step by identifyGeneral Biology ing pairs of utterances that are likely to be good paraphrases in at hot warm heated least some contexts. Differentiating the contexts in which the parasexy warm exciting thermal phrase holds from those in which it does not is a separate problem. I treat address cure have made several contributions to the literature on context-dependent handle fight buy kill paraphrasing, and will continue to pursue problems in this area in my future work (§4). In Pavlick et al. (2015d), we used crowdsourcing head leader skull boss brain to disambiguate context dependent paraphrases, expanding a widelymind cranium used semantic resource by 300% and releasing a valuable dataset to the Table 1: The goodness research community. In Pavlick et al. (2015b), we used language modof a paraphrase depends eling techniques to adapt paraphrase models to specialized domains. on the domain. As illustrated by Table 1, paraphrases that are highly probable in general (e.g. hot = sexy) can be extremely improbable when we move into a new domain, like biology. We significantly improve paraphrase recognition by adapting our paraphrase extraction model to the domain in which it will be applied. We further show that by combining our high-precision domain-adapted model with a broad-coverage unadapted model, we are able to substantially improve paraphrase ranking without any loss in coverage.

2

Adding semantics to paraphrasing

Identifying good paraphrases is only part of the problem. As illustrated in Figure 1, utterances are rarely, if ever, perfectly substitutable. Rather, paraphrases often exhibit subtle, or not-sosubtle, differences in meaning or in style. For full language understanding, it is not enough to know simply whether a paraphrase is good or bad. For example, boss may be a reasonably good paraphrase of CEO, but given the statement she is the boss, we cannot conclude that she is the CEO. These types of directed entailments are critical for many applications, such as question answering, knowledge base population, and inference. It is important to know, if two phrases are “good” paraphrases, whether they are mutually substitutable (boss/supervisor ), or if one has a more specific semantic denotation than the other (boss/CEO). Similarly, if two phrase are “bad” paraphrases, is it because they are unrelated (good CEO/rich CEO) or do they actually contradict one another (good CEO/awful CEO)? Such finer-grained semantics can allow computers to reason robustly and accurately about language, and to infer new facts that have not been explicitly stated (e.g She is an awful CEO → She is not a good boss). Equivalent look at/watch a person/someone clean/cleanse away/out distant/remote phone/telephone

Entailment little girl/girl kuwait/country tower/building the cia/agency sneaker/footwear heroin/drug

Exclusion close/open minimal/significant boy/young girl nobody/someone blue/green france/germany

Topically related swim/water husband/marry to oil/oil price country/patriotic drive/vehicle family/home

Unrelated girl/play found/party profit/year man/talk car/family holiday/series

Table 2: Different entailment relations arising in paraphrases extracted by data-driven methods. Automatic methods for paraphrase extraction and ranking, like those described in Section 1, are likely to conflate the more subtle relations that exist between utterances. Table 2 shows the range of entailment relations that appear in the Paraphrase Database (PPDB) (Ganitkevitch et al., 2013), the largest available resource of automatically-learned paraphrases. Because of this lack of well-defined semantics in automatic paraphrase resources, complex language understand-

F1 Score

ing systems for question answering and inference often rely on hand-built semantic resources (Rosenthal et al., 2014) which are expensive to maintain and offer only limited coverage. In my recent work, I have aimed to automatically add these finegrained semantic annotations to data-driven paraphrase resources, en- 0.67 abling nuanced natural language inference at scale. In Pavlick et 0.62 al. (2015a), we trained a statistical model to automatically anno- 0.57 tate each of PPDB’s 176 million paraphrases pairs with an explicit 0.52 entailment relation. These entailment relations are derived from nat- 0.47 ural logic (MacCartney, 2009), a lightweight semantic formalism de- 0.42 Baseline WordNet PPDB+ signed to operate over natural language itself, as opposed to heavier representations like first-order-logic. Our model was able to reach 80% accuracy against gold standard human labels. More importantly, Figure 2: Our model when incorporated into an end-to-end system for textual inference, (PPDB+) outperforms our automatically-assigned entailment relations improved performance WordNet, a manually(F1 score) by 15 points over baseline, even outperforming a state-ofbuilt semantic resource. the-art manually-built resource (Figure 2).

3

Recognizing stylistic shifts in paraphrases

Language often carries more meaning than is captured by its literal semantic content. Even when two paraphrases are truly equivalent in terms of their semantic denotation, there is still substantial room for variation. For example, the word brain has arguably the same meaning as the word noggin in terms of what it denotes, but the words differ substantially in terms of their formality. Such stylistic differences provide insight into the relationship between the speaker and the audience, the level of fact versus opinion expressed, and the use of literal versus figurative language (Hovy, 1987; Endrass et al., 2011). Understanding pragmatic dimensions in language is necessary for search personalization, as well as for tasks like information extraction and question answering, which often assume text consists of factual statements from impartial authors (Trummer et al., 2015). Pragmatic competence is also central in language generation: a machine translation system cannot output noggin instead of brain in the context the cancer has spread to his brain. agreed assumes currently most beautiful following a the man who

→ → → → → →

great implies today very nice in the aftermath one who

→ → → → → →

sure imagine now really nice in the wake the one that

→ → → → → →

yeah guess nowadays really pretty right after the guy that

Table 3: Groups of paraphrases ordered from formal (left) to casual (right). I have worked on detecting stylistic differences at the word and phrase level, as well as at the sentence level and document level. In Pavlick and Nenkova (2015), we proposed a lightweight, semi-supervised method for placing sets of paraphrases along stylistic dimensions, such as formality and complexity (Table 3). Our automatically induced style dimensions neared human upper bounds for determining the relative formality and complexity of words and phrases. We were further able to apply our method to characterize larger units of text such as sentences and documents, again producing near-human accuracies and providing compelling evidence for the importance of style, not just content, in the characterization of text genres. In Pavlick and Tetreault (Provisionally accepted), we extended this idea further by building a statistical model for characterizing the formality of language at the sentence level. In our work, we presented a rich analysis of the language variants that characterize formality, which included both the choice of stylistically revealing paraphrases as well as differences in grammar and deixis. We applied our model to a deep linguistic analysis of formality in online discussions, highlighting the importance of pragmatic competance for language generation and dialogue systems.

4

Future work

In the final years of my PhD, I will continue to push the state-of-the-art in paraphrasing and to refine the definition of paraphrases as meaning “approximately” the same thing in “many” contexts. By making the similarities, and more importantly the differences, between paraphrases explicit, we can provide computers with deeper representations of human language, bringing them ever closer to full natural language understanding. I will investigate how a paraphrase’s substitutability and entailment properties depend on linguistic context, building models that are robust and adaptable to the variable and ever-evolving language seen online today. Word sense Multiword expression Selectional preference Informal language

a) b) c) d) e) f)

Relation head == leader grab == seize gave == offered king != queen million != billion longest == argmax

True The head of the team. He grabbed the suitcase. She gave advice. I speak the king of England. Google acquires Waze for $966 million. The Nile is the world’s longest river.

False Use your head. Let’s grab a drink. She gave a talk. You’ll live like a king. There are a million ways to do it. Ugh LOTR is the longest movie!

Table 4: Context-specific factors affect the validity of paraphrases and semantic entailments. Context dependent paraphrasing Context has an enormous effect on when and where a paraphrase can be applied. Word sense ambiguity (Table 4a) poses a great challenge to question answering and information retrieval systems that must determine whether or not a document or a sentence answers a user’s question. Even if a paraphrase is semantically correct in terms of word sense, it might violate linguistic norms or sound unnatural (Table 4b-c), important considerations when generating natural language e.g in machine translation and dialogue systems. In my future work, I will develop context-specific paraphrase models which determine whether a paraphrase pair is good in the specific context of interest, rather than simply in “some” context. I will predict whether the paraphrase preserves the semantic meaning of the original phrase, as well as whether the substitution results in a natural, fluent sentence. This context-sensitive paraphrase model will provide invaluable information for both natural language inference and language generation systems. Entailment in informal language Most work on natural language inference assumes that humans speak in clear, factual statements that can be understood using a combination of logical reasoning and knowledge bases of facts. In reality, humans are highly flexible in their use of language. Many of the formal taxonomic relations found in resources like Freebase and Google’s Knowledge Graph must be adapted when applied to vernacular language (Feng et al., 2015), and words which are conventionally non-entailing can act as synonyms when used informally or non-literally (Table 4d-f). While humans have no difficulty differentiating formal language, in which strict logical reasoning is applicable, from informal language, in which softer semantic reasoning is necessary, computers are not yet capable of handling such variability. I will extend my context-specific paraphrasing models to handle noisy and informal language use, with a specific focus on how entailment changes based on stylistic context. Such models are essential for understanding and performing inference in the context of rapidly-growing volumes of usergenerated text appearing in social media, blogs, and discussion forums. Language generation with pragmatic constraints Ultimately, computers should not just process language, they should converse with humans in a personal and natural way. Can machine translation systems recognize the style of their input and preserve this style in their output? Can dialogue systems adapt their language to their users? Can computers automatically rewrite a document to fit the preferences of a reader (e.g. by making it simpler, or more formal)? I plan to explore the use of machine translation models for generating paraphrases which meet such pragmatic requirements. I will explore ways in which computers can not just recognize stylistic variation in human language, but can adapt dynamically in response to it.

References Regina Barzilay. 2003. Information fusion for multidocument summarization: paraphrasing and generation. Ph.D. thesis, Columbia University. Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, Wilko Horn, Ni Lao, Kevin Murphy, Thomas Strohmann, Shaohua Sun, and Wei Zhang. 2014. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 601–610. ACM. Birgit Endrass, Matthias Rehm, and Elisabeth Andr´e. 2011. Planning small talk behavior with cultural influences for multiagent systems. Computer Speech & Language, 25(2):158–174. Song Feng, Sujith Ravi, Ravi Kumar, Polina Kuznetsova, Wei Liu, Alexander C Berg, Tamara L Berg, and Yejin Choi. 2015. Refer-to-as relations as semantic knowledge. In AAAI Conference on Artificial Intelligence. Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. 2013. PPDB: The paraphrase database. In Proceedings of NAACL-HLT, pages 758–764, Atlanta, Georgia, June. Association for Computational Linguistics. Eduard Hovy. 1987. Generating natural language under pragmatic constraints. Journal of Pragmatics, 11(6):689–719. Bill MacCartney. 2009. Natural language inference. Ph.D. thesis, Citeseer. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781. Ellie Pavlick and Ani Nenkova. 2015. Inducing lexical style properties for paraphrase and genre differentiation. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 218–224, Denver, Colorado, May– June. Association for Computational Linguistics. Ellie Pavlick and Joel Tetreault. Provisionally accepted. An empirical analysis of formality in online communication. In Transactions of the Association of Computational Linguistics. Ellie Pavlick, Johan Bos, Malvina Nissim, Charley Beller, Benjamin Van Durme, and Chris CallisonBurch. 2015a. Adding semantics to data-driven paraphrasing. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1512–1522, Beijing, China, July. Association for Computational Linguistics. Ellie Pavlick, Juri Ganitkevitch, Tsz Ping Chan, Xuchen Yao, Benjamin Van Durme, and Chris CallisonBurch. 2015b. Domain-specific paraphrase extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 57–62, Beijing, China, July. Association for Computational Linguistics. Ellie Pavlick, Pushpendre Rastogi, Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. 2015c. Ppdb 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 425–430, Beijing, China, July. Association for Computational Linguistics. Ellie Pavlick, Travis Wolfe, Pushpendre Rastogi, Chris Callison-Burch, Mark Dredze, and Benjamin Van Durme. 2015d. Framenet+: Fast paraphrastic tripling of framenet. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 408–413, Beijing, China, July. Association for Computational Linguistics. Sara Rosenthal, Preslav Nakov, Alan Ritter, and Veselin Stoyanov. 2014. Semeval-2014 task 9: Sentiment analysis in twitter. Proc. SemEval, pages 73–80. Amit Singhal. 2012. Introducing the knowledge graph: things, not strings. Official Google Blog, May. Immanuel Trummer, Alon Halevy, Hongrae Lee, Sunita Sarawagi, and Rahul Gupta. 2015. Mining subjective properties on the web. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 1745–1760. ACM.