Applying Semantic Web Techniques to Poem Analysis

st Proceedings of the 21 International Conference on Automation & Computing, University of Strathclyde, Glasgow, UK, 11-12 September 2015 Applying S...
Author: Hugo Hunter
0 downloads 2 Views 428KB Size
st

Proceedings of the 21 International Conference on Automation & Computing, University of Strathclyde, Glasgow, UK, 11-12 September 2015

Applying Semantic Web Techniques to Poem Analysis Xuan Wang and Hongji Yang Center for Creative Computing Bath Spa University United Kingdom [email protected] and [email protected] Abstract—Computing technology has been eyed in many fields. Creative Computing addresses the challenges of reconciling the objective precision of computer science with the subjective ambiguities of the arts and humanities. Poetry is a creative language which is full of imagination and beauty. The ambiguity of poetry increases the difficulty in interpretation and appreciation. Based on Semantic Web techniques, the research intends to comprehensively analyse poetic elements such as syntax, style, metaphor and genre. The creativity and innovation of poetry will also be considered. In terms of the analysis results, there will be an assessment to the poem. The method aims to help promote the interdisciplinary study of language and contribute to the analysis and appreciation of verbal art. Keywords-Poetry Analysis; Semantic Web; Creative Computing; E-Assessment

I.

Ontology;

INTRODUCTION

With the rapid development of computer industry, more and more people pay higher attention to combining the computer science with human arts. Since human mind creates beautiful literature, it is a great challenge for computer machine to understand and analyse the achievements. As one of the most significant elements in literature, poem is a research object of exclusive value and culture meaning. The following sections review previous computing analysis work on poetry, and then introduce our research which applies Semantic Web techniques to analysing poems. II. ACADEMIC BACKGROUND A. Previous Research on Poem Analysis The analysis of poem dates back to the 1940’s when poet and literary critic Josephine Miles began her extensive work analysing the surface statistics of poetry across time [1, 2]. While Mile’s work was influential in establishing a statistical framework for thinking about poetry, it was done largely by hand and thus limited in scope and size. Most recently, Hayward’s connectionist model of poetic meter incorporated more sophisticated and varied features in the analysis [3]. For every feature considered, including prosody, meter, and syntax, Hayward hand-assigned numeric scores to each syllable in ten samples of poetry. Analysing these scores allowed him to identify unique patterns for each poet and to note similarities within each period. However, this analysis required Hayward’s personal assessment of the poems as well as assignment of feature scores. Since it is

unfeasible to apply this method to a large set of poems, Hayward’s model also faces limitations in size. One of the most thorough and sophisticated computing analysis of poems to date is the PoetryAnalyzer, where Kaplan and Blei’s work on visualised comparison of style in American poetry [4]. Modern statistical and computational tools allowed the authors to integrate more features to analyse a large set of poems in an automated manner. The authors mapped poems from different poets and eras into a vector space based on three types of stylistic elements -orthographic, syntactic, and phonemic -- in order to find stylistic similarities among poems. Other research on the analysis of poetry focused on quantifying poetic devices such as rhyme and meter [5, 6], or classifying poems based on the poet and style [7, 8]. These studies show that computational methods can reveal interesting statistical properties in poetic language that allow us to better understand and categorise great works of literature. However, there has been very little work on assessment poetry. Moreover, it is necessary to establish a scientific, systematic evaluation poetry system. In this paper, the research aims to use Semantic Web techniques to analyse poems and assess them. Moreover, it will focus on the creativity aspect. B. Meaning of Semantic Web The term “Semantic Web” was presented by the World Wide Web Consortium. It aims at converting the current web, dominated by unstructured and semi-structured documents into a “web of data” [9]. The emergence of the semantic web holds the promise of bringing the web to an all-new level with technologies that can import and channel data at the most granular level, providing a new frontier in data linking. The Semantic Web is an evolution and extension of the existing Web that allows computers to manipulate data and information. There is the great appeal in the Web that has the potential ability to “know” and “understand” data with an even greater capacity to process better. This adds a more humanistic quality to standard data processing because the Semantic Web seeks to close the gap between merely providing documents to people and automatic data and information processing. There are some advantages of the new technology. Firstly, the Semantic Web helps improve productivity and efficiency in terms of data and information dissemination to

people. The accuracy of web search can become more accurate and precise while removing more ambiguity that usually arises with current search engines. This is contributed by the idea that semantics allow for any existing knowledge-representation system to be exported onto the Web. Many organisations can use this in daily business functions to greatly speed up communication and information-sharing. So it can provide a better communication platform for us to analyse poems, which enables collaborative work more effectively. Another advantage of the Semantic Web is the achievement of automation with minimal human intervention. It means that machines become capable to process and “understand” the data that they merely display at present. Due to this advantage, we can utilise Semantic Web techniques to make computers “understand” poems, which can facilitates poem analysis with more precision and comprehensiveness.

User Interface Layer

Output

Data Processing Layer

Ontologies Module

Rules

Processing Module

Creativity Ontology

Poetry Ontology

Syntax Ontology

Input

Style Ontology

Metaphor Ontology

Genre Ontology

Database of Poems

Data Storage Layer

The Semantic Web consists primarily of three technical standards: RDF (Resource Description Framework): The data modelling language for the Semantic Web. All Semantic Web information is stored and represented in the RDF. SPARQL (SPARQL Protocol and RDF Query Language): The query language of the Semantic Web. It is specifically designed to query data across various systems. OWL (Web Ontology Language): The schema language, or knowledge representation language, of the Semantic Web. OWL enables people to define concepts so that these concepts can be reused as much and as often as possible. In Semantic Web, an ontology formally represents knowledge as a set of concepts within a domain and the relationships between pairs of concepts. Some researchers have constructed ontologies about poetry such as Tang ontology and Su Shi ontology, which are based on 300 Tang poems and poems by Su Shi [10]. Yao designed a Chinese Ancient Poetry Ontology [11] and Feng presented a Poetry Learning Environment using the Learning Context Ontology, Poetry Noun Ontology and Music Emotion Ontology [12]. It can be seen that current poetry ontologies are limited to Chinese poetry in literature, and they are not comprehensive enough. The research intends to design ontologies of poetic elements such as syntax, style, metaphor and genre. Moreover, it will focus on analysing the creativity aspect. III.

SYSTEM DESIGN

In order to systematically evaluate poems, a poetry analysis system is developed using Semantic Web techniques and supporting tools. Based on existing poem resources, the system increases ontologies and processing module, which could analyse poems from different aspects. A. System Architecture The system has three layers: User Interface Layer, Data Processing Layer and Data Storage Layer as shown in Fig.1.

Poems

Poet Information

Social Background

Language Information

Figure 1: System Architecture

1) User Interface Layer User Interface Layer is the entrance of the system, which communicates with the users. It displays the information and provides the interface to access the system for users. Poems could be input to the Data Processing Layer, and it will return analysis results to the output interface after processing. 2) Data Processing Layer Data Processing Layer receives the requests from User Interface Layer, and handles data processing and returning the analysis results to the client. It includes two modules: Ontologies module and Processing module. a) Ontologies Module: There are six ontologies module in this part: Poetry Ontology, Creativity Ontology, Syntax Ontology, Style Ontology, Metaphor Ontology and Genre Ontology. The assessment elements are shown in Fig 2. Poetry Ontology: an ontology of poetry aspects, like categories, diction, sound patterns, rhyme, meter and stanza, etc, as shown in Fig 3. Syntax Ontology: an ontology of poem syntactic, such as line break, rhythm, repetition and rhyme. Style Ontology: an ontology of poem diction, such as formal, stately, noble or informal, etc. Metaphor Ontology: an ontology of metaphor. Many abstract and common concepts can be embodied or evoked by surprising metaphor. The analysis will focus on negative and positive emotion.

Genre Ontology: an ontology of poem styles, such as lyric, dramatic, free verse and narrative. Creativity Ontology: an ontology of creativity character, such as new and useful. Creativity has been much used and becomes a hot topic. However, what is creativity? How to define creativity? In the Eighth Oxford English Dictionary, “creativity” is defined as the use of imagination or original ideas to create something. In a 2003 summary of scientific research into creativity, Michael Mumford suggested, “Over the course of the last decade, however, we seem to have reached a general agreement that creativity involves the production of novel, useful products. [13]” According to the analysis, “New” and “Useful” will be considered as the creative factor of poems. Based on the following ontologies, the system will also analyse other three aspects of poetry: Timeliness: According to the social background and language information, if the theme or diction is opportune at that time, it could be considered as timely. Purpose: There must be a purpose for a poem. By analysing a certain number of poems, four types of poetic purposes are elicited out by the research: Expressing Emotion, Criticising, Telling Ambition and Appealing. Expressing Emotion is about evaluating the sentiment aspect of a poem, either positive or negative. Criticising is about estimating what is criticised by a poem and the reasons underneath the criticism. Telling Ambition is about apprising what kind of ambition is expressed by a poem and the circumstance that triggers it. Appealing is about evaluating what is appealed by a poem and the historical events associated with it. Easy to deploy: It is known to us all that various forms of art including poetry, essay, novel, music, dance, drama and movie are not independent but interrelated with each other. Through editing materials, they could be transformed into each other, which is one of fascinating features of art. For example, the famous tragedy Romeo and Juliet written by Shakespeare early in his career is always adopted into opera. Therefore, an assessment on whether it is easy for a poem to deploy into other forms of art should be considered. b) Processing Module: Processing module is responsible for data processing and returning results to the User Interface Layer. Based on the original database of poems, the ontologies data will be processed. 3) Data Storage Layer Data storage layer stores the database of poems, which has four modules: Poems, Poet Information, Social background and language Information.

Figure 2: Assessment Elements

B. System Realisation The whole system realisation is complicated and still in process of developing. However, the key points of the system are building Ontologies, processing Ontologies by Jena and reasoning Ontologies by Racer. Some algorithms are also presented. Taking the creativity analysis of metaphor for instance, the algorithm is comprised of three phases. Firstly, plain text of poem is read from user input. It will be directed to a word processor which distinguishes and marks each word by various colours, e.g. red for adjectives, blue for nouns, green for verbs, etc. Since only the ‘adj+n’ draws our interest in terms of metaphor, only red concatenated with blue phrases are extracted and stored in a variable. Note that the extraction takes advantage of a localisation function which selects phrases from the text. Then the adjective part and noun part of these phrases are grouped in arrays, respectively. The number of nouns concerned is denoted using size function. Lastly, iterations are launched for n_number times, each of which examines if the adjective used to describe the noun is incorporated in its common metaphor database. Otherwise, accumulate credits for exceptional uses into k. The creativity rate of metaphor is derived from k divided by n_number. The pseudo code of this part is shown as below: Program metaphor_analysis input_text = pscanf(‘poem’) WORD_PROCESSOR(input_text); adj_n_array = LOCATE(‘RED+BLUE’ or ‘RED+RED+BLUE’or ‘RED+RED+RED+BLUE’); adj_array = LOCATE(‘RED’or ’RED+RED’or ’RED+RED+RED’, adj_n_array); n_array = LOCATE(‘BLUE’, adj_n_array); n_number = SIZE(n_array); k = 0; For i = 1 to n_number do {database = ADJ_COMMON(n_array( i )) if adj_array( i )

database

else if k=k+1 end if }; creativity_rate = k / n_number; Figure 3: Ontology of Poetry

Another algorithm is applied for style analysis with five steps. Firstly, plain text of poem is read from user input. It will be directed to a word root program which extracts roots of words, e.g. amen from amenable, break from broken, compete from competition, etc. Then the roots with sentimental colour are selected and stored in style_root_array. The number of roots, i.e. n_length, is denoted using size function. Create an accumulate_style vector of which the values are zero with n_length dimension. Then evaluate each root and derive a style_vector of which values represent mark of its stylish category. For example, ‘die’ is marked as (0,0,5,0,1) in integer range 0 to 5. The dimension of vector is the stylish categories of interest while each column stands for a certain aspect such as formal, informal, sadness, delightful or ironic. Thus, a root scores up to 5 if it presents stronger style, whereas, remains 0 if it is neural in terms of that stylish category. Sum up theses style vectors to accumulate_style with the help of loop structure. At last, an average stylish evaluation is derived via dividing the accumulation by the number of roots to undermine the influence of poem length diversity. The VALUE function is designed to estimate finally the contributions of all significant roots to the stylish characteristics of the poem. The pseudo code of this part is shown below: Program style_analysis input_text = pscanf(‘poem’); root_text = WORD_ROOT(input_text); style_root_array = SELECT_STYLE_ROOT(root_text); n_length = SIZE(style_root_array); accumulate_style = ZEROS(n_length); For i = 1 to n_length do {style_vector = ROOT_EVALUATE(style_root_array( i ) ) accumulate_style = accumulate_style + style_vector }; STYLE = accumulate_style /n_length; VALUE(STYLE); C.

Example As an example, the poem below will be analysed. Dreams Hold fast to dreams For if dreams die Life is a broken-winged bird That cannot fly.

Hold fast to dreams For when dreams go Life is a barren field Frozen with snow. —Langston Hughes Metaphor analyser would read through the text marking adjectives and nouns in red and blue, respectively. The LOCATE function would focus on those adjectiveconcatenate-noun phrases. In the given poem, ‘brokenwinged bird’ and ‘barren field’ are the only two valid phrases such that the total number of core nouns equals 2. After the common adjective matching process, ‘barren’ is believed a usual metaphor for ‘field’. However, ‘brokenwinged’ seems an uncommon vivid sketch of ‘bird’ which adds 1 to the creative stack. Hence, the creativity rate yields 0.5. Meanwhile, the style analyser initially rewrites the poem in root words which reads Dream Hold fast to dream For if dream die Life be a break wing bird That cannot fly. Hold fast to dream For when dream go Life be a barren field Freeze with snow. ‘Dream’, ‘dream’, ‘die’, ‘break’, ‘dream’, ‘dream’, ‘barren’, ‘freeze’ and ‘snow’ draw our attention in sequence, which will be extracted to form an array. The stylish value vector should then be determined and accumulated in terms of formality, informality, sadness, delightfulness and irony. The values of the vectors are shown in the table on next page. Consequently, the style-characteristic vector is derived from accumulative style vector divided by the length of stylish root array. It is the VALUE function that eventually interprets the determination on the style of the poem according to the style-characteristic vector. And the style value of this poem is (0, 0, 1.33, 0.44, 0.33). Since there shows null for formality or informality, it could be drawn that the poem is not interpretable in terms of formality. In contrast to delightfulness, sadness scores significantly higher, which suggests the poem is of remarkable sadness. As for irony, 0.33 is relatively smaller than threshold value such that it could not be determined as an ironic poem for lack of evidence.

TABLE 1: VALUES OF ROOT WORDS Style

dream

die

break

barren

freeze

snow

Total

Formality Informality Sadness Delightfulness Irony

0 0 0 1 0 4

0 0 5 0 1 1

0 0 2 0 0 1

0 0 3 0 2 1

0 0 1 0 0 1

0 0 1 0 0 1

0 0 12 4 3

Times

IV. CONCLUSIONS The research presented in this paper aims to analyse the creativity of poetry based on the elements of syntax, style, metaphor as well as genre by using Semantic Web Techniques. A three-layer system is proposed based on Ontologies of analysis elements and relevant algorithms. The algorithms for the creativity analysis of metaphor and style analysis have been presented in details. Moreover, an example poem is analysed and its evaluation result is obtained. According to the result, the creativity rate of metaphor in the poem has been derived and the style has also been concluded. Besides metaphor and style, the system also investigates other aspects including syntax, genre, purpose of which trained databases are developed based on statistical analysis of all kinds of poems. Since the great potential of the pervasive utilization of computing in every field, more and more fascinating innovations done by computing are demanded to facilitate the evermore sophisticated society. The research is aiming to use the great power of computing to analyse poems. However, there are far more work could be conducted. By analysing poems, insights about the decisive elements and impacting factors of building creative poems will be excavated. According to the results above, the system makes the further innovation about generating creative poems through computing possible. Hence, one of the promising future considerations is using computing to generate poems, both traditional and inventive. The former is about constructing poems based on traditional formats with new content, such as using creative words. The latter is about composing poems with totally new formats and content, which requires more creativity for computing.

REFERENCES [1]

[2] [3] [4]

[5]

[6]

[7]

[8]

[9] [10]

[11]

[12]

[13]

[14]

[15]

J. Miles, “Major Adjectives in English Poetry: From Wyatt to Auden”, University of California Publications in English, vol. 12, No. 3, 1946, pp. 305-426. J. Miles, Style and Proportion: The Language of Prose and Poetry , Brown and Co, Boston, 1967. M. Hayward, “Analysis of a Corpus of Poetry by a Connectionist Model of Poetic Meter”, Poetics, vol. 24, 1996, pp. 1-11. D. Kaplan and D. Blei, “A Computational Approach to Style in American Poetry”, 7th IEEE International Conference on Data Mining, Omaha, USA, 2007, pp. 553-558. D. Genzel, et al., “Poetic Statistical Machine Translation: Rhyme and Meter”, EMNLP Conference on Empirical Methods in Natural Language Processing, SIGDAT, Massachusetts, USA, SIGDAT, 2010, pp. 158-166. E. Greene, et al., “Automatic Analysis of Rhythmic Poetry with Applications to Generation and Translation”, EMNLP Conference on Empirical Methods in Natural Language Processing, SIGDAT, Massachusetts, USA, SIGDAT, 2010, pp. 524-533. Z. He, et al, “SVM-based Classification Method for Poetry Style”, IEEE International Conference on Machine Learning and Cybernetics, Hong Kong, China, 2007, pp. 2936-2940. A. C. Fang, et al, “Adapting NLP and Corpus Analysis Techiniques to Structured Imagery Analysis in Classical Chinese Poetry”, AdaptLRTtoND Workshop on Adaptation of Language Resources and Technology to New Domains, FLaReNet Project, Stroudsburg, PA, USA, 2009, pp. 27-34. T. B. Lee, et al, “The Semantic Web”, Scientific American Magazine, Macmillan Publisher Ltd, May, 2001, pp. 29-37. C. Huang, et al, “Reconstructing the Ontology of the Tang Dynasty: A Pilot Study of the Shakespearean-garden Approach”, 18th Pacific Asia Conference on Language, Informaion and Computation, PACLIC Steering Committee , Waseda University, Tokyo, 2004. R. Yao and J. Zhang, “Design and Implementation of Chinese Ancient Poetry Learning System Based on Domain Ontology”, IEEE International Conference on e-Education, e-Business, e-Management and e-Learning, Sanya, China, 2010, pp. 460-463. J. Weng, et al, “Constructing an Immersive Poetry Learning Multimedia Environment using Ontology-based Approach”, IEEE International Conference on Ubi-Media Computing, Lanzhou, China, 2008, pp. 308-313. M. D. Mumford, “Where Have We Been, Where are We Going? Taking Stock in Creativity Research”, Creativity Research Journal, Routledge Company, vol.15, 2003, pp. 107-120. D. Rubin, Memory in Oral Traditions: The Cognitive Psychology of Epic,Ballads, and Counting-out Rhymes, New York: Oxford University Press, 1995. R. Lea, et al, “Sweet Silent Thought: Alliteration and Resonance in Poetry Comprehension”, Psychological Science, SAGE Publications, vol.19, 2008.