Using the Dempster-Shafer Theory of Evidence to Rank Documents *

TSINGHUA SCIENCE AND TECHNOLOGY ISSNll1007-0214ll03/18llpp241-247 Volume 17, Number 3, June 2012 Using the Dempster-Shafer Theory of Evidence to Rank...
Author: Darrell Bridges
4 downloads 0 Views 302KB Size
TSINGHUA SCIENCE AND TECHNOLOGY ISSNll1007-0214ll03/18llpp241-247 Volume 17, Number 3, June 2012

Using the Dempster-Shafer Theory of Evidence to Rank Documents* Jiuling Zhang**, Beixing Deng, Xing Li Department of Electronic Engineering, Tsinghua University, Beijing 100084, China Abstract: Multi-source information can be utilized collaboratively to improve the performance of information retrieval. To make full use of the document and collection information, this paper introduces a new information retrieval model that relies on the Dempster-Shafer theory of evidence. Each query-document pair is taken as a piece of evidence for the relevance between a document and a query. The evidence is combined using Dempster’s rule of combination, and the belief committed to the relevance is obtained. Retrieved documents are then ranked according to the belief committed to the relevance. Several basic probability assignments are also proposed. Extensive experiments over the Text REtrieval Conference (TREC) test collection ClueWeb09 show that the proposed model provides performance similar to that of the Vector Space Model (VSM). Under certain probability assignments, the proposed model outperforms the VSM by 63% in terms of mean average precision. Key words: Dempster-Shafer theory of evidence; basic probability assignment; Dempster’s rule of combination

Introduction Ranking the retrieved documents in Information Retrieval (IR) is important if users’ information needs are to be adequately met[1]. Existing information retrieval models that allow ranking include the Vector Space Model (VSM)[2], the probabilistic model[3], and a number of their variants[1,4,5]. However, the best document ranking method in information retrieval has not yet been determined. From a multi-source information collaboration point of view, a variety of research concerning the usage of the Dempster-Shafer theory of evidence in ranking has been conducted[6-8]. In Ref. [6], the Dempster-Shafer theory of evidence is utilized to retrieve documents using noun phrases. In Ref. [7], single terms and term sets are taken as indexing terms, and belief functions Received: 2012-04-11; revised: 2012-05-02

** Supported by the Self-Directed Program of Tsinghua University (No. 2011Z01033)

** To whom correspondence should be addressed. E-mail: [email protected]; Tel: 86-10-62792516

are used to rank documents. In Ref. [8], the authors combine content and structure information using Dempster’s rule of combination. Even different sentences in a document are deemed to be multi-source information and are combined to help ranking[9]. In this work, the particle of the multi-source evidence is reduced to each query-document pair, rather than the term sets, sentences, contents, or structures. Each {qi , t j } is conceived as a piece of evidence that supports whether a document is relevant or irrelevant to a query depending on qi = t j or qi ≠ t j . Meanwhile, the collection information is taken into account. Multi-source evidence is combined to derive the belief committed to the relevance, which is then used to rank the documents.

1

Foundation of the Dempster-Shafer Theory of Evidence

The Dempster-Shafer theory of evidence was developed to model uncertainties[10]. It has the advantage of allocating belief to subsets of the universal set and a

Tsinghua Science and Technology, June 2012, 17(3): 241-247

242

combination rule that is able to combine multi-source evidence. This is an exceptional virtue for making decisions when multi-source information is available. 1.1

Dempster-Shafer theory of evidence

Before modeling the retrieval, several basic and related definitions of the Dempster-Shafer theory of evidence[10] must be given. Let Θ be a frame of discernment and P(Θ ) its power set. A function m : P(Θ ) → [0,1] is defined as a mass function[10] if it satisfies m(Θ ) = 0 and ∑ m( X ) = 1. X ⊆Θ

A mass function is a Basic Probability Assignment (BPA) to all subsets X of the frame of discernment Θ . The value m(X) represents the exact amount of belief committed to the proposition represented by subset X of Θ [11]. A function bel: P(Θ ) → [0,1] is called a belief function[10] if it satisfies Eq. (1): (1) bel( A) = ∑ m( X ) X ⊆A

A belief function assigns a measure of total belief to each subset of Θ . In comparison, m( A) is the belief assigned only to the set A [6]. 1.2 Dempster’s rule of combination

Let m1 , m2 , , mn be mass functions over the same frame of discernment Θ that satisfy Eq. (2)[11]: Nn = ∑ m1 ( X 1 )m2 ( X 2 ) mn ( X n ) ≠ 0 (2) X 1 ∩ X 2 ∩ ∩ X n ≠∅

Then, a function can be defined by m(∅) = 0 and Eq. (3): 1 m( X ) = ∑ m1 ( X1 )m2 ( X 2 ) mn ( X n ) (3) N n X1 ∩ X 2 ∩ ∩ X n = X

For all subsets X ≠ ∅ of Θ , the function m(∙) is also a mass function[11]. The Dempster-Shafer theory of evidence is then used to quantify the relevance between a document and a query.

2

Retrieval Model Based on the Dempster-Shafer Theory of Evidence

In this section, the information retrieval model based on the Dempster-Shafer theory of evidence is introduced. This section includes three subsections:

document and query expression, evidence formulation, and the relevance function. 2.1 Document expression

Before presenting the retrieval model based on the Dempster-Shafer theory of evidence, the VSM retrieval model needs to be reviewed. In the VSM, a document is represented by a vector, in which the elements of the vector are the corresponding weights of the terms. It is given by Eq. (4): D = ( w1 , w2 , , wN ) (4)

Generally, weight is determined by the Term Frequency (TF) and the Inverse Document Frequency (IDF). The weight wn is given by Eq. (5). TFn × IDFn wn = (5) ∑ m TFm × IDFm2 A document can also be represented by sequential indexing terms rather than by a set of terms, which is similar but not identical to the VSM expression of documents. A document can be uniquely denoted by the term sequence. To clarify the procedure of generating index terms from a document, a simple example is given. Let a document contain the following sentence: This chapter focused on the fundamental operation of evidential reasoning, namely, the orthogonal sum of evidential functions. Then, it is processed into the word sequence {chapter, focus, fundamental, operation, evidential, reason, orthogonal, sum, evidential, function}. Note that in this sequence, the term “evidential” occurs twice by loyally following the document; this is different from the VSM model. Queries are similarly expressed. Specifically, the queries are parsed, stop words are removed, and stemming is performed. 2.2 Evidence formulation

Because the purpose is to retrieve the documents and rank them according to their relevance to the query, let the frame of discernment be Θ = {R, R} . R means that the document is relevant to the query, while R means that the document is irrelevant to the query. The power set of Θ is P(Θ ) = {∅,{R},{R},Θ } . Let the document have N terms and be expressed as D = (t1 , t2 , , t N ) . Similarly, the query has M terms and is expressed as Q = (q1 , q2 , , qM ) .

Jiuling Zhang et al.:Using the Dempster-Shafer Theory of Evidence to Rank Documents

Each query-document pair is taken as a piece of evidence; in other words, {q1 , t1} is taken as a piece of evidence, {q1 , t2 } is a piece of evidence, and {q1 , t3 }, ,{qM , t N } are all pieces of evidence. They are indexed by i and j . From each piece of evidence, the extent to which the document is relevant to the query can be determined. For example, due to the piece of evidence {qi , t j }(qi = t j ) , the document and query are probably relevant. Thus, a basic probability value may be assigned to each of them. Then, M × N mass functions can be obtained, denoted by mij ( X ) . So far, there are no conclusive determinations on the value mij ( R) . Generally, the existence of qi = t j indicates that the document is possibly relevant to the query; the existence of qi ≠ t j cannot prove the relevance between a document and a query, and it can also unjustify the relevance. Different assignments for mij ( R) will be proposed and compared in Section 4. 2.3 Relevance function

In the previous subsection, M × N mass functions were defined. Here, a determination is given on how a document is relevant to a query given those mass functions. First, the mass functions are combined under Dempster’s rule of combination, as given by Eq. (6): m( R ) = ∑ mij ( R) (6) ⊕:i , j

Here m( R) denotes the overall evaluation given to R. Therefore, the total belief committed to R is given by Eq. (7): (7) bel( R ) = ∑ m( X ) X ⊆R

bel( R) is the belief committed to the fact that the document and the query are relevant, which is then used to rank the documents.

3

Propositions for Basic Probability Assignments

243

query are relevant. The occurrence of {qi , t j } ( qi ≠ t j ) should support the irrelevance between the two, or at least should not give more support to the relevance between them. If the collection information is taken into account, the degree to which it should be reflected and quantified must also be determined. This section is divided into two parts. In the first part, only the term’s existence and quantity information are considered. In the second part, collection information is incorporated. 3.1 Using only intra-document term information

In this part, only the intra-document information, narrowed down to the term’s existence and quantity information, are considered. Two basic probability assignments are proposed. 3.1.1 Assignment 1: Using only positive evidence (DST-BPA1) In this assignment, the frame of discernment is defined as Θ = {R} . Let the mass value assigned to a pair {qi , t j } be given by Eq. (8) and Eq. (9): ⎧ 1 , if qi = t j ; ⎪ (8) mij ( R) = ⎨ M × N ⎪0, q ≠ t if i j ⎩ 1 ⎧ , if qi = t j ; ⎪1 − mij (Θ ) = ⎨ M × N ⎪1, if qi ≠ t j ⎩

(9)

In this assignment, only the indexing term that is the same as the query terms is considered and taken as a piece of evidence; terms having no relation with the query terms are omitted. Assume there are ni indexing terms for query term qi satisfying qi = t j . Then, the combination of these M × N mass functions can be easily obtained: n

m( R ) =

1 ⎞i ⎛ ( ) 1 1 m R = − − ∑ ij ∏i ⎜⎝ M × N ⎟⎠ ⊕:i , j

The belief committed to R can also be obtained by Eq. (11): n

In the previous section, we proposed to assign the value mij ( R) to each piece of evidence {qi , t j } , but their values have not been defined. Several heuristic assignments for mij ( R) are proposed in this section. The basic principle of choosing the assignment is that the appearance of a query-document pair {qi , t j } should support the fact that the document and the

(10)

1 ⎞i ⎛ bel( R) = ∑ m( X ) = 1 − ∏ ⎜1 − (11) ⎟ M ×N ⎠ X ⊆R i ⎝ 3.1.2 Assignment 2: Using both positive and negative information (DST-BPA2) In this assignment, the frame of discernment is defined as Θ = {R, R} . Assume the mass value assigned to a pair {qi , t j } is as follows:

Tsinghua Science and Technology, June 2012, 17(3): 241-247

244

1 ⎧ , if qi = t j ; ⎪1 − mij ( R ) = ⎨ M × N ⎪0, if qi ≠ t j ⎩

(12)

if qi = t j ; ⎧0, ⎪ mij ( R) = ⎨ 1 ⎪⎩1 − M × N , if qi ≠ t j

(13)

1 (14) M ×N In this assignment, the indexing terms that are the same as the query terms are considered and taken as positive evidence; the indexing terms that are not the same as the query terms are deemed to be negative evidence. Assume that for query term qi , there are ni indexing terms satisfying qi = t j . Then, by combining these M × N mass functions, one can easily obtain the intermediate formulas, i.e., Eqs. (15)-(17), which will further be combined by Eq. (18): mij (Θ ) = 1 −

N −n

N

1 ⎞ i ⎛ 1 ⎞ ⎛ − 1 ⎜ ⎟ − ⎜1 − ⎟ ⎝ M ×N ⎠ ⎝ M ×N ⎠ mi ( R ) = N − ni ni N ⎛ ⎛ 1 ⎞ 1 ⎞ ⎛ 1 ⎞ + ⎜1 − ⎜1 − ⎟ ⎟ − ⎜1 − ⎟ ⎝ M ×N ⎠ ⎝ M ×N ⎠ ⎝ M ×N ⎠

(15) ni

N

1 ⎞ ⎛ 1 ⎞ ⎛ ⎜1 − ⎟ − ⎜1 − ⎟ ⎝ M ×N ⎠ ⎝ M ×N ⎠ mi ( R) = N − ni ni N ⎛ ⎛ 1 ⎞ 1 ⎞ ⎛ 1 ⎞ ⎜1− ⎟ + ⎜1− ⎟ − ⎜1 − ⎟ ⎝ M ×N ⎠ ⎝ M ×N ⎠ ⎝ M ×N ⎠

(16) N

1 ⎞ ⎛ ⎜1 − ⎟ ⎝ M ×N ⎠ mi (Θ ) = N − ni ni N ⎛ ⎛ 1 ⎞ 1 ⎞ ⎛ 1 ⎞ − + − − − 1 1 1 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ M ×N ⎠ ⎝ M ×N ⎠ ⎝ M ×N ⎠

m( R) = m1 ( R) ⊕ m2 ( R) ⊕

⊕ mM ( R)

(17) (18)

3.2 Using both intra- and inter-document information

In this subsection, both the indexing term’s existence information and the collection information are considered. The following assignments are proposed. 3.2.1 Assignment 3: Using only positive indecomposable intra- and inter-document information (DST-BPA3) In this assignment, the inter-document information is

incorporated by merging it with the term’s intradocument information. They are indecomposable, as given by Eqs. (19) and (20): ⎧ 1 ⎛C⎞ log C ⎜ ⎟ , if qi = t j ; ⎪ (19) mij ( R) = ⎨ M × N ⎝ ni ⎠ ⎪0, if qi ≠ t j ⎩ ⎧ ⎛C ⎞ 1 log C ⎜ ⎟ , if qi = t j ; ⎪1 − (20) mij (Θ ) = ⎨ M × N ⎝ ni ⎠ ⎪1, if qi ≠ t j ⎩ Here, the collection information is reflected by the ⎛C⎞ factor log C ⎜ ⎟ , which is similar to the measure used ⎝ ni ⎠

in the VSM. 3.2.2 Assignment 4: Using both positive and negative indecomposable intraand inter-document information (DST-BPA4) In the previous assignment definitions, the existence of t j that satisfies qi = t j is used only to compute the belief committed to the relevance between the document and the query. Similarly, one can assign values to terms that are not the same ( qi ≠ t j ). Assignments based on this idea are given as follows: ⎧ 1 ⎛C⎞ log C ⎜ ⎟ , if qi = t j ; ⎪ (21) mij ( R ) = ⎨ M × N ⎝ ni ⎠ ⎪0, if qi ≠ t j ⎩ if qi = t j ; ⎧0, ⎪ (22) mij ( R) = ⎨ 1 ⎛C⎞ ⎪ M × N log C ⎜ n ⎟ , if qi ≠ t j ⎝ i⎠ ⎩ mij (Θ ) = 1 − 3.2.3

⎛C⎞ 1 log C ⎜ ⎟ M ×N ⎝ ni ⎠

(23)

Assignment 5: Using only positive intra- and inter-document information with Dempster’s rule of combination (DST-BPA5) In this assignment, the intra- and inter-document information is not deemed to be indecomposable. Rather, the term’s collection information is taken as another source of evidence. Then, the evidence of different sources is combined by Dempster’s rule of combination. Let the assignment for the collection evidence of term i be ⎛C⎞ 1 miC ( R) = log C ⎜ ⎟ (24) N ⎝ ni ⎠

Jiuling Zhang et al.:Using the Dempster-Shafer Theory of Evidence to Rank Documents

For the evidence {qi , t j } , the assignment definitions in Eqs. (8) and (9) are employed. Finally, the evidence is combined by Dempster’s rule of combination: m( R ) = ∑ mij ( R ) ⊕ ∑ miC ( R ) (25) ⊕:i , j

⊕:i

3.2.4 Assignment 6: Using both positive and negative intraand inter-document information with Dempster’s rule of combination (DST-BPA6) In this assignment, in addition to the positive evidence from the previous assignment, the negative evidence and inter-document information are also combined. The basic probability assignments for intra-document evidence are written as Eqs. (12)-(14). The assignment for the collection evidence of term ti is defined as Eq. (24). Then, they are combined by Dempster’s rule of combination as Eq. (25). After the mass functions are obtained, one can further calculate the associating belief function as in Eq. (7), with which the documents can be ranked.

4

245

Dempster-Shafer Theory (DST) based model under Assignments 1 and 2 is only slightly lower than that of the VSM. In Assignments 3 and 4, the inter-document information is deemed to be indecomposable from the intra-document information. Figure 2 shows the experimental results of these two assignments compared to those of the VSM with TF-IDF as term weight. Figure 2 demonstrates that the retrieval performance of the DST-based model combining only positive indecomposable intra- and inter-document evidence is also only slightly worse than that of the general VSM model. If negative evidence is combined, the

Experiments

In this section, the experimental results of our evidential retrieval model are given. Here, the Category B collection, a subset of the Text REtrieval Conference (TREC) test collection ClueWeb09, was used as the test collection. The Category B collection has approximately 50 million English Web pages. The documents and queries are preprocessed first. For a given document and a given query, each pair {qi , t j } is taken as a piece of evidence. Using the values assigned in the previous section and Dempster’s rule of combination, the belief committed to the relevance of a document and query was used to rank the documents. In Assignments 1 and 2, only the intra-document information in the form of the term’s existence and quantity was utilized. This is in contrast to the VSM, in which term weights are only determined by term frequencies. The precision-recall curves of these two models are shown in Fig. 1. Figure 1 indicates that the proposed model combining only positive intra-document evidence has a slightly lower precision-recall curve compared to the simple VSM model that uses only term frequency information. The proposed model combining both positive and negative intra-document evidence has a similar precision-recall curve. The performance of the

Fig. 1 Comparison between the proposed model combining only positive intra-document evidence, the proposed model combining both positive and negative intra-document evidence, and the VSM model using only term frequency information

Fig. 2 Comparison between the proposed model combining only positive indecomposable intra- and inter-document evidence, the proposed model combining both positive and negative indecomposable intra- and inter-document evidence, and the general VSM model

246

performance also shows no improvement. The experimental results of combining intra- and inter-document information by Dempster’s rule of combination are shown in Fig. 3 and Fig. 4, corresponding to Assignment 5 and Assignment 6, respectively. From these two figures, it can be seen that by combining intra- and inter-document information, the retrieval performance of our proposed model greatly outperforms the simple VSM. This is due to the fact that after combining more information using Dempster’s rule of combination, the relevance between a document and a query can be better represented. The MAP values and precision at top N retrieved documents (P@N) of the different scenarios are shown in Table 1.

Tsinghua Science and Technology, June 2012, 17(3): 241-247 Table 1 Metrics

MAP P@5 P@10 P@20 P@50 P@100

VSM-TF

0.132 0.384 0.344 0.304 0.240

0.202

DST-BPA1

0.132 0.368 0.332 0.284 0.238

0.202

DST-BPA2

0.136 0.384 0.344 0.300 0.236

0.197

VSM-TF-IDF 0.145 0.464 0.384 0.339 0.252

0.210

DST-BPA3

0.144 0.448 0.376 0.333 0.256

0.214

DST-BPA4

0.144 0.440 0.352 0.324 0.252

0.208

DST-BPA5

0.231 0.640 0.580 0.535 0.412

0.340

DST-BPA6

0.236 0.656 0.598 0.541 0.412

0.332

Table 1 shows that the MAP values of the DST-BPA1 and DST-BPA2 retrieval models are close to that of the VSM model in which term frequency is taken as term weight. For DST-BPA3 and DST-BPA4, the MAP and P@N values are close to those of the general VSM. The DST-BPA5 and DST-BPA6 models obviously outperform the general VSM; their MAP values increased 59.3% and 62.8%, respectively, compared to the general VSM.

5

Fig. 3 Comparison between the proposed model combining only positive intra- and inter-document evidence and the general VSM model

MAP and P@N for different assignments

Conclusions

In this paper, a new information retrieval model based on the Dempster-Shafer theory of evidence is proposed. By taking term pairs as pieces of evidence for the relevance between a document and a query and combining all the evidence, the belief committed to the relevance can be obtained and utilized to rank the documents. Six different basic probability assignment definitions for six circumstances are proposed. Experimental results showed that our model could provide retrieval effectiveness close to or better than that of the general vector space model. In the future, we expect to further improve the retrieval performance by incorporating more document or collection information. Finding the optimal basic probability assignment will also be part of our future theoretical and experimental work. References [1] Ricardo B Y, Berthier R N. Modern Information Retrieval. New York, NY, USA: ACM Press, 1999.

Fig. 4 Comparison between the proposed model combining both positive and negative intra- and inter-document evidence and the general VSM model

[2] Salton G, Wong A, Yang C S. A vector space model for automatic indexing. Communications of the ACM, 1975,

18(11): 613-620.

Jiuling Zhang et al.:Using the Dempster-Shafer Theory of Evidence to Rank Documents

247

[3] Crestani F, Lalmas M, Rijsbergen C J V. A survey of prob-

Dempster-Shafer theory. In: Proceedings of the 17th ACM

abilistic retrieval models in information retrieval. ACM

Conference on Information and Knowledge Management.

Computing Surveys 1998, 30(4): 528-552.

Napa Valley, California, USA, 2008: 429-438.

[4] Wong S K M, Ziarko W, Wong P C N. Generalized vector

[8] Lalmas M, Moutoginni E. A Dempster-Shafer indexing for

spaces model in information retrieval. In: Proceedings of

the focused retrieval of a hierarchically structured docu-

the 8th Annual International ACM SIGIR Conference on

ment space: Implementation and experiments on a web

Research and Development in Information Retrieval.

museum collection. In: Proceedings of RIAO, 6th Confer-

Montreal, Canada, 1985: 18-25.

ence on Content-Based Multimedia Information Access.

[5] Hofmann T. Probabilistic latent semantic indexing. In:

College de France, France, 2000.

Proceedings of the 22nd Annual International ACM SIGIR

[9] Shi C, Zhang J, Deng B. A new document retrieval model

Conference on Research and Development in Information

using Dempster-Shafer theory of evidence. In: Proceedings

Retrieval. Berkeley, USA, 1999: 50-57.

of the IEICE General Conference. Nanjing, China, 2008:

[6] Theophylactou M, Lalmas M. A Dempster-Shafer belief model for document retrieval using noun phrases. In: Proceeding of BCS Information Retrieval Colloquium. Grenoble, France, 1998. [7] Shi L, Nie J Y, Cao G. Relating dependent indexes using

746-749. [10] Shafer G. A Mathematical Theory of Evidence. Princeton, US: Princeton University Press, 1976. [11] Guan J W, Bell D A. Evidence Theory and Its Applications. New York, US: Elsevier Science Inc., 1991.

Suggest Documents