Unified Contents Retrieval from an Academic Repository

Unified Contents Retrieval from an Academic Repository Haruo Yokota†‡ , Takashi Kobayashi† , Hiroaki Okamoto‡ , Wataru Nakano‡ †Global Scientific Info...
Author: Thomas Brown
2 downloads 0 Views 295KB Size
Unified Contents Retrieval from an Academic Repository Haruo Yokota†‡ , Takashi Kobayashi† , Hiroaki Okamoto‡ , Wataru Nakano‡ †Global Scientific Information and Computing Center ‡Department of Computer Science Tokyo Institute of Technology {yokota@cs, tkobaya@gsic, [email protected], [email protected]}.titech.ac.jp

Abstract Many forms of multimedia materials are currently stored in computer systems. In many cases, many of these materials are related to each other. Therefore, it is important to provide a unified presentation view of the related multimedia materials, which can be seen as virtual unified contents. To retrieve the most appropriate object from the unified contents, which satisfies given conditions, it is also important to prepare a dedicated weighting schema taking account of the relationships between the multimedia materials. To consider a practical weighting schema it is helpful to assume actual application areas. In this paper, we focus on academic multimedia materials in universities. We present an example of unified presentation views for the academic multimedia materials and propose a weighting schema dedicated to combinations of lecture slides, videos, voices, and laser pointer information. Experimental results, using genuine lecture materials, indicate that weighting schemata are effective in improving the precision of retrieval.

1. Introduction Many types of multimedia materials, such as texts, pictures, graphs, images, sounds, voices, and videos, are currently stored in computer systems. In many cases, a number of these multimedia materials are related to each other. To use the stored materials effectively, it is important to provide a unified presentation view of the related multimedia materials. By unifying the different types of multimedia materials, some synergy can be expected. For example, texts explaining the relationship between a picture and a piece of music should be presented together with the picture and music. The unified presentation view can then be seen as virtual unified content. When we have a large number of unified collections, it is also important to prepare dedicated weighting schemata to retrieve the most appropriate collection from the unified collections satisfying the given conditions. The relationship between the multimedia materials should be an important factor in developing the weighting schemata. To consider practical weighting schemata, it is helpful to assume the application area. In this paper, we focus on academic multimedia materials in universities. It is easy to find many types of academic multimedia material in a university, for example research papers, lecture manuscripts, and lecture videos. We can also readily expect unified content retrieval from the academic repository. Education and research in the university is made more effective by using stored academic multimedia materials. We have proposed the UPRISE (Unified Presentation Slide Retrieval by Impression Search Engine) [1, 2, 3, 4] as a method for searching appropriate lecture video scenes to match

given keywords. There are many related activities in academic repositories in many universities, such as the OCW and university repository. There have been many trials to provide crossover retrievals for multimedia materials. For example, MINOS[5] can present and browse voices, images and texts in a multimedia database. The Informia[6] combines the database and IR approaches to access heterogeneous information sources by taking a mediatorbased approach. Metadata fulfills one of the most important roles of integrating these materials. As described above, to expect synergy with different types of multimedia materials, it is not sufficient to present and browse only the individual materials. There are proposals for unified frameworks and models [7, 8, 9], query languages[10], and indexing methods[11, 12] for different types of multimedia materials. For academic materials, integrated information retrieval is also used in the knowledge worker support system[13]. However, they did not prepare any weighting schema to consider the combination of different types of multimedia materials. In this paper, we propose a unified presentation view and unified weighting schema for processing related multimedia materials in an academic repository. The remainder of this paper is organized as follows. In section 2, we describe an approach to providing a unified presentation view for multimedia materials stored in a repository. Then we consider a number of weighting schemata to retrieve the unified collection, in section 3. Section 5 reports the related activities in our university. We summarize the paper’s main points in the final section.

2. Unified Presentation View We can readily find some relationships between different types of multimedia materials stored in a computer system as a repository and expect some synergy by unifying the related materials, which cannot be represented by considering a single type of material. Naturally, there must be a wide variety in the combinations of the material types. The texts explaining a relationship between a picture and a music is one example. To produce synergy, a unified presentation view of the related materials is required. The unified presentation view can be treated as a virtual unified collection. We can also take the approach of materializing the unified collection. In other words, new physical material can be generated dynamically from the related multimedia materials. However, because the combination of material types varies widely, it is not efficient to generate an object for each combination. Therefore, we take the approach of constructing a unified presentation view instead of creating new unified objects. To construct the unified presentation view, the time sequence becomes a dominant factor, if some streaming media,

a function of the assigned point in the line l in the slide for s, and C(s, k, l) a function of counting keywords k in the line l of the slide for scene s. A simple example of P (s, l) is: P (s, l) = M ax Indent − indent(s, l), where indent(s, l) indicates the layer of indentation of line l in the slide for s. On the other hand, if P (s, l) = 1 to all the scenes, Ip is identical with the just term frequency tf. There are other approaches to integrating contents and structures in text retrieval, such as in [14]. However, because the structure of presentation slides is commonly based on some template, we can assume simple structures and can use these effectively. Video Frame

Text (Titles of slides)

Picture (Slides)

Figure 1: The Unified Presentation View in UPRISE

such as video, sound, and voice, are included in the combination. This means that synchronization between the multimedia materials is required in the presentation view. In the case of the academic repository we assume, the combination of lecture slides, videos, and the titles of the slides is a good example. Figure 1 shows the unified presentation view of these in UPRISE, a lecture video retrieval system we developed [1, 4]. The slide presented is changed by synchronizing with the video. This makes the unified presentation view more effective for students than the simple video materials.

3. Unified Weighting Schemata Some weighting schemata have been used to rank the retrieved contents. The tf-idf, term frequency and inverse document frequency, are commonly used as a weighting schema for textbased information retrieval. However, the tf-idf is not always sufficient for the combination of multimedia materials. For example, if the same slide appears in multiple scenes, by backtracking or reuse by the lecturer, the tf-idf cannot distinguish the appropriate scene from the multiple scenes available. In this section, we consider other weighting schemata to retrieve the unified content. We treat lecture videos and slides used in the video as the multimedia materials, stored in an academic repository, to evaluate the weighting schemata. We proposed the impression indicators as the weighting schemata in UPRISE[1]. We modify the impression indicators by combination with related information.

3.2. Combination with Duration Information Duration information is useful in distinguishing multiple appearances of the same slide caused by backtracking or reuse by the lecturer. To reflect the duration information of scenes to the weighting schema, we propose the duration-impression indicator. The value of the position-impression indicator is modified by the presentation time with a duration parameter: Id (s, k, θ) = T (s)θ · Ip (s, k), where T (s) denotes the time used for scene s, and θ is the duration parameter for changing the influence of the time factor. If θ = 0, the duration-impression indicator is identical to the position-impression indicator, i.e., Id (s, k, 0) = Ip (s, k). As θ becomes larger, the influence of the presentation time becomes greater. The longer scene is ranked higher by Id . We assume the unit of time duration can be adjusted in T (s) to change the effect of the timing information on this weighting schema. 3.3. Combination with Context Information We combine the information of slide appearance sequence to reflect the influence of context on the weighting schemata, which accumulates values of the duration-impression indicator within a presentation widow indicated by a window-size parameter δ: Ic (s, k, θ, δ, ε1 , ε2 ) =

L(s)

Ip (s, k) =

X

P (s, l) · C(s, k, l),

l=1

where s denotes an identifier of the objective scene, k a search keyword, L(s) the total number of lines in a slide in s, P (s, l)

E(r − s, ε1 , ε2 ) · Id (s, k, θ),

r=s−δ

where E(x, ε1 , ε2 ) is a function that specifies the effect of neighboring slides in the context window. 

3.1. Combination with Structure Information First, we consider the structure of a lecture slide. If a given keyword appears in the title of the slide or lines less indented, the value of the position impression is high, whereas if the keyword only appears in lines indented deep, the value is lower. The expression used to calculate the weighting schemata combined with the slide structure is:

s+δ X

E(x, ε1 , ε2 ) =

eε1 x e−ε2 x

(x < 0) (x ≥ 0)

The effect of distance is decided using the exponential function with distance-effect parameters of ε1 and ε2 . This means that we can alter the effect of the information for the former context and the latter context. If you want to make the former context effective, set ε1 to a small value and vice versa. When we want to emphasize a start point of the explanation related by the keyword, we set ε1 = 5 and ε2 = 0.5 as an example. When the window-size parameter δ is zero, Ic is identical with Id apart from the parameters of ε1 and ε2 , i.e., Ic (s, k, θ, 0, ε1 , ε2 ) = Id (s, k, θ).

3.4. Combination with Laser Pointer Information In the case where a laser pointer is used in a presentation, the information of points selected by the laser pointer can be used to improve the precision of retrievals. We proposed a method of reflecting the laser pointer information on the weighting schemata in [3]. Two approaches can be taken to reflect the information: count of hits by the laser pointer and duration of the hits. First, we extract subscenes from a scene to make each subscene contain a continuous pointer location. Because there is ambiguity regarding target keywords caused by shaking or habit of the lecturer, we distribute the possibilities of a hit by the pointer in a subscene to the neighborhood lines and make their sum be 1:

While both Ic[p+phc] and Ic[d+phd] are restricted to a portion of the weighting schemata, we can directly add the effect of normalized phc and phd to Ic : Ic[c+phc] (s, k, θ, δ, ε1 , ε2 , φc )



phc(s, k) s∈S phc(s, k)



Ic (s, k, θ, δ, ε1 , ε2 ) · 1 + φc · P

=

Ic[c+phd] (s, k, θ, δ, ε1 , ε2 , ωc )



phd(s, k) s∈S phd(s, k)



Ic (s, k, θ, δ, ε1 , ε2 ) · 1 + ωc · P

=

,

where S is a set of all scenes in the presentation, and φc and ωc are parameters for changing the effect of phc and phd on Ic , respectively.

L(s)

X

H(l, q) = 1,

3.5. Combination with Vocal Information

l=1

where H(l, q) denotes the hit probability of line l in subscene q of scene s. We make H(l, q) proportional to the distance between the line l and the point. We then propose an indicator phc(s, k) to express the probability of a pointer hit count for keyword k in scene s using H(l, q) and C(s, k, l): phc(s, k) =

X L(s) X

H(l, qi ) · C(s, k, l).

qi ∈s l=1

To reflect the effect of laser pointer position on the weighting schemata, we modify Id and Ic by adding phc(s, k) to the term representing keyword position Ip , and we denote these as Id[p+phc] and Ic[p+phc] , respectively: Id[p+phc] (s, k, θ, φp ) = T (s)θ (Ip (s, k) + φp · phc(s, k)) , Ic[p+phc] (s, k, θ, δ, ε1 , ε2 , φp ) =

s+δ X

E(r − s, ε1 , ε2 ) · Id[p+phc] (s, k, θ, φp ),

r=s−δ

where φp is the pointer-hit-count parameter for changing the effect of the laser pointer hit count. We then consider the effect of the duration of a hit by the pointer. We propose phd(s, k) by multiplying phc(s, k) by the duration of each subscene. phd(s, k) =

X L(s) X

H(l, qi ) · C(s, k, l) · T (qi )

We also modify Id and Ic by adding phd(s, k) to the term for the scene duration: θ

Id[d+phd] (s, k, θ, ωd ) = (T (s) + ωd · phd(s, k)) · Ip (s, k), Ic[d+phd] (s, k, θ, δ, ε1 , ε2 , ωd ) =

Id[p+vac] (s, k, θ, ψ) = T (s)θ (Ip (s, k) + ψ · vac(s, k)) , Ic[p+vac] (s, k, θ, δ, ε1 , ε2 , ψ) s+δ X

=

E(r − s, ε1 , ε2 ) · Id[d+phd] (s, k, θ, ωd ),

E(r − s, ε1 , ε2 ) · Id[p+vac] (s, k, θ, ψ),

r=s−δ

where ψ is the voice-appearance-count parameter to change the effect of voice on the rating. Whereas both phc and phd, the count and duration of pointer hits, are useful for reflecting the laser pointer information to the weighting schemata, the duration of voice cannot be used. Moreover, in the case of vocal information generated by voice recognition technology, there is the possibility of a recognition error where words that were not spoken were interpolated. To eliminate the influence of the interpolation recognition error, we modify the voice-appearance-count parameter depending on the keyword appearance in a slide for a scene or a sequence of slides in a context window. If keyword k does not appear in the slide for scene s, i.e., if Ip = 0, we set ψ = 0: Ic[p+vac/p] (s, k, θ, δ, ε1 , ε2 , ψ) 

qi ∈s l=1

s+δ X

We proposed a method for using the voice information in a video [15, 16]. If a target keyword not only appears in the slide for a scene but also is frequently uttered in the scene, the scene has to be highly ranked because the keyword is explained well in the scene. We propose an indicator vac(s, k) to count the voiced appearances of keyword k in scene s. We modify Id and Ic using vac(s, k) similar to phc:

=

Ic[p+vac] (s, k, θ, δ, ε1 , ε2 , ψ) Ic[p+vac] (s, k, θ, δ, ε1 , ε2 , 0)

(Ip 6= 0) (Ip = 0)

We also propose Ic[p+vac/c] to eliminate the influence of the interpolation recognition error in the context window. We force Ic[p+vac/c] = 0, in the case where the keyword k does not appear in any slides in the context window, i.e., Ic = 0: Ic[p+vac/c] (s, k, θ, δ, ε1 , ε2 , ψ) 

=

Ic[p+vac] (s, k, θ, δ, ε1 , ε2 , ψ) 0

(Ic 6= 0) (Ic = 0)

r=s−δ

where ωd is the pointer-hit-duration parameters that change the effect of the duration of a hit by the laser pointer.

These dynamic parameter changes are not required for the laser pointer information because interpolation error need not be considered for the laser pointer.

3.6. Combination with Rareness Factor The rareness factor is important for the weighting schemata in information retrieval. For tf-idf , the inverse document frequency, idf, is the rareness factor. While the appropriateness factor, tf, can be calculated simply in a document, the range for calculating the rareness factor is variable. For example, in scene retrieval for lecture videos in universities, there are several types of target range, such as a class, a course, courses in a school, and courses in the university. We proposed isfr(k, λ), the inverse frequency of scene including keyword k in range specified by λ, as a dedicated rareness factor for scene retrieval [2]. isfr(k, λ) = log

# of scenes in range λ # of scenes having keyword k in its slide

We can apply the isfr(k, λ) to all weighting schemata we previously discussed as the rareness factor. Thus, the weighting schema becomes Ix · isfr(k, λ), where x can be replaced by p, d, c, c[p+phc], c[d+phd], c[c+phc], c[c+phd], c[p+vac/p], or c[p + vac/c]. If the range specified by λ is the all slides, isfr is identical with idf. Therefore, if P (s, l) = 1, Ic (s, k, 0, 0, −, −) · isfr(k, all-slides) = tf(s, k) · idf(k). We also proposed ivfr(k, λ), the inverse frequency of vocal keyword k, as another rareness factor dedicated to vocal information [16]. ivfr(k, λ) = log

# of scenes in range λ # of scenes uttering keyword k

To apply ivfr to Ic[p+vac/p] or Ic[p+vac/c] , we combine it with Ic · isfr using ψ. Ic (s, k, θ, δ, µ, ε1 , ε2 , ψ) · isfr(k, λ) + ψ · Ic[p+vac/x] (s, k, θ, δ, µ, ε1 , ε2 , ψ) · ivfr(k, λ) If keyword k does not appear in the slide for Ic[p+vac/p] or the context window for Ic[p+vac/c] , ψ = 0 to eliminate the influence of the interpolation recognition error.

4. Effects of the Laser Pointer and Voice Figure 2 illustrates the effect of the laser pointers and voices in scene retrieval. Id and Ic use the duration information to distinguish multiple appearances of the same slide in different scenes. For example, if slide “a” containing the keyword “x” appears in both scene i and i + 2, scene i will be ranked higher than scene i + 2 by Id or Ic because the duration of scene i is longer. However, if slide “a” consists of two concepts and the target keyword “x” is included in the second concept described in scene i + 2, scene i + 2 should be ranked higher. If the keyword is highlighted by the laser pointer, or appears in the recognized voice, scene i + 2 can be ranked higher by the weighting schemata using the laser pointer or vocal information. We evaluated the effects of the laser pointer and vocal information using a series of videos from a lecture course in our university. The lecture course consists of 12 classes containing 538 scenes and 3822 subscenes. We used open-source speech recognition software Julius [17] to derive the vocal information from the lecture videos. We extracted 1099 words from the lecture slides, and added 176 words from them to the dictionary for voice recognition, which had not been originally included in it.

laser pointer location slide a

slide b

Concept A

slide c

slide a Concept A

Example A Concept B keyword x

Example B Concept B keyword x

scene i

scene i+1scene i+2scene i+3 Time

Duration Information keyword x

keyword x

Vocal Information

Figure 2: Effects of the Laser Pointer and Voice Table 1: Comparisons of the precision for the weighting schemata Weighting schemata Ic Ic[p+phc] Ic[d+phd] Ic[c+phc] Ic[c+phd] Ic[p+vac/p] Ic[p+vac/c] Ic · isfr Ic[p+vac/p] · isfr Ic[p+vac/p] · (isfr + ivfr)

Precision 0.601 0.544 0.608 0.614 0.615 0.640 0.630 0.629 0.637 0.654

To compare the effects of the proposed weighting schemata, we calculated the retrieval precision under the condition that the recall is one by asking testers to decide the scene corresponding best to the keywords among all the scenes related to the keywords. Therefore, we used the following formula to calculate the precision: precision =

N 1 1 X , N i=1 the rank of the best scene in the i-th test

where N is the number of tests. Table 1 lists the retrieval precisions of ranked scenes for Ic , Ic[p+phc] , Ic[d+phd] , Ic[c+phc] , Ic[c+phd] , Ic[p+vac/p] , Ic[p+vac/c] , Ic ·isfr, Ic[p+vac/p] ·isfr, and Ic[p+vac/p] ·(isfr+ivfr) with a parameter setting of θ = 0.5, δ = 4, ε1 = 5, ε2 = 0.5, φp/c = 10, ωd/c = 10, ψ = 5, and λ =course. The experimental results in Table 1 indicate that the dedicated weighting schemata are mostly effective to the scene retrieval in improving the retrieval precision of the scene ranking. This means that both the laser pointer and vocal information is useful to retrieve the unified contents. In particular, Ic[c+phd] is effective for the laser pointer information, and Ic[p+vac/p] for the vocal information. In other words, the duration information of laser pointer and keyword appearance in the target slide for vocal information seem influential in the results. The improvement on the retrieval precision by Ic[p+vac/p] · (isfr + ivfr) indicates that the rareness factor using the vocal information is also effective. Since the more detailed analysis is important, we plan to

evaluate the effect of the weighting schemata in more detail as future work.

5. Related Activities in Tokyo Tech In our university, Tokyo Institute of Technology, we have proposed a concept named Tokyo Tech STAR for activities related to this research. The STAR stands for Science and Technology Academic Repository and consists of four main components: CourseWareHouse, Research Repository, Digital Museum, and Nondigitized Repository. Figure 3 illustrates the relationship between the Tokyo Tech STAR and these components. In STAR, the CourseWareHouse, Research Repository, and the Digital Museum contain academic multimedia materials. The CourseWareHouse mainly stores educational multimedia materials, such as lecture manuscripts and lecture videos, and provides advanced retrieval functions for these. UPRISE is used as a method for searching appropriate lecture video scenes matched with given keywords in the CourseWareHouse. Some lectures in our university have already been using UPRISE for their courses, and the retrieval functions for them are available to the students taking the courses. Our university is also a member of the Japanese Open Course Ware (OCW) consortium cooperating with the MIT OCW. The course materials for Tokyo Tech OCW are also treated as a part of Tokyo Tech CourseWareHouse. It provides free access to course materials even for outside of our university. More than a hundred course materials have already been available in Tokyo Tech OCW 1 . The Research Repository also stores research manuscripts, mainly research papers, and provides dedicated retrieval functions for these papers. We proposed the Research Mining method[18] to derive macro research flows from citation information of each paper stored in the Research Repository. The other researchers in our university proposed a method of analyzing the property of citation in a paper, and developed a system named PRESRI[19] to retrieve research papers based on their information. We plan to make available some research manuscripts in our university with these functions as the Tokyo Tech Open Research Repository (ORR). The Digital Museum keeps digitalized historical materials, such as scanned ancient writings and pictures relating to results of research undertaken in our university. We also plan to make available some part of this as the Tokyo Tech Open Digital Museum (ODM). The activities are also quite closely related to the Tokyo Institute of Technology 21st Century COE Program ”Framework for Systematization and Application of Large-Scale Knowledge Resources” 2 [20]. Many members of the program are taking very important roles in the STAR activities. As a part of this program, we are developing a system named Knowledge Store (KS) to store many types of multimedia materials related to the knowledge resources [21, 22, 4]. The lecture videos and presentation slides are stored in the KS and are provided to an external system for UPRISE using the proposed weighting schemata.

6. Conclusions The paper describes the treatment of related multimedia materials in an academic repository. We first consider a unified presentation view for these materials to treat them as a unified 1 http://www.ocw.titech.ac.jp/index.php?lang=EN 2 http://www.coe21-lkr.titech.ac.jp/english/index.html

Research Publications

Course Materials

CourseWareHouse

Research Repository

Tokyo Tech OCW

Tokyo Tech ORR

Tokyo Tech STAR (Science and Technology Academic Repository) Non-DigitizedRepository

Digital Museum

Tokyo Tech ODM

Figure 3: Activities in the Tokyo Tech STAR

collection. Synchronization between the multimedia materials is important to unify them if some streaming media, such as videos, sounds and voices, are included in the related materials. We then propose a number of weighting schemata for retrieval, to rank the unified contents using information of slide structure, presentation duration, presentation context, laser pointer hit count, laser pointer hit duration, and voices. Our experimental results using actual lecture materials indicate that the dedicated weighting schemata for laser pointer information and vocal information are effective in improving the precision of retrieval. It indicates a direction for retrieving multimedia materials with the unified presentation view. We also describe related activities in our university. We have a number of activities related to the academic repository, and UPRISE is a part of them. UPRISE has been already used by some lectures in our university, and available to the students taking the courses. We will continue the evaluation of unified content retrieval based on the unified weighting schemata. We will also consider an approach to unifying the laser pointer information and voice information, and other possibilities. Unification of other multimedia materials is also planned for the future.

7. Acknowledgment This work is partially supported by the Tokyo Institute of Technology 21COE Program ”Framework for Systematization and Application of Large-Scale Knowledge Resources”, a Grantin-Aid for Scientific Research of MEXT Japan(#15017233, 16016232), and CREST of JST (Japan Science and Technology Agency).

8. References [1] Haruo Yokota, Takashi Kobayashi, Taichi Muraki, and Satoshi Naoi. UPRISE: Unified Presentation Slide Retrieval by Impression Search Engine. IEICE Transactions on Information and Systems, E87-D(2):397–406, Feb 2004. [2] Hiroaki Okamoto, Takashi Kobayashi, and Haruo Yokota. Presentation Retrieval Method Considering the Scope of Targets and Outputs. In Proc. of International Workshop

on Challenges in Web Information Retrieval and Integration(WIRI2005) in conjunction with ICDE2005, pages 47– 52, April 2005.

[14] Ricardo Baeza-Yates and Gonzalo Navarro. Integrating Contents and Structure in Text Retrieval. SIGMOD Rec., 25(1):67–79, 1996.

[3] Wataru Nakano, Yuta Ochi, Takashi Kobayashi, Yutaka Katsuyama, Satoshi Naoi, and Haruo Yokota. Unified Presentation Contents Retrieval Using Laser Pointer Information. In Proc. of the International Special Workshop on Databases For Next Generation Researchers (SWOD), in memoriam of Prof. Kambayashi, at ICDE 2005, pages 170–173, April 2005.

[15] Hiroaki Okamoto, Takashi Kobayashi, Satoshi Naoi, Haruo Yokota, and Sadaoki Furui. Application of Voice Data for Retrieving Unified Presentation Contents (in Japanese). Technical Report DE2005-107, IEICE, 2005.

[4] Takashi Kobayashi, Taichi Muraki, Satoshi Naoi, and Haruo Yokota. A Searching System on Unified Presentation Contents (In Japanese). IEICE Transaction on Information and System(D-I), J88-D-I(3):715–726, March 2005. [5] Stavros Christodoulakis, F. Ho, and M. Theodoridou. The multimedia object presentation manager of minos: A symmetric approach. In Carlo Zaniolo, editor, Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data, Washington, D.C., May 28-30, 1986, pages 295–310. ACM Press, 1986. [6] Maria Luisa Barja, Tore Bratvold, Jussi Myllymaki, and Gabriele Sonnenberger. Informia: a mediator for integrated access to heterogeneous information sources. In CIKM’98: Proceedings of the seventh international conference on Information and knowledge management, pages 234–241, New York, NY, USA, 1998. ACM Press. [7] Y Rui, T.S. Huang, and S. Mehrotra. Browsing and retrieving video content in a unified framework. In IEEE Second Workshop on Multimedia Signal Processing, pages 9–14, Dec 1998. [8] P. Piamsa-nga, N. A. Alexandridis, G. Blankenship, G. Papakonstantinou, P. Tsanakas, and S. Tzafestas. A Unified Model for Multimedia Retrieval by Content. In International Conference on Computer and Their Application (CATA98), 1998. [9] Xiang Sean Zhou and Thomas S. Huang. Unifying keywords and visual contents in image retrieval. IEEE MultiMedia, 9(2):23–33, 2002. [10] Young-Il Choi, Yoo-Mi Park, Hun-Soon Lee, and Seong-Il Jin. An Integrated Data Model and a Query Language for Content-Based Retrieval of Video. In 4th International Workshop on Advances in Multimedia Information Systems, page 192, September 1998. [11] Allen Ginsberg. A Unified Approach to Automatic Indexing and Information Retrieval. IEEE Expert: Intelligent Systems and Their Applications, 8(5):46–56, October 1993. [12] Kevin Cox. A unified approach to indexing and retrieval of information. In SIGDOC’94: Proceedings of the 12th annual international conference on Systems documentation, pages 176–181, New York, NY, USA, 1994. ACM Press. [13] G. McAlpine and P. Ingwersen. Integrated information retrieval in a knowledge worker support system. In SIGIR ’89: Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval, pages 48–57, New York, NY, USA, 1989. ACM Press.

[16] Hiroaki Okamoto, Takashi Kobayashi, Satoshi Naoi, Haruo Yokota, Koji Iwano, and Sadaoki Furui. Unified Presentation Contents Retrieval using Voice Information (in Japanese). In Proc. of DEWS2006. IEICE, 2006. [17] Tatsuya Kawahara and Akinobu Lee. Open-Source Speech Recognition Software Julius (In Japanese). Journal of the Japanese Society for Artificial Intelligence, 20(1):41–49, Jan 2005. [18] Makoto Yoshida, Takashi Kobayashi, and Haruo Yokota. Comparison of the Research Mining and the Other Methods for Retrieving Macro-information from an Open Research-paper DB (In Japanese). IPSJ Transactions on Databases, 45(SIG 7(TOD 22)):24–32, 2004. [19] Hidetsugu Nanba, Noriko Kando, and Manabu Okumura. Classification of Research Papers using Citation Links and Citation Types: Towards Automatic Review Article Generation. In Proc. of the 11th SIG Classification Research Workshop, Classification for User Support and Learning, pages 117–134, 2000. [20] Sadaoki Furui. Overview of the 21st century COE program ”Framework for Systematization and Application of Large-scale Knowledge Resources”. In Proc. International Symposium on Large-scale Knowledge Resources [LKR2004], pages 1–8, 2004. [21] Haruo Yokota. An Information Storage System for Large-Scale Knowledge Resources. In Proc. of International Symposium on Large-scale Knowledge Resources LKR2004, pages 87–90, 2004. [22] Takashi Kobayashi and Haruo Yokota. An Overview of the Infrastructure for Storing Large Scale Knowledge Resources. In Proc. of International Symposium on Largescale Knowledge Resources LKR2005, pages 123–130, 2005.