2010 Exceptional Web Experience TECH-B14

Integrating Search in WebSphere Portal Best Practices Andreas Prokoph, Lead architect – Search in Portal and WCM, Portal development ([email protected])

IBM Portal Excellence Conference

July 19 – 22, 2010 Chicago, Illinois © 2010 IBM Corporation

2010 Exceptional Web Experience Conference

Agenda • Overview – Portal and Search • Searching in Portal sites and WCM content – What about the other search engine vendors? • Enhanced end-user search experience • Extending the reach – enterprise search is the solution – Overview – OmniFind Enterprise Edition V9.1 • Questions ..

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Search Architectures in the Enterprise Applications

Search Services

Content Sources

Intranet Search Search Intranet

Employee Portals Portals Employee Employee Employee Directories Directories Corporate Info Info && Corporate Commerce Search Search Commerce

Embedded Search Search Embedded

E-Mail

CRM

Servers

Systems

Systems

Portal

Directory

File Servers

Servers

Servers

Enterprise Search Search Enterprise

Customer Services Services Customer

Sales Force Force Info Info Sales Center Center Search indexes Search indexes or collections or collections IBM Portal Excellence Conference

Content

July 19 – 22, 2010

Web

News

Servers

Servers

2010 Exceptional Web Experience Conference

Search engines and Portals Content and Information sources

1 Portal pages Users

CRM Application

and portlets

Security Personalization

Databases

Content Management

Crawlers, sitemaps,

Search

seedlists

technologies

Search

Collaboration

Engine eHR

Syndicated Content Web Services

User interaction

IBM Portal Excellence Conference

Other web content

July 19 – 22, 2010

2

2010 Exceptional Web Experience Conference

General aspects of integrating search services into WebSphere Portal

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Public Portal site - Lufhansa.de

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

The past: standard web crawlers and the ‘Ooops-effect’ ….

This set of pages represents the structure of the Portal site.

st ill se the cu c re as e d pa wi ge th s

Web crawlers

This set of pages the crawler retrieves and assumes to be unique based on the link structure of the site.

Search indexes

Result:  a few thousand pages will grow into the hundred-thousands with tons of duplicate pages  the crawler might have to give-up .. no end of the site seen  few or none pages will be indexed

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

WebSphere Portal – crawlability enablement Portal Server recognizes the crawler and triggers URLs to be published normalized Web crawlers

un a

P ut ub he lic nt ic pag at ed es ac ce ss

Normalized URL = all navigational state information is discarded from the URL

Search indexes Result:



no more ‘duplicate’ pages  all linked and public Portal pages are crawled and indexed

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

The URL “problem” – at least for a standard web crawler that is .. •In the past – and back then this was valid – a URL and the referenced resource were assumed to be a web page •However – per W3C standard – a URL uniquely identifies a resource •Such a resource could be the same ‘page’ however with partially different content displayed •What that ‘content’ is, is determined by the referenced server •The server might determine what content to show based on a user’s interaction with e.g. a web application in the current session •WebSphere Portal is such a web application which through portlets makes information and content available to users that may interact with backend application through portlets •The portal URL maintains navigational state information as well as portlet specific parameters (and more of course)

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

search engine User friendly-URLs •Friendly-URLs result in human readable URL prefixes that lead to portal pages •Each content node might have a friendly name assigned •The friendly-URL is a hierarchical path constructed from these names based on the content topology (see URL mappings) •Every URL that is generated by WP APIs will contain the friendly-path automatically –It is even guaranteed that every URL that leads to a particular page will start with the page‘s friendly-path

Content Nodes root home shop info shoes

/wps/portal/home /wps/portal/home/shop /wps/portal/home/shop/shoes /wps/portal/home/shop/shoes/!ut/p/04_SB8K8xLLM9MS...

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Portals differ comparing to regular websites – above all when security is involved .... unfortunately the web crawlers don’t know ….

ACL/Personalization This portlet is seen only by a small Group of users!

This would be the content we are interested in to be searchable

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Standard web crawler and the ‘Ooops’ effect ….

Page C

Page A

Welcome page

URL-D

URL-A

URL-C

U R LB

Page B

Portal encodes in URLs additional information about the navigational state of the user:

LR U

au Se c th en ure tic d at p a ed ge ac s ce ss

E

Information encoded within URLs:

 like: which page he comes

URL-A – Target: Page A, coming from Welcome page

from and in what state he left it – e.g. a specific portlet was maximized

URL-B – Target: Page B, coming from Welcome page URL-C – Target: Page C, coming from Page A URL-D – Target: Page A, coming from Page C

A crawler would want to assume:  URL-A and URL-D to be identical  URL-B and URL-E to be identical

IBM Portal Excellence Conference

July 19 – 22, 2010

URL-E – Target: Page B, coming from Page C

2010 Exceptional Web Experience Conference

Crawling becomes a even more complex task •Search engines have a hard time keeping up building crawlers –New content sources and third-party content systems are proliferating. –Where the content is stored and how it is stored is no longer straightforward

•Crawlers are tightly coupled with content protocols –Notes, Domino, DB, File System etc each have their own access protocol

•ACL mechanisms are as varied as content sources

•Standard crawling is becoming more and more inefficient –Crawling backend servers, BUT: having to show the results in the appropriate context

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

More crawling issues to take into consideration …

•Content meta-data is growing rapidly –Who delivers the metadata to the 'page'? Competing portlets, etc... –Needs to be indexed in a generic way –Needs to be harmonized across content types –Is often customized

•Web 2.0 is creating a chasm between content and views –It’s not necessarily the case anymore that a piece of content and the page it’s viewed on are the same thing –Crawling the glass (the view) is making way to crawling the content store –The MVC pattern is showing its face once again, but now it’s dynamic

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Today’s options of integrating applications with “Search” Today

Past and today App2

App1 App4

App4

App5

Programmatic Interface

Search Server

App2

App1

App3

Crawler 1

Search Server

Search Index

 Proprietary interfaces (search engines)  Too many to support  Resulted in: Vendor specific solutions  Verity is one of the few popular search vendors

IBM Portal Excellence Conference

July 19 – 22, 2010

Crawler 2

App3 App5

Crawler 3

Search Index

 Proprietary interfaces (repositories/servers)  Too many to support  Resulted in: Vendor specific solutions  HTTP protocol one of the very few standards but lacks search specific semantics

2010 Exceptional Web Experience Conference

Doing it the right way We need a new search integration paradigm for applications seeking to make their content/information searchable. Seedlists are used in Portal for: –Portal pages and portlets –WCM content – when using the new search admin feature now in WCM –Quickr document libraries, etc…

Google support ‘Sitemaps’ for Internet content today –Sitemap 0.90 protocol (www.sitemaps.org) –Their search appliance has also introduced an extended sitemap protocol –Note: Search Engine Utility portlet for Portal available in the Portal catalog

Sitemaps (entry point enumeration) and Seedlists (content enumeration) are different but complementary This is a ‘necessity to come’ as the Internet and how we use it is re-shaping itself …

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Benefits of integrating applications through Seedlists App2

App1

Seedlist

Seedlist

 Search integration can be done through any search SW that supports ‘Seedlists’.  Seedlists go beyond HTTP and HTML to provide search specific semantics and information

IBM Portal Excellence Conference

 Portal  WCM  Quickr  Connections  Document Libs  … etc.

App5

App4

Seedlist

App3

Seedlist

.. publish their content through seedlists

Seedlist

Seedlist Crawler

July 19 – 22, 2010

Search Server

Search Index

2010 Exceptional Web Experience Conference

Seedlists concepts •A Seedlist is “simply” an enumeration of content items and life-cycle events –Documents in a document library application –Posts on a blog –People in a blue-page-like application

•A Seedlist abstracts content and views – two URLs –Differentiates between the piece of content itself, and the page(s) it’s accessible from –The crawler gets the content “essentials” (crawl URL), the user gets to see the content in the right context of the portal (display URL)

•A Seedlists optimizes crawling –Timestamp and other mechanisms enable crawling only what’s necessary

•A Seedlist is a granular hierarchical construct –For example, a Seedlist can represent libraries and folders in a document library system –Seedlist hierarchies are discoverable

•Contains (additional) metadata usually not part of the content itself (like security information)

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

WCM Seedlist V1.0 support •Available with WebSphere Portal 6.1.0.2, default with V6.1.5 –Latest Seedlist support turned off per default •How to enable it: –WP ConfigService > Custom properties •wcm.config.seedlist.version=1.0 •wcm.config.seedlist.servletpath=/seedlist •Support of seedlist format 1.0 –Open, ATOM-based format –Access control information allowing the crawler to filter based on the user –two URLs provided: •for the crawler to fetch the content object •‘displayURL’ which is given to the user in the search result •Support of custom meta-data in the seedlist –Use cases: search key words –How to enable it: •WP ConfigService > Custom properties –wcm.config.seedlist.metakeys= •Add custom meta-data field to WCM content –Add a new Text Component with a name that you've specified in the ConfigService

Required package  http://www-01.ibm.com/software/brandcatalog/portal/portal/details?catalog.label=1WP1001S6

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Search improvements with Web Content pages Wiring of WCM content with the Portal site infrastructure …. 2. .. and is associated with a ‘sitearea’ in a WCM library

1. Portal page is of type ‘content’ …

IBM Portal Excellence Conference

July 19 – 22, 2010

.. what is achieved now, is that when a content object needs to be rendered in the Portal, a ‘content handler’ will then check what sitearea the content belongs to, and then retrieve the appropriate Portal page ID it is associated with. This allows to then generate the correct URL so display the content in the correct context of the Portal.

2010 Exceptional Web Experience Conference

Search improvements with Web Content pages

2. then selects search result link

1. User enters search term

WCM and Portal search integration imrpovements

3. The WCM page mapped to the selected content is then navigated to and content is shown

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Note: web crawlers following links with IBM Web Content portlets

If a web-crawler needs to follow such links, a Portal configuration change is required!

• • • •

URL normalization out-of-the-box drops render parameters from the URL However those are required to follow links within WCM portlets to get at the referenced content Solution: change “State manager” properties to include “renderparameters” in the normalized URL For details see: --> http://publib.boulder.ibm.com/infocenter/wpdoc/v6r0/topic/com.ibm.wp.zos.doc/wps/srvcfgref.html#srvcfgref__state_manager

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Third party search software integration – why it is not easy …. • For public portal pages – authentication not required – Portal can be crawled and searched just like ‘any other website’

• For secured Portal pages – ‘page-by-page’ and ‘following links’ paradigm will not always work – reason: different level of security – e.g. portlet level

• Requires a different crawling approach – Seedlists! – … compare to sitemaps •

.. Autonomy today already provides the “IBM Connector” (seedlist crawler)

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

IBM Search technologies

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Comparing the two IBM search engines …. embedded -

Enterprise -

Portal Search

OmniFind 8.5

Scale and performance

Up to 800K docs in a single index

‘Reach’ – repositories

HTTP accessible

supported

Content/pages

Quality

Excellent

Stability

No failover support

20 Mio

> 40 content or information stores

Excellent, plus add’l ranking algorithms Failover support with 4-node configuration Support provided for many

Security

For Portal controlled resources

backends – e.g. supports Portal controlled resources.

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

IBM Search technology – base features •Search and Administration user interfaces –one or more Search Services –Federated search capabilities –new Search Center /w Search Scopes –Advanced search available with Search & Browse portlet or OmniFind Search portlet •IBM Standard search interfaces (SIAPI) •Crawlers for Web Sites, Portal site, WCM libraries –the latter two using a ‘sitemap-like’ approach (Seedlist) –OmniFind adds a richer set of crawlers (Domino, CMSs, RDBs, etc.) •Support for more than 250 document formats •Categorization and taxonomies – rules based •Summarization (static/dynamic)

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Search Center V6.1

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Integration of various search services using the Search Center Search Center

Local search Quickr

Intranet

Domino

EJB

SOAP

Websites WCM Portal site Remote Search 1

IBMOmniFind OmniFind IBM EnterpriseEdition Edition Enterprise

Remote Search 2

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Improved end user search experience

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

New features in Portal Search V6.1.x – overview •The Search Center is more responsive now exploits Asynchronous Java™Script and XML (AJAX)

•Extend Portal Search by displaying results from external search engines using the External Search Results portlet. •Suggested search results can be displayed separately from the regular results •Configuration of the default scope and removal of the All Sources scope if required is now available. •Searching portal content is remotely accessible through REST Service.

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Search Center V6.1.x

WPTC Portal Intranet

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Search Center – search results view

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Page Layout of the Portal search page

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Configuring External Search Results portlet Example using the developerWorks search service

1.Locate the Manage Portlets portlet

2.Assign to its configuration parameter ‘searchEngineUrl’ the search service’s query URL

http://www.ibm.com/developerworks/views/rss/customrssatom.jsp?zone_by=Lotus&search_by=${searchTerms} IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Working with Suggested Links portlet 1.Create a search collection – in this case ‘Suggested Links’ as an example

2.‘Add document’ will allow to enter the URL to the web page

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Working with Suggested Links portlet – part 2 1.Add keywords and other information in the form on the right (Update content) and lick on ‘OK’

2.Last step is to configure the ‘Suggested Links’ portlet

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

More flexibility with Search Scopes

•Change the order of your Search Scopes •Plus: you may delete and re-create ‘All sources’ – e.g. to be a subset of search collection only

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Search REST services •The Search service for executing a query has the following URL: –://://search? –Example: http://www.example.com:8080/searchfeed/myportal/search?query=test&scope=&results=10

•Parameters: (subset listed here) –scope - the id of the scope you want to search (required) •However if sent as scope=, it searches in All Sources scope –query - your search terms (required) –start - the offset of the first result for paging (optional) - Default is zero –results - the number of results to retrieve (optional) – default is 10 –queryLang - the language of the query (optional) –output - the output format (optional) •Default is Atom •alternative value: text/html

for details see: http://download.boulder.ibm.com/ibmdl/pub/software/dw/lotus/quickrdevguide.pdf

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Search REST services – example ATOM output •Query:

http://www.example.com:8080/searchfeed/myportal/search?query=test&scope=&results=1

Search results for query test on all sources Enterprise Search API Web Service. [https://homer.haifa.ibm.com:10035/searchfeed/myportal/search?scope=&query=test&results=1] 2007-08-08T08:43:39.500+00:00 2 0 1 An Introduction to Mixed-Signal IC Test uid=quikradm,o=default organization 100.0 2007-07-30T14:15:58.000+00:00 [] /lotus/themes/html/QPG/icons/scope_search_docs.gif

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Crawl and index into a single search collection – versus federated search ●

Single search index (search collection) ● ●

Enterprise search applications (OmniFind) Relevancy calculation base on the heuristics of entire corpus

Multiple search indexes – federated search scenarios







Same search technology ● Federated search with relevancy ranking based on heuristics within individual search index ● Comparable rankings – thus merged search result view viable ● Secure access to certain search indexes to specific user groups Different search technologies ● Federated search with relevancy ranking based on heuristics within individual search index AND different ranking algorithms ● → rank scores not comparable, thus no single merged result view possible (or advised to realize) ● options: use External results portlet or external search service Search Scope

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Search federation – adding Quickr search to the picture … Configuring a new search service for a Quickr server: 1.Prerequisite is that both Portal and Quickr servers are using the correct SSL certificate: –Use the IBM Key Management tool (ikeyman) to extract the certificate from the Quickr server and import into Portal server

2.Add a new content search service to the local portal server –under ‘New Search Service’ and Search service implementation select the ‘Remote Content Server Search Service Type’ 3.Configure the new Search Service by setting the following parameters: RestServiceHost parameter define a secure port for the parameter RestServiceSecurePort (for example, 10035) use https for the RestServiceSecureProtocol entry provide a symbolic search collection identifier

4. When done a new (virtual) search collection is automatically created as a placeholder for the Quickr search collection(s) 5.This search collection can be used e.g. to define a Search Scope ‘Quickr search’

http://publib.boulder.ibm.com/infocenter/wpdoc/v6r1m0/topic/com.ibm.wp.ent.doc/admin/searchfederation.html

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Search Seedlist – Lotus Connections Lotus Connections V2.5.0.1 – seedlists available through the following URLs: 1.Activities

http:///activities/seedlist/myserver

2.Blogs

http:///blogs/seedlist/myserver

3.Communities

http:///communities/seedlist/myserver

4.Forums

http:///communities/seedlist/forum/myserver

5.Bookmarks

http:///dogear/seedlist/myserver

6.Files

http:///files/seedlist/document/community

7.Profiles

http:///profiles/seedlist/myserver

8.Wikis

http:///wikis/seedlist/myserver

Use above Seedlist URLs to add the respective crawler(s) to Portal Search or OmniFind.

Note: post-processing step on search results to filter out documents that users have lost access to since the last crawl. This is not currently supported for Lotus Connections and crawling should be frequent enough to reduce the likelihood of false positive results http://www-01.ibm.com/support/docview.wss?uid=swg21422913&myns=swglotus&mynp=OCSSYGQH&mync=R

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Portal Search administration

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Overview - major tasks and resources for Portal Search

Search Center

Administration

Search

Search Service A

Scopes

Search Service Quickr

Search Service B Each of the search collections can be individually secured IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

High Availability for Portal Search Two options available:

Whitepaper available here: http://www.ibm.com/developerworks/websphere/zones/portal/proddoc/dw-w-portalsearch/

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Search collection – backup and recovery •Backup and recovery function: –if Portal Search automatically detects a corrupted search collection configuration or the search index itself –automatic recovery of the search collection from the backup files is initiated •configuration settings for all search collections are automatically backed up •Default location is /collections_config_backup/ •How to activate: –Go to Search Service administration –Add the following configuration parameter: •RECOVERY_BACKUP_LOCATION and as value a valid subdirectory •Verify that backup has been created: –assuming search collection is ‘MyCollection’, the above subdirectory should contain a file called: –c_c_scollections_ sMyCollection -2007.08.03.22.50.40.297

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Search collection – managing the cleanup process •Cleanup process is required to physically remove deleted entries, broken links and expired entries from the search collections –the default setting is that the process is started every day at midnight

•There are two options available to define when the cleanup process is invoked: –configured through the search service property ‘CLEAN_UP_TIME_OF_DAY_HOURS ’; specify hour of the day (0 to 23) Note: this applies to search collections created thereafter! –Configure ‘content source’ to remove broken links after ‘0’ days this will invoke the cleanup process after every crawl

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Search Center V6.1.x WPTC Portal Intranet

IBM Portal Excellence Conference

July 19 – 22, 2010

Extending search to reach out to even more repositories

2010 Exceptional Web Experience Conference

IBM OmniFind Enterprise Edition just released: V9.1

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

OmniFind and WebSphere Portal

•OmniFind has been specifically engineered to integrate with the IBM WebSphere Portal •Benefits of Integration –Used to extend your reach into the enterprise –Scales to millions documents –Honors native document level security of your enterprise content –Adds advanced search features –Platform for integration of text analytics to enable semantic search

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

OmniFind Enterprise Edition 9.1 •Enhanced End-User Experience –High-performance faceted navigation –Saved searches –Search profiles –Document previews –And more… •Enhanced Administrative Experience –Scalability improvements –Incremental indexing support –Reduced resource requirements –More flexible scale-out & HA –New relevancy tuning options –And more… •Lucene & UIMA based platform –Shared technology base with eDiscovery, Content Analytics and other IBM products IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Introducing OmniFind Enterprise Edition 9.1 •Enables Knowledge Driven Search to: –accelerate time to knowledge –provide greater accuracy –deliver business context with enterprise search •OmniFind delivers on the Five Pillars of Knowledge Driven Search: –Dynamic – Delivers complete dynamic facet capabilities, type-ahead search, query saving and result exporting, and is reactive to search-led content exploration –Tailorable – Delivers business adjustable relevancy and UIMA standardisation –Supportable – Delivers search on 5 platforms, connects to 30+ repositories –Secure – Delivers enforced security across content repositories –Scalable – Lucene-based index for enterprise level scalability IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

New Search UI

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

New Search UI

Type ahead search: 1. Suggests queries based on index content and past queries 2. Shows estimated results count as part of suggestion 3. Customizable by Search Administrators

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

New Search UI

Save your search and re-execute saved queries

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

New Search UI

Search within current results set

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

New Search UI

Quick select for file type searching

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

New Search UI

1 – Toggles on and off document properties such as filesize 2 – Allows users to set individual results display preferences

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

New Search UI

Automatic query expansion suggestions and spell check

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

New Search UI

Thumbnail view for first page of documents in results page

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

New Search UI Faceted search provides drill-through capabilities out-of-the-box and customizable by the business

Numeric and Date Range Facets

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Relevancy Tuning •Static Score Tuning –Ratio of static score contribution in the ranking algorithm is adjustable •URL Pattern Matching –Boost factor based on specific URL patterns •Boost Word Dictionaries –Automatic term boosting for words specified in custom dictionary •Query Term Boosting –Query term boosting based on query syntax

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

REST API •Custom Search and Admin applications can be implemented by REST API •Language independent •Provides all required functions for creating a search UI –Search navigation –Facet navigation –Search functions •Faceted search •Fetch content, thumbnails and previous document •List spell correction, synonym expansions and type-ahead suggestions •And more…

•Provides required functions for administrating search –Managing collections –Controlling and monitoring components –Adding documents to a collection

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Search Customizer •Administrator can modify major search UI configurations thru customizer GUI •Customization Points –Server Configuration •Search server’s hostname, port, and timeout…

–Appearance •Displayed application name, logo image, show/hide links, data source icons…

–Default value for search UI preference •Search page, facets, top results, results, result columns

•No need to restart the search session IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Search Customizer – Examples •Show fields as a result table column •Change the order of columns in results pages •Add or remove custom fields

Default

Customized

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Extending Platform Support •Leveraging existing infrastructure to deliver Enterprise Search

•IBM AIX V5.3 & 6.1 (64 bit) •Microsoft Windows Server 2003 (32 bit), 2008 (64 bit) •SUSE Linux 10 & 11 (32 & 64 bit) •Red Hat AS 5.0 (32 & 64 bit) •Oracle Solaris 10 (SPARC) •New z/Linux

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Rich Support for IBM Collaboration Products •Ongoing collaboration within IBM to optimize OmniFind search for WebSphere and Lotus •Supports the latest Lotus protocols, frameworks and product releases sooner than competing products •Designed to unlock the full value of Portal and Collaboration investments

Lotus Notes

Lotus Domino

IBM Portal Excellence Conference

Lotus Quickr July 19 – 22, 2010

Lotus Connections

Lotus Sametime

2010 Exceptional Web Experience Conference

Connectors to Enterprise Repositories •Lotus Web Content Management 6.1, 6.1.5 •Microsoft Exchange 2000, 2003 •Microsoft SQL Server Enterprise 2005, 2008 •MySQL 5.0 •Network News Protocol Newsgroup (NNTP) •Open Text Livelink Enterprise Server 9.6, 9.7, 9.7.1 •Oracle 9i, 10g, 11g •QuickPlace 6.5.1, 7.0 •SharePoint Server 2003 SP2, Sharepoint Server 2007 •Software AG Adabas 7.1 •Sybase 11.9.2, 12.0, 12.5.x •UNIX file systems •VSAM for z/OS 1.4 •Web servers (HTTP or HTTPS) •WebSphere Portal 6.0, 6.0.1, 6.1, 6.1.5 •Windows 2000, 2003, 2008 Server •Workplace Web Content Management 2.5, 2.5.1, 6.0, 6.0.1

IBM Portal Excellence Conference

•DB2 Enterprise Server Edition 8.1, 9.1, 9.5, 9.7 •DB2 Express Edition 8.1, 9.1, 9.5, 9.7 •DB2 for iSeries 5.4 and 6.1 •DB2 for z/OS 8.1 and 9.1 •DB2 Workgroup Server Edition 8.1, 9.1, 9.5, 9.7 •Domino Document Manager 6.5.1, 7.0 •Domino R7, R8, R8.5 •EMC/Documentum 6.0, 6.5 •FileNet P8 CM 4.0, 4.5, 4.5.1 •Hummingbird DM 5.1.0.5, DM 6.0.4 with SR6 •IBM Content Manager 8.4, 8.4.1, 8.4.2 •IBM IMS 10, 11.01 •Informix Dynamic Server Enterprise Edition 11.10, 11.50 •Lotus Connections 2.0, 2.0.1, 2.5.01 •Lotus Notes R6.5, 6.5.1, R7, R8, R8.5 •Lotus QuickPlace 6.5.1 and 7.0 •Lotus Quickr Services for Domino 8.0, 8.1, 8.2 •Lotus Quickr Services for WebSphere Portal 8.0.0.2, 8.1

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

WebSphere Portal Integration •OEE V9.1 provides KnowledgeDriven enterprise search capabilities to WebSphere Portal and related products –Provides new search portlet and ESSearchPortlet (for classic search collections) –Provides service for search center –Provides search bar integration

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Query Statistics •Query statistics UI shows: •Time transition of •Number of queries, number of users, average response time (ms), worst response time (ms)

•Query popularity •History of submitted queries

•Query Statistics enables you to: •Export history data to CSV file •Change time range, collection or user ID •Change display of charts or a table •Refresh data automatically

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Information Security Is Critical •Not all users are allowed to search all corporation information •OmniFind supports both early binding and late binding to achieve maximum security while maintaining optimum search performance

Pessimistic security ensures information confidentiality

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Document Level Security 1

At indexing, extract and index the ACLs or static tokens

2

At search time, get the user credentials (SSO, LDAP, LTPA)

3

At search time, pre-filter results against indexed tokens/ACLs

4

Post-filter results against the secure data source; return results 4

Plugin

CrawlerPlug-in Plug-in

1 IBM Portal Excellence Conference

2

Parser Indexer

Search I n d e x

July 19 – 22, 2010

3

2010 Exceptional Web Experience Conference

Flexible Multi-Server Configuration •Support scalable and flexible configuration –Multiple Document Processing Nodes –Multiple Search Runtime Nodes –HA Cluster for Crawler and Document Processing & Indexer

Index

Doc Processing (Parser/UIMA Pipeline)

CDSR

Crawl

IBM Portal Excellence Conference

RDS

Search Scale out

Index

Doc Processing /Index Doc Processing /Index

Search Index

Doc Processing (Parser/UIMA Pipeline))

Crawl

Search

Search Index Index

July 19 – 22, 2010

HA Cluster

2010 Exceptional Web Experience Conference

Distributed Server HA •Distributed server also can have a backup server on both crawler and indexer –May have multiple remote search servers –Should share same storage on same path with master and backup server –Indexer and crawler server's master/backup configuration is separated •User may have backup for index server but not for crawler server Search Master Crawler

Master Indexer Search

Backup Crawler

Backup Indexer

IBM Portal Excellence Conference

July 19 – 22, 2010

Search

2010 Exceptional Web Experience Conference

Apache UIMA – text analytics •OASIS Standard as of March 2009

•Enables interoperability of different analytics solutions and enterprise applications •Provides an SDK for building and composing text analytics •Defines a common interface for integrating text analysis modules •Enables development of new and re-use of existing components for analysis

Search Index Identify Relationships

Named-entity extraction

Categorization

Find Words & Roots

Text

Identify Language

Identify Relevant Entities → Build Structure People, places, organizations, relationships Parts, problems, conditions Topics, products, interests, sentiment Times, events, threats, plots, associations

Extracted Metadata

Database

and Facts

Applications

Text Analysis Modules – aka “Annotators” IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

For More Information (1) WebSphere Portal – IBM Site http://www-3.ibm.com/software/genservers/portal/ WebSphere Portal Information Center http://www.ibm.com/developerworks/websphere/zones/portal/proddoc.html WebSphere Portal Business Solutions Catalog (on Lotus Greenhouse) https://greenhouse.lotus.com/catalog/home_full.xsp?fProduct=WebSphere%20Portal WebSphere and Lotus Web Content Management Portal Open Beta https://www14.software.ibm.com/iwm/web/cc/earlyprograms/lotus/portalopenbeta/ WebSphere Portal Blog https://www.ibm.com/developerworks/mydeveloperworks/blogs/WebSpherePortal/

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

For More Information (2 ) IBM Lotus Connections http://www.ibm.com/software/lotus/products/connections IBM Lotus Forms http://www.ibm.com/software/lotus/forms IBM Lotus Quickr http://www.ibm.com/lotus/quickr IBM Lotus Sametime http://www.ibm.com/lotus/sametime WebSphere Commerce http://www.ibm.com/websphere/commerce WebSphere Process Server and Business Process Automation http://www.ibm.com/software/integration/wps

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

We Value Your Feedback! Please complete the session survey for this session:

TECH-B14 Session Speaker: Andreas Prokoph

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference

Questions?

IBM Portal Excellence Conference

July 19 – 22, 2010

2010 Exceptional Web Experience Conference © IBM Corporation 2010. All Rights Reserved. The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.

All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.

IBM, the IBM logo, Lotus, Lotus Notes, Notes, Domino, Quickr, Sametime, WebSphere, UC2, PartnerWorld are trademarks of International Business Machines Corporation in the United States, other countries, or both. Unyte is a trademark of WebDialogs, Inc., in the United States, other countries, or both.

Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

IBM Portal Excellence Conference

July 19 – 22, 2010

If you reference Linux® in your presentation, please mark the first use and include the following; otherwise delete: