2010 Exceptional Web Experience TECH-B14
Integrating Search in WebSphere Portal Best Practices Andreas Prokoph, Lead architect – Search in Portal and WCM, Portal development (
[email protected])
IBM Portal Excellence Conference
July 19 – 22, 2010 Chicago, Illinois © 2010 IBM Corporation
2010 Exceptional Web Experience Conference
Agenda • Overview – Portal and Search • Searching in Portal sites and WCM content – What about the other search engine vendors? • Enhanced end-user search experience • Extending the reach – enterprise search is the solution – Overview – OmniFind Enterprise Edition V9.1 • Questions ..
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Search Architectures in the Enterprise Applications
Search Services
Content Sources
Intranet Search Search Intranet
Employee Portals Portals Employee Employee Employee Directories Directories Corporate Info Info && Corporate Commerce Search Search Commerce
Embedded Search Search Embedded
E-Mail
CRM
Servers
Systems
Systems
Portal
Directory
File Servers
Servers
Servers
Enterprise Search Search Enterprise
Customer Services Services Customer
Sales Force Force Info Info Sales Center Center Search indexes Search indexes or collections or collections IBM Portal Excellence Conference
Content
July 19 – 22, 2010
Web
News
Servers
Servers
2010 Exceptional Web Experience Conference
Search engines and Portals Content and Information sources
1 Portal pages Users
CRM Application
and portlets
Security Personalization
Databases
Content Management
Crawlers, sitemaps,
Search
seedlists
technologies
Search
Collaboration
Engine eHR
Syndicated Content Web Services
User interaction
IBM Portal Excellence Conference
Other web content
July 19 – 22, 2010
2
2010 Exceptional Web Experience Conference
General aspects of integrating search services into WebSphere Portal
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Public Portal site - Lufhansa.de
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
The past: standard web crawlers and the ‘Ooops-effect’ ….
This set of pages represents the structure of the Portal site.
st ill se the cu c re as e d pa wi ge th s
Web crawlers
This set of pages the crawler retrieves and assumes to be unique based on the link structure of the site.
Search indexes
Result: a few thousand pages will grow into the hundred-thousands with tons of duplicate pages the crawler might have to give-up .. no end of the site seen few or none pages will be indexed
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
WebSphere Portal – crawlability enablement Portal Server recognizes the crawler and triggers URLs to be published normalized Web crawlers
un a
P ut ub he lic nt ic pag at ed es ac ce ss
Normalized URL = all navigational state information is discarded from the URL
Search indexes Result:
no more ‘duplicate’ pages all linked and public Portal pages are crawled and indexed
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
The URL “problem” – at least for a standard web crawler that is .. •In the past – and back then this was valid – a URL and the referenced resource were assumed to be a web page •However – per W3C standard – a URL uniquely identifies a resource •Such a resource could be the same ‘page’ however with partially different content displayed •What that ‘content’ is, is determined by the referenced server •The server might determine what content to show based on a user’s interaction with e.g. a web application in the current session •WebSphere Portal is such a web application which through portlets makes information and content available to users that may interact with backend application through portlets •The portal URL maintains navigational state information as well as portlet specific parameters (and more of course)
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
search engine User friendly-URLs •Friendly-URLs result in human readable URL prefixes that lead to portal pages •Each content node might have a friendly name assigned •The friendly-URL is a hierarchical path constructed from these names based on the content topology (see URL mappings) •Every URL that is generated by WP APIs will contain the friendly-path automatically –It is even guaranteed that every URL that leads to a particular page will start with the page‘s friendly-path
Content Nodes root home shop info shoes
/wps/portal/home /wps/portal/home/shop /wps/portal/home/shop/shoes /wps/portal/home/shop/shoes/!ut/p/04_SB8K8xLLM9MS...
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Portals differ comparing to regular websites – above all when security is involved .... unfortunately the web crawlers don’t know ….
ACL/Personalization This portlet is seen only by a small Group of users!
This would be the content we are interested in to be searchable
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Standard web crawler and the ‘Ooops’ effect ….
Page C
Page A
Welcome page
URL-D
URL-A
URL-C
U R LB
Page B
Portal encodes in URLs additional information about the navigational state of the user:
LR U
au Se c th en ure tic d at p a ed ge ac s ce ss
E
Information encoded within URLs:
like: which page he comes
URL-A – Target: Page A, coming from Welcome page
from and in what state he left it – e.g. a specific portlet was maximized
URL-B – Target: Page B, coming from Welcome page URL-C – Target: Page C, coming from Page A URL-D – Target: Page A, coming from Page C
A crawler would want to assume: URL-A and URL-D to be identical URL-B and URL-E to be identical
IBM Portal Excellence Conference
July 19 – 22, 2010
URL-E – Target: Page B, coming from Page C
2010 Exceptional Web Experience Conference
Crawling becomes a even more complex task •Search engines have a hard time keeping up building crawlers –New content sources and third-party content systems are proliferating. –Where the content is stored and how it is stored is no longer straightforward
•Crawlers are tightly coupled with content protocols –Notes, Domino, DB, File System etc each have their own access protocol
•ACL mechanisms are as varied as content sources
•Standard crawling is becoming more and more inefficient –Crawling backend servers, BUT: having to show the results in the appropriate context
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
More crawling issues to take into consideration …
•Content meta-data is growing rapidly –Who delivers the metadata to the 'page'? Competing portlets, etc... –Needs to be indexed in a generic way –Needs to be harmonized across content types –Is often customized
•Web 2.0 is creating a chasm between content and views –It’s not necessarily the case anymore that a piece of content and the page it’s viewed on are the same thing –Crawling the glass (the view) is making way to crawling the content store –The MVC pattern is showing its face once again, but now it’s dynamic
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Today’s options of integrating applications with “Search” Today
Past and today App2
App1 App4
App4
App5
Programmatic Interface
Search Server
App2
App1
App3
Crawler 1
Search Server
Search Index
Proprietary interfaces (search engines) Too many to support Resulted in: Vendor specific solutions Verity is one of the few popular search vendors
IBM Portal Excellence Conference
July 19 – 22, 2010
Crawler 2
App3 App5
Crawler 3
Search Index
Proprietary interfaces (repositories/servers) Too many to support Resulted in: Vendor specific solutions HTTP protocol one of the very few standards but lacks search specific semantics
2010 Exceptional Web Experience Conference
Doing it the right way We need a new search integration paradigm for applications seeking to make their content/information searchable. Seedlists are used in Portal for: –Portal pages and portlets –WCM content – when using the new search admin feature now in WCM –Quickr document libraries, etc…
Google support ‘Sitemaps’ for Internet content today –Sitemap 0.90 protocol (www.sitemaps.org) –Their search appliance has also introduced an extended sitemap protocol –Note: Search Engine Utility portlet for Portal available in the Portal catalog
Sitemaps (entry point enumeration) and Seedlists (content enumeration) are different but complementary This is a ‘necessity to come’ as the Internet and how we use it is re-shaping itself …
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Benefits of integrating applications through Seedlists App2
App1
Seedlist
Seedlist
Search integration can be done through any search SW that supports ‘Seedlists’. Seedlists go beyond HTTP and HTML to provide search specific semantics and information
IBM Portal Excellence Conference
Portal WCM Quickr Connections Document Libs … etc.
App5
App4
Seedlist
App3
Seedlist
.. publish their content through seedlists
Seedlist
Seedlist Crawler
July 19 – 22, 2010
Search Server
Search Index
2010 Exceptional Web Experience Conference
Seedlists concepts •A Seedlist is “simply” an enumeration of content items and life-cycle events –Documents in a document library application –Posts on a blog –People in a blue-page-like application
•A Seedlist abstracts content and views – two URLs –Differentiates between the piece of content itself, and the page(s) it’s accessible from –The crawler gets the content “essentials” (crawl URL), the user gets to see the content in the right context of the portal (display URL)
•A Seedlists optimizes crawling –Timestamp and other mechanisms enable crawling only what’s necessary
•A Seedlist is a granular hierarchical construct –For example, a Seedlist can represent libraries and folders in a document library system –Seedlist hierarchies are discoverable
•Contains (additional) metadata usually not part of the content itself (like security information)
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
WCM Seedlist V1.0 support •Available with WebSphere Portal 6.1.0.2, default with V6.1.5 –Latest Seedlist support turned off per default •How to enable it: –WP ConfigService > Custom properties •wcm.config.seedlist.version=1.0 •wcm.config.seedlist.servletpath=/seedlist •Support of seedlist format 1.0 –Open, ATOM-based format –Access control information allowing the crawler to filter based on the user –two URLs provided: •for the crawler to fetch the content object •‘displayURL’ which is given to the user in the search result •Support of custom meta-data in the seedlist –Use cases: search key words –How to enable it: •WP ConfigService > Custom properties –wcm.config.seedlist.metakeys= •Add custom meta-data field to WCM content –Add a new Text Component with a name that you've specified in the ConfigService
Required package http://www-01.ibm.com/software/brandcatalog/portal/portal/details?catalog.label=1WP1001S6
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Search improvements with Web Content pages Wiring of WCM content with the Portal site infrastructure …. 2. .. and is associated with a ‘sitearea’ in a WCM library
1. Portal page is of type ‘content’ …
IBM Portal Excellence Conference
July 19 – 22, 2010
.. what is achieved now, is that when a content object needs to be rendered in the Portal, a ‘content handler’ will then check what sitearea the content belongs to, and then retrieve the appropriate Portal page ID it is associated with. This allows to then generate the correct URL so display the content in the correct context of the Portal.
2010 Exceptional Web Experience Conference
Search improvements with Web Content pages
2. then selects search result link
1. User enters search term
WCM and Portal search integration imrpovements
3. The WCM page mapped to the selected content is then navigated to and content is shown
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Note: web crawlers following links with IBM Web Content portlets
If a web-crawler needs to follow such links, a Portal configuration change is required!
• • • •
URL normalization out-of-the-box drops render parameters from the URL However those are required to follow links within WCM portlets to get at the referenced content Solution: change “State manager” properties to include “renderparameters” in the normalized URL For details see: --> http://publib.boulder.ibm.com/infocenter/wpdoc/v6r0/topic/com.ibm.wp.zos.doc/wps/srvcfgref.html#srvcfgref__state_manager
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Third party search software integration – why it is not easy …. • For public portal pages – authentication not required – Portal can be crawled and searched just like ‘any other website’
• For secured Portal pages – ‘page-by-page’ and ‘following links’ paradigm will not always work – reason: different level of security – e.g. portlet level
• Requires a different crawling approach – Seedlists! – … compare to sitemaps •
.. Autonomy today already provides the “IBM Connector” (seedlist crawler)
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
IBM Search technologies
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Comparing the two IBM search engines …. embedded -
Enterprise -
Portal Search
OmniFind 8.5
Scale and performance
Up to 800K docs in a single index
‘Reach’ – repositories
HTTP accessible
supported
Content/pages
Quality
Excellent
Stability
No failover support
20 Mio
> 40 content or information stores
Excellent, plus add’l ranking algorithms Failover support with 4-node configuration Support provided for many
Security
For Portal controlled resources
backends – e.g. supports Portal controlled resources.
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
IBM Search technology – base features •Search and Administration user interfaces –one or more Search Services –Federated search capabilities –new Search Center /w Search Scopes –Advanced search available with Search & Browse portlet or OmniFind Search portlet •IBM Standard search interfaces (SIAPI) •Crawlers for Web Sites, Portal site, WCM libraries –the latter two using a ‘sitemap-like’ approach (Seedlist) –OmniFind adds a richer set of crawlers (Domino, CMSs, RDBs, etc.) •Support for more than 250 document formats •Categorization and taxonomies – rules based •Summarization (static/dynamic)
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Search Center V6.1
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Integration of various search services using the Search Center Search Center
Local search Quickr
Intranet
Domino
EJB
SOAP
Websites WCM Portal site Remote Search 1
IBMOmniFind OmniFind IBM EnterpriseEdition Edition Enterprise
Remote Search 2
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Improved end user search experience
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
New features in Portal Search V6.1.x – overview •The Search Center is more responsive now exploits Asynchronous Java™Script and XML (AJAX)
•Extend Portal Search by displaying results from external search engines using the External Search Results portlet. •Suggested search results can be displayed separately from the regular results •Configuration of the default scope and removal of the All Sources scope if required is now available. •Searching portal content is remotely accessible through REST Service.
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Search Center V6.1.x
WPTC Portal Intranet
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Search Center – search results view
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Page Layout of the Portal search page
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Configuring External Search Results portlet Example using the developerWorks search service
1.Locate the Manage Portlets portlet
2.Assign to its configuration parameter ‘searchEngineUrl’ the search service’s query URL
http://www.ibm.com/developerworks/views/rss/customrssatom.jsp?zone_by=Lotus&search_by=${searchTerms} IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Working with Suggested Links portlet 1.Create a search collection – in this case ‘Suggested Links’ as an example
2.‘Add document’ will allow to enter the URL to the web page
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Working with Suggested Links portlet – part 2 1.Add keywords and other information in the form on the right (Update content) and lick on ‘OK’
2.Last step is to configure the ‘Suggested Links’ portlet
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
More flexibility with Search Scopes
•Change the order of your Search Scopes •Plus: you may delete and re-create ‘All sources’ – e.g. to be a subset of search collection only
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Search REST services •The Search service for executing a query has the following URL: –://://search? –Example: http://www.example.com:8080/searchfeed/myportal/search?query=test&scope=&results=10
•Parameters: (subset listed here) –scope - the id of the scope you want to search (required) •However if sent as scope=, it searches in All Sources scope –query - your search terms (required) –start - the offset of the first result for paging (optional) - Default is zero –results - the number of results to retrieve (optional) – default is 10 –queryLang - the language of the query (optional) –output - the output format (optional) •Default is Atom •alternative value: text/html
for details see: http://download.boulder.ibm.com/ibmdl/pub/software/dw/lotus/quickrdevguide.pdf
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Search REST services – example ATOM output •Query:
http://www.example.com:8080/searchfeed/myportal/search?query=test&scope=&results=1
Search results for query test on all sources Enterprise Search API Web Service. [https://homer.haifa.ibm.com:10035/searchfeed/myportal/search?scope=&query=test&results=1] 2007-08-08T08:43:39.500+00:00 2 0 1 An Introduction to Mixed-Signal IC Test uid=quikradm,o=default organization 100.0 2007-07-30T14:15:58.000+00:00 [] /lotus/themes/html/QPG/icons/scope_search_docs.gif
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Crawl and index into a single search collection – versus federated search ●
Single search index (search collection) ● ●
Enterprise search applications (OmniFind) Relevancy calculation base on the heuristics of entire corpus
Multiple search indexes – federated search scenarios
●
●
●
Same search technology ● Federated search with relevancy ranking based on heuristics within individual search index ● Comparable rankings – thus merged search result view viable ● Secure access to certain search indexes to specific user groups Different search technologies ● Federated search with relevancy ranking based on heuristics within individual search index AND different ranking algorithms ● → rank scores not comparable, thus no single merged result view possible (or advised to realize) ● options: use External results portlet or external search service Search Scope
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Search federation – adding Quickr search to the picture … Configuring a new search service for a Quickr server: 1.Prerequisite is that both Portal and Quickr servers are using the correct SSL certificate: –Use the IBM Key Management tool (ikeyman) to extract the certificate from the Quickr server and import into Portal server
2.Add a new content search service to the local portal server –under ‘New Search Service’ and Search service implementation select the ‘Remote Content Server Search Service Type’ 3.Configure the new Search Service by setting the following parameters: RestServiceHost parameter define a secure port for the parameter RestServiceSecurePort (for example, 10035) use https for the RestServiceSecureProtocol entry provide a symbolic search collection identifier
4. When done a new (virtual) search collection is automatically created as a placeholder for the Quickr search collection(s) 5.This search collection can be used e.g. to define a Search Scope ‘Quickr search’
http://publib.boulder.ibm.com/infocenter/wpdoc/v6r1m0/topic/com.ibm.wp.ent.doc/admin/searchfederation.html
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Search Seedlist – Lotus Connections Lotus Connections V2.5.0.1 – seedlists available through the following URLs: 1.Activities
http:///activities/seedlist/myserver
2.Blogs
http:///blogs/seedlist/myserver
3.Communities
http:///communities/seedlist/myserver
4.Forums
http:///communities/seedlist/forum/myserver
5.Bookmarks
http:///dogear/seedlist/myserver
6.Files
http:///files/seedlist/document/community
7.Profiles
http:///profiles/seedlist/myserver
8.Wikis
http:///wikis/seedlist/myserver
Use above Seedlist URLs to add the respective crawler(s) to Portal Search or OmniFind.
Note: post-processing step on search results to filter out documents that users have lost access to since the last crawl. This is not currently supported for Lotus Connections and crawling should be frequent enough to reduce the likelihood of false positive results http://www-01.ibm.com/support/docview.wss?uid=swg21422913&myns=swglotus&mynp=OCSSYGQH&mync=R
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Portal Search administration
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Overview - major tasks and resources for Portal Search
Search Center
Administration
Search
Search Service A
Scopes
Search Service Quickr
Search Service B Each of the search collections can be individually secured IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
High Availability for Portal Search Two options available:
Whitepaper available here: http://www.ibm.com/developerworks/websphere/zones/portal/proddoc/dw-w-portalsearch/
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Search collection – backup and recovery •Backup and recovery function: –if Portal Search automatically detects a corrupted search collection configuration or the search index itself –automatic recovery of the search collection from the backup files is initiated •configuration settings for all search collections are automatically backed up •Default location is /collections_config_backup/ •How to activate: –Go to Search Service administration –Add the following configuration parameter: •RECOVERY_BACKUP_LOCATION and as value a valid subdirectory •Verify that backup has been created: –assuming search collection is ‘MyCollection’, the above subdirectory should contain a file called: –c_c_scollections_ sMyCollection -2007.08.03.22.50.40.297
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Search collection – managing the cleanup process •Cleanup process is required to physically remove deleted entries, broken links and expired entries from the search collections –the default setting is that the process is started every day at midnight
•There are two options available to define when the cleanup process is invoked: –configured through the search service property ‘CLEAN_UP_TIME_OF_DAY_HOURS ’; specify hour of the day (0 to 23) Note: this applies to search collections created thereafter! –Configure ‘content source’ to remove broken links after ‘0’ days this will invoke the cleanup process after every crawl
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Search Center V6.1.x WPTC Portal Intranet
IBM Portal Excellence Conference
July 19 – 22, 2010
Extending search to reach out to even more repositories
2010 Exceptional Web Experience Conference
IBM OmniFind Enterprise Edition just released: V9.1
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
OmniFind and WebSphere Portal
•OmniFind has been specifically engineered to integrate with the IBM WebSphere Portal •Benefits of Integration –Used to extend your reach into the enterprise –Scales to millions documents –Honors native document level security of your enterprise content –Adds advanced search features –Platform for integration of text analytics to enable semantic search
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
OmniFind Enterprise Edition 9.1 •Enhanced End-User Experience –High-performance faceted navigation –Saved searches –Search profiles –Document previews –And more… •Enhanced Administrative Experience –Scalability improvements –Incremental indexing support –Reduced resource requirements –More flexible scale-out & HA –New relevancy tuning options –And more… •Lucene & UIMA based platform –Shared technology base with eDiscovery, Content Analytics and other IBM products IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Introducing OmniFind Enterprise Edition 9.1 •Enables Knowledge Driven Search to: –accelerate time to knowledge –provide greater accuracy –deliver business context with enterprise search •OmniFind delivers on the Five Pillars of Knowledge Driven Search: –Dynamic – Delivers complete dynamic facet capabilities, type-ahead search, query saving and result exporting, and is reactive to search-led content exploration –Tailorable – Delivers business adjustable relevancy and UIMA standardisation –Supportable – Delivers search on 5 platforms, connects to 30+ repositories –Secure – Delivers enforced security across content repositories –Scalable – Lucene-based index for enterprise level scalability IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
New Search UI
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
New Search UI
Type ahead search: 1. Suggests queries based on index content and past queries 2. Shows estimated results count as part of suggestion 3. Customizable by Search Administrators
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
New Search UI
Save your search and re-execute saved queries
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
New Search UI
Search within current results set
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
New Search UI
Quick select for file type searching
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
New Search UI
1 – Toggles on and off document properties such as filesize 2 – Allows users to set individual results display preferences
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
New Search UI
Automatic query expansion suggestions and spell check
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
New Search UI
Thumbnail view for first page of documents in results page
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
New Search UI Faceted search provides drill-through capabilities out-of-the-box and customizable by the business
Numeric and Date Range Facets
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Relevancy Tuning •Static Score Tuning –Ratio of static score contribution in the ranking algorithm is adjustable •URL Pattern Matching –Boost factor based on specific URL patterns •Boost Word Dictionaries –Automatic term boosting for words specified in custom dictionary •Query Term Boosting –Query term boosting based on query syntax
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
REST API •Custom Search and Admin applications can be implemented by REST API •Language independent •Provides all required functions for creating a search UI –Search navigation –Facet navigation –Search functions •Faceted search •Fetch content, thumbnails and previous document •List spell correction, synonym expansions and type-ahead suggestions •And more…
•Provides required functions for administrating search –Managing collections –Controlling and monitoring components –Adding documents to a collection
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Search Customizer •Administrator can modify major search UI configurations thru customizer GUI •Customization Points –Server Configuration •Search server’s hostname, port, and timeout…
–Appearance •Displayed application name, logo image, show/hide links, data source icons…
–Default value for search UI preference •Search page, facets, top results, results, result columns
•No need to restart the search session IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Search Customizer – Examples •Show fields as a result table column •Change the order of columns in results pages •Add or remove custom fields
Default
Customized
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Extending Platform Support •Leveraging existing infrastructure to deliver Enterprise Search
•IBM AIX V5.3 & 6.1 (64 bit) •Microsoft Windows Server 2003 (32 bit), 2008 (64 bit) •SUSE Linux 10 & 11 (32 & 64 bit) •Red Hat AS 5.0 (32 & 64 bit) •Oracle Solaris 10 (SPARC) •New z/Linux
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Rich Support for IBM Collaboration Products •Ongoing collaboration within IBM to optimize OmniFind search for WebSphere and Lotus •Supports the latest Lotus protocols, frameworks and product releases sooner than competing products •Designed to unlock the full value of Portal and Collaboration investments
Lotus Notes
Lotus Domino
IBM Portal Excellence Conference
Lotus Quickr July 19 – 22, 2010
Lotus Connections
Lotus Sametime
2010 Exceptional Web Experience Conference
Connectors to Enterprise Repositories •Lotus Web Content Management 6.1, 6.1.5 •Microsoft Exchange 2000, 2003 •Microsoft SQL Server Enterprise 2005, 2008 •MySQL 5.0 •Network News Protocol Newsgroup (NNTP) •Open Text Livelink Enterprise Server 9.6, 9.7, 9.7.1 •Oracle 9i, 10g, 11g •QuickPlace 6.5.1, 7.0 •SharePoint Server 2003 SP2, Sharepoint Server 2007 •Software AG Adabas 7.1 •Sybase 11.9.2, 12.0, 12.5.x •UNIX file systems •VSAM for z/OS 1.4 •Web servers (HTTP or HTTPS) •WebSphere Portal 6.0, 6.0.1, 6.1, 6.1.5 •Windows 2000, 2003, 2008 Server •Workplace Web Content Management 2.5, 2.5.1, 6.0, 6.0.1
IBM Portal Excellence Conference
•DB2 Enterprise Server Edition 8.1, 9.1, 9.5, 9.7 •DB2 Express Edition 8.1, 9.1, 9.5, 9.7 •DB2 for iSeries 5.4 and 6.1 •DB2 for z/OS 8.1 and 9.1 •DB2 Workgroup Server Edition 8.1, 9.1, 9.5, 9.7 •Domino Document Manager 6.5.1, 7.0 •Domino R7, R8, R8.5 •EMC/Documentum 6.0, 6.5 •FileNet P8 CM 4.0, 4.5, 4.5.1 •Hummingbird DM 5.1.0.5, DM 6.0.4 with SR6 •IBM Content Manager 8.4, 8.4.1, 8.4.2 •IBM IMS 10, 11.01 •Informix Dynamic Server Enterprise Edition 11.10, 11.50 •Lotus Connections 2.0, 2.0.1, 2.5.01 •Lotus Notes R6.5, 6.5.1, R7, R8, R8.5 •Lotus QuickPlace 6.5.1 and 7.0 •Lotus Quickr Services for Domino 8.0, 8.1, 8.2 •Lotus Quickr Services for WebSphere Portal 8.0.0.2, 8.1
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
WebSphere Portal Integration •OEE V9.1 provides KnowledgeDriven enterprise search capabilities to WebSphere Portal and related products –Provides new search portlet and ESSearchPortlet (for classic search collections) –Provides service for search center –Provides search bar integration
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Query Statistics •Query statistics UI shows: •Time transition of •Number of queries, number of users, average response time (ms), worst response time (ms)
•Query popularity •History of submitted queries
•Query Statistics enables you to: •Export history data to CSV file •Change time range, collection or user ID •Change display of charts or a table •Refresh data automatically
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Information Security Is Critical •Not all users are allowed to search all corporation information •OmniFind supports both early binding and late binding to achieve maximum security while maintaining optimum search performance
Pessimistic security ensures information confidentiality
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Document Level Security 1
At indexing, extract and index the ACLs or static tokens
2
At search time, get the user credentials (SSO, LDAP, LTPA)
3
At search time, pre-filter results against indexed tokens/ACLs
4
Post-filter results against the secure data source; return results 4
Plugin
CrawlerPlug-in Plug-in
1 IBM Portal Excellence Conference
2
Parser Indexer
Search I n d e x
July 19 – 22, 2010
3
2010 Exceptional Web Experience Conference
Flexible Multi-Server Configuration •Support scalable and flexible configuration –Multiple Document Processing Nodes –Multiple Search Runtime Nodes –HA Cluster for Crawler and Document Processing & Indexer
Index
Doc Processing (Parser/UIMA Pipeline)
CDSR
Crawl
IBM Portal Excellence Conference
RDS
Search Scale out
Index
Doc Processing /Index Doc Processing /Index
Search Index
Doc Processing (Parser/UIMA Pipeline))
Crawl
Search
Search Index Index
July 19 – 22, 2010
HA Cluster
2010 Exceptional Web Experience Conference
Distributed Server HA •Distributed server also can have a backup server on both crawler and indexer –May have multiple remote search servers –Should share same storage on same path with master and backup server –Indexer and crawler server's master/backup configuration is separated •User may have backup for index server but not for crawler server Search Master Crawler
Master Indexer Search
Backup Crawler
Backup Indexer
IBM Portal Excellence Conference
July 19 – 22, 2010
Search
2010 Exceptional Web Experience Conference
Apache UIMA – text analytics •OASIS Standard as of March 2009
•Enables interoperability of different analytics solutions and enterprise applications •Provides an SDK for building and composing text analytics •Defines a common interface for integrating text analysis modules •Enables development of new and re-use of existing components for analysis
Search Index Identify Relationships
Named-entity extraction
Categorization
Find Words & Roots
Text
Identify Language
Identify Relevant Entities → Build Structure People, places, organizations, relationships Parts, problems, conditions Topics, products, interests, sentiment Times, events, threats, plots, associations
Extracted Metadata
Database
and Facts
Applications
Text Analysis Modules – aka “Annotators” IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
For More Information (1) WebSphere Portal – IBM Site http://www-3.ibm.com/software/genservers/portal/ WebSphere Portal Information Center http://www.ibm.com/developerworks/websphere/zones/portal/proddoc.html WebSphere Portal Business Solutions Catalog (on Lotus Greenhouse) https://greenhouse.lotus.com/catalog/home_full.xsp?fProduct=WebSphere%20Portal WebSphere and Lotus Web Content Management Portal Open Beta https://www14.software.ibm.com/iwm/web/cc/earlyprograms/lotus/portalopenbeta/ WebSphere Portal Blog https://www.ibm.com/developerworks/mydeveloperworks/blogs/WebSpherePortal/
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
For More Information (2 ) IBM Lotus Connections http://www.ibm.com/software/lotus/products/connections IBM Lotus Forms http://www.ibm.com/software/lotus/forms IBM Lotus Quickr http://www.ibm.com/lotus/quickr IBM Lotus Sametime http://www.ibm.com/lotus/sametime WebSphere Commerce http://www.ibm.com/websphere/commerce WebSphere Process Server and Business Process Automation http://www.ibm.com/software/integration/wps
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
We Value Your Feedback! Please complete the session survey for this session:
TECH-B14 Session Speaker: Andreas Prokoph
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference
Questions?
IBM Portal Excellence Conference
July 19 – 22, 2010
2010 Exceptional Web Experience Conference © IBM Corporation 2010. All Rights Reserved. The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software. References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.
All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.
IBM, the IBM logo, Lotus, Lotus Notes, Notes, Domino, Quickr, Sametime, WebSphere, UC2, PartnerWorld are trademarks of International Business Machines Corporation in the United States, other countries, or both. Unyte is a trademark of WebDialogs, Inc., in the United States, other countries, or both.
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
IBM Portal Excellence Conference
July 19 – 22, 2010
If you reference Linux® in your presentation, please mark the first use and include the following; otherwise delete: