Feature
 Idea
 

Description

Context

Some users want to implement search using an implementation other than the 3 that exist (Solr, Database, Lucene) and they wish to plug this into XWiki.

ATM they have to make modifications to the following modules, in order to plug them to their Search implementation:

  • FAQ App
  • File Manager App
  • Document Tree Macro (part of Index App)
  • IRCBot App
  • Repository App
  • Search UI + Wiki UI for the Search suggest
  • Wiki Workspaces Migrator

That's very painful, hence the idea of providing a generic Search API.

Analysis

A generic search API should allow us to execute search requests (queries) on the XWiki model. The following is a list of search parameters that some search engines support:

  • query: the text the user is searching for; most of the time this is free text but it can support a complex query syntax that allows:
    • wildcard matching (foo*)
    • phrase search ("foo bar")
    • negative search (-foo)
    • field matching (title:foo)
    • boolean operators (title:foo AND author:bar)
    • proximity matching ("foo bar"~4)
    • ranges (creationdate:[NOW-1YEAR/DAY TO NOW/DAY+1DAY])
    • weights or boosts (title:foo^5 author:bar^2)
    • etc.

    Supporting only free text is very restrictive. Defining a generic XWiki Search Query Syntax is very complex. Each search engine may have its own syntax and so we would have to implement a parser for our syntax and then translators for all supported search engines (including database search).

  • query fields: in which fields to search; E.g. 'title^5 author^2 ...' means that the free text from the search query is matched against the "title" and "author" fields, with different weights (the title is more important than the author).
    • we can try to define a list of standard field names but we need to take into account that:
      • field names may depend on the search engine, e.g. we have only one 'title' field in database but we have 'title_XX' (where XX is the locale) and 'title_sort' in the Solr index. Also, in Solr we have fields for rendered values, while in database we have only raw values.
      • Solr supports both static and dynamic field names (dynamic schema). Database supports only static field names (static schema). Thus Solr can search directly in 'property.FAQCode.FAQClass.answer' while the database search has to perform some joins (after determining which tables have to be joined)
      • each search engine may have it's own set of special characters that are not allowed or that have to be escaped in the field names. The XClass name, for instance, may contain special characters that need to be escaped or encoded.
    • Not all search engines may support weights (boosts). Implementing weights on database search is complex.
  • filter query: a list of constraints on the fields (like a 'where' clause in SQL)
    • in Solr we have 3 space-related fields that can be used: 'spaces', 'space_exact', 'space_prefix'. Translating these into database search requires complex queries with joins
    • Solr supports range constraints, including complex date ranges like 'creationdate:[NOW-1YEAR/DAY TO NOW/DAY+1DAY]'. Other search engines may not support them.
  • field list: the list of fields that should be available on the search results, i.e. the information that you want to retrieve (like a 'select' clause in SQL)
  • sort fields: the list of sort fields and their order (e.g. 'score desc')
  • facet fields: the list of fields for which to enable faceting
    • faceting may not be supported by all search engines (e.g. database search)
    • each search engine may have its own faceting configuration parameters
  • highlighting whether you want the matches to be highlighted in the search results
    • Not all search engines may support highlighting and each search engine may have its own specific highlighting parameters

The steps that need to be taken are:

  • define the query syntax
  • define the list of standard fields
  • write a Java API that allows us to specify the request parameters listed above (should be based on the Query Manager)
  • write translators for various search engines (Solr and Database for a start)

Example:

#set ($query = $services.query.search('title:one -content:two three'))
#set ($query = $query.setQueryFields('title^5 name^2 content'))
#set ($query = $query.setFilterQuery([
  'type:DOCUMENT',
  'space_prefix:A.B',
  'locale:fr'
]))
#set ($query = $query.setFieldList('wiki spaces name locale'))
#set ($query = $query.setSort('score desc'))
#set ($query = $query.setFacetFields('author date'))
## Only receive 10 results
#set ($discard = $query.setLimit(10).setOffset(0))
#set ($searchResponse = $query.execute()[0])
#foreach ($searchResult in $searchResponse.results)
 #displaySearchResult($searchResult)
#end

 


Tags:
    

Get Connected