WAISE - Wiki AI Search Engine
WAISE is a sub-project providing a way to chat with a wiki using AI. The goal of this project is to provide a way to interact with the XWiki content using natural language. In order to do this without risking exposing data that is confidential or protected by user access rights, we build this by integrating XWiki with a Vector Database and with open source models using an open source implementation of OpenAI provided by LocalAI.
The Planned Features and Architecture Overview document provides a more technical overview of the features and the architecture from a technical/implementation point of view.
WAISE Architecture
WAISE architecture comprises 3 main components which are illustrated in the schema below:
- WAISE Server
- WAISE Client
- XWiki WAISE Extensions
WAISE Server
The WAISE Server comprises three main components:
- An implementation of the OpenAI API.
- An indexing API implemented on top of a Vector Database.
- A crawling API to make it possible to interact in chat mode also with any public web site.
The WAISE server will be deployable as a standalone server based on several Docker images that we will create within the course of the project. WAISE will be deployable in standalone mode or with an integration layer for XWiki. The standalone mode will allow any platform to benefit from WAISE features for interacting in chat mode with a corpus of documents. Making WAISE completely generic will also increase community involvement. Some components will be available both over REST or via Java native APIs so as to improvement performance when getting used directly from XWiki. Typically, since XWiki is already using Solr for full-text faceted search, XWiki will access the Solr Neural Search component via its Java APIs.
When a request is submitted to WAISE, it is processed in two steps:
- First the document fragments matching the request are retrieved from the vector database.
- Then the found documents fragments are sent to an OpenAI implementation (LocalAI or possibly others in the future, and ChatGPT for benchmarking purpose only) for generating an answer using these fragments as context. The size of the fragments, hence of the context window sent to the LLM is an important parameter that will need to be benchmarked since it has a direct impact on the time needed to compute the answer, and to its quality. The benchmark of models and of vector databases will allow to show how existing implementations differ in terms of performance and quality and how to obtain the best tradoff (see section below covering the benchmark aspects).
This approach alleviates the need to "unlearn" knowledge when the wiki pages get modified, while making sure that the most recent knowledge contained in the wiki is used by the AI when computing answers to domain-specific questions.
The Indexing API is a new API that will be designed and implemented during the course of the project. It adds a layer of access control on top of a Vector Database, which is a key requirement for using WAISE in an enterprise context. The Vector Database internally used by WAISE will be configurable. Several of them will be experimented and benchmarked in the project, in particular ChromaDB and Solr Neural Search.
WAISE communicates both with the XWiki backend for indexing the documents and for controlling access rights. Internally the Chat API communicates with LocalAI to get embedding data in the indexing phase and when receiving a request, and to generate a response.
WAISE also includes a crawling service in order to index sites on demand so that the WAISE chatbot can then submit questions to the knowledge automatically extracted from any public site that has been crawled.
WAISE Client
The WAISE client comprises the following main components:
- A generic JavaScript API for interacting with the WAISE server
- A generic UI for interacting visually with the WAISE server in chat mode.
- An XWiki WAISE UI that will extend the generic WAISE UI
- An integration of the JavaScript API in Matrix as a plugin that will ease conversations with WAISE directly from this widely used Matrix chat UI and infrastructure.
WAISE UI
The WAISE UI consists of a chat window allowing to submit requests in natural language, as illustrated below. A first UI has been started at XWiki SAS and a prototype will be released by August 2023. The UI exposes an input field and gives the option to select the language model to be used. It also gives access to a database of XWiki predefined prompts for assisting users in typical text generation tasks or questions. Letting users express their request directly in natural language allows to progressively refine the requests. Typically, users will be able to refine the context to obtain more accurate answers to their questions. Advanced uesrs will have the option to choose among the set of available language models.
Matrix WAISE UI
The integration of WAISE capabilities directly into Matrix will bring productivity gains to users by allowing to interact with WAISE from a chat room to submit questions to a wiki and conversely for asking WAISE to summarize chat discussions and to store them in the wiki. Such usages are relevant only if each user can access in view/edit mode only the documents he's allowed to view or modify indeed, hence the need for integration between WAISE and an Access Control API.
XWiki WAISE Extensions
XWiki WAISE Server Extensions
On the one hand, a new API will be designed an implemented as an XWiki extension for interacting with any OpenAI API implementations. Two implementations will be provided: one over REST, one in native Java for some components such as the Solr Neural Search, since XWiki already embeds Solr Core as a Java extension. On the other hand, some existing XWiki API or modules will be called by several WAISE components for checking user rights over the XWiki Access Control API and for receiving documents to be indexed from the XWiki Indexing module.
XWiki WAISE UI Extension
This extension will build on the generic WAISE UI extension to add chatting features for specific use cases in an XWiki context, in particular for easing the workflow of text from/to the wiki to/from the WAISE outputties (eg insert part of a WAISE answer into an XWiki page). This XWiki UI will also include other AI capabilities developed separately such as text generation and document summarization.
Benchmark
We will benchmark two OpenAI implementations: LocalAI and ChatGPT. LocalAI is "a drop-in replacement REST API that is compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs (and not only) locally or on-prem with typical cloud or consumer hardware, supporting multiple model families that are compatible with the ggml format." It has support for embeddings, is written in Go and is available under the MIT License. We have selected it because it allows to use a wide range of open source LLMs and embeddings via the OpenAI API. The LocalAI server will run on a dedicated machine since it needs its own configurable resources for running highly CPU/GPU intensive tasks using the LLMs, with the possibility to easily scale up.
The benchmark will allow to compare the performance and quality of open source LLMs with ChatGPT. We will create a corpus of documents to be indexed and a set of well-defined prompts to be executed on this corpus. Then we will measure the outputs both qualitatively and quantitatively for the a selection of 5 open source LLMs and for ChatGPT. The results will be gathered and published with the whole dataset and measurement tools as a public document.
We will measure both on CPU and on GPU hardware the following indicators among others (a full list of relevant indicators will be drawn during the course of the project):
- Inference speed
- Memory consumption
- Quality of the results (scale 1 to 10 from a Human)
- Keep each response for future comparison
Detailed Interactions Between WAISE Components
Integrated WAISE Server
In this architecture, the WAISE server is directly integrated in XWiki and uses the Solr Neural Search Vector database.
AI Search using external WAISE Server
Milestones
The following provides a list of milestones. These milestones are based on the modules defined in the Implemented Features and High-level Architecture. The months indicated for the different milestones might get adapted as we progress on the implementation.
- (November) Implement the model API and UI/refactor the existing code and add a basic version of the index, including chunking and vector database and some version of authentication and a UI. Could be a simple version of each of these parts without more advanced features.
- (December) Implement the RAG and expose it as an OpenAI-compatible API. Work a bit on tooling for the benchmark to regularly test the current state.
- (January) Work on the JavaScript API and chat UI that can be embedded in other applications and rewrite the existing XWiki chat UI to be based on that code.
- (February) Implement a crawler/indexer for XWiki (could be local and/or remote) and work on better chunking in particular for XWiki content.
- (March) Work on the benchmark and perform test runs of the benchmark.
- (April) Implement a simple web crawler and maybe also a crawler for another system. Polish the existing parts.
- (May) Delivery of a first version of WAISE as XWiki extensions
- (June) Docker image, matrix bot
- (July) Run benchmark with human evaluation, polishing of existing features
- (August) Delivery of the docker image and matrix bot