WAISE - Wiki AI Search Engine

Last modified by Ludovic Dubost on 2024/11/30 13:51

WAISE is a sub-project providing a way to chat with a wiki using AI. The goal of this project is to provide a way to interact with the XWiki content using natural language. In order to do this without risking exposing data that is confidential or protected by user access rights, we build this by integrating XWiki with a Vector Database and with open source models using an open source implementation of OpenAI provided by LocalAI.

The Planned Features and Architecture Overview document provides a more technical overview of the features and the architecture from a technical/implementation point of view.

WAISE Release

Information

The WAISE project has been released with version 0.7 here: https://extensions.xwiki.org/xwiki/bin/view/Extension/LLM/Index%20for%20the%20LLM%20Application/

Install it on top of XWiki 16.2+.

The code is available at: https://github.com/xwiki-contrib/ai-llm

Matrix bot: https://github.com/xwiki-contrib/ai-llm-matrix-bot

The Evaluation framework is available at: https://github.com/xwiki-contrib/ai-llm-benchmark

Docker of XWiki+LLM capabilities: https://github.com/xwiki-contrib/ai-llm/tree/main/docker

WAISE Architecture

WAISE architecture comprises 3 main components which are illustrated in the schema below:

  • WAISE Server
  • WAISE Client
  • XWiki WAISE Extensions
GPT4All
GPT4AllGPT4All
Indexing API
Indexing API
Chat API
Chat API
Embedding API
Embedding API
LLM API
LLM API
Fine-tuning API
Fine-tuning API
Completion API
Completion API
Authentication API
Authentication API
Vector Database API
Vector Database API
Chroma
Chroma
Solr Neural Search
Solr Neural Search
LocalAI
or other open source OpenAI implementations
LocalAIor other open source OpenAI implementations
ChatGPT
ChatGPT
Vicuna
VicunaVicuna
OpenAI API
OpenAI API
Indexing Module
Indexing Module
OpenAI API
OpenAI API
Access Control API
Access Control API
XWiki Engine
XWiki Engine
WAISE Server
WAISE Server
WAISE Client
WAISE Client
Builds on
Builds on
XWiki WAISE UI
XWiki WAISE UI
MPT-7B
MPT-7B
Bloom
Bloom
Crawling API
Crawling API
Check
access
rights
Check access rights
Send request and 
get response
Send request and get response
...
...
Send request
and get response
Send request and get response
WAISE JavaScript API
WAISE JavaScript API
WAISE UI
WAISE UI
XWiki Core
XWiki Core

WAISE Server

The WAISE Server comprises three main components:

  • An implementation of the OpenAI API.
  • An indexing API implemented on top of a Vector Database.
  • A crawling API to make it possible to interact in chat mode also with any public web site.

The WAISE server will be deployable as a standalone server based on several Docker images that we will create within the course of the project. WAISE will be deployable in standalone mode or with an integration layer for XWiki. The standalone mode will allow any platform to benefit from WAISE features for interacting in chat mode with a corpus of documents.  Making WAISE completely generic will also increase community involvement. Some components will be available both over REST or via Java native APIs so as to improvement performance when getting used directly from XWiki. Typically, since XWiki is already using Solr for full-text faceted search, XWiki will access the Solr Neural Search component via its Java APIs.

When a request is submitted to WAISE, it is processed in two steps:

  • First the document fragments matching the request are retrieved from the vector database.
  • Then the found documents fragments are sent to an OpenAI implementation (LocalAI or possibly others in the future, and ChatGPT for benchmarking purpose only) for generating an answer using these fragments as context. The size of the fragments, hence of the context window sent to the LLM is an important parameter that will need to be benchmarked since it has a direct impact on the time needed to compute the answer, and to its quality. The benchmark of models and of vector databases will allow to show how existing implementations differ in terms of performance and quality and how to obtain the best tradoff (see section below covering the benchmark aspects).

This approach alleviates the need to "unlearn" knowledge when the wiki pages get modified, while making sure that the most recent knowledge contained in the wiki is used by the AI when computing answers to domain-specific questions.

The Indexing API is a new API that will be designed and implemented during the course of the project. It adds a layer of access control on top of a Vector Database, which is a key requirement for using WAISE in an enterprise context. The Vector Database internally used by WAISE will be configurable. Several of them will be experimented and benchmarked in the project, in particular ChromaDB and Solr Neural Search.

WAISE communicates both with the XWiki backend for indexing the documents and for controlling access rights. Internally the Chat API communicates with LocalAI to get embedding data in the indexing phase and when receiving a request, and to generate a response.

WAISE also includes a crawling service in order to index sites on demand so that the WAISE chatbot can then submit questions to the knowledge automatically extracted from any public site that has been crawled.

WAISE Client

The WAISE client comprises the following main components:

  • A generic JavaScript API for interacting with the WAISE server
  • A generic UI for interacting visually with the WAISE server in chat mode.
  • An XWiki WAISE UI that will extend the generic WAISE UI
  • An integration of the JavaScript API in Matrix as a plugin that will ease conversations with WAISE directly from this widely used Matrix chat UI and infrastructure.

WAISE UI

The WAISE UI consists of a chat window allowing to submit requests in natural language, as illustrated below. A first UI has been started at XWiki SAS and a prototype will be released by August 2023. The UI exposes an input field and gives the option to select the language model to be used. It also gives access to a database of XWiki predefined prompts for assisting users in typical text generation tasks or questions.  Letting users express their request directly in natural language allows to progressively refine the requests. Typically, users will be able to refine the context to obtain more accurate answers to their questions. Advanced uesrs will have the option to choose among the set of available language models.

When it comes to solving performance issues with XWiki,
there are several steps you can take. Here are some general guidelines
to help you improve the performance of your XWiki installation:

  1. Analyze the performance issue: Start by identifying the specific areas where you're experiencing performance problems. Is it slow page loading, sluggish search functionality, high memory usage, or something else? Read more on how to identify XWiki performance issues.
  2. Check server resources: Ensure that your server has sufficient resources to handle the XWiki application. Check CPU, memory, and disk usage to see if any of these resources are maxed out. Read more about XWiki server resources optimization.
  3. Database optimization: XWiki relies on a database to store its data. Make sure your database is properly optimized for performance. This includes ensuring indexes are correctly set up, performing regular database maintenance (such as vacuuming or reindexing), and monitoring database performance metrics. Read more about XWiki database optimization.
  4. ...
  5. ...
  6. ...
  7. ...
When it comes to solving performance issues with XWiki, there are several steps you can take. Here are some general guidelines to help you improve the performance of your XWiki installation: Analyze the performance issue: Start by identifying the specific areas where you're experiencing performance problems. Is it slow page loading, sluggish search functionality, high memory usage, or something else? Read more on how to identify XWiki performance issues.Check server resources: Ensure that your server has sufficient resources to handle the XWiki application. Check CPU, memory, and disk usage to see if any of these resources are maxed out. Read more about XWiki server resources optimization.Database optimization: XWiki relies on a database to store its data. Make sure your database is properly optimized for performance. This includes ensuring indexes are correctly set up, performing regular database maintenance (such as vacuuming or reindexing), and monitoring database performance metrics. Read more about XWiki database optimization.............


Hello! How can I assist you today?
Hello! How can I assist you today?Hello! How can I assist you today?
Insert selection
Insert selection
WAISE Assistant (Model: GPT4All-J)
WAISE Assistant (Model: GPT4All-J)
WAISE Assistant (Model: GPT4All-J)
WAISE Assistant (Model: GPT4All-J)
Choose prompt
Choose prompt
Hello WAISE. How can I solve a performance issue with XWiki?
Hello WAISE. How can I solve a performance issue with XWiki?Hello WAISE. How can I solve a performance issue with XWiki?Hello WAISE. How can I solve a performance issue with XWiki?
Send
SendSend
Choose prompt
Choose prompt
Good. Now can you provide more details on how to make sure the database is properly configured and optimized?
Good. Now can you provide more details on how to make sure the database is properly configured and optimized? Good. Now can you provide more details on how to make sure the database is properly configured and optimized? Good. Now can you provide more details on how to make sure the database is properly configured and optimized?
Send
SendSend
Insert selection
Insert selection
GPT4All-J
Vicuna
Llama
MPT-7B
...
GPT4All-JVicunaLlamaMPT-7B...
Choose model
Choose model
Choose model
Choose model

Matrix WAISE UI

The integration of WAISE capabilities directly into Matrix will bring productivity gains to users by allowing to interact with WAISE from a chat room to submit questions to a wiki and conversely for asking WAISE to summarize chat discussions and to store them in the wiki. Such usages are relevant only if each user can access in view/edit mode only the documents he's allowed to view or modify indeed, hence the need for integration between WAISE  and an Access Control API.

XWiki WAISE Extensions

XWiki WAISE Server Extensions

On the one hand, a new API will be designed an implemented as an XWiki extension for interacting with any OpenAI API implementations. Two implementations will be provided: one over REST, one in native Java for some components such as the Solr Neural Search, since XWiki already embeds Solr Core as a Java extension. On the other hand, some existing XWiki API or modules will be called by several WAISE components for checking user rights over the XWiki Access Control API and for receiving documents to be indexed from the XWiki Indexing module.

XWiki WAISE UI Extension

This extension will build on the generic WAISE UI extension to add chatting features for specific use cases in an XWiki context, in particular for easing the workflow of text from/to the wiki to/from the WAISE outputties (eg insert part of a WAISE answer into an XWiki page). This XWiki UI will also include other AI capabilities developed separately such as text generation and document summarization.

Benchmark

We will benchmark two OpenAI implementations: LocalAI and ChatGPT. LocalAI is "a drop-in replacement REST API that is compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs (and not only) locally or on-prem with typical cloud or consumer hardware, supporting multiple model families that are compatible with the ggml format." It has support for embeddings, is written in Go and is available under the MIT License. We have selected it because it allows to use a wide range of open source LLMs and embeddings via the OpenAI API. The LocalAI server will run on a dedicated machine since it needs its own configurable resources for running highly CPU/GPU intensive tasks using the LLMs, with the possibility to easily scale up.

The benchmark will allow to compare the performance and quality of open source LLMs with ChatGPT. We will create a corpus of documents to be indexed and a set of well-defined prompts to be executed on this corpus. Then we will measure the outputs both qualitatively and quantitatively for the a selection of 5 open source LLMs and for ChatGPT. The results will be gathered and published with the whole dataset and measurement tools as a public document.

We will measure both on CPU and on GPU hardware the following indicators among others (a full list of relevant indicators will be drawn during the course of the project):

  • Inference speed
  • Memory consumption
  • Quality of the results (scale 1 to 10 from a Human)
  • Keep each response for future comparison

Detailed Interactions Between WAISE Components

Integrated WAISE Server

In this architecture, the WAISE server is directly integrated  in XWiki and uses the Solr Neural Search Vector database.

XWiki Engine














XWiki Engine
Vector DB
(SOLR with Neural Search / ChromaDB / other)
Vector DB(SOLR with Neural Search / ChromaDB / other)
OpenAI REST
API
OpenAI RESTAPI
Find
Matching
Docs
Find Matching Docs
Answer API
with Rights verification
Answer APIwith Rights verification
Local AI
(Compatible with OpenAI API)
Local AI(Compatible with OpenAI API)
AI Model 1
AI Model 1
AI Model 2
AI Model 2
AI Model 3
AI Model 3

XWiki UI













XWiki UI
Generative AI UI
Generative AI UI
Genertive AI
request
Genertive AI request
Generative AI Request
Generative AI Request
WAISE Request
WAISE Request
WAISE
Search Request

WAISE Search Request
Request to find response
in provided fragments
Request to find response in provided fragments
Indexing API
Indexing API
Indexing documents
in Vector format
Indexing documents in Vector format
Matrix WAISE Bot
Matrix WAISE Bot
Matrix UI
Matrix UI
WAISE Search Request
WAISE Search Request
WAISE Search Request
WAISE Search Request WAISE Search Request
Get embedding data
from LocalAI
Get embedding data from LocalAI
WAISE Chat UI
WAISE Chat UI

AI Search using external WAISE Server

XWiki Engine













XWiki Engine
WAISE Server










WAISE Server
OpenAI REST
API
OpenAI RESTAPI
Rights check API
Rights check API
Local AI
(Compatible
with OpenAI API)
Local AI(Compatible with OpenAI API)
AI Model 1
AI Model 1
AI Model 2
AI Model 2
AI Model 3
AI Model 3

XWiki UI













XWiki UI
Generative AI UI
Generative AI UI
Generative AI
request
Generative AI request
Generative
AI Request
Generative AI Request
WAISE
Search Request

WAISE Search Request
Matrix WAISE Bot
Matrix WAISE Bot
Matrix UI
Matrix UI
Waise Search Request
Waise Search Request
WAISE Search Request
WAISE Search Request WAISE Search Request
Indexing
Module
Indexing Module
Vector Database
Vector Database
Search API (Compatible OpenAI API)
Search API (Compatible OpenAI API)
WAISE
Search Request
WAISESearch Request
Verify rights
Verify rights
Send
Doc to index
Send Doc to index
Generate
Response
Generate Response
Index
Index
Query
Vector DB
QueryVector DB
Indexing API
Indexing API
Get embedding data
Get embedding data
WAISE Chat UI
WAISE Chat UI
Get embedding
data
Get embedding data

Milestones

The following provides a list of milestones. These milestones are based on the modules defined in the Implemented Features and High-level Architecture. The months indicated for the different milestones might get adapted as we progress on the implementation.

  1. (November) Implement the model API and UI/refactor the existing code and add a basic version of the index, including chunking and vector database and some version of authentication and a UI. Could be a simple version of each of these parts without more advanced features.
  2. (December) Implement the RAG and expose it as an OpenAI-compatible API. Work a bit on tooling for the benchmark to regularly test the current state.
  3. (January) Work on the JavaScript API and chat UI that can be embedded in other applications and rewrite the existing XWiki chat UI to be based on that code.
  4. (February) Implement a crawler/indexer for XWiki (could be local and/or remote) and work on better chunking in particular for XWiki content.
  5. (March) Work on the benchmark and perform test runs of the benchmark.
  6. (April) Implement a simple web crawler and maybe also a crawler for another system. Polish the existing parts.
  7. (May) Delivery of a first version of WAISE as XWiki extensions
  8. (June) Docker image, matrix bot
  9. (July) Run benchmark with human evaluation, polishing of existing features
  10. (August) Delivery of the docker image and matrix bot

All the milestones have been achieved:

The WAISE project has been released with version 0.7 here: https://extensions.xwiki.org/xwiki/bin/view/Extension/LLM/Index%20for%20the%20LLM%20Application/

Install it on top of XWiki 16.2+.

The code is available at: https://github.com/xwiki-contrib/ai-llm

Matrix bot: https://github.com/xwiki-contrib/ai-llm-matrix-bot

The Evaluation framework is available at: https://github.com/xwiki-contrib/ai-llm-benchmark

Docker of XWiki+LLM capabilities: https://github.com/xwiki-contrib/ai-llm/tree/main/docker

Get Connected