Instances replication

Last modified by Vincent Massol on 2024/11/19 16:14

 XWiki
 Feature
 Active
 
 

https://extensions.xwiki.org/xwiki/bin/view/Extension/Replication%20Application/

Description

The idea is to make completely different instances of XWiki replicate documents.

Constraints

  • communication between instances should use HTTP protocol
  • instances should have the same history (if not instantly at least have a correction mechanism when there are conflicts)
  • being able to configure the replication of a document, a space or a wiki
  • support sub-network configuration with relays and not just a single network with all instance having access to each other

Security

Instances should authorize each other based on unique identifiers:

  • the XWiki instance id used for pings
  • and/or something based the base URL of the instance since it's needed anyway

There are two levels for each document:

  • the controller: each replicated document is assigned a controller which is the reference instance in case of conflict and the one deciding which instance is allowed to get the entity
  • a member: receive update of the entity from the controller and send its own modifications to the controller

help What level of right the current user need to start replicating a document ? Might be EDIT by default and have a configuration to change that for ADMIN (or even VIEW) for example.

private/public keys

A good way to secure communication between exposed instances is to exchange keys between instances:

  • a source instance is given indicated a target instance
  • the target instance receive a public key along with the unique id of the source instance that need to be accepted
  • the target instance send back its own public key and unique id to the source instance

It should be possible to disable the private/public key system and only rely on the identifiers in case someone wants a more custom control on communication between instances in a local network.

help are each instance going to have its own XWiki id or will they share the same id for license purposes ?

Relays

Since one of the plans is to support configuration where the controller does not have direct access to all instances, some of them will have to relay the messages. 

Since the controller does not see all instances, it cannot fully control which instance is allowed to replicate a document, so this mode won't be supported with the UI based configuration of replication.

help need to figure out how an instance knows it needs to relay a message to other instances:

  • all instances are relays and send all the messages to everyone all the time: need a way to figure out that you always received a message from another source and stop the chain
  • the controller send with each message the ids of the instances it has access to, the relay can then compare this list to the instances it knows and on which the entity is replicated
  • some instances are explicitly configured as relay for a given list of instances sources

Conflict resolution

When the controller receive conflicting changes it automatically merge and resolve conflict and then send back history corrections to the other instances.

To not be sure to not lose any data, all the conflicting versions are saved as is in the history and then the merged result, so that it's always possible to go back to a working version easily if something went wrong with the merge. If a real hard conflict was found while merging a notification is also created.

UI

Wiki admin UI

accept extension point: configurable administration section

We need an administration UI to expose a few things:

  • link between instances
    • add a new instance
    • remove an instance
  • pending link requests from other instances
  • auto accept replication (and what to do in case of already existing document)
  • replication control of the wiki entity

Space (Page and children) admin UI

help extension point: space level configurable administration section

  • replication control of the space entity

Document (Final Page) admin UI

help extension point: dedicated document tab

  • replication control of the document entity

Entity replication control UI

  • controller:
    • add new member
    • remove one of the members (or stop replication completely, which also removed the documents on the members)
    • change the controller
    • help choose if a member is readonly (the controller will send updates for the specific entity to the member but will cancel any update coming from it)
    • excluded entities
  • member:
    • help refuse the replication
    • excluded entities

Controller instance

Each replicated document have a controller instance.

  • by default it's the instance which exposed an entity for the first time
  • it is in charge of dispatching changes, history modifications and corrections in case of conflicts
  • it controls which of the instances are allowed to receive the resource
  • it's the only instance allowed to:
    • delete the resource
    • modify the history help

General architecture

Transmission layer

A generic data transmission framework which can be used by anything to send arbitrary resources between instances. Take a serializable object on one side and call a listener on the other side with the unserialized object as input.

Document replication layer

Rely on the transmission layer to share:

  • documents modification
  • document history modification

It also exposes an extension point to send document related unique data which are not stored as part of the document.

Replication configuration store

General configuration

A dedicated configuration page.

This page needs to be automatically excluded from the sharing system.

Entity replication configuration

The controller and the members won't have the same configuration, so this configuration should not be shared.

  • Dedicated database table: probably the simplest and safest option
    • accept easy to initialize
    • accept not polluting the document serialization/export
    • cancel need to cleanup the table in a listener when an entity is deleted
  • Dedicated Solr core
    • cancel core are quite painful to initialize in remote mode
    • accept not polluting the document serialization/export
    • cancel we don't really have full text search need so a Solr core would not really bring any value compared to a dedicated table
  • xobject: by definition the sharing configuration is not really part of the document, so it should not be part of its metadata
    • accept xobject editor
    • cancel pollute the document serialization/export

 


Get Connected