Speedup document history update
Description
When a document is saved, its history is incremented with the current version. Problem is that this history update is extremely slow right now.
What is slow exactly seems to be the creation of the diff.
Improve the diff implementation
The diff is currently done by JRCS and while it's probably possible to improve it a bit, we can only go so far.
Stop creation the diff
The alternative is to simply stop creating diffs. It has various benefits (much simpler, much faster history load/delete). The main problem to deal with is how to optimize the space used by the history.
Just disable the diff
This already be done with a simple configuration: add xwiki.store.rcs.nodesPerFull=1 to xwiki.cfg. We could decide to make it the default.
Store a compressed version of the document XML
The idea is to store a compressed version of the XML instead of what we currently have to try to win some space. If we can find a compression which is good enough to take less than 20% of the XML it will even take less space than the current behavior (in which 1 version out of 5 is complete).
Use native database compression
All officially supported database have a notive solution to store but textual content:
- PostgreSQL: it's compressed by default, but only for content bigger than 2kB (can be changed using the `TOAST_TUPLE_THRESHOLD` and `TOAST_COMPRESSION_THRESHOLD`, but the documentation is not super explicit on how to actually set those, might be more obvious for someone more used to PosgreSQL than me) which can happen for a document without any xobject and a small content
- [{MySQL>>https://dev.mysql.com/doc/refman/8.4/en/innodb-row-format.html#innodb-row-format-compressed]] and MariaDB: it can be enabled when creating the table (if we find a way to tell the hibernate initializer that), or shortly after the creation (on our side) by setting the raw format `COMPRESSION` (the default is DYNAMIC which also have some storage optimization, but not as much as `COMPRESSION`)
- Oracle: similarly to MySQL and MariaDB, you can tell Oracle that you would like the table to be COMPRESSED
- HSQLDB: this one is not too critical as it's not really recommended to use it in production, but it can be enabled in the hibernate.cfg.xml using the property `hsqldb.lob_compressed`
Do the compression on our side
Ideas of compression libraries to evaluate:
Base64
Encode the compression version in base 64 so that we can reuse the current string storage.
Then either we try to determine from the content if it's base64 encoded, or we introduce a new property to indicate the type of content (clear, base64 compressed).
Binary
Introduce a new binary field to store the compressed version. More explicit and a bit better in terms of performances and storage.
Thomas Mortagne
Michael Hamann