Increase the 255 characters limit for references

Last modified by Vincent Massol on 2024/11/19 16:14

 XWiki
 Feature
 Completed
 

Description

Where is this limitation coming from ?

In the XWiki database schema document references are stored in a varchar(255) so if you try to save a document with a longer reference it will fail.

The reason for this 255 is that back when this was defined it was the maximum size an indexed utf8 varchar could have in MySQL.

One difficulty is that it's not just an informational field which store the reference, it's also used a lot in all kind of requests and especially in request involving the xobject where it's used is as a foreign key (even if it's not officially configured as foreign key in the schema).

Here are all the database fields affected by the reference related 255 chars limit (with hibernate naming):

xwiki.hbm.xml:

  • XWikiDocument
    • fullName
    • name
    • author
    • contentAuthor
    • creator
    • web/space
  • XWikiSpace
    • reference
    • name
    • parent
  • XWikiRCSNodeInfo
    • author
  • XWikiDeletedDocument
    • fullName
  • DeletedAttachment
    • filename
    • docName
  • XWikiLock
    • userName
  • XWikiLink
    • link
    • fullName
  • XWikiAttachment
    • filename
    • author
  • BaseObject
    • name
    • className
  • DocumentStats
    • name
  • RefererStats
    • name
  • VisitStats
    • name

eventstream.hbm.xml:

  • LegacyEvent
    • user
    • space
    • page

notification-filter-preferences.hbm.xml:

  • DefaultNotificationFilterPreference
    • page

Often used to store references:

xwiki.hbm.xml:

  • StringProperty
    • value

Other fields limited to 255 characters and that we might want to increase too depending on the chosen solution:

xwiki.hbm.xml:

  • XWikiDocument
    • title (255 characters is often a problem since the title can contains Velocity scripting)
    • customClass (contains a Java class complete name)
  • BaseProperty
    • classType (contains a Java class complete name)

Not really used:

xwiki.hbm.xml:

  • XWiki.XWikiPreferences
    • skin
    • stylesheet
    • stylesheets
    • editor
    • title
  • XWiki.XWikiComments
    • author

Non-breaking changes

Non-breaking changes all imply:

  • keeping all the fields listed above as otherwise you break every single xobject related HQL query (it's a bit less of a problem for XWQL where this link between the document and the xobject is hidden) for example
  • keep fields like Document.fullName indexed and equals to an indexed name field in the baseobject table as otherwise the performance impact can be huge when there is a lot of documents for example

Increase the size of the reference related fields

255 was chosen a long time ago and since then the maximum size of an index increase a lot in MySQL.

By looking at the constraints in the various databases we support it seems we could increase this number to 768 by default.

Pros:

  • don't need to change anything in the code
  • very easy to do, just need a migration

Cons:

  • 768 is a lot better than 255 but still not huge, of course we can make this number depend on the database, but MySQL is still the most used by far

MySQL/MariaDB

The maximum index size is 3072 bytes.

In utf8mb4 characters takes a maximum of 4 bytes, so the maximum size for our varchar is 768.

PostgreSQL

The maximum length for an indexed value is 2712 bytes but contrary to MySQL Postgres let you create an indexed varchar with whatever maximum size you want, and it's only when the text is compressed and inserted that it's going to check if it's under the 2712 bytes limit. It means that you can insert 100.000 characters string which is composed by repetitive characters easily because it will be compressed far below 2712, but you may not be able to insert some string with 4000 characters because the compressed size is greater than 2712 bytes.

In summary the Postgres limit is actually higher that the MySQL one in practice.

Oracle

The maximum size is 6398 bytes by default.

HSQLDB

The documentation claims that there is no limit, need to be tested. At least it's fine creating 768 varchar (which is good since we tend to use the same setup for both MySQL and HSQLDB).

SQL Server

For a nonclustered index key, the maximum is 1700 bytes.

Provide new EntityNameValidation components related to page/reference size

The idea is to provide new page EntityNameValidation components with different ways of reducing the size of a page reference or name.

Pros:

  • the limitation is not really a problem anymore

Cons:

  • that's not a generic solution, every code which want to benefit from it will have to go through EntityNameValidationManager before creating a new page

Cut the name of a page above a configurable short number and suffix it with a unique id

The idea is that if you have a page name with is greater than 30 characters you would take the first 30 characters and find a unique name by incrementing a counter.

Pros:

  • does not change anything for pages names under configured maximum size

Cons:

  • even if it's greatly increased you still have a maximum size which depend on the configured maximum name size

Use a short size unique id instead of a meaningful name for the page name

The idea would be to store a unique id (for example a simple unique counter based on the already existing pages in a given parent space) String instead of a name at each level of the reference.

Pros:

  • the maximum size of a reference become a lot bigger (if you consider spaces with less than 255 pages you can go up to a reference with about 768/3=256 elements)

Cons:

  • manipulate this kind of name is quite a pain in scripts (but it's less of a problem if an advanced user have an easy way to disable this when creating things like classes for example)
  • URLs become blind ids
  • most of the UI expose the title, but it's not yet 100% (especially in extensions)

Breaking changes

Arbitrary inode system

A very efficient way to not have this kind of limitation would be to do like filesystems and introduce an inode system where each document actually have an auto incremented inode and where you have a mapping between a reference and an inode.

Breaking changes:

  • you cannot deduce the document id from the reference anymore
  • every request which manipulate any of the fields listed in the top is broken (they are all replaced by inodes expect maybe for user related ones): you don't have the document reference is the object table anymore, you have to do a join through the document id (which became an inode)

Pro:

  • inode system comes with a very nice new feature: proper page aliases/links. Indeed, once a document is associated to an inode and not a reference anymore you can have several references pointing to the same inode
  • if we change the way to link documents and xobject we can as well allow xojects to reference a specific document translation instead of only the default one

 


Get Connected