Parent-child + Unique ID + Nested Documents

Last modified by Vincent Massol on 2024/11/19 16:14

Manage
- Copy
Actions
Viewers
- Source
- Siblings
- Attachments
- History
- Information
- Likes

Product: XWiki

Type: Requirements

Status: Idea

Participants

Description

Main idea
Uniquely identifying a document node in the document tree
- Absolute references (paths)
- Parent-child relationship instead of absolute reference/path
Unique document ID
Advantages of using Unique ID + Nested Documents based on parent-child
Exporting XAR format
Migration to Parent-child + Unique ID + Nested Documents
- From a previous XWiki version
  - Migrating relative links
- From a different wiki system that supports ND and wants to preserve hierarchy URLs
Other changes for backwards compatibility
- Add XWiki.getDocumentByID(String documentID)
Other implementation notes

Main idea

Drop the notion of "space" in XWiki altogether and instead use the existing parent-child relationship ("parent" field") so that documents only store the relative hierarchy. This helps build a tree structure that is resistant to changes in the hierarchy.
Instead of references (Space.Name) use unique document names that are initially deduced from the user filled document title and then tweaked by users to their liking (so that they get a nice URL) or in case the deduced document ID is already taken (so involve the user in the ID selection if need be). This allows us to get retrieve a document in O(1), using its ID. Useful in URL resolution and other operations.
Update the rights system to use the parent-child relationship instead of the document reference. This allows the inheritance of rights based on the parent-child relationship.

Uniquely identifying a document node in the document tree

Absolute references (paths)

The problem with storing the document's absolute path in the "space" field in the database (e.g. "Space1.Space2.Space3" for a document located in Space3) is that you need to maintain this path up to date in all the levers of your document tree.

A>B>C

B: space = "A"

C: space = "A.B"

If I move document A under some other document, the I need to go through all the children of A (direct and indirect), recursively, so I can update the "space" field (path).

X>A => X>A>B>C

A: space = "X"

B: space = "X.A"

C: space = "X.A.B.C"

If instead we are to rename the document A, we would have to perform a similar operation, again, recursively.

When deleting document A, we have 2 options:

delete just the document A and recursively update all its children to "upgrade" them one level in the hierarchy or
delete the node A and, recursively, all its children

Parent-child relationship instead of absolute reference/path

If we only store relative parent-child information in each document node, then the tasks above translate to the following.

A>B>C

B: parent = "A"

C: parent = "B"

Move document A:

X>A => X>A>B>C

A: parent = "X"

B: parent = "A" (unmodified)

C: parent = "B" (unmodified)

The problem that we face in this case is how to identify the parent of the document if we only store the immediate parent and not the entire path. In other words, if you only have "C" as parent, how do you know which "C" it is in the tree of documents you have?

When using absolute references you fix this problem because there is only one path "A>B>C" in your tree leading to your node. However, we have established above that storing such paths for each node of the document tree is inefficient and unmaintainable, so we need to add a restriction on what "C" actually means. We need to make "C" unique in the document tree, so that we have all the information we need to uniquely identify a document in the tree.

Unique document ID

Current document ID (reference): Space.Name

Nested spaces proposal document ID: Space1.Space2.SpaceN.Name

Nested documents proposal document ID (based on nested spaces): Space1.Space2.Name.WebHome

UniqueID proposal document ID: Name

How the unique document ID is created, at document creation time:

User enters document title (e.g. "My Document")
Document ID is extracted from title (e.g. "MyDocument")
The user is presented with the resulting URL of his document (e.g. "http://host/xwiki/bin/view/MyDocument")
1. If the document ID "MyDocument" already exists, the user is asked to provide an unique document URL (ID).
2. If the document ID "MyDocument" is available, the user clicks next and proceeds to editing the newly created document.

Rationale:

We already have 2 cases where this creation flow is successfully applied: wiki creation and AWM app creation. Even for new documents, we are already involving the user in picking an unique document ID, just that we are hiding this unicity constraint by prefixing the document ID with the space (Space.Name) so that the unicity constraint is pushed at a space level (but only because the prefix).

This idea is old:

Even since from the beginning we have identified the need for an unique document ID:

Issue	Title	Creator
http://jira.xwiki.org/browse/XWIKI-117	Switching to GUID for document and objects	(Ludovic 2005)
http://jira.xwiki.org/browse/XWIKI-1021	New ID model	(Catalin 2007)

Until now, we have considered this Unique ID problem from the perspective that the ID needs to be machine generated and not human readable. This proposal suggests that we instead involve the users in the selection of the document ID, presented from the perspective of the resulting URL, since that is the only place where users should care about the document's name (i.e. more than its title).

Scripts

Another big advantage of not using random machine generated unique IDs is that they are also usable in scripts, so we no longer need to worry about how to grab the "XWiki.XWikiUsers" class when it is named "23EA233B" and instead we can still grab it by the ID "XWiki.XWikiUsers", or "MyDocument" or "MyDocument1", etc.

Advantages of using Unique ID + Nested Documents based on parent-child

URLs

Shorter URLs

/bin/view/MyDocument uniquely identifies the document MyDocument inside the wiki, while preserving the document hierarchy.

I.e. /bin/view/MyDocument takes you directly to the "MyDocument" document which is currently inside the hierarcy "Parent1>Parent2>MyDocument>Child1>etc."

More expressive URLs (optional / just an idea)

/bin/view/MyDocument#The+Title+Of+My+Document

Note: Using anchor instead of path elements to allow changes in the document title for the future while preserving existing bookmarks.

URLs become resistant to hierarchy changes

/bin/view/MyDocument will remain the same even if the parent of the document changes since MyDocument uniquely identifies the document, regardless of its position in the document hierarchy.

On the other hand, in the Nested Spaces-based approach, in an URL of /bin/view/Space1/Space2/Space3/Page1 if you rename/move/remove Space2, all the child document URLs will be affected and all those bookmarks will be invalidated.

The same bookmark invalidation happens to Nested Documents based on Nested Spaces where you have an URL similar to /bin/view/Page1/Page2/Page3 and you alter Page2.

Support for path in URLs while preserving resistance to hierarchy changes

/bin/view/Parent1/Parent2/MyDocument is supported because we only need the last path element to identify the document MyDocument. The parent information can be exposed in the URL if desired, for aesthetic reasons, but it is not really useful/needed for practical reasons. As an addition, the parent information can be validated such that wrong parent information in the URL /bin/view/FakeParent/Parent2/MyDocument returns a Document Not Found error.

This could be an optional / disabled by default feature, in order to encourage short URLs which are more user-friendly.

Deleting/Renaming a document requires minimal local tree fixing

When a parent is renamed or deleted, it is only the immediate children that need to be fixed in order to restore consistency in the tree and that is it. No need for a recursive operation, unless it is a delete operation and the user instructed to delete the children together with the parent node.

In the path-based solution, we are always forced to update the path information of all the nodes.

Fixing backlinks on a document rename

Since a wiki link will now be just [[My Document>>MyDocument]] you will basically have removed the notion of relative links and all the problems faced while processing/resolving/fixing them since the unique document ID is also absolute.

Exporting XAR format

One advantage of using an unique document ID that is not purely technical and random generated is that it still makes sense outside the XWiki instance it was generated on and can be easily exported and reimported on a different XWiki instance.

However, the format of the exported XAR will need to be changed to support:

Child pages when exporting a subtree of documents (i.e. P1>P2>P3). The directory structure would be something like:
/P1/P1.xml
/P1/P2/P2.xml
/P1/P2/P3/P3.xml
The hierarchy of an exported document (i.e. exporting only P3 from P1>P2>P3). The directory structure would be something like:
/P1/P2/P3/P3.xml
such that we don`t touch the XML of the exported document and the parent field looks just like:
<parent>P2</parent>
Importing this xar would result in two blank documents (P1 with P2 as child) being created to preserve the original hierarchy.

Migration to Parent-child + Unique ID + Nested Documents

From a previous XWiki version

Previous XWiki document references are in the form "Space.Page" (space="Space", name="Page") and are accessible in the URL through "Space/Page".

A migrator can be written to use the previous serialized document reference as unique document ID (since the "<space>.<page>" combination is unique, as described earlier in this proposal). As a result of this migrator, the new URL of a migrated Space1.Page1 document would become:

from /bin/view/Space1/Page1
to /bin/view/Space1.Page1

With the intent of preserving bookmarks that were existing before the migration, an extension can be written and installed on a wiki that was migrated from previous version which has exactly this purpose. To achieve that, such an extension can look at the incoming URL if it contains more than 1 path element. Example:

Incoming /bin/view/Space1/Page1 contains 2 path elements (Space1 and Page1). The wiki is set to use short URLs which require only 1 path element (the document ID). This means that this is a request for the old URL scheme coming from an existing bookmark.
The URL gets resolved to /bin/view/Space1.Page1 (i.e. the "/" is converted to a "." and the "Space1.Page1" document ID is checked and returned if it exists)

A new XWiki instance that is not migrating from any previous version should not need to install this extension and. Also, an XWiki instance that did migrate and has installed this extension can choose to uninstall it at whichever point its admins wish to stop supporting existing bookmarks on the old URLs that were up before the migration.

Migrating relative links

To handle existing relative links that need to be migrated (e.g. "WebHome"), the migrator would first have to make existing links absolute before switching to the unique ID approach so that, after the migration is complete, we end up with a documentID named "Main.WebHome" and all the links that were previously relative ("WebHome") will now be absolute ("Main.WebHome").

From a different wiki system that supports ND and wants to preserve hierarchy URLs

After transferring the content and hierarchy information to the XWiki instance, simply enable the "hierarchy URLs" feature described above.

Other changes for backwards compatibility

Add XWiki.getDocumentByID(String documentID)

In order to support existing scripts and code that gets documents by DocumentReference or (serialized) Document Reference String, we should add an XWiki.getDocumentByID(String documentID) method where "documentID" is the unique document ID. Any existing API methods of the XWiki class such as XWiki.getDocument(...) should serialize the DocumentReference to a String (for DocumentReference versions) and use that String as documentID in this new method.

Other implementation notes

Obtaining children and parents information for a node

An extra hierarchy table (so not just the "parent" field or rather, replacing the "parent" field) would most likely be needed in both this approach and in a full-fledged Nested Documents approach (where we only store document parents and not full paths), for the simple reason that we would be storing a tree in an SQL database. Some implementation details here on tree structures stored in SQL databases.

This is needed to be able to efficiently obtain the children information for a node, including direct children (also requires depth information / column to be stored in the hierarchy table), and also the parent information (for breadcrumbs for example or just for loading the document's full path in 1 query, sorted by depth).

Alternative without an extra hierarchy table

If we wish to avoid the extra table, we could avoid the breadcrums usecase by displaying only the direct parent of the current document in the breadcrumbs and the wiki of the current document, with the possiblity to expand the gap between and get a tree view of the hierarchy. UI Example and UI example when expanding the gap.

For the top-down navigation of the tree (e.g. in a tree widget), the "parent" field provides enough information to recursively get the list of immediate children of a node.

Preserving URLs after a document rename

In order to preserve an URL after a document rename we can still apply the "Document alias" solution that would leave an "alias" of the document with the old ID in order to preserve existing bookmarks.

http://jira.xwiki.org/browse/XWIKI-3622 Add an automatic redirect when renaming a document

Concerns about document unicity constraint

As mentioned near the beginning of this proposal, a "solution" for reusing a document name is to prefix it with a namespace. Example for reusing "Test":

"My:Test"
"My.Test"
"My_Test"
etc.

It is purely the user's choice what scheme he wishes to apply to obtain an unique document ID, whether it is to append to the name ("Test1", "Test2", etc.), to prepend ("ATest", "BTest", etc.) or to define a naming scheme as exemplified above.

More details on storing trees in SQL databases

The following slides contain comparisons between the options and also presents details on Closure Tables and examples:

http://www.slideshare.net/billkarwin/models-for-hierarchical-data

http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back/48

Parent-child + Unique ID + Nested Documents

Description

Main idea

Uniquely identifying a document node in the document tree

Absolute references (paths)

Parent-child relationship instead of absolute reference/path

Unique document ID

How the unique document ID is created, at document creation time:

Rationale:

Scripts

Advantages of using Unique ID + Nested Documents based on parent-child

URLs

Shorter URLs

More expressive URLs (optional / just an idea)

URLs become resistant to hierarchy changes

Support for path in URLs while preserving resistance to hierarchy changes

Deleting/Renaming a document requires minimal local tree fixing

Fixing backlinks on a document rename

Exporting XAR format

Migration to Parent-child + Unique ID + Nested Documents

From a previous XWiki version

Migrating relative links

From a different wiki system that supports ND and wants to preserve hierarchy URLs

Other changes for backwards compatibility

Add XWiki.getDocumentByID(String documentID)

Other implementation notes

Obtaining children and parents information for a node

Alternative without an extra hierarchy table

Preserving URLs after a document rename

Concerns about document unicity constraint

More details on storing trees in SQL databases

About

About

Support

Platform

User Guide

Admin Guide

Developer Guide

Projects

XWiki

Extensions

Other

Contribute

Status

Practices

Under the Hood

Get Involved

Get Connected