Annotation Feature

Last modified by Vincent Massol on 2024/02/26 17:55

 XWiki
 Implementation
 Completed
 

Description

Objective

This feature aims to allow user to annotate a XWiki document. User define annotation target by selecting text using a pointing device. An annotation must be most persistent as possible regarding to the next versions of the document. We will use Google Web Toolkit.

Documentation

Design

How to retrieve wiki source code which has generated the HTML selected code?

Solution n°1 : Richer HTML (not used)

This concept is to replace (in a lazy way) initial HTML by a richer HTML. What I mean by richer? In order to retrieve the position of selected word in the XWiki source we wrap each world in an invisible markup that contains his position. So from the selection we are able to get the first and last word wrapping markup, and we can extract position and length of selection in XWiki source document.

Example :

Initial XWiki source

*Juliet*
* Yea, noise? then I'll be brief. O happy dagger!
~~Snatching Romeo's dagger~~
* This is thy sheath;
~~Stabs herself~~
* there rust, and let me die.
~~Falls on Romeo's body, and dies Enter Watch, with the Page of Paris~~

Richer HTML : 

<p/>
<strong><span pos="1">Juliet</span></strong>
<ul class="star">
<li><span pos="12">Yea,</span> <span pos="17">noise?</span> <span pos="24">then</span> <span pos="29">I'll</span> <span pos="34">be</span> <span pos="37">brief.</span> <span pos="44">O</span> <span pos="46">happy</span> <span pos="52">dagger!</span></li>

...
<p/>

Solution n°2 : HTML2XWiki mapping (not used)

Using an HTML to XWiki source convertor we are enable to transform HTML to XWiki source. So we could insert a "begin of annotation" and a "end of annotation" markup in the html and convert it to XWiki source. We get a XWiki source document containing these tags, so determining position and length of annotation is trivial.

Example :

*Juliet*
* Yea, noise? then I'll be brief. O happy dagger!
~~Snatching Romeo's dagger~~
* This is thy sheath;
~~Stabs herself~~
* there rust, and let me die.
~~Falls on Romeo's body, and dies Enter Watch, with the Page of Paris~~

After selecting the four first lines we get this content :

<p/>
<strong>{{annotation}}Juliet</strong>
<ul class="star">
<li>Yea, noise? then I'll be brief. O happy dagger!</li>

</ul><em>Snatching Romeo's dagger</em>
<ul class="star">
<li>This is thy sheath;{{/annotation}}</li>
</ul><em>Stabs herself</em>
<ul class="star">
<li>there rust, and let me die.</li>
</ul><em>Falls on Romeo's body, and dies Enter Watch, with the Page of Paris</em><p/>

After mapping:

*{{annotation}}Juliet*
* Yea, noise? then I'll be brief. O happy dagger!
~~Snatching Romeo's dagger~~
* This is thy sheath;{{/annotation}}
~~Stabs herself~~
* there rust, and let me die.
~~Falls on Romeo's body, and dies Enter Watch, with the Page of Paris~~

Using start/end selection tag we are able to get position and size of the selection and to store it in a dedicated modeling class for Annotations.

Solution n°3 : typographic alteration (used)

Since selection of HTML and wiki syntax differs (in general) in the fact that display information disappear :

wiki syntax

*this is bold*

html selection

this is bold

and noting that most of wiki syntaxes use non alphanumeric characters sequences for formatting informations (~~, *, ...), it appeared reasonable to suppose that removing non alphanumeric characters from wiki syntax will improve our chance to directly match the selection in the altered wiki content. Fortunately current prototype underline that the use of this heuristic is working well in most of cases.

The prototype character filter only accept roman alphabet characters and numbers. Indeed this is bad because it hurts internationalization. I simply forget this parameter, so I propose that the DefaultCharacterFilter only refuse characters that has a special meaning for the wiki syntax concerned.

The prototype also uppercase all characters, I can't justify this choice so it will be removed.

How to keep annotation relevant when a document is modified ?

By default all XWiki documents are mutable. So an annotation target can be moved, modified and deleted. From a version n to n+1 we are able to define a revision object describing all alterations concerning version n+1 regarding to version n. From this object we know that x characters has been inserted/removed from position y, so we can determine new position and length of annotation's XWiki source piece.

How can we detect that an annotation is no more relevant?

An annotation's selection or/and context can be modified, this may cause annotation become out of date. In order to detect that, we could use Levenshtein distance indicator.

How to render html with annotation markups

Prototype insert markers in a copy of the document content, and then render the obtained string.

The markers aren't related to a wiki syntax they are symbolic and they only make sens inside annotation plugin. The markers should not be interpreted by any parser because they allow us to retrieve the selection limit after the rendering has been processed. So it's important to retrieve them after the rendering exactly in the same state that we inserted them.

Finally theses symbolic markers are replaced by span markups. Theses span aims to underline the selection range.

Data model

Annotation

An annotation is an abstract object modeling meta-data concerning a piece of XWiki source document, this document has a version. It will be handled using an abstract class because there can exist many kinds of annotations. The abstract class will has access to the concerned selection object, to the XWiki concerned document object and to the last version of document where we determined that annotation was relevant.

Selection

A selection will be implemented by an object containing his position and offset in XWiki source document. It will also contain XWiki source string (+context ?) and eventually html result string.

Scripts

If user select generated content, application should inform user (using a warning icon and a tooltip). By this way we indicate that content can change even if source code don't. In a first approach we should consider macros and scripts result as atomic, I mean that if a part of generated content is selected, selection is expanded in order to concern all the generated content. Of course we should be also flexible by designing implementation that allow different plugable behaviours regarding to specifics macros (xhtml for example).

Includes

If user select included content, user should be aware of the fact that annotation will target included document. included content selection should also be atomic.

User interface

Adding annotations

Once a selection has been defined, a tool-tip appears in order to suggest user to add an annotation concerning selected text. If user want to annotate selection a panel is displayed side to document block. This panel allow to choose type of annotation, and display annotation type specific widgets. A button allow to save the annotation, this make panel disappear. If new annotation intersect with a previous current user one, I think it make sense to merge them.

Annotations display

Each annotated sentence is highlighted. The annotation itself could be displayed side to the content block, under the content block or in another page. In order to keep document clear, it could be usefull to highlight annotation user by user, by default the last annotator has focus. A list allow to switch between annotators.

Proposed roadmap

  • Annotation implantation.
  • Selection implantation.
  • Mechanism for determining xwiki source corresponding to a selection.
  • Mechanism for keeping up to date an annotation regarding to document modification.
  • Mechanism for detecting an irrelevant annotation.
  • Test mechanism of annotation concerned XWiki source code extraction
  • Test annotation resistance to XWiki document modifications.

Current Implementation

The current implementation (sandbox@r24300) can be described as follows:

  • there is a single type of annotation which can be added, using a specific javascript client backed by a rest service. Such an annotation contains an  annotation text, the annotated content and its position and this type is highly coupled in the plugin implementation
  • the backing annotation storage is based on components, currently with an XWiki objects implementation but a different service can be easily created (there is one implemented for Scribo annotations stored in RDF)
  • for the moment, only xwiki documents and feed entries fetched by the feedreader plugin can be used as targets for annotations (the annotated documents), with the restriction that the content of the document is not generated using scripting. A component can be implemented for a new type of document but the current UI (the javascript client) is specific to xwiki documents
  • the javascript client (UI) is only in the state of a prototype: while proving that it works, it is not robust enough and the user experience is poor
  • annotation creation algorithm seems to perform well in practice, but we should put it to more real-world test (by releasing).

The following picture illustrates the flow of adding an annotation:

addAnnotationSeq.png

To show all the annotations in a document, the following flow is executed:

showAnnotationsSeq.png

Where the referred services are implemented as illustrated in the following diagram:

classDiagram.png

Features improvements

  1. Typed annotations: be able to have different types of annotations (one to be able to easily specify fields for the annotations to add), with storage as xwiki objects. This needs to be flexible at all levels: UI and storage backend, preferably using xwiki scripting (no jars on the server side, or java coding) so that it can be easily customized. Also, the annotation UI should be thought of as easy to customize and create whatever forms and actions in a light manner.
     Current problem in implementing this feature is that the the Java script client is created custom for the annotation type as described by the Annotation class (in last diagram), and all the functions signatures, comunication interfaces and the Annotation type are not flexible to accommodate new types easily.
  2. Annotated object fields: be able to annotate any type of document (object inside such a document), namely all or any number of text fields in such an object. Preferably this should also be doable only using xwiki scripting or configuration.
     For implementing this feature, there are a few issues to overcome:
    • the client side selection detection relies on the way the document is rendered, and needs to identify the "annotatable" fields in the document (for now, "xwikidocument" is hardcoded as the id of the content that can be annotated)
    • the server side selection mapping relies on implementations of the IOTargetService, which need to be provided on the server. Ideally we should be able to configure this, instead of coding it.

Also see http://markmail.org/thread/brrwupwlnql2huxd for a discussion about current state and desired improvements.

Typed annotations

Solution 1: "extra fields" in the Annotation class

The idea is for the Annotation class to admit "extra fields", so that, if the JS client and the IO service are aware of them and handle them correctly, any type of annotation can be used. To get this trough, we need to ensure that:

  • the JS client can have an Annotation type configured, as an XWiki class. The JS client can display a form to edit any object of that class. This task can also be implemented in 2 ways:
    • either the annotation form/view is actually build in js only using information from the wiki, and it is then submitted to the rest service
    • the js code will do nothing but asynchronously fetch the form HTML from a "form builder" (set of pages/sheets/templates that can build a form for a given type, or can display an object of a page, etc) to be submitted to the rest service.
      All the classes used for the annotation type will have a set of mandatory fields though, namely the data allowing to annotation position to be identified in the document (selection, context, offset).
  • the XML schema which defines XML serialization for REST communication is relaxed so that the annotation element is allowed to contain any number of "extra fields" stored as (name, value) pairs. The same, the Annotation class is adjusted to support these types
  • the annotation handling mechanism (like mapping to source, or rendering the annotated HTML) stays the same, ignoring the "extra fields"
  • at the storage level (IOService), each implementation will handle the extra fields accordingly. The XWiki based implementation should get the annotation XWiki class either from a configuration, or should receive it as a parameter, and will serialize the Annotation object in an object of that specific class attached to the current page. Upon retrieval, it will detect all the objects of the configured annotation type and create the corresponding Annotation list from that data. The Scribo service will handle serialization and deserialization to/from the RDF storage.
  • as a general approach, the Scribo project implementation for the annotations will consist of a .xar containing the XWiki class with the specific fields (along with potential customization of the edit form or annotation display) and the implementation of the IOService (for which, since implemented as a component, just dropping the Scribo .jar in the lib folder should be enough to make the annotations engine use the scribo IO)

Pros: preserves the current annotations architecture, creates an easy way to plug an annotation storage service

Cons: the Annotation class would be nothing else but a "BaseCollection" the same as a generic XWiki object, and we would be reimplementing view, save, edit, delete of this type (already implemented through the xwiki action) from the web forms level (editing and displaying an annotation), through the controller (REST this time instead of actions in the standard wiki servlet), to the storage level (XWikiIOService will only transform a map (the Annotation class) to another map (the XWikiObject in the xwiki document)).

Solution 2: XWiki objects model based implementation

Given the above solution's "cons", we can think of an exclusive XWiki data model based solution: all annotations are handled as XWiki objects, and the standard xwiki object and documents manipulation is used for the edit, view or save of annotations. The components of the process would be implemented as follows:

  • the js client would only asynchronously fetch and display in a bubble (tooltip, dialog) the standard inline edit form, or the standard view of XWiki objects of a preconfigured type (the annotations xwiki class).
  • an annotation REST service would only be needed for fetching the annotated rendered document content. Annotations save, edit, delete would be done through the standard XWiki actions
  • the storage service (IOService) is not needed anymore, only an IOTargetService would be used for generating the annotated HTML
  • Scribo, in this case, would be implemented an as external "alterer" of the XWiki data: in order to add its annotations from the RDF storage, it would create XWiki objects (using any of the XWiki APIs) from its RDF data and would read the data from the wiki to get the newly created annotations. Basically it would run from time to time and make sure that the wiki is "synchronized" with the RDF store.
  • however, in this case, server side annotation mapping can not be hooked in the annotation saving process anymore (since a default edit form would submit to the default save action) There are 2 solutions to fix this:
    • implement an observer to listen to the annotations object creation, and in case they don't map correctly, delete the "wrong" annotations. This can be an issue if there are many wrong annotations: object are created only to be deleted afterwards, versions of documents are created with no reason
    • trust the js client selection detection, and consider that all positions are correct, give up the validation at save time completely and handle the annotations positions at rendering time (when they are rendered back in the document), either by invalidating them (marking them as "non-safe") or by deleting them. This could work if the js selection position detection performs well because even in the current case, mapping at save time cannot guarantee the correctness of an annotation: the document can be changed immediately after and the annotated selection invalidated.

Pros: uses the xwiki object model 100%, there's no need to rewrite the whole process of saving, editing objects, etc, the forms and views are light customizations of standard XWiki mechanisms

Cons: the annotation mapping to document source cannot be implemented straightforward (there are changes needed in the architecture and algorithm), Scribo (or any other storage model different from XWiki) would need to duplicate data, and it would not be 100% real time (since it would be a periodical synchronizer)

Wider annotation targets

Solution 1: store annotations as selection and context, over transformed documents

At the moment, an annotation is defined by an offset and a length computed relative to the source (wiki for documents or html for feed entry articles content) of the annotated document. To make this more flexible and be able to annotate anything that a user sees rendered, the following change is proposed:

  • an annotation is to be defined by the selection and context where the selection is the text selected by the user and context is the surrounding frame of text so that it uniquely identifies the selected text. Although the context can be computed on annotation creation time by the javascript client to ensure its uniqueness, a fixed, sufficiently large size in characters for this frame of context should work well in real-life cases.
  • on display, this annotation is to be mapped (context and selection identified and markers introduced) on the transformed XDOM of the document, after all the scripts / macros, etc are executed. On rendering this XDOM with the regular XHTML renderer, the markers will be rendered and then operated from the js client (enable / disable color, etc). This alteration should be implemented as an XDOM transformation, with the restriction that it only has to be executed "on demand" (ftm a transformation is executed always by the transformation manager).
  • the annotations maintainer is to be changed to update the selection and context of an annotation when a document changes. For the first iteration, this adjustment can be reduced to the offset / length algorithm by computing the offset & length of the annotation in the original document, applying the same algorithm for detecting the updated offset and length and then extracting the updated selection & context.

Pros: an annotation definition is more robust (what the user actually sees selected) allowing to try to identify it anytime in the rendered document, edited or not, etc; can handle any rendered content; annotation markers are generated in a tree model (XDOM) rather than altering strings, which is a more natural and correct solution; the annotation mapping step upon annotation save is not needed anymore, selection and context can be straightforward computed from the js client
Cons: potentially requires heavy changes in the existing implementation (to operate with the new annotation definition), since it's based on XDOM, it would only work for documents in XWiki 2.0 syntax

Solution 2: store annotations as selection and context on HTML content

This is a refinement of solution 1, with a slight change of strategy consisting of (HTML) 'content' as the main unit (and not 'document') and 'annotations set' (instead of 'document's annotations'), as follows:

  • store an annotation as the user selection and context (as presented in previous solution)
  • reduce the problem of annotating to mapping a set of annotations on a piece of XHTML and introducing markers (spans) to mark annotations:
    String getAnnotatedHTML(String someHtml, Set<Annotation> annotationSet);
  • apply a "accept everything, render what's mappable" strategy for storing annotations (no validation is to be done on add, any user selected content is considered to be 'valid'). Good UI should be provided to warn user about annotations that couldn't be rendered and allow him to manage them (so leave the user decide when an annotation is no longer "valid"). Various strategies could be applied for automated cleaning, if needed. This is because there is no way to detect automatically if an annotation will never be valid: maybe rendered for a different user the annotation would be valid, or rendered in a different context, or maybe the document content was temporarily edited, to be rolled back afterwards, etc.
  • at the js client level, some 'content annotable' elements could be configured so that annotations would be accepted only on those subtrees (for example, standard, the element with "xwikicontent" id only would be annotable). This way we would prevent accepting annotations on the layout elements, since we'd never be able to render these annotations in a document and it does not make sense.
  • provide APIs for the above proposed function in velocity, for example, so that whenever needed, the
    $doc.getRenderedContent() can be replaced by
    $annotations.getAnnotatedHTML($doc.getRenderedContent(), $annotations.getAnnotations($doc.fullName))
    to have rendered annotations in any .vm template, for example
  • the standard annotation application will provide a default implementation for the default XE, based on this API, but any customized version should obtain the same result easily using the API

Pros:

  • an annotated document would simply be obtained by applying the above getAnnotatedHTML(...) function on the annotation objects stored in that document, for example, and its renderedContent. 
  • the same, a set of annotations can be rendered on a property of an object, or on a concatenation of properties of objects in a document. 
  • it would be very easy to decide to render annotations only on a portion of a document, by simply extracting the HTML fragment and replacing it with the annotated version
  • the original HTML of a document can be anything, for example the one resulted from rendering the document with a customized .vm template, etc
  • annotations could be stored anywhere (in any document, or external service, etc) as long as there is way to obtain them and pass them as the parameter to the function
  • we can print annotations on everything as long as it's representable in HTML (docs in any syntax, XWiki Watch articles stored as original html content, etc)
  • mapping is as precise as possible, since the annotation selection is obtained from a HTML representation (in the browser) and we're mapping this on a HTML representation: dynamic content would all be executed, included content, etc. Also, since annotation markers are inserted in a HTML, precision of marking is maximal (as opposed to XDOM where markers for a HTML fragment were to be introduced in an XDOM which was to be rendered in a HTML afterwards).
  • loose coupling between the annotation definition and the content on which they are added allows flexibility, for example, moving content from one object property to another would not influence the annotations on that content: to be represented, they just need to be passed to the getAnnotatedHTML() function for the new property

Cons:

  • it's all about XHTML
    • However, I don't think this is a real con, because the main annotation problem is their representation. So far, the only representation for which we use annotations and need them marked is XHTML (and I don't see now a good case where we'd need a different representation and still could reuse code). In addition, bullet #7 above says that if it wouldn't be about XHTML chances are it would be imprecise
  • by default annotations won't be represented in the document rendered content
    • Again, I don't think it's a real con. They are not part of a document, so rendering them by default can be uncomfortable, they should be pulled
  • too many non-valid annotations can be added because there is no verification on adding
    • I think this is can only happen as a result of users annotating highly dynamic content (which will always change), which I don't find a very frequent usecase, and never cleaning up the annotations list, or bad configuration wrt 'content annotable' elements. Mapping selection on HTML should always work thanks to bullet #7 in pros list

Solution 3: add annotations on targets specified as references to doc.object.field

To take into account the need to be able to replace an annotated text with its comment (the selection with the annotation metadata), the need to identify the source of the content which is annotated has emerged. In the solution above, there would be no way to go back from a HTML representation to the actual field of an object. For example, in the case of a scripted display of a blog post, one would like for example to be able to edit the content field of the blogpost in this way. With a HTML 'blind' approach, it cannot be mapped back to a field.

So, a new strategy, based on targeting content instead of documents is described by the following:

  • a way to provide a reference to a field in an object in a document will be devised (for example, expanding the DocumentName concept)
  • Annotation model changes to store such a reference to a field (or to the document itself in which case it means 'document content')
  • annotation API changes as follows:
    void addAnnotation(String reference, Annotation ann)
    String renderAnnotatedContent(String reference, Set<Annotation>, String syntax)
    Set<Annotation> getAnnotations(String reference)

    where the renderAnnotatedContent can be replaced, in case of a HTML only based approach (as in the above solution), by a
    String getAnnotatedHTML(String reference, Set<Annotation> annotations)
    or syntax could default to XHTML, etc.
  • on the client, when using the annotations one would need to mark the source (object field, document content, etc) of a piece of XHTML and the fact that it can be annotated.
    js client will be looking in the doc HTML for a 'content annotable' element (which would be marked through the 'class' attribute), which would also have its reference attached (in the class, or in the name, etc). it will accept annotations on that element and direct the service call correctly. It will default on 'xwikicontent' which would direct calls to current document's content, and it will stop on the first such ancestor found (so that when an annotation is added on a blogpost field which also is in a 'xwiki content', it will be handled only for the blogpost field) However:
    • it will be the responsibility of the application authors to mark their scripted content as annotable (and mark the proper references) -- for example the blog app would need to mark the place where a blogpost content is displayed with the appropriate classnames
    • when scripting the view of such an application, if annotations have to be rendered, the app author will need to replace the call to getRenderedContent(propValue, doc) or doc.display(prop) with a call to the annotations service to render the annotated html
    • while actually annotating the content, the field should be re-renderable without the script context (velocity variables, macros, ?), so that it can be re-rendered by an ajax call by the js client to refresh the annotations
  • for scripted content outside inclusion of the annotable fields (for example, a toc, or a code macro, etc), 2 approaches are possible, depending on how the annotations would be stored:
    • either scripted content is annotated 'blind' (if a user selection touches a macro the whole macro is annotated) -- in case a selection is stored by index in the source, where no other solution is possible
    • either all selections which are found visible are annotated (when annotation is stored as selection and context)
  • on replacing an annotated text with the annotation metadata, the reference should be used to direct the edit to the right field. Server side would handle edit depending on how annotations are stored
  • this solution does not depend anymore on how annotations are stored on the server (by selection & context or by index in the source), the case when a rather statical document (with large static content stored in objects) is displayed dynamically (by including a sheet, so that it basically has no real source) is covered well by targeting annotations to fields.

Pros:

  • scripted content is handled wrt to the document where it comes from
  • replacing an annotated text with the annotation metadata is made possible for any such field
  • better identification of content as target of an application brings a plus of coherence

Cons:

  • annotations would not be able to transparently work over any displayed content. Though default annotation application will be made to work on default document content in default XE, the devs responsible for content display (for example, sheets to display object fields in a page) will have to use the annotation application conventions (class names, displaying annotated HTML) to benefit from the advantages
Solution 3 appendix: storing annotations as index & offset in the source vs. selected text & context in a representation

Since an annotation is given by a text from the client/browser anyway (it's all it can be computed on the client), there are 2 operations which are needed: mapping this text on a source , that is identifying the part of a source which will be rendered as the annotated text, and rendering this annotation in various formats, html mainly (we'll consider that rendering an annotation is about rendering the annotation's target with a set of markers inserted which would allow one to identify the annotation in the result, e.g. spans in HTML). Mapping will always need understanding the syntax of a source, rendering an annotation will need to know and follow the markers convention for the syntax in which it is rendered. They will both need to be made in both approaches, but in different moments.

by index in sourceby selected text (from a HTML representation)
requires mapping on adding an annotationdoesn't require anything on add, at most a verification that the text actually exists
rendering requires convention over markers and a way to insert markers at specified rendered offsets. It can require understanding the syntax in which it is rendered but it could also be made in a model of the source markup (e.g. XDOM) to be rendered afterwards.rendering requires mapping the text on the rendered result (e.g. on the rendered XHTML of a field / content). If done without using the source (without prior mapping) - ideally - it requires understanding the syntax in which it is rendered.
requires knowing the source syntax on addingdoesn't need to know the syntax of the content source on adding annotation
requires source modeling on editing the annotated part (i.e. even if we already know the source correspondent, editing is not straightforward since it has to keep source markup consistent)requires mapping on source on editing the annotated part. Only here source syntax needs to be known
is coupled with the annotated content (the field, etc) cannot be easily ported on a different contentcan be easily rendered on a different source, for example, as long as the text appears
 needs maintaining at any change needs maintaining only if the annotation cannot be localized anymore (if a user copy pasted the paragraph where the annotation was, for example)
precision is as fine as the source can be.
e.g. if a macro with scripted content is 'touched' by an annotation, since the source does not contain the result of the scripted content but only its source, it would probably need to consider the whole macro in the annotation
precision is as fine as representation can be:
result of scripted content can be annotated if it's visible
is robust on rendering/representation changes, like encoding changes of rendering, etc
e.g if an annotation is added on "hello {{velocity}}$context.user{{velocity}}", it will stll be rendered even if the username changes.
 is sensitive to rendering / representation changes, it is based on what the user sees
e.g. for a macro with a username, it will be rendered only for the same user
 needs migration on representation change (e.g. encoding that potentially affect what the user sees)

Solution 4: add annotations on specified (but optional) target, rendered by a specialized renderer

This is the approach is currently being implemented (in progress).

Refinement of the solution 3, this solution is based on specialized renderers that would render annotations on content. The following list will identify main issues to be handled and the solutions we propose:

  • model of an annotation
    • annotation is stored as selected text and context ('what the user sees') on a representation of the rendered annotation target.
      This representation is considered to be the output produced by a plain text rendered version of the target (HTML without markup if you like, or wiki syntax without markers) after all transformations (macros) were executed.
    • a reference of the target of the annotation is saved for every annotation, allowing to identify the document, object and field addressed by an annotation.
       see solution 3 for details
  • adding an annotation
     will consist of only saving an annotation object with the received selected text and context (no checking, no mapping would be done).
     see solution 3 pros for advantages of such a flexible approach
  • rendering an annotation
    will be a specialized renderer's job, and such a renderer should be built for any new format we need to render annotations on (different format == different renderer). However parts from the default renderer could be re-used by the new format.  
    The principle of this renderer is that annotations are modeled as a pair of "bookmarks" in the stream of rendering events (one for the annotation start event and one for the end). Such a bookmark is made of a rendering event where the annotation event takes place, the offset inside this event (for events that produce text to the output, it's the offset in this text where annotation begins or ends) and an annotation event, specifying whether is a start annotation event or end annotation event, and the annotation itself.
    There will be an annotation generator which is a chaining listener and will identify the poistions for annotations and return, whenever queried, the current state, as the annotation events that take place in the currtent rendering event (block). An annotation renderer reads these states, at each event, and renders the annotations accordingly. The two will be chained by an AnnotationRenderer. The state information returned by the generator is independent of the syntax in which the renderer will render them, so the generator (and mapping algorithm) is completely reusable.
    An important advantage of this approach is that annotations are a handled as a different layer in the rendering process (trying to sink them among the xwiki events creates the need to split some events in order to signal annotation events inside them and not all events are 'splittable').
    A diagram of this approach is available in the annotationsRendererState.png attachment.
     The two will work as follows:
    • annotations positions (events where they start and end) would be identified by buffering all events that potentially constitute the start of an annotation (or annotation context) until its end is reached. To the limit, this can consist of buffering all events and identifying the start event and end event of all annotations. Matching the annotation text should be based on the plain text representation of the buffered blocks.
    • the renderer will render the events with respect to whether they constitute start block or end block of an annotation, or whether they are inside an annotation and need special handling (closing inline tags before closing block tags, etc)
    • Initially, the raw blocks (which can contain pure XHTML to with markup) will be ignored, considering that no useful content is included in raw blocks. Later, special handlers can be added to provide plain text rendered content of such a block, along with handling annotation markers (spans in the XHTML version) inside such blocks.
      A similar approach (buffered events) is done for rendering the ids of the headings (where the content of the heading needs to be parsed first to create the ID), except that this solution also involves locating an annotation in the stream of events.
  • references
     will allow to identify an annotation target, i.e. a document, an object (by class name and index), and a field. (see solution 3 for details)
    • they will be stored for each annotation
    • the default is the document content (reference that only contains a doc name refers to its content)
    • annotation API functions will all allow to specify a target (storeAnnotation(..., String target), getRenderedContent(..., String target), getAnnotations(..., String target)) but will always default on the document content
    • programmers (developers of applications) will be provided methods to specify original source of a piece of XHTML (by using, for example, syntax macros, or HTML containers, etc), or to request the annotated rendered version of a specific target (field of an object). Full API for this functionality is to be designed at a later moment. Note that a syntax macro could also allow delaying the target identification of an annotation to a later moment, on server side processing (in the XDOM).
  • maintainment
     On a document update event,
    • the reference to the modified content in the document (document content, object field, etc) is identified,
    • all annotations for that reference are found and updated by
    • producing a diff of the plain text rendered version of the target before and after update and updating the context and of all annotations 'touched' by the edit. (also, we can think of updating the selected text of the annotations in the same way)
  • editing the target of an annotation with respect to the selected text of the annotation. For example, replacing the annotation selection with the annotation text (metadata) (see solution 3 for details).
     Will be done in the same manner as rendering an annotation, only with a replacement step after locating the annotation and using a target syntax modified renderer, and using the non-transformed version of the target source. The result of this renderer can be saved as the new source of the annotation target.
  • performance issues
     Rendering an annotation can take a lot of time. Right now rendering is done after each annotation add to re-display the annotated document so it can be a stopper if it's slow. It can be ameliorated by:
    • improving the string find algorithms and / or data structures to improve locating annotations in the rendering process
    • caches (of positions of annotations in their respective targets)
    • js tricks: displaying an annotation purely on the client, without actually re-rendering the content
    • considering a section in a document as annotation target

Updated version of the flow diagrams for the two main actions (add an annotation and showing the annotations) can be seen at addAnnotationSeq-ByText.png and showAnnotationsSeq-withRenderer.png .

 Pros: 

  • annotation rendering is semantic. Indeed, each renderer should handle its own annotation marking instead of having them pushed by the AST, since annotation doesn't have corresponding semantic on the AST and the renderer is the only point where the specific markers for the syntax are known.
  • annotations storage is independent of their target. By storing the actual target of the annotation, they can be stored anywhere (attached to any doc or in a different db, etc) as long as they can provide the content they refer
  • edit (which is not a priority at this point) is enabled in a manner easy to derive from the core code for annotations handling
  • targeting a specific field doesn't force the solution, but can be easily enabled and perfected at a later point
  • targeting a specific field of object is to be done declaratively by application developers, doesn't need server side code, etc (this is also semantic: specifying that a piece of content comes from a specific source is not a matter of algorithms or logic, is just a declaration that a programmer would make)
  • see also solutions 2 and 3 which contain parts of this solution and provide some advantages of this approach

Cons: 

  • a new renderer is needed anytime the annotations need to be rendered in a new format
  • the default XHTML renderer implementation is complex and can cause implementation delays
  • the rendering could be slow

 

Get Connected