Paragraph Numbering

Last modified by Manuel Leduc on 2022/03/03 11:01

 Design
 Completed
 

https://extensions.xwiki.org/xwiki/bin/view/Extension/Numbered%20Reference%20Macro/
https://design.xwiki.org/xwiki/bin/view/Proposal/OfficeImportImprovements

Description

The content below is, or will be, moved to Content Numbering.

Provide the ability to use paragraph number of documents, with automatic number of paragraphs and sub-paragraphs. In a similar way as it is currently done for headings.

Requirements

Key requirements:

  • Numbering level based on heading level
  • Automatic sequential numbering
  • Start from user defined number available as an override for automatic numbering (example "restart at 1"). A reset to default function should also be available
  • Ability to skip numbering when required, i.e. link to previous list but start at a defined number
  • Ability to have a numbered list within the numbered headings that is a normal text type list, i.e. independent of the style and numbering of the numbered paragraphs
  • List level 1 and 2 should be discoverable by the ToC macro (or a separate ToC macro if necessary). We will need some processing of the text when displaying in the macro, such as removal of the last character if it's not a letter or number. We may also want to convert upper case text to lower case. Users can add more levels using a macro parameter.

The above requirements should be possible from the WYSIWYG editor and should be easy and intuitive with no need for the user to know wiki syntax.

Numbers should be visible in View, WYSIWYG Edit, PDF Export and print.

Special Rules:

  • On return, the next paragraph should continue to use the same heading level as the previous paragraph.
  • The heading level should react to an indent change
  • We will also need a keyboard shortcut to change the indent level (tab is preferred, but another keyboard combination is acceptable).
  • The text style should default back to "body text" when hitting return after an empty heading but this should not break the numbering sequence if this took place in the middle of a set of numbered headings

Implementation Analysis

  • Contrib extension since this is not generic enough to be in XS.

    • It can be considered a contrib extension even though the need is pretty specific and we've never heard any user requesting this.
  • Use HTML ordered lists (<ol>) instead of paragraphs (<p>) or headings (<hX>) because it's more semantic than other options.
  • Style list items to look like paragraphs

    • CSS numbering and more
  • Use a rendering macro around the whole paragraphs to be numbered (technically it's around the list)

    {{numberedLists start="5.5"}}
    1. Sheer Strength:
    11. Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem.
    1. Manufacturing:
    11. (((
    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

    This item is skipped from numbering
    )))
    11. Proin feugiat viverra felis, eu auctor nulla sollicitudin ut. Curabitur non ex vitae erat sodales fermentum non in orci. Mauris commodo et nulla tristique euismod. Integer a pulvinar urna. Morbi in ante lobortis elit tincidunt mattis quis quis purus. Maecenas a auctor massa. Duis neque turpis, pharetra tincidunt imperdiet ut, pulvinar at nulla.
    11. In consequat id ipsum eget egestas. Morbi fringilla ornare orci, eget ornare purus tempor quis. Aenean ultrices consectetur pretium. Vestibulum non interdum leo. Aliquam rhoncus sagittis elit eget rutrum. Aliquam efficitur pellentesque odio sed aliquet. Etiam tristique lacinia facilisis.
    {{/numberedLists}}
  • Use a macro parameter for specifying the starting number (e.g., start="5.5")
  • The macro content will accept any wiki syntax, but the users will have to use numbered lists in order to get automatic numbering
  • Use group syntax (DIV) or line-break (BR, Shift+Enter) when we want to skip "paragraphs" from numbering
  • Technical considerations:

    • Two of the features that have to be supported are in "conflict":
      • being able to create an URL that targets a numbered "paragraph" requires the list items to have the id attribute set so that we can use url#listItemId to target them (similar to anchors and headings); the id needs to be set from the server otherwise the links won't work (if we add the IDs with JavaScript then they won't be available when the page loads so the browser won't scroll the page to the target numbered paragraph). I makes sense to generate the ID based on the paragraph numbering (e.g. id="P5.5.2") => the numbering needs to be done on the server side
      • being able to edit the numbered paragraphs inline inside the WYSIWYG editor and having the numbering updated on the fly while typing means we have to compute the numbering on the client side using only CSS. Any other solution (JavaScript or server-side rendering is too costly and will slow down the editing)
    • The solution for this conflict is for the rendering macro to have two (rendering) modes:
      • a view rendering mode (enabled when you view or export the page): the numbering is computed on the server side and the list item IDs are set accordingly
      • an edit rendering mode (enabled when editing the page with the WYSIWYG editor): the numbering is computed using CSS; the list items don't have IDs set but that's fine because links are not clickable in the WYSIWYG editor.
  • Implement another rendering macro (e.g. numberedListsToc), similar to the TOC Macro but for numbered lists

    • There should be a scope parameter to control whether the ToC is collected for the entire page or just for a subtree (of the page XDOM)
    • Collect only level 1 and 2 "headings" (list items)
      • (option) Add a macro parameter to control the depth (default value being 2)
    • Remove from the "headings" ending any non-alphanumeric character (e.g. ":")
    • (option) Add a toc boolean parameter to the numberedLists macro to show a ToC before the numbered list (paragraphs)
      • numberedLists would call internally the numberedListsToc macro described above, with scope=local so that only the content of the wrapping numberedLists macro is taken into account
  • Be able to use a standard list inside a numbered "paragraph"

    • Add support for extending CKEditor's Styles drop down from an extension (i.e. allow an extension such as Numbered Paragraphs to provide additional styles that make sense only for this extension)
    • Add a 'Skip Numbering' list style in the Styles drop down of CKEditor (using the "extension point" mentioned previously)
    • The user will place the caret inside the numbered list for which they want to remove the automatic numbering and select the 'Skip Numbering' style from the Styles drop down
    • Write the CSS to reset the automatic numbering inside an ol.skip-numbering
    • Limitation: You won't be able to have numbered paragraphs (lists) inside a "Skip Numbering" list (ol.skip-numbering)
  • (option) Add an XWiki Template to create a new document using numbered paragraph
  • Limitation: needs Edge for advanced CSS (like var)) for MS in edit mode, other browsers are supported (ony IE doesn't support it)
  • Note: exporting as doc is out of scope.

Importing

Requirements

  • The headings and their numbering must be preserved. They must stay "dynamic" after the import (for instance, adding a new heading between two existing headings must impact the numbering of the new header as well as the header after it)
    • Question: Should this also support custom, non-continuous numbers and headings without numbers or is it sufficient to remove numbers and let the Numbered References Macro automatically number all headings? Note that numbered references currently doesn't support custom numbers.
  • The numbered paragraphs and their numbering must be preserved. They must stay "dynamic" after the import (for instance, adding a new paragraph between two existing paragraphs must impact the numbering of the new paragraph as well as the paragraph after it)
    • Question: What about non-continuous numbers? Do we need to preserve them?
  • (Optional) Content with manually numbered heading must be converted into numbered heading automatically
    • Question: Is this required or is it okay if headings that are not designated headings in the source document are imported as regular text?
  • The identification of the existence, and the type of numbering (i.e, heading vs paragraph) must be done automatically, or can be chosen when importing the word document.

Importing Results

  • Heading numbers require explicit analysis of the heading numbers if they are consecutive and adding additional markup if not. MS Word HTML and LibreOffice XHTML export contain markup that allow to distinguish heading numbers from regular content, the currently used LibreOffice HTML export contains the numbers as plain text at the beginning of the heading without any spacing and thus requires heuristic detection of heading numbers. Note that LibreOffice 6 which is still (see https://jira.xwiki.org/browse/XDOCKER-157) included in official XWiki docker containers as it is the version included in Ubuntu 20.04 produces very different HTML for numbered headings. It instead wraps numbered headings inside ordered lists. They only include a start attribute in the innermost level and do not use any styling and thus do not display the correct numbers. As this is very different from LibreOffice 7, difficult to support as numbers would need to be guessed and LibreOffice 6 is not supported anymore (see LibreOffice Support Strategy), importing numbered headings and numbered paragraphs won't work with LibreOffice 6.
    • Using the LibreOffice API in JODConverter: we can iterate over all paragraphs and insert a special padding string (e.g., a random but fixed UUID) at the beginning of every heading. When LibreOffice inserts the number at the beginning of the heading, this padding string will now separate the number from the content and we can thus clearly separate the number from the content. We could also get the "List Label String" from "Character Direct Formatting" which contains the heading number in case we directly want to encode the heading number in our padding string.
    • Literature on the API: https://fivedots.coe.psu.ac.th/~ad/jlop/chaps/05.%20Text%20API%20Overview.pdf has on page 8 an example how to iterate over all paragraphs. https://forum.openoffice.org/en/forum/viewtopic.php?f=44&amp;t=68055 has both an example how to insert text at a cursor and an example how to set the property that contains the paragraph style, calling the get- instead of set-method should do the trick to get the current paragraph style name.
  • For paragraph numbering, the import source basically consists of a heading for every paragraph. Therefore, this is the same as before, except that for the intended representation as ordered lists, we need to transform headings into ordered lists and content between headings must be transformed into text inside the last list item (with a div).
  • If headings are manually numbered, this will require an additional effort to recognize headings. With LibreOffice (or MS Word) HTML output this should be possible, with LibreOffice XHTML output more difficult (we would need to analyze associated CSS code as the XHTML export only uses numbered classes with associated CSS code).
  • Automatic identification of paragraph numbering style should be possible by checking if a large part (say > 50%) of the content is headings. Adding a manual choice is probably more reliable. We could implement a list of filters that can be enabled or disabled and paragraph numbering conversion (i.e., from headings to paragraph numbering macro) could be one of the filters that can be selected. We can identify the paragraph numbering style reliably using the LibreOffice API in JODConvert. We can, e.g., check if the "Follow Style" property of the "Heading 2" style is "Heading 3" (paragraph numbering) or "Text body" (regular document).

Different HTML Output Formats

  • Existing LibreOffice HTML export
    • add Existing implementation well-tested.
    • delete Some heuristic guessing for heading numbers required as they are just numbers at the beginning of the headings. False positives will be possible.
  • LibreOffice XHTML export
    • add Some more metadata for heading numbers that makes detection reliable (there is a span with class "heading_numbering").
    • add CSS represents actual content styles like font size that could be used for additional detection, e.g., of numbered paragraphs style.
    • add/delete Image import probably needs some adjustments but is probably easy to fix as images are contained inline.
    • delete Will require extensive testing
    • delete Many styles no longer represented as explicit HTML tags like <b> but with some non-semantic class names, thus this will require parsing and interpreting CSS for import and thus significant work for import (currently the CSS is not parsed).
    • delete All lists no longer use actual list styles but contain spans with numbers/symbols. This will require additional work for importing regular lists and guessing for the right parameters for the lists which are basically provided automatically as traditional HTML in LibreOffice HTML export.
  • MS Word HTML
    • add Contains all word styles as HTML classes/CSS styles and thus allows creating an explicit mapping between Word styles and wiki markup
    • add Tabs are represented as spaces for alignment which means that a basic import of alignment tabs works out of the box as long as the font is the same.
    • delete Lists are not represented as lists but sometimes as regular paragraphs that would require custom transformations. Also different variants of lists need to be detected and transformed into XWiki lists. The default mapping doesn't seem obvious.
    • delete Will require extensive testing as the provided HTML is very different.
    • delete Image import is not clear, images are exported as separate files that would need to be uploaded by the user probably unless we automate the MS Word to HTML conversion and capture the images.
    • delete Can't be the default import option for XWiki as it requires MS Word which is non-free and the HTML is not compatible with LibreOffice.

Conclusion: For paragraph numbering, sticking with the current export format seems to be the best option, we can identify additional metadata using JODConverter filters and thus, e.g., identify paragraph numbers reliably. All other options require significant additional work and could cause a lot of breakage.

Architecture

Paragraph Numbering Macro

Macro allowing introduction of numbers aside the paragraphs defined in the body of the macro.

Parameters:

Parameter nameTypeDefault ValueRequiredComments
startString (\d+(\.\d+)*) NoWhen empty, the number continue where the previous paragraph numbering macro stopped. If it is the first call, starts at 1. When a value is set, starts the numbering at this value.
table of paragraphsBooleanNoNoDisplay a table of paragraphs for the current macro at the top of the macro block

Table of Numbered Paragraphs Macro

Macro displaying a Table of the Numbered Paragraphs of the document.

Parameters:

Parameter nameTypeDefault valueRequiredComments
depthInteger2NoDefines the depth of paragraph the we want to include in the table.
scopeString NoSelect a scope, the table will only list the content found in the selected scope. By default the scope is the current document root.

Results

To be completed.

User Interface

To be completed.

Known limitations

Pdf export support currently requires to manually add some css in "pdf.css". This will not be needed once the pdf export is migrated to use the browser's rendering.


 

Tags:
    

Get Connected