Words Based Notifications

Last modified by Vincent Massol on 2024/02/26 17:54

 XWiki
 Feature
 Active
 

Description

The idea of this new feature is to allow an user to define some words or sentences for which they could be notified whenever the word is added in a page. We can imagine multiple usages for such thing like using it for moderating some words in a wiki, or using it to be notified whenever a topic of interest is mentioned.

Usecases

List of usecases

  • UC1: As a user I want to receive notifications whenever an exact word or a group of words is mentioned after a change in some places of a page (title, content, tag, comment)
  • UC2: As a user I want to receive notifications whenever a partial word / group of words is mentioned after a change in some places of a page
  • UC3: As a user I want to receive notifications whenever an expression matching a regular expression is mentioned after a change in some places of a page
  • UC4: As a user when I receive a notification about a word or a group of words being mentioned I want to know the number of occurences for that word/group of words in the page
  • UC5: As an admin I'd like to know what are the words used for the notification, and I want capability to remove some
  • UC6: As an admin I'd like to know the current status for processing the pages for sending notifications
  • UC7: As an admin I want to be able to decide if hidden page should be or not filtered
  • UC8: As an admin I want to be able to decide what spaces should be included in the notifications (whitelist) and/or what space should be excluded (blacklist)
  • UC9: As a user I want to be able to define the location where I want to listen for some words
  • UC10: As a developer I want to be able to express that the feature should analyze a specific content
  • UC11: As an admin I never want a user to receive a notification concerning a page they don't have right view on
  • UC12: As a user I want to see in the notifications the context in which a word/group of words have been used
  • UC13: As a user I want to receive a notification when an occurence of a watched expression is removed from the document
  • UC14: As a user I want to receive a notification when an occurence of a watched expression is moved from a part of the document to another part

Details on each usecase

Terminology: I will use "query string" in some cases when talking about a word/group of words that a user is following.

UC1

As a user I want to receive notifications whenever an exact word or a group of words is mentioned after a change in some places of a page (title, content, tag, comment)

The idea is to perform only exact matches, at least in a first version: this would be improved with UC2 and UC3. We could also check for using Solr but I don't know:

  1. if we want to be able to be notified on some wiki syntax features: e.g. I want to receive notifications only if **foo** is used)
  2. how we index content in Solr
  3. if we index all content that we'd need to check for (e.g. do we index content of xobject textarea?)

Note that here by exact matching we only talk about "exact text matching": so we ignore the case and the formatting for the matching.

So Foo will match foo and FOO and also FoO.

The UI for allowing users to chose those words should be defined in the user profile. Users should be easily see all the groups of words they're following, and they could be easily edit them.

Regarding the place where a content should be looked for, note that it should concern the following places by default (see UC10):

  • title
  • content
  • comment xobject content
  • tag xobject

UC2

As a user I want to receive notifications whenever a partial word / group of words is mentioned after a change in some places of a page

The idea of this usecase is to improve UC1 to allow user to use some jokers when defining the words to listen to. We can imagine at least two types of jokers: * for optional multiple characters and ? for an optional single character. The UI allowing to input the group of words to be used should be clear on the meaning of those jokers and should allow to test the expressions. Also those jokers should be easy to escape.

For example: "Fo* Ba?" would match:

  • Fo Ba
  • Foo Bar
  • Fo Bar
  • Foooooooooo Bar

It wouldn't match:

  • FoBa
  • Foo Barrrrr

UC3

As a user I want to receive notifications whenever an expression matching a regular expression is mentioned after a change in some places of a page

The idea of this usecase is to go even further than UC2 and to allow advanced user to type their own regular expression for the matching. Note that this UC still might make sense even if we use Solr, but it might be a different syntax than the Java regular expression.

Also since it's a pretty advanced usecase we probably don't want it to be used for simple users.

UC4

As a user when I receive a notification about a word or a group of words being mentioned I want to know the number of occurrences for that word/group of words in the page

This UC is about what the notification should contain. First a notification should be triggered per page and per "query string": we don't want to group the notification only per page since a user could follow many query strings and it could lead to a long description in the notification that might be hard to grasp. On the contrary, using notification per page/per query string will lead to more notifications but they are easier to understand.

Then each notification should contain:

  • the title of the modified page, a link to the page and a link to the version of the change
  • the query string that have been identified
  • the number of occurences that have been added in this version
  • the author of the changes and the date of the changes

UC5

As an admin I'd like to know what are the words used for the notification, and I want capability to remove some.

The idea of this usecase is mainly to ensure admin can have control over the user settings if for some reason they are causing troubles.

UC6

As an admin I'd like to know the current status for processing the pages for sending notifications.

The goal of this usecase is to have an idea how much time it takes to send the notifications and to be able to detect problems.

UC7

As an admin I want to be able to decide if hidden page should be or not filtered.

By default hidden page should not be analyzed to send notifications: it optimize a bit the process and in general it should never be needed since those pages are supposed to be technical. Now we should still allow admin to decide whether or not those page should be analyzed, so it should be configurable.

UC8

As an admin I want to be able to decide what spaces should be included in the notifications (whitelist) and/or what space should be excluded (blacklist)

The idea of this feature is to allow admins to control exactly in which locations this feature would be used: for example, an admin could blacklist the XWiki space to avoid getting the user profiles analyzed at all even if they are public.

Note that the whole feature would be an extension installed on each subwiki: so even on a farm the admin could only blacklist / whitelist location of current wiki.

UC9

As a user I want to be able to define the location where I want to listen for some words

The idea here is for user to have fine-grained capability to define where a query string should be checked for: for example, we could have a query string "Foo" looked for in space Sandbox and another query string "Bar" looked for in space Main. In such scenario, adding "Foo" in space Main wouldn't trigger a notification.

The UI for such usecase would be to be able to select a space for each new query string added: by default if no space is selected it would concern the whole wiki. Note that this UI should probably take into account UC8 to prevent users selecting spaces that are not analyzed.

UC10

As a developer I want to be able to express that the feature should analyze a specific content

The origin of this UC is two-folds: first we want to restrict in UC1 the content to analyze in order to improve performance, which explains why we don't perform analysis of any textarea like we do for mentions. Then we have a usecase with Change request where the data to analyze are not stored in xobject but in attachment: we surely don't want to analyze any attachments. So it sounds like a good idea to give the capability for developers to decide what should be analyzed or not.

UC11

As an admin I never want a user to receive a notification concerning a page they don't have right view on

This is an almost obvious usecase: users who cannot see a document should not receive notifications about them. Admin shouldn't have to do anything for having that, except setting rights properly. This UC is mainly written to ensure we check the rights.

UC12

As a user I want to see in the notifications the context in which a word/group of words have been used.

The idea of this usecase is to be able to see in the notifications an excerpt of the sentence / paragraph where the word/group of words has been used. For such usecase, we should probably be able to take back some work done for Mentions. However there is still open questions on how to display those in case of multiple occurrences.

UC13

As a user I want to receive a notification when an occurrence of a watched expression is removed from the document.

The idea of this usecase is to go further UC1/2/3 and to receive notification for any change in the number of occurrences of the watched expressions: so not only an addition of an occurrence should trigger a notification, but also a removal of an existing occurrence. Now as this might not be something that all users might want receive notifications for, we might make it optional with a dedicated event type so that it would be switchable by users.

UC14

As a user I want to receive a notification when an occurrence of a watched expression is moved from a part of the document to another part.

The idea of this usecase is to go even further than UC13 and to be able to notify a user if a mention has been moved from a part of the document to another: e.g. if a word was mentioned in the title and has been removed but is now mentioned in the description. Note that this UC is not about being able to notify users when a mention has been moved inside the same area: i.e. if we move a watched word inside another paragraph of the content, this won't produce a notification.

Also for this usecase the notification will be on the form: "XXX has one less mention on the title (YY mentions) and one more mention in the content (ZZ mentions)". So it will basically only indicate the changes in the number of occurrences: it won't tell the user that there was an actual move.

Proposed architecture

The idea is to reuse the same architecture than Mention which is really close to that feature.

  • a listener checking for document created / updated and adding tasks for word based notifications
  • a component dedicated to retrieve the list of word to analyze for a given space (and capable of caching that information)
  • a task consumer getting the list of word to look for in a given document, getting the various analyzers and using them to look for the words
  • an analyzer component role and different implementations allowing to look into the content of the doc, the content of some xobject etc, the idea being to be compliant for UC10
  • a dedicated event, the associated descriptor and the template to display notifications
  • an xclass for the word based notification preferences and a UIX to the user profile to allow define the word to look for

Performance

This new feature might be expansive on two aspects:

  1. the analysis of the pages
  2. the large number of notifications to trigger

The task executor implemented for mention and indexing is already designed to avoid performance issue with expansive task, which is the reason why I plan to reuse it. This is part of the answer for problem 1. For problem 2, we will rely on the XWiki Standard notification system which should now be much more scalable and it's part of the answer for problem 2.

Besides those two choices, there will be some improvments to be done to improve performances:

  1. query strings should be cached with their referenced users
  2. when performing analysis, if possible a single regular expression should be used for all the query strings: I need to investigate how/if we could use a single big regular expression containing multiple named groups that would refer to the set of users for which to trigger the notifications in case of new matches
  3. the choice to not analyze everything in UC1 might help to optimize the analysis by not looking in every xobjects

Planification

I'm trying to present in that section the way I see the planification for developing this feature. This planification is organized per version, without a clear view of how much time a version takes to be done.

Version 1

Version 1 should include UC1, UC4, UC5, UC10 and UC11.

UC1, UC4 and UC11 are obviously the heart of the feature. UC5 will be done immediately since we need a UI in the user profile so we'll take care of giving access of admins to it. UC10 is not obvious but it will be probably immediately achievable because of the component nature of the analyzers, so it mainly depends on architectural choices.

Version 2

Version 2 should include UC2, UC6, UC7, UC13.

Version 3

Version 3 should include UC8 and UC9

Version 4

Version 4 should include UC3 and UC12

Open questions

  • How to display the context of the word/group of words in notifications when there's multiple occurrences: check what has been done in mention

 


Get Connected