NotificationsOptimization

Version 1.5 by Guillaume Delhumeau on 2019/03/06 15:06

 
 Feature
 Idea

Description

Currently, we have a big problem with notifications performances. In many cases, users are forced to disable notifications so they can simply use their wiki. The more recent example is https://jira.xwiki.org/browse/XWIKI-16207.

Problem 1: Post-Filters can cause very long loop

To understand the problem, we need to consider the current implemenation.

  1. The browser needs to know how many unread notifications there are for the current user. If there is more than 20 notifications, it does not need to know the exact number. Instead, it will display "20+".
  2. An AJAX query is sent for this purpose.
  3. A SQL query is generated, that take care of all enabled filters (don't show the user's event, for example) and watched pages. To avoid using too much database resource, we limit this request to the first 40 results.
  4. For each result, some checks are performed:
    • 4.1. First of all, if the event concerns a document that the user is not allowed to view, then the event is discarded. There is no way to improve the SQL query to take care of the rights so this check cannot be avoided.
    • 4.2. Post-filters are executed. Like the right check, these filters allows to check what cannot be expressed with an SQL query.
    • 4.3. The event is then compared to all events that we already have accepted, in order to group similar notifications inside a "fold" one that we call a CompositeEvent. The idea is to avoid having multiple notifications in the UI that concern the same document, with the same kind of event, but with different dates (like when you click "save & continue" on a document a multiple time during a work session).
  5. After the results have been checked and grouped into CompositeEvents, we count how many of them we have accepted. If we have less than 20 composite events, we go back to step 3 until we have at least 20 CompositeEvents, or until there is no more event in the database.

As you can see, the steps 3-4-5 can be executed a lot of times, in bad conditions. It is currently implemented as a recursive algorithm, which could theoretically lead to a stack overflow (see: https://jira.xwiki.org/browse/XWIKI-15927).

On probematic wikis, I often notice these kind of stacktraces, with a lot of repeating:
[...]
org.xwiki.notifications.sources.internal.DefaultParametrizedNotificationManager.getEvents(DefaultParametrizedNotificationManager.java:142)
org.xwiki.notifications.sources.internal.DefaultParametrizedNotificationManager.getEvents(DefaultParametrizedNotificationManager.java:142)
org.xwiki.notifications.sources.internal.DefaultParametrizedNotificationManager.getEvents(DefaultParametrizedNotificationManager.java:142)
org.xwiki.notifications.sources.internal.DefaultParametrizedNotificationManager.getEvents(DefaultParametrizedNotificationManager.java:142)
[...]

So this is exactly what is going on. It means the SQL queries return a lot of events, but almost all of them are filtered by post-filters or are so similar that they are grouped in a few CompositeEvent.

Some scenarios I can see (in descending order or probability):
A. There is a lot of events in documents that the user is not allowed to see. Adding a filter for the user profile on the restricted space could solve the issue.
B. There is a bug in a post-filter and we need to identify which one and why.
C. There is a lot of "personal messages" (using the Message Sender Gadget) that are filtered only by post-filters (I don't remember why it cannot be expressed with SQL but I had a good reason).
D. The same event is stored multiple times in the database, so it continuously fill the same CompositeEvent.
E. There is a bug in the recursion so the database always return the same results (but it would mean we have an infinite loop, so it would crash).

Problem 2: Notifications are computed each time a page is loaded

There is absolutely no cache mechanism. So, even if the query to fetch the notifications is long, it will be re-executed the next time a user loads a page.

Solution 2-A: Create a memory cache

For each user, we could have a cache that store all the notifications that were returned during the last execution. This would be cleared:

  • each time a new event is triggered
  • after a certain period of time

Solution 2-B: Create an "evolving" memory cache

Same than 2-A, but when a new event is triggered, the cache would not be cleared. Instead, the new event will be considered for each user, and added the cache if the event is not filtered by the user preferences.

Solution 2-C: Store in a permanent storage the notifications for each user

This title is not exactly what I meant. The idea is not to store each notification in a table (the event is already in the event stream store), but the "read" status of each event for each user, when the event is recorded.

Concretely, this is how it would perform:

  1. An event is triggered in XWiki.
  2. The event is recorded in the Event Stream table.
  3. For each user, we compute either or not the event should be displayed, according to their preferences.
    1. If the event should be displayed for the user, create a line in the "Event Status" table for the user, with the "read" value to false.
    2. If the event should not be displayed for the user, it is simply dismissed.
  4. When the user fetch the notifications, there is no complex SQL query to perform anymore. We just need to look into the "Event Status" table which events are linked to the current user id.

Pros:

  • No need for SQL injections, resulting in a complex SQL query, that can be buggy.
  • Even if the notifications are fetched every time a page is loaded, it is not very costly (the "Event Status" table acts like a cache).

Cons:

  • On a wiki with a very large number of users, it means we would need to write a lot of lines in the "Event Status" table every time an event is triggered.
  • Since the filters need to be computed for each individual user, this process can take a lot of time, and it must be performed asynchronously.
  • "Event Status" are generated for users who never goes to the wikis, which is a waste of resource.
  • This mechanism cannot be applied for the Notifications Macro (that was designed to replace the Activity Stream), so it means both mechanism should co-exists.




 


Tags:
    

Get Connected