NotificationsOptimization

Version 1.2 by Guillaume Delhumeau on 2019/03/06 14:54

 
 Feature
 Idea

Description

Currently, we have a big problem with notifications performances. In many cases, users are forced to disable notifications so they can simply use their wiki. The more recent example is https://jira.xwiki.org/browse/XWIKI-16207.

Problem 1: Post-Filters can cause very long loop

To understand the problem, we need to consider the current implemenation.

  1. The browser needs to know how many unread notifications there are for the current user. If there is more than 20 notifications, it does not need to know the exact number. Instead, it will display "20+".
  2. An AJAX query is sent for this purpose.
  3. A SQL query is generated, that take care of all enabled filters (don't show the user's event, for example) and watched pages. To avoid using too much database resource, we limit this request to the first 40 results.
  4. For each result, some checks are performed:
    • 4.1. First of all, if the event concerns a document that the user is not allowed to view, then the event is discarded. There is no way to improve the SQL query to take care of the rights so this check cannot be avoided.
    • 4.2. Post-filters are executed. Like the right check, these filters allows to check what cannot be expressed with an SQL query.
    • 4.3. The event is then compared to all events that we already have accepted, in order to group similar notifications inside a "fold" one that we call a CompositeEvent. The idea is to avoid having multiple notifications in the UI that concern the same document, with the same kind of event, but with different dates (like when you click "save & continue" on a document a multiple time during a work session).
  5. After the results have been checked and grouped into CompositeEvents, we count how many of them we have accepted. If we have less than 20 composite events, we go back to step 3 until we have at least 20 CompositeEvents, or until there is no more event in the database.

As you can see, the steps 3-4-5 can be executed a lot of times, in bad conditions. It is currently implemented as a recursive algorithm, which could theoretically lead to a stack overflow (see: https://jira.xwiki.org/browse/XWIKI-15927).

On probematic wikis, I often notice these kind of stacktraces, with a lot of repeating:
[...]
org.xwiki.notifications.sources.internal.DefaultParametrizedNotificationManager.getEvents(DefaultParametrizedNotificationManager.java:142)
org.xwiki.notifications.sources.internal.DefaultParametrizedNotificationManager.getEvents(DefaultParametrizedNotificationManager.java:142)
org.xwiki.notifications.sources.internal.DefaultParametrizedNotificationManager.getEvents(DefaultParametrizedNotificationManager.java:142)
org.xwiki.notifications.sources.internal.DefaultParametrizedNotificationManager.getEvents(DefaultParametrizedNotificationManager.java:142)
[...]

So this is exactly what is going on. It means the SQL queries return a lot of events, but almost all of them are filtered by post-filters or are so similar that they are grouped in a few CompositeEvent.

Some scenarios I can see (in descending order or probability):
A. There is a lot of events in documents that the user is not allowed to see. Adding a filter for the user profile on the restricted space could solve the issue.
B. There is a bug in a post-filter and we need to identify which one and why.
C. There is a lot of "personal messages" (using the Message Sender Gadget) that are filtered only by post-filters (I don't remember why it cannot be expressed with SQL but I had a good reason).
D. The same event is stored multiple times in the database, so it continuously fill the same CompositeEvent.
E. There is a bug in the recursion so the database always return the same results (but it would mean we have an infinite loop, so it would crash).

Problem 2: Notifications are computed each time a page is loaded

There is absolutely no cache mechanism. So, even if the query to fetch the notifications is long, it will be re-executed the next time a user loads a page.

Solution 2-A: Create a memory cache

For each user, we could have a cache that store all the notifications that were returned during the last execution. This would be cleared:

  • each time a new event is triggered
  • after a certain period of time

Solution 2-B: Create an evolving memory cache

Same than 2-A, but when a new event is triggered, the cache would not be cleared. Instead, it will be




 


Tags:
    

Get Connected