Feature
 Completed
 

ActiveInstalls

Description

This is the second version of ActiveInstalls. Its aim is to support the following use cases which couldn't all be fulfilled by the first version:

  • Active installs as of now
  • Total installs (ie per unique id)
  • Active installs over time (graph)
  • How frequently users upgrade their version of XWiki
  • Most used extensions
  • Average time an instance is used (by range, < 1 day, 2-7 days, 7-30 days, 30-365 days, > 365 days)
  • Drop rate or rather sticking rate = total installs / active installs * 100
  • Graph over time of number of extensions used
  • To be decided:
    • Ability to graph countries where XWiki instances are located
    • Java version used
    • Database engine used
    • OS used

Table of contents:

New Format

Old format:

curl -XPUT "http://localhost:9200/installs/install/<UUID>" -d'
{
  "formatVersion" : "1.0",
  "date" : "<last ping date>",
  "distributionVersion" : "<version>",
  "distributionId" : "<distribution id, eg org.xwiki.enterprise:xwiki-enterprise-web>",
  "extensions" : [
    {
      "id" : "<extension id>",
      "version" : "<extension version>"
    },
    ...
  ]
}'

New format:

curl -XPUT "http://localhost:9200/installs/install/<unique id generated by ES>" -d'
{
  "formatVersion" : "2.0",
  "instanceId": "<unique instance id>",
  "distributionVersion" : "<version>",
  "distributionId" : "<distribution id, eg org.xwiki.enterprise:xwiki-enterprise-web>",
  "extensions" : [
    {
      "id" : "<extension id>",
      "version" : "<extension version>"
    },
    ...
  ]
}'

This means we would store all pings sent by XWiki client instances and not just the last ping. This will give us the ability to graph the history.

Note that the date would be handled by ES's timestamp field which has the advantage of being the date on the ES server and not the date on the XWiki client instance which can be wrongly set.

Alternate Format

curl -XPUT "http://localhost:9200/installs/install/<UUID>" -d'
{
  "formatVersion" : "1.0",
  "pings" : [
    {
      "date" : "<ping date>",
      "distributionVersion" : "<version>",
      "distributionId" : "<distribution id, eg org.xwiki.enterprise:xwiki-enterprise-web>",
      "extensions" : [
        {
          "id" : "<extension id>",
         "version" : "<extension version>"
        },
        ...
      ]
    },
    ...
  ]
}'

However this format seems more complex to query. For example, graphing the active installs over time seems more complex than the other format since it means

Implementation

  • Active installs as of now
    curl -XGET "http://localhost:9200/installs/install/_search?search_type=count&pretty=1" -d'
    {
        "aggs": {
            "last_day" : {
                "filter" : {
                    "range" : {
                        "_timestamp" : {
                            "gt" : "now-1d"
                        }
                    }
                },
                "aggs" : {
                    "instanceId_count" : {
                        "cardinality" : {
                            "field" : "instanceId"
                        }
                    }
                }
            }
        }
    }'

    Implementation for alternate format:

    curl -XGET "http://localhost:9200/installs2/install/_search?search_type=count&pretty=1" -d'
    {
        "aggs": {
            "last_day" : {
                "filter" : {
                    "range" : {
                        "date" : {
                            "gt" : "now-1d"
                        }
                    }
                }
            }
        }
    }'
  • Total installs (ie per unique id)
    curl -XGET "http://localhost:9200/installs/install/_search?search_type=count&pretty=1" -d'
    {
        "aggs": {
            "instanceId_count" : {
                "cardinality" : {
                    "field" : "instanceId"
                }
            }
        }
    }'

    Implementation for alternate format:

    curl -XGET "http://localhost:9200/installs2/install/_count&pretty=1" -d'
    {
        "query": {
            "match_all": {}
        }
    }'
  • Active installs over time (graph)
    curl -XGET "http://localhost:9200/installs/install/_search?search_type=count&pretty=1" -d'
    {
        "aggs": {
            "activeinstalls_over_time" : {
                "date_histogram" : {
                    "field" : "_timestamp",
                    "interval" : "day"
                },
                "aggs" : {
                    "instanceId_count" : {
                        "cardinality" : {
                            "field" : "instanceId"
                        }
                    }
                }
            }
        }
    }'

    Each returned bucket will contain the active installs for the period (1day in this example). For example:

    ...
     "aggregations" : {
       "activeinstalls_over_time" : {
         "buckets" : [ {
           "key_as_string" : "2014-03-20T00:00:00.000Z",
           "key" : 1395273600000,
           "doc_count" : 1,
           "instanceId_count" : {
             "value" : 1
           }
         }, {
           "key_as_string" : "2014-04-04T00:00:00.000Z",
           "key" : 1396569600000,
           "doc_count" : 2,
           "instanceId_count" : {
             "value" : 1
           }
         } ]
       }
    ...

    Kibana3 doesn't support graphing aggregations. This is planned for Kibana4 around end of year. Thus it means doing our own Dashboard in XWiki, which isn't a bad idea in any case and shouldn't be hard to achieve.

    Implementation for alternate format:

    curl -XGET "http://localhost:9200/installs2/install/_search?search_type=count&pretty=1" -d'
    {
        "aggs": {
            "activeinstalls_over_time" : {
                "date_histogram" : {
                    "field" : "date",
                    "interval" : "day"
                }
            }
        }
    }'

    Just using facets (ie compatible with Kibana3):

    curl -XGET "http://localhost:9200/installs2/install/_search?search_type=count&pretty=1" -d'
    {
       "query" : {
           "match_all" : {}
       },
       "facets" : {
           "histo1" : {
               "histogram" : {
                   "field" : "date",
                   "time_interval" : "1d"
               }
           }
       }
    }

    Note that this last solution produces too many results: if an instance does several pings during the same day they're counted several times which isn't correct.

  • How frequently users upgrade their version of XWiki
  • Most used extensions
  • Average time an instance is used (by range, < 1 day, 2-7 days, 7-30 days, 30-365 days, > 365 days)
  • Drop rate or rather sticking rate = total installs / active installs * 100
  • Graph over time of number of extensions used
  • (to be decided if we want this one now or not) Ability to graph countries where XWiki instances are located:
    • Use http://ipinfodb.com/ip_location_api_json.php and to be nice with their server, store the country or the lat/long in our DB in the xwikiid table (InstanceId class)
    • This means registering a global XWiki key against http://ipinfodb.com/ with the risk of it being abused, not very nice...
    • If we store long/lat then we can tell ES that it's a Geopoint which allows to make query based on location.

It's also worth considering using the Apache logs for xwiki.org to handle the following use cases:

  • Listing what extensions are the most asked for
  • Listing  the extensions used by a given instance. Note that for this use case, the apache logs are not perfect since they won't contain the unique XWiki instance id and thus the same IP address can hide several XWiki instances installed at the same company.
  • Geo-locating XWiki users on a map (this is done by default by tools that analyze Apache logs such as piwiki or awstats).

Data

Some script to create and fill data in an ES instance.

curl -XDELETE "http://localhost:9200/installs"

curl -XPUT "http://localhost:9200/installs"

curl -XPUT "http://localhost:9200/installs/install/_mapping" -d'
{
  "install" : {
    "_timestamp" : {
      "enabled" : true,
      "store" : true
    },
    "properties" : {
      "formatVersion" : { "type" : "string", "index" : "not_analyzed" },
      "instanceId" : { "type" : "string", "index" : "not_analyzed" },
      "distributionId" : { "type" : "string", "index" : "not_analyzed" },
      "distributionVersion" : { "type" : "string", "index" : "not_analyzed" }
    }
  }
}'


curl -XPOST "http://localhost:9200/installs/install?timestamp=2014-02-20" -d'
{
  "formatVersion" : "2.0",
  "instanceId" : "abc",
  "distributionId" : "org.xwiki.enterprise:xwiki-enterprise-web",
  "distributionVersion" : "6.0-milestone-1"
}'


curl -XPOST "http://localhost:9200/installs/install?timestamp=2014-03-20" -d'
{
  "formatVersion" : "2.0",
  "instanceId" : "abc",
  "distributionId" : "org.xwiki.enterprise:xwiki-enterprise-web",
  "distributionVersion" : "6.0-milestone-2"
}'


curl -XPOST "http://localhost:9200/installs/install" -d'
{
  "formatVersion" : "2.0",
  "instanceId" : "def",
  "distributionId" : "org.xwiki.enterprise:xwiki-enterprise-web",
  "distributionVersion" : "5.4.3"
}'


curl -XPOST "http://localhost:9200/installs/install" -d'
{
  "formatVersion" : "2.0",
  "instanceId" : "def",
  "distributionId" : "org.xwiki.enterprise:xwiki-enterprise-web",
  "distributionVersion" : "5.4.3"
}'


curl -XGET "http://localhost:9200/installs/install/_search?pretty=1&fields=_source,_timestamp" -d'
{
   "query": {
      "match_all": {}
   }
}'

Data for alternate format

curl -XDELETE "http://localhost:9200/installs2"

curl -XPUT "http://localhost:9200/installs2"

curl -XPUT "http://localhost:9200/installs2/install/_mapping" -d'
{
  "install" : {
    "properties" : {
      "formatVersion" : { "type" : "string", "index" : "not_analyzed" },
      "pings" : {
        "properties" : {
          "date" : { "type" : "date" },
          "distributionId" : { "type" : "string", "index" : "not_analyzed" },
          "distributionVersion" : { "type" : "string", "index" : "not_analyzed" }
        }
      }
    }
  }
}'


curl -XPOST "http://localhost:9200/installs2/install/abc" -d'
{
  "formatVersion" : "2.0",
  "pings" : [
    {
      "date" : "2014-02-20",
      "distributionId" : "org.xwiki.enterprise:xwiki-enterprise-web",
      "distributionVersion" : "6.0"
    },
    {
      "date" : "2014-03-20",
      "distributionId" : "org.xwiki.enterprise:xwiki-enterprise-web",
      "distributionVersion" : "6.1"
    },
    {
      "date" : "2014-04-14",
      "distributionId" : "org.xwiki.enterprise:xwiki-enterprise-web",
      "distributionVersion" : "6.1"
    }
  ]
}'


curl -XPOST "http://localhost:9200/installs2/install/def" -d'
{
  "formatVersion" : "2.0",
  "pings" : [
    {
      "date" : "2014-04-14T00:00:00.000Z",
      "distributionId" : "org.xwiki.enterprise:xwiki-enterprise-web",
      "distributionVersion" : "5.4.3"
    },
    {
      "date" : "2014-04-14T00:05:00.000Z",
      "distributionId" : "org.xwiki.enterprise:xwiki-enterprise-web",
      "distributionVersion" : "5.4.3"
    }
  ]
}'

Backward Compatibility

  • Existing queries will need to ensure that they filter on the formatVersion so that they can handle the format change.
  • Introducing this new format means that the # of active installs for format 1.0 is going to reduce over time till it reaches 0 (when all instances will have migrated to the new version using the new format - e.g 6.0 or 6.1)
  • If we wish to continue showing the full figure, we'll need to sum the active installs figure from "format 1.0" with the figure from "format 2.0". This can be done transparently in the Active Install module's code. However the graph over time will only work with "format 2.0" data obviously.

 


Tags:
Created by Vincent Massol on 2014/04/01 11:38
    

Get Connected