Extension Manager - Indexer
Description
We need to put all extensions descriptors from all repository in some index to be able to search among them quickly.
Implementation
How to implement that ?
Maven Indexer (former Nexus Indexer)
http://maven.apache.org/maven-indexer/
That's what is used by M2Eclipse to index maven project for pretty much the same need.
Pros:
- designed for that
- we need to use that anyway to download indexes from maven repositories and parse them
- built in remote maven repositories index incremental fetcher
Cons:
- very maven oriented and it will miss some informations we want (contains really minimal information)
- stored in a file somewhere
- does not store dependencies
- works only with indexed repositories, however indexing a repository is a 1 line command that can be easily scheduled to run periodically
A custom Lucene index
Pros:
- full text search
- Maven Indexer is using Lucene too so that's a good sign I guess
- more control over the information stored compared to Maven Indexer
Cons:
- have to develop it (should not be too hard either)
- stored in a file somewhere
- not designed to store dependencies relations
JCR
I don't knows it very well
Pros:
- full text search capabilities
- can store dependencies relations
?:
- scoring
XWiki database
Pro:
- no need to store some file somewhere
- easier to store dependencies relations
Cons:
- fill the database with datas which are not really needed since that's after all only a cache to speedup things
- Lucene is better for full text search which is the main use case
Other SQL based database
Other NoSQL based database
Getting repositories indexes
Maven
No index provided
First thing: the simplest possible maven repository does not provide any index of any kind which mean for theses one the only way is to follow link in a HTTP request and it probably takes ages to do (actually probably not since these kind of repository are generally small repositories with one project or so) but it's not very hard
- There is really not much point in wasting time to support non-indexed maven repos. In a 5 minute search it is hard to find a maven repository in the wild that is not already indexed by some form of repository manager or even manually.
- Manually indexing a maven repo is a 1 line command
archetype-catalog.xml
Very easy to parse but contains almost nothing: groupid, artifactid and version. Nothing else...
That means we will need to download all the pom.xml in that repository to get useful informations so it's pretty slow too.
Maven indexes
See http://maven.apache.org/maven-indexer/index.html
Very complete (even contains Java classes for jar artifact for example and we could imagine provide wiki pages for a xar artifact since Nexus is extendable).
- Actually, the available indexed information is quite minimal, at least concerning out use cases. We can easily extend the indexed information, but we would only be able to do that for the indexes of repositories that we manage, all other repositories in the wild will have default indexes with default minimal information.
- http://maven.apache.org/maven-indexer-archives/maven-indexer-LATEST/indexer-core/index.html
Some helpers:
- https://github.com/apache/maven-indexer/tree/master/indexer-examples
- https://github.com/cstamas/maven-indexer-examples (obsolete, use the one above)