CAPTCHA investigation 7.0
Description
The CAPTCHA problem and state of the art
There are many articles on the topic, but we have listed below 2 recent (2014) ones that present the problem we are trying to solve with CAPTCHAs and what is the current state of the art.
- Think Your Site Needs CAPTCHA? Try These User-Friendly Alternatives.
- A Better CAPTCHA: Are We There Yet?
Requirements
The second article extracts some requirements:
- It must be accessible.
- It must be non-disruptive and transparent to the end user.
- It cannot detract or distract from the primary purpose of the page.
- It must be automated or require very little moderation on a large scale.
- It cannot be a 3rd-party service.
- It shouldn’t put a huge strain on the server/browser.
- It must have a low percentage of false positives and false negatives.
Note: Point 5 may not be so important/useful in our case, since the only place where CAPTCHAs are needed is public websites. By definition, public websites are... public so there should be no problem in using/accessing a 3rd-party service for retrieving and solving CAPTCHAs, even for an enterprise-oriented product as is XWiki Enterprise. Also, a remote service it compensates at point 6, since there is no more work to be done on the server (XWiki instance) for generating random images (which is not really a negligible performance penalty).
Conclusion
Out of all the available solutions until now (text, images, games, honeypots, checkboxes, drag&drop, etc.), the one that pretty much takes care of all the problems, including accessibility concerns, and does not suffer from the general problem of random generated algorithms (i.e. is not easily reversible) seems to be Google's recently introduces NO CAPTCHA reCAPTCHA.
Proposal: Google's NO CAPTCHA reCAPTCHA
Launched: December 2014
Presentation: http://googleonlinesecurity.blogspot.com/2014/12/are-you-robot-introducing-no-captcha.html
Accessibility aspects handled better than in the previous reCAPTCHA implementation
- "Optimistic" review: http://simplyaccessible.com/article/googles-no-captcha/
- "Not there yet" review: http://blog.adrianroselli.com/2014/12/recaptcha-reboot.html
More advanced than a simple CAPTCHA solution
- JavaScript generated checkbox
- Minimises the number of old-fashion CAPTCHAs a user has to answer, based on Google's history/analysis/knowledge about the fact that the user is, in fact, not a robot
- Possibly more advanced checks will be implemented along the way, in the background (like analysing the client's interaction with the form through javascript, etc.)
Initial implementation problems: http://www.shieldsquare.com/blog/sorry-google-captcha-recaptcha-doesnt-stop-bots/
- Currently the advertised "high degree of sophistication" implies just using a simple cookie, so if a user is verified once, he won`t be verified again next time. However, bots can store cookies and get past this easily
- Introduces a new clickjacking problem (solving for a different site than the current one)
- Once a bot fails the checkbox test, he can still attack the CAPTCHA test the same way he did before
- General problem of all anti-SPAM systems: Vulnerable to cheap human labour attacks.
High profile target
- Many people will write specialized bots to get past it
Proof of concept
This is just a demo of how to directly use the NO CAPTCHA reCAPTCHA service inside XWiki, without any integration work done. You need to fill in the public and secret site keys from your Google reCAPTCHA account.
#if (!$request.submit)
{{html}}
<script src='https://www.google.com/recaptcha/api.js'></script>
<form>
<div class="g-recaptcha" data-sitekey="YourPublicSiteKeyHere"></div>
<input name='submit' type='submit' class='button' value='Go Captain Planet!'>
<form>
{{/html}}
#else
Form submitted.
$xwiki.getURLContent("https://www.google.com/recaptcha/api/siteverify?secret=YourSecretSiteKeyHere&response=${request.g-recaptcha-response}&remoteip=${request.remoteAddr}")
#end
{{/velocity}}
XWiki Integration
We already have a generic CAPTCHA Module that defines a CaptchaVerifier component interface that can have multiple implementations, however this is not enough, since the task of actually displaying a CAPTCHA is left to the caller who has to do a different thing, depending on the implementation he uses.
We could improve this by adding a CaptchaDisplayer component interface that would have 2 methods:
- display()
- Displays the CAPTCHA challenge in HTML format.
- Maybe pass the output syntax as a parameter?
- String getAnswerRequestParameterName()
- Returns the name of the request parameter where the challenge expects the user to enter the answer. Used on the form handler by the caller to feed the answer to the CaptchaVerifier.
Ultimately, we could also consider an entire refactoring of the CAPTCHA Module and have just a component interface of CaptchaService:
- String getChallenge(String outputSyntax)
- String getAnswerRequestParameterName()
- boolean verify(String answer)
...and a CaptchaServiceManager component interface:
- List<String> listAvailableCaptchaNames()
- CaptchaService get()
- void set(String serviceName)
This way, we are able to write an administration UI similar to what we have for Search Administration, where we select the implementation/engine we want to use and we are then presented with the configuration specific to the selected option.
All clients of the CAPTCHA module would just use a script service to display a CAPTCHA challenge in their app's form and then, on the form handler, validate the CAPTCHA's answer to validate the form. Whenever the implementation is changed, all the clients of the new CAPTCHA module will still function correctly, without having to change their code.