Georgi Georgiev
2 min readFeb 27, 2020

--

Hi Nikolas,

Thank you for such a quick and thorough response. I’ll try to address all points back.

‘I believe variant testing usually is a means to solve a greater optimization problem, e.g. maximizing profit.’ — yes, and this means we need to take into account a lot of things external to the test data. Neither utility nor loss is fully captured in the test data itself. However, if we limit the role of the statistical analysis to an assessment of the error in the data, the profit maximization exercise can be left where it belongs — in the domain of decision-making (separate from, though related to the statistical analysis).

‘ I have to give you this one — assuming you meant “exploration phase”. Of course the users experiencing the superior scenario in this phase are already implicitly being “exploited”.’ — no, I meant exploited, in the sense that we are gaining from users being in the variant if the variant is performing better. My pushback is against this: ‘ we are purely paying to explore’ we are not purely paying to explore since we are raking in benefits / loses from the variant arm.

‘ The Split test here has been adjusted for sequential testing by multiplying the p-value for each round by the number of rounds so far (Bonferroni correction).’ — my apologies on this and the next point, I had it in my mind to check the p-values but I guess I ended up checking the X^2 scores only. Bonferroni is a conservative adjustment here, so it is an inefficient solution, but valid nonetheless.

On one-tailed tests — I’ve made the argument for them in many places. You can see http://blog.analytics-toolkit.com/2017/one-tailed-two-tailed-tests-significance-ab-testing/ for one version of my argument. I’ve refined it since — in my book and elsewhere, including a dozen articles on different aspects of the issue over at www.onesided.org if you are in for a deeper dive.

‘ In that regard, what do you mean by a misleading bandit?’ — I mean that in most scenarios you’d be compelled to go with either the variant or the control. Even in ad testing there is a point where you’d say — enough — we might be losing just 5% of our revenue due to the bandit still exploring, but it is unnecessary and we need to choose one over the other. What would be the error rate in making such a decision based on what the bandit says at that point in time?

More Options — has any correction been applied to the chi-square calculation, e.g. Dunnett’s? I’m pretty unsure it wasn’t, making it, again, an invalid frequentist test.’

> ‘The Bonferroni correction has been applied (see above).’

>> I meant a correction for multiple testing in the sense of testing A/B/n. Shall I understand that you applied Bonferroni twice — once for the peeking and once again for the multiple variants?

Thanks,
Georgi

--

--

Georgi Georgiev
Georgi Georgiev

Written by Georgi Georgiev

Applied statistician and optimizer by calling. Author of “Statistical Methods in Online A/B Testing”. Founder of Analytics-Toolkit.com and GIGAcalculator.com.

Responses (1)