Some good points here, Stas, but I'd say your take on test duration seems lacking. No one says you can't run all or most of your tests in parallel, so a 30 day duration is not a limiting factor in this sense. It only delays the possible implementation of the specific change, and any directly dependent tests/changes.
Given your interest in the default 5%/20% error rate trade-off, I'd also encourage you to explore my work on determining sample size and significance thresholds which achieve optimal balance of risk and reward.