there are a few terminology issues that are endemic across the whole document. i've caught some of them but you need to check that i've caught them all. - we are not running on different "data sets". we are running on only one "data set" but we are running different "queries". so we can't ask your RQ2 (so i deleted it). - we are not doing effort estimation, we are doing quality optimization. other editorial issues - i deleted the threats to validity section; that needs further work smaller issues: - there are a lot of raw "W" and not $W$.