there are a few terminology issues that are endemic across the whole document. i've caught some of them but you need to check that i've caught them all.

- we are not running on different "data sets". we are running on only one "data set" but we are running different "queries". so we can't ask your RQ2 (so i deleted it).

- we are not doing effort estimation, we are doing quality optimization.

other editorial issues

- i deleted the threats to validity section; that needs further work 

smaller issues:

- there are a lot of raw "W" and not $W$.