I write to protest, in the strongest possible terms, at the reviews of
my recent submission to TKDE. While there are certain comments of
merit in the reviewer remarks, on the whole the reviewers just missed
the point.

To some extent, the problem was in the presentation of the paper.  It
was my goal to produce a short TKDE paper and so perhaps I was too
terse in some areas. A rewrite of this paper, perhaps with two extra
pages is absolutely required and that rewrite would address certain
issues raised by the reviewers.

Nevertheless, I strong believe that the reviewers fixated on surface
features of the paper and never really considered the paper's main message.

----------------------------------
Reviewer one remarks:

  "However, this interesting idea has not been studied gracefully
  enough to justify its publication.".

In this regard, I just don't understand "gracefully enough". The paper
presents algorithms, discusses related work, then analyzes the
performance of those algorithms on dozens of data sets (20 from UC
Irvine, some KDD cup data and an F18 flight simulator
data). limitations to the analysis are then discussed. What more does
Reviewer One want?

----------------------------------
Reviewer one also comments:

  There exist a large number of discretization methods for naove-Bayes
  as well as concept drift learning algorithm. However no empirical
  results are presented that compare SPADE or SAWTOOTH against its
  alternatives. This leaves readers wonder why they can be claimed to
  work ``very well``.

This remark makes no sense at all. In the paper we state: "Provost and
Kolluri [2, p22] comment that sequential learning strategies like
windowing usually performs worse than learning from the total set". In
figure 3 we compare our incremental method (which suffers from the
Provost and Kolluri warning) to a widely used non-incremental method
(kernel estimation) and we do (nearly) as well as kernel
estimation. But kernel estimation requires N passes thru the system so
won't scale to large data sets. Our method requires 1 pass and has a
low memory footprint and so will scale to very large data sets. If our
system does better than a non-incremental scheme, why do we need to
empirically compare it against the LOWER baseline of other incremental
schemes?

To be fair to reviewer one, the current text does not place enough
stress on the above point. This could be fixed in a rewrite.

-| 1 |--------------------------------
Reviewer one also comments:

  In the conclusion section, the paper claims that one advantage is
  that ``In Figiure 3... This discretizer performed nearly as well as
  other discretization methods without requiring multiple passes
  through the data``. However, in Figure 3, SPADE is only compared
  with naove-Bayes with kernel estimation, which does not involve
  discretization at all. Where is the conclusion drawn from then?

The following point is not stressed in the paper and could be fixed in a rewrite.
The paper references the following, widely cite, publication:
James Dougherty, Ron Kohavi, and Mehran Sahami, Supervised and
unsupervised discretization of continuous features, in International
Conference on Machine Learning, 1995, pp. 194-202. 

XXX andres:  do we compare our stuff with n-bins etc?

-| 2 |---------------------------------
Reviewer one comments:

  The understating of (naove) Bayes classifiers is far less than
  accurate. In the first paragraph on page 6, it is said that ``Bayes
  classifiers are called naove``. This expression is misleading. Bayes
  classifiers have a very big family. Naove Bayes is only one member
  out of it. Nobody calls Bayes classifiers naove except for naive
  Bayes.  

Reviewer one is being very unkind here. Many authors use the term
Naive Bayes: Langley, Pazzini and Domingoes, the whole WEKA
team. Domingos and Pazzini comment: "The classifier obtained by using
this set of discriminant functions, and estimating the relevant
probabilities from the training set, is often called the naive
Bayesian classifier."

@misc{ domingos97optimality,
  author = "P. Domingos and M. Pazzani",
  title = "the optimality of the simple Bayesian classifier under zero-one loss",
  text = "Domingos, P., & Pazzani, M. (1997). On the optimality of the simple Bayesian
    classifier under zero-one loss. Machine Learning, 29, 103--130.",
  year = "1997",
  url = "citeseer.ifi.unizh.ch/domingos97optimality.html" }

We considered adopting the Domingos and Pazzini renaming (call them
"Simple" not "Naive") but a review of public domain sources showed
that "Naive" was a more common term than "simple" (e.g. wikipedia has
an entry for "naive bayes" but not for "simple bayes"). So we stayed
with the common parlance. 

-| 3 |---------------------------------
Reviewer one then comments

  The paper then goes on by saying `` since they assume that
  the frequencies of different attributes are independent``. This
  statement is wrong.  Instead, naove Bayess `attribute independence
  assumption` is: ``attributes are independent of each other given the
  class``.

This is a minor typo and we thank reviewer one for that
correction. Our understanding the class dependencies is clearly shown
in fig 2 (the classify function, where the frequency counts from
different class hypotheses are added to separate parts of the
frequency table).

-| 4 |---------------------------------
Reviewer one then comments:

  SPADE is interesting since it does not need to repeat scanning the
  data. This will be useful in applications where one can not retain
  the whole historical data. However, there are two potential pitfalls
  that the paper fails to address:

   >>> first on the merge mechanism. It produces new cut points from
  the old cut points. For example, the old discretization of age is (,
  [30, 39], [40, 49], ). Merging the two intervals will still retain
  the old cut points like 30 and 49. But what if should the
  appropriate cut points be 35 and 45 instead?

   >>> second on lacking a split mechanism. Although the paper has
  mentioned it is because ``do not know how to best divide up a bin
  without keeping per-bin data`` and `` experiments suggested that
  adding SubBins=5 new bins between old ranges and newly arrived
  out-of-range values was enough to adequately divide the range``,
  those arguments can not trade-off the need of a split operator. For
  example, the instances are patients coming into a clinic one after
  the other. The first one is an infant while the second one is an old
  lady. In the two first instances, one has seen the two far ends of
  the age attribute [1, 90]. SPADE will produce 1+5 intervals by now
  and forever (assume the oldest is 90 years old). The reason behind
  this sub-optimality is that the attribute values do not necessarily
  gradually change, they can abruptly shift.

How does Reviewer One reconcile their theoretical concerns
with SPADE and our experimental results? Is the reviewer saying that
our experiment methods are somehow in error?  We would be happy++ to
supply more information on those experiments.

But we should add that when we first designed SPADE, we shared the
above concerns. However, on experimentation (and those experiments
are clearly described in the paper), those concerns turned out to be
irrelevancies. EVERY learning method has a search bias and once that
is known it is possible to create and example that defeats that method
(as Reviewer One does in the above paragraph). For example:

** Naive Bayes (which we will call "Simple Bayes" in future drafts)
   assumes independence between attributes (given the same class) and
   with that knowledge is it possible to devise examples that confuse
   that classifier.

However, in practice, those kinds of examples have yet to be seen in
naturally occurring data sets. In fact, that algorithm works
astonishingly well:

** witness the good performance of naive bayes shown in the above
   Domingos and Pazzani paper,

** see also our own experiments at
   http://www.cs.pdx.edu/~timm/scant.org/2/xval.html). 

So, to assess a learner (or a discretization scheme), it is not enough
merely to conduct small experiments on one made-up example.  Instead,
we need to explore real-world data in all its glorious complexity. And this is
what this paper does (see fig3 of our paper).

-| 5 |------------------------
Reviewer One says:

  3. The paper mentions the MaxBins parameters is by default set to be
  the square root of all the instances seen to date.  If the paper
  wants to justify this setting, it may help by citing a causal paper:
  Ying Yang and Geoff Webb, Proportional k-interval discretization for
  naive-Bayes classifiers, ECML 2001.

This remark is incredible. Did Reviewer One give this paper any more than a cursory read?
We are well aware of the Yang and Webb work. In the current draft of the paper,
we even cite a paper that is NEWER than the 
older one mentioned by reviewer one:
Y. Yang and G. Webb, Weighted proportional k-interval discretization
for naive-bayes classifiers, in Proceedings of the 7th Pacific-
Asia Conference on Knowledge Discovery and Data Mining (PAKDD
2003), 2003, Available from http://www.cs.uvm.edu/yyang/
wpkid.pdf.


-| 6 |--------------------------

Reviewer Two says:

  However, we now know that, roughly speaking, getting 90% of the best
  possible performance is quite easy, but getting that last 10% can be
  quite hard.  Therefore, the results on the KDDCUP dataset presented
  in this paper are not surprising.  They're close to, but not as good
  as, the results from the winning system which was much more
  complicated.

This comment is a mis-read of the paper. Firstly, we aren't showing a 90-10 rule
using simple methods. We are showing more like 99-1 results using very very simple methods:

** average difference between schemes in Figure 3 is -1.1%; 
** there is barely any difference between SPADE/SAWTOOTH and the KDD cup winner in fig4.
** in fig 5, we are within 3% of standard methods on UCI data.

-| 7 |--------------------------
Reviewer Two says:

  The observations in section II on finding plateaus, and the method
  used, do not seem to constitute a novel contribution.  As the
  authors acknowledge, the fact that relatively few instances often
  suffice has been noticed by others before.  Figure 1 confirms this
  observation yet again.

No, this is not a novel contribution. The paper does not claim that it
is. But it does set the stage for the rest of the paper. Assuming
plateaus, we don't need to work on mega-induction. Instead, for domains
where the data generating phenomena changes SLOWER than
time-to-plateau, the induction problem just becomes "learn what you
can till plateau, disable learning while on the plateau, then
reactivate learning only if you fall off the plateau". And once that
is clear, then the next thing that follows is that standard methods,
with minor modifications, will scale to very large data sets.

The above paragraph was the line of reasoning that lead to this
paper. So we tried the SIMPLEST method we could think of (Naive Bayes)
and it worked- very well. 


-| 8 |-------------------------------------
Reviewer 2 comments:

  The use of sliding windows to deal with non-stationarity is not new,
  though the use of equation 1 to control window growth may be.
  However, that equation is presented without discussion as to its
  derivation and appears to be ad hoc.  That's not necessarily a bad
  thing, but some discussion of why equation 1 is expected to be
  useful is in order.

Equation 1 comes from standard sampling theory when just compares the means
of the SAME phenomena at different times (whereas a student t-test compares the
means of different phonmena). We can spell this out more in a longer draft.
 shows

-| 9 |-------------------------------------
Reviewer 2 comments:

 Section IV is just a review of NB, and section V presents SPADE.
 Figure 3 suggests that SPADE performs roughly as well as John and
 Langley's method, which is true of a large number of other
 discretization methods.  There's nothing particularly new or
 insightful about the approach.

We don't understand this remark at all. 

Using the simplest method imaginable we work as well as widely used methods.
Better yet,  we scale up to large data sets (cause we are one pass). 
And even better than that, we get a built in confidence measure on our
learners. Our learners produce conclusions and (if we track the
average likelihood, as in Figure 7) we get a second measure showing how
much we can trust the conclusions

We are show that very simple methods (incremental Naive Bayes with
windowing) doesn't do NEARLY as well. it out-performs. look at the 

     1	RE: TKDE-0074-0305, "Incremental Discretizastion and Bayes
     2	Classifiers Hanldes Concept Drfit and Scales Very Well"
     3	Manuscript Type: Concise
     4	
     5	Dear Dr. Menzies,
     6	
     7	We have completed the review process of the above referenced
     8	paper that was submitted to the IEEE Transactions on Knowledge
     9	and Data Engineering.  Enclosed are your reviews. We hope that
    10	you will find the editor's and reviewers comments and
    11	suggestions helpful.
    12	
    13	I regret to inform you that based on the reviewer feedback,
    14	Associate Editor, Dr. Qiang Yang could not recommend publishing
    15	your paper to our Editor-in-Chief. Final decisions on acceptance
    16	are based on the referees' reviews and such factors as
    17	restriction of space, topic, and the overall balance of
    18	articles.   
    19	
    20	We hope that this decision does not deter you from submitting to
    21	us again. Thank you for your interest in the IEEE Transactions
    22	on Knowledge and Data Engineering. 
    23	
    24	Sincerely,
    25	
    26	Ms. Susan Miller
    27	Transactions Assistant
    28	IEEE Computer Society
    29	10662 Los Vaqueros Circle
    30	Los Alamitos, CA 90720
    31	USA
    32	tkde@computer.org
    33	Phone: +714.821.8380
    34	Fax: +714.821.9975
    35	
    36	***********
    37	Editor Comments
    38	
    39	Reviewer 2 raised serious concern over the novelty of the work
    40	as well provided many good suggestions (same as reviewer one).
    41	On the basis of their reviews, I have to recommend rejection of
    42	the paper.  
    43	
    44	***********************
    45	
    46	Reviewer Comments
    47	
    48	Please note that some reviewers may have included  additional
    49	comments in a separate file. If a review contains the note "see
    50	the attached file" under Section III  A - Public Comments, you
    51	will need to log on to Manuscript Central to view the  file. 
    52	After logging in to Manuscript Central, enter the Author Center.
    53	Then, click on Submitted Manuscripts and find the correct paper
    54	and click on "View Letter". Scroll down to the bottom of the
    55	decision letter and click on the file attachment link.  This
    56	will pop-up the file that the reviewer included for you along
    57	with their review. 
    58	
    59	***********************
    60	Reviewer 1
    61					
    62	
    63	Section I. Overview
    64	
    65	A. Reader Interest
    66	
    67	1. Which category describes this manuscript?
    68	
    69	( ) Practice / Application / Case Study / Experience Report
    70	
    71	(X) Research / Technology
    72	
    73	( ) Survey / Tutorial / How-To
    74	
    75	
    76	2. How relevant is this manuscript to the readers of this
    77	periodical? Please explain  your rating under IIIA. Public
    78	Comments.
    79	
    80	( ) Very Relevant
    81	
    82	(X) Relevant
    83	
    84	( ) Interesting - but not very relevant
    85	
    86	( ) Irrelevant
    87	
    88	
    89	B. Content
    90	
    91	1. Please explain how this manuscript  advances this field of
    92	research and / or contributes something new to the literature.
    93	Please explain your  answer under IIIA. Public Comments.
    94	
    95	2. Is the manuscript technically sound? Please explain your
    96	answer under IIIA. Public Comments. 
    97	
    98	( ) Yes
    99	
   100	( ) Appears to be - but didn't check completely
   101	
   102	( ) Partially
   103	
   104	(X) No
   105	
   106	
   107	C. Presentation
   108	
   109	1. Are the title, abstract, and keywords appropriate? Please
   110	explain your answer under IIIA. Public Comments.
   111	
   112	( ) Yes
   113	
   114	(X) No
   115	
   116	
   117	2. Does the manuscript contain sufficient and appropriate
   118	references? Please explain your answer under IIIA. Public
   119	Comments.
   120	
   121	( ) References are sufficient and appropriate 
   122	
   123	(X) Important references are missing; more references are
   124	needed
   125	
   126	( ) Number of references are excessive
   127	
   128	
   129	3. Does the introduction state  the objectives of the manuscript
   130	in terms that encourage the reader to read on? Please explain
   131	your answer under IIIA. Public Comments.
   132	
   133	(X) Yes
   134	
   135	( ) Could be improved
   136	
   137	( ) No
   138	
   139	
   140	4. How would you rate the organization of the manuscript? Is it
   141	focused? Is the length appropriate for the topic? Please explain
   142	your answer under IIIA. Public Comments.
   143	
   144	( ) Satisfactory
   145	
   146	(X) Could be improved
   147	
   148	( ) Poor
   149	
   150	
   151	5. Please rate and comment on the readability of this
   152	manuscript. Please explain your answer under IIIA. Public
   153	Comments.
   154	
   155	( ) Easy to read
   156	
   157	(X) Readable - but requires some effort to understand
   158	
   159	( ) Difficult to read and understand
   160	
   161	( ) Unreadable
   162	
   163	
   164	Section II. Summary and Recommendation
   165	
   166	
   167	A. Evaluation
   168	
   169	Please rate the manuscript. Please explain your answer under
   170	IIIA. Public Comments.
   171	
   172	( ) Award Quality
   173	
   174	( ) Excellent
   175	
   176	( ) Good
   177	
   178	(X) Fair
   179	
   180	( ) Poor
   181	
   182	
   183	B. Recommendation 
   184	
   185	Please make your recommendation.  Please explain your answer
   186	under IIIA. Public Comments.
   187	
   188	( ) Accept with no changes
   189	
   190	( ) Author should prepare a minor revision
   191	
   192	(X) Author should prepare a major revision for a second review
   193	
   194	( ) Reject
   195	
   196	
   197	Section III. Detailed Comments
   198	
   199	
   200	A. Public Comments (these will be made available to the author)
   201	 Incremental discretization is enchanting when put into the
   202	context of concept drift. However, this interesting idea has not
   203	been studied gracefully enough to justify its publication.

I don't understand "gracefully enough". 

   204	
   205	The paper title claims that incremental discretization and Bayes
   206	classifiers handle concept drift very well. There exist a large
   207	number of discretization methods for naove-Bayes as well as
   208	concept drift learning algorithm. However no empirical results
   209	are presented that compare SPADE or SAWTOOTH against its
   210	alternatives. This leaves readers wonder why they can be claimed
   211	to work ``very well``.

There are numerous experiments comparing SPADE/SAWTOOTH against its alternatives.

   212	
   213	In the conclusion section, the paper claims that one advantage
   214	is that ``In Figiure 3... This discretizer performed nearly as
   215	well as other discretization methods without requiring multiple
   216	passes through the data``. However, in Figure 3, SPADE is only
   217	compared with naove-Bayes with kernel estimation, which does not
   218	involve discretization at all. Where is the conclusion drawn
   219	from then?
   220	
   221	The understating of (naove) Bayes classifiers is far less than
   222	accurate. In the first paragraph on page 6, it is said that
   223	``Bayes classifiers are called naove``. This expression is
   224	misleading. Bayes classifiers have a very big family. Naove
   225	Bayes is only one member out of it. Nobody calls Bayes
   226	classifiers naove except for naive Bayes.  The paper then goes
   227	on by saying `` since they assume that the frequencies of
   228	different attributes are independent``. This statement is wrong.
   229	Instead, naove Bayess `attribute independence assumption` is:
   230	``attributes are independent of each other given the class``.
   231	
   232	SPADE is interesting since it does not need to repeat scanning
   233	the data. This will be useful in applications where one can not
   234	retain the whole historical data. However, there are two
   235	potential pitfalls that the paper fails to address:
   236	
   237	   >>> first on the merge mechanism. It produces new cut points
   238	from the old cut points. For example, the old discretization of
   239	age is (, [30, 39], [40, 49], ). Merging the two intervals
   240	will still retain the old cut points like 30 and 49. But what if
   241	should the appropriate cut points be 35 and 45 instead? 
   242	
   243	   >>> second on lacking a split mechanism. Although the paper
   244	has mentioned it is because ``do not know how to best divide up
   245	a bin without keeping per-bin  data`` and `` experiments
   246	suggested that adding SubBins=5 new bins between old ranges and
   247	newly arrived out-of-range values was enough to adequately
   248	divide the range``, those arguments can not trade-off the need
   249	of a split operator. For example, the instances are patients
   250	coming into a clinic one after the other. The first one is an
   251	infant while the second one is an old lady. In the two first
   252	instances, one has seen the two far ends of the age attribute
   253	[1, 90]. SPADE will produce 1+5 intervals by now and forever
   254	(assume the oldest is 90 years old). The reason behind this
   255	sub-optimality is that the attribute values do not necessarily
   256	gradually change, they can abruptly shift.
   257	
   258	
   259	In the second to last paragraph of Section V, the paper claims
   260	that SPADE is good because it outperforms dealing with numeric
   261	attributes by normal or kernel probability estimation. However,
   262	the observation that discretization is better than probability
   263	estimation has long been established. Mentioning it here only
   264	again proves that discretization is better, but not that SPADE
   265	itself is good discretization. A much convincing way is to
   266	compare SPADE with peer discretization methods.
   267	
   268	At the end of section C in experiments, it is said that ``
   269	SAWTOOTH can retain knowledge of old contexts and reuse that
   270	knowledge when contexts re-occur``. But the paper does not
   271	mention before any mechanism to retain old concepts or identify
   272	re-appearing concepts at all.  How did this achievement happen
   273	then?
   274	
   275	
   276	Other minor comments:
   277	
   278	1. Is WASTOOTH a method newly proposed in this paper or it is
   279	only reused by this paper? It does not hurt to clarify this
   280	point. If it is new, should emphasize more; if not, should give
   281	a reference. 
   282	
   283	2. At the end of this paper, in the conclusion section, the term
   284	``V & V`` agent is mentioned for the first time. What does it
   285	mean?
   286	
   287	3. The paper mentions the MaxBins parameters is by default set
   288	to be the square root of all the instances seen to date.  If the
   289	paper wants to justify this setting, it may help by citing a
   290	causal paper: Ying Yang and Geoff Webb, Proportional k-interval
   291	discretization for naive-Bayes classifiers, ECML 2001.
   292	***********************
   293	Reviewer 2
   294					
   295	
   296	Section I. Overview
   297	
   298	A. Reader Interest
   299	
   300	1. Which category describes this manuscript?
   301	
   302	(X) Practice / Application / Case Study / Experience Report
   303	
   304	( ) Research / Technology
   305	
   306	( ) Survey / Tutorial / How-To
   307	
   308	
   309	2. How relevant is this manuscript to the readers of this
   310	periodical? Please explain  your rating under IIIA. Public
   311	Comments.
   312	
   313	( ) Very Relevant
   314	
   315	(X) Relevant
   316	
   317	( ) Interesting - but not very relevant
   318	
   319	( ) Irrelevant
   320	
   321	
   322	B. Content
   323	
   324	1. Please explain how this manuscript  advances this field of
   325	research and / or contributes something new to the literature.
   326	Please explain your  answer under IIIA. Public Comments.
   327	
   328	2. Is the manuscript technically sound? Please explain your
   329	answer under IIIA. Public Comments. 
   330	
   331	(X) Yes
   332	
   333	( ) Appears to be - but didn't check completely
   334	
   335	( ) Partially
   336	
   337	( ) No
   338	
   339	
   340	C. Presentation
   341	
   342	1. Are the title, abstract, and keywords appropriate? Please
   343	explain your answer under IIIA. Public Comments.
   344	
   345	(X) Yes
   346	
   347	( ) No
   348	
   349	
   350	2. Does the manuscript contain sufficient and appropriate
   351	references? Please explain your answer under IIIA. Public
   352	Comments.
   353	
   354	(X) References are sufficient and appropriate 
   355	
   356	( ) Important references are missing; more references are
   357	needed
   358	
   359	( ) Number of references are excessive
   360	
   361	
   362	3. Does the introduction state  the objectives of the manuscript
   363	in terms that encourage the reader to read on? Please explain
   364	your answer under IIIA. Public Comments.
   365	
   366	(X) Yes
   367	
   368	( ) Could be improved
   369	
   370	( ) No
   371	
   372	
   373	4. How would you rate the organization of the manuscript? Is it
   374	focused? Is the length appropriate for the topic? Please explain
   375	your answer under IIIA. Public Comments.
   376	
   377	(X) Satisfactory
   378	
   379	( ) Could be improved
   380	
   381	( ) Poor
   382	
   383	
   384	5. Please rate and comment on the readability of this
   385	manuscript. Please explain your answer under IIIA. Public
   386	Comments.
   387	
   388	(X) Easy to read
   389	
   390	( ) Readable - but requires some effort to understand
   391	
   392	( ) Difficult to read and understand
   393	
   394	( ) Unreadable
   395	
   396	
   397	Section II. Summary and Recommendation
   398	
   399	
   400	A. Evaluation
   401	
   402	Please rate the manuscript. Please explain your answer under
   403	IIIA. Public Comments.
   404	
   405	( ) Award Quality
   406	
   407	( ) Excellent
   408	
   409	( ) Good
   410	
   411	(X) Fair
   412	
   413	( ) Poor
   414	
   415	
   416	B. Recommendation 
   417	
   418	Please make your recommendation.  Please explain your answer
   419	under IIIA. Public Comments.
   420	
   421	( ) Accept with no changes
   422	
   423	( ) Author should prepare a minor revision
   424	
   425	( ) Author should prepare a major revision for a second review
   426	
   427	(X) Reject
   428	
   429	
   430	Section III. Detailed Comments
   431	
   432	
   433	A. Public Comments (these will be made available to the author)
   434	 This paper describes SAWTOOTH and SPADE - the former is an
   435	implementation of a Naive Bayes (NB) classifier that does
   436	windowing on
   437	the input data, and the latter is a one-pass discretization
   438	algorithm.  It is a bit difficult to ascertain the contribution
   439	of the
   440	paper.  It could be, and the introduction leads one to believe
   441	that
   442	the authors consider it to be at least in part, the observation
   443	that
   444	simple systems can perform well on large datasets (such as the
   445	1999
   446	KDDCUP dataset).  When Rob Holte made this observation over a
   447	decade
   448	ago, it was surprising to many.  However, we now know that,
   449	roughly
   450	speaking, getting 90% of the best possible performance is quite
   451	easy,
   452	but getting that last 10% can be quite hard.  Therefore, the
   453	results
   454	on the KDDCUP dataset presented in this paper are not
   455	surprising.
   456	They're close to, but not as good as, the results from the
   457	winning
   458	system which was much more complicated.
   459	
   460	The observations in section II on finding plateaus, and the
   461	method
   462	used, do not seem to constitute a novel contribution.  As the
   463	authors
   464	acknowledge, the fact that relatively few instances often
   465	suffice has
   466	been noticed by others before.  Figure 1 confirms this
   467	observation yet
   468	again.  Also, there's a paper from KDD by Provost, Jensen, and
   469	Oates
   470	on progressive sampling in which issues related to determining
   471	when
   472	learning curves have plateaued that's relevant.  The problem is
   473	fairly
   474	difficult.  
   475	
   476	The use of sliding windows to deal with non-stationarity is not
   477	new,
   478	though the use of equation 1 to control window growth may be.
   479	However, that equation is presented without discussion as to
   480	its
   481	derivation and appears to be ad hoc.  That's not necessarily a
   482	bad
   483	thing, but some discussion of why equation 1 is expected to be
   484	useful
   485	is in order.
   486	
   487	Section IV is just a review of NB, and section V presents
   488	SPADE.
   489	Figure 3 suggests that SPADE performs roughly as well as John
   490	and
   491	Langley's method, which is true of a large number of other
   492	discretization methods.  There's nothing particularly new or
   493	insightful about the approach.
   494	
   495	Finally, the experiments are mostly done well, though there is
   496	a
   497	complete lack of information about variance in the paper.  Are
   498	any of
   499	the results statistically significant?  I suspect in the end
   500	that the
   501	answer may not be relevant - some results will be, and some
   502	won't, and
   503	SAWTOOTH/SPADE will enter the pack of other
   504	algorithms/approaches that
   505	exhibit similar behavior, though on different datasets.  There
   506	is no
   507	free lunch in machine learning.
   508	
   509	Section VI-C describes an experiment in which the ability of
   510	SAWTOOTH
   511	to deal with concept drift is explored.  However, very little
   512	information about the simulator is provided and, more
   513	significantly,
   514	the paper never says precisely how SAWTOOTH "retain[s] knowledge
   515	of
   516	old contexts".
   517	
   518	In summary, there's nothing really new in this paper. 
   519