We thank the referee for helpful comments. We have edited the manuscript in response, and we address them point-by-point below. Section II A : { This was done by taking nine weeklong data samples from each detector spaced roughly 55 days apart, giving nine evenly spaced weeks throughout the duration of S6.} Why the authors have chosen those nine weeklong data samples? Why 55 days? In other words, are there any other 9.2 days when the detector was more sensitive than those chosen in this paper? We expect the detector sensitivity to improve throughout a science run because commissioning work is ongoing on a weekly basis. The nine weeklong data samples were used to provide a "big picture" view of the detector's evolving sensitivity, which pointed to the end of S6 as the most sensitive time. From there we zeroed in on the most sensitive nine days, which were in fact the final nine days of S6. Note that the data sample arrived at via this method (Oct. 11-20, 2010) is not actually one of the nine weeklong data samples listed. We have edited the section to make this clearer. The final two sentences of that paragraph now read: "At all four frequencies, detector sensitivity improved as the run progressed. Using this figure of merit, it was found that the final nine days of S6 yielded the most sensitive data stretch for all four frequencies: October 11-20, 2010 (GPS 970840605 -- 971621841)." Section II D: The mismatch is same for both initial candidate selection, ``extended-look'' or ``time shifts'', and follow-up searches? Yes, it is. We've modified the sentence to: "This search used a mismatch parameter $m = 0.2$ at all stages." Section II G: {A survey of the loudest joint-detector 2F value reported for background subbands known to be free of injected signals for the band between 200 Hz and 240 Hz (used in a pilot run) gave a mean loudest joint-detector} $2 F \simeq 55$.} How can one be sure that there were no TRUE signal in that frequency band? The "pilot run" in the band between 200 Hz and 240 Hz was a pilot mock data challenge. The text has been modified to make this clear. This pilot MDC consisted of 24 fake signals injected into the 200-240 Hz band at known frequencies. Our search algorithm returns a loudest 2F value for each 0.1 Hz, and we searched in 1-Hz bands centered on each fake signal. Each search therefore returned ten loudest 2F values per signal. To be conservative, we excluded the central four 0.1-Hz subbands as possibly affected by the fake signal and used the outer six subbands for our background estimate. The mean of the loudest 2F for these 144 subbands known to be free of injected signals is the 2F \simeq 55 reported in the paper. We have added to this section a clarification that this threshold was confirmed to be appropriate in other bands as well. While we cannot be sure there were zero true signals in these subbands, the odds that they existed, and in high enough number and strength to significantly alter that mean, are very low. No true CW signal has ever been reported in the S6 data, and the fake injections were performed at randomized sky locations. Table III: How can the follow-up $2F_{J}$ values be almost always smaller than the original values? Or how those ``Followup $2F_{J}$'' entries are computed? The "Followup $2F_J" values are returned by the same search algorithm running over a "followup" data set, either time-shifted or extended. The signals all proved to be either instrumental lines, hardware injections, or Gaussian noise. In all cases we would expect lower 2F values in follow up. For Gaussian noise, what was flagged was an outlier--a particularly high instance of random noise. In either a time-shift or an extended look (longer observation time) we would expect to see a lower result. The same holds true for instrumental lines and hardware injections. Since the search, by design, uses the most sensitive data, any time-shift is to a less sensitive time period and will return lower 2F values. Instrumental lines will also degrade when we search over longer periods in an extended look, because their "sky location" (co-located with the detector) does not actually match the sky location we are searching for, and this disagreement becomes more pronounced over time. The outliers labeled "hardware injections" were actually outliers caused by hardware injections at a different sky location, and therefore the sky-location mismatch degraded them as well. We have added a sentence to the table caption: "For outliers due to random noise and for many instrumental artifacts, we expect the follow-up 2F to be smaller than the originally obtained 2F, in contrast to true signals for which 2F should increase with observation time." I probably do not understand the designs of the manual follow-up steps. From the paragraph starting with ``Outliers detected with joint 2F greater than the threshold established by the software injections were labeled candidates and received manual followup.'', it looks to me that the ``extended-look'' and ``time shifts'' tests are what the authors meant by the ``manual follow-up''. But the ``extended look'' was expected to give us higher values of F-statistic at least for hardware injections. as the authors noted that ``the same assumption of signal continuity would predict, roughly, a doubling of the 2F value for a doubling of coherence time''. The "candidates" were the outliers which survived the time shifts and extended looks--they are the seven listed in Table III. To make this clearer, we have changed that opening paragraph to read: "Outliers detected in time shifts and extended looks with joint $2\mathcal{F}$ greater than the threshold established by the software injections were labeled candidates. The time shift and extended look tests were not cumulative; ..." As mentioned above, the outliers due to hardware injections were hardware injection signals loud enough to bleed over into the wrong sky location (there were no hardware injections at the sky location we searched over). The mismatch in sky location caused them, like the other instrumental lines, to degrade over longer timescales. We've changed their description to: "...arose from hardware injections located at other points in the sky..." to clarify. The next paragraph started with the sentence ``These candidates were subject to manual followup.'', mentioning another ``follow-up''. Here the authors mentioned know line noise artifacts only. They do not mention how or even whether they computed F-statistic values again at all. We've changed that opening sentence to read "These seven candidates" to make it clear we are referring to the seven candidates in Table III which survived the time-shifts and extended looks. Then the next-to-the-next paragraph mention further follow-up, for only two candidates (outliers 79 and 131). These two candidates are the two which were not associated with instrumental lines. The further follow-up in this paragraph and the next rules them out as Gaussian noise. We've changed the opening sentence of the paragraph to read: "The final remaining two candidates, which were not associated with known instrumental lines, were given another subsequent round of followup: a time shift and extended look performed in data from June 2010..." The final sentence of the paragraph about the loudest expected 2F value has been changed to: "The final two candidates' failure..." to clarify that it was only these final two candidates that underwent this extra step and were dismissed as noise fluctuations. It may be helpful if a flow-chart and the number of the surviving candidates at each step are provided. We concur, but believe a table format is more natural and have added such a table at the end of this section. Table IV: If one knows the frequencies of the ``known lines'' (sharp spectral features) before the analysis, why did the authors include those 1 Hz bands listed in the table in their analysis? The S6 ended 6 years ago. Our preference was to include as much data as possible, so we only removed data if instrumental artifacts truly made it impossible for our methods to set an upper limit. There were many other "known lines" which did not degrade the data enough to prevent an upper limit, and we feel our analysis is more valuable for having included them. Thanks to our approach, only 1.2 Hz of our 583 Hz band had to be excluded.