Dear Dr. Hildebrandt, We would like to thank the referees for their helpful comments, suggestions, and criticisms. Enclosed is a revised manuscript which addresses their various points. Below, we have reproduced the comments from the three referees verbatim. After each, we detail the changes to the manuscript that have been made in response to those comments. In almost all cases, we have made the changes that were suggested or recommended or implied. There are however a couple of places where we do not entirely agree. In those cases, we have explained and justified our reasoning and choices in more detail. We also enclose a plot and short calculation intended for the first referee, showing three different values of the chirp mass obtained directly from the time/frequency data. These are described but not detailed in the manuscript (where they have been left as an exercise for the reader) but we thought it would be helpful and save time for the referee. Referee #1 is clearly an expert in the field, and we are extremely grateful for the mistakes and errors that they spotted. However we would like to point out that the target audience for this paper is not the specialist or expert, but precisely the opposite. It is for this reason that we have not always detailed every imaginable caveat and caution. Doing this would make the paper less accessible, and complicate what are essentially simple paper-and-pencil arguments. We believe that the revised manuscript reflects a reasonable compromise between these two opposing tensions. Sincerely, Bruce Allen, Ofek Birnholz, Alex Nielsen, on behalf of the LIGO and VIRGO Collaborations. ---------------------------------------------------------------- Reviewer #1: * Finding zero crossings Footnote 3 says "we averaged the positions of the (odd number of) adjacent zero crossings." This is not clear, as the sentence without the parenthetical statement implies something different from what is implied by the parenthetical statement. Does it mean that to find the position of zero crossing i, the average was taken of the positions of zero crossings i-1, i, and i+1? If so, I would suggest saying something like "we averaged the positions of the zero crossing in question and its two neighboring zero crossings." For which zero crossings was this averaging necessary? We clarified this in footnote 3 and the first paragraph of section 2: we averaged 5 zero-crossing positions for the data point at t~0.35s. Averaging was not required for any other data points. * Nonlinear effects "The merger of two black holes could have included highly nonlinear effects, making any Newtonian analysis wholly unreliable for the late evolution. However, solutions of Einstein's equations using numerical relativity (NR) [10–12] have shown that this does not occur." This is sloppy. The cited papers show that PN disagrees to some significant extent with the numerical results (even when using higher orders than 0PN), and it is clear that the merger does include "highly" nonlinear effects. From a PN perspective, the "late evolution" is actually singular, leading to infinite metric perturbations. Numerical simulations show that nonlinearities take over before the singularity is reached, meaning that PN cannot model the end very well. The authors have overextended here. I suggest rephrasing this very carefully, to allow that nonlinear effects are important, but to explain that PN still gives a good qualitative representation of the data until quite late in the inspiral. We rephrased as suggested, and rearranged paragraphs 4 & 5 of the Introduction for clarity. * Reason for frequency doubling The authors say that "the quadrupole moment [of gravitational radiation] is invariant under reflection about the center of mass," which they say "implies that the gravitational wave must be radiated at a frequency that is twice the orbital frequency." There are a few problems with this. 1) I cannot find a similar statement in the cited reference. Perhaps the authors only meant that reference to support the claim that the radiation is quadrupolar at leading order. If so, the citation should be moved closer to that statement. It is also possible that Landau and Lifshitz do actually make a similar statement, but I cannot find it because the citation is just to the book as a whole. Generally one should provide page numbers when citing a book for a particular fact. 2) Reasonable interpretations of the statement are false. It is not clear precisely what is meant by "the quadrupole moment", but if one takes the 2,2 mode, one may note that while it is maximum along the orbital axis, it is actually zero in the opposite direction. Obviously, we would need to also use the 2,-2 mode, but even then it is not clear how to evaluate the field at the reflected point when the field depends on the orientation of the detector. If one instead uses a quadrupolar tensor, how are the components found? How does one compare tensors at two different points on a manifold? One could speak of the *amplitude* of the field, but this is not what is measured by a detector. The point is moot, of course, for the following reason. 3) Most importantly, the conclusion would not actually follow logically from any knowledge of the field's behavior under reflection about the center of mass. An observer at a fixed sky location relative to the binary does not soon measure the field at a point reflected about the center of mass unless the observer is on the equatorial axis, which is presumably not relevant to the case of GW150914. Instead, the authors presumably intend to point out that the quadrupolar components are invariant under rotation by \pi about the orbital axis, since the waveform modes vary as e^{-i m \phi} and the quadrupolar components are m=+/-2. (The authors will need to find a suitable and specific reference for this fact.) An observer at a fixed sky location will measure the field at one instant, the binary will have rotated by \pi a moment later, and the observer will measure (roughly) the same field at that instant. This is why the dominant GW frequency is roughly twice the orbital frequency. Of course, it should also be noted that this is only roughly true, because the frequencies (and amplitudes) are time dependent, and the "amplitude" itself is actually a complex quantity. The authors should simply say that the dominant time dependence is due to rotation about the orbital axis. We rephrased the entire paragraph (now paragraph four of page 3) to explain the origin of the quadrupolar nature of the radiation, relating to rotation by \pi about the orbital axis as the relevant symmetry. Note that for a fixed observer, the rotation by \pi symmetry implies an identical h_{ij} by Eq. (4). We also added a reference to Appendix A (which demonstrates where "2\omega" comes from) and to the exact pages in Landau & Lifshitz. We did not take up the spin-two form of the argument as suggested by the referee because this paper assumes minimal (undergraduate) knowledge. * Determining the mass scale In Sec. 2, the authors say that "a value for the chirp mass can be determined directly from the observational data, by plotting the frequency and frequency derivative of the gravitational waves as a function of time," and say that "the implied chirp mass value ... remains constant to within 25%." This is an important claim because, as the authors write, "The fact that the chirp mass remains approximately constant ... is strong support for the orbital interpretation." Unfortunately, I see no evidence for this claim. The authors do plot the frequency (^-8/3) in Fig. 2, but these values are very noisy. How do the authors estimate the frequency derivative, df/dt? Is it estimated directly from the data? If so, how do they do it, and how reliable are the results? If not, are they using Eq. (7) in some sort of circular reasoning? Are they using the slopes of the lines shown in Fig. 2 somehow? If so, they are using a constant value for df/dt, which is obviously wrong, and would seem to argue that the orbital interpretation is wrong. It seems like the most convincing evidence for this important claim would be to find the values of f and df/dt at various times (and explain how those values are found), and then plot the right-hand side of Eq. (7). This would be an interesting plot. However, I suspect that the data is simply too noisy, meaning that this paragraph would have to be removed. Of course, the following argument is sufficient, and uses the analytical formula to avoid calculating df/dt from the data. The conclusion to be drawn, however, is slightly weaker: the data are consistent with Eq. (7). The manuscript describes two methods for estimating the chirp mass: the first uses point-wise values for f and fdot (Eq. 7), the second uses an integral form over several cycles (Eq. 8). The second method uses only t and f=1/(2\Delta t) (estimated for zero crossings, as described in the manuscript). The first method additionally needs fdot values for the fitting. In the paragraph following Eq. (7) we have added a clarification text describing how fdot was obtained for the first method: we estimated tangents to the time-frequency curve (Fig. 2) to obtain the slopes (d ln(f)/dt = fdot/f). While we don't give further details in the manuscript, we have provided a plot for the referee to show how this works. In this plot we used 3 points (f = 45, 64, 128). The chirp mass Mc estimates came to within ~35% (32, 41, 30) (rather than the previouslu quoted %25). We have not added this explicit plot with the tangents to the manuscript, so this is left as an exercise to the reader. * "Proving" compactness The title of Sec. 3 begins with "proving compactness". Proof is a delicate topic, and this section surely does not meet the criteria. I suggest using "evidence for" in place of "proving". We changed "proving" to "evidence" in the title of Sec 3. Similarly, "the possibility of non-compactness is thus refuted" is too strong, because "to refute" means to disprove. This should be replaced by something like "non-compactness is inconsistent with our model of the data." We changed "refuted" to "inconsistent" in footnote 4. * Defining compactness The rest of the paper uses the compactness parameter frequently, but Sec. 3 never establishes the scale of interesting compactness. What would the compactness be for some types of star, white dwarfs, and neutron stars just as they touch or overflow their Roche lobes? What if one component is a black hole, but the other is not? At what compactness can we conclude that we have two black holes? This is one of the most interesting questions about GW150914, but the reader is left with no better idea about the evidence for two black holes after reading this paper. We've added several examples illustrating the compactness of binary systems to the last two paragraphs of Sec 3 (Mercury, HM Cancri, Sig A*, Cyg X-1). All are orders of magnitude less compact than GW150914. A discussion of why such highly compact objects should be black holes appears in Appendices B and C. * Orbital eccentricity The authors say in Sec. 4.1 that the correction due to orbital eccentricity can be neglected because orbits circularize faster than they merge. That's not a very strong argument, because the orbit may have started out very highly eccentric or may have been disturbed in the recent past (an event which might have initiated the merger, for example). It would be better to say that we *don't expect* the orbit to have such high eccentricity, and so we neglect the correction. There is no positive evidence presented in this paper we *can* neglect the correction; it's just what we do, and we find that we can tell a consistent story. If the authors have actual evidence that the system is not eccentric, it would be great if they could include it in this paper. Otherwise, they need to make clear that the assumption of low eccentricity is just that, however well motivated it may be. We've expanded this subsection (4.1). In the second paragraph of 4.2 we describe the modulation that would appear in a signal from a highly eccentric system. There is no such modulation observed in the data. In the third paragraph we also explain that this is to be expected because the GW emission quickly circularizes the orbit. * Effect of spin In Sec. 4.3, the authors discuss the effect of spin solely in terms of the change in the horizon radius, while still assuming Keplerian dependence of separation on frequency. But surely extremal Kerr black holes would alter that relationship. How does spin affect Kepler's equation? Sec 4.3 (the sentence after Eq. 12) specifically says it defers the effects spins have on the orbital dynamics to Sec 4.4. Sec 4.4 then explains that the spin effects on the dynamics are suppressed by the post-Newtonian parameter, and so only significant in the already-compactly-close regime. * PN order The authors say that "Strictly speaking, x = 0 corresponds to the 0PN approximation." That's not precise enough to be true. Similar statements may be true. For example: the 0PN approximation is only precisely correct at x=0. Or: if x=0 then all PN approximations agree. Alternatively: x^0 corresponds to 0PN. The authors should reword this statement to be clear and true. We reworded the first paragraph of Sec 4.4 as suggested. * Validity of PN "As Newtonian dynamics holds when x is small, the Newtonian approximation is valid down to compactness R of order of a few." How does this follow? If x~2/small. Why should 2/small be a few? The physicist's standard value of "small" is 0.1 or less, in which case the approximation would only be "valid" for R>~20. Surely this is not "a few". Even if I plug in the frequency from Eq. (3), I get x=0.28, which implies R~7. Does 7 count as "a few"? Perhaps the discrepancy suggests that the approximation is no longer "valid". And what does "valid" mean, anyway? Thank you for spotting this!! There was indeed a typo in the second paragraph of Sec 4.4 in the x to R relation, which we have corrected to x~1/(2R). For x~0.28 this gives the ~1.8 (we used x~0.3, thus quoted R~1.7), which certainly qualifies as "a few"; taking 0.1 as the definition of "small" gives R~5, which we think also justifies the language for "down to a few". * Proof by contradiction "Reductio ad absurdum then shows that the orbit must be compact: if one assumes that the orbit is non-compact, then the Newtonian approximation is fully valid and leads to the conclusion that the orbit is compact." If one bandies about Latin terms for logical arguments, one must be prepared to play by the meticulous rules of logic. In particular, one must state the premises and apply the transformation rules of propositional calculus correctly. Unfortunately, the authors failed to state their premise completely, then failed to correctly negate that premise, and thus fell victim to the fallacy of false dichotomy. Here is a non-exhaustive list of premises required to conclude via the Newtonian approximation that the system is compact: 1) The features identified in the data by the authors are real, and not caused or substantially altered by noise. 2) Those features *prove* that the source is a binary system, as opposed to a single or triple or some exotic source. 3) Those features *prove* that the binary is not highly eccentric. 4) Those features *prove* that the spin is not important. 5) The binary is non-compact during the observation interval. 6) General relativity (and by implication, the Newtonian model for non-compact systems) is a good model for physics. A proof by contradiction simply proves the negation of the conjunction of the premises, or equivalently proves the disjunction of the negation of each premise. For example, this might show that the features identified in the data are not real, or are caused by noise; or that the source is not a binary system; or that the binary is highly eccentric; or that spin is important; or that the binary is compact; or that we really don't understand how such a binary should evolve. No one of these statements can be selected as the single item that is proven by the contradiction without *proof* of all the other premises. Now, the authors might reply, for example, that in Sec. 4.3 they argued that spin is not important, and thus the stipulation of small spin can be removed. First of all, there's some circularity to this argument: assuming the Newtonian approximation is valid (which is true only if the system is non-compact), we "prove" that the spin is not important, which allows us to "prove" that the system is compact. Second, the model space is multi-dimensional, and the authors have only investigated a single dimension at a time; perhaps a combination of factors would lead to different conclusions. In any case, they have not *proven* any of this, and physics is a messy affair of multiple lines of (sometimes contradictory) evidence. If they wish to approach the level of rigor in which a proof by contradiction is possible, they must not be so quick to dismiss possibilities. The cure for these problems is fairly simple: the authors should avoid pretensions to proof. They should simply say that the contradiction implies that either the system is indeed compact, or some other element of their model is substantially wrong---or both. But if one is persuaded that the model is reliable, one should believe that the system is compact. The contentious paragraph (2nd paragraph of 4.4) is important to the logic of the paper and we do not want to water it down too much. However we have softened the language a bit and reworded to make it clear that the validity of Newtonian mechanics and the validity our data analysis are assumptions. We do not use the term "prove" but rather "leads to the conclusion", and removed the Latin. We do maintain that in any argument, assumptions must be made; for example a student learning Newtonian mechanics "proves" that the external gravitational field of a spherically symmetric object is unchanged if all the mass is concentrated at the center. This result is not stated in the form, "If Newton's laws of gravity are correct, then...". Similarly here, we are discussing experimental observational data, and are obviously assuming that our data set is valid, that Einstein's theory of general relativity is correct, and so on. So it's clear from context that our "proof" is contingent; a complete list of assumptions is not required or appropriate here. * [I should note that the end of Sec. 4.4 contains one use of the word "refute" to which I will not object. The authors say that an argument does not refute their conclusions, which is true because there is no refutation in science.] Thank you. There are now no other "refute"'s in the text, nor "proof"/"prove". * Plunge trajectory What is a plunge, when might it begin, and what drives the changes in frequency when it does? In paragraph 1 of Sec 4.5 we have added a short discussion of the innermost stable circular orbit, of the enery-loss-driven inspiral for orbits outside that versus the plunge trajectories inside it. We also reference the pages in the Misner, Thorne & Wheeler textbook (MTW). * Strain at 100km "the strain can be at most h ~ 1, at a radius of the order of the Schwarzschild radius of the system R ~ 100 km." Why should that be the case and/or relevant? How well-defined is strain at such a dynamic place? "the amplitude decreases as h ~ R/d_L" Why should this be the case? Surely the 1/d_L behavior can only reliably be said to start in the wave zone, R ~ c/f ~ 2000 km, which would lead to a distance bound of d_L <~ 65 Gpc, which is not very interesting given the size of the universe. I don't believe this entire paragraph; I suggest leaving it out. This argument (given in the first paragraph containing equation 18) is correct, and entirely in the spirit of the order-of-magnitude arguments given in this paper. Yes, we could of course formulate the argument in the wave zone of small linearized perturbations, say starting at a distance R~2000km, with h a small perturabation h~0.1 (the suggested acceptable "small"). Or at a distance of 2,000,000km, with h~0.0001. But the conclusion is the same, because the amplitude decreases proportional to the inverse of the luminosity distance 1/d_L (which certainly holds in the wave zone)! The resulting upper bound would not change. So we are giving the argument in its simplest and most direct form. In the following paragraphs, we obtain a much more accurate distance estimate. This more accurate estimate shows that the luminosity distance d_L is about an order of magnitude larger than given by the first cruder estimate. This demonstrates that h is only about ~0.1 in the system zone. So bounding h<~1 in the system zone is conservative and justified in this case. To summarize, we do not give wave-zone calculation, because (a) it does not lead to a different conclusion, and (b) we immediately give a more accurate estimate. * Universal peak luminosity The peak luminosity scales roughly with the square of the mass ratio, but the mass ratio varies from 1/4 to 0. Yet the authors say that the given luminosity is universal. Would it be better to say that the luminosity does not change in order of magnitude over some relevant range of mass ratios? Yes. We reworded this to make it clear that we are discussing cases where the masses are equal or near-equal (3rd paragraph of Sec. 5). * Emitted energy and power "During the peak of its emission, GW150914 emitted about 23 orders of magnitude more power than this, in the form of gravitational waves." Is this meant to be calculable from the previous paragraph and a half? If so, there's been some sleight of hand, involving translation from energy to power. The conclusion of Eq. (21) estimates the total energy output by the GW150914 system during its lifetime. How is this converted to power? By multiplying the ~300x greater energy from GW150914 by the 3e17 sec in ten billion years, and dividing by ~0.001 sec? Where could that last number come from? The authors might argue that most of the energy is emitted in the final instants of the merger due to the dot and square on h appearing in Eq. (19), but I would only believe this comprises ~0.01 sec, judging by Fig. 1. The numbers need to be checked. And in any case, this should be explained more clearly, because it would be an interesting fact. The text in the final paragraph of Sec. 5 was ammended to clearly distinguish between total energy (x300 than the solar output) and power. The power was corrected to 22 orders of magnitude greater than the sun, calculated from the peak luminosity 0.2*10^-3 L_Planck, which corresponds to one cycle at peak amplitude (about 6ms). * Generating gravitational waves Appendix A starts into some detail about the relationship between emitted waves and their source. It surprises me that the authors don't bother going into any detail about where these equations come from. It's relatively simple (see MTW, for example) to point out that Einstein's equations can be linearized, and a simplifying gauge choice can be made, so that you just get a wave equation for h_{ab}, for which the equations are basically Q_{ij}, up to time-derivatives and constant factors. The point of this paper is to discuss the source system (and its observational properties) rather than Einstein's equation or linearizations thereof. But we added a footnote (number 5, immediately after equation 5 in Section 2) on this derivation and the exact MTW location, for the interested reader. This is in line with keeping the Appendix as a "worked out exercise" that is accessible even to first year students. * Minor points There seem to be some latex issues (missing backslashes?) in the units for G and c on page 1. Thank you, we have fixed this! The shaded region in Fig. 5 seems a little misleading, as it suggests (without careful reading of the caption) that it actually bounds the region of possible. It would be better to place another line at e=0.8 and remove the shading. Or maybe this could be turned into a contour plot, with y axis being the eccentricity and the "z" axis being the compactness ratio. We changed Figure 5 to a colour-graded contour plot with the "z axis" marking the compactness ratio, as suggested. "Unto" -> "into" at the bottom of page 8. Fixed. "It's" -> "its" in the paragraph before Sec. 6. Fixed. Reviewer #2: Major points: * Not Cosmologically distant If I am not mistaken, Eqs. (4) and (5) hold relative to an asymptotically flat background metric. If this is correct, then these equations already implicitly assume that the source is cosmologically nearby, and this should be mentioned right there. We added this statement in the paragraph after equations 4 & 5. * Keplers law holds even in GR (for test masses) Following Eq. (6), the authors state that Newton's laws of motion and of gravity were used to derive Eq. (7). I am not quite sure about this statement since Kepler's third law continues to hold for circular orbits in Schwarzschild geometry. As long as the two black holes can be considered sufficiently detached, not even Newton's laws need to be assumed. One might go one step further and define a modification of the reduced mass such that Kepler's third law continues to hold further beyond the Newtonian regime. Perhaps this could be further clarified. It is true that Kepler's third law is also holds for a test particle orbiting a non-spinning BH in general relativity. This indeed in gives the same mass-separation-(orbital)frequency relation for (adiabatically) stationary orbits. But unfortunately we can not give up Newtonian mechanics for this elementary presentation: concepts such as energy and its time derivative (power) are needed to describe the change in those orbits due to energy loss. * Adiabatic orbital changes Also, in Appendix A, it seems to me that the assumption needs to be made that the loss of energy per orbit is small compared to the orbital energy because Kepler's law is used for evaluating the energy loss. This does not harm the argument made, but should also be added for clarity. We added a statement of this assumption at the end of the paragraph containing equation (27). * Can near-equal masses be concluded in this paper? The only assumption for which no strong (empirical) motivation is given is that of equal masses. It is estimated later that even the lower mass must exceed approximately 11 Solar masses, but this still allows a large range of mass ratios. It would be helpful for the reader to learn what property of the data forces the masses to be similar even if the argument could not be given quantitatively. The mass ratio has a considerable effect on the waveform at the 1PN order, but that is unfortunately beyond the scope of this paper, which is 0PN. But we have added a comment and discussion with references about this point to the second paragraph of the conclusion. * Minor points: The statement "main-sequence stars have radii measured in millions of kilometers" should be rephrased in view of the fact that the Solar radius is about 695000 km. We fixed this by making a more precise statement after Equation 9. Given that an upper mass limit of 4.76 Solar masses is conservatively adopted later, the statement that 5 Solar masses were "significantly above the neutron star mass limit" (p. 7) seems exaggerated. We reworded the second paragraph after Eq. 9 so it is now slightly milder, with a reference to App. B (using a limit of ~3 solar masses for neutron stars). * Tiny points: It seems that the reduced mass µ is used for the first time in Sect. 5 (p. 8) without definition. This should be defined. We moved the definition of reduced mass to before equation 6. Following Eq. (1), "rm m^3/kg s^2" and "; m/s" should be replaced by "{\rm m^3/kg s^2}" and "\; m/s". Thank you, this has been fixed! "During it's ten-billion-year lifetime" (p. 9, left column) should read "During its ten-billion-year lifetime". Fixed. "does the end" (p. 11, left column) should read "Does the end". Fixed Reviewer #3: There were no specific comments to address.