If you run tar -xzf full_data_release_tar.tgz, it will create a new directory
called data with two sub-directories.
For the files in O2_population_data/reruns/, they must be read in with the deepish module. You can then cast the posterior to a pandas DataFrame:
import deepdish
from pandas import DataFrame
data = deepdish.io.load('mass_c_iid_mag_two_comp_ind_tilt_result.h5')
post = DataFrame.from_dict(data['posterior’])
The data are separated into sub-directories corresponding to the waveform model used the obtain the individual-event posterior samples. The parameter names are as follows, compared to the latex in Table I of https://dcc.ligo.org/LIGO-P1800324
‘alpha’ - \alpha
‘beta’ - \beta_q
‘delta_m’ - \delta m
‘lam’ - \lambda_m
‘log_likelihood'
'log_prior'
‘mmax’ - m_{\max}
‘mmin’ - m_{\min}
‘mpp’ - \mu_{m}
‘mu_chi’ - E[a]
'rate’ (units of Gpc^-3 yr^-1)
‘sigma_1’ - \sigma_1
‘sigma_2’ - \sigma_2
‘sigma_chi’ - Var[a]
‘sigpp’ - \sigma_{m}
‘xi’ - \zeta
The files found in o2_pop_data_rit can be read in using h5py. This directory is separated into ppd_samples and source_data, which have the same hyper-parameter samples for a sub-directory with the same name. For example, modelA_nonsingularspins_alignedspinVT_quadratic_IMRP_all uses the mass model A from the paper, includes nonsingular spin configurations, calculates the sensitive spacetime volume including the effects from aligned spins, and uses the IMRPhenomPv2 posterior samples for all individual events. The meaning of the other directories can be inferred analogously, and more details are provided in the paper.
The ppd_samples.hdf5 files have the hyper-parameter posteriors as well as samples from the PPDs saved. It is recommended to use the hyper-parameter posteriors stored in these files over the posteriors_cleaned.hdf5 files, as they are easier to access. The posteriors_cleaned.hdf5 files in the source_data directory have the hyper-parameter posterior samples as well as additional information on the setup of the runs. The “constants” key indicates which parameters were set to fixed values during the sampling, and can be accessed as:
import h5py
data = h5py.File('posteriors_cleaned.hdf5','r')
constants = dict(data['constants'].attrs)
The variable names in the order that they appear in np.array(data['pos’]) in this file can be accessed as data.attrs["variable_names"].decode("utf-8").split(",")
Both the ppd_samples.hdf5 and the posteriors_cleaned.hdf5 files use the following parameter names compared to Table I of the paper:
'E_chi1', ‘E_chi2’ - E[a] assuming a_1 and a_2 are drawn from independent Beta distributions
'Var_chi1', ‘Var_chi2’ - Var[a] assuming a_1 and a_2 are drawn from independent Beta distributions
‘alpha_m’ - \alpha
‘beta_q’ - \beta_q
‘log10_rate’ - units of log_{10}( Gpc^-3 yr^-1)
‘m_max’ - m_{\max}
‘m_min’ - m_{\min}
‘mu_cos1’ - Mean of the cos_tilt_1 distribution, fixed to 1 (check “constants”)
‘mu_cos2’ - Mean of the cos_tilt_2 distribution, fixed to 1 (check “constants”)
‘sigma_cos1’ - \sigma_1
‘sigma_cos2’ - \sigma_2
We note that the spin magnitude distribution used in these files is not described in detail in the paper. Instead of assuming that the two component spin magnitudes are drawn from the same beta distribution, they are fit to independent beta distributions characterized by (E[a_1], Var[a_1]) and (E[a_2], Var[a_2]).