Since structural modelling is sensitive to low levels of noise, haploid G1 cells were processed using stringent contact filtering to remove contacts that are more likely to be technical artefacts. We first used HiCUP27 (link), applying a di-tag size selection from 50 bps to 850bps, for mapping di-tags and filtering out common Hi-C artefacts. Putative PCR duplicates were not removed by HiCUP, instead the filtered data was then passed a new tool (SiCUP) for further single-cell Hi-C specific filtering. We removed reads mapping to the Y chromosome, to short restriction fragment (less than 21bps) and to regions defined as problematic by ENCODE. We also filtered reads mapping to fragment ends forming multiple interactions in one percent or more of the datasets. To avoid potential artefacts we removed singleton di-tags. In haploid G1 cells there is only one copy of the genome, hence after removal of PCR-duplicates each observed fragment end should be in contact with at most one other fragment end. Consequently, multiple contacts from the same fragment were removed entirely. An exception to this was when a fragment end (A) interacted with two other fragments ends (B and C) which were close together (defined here as when B and C were within 20 MboI fragments). In such instances the strand orientation of the reads mapping to B and C were typically the same, to a degree not expected by chance (as defined by a chi-squared test when evaluating the whole dataset). We reasoned that in such instances these apparently distinct interactions were in fact derived from one initial Hi-C interaction. Consequently, when this was observed, not all the di-tags were discarded. Instead, if the Hi-C interaction was in trans, a random di-tag was discarded. Alternatively, when the Hi-C interaction was in cis, the di-tag representing the shortest Hi-C interaction was retained.
We also filtered out unsupported contacts. For each cell, using the filtered contacts, we first derived a connectivity graph of the genome. Nodes of the graph represented 1Mb segments of the genome, and each edge represented a single contact mapped onto 1 Mb resolution, so any two nodes of the graph might be connected by more than one edge. We defined a contact as unsupported if upon deletion of that contact, the shortest path connecting its two end nodes would be longer than 3 edges. These unsupported contacts (median 1.06% of contacts, Extended Data Fig. 10a) were removed from the sc-HiC libraries before 3D modelling.