Publications

Mielczarek O, Rogers CH, Zhan Y, Matheson LS, Stubbington MJT, Schoenfelder S, Bolland DJ, Javierre BM, Wingett SW, Várnai C, Segonds-Pichon A, Conn SJ, Krueger F, Andrews S, Fraser P, Giorgetti L, Corcoran AE Immunology, Bioinformatics

To produce a diverse antibody repertoire, immunoglobulin heavy-chain (Igh) loci undergo large-scale alterations in structure to facilitate juxtaposition and recombination of spatially separated variable (V), diversity (D), and joining (J) genes. These chromosomal alterations are poorly understood. Uncovering their patterns shows how chromosome dynamics underpins antibody diversity. Using tiled Capture Hi-C, we produce a comprehensive map of chromatin interactions throughout the 2.8-Mb Igh locus in progenitor B cells. We find that the Igh locus folds into semi-rigid subdomains and undergoes flexible looping of the V genes to its 3' end, reconciling two views of locus organization. Deconvolution of single Igh locus conformations using polymer simulations identifies thousands of different structures. This heterogeneity may underpin the diversity of V(D)J recombination events. All three immunoglobulin loci also participate in a highly specific, developmentally regulated network of interchromosomal interactions with genes encoding B cell-lineage factors. This suggests a model of interchromosomal coordination of B cell development.

+view abstract Cell reports, PMID: 37676766 06 Sep 2023

Richer S, Tian Y, Schoenfelder S, Hurst L, Murrell A, Pisignano G Epigenetics

There is widespread interest in the three-dimensional chromatin conformation of the genome and its impact on gene expression. However, these studies frequently do not consider parent-of-origin differences, such as genomic imprinting, which result in monoallelic expression. In addition, genome-wide allele-specific chromatin conformation associations have not been extensively explored. There are few accessible bioinformatic workflows for investigating allelic conformation differences and these require pre-phased haplotypes which are not widely available.

+view abstract Genome biology, PMID: 36869353 03 Mar 2023

Orozco G, Schoenfelder S, Walker N, Eyre S, Fraser P Epigenetics

Genome sequencing has revealed over 300 million genetic variations in human populations. Over 90% of variants are single nucleotide polymorphisms (SNPs), the remainder include short deletions or insertions, and small numbers of structural variants. Hundreds of thousands of these variants have been associated with specific phenotypic traits and diseases through genome wide association studies which link significant differences in variant frequencies with specific phenotypes among large groups of individuals. Only 5% of disease-associated SNPs are located in gene coding sequences, with the potential to disrupt gene expression or alter of the function of encoded proteins. The remaining 95% of disease-associated SNPs are located in non-coding DNA sequences which make up 98% of the genome. The role of non-coding, disease-associated SNPs, many of which are located at considerable distances from any gene, was at first a mystery until the discovery that gene promoters regularly interact with distal regulatory elements to control gene expression. Disease-associated SNPs are enriched at the millions of gene regulatory elements that are dispersed throughout the non-coding sequences of the genome, suggesting they function as gene regulation variants. Assigning specific regulatory elements to the genes they control is not straightforward since they can be millions of base pairs apart. In this review we describe how understanding 3D genome organization can identify specific interactions between gene promoters and distal regulatory elements and how 3D genomics can link disease-associated SNPs to their target genes. Understanding which gene or genes contribute to a specific disease is the first step in designing rational therapeutic interventions.

+view abstract Frontiers in cell and developmental biology, PMID: 36340032 2022

Ridnik M, Schoenfelder S, Gonen N Epigenetics

Sex determination is the process by which an initial bipotential gonad adopts either a testicular or ovarian cell fate. The inability to properly complete this process leads to a group of developmental disorders classified as disorders of sex development (DSD). To date, dozens of genes were shown to play roles in mammalian sex determination, and mutations in these genes can cause DSD in humans or gonadal sex reversal/dysfunction in mice. However, exome sequencing currently provides genetic diagnosis for only less than half of DSD patients. This points towards a major role for the non-coding genome during sex determination. In this review, we highlight recent advances in our understanding of non-coding, cis-acting gene regulatory elements and discuss how they may control transcriptional programmes that underpin sex determination in the context of the 3-dimensional folding of chromatin. As a paradigm, we focus on the Sox9 gene, a prominent pro-male factor and one of the most extensively studied genes in gonadal cell fate determination.

+view abstract Sexual development, PMID: 34710870 28 Oct 2021

Groves IJ, Drane ELA, Michalski M, Monahan JM, Scarpini CG, Smith SP, Bussotti G, Várnai C, Schoenfelder S, Fraser P, Enright AJ, Coleman N Epigenetics

Development of cervical cancer is directly associated with integration of human papillomavirus (HPV) genomes into host chromosomes and subsequent modulation of HPV oncogene expression, which correlates with multi-layered epigenetic changes at the integrated HPV genomes. However, the process of integration itself and dysregulation of host gene expression at sites of integration in our model of HPV16 integrant clone natural selection has remained enigmatic. We now show, using a state-of-the-art 'HPV integrated site capture' (HISC) technique, that integration likely occurs through microhomology-mediated repair (MHMR) mechanisms via either a direct process, resulting in host sequence deletion (in our case, partially homozygously) or via a 'looping' mechanism by which flanking host regions become amplified. Furthermore, using our 'HPV16-specific Region Capture Hi-C' technique, we have determined that chromatin interactions between the integrated virus genome and host chromosomes, both at short- (<500 kbp) and long-range (>500 kbp), appear to drive local host gene dysregulation through the disruption of host:host interactions within (but not exceeding) host structures known as topologically associating domains (TADs). This mechanism of HPV-induced host gene expression modulation indicates that integration of virus genomes near to or within a 'cancer-causing gene' is not essential to influence their expression and that these modifications to genome interactions could have a major role in selection of HPV integrants at the early stage of cervical neoplastic progression.

+view abstract PLoS pathogens, PMID: 34432858 25 Aug 2021

Chovanec P, Collier AJ, Krueger C, Várnai C, Semprich CI, Schoenfelder S, Corcoran AE, Rugg-Gunn PJ Epigenetics

The transition from naive to primed pluripotency is accompanied by an extensive reorganisation of transcriptional and epigenetic programmes. However, the role of transcriptional enhancers and three-dimensional chromatin organisation in coordinating these developmental programmes remains incompletely understood. Here, we generate a high-resolution atlas of gene regulatory interactions, chromatin profiles and transcription factor occupancy in naive and primed human pluripotent stem cells, and develop a network-graph approach to examine the atlas at multiple spatial scales. We uncover highly connected promoter hubs that change substantially in interaction frequency and in transcriptional co-regulation between pluripotent states. Small hubs frequently merge to form larger networks in primed cells, often linked by newly-formed Polycomb-associated interactions. We identify widespread state-specific differences in enhancer activity and interactivity that correspond with an extensive reconfiguration of OCT4, SOX2 and NANOG binding and target gene expression. These findings provide multilayered insights into the chromatin-based gene regulatory control of human pluripotent states.

+view abstract Nature communications, PMID: 33828098 07 04 2021

Olan I, Parry AJ, Schoenfelder S, Narita M, Ito Y, Chan ASL, Slater GSC, Bihary D, Bando M, Shirahige K, Kimura H, Samarajiwa SA, Fraser P, Narita M Epigenetics

Senescence is a state of stable proliferative arrest, generally accompanied by the senescence-associated secretory phenotype, which modulates tissue homeostasis. Enhancer-promoter interactions, facilitated by chromatin loops, play a key role in gene regulation but their relevance in senescence remains elusive. Here, we use Hi-C to show that oncogenic RAS-induced senescence in human diploid fibroblasts is accompanied by extensive enhancer-promoter rewiring, which is closely connected with dynamic cohesin binding to the genome. We find de novo cohesin peaks often at the 3' end of a subset of active genes. RAS-induced de novo cohesin peaks are transcription-dependent and enriched for senescence-associated genes, exemplified by IL1B, where de novo cohesin binding is involved in new loop formation. Similar IL1B induction with de novo cohesin appearance and new loop formation are observed in terminally differentiated macrophages, but not TNFα-treated cells. These results suggest that RAS-induced senescence represents a cell fate determination-like process characterised by a unique gene expression profile and 3D genome folding signature, mediated in part through cohesin redistribution on chromatin.

+view abstract Nature communications, PMID: 33247104 27 Nov 2020

Bevan S, Schoenfelder S, Young RJ, Zhang L, Andrews S, Fraser P, O'Callaghan PM Epigenetics, Bioinformatics

Chinese hamster ovary (CHO) cell lines are the pillars of a multi-billion dollar biopharmaceutical industry producing recombinant therapeutic proteins. The effects of local chromatin organisation and epigenetic repression within these cell lines result in unpredictable and unstable transgene expression following random integration. Limited knowledge of the CHO genome and its higher-order chromatin organisation has thus far impeded functional genomics approaches required to tackle these issues. Here, we present an integrative three-dimensional (3D) map of genome organisation within the CHOK1SV® 10E9 cell line in conjunction with an improved, less fragmented CHOK1SV® 10E9 genome assembly. Using our high-resolution chromatin conformation datasets, we have assigned ≈ 90% of sequence to a chromosome-scale genome assembly. Our genome-wide 3D map identifies higher-order chromatin structures such as topologically associated domains, incorporates our chromatin accessibility data to enhance the identification of active cis-regulatory elements and importantly links these cis-regulatory elements to target promoters in a 3D promoter interactome. We demonstrate the power of our improved functional annotation by evaluating the 3D landscape of a transgene integration site and two phenotypically different cell lines. Our work opens up further novel genome engineering targets, has the potential to inform vital improvements for industrial biotherapeutic production, and represents a significant advancement for CHO cell line development. This article is protected by copyright. All rights reserved.

+view abstract Biotechnology and bioengineering, PMID: 33095445 23 Oct 2020

Thiecke MJ, Wutz G, Muhar M, Tang W, Bevan S, Malysheva V, Stocsits R, Neumann T, Zuber J, Fraser P, Schoenfelder S, Peters JM, Spivakov M Epigenetics

It is currently assumed that 3D chromosomal organization plays a central role in transcriptional control. However, depletion of cohesin and CTCF affects the steady-state levels of only a minority of transcripts. Here, we use high-resolution Capture Hi-C to interrogate the dynamics of chromosomal contacts of all annotated human gene promoters upon degradation of cohesin and CTCF. We show that a majority of promoter-anchored contacts are lost in these conditions, but many contacts with distinct properties are maintained, and some new ones are gained. The rewiring of contacts between promoters and active enhancers upon cohesin degradation associates with rapid changes in target gene transcription as detected by SLAM sequencing (SLAM-seq). These results provide a mechanistic explanation for the limited, but consistent, effects of cohesin and CTCF depletion on steady-state transcription and suggest the existence of both cohesin-dependent and -independent mechanisms of enhancer-promoter pairing.

+view abstract Cell reports, PMID: 32698000 21 Jul 2020

Wutz G, Ladurner R, St Hilaire BG, Stocsits RR, Nagasaka K, Pignard B, Sanborn A, Tang W, Várnai C, Ivanov MP, Schoenfelder S, van der Lelij P, Huang X, Dürnberger G, Roitinger E, Mechtler K, Davidson IF, Fraser PJ, Lieberman-Aiden E, Peters JM Epigenetics

Eukaryotic genomes are folded into loops. It is thought that these are formed by cohesin complexes extrusion, either until loop expansion is arrested by CTCF or until cohesin is removed from DNA by WAPL. Although WAPL limits cohesin's chromatin residence time to minutes, it has been reported that some loops exist for hours. How these loops can persist is unknown. We show that during G1-phase, mammalian cells contain acetylated cohesin which binds chromatin for hours, whereas cohesin binds chromatin for minutes. Our results indicate that CTCF and the acetyltransferase ESCO1 protect a subset of cohesin complexes from WAPL, thereby enable formation of long and presumably long-lived loops, and that ESCO1, like CTCF, contributes to boundary formation in chromatin looping. Our data are consistent with a model of nested loop extrusion, in which acetylated cohesin forms stable loops between CTCF sites, demarcating the boundaries of more transient cohesin extrusion activity.

+view abstract eLife, PMID: 32065581 17 Feb 2020

Schoenfelder S, Fraser P Epigenetics,

Spatiotemporal gene expression programmes are orchestrated by transcriptional enhancers, which are key regulatory DNA elements that engage in physical contacts with their target-gene promoters, often bridging considerable genomic distances. Recent progress in genomics, genome editing and microscopy methodologies have enabled the genome-wide mapping of enhancer-promoter contacts and their functional dissection. In this Review, we discuss novel concepts on how enhancer-promoter interactions are established and maintained, how the 3D architecture of mammalian genomes both facilitates and constrains enhancer-promoter contacts, and the role they play in gene expression control during normal development and disease.

+view abstract Nature reviews. Genetics, PMID: 31086298 2019

Schoenfelder S, Mifsud B, Senner CE, Todd CD, Chrysanthou S, Darbo E, Hemberger M, Branco MR Epigenetics,

The establishment of the embryonic and trophoblast lineages is a developmental decision underpinned by dramatic differences in the epigenetic landscape of the two compartments. However, it remains unknown how epigenetic information and transcription factor networks map to the 3D arrangement of the genome, which in turn may mediate transcriptional divergence between the two cell lineages. Here, we perform promoter capture Hi-C experiments in mouse trophoblast (TSC) and embryonic (ESC) stem cells to understand how chromatin conformation relates to cell-specific transcriptional programmes. We find that key TSC genes that are kept repressed in ESCs exhibit interactions between H3K27me3-marked regions in ESCs that depend on Polycomb repressive complex 1. Interactions that are prominent in TSCs are enriched for enhancer-gene contacts involving key TSC transcription factors, as well as TET1, which helps to maintain the expression of TSC-relevant genes. Our work shows that the first developmental cell fate decision results in distinct chromatin conformation patterns establishing lineage-specific contexts involving both repressive and active interactions.

+view abstract Nature communications, PMID: 30305613 2018

Koohy H, Bolland DJ, Matheson LS, Schoenfelder S, Stellato C, Dimond A, Várnai C, Chovanec P, Chessa T, Denizot J, Manzano Garcia R, Wingett SW, Freire-Pritchett P, Nagano T, Hawkins P, Stephens L, Elderkin S, Spivakov M, Fraser P, Corcoran AE, Varga-Weisz PD Signalling, Bioinformatics

Aging is characterized by loss of function of the adaptive immune system, but the underlying causes are poorly understood. To assess the molecular effects of aging on B cell development, we profiled gene expression and chromatin features genome-wide, including histone modifications and chromosome conformation, in bone marrow pro-B and pre-B cells from young and aged mice.

+view abstract Genome biology, PMID: 30180872 2018

Schoenfelder S, Javierre BM, Furlan-Magaril M, Wingett SW, Fraser P Epigenetics, Bioinformatics

The three-dimensional organization of the genome is linked to its function. For example, regulatory elements such as transcriptional enhancers control the spatio-temporal expression of their target genes through physical contact, often bridging considerable (in some cases hundreds of kilobases) genomic distances and bypassing nearby genes. The human genome harbors an estimated one million enhancers, the vast majority of which have unknown gene targets. Assigning distal regulatory regions to their target genes is thus crucial to understand gene expression control. We developed Promoter Capture Hi-C (PCHi-C) to enable the genome-wide detection of distal promoter-interacting regions (PIRs), for all promoters in a single experiment. In PCHi-C, highly complex Hi-C libraries are specifically enriched for promoter sequences through in-solution hybrid selection with thousands of biotinylated RNA baits complementary to the ends of all promoter-containing restriction fragments. The aim is to then pull-down promoter sequences and their frequent interaction partners such as enhancers and other potential regulatory elements. After high-throughput paired-end sequencing, a statistical test is applied to each promoter-ligated restriction fragment to identify significant PIRs at the restriction fragment level. We have used PCHi-C to generate an atlas of long-range promoter interactions in dozens of human and mouse cell types. These promoter interactome maps have contributed to a greater understanding of mammalian gene expression control by assigning putative regulatory regions to their target genes and revealing preferential spatial promoter-promoter interaction networks. This information also has high relevance to understanding human genetic disease and the identification of potential disease genes, by linking non-coding disease-associated sequence variants in or near control sequences to their target genes.

+view abstract Journal of visualized experiments : JoVE, PMID: 30010637 2018

Novo CL, Javierre BM, Cairns J, Segonds-Pichon A, Wingett SW, Freire-Pritchett P, Furlan-Magaril M, Schoenfelder S, Fraser P, Rugg-Gunn PJ Epigenetics, Bioinformatics

Transcriptional enhancers, including super-enhancers (SEs), form physical interactions with promoters to regulate cell-type-specific gene expression. SEs are characterized by high transcription factor occupancy and large domains of active chromatin, and they are commonly assigned to target promoters using computational predictions. How promoter-SE interactions change upon cell state transitions, and whether transcription factors maintain SE interactions, have not been reported. Here, we used promoter-capture Hi-C to identify promoters that interact with SEs in mouse embryonic stem cells (ESCs). We found that SEs form complex, spatial networks in which individual SEs contact multiple promoters, and a rewiring of promoter-SE interactions occurs between pluripotent states. We also show that long-range promoter-SE interactions are more prevalent in ESCs than in epiblast stem cells (EpiSCs) or Nanog-deficient ESCs. We conclude that SEs form cell-type-specific interaction networks that are partly dependent on core transcription factors, thereby providing insights into the gene regulatory organization of pluripotent cells.

+view abstract Cell reports, PMID: 29514091 2018

Comoglio F, Park HJ, Schoenfelder S, Barozzi I, Bode D, Fraser P, Green AR ,

Thrombopoietin (TPO) is a critical cytokine regulating hematopoietic stem cell maintenance and differentiation into the megakaryocytic lineage. However, the transcriptional and chromatin dynamics elicited by TPO signaling are poorly understood. Here, we study the immediate early transcriptional and cis-regulatory responses to TPO in hematopoietic stem/progenitor cells (HSPCs) and use this paradigm of cytokine signaling to chromatin to dissect the relation between cis- regulatory activity and chromatin architecture. We show that TPO profoundly alters the transcriptome of HSPCs, with key hematopoietic regulators being transcriptionally repressed within 30 minutes of TPO. By examining cis-regulatory dynamics and chromatin architectures, we demonstrate that these changes are accompanied by rapid and extensive epigenome remodeling of cis-regulatory landscapes that is spatially coordinated within topologically associating domains (TADs). Moreover, TPO-responsive enhancers are spatially clustered and engage in preferential homotypic intra- and inter-TAD interactions that are largely refractory to TPO signaling. By further examining the link between cis-regulatory dynamics and chromatin looping, we show that rapid modulation of cis-regulatory activity is largely independent of chromatin looping dynamics. Finally, we show that, although activated and repressed cis-regulatory elements share remarkably similar DNA sequence compositions, transcription factor binding patterns accurately predict rapid cis-regulatory responses to TPO.

+view abstract Genome research, PMID: 29429976 2018

Wutz G, Várnai C, Nagasaka K, Cisneros DA, Stocsits RR, Tang W, Schoenfelder S, Jessberger G, Muhar M, Hossain MJ, Walther N, Koch B, Kueblbeck M, Ellenberg J, Zuber J, Fraser P, Peters JM ,

Mammalian genomes are spatially organized into compartments, topologically associating domains (TADs), and loops to facilitate gene regulation and other chromosomal functions. How compartments, TADs, and loops are generated is unknown. It has been proposed that cohesin forms TADs and loops by extruding chromatin loops until it encounters CTCF, but direct evidence for this hypothesis is missing. Here, we show that cohesin suppresses compartments but is required for TADs and loops, that CTCF defines their boundaries, and that the cohesin unloading factor WAPL and its PDS5 binding partners control the length of loops. In the absence of WAPL and PDS5 proteins, cohesin forms extended loops, presumably by passing CTCF sites, accumulates in axial chromosomal positions (vermicelli), and condenses chromosomes. Unexpectedly, PDS5 proteins are also required for boundary function. These results show that cohesin has an essential genome-wide function in mediating long-range chromatin interactions and support the hypothesis that cohesin creates these by loop extrusion, until it is delayed by CTCF in a manner dependent on PDS5 proteins, or until it is released from DNA by WAPL.

+view abstract The EMBO journal, PMID: 29217591 2017

Harewood L, Kishore K, Eldridge MD, Wingett S, Pearson D, Schoenfelder S, Collins VP, Fraser P , Bioinformatics

Chromosomal rearrangements occur constitutionally in the general population and somatically in the majority of cancers. Detection of balanced rearrangements, such as reciprocal translocations and inversions, is troublesome, which is particularly detrimental in oncology where rearrangements play diagnostic and prognostic roles. Here we describe the use of Hi-C as a tool for detection of both balanced and unbalanced chromosomal rearrangements in primary human tumour samples, with the potential to define chromosome breakpoints to bp resolution. In addition, we show copy number profiles can also be obtained from the same data, all at a significantly lower cost than standard sequencing approaches.

+view abstract Genome biology, PMID: 28655341 2017

Mifsud B, Martincorena I, Darbo E, Sugar R, Schoenfelder S, Fraser P, Luscombe NM Epigenetics,

Hi-C is one of the main methods for investigating spatial co-localisation of DNA in the nucleus. However, the raw sequencing data obtained from Hi-C experiments suffer from large biases and spurious contacts, making it difficult to identify true interactions. Existing methods use complex models to account for biases and do not provide a significance threshold for detecting interactions. Here we introduce a simple binomial probabilistic model that resolves complex biases and distinguishes between true and false interactions. The model corrects biases of known and unknown origin and yields a p-value for each interaction, providing a reliable threshold based on significance. We demonstrate this experimentally by testing the method against a random ligation dataset. Our method outperforms previous methods and provides a statistical framework for further data analysis, such as comparisons of Hi-C interactions between different conditions. GOTHiC is available as a BioConductor package (http://www.bioconductor.org/packages/release/bioc/html/GOTHiC.html).

+view abstract PloS one, PMID: 28379994 2017

Freire-Pritchett P, Schoenfelder S, Várnai C, Wingett SW, Cairns J, Collier AJ, García-Vílchez R, Furlan-Magaril M, Osborne CS, Fraser PJ, Rugg-Gunn PJ, Spivakov M Epigenetics,

Long-range cis-regulatory elements such as enhancers coordinate cell-specific transcriptional programmes by engaging in DNA looping interactions with target promoters. Deciphering the interplay between the promoter connectivity and activity of cis-regulatory elements during lineage commitment is crucial for understanding developmental transcriptional control. Here, we use Promoter Capture Hi-C to generate a high-resolution atlas of chromosomal interactions involving ~22,000 gene promoters in human pluripotent and lineage-committed cells, identifying putative target genes for known and predicted enhancer elements. We reveal extensive dynamics of cis-regulatory contacts upon lineage commitment, including the acquisition and loss of promoter interactions. This spatial rewiring occurs preferentially with predicted changes in the activity of cis-regulatory elements, and is associated with changes in target gene expression. Our results provide a global and integrated view of promoter interactome dynamics during lineage commitment of human pluripotent cells.

+view abstract eLife, PMID: 28332981 2017

Martin P, McGovern A, Massey J, Schoenfelder S, Duffus K, Yarwood A, Barton A, Worthington J, Fraser P, Eyre S, Orozco G ,

The chromosomal region 6q23 has been found to be associated with multiple sclerosis (MS) predisposition through genome wide association studies (GWAS). There are four independent single nucleotide polymorphisms (SNPs) associated with MS in this region, which spans around 2.5 Mb. Most GWAS variants associated with complex traits, including these four MS associated SNPs, are non-coding and their function is currently unknown. However, GWAS variants have been found to be enriched in enhancers and there is evidence that they may be involved in transcriptional regulation of their distant target genes through long range chromatin looping.

+view abstract PloS one, PMID: 27861577 2016

McGovern A, Schoenfelder S, Martin P, Massey J, Duffus K, Plant D, Yarwood A, Pratt AG, Anderson AE, Isaacs JD, Diboll J, Thalayasingam N, Ospelt C, Barton A, Worthington J, Fraser P, Eyre S, Orozco G ,

The identification of causal genes from genome-wide association studies (GWAS) is the next important step for the translation of genetic findings into biologically meaningful mechanisms of disease and potential therapeutic targets. Using novel chromatin interaction detection techniques and allele specific assays in T and B cell lines, we provide compelling evidence that redefines causal genes at the 6q23 locus, one of the most important loci that confers autoimmunity risk.

+view abstract Genome biology, PMID: 27799070 2016

Cairns J, Freire-Pritchett P, Wingett SW, Várnai C, Dimond A, Plagnol V, Zerbino D, Schoenfelder S, Javierre BM, Osborne C, Fraser P, Spivakov M , Bioinformatics

Capture Hi-C (CHi-C) is a method for profiling chromosomal interactions involving targeted regions of interest, such as gene promoters, globally and at high resolution. Signal detection in CHi-C data involves a number of statistical challenges that are not observed when using other Hi-C-like techniques. We present a background model and algorithms for normalisation and multiple testing that are specifically adapted to CHi-C experiments. We implement these procedures in CHiCAGO ( http://regulatorygenomicsgroup.org/chicago ), an open-source package for robust interaction detection in CHi-C. We validate CHiCAGO by showing that promoter-interacting regions detected with this method are enriched for regulatory features and disease-associated SNPs.

+view abstract Genome biology, PMID: 27306882 2016

Wingett S, Ewels P, Furlan-Magaril M, Nagano T, Schoenfelder S, Fraser P, Andrews S , Bioinformatics

HiCUP is a pipeline for processing sequence data generated by Hi-C and Capture Hi-C (CHi-C) experiments, which are techniques used to investigate three-dimensional genomic organisation. The pipeline maps data to a specified reference genome and removes artefacts that would otherwise hinder subsequent analysis. HiCUP also produces an easy-to-interpret yet detailed quality control (QC) report that assists in refining experimental protocols for future studies. The software is freely available and has already been used for processing Hi-C and CHi-C data in several recently published peer-reviewed studies.

+view abstract F1000Research, PMID: 26835000 2015

Wilson NK, Schoenfelder S, Hannah R, Sánchez Castillo M, Schütte J, Ladopoulos V, Mitchelmore J, Goode DK, Calero-Nieto FJ, Moignard V, Wilkinson AC, Jimenez-Madrid I, Kinston S, Spivakov M, Fraser P, Göttgens B ,

Comprehensive study of transcriptional control processes will be required to enhance our understanding of both normal and malignant haematopoiesis. Modern sequencing technologies have revolutionized our ability to generate genome-scale expression and histone modification profiles, transcription factor binding maps and also comprehensive chromatin looping information. Many of these technologies however require large numbers of cells, and therefore cannot be applied to rare haematopoietic stem/progenitor cell (HSPC) populations. The stem cell factor (SCF) dependent multipotent progenitor cell line HPC-7 represents a well recognised cell line model for HSPCs. Here we report genome-wide maps for 17 transcription factors (TFs), 3 histone modifications, DNase I hypersensitive sites and high-resolution promoter-enhancer interactomes in HPC-7 cells. Integrated analysis of these complementary datasets revealed transcription factor occupancy patterns of genomic regions involved in promoter-anchored loops. Moreover, preferential associations between pairs of transcription factors bound at either ends of chromatin loops lead to the identification of four previously unrecognised protein-protein interactions between key blood stem cell regulators. All HPC-7 genome-scale datasets are freely available both through standard repositories and a user-friendly web interface. Together with previously generated genome-scale datasets, this study integrates HPC-7 data into a genomic resource on a par with ENCODE tier 1 cell lines, and importantly the only current model with comprehensive genome-scale data that is relevant to HSPC biology.

+view abstract Blood, PMID: 26809507 2016

Martin P, McGovern A, Orozco G, Duffus K, Yarwood A, Schoenfelder S, Cooper NJ, Barton A, Wallace C, Fraser P, Worthington J, Eyre S ,

Genome-wide association studies have been tremendously successful in identifying genetic variants associated with complex diseases. The majority of association signals are intergenic and evidence is accumulating that a high proportion of signals lie in enhancer regions. We use Capture Hi-C to investigate, for the first time, the interactions between associated variants for four autoimmune diseases and their functional targets in B- and T-cell lines. Here we report numerous looping interactions and provide evidence that only a minority of interactions are common to both B- and T-cell lines, suggesting interactions may be highly cell-type specific; some disease-associated SNPs do not interact with the nearest gene but with more compelling candidate genes (for example, FOXO1, AZI2) often situated several megabases away; and finally, regions associated with different autoimmune diseases interact with each other and the same promoter suggesting common autoimmune gene targets (for example, PTPRC, DEXI and ZFP36L1).

+view abstract Nature communications, PMID: 26616563 2015

Schoenfelder S, Sugar R, Dimond A, Javierre BM, Armstrong H, Mifsud B, Dimitrova E, Matheson L, Tavares-Cadete F, Furlan-Magaril M, Segonds-Pichon A, Jurkowski W, Wingett SW, Tabbada K, Andrews S, Herman B, LeProust E, Osborne CS, Koseki H, Fraser P, Luscombe NM, Elderkin S , Genomics

The Polycomb repressive complexes PRC1 and PRC2 maintain embryonic stem cell (ESC) pluripotency by silencing lineage-specifying developmental regulator genes. Emerging evidence suggests that Polycomb complexes act through controlling spatial genome organization. We show that PRC1 functions as a master regulator of mouse ESC genome architecture by organizing genes in three-dimensional interaction networks. The strongest spatial network is composed of the four Hox gene clusters and early developmental transcription factor genes, the majority of which contact poised enhancers. Removal of Polycomb repression leads to disruption of promoter-promoter contacts in the Hox gene network. In contrast, promoter-enhancer contacts are maintained in the absence of Polycomb repression, with accompanying widespread acquisition of active chromatin signatures at network enhancers and pronounced transcriptional upregulation of network genes. Thus, PRC1 physically constrains developmental transcription factor genes and their enhancers in a silenced but poised spatial network. We propose that the selective release of genes from this spatial network underlies cell fate specification during early embryonic development.

+view abstract Nature genetics, PMID: 26323060 2015

Nagano T, Várnai C, Schoenfelder S, Javierre BM, Wingett SW, Fraser P Signalling,

Chromosome conformation capture and various derivative methods such as 4C, 5C and Hi-C have emerged as standard tools to analyze the three-dimensional organization of the genome in the nucleus. These methods employ ligation of diluted cross-linked chromatin complexes, intended to favor proximity-dependent, intra-complex ligation. During development of single-cell Hi-C, we devised an alternative Hi-C protocol with ligation in preserved nuclei rather than in solution. Here we directly compare Hi-C methods employing in-nucleus ligation with the standard in-solution ligation.

+view abstract Genome biology, PMID: 26306623 2015

Mifsud B, Tavares-Cadete F, Young AN, Sugar R, Schoenfelder S, Ferreira L, Wingett SW, Andrews S, Grey W, Ewels PA, Herman B, Happe S, Higgs A, LeProust E, Follows GA, Fraser P, Luscombe NM, Osborne CS , Bioinformatics

Transcriptional control in large genomes often requires looping interactions between distal DNA elements, such as enhancers and target promoters. Current chromosome conformation capture techniques do not offer sufficiently high resolution to interrogate these regulatory interactions on a genomic scale. Here we use Capture Hi-C (CHi-C), an adapted genome conformation assay, to examine the long-range interactions of almost 22,000 promoters in 2 human blood cell types. We identify over 1.6 million shared and cell type-restricted interactions spanning hundreds of kilobases between promoters and distal loci. Transcriptionally active genes contact enhancer-like elements, whereas transcriptionally inactive genes interact with previously uncharacterized elements marked by repressive features that may act as long-range silencers. Finally, we show that interacting loci are enriched for disease-associated SNPs, suggesting how distal mutations may disrupt the regulation of relevant genes. This study provides new insights and accessible tools to dissect the regulatory interactions that underlie normal and aberrant gene regulation.

+view abstract Nature genetics, PMID: 25938943 2015

Schoenfelder S, Furlan-Magaril M, Mifsud B, Tavares-Cadete F, Sugar R, Javierre BM, Nagano T, Katsman Y, Sakthidevi M, Wingett SW, Dimitrova E, Dimond A, Edelman LB, Elderkin S, Tabbada K, Darbo E, Andrews S, Herman B, Higgs A, LeProust E, Osborne CS, Mitchell JA, Luscombe NM, Fraser P , Bioinformatics

The mammalian genome harbors up to one million regulatory elements often located at great distances from their target genes. Long-range elements control genes through physical contact with promoters and can be recognized by the presence of specific histone modifications and transcription factor binding. Linking regulatory elements to specific promoters genome-wide is currently impeded by the limited resolution of high-throughput chromatin interaction assays. Here we apply a sequence capture approach to enrich Hi-C libraries for >22,000 annotated mouse promoters to identify statistically significant, long-range interactions at restriction fragment resolution, assigning long-range interacting elements to their target genes genome-wide in embryonic stem cells and fetal liver cells. The distal sites contacting active genes are enriched in active histone modifications and transcription factor occupancy, whereas inactive genes contact distal sites with repressive histone marks, demonstrating the regulatory potential of the distal elements identified. Furthermore, we find that coregulated genes cluster nonrandomly in spatial interaction networks correlated with their biological function and expression level. Interestingly, we find the strongest gene clustering in ES cells between transcription factor genes that control key developmental processes in embryogenesis. The results provide the first genome-wide catalog linking gene promoters to their long-range interacting elements and highlight the complex spatial regulatory circuitry controlling mammalian gene expression.

+view abstract Genome research, PMID: 25752748 2015

Jäger R, Migliorini G, Henrion M, Kandaswamy R, Speedy HE, Heindl A, Whiffin N, Carnicer MJ, Broome L, Dryden N, Nagano T, Schoenfelder S, Enge M, Yuan Y, Taipale J, Fraser P, Fletcher O, Houlston RS ,

Multiple regulatory elements distant from their targets on the linear genome can influence the expression of a single gene through chromatin looping. Chromosome conformation capture implemented in Hi-C allows for genome-wide agnostic characterization of chromatin contacts. However, detection of functional enhancer-promoter interactions is precluded by its effective resolution that is determined by both restriction fragmentation and sensitivity of the experiment. Here we develop a capture Hi-C (cHi-C) approach to allow an agnostic characterization of these physical interactions on a genome-wide scale. Single-nucleotide polymorphisms associated with complex diseases often reside within regulatory elements and exert effects through long-range regulation of gene expression. Applying this cHi-C approach to 14 colorectal cancer risk loci allows us to identify key long-range chromatin interactions in cis and trans involving these loci.

+view abstract Nature communications, PMID: 25695508 2015

Chandra T, Ewels PA, Schoenfelder S, Furlan-Magaril M, Wingett SW, Kirschner K, Thuret JY, Andrews S, Fraser P, Reik W Epigenetics, Bioinformatics

Cellular senescence has been implicated in tumor suppression, development, and aging and is accompanied by large-scale chromatin rearrangements, forming senescence-associated heterochromatic foci (SAHF). However, how the chromatin is reorganized during SAHF formation is poorly understood. Furthermore, heterochromatin formation in senescence appears to contrast with loss of heterochromatin in Hutchinson-Gilford progeria. We mapped architectural changes in genome organization in cellular senescence using Hi-C. Unexpectedly, we find a dramatic sequence- and lamin-dependent loss of local interactions in heterochromatin. This change in local connectivity resolves the paradox of opposing chromatin changes in senescence and progeria. In addition, we observe a senescence-specific spatial clustering of heterochromatic regions, suggesting a unique second step required for SAHF formation. Comparison of embryonic stem cells (ESCs), somatic cells, and senescent cells shows a unidirectional loss in local chromatin connectivity, suggesting that senescence is an endpoint of the continuous nuclear remodelling process during differentiation.

+view abstract Cell reports, PMID: 25640177 2015

Dryden NH, Broome LR, Dudbridge F, Johnson N, Orr N, Schoenfelder S, Nagano T, Andrews S, Wingett S, Kozarewa I, Assiotis I, Fenwick K, Maguire SL, Campbell J, Natrajan R, Lambros M, Perrakis E, Ashworth A, Fraser P, Fletcher O , Bioinformatics

Genome-wide association studies have identified more than 70 common variants that are associated with breast cancer risk. Most of these variants map to non-protein-coding regions and several map to gene deserts, regions of several hundred kilobases lacking protein-coding genes. We hypothesized that gene deserts harbor long-range regulatory elements that can physically interact with target genes to influence their expression. To test this, we developed Capture Hi-C (CHi-C), which, by incorporating a sequence capture step into a Hi-C protocol, allows high-resolution analysis of targeted regions of the genome. We used CHi-C to investigate long-range interactions at three breast cancer gene deserts mapping to 2q35, 8q24.21, and 9q31.2. We identified interaction peaks between putative regulatory elements ("bait fragments") within the captured regions and "targets" that included both protein-coding genes and long noncoding (lnc) RNAs over distances of 6.6 kb to 2.6 Mb. Target protein-coding genes were IGFBP5, KLF4, NSMCE2, and MYC; and target lncRNAs included DIRC3, PVT1, and CCDC26. For one gene desert, we were able to define two SNPs (rs12613955 and rs4442975) that were highly correlated with the published risk variant and that mapped within the bait end of an interaction peak. In vivo ChIP-qPCR data show that one of these, rs4442975, affects the binding of FOXA1 and implicate this SNP as a putative functional variant.

+view abstract Genome research, PMID: 25122612 2014

T Nagano, Y Lubling, TJ Stevens, S Schoenfelder, E Yaffe, W Dean, ED Laue, A Tanay, P Fraser ,

Large-scale chromosome structure and spatial nuclear arrangement have been linked to control of gene expression and DNA replication and repair. Genomic techniques based on chromosome conformation capture (3C) assess contacts for millions of loci simultaneously, but do so by averaging chromosome conformations from millions of nuclei. Here we introduce single-cell Hi-C, combined with genome-wide statistical analysis and structural modelling of single-copy X chromosomes, to show that individual chromosomes maintain domain organization at the megabase scale, but show variable cell-to-cell chromosome structures at larger scales. Despite this structural stochasticity, localization of active gene domains to boundaries of chromosome territories is a hallmark of chromosomal conformation. Single-cell Hi-C data bridge current gaps between genomics and microscopy studies of chromosomes, demonstrating how modular organization underlies dynamic chromosome structure, and how this structure is probabilistically linked with genome activity patterns.

+view abstract Nature, PMID: 24067610 2013

Mitchell JA, Clay I, Umlauf D, Chen CY, Moir CA, Eskiw CH, Schoenfelder S, Chakalova L, Nagano T, Fraser P , Genomics

In addition to protein coding genes a substantial proportion of mammalian genomes are transcribed. However, most transcriptome studies investigate steady-state mRNA levels, ignoring a considerable fraction of the transcribed genome. In addition, steady-state mRNA levels are influenced by both transcriptional and posttranscriptional mechanisms, and thus do not provide a clear picture of transcriptional output. Here, using deep sequencing of nuclear RNAs (nucRNA-Seq) in parallel with chromatin immunoprecipitation sequencing (ChIP-Seq) of active RNA polymerase II, we compared the nuclear transcriptome of mouse anemic spleen erythroid cells with polymerase occupancy on a genome-wide scale. We demonstrate that unspliced transcripts quantified by nucRNA-seq correlate with primary transcript frequencies measured by RNA FISH, but differ from steady-state mRNA levels measured by poly(A)-enriched RNA-seq. Highly expressed protein coding genes showed good correlation between RNAPII occupancy and transcriptional output; however, genome-wide we observed a poor correlation between transcriptional output and RNAPII association. This poor correlation is due to intergenic regions associated with RNAPII which correspond with transcription factor bound regulatory regions and a group of stable, nuclear-retained long non-coding transcripts. In conclusion, sequencing the nuclear transcriptome provides an opportunity to investigate the transcriptional landscape in a given cell type through quantification of unspliced primary transcripts and the identification of nuclear-retained long non-coding RNAs.

+view abstract PloS one, PMID: 23209567 2012

Eskiw CH, Cope NF, Clay I, Schoenfelder S, Nagano T, Fraser P ,

The dynamic compartmental organization of the transcriptional machinery in mammalian nuclei places particular constraints on the spatial organization of the genome. The clustering of active RNA polymerase I transcription units from several chromosomes at nucleoli is probably the best-characterized and universally accepted example. RNA polymerase II localization in mammalian nuclei occurs in distinct concentrated foci that are several-fold fewer in number compared to the number of active genes and transcription units. Individual transcribed genes cluster at these shared transcription factories in a nonrandom manner, preferentially associating with heterologous, coregulated genes. We suggest that the three-dimensional (3D) conformation and relative arrangement of chromosomes in the nucleus has a major role in delivering tissue-specific gene-expression programs.

+view abstract Cold Spring Harbor symposia on quantitative biology, PMID: 21467135 2010

S Schoenfelder, I Clay, P Fraser ,

Transcription in the eukaryotic nucleus has long been thought of as conforming to a model in which RNA polymerase complexes are recruited to and track along isolated templates. However, a more dynamic role for chromatin in transcriptional regulation is materializing: enhancer elements interact with promoters forming loops that often bridge considerable distances and genomic loci, even located on different chromosomes, undergo chromosomal associations. These associations amass to form an extensive 'transcriptional interactome', enacted at functional subnuclear compartments, to which genes dynamically relocate. The emerging view is that long-range chromosomal associations between genomic regions, and their repositioning in the three-dimensional space of the nucleus, are key contributors to the regulation of gene expression.

+view abstract Current opinion in genetics & development, PMID: 20211559 2010

S Schoenfelder, T Sexton, L Chakalova, NF Cope, A Horton, S Andrews, S Kurukuti, JA Mitchell, D Umlauf, DS Dimitrova, CH Eskiw, Y Luo, CL Wei, Y Ruan, JJ Bieker, P Fraser , Bioinformatics

The discovery of interchromosomal interactions in higher eukaryotes points to a functional interplay between genome architecture and gene expression, challenging the view of transcription as a one-dimensional process. However, the extent of interchromosomal interactions and the underlying mechanisms are unknown. Here we present the first genome-wide analysis of transcriptional interactions using the mouse globin genes in erythroid tissues. Our results show that the active globin genes associate with hundreds of other transcribed genes, revealing extensive and preferential intra- and interchromosomal transcription interactomes. We show that the transcription factor Klf1 mediates preferential co-associations of Klf1-regulated genes at a limited number of specialized transcription factories. Our results establish a new gene expression paradigm, implying that active co-regulated genes and their regulatory factors cooperate to create specialized nuclear hot spots optimized for efficient and coordinated transcriptional control.

+view abstract Nature genetics, PMID: 20010836 2010

S Schoenfelder, P Fraser ,

Long-distance chromosomal interactions are emerging as a potential mechanism of gene expression control. In this issue, Apostolou and Thanos (2008) describe how viral infection elicits interchromosomal associations between the interferon-beta (IFN-beta) gene enhancer and DNA binding sites of the transcription factor NF-kappaB, resulting in the initiation of transcription and an antiviral response.

+view abstract Cell, PMID: 18614003 2008

S Schoenfelder, G Smits, P Fraser, W Reik, R Paro Epigenetics,

The imprinting control region (ICR) upstream of H19 is the key regulatory element conferring monoallelic expression on H19 and Igf2 (insulin-like growth factor 2). Epigenetic marks in the ICR regulate its interaction with the chromatin protein CCCTC-binding factor and with other control factors to coordinate gene silencing in the imprinting cluster. Here, we show that the H19 ICR is biallelically transcribed, producing both sense and antisense RNAs. We analyse the function of the non-coding transcripts in a Drosophila transgenic system in which the H19 upstream region silences the expression of a reporter gene. We show that knockdown of H19 ICR non-coding RNA (ncRNA) by RNA interference leads to the loss of reporter gene silencing. Our results are, to the best of our knowledge, the first to show that ncRNAs in the H19 ICR are functionally significant, and also indicate that they have a role in regulating gene expression and perhaps epigenetic marks at the H19/Igf2 locus.

+view abstract EMBO reports, PMID: 17948025 2007