98% of the DNA in our body is non-coding, i.e. does not carry the information needed to build proteins. Non-coding has sometimes been equated with ‘non-functional’, or called ‘junk’ in the past; today we know that this is far from the truth. Scattered throughout non-coding DNA is a plethora of so-called regulatory elements, including enhancers, silencers and insulators. These regulatory elements function like molecular switches to control which genes are active (and thus produce proteins) in which cells. This process of gene expression control is vital to allow cells – which all contain the same genes – to specialise to carry out different tasks, and to help them respond to changes.
Enhancers are a type of regulatory element that control gene expression over long distances. They contact their target genes via chromosomal interactions, often bridging large distances in the genome, with the intervening DNA ‘looping out’. To understand how enhancers work, we study them in the context of the three-dimensional organisation of the genome. Our aim is to find regulatory elements and to understand which genes they control. We also aim to uncover the molecular mechanisms by which regulatory elements find their target genes in the three-dimensional space of the cell nucleus, and to understand how altering the function of regulatory elements can lead to developmental malformations and disease. We study these questions in pluripotent stem cells – cells that have the potential to create all cell types in the adult body. We use a combination of molecular, genetic, biochemical and imaging approaches to study pluripotent stem cells in their ‘ground state’, and when they start to form new cell types – a process called cell lineage specification.
Through high-resolution mapping and experimental perturbation of the spatial genome architecture, we aim to reveal gene regulatory principles that underpin cell states and cell fate transitions. This may ultimately pave the way for us to experimentally engineer 3D genome folding to achieve predictable outcomes on gene expression and cell fate choice, with potential implications for gene therapy and regenerative medicine.
To produce a diverse antibody repertoire, immunoglobulin heavy-chain (Igh) loci undergo large-scale alterations in structure to facilitate juxtaposition and recombination of spatially separated variable (V), diversity (D), and joining (J) genes. These chromosomal alterations are poorly understood. Uncovering their patterns shows how chromosome dynamics underpins antibody diversity. Using tiled Capture Hi-C, we produce a comprehensive map of chromatin interactions throughout the 2.8-Mb Igh locus in progenitor B cells. We find that the Igh locus folds into semi-rigid subdomains and undergoes flexible looping of the V genes to its 3' end, reconciling two views of locus organization. Deconvolution of single Igh locus conformations using polymer simulations identifies thousands of different structures. This heterogeneity may underpin the diversity of V(D)J recombination events. All three immunoglobulin loci also participate in a highly specific, developmentally regulated network of interchromosomal interactions with genes encoding B cell-lineage factors. This suggests a model of interchromosomal coordination of B cell development.
There is widespread interest in the three-dimensional chromatin conformation of the genome and its impact on gene expression. However, these studies frequently do not consider parent-of-origin differences, such as genomic imprinting, which result in monoallelic expression. In addition, genome-wide allele-specific chromatin conformation associations have not been extensively explored. There are few accessible bioinformatic workflows for investigating allelic conformation differences and these require pre-phased haplotypes which are not widely available.
Genome sequencing has revealed over 300 million genetic variations in human populations. Over 90% of variants are single nucleotide polymorphisms (SNPs), the remainder include short deletions or insertions, and small numbers of structural variants. Hundreds of thousands of these variants have been associated with specific phenotypic traits and diseases through genome wide association studies which link significant differences in variant frequencies with specific phenotypes among large groups of individuals. Only 5% of disease-associated SNPs are located in gene coding sequences, with the potential to disrupt gene expression or alter of the function of encoded proteins. The remaining 95% of disease-associated SNPs are located in non-coding DNA sequences which make up 98% of the genome. The role of non-coding, disease-associated SNPs, many of which are located at considerable distances from any gene, was at first a mystery until the discovery that gene promoters regularly interact with distal regulatory elements to control gene expression. Disease-associated SNPs are enriched at the millions of gene regulatory elements that are dispersed throughout the non-coding sequences of the genome, suggesting they function as gene regulation variants. Assigning specific regulatory elements to the genes they control is not straightforward since they can be millions of base pairs apart. In this review we describe how understanding 3D genome organization can identify specific interactions between gene promoters and distal regulatory elements and how 3D genomics can link disease-associated SNPs to their target genes. Understanding which gene or genes contribute to a specific disease is the first step in designing rational therapeutic interventions.