INTRODUCTION

Biological receptors present intrinsic dynamism and plasticity to respond to physiological changes.[1] The binding of ligands in the orthosteric site located at the extracellular region can establish dynamic and chemical communication with the intracellular region resulting in key structural rearrangements that trigger a particular response.[2] This communication between distant protein sites is called allostery and its understanding presents a current challenge for many protein complexes.[1, 3-5] Allosteric modulators are capable to bind regions other than orthosteric sites and propagate communication networks to other functional regions of the receptor.[6] A complete information for the structure-based design of allosteric modulators involves not only the exploration of the receptor activation pathway and the identification of binding sites, [7] but also the characterization of the allosteric pathways, which is a complicated task. However, allosteric drugs offer the advantage of higher selectivity respect to conventional drugs due to the greater sequence variability of allosteric sites.[8, 9]

In G-protein coupled receptors (GPCRs), the binding of endogenous agonist in the orthosteric site reshapes the conformational ensemble of a Transmembrane (TM) helix to prime the binding and activation of G-proteins.[10-12] The conformational landscapes of GPCRs associated with the receptor activation are complex, with some general principles in common and also receptor-specific differences.[13, 14] For many GPCRs it has been observed that (1) inactive and pre-active states are in dynamic equilibrium in the apo form, (2) the action of the agonist induce subtle population shifts involving the stabilization of intermediate and pre-active states and/or decreasing their rates of interconversion and (3) the combined action of both, agonist and G-protein binding stabilizes a more rigid fully-active state. [11, 13, 15-17] The complexity of GPCRs activation landscapes difficult its experimental and computational characterization. However, the study of its system-specific properties opens the door for de design of allosteric drugs.

The adenosine A1 receptor (A1R) belongs to the class A G protein-coupled receptor (GPCR) family and has been targeted for the pain management through allosteric modulation without success in clinical trials. [18, 19] Fortunately, X-ray crystallography and Cryo-electron microscopy (Cryo-EM) captured A1R in an inactive and active states revealing a notorious inward-to-outward conformational transition of TM6 (Figure 1A and S1).[18, 20, 21] The most recently released active state structure is crystalized with adenosine, trimeric G-protein and also with a positive allosteric modulator (PAM) located in an extrahelical region.[18] Computational and mutagenesis studies have reported interesting insights about the A1R activation and allosteric modulation identifying important residues for the signaling efficacy of agonists and cooperativity of PAMs.[18, 22, 23] Despite all these achievements, a detailed characterization of the allosteric networks that drive receptor activation and G-proteins binding is still missing.

Free energy landscape (FEL) of A1R activation in presence of adenosine (ADO).

(A) TM6 inward-to-outward conformational transition observed in the inactive (PDB 5N2S) and active (PDB 6D9H) X-Ray and Cryo-EM structures. (B) Features (TM6 torsion and TM3-TM6 center of mass distance of the intracellular ends) used to follow the TM6 inward-to-outward transition (receptor activation) in this work. The X-ray and Cryo-EM values are also shown. (C) Population analysis obtained from conventional molecular dynamics (cMD) simulations of A1R activation starting from the inactive X-Ray and active Cryo-EM structures, the inactive X-ray and active Cryo-EM coordinates are projected as blue and magenta stars, respectively. (D) Reconstruction of the FEL associated with the A1R activation obtained from metadynamics simulations. The most relevant conformational states are labeled from 1 to 5. Note that the lowest energy states (1,3 and 4) are labeled in gray while the other (2 and 5) in orange. The X-ray and Cryo-EM coordinates are also projected. (E) Representative structures of the inactive (1), Intermediate (2-3) and Pre-active (4-5) conformational states sampled along A1R-ADO activation.

In this work, we develop a computational workflow tailored to decipher the interplay between the receptor activation pathway, the allosteric communication networks and transient pockets. First, we used conventional molecular dynamics (MD) simulations coupled with enhanced sampling techniques to reconstruct the receptor activation conformational landscape of the inward-to-outward TM6 transition revealing the inactive, intermediate, pre-active and fully-active conformational states. Second, we study the dynamic communication energy networks throughout the conformational states sampled successfully capturing the extra and intracellular communicating centers and the pathways that interconnect them. We observe that the allosteric communication is enhanced along the receptor activation and fine-tuned in presence of the trimeric G-protein. Third, we used a geometric-based approach to search for the formation transient pockets. By studying the connection between these three elements we give a complete dynamic picture of the A1R activation that is essential for the design of specific allosteric modulators. This in silico approach can be also applied to uncover the activation landscape, allosteric networks and transient pockets of related GPCRs, which is of interest for allosteric drug design.

RESULTS

The free energy landscape of A1R activation reveals intermediate and pre-active states

To uncover the activation pathway of A1R in presence of its endogenous agonist adenosine (A1R-ADO), we reconstructed the free energy landscape (FEL) of A1R-ADO associated with the TM6 inward-to-outward transition observed by X-ray and Cryo-EM data (Figure 1 A). Specifically, we focus on two features: the TM6 torsion and the center of mass (COM) distances between the TM3 and TM6 intracellular ends (Figure 1B). Initially, we performed conventional molecular dynamics (cMD) simulations starting from both, the inactive (PDB 5N2S) and the active (PDB 6D9H) structures in order to increase the sampling of the activation conformational space. For each starting point we computed 3 replicas of 500ns. Either starting from the inactive or active structures the complete TM6 Inward-to-Outward transition is not sampled (Figure 1C). However, when starting from the active structure, an intermediate state centered ca. at TM6 torsion (−140°) and TM3-TM6 distance (15Å) coordinates is substantially populated. This intermediate structure still presents a rather high degree of TM6 torsion while the TM3-TM6 distances are more shortened. In contrast, when starting from the inactive structure, the receptor mostly samples inactive conformations exhibiting a low probability to progress towards the intermediate state, thus suggesting a higher inactive-to-active transition time scale.

To estimate a reliable energy landscape of the A1R-ADO activation and check the importance of the intermediate state, we relied on metadynamics simulations, a powerful method that has been successfully used to study complex conformational transitions in proteins,[24-26] including GPCRs. [12, 15] In particular, we performed well-tempered metadynamics (WT-MetaD) simulations using the walkers approach (see Materials and methods). We selected 10 representative structures along the activation pathway sampled in the cMD as starting points for the walker replicas (see Figure S2). After 250 ns of accumulated time the FEL was considered to be converged (see Figure S3). The FEL in presence of adenosine confirms that A1R presents three major states in dynamic equilibrium (inactive, intermediate and pre-active) while the fully-active Cryo-EM like state is not significantly populated. The relative stabilities of the three states shows that the most stable state is the inactive, followed by the pre-active and intermediate states, which are slightly higher in energy. The inactive state resembles the inactive X-ray coordinates showing short TM3-TM6 distances and low TM6 torsion values (state 1 in Figure 1 D-E). The inactive conformations are stabilized by a tight energy coupling between TM3 and TM6 (see below). As the activation progresses, TM6 torsion evolves further than the TM3-TM6 distances. At this point, the remaining energy coupling between TM3 and TM6 ends hampers the progression of the TM6 outward transition. This tug of war between forces provokes the adoption of a torsion in the TM6 end that is signature of the intermediate conformations (state 2 in Figure 1 D-E). This notorious torsion evolves to a more stable intermediate conformation (state 3 in Figure 1 D-E). Finally, a pre-active local energy minimum is reached completing the TM6 transition, which involves the complete break between TM6 and TM3 energy coupling (see below). The pre-active state presents similar TM3-TM6 distances to the fully-active Cryo-EM structure. However, it exhibits higher TM6 torsion values (state 4 in Figure 1 D-E). In addition, an extra opening of TM6 is accessible (state 5 in Figure 1 D-E). All together suggest that the coupling of the G-proteins is required to stabilize the adoption of the fully-active Cryo-EM like structure (see next sections). Once deciphered the A1R-ADO FEL of activation, our study follows on investigating the receptor-specific allosteric properties that harbor these inactive, intermediate and pre-active states.

Energy Networks captures the dynamic allosteric pathways along A1R activation

A complete understanding of allosteric modulation involves the decoding of the communication pathways that dynamically couples distinct protein sites. Despite the difficulties, network theory has been successfully applied to uncover the allosteric communication pathways in protein complexes [24, 27], including GPCRs [28, 29]. In order to trace down the allosteric pathways in A1R-ADO, we relied on the protein energy networks (PEN) approach. [30] First, we use the get Residue Interaction eNergies and Networks (gRINN)[30] tool to calculate the pairwise residue interaction energies along the A1R-ADO conformational ensemble and obtain the mean interaction energy matrix (IEM). Second, we processed the mean IEM into the shortest-path map (SPN) [27, 31, 32] tool to construct and visualize the PEN graph.

The analysis of the PEN associated with the A1R-ADO ensemble shows that the extracellular region can communicate with the intracellular region through multiple energy pathways (see Figure 2 A). In the extracellular region, ECL2 and ECL3 presents a center of communication connecting with TM4, TM5, TM1 and TM6. Energy pathways involving TM5, TM6 and TM4 propagates from the extracellular ECL2/ECL3 regions and link with TM3 in the intracellular region. In addition, TM3 establish connections with TM2 and TM7 that communicates back to the ECL2 by crosslinking with TM1 (Figure 2 A and B). TM6 is then energetically coupled with the extracellular region through ECL3/ECL2 and with the intracellular region through TM3. In fact, the TM3 end plays a central role for the allosteric communication in the intracellular region. Specifically, TM3 (R108) is a communication hub forming and hydrogen networking with TM6 (D229), TM3 (D104) and D326 (C-Ter). Regarding the energy communication at the orthosteric site, adenosine samples a wide range of poses in its large binding site [18, 33] performing transient interactions with many residues of the PEN, thus establishing multiple transient energy communication that propagates towards the TM6 intracellular end (see Figure S4).

Protein energy networks (PEN) of A1R-ADO along activation.

(A) PEN analysis of the A1R conformational ensemble. The PEN identifies extra and intracellular communication centers together with the allosteric pathways that interconnect them. The PEN residues (nodes) are represented by colored spheres as a function of the receptor region (e.g. TM6 nodes in teal) while the allosteric pathways (edges) as yellow-orange sticks. The size of each edge and node correspond to their importance for the allosteric communication. The experimentally identified allosteric residues captured in the PEN nodes are labeled with a red asterisk. (B) Relevant interactions of the PEN in the upper region of the receptor. (C) PEN analysis of the A1R ensemble split by conformational states (inactive, intermediate and pre-active). The allosteric communication is enhanced along the receptor activation. (D) Relevant interactions found in the PEN of the lower region of the receptor along the inactive, intermediate and pre-active conformational states.

In order to provide further insights in the allosteric communication along the activation pathway we decided to split the analysis into the inactive, intermediate and pre-active states (Figure 2C and S5). We started the analysis with the inactive PEN, which shows that the extracellular ECL2 center is only connected to the intracellular region through TM2 and TM1. In the intracellular region, TM3 (R108) establish a tight communication with TM3 (D104), TM6 (D229) and C-Ter (D326), see Figure 2 C and D. This hydrogen network is signature of the inactive in-ward conformation of TM6. In fact, TM6 (D229)-TM3 (R108) interaction is pivotal to attain the inactive conformation of the receptor. Interestingly C-Ter (D326) plays an important role in the energy communication. The flexible C-Ter region can partially occupy the G-proteins binding site and perform stable communication with TM3 (R108). We follow the analysis with the PEN of the intermediate ensemble. In this case, the extracellular communication center is reduced and ECL2/ECL3 region is connected to the intracellular region through TM7 and TM1. Interestingly, the communication through TM6 start to take place partially in the intracellular region. In contrast with the inactive ensemble, TM6 (D229) now communicates with TM3 (R105) instead of TM3 (R108) in the intracellular region, see Figure 2 C-D. This transient interaction is key to prevent TM6 opening after the tension generated in the last segment of TM6 and provokes a torsion in the TM6 intracellular end that captures the receptor in an intermediate state (described above). In this scenario, TM3 (R108) communicates with TM2 (D42) and TM3 (D104). Note that the C-Ter does not communicate with TM3, which explains the transience of this interaction due to the C-Ter flexibility. Finally, in the pre-active ensemble, the extracellular ECL2 networks are recovered and connect to the intrahelical region through multiple pathways, including TM2, TM4, TM5, TM6 and TM7. The partial communication through TM6 observed in the intermediate state progresses further and it is completed linking with TM5 (Figure 2 B). This enhanced communication between the intra-and extracellular region is driven by the complete opening of TM6. As expected, the TM6 opening disconnect TM6 (D229) from TM3 (R108 and R105). Indeed, the TM3 (R108) node loses prominence in the intracellular region. However, TM3 (R108) still can communicate with C-Ter (D326), which may compensate the break of TM3-TM6 ends (Figure 2 C and D). Complementary insights are gained by computing relevant interactions of the intracellular communication center along activation (Figure S6).

At this point, we wonder if the positions captured in the PEN coincide with allosteric residues previously identified in mutagenesis studies that affect the allosteric responses of the receptor. For the ECL2/ECL3 extracellular region 9 allosteric residues have been identified that affect the efficacy of orthosteric agonist [22, 34]. Among all of them, up to 7 are captured in the ECL2/ECL3 PEN communication center (W146, N148, K173, K265, E170, S150 and T270) and the other 3 are located in the ECL2 alpha helix that connects with the PEN through S150. In addition, other 4 allosteric residues located in a TM1-7 extrahelical region have been recently reported to be important for PAM cooperativity. [18] The PEN directly captures S246 and L242 and the other two residues (L245 and G279) are located adjacent to the PEN residues H278 and S246, respectively. Note that the experimentally identified allosteric residues captured in the PEN are labeled with a red asterisk in Figure 2 A-C. The high predictive power of the PEN to identify allosteric residues highlights the reliability of the characterized allosteric pathways. It is worth mentioning that among the allosteric residues captured in the PEN all of them are identified in the pre-active ensemble while only some of them are identified in the inactive and intermediate states. This is due to the enhanced allosteric communication observed upon activation. In summary, the PEN analysis along the conformational states captures many important positions for the allosteric mechanisms of the receptor and provides a dynamic view of the protein energy communication along the activation of the receptor. The TM6 in-ward to outward transition reveals that the TM6 allosteric pathway along the receptor is missing in the inactive state, partially formed in the intermediate state and completely established upon the receptor activation. Indeed, the pre-active state is characterized by an increased allosteric communication though multiple pathways.

G-Proteins binding stabilizes the fully-active state and fine-tune the allosteric communication

The next unknow we were intrigued to investigate is how the binding of G-proteins affects the A1R conformational ensemble and the allosteric networks. To that end, we performed 3 cMD replicas of 500 ns of A1R in presence of both, adenosine and heterotrimeric Gi2 protein (referred as ADO-A1R-Gi2 complex). As previously, the MD data was plotted as a function of the TM6 torsion and TM3-TM6 ends distances (Figure 3A). The population analysis of the receptor activation shows that the presence of the slim Gαi2-α5 helix in the A1R intracellular cavity prevents TM6 to sample intermediate and inactive conformations. Indeed, A1R-Gi2 displays restrictive conformational dynamics by only sampling one major conformational state. This conformational state overlaps with both, the pre-active state and the fully-active state corresponding to the active Cryo-EM structure (Figure 3B-C). Thus, the binding of the G-proteins induces a population shift in A1R towards fully-active conformations, as described previously in other GPCR receptors. [11-14].

Effect of G-proteins binding in the A1R-ADO conformational ensemble.

(A) Population analysis of A1R activation in the A1R-ADO-Gi2 complex obtained from conventional molecular dynamics (cMD) simulations, the inactive and active X-ray and Cryo-EM coordinates are projected as blue and magenta stars, respectively. (B) Projection of the A1R-ADO-Gi2 energy minima obtained from cMD over the FEL associated with the A1R activation obtained from metadynamics simulations. The A1R-ADO-Gi2 energy minima (depicted in red) is centered on the coordinates of the active Cryo-EM structure. (C) Overlay of the active Cryo-EM structure (PDB 6D9H) and a representative snapshot from the A1R-ADO-Gi2 energy minima.

To study the allosteric pathways of the A1R-ADO-Gi2 complex, we computed the PEN considering the intra (A1R-A1R) and inter (A1R-Gαi2) interactions. The PEN analysis shows some similarities respect to the PEN of the pre-active ensemble (Figure 4). In both cases, TM6 allosteric communication is completed and TM7 presents an important communication pathway connecting the intra-and extracellular regions. However, the presence of G-Proteins finely-tunes the PEN in some aspects: (1) The ECL2 extracellular communication center propagates toward the ECL2 helix capturing the W156 allosteric residue that was not identified in previous PEN analysis. (2) The communication pathways between extra and intracellular regions are refined only passing through TM2, TM1, TM6 and TM7 and not using TM4 and TM5. Certainly, TM7 becomes the major communication pathway and another allosteric residue G279 is captured as an important communication node. (3) As expected, the C-Ter does not communicate with TM3 because the end of the α5 helix of Gαi2 (Gαi2-α5 helix) replaces the C-Ter communication position. As a consequence, TM3 (R108) communicates with TM2 (D42) and most importantly with Gαi2-α5 helix (D667). The latest becomes a communication hub in the intracellular region by also connecting with H8 (K294). The inclusion of Gαi2 also causes that TM6 (E229) communicates with TM5 (R208), Figure 4. Thus, the receptor signaling attains a specific profile of protein energy communication networks triggered by the Gαi2 protein coupling. Identification of transient pockets as potential allosteric sites. The identification of transient pockets formed along specific receptor activation profiles is useful to guide the design of allosteric drugs. We use a geometry-based algorithm (MDpocket)[35] in order to search for transient pockets in the inactive, intermediate, pre-active and fully-active ensembles of A1R. The MDpocket output provides a normalized frequency map that allows the visualization of the frequency of a pocket formation along each conformational ensemble studied.

Effect of G-proteins binding in A1R-ADO Protein Energy Networks (PEN).

The PEN residues (nodes) are represented by colored spheres as a function of the receptor region (e.g. TM6 nodes in teal) while the allosteric pathways (edges) as yellow-orange sticks. The size of each edge and node correspond to their importance for the allosteric communication. The experimentally identified allosteric residues captured in the PEN nodes are labeled with a red asterisk. Relevant interactions of the PEN in the upper and lower regions of the receptor are also shown.

We identified multiple pockets in the upper and lower regions of the receptor. Interestingly, we observe that pockets can change its shape and frequency of formation along the different conformational states (Figure 5 A). Focusing in the upper region pockets (PA-D), PA is present the ECL2 vestibule with low frequency. This region has been proposed to bind A1R PAMs on the basis of computational and mutagenesis studies.[22, 33, 34] PB is located at the adenosine binding site and extends to the adjacent secondary pocked observed by X-ray data in the inactive A1R structure [21]. Due to the PB large size, it overlaps with many allosteric modulators observed in class A (PDBs 4MQT[36], 5NDD[37], 4MBS[38]) and class C (PDBs 5CGC[39], 5CGD[39], 6FFH[40] and 6FFI[40]) GPCR structures. PE corresponds to the protein internal channel and overlaps with the NAM sodium Ion site (PDB 4N6H)[41], which is narrowed along activation (Figure 5A). PD corresponds to the MIPS521 PAM shallow pocket exposed to the lipid-detergent interface (PDB 7LD3)[18]. Up to date, this is the only allosteric modulator crystallized for A1R. Experimental assays and accelerated MDs determined that MIPS521 PAM increases the binding affinity of ADO in the orthosteric site[18]. Thus, PB and PD must be allosterically coupled. Among MIPS521 PAM pocket residues, only L242, L245, S246 and G279 were found to affect the PAM cooperativity. Interestingly, we capture TM1-TM6-TM7 PEN along activation including TM6 (L242 and L245) in the intermediate, L242 and S246 in the pre-active and L242 and G279 allosteric residues in the fully-active ensemble (Figure 5 B). Indeed, G279 becomes a key node in the PEN of the fully-active ensemble (Figure 4). This evidences that the energy coupling between PD and PB is enhanced along the receptor activation. We also identify a pocket (PC) located in an extrahelical TM1 and TM7 region. It matches with the polar region of the Monooleoylglycerol molecule captured in a GPCR structure (PDB 4MBS) 35. PC is energetically coupled with TM1 and TM7 PEN residues. For a complete description all pockets location and containing PEN residues, see Table S1 and S2.

Energy coupling between the transient pockets formed along activation.

(A) Iso-surface representation of the normalized frequency map (set at Φi =0.2 iso-value) obtained from MDPocked in the inactive (blue), intermediate (teal), pre-active (violet) and fully-active (red) ensembles. The transient pockets are labeled from A to N. The allosteric modulators that overlap with the pockets found are shown as dark gray sticks (PDB 7DL3, 5TZY and 5LWE) and spheres (PDB 4N6H). (B) Overlap of the transient pockets and the protein energy networks (PEN). The PEN residues (nodes) are represented by colored spheres as a function of the receptor region (e.g. TM6 nodes in teal) while the allosteric pathways (edges) as yellow-orange sticks.

Regarding the lower part of the receptor, both the outer surface (extrahelical) and the inner surface (intrahelical) presents many cavities. Some of them are open along the activation pathway (PF, PL) and others are more predominant (PJ, PH) or only formed (PM, PK, PN) in some states. PF is formed in all states and matches with the AP8 allosteric modulator site found in the free fatty acid receptor 1 (PDB 5TZY)[42] (Figure 5A) and also with cholesterol and other lipid molecules found in other GPCR structures (PDB 5LWE[43], 4PHU[44], 5TZR[42] and 4XNV[45]). Among GPCR allosteric modulators co-crystalized in the intracellular surface, only Vercircon (5LWE) [43] overlaps partially with PL, Figure 5A. This suggest that the shape and location of the A1R pockets located in the lower part is highly system-specific. In fact, the shape of the lower region changes substantially along activation. PG and PF pockets are formed from the intermediate to the fully-active state and are coupled to TM5-TM6 and TM2-TM3-TM4 PEN residues, respectively. PM supposes an interesting pocket because it is only predominant in the intermediate state. In addition, it contains many PEN residues of the extrahelical TM1-TM2-TM4 region. PK is only formed in the intermediate and pre-active states and contains PEN residues of the extrahelical TM1-H8 region, while PI is only formed in the pre-active and fully-active states and is energetically coupled with TM6 and TM7. PH and PN are located in the extrahelical TM5-TM6 region. PH is more predominant in the pre-active and fully-active state containing many PEN residues while PN is only formed in the pre-active and fully active states but it is less energetically connected (Figure 5 B). The overlap between pockets and PEN provides a view of how the distinct pockets are allosterically coupled between them and with other functional regions of the receptor (Figure 5 B and Table S1-2).

DISCUSSION

A comprehensive knowledge of the allosteric properties of receptors is essential for the design of allosteric drugs. It involves the deciphering of activation conformational landscapes, the decoding of allosteric networks and the characterization of transient pockets that are allosterically coupled with functional regions of the receptor. Here, we focus on the A1R due to the failure in the development of efficient and safe allosteric modulators reported up to date. [18, 22, 46]

The A1R activation FEL associated with the TM6 inward-to-outward transition reveals that the receptor is in dynamic equilibrium with inactive, intermediate and pre-active states, where the inactive is the most stable and the intermediate and pre-active states are separated by a lower energy barrier. This fast-conformational transition observed in presence of adenosine may favor the binding and activation of G-proteins. [12, 13] Regarding the activation pathway, the inactive state of TM6 that resembles the inactive X-ray structure progresses towards an intermediate state that is characterized by a rather high TM6 torsion and a modest TM6 opening due to the tight TM3-TM6 energy coupling in the intracellular ends. This results in a torsion in the end of TM6, which is signature of the intermediate conformations. This torsion evolves to a more stable intermediate state that exhibit higher TM3-TM6 distances. Finally, TM6 progresses towards a complete opening reaching the pre-active state, in which the fully-active Cryo-EM structure is not highly populated. Interestingly a larger outward movement of the TM6 is accessible, as captured in other class A GPCR receptors.[47, 48]. Upon G-proteins coupling, the slim Gαi2-α5 helix induces a population shift towards TM6 fully-active conformations that resembles the active Cryo-EM structure, and the intermediate and inactive states are not accessible. This data is in line with the combined activation mechanisms described in many GPCRs by means of NMR and computational studies, which states that the combined action of both, agonist and G-proteins is essential to stabilize the adoption of a less dynamic fully-active state. [11, 13, 15, 16]

The conformational progression obtained from the inactive to the adoption of fully-active conformations are relevant to target A1R drug specificity, specially the hidden intermediate and pre-active states. These conformational ensembles are also valuable to compute the allosteric networks of the receptor along the activation pathway. Such analysis provides a dynamic view of how the allosteric communication evolves along activation. The analysis of the interaction protein energy networks (PEN) captured the extra and intracellular communication centers together with the allosteric pathways that interconnect them. Among the PEN residues, we successfully captured most of the allosteric residues previously identified by mutagenesis studies, which highlights the reliability of the allosteric networks computed.

Focusing on the intracellular region, a tight allosteric commutation is observed in the inactive ensemble between TM3 (D104), TM3 (R108), TM6 (D229) and C-Ter (D326), which is key to attain the TM6 in the inactive state. The strong TM6-TM3 hydrogen network weakens in the intermediate state and TM6 (D229) now communicates with TM3 (R105). This transient communication prevents the TM6 opening and captures the receptor in intermediate conformations, that are characterized by a structural torsion in the TM6 end. In the pre-active ensemble TM6 opening provokes the complete loss of TM3-TM6 communication. Interestingly, the flexible C-ter (D326) performs transient communications along the receptor activation with TM3 (R108) and TM4 (K106). These interactions could compensate the TM6-TM3 break upon activation. However, the C-ter (D326) communication takes place in the region where the G-proteins bind. Thus, the binding of Gαi2-α5 helix must also compensate for the release of the C-ter from the G-proteins binding site. In fact, Gαi2-α5 helix (D340) replaces the C-terminal (D326) communication with TM3(R108) becoming a hub node in the intracellular allosteric center. Regarding the allosteric pathways, we observe an enhanced communication between the intra-and extracellular centers upon activation. In contrast with the inactive and intermediate states, in the pre-active state multiple energy pathways arises including TM2, TM4, TM5, TM6 and TM7. This is evidenced by the lack of TM6 communication in the inactive state, the partial TM6 communication in the intermediate state and the complete TM6 communication in the pre-active state. Interestingly, the presence of the G-proteins fine-tunes the allosteric communication profile only passing through TM2, TM1, TM6 and TM7. These insights may be interesting for future biased agonism studies. We hypothesize that the binding of drugs that potentially favor the establishment of specific PEN profiles may result in the discrimination between different intracellular partners by preferentially communicating with one of them. Thus, avoiding side effects by activating desired pathways.

Finally, we studied the connection between transient pockets and PEN along the receptor activation in order to unravel the allosteric coupling between pockets and distinct functional regions of the receptor. The dynamic shape of the cavities observed in the different conformational states highlights the importance of ensemble docking in drug design procedures.[49] We believe that drugs capable to perform interactions with PEN residues contained in pockets may potentially alter the receptor activity through allosteric effects. In addition, pockets transiently formed in the intermediate and pre-active states may provide further drug specificity. Following this reasoning PG, PJ, PK, PM and PI are interesting pockets to target allosteric modulation.

CONCLUSIONS

With the combination of free energy landscape construction, the decoding of allosteric networks and the calculation of transient pockets we successfully capture hidden conformational states, the allosteric communication centers/pathways and the transient pockets formed along the A1R activation. Most importantly, by an in-depth study of the connection between these three elements we provide a complete dynamic view of the A1R activation. Specifically, we observe that the allosteric communication is progressively enhanced between the extra and intra allosteric centers throughout the inactive, intermediate and pre-active states to become finely-tuned upon the binding of G proteins in the fully-active state. In fact, not only the allosteric networks are dynamic but also the shape and frequency of formation of the transient pockets change along the different conformational states. Overlap of the energy networks and transient pockets uncover how the allosteric coupling between pockets and distinct regions of the receptor is altered along the receptor activation pathway. All this system-specific structural dynamic information is essential to ease the design A1R allosteric modulators on the basis of structure-based drug design. This computational approach can be also applied to other GPCRs and related receptors, which is of interest for the design of novel allosteric drugs.

MATERIALS AND METHODS

Conventional Molecular Dynamics (cMD) simulations

System preparation

In total, three systems were prepared: the inactive conformation of A1R in complex with adenosine (A1R-ADO, inactive), the active conformation in complex with Adenosine (A1R-ADO, active) and the active conformation in complex with adenosine and heterotrimeric Gi2 proteins (ADO-A1R-Gi2, active). For the A1R-ADO inactive system, we used the inactive structure of A1R in complex with PSB36 (PDB 5N2S)[50] as starting structure. PSB36 was manually removed and adenosine was placed in the orthosteric site by structural alignment with the active Cryo-EM structure of A1R (PDB 6D9H)[20]. The A1R-ADO active system was obtained by manually removing the G-proteins from PDB 6D9H. Finally, ADO-A1R-Gi2 active system was obtained from PDB 6D9H. The crystal structure (PDB 2ODE)[51] was used to reconstruct the missing Gi2 protein regions and also to place Guanosine-5’-Diphosphate (GDP) and the magnesium ion (Mg+2) by structural alignment. The Modeller software[52] was used to reconstruct the missing X-Ray and Cryo-EM regions. MD parameters for the ligands (ADO and GDP) were generated with the parmchk module of AMBER20[53] using the general amber force-field (GAFF). The atomic charges (RESP model) were obtained using the antechamber module of AMBER20, with partial charges set to fit the electrostatic potential generated at HF/6-31G* level of theory using Gaussian 09[54]. Internal water molecules highly conserved among GPCRs were incorporated into the A1R internal channel of the inactive and active systems using the HomolWat web server tool.[55] All systems were filled into a simulation cell composed by a phosphatidylcholine (POPC) membrane solvated at NaCl 0.15 nM using the packmol memgen tool implemented in AMBER20. The force-fields selected to describe the different molecule types for the MD simulations were ff14SB (protein), GAFF2 (ligands), Lipid17 (membrane) and TIP3P (waters). In addition, two disulfide bonds were created between C80-C169 and C260-C263 residues in A1R.

Molecular dynamics protocol

The systems were minimized in a two-stage geometry optimization approach. In the first stage, a short minimization of the water molecules positions, with positional restrains on the protein, ligand and P31 atoms of the membrane was performed with a force constant of 10 kcal mol-1 Å-2 at constant volume periodic boundaries conditions. In the second stage, an unrestrained minimization including all atoms in the simulation cell was carried out. The minimized systems were gently heated in two phases. In the first phase, the temperature was increased from 0K to 100K in a 40 ps step. Harmonic restrains of 10 kcal mol-1 Å-2 were applied to the protein, ligand and membrane. In the second phase, the temperature was slowly increased from 100K to the production temperature (303.15K) in a 120 ps step. In this case, harmonic restrains of 10 kcal mol-1 Å-2 were applied to the protein, ligand and P31 atoms of the membrane. The Langevin thermostat was used to control and equalize the temperature. During the heating process the initial velocities were randomized. For the heating and following steps, bonds involving hydrogen were constrained with the SHAKE algorithm and the time step was set at 2 fs, allowing potential inhomogeneities to self-adjust. The equilibration step was performed in three stages. In the first stage, an MD simulation of 5 ns under NVT ensemble and periodic boundary conditions was performed to relax the simulation temperature. In a second stage, an MD simulation of 5 ns under NPT ensemble at a simulation pressure of 1.0 bar was performed to relax the density of the system. The semi-isotropic pressure scaling using the Monte Carlo barostat was selected to control the simulation pressure. In a third stage, an additional MD simulation of 10 ns was performed to further relax the system. After the systems were equilibrated in the NPT ensemble, we performed 3 independent MD production runs of 500 ns each (i.e. 1.5 µs accumulated time for each system). A 11 Å cutoff value was applied to Lennard-Jones and electrostatic interactions.

Metadynamics simulations

Collective Variables

Metadynamics is a powerful method to construct complex free energy landscapes of proteins as function of a few low-dimensional descriptors, also referred as collective variables (CVs).[56, 57] In this work, we selected the TM3-TM6 intracellular ends distance as the first collective variable (CV1). Specifically, we computed the center of mass (COM) distances between the backbone atoms of TM3 (105-109) and TM6 (226-230). For the second collective variable (CV2), we relied on the TM6 torsion, which was described by the dihedral angle formed by the alpha carbon atoms of TM7(T270 and T277) and TM6 (W247 and L236). These two CV were used to describe and monitor the TM6 inward-to-outward transition (A1R activation). Well-Tempered and multiple-walkers metadynamics simulation protocol: The PLUMED2.7[58] software package together with AMBER20[53] were used to carry out the metadynamics simulations. During a metadynamics simulation, external energy quantities (Gaussian potentials) are added at a regular number of MD steps to a selected CVs (see above). This bias potential encourages the system to escape from local energy minima and overcome energy barriers, thus allowing for an enhanced sampling of the CV conformational states.[59] After sufficient simulation time, the bias potential converges and the FEL can be reconstructed by summing the Gaussian potentials added to the CV values along the simulation time. Here, Gaussian potentials of height 0.5 kcal mol-1 and width of 0.25 (CV1) and 0.015 (CV2) were deposited every 2 ps of MD simulation at 303 K. For a smooth convergence of the bias potential we used the well-tempered (WT)[60] version of metadynamics algorithm, in which the height of the gaussian potentials were gradually decreased over time proportional to the potential deposited in the currently visited point of the CV space. A bias factor parameter of 10 was selected to control how quick the Gaussian height is decreased. In addition, the multiple-walkers approach[61] was used to improve the conformational sampling and to speed up the metadynamics simulations. It is based on running in parallel interacting replicas (walkers) where each walker biases the identical CVs and reads the Gaussian potentials deposited by the others during the simulation, thus reconstructing the same metadynamics bias simultaneously. In particular, we run ten walker replicas. The ten walker structures (W1-10) used as starting points for the walker metadynamics simulations were carefully selected from the initial conformational sampling of the cMD simulations. Specifically, we selected 5 walker structures (W1-5) from the cMD starting from the inactive X-Ray and 5 walker structures (W6-10) from the cMD starting from the active Cryo-EM in order to provide a path of conformations that encompasses the conformational states sampled along the A1R activation pathway (Figure S3). Each walker replica was run for 25 ns, giving a total of 250 ns. Finally, the FEL of the TM6 inward-to-outward transition was completely reconstructed by summing the Gaussian potentials deposited by all walker replicas as a function of the CVs.

Convergence

An indicator of convergence consists in observing that the free energy surface does not change significantly over time. We estimate the convergence of the recovered FEL monitoring the free energy difference (ΔΔG) between the local energy minima of the activation conformational surface along the simulation time. In particular, we calculated the inactive-active, the inactive-intermediate and the active-intermediate local energy minima differences (ΔΔG), see Figure S4. The metadynamics simulations were considered to be converged once we observed that with increasing simulation time, the energy differences between the energy minima tend to flatten. In other words, once the free energy surface does not change significantly during a relatively long period of time in the last part of the simulation. This criterion resulted in an accumulated time of 250 ns.

Protein Energy Networks (PEN) analysis

Mean Interaction Energy Matrix (IEM)

The get Residue Interaction eNergies and Networks (gRINN) [30] tool was used to calculate the pairwise residue interaction energies along the different conformational ensembles in order to obtain their respective mean interaction energy matrixes (IEM). The analysis was performed considering all residue pairs of A1R, which resulted in 326×326 matrixes. For the complete A1R conformational ensemble, we used one out of two conformations sampled in the metadynamics simulations resulting in 6,250 structures. For the analysis by conformational states, we split the whole conformational ensemble obtained from metadynamics simulations (12,500 structures) into the inactive, intermediate and pre-active ensembles, resulting in 3,648 structures for the inactive ensemble, 3,896 structures for the intermediate ensemble and 4,404 structures for the pre-active ensemble (Figure S5). The analysis of the ADO-A1R-Gi2 complex was performed using 5,000 representative structures from the cMD. For this case, we consider both the intra (A1R-A1R) and inter (A1R-Gαi2) residue pairs of interactions. In all matrixes, the positive energies (repulsive interactions) were set to 0 and the negative energies (attractive interactions) were converted to absolute values. In a second step, the matrixes were normalized. Hence, the attractive interactions were weighted containing values that range between 0 and 1.

Shortest Path Map

To study the allosteric communication by means of protein energy networks (PEN) we processed the normalized mean IEM into to Shortest Path Map (SPM) tool. [27, 31, 32] The SPM algorithm first constructs a network graph in which only the pair of residues with a normalized mean interaction energy (IE) higher than 0.1 are considered as nodes. The length of the edge connecting pair of residues (nodes) is drawn according to their normalized mean IE value (dij=-log |IEij|). Thus, higher normalized mean IE values (closer to 1) will have shorter edge distances, whereas lower normalized mean IE values (closer to 0.1) will have edges with long distances. At this point, we apply the Dijkstra algorithm to identify the shortest path lengths and generate the SPM graph. The Dijkstra algorithm operates through all nodes of the initial network graph and determine the shortest path to go from one node origin to all other nodes. The exploration is over when all nodes have been targeted as origin. In the SPM graph, the width of each edge and the size of each node are proportional to the number of shortest paths passing through that edge or node during the exploration. The method therefore offers the visualization of which nodes and edges are more frequently used for going through all residues of the protein, i.e. they are more central and significant for the communication pathway.

Transient pocket analysis

The MDpocket[35] tool was used to detect transient pockets in the inactive, intermediate, pre-active and fully-active ensembles. MDpocket is a fast geometry-based algorithm that relies on the concept of alpha spheres and makes extensive use of Voronoi tessellation during the cavity detection. The output provides a normalized frequency map allowing for an iso-surface representation. In the frequency map, the iso-values (Φi) ranges from 0 to 1, allowing for visualization of both permanent (Φi =1) and transient pockets (0<Φi < 1). Since we were interested in the detection of transient pockets along the receptor activation we used aΦi =0.2 iso-value for the display of pockets in all conformational ensembles studied.

Supporting information

Supplementary Information

Acknowledgements

This work was supported by the Mid-career Researcher Program (NRF-2020R1A2C2101636), the Bio & Medical Technology Development Program (NRF-2022M3E5F3080873), and the Brain Pool Program (NRF-2021H1D3A2A02038434) funded by the Ministry of Science and ICT (MSIT) through the National Research Foundation of Korea (NRF). It was also supported by the Ewha Womans University Research Grant of 2021. We are grateful to the Korea Institute of Science and Technology Information (KISTI) Supercomputing Center for providing computing resources and technical assistance (KSC-2020-CRE-0359). We thank Prof. Ferran Feixas, Dr. Raudah Lazim and Dr. Maninder for the helpful discussions and comments on the manuscript.