Introduction

G-Protein Coupled Receptors (GPCRs) act as molecular telephones and transmit signals across the cellular membrane by associating with G-proteins1, 2 or β-arrestins.3 The process of signal transduction generally involves GPCRs binding to effectors (agonists) which aid the shift in conformational equilibrium, facilitating the receptors to transition to an ’active’ state. Activation allows the receptor to associate with intracellular binding partners, terminating the process of signal transduction.4 GPCR activation is an area of active research - with studies establishing conserved structural motifs like the E/DRY, NPxxY48 in Class A and PxxG, HETx9 in Class B receptors acting as molecular switches that stabilize the inactive state. Unlike Class A and B GPCRs, activation of Class F receptors (SMO, FZD110) is still poorly understood. A primary reason for this elusiveness is that these receptors share none of the structural motifs seen in Class A/B, and have less than 10% sequence similarity to Class A receptors. Since GPCRs are involved in mediating virtually every physiological response - they are crucial drug targets, as 34% of all FDA approved drugs target one of these proteins.10

Smoothened (SMO) is a transmembrane protein from the Class F family of GPCRs. Class F consists of proteins that are involved in maintaining tissue homeostasis and regenerative responses in adults, and are crucial in embryonic development, as they regulate cellular diff tiation by binding to sterol and WnT ligands.1114 SMO is expressed in tissues throughout the body, particularly in cerebellar and pituitary tissue,15 and is a member of the Hedgehog (Hh) signaling pathway. When the endogenous inhibitor of SMO, a membrane protein Patched (PTCH), is inhibited by Sonic Hedgehog (ShH) binding, SMO translocates to the ciliary membrane, and undergoes conformational transitions (activation) to bind to its intracellular signaling partner Gi.16, 17 How PTCH inhibits SMO is still unclear. However, multiple studies have described PTCH’s inhibition on SMO as acting through reducing SMO’s accessibility to membrane cholesterol.18 A recent study described the effect of PTCH on the cholesterol accessibility of the upper leafl suggesting that PTCH inhibits SMO by either transporting cholesterol to the inner leafl , or to an extracellular acceptor.19 Hh signaling is critical to embryonic development, and any changes in signaling can lead to severe birth defects.20 Cyclopamine, a naturally occuring alkaloid in corn lily, has been identified as a teratogen (agents responsible for birth defects in infants),21 and was responsible for birth defects in lambs in Idaho in the 1950s.22 It was identified later that cyclopamine’s mechanism of action involved inhibiting HH signalling by binding to SMO.2325 On the other hand, overstimulation of HH signaling via SMO has been linked to the pathogenesis of pediatric medulloblastoma and basal cell carcinoma. 26, 27 Vismodegib28 and Sonidegib29 are two FDA approved drugs that target SMO, but are prone to chemoresistance.30 Therefore, understanding activation mechanisms of Class F GPCRs is hence critical to design novel therapeutics.

Structures of SMO bound to agonists and antagonists outline the effects of allosteric and orthosteric modulators binding on SMO activity. These structures show the existence of two primary binding sites in SMO - in the CRD (Cysteine Rich Domain), which binds agonists cholesterol,31 cyclopamine 32 and in the TMD, which binds both antagonists LY2940680,33 SANT1 and AntaXV,34 cyclopamine, 35 TC114,36 Vismodegib31 and agonists SAG1.5,34 SAG,37 SAG21k,38 24,25 epoxycholesterol,39 and cholesterol.37 Mutagenesis studies have outlined the presence of an intracellular W7.55-R6.32 cation-π lock40 in Class F that is broken on activation (Fig. 1A), with mutations that disrupt this lock lead to increased agonist potency and pathway selection (Superscripts refer to the Ballesteros-Weinstein numbering system used to denote GPCR TM residues41). On the extracellular end, for SAG-bound SMO, the D-R-E network is broken in active SMO34 (Fig. 1A). The intracellular end of active SMO shows rearrangements in TM6 (outward, Fig. 1B), TM3 (outward, Fig. 1C) and TM5 (inward, Fig. 1D). These studies paint a static picture of how SMO activity can be attributed to structural rearrangements; however, a dynamic understanding of the process of SMO activation remains to be understood. Hence to provide a dynamic overview of activation, we simulated 250 µs Apo-SMO (no ligand bound) to understand SMO’s activation process in atomistic detail. Moreover, it has been shown that PTCH modulates SMO activity by controlling its access to membrane cholesterol18, 42 which then travels through a hydrophobic tunnel inside SMO to access the primary ligand binding site in CRD, showing an expanded tunnel in active SMO (Fig. 1A). Hence, we simulated agonist bound (SAG-SMO) and antagonist bound (SANT1-SMO) to explore the effects of bound modulators on SMO activity, and the mechanisms of action for these molecules. Using a highly parallel adaptive sampling based approach and constructing a Markov state-model (MSM),43, 44 we probe submillisecond dynamics of SMO, and show that SMO activation involves a intra-cellular structural motif that is conserved across Class F receptors. MSMs have been used to model membrane protein behavior at varied timescales, to probe activity of membrane transporters,4547 as well as to study conformational dynamics of signaling proteins.8, 4855 In particular, Markov State Models have been employed to investigate conformational dynamics of GPCRs, such as β2-AR,8, 52, 56 Dopamine D3 receptor,48 µ-opioid receptor,54, 55 Chemokine receptor CCR2,49 and Cannabinoid Receptors 1,2.50, 51 Using MSMs, we outline the involvement of multiple CRD-TMD salt-bridges that are rearranged during SMO activation, establishing a role for the CRD in SMO activation. We show that the hydrophobic tunnel inside SMO expands in the presence of an agonist, and is occluded by the antagonist. These observations are amenable to experimental observations that bolster the cholesterol transport-like activity of SMO. We then use a mutual-information based approach to outline the allosteric mechanisms through which the agonist SAG operates, i.e. by changing the allosteric pathways in SMO to more active-like SMO. These observations provide a detailed and atomistic in-depth view of SMO activation, and may aid in design of antagonists for cancer therapy.

Major structural changes during SMO Activation. (A) Comparison of the broken D-R-E network and the W-R ionic lock, and the expanded tunnel, in inactive (green) vs active (red) SMO (B)-(D) Comparison of inactive and active SMO, indicating the outward movement of the TM6 and TM3 and inward movement of TM5 in active SMO.

Results and Discussion

SMO activation involves a conserved molecular switch

To probe the transitions SMO undergoes during activation, SMO was simulated in a ligand-free form (Apo-SMO) from two starting points - inactive and active crystal structures. Simulations were performed using a parallel approach - by clustering the existing data based on selected features (feature selection explained in Methods) and seeding the next round of simulations by randomly selecting starting points from the least populated clusters - a technique known as Adaptive Sampling.57 (Fig. S1, Table S1). The high dimensionality of the data was reduced by transforming it using time-Independent Component Analysis (tICA).58, 59 tICA uses a linear combination of the supplied features to identify the slowest collective degress of freedom in the data by computing the time-lagged autocorrelation. The fi two tICA components account for the two slowest processes associated with activation(Fig. S2). The active and inactive structures were separated majorly in the fi tICA component (tIC 1), indicating that activation was the slowest process observed in simulations. Hence, features that were highly correlated with tIC 1 (Fig. S3) were considered pivotal to activation. The tICA-transformed data was clustered - dividing the data into kinetically distinct microstates. A Markov-State Model (MSM) was constructed on the clustered data to compute the transition rates between microstates, and to reweigh the data, eliminating the bias introduced by adaptive sampling.

At the intracellular end, we observe that W3393.50 shows a very dramatic reorientation on receptor activation, moving outwards from the center of the TM bundle, to accommodate the bound Gi. W3393.50 is conserved across all Class F receptors (Fig. S4). Upon further analysis, we observed that this rearrangement extends to include M4496.30 and G4536.34 (outward movement) , G4225.61 (translation) (Fig. 2A), as well as W5357.55 (inward rotation) - residues that are all conserved across the entire Class F family (Fig. S4).

M4496.30’s outward movement is a proxy for the outward movement of TM6 a process associated with canonical GPCR activation.;37, 39 However, instead of kinking outwards as observed in Class A/B receptors, TM6 in SMO undergoes translation, to accommodate Gi. This can be attributed to the absence of P6.43, a residue conserved across FZDs. P6.43 is replaced by F4626.43 in SMO - thereby increasing its rigidity and resistance to developing kinks.60 Recently published structure of active FZD761 shows this kink at P6.43. TM5 on the other hand, shows slight inward translation. To capture these outlined movements, we projected the entire Apo-SMO data onto W3393.50 – M4496.30 (TM3-TM6 distance) v/s W3393.50 – G4225.39 (TM3-TM5 distance) and computed the free energy associated with each state(Fig. 2B, Fig. S5). The free energy plot shows that this TM3-5-6 rearrangement follows a stage-wise process - with TM6 moving outwards fi by 4 Å , (State 1 in Fig. 2B) followed by the rest of the TM3 outward movement after a slight outward rearrangement in TM5 (State 2 in Fig. 2B). The overall free energy barrier for this rearrangement is 2.5 kcal/mol. The outward movement of TM6 is analogous to class A receptor activation (Fig. S6 A,B). A conserved molecular switch mediating SMO’s activation on the intracellular end is similar to the breakage of molecular switch E/DRY in Class A GPCRs- with W3393.50 being the residue analogous to R3.50 (Fig. S6 C,D). Hence, we posit that this conserved molecular motif (W-G-M) is integral to Class F receptor activation, and provides a basis for activation across the entire Class F receptors, while also showing the uniqueness of activation of Class F receptors.

Molecular metrics integral to SMO Activation. (A) Rearrangement of the WGM motif, a conserved molecular switch across class F GPCRs, undergoes rearrangement on SMO activation. (B) Relative free energies from MSM-weighted simulation data plotted on the TM3-TM6 distance vs TM3-TM5 distance measured at residues W3393.50, M4496.30 and G4225.61. (C) Breaking of the D-R-E network on the extracellular end of the TMD. (D) SImilar to (B), but for TM3-TM6 distance vs the D-E distance. (E) The ionic lock breaks by the sidechain rotation of W5357.55. (F) Same as (B) but for TM3-TM6 distance vs χ2 dihedral measured at W5357.55.

The crystal structure of SMO bound to the synthetic agonist SAG1.5 gives clues about the activation-specific residue-level rearrangements that occur on the extracellular end of SMO. D4736.54 has been established as a residue critical to SMO activity, as it forms a part of SMO’s core TMD ligand binding cavity, and is shown to interact with agonists SAG1.5, SAG, oxysterols and antagonists GDC-0449, AntaXV.34, 37, 39, 62, 63 Specifically, a network of salt bridges formed by the residues D4736.54, E5187.38 and R4005.39 is broken in SAG1.5-bound SMO (Fig. 2C).34 Hence, we also projected the data on the D4736.54 – E5187.38 distance v/s intracellular TM3-6 movement (Fig. 2D, Fig. S5). We observe that the TM6-TM3 outward movement (2 in Fig. 2D) is preceded by the breakage of the hydrogen bond between D473-E518 (1 in Fig. 2D).

To outline the role of the cation-π lock W5357.55-R4516.32 in activation, we projected this ionic lock contact v/s the TM3-6 outward movement (Fig. 2F, Fig. S5). Projecting the Apo-SMO data along the sidechain dihedral angle χ2 of W5357.55, clearly showed the distinct inactive and active states. This shows that the mechanism of cation-pi lock breaking involves the sidechain rotation of W5357.55. Additionally, we observe that the cation-pi lock breaks around the same TM3-TM6 distance as the outward movement of TM3. Thus, the WGM motif and the ionic lock at the intracellular end, and the D-R-E network at extracellular end are critical residue networks involved in SMO activation. These residues form a network of allosterically coupled residues, proving crucial for signal transduction across the membrane.

Residues at the CRD-TMD interface involve salt-bridge rearrangements in SMO activation

SMO, in addition to a heptahelical TM domain, possesses an extracellular domain called the Cysteine Rich Domain (CRD). The CRD consists of residues that are highly polar; in comparison to the TMD, which is mostly hydrophobic (Fig. S7). This domain is critical for SMO activation, as SMOΔCRD mutants show a higher constitutive activity - suggesting that the CRD represses SMO’s basal activity.64 The CRD also includes the primary sterol binding site in SMO31 - and it has been posited that PTCH inhibits SMO by reducing cholesterol access to this site.17 Structures of active xenopus laevis SMO (xSMO) show a dramatic reorientation of the CRD on xSMO activation-suggesting that the CRD has a very dynamic range of motion.63 However, this reorientation is not observed in human SMO (hSMO).31, 36 Thus to establish a role of the CRD in activation of hSMO, we sought residue pairs in SMO CRD-TMD interface that showed the highest variance along tIC1, the slowest process that captured SMO activation.

Fig. 3(A-F) show the residue pairs that have the highest change in contact frequency during activation - starting with the R4856.66 – D209CRD , salt-bridge, which breaks during activation (Fig. 3A) due to the outward movement of TM6. This indicates that the R485-D209 salt-bridge is involved in stabilizing the inward conformation of TM6 in the inactive state. This loss of the R485-D209 salt-bridge is however compensated by the formation of the nearby R161–D486 salt bridge, which is predominantly seen in the active conformation (Fig. 3E). Furthermore, the inactive state shows a salt-bridge E208CRD -K395ECL2 which breaks on activation, compensated by the formation of the nearby D201CRD –R296ECL1 (Fig. 3B, E). Additionally, activation strongly favors the formation of R159CRD –D209CRD (Fig. 3C) and D382ECL2–K204CRD (Fig. 3G) salt bridges.

Overall activation of SMO involves residues at CRD-TMD junction. (A)-(F) Snapshots and probability density plots outlining the salt-bridge rearrangements at the CRD-TMD interface during SMO activation.

The path along tIC1 from the inactive state to active state involves 3 intermediate states(I13) (Fig. 4A), characterized by free energy barriers of atleast 1 kcal mol among them. Using Transition Path Theory on the constructed MSM, we calculated the fl of transitions between these states, to establish timescales for activation of SMO. (Fig. 4B). The simulations show that the entire process of activation from inactive to active has a MFPT (mean fi passage time) of 72µs (Fig. 4B), while the reverse process is 3X faster, with MFPT 24 µs.

(A)Relative free energies from MSM-weighted simulation data of Apo-SMO plotted along tIC1 and tIC2, the 2 slowest components, with the intermediate states I13 as shown. (B) Overall transition pathway of SMO activation process.

We observe that residue pair rearrangements that are associated with activation at the CRD-TMD junctions are salt-bridges, mostly between residues with one residue in CRD and the other one in TMD. Almost none of these polar residues are conserved (Fig. S4, S8), indicating that these residues contribute to a unique activation process for SMO at the CRD-TMD interface. Additionally, we observe that the entire CRD motion can be accounted for by a slight outward rotational motion of the CRD (Fig. S9), thereby causing TM6 to move outwards and triggering activation on the intracellular end. Since the CRD has a cholesterol binding site, it is possible that cholesterol binding to the CRD triggers this outward rotation, inducing the signal that causes TM6 to move out. This potentially outlines a mechanism for the activation of SMO by cholesterol, its endogenous agonist.

SMO’s Activation is linked to opening of a hydrophobic tunnel

Endogenously, on PTCH’s inhibition by ShH, SMO is activated. SMO’s activation is mediated endogenously by cholesterol, suggesting that PTCH’s inhibition facilitates SMO’s activation by cholesterol. This suggests that cholesterol from the membrane travels to the extracellular sterol binding site. How this transfer of cholesterol occurs to the SMO CRD is still unknown; however, SMO does indeed present itself with a unique topology - the presence of a tunnel inside the protein. This tunnel has been hypothesized3739, 63 to facilitate the transport of cholesterol from the membrane to the binding site in the CRD31 ; making this tunnel a prime target for inhibitors. As noted by Qi et al., SMO antagonists (SANT1, AntaXV, LY2940680) bind deeper into a tunnel inside SMO, whereas SMO agonists (SAG) bind outside this tunnel. Adding a 4-aminomethyl moiety to the tail-end of SAG converts it to an antagonist, suggesting that this added moiety can hinder the tunnel.65 Mutations that introduced a bulky residue into the tunnel (V329F,V333F,V408F,I412F,T470Q), blocked SMO activity,31, 38 suggesting that the tunnel conformation was linked to how small molecule and mutations modulated SMO activity.37 This suggests that SMO antagonists like SANT1 act as steric antagonists by blocking the sterol tunnel inside SMO, while agonists like SAG allosterically activate SMO by breaking the D-R-E network, setting off receptor activation on the intracellular end. The mechanism and dynamics of the modulators acting on SMO’s activation is still unclear; hence we simulated SMO bound to antagonist SANT1 (SANT1-SMO) and agonist SAG (SAG-SMO) to probe the effect of bound agonist and antagonist on SMO’s activation.

SMO’s tunnel is characterized by markedly hydrophobic residues (Fig. S10), pointing further towards the idea that a hydrophobic molecule may be transported through it. This tunnel runs through the core of the receptor, spans the entire TM domain, starting at the conserved residues W3393.50, spans seven helical turns, and ends at the extracellular network of residues E5187.38, D4736.54 and R4005.39. These three residues form the base of the space between the CRD and TMD. Moving outwards along the path defi along the tunnel directly leads to the binding site, with TM6, ECL2 and ECL1 forming the bridge between these sites (Fig. S11).

In SANT1-SMO simulations, the tunnel remains almost completely blocked (Fig. 5A, B), indicating that the mechanism by which SANT1 modulates SMO activity is by binding deeply into the SMO tunnel core, precluding the potential transport of cholesterol. SANT1’s piperazine moiety directly interacts with H4706.51 and sidechain of M5257.45 - forming hydrogen bond interactions. The pyrrolic head of the ligand remains buried deep inside, with minimal lateral movement across the tunnel (Fig. S12). However, in Apo-SMO simulations, the tunnel remains relatively open(Fig. 5C, D). In SAG-SMO simulations however, the tunnel radius has a sudden kink outward (z -20 Å), suggesting that there is a relative expansion of the tunnel induced by SAG (Fig. 5 E, F). Since this expansion occurs at z -20 Å, it suggests the opening is in the upper leafl (Fig. S13A). Recent studies suggest that active PTCH precludes SMO’s accessibility to cholesterol in the upper leafl 19 To further probe into the exact position of this tunnel opening, we observed that a cluster of openings occured at x 16 Å and y 22 Å; corresponding to the space between TM2 and TM3. (Fig. S13B). This is in agreement with a recent study that used coarse-grained simulations to observe a cholesterol binding site at the TM2-TM3 interface in the upper leafl 67 Thus, SAG acts as an agonist by allosterically expanding the tunnel at the cholesterol interaction site - giving further evidence for the cholesterol-transport like activity of SMO. Thus we conclude that SANT1 functions as a steric antagonist by blocking the tunnel, whereas SAG functions by allosterically expanding the tunnel, thereby establishing design rules for SMO agonists and antagonists.

Tunnel radius plots for SMO. (A) Free energy plot of the tunnel diameter along the z-coordinate for SANT1-bound SMO. (C) same as (A), but for Apo-SMO. (E) same as (A), but for SAG-bound SMO. SAG-bound SMO clearly shows the expansion of the tunnel as compared to Apo-SMO and SANT1-SMO. (B), (D), (F) - representative fi for SANT-1 SMO, Apo-SMO and SAG-SMO. Tunnel radii were calculated using the HOLE program66 transduction. Allosteric pathways contain a series of conformationally-coupled residues that link dynamically active and spatially distant residues. In Class A GPCRs, allosteric pathways are responsible for communicating conformational changes from the extracellular end to the intracellular end, completing the process of signal transduction.6870 Since SMO’s activation process involves allosteric communication between the extracellular ligand binding site (D-R-E network) and the G-protein coupling site (WGM motif), we sought to analyze the allosteric pathways that connect the two sites. We computed the dynamic pairwise mutual information of Inactive-Apo-SMO, Active-Apo-SMO, SANT1-SMO and SAG-SMO on a residue-level basis, and construct a graphical network of residues that are allosterically linked. Based on this network, we present the allosteric pathway between the intra- and extracellular ends of TMD.

Allosteric pathways between E5187.38 and W3393.50. (A) - Pathway in Apo-Inactive-SMO. Since the tunnel radius is decreased, TM6 outward movement is restricted, and therefore the entire allosteric communications occurs via TM6. (B) In SANT1-SMO, due to slight outward movement of TM6, the pathways switches from TM7 to TM6 to TM3. (C,D) SAG-SMO and Apo-Active SMO show the same allosteric pathway, which spans TM7-TM6-TM5-TM3.

SAG alters the allosteric pathways in SMO during the process of SMO activation

To further investigate the mechanism by which SAG allosterically modulates SMO’s activity resulting in the expansion of the tunnel, we computed the allosteric pathways that connected the intra- and extracellular ends of SMO, responsible for transmembrane signal

In our simulations, we observe that the allosteric pathway between the intra and extra-cellular ends in Apo-Inactive SMO completely passes through TM6, encompassing residues T4666.47, F4606.41 and G4566.37 (Fig. 5A). This establishes an integral role for TM6 in mediating the signals across the transmembrane domain in inactive-SMO. SANT1-SMO on the other hand, unexpectedly shows a distinct pathway, fi going down intra-helically to A5247.24, crossing over to TM6 via A4596.40 and fi to L3353.46. This however can be explained by the observation that the SANT1 causes a slight outward movement of TM6, to accommodate itself in the deep core TMD ligand binding cavity (Fig. S14). This outward movement of TM6 drifts T4664.67 away from E5187.55; causing the network to rearrange itself, moving over to TM6 further downstream. On the other hand, SAG and Apo-Active SMO show the exact same networks, further indicating that SAG alters the allosteric networks in SMO to resemble Apo-SMO. These networks involve C4696.50, the most conserved residue in TM6, down L4646.45, and a fl over to TM5 as we move intracellularly, due to the outward intracellular movement of TM6, via L4125.51 and F4185.57. Thus, we can establish a basis for the mechanisms through which SAG and SANT1 effectively modulate SMO activity, and establish an integral role for TMs 7,6,5,3 in signal transduction.

Conclusions

Our study reveals the activation mechanism for SMO, a Class F GPCR, in atomistic detail via molecular dynamics simulations. We characterized the residue level transitions that SMO undergoes during activation. We simulated SMO in Apo, SAG and SANT1 bound states to probe the activation mechanism of SMO, and computed the free energy landscape of the process. Our MSM weighted free energy landscapes show a barrier of max free energy barrier of 3 kcal mol1 while transitioning from an inactive to active state, involving three intermediate states.

Class A and Class B receptors have been the subject of major interest involving GPCR activation.4, 9 Receptor activation studies on Class F majorly focused on the start and end states of the receptor, without giving an overview of the dynamics of the process. Using computational methods, we show that SMO activation involves the rearrangement of a intracellular structural motif - the W-G-M motif, conserved across the entire Class F family. This lays the basis for a common activation mechanism for all Class F receptors on the intracellular end. Additionally, this motif involves W3.50, which is the residue equivalent to R3.50 in class A receptors, establishing the integral role of TM3 in GPCR activation. On the extracellular end of TMD, we see that the D-R-E network of residues is pivotal to activation, as it engages the agonist and sets off the activation process at the intracellular end. We also show evidence of allosteric coupling between these two sites, showing that the rearrangement of the D-R-E network is necessary to ensure intracellular rearrangement of the WGM motif.

We also establish a role for the CRD in SMO activation, establishing and breaking salt- bridges while transitioning to an active state, contacts that have not been discussed previously. This gives novelty to the methodology established, inferring that MD simulations can be used to discover contacts crucial to activation, previously unknown. We show that the agonist SAG expands an intra-TMD tunnel inside SMO, further supporting the hypothesis that SMO transports a cholesterol molecule through its hydrophobic tunnel to activate SMO.18, 19, 37, 39, 63 We also show that SAG acts as an allosteric modulator, by modifying SMO’s allosteric pathways to be similar to Apo-SMO. On the other hand, SANT1 acts as a steric antagonist, by occluding the hydrophobic tunnel inside SMO, hence lowering the radius. Therefore, we establish the mechanisms of action of antagonists and agonists in modulating SMO activity. How cholesterol, the endogenous agonist of SMO, modulates SMO activity in the presence of agonists, still needs to be explored. However, we propose that the overall mechanistic fi from this study can be used to design novel SMO antagonists, for chemotherapy.

Methods

Molecular Dynamics Simulations

Simulation setup

SMO crystal structures in the bound inactive conformation (inactive-SMO) (PDB ID: 5L7D31) and active conformation (active-SMO) (PDB ID: 6XBL37) were used as starting points for the SMO-Apo simulations. For apo systems, the bound ligand and the stabilizing antibodies were removed. The missing residues in the proteins were modeled using MODELLER71 (Table S2). The inactivating mutation V329F in the inac-SMO was corrected back to wild-type. For SAG-SMO, the bound SAG was retained in the SMO-SAG complex.37 For SANT1-SMO, owing to the lack of the CRD in the SANT1-SMO complex (PDB ID: 4N4W34); this crystal structure was instead aligned to inac-SMO 5L7D (to maintain the same binding pose for SANT1), and the 5L7D-SANT1 starting point was used for simulations. The terminal residues were capped using neutral terminal caps Acetyl (ACE) for N-terminus and MethylAmide (NME) for the C-terminus. The proteins were embedded in a membrane bilayer using CHARMM-GUI.72, 73 The atomic interactions were characterized using the CHARMM36 force fi 74, 75 The force fi parameters for synthetic ligands SAG and SANT1 were generated using ParamChem,76 an automated version of CGenFF.77, 78 Input fi were generated using the web-based input generator CHARMM-GUI.79 The composition of the membrane bilayer was based on lipid composition of the mice brain cerebellum80 - (75% POPC, 21% Cholesterol, 4% Sphingomyelin), to mimic physiological cerebellar membrane composition. The system was solvated using TIP3P water81 and 150mM NaCl, to mimic physiological conditions. Overall the system sizes for inac-SMO, act-SMO, SAG-SMO and SANT1-SMO were 106,415, 105,971, 105,100 and 105,582 atoms with box sizes 86x86x153 Å3, 86x86x152 Å3 86x86x152 Å3 and 85x85x153 Å3 respectively. The mass of non-protein hydrogens was repartitioned to 3.024 Da,82 to enable simulations with a long timestep (4fs). Parmed, a part of the AmberTools19 package, was used for this purpose.83

Pre-Production MD

The systems were minimized for 1000 steps, using the steepest descent method, followed by minimization using the SHAKE algorithm84 for 14000 steps. Systems were then heated from 0-310 K using NVT conditions for 5ns, constraining the backbone using a force constant of 10 kcal mol1 Å2. Systems were then equilibrated using the NPT conditions for 5ns, at 310 K and 1 bar, using similar backbone restraints. This was followed by an equilibration of 40 ns, without constraints. Apo-SMO and SANT-SMO simulations were performed using the AMBER1883, 8588 biomolecular simulation package. SAG-SMO simulations were performed using NAMD 2.14.89, 90

Production MD

Post equilibration, the GPU-accelerated pmemd.cuda package from AMBER1883, 88 was used for production MD. Integrator timestep was set to 4fs. Periodic boundary conditions were used, and the temperature was maintained using the Langevin Thermostat. 91 Particle Mesh Ewald92 (PME) method was used for computing long-range electrostatic interactions. SHAKE84 algorithm was used to restrain the Hydrogen bonds. Cutoff for non-bonded interactions was set to 10 Å. Frames were saved every 25000 steps, giving a frame rate of 100 ps between each frame. Simulations were performed using the Blue Waters supercomputer(NVIDIA Tesla K20X GPUs) or our in-house computing cluster(NVIDIA GeForce GTX 980 GPUs).

Adaptive Sampling, Feature Selection and Clustering

Simulating biological systems using traditional long-MD simulations to observe sub-millisecond dynamics is unfeasible, hence we sorted to using a parallel approach to accelerate conformational sampling, called Adaptive Sampling. The simulation data after every round of simulations was clustered (feature selection explained below), and the least populated clusters were used to seed simulations for the next round. Overall, for Apo-SMO, 7 rounds of simualtions were performed, collecting 30-50µs per round. For SAG/SANT1-SMO, the data was collected in a similar fashion, for 3 rounds each, around 10-20 µs per round. The bias introduced in the system due to selectively starting simulations from least populated clusters was eliminated by constructing a Markov-state model, that estimated the reverse transition probabilities from each microstate.

The progress of the transition from inactive to active was monitored by calculating features, each of which was selected based on maximum magnitude of Δ RRCS (RRCS - Residue Residue Contact Score). RRCS is a order-parameter identifying technique that uses a fl linear-flat scoring scheme to assign a score to contact between every residue-pair in the system.6 Contacts that had |ΔRRCS| < 3.5 (58 such distances total) (Table S3) were used. K-means clustering was used to cluster the simulation data, based on these calculated features. Clustering was performed using the pyEMMA python library.93

Markov State Model construction

The high dimensionality of the data was fi st reduced using tICA. The tICA lagtime was optimized by observing the plateauing of the implied timescales (-2/lnλ, λ being the largest eigenvalue of the fi tICA eigenvector) vs the lag time, and was set to 30 ns for the 3 systems (Apo-SMO, SAG-SMO, SANT1-SMO). The tICA reduced-dimension data was then clustered using k-means clustering. The optimum number of clusters and no. of tICA components to be used was optimized by maximizing the VAMP2 score (sum of the squares of the highest eigenvalues of the transition matrix) for a particular the number of clusters, and the convergence of the implied timescales vs the MSM lag time. (Fig. S16,S17,S18). Accordingly, the number of clusters was set to 200 (Apo-SMO) and 100 (SANT1-SMO and SAG-SMO). The MSM lag time was set to 30 ns for the three systems (Apo-SMO, SAG-SMO, SANT1-SMO). The Chapman-Kolmogorov test, which tests the validity of the MSM on 5 macrostates, was performed using pyEMMA (Fig. S19,S20,S21).

Trajectory Analysis and Visualization

cpptraj94 was used for trajectory processing. VMD95, 96 and open-source PyMOL97 were used to visualize and render images. MDTraj98 was used for computing all order parameters. All plots were made using matplotlib99 and seaborn100 python libraries. Numpy101 was used for numerical computations. The salt-bridge based contacts were discovered by extracting probabilty-weighted 10000 frames from clusters the in Inactive, I13 and Active states, using cluster probabilities from the MSM. They were analyzed for unique contacts using GetContacts.102 Tunnel radii for for analysis of effect of SAG and SANT1 were calculated using HOLE.66

Mutual Information Calculations

Mutual Information for describing the allosteric pathways was computed using mdentropy,103 using the DihedralMutualInformation function. Analysis was performed on 10000 frames each extracted from Apo-SMO, SANT1-SMO and SAG-SMO data. The frames were chosen based on the predicted MSM probabilities, to represent the entire ensemble. A graph was constructed from the computed Mutual Information, and residues with C-α distances < 10 Å were considered to be connected by an edge. The weight of each edge was assigned as MI = MImax - MIab, with the MImax as the maximum mutual information computed among two residues in a protein, and MIab was the mutual information computed between residue pair ab. Edges with MI < MIavg were not considered. Allosteric pathways were computed by calculating the shortest paths between 2 nodes, (in our case E518 and W339) using Dijkstra’s algorithm.104 NetworkX,105 a python library was used for graph-construction, visualization and computing shortest paths.

Data Availability

Stripped trajectories and corresponding parameter fi have been uploaded to Box. Scripts used for MSM construction and trajectory analysis have been uploaded to github.

Supporting information

Supplementary Methods, Images, Tables and Results

Acknowledgements

The authors thank The Blue Waters Petascale Computing Facility and National Center for Supercomputing Applications, which is supported by the National Science Foundation (awards OCI-0725070 and ACI-1238993) and the state of Illinois. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Super-computing Applications. DS acknowledges support from the NSF Early CAREER award MCB-1845606 and seed grant from Cancer Center at Illinois for their support. PDB thanks Matthew Chan, Austin Weigle and Jiming Chen of the Shukla Group at University of Illinois for the valuable insights throughout the course of this study.