0 Front Pages New - Merged

GENOME WIDE SURVEY OF CERTAIN MAMMALIAN GPCRS AND OLFACTORY RECEPTORS
A THESIS
Submitted by
NAGARATHNAM B
in partial fulfillment for the award of the degree of
DOCTOR OF PHILOSOPHY
FACULTY OF SCIENCE AND HUMANITIES ANNA UNIVERSITY CHENNAI 600 025

JUNE 2012
ii
iii
ABSTRACT
In the recent era of G-protein coupled receptor (GPCR) research, computational approaches in sequence analysis play a vital role in identifying related sequences (homologues), conserved features, (domains, motifs) and evolutionary impacts (orthologs) for the interested protein families at intraand inter-genomic levels. Candidate GPCRs and ORs (class A type GPCR) are important for their diverse cellular activities and have been considered for the genome-wide survey in selected eukaryotic genomes, which further helps to establish a structure, function resemblance. Generally, GPCRs are predicted for having extracellular N-terminal (N-out topology), intracellular C-terminal with seven transmembrane-helices (TMHs) and are connected by three intra and extracellular loops thereby termed as serpentine-like receptors. Previous cross-genome studies on human- Drosophila GPCRs, motivated to perform a cross-genome clustering on human- C. elegans GPCRs (Chapter 2). A profile based clustering (RPS-BLAST) was employed to associate more than 1000 C. elegans GPCRs with already grouped human GPCR clusters of eight major types of receptors. The generated 32 human- C. elegans GPCR clusters were analyzed for five different types of cluster association with proposed terminologies such as human GPCR clade [HC], coclusters [CC], neighbor clades [NC], neighbor members [NM], speciesspecific members [SS] observed at tree topology which facilitate to connect functional relevance at intra-and inter-genomic levels. Interestingly, the referred CC was significant and exhibited evolutionary integrity at inter-genomic level. Also, the identified 27 orthologs were evident to illustrate the effectiveness of using cross-genome clustering techniques in connecting related GPCRs even at
iv
remote homology. Overall 84% of the GPCR sequences across genomes have been associated at the significant E-value thresholds (ranges from 0.001 to 1) successfully by RPS-BLAST (work published). Cross-genome clustering on human and C. elegans GPCRs motivated to perform a phylogenetic analysis on serpentine receptors (SRs) exclusively (Chapter 3). As we know, nearly 20 protein families of SRs from C. elegans were related to chemosensation, a phylogenetic analysis on 683 serpentine receptors was carried out to identify the related sequences/clusters to represent the family specific/receptor specific sequence features, ultimately to connect at superfamily level. Interestingly, the only one receptor annotated for olfaction (odr-10) in C. elegans to sense di-acetyl compounds has been noticed along with 43 SRs in the phylogeny. All the associated homologues to odr-10 are from Str superfamily and particularly str-112 has been found as the most closely related sequence homologue to odr-10 from the phylogenetic analysis. As a case study, odr-10 has been modelled for understanding secondary structural details. A str family specific QLF motif was identified in ICL3, TM6 of odr-10 and 92 other SR family specific motifs were also identified by using TM-MOTIF package. The identified sequence features can be used further to train SVM models and to predict putative receptors from other nematode species. Attempts have been made to design an user-friendly alignment viewer TM-MOTIF (work published) to detect and to display conserved motifs on the predicted membrane topology in the set of aligned transmembrane proteins (Chapter 4). The tool is very effective in identifying not only the conserved motifs (default 60%) but also the amino acid substitution (AAS) with its respective physico-chemical properties (by using
an in-house program namely,MotifS) at each position of the alignment. TMMOTIF provide option for the users to submit their sequence of interest (multiple FASTA and MSA) to visualize the seven predicted helices of TM proteins in VIBGYOR colouring scheme. User can also align sequence of interest with any one of the given reference sequence (known structure) to get a pairwise alignment and this particular display is highly helpful as a pre-requisite for homology modelling. User can also perform a BLAST search to identify a nearest homologue from the incorporated cross-genome GPCR and OR cluster datasets of selected organisms. In short, TM-MOTIF is highly suitable for the comparative genomics and to identify the cluster-specific or receptor specific and common motifs observed at various percentage of conservation within and across the genome(s). The package is integrated to DOR (Database of Olfactory Receptors). As we know, the role of conserved motifs and AAS play crucial role in functional aspects. The previously established 32 clusters of eight major types of receptors of cross-genome GPCR clusters such as humanDrosophila GPCR clusters, human- C. elegans GPCR clusters and human only GPCR cluster dataset were considered to study primarily for the conserved motifs (MotifS program) and TM-MOTIF package has been used to record the observed motifs to its respective membrane topology (Chapter 5). Interestingly, a total of 33 conserved motifs have been identified from the human-Drosophila GPCR clusters and 76% of them were observed in TM helices, predominately in TM2 and TM7. Besides the classical motifs such as E/DRY and NPXXY, motifs observed in single receptor type (clusterspecific motifs or receptor-specific), two-receptor and multi-receptors types were also documented for the cross-genome GPCR clusters (work published).
vi
Olfactory receptor data repository was generated for selected eukaryotic organisms (yeast, worm, fly, mouse and human) and these sequences were aligned to produce intra- and inter-genomic phylogeny. Interestingly, 371 functional ORs from human genome were distributed in 10 distinct clusters, and class I (to sense water-borne odors), II (to sense air-borne odors) type receptors were discriminated while introducing few selected fish and amphibian ORs in the human OR phylogeny. In other study, fly ORs showed no significant coclustering with human OR phylogeny and proves that insect ORs are evolutionarily distinct from mammalian ORs. This could be due to the independent evolution, life style or reverse topology of fly ORs. Selected nematode ORs also shows no coclustering with human ORs due to long lineage and nematode life style. Study on humanmouse OR clusters showed significant coclustering and studies were carried with ORs of canine, rodents and nonhuman primates to analyze cluster association with human ORs. The results of sequence studies were organized in a publically available database namely DOR. It provides sequences, predicted TM boundaries, intra- and inter-genomic alignments, phylogeny of selected genomes. It also includes motif identification tool (TM-MOTIF) and is associated with other features like predicted secondary structure and dimer prediction from collaborators (work in press). In essence, genome-wide survey suggests representative sequences, cluster association, cluster specific motifs, orthologs, coclusters arrived at intra- and inter-genomic levels and are ultimately guiding to connect functional properties of known to unknown gene/protein and to understand structure function relationship.
vii
ACKNOWLEDGEMENT
I express my deep sense of gratitude to Dr. V. Balakrishnan, Department of Biotechnology, KSR College of Technology, Tiruchengode for his valuable guidance for my Ph.D. study. Besides I am extremely thankful to my co-supervisor and mentor Prof. Dr. R. Sowdhamini, Lab-25, National Center for Biological Sciences, Bangalore who has been a source of inspiration, help, guidance, advice to me throughout the course of this research work. Further, I sincerely express my earnest gratitude to my doctoral committee member Dr. S. SenthilKumar, PSG College of Technology, Coimbatore. I express my heartfelt thanks to Prof. Dr. K.
Karunakaran, Vice Chancellor, and Dr. P. Renuka Devi, Director-Research, Anna University of Technology Coimbatore for graciously permitting me to do this research. I submit my gratitude to Prof. Dr. K. Vijayaragavan, RSF, Director, NCBS, Bangalore, Prof. Dr. Obaid Siddiqi, RSF, Prof. Dr. Apurva Sarin, Prof. N. Srinivasan from IISc., Bangalore for extending care and moral support to pursue the research work and I submit my deepest gratitude to Mr. Ashok Rao, Mr. Shaju, teaching and non-teaching staff, my lab mates and all@ncbs for their kind hearted support in encouraging my research thirst. Thanks to my family members and my beloved APPA. B. NAGARATHNAM
viii
TABLE OF CONTENT
CHAPTER NO.
TITLE
PAGE NO.
ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF ABBREVIATIONS
iii xxii xxiv xxx
INTRODUCTION 1.1. 1.2. PRIOR ART ON GENOME-WIDE SURVEY BREAKTHROUGHS IN GPCR CRYSTALLOGRAPHY STUDIES 1.3. 1.4. GPCRS: POPULAR DRUG TARGETS STRUCTURE AND CELLULAR ACTIVITIES OF MEMBRANE PROTEINS 1.5. 1.6. 1.7. MEMBRANE PROTEIN: TOPOLOGY GPCR MECHANISM GPCR CLASSIFICATION 1.7.1 Olfactory Receptors (ORs) 1.7.2 Classical Knowledge on Olfactory Receptors 1.7.3 Olfactory Signaling Pathway in Human ORs 1.7.4 ORs, GRs and IRs in Drosophila 1.7.5 Insect olfaction (Drosophila ORs) 1.7.6 Nematode Olfaction 1.7.7 Mouse Olfaction
1 2
4 6
7 7 9 10 11
12
13 14 14 15 16
ix
CHAPTER NO.
TITLE
PAGE NO.
1.8
DATA REPOSITORIES FOR MEMBRANE PROTEINS 16 17
1.9
COLLECTION OF GPCR- HOMOLOGUES 1.9.1 BLAST (Basic Local Alignment Search Tool) 1.9.2 PSI-BLAST (Profile Vs Sequence comparison method) 1.9.3 Reverse PSI-BLAST (Sequence Vs Profile comparison method)
18
19
20
1.10
MULTIPLE SEQUENCE ALIGNMENT TECHNIQUES 1.10.1 CLUSTAL W 1.10.2 PRALINE TM 1.10.3 MAFFT 22 23 24 24 25 26 26
1.11
DERIVING PHYLOGENY OF GPCRs/ORs 1.11.1 PHYLIP 1.11.2 TREE-PUZZLE 1.11.3 MEGA (Molecular Evolutionary
Genetics Analysis)
27
1.12 1.13
CLUSTER ASSOCIATIONS SEQUENCE CONSERVATION AND DIVERSITY
27
28 29
1.14
HOMOLOGY MODELLING OF GPCRs/ORs
CROSS-GENOME CLUSTERING OF HUMAN AND C. ELEGANS G-PROTEIN COUPLED RECEPTORS 2.1 INTRODUCTION 30 30
CHAPTER NO. 2.2
TITLE C. elegans - AN ATTRACTIVE ANIMAL MODEL 2.2.1 Features Related to C. elegans and Human GPCRs
PAGE NO.
30
31 33 33 34 35 35 38 38
2.3 2.4
OBJECTIVES PRIOR ART 2.4.1 Superfamilies of Serpentine Receptors
2.5
METHODOLOGY 2.5.1 Selection Criteria for C. elegans GPCRs 2.5.2 Generation of Representative Profiles 2.5.3 Performing RPS-Blast 2.5.4 Cross Genome Alignment of Human C. elegans GPCRs 2.5.5 Cross -Genome Phylogeny of Human C. elegans GPCRs 2.5.6 Terminologies used to Describe Phylogeny 2.5.6.1 Human GPCR clade [HC] 2.5.6.2 Coclusters [CC] 2.5.6.3 Neighbor Clades [NC] 2.5.6.4 Neighbor Members [NM] 2.5.6.5 Species specific Members [SS] 2.5.6.6 Superfamilies of Serpentine receptors (SR)
39
40
40 40 41 41 41
41 42 43 67
2.6
RESULTS AND DISCUSSION 2.6.1 Result Summary for Peptide Receptors 2.6.2 Result Summary for Chemokine Receptors 2.6.3 Result Summary for Nucleotide and Lipid receptors
68
xi
CHAPTER NO.
TITLE 2.6.4 Result Summary for Biogenic Amine Receptors
PAGE NO.
81
2.6.5 Result Summary for Class B (Secretin) Receptors 2.6.6 Result Summary for Cell Adhesion Receptors 2.6.7 Result Summary for Class C (Glutamate) Receptors 2.6.8 Result Summary for Frizzed/Smoothened Receptors 2.7 CONCLUSION 108 110 101 99 94
PHYLOGENETIC ANALYSIS OF SERPENTINE RECEPTORS OF C. ELEGANS AND IDENTIFICATION OF CONSERVED MOTIFS IN SERPENTINE RECEPTOR SUPERFAMILIES 3.1 3.2 3.3 3.4 3.5 INTRODUCTION HOMOLOGUES OF C. elegans GPCRs OBJECTIVES 117 117 118 118
CHEMOSENSORY RECEPTORS IN C. elegans 119 CHEMOSENSORY NEURONS AND OLFACTORY APPARATUS IN C. elegans 119
3.6
FAMILIES AND SUPERFAMILIES OF SERPENTINE RECEPTORS IN C. elegans 120 122
3.7 3.8
FEATURES AND IMPORTANCE OF SRs SRs: FUNCTIONAL RELEVANCE WITH OTHER EUKARYOTIC GPCRs
122 123
3.9
METHODOLOGY
xii
CHAPTER NO.
TITLE
PAGE NO.
3.9.1 Data Collection 3.9.2 Prediction of TM-helices by HMMTOP 3.9.3 Alignment Procedure by MAFFT
123 123 124
3.9.4 Phylogeny of Selected Serpentine Receptors 124
3.9.5 Identification of Motifs in SRs 3.10 RESULTS 3.10.1 Identified Motifs in SR Families : A Pilot Study 3.10.2 Homology Modelling of odr-10 3.10.2.1 Pairwise alignment of odr-10
with bovine rhodopsin sequence 3.10.2.2 Alignment by MAFFT 3.10.2.3 Structure validation for Odr10 model 3.10.2.4 Preliminary phylogenetic analysis 3.10.2.5 Odr-10 an outgroup to HOR
124 125
127 128
128 129
130 131
131 132
3.11
CONCLUSION
TM-MOTIF: A PACKAGE AND AN ALIGNMENT VIEWER TO IDENTIFY CONSERVED MOTIFS AND AMINO ACID SUBSTITUTIONS IN ALIGNED SET OF SEVEN TRANSMEMBRANE HELIX PROTEINS 4.1 INTRODUCTION 4.1.1. Functional Importance of Conserved Motifs in TM-Proteins 4.1.2. Motif Related to Structural Integrity and Stability 137 136 135 135
xiii
CHAPTER NO.
TITLE
PAGE NO.
4.1.3. Impacts of Motifs in Evolutionary Bioinformatics 4.2. 4.3. 4.4. OBJECTIVES OF TM-MOTIF KEY FEATURES OF TM-MOTIF METHODOLOGY 4.4.1. In-Built Dataset of Cross-Genome GPCR and OR Cluster Dataset 4.4.1.1 Human-Drosophila cross-genome GPCR clusters 4.4.1.2 Human-C. elegans cross-genome GPCR clusters 4.4.1.3 Human-mouse cross-genome OR clusters 4.4.2 Alignment Procedures for Cross-Genome GPCR/OR Clusters 4.4.3. Prediction of Membrane Topology for TM Helices and Loops 4.4.4 Detection of Motifs and Amino Acid Substitution (AAS) in the Cross-Genome Alignment 4.4.5 Mapping of Identified Motifs on TM-helices and Loops in MSA 4.4.6 Identification of Homologues Sequences for user Submitted Queries by Performing BLAST 4.4.7 Pairwise Alignment in TM-MOTIF 4.5 RESULTS 144 144 145 143 143 142 141 141 141 141 141 138 138 139 140
xiv
CHAPTER NO.
TITLE
PAGE NO.
4.5.1. Software Input and Output Options 4.5.2. Input Options 4.5.3. Output Options
4.5.3.1 Display of predicted 7 TM|helices in VIBGYOR colouring
145 146 146
scheme: (by using Run TM option) 146
4.5.3.2 Display of Identified Motifs and AAS in MSA: (by using Run Motif option) 4.5.3.3 Display of Detected Motifs on TM-helices: (by using Run TM-Motif option) 4.5.3.4 Alignment with Reference Sequence 4.5.3.5 Identifying closest homologues of user sequence in selected organisms 151 150 148 147
4.5.3.6 Display of Over predicted helices 151 4.6. DEFAULT PARAMETERS 4.6.1 TM-MOTIF- Output Files 4.7. 4.8. 4.9. CAVEAT AND FUTURE DEVELOPMENT AVAILABILITY CONCLUSIONS 152 152 153 154 154
xv
CHAPTER NO.
TITLE
PAGE NO.
ANALYSIS ON CONSERVED MOTIFS AND PERMITTED AMINO ACID EXCHANGES IN CROSS-GENOME GPCR CLUSTERS 5.1 5.2 5.3 INTRODUCTION OBJECTIVES RESIDUE CONSERVATION IN CROSSGENOME SEQUENCES 5.4 IMPACT OF AMINO ACID CONSERVATION AND TYPES OF SUBSTITUTIONS 5.5 METHODS 5.5.1 Cross-genome GPCR cluster dataset 5.5.2 Alignment Procedure 5.5.3 Prediction of membrane topology 5.5.4 Program to Detect Motifs and AAS 5.6 5.7 RESULTS OCCURRENCE OF MOTIFS FOR SINGLE RECEPTOR TYPE 5.8 MOTIFS OBSERVED IN HUMANDROSOPHILA CROSS-GENOME CLUSTERS 5.8.1 Motifs Observed in Transmembrane Helices 5.8.2 Motifs Observed in Loop Regions 5.9 MOTIFS OBSERVED IN HUMAN- C. elegans GPCR CROSS-GENOME CLUSTERS 5.10 CHARACTERISTIC MOTIFS FROM CROSS-GENOME GPCR CLUSTERS 169 167 164 165 164 163 159 159 160 160 161 161 162 158 156 156 157
xvi
CHAPTER NO.
TITLE
PAGE NO.
5.10.1 Conserved D/ERY and NPXXY motifs in GPCR Clusters 5.10.2 Identified KLK/R and RLAR/K motif in Secretin Receptor 5.10.3 Conserved PMNYM / PMSYM motif in BGA Receptor 5.11 SUMMARY 170 171 169 169
GENOME WIDE SURVEY OF OLFACTORY RECEPTORS (ORS) IN SELECTED EUKARYOTIC GENOMES 6.1. PHYLOGENETIC STUDY ON SELECTED HUMAN ORS 6.1.1. Introduction 6.1.2. Objectives and Scopes 6.1.3. Olfactory Receptors 6.1.4. OR: Membrane Topology 6.1.5. Prior Studies on ORs 6.1.6. Methodology 6.1.6.1. Retrieval of OR sequences 6.1.6.2. Prediction of membrane topology : Human ORs 6.1.6.3. Alignment procedure 6.1.6.4. Phylogeny on selected human olfactory receptors 6.1.6.5. Analysis of phylogeny 179 180 178 179 173 173 173 174 175 175 177 177 173
xvii
CHAPTER NO.
TITLE 6.1.7. Results
PAGE NO. 181
6.1.7.1. Class I and II type receptors in human OR phylogeny 6.1.7.2. Sequence features of 10 human OR-subclusters 6.1.7.3. Representative OR sequences 6.1.7.4. Motif analysis on human olfactory receptors 6.1.7.5. SVM Analysis 6.2. CROSS-GENOME PHYLOGENY ON SELECTED ORS FROM HUMAN AND FISH GENOMES 6.2.1. Objective 6.2.2. Review of Literatures 6.2.3. Fish ORs 6.2.4. Results 6.2.5. Sequence conservation: across fish and human ORs 6.3 CROSS-GENOME PHYLOGENY ON SELECTED ORS FROM HUMAN AND AMPHIBIAN GENOME 6.3.1 Objective 6.3.2 Literature survey on class I and II type ORs 6.3.3 Amphibian ORs 6.3.4 Results 6.3.5.1 Cocluster HXC1 Class I type receptors 6.3.5.2 Cocluster HXC2- class II type receptors 195 195 192 192 193 191 191 189 186 186 187 187 188 183 185 181 182 181
xviii
CHAPTER NO.
TITLE 6.3.5.3 Cocluster HXC3 - class II type receptors
PAGE NO.
196 199 199 199
6.4
PHYLOGENETIC ANALYSIS ON DROSOPHILA OLFACTORY RECEPTORS 6.4.1 Background 6.4.2 Drosophila ORs 6.4.3 Results on Drosphila OR Phylogeny Analysis 200
6.4.3.1 Cluster association: 10 subclusters 200 6.4.4 Summary 6.5 CROSS-GENOME PHYLOGENETIC ANALYSIS ON SELECTED ORS FROM DROSOPHILA, YEAST AND HOMO SAPIENS 6.5.1 Background 6.5.2 Insect ORs and mammalian ORs: (Evolutionarily unrelated) 6.5.3 Membrane proteins in Yeast 6.5.4 Results 6.5.5 Summary 6.6 CROSS-GENOME PHYLOGENETIC ANALYSIS ON SELECTED OLFACTORY RECEPTORS FROM HUMAN AND C. elegans GENOMES 6.6.1 Odr -10 and homologues 6.6.2 Results and Discussion 6.6.3 Summary 206 207 208 211 204 205 205 206 204 204 203
xix
CHAPTER NO.
TITLE
PAGE NO.
6.7
CROSS-GENOME PHYLOGENETIC ANALYSIS ON SELECTED ORS FROM HUMAN AND MOUSE GENOMES 6.7.1 Introduction 6.7.2 Objectives 6.7.3 Human Mouse OR Orthology 6.7.4 Complex Picture on Human-Mouse OR Orthology 6.7.5 Methodology 6.7.6 Results 6.7.6.1 Cross-genome OR cluster association 6.7.6.2 Cross- genome phylogeny with Class-I type receptor homologues 6.7.7 Common motifs in the Cross-genome phylogeny 6.7.8 Summary 218 218 217 215 214 215 215 212 212 213 213
6.8
PHYLOGENETIC ANALYSIS ON OLFACTORY RECEPTORS FROM SELECTED HUMAN AND NON-HUMAN PRIMATES 6.8.1 Objectives 6.8.2 Background 6.8.3 Methodology 6.8.4 Results 6.8.4 Summary 220 220 220 220 221 222
xx
CHAPTER NO.
TITLE
PAGE NO.
6.9
DATABASE OF OLFACTORY RECEPTORS (DOR) 6.9.1 Objectives 6.9.2 Features on OR sequences in DOR 6.9.2.1 OR sequences of target genomes: 6.9.2.2 Predicted TM boundaries 6.9.2.3 Single/cross- genome OR alignments 6.9.2.4 Cluster association and Phylogeny 6.9.2.5 Softwares and Tools (TM-MOTIF) in DOR 6.9.3 Structural features (Application of sequence searches) 6.9.4 Summary 230 233 229 228 227 222 222 224 225 226
CONCLUSION 7.1 7.2 7.3 7.4 7.5 7.6 7.7 COMPENDIUM CROSS-GENOME GPCR CLUSTERING PHYLOGENETIC ANALYSIS ON SERPENTINE RECEPTORS TM-MOTIF PACKAGE STUDY ON CONSERVED MOTIFS AND AAS IN CROSS-GENOME GPCR CLUSTERS PHYLOGENETIC ANALYSIS ON ORS IN SELECTED EUKARYOTIC GENOMES SUMMARY
236 236 237 240 242 245 247 253
xxi
CHAPTER NO.
TITLE
PAGE NO.
APPENIDX 1 THE LIST OF IDENTIFIED FAMILY-SPECIFIC MOTIFS IN SR 256 REFERENCES LIST OF PUBLICATIONS CURRICULUM VITAE 260 284 285
xxii
LIST OF TABLES
TABLE NO.
TITLE
PAGE NO.
2.1
Distribution of Human and C. elegans GPCRs in 32 Clusters 114 116
2.2 3.1
List of Identified Orthologs List of identified motifs in serpentine receptor super families
134
5.1
Motifs@ observed in the transmembrane helices and loop regions of human and Drosophila GPCR clusters+ 162
6.1
Analysis on sequence features of 10 human OR subclusters 183
6.2
List of conserved motifs in 10 human OR subclusters (60% level of conservations) 184
6.3
Sequence identity of neighboring fish ORs and human class I type receptors observed in cross-genome OR phylogeny 191
6.4
Sequence identity of neighboring frog ORs and human class I type receptors observed in cross-genome OR phylogeny 197
6.5
Sequence identity of neighboring frog ORs and human class II type receptors observed in cross-genome OR phylogeny (referred as HXC2) 198
6.6
Sequence identity of neighboring frog ORs and human class II type receptors observed in cross-genome OR phylogeny (referred as HXC3) 198
xxiii
TABLE NO.
TITLE
PAGE NO.
6.7
Significant cluster association for str type receptors in CeC3 and sequence pairs with high /low identity has been given 210
6.8
Sequence identity and similarity between odr-10 and associated SR 213
6.9
Percentage identity for selected human and mouse ORs for significant association from cross-genome OR phylogeny 219
6.10
Percentage Identity between selected human ORs and non-human ORs 221
xxiv
LIST OF FIGURES
FIGURE NO. 1.1 1.2 1.3
TITLE
PAGE NO. 3 5 8 10
Central dogma of genome-wide survey on sequences Crystal structure of bovine rhodopsin (Li et al 2004) Membrane topology of olfactory receptor (odr-10) in C. elegans
1.4 1.5
GPCR signaling pathway ORs and organization of the olfactory system in mammals and OR signaling pathway (Meyer et al 2000)
13
1.6
Overview on the techniques involved in genomewide survey 22
2.1
Flow-chart to depict the step-wise procedure for cross-genome clustering of GPCRs 37
2.2(a-c)
Pictorial representation for various types of cluster association 42
2.3(a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) 2.4(a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) 2.5(a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) 2.6(a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) 2.7(a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) 55 52 50 48 46
xxv
FIGURE NO.
TITLE
PAGE NO.
2.8(a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) 2.9(a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) 2.10(a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) 2.11(a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) 2.12 (a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display and Radial Display 2.13(a-b) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display) 2.14(a-b) Cross-genome phylogeny of chemokine receptors: (Rectangular Display & Radial Display) 2.15(a-b) Cross-genome phylogeny of chemokine receptors: (Rectangular Display & Radial Display) 2.16(a-b) Cross-genome phylogeny of nucleotide and lipid receptors (Rectangular Display & Radial Display) 2.17(a-b) Cross-genome phylogeny of nucleotide and lipid receptors(Rectangular Display & Radial Display) 74 2.18(a-b) Cross-genome phylogeny of nucleotide and lipid receptors (Rectangular Display & Radial Display) 2.19(a-b) Cross-genome phylogeny of peptide receptors nucleotide and lipid receptors (Rectangular Display & Radial Display) 2.20(a-b) Cross-genome phylogeny of nucleotide and lipid receptors (Rectangular Display & Radial Display) 80 78 76 72 70 69 66 64 63 61 59 57
xxvi
FIGURE NO.
TITLE
PAGE NO.
2.21(a-b) Cross-genome phylogeny of nucleotide and lipid receptors (Rectangular Display & Radial Display) 2.22(a-b) Cross-genome phylogeny of biogenic amine receptor receptors (Rectangular Display & Radial Display) 2.23(a-b) Cross-genome phylogeny of biogenic amine receptor (Rectangular Display & Radial Display) 2.24(a-b) Cross-genome phylogeny of biogenic amine receptor (Rectangular Display & Radial Display) 2.25(a-b) Cross-genome phylogeny of biogenic amine receptor (Rectangular Display & Radial Display) 2.26(a-b) Cross-genome phylogeny of biogenic amine receptor (Rectangular Display & Radial Display) 2.27(a-b) Cross-genome phylogeny of secretin type receptors (Rectangular Display & Radial Display) 2.28(a-b) Cross-genome phylogeny of secretin type receptors (Rectangular Display & Radial Display) 2.29(a-b) Cross-genome phylogeny of cell adhesion type receptor (Rectangular Display & Radial Display) 2.30(a-b) Cross-genome phylogeny of glutamate receptor (Rectangular Display & Radial Display) 2.31(a-b) Cross-genome phylogeny of glutamate receptor (Rectangular Display & Radial Display) 2.32(a-b) Cross-genome phylogeny of glutamate receptor (Rectangular Display & Radial Display) 2.33(a-b) Cross-genome phylogeny of glutamate receptor (Rectangular Display & Radial Display) 107 105 104 102 100 98 96 93 91 88 86 84 82
xxvii
FIGURE NO.
TITLE
PAGE NO.
2.34(a-b) Cross-genome phylogeny of FRZ/SMT type receptor (Rectangular Display & Radial Display) 2.35 (a-b) Distribution of C. elegans GPCRs at various E-value thresholds 3.1 3.2 3.3 3.4 3.5 3.6 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 Pie-diagram to show the distribution of serpentine receptors (SR) in the dataset Phylogeny on selected serpentine receptors (circular view tree) The subcluster showing odr-10 and its homologues Pairwise alignment of odr-10 with bovine rhodopsin sequence Three -dimensional model of olfactory receptor odr-10 and structure validation Phylogeny on selected human olfactory receptors with an olfactory receptor (odr-10) from C.elegans Flow-chart Tool guide of TM-MOTIF : an overview Snapshot for the available main menu of the front window of TM-MOTIF with user interactive features 145 Options given for the submission of input sequences in TM-MOTIF package Sample output for the option RUN TM Sample output for the option RUN MOTIF Sample output for the option RUN TM-Motif
Snapshot for the display of pairwise alignment of users input sequence with selected reference sequence 150
109 112 123 125 127 129 130 132 140 142
146 147 148 149
Snapshot Depicts the Display of Over Predicted TM-Helices 151
xxviii
FIGURE NO. 5.1 5.2 5.3 5.4(a-c) 6.1
TITLE Pictorial representation to denote the occurrence of highly conserved DRY motif in TM3,ICL2 Flow-chart describes about the steps involved in the study
PAGE NO.
158 159 168 171 179 180 189
Percentage residue conservation in TM helices and loops in GPCR Clusters Illustration of characteristic motifs (observed at 60% conservation) Flow-chart for the sequence analysis on olfactory receptors
6.2(a-b) Phylogenetic display of selected human olfactory receptor 6.3 6.4 Phylogeny of selected olfactory receptors in Homo sapiens and fish genomes Snapshot of Alignment window for the motif KAFSTC in human ORs and in few fish ORs at cross-genome alignment 6.5 Snapshot depicts the co-clustering of fish ORs with class I type receptors of human ORs in HSC1(given in A),also exhibiting the coclusters like HXC1,HXC2 and HXC3 to indicate the class I and II type receptors from frog ORs with human ORs (given in B). 6.6 Snapshot depicts the co-clustering of fish ORs with class I type receptors of human ORs in HSC1(given in A),also exhibiting the coclusters like HXC1,HXC2 and HXC3 to indicate the class I and II type receptors from frog ORs with human ORs (given in B). 194 193 190
xxix
FIGURE NO. 6.7 6.8
TITLE Phylogeny of Drosophila Olfactory receptors Observed 10 subclusters of Drosophila olfactory receptors
PAGE NO. 201
203
6.9
Cross-genome phylogeny on selected ORs from human, Drosophila and yeast 206
6.10
Observed cluster association in the cross-genome phylogeny of selected ORs from human and C. elegans genomes 208
6.11
Cross-genome phylogeny of selected olfactory receptors (ORs) from human and mouse genomes 216
6.12
Phylogeny on selected human and mouse olfactory receptors with special emphasize to mouse class I type receptors 216
6.13
Cross genome phylogeny on selected human ORs with ORs from non human primates and aves 222 225
6.14 6.15
Available main menu in the front page of DOR A snapshot of the give option sequence and its application in DOR
226 227 228 229
6.16 6.17 6.18 6.19
Display of predicted membrane boundaries in DOR Display of Alignment option in DOR Display of cross-genome OR phylogeny in DOR Overview on pictorial representation of available features in DOR for sequence analysis
230
6.20
Overview on DOR features for sequence and structural information for olfactory receptors in DOR 231 233
6.21
Display of 3D Structure and related features in DOR
xxx
LIST OF ABBREVIATIONS
AAS BGA receptors BLAST BS CAR CC CMK FRZ/SMT GLR GPCRs HC MAFFT MEGA N&L NC NJ NM ORs PR RMSD RPS-BLAST SEC SR SS SVM TM proteins
Amino acid substitutions Biogenic amine receptors Basic Local Alignment Tool Bootstrap Cell adhesion receptors (CAR), Co-clusters Chemokine receptors (CMK), Frizzed/smoothened receptors Class C (glutamate) receptors G-protein coupled receptors Human GPCR clade Multiple Alignment using Fast Fourier Transform Molecular Evolutionary Genetics Analysis Nucleotide and lipid receptors Neighbor clades Neighbor joining Neighbor members Olfactory receptors Peptide receptors Root-mean-square deviation Reverse PSI-BLAST Class B (secretion) receptors Serpentine receptors Species-specific members Support vector machine Trans-membrane proteins
CHAPTER 1 INTRODUCTION
The vast and frequent update of sequence databases to build repositories for various genomes and predicting accurate structural information of these sequences are two critical steps in Computational Genomics. Available knowledge and approaches for genomics (Lipman et al 2011) and structural genomics (Redfern et al 2008) are drastically different, but can be inter-connected effectively for the cause of identifying functional annotations (Alfarano et al 2005). Huge accumulation of sequence information in one end and limited resources on structural details on the other end is the crucial scenario in bioinformatics. This imbalance is indeed a challenge to achieve the goal of identifying function(s) of interested gene(s) immediately. However, the accumulated large size data repositories can be handled effectively only through bioinformatics techniques such as genome wide survey which is a more sophisticated approach than the traditional geneby-gene approach and provide clues to connect sequences from various genomes for the common function. Methods such as data clustering or principal component analysis, artificial neural networks or support vector machines are useful for gene/protein prediction, classification, association and annotation of novel proteins etc., further support in analyzing functional genomics data. My current objective is applying effective bioinformatics approaches such as genome-wide survey, cross-genome phylogenetic analysis
2 on certain GPCRs and ORs to propose representative sequences, cluster association, cluster-specific motifs, orthologs, species-specific behavior and co-clusters arrived at intra- and inter-genomic levels, ultimately to connect the functional properties of known to unknown gene/protein (Figure 1.1). In principle, sequence comparison studies, along with reference to structural similarities, provide clues to connect functional resemblance (Redfern et al 2008) (Ye et al 2006). This unidirectional hypothesis of associating sequences, predicting structural details, relating biochemical functions with the phenotypes, forms the baseline of computational biology. Sequence studies for various genomes will provide opportunity to identify a group of associated proteins based on phylogeny and can be exploited for functional relevance. This conceptual framework really helps to compare sequences from various genomes and provides clues to connect the sequences of known function to the unknown. These rationale on genome-wide survey of interested gene/protein sequences provide platform to integrate knowledge on sequencestructure-function paradigm for public access (Kerrien et al 2011). Thus, sequence studies act as a primary step to connect structural and functional studies. 1.1 PRIOR ART ON GENOME-WIDE SURVEY Performing genomewide survey on selected or interested protein families (Tripathi and Sowdhamini, 2008 and Metpally and Sowdhamini 2005) will be appropriate to explain the approach of accumulating related proteins (associated gene clusters), identifying putative orthologs and to observe conserved motifs from various genomes. Cross-genome sequence analysis provides knowledge on sequence conservation across taxa, preserved at cellular, biochemical and molecular levels
3 species-specific tendencies and exhibit evolutionary integrity at cross-genome level (Figure 1.1). Particularly, cross-genome sequence studies with selected model organisms will be useful for vast practical applications. For instance, a cross-genome phylogenetic analysis on selected GPCRs of human and Drosophila genome (Metpally and Sowdhamini, 2005) organized as eight major groups of GPCRs, led to generate 32 cross-genome GPCR clusters. Such an approach proved valuable for identifying the natural ligands of Drosophila and human orphan receptors.
Figure 1.1
Central dogma of genome-wide survey on sequences

Note: Pictorial representation describing the procedures involved in genome-wide sequence analysis. Label 1 refers to the selection of interested genomes. Label 2 refers to the collection of non-redundant sequences from the selected genomes. Label 3 refers to crossgenome alignment procedure. Label 4 refers to cross-genome phylogeny on sequences. Label 5 refers to cross-genome cluster association and analysis for species-specificity, co-cluster arrangements, identification of orthologs, conserved motifs, observing functional clues to hypothetical proteins in the phylogeny.
Other case studies like genome-wide survey on identifying putative serine/threonine protein kinases (STKs) in cyanobacteria, (Zhang et al 2007), gaining practically useful insights on symbiotic nitrogen-fixing alpha-
4 proteobacterium like Sinorhizobium meliloti (Schluter et al 2010) based on experimental data, phylogenetic classification on transporters and membrane proteins from lower organisms (De Hertogh et al 2002) to higherorder organisms (Chang et al 2004), phylogenetic analysis on olfactory receptor subfamilies (class I and class II type) in fish (Freitag et al 1999), amphibians (Freitag et al 1995), phylogenetic analysis in discriminating gustatory and olfactory receptors in Drosophila (Robertson et al 2003), phylogenetic grouping of serpentine receptor superfamilies in C. elegans (Robertson and Thomas 2006), identifying olfactory receptor subfamilies in mouse
(Sullivan, et al 1996) and human (Glusman et al 2001), influence of phylogenetic analysis in ethno-medicinal studies (Saslis-Lagoudakis et al 2011) are highly commendable. These case studies illustrate the important applications of genome-wide survey and usage of phylogeny in identifying similar or related sequences for protein of interest across genomes. 1.2 BREAKTHROUGHS IN GPCR CRYSTALLOGRAPHY STUDIES As we know, the diverse cell surface proteins exist as 30% in human genome and are very popular for their therapeutic importance and applications. Among the available (>82,160) structures in the PDB, crystal structures are available for only very few membrane proteins. For structural crystallization, membrane proteins embedded in the lipid bilayer have to be extracted and need to form a protein-detergent complex (PDC) (KoszelakRosenblum et al 2009). Also, the surrounding environmental lipids in cell membranes interfere with both crystallography and nuclear magnetic resonance (NMR) spectroscopy, while solving three-dimensional structures of membrane proteins. As purification and crystallization of membrane proteins are very crucial events in membrane protein crystallography (Dilanian et al 2011), only a limited number of membrane proteins have been reported so far.
Figure 1.2
Crystal structure of bovine rhodopsin (Li et al 2004)

a) Crystal structure of bovine rhodopsin displayed in ribbon representation (Li, et al., 2004). The observed seven TM-helices and one peripheral helix are colored in the rainbow order: TM-helix1 in dark blue (residues 3464); TM-helix 2 in light blue (71100); TM-helix 3 in blue-green (106140); TM-helix 4 in yellow-green (150 173); TM-helix 5 in yellow (200230); TM-helix 6 in orange (241276); TM-helix 7 in red (286309); TM-helix 8 in magenta (311321). 1.2. Space-filling representation of rhodopsin- a photoreceptor protein.
b)
Rhodopsin- is the first solved crystal structure (Palczewski et al 2000) (Figure 1.2 a and b), 1 adrenergic receptor (Warne et al 2008), 2 adrenergic receptor (Rasmussen et al 2007), adenosine receptor (Jaakola et al 2008), dopamine D3 receptor, CXCR4 chemokine receptor (Wu et al 2010), histamine receptor and most recently reported lipid GPCR - sphingosine 1phosphate receptors (S1P1 receptors) are few important crystal structures. These structural studies will guide to compare the reference structures with disease-implicated genes based on modelling to interpret the dysfunctions. Most of the solved structures are used as templates for molecular modelling.
6 1.3 GPCRS: POPULAR DRUG TARGETS As GPCRs are involved in a wide variety of physiological processes, such as regulation of immune system activity and inflammation, cell density sensing, sense of smell, visual sense, autonomous nervous system transmission and behavioral and mood regulation, they are effectively targeted in medicinal chemistry. Several previous reviews and literature highlight the clinical importance of GPCRs (Insel et al 2007) and few examples can be discussed to denote the importance of GPCR biology in medicine. For instance, a number of monogenic mutations have been identified in rhodopsin causing disease called retinitis pigmentosa, number of endocrine disorders, serious illness such as schizophrenia (Seeman 1987), Alzheimer's disease and Parkinson's disease (Lee et al 1978). Also there are many reported disorders such as genetic disorders of the calcium-sensing receptor (CaSR), graves disease, cancer, diabetes, heart diseases,
neurodegenerative diseases, asthma, and diseases related to autoimmunity, AIDS and so on are few other examples to emphasize the multi-functional role of GPCRs and its clinical implications. Diversity of GPCRs and ligand-binding properties make these receptors as interesting targets for the structure-based drug design (Schlyer and Horuk 2006) and even lead the scope for personalized medicine. Notably, receptors such as AT1 angiotensin, adrenergic, dopamine and serotonin (5-hydroxytryptamine, 5-HT) receptor subtypes are most exploited for their clinical importance and related diseases which are all useful drug targets.
7 1.4 STRUCTURE AND CELLULAR ACTIVITIES OF MEMBRANE PROTEINS Membrane proteins are embedded within the lipid bilayer and are designated as transmembrane proteins, since they loop inside and outside of the cell boundaries (Figure 1.2). A class of cell-surface receptors retain structural features, having extracellular N-terminal, intracellular C-terminal with seven transmembrane-helices (TMHs) connected by three intra and extracellular loops and reminding a snake-like structural element /display to have names such as 7TM receptors or heptahelical receptors or serpentine-like receptors (Probst et al 1992). Since the downstream targets of such membrane receptors are guanine nucleotide binding proteins, they are also referred as Guanine nucleotide-binding protein-coupled receptors, G-protein coupled receptors (GPCRs), serpentine receptors, and are popular for their versatile functional importance. GPCRs are ubiquitous as they majorly participate in signal transduction, and recognize various type of ligands (Bockaert and Pin 1999). Substantial evidence on GPCR oligomerization (Prinster et al 2005), participation in signaling pathways (Greenwald 2005), clinical importance (Kuwabara and N 2001) and availability of repositories for multiple organisms (Fredriksson and Schioth 2005) provide significant impetus for the study of GPCR sequences and their ligand-binding properties. Ligands could be endogenous compounds such as amines, peptides, Wnt proteins or endogenous cell surface adhesion molecules or photons and exogenous compounds like odorants. 1.5 MEMBRANE PROTEIN: TOPOLOGY There are several prediction methods available online to predict topology of membrane proteins. The prediction methods are mainly based on
8 the hydrophobicity profile of the helices. Notably, canonical GPCR members exhibit N-in and C-out topology, but olfactory receptors show N-out and C-in topology in higher order organisms (Figure 1.3). The other interesting fact is that especially Drosophila ORs and GRs retain N-in and C-out topology (Bargmann 2006, Benton et al 2006, Lundin et al 2007) and also referred as inverted/reverse topology. The methods like HMMTOP (Tusnady and Simon 2001), SOSUI (Hirokawa et al 1998),TMHMM (Krogh et al 2001), TMAP, MEMSAT, TMpred, TSEG, TM-finder, Pred-TMP, SPLIT, DAS, TopPred II, PREDTMR2, MPEx, Phobious and TOPCON are popularly used to predict the secondary structure of membrane proteins. Methods are also available to discriminate signal peptides (Lao et al 2002) in proteins.
Figure 1.3
Membrane topology of olfactory receptor (odr-10) in C. elegans

The predicted seven trans membrane helices (by HMMTOP) for odr-10 was given in TOPO2 display, wherein residues from 12-31 for TM1, 44-63 for TM2 , 94 -113 for TM3, 126-145 for TM4, 202-225 for TM5, 256-275 for TM6 and 286-305 for TM7 was predicted by HMMTOP. The conserved YRY motif in TM3, ICL2 and the Str superfamily specific QLF motif in ICL3 has been highlighted in red colour.
9 1.6 GPCR MECHANISM Membrane proteins are effectively involved in signal transduction (Figure 1.4), where GPCRs are activated by various external stimuli (Rodbell et al 1971). Due the influence of various external stimuli, receptors undergo conformational change (i.e., minimal rearrangement occur in TM6 and TM3 helices, but still the area remains unclear) and causes the activation of a guanine nucleotide-binding proteins (G-protein). GPCRs are dedicated to recognize intercellular messenger molecules (such as hormones,
neurotransmitters, lipids, biogenic amines, growth and developmental factors), and several sensory messages (such as light, odors and gustative molecules). Also, this event is primarily dependent on the type of the G-protein. For instance, The Golf subunit is mainly related to sense the chemosensory signals and participates in olfactory signaling pathways (Figure 1.4). Gs state of G-protein regulates the enzyme called adenylate cyclase (AC). AC activity is triggered when it binds to a subunit of the activated G-protein and subsequently triggers cAMP pathway for further transduction to result in various biological responses. Activation of AC stops when G-proteins return to the GDP-bound state (Figure 1.4). GPCRs are also involved in various secondary pathways like ion channels, adenylyl cyclases, and phospholipases.
10
Figure 1.4
GPCR signaling pathway

Image represents about GPCR-signal transductions which depicts the entry of ligands /stimuli, activation of G-protein subunit, subsequent activation of cAMP and event of internalization for biological responses. (Image adopted from DB-DRD4 - a database of dopamine D4 receptor (home page) and SOURCE: TRENDS in Pharmacological sciences URL: http://www.ibibiobase.com/projects/db-drd4/G_protein.htm)
1.7
GPCR CLASSIFICATION GPCRs comprise the most prolific family of cell membrane
proteins. Knowledge on GPCR classification is necessary since they involve in various signaling pathways and recognize diverse set of ligands and are related to various biological functions. The candidate GPCRs with characteristic seven TM-helices were classified with the aid of several prediction methods and classifiers. Though all the candidate GPCRs from various families retain seven TM-helices and are connected by ICLs and ECLs, sequence differences occur and exhibit subtle structural diversity (Gether 2000). Superfamily of GPCRs are classified majorly as class A (rhodopsin-like), class B (Secretin-like), class C (Metabotropic glutamate), class D (Fungal pheromone), class E (cAMP receptors) and class F (Frizzled/smoothened) (Kristiansen 2004). Particularly, class A is the largest, occupying 80% of the distribution and retains diverse receptors like rhodopsin, olfactory, biogenic amine, bioactive lipid, nucleic acid, and
11 peptide receptors. Wherein receptors such as secretin, calcitonin, glucagon, parathyroid hormone, vasoactive intestinal peptide and so on are related to class B. Class C includes receptors like metabotropic glutamate receptors (mGluRs), Ca2+-sensing receptor, -aminobutyric acid type B receptors (GABA-B) and vomeronasal receptors type 2. Class D retains receptors such as fungal pheromone P and -factor receptors (STE2/MAM2), whereas fungal pheromone A and M-factor receptors (STE3/MAP3) are related to class E. Class F retains slime mold cyclic adenosine monophosphate (cAMP) receptors. Recently, few other GPCR families, such as frizzled type receptors/FRZ (Vinson and Adler 1987, Bhanot et al 1996), smoothened type receptors/SMT (Alcedo et al 1996 and Nehme et al 2010), vomeronasal receptors type 1 /VNS (Dulac and Axel, 1995), ocular albinism (Schiaffino et al 1996, Schiaffino et al 1999), and plant receptors (Grill and Christmann 2007) i.e., Arabidopsis thaliana receptor GCR1 (Josefsson and Rask 1997), (Perfus-Barbeoch et al 2004) have also been added to the existing GPCR families. It has been observed that Class A, B and C cover nearly 600 GPCRs in the human genome, excluding putative candidate GPCRs. Notably, olfactory receptors (ORs) are members of class A type receptors and has been dealt exclusively in Chapter 6 under the title of genomewide survey on olfactory receptors in selected eukaryotes. 1.7.1 Olfactory Receptors (ORs) Sense of smell - a process of olfaction is beyond simple scientific understanding. In general, chemical senses are broadly divided into olfaction (the sense of smell) and gustation (the sense of taste). Critical knowledge on understanding and analyzing about the olfaction is a necessary science, not only for its biological or chemical perspective, but also for its powerful sociocultural phenomenon (Low 2005). Olfactory receptors participate in sensing diverse chemical stimuli or odors (Firestein 2001). ORs are fascinating for their functional significance
12 in detecting food, to assess its quality, to enhance its flavor, to indicate the presence of potential toxins and pathogens, to know about reproductive status, gender, genetic identity, conspecifics, mates as well as threats. ORs activate chemosensory cells leading to neural recognition and influence behaviours, hormone state and also mood (Munger et al 2009). Due to their diverse role, ORs are very important as well as present in our everyday life experiences and are need to be explored more in detail for the vast practical applications in the field of pharmaceutical industry (aroma therapy), cosmetic industry (scent/perfume manufacturing), food industry, olfacto-sexual function and to study olfacto-neural communication, olfactory dis-orders and so on. Thus, performing genome-wide survey on ORs of selected eukaryotic organisms will improve scientific credibility and ultimately serve for human benefit. 1.7.2 Classical Knowledge on Olfactory Receptors The landmark paper published in the year 1991, by Nobel
Laureates Buck and Axel, have explained about the role of olfactory receptors and the organization of olfactory system in humans (Buck and Axel 1991). Around three percent of our genes are used to code for different odorant receptors on the membrane of the olfactory receptor cells. Further research studies on phylogenetic approach in discriminating class I and class II type receptors to sense the water- and air-borne odors in higher eukaryotes i.e., human and mouse (Zozulya et al 2001; Niimura and Nei 2005), studies related to insect olfaction (Robertson et al 2003), nematode olfaction (Robertson and Thomas 2006), olfactory signaling , availability of ORs in various genomes, and observed common peptides in OR subfamilies in selected eukaryotic genomes further to (Gottlieb et al 2009) are providing remarkable background and facilitate the genome-wide survey of ORs identify OR subclusters, cluster-specific motifs, species-specific tendencies and co-clusters in tree topology (Chapter 6 for more details).
13 1.7.3. Olfactory Signaling Pathway in Human ORs The process of olfaction primarily starts with binding of an odor to specific receptor on sensory neuron where chemical energies transformed to electrical signals to sense the smell. Such binding activates Golf a G protein. The alpha subunit of Golf activates the enzyme adenyl cyclase, generating the major second messenger 3`,5`-cyclic adenosine monophosphate (cAMP) which directly opens the cyclic nucleotide gated channel. This allows the Na2+ and Ca2+ to flow in and depolarize the cell. Depolarization of these cells cause action potentials (nerve impulses) and are sent to the olfactory bulb and also by the pathway involving guanylyl cyclase GC-D (Meyer et al 2000). Human nose expresses different types of receptors, enabling the main olfactory system and using common pathway to encode thousands of odorants (Figure 1.5 a and b).
(a)
(b)
Figure 1.5 ORs and organization of the olfactory system in mammals and OR signaling pathway (Meyer et al 2000)
a) Depicts the pictorial representation of ORs and organization of the olfactory system in mammals b) Depicts OR signaling pathway, which depicts the proposed two hypothesis of OR-signal transduction (Meyer et al 2000). In this, upper panel describes the entry of various odors and recognized by ORs and initiate cGMP signaling pathway which involves G protein (Golf), an adenylyl cyclase (ACIII), a cyclic nucleotide-gated (CNG) channel (341b) and a chloride channel (ClC). After the response, cAMP is degraded by a CaM-dependent phosphodiesterase (PDE1C2). The other hypothesis (lower panel in b) explains the components of cGMP-signaling pathway and putative targets of cGMP which involves receptor guanylyl cyclase GC-D, cGMP-regulated PDE2, an unknown cGMP-regulated ion channel and the known CNG channel of the cAMP-signaling pathway.
14 1.7.4. ORs, GRs and IRs in Drosophila As we know, olfactory neurons play a central role in sensing volatile cues that afford the organism the ability to detect food, predators and mates. But, gustatory neurons sense soluble chemical cues that elicit feeding behaviours. In insects, the taste neurons initiate innate sexual and reproductive responses. It is believed that nearly 60 olfactory receptors (Berkeley Drosophila Genome Project database) play a major role in identifying and discriminating diverse odors for the insectsurvival and these Drosophila olfactory receptor (DORs) gene family are identified as G-protein coupled receptors (Clyne et al 1997, Gao and Chess 1999, Vosshall and Stocker 2007). These proteins are expressed in distinct subsets of olfactory neurons and certain family members were restricted to distinct portions of the olfactory system. Nearly the same numbers of gustatory receptors (GR) are meant for gustatory functions (Clyne et al 1997). Notably, insects GRs have the same transmembrane topology as ORs. Ionotropic Glutamate Receptors (IR) in Drosophila is referred as a new family of odorant receptors and these proteins accumulate in sensory dendrites and not present at synapses. They mediate chemical communication between neurons at synapses and are expressed in a combinatorial fashion in sensory neurons that respond to many distinct odors, but do not express either insect odorant receptors (ORs) or gustatory receptors (GRs). 1.7.5. Insect olfaction (Drosophila ORs) Several fundamental explanations have been published (Siddiqi, 1990), (Clyne et al 1999) to investigate molecular mechanism on Drosophila olfaction. Electrophysiological studies explained the differentiation in the
15 morphology of the olfactory sensilla and their distribution patterns (Venkatesh and Singh 1984, Stocker 1994). Studies suggest that there are 30 different classes of ORNs in the antenna (in adult ~40), based upon the odor response profile of individual neurons and few exhibit odor specificity. Notably, 24 antennal receptors such as Or2a, Or47b, Or33b, Or49b, Or65a, Or23a, Or85f, Or88a, Or67c, Or43a, Or7a, Or43b, Or59b, Or9a, Or85a, Or47a, Or22a, Or19a, Or67a, Or35a, Or98a, Or85b, Or82a and Or10a were tested experimentally with 110 odorant molecules using empty neuron system (Dobritsa et al 2003) and responses of receptors vary to different chemical classes. Generally, the functional insect ORs retain variable insect ORs with a constant odorant binding receptor called OR83b and forms the heteromeric complex then participate in signaling pathway. OR83b is also called as coreceptor (Vosshall and Stocker 2007) for its functional importance. In the literature (Larsson et al 2004), it is also mentioned that heteromeric insect ORs comprise a new class of ligand-activated non-selective cation channels (Sato et al 2008). Notably, insects ORs lack homology to G-protein coupled chemosensory receptors of vertebrates and exhibit drastically differing mechanisms in olfaction. Recent studies explained insect ORs as heteromeric ligand-gated ion channels (More details in Chapter 6). 1.7.6. Nematode Olfaction Chemosensory receptors in nematodes are highly diverse and large in number. Since worms lack both auditory and visual sense, chemosensation plays a central role in nematodes for its survival. In C. elegans, chemosensory receptors belong to G-protein coupled receptors and retain seven transmembrane proteins. Around 1330 genes and 400 pseudo genes have been
16 identified as chemoreceptors (Robertson and Thomas 2006) in C. elegans. Also many of these receptors are known as serpentine receptors and around 19 largest gene families are reported so far. Among the large number of proteins, only one protein namely odr-10 (Figure 1.3), was reported as an olfactory receptor in C. elegans (Sengupta et al 1996). 1.7.7. Mouse Olfaction As found in human olfactory receptors, mouse ORs also possess two broad classes of ORs with excellent bootstrap support (Glusman et al 2001). The class I type in mouse ORs are as found in fish and in the frog, but had been considered an evolutionary relic in mammals (Ngai et al 1993) and the class II receptors are found in amphibians and terrestrial vertebrates (Freitag et al 1995). There are 147 class I OR genes found in mouse OR subgenome, among them 120 OR genes were potentially functional. In mouse, all of the class-I type ORs were located in a single large cluster in chromosome 7. 1.8 DATA REPOSITORIES FOR MEMBRANE PROTEINS There are a huge number of data repositories and prediction servers for membrane topology are available exclusively for membrane proteins. Notably, repositories related to GPCRs (Elefsinioti et al 2004) like gpDB (Theodoropoulou et al 2008), GPCRDB and integrated web resources like G Protein Coupled Receptor - Oligomerization Knowledge Base Project, GPCR Natural Variants database (NaVa). Database namely SEVENS (Ono et al 2005) provides useful sequence information, chromosomal location and intragenomic phylogenetic clusters for membrane proteins from more than 50 eukaryotic organisms. IUPHAR (Committee on Receptor Nomenclature and Drug classification) incorporates detailed pharmacological, functional and
17 patho-physiological information on GPCRs, voltage-gated ion channels, ligand-gated ion channels and nuclear hormone receptors. The other related databases for structural resources like PDBTM, TOPDB (Tusnady et al 2008), provide collection of domains and sequence motifs. TMpad (Trans Membrane Protein Helix-Packing Database) and MPDB (Membrane Protein Data Bank) are useful to provide structural information on integral, peripheral and anchored membrane proteins and also peptides (Raman et al 2006). Data repositories for olfactory receptors are also available for public access. ORDB (Skoufos et al 2000), HORDE (The Human Olfactory Data Explorer) and integrated web resources from Sense Lab for ORs with associated links such as odorDB, odorMapDB are highly useful and particularly relevant to retrieve sequences for the olfactory receptors (ORs) from multi-genomes. 1.9 COLLECTION OF GPCR- HOMOLOGUES Sequence similarity searches are robust techniques to identify nearest homologues for a query sequence from database of interest. Pairwise comparison of proteins is a fundamental step in sequence similarity searches. The similarity scores depend upon the sequence features like amino acids and permitted amino acid substitutions (AAS). Generally, when a query and the subject are aligned with high similarity scores, then they can be referred for their sequence relevance and can be called as homologues. In other words, two proteins retaining similar sequences can be called as homologues. Homologues are further classified into orthologs and paralogs. While orthologous proteins evolved from a common ancestral gene belonging to two different genomes, paralogs were generated by the event of gene duplication and belong to the same genome. Thus, homologues share
18 significant sequence similarity and can be further connected for their functional relevance. A necessity arises to select an appropriate technique for similarity search when we deal with evolutionarily distant sequences and particularly membrane proteins. Each method is unique for its scoring scheme with respect to amino acid substitutions and the gap penalties. Functionally and evolutionarily important protein similarities can be recognized by comparing three-dimensional structures, but when structures are not available, patterns of conservation such as motifs, profiles, positionspecific scoring matrices, and Hidden Markov Models can be used to identify related sequences from the database of protein sequences. Several methods like BLAST (Altschul, et al 1997), FASTA (sequence based searches) (Lipman and Pearson 1985), IMPALA (profilebased searches) (Schaffer et al 1999) other approaches like PSI-BLAST, RPSBLAST, are effectively used to find homologues and further to identify common functional relevance. 1.9.1 BLAST (Basic Local Alignment Search Tool) Sequence comparisons between two sequences are achieved by producing quality alignments which maximize the correspondence between similar residues and minimize gaps (Altschul et al 1997). The objective here is to align or match a sequence of unknown function with
characterized/annotated proteins from model organisms, so that the structure and function can be extrapolated to the new sequence. Generally, dynamic programming technique has been implicated to achieve alignments locally (BLAST) or globally (FASTA). BLAST and FASTA (Lipman and Pearson 1985) are robust methods. Conceptually, the heuristic approach (BLAST) can deal with sequences considerably differing in length and identifies islands of
19 short matches. It relies upon Smith-Waterman algorithm (Smith and Waterman 1981), and is guaranteed to find the optimal local alignment with respect to the scoring system to provide maximal scoring segment pairs (MSPs). The scoring system majorly includes the substitution matrix and the gap-scoring scheme to align the sequences based on possible similarities. BLAST-a robust sequence comparison tool - is applicable for five main search methods such as blastp, blastn, blastx, tblastn and tblastx for varying inputs such as nucleotide and protein sequences. BLAST produces statistically significant alignments in the output and features like raw scores, bit scores and E-values are considered for quantify the alignment significance. Among them, E-values are most often used. Generally, lowest E-values are considered as highly significant for best alignment. An E-value refers to the number of alignments one expects to find with a score greater than or equal to the observed alignment score in a search against a random database. PAM (point residues) amino acid scoring accepted mutations per 100
matrix which
is based on an explicit in the BLAST
evolutionary model (Dayhoff et al 1978) is
provided
software distribution. It includes PAM40, PAM120, and PAM250, whereas the BLOSUM matrices are based on an implicit model of evolution and includes BLOSUM 45, 62 and 85 (Henikoff and Henikoff 1992). Generally, these matrices are very appropriate to deal with globular proteins, whereas PAM and JTT-200 (Jones et al 1992) can be used for membrane proteins. 1.9.2 PSI-BLAST (Profile Vs Sequence comparison method) Among the five BLAST programs, the work described in this thesis mostly relies on the basic protein BLAST technique, which includes blastp (protein-protein BLAST), PSI-BLAST (Position Specific Iterated BLAST), PHI-BLAST (Pattern Hit Initiated BLAST) and DELTA-BLAST (Domain
20 Enhanced Lookup Time Accelerated BLAST). As the name suggests, blastp compares a protein query with a protein database, PSI-BLAST allows the user to build a PSSM (position-specific scoring matrix) using the results of the first blastp run and iteratively uses the profile as query against the database of protein sequences (Altschul et al 1997). The generated profiles at each iteration, are searched against the database of protein sequences by rigorous iterations until convergence (meaning iterate until no new sequences are found). Thus, this method is effective in associating even distantly related sequences with remote homology. The application can be further improvised by using as jump-start PSI-BLAST (Altschul et al 1997), jack-knife approach, HOE (Homologous over-extension) reduced profile search (Gonzalez and Pearson, 2010) and the improved PSI-BLAST search techniques such as cascade PSI-BLAST (Bhadra et al 2006) as per user requirement. 1.9.3 Reverse PSI-BLAST (Sequence Vs Profile comparison method) To associate remotely related sequences, reverse PSI-BLAST technique (RPS-BLAST) is highly effective. This method differs from other sequence searches, wherein the query sequences are searched against a database of PSSM (Position Specific Scoring Matrices) profiles. PSSMs give the amino acid propensities at each sequence position based on the multiple alignments. PSSM generation also uses the multiple alignment sequence weights, the expected number of amino acids and the frequencies of unobserved amino acids (pseudo counts). Representative sequences from the protein families (example:3PFDB Shameer et al 2009), related domains and cluster types can be used to generate profiles to represent sequence properties as a block of consensus of amino acids. Hence, sequence search space has been broadened and opportunity has been extended to connect sequences at remote homology (Figure 1.6).
21 In the other method, that compares protein sequences against database of protein sequences, some limitations do exist. If stringent sequence properties are employed, scaled at sequence against database of sequences, there is little chance of missing very distantly related sequences in these search techniques. But, RPS-BLAST helps to associate even the distantly related sequences to its related profiles. So, the practical implications like generating cross-genome phylogenies, finding new members, associating evolutionarily distant sequences, classification and to associate functional annotation to new sequences based on known data. This effective method can be employed carefully in designing profiles, setting significant E-value thresholds and to interpret sequence search for related profiles. Separately, Hidden Markov Model (HMM) can also be used for pattern recognition and it provides a mathematical representation of a protein sequence (Eddy 1998, Karplus et al 1998). HMMs have been used for gene prediction, recognition of transmembrane helices (Sonnhammer et al 1998), phylogenetic analysis (Felsenstein and Churchill, 1996) and in distant homology detection (Krogh et al 1994b). Machine learning approaches are appropriate techniques to deal with pattern recognition problems and to recognize remote homology. Method like support vector machines (SVMs) (Pugalenthi et al 2010) is effectively used in classification problems where the already trained dataset with known features (Positive set) is used to associate unknown gene/protein sequence (Negative set) and is useful to propose putative members, where the predictions relay upon training dataset.
22
Figure 1.6
Overview on the techniques involved in genomewide survey

The given diagram depicts the use of available data repositories related to membrane proteins (GPCRDB, SEVENS DB, ORDB, HORDE and so on.) following the collection of sequences, predicting the membrane topology, using redundancy filter as the primary step for the cross-genome studies. The methodology is starting with sequence search programs (such as BLAST, PHI-BLAST, PSI-BLAST, RPS-BLAST) to homologues sequences and to perform cross-genome analysis.
1.10
MULTIPLE SEQUENCE ALIGNMENT TECHNIQUES Alignment procedures play a crucial role (Figure 1.1 and
Figure 1.6) in analyzing the relationships among diverse sequences. The arrangement of two or more sequences can be possible by aligning the sequences for common properties or sites. Weights can be assigned to the aligned elements so as to determine the degree of relatedness or to detect the existing homology between the multiple sequences. A pairwise alignment is between two sequences and a multiple sequence alignment (MSA) with many sequences, which are facilitating sequence comparison studies and the sequence can be aligned by various alignment methods. MSA can be referred as a generalization of pairwise sequence alignments. Here, instead of aligning two sequences, n number of sequences were aligned simultaneously, where
23 n is always >2, thus called as multiple sequence alignments and the alignment of multiple sequences is possible by introducing the gaps _ into the sequences. Membrane proteins differ considerably from globular proteins in sequence composition. The region that inserts into the cell membrane possesses different hydrophobicity patterns when compared to soluble proteins. Multiple sequence alignment techniques which are designed for globular proteins are not optimal to align the transmembrane proteins. And recommended alignment procedures (Pirovano 2008), can be employed carefully. When sequences from different genomes have been aligned together, then the alignment has been referred as cross-genome sequence alignments and the resulting phylogeny is referred as cross-genome phylogeny (Figure 1.6). 1.10.1 CLUSTAL W The CLUSTAL W (Thompson JD, 1994) is a popular MSA tool and generally the MSA technique consists of three main stages like 1) All pairs of sequences are aligned separately in order to calculate distance matrix giving the divergence of each pair of sequences. 2) A guide tree is generated from the distance matrix. 3) The sequences are progressively aligned according to the branching order in the guide tree. Initially, the CLUSTAL W program apply fast approximate (heuristic) method based on the number of K-tuple (this is the size of exactly matching fragment that is used) matches for generating pairwise distances (Wilbur and Lipman, 1983). Later, dynamic programming algorithm was used to enhance accuracy by providing the scores using gap opening penalties (GOP) and gap extension penalties (GEP). The method improves quality of alignment by implementing amino acid weight matrices such as BLOSUM with series of 80,62,45,30, PAM with series of 20, 60, 120, 350, GONNET
24 matrix (can be used for larger datset) with series of 80, 120, 160, 250 and 350. Though CLUSTAL W is handy to align large number of sequences with reliable accuracy, there are few recommended alignment tools to align transmembrane proteins, which are conceptually different in aligning TM helices and loops by using different matrices (for example PRALINE TM and MAFFT). 1.10.2 PRALINE TM Thus, the servers to align TM-proteins (like PRALINE TM) are more specific, where the transmembrane regions are first predicted (Pirovano 2008). The reliable topology prediction methods guide the boundaries of TM domain and loop as an initial requirement. PRALINE TM refers HMMTOP v2. 1 (Tusnady and Simon, 2001), TMHMM v2. 0 (Krogh et al 2001) and Phobius (Kll et al 2007) for membrane predictions. Then, the profile scoring scheme simply applies TM-specific substitution scores from the matrices like PHAT to reliably compare TM positions. Finally, an alternative iterative scheme was implied to enhance the alignment quality. Recent study suggests that PHAT matrix (Ng et al 2000) outperforms to the JTT matrix (Jones et al 1992) especially on database searching (Ng et al 2000). Earlier methods like STMP (Shafrir and Guy, 2004) is also useful and is the first multiple sequence alignment program targeted to align transmembrane proteins. 1.10.3 MAFFT MAFFT (Multiple Alignment using Fast Fourier Transform) can be used for aligning large datasets of transmembrane protein. The method is very advanced than other alignment programs, in increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. MAFFT alignment program (Katoh et al 2002) is more effective with two different heuristics, such as the progressive method (FFT-NS-2) and the iterative refinement method
25 (FFT-NS-I). The other important feature of the program is that the number of input sequences can be very large and it offers a range of multiple alignment methods such as L-INS-I (accurate; for alignment of <~200 sequences), FFTNS-2 (fast; for alignment of <~10,000 sequences) and so on. Yet another attractive feature of the alignment program is to provide a range of matrices, especially JTT 200 matrix (Jones et al 1992) which is usually meant to deal with membrane proteins. Large number of sequences aligned by appropriate alignment tools, can be viewed/edited (if required) with the help of alignment viewers/editors. Alignment viewers and editors such as Seaview, Jalview, CLC, Genedoc, SeqPop, BioEdit, and MEGA are highly useful in visualizing and improving the alignment quality. 1.11 DERIVING PHYLOGENY OF GPCRs/ORs Multiple sequence alignments (MSA) supply the sequence properties at equivalent regions, which can be used to drive phylogenetic analysis. It is a hypothesis, wherein the tree topology depicts the inferred evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical and/or genetic characteristics. The tree-based representation of the observed relationship among the species/sequences (protein or nucleotides) can be inferred for the passed evolutionary trends within and across genomes. Various computer-aided programs are available to generate maximum likelihood (ML) (Strimmer and von Haeseler 1997) or unweighted pair group method with arithmetic mean (UPGMA) algorithm (Sokal and Michener 1958) or neighbor joining method (NJ) (Saitou and Nei 1987) of constructing phylogeny. ML method refers to the probabilistic approach and evaluates every possible tree topology given a starting set of sequences. Here,
26 by assigning probabilities to every possible evolutionary change at informative sites, and by maximizing the total probability of the tree, search for the optimal choice can be reached. In NJ method, it eliminates possible errors that can occur when we use UPGMA method. NJ algorithm searches not only evaluate pairwise distances (using distance matrices), but also set neighbors that minimize the total length of the tree. NJ method is recommended to deal with sequences whose evolutionary distances are short. There are multiple packages available both for the standalone and on-line access. Suites like PHYLIP, TREE-PUZZLE and MEGA are more user-friendly and are appropriate tools to perform phylogenetic analysis both for ML and NJ method. 1.11.1 PHYLIP PHYLIP (Phylogeny Inference Package) (Felsenstein, 1981) is a free computational phylogenetic package consisting of 35 portable programs. It facilitates to perform parsimony, distance matrix, and likelihood methods, including bootstrapping and consensus trees. 1.11.2 TREE-PUZZLE It is a popular computer program to reconstruct phylogenetic trees from molecular sequence data such as nucleotide sequence/ proteins based on the maximum likelihood (ML) method (Schmidt et al 2002). It implements quartet puzzling algorithm. The average distance between all pairs of sequences (maximum likelihood distances) is computed. These distances can be viewed as a rough measure for the overall sequence divergence. This is performed in three steps: In ML step, the supplied n (number of sequences in the alignments) is set for the quartets. All quartets are evaluated using ML method and the three quartet topologies such as ab|cd, ac|bd, and ac|bd are weighted by their posterior probabilities. In the puzzling step, quartet trees are considered from intermediate tree adding sequences one-by-one. As this step
27 is highly dependent on the order of sequences, many intermediate trees from different input orders are constructed. In the consensus step, with the generated intermediate trees, a majority rule consensus tree has been built. These two steps are timeconsuming and the result files (.dist, .puzzle, and .outtree) are useful for interpreting tree topologies. The evolutionary models such as DAYHOFF, JTT and mtREV24 (Adachi and Hasegawa, 1996) (is for use with proteins encoded on mtDNA) matrices are provided. Others like BLOSUM 62 and the WAG model (Whelan and Goldman, 2004) are for more distantly related amino acid sequences. VT is for use with proteins of distant relationships as well (Muller and Vingron 2000). 1.11.3
MEGA (Molecular Evolutionary Genetics Analysis)
MEGA is an user-friendly software for phylogenetic studies, which also integrates sequence alignment approaches like CLUSTAL W and MUSCLE. MEGA 5 can be employed for phylogenetic reconstruction and phylogeny visualization, testing an array of evolutionary hypotheses using maximum likelihood (ML), maximum composite likelihood (MCL), neighbor-joining (NJ), minimum evolution (ME) and maximum parsimony (MP) to produce bootstrap construction tree for the required replications. MEGA is handy to display tree topologies legibly such as rectangular, radial and circular displays (Kumar et al 2008). 1.12 CLUSTER ASSOCIATIONS The generated tree topologies can be inferred for cluster associations. Understanding the distribution of clusters with significant bootstrap (BS) values helps to classify / group the related sequences. For example, in the phylogenetic analysis on mouse olfactory receptors (Zhang and Firestein 2002), by using consensus tree, nearly 1000 OR genes were classified into several OR families. For the classification, they identified reliable clusters as those having >50%
28 bootstrap support and more than 40% protein identity. By this definition, mouse ORs were classified into 228 families. This kind of segregation of gene/protein sequences will create cluster association for the interested protein families. Cluster associations will provide information about the conserved species-specific behaviors and evolutionary integrity obtained at intra- and inter-genomic level (Figure 1.6). 1.13 SEQUENCE CONSERVATION AND DIVERSITY The performed intra- and inter-genomic phylogenetic studies guide the sequence association for the species-specific tendency as well as coclustering arrangements. Evolutionarily conserved sequence properties such as motifs (Scott Gleim 2009) are highly important to connect further for the structural and functional relevance. Several computational techniques and software tools are available to locate and display conserved amino acid residues in the aligned set of homologues sequences. Available tools and databases such as TOPDOM, MeMotif, PROSITE, IMOTdb and SmoS, WEBLOGO, and with the guidance of in-house program MotifS program (by Sowdhamini, yet to be published) can be used to visualize the set of aligned TM-proteins and observed motifs and AAS. Such annotation tools can be applied in comparative genomics of GPCRs or ORs to identify cluster-specific/family-specific motifs along with the knowledge on predicted topology (Figure 1.6). 1.14 HOMOLOGY MODELLING OF GPCRs/ORs The sequence searches and clustering provide representative sequences to generate three-dimensional structures and this further helps to map hotspots and to associate functional properties. Comparative
29 modelling/homology modelling is an appropriate procedure for generating 3D models for the interested proteins and can be achieved by the following steps: i) Primarily, homologues sequences of the query can be collected by using effective sequence search methods. The nearest homologues sequence with reference sequence, whose structure is known, can be used as a template. ii) Pairwise alignment of template and target sequence can be made by using appropriate alignment methods. Procedures such as PRALINE
TM
, MAFFT can be used for membrane
proteins. Alignments can be manually edited to improve the alignment quality (using MEGA). iii) Building co-ordinates of the three-dimensional model based on the generated alignment can be achieved by using software like MODELLER (Sali and Blundell, 1993) and web server like SWISS-MODEL (Arnold et al 2006). iv) Assessing potential accuracy for the generated models and models with least energy constraints can be selected. If unfavorable conformations and short contacts are observed, model can be minimized by using SYBYL software package (Tripos associate Inc). v) Structure validation can be done by checking for disallowed conformations or structural environments (can be guided by Ramachandran Plot values, using PROCHECK server (Laskowski et al 1993) and VERIFY 3D (Bowie et al 1991). In essence, the compiled writings in this introductory chapter provide a necessary background to the following work chapters 2-6.
30
CHAPTER 2 CROSS-GENOME CLUSTERING OF HUMAN AND C. ELEGANS G-PROTEIN COUPLED RECEPTORS
2.1
INTRODUCTION Membrane proteins are ubiquitous (Perez 2005), constitute nearly
20% of whole genomes and are most attractive drug targets since they are implicated in various diseases. Membrane proteins are embedded within the lipid bilayer and are designated as transmembrane proteins, since they loop inside and outside of the cell boundaries. A class of cell-surface receptors retains structural features, having extracellular N-terminal, intracellular C-terminal with seven transmembrane-helices (TMHs) connected by three intra and extracellular loops and provides a snake-like structural element /display to have names such as 7TM receptors or heptahelical receptors or serpentine-like receptors. If the downstream targets of such membrane receptors are guanine nucleotide binding proteins, they are also referred as Guanine nucleotide-binding protein-coupled receptors, G-protein coupled receptors (GPCRs) or serpentine receptors. 2.2 C. ELEGANS - AN ATTRACTIVE ANIMAL MODEL C. elegans is an attractive experimental animal model and is a hermaphrodite with the protandrous reproductive system. For over 50 years, C. elegans has been used as viable/feasible model organism, because of i) Clear understanding on complete cell-lineage from fertilization to maturity (Brenner 1974) in the worm , ii) Detailed study on its entire nervous
31
system (White et al 1986),
iii)
knowledge on RNA interference in
manipulating the expression of genes (Fire et al 1998), iv) convenient storage and maintenance protocol and v) Availability of genome information from C. elegans Sequencing Consortium (Hillier et al 2005). The mentioned classical reasons made this simple nematode as a useful model organism. Further, Sydney Brenner, MRC from Laboratory of Molecular Biology at Cambridge, who received Nobel Prize in Physiology or Medicine in 2002, also added more interest and confidence in employing C. elegans as a model organism. Phylogenetic analysis on C. elegans with other organism of interest leads to comparative genomics and helps to explore the observed functional relevance between genomes at cross-genome level. So, for the current study, a cross-genome phylogenetic analysis with selected human GPCRs and C. elegans GPCRs has been pursued. 2.2.1 Features Related to C. elegans and Human GPCRs Current objective to study cross-genome GPCR clustering could be very effective to address commonality/relevance occur between C. elegans and human GPCRs at sequence level and the resulted sequence information can be further well-studied for structural and functional relevance. Also, the list of promising features and reasons about C. elegans helps to compare with the higher order organism are as follows: It has been observed that atleast 12 signal transduction pathways in C. elegans shows either partial or high degree of conservation with human biology and the relevance is brought into the lime-light (Scientific Frontiers in Developmental Toxicology and Risk Assessment 2000). One of the best studied pathways in C. elegans is the insulin/insulin-like growth factor IGF-1 signaling pathway and is related to the mechanism of controlling lifespan of worm. The same pathway is conserved across taxa like Drosophila and human.
32
High conservation of (TGF-b) pathway, partial conservation of Toll-like receptor pathway and the JAK/STAT signaling pathway, Wnt pathway via catenin , receptor serine/threonine kinase (TGF- receptor) pathway -, hedgehog pathway (patched receptor protein)-, receptor tyrosine kinase pathway, notch-delta pathway-, receptor-linked cytoplasmic tyrosine kinase (cytokine) pathway, IL1froll receptor; NF-kappaB pathway, nuclear hormone receptor pathway, apoptosis pathway, receptor protein tyrosine phosphatase (RPTPs) pathway, receptor guanylate cyclase pathway, nitric oxide receptor pathway, G-protein coupled receptor (large G-protein) pathway, integrin pathway, cadherin pathway, gap junction pathway, ligandgated cation channel pathway are the few pathways mentioned in early literature for the conservation across fly, nematode and vertebrates. Studies related to neurodegenerative diseases, such as Parkinsons and Huntingtons diseases, have been explored by using the transgenic C. elegans. Notably, genes related to Alzheimers disease and colon cancer in humans have counterparts in C. elegans and (Kuwabara and O'Neil 2001). The Machado-Joseph disease gene (SCA3/MJD) also has an identified homologue in C. elegans (Fortini et al 2000). A recent report on the identification of two C. elegans NP_491453, NP_506566 with more than 40% sequence similarity to human gonadotropin releasing hormone receptor I and II (GnRHR1 and GnRHR2) provide a platform for relating extent of similarities in their reproductive endocrinology (Vadakkadath Meethal et al 2006). Functional expression studies were performed on human somatostatin receptor 2 (Sstr 2) and chemokine receptor 5 (CCR5) in the gustatory neurons of C. elegans (Teng et al 2006). Further, various bioinformatic approaches and resources (C. elegans Sequencing Consortium, 1998 and (Sonnhammer et al 1997) also reporting that C. elegans homologues have been identified for 6080% of human genes (Lai et al 2000).
33
The selected model organism C. elegans carries an extensive repertoire of Pfam domain matches, conserved signalling pathways and homologues of proteins found in other organisms, such as human and Drosophila. Also there is an increasing evidence for genetic and physiological similarity (e.g., stress response and basic physiological processes) with higher order organisms (humans) are noteworthy to deal with C. elegans GPCRs to explore the nematode genome for its genetic influence in the evolutionary trends (Remm and Sonnhammer 2000) and the effectiveness of employing C. elegans as a model organism for understanding fundamental pathways in higher order organisms. 2.3 OBJECTIVES The significant genetic tractability in the whole genome of D. melanogaster (1%), C. elegans (<5%) (C. elegans Sequencing Consortium, 2012 ) and occurrence of more than 1000 candidate GPCRs in humans are further reasons to investigate these cell surface receptors across genome (Marinissen and Gutkind, 2001). 2.4 PRIOR ART Previous lab publication (Metpally and Sowdhamini 2005) proposed a novel approach to establish phylogenetic cluster association and dealt with eight major groups of human GPCRs such as peptide receptors (PR), chemokine receptors (CMK), nucleotide & lipid receptor (N&L), biogenic amine receptor (BGA), secretin (SEC), cell adhesion (CAR), glutamate receptor (GLR), and frizzled and smoothened (FRZ/SMT) with selected GPCRs from Drosophila genome. The current chapter focuses on comparing C. elegans GPCRs (Brenner 1974) with the established human GPCR clusters. The study is aimed to associate more than 1000 GPCRs of C. elegans to the already grouped known human GPCRs. For this, previously
34
established 32 human-Drosophila GPCR cluster dataset was used and the candidate receptors from Drosophila genome have been removed and the representative GPCRs from each cluster has been used to generate PSSM profile to represent the cluster property and are used to employ (RPS-Blast) (Khader Shameer et al 2009, Marchler-Bauer et al 2009) to associate with C. elegans GPCRs. In principle, cross-genome phylogenetic approach is intended to understand the evolutionary plasticity to discover functional similarities and to identify functionally related genes, orthologous relationship and conserved motif patterns across these two organisms. Current study, in turn, will enable to observe the details of cluster association in retaining nematode-specific gene clusters and also the established evolutionary integrity with human GPCRs at cross-genome phylogeny. 2.4.1 Superfamilies of Serpentine Receptors C. elegans chemoreceptor genes are strikingly abundant and diverse, possessing around 400 apparent pseudogenes and almost ~1300 predicted genes that encode members of putative chemosensory genes (Robertson and Thomas 2006, Melkman and Sengupta 2004) and are usually referred as serpentine receptors (SR) or chemosensory receptors (CR). These genes are classified under serpentine receptors (SR) superfamily and about 7% of occurrence of serpentine receptors in the whole genome indicates the extreme dependency of chemosensory abilities due to the lack of visual and auditory systems in C. elegans (Chen et al 2005). Through genetic screening, Sengupta et al identified odr-10 in C. elegans participating in olfactory response which inturn reveals the relationship between odr-10 and other serpentine receptors in C. elegans GPCR families (Sengupta et al 1996). Recent reports are stating that
35
serpentine receptors like srg 36, srg 37 are pheromone-like receptors and participate in sensing ascaroside pheromones which are observed in sex chromosomes (McGrath et al 2011). In other instance, srbc-64 and srbc-66 candidate serpentine receptors are also implicated in pheromone activity in C. elegans (Kim et al 2009). These case studies are helpful in understanding the divergent chemoperception properties in C. elegans with reference to GPCRs. A detailed compilation on SR superfamilies and relative families of odr-10 by Robertson and co-workers (Robertson and Thomas 2006) provide information on phylogenetic distribution, function and expression patterns in C. elegans chemosensory receptors. They classify C. elegans serpentine receptors into nearly 20 recognizable families on the basis of sequence similarity and shared intron locations. 19 of these families are wellestablished and grouped under superfamilies such as Sra superfamily (sra, srab, srb, and sre), Srg superfamily (srg, srt, sru, srv, srx, and srxa), Str superfamily (srd, srh, sri, srj, str) and others or Solo type includes srbc, srsx, srw and srz. Notably, the large Str family along with related sri and srj families are observed to be related to odr-10 (olfactory receptor) in C. elegans. 2.5 2.5.1 i. METHODOLOGY Selection Criteria for C. elegans GPCRs Retreival of C. elegans GPCR sequences Around 1204 GPCR sequences of C. elegans were collected from SEVENS database (Ono et al 2005) and are subjected to prediction of the membrane topology (Figure 2.1).
36
ii.
Prediction of membrane topology for C. elegans GPCRs The membrane topology of each GPCR sequence was predicted by
using SOSUI (Hirokawa et al 1998) and HMMTOP (Tusnady and Simon 2001) prediction methods (refer step 1.2 in Figure 2.1). The observed consensus from both methods was used to define the eligible candidate GPCRs. iii. Elimation of over/ under predicted TM helices : The C. elegans GPCR sequences predicted for 7 (2) TM helices were retained, whereas GPCRs predicted to lower or upper to the mentioned cut-off were removed from the dataset. Thus, totally 1160 GPCR sequences of C. elegans were retained after this screening procedure and the sequences are retained as C. elegans GPCR dataset (Figure 2.1). iv. Alignment of Human (31)/Drosophila(1) GPCR clusters Human-Drosophila GPCR cluster dataset for 32 clusters of eight major groups were obtained from our previous lab publication (Metpally and Sowdhamini 2005) (herein we refer as known cluster association). Since we were interested to perform cross-genome phylogeny with C. elegans GPCRs, the associated Drosophila GPCR sequences were eliminated from the
previously established human-Drosophila GPCR cluster, so to get human GPCR-only clusters from 31 clusters and Drosophila GPCR only cluster from one cluster (Cluster No 26), since it was associated only with
Drosophila GPCRs (Metpally and Sowdhamini 2005) and has been used for our current study. Due to the removal of Drosophila GPCR sequences from the human -Drosophila GPCR cluster dataset of the known cluster association (except for 26th Cluster), the extra indels were observed in the previous
37
alignments. The observed extra indels in each alignment position of the MSA were carefully edited manually by using MEGA 4.0 (Tamura et al 2011). The resulted improved alignment for 32 clusters were retained with totally 353 human GPCR sequences from 31 clusters and 14 Drosophila GPCRs from one cluster (Cluster 26) to obtain 32 clusters (Figure 2.1) and are referred as pre-aligned set of GPCR association as GPCR cluster dataset.
Figure 2.1
Flow-chart to depict the step-wise procedure for crossgenome clustering of GPCRs

Note : Step 1.1 indicates the collection of C. elegans GPCRs from SEVENS database and followed by the removal of redundancy by CD-hit server. Step 1.2 refers to the prediction of membrane topology by HMMTOP server and SOSUI. Step 1.3 refers to the removal of over/under predicted TM-helices. Step 1.4 refers to the construction of human GPCR cluster dataset from already established human-Drosophila GPCR cluster dataset and preparation of respective human GPCR cluster alignment. Step 2 refers to the construction of PSSM profiles from the respective human (31) and Drosophila (1) GPCR cluster alignments. Step 3 refers to the usage of RPS-blast for the association of C. elegans GPCRs with the human GPCR profiles and generation of cross-genome GPCR clusters. Step 4 and 5 refers to the cross-genome alignment and phylogeny, respectively.
38
2.5.2
Generation of Representative Profiles The previously aligned GPCR cluster dataset (by CLUSTAL W)
for 32 clusters of known receptor types were used to generate positionspecific scoring matrix (PSSM) or profile to represent cluster/receptorspecific sequence properties. In the current attempt, for 32 clusters, 32 Profiles were created by supplying their respective MSA of a representative sequence to Psi-Blast procedure (Position-Specific Iterative Basic Local Alignment Search Tool) (Altschul et al 1997) ultimately to produce PSSMprofiles of 32 cluster specific profiles. These profiles are termed as representative profiles, since it represents sequence property of respective alignments from 32 clusters (refer step 2 in Figure 2.1). 2.5.3 i. Performing RPS-Blast Trial study with Known associations Separately, E-value thresholds were standardized by performing a trial study with 102 sample Drosophila GPCR sequences were chosen as queries and correct cluster association could be obtained using RPS-BLAST at an E-value cut-off <0.001 nearly 90% of the times. Cross-associations, which happened in 11 queries, were also meaningful (for example: the Q8MKUO receptor identifies cluster 8, a peptide receptor cluster, but belongs to cluster 11 that also contains peptide receptors). With this significant initial standardization and confidence level of 90 % for correct association, the queries from C. elegans GPCR dataset (unknown association) were given chance to select its mostly related GPCR profile from the generated human (31) /Drosophila (1) GPCR cluster dataset by employing RPS- Blast (Reverse PSI-Blast).
39
ii.
Setting E-value thresholds for unknown association In preliminary analysis, correct associations between the 100
selected Drosophila GPCRs and their respective human GPCR profiles were observed at E-value range of < 0.001 Various ranges of E-value thresholds from 0.001, 0.01, 0.5, 1.0 and < 5.0 to > 5.0 till 14 were tried with C. elegans GPCRs to select its closely related profile from the dataset with little chance of encountering false connections (11% of predicted false association for known cluster association). A need of relaxing the E-value thresholds for different ranges has arisen due to fact of higher evolutionary divergence between humans and C. elegans for the current study. The identification of the closest representative profile by each C. elegans GPCR sequence at varying E-values guides the respective C. elegans GPCRs to the respective human GPCR cluster (as in the previous dataset). The cross-genome cluster association was decided on the basis of its respective profile, and significant bit score, percentage identity and E-value from the hit list arrived from RPS-Blast. The E-value thresholds are mainly considered for finalizing the association. The previously associated human GPCR sequences of the respective known human GPCR profile were aligned with newly associated C. elegans GPCR sequences (Figure 2.1). 2.5.4 Cross Genome Alignment of Human C. elegans GPCRs The pre-aligned set of GPCR sequences from the dataset with newly associated C. elegans GPCR sequences were aligned by an appropriate multiple alignment tool called PRALINETM (Pirovano et al 2008) to generate cross-genome sequence alignments of 32 GPCR clusters. In principle, the alignment is based on the TM topology and profile scoring schemes such as PHAT matrix with the gap penalty of 15 (1 extension) for predicted TM regions and Blosum 62 matrix for non-TM regions with the gap penalty of
40
16.5 (1 extension) followed by an iterative scheme to enhance the alignment quality. Where necessary, alignments were further optimized by manual editing through MEGA 4.0 software (Tamura et al 2007)
(Figure 2.1). 2.5.5 Cross Genome Phylogeny of Human C. elegans GPCRs The generated cross-genome GPCR cluster alignments of human C. elegans GPCRs were clustered to generate cross-genome phylogeny by using Tree-Puzzle (Schmidt et al 2002). This package was employed to perform quartet-based maximum-likelihood phylogenetic analysis with the puzzling step of 10,000 times and resolved trees were generated as a phylogenetic tree file (outtree files). The resultant tree files were viewed by MEGA 4.0. Some critical decisions of the constructed phylogenetic trees were done based on branching patterns and cluster association (Figure 2.2A-C). Few terminologies were introduced for the interest of describing type of cluster association, as discussed below. 2.5.6 2.5.6.1 Terminologies used to Describe Phylogeny Human GPCR clade [HC] Refers to the pure distribution (homogenous occurrence) of human GPCRs in the established branching pattern at intra-genomic level and are referred as HC in tree topology (Figure 2.2A-C). 2.5.6.2 Coclusters [CC] Refers to the coclustering or heterogenous distribution or clear intermixing of C. elegans GPCRs with human GPCRs to represent a strong cross-genomic clustering in the established branching pattern/clade at intergenomic level and is denoted as CC in phylogenetic tree (Figure 2.2A and C).
41
2.5.6.3
Neighbor Clades [NC] Refers to homogenous occurrence of C. elegans GPCRs adjacent or
neighboring to human GPCRs clusters [HC] in the established branching patterns at intra-genomic level and is referred as NC in phylogenetic tree (Figure 2.2B). 2.5.6.4 Neighbor Members [NM] Refers to pure (homogenous) or intermixed (heterogenous) distribution of GPCRs in the branching pattern with limited nodes, mostly originating from root and is denoted as NM (Figure 2.2C). However, the observed associations may not be viewed as closely related at inter-genomic level, as mentioned in CC. 2.5.6.5 Species- specific members [SS] Refers to pure distribution (homogenous) of C. elegans GPCRs and remains as separate clade in a tree topology. The clades are denoted as SS in the phylogenetic tree. In this current study, although we refer species-specific members, it is only in the context of cross-genome human-C. elegans GPCRs and this term does not imply complete set of unique genes observed only in one species in the entire taxonomy and evolutionary tree of life (Figure 2.2C). 2.5.6.6 Superfamilies of serpentine receptors (SR) The distribution of chemosensory receptors of C. elegans are discussed according to superfamilies like Sra, Str, Srg and Other, broken into 24 types, as suggested by Robertson and coworkers. This classification has been followed throughout the 32 clusters to appreciate the influence of human GPCRs and species-specific preservation of such superfamily members.
42
Also, the bootstrap supporting values (herein referred as Bs), cluster association of serpentine receptor type (superfamily level) are observed using both rectangular (known as dendrogram) and radial view (known as radial display) for the analysis and final graphical display and result interpretations (Figure 2.1).
Figure 2.2(A-C) Pictorial representation for various types of cluster association

Notes: 2.4.A. Figure represents the cluster association as HC (human GPCR clade) and CC (Coclusters), wherein HC represents the association of GPCRs from human genome, CC represents association of GPCRs from human and C. elegans genome. 2.4.B. refers to NC association, where the occurrence of C. elegans GPCRs has observed at adjacent or neighboring to HC. 2.4.C. refers to the association for speciesspecific (SS), Coclusters (CC) and neighbor member (NM) occur in the tree topology.
2.6
RESULTS AND DISCUSSION In the current Chapter (as mentioned in Methods), 1106 C. elegans
GPCR sequences were associated by querying against the database of human GPCR profiles from the known cluster association by using a sensitive RPS
43
Blast technique. The newly associated C. elegans GPCR sequences were tabulated (Table 2.1) for cluster wise association with significant E-values using the procedure for cross-genome association discussed and detailed in Methods The association observed between the two genomes is considered for the best connecting sequences for known receptor type in higherorder organism with a effective model organism further to compare the sequence properties, then to connect the structural, functional, and evolutionary relatedness among them. The observed cross-genome cluster associations were discussed in detail below with cluster-wise summary according to the observed crossgenome GPCR topology/phylogeny. 2.6.1 Result summary for Peptide Receptors Cluster 1-11 are related to peptide receptors and around 442 GPCRs from C. elegans have been associated with nearly 101 peptide receptors of human in the dataset (Table 2.1). PR occur predominantly in the dataset. Generally, the size of the peptide ligands varies from two amino acid residues to as many as 50. Broadly GPCRs occur in A1-A19 subfamilies. Many of the peptide receptors are related to potential clinical applications and related to various diseases such as chronic inflammatory diseases, degenerative diseases, autoimmune diseases, cancer, cardiovascular diseases etc, hence these receptors also act as interesting drug targets. Few important receptors are given along with their group to emphasize the distribution of various PR in the Clusters 1-11. Small peptide receptors such as angiotensin (8 amino acids), bradykinin (9 amino acids) (refer Cluster 3), apelin (`~36 amino acids) and orphan receptors GPRF that act as co-receptor for the human immunodeficiency virus (HIV) (belong to subfamily A2 and A3) are observed in the cluster dataset (Joost and Methner
44
2002). Bombesin receptor, gastrin-releasing peptide receptor (GRPR), endothelin receptor (Cluster 7), thyrotropin-releasing hormone receptor (TRHR, TRFR), and motilin receptor that are related to subfamily A7 are observed in the cluster dataset. Anaphylatoxin receptors formyl peptide receptor, MAS1 oncogene ,GPR1 (Cluster 6), GPR32 (Cluster 9), GPR44 and GPR77 (Cluster 9) are from subfamily A8, neurokinin receptor (Cluster 11), neuropeptide Y receptor (Cluster 11), prolactin-releasing peptide receptor, prokineticin receptor 1, 2 (Cluster 11), GPR19, GPR50, GPR75 and GPR83 from subfamily A9 are found in cluster association. Glycoprotein hormone receptor, leucine-rich repeat-containing G protein-coupled receptor 4, LGR5, LGR6 from subfamily A10 (Cluster 7) are present in the cross-genome GPCR cluster associations, further allowing to explore the functional relevance with nematode GPCRs. Cluster 1 Human GPCRs such as galanin receptor 1-3, melanin-concentrating hormone receptor, KiSS1-derived peptide receptor (GPR54) (de Roux et al 2003), urotensin-II receptor related to subfamily A5 of class A type receptors are observed in Cluster 1. Cluster 1 is associated with eight human peptide receptors and 32 C. elegans GPCRs at different E-value cutoffs (Table 2.1). Human peptide receptors are distributed into clades, having only human GPCRs (HC1) and another clade having human GPCRs with a C. elegans GPCR (CC1). Interestingly in CC1, coclustering of neuropeptide receptor (npr-9) from C. elegans with human GPCR (KiSS1-derived peptide receptor (GPR54) /Swissprot Code (Q969F8) was observed (Figure 2.3.A and B). The human GPCR (Q969F8) observed in CC1 is implicated in breast carcinomas, also majorly involved in endocrine regulations and onset of puberty (Seminara et al 2003). Mutations on this gene have been associated
45
with hypogonadotropic hypogonadism and central precocious puberty in humans (de Roux et al 2003). Such functionally important human peptide receptor extend an orthologous relationship (Table 2.2) with neuropeptide receptor (npr-9) of C. elegans and proves to be an interesting target to be studied in C. elegans model organism. Neighboring clusters (NC1 NC5) exhibit topology only with C. elegans GPCRs, where NC1- NC3, NC5 also neighbor members in NM2, include majority of hypothetical proteins. This further helps to interpret that these unknown or unannotated GPCRs in nematode are probably related to human peptide receptors through cross-genome phylogeny and also probably belong to class A GPCRs (Rhodopsine like). NC4 associates with pure set of srd members. Receptors from Str superfamily (str, srh type) are observed as neighbor members in NM1. SS1 and SS2 retain candidate receptors from srw type in C. elegans. srw type is related to families of FMRFamide and other peptide receptors, which are expected to have relatives in vertebrates and insects and have strong clustering at the chromosomal level (Troemel et al 1995). Their associations with cluster-1 human peptide receptors suggest that they serve as closely related environmental peptide receptors. While observing the distribution of serpentine receptors exclusively at various E-value thresholds in cluster 1 among 32 nematode GPCRs, npr-9 of C. elegans shows favourable association at significant E-value (2.00E-47) with human GPCR (Q969F8). Interestingly, this receptor exhibit orthologous relation with a human GPCR (Table 2.2).
(a) Figure 2.3
(b)
(A and B) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display)
Cross-genome phylogeny of peptide receptors (Cluster 1) : Phylogenetic trees were generated using TREE-PUZZLE 5.1, quartet puzzling steps of 10,000 puzzling steps were done for maximum likelihood method. Out-group is not shown in the figure. Generated newick tree files were colored by using MEGA 4.0. Human GPCRs are denoted in green color, serpentine receptors of C. elegans like Sra super family (aqua), Str super family (fuchsia/pink), Srg super family (blue) ,Others/Solo type receptors (maroon), typical membrane proteins (purple), hypothetical trans membrane proteins (red) are also shown. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.
46
47
Cluster 2 The receptors related to subfamily A4, such as opioid and somatostatin receptors, are found in cluster 2. 11 candidate human peptide receptors (HC1) and 41 serpentine receptors of C. elegans are distributed in cluster 2 (Figure 2.4.A-B). HC1 remains as clade with a majority of somatostatin receptor types (Matsumoto, et al., 2000) and the neighboring clade NC1 has members from Str superfamily (sri, str). NC2, NC4 and NC5 retain receptors of srsx, srv, srsx and srw family, suggesting clear conservation at superfamily level. NC6 branched with srm, srx (others superfamily) with a hypothetical receptor and NC7 includes six hypothetical proteins and including an unidentified vitellogenin-linked transcript family member (uvt-6). Particularly, uvt-6 branches with a hypothetical protein (NP_510833.3) that belongs to rhodopsin family and is most similar to the mammalian somatostatin receptors (referWBGene00006864) (Wicher et al 2009). This helps to understand the association of C. elegans GPCR members with human peptide receptors by RPSBlast where the somatostatin receptor types are present in its profile; this association is also reported in SEVENS database. Diverse members, like of srd, srv, sre types, are also observed as neighboring members in NM1. The observed pure dispersion of srh type receptors of C. elegans at SS1 indicates the high nematodespecific tendencies for these receptor types. Uvt-6 exhibits the significant association at the E-value of 2.00E37. Notably, a hypothetical protein namely R106.2 (NP_510833.3) associates with somatostatin receptor type SSR5 at the significant E-value of 3.00E-41 and found as an ortholog (Table 2.2) proving the effectiveness of RPS-blast in associating related homologues across genome.
(a)
(b)
Figure 2.4 (A and B) : Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display)
Cross-genome phylogeny of peptide receptors (Cluster 2): Phylogenetic trees were generated using TREE-PUZZLE 5.1, quartet puzzling steps of 10,000 puzzling steps were done for maximum likelihood method. Out-group is not shown in the figure. Generated newick tree files were colored by using MEGA 4.0. Human GPCRs are denoted in green color, serpentine receptors of C. elegans like Sra super family (aqua), Str super family (fuchsia/pink), Srg super family (blue) ,Others/Solo type receptors (maroon), typical membrane proteins (purple), hypothetical trans membrane proteins (red) are also shown. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.
48
49
Cluster 3 In cluster-3, HC1 retains eight entries of human peptide receptors and 34 candidate GPCRs from C. elegans (C. elegans Sequencing NC1- NC5, and species-
Consortium ,1998) are distributed into neighboring clusters neighboring members NM1 with 16 mixed receptor types
specific clade SS1 (Figure 2.5.A-B and Table 2.1). HC1 remains with closely related peptide receptors like angiotensin II receptor, bradykinin receptor I, GPR15 types belonging to the same subfamily of A3 in GPCR classification (Joost and Methner 2002). NC1 retains candidates from srd type of Str superfamily. Srd types of chemoreceptors in C. elegans retain one potential disulfide bond in extracellular domain 2 and shares with the srh and sri families a highly conserved PYR sequence at or near the inner end of transmembrane (TM) helix 7 and belongs to pfam profile PF10317 (Thomas and Robertson 2008). NC2 carries srxa and str type C. elegans GPCR members and NC3-NC5 clades carry pure set of serpentine receptors of sri, hypothetical protein and srw members, respectively. Abundant srw members with comparable representation from Str superfamily (srh type) and Srg superfamily (srt, srv type) of C. elegans are dispersed as neighboring members in NM1. SS1 covers six candidate GPCRs from Str superfamily and single representation from Srg superfamily. By observing the distribution of serpentine receptors in PR cluster 3, two hypothetical proteins namely T15B7.11 (NP_504729.1) and C35A11.1 (NP_504431.2), in NC4, are associated at the significant E-value thresholds of 3.00E-08 and 8.00E-08, respectively.
(a)
(b)
50
51
Cluster 4 In cluster-4, 32 candidate GPCRs from C. elegans and 8 human GPCRs were observed and subgrouping of human GPCRs into HC1- HC3 helps to understand evolutionary integrity at intra-genomic level (Figure 2.6 A-B). HC1 includes human Gastrin-releasing peptide receptor (GRPR), neuromedin-B receptor (NMB-R) (Neuromedin-B-preferring bombesin receptor, related to Lung carcinoma, (Thomas et al 2009)and BRS3 (Bombesin receptor subtype-3 related to obesity and associated to diseases (Ohki-Hamazaki et al 1997) and HC2 retains functionally related Endothelin receptor, Non-selective type (ETBR) and Endothelin-1 receptor precursor; ET-A (ET1R). G-protein-coupled receptor 37 and endothelin receptor type B-like (Villeneuve et al 2000) are noticed in HC3 carrying the same functional property, such as participating in glucocorticoid actions and blood pressure control. CC1 refers to the coclustering of human GPCR (Q8TDVO) and a C. elegans receptor (NP_502893.2 (srv-14), refer: WBGene00005725) which illustrates the intergenomic association at cross-genome phylogeny (Figure 2.6.A), suggesting similar function. The neighboring clusters include srsx, srw and srh types and are distributed in NC1, NC3 and NC4, respectively in a common fashion covering two members of the same receptor type. NC2 retains sri type receptor with a hypothetical protein. NC5 clade includes two members of srj type of Str superfamily with a hypothetical protein. NC6 includes members of hypothetical proteins (Rhodopsin-like), NC7, NC8, NC11 retain srh type members uniquely. NC10 includes candidates from srh type and an unannotated transmembrane protein. Two srh type receptors, clusters with a hypothetical protein observed in NM1, further providing functional relevance. Overall, this cluster is an illustrative model in explaining the rich distribution of Str type of SR with hypothetical proteins to connect functional relevance from known receptor type. SS1 includes four srd members of Str superfamily. Apart from srv-14 observed in CC1, two hypothetical proteins such C54A12.2 (NP_494987.2) and C35A5.7 (NP_505697.2) at the significant E-values such as 2.00E-13, 1.00E-10 in NC6 exhibit significant association.
(a)
(b)
52
53
Cluster 5 Cluster-5 carries 54 C. elegans GPCRs and 8 human GPCRs (Figure 2.7.A-B) and dispersion of human GPCRs in CC1 is notable for cross-genome clustering, wherein human thyrotropin-releasing hormone receptor (TRFR_HUMAN) coclusters with a GPCR from (NP_491990.1-rhodopsin-like/hypothetical protein). C. elegans such
Indeed,
coclustering is expected in this case, since there is significant evolutionary similarity between the two proteins and the pair is termed as an ortholog (Table 2.2) also shares similar functional cues in calcium signaling pathway (KEGG PATH: Ko04020). Further, NP_505077.1 (str-138) Neuromedin U receptor 2 of C. elegans is observed to retain orthologous relationship with human GPCR Q96AM5 - although their functional equivalence is yet to be established (Table 2.2). NC1 to NC10 are reported as neighboring clusters, wherein NC1 includes candidates of Sri, Srh type from Str superfamily and uniformly the pure set of Srx and Srw candidates are observed in NC2 and NC6, whereas rest of the neighboring clusters (NC3-NC5, NC7-NC10) retain mostly hypothetical proteins/receptors. NC8 clade branches with GNAT (GCN5related N-acetyltransuperfamilyerase (GNAT) family protein) at the bs of 59 with hypothetical protein member and NC9 is branching with a Spr-2 Sex Peptide Receptor (Drosophila) Related family member (sprr-2)
(NP_510455.2). Notably, all the neighbor members in NM1 are also from hypothetical proteins. Overall, this cluster covers majorly of rho-like members (hypothetical proteins) and provides broader scope in connecting functional relevance with available 19 subfamilies (A1-A19) of GPCRs of higher organisms (http://en.wikipedia.org/wiki/Rhodopsin-like_receptors).
SS1 and SS2 include purely Str superfamily superfamily members (srj, str and
54
srd) and notably SS3 retains species-specific olfactory receptor Odr-10 (Sengupta et al 1996) and branches with three other str type receptors. This superfamily helps to correlate the functional clues within candidate representation for olfaction in C. elegans and also proves the fact that Str/Stl family carries the large group of genes with special functional relationship towards Odr-10. Since knowing Odr-10 is meant for olfactory perspective, association of this sequence with class-A member by RPS-Blast, albeit at poor E-values and appearing as species-specific clade is encouraging suggesting that such cross-family connections can be recognized using significance of E-values and mode of clustering. Apart from orthologs,
hypothetical proteins such as C48C5.1 (NP_509515.1) and K10B4.4 (NP_493666.2) associated at significant E-values and can be explored for functional relevance with the respective receptor profile and are notably observed in CC1.
(a)
(b)
Figure 2.7 (A and B) Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display)
55
56
Cluster 6 Cholecystokinin receptor, neuropeptide FF receptor, orexin receptor, vasopressin receptor, gonadotrophin releasing hormone receptor (GNRHR, GRHR) belongs to subfamily A6 are seen in cluster6. Cluster-6 establishes peculiar pattern of associating with eight hormone receptors of human with 42 C. elegans GPCRs. Limited branching patterns is the critical feature in describing this phylogeny suggesting polyploidy in this cluster (Figure 2.8.A-B). HC1 carries closely related V2R, V1BR, V2BR -Vasopressin receptors (Thomas et al 2009) and five other human GPCRs and are dispersed along with neighbor members in NM1 along with the diverse type of C. elegans GPCRs. NC1 NC5 are denoted as neighboring clusters due to the occurrence of HC1 in between. NC1-NC3, NC5 clades, on the other hand retain only srw, srh, srsx, srh type members, respectively. NC4 associates with a hypothetical protein and a gnrr (NP_491453.1) and 27 entries are distributed as neighbor members in NM1 including five human peptide receptors. The neighbor members of C. elegans majorly includes candidates from srw type, five candidate GPCRs from Str superfamily, six entries from Gnrr type, two entries of hypothetical proteins and a sre type receptor. SS1 retains 2 srw receptors in a distinct fashion to represent species-specific clade. This cluster retains a peculiar fashion of intermixing nematode gnrr (gonadotropin releasing hormone receptor) (Vadakkadath Meethal et al 2006) with human gonadotropin releasing hormone receptor (GRHR) further helps to correlate biological significance in reproductive endocrinology in model organisms.
(a)
(b)
Figure 2.8 (A and B ) : Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display)
Cross-genome phylogeny of peptide receptors (Cluster 6): Phylogenetic trees were generated using TREE-PUZZLE 5.1, quartet puzzling steps of 10,000 puzzling steps were done for maximum likelihood method. Out-group is not shown in the figure. Generated newick tree files were colored by using MEGA 4.0. Human GPCRs are denoted in green color, serpentine receptors of C. elegans like Sra super family (aqua), Str super family (fuchsia/pink), Srg super family (blue) ,Others/Solo type receptors (maroon), typical membrane proteins (purple), hypothetical trans membrane proteins (red) are also shown. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side)displays.
57
58
Cluster 7 Neuromedin U receptor (cluster 5), neurotensin receptor (cluster 5), thyrotropin-releasing hormone receptor (TRHR, TRFR), GHSR, GPR39, GHSR and GPR39 (cluster 5), belongs to subfamily A7are observed in cluster 7. In cluster-7, eight human and 34 C. elegans GPCRs are associated, where the human GPCRs are branched into HC1 and HC2 suggesting distinct members observed even within the humanGPCRs clusters, whereas CC1 illustrates the coclustering of leucine-rich repeat-containing GPCR7 (LGR7) and insulin-like peptide 3 receptors (LGR8) (Daniel et al 2006) of human GPCRs with a homologue from C. elegans (fshr-1) (Figure 2.9.A-B). This strong coclustering can be explained not only due to the orthologous relationship to mammalian follicle stimulating hormone receptor, but also its functional importance in germline differentiation and survival in nematode taxon (Cho et al 2007). HC1 retains closely related human glycoprotein hormone receptors and HC2 branches with related human leucine-rich hormone receptors denoting the functional integrity observed at higher order organisms for these types of receptors. This cluster covers species-specific members in SS1 including sri and srh type members of Str superfamily. The neighboring clusters NC2 and NC3 have candidate representation from srx, sre and srt type receptors. NC3-NC9 and following neighboring members are abundant and purely distributed with members from the largest superfamily of Str and highly duplicated srt members of C. elegans. Among the 34 C. elegans GPCRs, fshr-1 shows high E-value significance with (2.00E-59) LGR7 and 8 and observed in CC1 association.
(a)
(b)
59
60
Cluster 8 Cluster-8 carries eight human GPCRs and 34 C. elegans GPCRs. HC1 retains neuropeptide receptors, HC2 also includes neuropeptide receptors related to wakefulness, food consumption and locomotion in humans. Deletion of the orexin gene in mice produces a condition similar to canine and human narcolepsy in vivo (Sikder and Kodadek 2007). CC1 includes human cholecystokinin (CCK) receptors, important for gall bladder contraction and pancreatic enzyme secretion (Ulrich et al 1993) and is significantly branched with a nematode cholecystokinin receptor-type namely Ckr-2 (NP_001022842.1) at the bs of 55. The observed association further helps to analyse CCK receptors in two taxa for functional similarities and are illustrative of cross-genome association. Eight neighboring members of C. elegans GPCRs like srw, sprr, srxa type members from Other type superfamily, two hypothetical protein members, a neuropeptide receptor (NK2R) and with other two human neuropeptide receptors suggesting a common functional relevance despite a heterogenous dispersion. Neighboring clusters are observed from NC1-NC3. NC1 is associated purely with the srd type receptors. NC2 includes a str candidate with a probable GPCR of C. elegans NP_509368.1 (C02B8.5). NC3 carries neuropeptide receptors (npr-1, npr-2) of C. elegans. Overall, cluster 8 includes huge number of srh type receptors from Str superfamily and the retention of this largest superfamily is observed at nematode specific-clades from SS1-SS7 (Figure 2.10.A-B) suggesting a significant over-representation or amplification of species-specific members. Apart from the ortholog npr-2, notably a hypothetical protein Y54E2A.1 (NP_497057.2) associated at significant E-value of 2.00E-45.
(a)
(b)
Figure 2.10 (A and B ): Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display)
61
62
Cluster 9 Cluster-9 has 54 entries of C. elegans GPCRs and 14 human GPCRs and the distribution of C. elegans Str superfamily receptors dominates along with srbc receptor population and equally intermixed human peptide receptors (Figure 2.11 A-B). Human GPCRs within this cluster are sub distributed into five clades (HC1-HC5) and also with C. elegans str type members. Four other human peptide receptors, observed as neighbor members as NM1 in tree topology, represent clear inter-genomic clustering. Interestingly, the formyl peptide receptors / chemoattractant receptors (FML1, FML2, FMLR and GP44) of human origin belonging to the subfamily of A8 (http://en.wikipedia.org/wiki/Rhodopsin-like_receptors and (Rognan 2006) remain as neighbor members, suggesting that C. elegans GPCR counterparts could be identified for such receptors. NC1NC12 is observed for neighboring clusters. Among 12 neighboring clades (Figure 2.11 A-B), particularly NC2, NC3, NC4, NC5, NC8, NC9, NC10 and NC12 are branched predominately distributed with Str superfamily members, whereas NC1 has receptors from Srg superfamily. Candidates from srbc type occur at NC7, NC11 exclusively and in NC6 hypotetical protein has a counterpart with a typical GPCR (dct-12). NC11 includes receptors from srbc and srw families. An example study from this cluster association i.e., srbc-64 and srbc-66 candidates (Figure 2.11 A-B) are responsible for pheromone activity in C. elegans and this illustrates the involvement of serpentine receptors not only in chemoperception, but also proves the fact that all the serpentine receptors are not necessarily non-GPCRs (Murphy and Tiffany 1991); few are
particular and most are related to potential GPCRs. SS1 carries distinct srj type receptors from Str superfamily with a hypothetical protein. Particularly, CAB03313.3 at the E-value of 6.00E-06 shows significant association.
(a)
(b)
Figure 2.11 (A and B ) :Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display)
63
64
Cluster 10 Cluster 10 (Figure 2.12.A-B) includes eight entries of human GPCRs and 26 entries of C. members, while NC4 is with pure set of str members. Various type receptors like sri, srt, srg, srh, srd, srt are also associated in NM1 clade further to analyse the diverse properties of serpentine receptors in C. elegans. Notably, srd-2 (NP_496196.1) in NM1 exhibit significant association at the E-value of 5.00E-05.
Figure 2.12 (A and B)
Cross-genome phylogeny of peptide receptors: (Rectangular Display and Radial Display

65
Cluster 11 Cluster 11 retains 10 human chemokine peptide receptors and 62 C.
elegans GPCRs (Figure 2.13.A-B) and exhibits sufficient polyploidy. Though all the entries are uniting at the root, the association provides NM1 with many more neighboring members like hypothetical proteins, sre, sra, srh, sri, srw members, typical GPCR members like GAL4, Tag-49 along with 10 related human peptide receptors. Appreciably, two peptide receptors of human (NY1R and NY4R) retained strong association and cluster together within the clade HC1, thereby providing NC2-NC17 as neighboring
(Figure 2.13.A-B). Also, notably two orthologous pairs are also present in this cluster (refer Table 2.2). Interestingly, each neighboring cluster retains its distinct identity with a branching pattern of carrying the same type of receptors belongs to the appropriate superfamily (Troemel et al 1995). Thus, NC1, NC2, NC5, NC6, NC15-17 includes majority of hypothetical proteins and NC3, NC14, NC8, NC10, NC11, NC13, NC14 include members from the largest Str superfamily. Notably, considerable number of hypothetical proteins are observed at the significant E-value thresholds in this cluster related to human peptide receptors.
(a)
(b)
Figure 2.13 (A and B ) :Cross-genome phylogeny of peptide receptors: (Rectangular Display & Radial Display)
66
67
2.6.2
Result summary for Chemokine Receptors
Cluster 12 and Cluster 13 Clusters 12 and 13 include candidates from chemokine receptors. Cluster 12 is associated with 16 entries of C. elegans GPCRs and 10 human GPCRs (Figure 2.14.A-B). HC1 accommodates purely with 10 human GPCR entries denoting the evolutionary specificity of human chemokine receptors (Joost and Methner 2002). Likewise, SS1 clade retains purely nematode chemokine receptors of srh type members from Str superfamily. Neighboring clusters NC1 and NC2 also retain candidates from Str superfamily (srd,str) predominantly, wherein a srt member (NP_507069.1) is associated at the significant E-value of 0.86. NC3 is dispersed with a Sra type, hypothetical protein at the bs of 53 and is followed by diverse neighbor members like sri, srbc and Col-40 in NM1. Notably, among the other serpentine type receptors, str-177 (NP_505383.2) shows favourable association with human CMK receptors at the significant E-value 0.024 and observed in NC2 (Figure 2.14.A-B). In Cluster-13, 15 human GPCRs are associated with 13 C. elegans GPCRs. Human GPCRs are distributed in the HC1, HC2 and NM1. Especially, IL-8A and IL-8B receptors, which are functionally related to calcium storage in human cells (Teng et al 2006) are clustered together at the highest bs of 96 in HC1 clade (Figure 2.15.A-B). HC2 comprises of receptor for adrenomedullin (ADMR) and Q8NE10. HC3 associates with CCR5 and CCR3 at the bs value of 50 and both of these receptors are implicated in AIDS virology (Stefano Costanzi and Gershengorn 2006). NM1 includes 9 members of human chemokine receptors with a hypothetical protein from C. elegans (NP_504623.1) at the significant E-value of 0.003 as neighbur members (NM1) in this cluster. Notably, among all human chemokine
68
receptor, DUFF (the Duffy antigen) (Joost and Methner 2002) is distantly related receptor, but this receptor is observed closer to a hypothetical protein in C. elegans. NC1 NC3 illustrate the distribution of C. elegans GPCR entries in neighboring clusters. Neighboring cluster NC1 associates with srh members of Str superfamily and NC2 covers hypothetical protein. NC3 includes five members of Str type and a member from srv type receptors. SS1 indicates species-specificity with a member of srxa and a hypothetical protein. Notably, three hypothetical proteins such as Y54G11B.1 (NP_497004.1), T15B7.12 (NP_504730.1) and F13H6.5 (NP_504623.1) showed favourable association with human CMK receptors at the significant E-values such as 2.00E-06, 2.00E-06 and 0.003, respectively. 2.6.3 Result summary for nucleotide and lipid receptors Clusters from 14-19 associate receptors belonging to nucleotide and lipid type (Joost P and Methner 2002) and are majorly activated by negatively charged ligands (Montero, et al., 2005) . These receptors also retain basic residues at the ligand-binding sites and show high sequence diversity owing to the binding of different ligands. Notably, in the cluster association, all the species-specific clades observed from clusters 14 to 19 belong to Str superfamilySS1-SS8, NC6 in cluster-14; SS1-SS3, NC2, NC12 in cluster-15; SS1, SS2 and majority of neighbor members in cluster -16, SS1 in cluster17, SS1 of cluster-18 and Srd members in SS1 of cluster-19 explain the unique association of candidate GPCRs mainly from Str superfamily. Such C. elegans-only GPCR clusters denote the abundance of this particular superfamily (Str) in the nematode genome.
(a)
(b)
Figure 2.14 (A and B ): Cross-genome phylogeny of chemokine receptors: (Rectangular Display & Radial Display)
Cross-genome phylogeny of chemokine receptors: (Cluster 12): Phylogenetic trees were generated using TREE-PUZZLE 5.1, quartet puzzling steps of 10,000 puzzling steps were done for maximum likelihood method. Out-group is not shown in the figure. Generated newick tree files were colored by using MEGA 4.0. Human GPCRs are denoted in green color, serpentine receptors of C. elegans like Sra super family (aqua), Str super family (fuchsia/pink), Srg super family (blue) ,Others/Solo type receptors (maroon), typical membrane proteins (purple), hypothetical trans membrane proteins (red) are also shown. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.
69
(a)
(b)
Figure 2.15 (A and B): Cross-genome phylogeny of chemokine receptors: (Rectangular Display & Radial Display)
Cross-genome phylogeny of chemokine receptors: (Cluster 13): Phylogenetic trees were generated using TREE-PUZZLE 5.1, quartet puzzling steps of 10,000 puzzling steps were done for maximum likelihood method. Out-group is not shown in the figure. Generated newick tree files were colored by using MEGA 4.0. Human GPCRs are denoted in green color, serpentine receptors of C. elegans like Sra super family (aqua), Str super family (fuchsia/pink), Srg super family (blue) ,Others/Solo type receptors (maroon), typical membrane proteins (purple), hypothetical trans membrane proteins (red) are also shown. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.
70
71
Cluster 14 Cluster-14 retains seven human GPCRs and 64 C. elegans GPCRs. Human GPCR members are distributed in HC1 clade and are also present in the neighboring members in NM1. HC1 retains evolutionarily-related human opsin members and four other opsin candidates are dispersed along with the diverse members of C. elegans (like srsx, srbc, srx, srt, srg) (Figure 2.16.A-B) suggesting the evolutionary conservation at the functional level across two taxa for these types of receptors. The neighboring clusters from NC2-NC5 establish clear composition of srx, srw, srd, srx, respectively wherein NC1 has members from both sre and srg families. NC6 clusters with majority of Str type members (srj and str families) and NM1 observed with srt members. SS1- SS5 clusters retain C. elegans GPCR members of srh and SS6-SS8 retain sri-type belonging to the large Str superfamily respectively, demonstrating species-specificity. Notably, sro-1 type receptor is associated at favourable E-value of 6.00E-16 and is observed in NM1.
(a)
(b)
Figure 2.16 (A and B ) :Cross-genome phylogeny of nucleotide and lipid receptors (Rectangular Display & Radial Display)
Cross-genome phylogeny of nucleotide and lipid receptors (Cluster 14): Phylogenetic trees were generated using TREE-PUZZLE 5.1, quartet puzzling steps of 10,000 puzzling steps were done for maximum likelihood method. Out-group is not shown in the figure. Generated newick tree files were colored by using MEGA 4.0. Human GPCRs are denoted in green color, serpentine receptors of C. elegans like Sra super family (aqua), Str super family (fuchsia/pink), Srg super family (blue) ,Others/Solo type receptors (maroon), typical membrane proteins (purple), hypothetical trans membrane proteins (red) are also shown. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.
72
73
Cluster 15 Cluster-15 is associated with 18 human GPCRs and 58 C. elegans GPCRs. Human GPCR members are distributed into HC1 - HC4 along with C. elegans candidate GPCRs and 7 other human GPCR entries are dispersed in NM1 (Figure 2.17.A-B). SS1-SS3 carries clear composition from Str superfamily (srj, sri and str) denoting the species specificity (Robertson and Thomas, 2006) as opposed to NC1-NC13 observed as neighboring clades. NC1 branches at the significant bs of 94 including srv type members of Srg superfamily. NC2-NC4 clades include association from Str superfamily (srh, sri, srm), wherein NC5 includes srj type receptor and a hypothetical protein. NC6 at the bs of 70, associates with a sra-type member with a hypothetical protein. NC7 has str type receptor with hypothetical proteins at the bs value of 62. NC8 includes two members of sri type from Str superfamily and a candidate of srt type from Srg superfamily. NC9 and NC10 associate majorly with srw members. NC11 includes hypothetical proteins at the bs of 54. Notably, NC12 and NC13 have pure association of candidates from sri type and srw type respectively. NC13 is followed by seven human nucleotide and lipid receptors counterpart with mostly of srw members of C. elegans GPCRs as neighboring members. The human GPCR from HC1 HC4 do not cocluster with C. elegans GPCR members, but observed along with neighbor members shows intergenomic clustering. Notably, three hypothetical proteins such as F53B7.2 (NP_505802.1), ZC374.1
(NP_509685.1) and K10C8.2 (NP_506168.1) exhibit favourable association with human N&L type receptors at the significant E-values such as 4.00E-15, 9.00E-14 and 1.00E-14 respectively.
(a)
(b)
Figure 2.17 (A and B ): Cross-genome phylogeny of nucleotide and lipid receptors(Rectangular Display & Radial Display)
74
75
Cluster 16 In cluster-16, all the human GPCR members are associated in HC1 clade at the bs of 76 (http://en.wikipedia.org/wiki/Rhodopsin-like_receptors), whereas the neighboring clades, NC1 and NC2 retain pure set of srd GPCRs and a mixture of sra and srxa type, respectively. More C. elegans GPCRs such as sri, srh, srx, hypothetical proteins are distributed within cluster-16 as neighbor members (NM1) (Robertson, 1998). SS1 and SS2, however, represent the species-specific clades containing a majority of candidate receptors from Str superfamily (Figure 2.18.A-B). Notably, srx-86 (NP_001023896.1) is associated at the significant E-value of 2.00E-04 showing favorable association with human N&L type receptors and with a hypothetical protein T21622.
(a)
(b)
Figure 2.18 (A and B ): Cross-genome phylogeny of nucleotide and lipid receptors (Rectangular Display & Radial Display)
76
77
Cluster 17 Cluster-17 carries 18 human GPCRs and 30 C. elegans GPCRs (Figure 2.19.A-B). The human GPCR members are dispersed into HC1-HC5. HC1 comprises of cannabinoid receptors(CB1R, CB2R) at the best bs of 93, but the genomes of the protostomian invertebrate like C. elegans do not contain CB1R nor FAAH orthologs (Table 2.2). This indicates that CB1-like cannabinoid receptors may have evolved after the divergence of deuterostomes (Elphick and Egertova 2001) and (McCarroll et al 2005)
Lysophospholipid receptor got clustered in HC2 clade, GP12, GPR3, GPR6 are associated in HC3, Melanocortin receptor observed in HC4 clade, and HC5 carries Sphingosine 1-phosphate receptors. SS1 retains receptors of Srj type and a hypothetical protein and stay distant from the root. The neighboring clusters were annotated from NC1-NC8. Srv type members associate in NC1, NC2 clade retains srh type 23, 66], NC3 retains pure Srj type members (Robertson 1998) at an average E-value of 1.11 (Table 2.1) and NC4 comprises two hypothetical proteins. NC5 clade is of str type with bs of 70. NC6 is of sri type (20) with 65 bs and 0.498 E-value, NC7 is with Srh type. NC8 includes a hypothetical protein with a Srsx (Other type) receptor. Many more neighbors with diverse receptor type are observed in NM1. Overall, cluster-17 covers more number (24 entries) of serpentine types (srh, sri, srj, srg, str) which falls under Str superfamily, and rest of receptors from Srg superfamily got associated. Interestingly, pure set of clustering belongs to the same family has been observed from NC1-NC7 in this cluster helps to explain the strong intra-genomic retention. Interestingly, two hypothetical proteins ZK418.6 (NP_498545.1), F10D7.1 (NP_510813.2) and srsx-22 (NP_505153.2) at the significant E-values such as 8.00E-08, 5.00E-10 and 6.00E-08 shows favourable associations with human N&L receptor type. And F10D7.1, srsx22 are observed in the NC8, whereas ZK418.6 is observed in NM1.
(a)
(b)
Figure 2.19 (A and B) :Cross-genome phylogeny of nucleotide and lipid receptors (Rectangular Display & Radial Display)
78
79
Cluster 18 Cluster-18 consists of 8 Human and 28 C. elegans GPCR entries (Figure 2.20.A-B), which are related to the receptors binding to prostaglandins, prostacyclins and thromboxanes. These receptors bind to ligands which are derivatives of arachidonic acid (AA) and serves as the precursor via the cyclooxygenase (COX) pathway. Prostanoids function close to the site of synthesis, and they are deactivated before they are exported into the circulation as inactive metabolites. Prostanoids have essential homeostatic functions in the cytoprotection of gastric mucosa, renal physiology, gestation, and parturition, but they are also implicated in a number of pathological conditions, such as inflammation, cardiovascular disease and cancer The prostaglandins, prostacyclins and thromboxanes receptors cluster together with a bs of 74 and observed in HC1. The C. elegans GPCRs are grouped into NC1-NC5, where NC1 branched at the best bs of 93 with Srt type receptor and hypothetical protein. NC2 retains pure set of receptors from Sra superfamily. NC3 and NC4 also associate with pure set of receptors like srbc, srw from others superfamily. Many str type receptors, with a hypothetical protein and a srt type receptor, are associated at NC5. A srxa receptor is observed in NM1. In SS1, receptors from largest Str superfamily with a hypothetical protein are branched with an average E-value of 1.178. Particularly, a hypothetical protein AAB38097.2 (T07F8.2) associated at the E-value of 1.00E-15 shows significant association with the human N&L type receptors and observed in NC.
(a)
(b)
80
81
Cluster 19 In this cluster, 18 human GPCRs and 93 C. elegans GPCRs are associated (Table 2.1 and Figure 2.21A-B) in which human GPCRs are distributed into HC1-HC3. However, around 12 human GPCRs intermix with diverse types of receptors from C. elegans (like str, srw, srh, srv, srxa, and hypothetical proteins) suggesting intergenomic association at NM1. Human GPCR clades contain nucleotide and lipid receptor of protease-activated receptors (PAR), psychosine receptors, lysophosphatidylcholine and
sphingosylphosphorylcholine. This cluster is highly populated with GPCRs from Str superfamily and Others type receptors. NC1-NC24 are denoted as neighboring clades. Predominantly, receptors like srh, str, srj, sri, srd belong to Str superfamily and are observed at NC1-11, NC13-14, NC18-NC19, NC21-NC22 along with hypothetical proteins and few receptors from Others type. However, notably all the other neighboring clades like NC12, NC15 17, NC20, NC23, NC24 associate with pure set of srw, srbc type receptors. Overall, this cluster is aggregating majorly of Str and Others type superfamily receptors (Robertson and Thomas 2006) which provide
information in connecting these receptor types with biogenic amine receptors of humans. 2.6.4 Result summary for biogenic amine receptors Biogenic amine receptors are distributed into five clusters (cluster20 to cluster-24) mainly consisting of trace amine; melatonin; serotonin receptors; histamines, muscarinic acetylcholine, adenosine and histamine; dopamine, octopamine and adrenaline receptors Intermixing of human and C. elegans receptors was observed (Figures 2.22 A-B to 2.26 A-B and This suggests biogenic amine receptors have ancient evolutionary origin, as they are observed in invertebrates to higher vertebrates.
(a)
(b)
82
83
Cluster 20 Cluster 20 is represented mainly by trace amine (TA) receptors. Trace amines and their receptors may therefore be useful in treating various neurological and psychiatric disorders (Berry 2007, Davenport 2003)) and are potentially druggable targets (Foord et al 2002). They form a subfamily of GPCRs related to Norepinephrine (NE), serotonin (5-HT), and dopamine (DA) receptors. Q9P1P5_Hum (GPR58) and Q9P1P4_Hum (GPR57) are closely related to Q96RJ0_Hum (TA1). Similarly, O14804_Hum, a putative neurotransmitter receptor (PNR) is closely related to trace amine (Q969N4_Hum, Q96RI8_Hum, and Q96RI9_Hum) receptors. The eight human GPCR entries get branched with very good bs value of 90 at HC1 (Figure 2.22.A-B) and eight GPCR entries of C. elegans GPCRs are distributed as NC1, SS1 and NM1. NC1 carries pure set of srh type receptors of Str superfamily and a srv type receptor of Srg superfamily observed in NM1. Receptors of Str superfamily (str, srh, srj) at an average E-value of 4.23 stay quite distant from the root and has been annotated as a species-specific clade SS1. Notably, apart from srv-33 associated at 7.00E-04, other hypothetical protein in the dataset namely C24A8.6 is associated at significant E-value 1.00E-14, further proposing the analysis of the functional relevance with human N&L type receptors.
(a)
(b)
Figure 2.22 (A and B): Cross-genome phylogeny of biogenic amine receptor receptors (Rectangular Display & Radial Display)
84
85
Cluster 21 Cluster-21 consists of four human and 49 C. elegans GPCRs (Figure 2.23 A-B). Human GPCRs are distributed into HC1 and CC1. HC1 retains GPCRs belonging to family of melatonin receptors and a candidate GPCR of C. elegans NP_493667 (srd 3 type) has been observed in associating with a human GPCR (Q9NQS5_Hum) at a significant E-value of 0.01 in CC1 to represent intergenomic association. The other C. elegans GPCRs get associated into multiple clades (NC1-NC12). Interestingly, NC1, NC4-NC6, NC8-NC10 clades are predominantly associated with receptors of Srg superfamily, whereas NC2, NC7, NC11, NC12 clades carry pure distribution of receptors from largest Str superfamily. Abundant srx type receptors of Srg superfamily are observed with srbc type receptors and a hypothetical protein at NM1. Overall, despite their association with human biogenic amine receptors, there is a clear branching into various members that belong to Srg and Str superfamilies. Also notably, candidate receptors from Srg superfamily is associated predominately in this cluster. Receptors Srx-50 and srx-60 exhibit significant association at the E- values 2.00E-10 and 1.00E-09 in NM1 and NC10, respectively, to study further to connect functional relevance with human N&L type receptors.
(a)
(b)
Figure 2.23 (A and B) Cross-genome phylogeny of biogenic amine receptor (Rectangular Display & Radial Display)
Cross-genome phylogeny of biogenic amine receptor (Clusters 21): Phylogenetic trees were generated using TREE-PUZZLE 5.1, quartet puzzling steps of 10,000 puzzling steps were done for maximum likelihood method. Out-group is not shown in the figure. Generated newick tree files were colored by using MEGA 4.0. Human GPCRs are denoted in green color, serpentine receptors of C. elegans like Sra super family (aqua), Str super family (fuchsia/pink), Srg super family (blue) ,Others/Solo type receptors (maroon), typical membrane proteins (purple), hypothetical trans membrane proteins (red) are also shown. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.
86
87
Cluster 22 Cluster-22 consists of 9 human and 29 C. elegans GPCR entries (Figure 2.24.A-B). Human 5-HT1 receptor class comprises of five different receptors, which share 41 to 66% overall sequence identity within themselves and observed in HC1 and HC2. Human GPCRs are distributed into HC1, HC2 and CC1, CC2, NM1 which represent the intermixing of receptors between two taxa. In NM1, 5HT1A of human has an orthologous relationship with NP_497452.1 (ser 4) of C. elegans at 1.0e-71. Dopamine 4 receptor of human and tyra-2 of C. elegans also got clustered together as ortholog pairs. C. elegans entry, NP_506052.1 (srj type) associates with Q8TDV2 in CC1 at best percentage identity among the rest all other members as 15.0 at an Evalue of 0.15 whereas, NP_001024728.1 (ser 1) coclusters with Q16538 in CC2 and retains the percentage identity as 11.4 at a significant E-value of 1.0e-50. Neighboring clades of NC1-NC7 are present and mostly retain pure set of receptor at intra-genomic level, wherein receptors from Srg (in NC1), Str (in NC2, NC3), Others (in NC4), Str (in NC5, NC6, NC7) superfamily are observed. Interestingly, the receptor namely tyra-2 in NM1 exhibit significant association at the favorable E-value of 5.00E-69 for further investigation to connect function commonality.
(a)
(b)
Figure 2.24 (A and B): Cross-genome phylogeny of biogenic amine receptor (Rectangular Display & Radial Display)
Cross-genome phylogeny of biogenic amine receptor (Cluster 22): Phylogenetic trees were generated using TREE-PUZZLE 5.1, quartet puzzling steps of 10,000 puzzling steps were done for maximum likelihood method. Out-group is not shown in the figure. Generated newick tree files were colored by using MEGA 4.0. Human GPCRs are denoted in green color, serpentine receptors of C. elegans like Sra super family (aqua), Str super family (fuchsia/pink), Srg super family (blue) ,Others/Solo type receptors (maroon), typical membrane proteins (purple), hypothetical trans membrane proteins (red) are also shown. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.
88
89
Cluster 23 This cluster contains 22 human GPCRs and 77 C. elegans GPCR entries (Figure 2.25 A-B). G-protein-linked Acetylcholine Receptor family members, like muscarinic acetylcholine, adenosine, histamine and many orphan receptors of human GPCRs, are all clustered in different clades (HC1-HC7) in cluster 23. Particularly, G-protein-linked Acetylcholine receptor family members, gar-1, gar-2 are observed together as a neighboring cluster (NC11) and gar-3 of C. elegans got associated as neighboring
member (NM1) with human GPCR clades. NP_001024236.1 (gar-3) has 37.9% of identity with ACM3 human GPCR and retain orthologous relationship and a hypothetical protein NP_001040810.1, retains 7.8% identity with a human GPCR (AA3R) is another notable ortholog, although their functional equivalence is yet to be established. NC1 is from Sre type belonging to the Srg superfamily, while NC2 associates only hypothetical proteins. Pure set of Srw receptors (Solo Type) are observed in NC3 and NC9, whereas NC4 has receptors of sra and srt types. Notably, pure set of srsx type (Solo Type) receptors are associated in NC5 and NC7. NC6 also retains only Srbc type (Solo Type). NC8 includes majority of srj type and a hypothetical protein. Srg receptors are at NC11 and two hypothetical receptors are at NC12, NC13. NC15 associates with srj, str members of Str superfamily. Hypothetical proteins, lung even transmembrane receptor and Srx, Srv type receptors are observed at NM2. The species-specific clade (SS1) retains purely srh type receptors of Str superfamily. Overall, cluster-23 has members from others / (solo) superfamily as well. Despite the high sequence divergence within human biogenic amine receptors (with seven clades), the
90
fact that many of the C. elegans GPCRs associated with this cluster do not cocluster with the human GPCRs suggests species-specific requirements and lineages of these receptors. Interestingly, this study brings together several uncharacterized and hypothetical proteins to this cluster of biogenic amine receptors. Also notably, gar-3 (NP_001024236.1) at the lowest E-value 4.00E-51 shows highest functional significance in the association.
(a)
(b)
Figure 2.25 (A and B): Cross-genome phylogeny of biogenic amine receptor (Rectangular Display & Radial Display)
91
92
Cluster 24 Receptors of biogenic amines (dopamine, histamine, octopamine and adrenaline), few serotonergic receptors and many orphan receptors are associated in cluster 24. This cluster contains 24 GPCRs from human and 68 GPCRs from C. elegans (Figure 2.26.A-B). Human BGA receptors are distributed in HC1-HC3 clades and C. elegans GPCRs are noted within clades CC1, NC1-NC13 and NM1 clades. Many biogenic amine receptors of C. elegans, like tyra-3, serotonin 5, serotonin 7, serotonin 2, dopamine 1 and dopamine 2 branch near the human biogenic amine receptors, where the percentage identity ranges from 20 -37%. Two pairs of ortholog sets are observed in NM1. NP_001024579.1 (dop-1) has 36.1% of identity with Q8NGU3 and got associated at 1.0e-69 and NP_001024047.1 (dop-2) has 33.2% of identity with human D3DR and got associated at 2.00e-49. Notably, neighboring clades (NC1-NC3, 8, 11 and 13) retain pure set of GPCR members from Str superfamily, whereas the NC4, 5, 7, 9 and 10 clades are predominately associated with hypothetical proteins. Many neighbor members were observed with mixed distribution (receptors belong to srd, srv types
from Str superfamily and sre, sra of Sra superfamily). Overall, typical candidate GPCRs like Dop-1, Tag24, Ser-2 and so on from C. elegans are observed at the significant E-values and proposing to investigate for functional similarities.
(a)
(b)
Figure 2.26 (A and B) : Cross-genome phylogeny of biogenic amine receptor (Rectangular Display & Radial Display)
93
94
2.6.5
Result Summary for Class B (Secretin) Receptors
Cluster 25 Class B receptors are represented by two clusters (25 and 26) consisting of classical hormone receptors from human and Drosophila methuselah (MTH) like proteins. Cluster 25 consists of 17 human and 31 C. elegans GPCR entries in which human GPCRs recognise structurally related ligands like polypeptide hormones of 27141 amino-acid residues (pituitary adenylate cyclase-activating polypeptide (PACAP), secretin, calcitonin, corticotropin-releasing factor (CRF), urocortins, growth-hormonereleasing hormone (GHRH), vasoactive intestinal peptide (VIP), glucagon, glucagon-like peptides (GLP-1, GLP-2) and glucose-dependent insulinotropic polypeptide (GIP) and are related to calcitonin (CALR_Hum) and calcitonin gene-related peptide type 1 receptors (CGRR_Hum). Small accessory proteins (Receptor activity-modifying proteins (RAMPs), interact with these calcitonin receptors and can generate pharmacologically distinct receptors. Human orphan receptor, Q8NHB4_Hum, is very closely related to PTRR_Hum receptor (that binds to parathyroid hormone and parathyroid hormone-related protein (PTHrP). The human GPCRs within this cluster diversify and in CC1 GRFR_Hum coclusters with secretin type receptor (NP_498978.1) from C. elegans. C. elegans GPCRs associated with this cluster are distributed as NC1-NC7 (Figure 2.27 A-B), NM1 and SS1 clades. The NC1 clade has sre and srw receptor types at good bs value of 100 which get associated at average E-value of .002. The Secretin receptor family of C. elegans got association with this cluster. NP_510496.1 (Secretin receptor family) and NP_001021172.1 (pdfr-1) were found as ortholog for CALR_HUMAN (28.5 % identity) and CRF2_HUMAN (30.4 % identity) at very significant E-value. In NC1 NP_498978.1 (Secretin receptor family) retains good percentage
95
identity (28.1) with a human GPCR VIPR_HUMAN. These associations provide examples where coclustering of orthologs may not happen suggesting functional equivalence even during large sequence variations in this cluster. NC3-NC8 are associated with candidate GPCRs from of srh, srd, str type of Str superfamily, which are distantly related to the human GPCR clades of this cluster and many more neighbor member from same Str superfamily has been observed in NM1. SS1 retains srh and str type receptors of Str superfamily. In general, Str superfamily members are found more in this cluster. C13B9.4 (NP_001021172.1), C18B12.2 (NP_510496.1) and ZK643.3a (NP_498978.2) are associated at the significant E-values such as 5.00E-40, 5.00E-43 and 5.00E-30 to the human Secretin type GPCR members and the associations can be further explored for the functional relevance with the respected GPCR profile.
(a)
(b)
Figure 2.27 (A and B ): Cross-genome phylogeny of secretin type receptors (Rectangular Display & Radial Display)
Cross-genome phylogeny of secretin type receptors (Cluster 25): Phylogenetic trees were generated using TREE-PUZZLE 5.1, quartet puzzling steps of 10,000 puzzling steps were done for maximum likelihood method. Out-group is not shown in the figure. Generated newick tree files were colored by using MEGA 4.0. Human GPCRs are denoted in green color, serpentine receptors of C. elegans like Sra super family (aqua), Str super family (fuchsia/pink), Srg super family (blue) ,Others/Solo type receptors (maroon), typical membrane proteins (purple), hypothetical trans membrane proteins (red) are also shown. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.
96
97
Cluster 26 This particular cluster is retained only with Drosophila GPCR members (Methods) and due to RPS-blast runs, 20 candidate GPCR members from C. elegans have been associated. Methuselah receptors and its paralogs of Drosophila solely represent cluster 26. The Drosophila mutant methuselah (MTH) was identified from a screen for single gene mutations that extended average lifespan of an organism and also increased resistance to several forms of stress, including starvation, heat, and oxidative damage. Drosophila GPCRs branch into two clades denoted as DC1 and DC2, whereas C. elegans GPCRs are distributed in NC1, NC2, NM1, NM2, and SS1 (Figure 2.28 A-B). Interestingly, all the clades retain receptors from Str superfamily. NC2 retains entirely of srh type receptors from Str superfamily at 69 bs value and many neighboring members also from Str superfamily with two hypothetical proteins. SS1 also retains candidate receptors from Str superfamily. As mentioned earlier, the predominant occurrence of Str superfamily denotes the abundant availability of str type candidate receptors in nematode genome and is reflects the species-specific retention and with the limited coclustering while performing cross-genome phylogeny. Notably, three hypothetical proteins namely, sri-20 (NP_505665.3) and srh-250 (NP_494681.1) are associated at the significant E-values of 0.011 and 0.38, respectively, with the Drosophila GPCR members, and the association can be further explored for the functional relevance with related fly GPCRs from this cluster.
(a)
(b)
Figure 2.28 (A and B): Cross-genome phylogeny of secretin type receptors (Rectangular Display & Radial Display)
Cross-genome phylogeny of secretin type receptor (Cluster 26): Phylogenetic trees were generated using TREE-PUZZLE 5.1, quartet puzzling steps of 10,000 puzzling steps were done for maximum likelihood method. Out-group is not shown in the figure. Generated newick tree files were colored by using MEGA 4.0. Human GPCRs are denoted in green color, serpentine receptors of C. elegans like Sra super family (aqua), Str super family (fuchsia/pink), Srg super family (blue) ,Others/Solo type receptors (maroon), typical membrane proteins (purple), hypothetical trans membrane proteins (red) are also shown. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.
98
99
2.6.6
Result summary for cell adhesion receptors
Cluster 27 This cluster contains 29 human GPCRs and 17 C. elegans GPCRs (Figure 2.29.A-B). Large number of GPCRs belonging to Cell adhesion receptors, characterised by a long extracellular N-terminus and GPCR proteolytic site (GPS) domain, are represented in cluster 27. Several of these receptors from human have functional domains such as epidermal growth factor (EGF), leucine rich repeat (LRR), hormone-binding domain (HBD) and immunoglobulin (Ig) domains. Hence, they branch into several distantly related clades together with C. elegans members and are denoted as HC1 and HC2, CC1, NM1 and SS1. Most of the human GPCRs from this cluster are orphans with no known ligands (Hideo Taniura, et al., 2006) The Q9HAR2 (LEC3 Lectomedin-3) in NM1, has an orthologous relationship with the C. elegans gene product NP_001040724.1 (lat-2; E-value of 7e-71), albeit observed in SS1. lat-2 gene has significant sequence similarity with a paralog - NP_495894.1 (lat-1); lat-1 associates with human Q9HAR2 at an E-value of 5e-68. The rest of C. elegans GPCRs are branched distantly from the root and has been observed in NC1 with mostly srh, str, sri from Str superfamily. Even though one of the human GPCR (Q8WXG9) ,coclusters with NC1 clade .This could be due to the large size of Q8WXG9 (6307 amino acids) and can even be viewed as an outlier in this cluster. The clear underrepresentation of C. elegans GPCR with cell adhesion receptors, belonging to cluster 27, is noteworthy. Also receptors such as lat-1 (NP_495894.1) and lat-2 (NP_001040724.1) are associated at the E-values such as 5.00E-68 and 7.00E-71 provide clues to connect further for the functional relevance with respective human GPCR profile from the cluster.
(a)
(b)
Figure 2.29 (A and B ): Cross-genome phylogeny of cell adhesion type receptor (Rectangular Display & Radial Display)
Cross-genome phylogeny of cell adhesion type receptor (Cluster 27): Phylogenetic trees were generated using TREE-PUZZLE 5.1, quartet puzzling steps of 10,000 puzzling steps were done for maximum likelihood method. Out-group is not shown in the figure. Generated newick tree files were colored by using MEGA 4.0. Human GPCRs are denoted in green color, serpentine receptors of C. elegans like Sra super family (aqua), Str super family (fuchsia/pink), Srg super family (blue) ,Others/Solo type receptors (maroon), typical membrane proteins (purple), hypothetical trans membrane proteins (red) are also shown. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.
100
101
2.6.7
Result summary for class C (glutamate) receptors Receptors of Class C are divided mainly into four clusters: clusters 28
to 31. Metabotropic glutamate receptors (MGR), -aminobutryic acid (GABA) receptors, calcium-sensing receptors (CASR) and retinoic acid-inducible G-protein-coupled receptors (RAIG) are available as in our previous work. Cluster 28 Cluster 28 associates with eight Human and 20 C. elegans GPCRs and retains majorly with human metabotropic glutamate receptors (MGRs). The primary structure and pharmacology of mGluRs are evolutionarily wellconserved in Drosophila, C. elegans, and higher mammals (Adams, 2000). The human metabotropic glutamate receptors mGluR1, mGluR2, mGluR3, mGluR4, mGluR5, mGluR6, mGluR8, Q8NFS4 (Metabotropic glutamate receptor 7 variant 3) form a clade and thus inferred as CC1 with the presence of metabotropic glutamate receptor of C. elegans (mgl-1, mgl-2). Notably mgl-1 (CAM33507.1) at E-value 8.00e-87, mgl-2 (NP_492720.2) at E-value 5.00e-70 and mgl-3 (NP_741400.1) at E-value 3.00e-85 has got associated in this cluster (Figure 2.30 A-B). Interestingly, mgl-1 of C. elegans had been observed as an ortholog of Q8NFS4 and has 51.9 % sequence identity with mGluR2, mgl-2 (NP_492720.2) has 46.3% sequence identity with mGluR5 and the mgl-3 (NP_741400.1) has 48.4 % sequence identity with mGluR3 suggesting similar ortholog pairs exist. Specific sequence patterns like SGREL(S/C)Y, TKT, (G/S)RE, MYTTCIIWLAF, NETKFIGFT are well-conserved amongst these members (alignment data not shown). NC1-NC6 include Srd, Srh, Str type receptors from Str superfamily. This cluster is illustrative to explain that related C. elegans GPCRs, especially from Str superfamily with the glutamate receptor of humans do exist and could be identified by our sequence analysis. Also, hypothetical receptor CAM33507.1, and mgl-3 (NP_741400.1), (mgl-2) (NP_492720.2) at the significant E-values such as 8.00E-87, 3.00E-85 and 5.00E-70 respectively, provide clues to connect for the functional relevance with human CARs and are notably observed at CC1.
(a)
(b)
Figure 2.30 (A and B ): Cross-genome phylogeny of glutamate receptor (Rectangular Display & Radial Display)
Cross-genome phylogeny of glutamate receptor (Cluster 28 ): Phylogenetic trees were generated using TREE-PUZZLE 5.1, quartet puzzling steps of 10,000 puzzling steps were done for maximum likelihood method. Out-group is not shown in the figure. Generated newick tree files were colored by using MEGA 4.0. Human GPCRs are denoted in green color, serpentine receptors of C. elegans like Sra super family (aqua), Str super family (fuchsia/pink), Srg super family (blue) ,Others/Solo type receptors (maroon), typical membrane proteins (purple), hypothetical trans membrane proteins (red) are also shown. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.
102
103
Cluster 29 Human calcium-sensing calcium-sensing receptor receptor (CASR_Hum-Extracellular Cell calcium-sensing
precursor/Parathyroid
receptor) forms cluster-29 along with a set of five orphan receptors and 14 C. elegans GPCRs (Figure 2.31.A-B). Human Calcium-sensing receptor CASR and the orphan receptors (Q8NHZ9, Q8NGV9, 8NGW9 and Q8NGZ7) form a clade with a C. elegans GPCR (NP_501400.1). A sweettaste receptor of 3GCPR / PBP1_GPCR_family_C_ like receptor is associated at an E-value of 2.00 E-26 and got branched with Q8NGV9, at 23.5 %identity. Thus, the current clustering approach suggests that NP_501400.1 may be a putative ortholog of this extracellular calcium-sensing receptor
precursor/Parathyroid Cell calcium-sensing receptor. NC1 and NC2 clades retain only candidate GPCRs from Str superfamily, in contrast to Sre, Srxa type GPCRs at NC3 and two other Srh type receptors observed at NM1. Interestingly, a hypothetical protein namely F35H10.10 (NP_501400.1) associated at the significant E-value (2.00E-26) and srh-275 (NP_504876.1) at the favourable E-value (0.075) in SS1 are worth to analyse further for possible functional relevance with human glutamate type receptors.
(a)
(b)
Figure 2.31 (A and B ): Cross-genome phylogeny of glutamate receptor (Rectangular Display & Radial Display)
Cross-genome phylogeny of glutamate receptor (Cluster 29): Phylogenetic trees were generated using TREE-PUZZLE 5.1, quartet puzzling steps of 10,000 puzzling steps were done for maximum likelihood method. Out-group is not shown in the figure. Generated newick tree files were colored by using MEGA 4.0. Human GPCRs are denoted in green color, serpentine receptors of C. elegans like Sra super family (aqua), Str super family (fuchsia/pink), Srg super family (blue) ,Others/Solo type receptors (maroon), typical membrane proteins (purple), hypothetical trans membrane proteins (red) are also shown. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.
104
105
Cluster 30 Cluster-30 comprises of four Human GPCRs and 23 C. elegans GPCRs (Figure 2.32. A-B). Human GPCR clade (HC1) contains the human retinoic acid induced GPCRs and orphan GPCR members. C. elegans GPCRs form different clades as SS1, NC1, NC2, NM1, NM2 and SS1. Overall, C. elegans GPCRs associated with cluster-30 are from Str superfamily receptors. NC1, NC2 and SS1 clades include C. elegans receptors from Str superfamily belonging to srj, srh and str types, respectively; NM1 and NM2 clades also retain receptors of Str superfamily. str-123 (NP_510135.1) in NC3 is observed at the lowest E-value (0.003).
(a)
Figure 2.32 (A and B) Cross-genome phylogeny (Rectangular Display & Radial Display) of
(b)
glutamate receptor
106
Cluster 31 Cluster 31 has four human and eight C. elegans GPCRs (Figure 2.33 A-B). The GABAB receptors are present in this cluster. GABAa receptors are members of the ionotropic receptor superfamily which includes alpha-adrenergic and glycine receptors. The four human GABAB receptors branches with a good bs value of 80 and form the HC1 clade. The NP_741740.1 (gbb-1) receptor was picked as ortholog to GBR1_human by reverse blast best hit procedure with 1.00E-48 and the NP_493575.2 also gets associated with GBR1_human at a highly significant E-value of 4.00E-09. Both of these C. elegans GPCRs get branched in the human clade, HC1. Additionally, NC1 and NM1 clades include receptors from Str and Sra superfamily, respectively and SS1 is associated with Str superfamily. The hypothetical proteins such as T32170 (C31B8.10), ZK180.1 (NP_500579.2) and Y41G9A.4b (NP_741740.1) are associated at significant E-values and can be used further to relate functional commonality with human glutamate type receptors.
(a)
(b)
Figure 2.33 (A and B): Cross-genome phylogeny of glutamate receptor (Rectangular Display & Radial Display)
107
108
2.6.8
Result summary for frizzed/smoothened receptors
Cluster 32 Cluster-32 comprises of receptors with similar domain
architectures: a 200-residue long N-terminal domain (which contains the predicted orthosteric ligand binding site), the cysteine-rich domain (CRD domain; which is likely to participate in Wnt ligand binding) apart from the GPCR domain. This cluster contains 11 human and 42 C. elegans GPCRs (Figure 2.34 A-B). The Human GPCR (FZD1-10) got associated in CC1.
NP_492635.1 (mom5) of C. elegans is an ortholog of FZD1 (human GPCR) which got associated with this cluster at an E-value of 1.0e-75 and retains 34.5% identity with human GPCR sequence. Apart from these entries, NP_491028.2 shares 34.5% identity with FZD4 (human GPCR), and NP_503964.2 (cfz2) shares 40.0% identity with FZD5 (human GPCR), got branched in the CC1 itself indicating that there may be a high functional similarity characteristic of close homology between these receptors. The neighboring clusters were annotated from NC1 NC8. The NC3 clade is of srbc type which forms a clade (with 67 bs value and 0.061 Evalue) with candidate receptors from Srg superfamily. Neighboring clusters NC2-NC8 share most of the members from str, srj, srh types from Str superfamily (with <3.0 average E-value with good bs values). Interestingly, two hypothetical proteins and receptors of Str superfamily are observed in NM1. In CC1 cluster, two srh type receptors of C. elegans entries are observed in SS1 clade. Interestingly, mom-5 (NP_492635.1) also an ortholog, lin-17 (NP_491028.2) and Frizzled homologue (cfz-2) (NP_503964.2) are associated with human Frizzed/smoothened type receptors.
(a)
(b)
Figure 2.34 (A and B ): Cross-genome phylogeny of FRZ/SMT type receptor (Rectangular Display & Radial Display)
Cross-genome phylogeny of FRZ/SMT type receptor (Cluster 32): Phylogenetic trees were generated using TREE-PUZZLE 5.1, quartet puzzling steps of 10,000 puzzling steps were done for maximum likelihood method. Out-group is not shown in the figure. Generated newick tree files were colored by using MEGA 4.0. Human GPCRs are denoted in green color, serpentine receptors of C. elegans like Sra super family (aqua), Str super family (fuchsia/pink), Srg super family (blue) ,Others/Solo type receptors (maroon), typical membrane proteins (purple), hypothetical trans membrane proteins (red) are also shown. Respective cluster types and cluster numbers are also mentioned for both rectangular (left-side) and radial (right-side) displays.
109
110
2.7
CONCLUSION Reported associations in the current study for selected human and
C. elegans GPCRs provide information at preliminary level of understanding the cross-genome clustering and possible related sequences across taxa. Though the RPS blast is a sensitive approach in associating the queries to the profile (independent of sequence identity), and lack of input of all the available /possible receptor type within profiles, the current approach is effective in connecting the remote homologues to the given representative profiles. In future, by observing the fine-grained analytical approaches like identifying conserved domains and motifs, a clear understanding about the established associations could be obtained. The selected human (353) - C. elegans (around 1159) GPCRs were associated by RPS-Blast technique to produce 32 cross-genome GPCR clusters of biologically important GPCRs. 84% of cross-genome clustering was done successfully at significant E-value thresholds (ranges from 0.001 to 1), additional 14% association was observed at the E-value thresholds ranges from >1 to >5, and a very small percentage (2% ) of association was obtained at E-value thresholds more than 5 (Figure 2.35 A and B). Also notably, serpentine receptors, hypothetical receptors are associated at the significant E-value range of <1 and orthologs are reported at the significant E-value thresholds (such as <1) in the overall distribution of human-nematode GPCRs in the cross-genome GPCR associations. Interestingly, few candidate receptors from the largest Str superfamily were only showing high E-value thresholds (Figure 2.35.B). This may be due to the long lineage of human GPCRs with the nematode serpentine receptors, in the evolution, particularly str-type receptors. In associating cross-genome GPCR sequences, E-value limit plays a critical role, and as we know the E-value thresholds obtained for this profile-based clustering technique is purely dependent upon the selected representative human GPCRs, respective PSSM profiles, and type of GPCRs
111
in the dataset. Also, every chance given for nematode GPCR to associate to related human GPCR profile (a sequence feature/property) was started with statistically stringent E-value cut-off. When required, the stringency has been relaxed to facilitate cross-genome GPCR association. Therefore the approach could further tried/improved with alternate representative human GPCRs to assess the coverage of E-values for cross-genome clustering. Notably, in the current study, among the 27 orthologs identified (Table 2.2), predominately, 13 candidate GPCRs were observed from the cross-genome GPCRs of peptide receptors (Table 2.2). The association was obtained by applying the coverage of E-value ranges from 5.00E-73 to 0.01. However, the closely associated GPCRs were distributed in the tree topology not only as CC, but also as NM, NC and SS. Particularly, the observed orthologs were associated at the E-value ranges between 8.00 E-87 to 0.53 in the dataset. Ortholog pairs such as (mom5, FZD1_HUMAN) of FRZ type receptor (cluster 32), dopamine receptor family member (dop-1) and DADR_HUMAN of BGA type receptor (Cluster 24) are the observed two examples for cross-genome GPCR association found at the low (8.00 E-87) and high (0.53) E-value thresholds among the identified orthologs of the dataset (Table 2.2). Apart from orthologs, candidate GPCRs such as frizzled homolog (cfz-2) (FRZ/SMT receptor of Cluster 32), serotonin /octopamine receptor (ser-2) (BGA receptor type of Cluster-24), Serpentine receptor sro-1 (N&L receptor type of Cluster 14), NP_505583.1 (F57A8.4) (CMK receptor type of Cluster13) establishes favourable association at the relatively significant E-value such as 1E-80, 3E74,6E-16 and 2E-07 respectively to human GPCRS. Meanwhile, GPCRs such as NP_507020.1 from BGA receptor type of Cluster 20, serpentine receptor srh-78 from PR type of Cluster 4, serpentine receptor str-262 from secretin type receptor of Cluster 25 are found as most/very distantly related to human GPCRs observed highly relax/lenient E-value cut-off. In this manner, E-value thresholds (statistical application) is used effectively in profile-based clustering technique to identify/discriminate the associated GPCRs as closely or distantly related sequences across taxa.
112
Figure 2.35 (A and B ) Distribution of C. elegans GPCRs at E-value thresholds
various
(A) Graphical pi-chart representation for the association of C. elegans GPCRs with 32 human GPCR profiles at differentE-value thresholds: threshold of <1.0 for 84% (green color), >1 to <5 for 14% (orange color) and >5 for 2% (red color) in dataset. (B) Bar diagram illustrates the distribution of receptors at superfamily level, hypothetical protein, orthologs at different E-value thresholds: such as ,1.0 (green color), .1 to ,5 (orange color) and .5 (red color) in dataset. Kindly Note: Alignment files and supporting tables for all 32 clusters are available in the following URL for downloads: http://caps.ncbs.res.in/download/crossgenome GPCR/supplementary files.zip.
The proposed protocol for associating C. elegans GPCRs to known profiles of human GPCR clusters suggest that there has been high representation of C. elegans GPCRs (except in the cases of chemokine receptor cluster (Cluster 12,13) and cell adhesion receptors (Cluster 27). The distribution of serpentine receptors at the superfamily level, associated hypothetical proteins, coclustering of orthologs (Table 2.2) and also a trial study with known association clearly supports the RPSBLAST clustering technique. The observed trend in cross-genome GPCR phylogeny also supports the employment of RPS-BLAST in associating the unknown GPCRs to the known/previously annotated GPCR profiles and to establish the associations even at remote homology.
113
Since the significant E-values play a major role and act as a preliminary reference for sequence comparison, the current method helps to connect the reported hypothetical proteins with the associated known receptor types at sequence level further to compare for the functional relationships. The resultant 32 enriched cluster and their phylogenies were analysed and discussed in detail for the distribution in cross-genome studies (Result summaries). Terms, like human GPCR clade [HC], coclusters [CC], neighbor clades [NC], neighbor Members [NM], species-specific members [SS] were introduced to follow the branching patterns in the dendrogram of different clusters. Further, the association of GPCRs in the two genomes by our sequence analysis suggests that we can capture remote homology from 12 to 20 % (average cluster identity) and can include highly related (coclusters, orthologs), related (neighbor clusters) and distantly related (neighbor members) sequences. A broad spectrum of sequence relationships between human and C. elegans GPCRs could be seen: for example, there is inter-mixing in biogenic amine receptors (clusters 20 to 24), sufficient polyploidy amongst members in a cluster (example as in clusters 6 and 11), not sufficient inter-mixing (as observed in clusters 10 and 26) and strong species-specific tendencies (example as noticed for nucleotide and lipid receptors (Clusters 14 to 19). The identification of putative orthologs (example as in clusters 1, 5, 8 and 21in Table 2.2) among GPCRs from the two genomes helps to correlate the evolutionary integrity between the two genomes. Interestingly, in many instances, we could observe the orthologs coclustered within the same clade (example: in cluster 1, npr-9 is ortholog to GALR peptide receptor) validating our associations and clustering techniques. In this study, unannotated/ hypothetical proteins are associated with GPCR clusters at statistically significant E-values provoking their function to be interrogated by experiments (for example, as in clusters 3-5, 8, 11, 16-17, 23 and 32). No inter-mixing of sequence groups across individual genomes (called speciesspecific (SS) clades) and in Human GPCR (HC) clades) have also been
114
noticed in some instances (for example, Clusters 10 and 26). Nematodespecific serpentine receptors, with their Pfam domain knowledge and GO annotations, are helpful to understand the species-specific repertoire of GPCRs for the Sra, Str, Srg and others/Solo type of superfamilies. The motifs belonging to these receptors and the distribution of these receptors among biologically important eight subtypes of human GPCRs (kindly refer to Chapter 3) further helps to address the nematode specificity and to provide guidelines to understand species-specific sequence properties. A recent publication on identifying conserved motifs in the aligned set of cross-genome GPCR clusters, emerging from work described in Chapters 4 and 5 of this thesis, are biologically useful to connect the conservation at sequence level, next to structure then to functional benefit. Overall, crossgenome study such as the one reported in this Chapter, using sequence search and clustering strategy, uncovers information on putative orthologs, function annotation of novel genes and functionally important /related sequences in two genomes for further practical applications. Table 2.1 Distribution of Human and C. elegans GPCRs in 32 Clusters
S.No 1 2 3 4 5 6 7 8 9 10 11 12 Receptor type PR PR PR PR PR PR PR PR PR PR PR CMK No.of.human GPCRs 8 11 8 8 8 8 8 8 14 8 12 10 No.of. C. elegans GPCRs 32 40 34 32 54 42 34 34 54 26 60 17
115
Table 2.1 (Continued)

S.No 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Note:
Receptor type CMK N&L N&L N&L N&L N&L N&L BGA BGA BGA BGA BGA SEC SEC CAR GLR GLR GLR GLR FRZ/SMT
No.of.human GPCRs 15 7 18 7 18 8 18 8 4 8 22 23 16 Droso (14) 29 8 5 4 3 11
No.of. C. elegans GPCRs 13 64 53 21 30 28 89 8 49 30 77 68 29 20 16 20 14 23 8 40
List of cluster wise distribution (for 32 clusters) of C. elegans GPCRs according to the eight subtypes of human GPCRs. And cluster 26 retains around 14 GPCRs from Drosophila genome.
116
Table 2.2 List of Identified Ortholog

S.No 1 2 3 4 5 6 Human GPCR GBR2_HUMAN FZD1_HUMAN CALR_HUMAN Q9NZD1 GALS_HUMAN TRFR_HUMAN Type GLR FRZ SEC PR PR PR C. elegans GPCR NP_741740.1 NP_492635.1 NP_001021172.1 NP_505077.1 NP_509896.1 NP_491990.1 Describtion GABA B receptor subunit (gbb-1) mom-5 Calcitonin receptor activity Str-138 npr-9 hypothetical protein transmembrane receptor (Secretin family) Neuropeptide Y receptor activity hypothetical protein G-protein-linked Acetylcholine Receptor family member (gar-3) rhodopsin family tag-24 FSHR (mammalian follicle stimulating hormone receptor) homolog Dopamine receptor family member (dop-2) Srw-102 Gonadotropin-Releasing hormone rececptor (GnRHR) related (srh-2) npr-2 Metabotropic Glutamate receptor family hypothetical protein lat-2 hypothetical protein ser-4 hypothetical protein hypothetical protein Cholecystokinin Receptor homolog family member (ckr-2) Dopamine receptor family member (dop-1) Cluster no 31 32 25 5 1 5 E-value 0 8.00E-87 1.00E-75 5.00E-73 1.00E-71 7.00E-71
7 8 9
Q8NG71 GP10_HUMAN AA2R_HUMAN
SEC PR BGAR
NP_510496.1 NP_510101.2 NP_001040810.1
25 11 23 23 2 24
1.00E-69 2.00E-59 2.00E-54
10 11 12
ACM3_HUMAN SSR5_HUMAN 5H1B_HUMAN
BGAR PR BGAR
NP_001024236.1 NP_510833.3 NP_001024569.1
6.00E-52 4.00E-51 1.00E-50
13
TSHR_HUMAN
PR
NP_505548.1
5.00E-49
14 15
Q9NZR3 MTH8_DROME
BGAR SEC
NP_001024047.1 NP_001024316.1
24 26
1.00E-48 2.00E-48
16 17 18
GRHR_HUMAN MTH9_DROME OX2R_HUMAN
PR SEC PR
NP_491453.1 NP_001021365.1 NP_501701.2
6 17 8
2.00E-47 4.00E-47 2.00E-45
19 20 21 22 23 24 25
Q8NFS4 V2R_HUMAN Q9HAR2 NK1R_HUMAN 5H1A_HUMAN Q96AM5 NY4R_HUMAN
GLR PR CAR PR BGAR PR PR
CAM33507.1 NP_493193.1 NP_001040724.1 NP_500930.1 NP_497452.1 NP_509515.1 NP_508234.1
28 6 27 11 22 5 11
5.00E-43 3.00E-41 5.00E-40 3.00E-38 4.00E-38 1.00E-30 3.00E-30
26
CCKR_HUMAN
PR
NP_001022842.1
0.01
27
DADR_HUMAN
BGAR
NP_001024579.1
24
0.53
117
CHAPTER 3 PHYLOGENETIC ANALYSIS OF SERPENTINE RECEPTORS OF C. ELEGANS AND IDENTIFICATION OF CONSERVED MOTIFS IN SERPENTINE RECEPTOR SUPERFAMILIES
3.1
INTRODUCTION A man may fish with the worm that hath eat of a king, and eat of
the fish that hath fed of that worm Shakespeare from Hamlet This olden quote would be an appropriate starting point to recollect the importance of not only the trend of evolution, but also the role of nematode serpentine receptors in comparative genomics. As Sydney Brenner suggested the conserved features of C. elegans help to understand macromolecular evolution and also guide to connect the functional resemblance with other metazoans and higher order organisms. C. elegans is an effective model organism for studies in the fields of genome biology, animal development, behavioral studies and evolutionary studies. Since the cross-genome phylogenetic analysis on selected GPCRs of human and C. elegans has been studied previously (Chapter 2) the current objective is to perform phylogenetic analysis on serpentine receptors (SR) of C. elegans exclusively. One of the annotated C. elegans olfactory receptor (odr-10) has been selected for modelling three-dimensional structure.
118 3.2 HOMOLOGUES OF C. elegans GPCRs Genes spanning distant taxa are worthwhile target for comparative genomics. With this respect, C. elegans is an important genetic resource for doing cross-genome studies. In a comparative proteomics study, 83% of the worm proteome was found to have human homologues genes (Lai et al 2000, Stein et al 2003) . This metazoan is interesting because of retaining significant amount of homology with vertebrates and non-vertebrates. For instance, FGF (fibroblast growth factors) in C. elegans which plays major role in number of developmental processes (Coulier et al 1997) has conserved kinase subdomain II-VII (Hanks et al 1988) when aligned with nearly 40 homologues sequences and extended conservation with mammalian, avain, fish, amphibian and invertebrate paralogs (Coulier et al 1997). In other instance, interestingly, lin-12 a membrane protein in C. elegans and Notch in Drosophila share homology and notably these two receptors are tend to be a functional homologues also (Yoo and Greenwald 2005). Atleast 12 -17 signaling pathways of nematode are observed to be relevant with higher order organisms and involving with GPCRs reported by Scientific Frontiers in Developmental Toxicology and Risk Assessment, Washington DC, 2000. Such evidences emphasize the need of understanding the occurrence of SR gene clusters in C. elegans and further motivate to connect the functional relevance with GPCRs of other nematodes, eukaryotic organisms / higher order organisms for the vast structure and functional implications. 3.3 OBJECTIVES The main objective of the current study is to perform phylogenetic study on serpentine receptors of C. elegans for a better understanding of family-specific features at sequence perspective. Since the only one receptor that has been annotated as olfactory receptor (odr-10) in C. elegans (Mori 1999,
119 Sengupta et al 1996), phylogenetic cluster association could guide to collect homologues sequences of odr-10 and which further helps to collect odr-10 like sequences from other nematode species and higher eukaryotes. By using homology modelling, odr-10 has been modelled and could be explored for ligand binding sites and map hot spot residues. The identified sequence-specific properties of SR families can be used in machine learning approaches (SVM) to train and detect the putative chemosensory receptors in other nematode species, eukaryotes and higher order organisms. 3.4 CHEMOSENSORY RECEPTORS IN C. elegans Chemoperception is a central sense in C. elegans (Robertson and Thomas 2006, Thomas and Robertson 2008) and is mediated by members of the seven-transmembrane G-protein-coupled receptor class and genetic analysis of sensory neuron-specific G-proteins indicate the functional participation of olfaction, nociception and pheromone responses. The external/chemical signals are mediated predominately by GPCRs in C. elegans (Hilliard et al 2002, Roayaie et al 1998, Woollard 2005). 3.5 CHEMOSENSORY NEURONS AND OLFACTORY
APPARATUS IN C. elegans In the adult worm, around 959 somatic cells are present and among them, one third of the cells (nearly 302) are neurons and hence the study on neuroanatomy in nematode is focused at the synaptic level. The synaptic connectivity is well-understood by using electron microscopic studies (Chen et al 2006, White et al 1986). Notably, 32 neurons, i.e., nearly 10% of the nervous system participate in nematode olfaction. There are four types of chemosensory organs such as amphid, phasmid, inner labial, and outer labial organs are present in C. elegans. Each organ possesses two supporting cells
120 namely sheath and socket cells and forms the pore. Through this pore, sensory neuron endings are exposed out to sense external stimuli. Neurons are generally referred by three or four alphabetic letters. In the anterior end of the C. elegans, two amphid pores are present and each pore contains 11 chemosensory neurons and one thermosensory neuron called AFD. Among the 11 chemosensory neurons, eight neurons (namely ADL, ADF, ASE, ASG, ASH, ASI, ASJ, and ASK) are exposed to the cilia and the other three chemosensory neurons (such as AWA, AWB, AWC) are embedded in the sheath cell. The later three sensory neurons are also called as wing cells. Phasmid pore is comparatively smaller than amphid pore and contains PHA and PHB neurons (Hilliard et al 2002) Generally, neurons present in the anterior end participate as chemoattractants and the posterior end is meant for the chemosensory avoidance. Interestingly, odr-10 is found to be sensitive in recognizing volatile odorant (2,3-butanedione) (Sengupta, et al 1996) . 3.6 FAMILIES AND SUPERFAMILIES OF SERPENTINE namely di-acetyl
RECEPTORS IN C. elegans Nearly 1280genes 420 pseudogenes were identified as related to chemoreceptors in C. elegans and are classified under 20 families (Thomas and Robertson 2008). Due to the need of nematode habitat i.e., lack of auditory and visual sense, the abundant occurrence of serpentine receptors has been observed and it covers almost 7-8% in genome size (Chapter 1 also). Interestingly, C. elegans possesses almost 70% more chemosensory genes (718) than the other subspecies, as defined by the individual Pfam (Version 9.0) chemosensory gene families (Bateman et al 2002). It is worthwhile to study the sequence properties of serpentine receptors to connect structural and functional relevance with other higher eukaryotic organisms. Genes related to chemoreceptors are larger in C. elegans species than in C. brigasse. This may be due to the importance and necessity of
121 chemoperception in C. elegans than in C. brigasse. Thus, the frequency distribution of chemoreceptor is species-specific. Serpentine receptors (SR) are broadly classified as Sra, Str, Srg and other/Solo type superfamilies. The Sra superfamily retains families such as sra, srab, srb, sre, Srg superfamily includes families like srg, srt, sru, srv, srx, srxa, Str superfamily covers srd, srh, sri, srj, str families and others and Solo type includes srbc, srsx, srw and srz families. Overall, Str superfamily is the largest and notably srh family is the largest family in the nematode chemoreceptors. Interestingly, str family along with related sri and srj families are observed to be related to odr-10 (olfactory receptor) in C. elegans (Sengupta et al 1996). The occurrence of gene duplication, redundancy, movement and diversification in C. elegans (Robertson 1998) suggest the need for careful assignment of gene description, particularly, in phylogenetic clustering. There is substantial evidence reported for the event of gene loss and duplication in serpentine receptors of C. elegans (due to birth-death evolution). At one instance, srbc, srw, and srz families of C. elegans have very few clear orthologs across the other species such as C. briggsae and process of ongoing gene duplication and loss on each lineage. As explained in another case study, more than half of the genes in C. elegans, like sre and srxa families, have single orthologs in both C. briggsae and C. remanei. These are relatively stable families and have substantial numbers of apparent gene duplications and losses. Such observations clearly show the event of gene duplication and losses and emphasize the need for careful assignments in the gene cluster arrangements and also in the phylogenetic tree observations. C. remanei. These are dominated by species-specific expansions, which presumably arose by a
122 3.7 FEATURES AND IMPORTANCE OF SRs There are reports in the literature that C. elegans responds /senses to a variety of chemical attractants and repellants. Nematode responds to the water soluble compounds such as salts, some amino acids (like lysine and histidine), some nucleotides, and some vitamins and various volatile compounds as alcohols, ketones, esters, pyrazines, thiazoles, and aromatic compounds and biotin, anions, cations (K+, Na+), and chloride ion concentrations and basic pH changes. C. elegans shows remarkable olfactory ability in discriminating various external cues and is mediated by the olfactory pathway. They respond to various chemical stimuli by diverse behavioral responses (Pace et al 1985, Sklar et al 1986). 3.8 SRs: FUNCTIONAL RELEVANCE WITH OTHER
EUKARYOTIC GPCRs Interestingly, certain serpentine receptors show remarkable functional relevance with human and mouse(Thomas and Robertson 2008). For example, candidate receptors from sra family found to be related to melanin-concentrating hormone receptor (mouse MCHR-1), srab receptors to human thyroid stimulating hormone receptor, srb receptors to melaninconcentrating hormone receptor of mouse, srbc type receptors to angiotensin II receptor type 1 (human MAS1), candidate sre type receptor to sphingolipid G-protein coupled receptor 1 (human EDG1) are observed as significantly related at sequence level. Also, srsx type receptors to somatostatin receptor (mouse Sstr2), srx to melatonin receptor 1A (human MTNR1A) and srxa to melaninconcentrating hormone receptor (mouse MCHR-1) are found to be related. Notably, the diverse receptors such as srg, srt, sru, srv are related to opsin type receptors from many other species. srw type receptors, observed to be related to the neuropeptide receptors and various SR families such as srh, sri, str, srj, and srd, are highly related to each other at intra-genomic level.
123 3.9 3.9.1 METHODOLOGY Data collection SEVENS database (Ono et al 2005) provides more than 1000 GPCRs for C. elegans genome exclusively, and as mentioned in the literature. For the current study, 682 receptors have been collected along with Odr-10. Candidate receptors from sra, srb and sre type from Sra superfamily, srg, srt, srv, srx, srxa type receptors from Srg superfamily, srd, srh, sri, srj, str from Str superfamily, and srbc, srsx, srw, and srm from Solo/others superfamily were collected and distribution of serpentine receptors 4%, 7%, 69% and 20% respective superfamilies in the dataset (Figure 3.1).
Distribution of serpentine receptors in dataset

20% 4% 7%
SRA
69%
Figure 3.1 Pie-diagram to show the distribution of serpentine receptors (SR) in the dataset
Note : The pie-diagram illustrates the distribution of serpentine receptors in dataset. Candidate receptor types such as sra, srb and sre from Sra superfamily (aqua), srg, srt, srv, srx, srxa type receptors from Srg family (blue), srd, srh, sri, srj, str from Str superfamily (pink), and srbc, srsx, srw, and srm from Solo/others superfamily (brown) were exhibiting 4%, 7%, 69% and 20% respectively.
3.9.2
Prediction of TM-helices by HMMTOP The collected serpentine receptors were predicted for
transmembrane helices and their membrane topology. As mentioned in the literature, N-out topology is observed predominantly in worm, mouse and humans which is seen in the canonical GPCRs, but Drosophila ORs exhibit
124 reverse topology (Benton et al 2006). HMMTOP-prediction server was used to predict the membrane topology of the SRs and 80% of sequences were predicted for N-OUT topology. 3.9.3 Alignment Procedure by MAFFT Since number of sequences for alignment and evolutionary
distance between protein sequences play crucial role in the alignment procedure, an appropriate procedure namely MAFFT has been employed to align more than 650 SR sequences. A gap penalty of 1.53 and JTT 200 matrix was used to generate alignment. Sequences of serpentine receptor show high degree of diversity and varying sequence composition giving rise to considerable indels in the alignment window. 3.9.4 Phylogeny of Selected Serpentine Receptors The aligned SR sequences were used for generating phylogeny and NJ method of tree generation has been done for the 1000 BS replicates by using MEGA 5.0 (Saitou and Nei (1987). The generated tree topologies were analyzed for the cluster association with reference to superfamily. Cluster association at reliable bootstrap (>=50) values were grouped and inferred at superfamily level (Figure 3.2). 3.9.5 Identification of Motifs in SRs Phylogeny-guided cluster association has been used to build multiple sequence alignments. Few representative sequences at the family level have been selected and aligned by online MAFFT alignment program. Receptors for about 15 families have been selected and studied for the conserved motifs by using TM-MOTIF package (Chapter 4). As the number of serpentine receptors varies from family to family, only few representative serpentine receptors have been considered for the current study.
125 The generated MSA of 15 SR families, along with their respective multiple FASTA sequences of 15 SR families, were used as input to the TMMOTIF package. Conserved motifs with their respective topology and possible substituting AAS were examined. 3.10 RESULTS The generated phylogeny for the selected SRs reveal four distinguishable clusters of the four receptor superfamilies such as Str, Srg, Sra, others/Solo (Figures 3.2). The phylogeny shows the clear speciesspecific relationship within the four superfamilies. Among them, candidate receptors from Str superfamily tend to be predominant and observed for their huge occurrence in the phylogeny. The only annotated olfactory receptor in C. elegans (odr-10), which is reported to belong to the Str superfamily, indeed is co-clustered with candidate receptors from Str superfamily. The phylogenetic analysis enables the collection of neighboring sequences for odr10. Thereby, nearly, 43 homologues sequences have been identified (Figure 3.3).
Figure 3.2 Phylogeny on selected serpentine receptors (circular view tree)

Note: The generated NJ method of phylogeny shows the species-specific cluster arrangements for the serpentine receptor superfamily Str (in pink), Srg (in blue), Sra (in aquva), others/Sole (in brown). Notably the largest Str superfamily (in pink) shows high number of occurrence and odr10 (in red) annotated olfactory receptor tends to be with the str family of Str superfamily members (in pink). The tree topology is given for the circular display in MEGA 5.0.
126 In the interest of collecting the homologues for odr-10, only the subcluster related to odr-10 has been studied in detail. As a result, six subclusters have been identified and are named as Str_C1 to Str_C6 (Figure 3.3). Among the collected 43 sequences, Str_C1 clade retains five related str type receptors with significant BS values. Notably, serpentine receptor namely str-112 shows highly significant BS value as 100 and remains as a closest homologue to odr-10. Early literature also strongly suggests that this particular receptor and str-89 as structural homologues to odr-10 (wormbase IDs such as WBStructure010755, WBStructure010818 (F10D2.4 (str-112) (Chen et al 2004)). The pairwise alignment between odr10 and str-112 shows the good quality alignment with the query coverage (alignment length versus query sequence length) of more than 95% and (Figure 3.4) also the sequence identity observed between these receptors is nearly 80%.This sequence association could be an interesting evidence to refer the effectiveness of phylogeny clustering in identifying/associating the related homologs rather reliable association. Str-112 and related str type members such as 115, 114,113 can be further studied for the behavioral assays to detect di-acetyl and related compounds. The same way, the str-type receptors such as Str- 96,101,109, 111,108,106,97,99,103 from Str_C2, candidate receptors such as Str89,90,264,92 and 93 from Str_C3, receptors such as str-151,149,148,146, 145,143,144,141,138,139,140 from Str_C4, Str 131, 134, 135, 136,118 from Str_C5 and Str-126,130,124,123 and125 from Str_C6 can be further explored for ligand binding properties and secondary structure predictions (Figure 3.3) since they are found to be related to odr-10 .
127
Figure 3.3
Subcluster showing odr-10 and its homologues

Note: The circular view of NJ tree shows odr-10 and its related homologs from Str_C1 to Str_C6 and notably the closest homologue str-112 is observed along with odr-10 in Str_C1.
3.10.1
Identified Motifs in SR Families : A Pilot Study From the phylogeny, the selected representative sequences from
each clade were taken and observed for the conserved amino acids. As discussed in the methods, the sample sequences for each serpentine receptor family varies (refer 3.8.1) and are aligned by the MAFFT alignment program, and the resulted alignments were used to identify the conserved motifs along with substituting amino acid residues (AAS) at the family level by using an in-house program (MotifS program- by R.Sowdhamini unpublished results) and recorded by using TM-motif package (Chapter 4). The aligned set of serpentine receptors were given as an input to the TM-MOTIF, and the obtained results for the motifs and its occurrence in the predicted membrane topology were tabulated (Table 3.1 and 3A.1) for 60% level of conservation. As a pilot study, effort has been made in identifying the conserved motifs and nearly 92 conserved amino acid patterns have been identified and are tabulated with respect to the serpentine receptor family with predicted
128 membrane topology. Prominently, YRY motif is observed in TM3, ICL2 in many of the serpentine receptor families. And as sre family is most diverse when compared to sra, srab, and srb members in Sra superfamily, the identified motifs such as MIF, PIY (N`terminal), WTDD (ECL1), FEN (ECL3), RFQAKEN (ICL3), ETD (C` terminal) could help to define sequence features particularly to sra family. Also notably, IYL motif is conserved both in the sri, srj family related to Str superfamily. QLF motif is observed in the ICL3-TM6 of the str family, notably odr-10 is belonging to this family and this identified family-specific motif is observed in the odr-10 and is mapped on the modelled structure of odr-10 (Figure 3.5). Since the pilot study is varying in sample size, considerable chances of missing the conserved AA patterns but the study can be further improved by retaining unbiased sample size, applying consensus prediction methods at various percentage level of conservation. However, this pilot study uncovers the unique sequence feature of each serpentine receptor families and provides clues to develop the procedure on observing conservation and number and physiochemical property of substituting amino acids (AAS). 3.10.2 Homology Modelling of odr-10 odr-10 has been selected for structure prediction and the following procedure has been followed by homology modelling. 3.10.2.1 Pairwise alignment of odr-10 with bovine rhodopsin sequence The membrane topology of odr-10 was predicted by HMMTOP and the N-out topology was observed as in the canonical GPCRs.
129
Figure 3.4 Pairwise alignment of odr-10 with bovine rhodopsin sequence

Note : The pairwise alignment of bovine rhodopsin and odr-10 is given in (V) violet (I)indigo, (B) blue, (G) green, (Y) yellow, (O) orange, (R) red colouring scheme for the respective TM-helices. Alignment is done by using MAFFT. The conserved YRY motif in the TM3, ICL2 and the Str superfamily-specific QLF motif in ICL3 has been highlighted in red colour.
The crystal structure of bovine rhodopsin was selected as a template and a pairwise alignment was done by using an alignment program called MAFFT (Figure 3.4). The percentage identity between the template and odr-10 is 19.5% and a structure-guided alignment was provided as input to MODELLER (Sali and Blundell 1993) (for generating a three-dimensional model of odr-10 using homology modelling technique. Among the generated 20 models, best model in terms of least energy was selected (Figure 3.5) and was further energy minimized by using SYBYL software package (Tripos associate Inc). 3.10.2.2 Alignment by MAFFT The pairwise alignment of bovine rhodopsin and odr-10 is given in VIBGYOR (Voilet, Indigo, Blue, Green, Yellow, Orange and Red to denote TM1 to TM7, respectively) colouring scheme for the respective TM-helices. Alignment is done by using MAFFT. The conserved YRY motif in the TM3, ICL2 and the Str superfamily specific QLF motif in the ICL3 has been highlighted in red colour.
130 3.10.2.3 Structure validation for Odr-10 model Structure validation was performed by referring the pre-existing experimental data using PROCHECK server (Figure 3.5). The selected threedimensional model shows a final energy of -1020.23 Kcal/mol after energy minimization by SYBYL (Tripos associate Inc.). to be 3.882 . PROCHECK (Laskowski et al 1993) results of generated energy minimized model shows 82% of the residues are within strictly allowed regions and 14% are within partially allowed regions of Ramachandran plot (Figure 3.5). After excluding the intra and extracellular loop regions, minimized models with only TM helices were subjected to structure validation. The structure reports showed that the percentage of residues within allowed regions was 93.3% and those within partially allowed regions was 5.2% (Figure 3.5) indicating high quality of the model. RMSD was measured between the target structure (bovine rhodopsin) and odr-10 model and found
Figure 3.5
Three -dimensional model of olfactory receptor odr-10 and structure validation

Note: 3D model generated by MODELLER is displayed in ribbon representation. The seven transmembrane helices were colored in VIBGYOR colouring scheme. The str family specific motif QLF is identified by TM-MOTIF package and is denoted in sphere shapes in TM6 and details of structure validation is given in chart.
131 3.10.2.4 Preliminary phylogenetic analysis Preliminary study was performed with a phylogenetic analysis on selected human ORs with odr-10. The supplied human OR sequences were annotated with the respective cluster numbers, along with respective protein annotations. These are aligned with a nematode olfactory receptor, namely odr-10, which is the only functional OR reported from C. elegans genome. Sequences were aligned by using MAFFT server using JTT-200 matrix and the derived OR alignements were employed to generate phylogeny in MEGA 5.05 software for NJ method with 1000 Bs replicates. 3.10.2.5 Odr-10 an outgroup to HOR The nematode OR (odr-10) stays as an outgroup (Figure 3.6) in a phylogeny derived from an alignment with predominant human OR sequences. Although the reported topology (N-out topology) is favourable as predicted in higher order organism, odr-10 retains species-specific tendency and exhibit as an outgroup to human ORs. The nematode life style and recognizing limited and simple odors like Di-acetyl etc could be another reason for odr-10 staying an outgroup. The long evolutionary lineage between these two taxa can also be a strong reason for not exhibiting signnificant cocluster tendency in the phylogeny. This additionally suggests that agreement in topology may not necessarily include olfactory receptors to cluster together. Distinct Class I type receptors HsC1 Notably, a cluster (HsC1), containing fish-like ORs is known to recognize only the water-borne odours. This was observed as a distinct clade and retains intra- genomic retention or cluster-specific property in the human OR phylogeny. It is also useful to understand the evolutionary trends in higher order organisms, starting from lower chordate like fishes (Chapter 6 for more details). Interestingly, despite evoultionary conservation observed in HsC1 cluster including ORs from fish and humans, worm ORs (for example, odr10) remain distinct members in the phylogeny, nor do fish-like ORs of humans co-cluster with nematode ORs. The observed phylogeny also depicts
132 evolutionary hierarchy at various taonomic levels in cross-genome phylogenetic study where worm ORs, fish-like ORs and other human ORs are present in separate and distinct clades reflecting taxonomic hiearchy and species-specific requirement of ORs.
odr-10
Figure 3.6
Phylogeny on selected human olfactory receptors with an olfactory receptor (odr-10) from C. elegans
Note: The observed 10 human OR subclusters (Chapter 6 for more details) were denoted in Aqua (HsC1), Purple (HsC2), Teal (HsC3), Blue (HsC4), Green (HsC5), Yellow (HsC6), Orange (HsC7), Red (HsC8), Fuchsia (HsC9) and lime (HsC10) colours respectively in tree topology. Each human OR is designated with cluster number and HS as prefix. Notably, intra-genomic clustering in human ORs were observed and nematode OR tends [odr-10] stays as an outgroup.
3.11
CONCLUSION
683 serpentine receptors were collected from SEVENS database and were examined for their predicted membrane topology by HMMTOP (Chen et al 2002). 80% of sequences were reported for the N-out topology. The collected SR sequences were aligned and clustered using neighbourjoining method. Most of the serpentine receptors were clustered at the family level and depict the sequence-specific features at the superfamily level. Odr10, the only annotated olfactory receptor, and a subcluster related to this receptor have been well studied for the collection of odr-10-like sequences. 43 homologues of odr-10 have been collected through phylogenetic clustering and are denoted in the clusters such as Str_C1 to Str_C6. Interestingly, str112 seems to be the closest homologue to odr-10. Three-dimensional structure of this particular olfactory receptor, odr-10, has been modelled and studied for secondary structural details. This is a case study to demonstrate that reliable
133 3D model can help to recognise the functional important residues like ligand binding sites and general mechanism of odorant binding proteins. Though the remote sequence identity (19.5% ) exist between template and odr-10 in pair-wise alignment, an attempt has been made to model odr-10 by using bovine rhodopsin as template. This further can guide to compare with other alternate templates (where the recently solved crystal structures with high resolution can be used), and a multi template approach for modellling can also be implicated. A consensus method of predicting transmembrane topology can be viable in detecting the best coverage of predicted topology for both helice and loop regions. However, previous lab publication to model GPCR receptor (Kanagarajadurai et al 2009), ionicchannel (Shah and Sowdhamini 2001), and OR 83b - a ligand gated ionchannel of Drosophila (Harini and Sowdhamini 2012) also suggest an established protocol for modelling and validating membrane proteins of our interest. Primarily, the predicted N-out topology of odr-10 as the same as canonical GPCR (also retains N-out topology) drives to model this particular olfactory receptor. Among the generated 20 models by MODELLER (Sali and Blundell 1993), the least energy model was further energy minimized by using SYBYL software package (Tripos force field, Powell's gradient (100 iterations), Distance dependent dielectric constant 1 and nonbonded interaction cut off 8 has been set) and the resulted model shows the bond stretching energy 10.249, angle bending energy 336.245, torsional energy 255.229, van der waals energy -1568.429 and the total energy is 1020.234 kcals/mol. RMSD between the bovine rhodopsin and odr-10 found to be 3.882 . PROCHECK (Laskowski et al 1993) results of generated energy minimized model reports 82% of the residues are within strictly allowed regions and 14% are observed within partially allowed regions of Ramachandran plot (Figure 3.5). Since, the primary objective of the current study is performing phylogenetic analysis on selected serpentine receptors and to identify the SRs associated with the only one annotated olfactory receptor (odr-10), the
134 observed sub-clusters Str _C1 to Str_C6 (Figure 3.3) propose closest
homologues for odr-10. This helps further to propose representative candidates for secondary structure modelling , dimer-inteface predictions (Nemoto and Toh (2005), Nemoto and Toh (2009). The essence of the study has been used elaboratively in developing an integrated database called DOR (Database of Olfactory receptors) (refer 6.9.3 of Chapter 6) to connect the sequence-structure-function relationship of ORs in five dedicated genomes. In future, select olfactory receptor sequences (having data on odor binding) for docking and molecular dynamics analysis can be carried out. This would provide an insight into functional characterization of these receptors. In sequence analysis, By using TM-MOTIF, 92 serpentine receptor family-specific motifs have been identified (Table A1.1 in Appendix). Such sequence features could be useful as constraints in sequence searches, to train SVM models further to predict putative receptors from other nematode species. These hot-spot residues can be used to relate with the functional sites of the receptor(s) further to extend the possibilities in ligand-receptor binding, virtual screening with various ligands/receptors, ligand binding sites (through docking) which could enable better understanding of the mechanism of olfaction. The next chapter deals exclusively about TM-MOTIF in identifying motifs in TM proteins for numerous practical applications. Table 3.1 List of identified motifs in serpentine receptor superfamilies
SRA superfamily: Topology N` terminal TM1 TM2 TM4 TM5 TM6 TM7 ICL1,TM2 ECL2 ICL3 N` terminal ECL1 ICL3 ECL3 TM7 C` terminal
STR superfamily:
Srd family: GPC YFV PYR Sri family: IYL KHQ VLI Srj family: IYL RALIVQT, IPI AIIL FGNYR RSW PIFGI
Topology
Sra family:
SLN KISQ ,LTF NLF, ANL SGM, YGQTGLL ISI, STG FNL, ICFLT, FMF, YSFG VVW, PFIAL STKILL CATF WDDPL,YNK Sre family: MIF, PIY WTDD RFQAKEN FEN LNPL ETD
TM2, ECL1 TM7 TM7and C` terminal
TM1 ICL2 TM7
TM1 TM6 TM7 ICL1 ECL2,TM5 ECL3
135
CHAPTER 4 TM-MOTIF: A PACKAGE AND AN ALIGNMENT VIEWER TO IDENTIFY CONSERVED MOTIFS AND AMINO ACID SUBSTITUTIONS IN ALIGNED SET OF SEVEN TRANSMEMBRANE HELIX PROTEINS
4.1
INTRODUCTION Transmembrane proteins belong to the largest protein families and
are popular drug targets. They are involved in important biological functions and participate in signal transduction pathways. They recognize and mediate various stimuli such as hormone (example, by melanin-concentrating hormone receptor 1, gonadotrophin releasing hormone receptor, thyrotropinreleasing hormone receptor, growth hormone secretagogue receptor, follicle stimulating hormone receptor, luteinizing hormone/choriogonadotropin receptor), neurotransmitter (by muscarinic acetylcholine receptors,
catecholamine receptors, serotonin receptors,5-HT1,2,4,6, GABAB receptors, metabotropic glutamate receptors, purine receptors (P2Y): adenosine, AMP, ADP, ATP, peptide hormone receptors etc.,), growth factor, light and odors and so on (Marinissen and Gutkind 2001). Membrane proteins are abundant and ubiquitous. They are implicated in various diseases and are related to more than 30 different human diseases such as cancer, diabetes, hyperthyroidism, ovarian hyperstimulation syndrome, congenital stationary night blindness and causing obesity (Schoneberg et al 2004), (Schlyer and Horuk 2006). They are also associated with oligomerization and
136 understanding the structural details of oligomeric interfaces helps in identifying active stage of membrane proteins and to locate binding sites of ligands. The conserved amino acid patterns present in the helices and in the loop regions of membrane proteins play an important role in retaining conserved evolutionary trends, features and functional expressions. Apart from the conserved motifs, substituting AAs also play an important role in determining the functional expressions. Couple of cases like, a single amino acid mutation in rhodopsin motif [R194/K195]xE, in causing retinitis pigmentosa (RP) - a neurodegenerative disorder (Gleim et al 2009) and a single hydrophobic to hydrophobic substitution in the transmembrane domain impairing aspartate receptor function (Jeffery and Koshland 1994) stand as suitable examples to highlight the need of identifying the transmembrane motifs and the conservative amino acid substitutions. The implication of membrane proteins in vast practical applications such as pharmacy, drug design etc., require knowledge on sequence analysis, structural details, interacting interfaces, ligand binding sites, hot spot residues and so on. 4.1.1. Functional Importance of Conserved Motifs in TM-Proteins Various earlier studies emphasized the role and importance of conserved motifs in membrane proteins. The role of characteristic E/DRY motif is essential for regulating GPCR conformational states, wherein glutamic acid/aspartic acid maintains the receptor in its ground state (Rovati et al 2007) also NPXXY motif observed in the TM7 and C` terminal is crucial for structural constraints in rhodopsin (Fritze et al 2003), also involvement of NPxxY motif of the V2R in clathrin-mediated endocytosis (Bouley et al 2003) are commendable. Studies explaining the role of LWYIK
137 motif in HIV type-1 transmembrane gp41 protein for viral infection (Chen et al 2009) and CCR5a chemokine receptor acts as a coreceptor for
macrophage-tropic HIV-1 strains involving a sequence motif (TXP) in the second transmembrane helix (Govaerts et al 2001), permeabilizing motifs related to host membrane destabilization in alphaviruses (Jos Nieva et al 2004), immunoreceptor tyrosine-based inhibitor motif (ITIM) in cell proliferations (Duchene et al 2002) motifs in protein-protein interfaces (Neha Vyas et al 2008), characteristic motifs in GPCRs (Kim et al 2008), olfactory receptors (ORs) (Malnic et al 2004) are few examples to illustrate the functional importance of conserved motifs in various cellular activities. 4.1.2. Motif Related to Structural Integrity and Stability Conservation of seven amino acids in TM-proteins
(LIxxGVxxGVxxT) is related to dimerization, and usually occurs in the helixhelix interface. These motifs are thought to assist structural organization, apart from oligomerization (Lemmon et al 1994). A membrane-spanning heptad repeat motif is also found to be useful in mediating interaction between transmembrane segments, also referred as leucine zipper proteinprotein interaction motifs (Weiming Ruan et al 2004 ). The GxxxG motif (also known as packing motif) is essential for transmembrane helix-helix interactions in both membrane and water-soluble proteins (Russ and Engelman 2000). The detected AxxxA motif is responsible for the thermostability of protein structures in thermophiles. Many computational approaches and related data repositories are upcoming to analyse sequence properties, in particular, to motifs in membrane proteins (Tusndy et al 2008); (Marsico et al 2010). Aside from the convenience of automation, due to large number of sequences, previous
138 attempts by other methods do not consider mapping predicted secondary structures and evolutionary perspective of motif retention in one algorithm or within a single repository. 4.1.3. Impacts of Motifs in Evolutionary Bioinformatics Sequence analysis at finegrain level, in observing conserved domains and motifs always provide a path to understand the evolutionary conservation observed within and across genome. Comparison of the occurrence of motifs in the perspective of topology will provide a clue to connect with the structural and functional aspects. The knowledge on identifying cluster-specific motifs or family-specific motifs will help to classify the proteins (Attwood and Findlay 1993) and further can be associated with its structural and functional relevance. Motif-based construction of functional maps for olfactory receptors could be an appropriate citation to quote the implication of motifs in practical applications (Liu et al 2003). Thus, the tool to identify motifs in TM-proteins is highly useful to the field and is my current objective. 4.2. OBJECTIVES OF TM-MOTIF The current study is aimed to design a computational tooli.e., to identify conserved
TM-MOTIF, to serve with the dual objective
motifs (default by 60% level of conservation) and map the discovered motifs on predicted membrane topology of the set of aligned transmembrane protein sequences (TM proteins). It also serves as an effective alignment viewer in displaying predicted seven transmembrane boundaries in seven varying colours, namely violet (V), indigo (I), blue (B), green (G), yellow (Y), orange (O) and red (R) colors, and map the identified motifs on the colour-annotated
139 membrane helices. Apart from the identified conserved residues at 60%, physico-chemical properties of the substituting amino acids are also documented. 4.3. KEY FEATURES OF TM-MOTIF An in-built dataset of previously established phylogenetic clusters (Metpally and Sowdhamini 2005) of selected human and Drosophila GPCR clusters, a profile-based clustering of selected human and C. elegans GPCR clusters of eight major types of 32 clusters (Chapter 2) and clearly distinguishable 10 human -mouse OR clusters (Chapter 6) from cross-genome clustering studies were incorporated in TM-MOTIF package for the observation of conserved motifs and AAS at various percentage levels of conservation. TM-MOTIF facilitates the user to submit their sequence of interest (membrane proteins) in FASTA format along with its respective MSA in the given text box further to obtain various TM-MOTIF display options, wherein the predicted seven transmembrane helices were displayed in seven colours (VIBGYOR colouring scheme) and the identified motifs were mapped on the respective topology. The provided mouse-over option facilitates to display not only the identified motifs also amino acid substitutions with its physico- chemical properties in each position of the alignment. TM-MOTIF assists user to perform a BLAST search (using option Run-BLAST) to collect the nearest homologue for their sequence of interest from the in-house dataset further to
140 observe motifs and AAS. TM-MOTIF encourages user to select any one of the reference sequences (whose structure is solved) to align with their sequence of interest and the resulting pairwise alignment in VIBGYOR display serves for the initial requirements for the homology modelling. 4.4. METHODOLOGY A flow-chart (Figure 4.1) provides the pictorial representation of the methods and the steps involved in developing TM-MOTIF. And an overview of the tool-guide is given pictorially (Figure 4.2).
Figure 4.1 Flow-chart

Note: Flow-chart depicts the step wise procedure involved in developing TM-MOTIF. In principle the generated in-built dataset (Step 1) was used primarily to detect the membrane topology (by HMMTOP prediction method (Step 2). Respective cross-genome GPCR/OR cluster alignments were used to detect the conserved motifs by using in-house program (Step 3). The identified motifs were mapped on the membrane topology for TM-MOTIF display (Step 4). Separately, user provided option to choose (Step 5) a BLAST search to identify homologs from the in-built dataset (given in maroon box) or align sequence of their interest with any one of the given reference sequences (given in green colour).
141 4.4.1. In-Built Dataset of Cross-Genome GPCR and OR Cluster Dataset 4.4.1.1 Human-Drosophila cross-genome GPCR clusters From our previous lab publication (Metpally and Sowdhamini 2005), phylogenetically established 32 human-Drosophila GPCRs clusters of eight major receptor types such as peptide receptors (PR), chemokine receptors (CMK), nucleotide and lipid receptors (N&L), biogenic amine receptors (BGAR), secretin receptors (SEC), cell adhesion receptors (CAR), glutamate receptors (GLU) and frizzled /smoothened (FRZ) were incorporated as in-built GPCR cluster dataset in the package (Figure 4.2 and 4.3). 4.4.1.2 Human-C. elegans cross-genome GPCR clusters Through a profilebased clustering approach (RPS-BLAST), human and C. elegans GPCR clusters were established at varying E-value thresholds. And the resulted 32 cross-genome GPCR clusters were also considered and incorporated to the TM-MOTIF package (Chapter 2 also). 4.4.1.3 Human-mouse cross-genome OR clusters Olfactory receptors (ORs) belong to Class-A type of GPCRs. Through neighbourjoining method, 10 human-mouse cross-genome OR clusters were established and are also included as OR-subclusters in the package. 4.4.2. Alignment Procedures for Cross-Genome GPCR/OR Clusters Appropriate alignment tool is important in generating high-quality alignments. In our current study, the effective alignment procedures like CLUSTALW (Thompson 1994), PRALINETM (Pirovano et al 2008) and MAFFT (Katoh et al 2002) were used to align the cross-genome GPCRs of human-Drosophila GPCR clusters, human-C. elegans GPCR clusters and human-mouse OR clusters, respectively. The alignment procedure was
142 different to cluster datasets, due to the varying parameters such as number of sequences, sequence length, genomic combination and average cluster percentage identity. Whereas, user submitted queries are treated by standalone CLUSTALW alignment method incorporated in the package. 4.4.3. Prediction of Membrane Topology for TM Helices and Loops Each candidate receptor (GPCR or OR) from the cluster dataset was predicted for the membrane topology by using the standalone version of HMMTOP incorporated in the package. Care was taken to retain sequences with only 7(2) TM helices for in-built GPCR/OR cluster dataset. Sequences which are predicted between five to nine TM helices were alone considered for the current analysis. User-submitted query/queries is/are also considered for the same cut-off and prediction procedure.
Figure 4.2 Tool guide of TM-MOTIF : an overview

Figure 4.2: Pictorial representation for the tool guide of TM-MOTIF which depict the available options: User can start with selecting the in-built cluster dataset/organisms (pipelines referred in Left side), and viewing the MSA for the available display options of the TM-MOTIF such as Run TM, Run Motif, Run TM-Motif options. The user can also start by submitting their sequence of interest or alignment (pipelines referred in Right side) and can perform Run-BLAST (Methodology), while running BLAST users query is searching against in-built cluster dataset and finds out homologues sequence of the respected cluster. By selecting Alignment with reference sequence (refer Methodology) user can benefit by obtaining any one of the Display options -Run TM, Run Motif, Run TM-Motif).The four important output files in section 4.6.1.
143 4.4.4. Detection of Motifs and Amino Acid Substitution (AAS) in the Cross-Genome Alignment The in-built cross-genome GPCR/OR cluster alignments were used as an input (test set) to our in-house program (MotifS program-by Sowdhamini, unpublished results) to identify residue conservation and substitutions in each position of the alignment. Here, conservation simply refers to an average of all possible pairwise sequences and the score is consulted from a normalized AA exchange matrix. A motif is defined by at least three consecutive conserved AAs with high amino acid conservation (default is set for 60% conservation). The conservation of each residue in the set of aligned sequences was noted as consensus and documented if the percentage conservation at a position is from 60 to 100%. Once conserved amino acid patterns were recorded, the substituting AA residues were also identified and the properties for the AAS (like hydrophobic ( @), aromatic (*), polar positive (+), polar negative (-) and polar uncharged ($) are denoted by given symbolic representation. The significant observation of preserved motifs and AAS in the helices and loop regions of the cross-genome GPCR cluster dataset was published recently (refer Chapter 5) and the same principle was implemented in the package and also applied to detect motifs in OR cluster dataset and the same is applicable for user submitted queries and the obtained consensus from this program has been displayed along with MSA . 4.4.5. Mapping of Identified Motifs on TM-helices and Loops in MSA The discovered motifs by the in-house program (Motifs written by Sowdhamini), was mapped on predicted membrane topology (Methodology) of multiple sequence alignment (MSA) both for in-built cluster alignment and user-submitted query. The predicted seven TM helices are displayed in seven varying colours, such as violet (V), indigo (I), blue (B), green (G), yellow (Y), orange (O) and red (R) colors namely VIBGYOR colouring scheme. The
144 identified motifs, if within a predicted TM helix, are shown in
self-highlighted darker shades of VIBGYOR colouring scheme using TM-MOTIF package. Motifs observed in the loop regions are highlighted in grey colour. The candidate receptors predicted for 72 only considered for the study. And inevitably, considerable amount of false predictions occur due to false merge and false-split of TM-boundaries and causes over- and under prediction for TM helices. In such cases, consensus approach of prediction methods could be useful for improving the accuracy of predictions for membrane topology. Sequences with over/under predicted TM-helices are denoted in a pale cream colour since, the full length of such sequences are aligned with other candidate receptors with considerable indels in the MSA. In order to avoid the distracted display of VIBGYOR colouring scheme, the over/under predicted topologies are given in pale cream colour for the whole sequence. 4.4.6. Identification of Homologues Sequences for user Submitted Queries by Performing BLAST By using Run-Blast in TM-MOTIF package, user could collect the homologues sequences from the inbuilt data set, and co-aligned with the related GPCR/OR cluster dataset and made available for TM-MOTIF display. Here, usersubmitted query is aligned with the nearest homologues sequences, with its prealigned set of sequences in the cluster, by performing BLAST search using the profile alignment option of CLUSTALW. The results are displayed in a new window according to the specifications by the user in setting parameters (refer parameters in TM-MOTIF package for more detail). The default threshold is set for the sequence identity of 60% to recognize hits. BLAST version 2.1.18 was incorporated to the package (option Run -BLAST and Figure 4.2 and 4.3). 4.4.7. Pairwise Alignment in TM-MOTIF TM-MOTIF provides an option to select anyone of the listed reference sequences (whose structure is solved experimentally) and to align
145 users sequence of interest to obtain a pairwise alignment in the proposed VIBGYOR colouring scheme and aligned by CLUSTALW (option Align with reference sequence also Figure 4.2 and 4.3). There are seven reference sequences included in TM-MOTIF (bovine rhodopsin, Japanese flying squid rhodopsin, common turkey -1 AR, human -2 Adrenergic receptor, human adenosine receptor A2A, human dopamine D3 receptor and human CXCR4 chemokine receptor), whose crystal structures were solved experimentally and are available for user to select as relevant reference sequence or template 4.5 4.5.1. RESULTS Software Input and Output Options The main menu of the front window of the TM-MOTIF package, describing available input and output options (display options), parameter setting and choice of available organisms are provided as a snapshot (Figure 4.3).
Figure 4.3
Snapshot for the available main menu of the front window of TM-MOTIF with user interactive features
Note: Front window of TM-MOTIF displays the main menu with user interactive features: Note : Label 1 refers cluster number, receptor type of the in-built dataset, Label 2 directs for select organism combinations, Label 3 denotes parameter settings to set threshold for consensus and %id in BLAST, Label 4 provides input window to submit FASTA sequence or MSA, Label 5 provides option Run BLAST, Label 6 refers option compare with reference sequence. Label 7 refers for the display options like Run TM, Run Motif, Run TM-Motif.
146 4.5.2. Input Options The user can submit sequence in FASTA format or MSA by using the available text box given as Submit your FASTA sequence(s) or Submit your Input MSA (Figure 4.4). There are also choices given to select organism of interest by the user: GPCR or OR cluster dataset. The Cluster number, receptor type and organisms are mentioned in main menu of the package.
Figure 4.4
Options given for the submission of input sequences in TMMOTIF package

Note : A snapshot showing the given text boxes for the submission of multiple sequence in FASTA and .aln format Also related option Display Alignment with TM-motif is selected.
4.5.3. 4.5.3.1
Output Options Display of predicted 7 TM- helices in VIBGYOR colouring scheme: (by using Run TM option) The predicted TM helices (1-7) are highlighted in seven different
colours (VIBGYOR colouring scheme), as mentioned in methodology (Figure 4.5) This facilitates the user to view the transmembrane proteins in the large sequence alignments, also to keep track of record on current location of membrane topology, with residue conservation as consensus.
147
Consensus
Mouse-over option
VIBGYOR colouring scheme
Average Seq.Identity
Figure 4.5 Sample output for the option RUN TM

Note : Snapshot showing TM-MOTIF for the display of RUN TM for predicted seven TMhelices in VIBGYOR colouring scheme. Average sequence percentage identity of this particular cluster (human OR-Cluster 6) is 44% . A mouse over option at the position 140 is given to denote the conservation of arginine (R) in the MAYDRY VAIC motif in the TM3,ICL2 topology of human OR cluster 6 and observed substitution is by another polar positive residue.
4.5.3.2
Display of Identified Motifs and AAS in MSA: (by using Run Motif option) If the user is interested to record only the identified motifs in the
selected cluster or user-provided MSA, it is possible to display only the identified motifs and AAS in the MSA (using Run motif option). Here, observed motifs are highlighted in grey colour on MSA (Figure 4.6). The conserved amino acid residues are displayed below the alignment as text and referred as consensus. Besides, mouse-over option is also provided on each position of the MSA to guide the user to document the observations like alignment position, percentage residue conservation, substitution (AAS), and property of amino acid substitutions on MSA.
148
Figure 4.6 Sample output for the option RUN MOTIF

Note : Snapshot showing TM-MOTIF for the display of RUN MOTIF for the identified motifs in MSA . The conserved DRY motif is shown for the cross-genome human-Drosophila GPCR cluster at the alignment position 204.
4.5.3.3
Display of Detected Motifs on TM-helices: (by using Run TMMotif option) It is also possible to display identified motifs and predicted
membrane topology simultaneously on user-selected cluster or user-submitted MSA. This is probably the most crucial output delivered by the package since such an annotated alignment is biologically meaningful. As discussed by using Run TM option, the predicted membrane topology are d isplayed in VIBGYOR colouring scheme, and the embedded motifs in the alignment are denoted with the self-highlighting colours of VIBGYOR colouring scheme (Figure 4.7 A and B). Display of consensus and navigating mouseover option at each position of the alignment are also available to facilitate the observation on motif (level of conservation) AAS, respective topology in onego for the effective visual inspection and corresponding output files for documentation are also generated at each level of performance (output files for more details). All the three display options facilitate to display the detected motifs on not only in the TM helices but also in the intracellular and extra cellular loops and is useful to understand sequence properties at predicted loop
149 regions for comparative sequence analysis and to generate loop libraries. Also the display of over/under predicted TM-helices (Figure 4.9) in the alignment provides a caution to the user about the false prediction and could be rectified by the consensus of prediction methods. And this situation could be overcome by the re-edition/re-aligning the respective sequences, or eliminating possible outliers so to improve alignment quality and conservation.
Scroll bar in the Alignment viewer
Figure 4.7 Sample output for the option RUN TM-Motif

Note : Snapshot showing TM-MOTIF for the display of RUN TM-MOTIF where identified motifs are mapped on predicted TM-helices in MSA, Notably the conserved motif DRY motif is observed in the TM3,ICL2 region of the topology and the average sequence identity of the human GPCR cluster 1 is 30%. Here 4.7.A denotes the display for the four helices and 4.7.B denotes the display of rest of the three helices and the large alignments can be visualized by using the scroll bars given in the right-hand side of the display window.
150 4.5.3.4 Alignment with Reference Sequence User-submitted sequence can be aligned with any one of the selected reference sequences by using the option Select reference Sequence (Figure 4.3). This particular option helps user to prepare a pairwise alignment with their sequence of interest, which can enable the generation of a good quality 3D-model (Figure 4.8). A case study on a selected sequence, Odr-10 (a characterized olfactory receptor from C. elegans) was guided by this option of the package for pairwise alignment with bovine rhodopsin as an appropriate template. Notably, this illustration can be viewed as a practical application of TM-MOTIF package in guiding effective homology modelling.
Figure 4.8
Snapshot for the display of pairwise alignment of users input sequence with selected reference sequence
Note: Mouse over option at the position 135 shows the conservation of arginine (R) and a type is polar positive from the classical E/DRY motif. Predicted topology are highlighted in VIBGYOR colouring scheme with highlighted motifs as consensus with navigating mouse over options to display position ,conservation and AAS of the alignment. The user sequence namely, Q965V1_HUM is aligned with bovine rhodopsin sequence (1F88) further facilitate homology modelling).
151 4.5.3.5 Identifying closest homologues of user sequence in selected organisms When user is interested to search the homologues sequence for their sequence of interest from the available GPCR and OR cluster dataset, Run BLAST option is highly useful. The cluster from which maximum number of hits was obtained by BLAST search for the given query against the in-built GPCR and OR clusters are chosen for result alignment and for deciphering receptor type and functional relevance (Figure 4.3). 4.5.3.6 Display of Over predicted helices Apart from displaying the proteins predicted for seven helices, care has taken to present the over and under predicted GPCR sequence for seven helices and are displayed in the pale cream colour so as to understand clearly about the number of helices predicted in each sequence of GPCR cluster alignments. This facilitates correlation of sequence properties (such as presence of motifs or sequence identities) and structural properties (like secondary structural topology) with their associated (Figure 4.8) for potential functional values. This particular display of TM-MOTIF guides the quality of the alignment and the number of predicted TM-helices in the cluster dataset along with the average sequence identity.
Figure 4.9 Snapshot Depicts the Display of Over Predicted TM-Helices

The snapshot showing two GPCR sequences are under predicted for the seven helices in the selected OR cluster alignment and hence indicated in pale cream colour. Respective sequence identity of the cluster is given and a message saying Consensus approach for membrane prediction is advisable, because more than 10% of sequences in the cluster show over/under prediction in the NOTE.
152 4.6. DEFAULT PARAMETERS User is given opportunity to set the threshold from 30 % to 100% for recognizing the consensus and % identity in BLAST (Methodology). However, the default threshold is fixed as 60% (Figure 4.3 and 4.4) by using Select Organism Combination user can select any one or two organisms from the available dataset which includes H. sapiens, M. Musculus, D. Melanogaster and C. elegans to display clusterspecific or cross-genome cluster alignments. It is always preferable to select two organisms that are not too distantly related while performing cross-genome alignments. Options like Select GPCR Cluster and OR-subclusters (also refer Methodology) are also helpful to the user to select their interested cluster type for intra- and inter-genomic cluster alignments. Clusters are noted with respective cluster number and receptor types in the main menu of the front window. User can select any one of the reference sequence in the given option Select reference Sequence or use RunBLAST for pairwise alignment or MSA. In principle, while identifying the conserved residues in the aligned set of sequences, it is mainly dependent on the number of sequences, length of each candidate sequence and an overall average sequence identity of that particular cluster dataset which varies cluster to cluster in the dataset 4.6.1 TM-MOTIF- Output Files Along with the discussed three display-output options discussed above, three output files would also be generated: Zconsensus.txt : This output file provides a list of all the consensus residues and alignment position that satisfies the users specification of threshold values including the percentage of conservation and percentage of substitutions according to amino acid type.
153 Zpattern.txt: This file generates the list of all conserved amino acid positions and substitutions observed at the sites. Zmotif.txt: This output file generates a list of motifs with substitutions discovered in the alignment along with their start-and-end positions. Zuser.aln: This is the result file for the user-submitted sequence/MSA, aligned by CLUSTAL W alignment procedure and given in .aln format. This file is the primary source for the user-submitted query to provide either alignment options or display options. Zuser.pir: This is the result file for the user-submitted sequence/MSA, aligned by CLUSTALW alignment
procedure and given in. pir format. This file is the primary source for the detection of motifs and AAS by for the MOTIFS program. Zblast_sorted.txt: For the user option of RUN-BLAST, the number of hits observed in each in-built cluster can be obtained from the file 4.7. CAVEAT AND FUTURE DEVELOPMENT This tool has been coded in Perl language (using Tk module for GUIs).The package can be executed in LINUX OS and requires the following backhand programs to be installed prior to use: PerlTk, BioPerl, FORTRAN compiler and standalone versions of CLUSTAL W and BLAST 2. TM-MOTIF package could be extended to other genomes and membrane-bound helical proteins like ion-channels and transporters in future. TM-MOTIF could evolve into a specialized alignment viewer for transmembrane helix-rich proteins with added features such as a graphical display to
154 provide a 2D cartoon representation of the helix topology embedded with identified motifs. Seperately, an aid could be included to edit sequences only for the TM-helices or loop regions. So, user could generate TMdomains and loop libraries in turn will be useful for the AA composition in the TM-domains and loop library for the applications in the homology modelling. The TM-MOTIF package is available for open access to users for academic purposes and is integrated with DOR (Database of Olfactory receptors) and can be downloaded (Figure 4.10 and in Chapter 6). 4.8. AVAILABILITY The TM-MOTIF package is integrated with DOR (Database of Olfactory receptors) and can be downloadable from the URL
http://caps.ncbs.res.in/DOR. 4.9. CONCLUSIONS TM-MOTIF, a software package and an alignment viewer, helps to map discovered motifs on predicted TM- helices and loop regions in MSA. The VIBGYOR colouring scheme for TM-helices helps the user to track record of the current location of membrane topology and appreciate relative positions of motifs in the large sequence alignment. Selection of combination of organisms guides the user to understand the sequence properties at cross-genome level and the package is highly suited to perform comparative genomics studies.
155 The provided mouse-over option assists the user to obtain knowledge on position, amino acid conservation (motifs), amino acid substitutions (AAS) in MSA for better understanding on the sequence properties. The package is very efficient to analyse sequence properties at intra- and inter- genome level to identify receptor-specific motifs (otherwise cluster-specific motifs), common motifs occurring in more than one genome (at cross-genome level), to identify receptor-specific features and evolutionarily conserved motifs across genomes of various datasize. In essence, the package is highly suited for crossgenome sequence analysis. Besides, analysis of residue conservation could depend on the critical parameters such as average sequence length, percentage identity, genome of interest, clustering techniques, phylogenetic analysis and evolutionary lineage. So, the current interest in developing a handy computational package is to perform critical analysis to understand sequence properties focused mainly on conservation and substitution at sequence level. Such sequence analysis has vast applications in comparative genomics, effective alignment visualization with more information like membrane topology, observed motifs, strength of conservation, possible substitution at cross-genome sequence alignments provide to nourish knowledge on evolutionary consistent or distant at sequence level observations. The resulted effective alignment displays enable user to choose the best sequence /template in least time and to perform homology modelling and cross-genome sequence comparisons. Overall, the package is handy to forward the results to map hot spot residues on the structure, which in turn can help to connect with functions at intra- and inter-genomic level in broader scope.
156
CHAPTER 5 ANALYSIS ON CONSERVED MOTIFS AND PERMITTED AMINO ACID EXCHANGES IN CROSS-GENOME GPCR CLUSTERS
5.1
INTRODUCTION Membrane proteins are biologically most significant and participate
in various cellular activities such as signal transduction, oligomerization and cause diseases. These cell surface proteins are popular drug targets in pharmaceuticals. As described in Chapter 1, GPCRs are diverse and play vital role in several physiological functions such as perception of sensory information, modulation of synaptic transmission, hormone release and actions, regulation of cell contraction and migration, or cell growth and differentiation and abnormal function cause diseases (Wettschureck and Offermanns 2005). 5.2 OBJECTIVES The characteristic structural features of GPCR are the retention of seven helices with three intracellular, extracellular loops and flanking N and C termini. The key features of membrane proteins are helpful in comparing the predicted helix boundaries (TM-domain), loop lengths, sequence features such as conserved motifs and substituting amino acids and its physicochemical properties in the set of aligned homologues sequences or
157 phylogenetically associated sequences (clusters) at intra- and inter-genomic levels. The conserved amino acid patterns, i.e., motifs present in the helices and in the loop regions, provide knowledge in understanding the preserved trends in evolution, reasons for species-specific features and functional importance. So, the current study is aimed to identify conserved motifs, along with amino acid substitutions (AAS) in the set of aligned homologues sequences, particularly clusters of GPCRs belonging to two genomes. 5.3 RESIDUE CONSERVATION IN CROSS-GENOME SEQUENCES Number of earlier studies (Leonov and Arkin 2005) emphasized the role of conserved residues in the predicted helices and in the loop regions of transmembrane proteins for the structural integrity and functional implications. Principally, when GPCR sequences of more than one genome are aligned together, then the alignment can be referred as cross-genome GPCR alignment. By using phylogenetic procedure, association of homologues GPCRs produces GPCR-clusters and when GPCRs are dealt with from more than one genome, in this chapter, they are referred as crossgenome GPCR-clusters. These cross-genome GPCR cluster alignments are most interesting in studying the residue conservation imprints (motifs) preserved over evolutionary lineages. These kinds of exercises in identifying conserved motifs, along with AAS, provide knowledge on the extent of residue conservation and diversity existing in the cross-genome sequence level (Bjarnadottir et al 2006). It also provides a handle on the physico-chemical properties of the substituting amino acids at intra-and inter-genomic GPCR clusters advocate
158 about the evolutionary conservation or deviations that exist within or across the genome. 5.4 IMPACT OF AMINO ACID CONSERVATION AND TYPES OF SUBSTITUTIONS The conserved motifs, observed in cross-genome GPCR/OR clusters, play crucial role in retaining critical and characteristic function for various receptor types. For example, the evolutionarily conserved DRY motif present in the TM3-ICL2 (Figure 5.1) is important for GPCR- function in rhodopsin-like receptors (Class A type GPCRs), and mutation on this residue pattern leads to various functional consequences (Rompler et al 2006). Also the role of conserved 6-8 residues in SNARE protein assembly and function (Laage et al 2000), FF motif in mediating transport (Nufer et al 2002), serine and threonine residues in conserved patterns such as SxxSSxxT and SxxxSSxxT for oligomerization clearly emphasize the role of residue conservation for various GPCR-functions (Dawson et al 2002 ).
Figure 5.1
Pictorial representation to denote the occurrence of highly conserved DRY motif in TM3,ICL2
Figure 5.1: Pictorial representation of DRY motif in TM3/ICL2 of the membrane topology. TM1 to TM7 were given in violet (V), indigo (I), Blue (B), green (G), yellow (Y), orange (O) and red (R) colours respectively.
159 The current study is equally focused in observing AAS occurred in the conserved motifs for explaining the impact of amino acid substitutions in functional diversity, abnormalities and mutagenesis. For instance, a single amino acid mutation in rhodopsin motif [R194/K195]xE, in causing a neurodegenerative disorder called retinitis pigmentosa (RP) (Scott Gleim et al 2009). Similarly, a single hydrophobic to hydrophobic substitution in the transmembrane domain impairs aspartate receptor functions (Jeffery and Koshland 1994). Such examples demonstrate the need of identifying the motifs in transmembrane proteins and the observed amino acid substitutions. 5.5 METHODS The flowchart (Figure 5.2) summarizes stepwise procedure for identifying conserved amino acids (motifs) and substituting residues at each position of the MSA.
Figure 5.2 Flow-chart describes about the steps involved in the study
Note : Step 1. Denotes the available cross-genome GPCR cluster dataset (H. sapiens, D. melanogaster); Step 2. Alignment procedure (by PRALINE TM ); Step 3. Denotes the prediction of membrane topology by HMMTOP and given in left hand-side and discovering motifs and property of replacing aminoacids (By using MotifS program given in right-hand-side; Step 4. Denotes the analysis of motifs and substituting amino acids in respective membrane topology across selected genomes.
160 5.5.1 Cross-genome GPCR cluster dataset For the current study, a cross-genome GPCR cluster dataset (human-Drosophila GPCR cluster) of 32 clusters, derived from a previous lab publication (Metpally and Sowdhamini 2005), was used in analyzing conserved motifs and documenting the proposition and property of substituting amino acids in all 32 clusters (step 1 in Figure 5.2). The 32 clusters fall into eight major receptor types, such as peptide receptors, chemokine receptors, nucleotide and lipid receptors, biogenic amine receptors, secretin receptors, cell adhesion receptors, glutamate receptors and frizzled /smoothened receptors. Such classifications were useful to analyse conserved key motifs and amino acid substitutions (AAS)) along with the observed physico-chemical properties and to report cluster-specific or receptor-specific motifs at cross-genome level. As discussed earlier ( Chapter 2), Cluster 26 retains Drosophila-only GPCR clusters (Metpally and Sowdhamini, 2005). Also intra-genome cluster (human-GPCR cluster), crossgenome GPCR clusters (such as human-Drosophila, human-C. elegans GPCR clusters), associated by RPS-BLAST clustering (Chapter 2), were used for the current study. 5.5.2 Alignment Procedure The phylogenetically established GPCR cluster association enabled the assembly of the set of homologues sequences from human and Drosophila genome. Alignment tools play a crucial role in understanding sequence features even at remote homology. Thus, selecting an appropriate alignment tool helps in improving the alignment quality and to analyze sequence properties at each position in the alignment critically. In the current study, CLUSTAL W (Thompson 1994) was used to deal with human-GPCR, human-Drosophila GPCR cluster alignments and MAFFT (Katoh et al 2002) was used to align the human-C. elegans GPCR clusters (step (2) in Figure 5.2).
161 5.5.3 Prediction of membrane topology Each sequence of the cross-genome alignment was examined for the predicted membrane topology by using HMMTOP 2.1 version package (Tusnady and Simon 2001) ( step (3) in Figure 5.2). 5.5.4 Program to Detect Motifs and AAS The aligned set of sequences of cross-genome GPCRs, organized as 32 clusters, were provided as input to an in-house program (MotifS, written by Sowdhamini) to identify motifs (three consecutively conserved AAS with minimum of 60% conservation). The conservation of each residue in the set of aligned sequences was noted as consensus and was documented from 60100%. Once motifs were identified, the substituting or replacing amino acid in the identified pattern is recorded and has been classified based on its physico-chemical properties. The properties of substituting amino acid residue were denoted by symbolic representation. Symbols like @,*, +, -, $ were used to represent the hydrophobic, aromatic, polar positive, polar negative and polar uncharged property of amino acid residue, respectively. The symbolic representation for denoting AAS at each position in MSA helps to understand the composition of amino acid conservation and replacement. Incorporating the knowledge on predicted membrane topology and the identified motifs and AAS for each sequence in MSA helps us to perform easy analyses at cross-genome level (step (3,4) in Figure 5.2. The MotifS program uses Birkbeck matrix scoring scheme (that employs structurebased sequence alignment of several homologues protein families) for recording permitted AAS after normalization for the inherent frequency of occurrence of different amino acids. One of the result files, namely file.summary, is used to document the identified motifs along with the AAS. The motifs program is incorporated to TM-MOTIF package ( Chapter
162 4) and is used for the current study to document the location of identified motifs with respect to membrane topology. 5.6 RESULTS Conservation of residues were identified with AAS patterns for each of the 32 human-Drosophila GPCR clusters and property at alignment positions have been detected to understand the prominent frequency of AAS and the key property of that respective AA in the expected conserved pattern. Motifs observed for single receptor types were also documented. The results report the observed membrane topology related to motif, identified motif, motif with AAS and respective symbolic representation for the human-Drosophila GPCR cluster dataset. Motifs identified for particular receptor type were also documented to denote the cluster-specific/receptor-specific sequence properties. Table 5.1 Motifs@ observed in the transmembrane helices and loop regions of human and Drosophila GPCR clusters+
Motifs in Single receptor type No 1 2 3 4 5 6 7 8 Motif VGL(TM1)1 GNL(TM1)
1 1
Motifs in two different receptor types No 17 18 19 20 Motif AIA(TM3)2 CIS(TM3)

2 2
Receptor Type PR BGA BGA CMK BGA PR BGA PR
Receptor Type PR,CMK CMK,N&L PR,CMK PR,CMK
VMP(TM2)
LPL(TM5)
YLLNLA(TM2 )1 TASI(TM3)1 LGF(TM5)

1
LYA(TM7)2
Motifs in multi-receptor types 21 22 23 NLA(TM2)3 ADLL(TM2)3 CWLP(TM6)3 PR,CMK, BGA CMK,N&L, BGA PR,CMK, BGA
PFF(TM6)1 NSC(TM7)1
163 Table 5.1 (Continued)

Motifs in Single receptor type No 9 10 11 Motif WLGY(TM7)1 HCC(TM7)1 NPI(TM7)
1
Motifs in two different receptor types No 24 Motif DLL(TM2)4 Receptor Type PR,CMK, N&L,BGA
Receptor Type BGA CMK PR
27 28 29 30 31 32 33
Motifs in Loop regions* MRTVTN(ICL1) PR

1
Motifs in two different receptor types 12 13 14 15 16 SLA(TM2)2 IYL(TM2)

2
KLRN(ICL1)2 LDR(ICL1)1 DRYLA(ICL2) RYL(ICL2)3 WPFG(ECL1) LCK(ECL1)

1 1 1
BGA,SEC PR PR,CMK PR,CMK,N&L PR PR
PR,BGA CMK,N& L PR,CMK PR,CMK PR,CMK
LFL(TM2)2 TLP(TM2) LPF(TM2)

2 2
@ The observed motifs were tabulated along with distribution of various receptor types of human and Drosophila GPCR clusters.
+
Topologies of observed motifs are given within brackets and number of occurrence is denoted in superscript with respect to the number of receptor types. * Motifs corresponding to the classic DRY motif are shown in italics.
5.7
OCCURRENCE OF MOTIFS FOR SINGLE RECEPTOR TYPE Multiple sequence alignments from 32 GPCR cluster dataset were
analyzed for the presence of motifs for human-Drosophila and human-C. elegans GPCR cluster dataset (Alignment files are available at
http://caps.ncbs.res.in/download/crossgenomeGPCRs/align.zip). A total of 33 motifs were identified and 76% of them are within TM helices, predominantly in TM2 and TM7 in the human- Drosophila GPCR cluster dataset (Table 5.1). Interestingly, peptide receptors retain 20 motifs and covers nearly 64% of the identified motifs, whereas other receptor types like chemokine receptors,
164 nucleotide and lipid receptors and biogenic amine receptors contain 52%, 18% and 36% of motifs in the cross-genome cluster dataset. This could be due to the direct involvement of TM helices in ligand binding in the case of peptide receptors. The current study is not including the N and C termin i of the sequences and the study is focused only on selected set of sequences for the eight particular receptor types. The overall residue conservation is observed in the helices and the loop regions of human only, human-Drosophila and human-C. elegans GPCR clusters. Significant conservation occurs in the human-only and human-Drosophila GPCR clusters at the TM3 region. The ranking of residue conservation in helices and loop regions for human-only, human-Drosophila and human-C. elegans GPCR clusters are given in modellings (Figure 5.3.a-i). 5.8 MOTIFS OBSERVED IN HUMAN-DROSOPHILA CROSSGENOME CLUSTERS 5.8.1 Motifs Observed in Transmembrane Helices Notably, VGL motif in transmembrane helix 1 (TM1), LGF motif in TM5 and NSC motif in TM7 are observed exclusively in peptide receptors (Table 5.1). The same way, YLLNLA motif in TM2, HCC motif in TM7 are observed exclusively in chemokine receptors. GNL motif in TM1, VMP motif in TM2, TASI motif in TM3, PFF motif in TM6, WLGY motif in TM7 are identified solely in biogenic amine type receptors. Further, the conservation of these motifs can be correlated to the cluster- or receptor-type specific properties at the sequence level. There are nine motifs observed in two different types of receptors in the current study. SLA motif in TM2 is observed both in peptide and
165 biogenic amine receptors. Interestingly, peptide and chemokine type receptors retain prominent conservation of motifs, such as LFL, TLP and LPF motifs in TM2, AIA motif in TM3, LPL motif in TM5 and LYA in TM7 and explains the sequence conservation across two different receptor types and provide clues to connect common sequence properties (Table 5.1) among them. In a similar manner, IYL motif in TM2 and CIS motif in TM3 are observed not only in chemokine type receptors, but also in nucleotide and lipid type receptors. This emphasizes the utility of cross-genome clustering techniques, knowledge on receptor types for inferring the conservation of motifs across different receptor types at the cross-genome level. The significant occurrence of motifs in multi receptor type also tabulated (Table 5.1). The NLA motif in TM2 occurs in three different receptor types like peptide, chemokine and nucleotide and lipid type receptors. This motif has been observed for the maximum occurrence in our cluster dataset. The other motif DLL is also observed in TM2 helix in few clusters of peptide, chemokine, nucleotide, lipid and biogenic amine receptors. The same motif is also observed as ADL in TM2 in few clusters of all these four types of receptors (Table 5.1) and as ADLL motif in TM2 is observed in all three types of receptors, except peptide-type receptors. The CWLP motif in TM6 is identified in peptide, chemokine, biogenic amine type receptors, but not in nucleotide and lipid type receptors. In a broader sense, this significant conservation of motifs in TM2 explains the conservation of motifs not only with reference to the amino acid residues, but also with reference to their topology. 5.8.2 Motifs Observed in Loop Regions While observing motifs in the loop regions, eight different motifs were noted (Table 5.1). The well-known E/DRY motif in ICL2 has the
166 conservation as DRYLA in peptide (Cluster 3) and chemokine type receptors (Cluster 12) and RYL in nucleotide and lipid type receptors (Cluster 15). ASG motif in ICL1 is conserved exclusively in glutamate receptors, whereas MRTVTN in ICL1 and LDR motif in ICL2 were conserved exclusively in peptide type receptors. Notably, WPFG and LCK motifs were found exclusively in ECL2 of peptide type receptors. Interestingly, KLRN motif is observed in biogenic amine receptors (Cluster 21) and in secretin receptors (Cluster 26) in ICL1. Notably, Cluster 26 has a set of homologues sequences from Drosophila only GPCR clusters. However, Cluster 21 has GPCR sequences from both human and Drosophila genomes and one could notice the common motifs such as GNL, ADLL observed across two taxa. This particular cluster can be a best illustration to emphasize the need of crossgenome phylogenetic analysis at sequence level even at distant relationships and during strong evolutionary drifts. Since the conservation of amino acids in the ECL2 is crucial for the participation of ligand binding, the current study reports the presence of eight function-specific motifs in ECL2, distributed in PR, N&L, BGAR, GLU, FRZ/SMT receptors. However, several motifs were identified in only one of the 32 cluster of receptors (Cluster/receptor specific motifs). For example, CLP motif from PR (Cluster 7) has AAS in the pattern as [C/P][L/F][P/C/S]. In the current study, there are 133 cluster-specific motifs observed in transmembrane helices and 62 cluster-specific motifs observed in the loop regions. The average sequence length of each of the TM-helices and loops were calculated from set of sequences based on the HMMTOP boundary predictions. The average percentage of residue conservation in each TM helix and loop region were examined for the eight types of receptors.
167 Interestingly, the maximum amino acid conservation occurs as 42% and 46 % in TM2 and TM3, respectively. Significant conservation of 55%, 80%, 61% occurs in TM1, TM2, TM3 within CMK receptors. Although the occurrence of motifs is high in PR, it retains only 30- 50% of conservation at TM2, TM6 and TM7. Generally, AA conservation is high at TM2 for BGAR, SEC, GLUR, and FRZ type receptors. In most of the clusters, as expected, percentage residue conservation in ICL2 is higher than the other loop regions. 5.9 MOTIFS OBSERVED IN HUMAN- C. elegans GPCR CROSSGENOME CLUSTERS Since the selected human-C. elegans GPCRs possess remote homology, the motifs are limited and are documented at the 30% conservation (due to evolutionary lineage) . 295 motifs were observed in the human and C. elegans GPCR clusters. Since the number of human and C. elegans GPCRs varies in each cluster, the pilot study can be studies delicately to C. elegansonly GPCRs for eight major receptor types. This study can also be elaborated further by adding closely related candidate GPCRs (from other nematode species) into the respective clusters and to improve data size and to observe residue conservation at various percentages so to define sequence features to support vector machines such as SVM.
168
Figure 5.3
Percentage residue conservation in TM helices and loops in GPCR Clusters

Figure 5.3 (a-i) Bar diagram showing the percentage residue conservation in TM region, intracellular loop, extracellular loop of human GPCR clusters (shown in panels a, b, c); human-Drosophila GPCR clusters (shown in panels d, e, f); human-C. elegans GPCR clusters (shown in panels g, h, i ) respectively.
The reported motifs at intra- and inter-genomic levels (for 60% of conservation) provide information about the optimal residue conservation and also provide preliminary knowledge about level of sequence conservation. In the interest of highlighting the observed classical motifs at cross-genome level, I am providing few examples from the cross-genome GPCR alignments (Figure 5.4.a-c).
169 5.10 CHARACTERISTIC MOTIFS FROM CROSS-GENOME GPCR CLUSTERS 5.10.1 Conserved D/ERY and NPXXY motifs in GPCR Clusters As cited in many literature evidences (Rovati et al 2007), the highly conserved characteristic E/DRY motif located at the boundary
between transmembrane domain (TM) III and intracellular loop (ICL) 2 of Family A GPCRs play a pivotal role in regulating GPCR conformational states. The importance of DRY motifs in connection with active MG4R in humans is well-known (Yamano et al 2004). Notably, in the cross-genome GPCR alignments, the preservation of characteristic DRY motif was observed in our current study (Model 5.4.a) where predominately troptophan (W) is conserved in most of the clusters. In particular, human GPCRs, there is high degree of conservation and the substituting AA also mostly belong to aromatic group (example in Chemokine receptors in Cluster 12,13) , but in Drosophila, a weak conservation of tyrosine is observed when compared to aspartate and arginine (Figure 5.4.a) in peptide receptors. Arginine is
conserved comparatively well and the substitutions are of polar uncharged ($) or positively charged residues (+) of the same kind (example in Biogeneic amine receptors in cluster 24). The characteristic NPXXY motif in the C terminal of the GPCR sequences in MSA could be recorded, for example, NPIIY and NPLIYA motif in the peptide and chemokine type receptors. However, due to the cut-off for the percentage level of conservation threshold of 60%, some of the weak conservations in other cluster types are not recorded. 5.10.2 Identified KLK/R and RLAR/K motif in Secretin Receptor Another highly conserved motif, KLR / RLAR motif, is seen within the third endoloop of the family B human secretin receptor (Figure 5.4.b).
170 Block deletion of KLRT and mutation of Lys323 (K323I) is known to reduce cAMP accumulation, and these mutations do not affect ligand interaction. Thus, the KLRT region at the N-terminus of the third intracellular loop, particularly Lys323, is important for G-protein coupling. Also, it is noticed that for the RLAR motif, substitutions from Arg (R330) to Ala (342A), Glu (342E), or Ile (342I) as well as to block deletion of the RLAR motif were all found to be defective in both secretin-binding and cAMP production. KLK/R and RLAR/K pattern is seen to be conserved in two proteins, GLR and GLP1, which belong to the secretin family noted in Cluster 25 of our GPCR cluster dataset. Alhough due to the strict conservation threshold of 60% level, some motifs are not recorded, due to the biological significance, this occurrence is highlighted and given in Figure 5.4.b. 5.10.3 Conserved PMNYM / PMSYM motif in BGA Receptor The PMNYM / PMSYM pattern is conserved in TM5 of GPCRs. TM5 has been suggested to be implicated in self-association and may be involved in the dimerization of the receptor A2aR (Human adenosine receptor). In adenosine A2b receptor, asparagine (N) residue is replaced by serine (S) generating the motif PMSYM, thus differentiating the two isoforms of receptors functionally. It is suggested that the motif PMNYM of A2aR and PMSYM of A2bR may be involved in TM assembly of the two isoforms of the receptors, respectively. The information may provide an insight into the molecular mechanism of receptor-ligand interaction leading to design of tailored compounds. Notably, the consensus was not achieved at 60% threshold, and the PMNYM pattern is not documented in the result file. However, a careful observation of the alignment, helps us to identify the important PMNYM/PMSYM in GPCR cluster 23 (Figure 5.4.c).
171
Figure 5.4(a-c) Illustration of characteristic motifs (observed at 60% conservation) Alignments showing conserved E/DRY, KLR/RLAR and
PMNYM/PMSYM motifs in GPCR clusters (noted in the panel a, b, c respectively). 5.11 SUMMARY The current approach for identifying conserved motifs and substituting AA residues are effective in recognizing functionally important residues in GPCR cluster dataset. Along with the well-known characteristic motifs (Figure 5.4.a-c), other preserved motif patterns in the MSA were also identified for their occurrence at 60-100% conservation. The reports display the residue conservation / identity, permitted AAS (based on their respective physiochemical property) at each position and cluster-specific motifs. This current approach can be applied to other
172 membrane-bound receptors (such as olfactory receptors) and protein families to detect the conserved motifs. It will be interesting to map the identified motifs on predicted topology in MSA which may be helpful to perform evolutionary studies at the cross-genome level. Due to remote homology, there are chances of missing the key motifs in the generated MSA in some cases, especially in cross-genome GPCR alignments. The current study (based on the recognition of motifs, derived from average AAS scores) is helpful in recognizing both classical and newer motifs, which have not been hitherto attributed any functional significance. The current approach of analyzing sequence properties in the set of aligned sequences can be applicable to compare with a reference sequence (of known 3D structure) to understand sequence similarity in the predicted topology and preserved motifs with AAS at each position. This method can be used as a guiding principle for 3-D modelling of GPCR sequences. Homology modelling, together with such motif analysis, could uncover additional spatial clusters or spatial motifs, which may be critical for function. This study can be further extended to comparative genome sequence analysis involving GPCRs from other genomes in future. Also the related supporting tables can be downloadable from the URL http://www.ncbi.nlm.nih.gov/pmc/articles/pmc3163927.
173
CHAPTER 6 GENOME WIDE SURVEY OF OLFACTORY RECEPTORS (ORs) IN SELECTED EUKARYOTIC GENOMES
6.1. 6.1.1.
PHYLOGENETIC STUDY ON SELECTED HUMAN ORS Introduction There have been a number of earlier studies which emphasised the
importance of ORs (Chess et al 1994), and tremendous efforts have been made in updating the knowledge of ORs at multiple levels such as creating data repositories (Crasto et al 2002), understanding receptor specificity, olfactory neural circuit, wiring specificity, olfactory map at different developmental stages (Chou et al 2010). Several recent studies in odor recognization for intelligent systems like e-nose, machine olfaction, mobile robots (e.g. pippi) and their application in food industry and medical diagnosis are highly appreciable. All such sophisticated analyses primarily depend on the initial sequence analysis of these receptors. Thus, performing a genome-wide survey on OR sequences from selected eukaryotic genomes will facilitate to identify the conserved evolutionary trends at intra- and intergenomic levels in order to explore the structural and functional significance. 6.1.2. Objectives and Scopes Objective of my current study is to perform genome-wide survey on ORs and performing phylogenetic analysis on selected eukaryotic
174 organisms such as yeast (S. cerevisae), fly (D. melanogaster), worm (C. elegans), mouse (M. musculus), rat (R. norvegicus), dog (C. familiaris), human (H. sapiens) and few non-human primates. The aim includes retrieval of OR sequences, predicting membrane topology, identifying conserved motifs, orthologs, creating non-redundant data repositories and analyzing phylogenetic clusters at intra-and inter-genomic levels (applicable to certain genomic combinations). The study helps to analyze sequence association as clusters in the phylogeny (Metpally and Sowdhamini 2005) and to identify conserved sequence features as cluster/ species specific motifs by using TM-MOTIF package (Chapter 4). The obtained preliminary knowledge on sequence information, through genome-wide survey, along with additional features like generated 3D-models, predicted dimer interface (collaboration with other research group) has been integrated to construct a non-redundant data repository
called DOR (Database of Olfactory receptors) (refer Section 6.10). 6.1.3. Olfactory Receptors The process of olfaction can be effectively communicated by the most important receptors, i.e., olfactory receptors, which are G-protein coupled, seven-transmembranedomain proteins located on the surface of the dendritic cilia of olfactory neurons. And in this section (6.1), I would like to discuss about the availability, predicted membrane topology and phylogeny of selected human olfactory receptors in detail and ORs in other eukaryotes will be discussed in subsequent sections from 6.2-6.9.
175 6.1.4. OR: Membrane Topology As mentioned in the Introduction chapter, there are several prediction methods available online to predict the secondary structure of membrane proteins. The prediction methods are mainly based on the hydrophobicity profile of the TM-helices. For the current study, I am using HMMTOP-prediction server (Tusnady and Simon 2001) to predict the membrane topology. However, consensus analysis with more than one prediction method helps to improve the accuracy. Generally, ORs are predicted for the N-out (N-terminal out) topology as canonical GPCRs in higher eukaryotes such as mouse, rat, human and C. elegans (Bargmann 2006, Sengupta et al 1996), whereas the reverse topology (i.e., N-in and C-out topology) has been observed in the Drosophila ORs (most of the insect ORs) (Bargmann 2006, Benton et al 2006), and is also referred as reverse/inverted topology (please see Section 6.4 for details). 6.1.5. Prior Studies on ORs Olfactory receptor genes are generally expressed in bipolar neurons and the dendritic membrane terminates with filamentous process to increase the surface area to capture diverse stimuli from the environment. In general, the morphology of the olfactory receptor cells are common in different taxa (vertebrates, insects and nematode) (Ache and Young 2005), although the overall morphology is conserved, they tend to be adaptive. This happens in a habitat-dependent not a species-dependant manner (Stensmyr et al 2005) and this phenomenon is much helpful in interpreting the trend of evolution in olfaction and cluster associations of diverse taxa.
176 All organisms recognize vast array of odorants by using ORs (member belongs to class A type GPCRs) by activating the G-protein based cascades by the action of various ligands binding to the receptors. Buck and Axel reported diverse family of GPCRs in the rat epithelium and their participation in olfaction during 1991 (Buck and Axel 1991). Since then, studies in identifying functional ORs in diverse genome have emerged and various molecular and bioinformatics approaches have identified a great number of ORs in vertebrates such as mammals, birds, fish and amphibians (Hayden et al 2010). But, invertebrate species have independently expanded chemosensory GPCRs to perform olfaction (Bargmann 2006, Robertson and Thomas 2006). Pioneer studies played a major role in identifying and
documenting human ORs (Sharon et al 1998, Mombaerts et al 1996, Glusman et al 2001). In mammals (mouse, rat), around 1000 genes were identified and estimated for contributing to olfactory receptor family (Crasto et al 2002, Mombaerts 1999) and it constituted only 3% of the whole human genome. Rouquier and coworkers (Rouquier et al 2000) identified around 72% of human olfactory receptors and early research indicated the loss of genes during the process of evolution in human olfactory gene family. This clearly shows the loss of receptor function by the transformation of functional genes into pseudogenes. This phenomenon was observed as most common in human and prosimian primates. But, it was less common in lower primates and was very rare in mouse or zebra fish. The two extreme examples, such as the absence of functional ORs in dolphin and the deterioration of vision in moles can be used to understand the mechanism of species requirement on sensory acuity. It is generally observed that the humans have reduced olfactory acuity when compared to rodents and non-human primates.
177 Earlier studies also reported nearly 350-370 full-length functional olfactory genes (Zozulya et al 2001) (Glusman et al 2001) and more than 900 pseudogenes in the human genome. Human ORs are distributed
predominantly on chromosome 11 and this shows the central role of chromosome 11 in olfaction (Rouquier et al 2000, Crasto et al 2001). Besides this, ORs are also distributed on other chromosomes such as 1, 9, 6, and 14. Interestingly, chromosomes 10, 22, and X carry only one OR gene in humans. Studies also report about the occurrence of class I type (fish-like ORs) and class II type (mammalian-like ORs) receptors in human OR family (Zozulya, et al 2001). Human ORs are documented with subclusters in the phylogenetic tree due to the event of evolutionary divergence and duplications. These events cause diversity at inter-genomic level. Another interesting fact is that, there is only one allele expressed for olfactory receptor gene in any given olfactory receptor neuron and the underlying mechanism of the other excluded allele is still unclear (Chess et al 1994). 6.1.6. Methodology This sub-section describes a step-wise procedure which includes data collection, prediction of membrane topology, alignment procedures, phylogenetic tree construction for uni-genome or cross-genome phylogeny (Figure 6.1). Resultant phylogeny can be further analyzed for cluster association, average percentage identity and cluster-specific motifs and is followed for the given exercises, when required additional specifications have been mentioned. 6.1.6.1. Retrieval of OR sequences Data repositories like ORDB (Crasto et al 2002), NCBI protein resource (http://www.ncbi.nlm.nih.gov/protein) are the major data resources to retrieve OR sequences from selected eukaryotes. The genetic description (example : Homo sapiens in this case) of each sequence was verified at this
178 step and genes referring to putative, hypothetical, partial and incomplete OR sequences were not considered. Primarily, collected sequences were submitted to the CD-HIT server (Huang et al 2010) and sequence with >95% identity were removed to avoid redundancy. Since diverse nomenclature has been used for referring OR sequences, the collected OR sequences were referred with their protein ID, (NCBI record identifier) and are denoted by gene symbol with prefix HS to denote the organism name as Homo sapiens, wherein XLOR for Xenopus levis , FOR for fish ORs, CeOR for C. elegans ORs, MMOR for mouse ORs and so on. This labeling is convenient and helpful for legible phylogenetic displays. Apart from convenience, it has more significance in adding the chromosomal location of each receptor in the current dataset to each gene symbol, in particular for the sequences from chromosome 11 and 1 which are marked with a suffix as _chr11 and _chr 1 respectively. 6.1.6.2. Prediction of membrane topology : Human ORs The collected non-redundant OR sequences were submitted to HMMTOP- server (Tusnady and Simon 2001) to predict membrane topology. Considerable amount of mispredictions do occur, due to false merge and false-split of TM-boundaries, and cause underprediction and overprediction for TM helices. However, consensus approach of prediction could be useful for improving the accuracy of predictions for membrane topology. The sequences predicted only for 72 TM helices were considered for current study (applicable to other genomes also). By using HMMTOP, 87% of human OR sequences were predicted for N-out topology (i.e., N-terminal region of the sequences were present outside of the cell) and the selected 371 OR sequences were prepared with short description and made ready for the alignment procedure.
179 6.1.6.3. Alignment procedure MAFFT online alignment server (Katoh et al 2002) was used to align OR sequences by using parameters such as JTT 200 for scoring matrix (Jones et al 1992) with gap opening penalty as 1.53. The initial alignment was exported to MEGA 5.05 software (Tamura et al 2011) and at this stage, a careful editing was done to refine the quality of the alignment. This particular step is crucial while aligning ORs from different genomes, where there could be the problem of remote homology (see section 6.4 with Drosophila ORs and nematode ORs). The obtained final alignment session could be saved (file.mas) in MEGA to construct phylogeny.
Figure 6.1 Flow-chart for the sequence analysis on olfactory receptors

Note: (Figure 6.1) The given flow-chart depicts the stepwise procedure involved in generating phylogenetic analysis on selected olfactory receptors. The pictorial representation denotes the data collection & curation, prediction of membrane topology, alignment procedure and creating phylogeny using various tools like CD-hit, HMMTOP, MAFFT and MEGA 5.05 respectively.
6.1.6.4.
Phylogeny on selected human olfactory receptors The multiple sequence alignment (MSA) of selected human ORs
was used to construct a phylogeny by fixing 1000 bootstrap replicates for
180 neighbor joining (Nj) method. The generated tree session files were saved with the extension .mts and the radial and rectangular displays were used for analyzing the tree topology. 6.1.6.5. Analysis of phylogeny The constructed tree topology was analyzed for cluster association. Phylogenetically grouped OR sequences, where association is based on clades, were designated as clusters. Sequences with significant boostrap value (Bs) (i.e., more than 50) in the phylogeny were considered as reliably associated, as mentioned in the case study of mouse OR classification (Zhang and Firestein 2002).
0.1
(a)
(b)
Figure 6.2 (A and B) Phylogenetic display of selected human olfactory receptor

Figure 6.2. A) Rectangular display of human OR phylogeny shows the distribution of ORs from chromosome 1 (blue colour), 11(pink) and from other chromosomes green color. B) The observed 10 subclusters were denoted in aqua (HSC1), violet (HSC2), indigo (HSC3), blue (HSC4), green (HSC5), yellow (HSC6), orange (HSC7), red (HSC8), fuchsia (HSC9) and lime (HSC10) colors, respectively, in tree topology along with the average percentage identity (Kindly read anti-clock wise). Notably, HSC1 stays distinct in the tree topology and all the ORs related to this subcluster (HSC1) are located in chromosome 11 (noted in black colour circles).
181 6.1.7. Results The performed intra-genomic phylogeny of selected 371 human ORs, exhibits 10 different subclusters. Among them, HSC1 cluster stays distinct and interestingly all sequences associated to this particular cluster were originated from chromosome 11 (Figure 6.2A and B). The cluster associations were labeled as HSC1 to HSC10 (referring to the organism name followed by cluster number). 6.1.7.1. Class I and II type receptors in human OR phylogeny Through prior literature (Zozulya et al 2001), distinct HSC1 cluster was considered to be related to class I type receptors (also known as fish-like ORs to sense water borne odors) in Homo sapiens and ORs dispersed in other subclusters could be referred as class II type receptors (also known as mammalian-like ORs to sense air-borne odors). Attempts were made to confirm the HSC1 as fishlike ORs by performing a crossgenome phylogenetic analysis of established human ORs with the few selected OR sequences from various fish genomes (Section 6.2) and the results were as expected. The OR sequences (54 in number) observed in the HSC1 showed the average sequence identity as 44% (Figure 6.2 B). Among 10 human OR subclusters, HSC2 showed the highest average sequence identity (54%). HSC9 and HSC10 clusters also exhibit reasonable sequence identity as 52%. 6.1.7.2. Sequence features of 10 human OR-subclusters Among the collected 371 human olfactory receptors, 87% of sequences were predicted for N-out topology (predicted by the HMMTOP server).
182 However, for the current study, the sequence predicted for N-out (356) and N-in (45) were taken into account for generating phylogeny but, essentially sequences predicted for N-out topology observed with 72 predicted TM helices were only considered for analyzing the average number of predicted residues in the helices and in the loop regions. The results showed that the number of residues predicted for TM helices range from 19-23 amino acids and notably the average number of residues were observed as 23, 21, 21, 23, 22, 22, 19 for TM1-TM7 helices; 12, 21, 15 for intracellular loops (ICL1-ICL3 loops) and 17, 34 ,11 for extracellular loops (ECL1-ECL3 loops) respectively. Notably, TM7 exhibit relatively less number of average residues and among the predicted loop regions ICL2 and ECL2 are longer than the other loop regions. The observed long length of ECL2 could be due to ligand binding properties and long length of ICL2 could be due to the occurrence of conserved motif MAYDRYVAIC and its functional importance in structure stability could be probable reasons. In general, average sequence identity, average length and number of sequences observed in each cluster (from HSC1-HSC10) vary and influence the cluster-specific properties (like motifs) as are tabulated (Table 6.1). 6.1.7.3. Representative OR sequences Among the 10 human OR subclusters, around 50 OR sequences were selected to represent each cluster and atleast three representative sequences from each clade was selected with significant Bs value. These representative sequences would be appropriate candidates ORs to perform modeling and to predict secondary structural features. This may further help
183 to connect the structure and functional properties at the sequence level. The average sequence identity of selected representative sequence with the associated OR sequences were ranging from 40 to 53% which provide significant level of confidence. Table 6.1 Analysis on sequence features of 10 human OR subclusters Cluster No HSC1 HSC2 HSC3 HSC4 HSC5 HSC6 HSC7 HSC8 HSC9 HSC10
[
No of Sequence 54 40 61 43 35 9 33 24 34 38
Average alignment length 320 317 305 313 315 315 314 307 317 316
Average Sequence identity 44% 54% 50% 53% 49% 44% 46% 49% 52% 52%
Note :
Table for the observed no. of sequences, average alignment length, and average sequence identity of the 10 human OR subclusters.
6.1.7.4.
Motif analysis on human olfactory receptors In the interest of identifying the conserved motifs and substituting
amino acids (AAS) in the observed 10 human OR subclusters, the respective MSA (aligned by MAFFT) of each cluster has been used as an inbuilt dataset to the TM-MOTIF package (Chapter 4). The identified motifs (at 60% level of conservation), along with respective membrane topology, were documented by using TM-MOTIF package. Overal1, 162 motifs were identified from HSC1-HSC10. The residue conservation is documented not
184 only to the consecutive three AA residue conservation but also with additionally conserved residues at 60% level of conservation. Motifs observed for one particular cluster (cluster-specific motifs) and more than one cluster with respective topology were also reported (Table 6.2). Apart from the conserved characteristic motifs such as
MAYDRYVAIC motif in between TM3 and ICL2 and NPXXY motif in TM7, the PMY motif (Table 6.2) is observed in TM1, ICL1 topology and will be the best example to denote the sequence conservation retained at all the clusters. This particular PMY motif is evolutionarily important and advocates knowledge about the passed evolutionary trends from the aquatic (Class I type) to terrestrial habitant (Class II type) (Freitag et al 1995) by occurring at HSC1-HSC10 as it occurs in all clusters from HSC1 to HSC10. Table 6.2 List of conserved motifs in 10 human OR subclusters (60% level of conservations)
185 Table 6.2 (Continued)
6.1.7.5.
SVM Analysis For the preliminary analysis to predict putative olfactory receptors
by using support vector machine (SVM) techniques, the collected (371) human olfactory receptors were kept as positive dataset and GPCRs as negative dataset. The collected 371 human OR sequences were used to train SVM along with OR sequences from other genomes such as mouse (338 ORs), frog (15 ORs), fly (64 ORs), worm (odr-10 and homologues) and yeast (5 ORs). GPCRs from the genomes such as human (351 GPCRs), mouse (331 GPCRs), rat (283 GPCRs), worm (735 GPCRs) and fly (100 GPCRs) were also used in the current study for the non-OR dataset. Here, sequence properties like predicted helices and loop regions, physico-chemical property of residues were highly helpful to define feature to
186 the SVM. A pilot study was carried out with the dataset and the features have been set to identify the putative olfactory receptors in humans (Kandaswamy et al 2010). The study is useful to perform survey in human genome with trained SVM. A human proteome database containing 89822 protein sequences was downloaded from the IPI database (http://www. ebi. ac. uk/IPI/). 592 proteins were predicted as sequence properties of ORs by SVM. Out of these 592 gene products, 449 proteins were observed for the annotation details as OR positives (data unpublished). The accuracy for the training set was obtained as 87. 79% and the testing set accuracy was 86.63 % and the sensitivity 85. 00%, specificity 87. 55% and MCC 0. 7154. As a result, 58
sequences were predicted as putative ORs for which the sequence identity is observed in the range of 60-90% with true positives. Among them, 45 sequences were verified for the UNIPROT Ids, in that 33 were reported for the reviewed status and 12 for un reviewed status in UNIPROT database. 6.2. CROSS-GENOME PHYLOGENY ON SELECTED ORS FROM HUMAN AND FISH GENOMES 6.2.1. Objective As phylogenetic analysis of selected human olfactory receptors showed HSC1 as distinct cluster and is assumed to be related to class I type receptors, the analysis was next aimed to align human olfactory receptors with selected olfactory receptors from various fish genomes and to observe the influence of fish ORs in the previously established human OR phylogeny. Thus, performing cross-genome phylogenetic analysis with selected fish ORs and 371 human ORs will be helpful to identify fish-like ORs in the already established human OR phylogeny (Section 6.1).
187 6.2.2. Review of Literatures The class I type receptors are generally associated with sensing water-borne odors (Ngai et al 1993, Freitag et al 1998). The water soluble odorants are recognized by fishes to fit to their aquatic habitat (Friedrich and Korsching 1997, Kang and Caprio 1991), whereas terrestrial vertebrates possess class II type receptors to detect volatile compounds, apart from retaining few of the class I type receptors (Firestein and Werblin 1989, Kashiwayanagi and Kurihara 1995, Tareilus et al 1995, Duchamp-Viret and Duchamp 1997, Bozza and Kauer 1998). This explains the sense of olfaction evolved from the lower chordate to the higher chordate organism with respect to the environmental requirements. So, it has been studied that the class I type receptors may be specialized for the detection of water-soluble odorants, whereas class II type receptors recognize volatile compounds
(Freitag et al 1998). The availability of repositories for GPCRs and ORs in fish genomes like pufferfish (Tetraodon nigroviridis), zebrafish (Danio rerio) (Barth et al 1996), Lampetra fluviatilis (Freitag et al 1999), and frog (Ji et al 2009) also facilitate to perform sequence comparison studies across genomes. In particular, it will be interesting to discriminate the class I and II type receptors in human OR phylogeny to study in further details. 6.2.3. Fish ORs Twenty five OR sequences were collected from the human olfactory receptor dataset and submitted to the online PSI-BLAST (http://blast.ncbi.nlm.nih.gov/) with default parameters against the fish genomes. The homologous sequences from diverse fish genomes such as Tetraodon nigroviridis, Danio rerio, Misgurnus anguillicaudatus, Ictalurus
188 punctatus, Takifugu rubripes, Oncorhynchus tshawytscha, Osmerus mordax, Carassius auratus and Oncorhynchus nerka were collected from the first hit and organized in FASTA format. Among the collected 32 ORs, 31 OR sequences were predicted for the N-out topology. Totally, 403 OR sequences (371 OR sequence from human and 32 from fishes) were aligned using the MAFFT alignment server with default parameters and are used to construct boostrap construction tree for the neighbor joining method (Nj) for 1000 replicates. 6.2.4. Results Cross-genome phylogeny with human ORs and selected fish ORs showed coclusters (Figure 6.3A and B). Notably, the cocluster arrangements occurred only with the HSC1 cluster and not with other human OR subclusters, which clearly suggest that the HSC1 cluster from human OR phylogeny belongs to class I type receptors (necessary to sense water-borne odors). This association indicates that the evolution of olfaction in higher order organism primarily originated from the aquatic organism in sensing water borne odors (Class I type) then evolved further to sense air-borne odors (Class II type) to adapt to the terrestrial habitat (Figure 6.3A and B). Notably, among the 32 fish ORs, four (gi 83752816, gi 83752926, gi 83752750 and gi 13177509) of them are neighbor members in the HSCI cluster or class I type receptors. The sequence identity (using needleall program (Needleman and Wunsch 1970) between these fish ORs with human ORs varies from 15% to 35% and sequence similarity has a range from 26% to 52% (Table 6.2).
189
A B
HSC1
CLASS I
type
Figure 6.3
Phylogeny of selected olfactory receptors in Homo sapiens and fish genomes

Note : The phylogenetic display of human Olfactory receptors (refer A) shows HSC1 clade (in aqua blue) as distinct. But the cross-genome phylogeny (refer B) on selected human olfactory receptors with fish ORs shows significant coclustering. This clearly indicates the characteristic feature of HSC1 as class I type receptors in sensing water-borne odors. Notably fish ORs were not coclustering with any other cluster, except HSC1.
6.2.5.
Sequence conservation: across fish and human ORs So far, no convincing evidence for class-specific sequence motifs
for fish-like receptors (class I type receptors) and mammalian-like (class II type receptors) have yet been obtained. However, efforts were made in observing few characteristic motifs from the human ORs and observed their conservation at cross-genome alignment, particularly to HSC1 cluster. Earlier studies support that the characteristic motif MAYDRYVAIC is present at TM3 and ICL2, and is common in human, mouse and zebrafish ORs (Zhang and Firestein 2002, Alioto and Ngai 2005). Among these residues, methionine (M), tyrosine (Y), and cysteine (C) residues are found to be related to OR-specific functions.
190
Figure 6.4
Snapshot of Alignment window for the motif KAFSTC in human ORs and in few fish ORs at cross-genome alignment
Kindly note the conserved KAFSTC motif observed in ICL3 of the cross-genome OR alignment on selected human and fish ORs. The fish ORs are denoted with the prefix FOR_ and human ORs with HS. The KAFSTC motif is conserved in the ICL3 and extended in TM6 of human ORs and this motif is observed in the fishes especially in zebrafish (Figure 6.4). However, the phenylalanine (F) and serine (S) residues are not as common in zebrafish ORs. Also, Lysine (K), Alanine (A) and Threonine (T) residues play major role in OR functions, and that the downstream histidine (H) is recommended for site-directed mutagenesis studies in earlier literature (Figure 6.4). To discriminate the class I and class II type receptors among human ORs, a study has to be conducted with adequate OR sequences from amphibian genome, which is expected to have both class I and class II type receptors (Freitag et al 1998). Performing cross-genome phylogeny with selected amphibian ORs with already established human OR phylogeny will be helpful to understand human OR clusters for class I type and class II type receptors (Section 6.3 for more details).
191 Table 6.3 Sequence identity of neighboring fish ORs and human class I type receptors observed in cross-genome OR phylogeny.
S. No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Fish ORs FOR_13177509 FOR_83752926 FOR_83752816 FOR_83752750 FOR_13177509 FOR_13177509 FOR_83752816 FOR_13177509 FOR_83752926 FOR_83752816 FOR_83752926 FOR_83752750 FOR_83752750 FOR_83752816 FOR_13177509 FOR_83752926 FOR_83752750 FOR_83752750 FOR_83752750 FOR_83752816 FOR_83752926 Human ORs HS56A1_Chr11 HS56A1_Chr11 HS56A1_Chr11 HS56A1_Chr11 HS56A5_C1 HS52I2_C1 HS56A3_Chr11 HS56A3_Chr11 HS56A3_Chr11 HS56A5_C1 HS56A5_C1 HS56A5_C1 HS52I2_C1 HS52I2_C1 HS52I1_C1 HS52I2_C1 HS56A3_Chr11 HS56A3_Chr11 HS52I1_C1 HS52I1_C1 HS52I1_C1 Sequence identity 15% 16% 17% 17% 30% 30% 31% 31% 31% 31% 31% 32% 32% 32% 32% 32% 33% 33% 34% 35% 35% Sequence similarity 26% 26% 25% 27% 53% 47% 47% 54% 49% 50% 52% 52% 51% 49% 50% 49% 51% 51% 53% 53% 52%
6.3
CROSS-GENOME PHYLOGENY ON SELECTED ORS FROM HUMAN AND AMPHIBIAN GENOME
6.3.1
Objective As cross-genome phylogenetic analysis on selected human
olfactory receptors with selected fish ORs showed HSC1 in human OR phylogeny corresponds to class I type (fish like ORs to sense water-borne odors, section 6.2), the current study is aimed to introduce few ORs from frog genome to the already established human OR phylogeny. Since amphibians have two classes of receptor types (class I and II), this study will be more helpful in discriminating both class I type (to sense waterborne odors) and class II type receptors (to sense air-borne odors) in the previously established human OR phylogeny.
192 6.3.2 Literature survey on class I and II type ORs Xenopus laevis possesses gene repositories for two distinct classes of olfactory receptors and class I is related to receptors of fish and the other class is similar to receptors of mammals (Freitag et al 1995). Amphibians provide "an unique opportunity to compare olfactory receptors of both classes in one animal species" (Mezler et al 2001). In frogs, fish-like receptor genes (class I type receptors) are exclusively expressed in the lateral diverticulum of the nasal cavities (Breer 2003), whereas mammalian-like receptors are expressed in the sensory neurons of the main diverticulum to sense the air-borne odors/volatile odors. Studies comparing the structural features of both receptor classes from various species revealed the fact that they differ mainly in their extracellular loop 3, which may contribute to ligand specificity (Freitag et al 1998). Earlier studies reported that OR sequences such as XB107, 239, 238, 242 are class I receptors and XB178, 180, 177, 350, 352 and 154 are class II receptors in frog genome (Mezler et al 2001). 6.3.3 Amphibian ORs Twenty representative OR sequences from human OR phylogeny were collected and submitted to online PSI-BLAST (http://blast.ncbi. nlm.nih.gov/) searches with default parameters to collect homologues from Xenopus Levis. Initial results gave rise 28 homologues, which reduced to 14 sequences after redundancy filter (like description, membrane topology). Among them, three sequences were designated as class I type receptors, six are denoted as class II type receptors. The anomalous topology and presence of lesser number of TM-helices were striking features in the frog OR dataset
193 and cross-genome phylogeny was performed for bootstrap construction tree using MEGA 5.0 for the neighbor joining method (Nj) for 1000 replicates. 6.3.4 Results Cross-genome phylogeny of human ORs with frog ORs showed remarkable coclustering in selected human OR subclusters (Figure 6.5). Frog ORs were distributed particularly in three human OR subclusters, namely HSC1, HSC2 and HSC4. For cross-genome phylogeny, the observed three coclusters of human and frog ORs are labeled as HXC1, HXC2 and HXC3 (Figure 6.5). Here, HX refers to Homo sapiens and Xenopus laevis in crossgenome phylogeny and HXC1 is found to related to class I type receptor and HXC2 and HXC3 with class II type receptors of both genomes.
A B HXC2 (Class II ) type)
HSC1 (Class I type)

[[
HXC1 (Class I type)
Figure 6.5
Snapshot depicts the coclustering of fish ORs and frog ORs in human OR phylogeny
Note: The phylogenetic display of human Olfactory receptors (refer A) shows HSC1 clade as distinct and coclustered with ORs from fishes (denoted with an arrow mark and fish ORs in brown colour). And the cross-genome phylogeny (refer B) of selected human olfactory receptors with frog ORs (brown colour) exhibit coclustering at three human OR clusters. This clearly indicates the characteristic feature of HSC1 as class I type receptors in sensing water-borne odors (HXC1) and other clusters (HXC2, HXC3) belong to class II type receptors, for sensing water-borne odors.
194
HXC1
Frog ORs for class I type receptors
HXC3
HXC2
Figure 6.6
Snapshot depicts the coclustering of fish ORs with class I type receptors of human ORs in HSC1(given in A),also exhibiting the coclusters like HXC1,HXC2 and HXC3 to indicate the class I and II type receptors from frog ORs with human ORs (given in B).
Note: The phylogenetic display of human Olfactory receptors (refer A) shows HSC1 clade as distinct and co-clustered with ORs from fishes (denoted with an arrow mark and fish ORs in brown colour). And the cross-genome phylogeny (refer B) of selected human olfactory receptors with frog ORs (brown colour) exhibit coclustering at three human OR clusters. This clearly indicates the characteristic feature of HSC1 as class I type receptors in sensing water-borne odors (HXC1) and other clusters (HXC2, HXC3) belong to class II type receptors, for sensing water-borne odors. (Table 6.4 and 6.6)
195 6.3.5.1 Cocluster HXC1 Class I type receptors Notably, few sequences (such as gi 9650878, 7530156, 9650880, 1617229, 1617249, 1617227 and 1617231 co-cluster with HSC1 (noted as HXC1 in cross-genome phylogeny) (Figure 6.6). As we know from previous experiments, HSC1 was identified as fish-like ORs in the human OR phylogeny (see Section 6.1). Interestingly, the mentioned frog ORs which coclustered to this cluster also belongs to Family A G protein-coupled receptor-like and are designated as olfactory receptor class I (Xenopus laevis) in the SCOP definition (URL: http://supfam2.cs.bris.ac.uk/
SUPERFAMILY/cgi-in/genome.cgi?model=0037432;cgi_xl=yes;sf=81321). This further suggests that the HSC1 retains class I type receptors to sense water-borne odors in human. The given snapshot shows the cocluster of human and frog ORs i.e., HXC1 (Figure 6.6) and as mentioned previously, the seven frog ORs which are annotated as class I type receptors are co-clustered with the human class I type ORs such as HS52l1, 52l2, 56A3, 56A1, 56A5, 56B4, 56B1, 51A2, 51A4, 51A7, 51S1, 51G2, 51G1, 51L1, 51M1, 51V1) (referred as HSC1 in section 6.2. to sense water-borne odors). A pairwise sequence identity between selected frog ORs and human ORs related to HXC1 range from 18-35% and the sequence similarities range from 32 -57% (Table 6.3). 6.3.5.2 Cocluster HXC2- class II type receptors In the other cocluster HXC2 (Figure 6.6), the established sequence association refers to the class II type receptors both in human and in frog genome. Sequences belonging to human subcluster (namely HSC2) including human OR sequences like HS2D2 and HS10AD1 cocluster with frog ORs like gi 9650890, 96050886, 9650888, 9650884 and 9650892 and are
196 annotated as class II type receptors. Notably, the observed sequence identity between frog ORs and associated human ORs in HXC2 range from 33% to 43% and the similarity range from 33-60% (Table 6.5) 6.3.5.3 Cocluster HXC3 - class II type receptors The OR sequences labeled as olfactory receptor and class II type receptor (gi 1617247 and gi 9650882) from the frog genome coclustered with human OR subcluster namely HXC3 (Figure 6.6). The distribution of frog ORs in human OR phylogeny denotes the distribution of class II type receptors in the clusters of HSC2-HSC9, but not with HSC1. Thus, ORs in HSC1 are referred to as class I type receptors and stay distinct from other subclusters. Generally, human ORs are abundantly located in chromosome 11. Due to the introduction of frog ORs, considerable cluster rearrangements have been observed in the human OR subclusters. Crossgenome phylogenetic studies were helpful to identify these kinds of clusterspecific features at cross-genome level, especially in discriminating class I and class II type receptors in the human OR phylogeny. The observed cross-genome phylogeny with human-fish ORs and human-frog ORs helps to discriminate class I and II type receptors in human OR phylogeny (Figure 6.6).
197 Table 6.4 Sequence identity of neighboring frog ORs and human class I type receptors observed in cross-genome OR phylogeny
S. No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 Note: Frog ORs (Class I ) XLOR_1617231 XLOR_9650878 XLOR_7530156 XLOR_9650880 XLOR_1617229 XLOR_1617249 XLOR_1617227 XLOR_1617231 XLOR_9650878 XLOR_7530156 XLOR_9650880 XLOR_1617229 XLOR_1617249 XLOR_1617227 XLOR_1617231 XLOR_9650878 XLOR_7530156 XLOR_9650880 XLOR_1617229 XLOR_1617249 XLOR_1617227 XLOR_1617231 XLOR_9650878 XLOR_7530156 XLOR_9650880 XLOR_1617229 XLOR_1617249 XLOR_1617227 XLOR_1617231 XLOR_9650878 XLOR_7530156 XLOR_9650880 XLOR_1617229 XLOR_1617249 XLOR_1617227 XLOR_1617231 XLOR_9650878 XLOR_7530156 XLOR_9650880 XLOR_1617229 XLOR_1617249 XLOR_1617227 XLOR_1617231 XLOR_9650878 XLOR_7530156 XLOR_9650880 XLOR_1617229 XLOR_1617249 XLOR_1617227 Human ORs (class I) HS52L1_Chr11 HS52L1_Chr11 HS52L1_Chr11 HS52L1_Chr11 HS52L1_Chr11 HS52L1_Chr11 HS52L1_Chr11 HS52I2_Chr11 HS52I2_Chr11 HS52I2_Chr11 HS52I2_Chr11 HS52I2_Chr11 HS52I2_Chr11 HS52I2_Chr11 HS56A3_Chr11 HS56A3_Chr11 HS56A3_Chr11 HS56A3_Chr11 HS56A3_Chr11 HS56A3_Chr11 HS56A3_Chr11 HS56A1_Chr11 HS56A1_Chr11 HS56A1_Chr11 HS56A1_Chr11 HS56A1_Chr11 HS56A1_Chr11 HS56A1_Chr11 HS56A5_Chr11 HS56A5_Chr11 HS56A5_Chr11 HS56A5_Chr11 HS56A5_Chr11 HS56A5_Chr11 HS56A5_Chr11 HS56B4_Chr11 HS56B4_Chr11 HS56B4_Chr11 HS56B4_Chr11 HS56B4_Chr11 HS56B4_Chr11 HS56B4_Chr11 HS56B1_Chr11 HS56B1_Chr11 HS56B1_Chr11 HS56B1_Chr11 HS56B1_Chr11 HS56B1_Chr11 HS56B1_Chr11 Sequence identity 18% 18% 18% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 19% 20% 20% 20% 20% 20% 20% 20% 20% 22% 23% 23% 24% 24% 26% 26% 26% 29% 29% 30% 30% 30% 31% 32% 33% 34% 34% 34% 34% 34% 34% 34% 34% 35% 35%
Sequence similarity 32% 49% 49% 51% 35% 33% 33% 32% 50% 50% 50% 35% 32% 34% 31% 52% 52% 52% 36% 35% 32% 32% 54% 53% 54% 37% 33% 32% 36% 55% 55% 57% 38% 36% 36% 32% 50% 50% 53% 36% 30% 30% 33% 52% 52% 54% 36% 33% 34%
ORs from Xenopus Levis are labeled with XLOR as prefix instead of gi and human ORs are given in common name with HS as prefix.
198 Table 6.5 Sequence identity of neighboring frog ORs and human class II type receptors observed in cross-genome OR phylogeny (referred as HXC2)
S. No 1 2 3 4 5 6 7 8 9 10 Human Ors (class II) HS10AD1 HS10AD1 HS2D2 HS2D2 HS10AD1 HS10AD1 HS10AD1 HS2D2 HS2D2 HS2D2 Frog ORs (class II) Sequence identity Sequence similarity XLOR_9650890 XLOR_9650890 XLOR_9650890 XLOR_9650890 XLOR_9650884 XLOR_9650886 XLOR_9650888 XLOR_9650888 XLOR_9650884 XLOR_9650886 33% 33% 37% 37% 37% 38% 38% 39% 41% 43% 33% 50% 54% 54% 37% 55% 55% 57% 60% 60%
Table 6.6 Sequence identity of neighboring frog ORs and human class II type receptors observed in cross-genome OR phylogeny (referred as HXC3) S. No. 1 2 3 4 Human ORs(class II) XLOR_1617247 XLOR_1617247 XLOR_9650882 XLOR_9650882 Frog ORs (class II) HS11L1_Chr1 HS6Q1_Chr11 HS6Q1_Chr11 HS11L1_Chr1 Sequence identity 26% 28% 36% 41% Sequence similarity 40% 42% 55% 58%
199 6.4 PHYLOGENETIC ANALYSIS ON DROSOPHILA
OLFACTORY RECEPTORS 6.4.1 Background As mentioned in Chapter 1, insect olfaction is one of the most fascinating areas, particularly that of Drosophila chemosensory receptors. Various earlier studies showed the importance of understanding insect olfaction (Siddiqi 1990, Clyne et al 1999). Drosophila is a favorite model organism, with the availability of complete genome and the databases for Drosophila olfactory receptors (Crosby et al 2007), it has motivated to compare Drosophila ORs with other olfactory receptors from various eukaryotic genomes to understand evolution in olfaction and to implement the conserved features. 6.4.2 Drosophila ORs As referred in 6.1.1 (Methodology), the same procedure has been followed for the current study with Drosophila ORs. Since many of the ORs have been referred with many gene synonyms, care has been taken to designate OR sequences. For example, several gene synonyms such as 22A.1, AN11, CG12193, Dmel Or22a, Dmel22a, DmelCG12193, DOR22a, DOR22A.1, OR22a, Or22A.1, are available to represent candidate OR 22a and is referred as DOR 22afor the current study. The same way, the other OR sequenc es were also labeled with DOR as prefix. Collected sixty sequences were predicted for membrane topology and notably, 90% of OR sequences were predicted for the N-in topology with 7 2 TM helices.
200 Sequences were submitted to MAFFT alignment procedure with the default parameters (JTT 200 scoring matrix and gap opening penalty as 1.53). Inevitably, some inserts were retained in the alignment due to the presence of sequences like OR 83b, DOR 104, 67 b and 45a. 6.4.3 6.4.3.1 Results on Drosphila OR Phylogeny Analysis Cluster association: 10 subclusters The generated phylogeny for 60 diverse Drosophila olfactory receptors showed 10 subclusters (Figure 6.7 and 6.8) and the observed cluster association indicates the specific sequence properties among different clusters. The observed tree topology resembles the earlier studies (Warr et al 2001). But the current study is varying in the alignment procedure and lacking the gustatory receptors (GR) in the phylogeny. In the previous work (Robertson et al 2003), the long extracellular loop 2 between the TM4, TM5 of 83a,b,85e was edited. But, in the current analysis, these long loop lengths were not excised. Thus, the obtained tree topology shows 10 different subclusters. Notably, the known 24 antennal receptors (Dobritsa et al 2003) were distributed predominately in eight subclusters (except clusters, DmC4, DmC8 (Figure 6.7 and 6.8). Here, notably the pheromonelike receptors DOR47b and 65a were observed at DmC5. Also, interestingly, both ORs are of antennal receptors. However, considerable distance was observed with other pheromone-like receptors, i.e., DOR 88a, which was observed in the neighboring DmC6.
201
R6 DO
DO R
9a
DO R8
8a
DOR 71a
65
3c R8 DO
DOR
5c
DOR 46aB
aA DOR 46
DO R4
DO R
DO R3
DO R
100
R O
67
DO R
85
3c
33
R DO
92
69
R DO
DOR 47b
65a
d 67
10 0
DO R 9 DOR 4b 94a
R DO 19 a
10 0
D R O 2a
100
DO R
64
100
10
99
DOR 23a
DOR
96
89
R DO b 33
R6 DO
7a
100
10 0
10
99
70
72
64
100
85d DOR 85b DOR DOR 85c
86
7a
97
91
57
DOR
21
43b
1 00
43
33 6 8 25
6
40 50
48 65
DOR 56a
99
100
10 0
66
DOR 43a
DOR 30 a DO R4 9b
DOR
98 a DO R5 9 DOR 42 bb
DOR 22b 2a R2 DO c 59 R a DO 85 R DO
DO R
61
100
2 20 6 36
80
DOR 42a
42
9 33 6
64
97
99
65
98
99
98
DO R
10 0
DO R
63 a
DO R
83 b
83a
10 4
5 63 8
R1 DO 3a
R DO
DOR 8
4a R2 DO OR 45b D
DO R 47
b 67
DOR 22c
DO R9
R DO
DOR 45a
DO R
2a
0.2
Figure.6.7 Phylogeny of Drosophila Olfactory receptors Figure 6.7: The observed tree topology of Drosophila ORs are denoted in blue and purple colour for the alternative clusters to differentiate cluster association (kindly read tree topology in clockwise direction). As mentioned in the classical publication (Robertson et al 2003), the closely related receptors like OR 22a,b and 59b-c in DmC1,OR33a-c in DmC2,OR 65 a-c and OR 94a-b in DmC5,OR85 b-d in DmC6 are observed closely in the tree topology as nearly as the same reported by the previous group. These patterns could be illustrative to elaborate the highly conserved sequence association at family level in spite of general sequence diversity (Figure 6.8). The current study clearly reports the diversity of Drosophila ORs and associations were grouped as 10 DOR subclusters.
R DO
DOR 10a
74
35
98
DOR 1a
202 The current approach is different from the other earlier studies with the novel features like employing FFT-alignment procedure, JTT matrix, Nj method and without providing any outgroup (s) to structure the tree. The resulted phylogeny may be different from previous results in fine features but showed 10 subclusters which are labeled as Dm (refers to Drosophila melanogaster) followed by cluster number (referred as DmC1-DmC10). Notably, the antennal receptors, Or22a and 35 a from DmC1, Or85b from DmC6, Or 35a from DmC9 are related in sensing pentylacetatesensitive receptors (Hallem et al 2004). This cluster association can be used as an illustrative to explain the diversity of Drosophila ORs. Though these receptors show same functional properties and cellular localization, they are distributed in different clusters in the tree-topology. This may be due to specificity of receptors which are required due to shape and size of ligands but similar chemistry, re-emphasizing the fact that the olfactory function has evolved separately several times within the superfamily of proteins (Robertson et al 2003). In the study, I could observe OR83b with adequate OR sequences and forms a cluster association with other ORs like 83a,104,63a,67b and can be further examined for the sequence analysis for common motif and ionchannel properties later. Separately, this particular association lack high supporting Bs values and alignment procedures also play a major role in placing OR83b in tree topology. Earlier studies suggest (Dunipace et al 2001), OR83b is closer to GRs in phylogeny, but not so closely related to ORs. OR83b is observed in the DmC8 and notably the functional antennal receptors were not present in this cluster. In general, the subclusters observed in this current study show the diverse features at inter-genomic level to discriminate diverse odors as single or mixed odors.
203 Trial phylogenetic study was performed with 60 selected GRs, ORs and the resulted tree showed clear and distinct clusters of ORs and GRs. There is no coclustering observed between these two types of chemosensory receptors, in turn is showing their independent evolution.
DmC5 DmC4 DmC6 DmC3
DmC2
DmC7
DmC1 DmC9 DmC10
DmC8
Figure.6.8 Observed 10 subclusters of Drosophila olfactory receptors

Note: The observed 10 subclusters of Drosophila olfactory receptors were labeled as DmC1 to DmC10 in clockwise direction, and cluster association was indicated in the green color filled circles and particularly the antennal receptors are given in fushia color.
The observed average sequence identity (by using Alistat programEddy S,2005) among the 60 olfactory receptors was only 18%, and the most related pairs like DOR 64 and 23a showed 100% identity and the isoforms like 22a and 22b showed the next highest identity of 77%. 6.4.4 SUMMARY The generated Nj method of Phylogeny on 60 selected Drosophila ORs exhibited 10 OR subclusters, referred as DmC1 to DmC10. Notably, the known 24 antennal receptors (Dobritsa et al 2003) were distributed in eight subclusters except in DmC4, DmC8. Cluster associated with OR83b and associated OR
204 sequences such as 83a,104,63a,67b (DmC8) can be further examined for the sequence analysis for common motifs and predicted for secondary structures and to observe ion-channel properties. The pheromonelike receptors DOR47b and 65a were observed at DmC5 and both are antennal receptors. This illustrates the relevance of localization with the functional expressions. The observed average sequence identity among the 60 olfactory receptors was only 18%. 6.5 CROSS-GENOME HOMO SAPIENS The objective of the current study is to perform a cross-genome phylogeny on selected ORs from Drosophila, S. cerevisae and Homo sapiens. 6.5.1 Background Olfactory system of Drosophila is simple, wherein animal olfactory systems are interestingly complex in sensing diverse air-borne odors. As in earlier studies, insect olfactory sensory neurons (OSNs) and mammalian OSN are anatomically similar, but insect OSNs differ in possessing the sensilla in the antenna and maxillary palp in their olfactory system (Stocker 1994). 6.5.2 Insect ORs and mammalian ORs: (Evolutionarily unrelated) Insects ORs are seven transmembrane proteins and are PHYLOGENETIC ANALYSIS ON
SELECTED ORS FROM DROSOPHILA, YEAST AND
evolutionarily distinct from mammalian ORs. Drosophila ORs retain reverse topology (Benton et al 2006; Wistrand et al 2006). They show reasonable sequence similarity and orthology with other insect species such as Anopheles gambiae, Heliothis virescens and other endopterygota (Carey et al 2010). However, a single member of the insect OR is strongly conserved
205 across insect genomes and is called OR83b (Krieger et al 2003; Pitts et al 2004; Jones et al 2005). OR83b is not directly interacting with odors, but functions as a chaperoning co-receptor. OR83b acts as a co-receptor and forms heteromeric complex with ligand binding ORs (Larsson et al 2004; Nakagawa et al 2005; Neuhaus et al 2005; Benton et al 2006).Though anatomically mammalian and insect OSNs are similar, number of olfactory receptors of insects is smaller than in mammalian genomes. Moreover, insect ORs are evolutionarily unrelated to vertebrate ORs. 6.5.3 Membrane proteins in Yeast Six membrane proteins (OR-like) were collected (NP_012743.1, NP_014078.1, NP_014081.1, NP_014094.1, NP_014105.1, NP_116627.1) and used for the current study. 6.5.4 Results The collected 60 Drosophila ORs, 371 human ORs and 6 candidate receptors from yeast were aligned and observed for the cluster associations at cross-genome level. There is no coclustering observed between the insect ORs and mammalian ORs. Notably, Drosophila ORs stay very distinct and away from the human ORs . But, this establishes a considerable co-cluster arrangement with candidate receptors from yeast. Perhaps, this could be due to insect ORs exhibiting a long lineage of evolution with human ORs, but are relatively closer to fungal taxa. The other possible reasons could be due to the independent evolution of fly ORs, lifestyle of fruit flies in sensing specific odors wherein mammals established a complex olfactory system to sense both airborne and water-borne odors.
206 Probably, the observed reverse topology in the fly genome could be another strong reason for the lack of coclustering with human ORs. 6.5.5 Summary There is no significant coclustering observed between selected ORs of human and Drosophila genomes.
Figure 6.9
Cross-genome phylogeny on selected ORs from human, Drosophila and yeast

Note: The selected human (pink) and Drosophila ORs (blue) do not show any significant coclustering. But Drosophila ORs (blue) shows considerable coclustering with yeast ORs.
6.6
CROSS-GENOME PHYLOGENETIC ANALYSIS ON SELECTED OLFACTORY RECEPTORS FROM HUMAN AND C. Elegans GENOMES C. elegans OR physiology:A special occurrence - Cory Bargman Cory Bargman and associates have proposed a genetic approach to
investigate odor response in C. elegans a nematode which possesses 14
207 types of chemosensory neurons in sensing various odors. He stated that among important olfactory candidate genes, more than 40 highly divergent receptors have been found. They do not show sequence homology, but exhibit structural homology with vertebrate OR proteins. Eleven of these are expressed in small subsets of chemosensory neurons, a single neuron can express upto 4 different OR genes. A receptor gene called odr-10 is expressed in one of the sensory neurons and encodes a potential odorant receptor. These genetic studies provide the first time an in vivo model for the specific interaction between a receptor of the seven transmembrane protein family and an odor ligand. In the current study, amongst the collected ORs, odr-10 is the only one olfactory receptor sequence reported in C. elegans. A cross-genome phylogenetic analysis, with the collected homologues of odr-10, along with the selected representatives OR sequences from human OR sequences (Section 6.1) might help to provide further annotation. So, the intention of the study is to find out is there any possible coclustering observed at the crossgenome phylogeny of selected human ORs with homologues of olfactory receptor of C. elegans. 6.6.1 Odr -10 and homologs As discussed, the only one sequence in the nematode genome was annotated as olfactory receptor and is odr-10. Attempts were made to collect the homologous sequences for odr-10 by running a BLAST search with default parameters against the database of already collected 1016 membrane proteins of C. elegans from SEVENS database. Odr-10 was given as query to search against the database of C. elegans GPCRs with the default parameters. Hits with significant E-value were considered for the current study and 82 homologues were collected for
208 the odr-10 and among them seven hypothetical proteins were collected. Odr10 was predicted to retain seven transmembrane helices and N-out topology (Colbert Ha and Bargmann 1997). Among the collected homologues, 78 sequences were predicted for the N-out topology. Separately, 10 representative OR sequences (from HSC1-HSC10) were selected from previously established human OR phylogeny (section 6.1), and with the selected human OR representative sequences along with collected homologues of odr-10 was used to generate cross-genome alignment.
Figure 6.10 Observed cluster association in the cross-genome phylogeny of selected ORs from human and C. elegans genomes
Note: The Nj method of phylogeny shows the cluster arrangements of serpentine receptors in C. elegans from the clusters CeC1 to CeC6. The non co-clustering representative OR sequences from human stay distinct and noted as Hum_C7 (read anti-clock wise).The nematode olfactory receptor odr-10 is highlighted in star symbol at CeC3.
6.6.2
Results and Discussion The obtained cross-genome phylogeny exhibits seven distinct
clusters in the tree topology and phylogeny was reported between selected human ORs, C. elegans OR and its related homologues (Figure 6.10).
209 As earlier studies have reported (Chapter 2) that the human and C. elegans GPCRs show long lineage in evolution and thus no significant coclustering were observed in the cross-genome phylogeny, the same way there is no coclustering observed between these taxa with reference to olfactory receptors. Notably, all the representative human OR sequences were clustered together and stays as a separate clade (denoted as HumC1) and shows the strong species-specific trend (Figure 6.8). Interestingly, among the collected homologues, predominantly 58 ORs are from Str superfamily. The annotated olfactory receptor Odr-10 belongs to this largest Str superfamily and notably in the CeC3 cluster arrangement, odr-10 is associated with candidate receptors purely from Str superfamily. Odr-10 tends to be closely associated with a particular strtype receptor namely NP_505861.3 (Str-115) with 33.5% of sequence identity. CeC3 retains 23 Strtype receptors and shares about 24% sequence identity. Most related pairs based on sequence identity were identified and particularly ten str candidate receptors associated to odr-10 has been reported for sequence identities and similarities (Table 6.7 and 6.8). The rest of the 35 candidate receptors from the Str family exhibit a separate association and form a separate cluster and is denoted as CeC1. These associations can be explained for the sequence diversity existing even at the family level, although they belong to same superfamily. CeC2 is associated with six candidate receptors, among them five receptors belong to the same family i.e., Sru family of SRG superfamily and an unannotated GPCR (NP_496399.1) associated in this cluster which could be further explored for its functional relevance. Two candidate receptors from srsx family and a hypothetical protein (NP_494099.1), candidate GPCR namely (fol-3), and a srt type
210 receptor cocluster in CeC4, to represent diverse sequence property of this cluster (Figure 6.10). 5 srab candidate receptors from SRA superfamily are associated in the CeC5 and denote the sequence specificity and average identity for this cluster is 45%. Notably, CeC6 is associated with hypothetical proteins and typical GPCR of gar-3. Table 6.7 Significant cluster association for str type receptors in CeC3 and sequence pairs with high /low identity has been given
S. No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 High Length identity to 379 346 349 337 340 351 340 339 334 341 336 337 351 333 359 353 340 339 686 346 334 325 20.6 28 29.8 29.9 32.1 32.1 32.9 33.5 33.5 33.6 33.6 36.6 39.9 39.9 46.7 46.7 56.3 56.3 57.8 57.8 86.2 86.2 Low identity to NP_505321.1(str-85) NP_507067.2(str-254) NP_503223.1(str-256) NP_505861.3(str-115) NP_503316.1(str-20) NP_507162.2(str-15) NP_503666.1(str-119) NP_505861.3(str-115) NP_509157.1(odr-10) NP_503493.1(str-160) NP_001023592.1(strNP_503493.1(str-160) NP_503666.1(str-119) NP_509720.1(str-74) NP_506742.2(str-45) NP_507067.2(str-254) NP_503223.1(str-256) NP_505321.1(str-85) NP_506821.1(str-88) NP_505322.1(str-87) NP_506177.2(str-181) NP_507193.1(str-151) NP_507192.3(str-149) 169) NP_507192.3(str-149) NP_506742.2(str-45) NP_509720.1(str-74) NP_503223.1(str-256) NP_507067.2(str-254) NP_506821.1(str-88) NP_505321.1(str-85) NP_506177.2(str-181) NP_505322.1(str-87) NP_507192.3(str-149) NP_507193.1(str-151) 17.6 14.7 16.2 14.7 16.7 16.6 17.4 17.7 18.5 17.1 16.2 16.9 NP_507048.2(srj-29) NP_506742.2(str-45) NP_507193.1(str-151) NP_503666.1(str-119) NP_507048.2(srj-29) NP_503666.1(str-119) NP_506742.2(str-45) NP_509720.1(str-74) NP_507048.2(srj-29) NP_507048.2(srj-29) NP_509720.1(str-74) NP_506742.2(str-45) 15.3 15.3 17.7 17.8 16.5 17.9 15.9 16.5 17.4 18.8 NP_507018.1(str-233) NP_507048.2(srj-29) NP_507193.1(str-151) NP_507048.2(srj-29) NP_507048.2(srj-29) NP_507048.2(srj-29) NP_507048.2(srj-29) NP_507048.2(srj-29) NP_507048.2(srj-29) NP_507048.2(srj-29)
Protein identifier NP_507048.2(srj-29) NP_507018.1(str-233) NP_506518.2(str-230) NP_507068.2(str-97) NP_507162.2(str-15) NP_503316.1(str-20) NP_500472.1(str-122) NP_509157.1(odr-10) NP_505861.3(str-115) NP_001023592.1 (str-169)
Protein identifier
Protein identifier
211 Table 6.8 Sequence identity and similarity between odr-10 and associated SR
S.N o 1 2 3 4 5 6 7 8 9 10 Odr-10 NP_509157.1(odr-10) NP_509157.1(odr-10) NP_509157.1(odr-10) NP_509157.1(odr-10) NP_509157.1(odr-10) NP_509157.1(odr-10) NP_509157.1(odr-10) NP_509157.1(odr-10) NP_509157.1(odr-10) NP_509157.1(odr-10) Associated OR NP_507048.2(srj29) NP_507162.2(str15) NP_507068.2(str97) NP_505321.1(str85) NP_507193.1(str151) NP_500472.1(str122) NP_505322.1(str87) NP_507192.3(str149) NP_506821.1(str88) NP_503666.1(str119) Sequence identity 20% 25% 27% 27% 27% 27% 28% 28% 28% 31% Sequence similarity 37% 43% 46% 44% 45% 48% 45% 46% 46% 49%
6.6.3
Summary In the cross-genome phylogeny of selected representative OR sequences from human and Odr-10 and 84 related homologues from C. elegans, there is no coclustering. This may be due to the long lineage of evolution between human and nematode membrane proteins also due to their widely different olfactory behavior. The observed CeC1 and CeC3 clusters retain sequences from the largest Str superfamily, wherein notably CeC3 cluster retains the characteristic olfactory receptor of C. elegans (odr-10) along with 22 candidate receptors exclusively from Str superfamily. This cluster can be illustrative of nematodespecific ORs observed at the cross genome level.
212 As odr-10 belongs to Str superfamily, in CeC3, the candidates from Str superfamily establishes the association. Notably (Str-115) shows 33.5% of sequence identity with the odr-10 and this cluster can be further analyzed for sequence properties such as motifs and orthologs. Distantly related homologues are hard to identify however typical sequence search procedures could be accompanied by cross-talks. Such coclustering of sequences (like odr-10 and Str-115) could establish distant relationships between them. Among the seven hypothetical proteins, two of them are associated in the CeC2 and CeC4 clusters which, in turn, can lead to interference with the functional properties of the associated serpentine receptors. 6.7 CROSS-GENOME PHYLOGENETIC ANALYSIS ON
SELECTED ORS FROM HUMAN AND MOUSE GENOMES 6.7.1 Introduction In mouse, olfactory epithelium is divided along the dorso-ventral axis into four zones, based on OR expression (Ressler et al 1993, Villeneuve et al 2000). The dorsal region, also referred as zone I, expresses about 50% of all OR genes, including class I type as well as class II type receptors. Ventral region, consists of endoturbinates II, III and IV, expresses only class II type receptors (Zhang et al 2004, Tsuboi et al 2006). Earlier studies also suggest that receptors for polar, hydrophilic and weakly volatile odorants are present in the dorsal region of olfactory epithelium; while receptors for non-polar, more volatile odorants are distributed in the ventral region (Abaffy and Defazio 2011), to exhibit different odor codings.
213 Expression data are also available for some of the mouse and rat class I type ORs. Both classes (mention classes) of ORs are expressed in the dorsal zone of the olfactory epithelium (Bulger, et al., 1999) and (Conzelmann et al 2000). Class II type receptors have been found in all four zones of the ventral zone. In a previous phylogenetic analysis on mouse olfactory receptors (Zhang and Firestein 2002, (Zhang et al 2007) by using consensus tree, nearly 1000 OR genes were classified into several OR families. For the classification they have set the rule as family members must comprise a strong phylogenetic cluster, which refers to a reliable clade, generally possessing >50% bootstrap value and have more than 40% protein identity. By this definition, mouse ORs were classified into 228 families. 6.7.2 Objectives Since OR sequence clusters were abundant in the mouse genome, the current study is aimed to perform a cross-genome phylogenetic analysis with a non-redundant set of 338 mouse olfactory receptors and the selected representative human OR sequences (around 50 in numbers). The current study will be helpful to identify reliable phylogenetic clades at cross-genome level and the conserved motifs across two genomes. 6.7.3 Human Mouse OR Orthology Many earlier studies report the existing significant evolutionary relationship between human and mouse ORs. The orthology has been observed from 60%, 70-80% and 80% and >80% of sequence identity across these genomes. It has been observed that mouse ORs from chromosome 11shows synteny relationship with the human chromose17p 13.3. Indeed, OR clusters from these genomes share the highest sequence % identity, and the closest pair retain 74-88% identity at protein level (Sullivan et al 1996) across two genomes. Mouse ORs are reported for orthology with human ORs even in
214 the sub family level. The mouse OR sub families like 3A, 1A, 1D, 1E and 1P are all present in human OR clusters. Apart from human counterparts, mouse ORs retain orthology with other vertebrate genomes also. For example, mOR11-2c shows 81.48% sequence identity to olfactory receptor like protein- DTMT in canine (Parmentier et al 1992) and mOR11-2e shows 89.81% identity to the rat OR sequence, namely RATOLFPROQ (also known as M64391) (Buck and Axel 1991). Earlier studies have reported the synteny relationships derived from the Mouse Genome Database linkage maps with the specific cluster pairs (Lapidot et al 2001). 6.7.4 Complex Picture on Human-Mouse OR Orthology Further inspection of human and mouse orthology shows a complex picture, in few cases simple pair-wise orthology was seen, but in other cases multiple potential orthologous mouse ORs for single human OR sequence and vice versa is found. Yet in some cases (Makalowski et al 1996), there is not much significant orthology between human and mouse ORs of same sub-family. True OR orthologous genes are expected to share a function and therefore to display higher conservation at the residues which are related to the odorant binding site. To identify the conserved residues in the othologous and paralogous OR sequences, Pilpel and Lancet, 1999 conducted a variability diagnostic plane analysis. They used six humanmouse orthologous genes to identify inter-orthologue variability and 197 OR genes to evaluate interparalogue variability and showed the detailed study on the correlation between them. The results reported 17 CRS (complementarity-determining region (CDR) in the lower right quadrant (Pilpel and Lancet 1999) which represents residues that have high variability among paralogous genes, but
215 relatively low variability among orthologous genes and shown the functional diversity such as odorant recognition. 6.7.5 Methodology Among the collected 338 mouse olfactory receptors 90% of ORs were predicted for retaining seven TM helices, with N-OUT topology. Along with selected 50 representative ORs (reference of why they were selected) from previously established human OR phylogeny, 338 mouse ORs were aligned by MAFFT alignment program with JTT 200 scoring matrix and gap opening penalty as 1.53. DUFF gene - a human chemokine receptor was used as outgroup for the current study. The obtained cross-genome OR alignment was used to generate tree with Nj method with bootstrap and the generated tree topology with circular display was preferred. Around 10 mouse OR subclusters were differentiated by the presence of 50 representative human ORs. 6.7.6 6.7.6.1 Results Cross-genome OR cluster association The selected 50 representative ORs from human OR phylogeny were helpful to associate 338 mouse ORs into 10 mouse OR subclusters. While reading the tree topology, exempting the outgroup (DUFF_gene), 10 mouse OR subclusters were observed along with human ORs and are named as MMC1, MMC2 to MMC10 (Figure 6.11 and 6.12). Apart from 6 human ORs which are from HSC1 (fish-like ORs), rest of the 44 human ORs were distributed along with the mouse ORs and the occurrence of human and mouse ORs in the clusters referred as coclusters which shows higher BS values and closely related with human ORs. From the observed coclusters, 25 human-mouse OR sequence pairs were selected and their sequence identity ranges from 41-84 % in cross-genome phylogeny (Table 6.9).
216
Figure 6.11 Cross-genome phylogeny of selected olfactory receptors (ORs) from human and mouse genomes
Note: Phylogeny of selected (50) representative OR sequences from 10 human OR subclusters (fuchsia) and mouse ORs (around 338 )in green color with a chemokine receptor (duff_hum) as an out group (red).ORs of H. sapiens are noted with prefix HS and ORs of M. musculus are noted with prefix MOR.
HMC1
Figure 6.12 Phylogeny on selected human and mouse olfactory receptors with special emphasize to mouse class I type receptors.
DUFF HUMAN
217 As seen in the human OR phylogeny where HSC1 stays distinctly, even in the cross-genome phylogeny the selected human ORs from HSC1 shows a distinct clade but co-clustering with around 74 mouse OR sequences (this particular association is referred as HMC1 in the phylogeny and the mouse homologues were designated with prefix as MOR* followed by gene id (Figure 6.11). The chemokine receptor, DUFF_HUMAN stays as outgroup in the study. 6.7.6.2 Cross- genome phylogeny with Class-I type receptor homologues 74 mouse class I type receptors were collected and the crossgenome phylogenyetic analysis was done (mention the other genome used for this along with 74 sequences, wherein the added 74 mouse class I type receptors coclustered only with the given representative human class I type OR receptors as expected). This further emphasize the clear discrimination of class I, II type receptors in higher eukaryotes such as human and mouse. The chemokine receptor was selected as an outgroup for the cross-genome OR phylogeny. In order to ascertain mouse ORs belonging to class I type ORs, attempts were specifically made to collect mouse homologues for class I type receptors (from HSC1). 74 mouse OR sequences were aligned with collected 338 mouse olfactory receptors along with 50 human representative ORs. The obtained cross-genome phylogeny clearly exhibited the coclustering arrangements with human ORs (class I type), particularly to represent class I type receptor properties. The alignment was performed as mentioned earlier (Section 6.1). Notably, all the OR representative sequences from HSC1 were clustered only with 74 mouse homologues and are noted as HMC1 (Figure 6.12). This exercise explains the usage of representative sequences in the cross-genome phylogeny to collect homologues. Though there are 52
218 human ORs present in HSC1 of human OR phylogeny (section 6.1), only six representative ORs were selected for the current study and these representative sequences were quite sufficient to establish a significant coverage of representing the HSC1 cluster properties. These six human OR representatives produced the coclustering with the mouse homologues (Figure 6.11), which could be of class I type receptors in mouse genome. 6.7.7 Common motifs in the Cross-genome phylogeny: By using the TM-MOTIF tool, the 10 mouse OR subclusters were observed for the conservation of amino acids at cross-genome level. Notably, LHPMY motif in TM1, ICL1, MAYDRYVAIC motif in TM3, ICL2, SY motif in TM5, FSTCSSH motif in TM6 and PMLNPF motif in TM7 are conserved between human and mouse ORs. 6.7.8 Summary By performing cross-genome OR phylogeny with selected 338 mouse ORs with 50 human OR representative sequences, significant co-clustering arrangements were observed in the phylogeny, reflecting the occurrence of highest sequence identity between certain human and mouse ORs. Selected 25 representative human-mouse OR sequence pairs were showing significant sequence identity varies from 41%-84%. Though ample co-clustering was observed, the labeled HSC1 of human ORs meant for the class I type receptors stays distinct. This strongly supports the presence of fish-like ORs in human and mouse OR clusters. The collected mouse homologues for Class I type receptor exhibited clear coclustering with Class I type receptors in the mouse genome, suggesting that class I type mouse OR homologues shows significant sequence identity with human ORs.
219 The ORs from different clusters of human ORs (class II) tend to spread along with mouse ORs for reporting the co-clusters. The conserved motif shows the evolutionary relationships between human and mouse ORs and preservation of sequence and structural properties for functional relevance. Table 6.9 Percentage identity for selected human and mouse ORs for significant association from cross-genome OR phylogeny S.NO 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Human OR Cluster No HSC2 HSC2 HSC2 HSC2 HSC3 HSC3 HSC3 HSC3 HSC4 HSC4 HSC4 HSC4 HSC4 HSC5 HSC5 HSC5 HSC6 HSC7 HSC8 HSC8 HSC9 HSC9 HSC9 HSC10 HSC10 Human OR HS1D2 HSorl16 HS1A1 HS1E1 HS10S1 HS12D2 HS4A16 HS4K5 HS8K5 HS8D1 HS9Q1 HS5W2 HS5T1 HS5AC2 HS8H1 HS6B1 HS14A16 HS11H1 HS0J3 HS10A7 HS2Y1 HS2K2 HS10AD1 HS2AK2 HS2M5 Mouse % Mouse OR OR identity Cluster No 18480460 52 MMC2 18480814 76 MMC2 18480066 84 MMC2 18479630 83 MMC2 18480630 54 MMC3 18479814 69 MMC3 18479942 54 MMC2 18480928 82 MMC2 18479442 76 MMC8 18480336 50 MMC8 18480006 74 MMC8 18479812 79 MMC10 18480754 74 MMC10 18479484 69 MMC9 18479794 72 MMC8 18480732 48 MMC5 18480640 52 MMC4 18480158 63 MMC5 18480958 61 MMC5 18480168 54 MMC6 18480552 81 MMC6 18480320 47 MMC6 18479756 41 MMC6 18480490 76 MMC4 18480592 73 MMC4
220 6.8 PHYLOGENETIC ANALYSIS ON OLFACTORY
RECEPTORS FROM SELECTED HUMAN AND NONHUMAN PRIMATES 6.8.1 Objectives A cross genome OR phylogeny was performed with OR sequences from non-human primates such as Ailuropoda melanoleuca (bear), Pongo abelii (Sumatran orangutan), Bos taurus (bovine), Callithrix jacchus (common marmoset), Rattus norvegicus (rat), Pan troglodytes (common chimpanzee), Canis lupus familiaris (domestic dog), and Gallus gallus (from class Aves) and to identify the coclusters among various taxa. 6.8.2 Background Cross-genome OR phylogeny with multiple organism is highly significant to identify cocluster association with ORs of diverse taxa. When compared to mammalian ORs, avian olfaction is poorly understood (Steiger, et al., 2009), so preliminary attempts were made in this section to identify the coclusters of ORs from various taxa for common function-olfaction. 6.8.3 Methodology Since the established human OR phylogeny proposed 10 distinct human OR subclusters, homologues were collected for the 50 human OR sequences for the interested seven non-human primates, Ailuropoda melanoleuca (bear), Pongo abelii (Sumatran orangutan), Bos taurus (bovine), Callithrix jacchus (common marmoset), Rattus norvegicus (rat), Pan troglodytes (common chimpanzee), Canis lupus familiaris (domestic dog), and an organism from class Aves Gallus gallus. For the pilot study sequences from non-human primates and aves were considered for the limited
221 representative sequences (ranges from 15 to 20 for each organism). Care has been taken while selecting sequences to have 7 2 predicted TM-helices. Totally, 505 sequences were coaligned by MAFFT alignment program and among them as mentioned 122 from are from non-human primates, 12 ORs from Gallus gallus, 371 were human ORs and a cross genome phylogeny was constructed for 1000 BS replicates. 6.8.4 Results The generated cross-genome phylogeny exhibit significant
coclustering with human ORs, particularly HSC1 (fish-like ORs) coclusters with Rat NP 00100126, Gallus NP 001008754.1, Bos XP 875301.2, Ailuro EFB18423 (Figure 6.13). Notably ORs from Gallus gallus (aves) is observed in human OR clusters such as HSC5, HSC6 and HSC7. Canine ORs were observed in human OR clusters such as HSC2, 4, 6 and 7. The significant coclusters for 25 sequence pairs were identified and reported (Table 6.10) in the interest of showing inter-genomic OR cluster association for the olfactory acuity. The clusters with ORs from multiple organisms provide platform to study for the conserved motifs, and to train SVM model to identify putative ORs across genomes. Table 6.10 Percentage Identity between selected human ORs and nonhuman ORs CLUSTER Hum ORs Non-human ORs Bos_XP_875301.2 Ailuro_EFB18423.1 Gallus_NP_001008754.1 Bos_XP_591375.2 Pan_XP_524919.2 canis_XP_545735.2 XP_002822168.1 Calli_XP_002743272.1 Percentage Identity 82.5 81.3 47.8 39.2 98.1 85.5 26.9 82.9
C1 HS51M1_Chr11 C1 HS51M1_Chr11 C1 HS51Q1_Chr11 C3 HS10S1_Chr11 C7 HS6N1_Chr1 C7 HS6N1_Chr1 C9 HS13D1 C9 HS13D1
222
Aves
Figure 6.13 Cross genome phylogeny Figure 6.13 Cross- genome phylogeny on selected human ORs with ORs from non human primates and aves
Note : Aquva (HSC1), violet (HSC2), indigo (HSC3), blue (HSC4), green (HSC5), yellow (HSC6), orange (HSC7), red (HSC8), olive (HSC9) and teal (HSC10) were denoting distributed 10 human OR subclusters in cross-genome OR phylogeny and all other nonhuman ORs and ORs from Gallus gallus were noted in maroon colour.
6.8.5
Summary The pilot study with selected ORs from non-human primates, aves
and human ORs shows clear coclustering and evolutionary trends across genomes and the study provides platform to observe conserved motifs and othologs. 6.9 6.9.1 DATABASE OF OLFACTORY RECEPTORS (DOR) Objectives The availability of genome sequences for the interested genomes like yeast, fly, worm, mouse and human facilitate the creation of a nonredundant data repository on olfactory receptors. The selected eukaryotic genomes are useful model organisms and hence in vivo application can be
223 suggested using the curated data repositories and related structural information in the near future. DOR is an integrated database to provide sequence and structural information on olfactory receptors (OR) for selected eukaryotic organisms such as S. cerevisiae, D. melanogaster, C. elegans, M. musculus and H. sapiens. Versatile functions of ORs motivate to create a non-redundant data repositories which can be further used for various practical applications in the field of pharmaceutical industries (aroma therapy), olfacto-sexual function, olfacto-neural communication, cosmetic industry (perfume manufacturing), food industry, agricultural pest managements and so on for the vast practical application for the benefit of mankind. OR from each genome is peculiar about its sense of olfaction. For instance, amphibians retain both class I and class II type of olfactory receptors, where as teleost fish including the goldfish Carassius auratus carries only class I type receptors (Freitag, et al., 1998, (Speca, et al., 1999). This, further emphasizes the role of class I type receptor for detecting waterborne odors, wherein class II type receptors for sensing airborne odors. Since the amphibian lifestyle accommodates both terrestrial and aquatic habitat, the class I and II type receptors were acquired for its dual life-style (Freitag, et al., 1995). Higher order organisms also retain both class I type (fish-like) ORs and class II type ORs (mammalian ORs) (Glusman, et al., 2001,and Niimura and Nei, 2005). The observed two types of receptors in human (terrestrial vertebrates) particularly reveals the phylogenetic distance between fish and mammals and also the occurrence of class I and II types could be the result of an adaptive process during evolution. This permits fishes to sense water-soluable odors and mammals to recognize a large variety of hydrophobic and volatile compounds (Freitag, et al., 1998). Separately,
224 since there is no class-specific motifs were identified for these classes of ORs the structural differences are helpful in discriminating these two type receptors to some extent. Notably the length of the ELC3 in the class I type receptor ranges from 10 -15 amino acids, but ECL3 in class II type receptors in vertebrates ranges from 13-14 amino acid residues (Freitag, et al., 1998). This could be the best example to emphasize the need of integrated knowledge on sequence and structure to understand the property of ORs more in detail. Thus in the current study, attempts were made to incorporate information related to sequence analysis in documenting the non-redundant OR sequences, predicted membrane topology, possible cross-genome OR alignment, phylogeny, and structure analysis to provide information on predicted secondary structural details, conserved motifs and dimer-interfaces for the selected representative sequences selected from OR phylogeny (all structural information data were carried out by collaborators from NCBS and AIST). 6.9.2 Features on OR sequences in DOR DOR provides user friendly platform to access features related to OR sequence and structure (Figure 6.14 and 6.15). The main menu provides 5 key features like Sequence, Genomic combination, phylogeny, Structure and TM-MOTIF and database can be accessed from http://caps.ncbs.res.in/DOR
225
Figure 6.14 Available main menu in the front page of DOR

Notes : Snapshot depicting the available main menu Database Of Olfactory Receptors (DOR) with user interactive features. Label 1 refers to the retrieval of OR sequences for the genomes of interest in FASTA format by using the option Sequence. Label 2 indicates the available intraand inter-genomic OR cluster alignments and are available in both .aln and in .mas format by using the option Genomic combinations. Label 3 guides the user to view and download the phylogeny of selected intra- and inter-genomic OR phylogeny (available in .meg and .mts formats). Label 4 provides secondary structural details such as 3D structure, pairwise alignment with template, CONSURF, predicted dimer interface for the interested OR sequences. Label 5 facilitates user to download TM-MOTIF package to visualize MSA in VIBGYOR colouring scheme and to identify conserved motifs with AAS. All the said options have related drop-down menu namely ORGANISM which provides list of available organisms for user to select. Label 6 refers to the DORhome page to reach back after navigation. Label 7 refers to the available help-page for DOR.
DOR (Database of Olfactory Receptors) provides following information on olfactory receptor for sequence details: 6.9.2.1 OR sequences of target genomes: In this option, user can select their organism of interest and can collect the respective OR sequences (in FASTA format) by using the hyperlink in every genome for downloads . The related drop-down menu called SOURCE provides the list of organisms such as S. cerevisiae, D. melanogaster, C. elegans, M. musculus and H. sapiens.
226 6.9.2.2 Predicted TM boundaries By using this option, user can collect information about predicted transmembrane domain boundaries from TM1 to TM7 for predicted seven helices (Figure 6.15 and 6.16) and the predicted helix boundaries for the OR sequences were colored by violet (V), indigo (I), blue (B), green (G),yellow (Y),orange (O) and red (R) colouring scheme with respect to predicted seven TM-domains and when sequences were overpredicted for more than seven TM-domains, a pale cream colour is used. The sequences predicted less than seven TM-domains also observed through the incomplete representation in VIBGYOR colouring scheme. This provides knowledge on membrane topology at first sight and the given hyperlink helps the user to download the corresponding OR sequence in FASTA format and the OR sequences recommended for 3-D modeling are emphasized with a * symbol and the provided hyperlink helps the user to navigate to the webpage related to structural information.
A B
Figure 6.15 A snapshot of the give option sequence and its application in DOR
Note: The given snapshot depicts the display for the given menu Sequence (given in A) and the respective drop-down option for SOURCE (given in B) and the SUPPORTS (given in C).
227 DOR display for selected sequence with predicted membrane topology in VIBGYOR colouring scheme (given in D) and respective display to retrieve FASTA sequence (given in E) is shown.
Figure 6.16 Display of predicted membrane boundaries in DOR

Note : Display for the given option sequence wherein each olfactory receptor was given with protein ID, sequence length, NCBI protein identifier, followed by predicted membrane topology (by HMMTOP), Here , Nterminal, number of predicted helices along with the predicted TM boundaries for the seven helices with start and stop positions. Notably, seven helix boundaries were denoted in VIBGYOR colouring scheme and over/under predicted helices were also given.
6.9.2.3
Single/cross- genome OR alignments Apart from the uni genomic phylogeny, few cross-genome
phylogenetic analysis were performed and the user can select any of the following combinations to view phylogeny such as S. cerevisiae D. melanogaster -H. sapiens, C. elegans- H. sapiens, and H. sapiens-
M. musculus. MAFFT alignment tool was used to generate the alignment for the selected intra and inter-genomic organism(s). Here, user can benefit by the cross-genome alignments to study more on comparative genomics. The MSA of the interested genome can be downloaded both in CLUSTALW alignment
228 format (.aln format) also in the MEGA alignment session format (.mas format) (Figure 6.17).
Figure 6.17 Display of Alignment option in DOR

Note: Snapshot showing the display of Alignment option for the selected genomic combinations (H. sapiens- M. musculus) and the respective cross-genome OR alignment has been given in MEGA (.mas) and CLUSTAL W (.aln) formats (given in A and B).
6.9.2.4
Cluster association and Phylogeny The phylogenetic analysis at single genome level and cross -
genome level provides knowledge on cluster distribution. By observing tree topology with significant BS values, related sequences were grouped as clusters and the cluster-wise distribution of sequences were given in MSA. For example, around 371 ORs of H. Sapiens were grouped into 10 OR subclusters and distributed 10 OR clusters were added in TM-MOTIF tool (Chapter 4) to observe the conserved amino acids, along with AAS in the each position of the alignment. The generated phylogeny, for all selected genomes, at intra genomic and inter genomic association were made available with legible display and cladograms were made available to users for
229 downloadable image format and the MEGA tree session file in the .mts also available for the downloads (Figure 6.18).
Figure 6.18 Display of cross-genome OR phylogeny in DOR

Note: A snapshot for the option PHYLOGENY showing the cross -genome OR phylogeny (H. sapiens- M. musculus) and respective MEGA tree session file (.mts) can be downloadable.
6.9.2.5
Softwares and Tools (TM-MOTIF) in DOR TM-MOTIF is a downloadable software tool (Chapter 4) and an
effective alignment viewer to map discovered motifs on predicted membrane topology in the set of aligned OR sequences in VIBYOR colouring scheme. TM-motif mainly helps in mapping the discovered motifs on the intra- and inter-genomic clusters of users interest. GPCR cluster dataset of human, D. melanogaster. and C. elegans and 10 human OR subclusters are available as an inbuilt cluster dataset and user can view the detected motifs at intra and inter-genomic cluster alignments of inbuilt dataset or user can also submit their sequences of interest in the MSA (.aln format) along with multiple fasta sequences to run the various display options such as Run-TM, Run-Motif and Run-TM-motif. By using TM-motif, user can identify conserved
230 motifs at 60% level of conservation along with amino acid substitutions (AAS) with their physicochemical properties such as hydrophobic, aromatic, polar positive, polar negative and polar uncharged at each position in the MSA. Separately, user can submit their sequence of interest to align with any of the selected reference sequences (whose structure is known) to obtain the pairwise alignment with TM-MOTIF display. Such an annotated alignment can be effectively used for modeling the sequences and also guide template selection. User can choose the option to run-blast search the nearest homologues for their sequence of interest from the in-built cluster associations. The tool can be downloaded and used as a standalone package for the benefit of user (Also refer Figure 6.19 and 6.20). .
Figure 6.19 Overview on pictorial representation of available features in DOR for sequence analysis
Note : Label 1 depicts the option Sequences for the retrieval of OR sequences in FASTA format. Label 2 refers the available alignments for single and cross-genome display in CLUSTAL W format (.aln ) and in MEGA format (.mas). Label 3 indicates the display of predicted seven TM-helices with respective boundaries in VIBGYOR colouring scheme. Label 4 shows the display of generated phyogenetic tree for the unigenome Label 5 refers to the display of cross-genome phylogeny. Label 6 indicates the available DOR-help page and Label 7 displays the TM-MOTIF display of the OR subclusters in VIBGYOR colouring scheme and identified motifs.
231
Figure 6.20 Overview on DOR features for sequence and structural information for olfactory receptors in DOR
Note : The five available SOURCE options are given in pink arrows and are numbered from 1-5. The available ORGANISM and SUPPORT drop-down menu options are given in inverted triangle in blue.
Label 1 The Sequence in the SOURCE option provides the list of ORGANISM such as C. elegans, D. melanogaster, H. sapiens, M. musculus, S. cerevisiae. The respective SUPPORT option provides the Sequence and Alignment. Sequences facilitate the retrieval of OR sequences in FASTA format. Alignment provides the alignments for single and cross-genome display in CLUSTALW format (.aln ) and in MEGA format (.mas). And result files are downloadable. Label 2 The GENOMIC COMBINATION in the SOURCE option provides the list of ORGANISM such as H. sapiens/C. elegans , H. sapiens
232 /M. musculus , H. sapiens/D. mel/S. cer, D. mel/C. elegans for the display of cross-genome alignments in CLUSTAL W format (.aln ) and in MEGA format (.mas). And result files are downloadable. Label 3 The Phylogeny in the SOURCE option provides the list of ORGANISM such as D. melanogaster, H. sapiens, M. musculus , S. cerevisiae , H. sapiens/C. elegans, H. sapiens / M. musculus , H. sapiens/D.mel/S. cer, D. mel/C.elegans for the display of unigenome and cross-genomic OR phylogeny and the tree session files are downloadable. Label 4 The Structure in the SOURCE option provides the list of ORGANISM such as C. elegans, D. melanogaster, H. sapiens, M. musculus, S. cerevisiae and representative 3D models with features like (a)Alignment between OR sequence and bovine rhodopsin, (b)-Pymol session file with seven TM domains coloured in VIBGYOR colour. (c) -Residue conservation mapped on OR sequence using Consurf. (d) Residue conservation mapped on OR homology model using Consurf. (e)-Validation chart for every homology model. (f) Dimer-interface prediction for OR model. And the result files are downloadable. Label 5 MOTIF ANALYSIS TOOL provides option for TM-MOTIF an alignment viewer to display predicted seven TM-helices of ORs in VIBGYOR colouring scheme with the identified motifs mapped on the alignments along with AAS and the package is available for downloading.
233 6.9.3 Structural features (Application of sequence searches) The performed sequence searches were highly useful in proposing representative sequences, intra and inter-genomic OR clusters further to perform homology modeling (by K. Harini, NCBS, Banglore) and dimer interface predictions (by Dr. Nemato, AIST, JAPAN). Structural features like homology modeling of selected OR sequences, related alignments, conserved residues, details on structure
validation and predicted dimer interface residues have been also incorporated into DOR for having complete knowledge on sequence-structure-function paradigm through DOR.
Figure 6.21 Display of 3D Structure and related features in DOR

Note : A snapshot for the option STRUCTURE showing related options about generated model and predicted dimer-interfaces for certain representative OR sequences selected from phylogeny.
6.9.4
Summary DOR (Database of Olfactory Receptors) is an user-friendly
database where user can retrieve and download information on both OR sequence and structure arena for the five eukaryotic genomes. (The supportive
234 tables, alignments, and phylogeny are available in the URL :http://caps.ncbs.res.in/DOR) The given option Sequence provides non-redundant OR sequences for the targeted eukaryotic genomes. The other option namely TM-boundaries provides the predicted TM-helices for each OR sequence with the start and end position for each predicted helix and the predicted boundaries for seven helices are given in seven different colour (VIBGYOR colouring scheme) for easy observation. The given option Alignment provides not only the MSA for the single genome but also for the crossgenome. These alignments can be further used to detect conserved motifs and particularly crossgenome alignments are very useful for the evolutionary perspective. The phylogenetic tree from uni-genome and cross-genome help us to study the cluster associations to select the representative sequences for further analysis. The generated phylogenetic tree (single and cross-genome) further helps to understand the sequence properties at intra and inter-genomic levels. These sequence studies could effectively be used to detect cluster specific motifs from the MSA, species-specific cluster association and cocluster association at cross genome phylogeny. The selected best representative sequences from the generated can be suggested to predict for the homology modeling and to predict dimerinterfaces to discover functionally important residues and ligand binding pockets. The list of non-redundant OR sequence can be further used to train SVM to identify potential OR sequences also implemented to identify orthologs across genomes. As an initiative in implementing the sequence knowledge, TMMOTIF a tool to detect motif in the set of aligned OR sequences was incorporated to database. An inbuilt dataset of 10 human OR subclusters was available in the TM-MOTIF package and downloadable. User can also use :
235 their sequence of interest to view the alignment in VIBGYOR colouring scheme with identified conserved motifs along with AAS in each position of the alignment. The olfactory receptor structures provide great opportunity to the users to analyse the interaction between helices, the conservation of residues within helices and to generate electrostatic contour maps. This would further help us in understanding the mechanism of function of olfactory receptors. The dimer-interface prediction for every structure guides us further to study the oligomerization process of these receptors and the functional significance of such higher order entities.
236
CHAPTER 7 CONCLUSION
7.1
COMPENDIUM My Ph.D objective entitled Genome wide survey of certain
mammalian GPCRs and olfactory receptors has been carried out using effective bioinformatics approaches and resulted in insights on related GPCR/OR sequences at cross-genome level, conserved sequence features and the design of computational package (TM-MOTIF) and database (DOR,
Database of Olfactory Receptors). In this chapter, I wish to compile the highlights of results previously discussed in Chapters (2-6) and intended to highlight critical results, scopes, applications and future directions in brief. The abundant availability of non-olfactory G-protein-coupled receptors (GPCRs) (GRAFS system of classification), and olfactory GPCRs (OR repositories) of various model organisms facilitates to investigate intraand inter-genomic phylogenetic clustering studies. The main purpose of the current study is to collect biologically most significant GPCRs and ORs from selected eukaryotic genome(s) and to perform cross-genome GPCR/OR clustering to address the conserved evolutionary trends (motifs and orthologs) and co-clusters. The other mandate had been to create related tools and databases, for public access, for this study on membrane proteins.
237 Analysis on phylogenetic clustering of GPCRs/ORs helps to recommend the best representative sequences, the cluster-specific sequence motifs and structure-function studies for various practical applications. 7.2 CROSS-GENOME GPCR CLUSTERING Chapter 2 is focused on cross-genome clustering of selected GPCRs from human and C. elegans genome to provide results like cluster-specific associations and motifs at intra- and intergenomic levels. A profile database of 32 well-known GPCR clusters and the RPS-BLAST technique were utilized to associate more than 1000 C. elegans GPCRs to the known group of human GPCRs. The previously established and biologically significant eight major types of human GPCR clusters (such as peptide receptors (PR), chemokine receptors (CMK), nucleotide and lipid receptors (N&L), biogenic amine receptors (BGA), class B (secretin) receptors (SEC), cell adhesion receptors (CAR), class C (glutamate) receptors (GLR), frizzed/smoothened receptors (FRZ/SMT)) were used to associate more than 1000 C. elegans GPCRs to associate functional relevance. Serpentine receptors of nearly 20 recognizable families, grouped as sra superfamily (sra, srab, srb, and sre), srg superfamily (srg, srt, sru, srv, srx, and srxa), str super family (srd, srh, sri, srj, str) and others or solo type (srbc, srsx, srw and srz) have also been associated with 32 known human GPCR cluster dataset. Cross-genome GPCR alignments were prepared using an efficient alignment procedure PRALINETM server
(Pirovano et al 2008) and the cross-genome GPCR
phylogeny was generated by quartet-based maximum-likelihood method for 10,000 BS replications using TREE-PUZZLE (Schmidt et al 2002). The resultant 32 cross-genome GPCR cluster association of human and C. elegans GPCRs were analyzed for the type of cluster association using the alignment viewer in MEGA 4.0 (Tamura et al 2007). Terminologies such
238 as human GPCR clade [HC], co-clusters [CC], neighbor clades [NC], neighbor members [NM], species-specific members [SS] have been used to describe the branching types in the dendrogram (Chapter 2) and to refer the types of association as pure distribution (homogenous occurrence) of human GPCRs, inter-mixing distribution (heterogenous occurrence) of GPCRs to denote highly related (co-clusters & neighbor clusters) and distantly related (neighbor members) nematode to human GPCRs and the homogenous distribution of nematode GPCRs in the tree topology, respectively. The designed protocol is quite effective in associating remote homologues, for instance, the cross-genome GPCR cluster association shows average cluster identity ranging from 12% to 20% in many clusters and this reflects the efficiency of RPS-BLAST in associating nematode GPCRs to the given human GPCR profiles even at low sequence identities. In parallel, the current approach on profile-based clustering of nematode GPCRs with the functionally known human GPCRs was quite impressive in associating 84% of nematode GPCRs with the human GPCRs at significant E-value thresholds (ranges from 0.001 to 1). Additional 14% association was observed at the E-value thresholds ranges between 1 to 5, and very small percentage i.e., 2% of association was done by the E-value thresholds more than 5. The cross-genome GPCR association exhibit 27 nematode GPCRs as orthologs to certain human GPCRs. Notably, the observed orthologs occur predominantly in the co-clusters (results in Chapter 2) indicating close relationship. For instance, two dopamine receptors, namely dop-1 and dop-2, from C. elegans were associated with the human biogenic amine type receptors at the significant E-values in Cluster 24. In other instance, GABA B receptor subunit (gbb-1) from C. elegans is identified as an ortholog to human
239 (GABA) B receptor 1 (GBR2_HUMAN) at the most significant E-value thresholds (Remm and Sonnhammer, 2000). This ortholog pair retains 37 % of sequence identity and 51% sequence similarity. Interestingly, the identified putative ortholog pairs, such as Q96AM5/NP_509515.1, TRFR_HUMAN/NP _491990.1, V2R_HUMAN/NP_493193.1, NK1R_HUMAN/ NP_500930.1 and NY4R_HUMAN/ NP_508234.1 from the clusters such as 5, 6, and 11 from peptide receptor type can be further explored for functional relevance to human GPCR types, since the counterpart GPCR from C. elegans were annotated as hypothetical proteins. Thus, the identified ortholog pairs emphasize the role of RPS-BLAST in associating closely related species across taxa. 176 GPCRs annotated as hypothetical proteins (unannotated proteins) from C. elegans have been associated by RPS-BLAST to the known human GPCR type and provides a platform to investigate the functional relevance with the associated human GPCR type (s) (examples from the Clusters 3-5, 8, 11, 16-17, 23 and 32). Besides evolutionarily related GPCR sequences, certain candidate GPCRs showed species-specific tendency (referred as SS and HC) in the cluster association. Notably, few candidate receptors from the largest str superfamily show relaxed E-value thresholds, indicating the distant relationships of nematode GPCRs with human GPCRs in evolution, particularly str and srh type receptors. A trial study conducted with known associations (cross-genome human-Drosophila GPCR clusters (Metpally and Sowdhamini 2005), showed 90% of correct association at significant E-value thresholds: Table A2.1 in Appendix). Studies verified/cross-checked with known association (Trial study), and identified orthologs, clearly support the RPSBLAST
240 clustering technique in associating sequences (remote homologues) to related PSSM profiles. In essence, the cross-genome GPCR association between diverse serpentine receptors from C. elegans and eight major types of human GPCRs provide opportunity to explore the secondary structural details, conserved motifs and to confirm functional relevance in vivo across these genomes for practical applications. 7.3 PHYLOGENETIC RECEPTORS Chapter 3 describes the phylogenetic analysis on selected serpentine receptors of C. elegans. 683 serpentine receptors were collected from SEVENS database (Ono et al 2005) and 97% of sequences were found to be retaining N-out topology in the predicted membrane topology. Since a broad spectrum of serpentine receptor superfamily members i.e., nearly 20 SR families have been reported for the C. elegans chemoreceptors (Robertson 1998, Robertson and Thomas 2006), the current objective of performing phylogenetic analysis is helpful to identify the related serpentine receptors and conserved sequence features following cluster association at superfamily level. The generated phylogenetic tree exhibited the cluster association in a family-specific manner and in turn the superfamily-specific cluster association. As odr-10 is the only one annotated olfactory receptor in C. elegans (Sengupta et al 1996) to sense compounds like di-acetyl, the subclusters related to odr-10 has been studied in detail. Interestingly, str-112 is found to be the closest homologue to odr-10 and has been identified from the associated tree topology. Through phylogenetic analysis, 43 SR sequences have been identified as homologues to odr-10 and are distributed in the subclusters namely, Str_C1 to Str_C6 in tree topology. Interestingly, all the ANALYSIS ON SERPENTINE
241 sequences associated to odr-10 belongs to str family of Str superfamily to represent species-specific tendenc at family and superfamily levels and to study ligand binding for odr-10 homologues. This cluster association can be taken as a best example to explain the effectiveness of phylogenetic approach in associating closely related sequences at intra-genomic level, and also guides to connect structurefunction relevance. As a pilot test, a case study on odr-10 has been performed for the secondary structural details. A three-dimensional model was generated using bovine rhodopsin (known structure) as a template through homology modelling technique using MODELLER (Sali and Blundell 1993). The generated three-dimensional model shows a final energy of -1020.23 kcal/mol after energy minimization and shows 82% of the residues are observed within strictly allowed regions and 14% are observed within partially allowed regions of the Ramachandran plot. Also, the model generated only with TMhelices shows structure validation for allowed regions as 93.3% and additionally allowed regions as 5.2%. Such a model can be further studied for ligand-binding sites and active sites/hot spot residues in the three-dimensional structure embedded in the lipid-environment in-silico. Identified homologues of odr-10 can be further explored for secondary structural details, oligomerisation, ligand-binding sites, and in sensing di-acetyl compounds. This case study can be an example for the usage of sequence studies and to extend structure prediction further to functions. In order to analyze the conserved sequence features, few representative SR sequences were collected and aligned by MAFFT alignment procedure (Katoh et al 2002). These were used to detect the amino acid conservation by using TM-MOTIF package (Chapter 4). 92 family-
242 specific motifs have been identified from the selected serpentine receptors and the observed sequence features can be used for SVM techniques to train the sequence features and can be used further to detect the SR-like sequences from other nematode species and other organism(s). Since odr-10 also reported for the N-out topology as human olfactory receptors, a phylogenetic study was conducted with selected human ORs (371 ORs) and odr-10. However, the generated phylogeny does not show any significant co-clusters and odr-10 stays as an outgroup. This may be due to the long lineage in evolution, nematode life style and the ability to recognize limited and simple odors. The lack of coclustering also suggests that the agreement in topology may not necessarily include olfactory receptors to cluster together. This way, cross-genome clustering and phylogeny provide preliminary guidelines on types of sequence association as related or distant within and across genomes. 7.4 TM-MOTIF PACKAGE The characteristic feature of TM proteins in retaining seven helices with three intra- and extracellular connecting loops, provide an opportunity to compare the conserved sequence feature (motifs) within and across genome(s) in the set of aligned homologous sequences. The main objective of TM-MOTIF package is to identify and display the conserved motifs and amino acid substitutions (AAS) in the set of aligned transmembrane proteins. The key feature of TM-MOTIF package (Figure 4.3) is primarily to aid user to visualize identified motifs on predicted seven transmembrane helices and loop regions of the MSA(Tusnady and Simon 2001), where the predicted seven TM-helices are displayed in violet (V), indigo (I), blue (B), green (G), yellow (Y), orange (O) and red (R) colors
243 (VIBGYOR colouring scheme) and the conserved residues along with substituting amino acids (AAS) (at default of 60% conservation) are documented at each position in the multiple sequence alignment Figure 45. And an mouse-over option provides the details about the type (physicochemical property) of AAS at each position. An in-house program for the identification of motifs (MotifS program, written by R.Sowdhamini) was used effectively to identify residue conservation and substitutions in each position of the alignment. Amino acid substitutions were denoted according to their physico-chemical properties such as hydrophobic (@), aromatic (*), polar positive (+), polar negative (-) and polar uncharged ($)) by the given symbolic representation. The user-friendly TM-MOTIF package provides options for the user to submit their sequence of interest (should be membrane proteins) in FASTA format, along with its respective MSA. User can select any one of the given display options such as Run-TM, Run-MOTIF and Run-TMMOTIF from TM-MOTIF package (Figure 4.5-4.7 in Chapter 4). Inevitably, considerable amount of mis-predictions occurs due to false merge and falsesplit of TM-boundaries and causes underprediction and overprediction of TM helices. For such cases, the full length of the sequence is displayed in pale cream colour. For all the displays options, the conserved residue in each position of the alignment as consensus is displayed along with the MSA. An inbuilt dataset of previously established phylogenetic clusters (Metpally and Sowdhamini, 2005) of selected human - Drosophila GPCR cluster dataset, a profile based clustering of selected human - C. elegans GPCR cluster dataset of eight major groups of 32 clusters (Chapter 2) and clearly distinguishable 10 human-mouse OR clusters (Chapter 6) from cross-genome clustering studies were incorporated in TM-MOTIF package.
244 TM-MOTIF is user-interactive tool, where user can use the option namely, Run-BLAST to collect the nearest homologue for their sequence of interest from the in-built dataset. User can also select any one of the reference sequences whose structure is solved such as bovine rhodopsin, japanese flying squid rhodopsin, common turkey -1 AR, human -2 Adrenergic receptor, human adenosine receptor A2A, human dopamine D3 receptor and human CXCR4 chemokine receptor to get a pairwise alignment (by CLUSTAL W) in preferable TM-MOTIF display options and can be further used for homology modelling. Also, TM-MOTIF provides useful output files such as Zconsensus.txt, Zpattern.txt, Zmotif.txt (output for the three display options), Zuser.aln, Zuser.pir (output for the alignment option namely compare with reference Sequenceoption) and Zblast_sorted.txt (output for RUNBLASToption). The TM-MOTIF package could be enriched with other genomes for in-built cluster dataset and extended to membrane-bound helical proteins like ion channels and transporters in future. Also, TM-MOTIF alignment displays could be supported with graphical representation (as structures in 2D cartoons). TM-MOTIF package has been effectively used for the crossgenome GPCR/OR cluster dataset and is highly suitable for the comparative genomics to identify the cluster / receptor specific and common motifs observed at various percentage of conservation within and across the genome(s) of interest. TM-MOTIF is suited for the linux OS. It requires pre-requisites such as: PerlTk, BioPerl, FORTRAN compiler and standalone versions of CLUSTAL W and BLAST2 installed in users machine. The package is integrated with DOR (Database of Olfactory receptors) and downloadable from the URL http://caps.ncbs.res.in/DOR (Chapter 6 also).
245 7.5 STUDY ON CONSERVED MOTIFS AND AAS IN CROSSGENOME GPCR CLUSTERS The role of conserved motifs and AAS play crucial role in functional aspects. Interestingly, in membrane proteins conserved amino acids play an important role in GPCR mechanism, structural stability and mutations causing diseases and abnormalities. So, the current study (Chapter 5) is aimed to identify the conserved motifs along with the substituting amino acid (AAS) in the set of aligned homologues sequences, particularly cross-genome GPCR cluster datasets. As mentioned in Chapter 4, previously established 32 clusters of eight major types of receptors of cross-genome GPCR clusters such as human-Drosophila GPCR clusters, human-C. elegans GPCRs, and human only GPCR cluster dataset were considered to identify conserved motifs. TMMOTIF package has been used to recognize membrane topology for the observed motifs. A total of 33 conserved motifs have been identified from the crossgenome (human-Drosophila) GPCR cluster dataset and 76% of them were observed in TM helices (predominately in TM2 and TM7). Motifs observed in single receptor type (also known as cluster /receptor-specific receptors), two and multi-receptor types were also studied. Interestingly, VGL motif in TM1, LGF motif in TM5 and NSC motif in TM7 are observed exclusively in peptide receptors. YLLNLA motif in TM2 and HCC motif in TM7 are observed in chemokine type receptors. Motifs such as GNL motif in TM1, VMP motif in TM2, TASI motif in TM3, PFF motif in TM6 and WLGY motif in TM7 are observed exclusively in BGA type receptors and these motifs can be referred as receptor-specific motifs and are very interesting since they are observed at cross-genome level.
246 Motif such as SLA in TM2 is identified in two receptor types such as peptide and biogenic amine receptors. Motifs such LFL, TLP and LPF motifs in TM2, AIA motif in TM3, LPL motif in TM5 and LYA in TM7 are observed in both peptide chemokine type receptors. Also, IYL motif in TM2 and CIS motif in TM3 are observed not only in chemokine type receptors, but also in nucleotide and lipid type receptors. The other motif pattern as DLL (also as ADL, ADLL) in TM2 is observed in multi-receptor types. However, several motifs were identified exclusively in TM-helices and 133 such motifs have been documented along with AAS. Also, 59 clusterspecific motifs observed in the loop regions were also documented. For example, CLP motif from PR (Cluster 7) has AAS in the pattern as [C/P][L/F][P/C/S]. Interestingly, the maximum amino acid conservation occurs as 42% and 46 % in TM2 and TM3, respectively. Significant conservation of 55%, 80%, 61% occurs in TM1, TM2, TM3 within CMK receptors. Although the occurrence of motifs (consecutively preserved as three residues) are high in PR, it retains only 30- 50% of conservation at TM2, TM6 and TM7. Generally, AA conservation is high at TM2 for BGAR, SEC, GLUR, and FRZ type receptors. Motifs preserved in the loop regions also identified for the cause of functional importance such as structure stability, ligand binding (extracellular loops), signaling (intra-cellular loops). There are eight different motifs were observed in loop regions and the well-known E/DRY motif in ICL2 is also found as DRYLA, RYL, LDR at 60% level of conservation. MRTVTN and ASG motifs were observed in both glutamate and peptide type receptors,
247 whereas WPFG and LCK motifs were found exclusively in ECL2 of peptide type receptors. The list of identified motifs from this study illustrates the conserved sequence properties (motifs) across two (or more) different receptor types and provide clues to connect common sequence properties observed at crossgenome level. In most of the clusters, as expected, percentage residue conservation in ICL2 is higher than the other loop regions. Preliminary analysis on identification of conserved motifs at 30% level of conservation (due to the evolutionary distance) for human-C. elegans GPCRs cluster dataset have been performed and handful of identified motifs (295 motifs) for the cross-genome human-C. elegans GPCR clusters have been documented. The study on identifying conserved motifs and AAS at cross-genome GPCR cluster depends on number of sequences, membership/participation of sequences from particular genome, sequence length, alignment procedure, sequence identity and evolutionary relationship. In essence, the identified motifs emphasize the importance of conserved residues in terms of functional relevance across receptor types and the study is more useful since pursued at cross-genome level. 7.6 PHYLOGENETIC ANALYSIS ON ORS IN SELECTED EUKARYOTIC GENOMES Olfactory receptors (ORs) belong to the largest group of class A type GPCRs (Gaillard et al 2004) and are fascinating for their vast practical applications. The current study (Chapter 6) is aimed to perform phylogenetic analysis on certain olfactory receptors in selected eukaryotic genomes. Primarily, 371 OR sequences were collected from various data resources and
248 unrooted NJ method of phylogenetic analysis was conducted for the 1000 BS replicates. Interestingly, the selected human OR sequences were distributed in 10 subclusters (namely HSC1-HSC10) and showed remarkable differentiation in tree topology. Among the 10 OR subclusters, HSC1 remains distinct and retains the class I type receptors and are found to be responsible for sensing the water-borne odours and could be fish-like ORs (Freitag et al 1998). The other subclusters from HSC2-HSC10 were referred for the class II type receptors (mammalian like ORs). Notably, almost all receptors in HSC1 are from chromosome 11 and are related to class I type receptors. Human OR subclusters exhibit percentage identity ranges from 44% to 54%, showing the sequence diversity at intra-genomic level for the need of recognizing complex and diverse odors (Hayden, et al., 2010). Motifs exhibiting 60% conservation were identified from 10 OR subclusters and 163 motifs were identified. These include both common and cluster-specific motifs for 10 human OR subclusters to various topologies such as TM-helices, loops, helix-loop junctions, loop-helix junction, N-,C-termini, N`-TM1junction and TM7C`junction (Table section 6.1 in Chapter 6). From the generated human OR phylogeny, 50 representative sequences have been recommended further for three-dimensional modelling. Separately, a cross-genome OR phylogeny with human and selected fish ORs suggest that HSC1 is related to fish-like ORs (class I-type) (Section 6.2 in Chapter 6) in human OR phylogeny and cross-genome OR phylogeny of frog ORs (pertaining to dual lifestyle to sense both air and water-borne odors) with human ORs helped to discriminating the class I (air-borne) and class II type (water-borne) receptors in human OR phylogeny. Interestingly, KAFSTC motif related to class I type receptors, is conserved both in HSC1 and in the ORs of fishes like zebra fish. This illustration further confirms the effectiveness of phylogenetic clustering in
249 associating related sequences across genomes. In parallel, this emphasizes the necessity of identifying motifs to understand the sequence features at crossgenome levels. Drosophila olfaction is an interesting field of study and notably fly ORs exhibit reverse topology (Benton et al 2006, Wistrand et al 2006) and the study in performing phylogeny on selected 60 Drosophila ORs established the cluster association as 10 subclusters, namely DMC1-DMC10. It is also found that the known 24 antennal receptors (Hallem et al 2004, Dobritsa et al 2003) were distributed in eight subclusters, except in DmC4, DmC8. Interestingly, the OR83b and associated sequences such as 83a,104,63a,67b (DmC8) can be further examined for common motifs (at various level of conservations such as 30-60%), predicted for secondary structures and to observe for ion-channel properties. Overall, candidate OR sequences in the phylogeny contributed only 18% average sequence identity and show diverse requirement for fly olfaction. An attempt has been taken to perform a cross-genome phylogenetic analysis on selected Drosophila ORs, human ORs and OR-like sequences from yeast. The resultant phylogeny clearly depicts the distant cluster of Drosophila ORs and there is no-significant co-clustering between selected ORs of human and Drosophila. This proves that insect ORs are evolutionarily distinct from mammalian ORs. This could be due to the independent evolution of fly ORs, and life style of fruit flies in sensing specific odour, whereas higher order organisms established a complex olfactory system. Also, probably the observed reverse topology in the fly genome could be another reason for the observed lack of co-clustering with human ORs. As we know, chemosensory receptors in nematodes are highly diverse and abundant. Nearly 20 families of serpentine receptors participate in
250 chemosensation (Robertson 1998, Robertson and Thomas 2006) and particularly odr-10 is the only one annotated olfactory receptor in C. elegans which is capable of sensing di-acetyl compounds. A cross-genome phylogeny on selected human ORs with odr-10 and 82 homologues of odr-10 (Chapter 3) showed the lack of co-clustering. Interestingly, the obtained subcluster namely CeC3 retains odr-10 and 22 serpentine receptors, all belonging to Str superfamily. And particularly Str-115 is found to be the nearest homologue to odr-10 through phylogenetic cluster associations. As found in human olfactory receptors, mouse ORs also possess two broad classes of ORs and comparatively mouse ORs are abundant and more diverse than human ORs (Zhang and Firestein 2002). Human ORs are predominantly distributed in chromosome 11 and 1, wherein mouse ORs are scattered in all the chromosomes except chromosome 12 and Y. In the interest of performing the cross-genome phylogeny, a preliminary study was conducted with 50 representative human OR sequences and 410 mouse ORs and were co-aligned. The resulted NJ method of phylogeny exhibited significant co-clustering and notably, 72 mouse ORs were co-clustered with the given class I type of human ORs and the rest of the 45 human OR sequences were distributed along with other mouse ORs to represent class II type receptors. This further helps to discriminate the occurrence of class I and II type of ORs in the mouse. This is also an appropriate example to emphasize the effective use of representative sequences in cross-genome phylogeny. A chemokine receptor, namely Duff_human (recently evolved GPCR), was also included along with the human and mouse ORs and it stays as an outgroup. Since only limited number of human ORs were considered for cross-genome survey, the possibility on critical analysis on mouse OR subclusters is limited and thus the phylogeny can be further improved by adding additional human ORs.
251 Also, a pilot study with ORs from human and non-human primates have been studied and predicted for membrane topology and analyzed for the cluster arrangements at intra-and inter genomic level. In essence, ORs of each genome is peculiar for their sense of olfaction, and the performed cross-genome phylogenetic analysis addresses the issues on conserved evolutionary trends, clustering and orthologs at intraand inter-genomic levels for olfactory receptors (OR) in selected eukaryotic genomes. So, in the interest of creating a non-redundant data repository for the ORs in selected eukaryotic genomes such as Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Mus musculus, and Homo sapiens with the related sequence information, 3D models of representative ORs (K.Harini, NCBS, Banglore), dimer-interface predictions (Dr. Nemato, AIST, Japan) have been compiled and deposited to generate/construct a Database of Olfactory receptors namely DOR. DOR- provides sequence and structural information on olfactory receptors (OR) of selected organisms. And especially information such as OR sequences, predicted membrane topology, crossgenome OR alignments and phylogeny, tool for motif identification (TM-MOTIF), are available attractive features to access. The given option Sequence provides non-redundant OR sequences for the targeted eukaryotic genomes. The other option namely TM-boundaries provides the predicted TM-helices for each OR sequence with the start and end position for each predicted helix and the predicted boundaries for seven helices are given in seven different colour (VIBGYOR colouring scheme) for easy observation. The given option Alignment provides not only the MSA for the single genome but also the cross-genome alignments. These alignments can be further used to detect conserved motifs
252 and particularly cross genome alignments are very useful for the evolutionary perspective. The phylogenetic tree for uni-genome and crossgenome helps to study the cluster associations, to select the representative sequences for further analysis. The generated phylogeny (single and crossgenome) further helps to understand the sequence properties at intra- and inter-genomic levels. These sequence studies could effectively be used to detect cluster specific motifs from the MSA, species-specific cluster association and co-cluster association at cross-genome phylogeny. The selected best representative sequences from the clusters can be gathered for homology modelling and to predict dimer-interfaces to discover functionally important residues and ligand binding pockets. The list of nonredundant OR sequence can be further used to train SVM to identify potential OR sequences, that can be also implemented to identify orthologs across genomes. As an initiative in implementing the sequence knowledge, TM-MOTIF a tool to detect motif in the set of aligned OR sequences was incorporated in the database. An inbuilt dataset of 10 human ORsubclusters was available in the TM-MOTIF package and downloadable. User can also use their sequence of interest to view the alignment in VIBGYOR colouring scheme with identified conserved motifs along with AAS in each position of the alignment. The olfactory receptor models provide great opportunity to the users to analyse the interaction between helices, the conservation of residues within helices and to generate electrostatic contour maps. This would further help us in understanding the mechanism of function of olfactory receptors. The dimer-interface prediction for every structure guides us further to study
253 the oligomerization process of these receptors and the functional significance of such higher order entities. In short, DOR (Database of Olfactory Receptors) is an user-friendly and composite resource, with information on sequence and structural information of several ORs. The users can retrieve and download information on both OR sequence and structure arena for five eukaryotic genomes. The list of non-redundant OR sequences can be further used to train machine learning algorithms, to identify potential OR sequences and also implemented to identify orthologs across genomes. The database can be accessed from http://caps.ncbs.res.in/DOR. 7.7 SUMMARY To conclude, my research interest on Genome-wide survey on certain mammalian GPCR and ORs provides useful insights for the scientific community, particularly scholars interested in olfaction and membrane proteins and can be applied to the fields of molecular modelling and drugdesign. As we know, membrane proteins are of utmost significance and are vital proteins for cellular activities, pharmaceutical importance, and related to human healthcare, sequence analysis on these proteins across genome provide excellent opportunity and responsibility to convey knowledge on sequence properties further to connect structure and function. The study on cross-genome GPCR clustering of biologically significant GPCRs of human and C. elegans is useful in introducing a profilebased clustering technique such as RPS-BLAST. The realization of related sequences across genome paves a way for comparative genomics, the usage of
254 viable model organism, and indeed a valid starting point to conduct experimental studies for functional implications. Separately, phylogeny-guided sequence analyses across genomes can be explored for the conserved sequence features like domain architecture, motifs, amino acid substitutions and orthology. Receptor-specific-sequence properties, in turn, can be used for the support vector machine a machine learning approach - to predict putative candidate receptors across genome. Sequence information on known protein sequences guides to predict the structural and functional relevance of the unknown sequence, when associated by clustering technique/phylogeny. This fundamental understanding leads to assign functional relevance of the unknown sequence with the reference sequence which is high when degree of sequence identity/BS value/ E-value is favourable. Besides, evolutionary pressure also plays a major role in relating sequence features across genome. Exclusive study on serpentine receptors (in Chapter 3) could inspire to compare the illustrated sequence properties (such as motifs), odr-10 and its homologues to train SVM and also to identify putative chemoreceptors in other nematode species. The designed TM-MOTIF package is complete with alpha testing and is an user-friendly tool to visualize motifs in TM-proteins in VIBGYOR colouring scheme in the large alignment window. It can be used as an academic tool-kit to identify sequence motifs in membrane proteins. It is helpful for generating pairwise alignment between query and a reference sequence (whose structure is known), the pre-requisite of homology modelling. The package can be used effectively to identify not only the conserved motifs but also substituting amino acids. Such studies inspire applying bioinformatic approaches in handling biological data effectively.
255 Insights from the analysis of conserved motifs and permitted amino acid exchanges in the human, the fly and the worm GPCR clusters provide knowledge on conserved sequence features across taxa to establish structurefunction relationship further to apply for the vast practical applications. Information on ORs, organized as a database of olfactory receptors (DOR), should assist the study of structural details, ligand-binding properties, associated mechanism/ proteins (OBP), signaling for vast practical application in the fields of pest-control, pharmaceutical industry (aroma therapy), cosmetic industry (scent /perfume manufacturing), food industry, olfactosexual function and to study olfacto-neural communication, olfactory disorders, in forensics and defense studies. Thus, performing genome wide survey on GPCRs and ORs from selected eukaryotic organisms will improve scientific credibility and ultimately serve for human benefit.
256
APPENDIX 1 THE LIST OF IDENTIFIED FAMILY-SPECIFIC MODIFS IN SR
Table A1.1 List of observed motifs in Serpentine receptor families (60 % level of conservation)
S.No Alignment Position Motif Location Super family AAS [K/E/H/I/L/N/Q/R/S/ T][T/C/K/S][P/D/E/ K/L/S/T] [F/I/L/M/T/V/W/Y][ E/C/D/F/I/K/L/M/Q/ T/V/Y][N/C/D/E/F/ G/H/I/Q/R/S/T/Y][R /Q/S] [P/A/C/D/G/H/L/N/ S/T/V][Y/F/H/I/L/V] [R/H/K/L/Q/T/W/Y] [I/F/L/M/S/V][Y/F/ L/M][L/F/I/M/V] [K/R][H/N/Q/Y][Q/ E/H/K/N/R] [V/C/I/S][L/M/V][I/ F/L/M/V] [V/F/I/L]N[P/Q] [F/T][I/V][Y/C/F][L/ F/I][I/A/V][F/H/L/V /W] [F/L/M/V]G[N/S]Y[ R/K] [L/I][L/M/S][L/I/V/ Y] [F/Y][N/D][L/F/I/M] [Y/F/H/N][R/G/K/S] [Y/H] R[C/A/I/T][S/A/G] [Y/F]R[Y/F][L/F/I/ M] [R/K/L][S/A/T][W/I/ R] [R/K/L/V][A/T]L[I/ T/V][V/I]Q[T/A/S] [I/T/V]P[I/S/T] [P/A/L][I/A/M/V][F/ I/L/S][G/D/N][I/F/L/ V] Symbols for AAS [+/-/+/@/@/$/$/+/$/$][$/$/+/$][$/-//+/@/$/$]
148-150
KTP
TM1,ICL1
Srh
246-249
FENR
TM3,ICL2
Srh
530-532
PYR
C'
Srh
[*/@/@/@/$/@/*/*][-/$//*/@/+/@/@/$/$/@/*][$/$/-//*/$/+/@/$/+/$/$/*][+/$/$] [$/@/$//$/+/@/$/$/$/@][*/*/+/@/@/@][+/+ /+/@/$/$/*/*] [@/*/@/@/$/@][*/*/@/@][@/*/@/ @/@] [+/+][+/$/$/*][$/-/+/+/$/+] [@/$/@/$][@/@/@][@/*/@/@/@] [@/*/@/@]$[$/$] [*/$][@/@][*/$/*][@/*/@][@/@/@ ][*/+/@/@/*] [*/@/@/@]$[$/$]*[+/+] [@/@][@/@/$][@/@/@/*] [*/*][$/-][@/*/@/@] [*/*/+/$][+/$/+/$][*/+] +[$/@/@/$][$/@/$] [*/*]+[*/*][@/*/@/@] [+/+/@][$/@/$][*/@/+] [+/+/@/@][@/$]@[@/$/@][@/@]$ [$/@/$] [@/$/@]$[@/$/$] [$/@/@][@/@/@/@][*/@/@/$][$//$][@/*/@/@]
1 2 3 1
35-37 143-145 357-359 22-24
IYL KHQ VLI VNP FIYLI F FGNY R LLL FNL YRY RCS YRYL RSW RALIV QT IPI
TM1 ICL2 TM7 TM1
Sri Sri Sri Srj
2 3 4 5 6 7 8 9 10 11
26-31 38-42 44-46 51-53 69-71 99-101 116-119 200-202 252-258 260-262
TM1 ICL1 TM2 TM2 ECL1 ECL1,TM3 TM3,ICL2 ECL2,TM5 TM6 TM6
Srj Srj Srj Srj Srj Srj Srj Srj Srj Srj
12
276-280
PIFGI
ECL3
Srj
257 Table A1.1 (Continued)

S.No 13 Alignment Position 304-307 Motif AIIL Location TM7 Super family Srj AAS A[I/L/V][I/V][L/F/I/ V/Y] [Y/C/F][R/L][Y/A/C /F/H/L/S/T/V/W] [Q/D/E/N][L/F/I/Y][ F/H/L/M/T/V/Y] G[P/F/L/S/V/Y][C/ G/I/L] [Y/F/I/L/T/V][F/H/S /T/Y][V/F/I/L] [P/H/I][Y/F][R/F/K/ Q] Symbols for AAS @[@/@/@][@/@][@/*/@/@/*]
1 2
268-270 493-495
YRY QLF
TM3,ICL2 ICL3,TM6
Str Str
[*/$/*][+/@][*/@/$/*/+/@/$/$/@/*] [$/-/-/$][@/*/@/*][*/+/@/@/$/@/*]
1 2 3
124-126 409-411 413-415
GPC YFV PYR
TM2,ECL1 TM7 TM7,C`
Srd Srd Srd
$[$/*/@/$/@/*][$/$/@/@] [*/*/@/@/$/@][*/+/$/$/*][@/*/@/ @] [$/+/@][*/*][+/*/+/$]
1 2 3 4 5 6
25-27 29-31 93-95 127-129 161-163 237-241
YLL SIF KNL FPI LFG IALLD
TM1 TM1,ICL1 ECL1,TM3 ICL2 TM4 TM6
Srbc Srbc Srbc Srbc Srbc Srbc
[Y/M/N/S][L/I][L/I/ V] [S/I/K/T][I/T][F/L/V ] [K/R][N/S][L/F/I/V] [F/S/V][P/S][I/L/Q/ T/V] [L/I/M][F/Y][G/C/E/ V] [I/F/L/V]A[L/M][L/ F/I/V]D [I/L/M/V][D/I][R/I/ K/L/V][L/F/V/Y][I/ L/R/V/Y] [L/C/I][T/C/F/G/I/L/ N/S][R/H/K/N/Q][K /E/P/R/S/T] [S/A/C/D/E/H/N/P][ S/A/C/D/H/I/L/R/T/ V][Q/A/E/H/K/L/M/ N/R/V][Y/C][R/C/E/ K/N/Q/S] S[L/F/I][N/W] [K/I/Q][I/A/L][S/A/ N][Q/F] [L/F/T][T/A][F/L/Y] [S/G][T/S][K/Q]ILL [N/T][L/I][F/L/V] [A/T]N[L/I] S[G/V]M [Y/F][G/C][Q/S]TG LL [C/F]A[T/L][F/Y] [I/T/V][S/A][I/L] [S/I/T][T/S/Y][G/A] WDD[P/S][L/I/R]
[*/@/$/$][@/@][@/@/@] [$/@/+/$][@/$][*/@/@] [+/+][$/$][@/*/@/@] [*/$/@][$/$][@/@/$/$/@] [@/@/@][*/*][$/$/-/@] [@/*/@/@]@[@/@][@/*/@/@][@/@/@/@][/@][+/@/+/@/@][@/*/@/*][@/@/+ /@/*]
172-176
IDRLI
TM3,ICL2
Srsx
150-153
LTRK
TM1,ICL1
Srw
[@/$/@][$/$/*/$/@/@/$/$][+/+/+/$/ $][+/-/$/+/$/$] [$/@/$/-/-/+/$/$][$/@/$//+/@/@/+/$/@][$/@//+/+/@/@/$/+/@][*/$][+/$//+/$/$/$] $[@/*/@][$/*] [+/@/$][@/@/@][$/@/$][$/*] [@/*/$][$/@][*/@/*] [$/$][$/$][+/$]@@@ [$/$][@/@][*/@/@] [@/$]$[@/@] $[$/@]@ [*/*][$/$][$/$]$$@@ [$/*]@[$/@][*/*] [@/$/@][$/@][@/@] [$/@/$][$/$/*][$/@] *--[$/$][@/@/+]
2 1 2 3 4 5 6 7 8 9 10 11 12
660-664 28-30 32-35 44-46 65-70 73-75 77-79 124-126 128-134 143-146 165-167 177-179 184-188
SSQY R SLN KISQ LTF STKIL L NLF ANL SGM YGQT GLL CATF ISI STG WDDP L
TM7,C` N` TM1 TM1 ICL1,TM2 TM2 TM2 TM4 TM4 ECL2 TM5 TM5 ICL3
Srw Sra Sra Sra Sra Sra Sra Sra Sra Sra Sra Sra Sra

S.No 13 14 15 16 17 18 19 20 21 1 2 3 4 5 6 7 8 1 2 Alignment Position 219-221 233-235 261-265 271-273 275-278 298-300 305-309 337-341 343-346 1-3 18-20 114-117 198-200 255-261 305-307 311-314 376-378 260-262 348-350 Motif FNL YNK ICFLT FMF YSFG VVW PFIAL KQTQ D HIKQ MIF PIY WTDD FFN RFQA KEN FEN LNPL ETD LRK INP Location TM6 ICL3 TM6 TM6 TM6 TM7 TM7 C` C` N` N` ECL1 TM4,ECL2 ICL3 ECL3 TM7 C` TM5,ICL3 TM7 Super family Sra Sra Sra Sra Sra Sra Sra Sra Sra Sre Sre Sre Sre Sre Sre Sre Sre Srv Srv AAS [F/A/I]N[L/C/F] [Y/H][N/K][K/D/E] IC[F/S][L/V][T/A/N ] [F/A/W][M/L/V][F/ S] [Y/N/S][S/T][F/A/S] [G/A] [V/I][V/A/Q][W/Y] [P/V][F/I/Y][I/G/V][ A/N/V][L/A] [K/T][Q/G]T[Q/V][ D/E] H[I/M][K/N/S][Q/H/ S] MI[F/I] P[I/T/V][Y/F/T] WT[D/K/S][D/I] [F/L/T][F/Y][N/H/Q ] [R/Q][F/Y]Q[A/V][ K/M/R]EN [F/V][E/D/Q][N/A/S ] [L/V][N/G]P[L/S/V] ETD [L/I][R/H/K][K/E] INP [N/E/I/K/V]R[F/T/V /W/Y]
[Y/C/F][G/M/V][S/F/I/ L] [I/F/T][P/H/Q/S/T][L/F /M] [Y/A][N/D/G/K/S]C[S/ P] [R/H/Q/Y][P/Q/T/Y][I/ F/L/P/V] [L/F/I/V][Y/T][I/F/L/T/ V][P/I/L] [K/E/Q/R][I/L/M/T/V] M [N/H/S][S/C][I/F/L/V] [Q/E/F/I/K/M/Y]G[A/I] [V/A/S][F/Y]C [L/F/P][I/F][Y/F][I/C/F /L/V] [W/Y][F/L][F/Y][D/N] P [I/A/L/V][Y/S][V/E/I/T ] [M/E/I/S][N/E/F/M/Q/S ][F/L/Y] [I/A/S/V]Y[L/F/I] [T/A/L/M/Q][I/M]R[N/ K/Q/S]
Symbols for AAS [*/@/@]$[@/$/*] [*/+][$/+][+/-/-] @$[*/$][@/@][$/@/$] [*/@/*][@/@/@][*/$] [*/$/$][$/$][*/@/$][$/@] [@/@][@/@/$][*/*] [$/@][*/@/*][@/$/@][@/$/@][@/ @] [+/$][$/$]$[$/@][-/-] +[@/@][+/$/$][$/+/$] @@[*/@] $[@/$/@][*/*/$] *$[-/+/$][-/@] [*/@/$][*/*][$/+/$] [+/$][*/*]$[@/@][+/@/+]-$ [*/@][-/-/$][$/@/$] [@/@][$/$]$[@/$/@] -$[@/@][+/+/+][+/-] @$$
182-184
NRF
TM3,ICL2
Srx
[$/-/@/+/@]+[*/$/@/*/*]
1 2 3 4 5 6 7 8 9 10 11 12 13 14
20-22 26-28 33-36 97-99 181-184 202-204 218-220 228-233 238-241 313-317 393-395 397-399 424-426 431-434
YGS IPL YNCS RPI LYIP KIM NSI QGAVF C LIYI WFFDP IYV MNF IYL TIRN
N` N` N` N`,TM1 TM1 ICL1,TM2 TM2 TM2,ECL1 ECL1,TM3 ECL2 TM6 TM6,ECL3 TM7 C`
Srt Srt Srt Srt Srt Srt Srt Srt Srt Srt Srt Srt Srt Srt
[*/$/*][$/@/@][$/*/@/@] [@/*/$][$/+/$/$/$][@/*/@] [*/@][$/-/$/+/$]$[$/$] [+/+/$/*][$/$/$/*][@/*/@/$/@] [@/*/@/@][*/$][@/*/@/$/@][$/@/@] [+/-/$/+][@/@/@/$/@]@ [$/+/$][$/$][@/*/@/@] [$/-/*/@/+/@/*]$[@/@][@/@/$][*/*]$ [@/*/$][@/*][*/*][@/$/*/@/@] [*/*][*/@][*/*][-/$]$ [@/@/@/@][*/$][@/-/@/$] [@/-/@/$][$/-/*/@/$/$][*/@/*] [@/@/$/@]*[@/*/@] [$/@/@/@/$][@/@]+[$/+/$/$]

Alignment Position 32-34 43-47 90-93 131-134 160-162 165-167 180-182 192-194 212-214 Super Motif AYL RILYV PQLC NRMS APF IWN GGF WAS VTT VGSP LV Location TM1 TM1 ICL1 TM3,ICL2 TM4 TM4 ECL2 ECL2 TM6 family Srg Srg Srg Srg Srg Srg Srg Srg Srg AYL RILYV PQLC NRMS APF IWN GGF WAS VTT AAS @*@ +@@*@ $$@$ $+@$ @$* @*$ $$* *@$ @$$ Symbols for AAS
S.No 1 2 3 4 5 6 7 8 9
10
284-289
TM7
Srg
VGSPLV
@$$$@@
72-74
ILL
ICL1
Sra
[I/C/S/T/V]L[L/I/S] [R/Q][F/Y][Q/H/N/ R]
[@/$/$/$/@]@[@/@/$]
282-284
RFQ
ICL3
Sra
[+/$][*/*][$/+/$/+]
260
REFERENCE
1.
Abaffy, T. and DeFazio, A.R. "The location of olfactory receptors within olfactory epithelium is independent of odorant volatility and solubility" BMC Res Notes, PubMed PMID: 21548958; PubMed Central PMCID: PMC3118157, Vol. 6, No. 37, pp. 23, 2011. Ache, B.W. and Young, J.M. "Olfaction: diverse species, conserved principles" Neuron, Vol. 48, pp. 417-30, 2005. Adachi, I. and Hasegawa, M. "Model of amino acid substitution in proteins encoded by mitochondrial DNA", J Mol Evol, Vol. 42, pp. 459-68, 1996. Adams, M.D. "The genome sequence of Drosophila melanogaster", Science, Vol. 287, pp. 2185-95, 2000. Alcedo, J., Ayzenzon, M., Von Ohlen, T., Noll, M. and Hooper, J.E. "The Drosophila smoothened gene encodes a seven-pass membrane protein, a putative receptor for the hedgehog signal" Cell, Vol. 86, pp. 221-32, 1996. Alfarano, C., Andrade, C.E., Anthony, K., Bahroos, N., Bajec, M., Bantoft, K., Betel, D., Bobechko, B., Boutilier, K., Burgess, E., Buzadzija, K., Cavero, R., D'Abreo, C., Donaldson, I., Dorairajoo, D., Dumontier, M.J., Dumontier, M.R. and Earles, V., "The Biomolecular Interaction Network Database and related tools 2005 update", Nucleic Acids Res, Vol. 33, pp. D418-24, 2005. Alioto, T.S. and Ngai, J. "The odorant receptor repertoire of teleost fish", BMC Genomics, Vol. 6, pp. 173, 2005. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W. and Lipman, D.J. "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res, Vol. 25, pp. 3389-402, 1997. Arnold, K., Bordoli, L., Kopp, J. and Schwede, T. "The SWISSMODEL workspace: a web-based environment for protein structure homology modelling", Bioinformatics, Vol. 22, pp. 195-201, 2006.
2. 3.
4. 5.
6.
7. 8.
9.
261
10. 11.
Bargmann, C.I. "Comparative chemosensation from receptors to ecology", Nature, Vol. 444, pp. 295-301, 2006. Barth, A.L., Justice, N.J. and Ngai, J. "Asynchronous onset of odorant receptor expression in the developing zebrafish olfactory system", Neuron, Vol. 16, pp. 23-34, 1996. Bateman, A., Birney, E., Cerruti, L., Durbin, R., Etwiller, L., Eddy, S.R., Griffiths-Jones, S., Howe, K.L., Marshall, M. and Sonnhammer, E.L. "The Pfam protein families database" Nucleic Acids Res, Vol. 30, pp. 276-80, 2002. Benton, R., Sachse, S., Michnick, S.W. and Vosshall, L.B. "Atypical membrane topology and heteromeric function of Drosophila odorant receptors in vivo", PLoS Biol, Vol. 4, pp. e20, 2006. Berry, M.D. "The potential of trace amines and their receptors for treating neurological and psychiatric diseases" Rev Recent Clin Trials, Vol. 2, pp. 2007. Bhadra, R., Sandhya, S., Abhinandan, K.R., Chakrabarti, S., Sowdhamini, R. and Srinivasan, N. "Cascade PSI-BLAST web server: a remote homology search tool for relating protein domains", Nucleic Acids Res, Vol. 34, pp. W143-6, 2006. Bhanot, P., Brink, M., Samos, C.H., Hsieh, J.C., Wang, Y., Macke, J.P., Andrew, D., Nathans, J. and Nusse, R. "A new member of the frizzled family from Drosophila functions as a Wingless receptor", Nature, Vol. 382, pp. 225-30, 1996. Bjarnadottir, T.K., Gloriam, D.E., Hellstrand, S.H., Kristiansson, H., Fredriksson, R. and Schioth, H.B. "Comprehensive repertoire and phylogenetic analysis of the G protein-coupled receptors in human and mouse" Genomics, Vol. 88, pp. 263-73, 2006. Bockaert, J. and Pin, J.P. "Molecular tinkering of G protein-coupled receptors: an evolutionary success", Embo J, Vol. 18, pp. 1723-9, 1999. Bowie, J.U., Luthy, R. and Eisenberg, D. "A method to identify protein sequences that fold into a known three-dimensional structure", Science, Vol. 253, pp. 164-70, 1991. Bozza, T.C. and Kauer, J.S. "Odorant response properties of convergent olfactory receptor neurons" J Neurosci, Vol. 18, pp. 4560-9, 1998.
12.
13.
14.
15.
16.
17.
18.
19.
20.
262
21.
Breer, H. "Olfactory receptors: molecular basis for recognition and discrimination of odors", Anal Bioanal Chem, Vol. 377, pp. 427-33, 2003. Brenner, S. "The genetics of Caenorhabditis elegans", Genetics, Vol. 77, pp. 71-94, 1974. Buck, L. and Axel, R. "A novel multigene family may encode odorant receptors: a molecular basis for odor recognition", Cell, PubMed PMID:1840504, Bulger, M., van Doorninck, J.H., Saitoh, N., Telling, A., Farrell, C., Bender, M.A., Felsenfeld, G., Axel, R. and Groudine, M. "Conservation of sequence and structure flanking the mouse and human beta-globin loci: the beta-globin genes are embedded within an array of odorant receptor genes", Proc Natl Acad Sci U S A, Vol. 96, pp. 5129-34, 1999. Chang, A.B., Lin, R., Keith Studley, W., Tran, C.V. and Saier, M.H. "Phylogeny as a guide to structure and function of membrane transport proteins", Mol Membr Biol., Vol. May-Jun, Vol. 21, pp. 171-81, 2004 Chen, B.L., Hall, D.H. and Chklovskii, D.B. "Wiring optimization can relate neuronal structure and function", Proc Natl Acad Sci U S A, Epub 2006 Mar 14. PubMed PMID: 16537428; PubMed Central PMCID: PMC1550972, Vol. 21, No. 12, pp. 4723-8, 2006. Chen, C.P., Kernytsky, A. and Rost, B. "Transmembrane helix predictions revisited", Protein Sci, Vol.12,pp.2774-91, 2002. Chen, N., Lawson, D., Bradnam, K., Harris, T.W. and Stein, L.D. "WormBase as an integrated platform for the C. elegans ORFeome", Genome Res, Vol. 14, No. 10B, pp. 2155-61, 2004. Chen, N., Pai, S., Zhao, Z., Mah, A., Newbury, A., Johnsen, R.C., Altun, Z., Moerman, D.G., Baillie, D.L. and Stein, L.D. "Identification of a nematode chemosensory gene family", Proc Natl Acad Sci U S A, Vol. 102, pp. 146-51, 2005. Chess, A., Simon, I., Cedar, H. and Axel, R. "Allelic inactivation regulates olfactory receptor gene expression", Cell, Vol. 78, pp. 823-34, 1994. Cho, S., Rogers, K.W. and Fay, D.S. "The C. elegans glycopeptide hormone receptor ortholog, FSHR-1, regulates germline differentiation and survival", Curr Biol, Vol. 17, pp. 203-12, 2007.
22. 23.
24.
25.
26.
27. 28.
29.
30.
31.
263
32.
Chou, Y.H., Spletter, M.L., Yaksi, E., Leong, J.C., Wilson, R.I. and Luo, L. "Diversity and wiring variability of olfactory local interneurons in the Drosophila antennal lobe", Nat Neurosci, Vol. 13, pp. 439-49, 2010. Clyne, P., Grant, A., O'Connell, R. and Carlson, J.R. "Odorant response of individual sensilla on the Drosophila antenna", Invert Neurosci, Vol. 3, pp. 127-35, 1997. Clyne, P.J., Warr, C.G., Freeman, M.R., Lessing, D., Kim, J. and Carlson, J.R. "A novel family of divergent seven-transmembrane proteins: candidate odorant receptors in Drosophila", Neuron, Vol. 22, pp. 327-38, 1999. Conzelmann, S., Levai, O., Bode, B., Eisel, U., Raming, K., Breer, H. and Strotmann, J. "novel brain receptor is expressed in a distinct population of olfactory sensory neurons", Eur J Neurosci., Vol. 12, No. 11, pp. 3926-34, 2000. Coulier, F., Pontarotti, P., Roubin, R., Hartung, H., Goldfarb, M. and Birnbaum, D. "Of worms and men: an evolutionary perspective on the fibroblast growth factor (FGF) and FGF receptor families", J Mol Evol, Vol. 44, pp. 43-56, 1997. Crasto, C., Marenco, L., Miller, P. and Shepherd, G. "Olfactory Receptor Database: a metadata-driven automated population from sources of gene and protein sequences", Nucleic Acids Res, Vol. 30, pp. 354-60, 2002. Crasto, C., Singer, M.S. and Shepherd, G.M. "The olfactory receptor family album", Genome Biol, Vol. 2, pp. 1027, 2001. Crosby, M.A., Goodman, J.L., Strelets, V.B., Zhang, P. and Gelbart, W.M. "FlyBase: genomes by the dozen", Nucleic Acids Res, Vol. 35, pp. D486-91, 2007. Daniel, J., Scott and Sharon Layfield "Characterization of novel splice variants of LGR7 and LGR8 reveals that receptor signaling is mediated by their unique low density lipoprotein class a modules", Journal of Biological Chemistry, Vol. 281, pp. 3494254, 2006. Davenport, P. "Peptide and trace amine orphan receptors: prospects for new therapeutic targets", Curr Opin Pharmacol, Vol. 3, pp 127-34, 2003.
33.
34.
35.
36.
37.
38. 39.
40.
41.
264
42.
Dawson, J.P., Weinger, J.S. and Engelman. D.M. "Motifs of serine and threonine can drive association of transmembrane helices", J Mol Biol., PubMed PMID: 11866532., Vol. 22, No. 316, pp. 799-805, 2002. Dayhoff, M.O., Schwartz, R.M. and Orcutt, B.C. "A model of evolutionary change in proteins", Atlas of Protein Sequence and Structure, Vol. 5, No. 3, pp. 345352, 1978. De Hertogh, B., Carvajal, E., Talla, E., Dujon, B., Baret, P. and Goffeau, A. "Phylogenetic classification of transporters and other membrane proteins from Saccharomyces cerevisiae", Funct Integr Genomics, Vol. 2, pp. 154-70, 2002. De Roux, N., Genin, E., Carel, J.C., Matsuda, F., Chaussain, J.L. and Milgrom, E. "Hypogonadotropic hypogonadism due to loss of function of the KiSS1-derived peptide receptor GPR54", Proc Natl Acad Sci U S A, Vol. 100, No. 19, pp. 10972-6, 2003. Dilanian, R.S., Darmanin, C., Varghese, J.N., Wilkins, S.W., Oka, T., Yagi, N., Quiney, H.M. and Nugent, K.A. "A new approach for structure analysis of two-dimensional membrane protein crystals using X-ray powder diffraction data", Protein Sci., Vol. 20, pp. 457-64, 2011. Dobritsa, A.A., van der Goes van Naters, V., Warr, C.G., Steinbrecht, R.A. and Carlson, J.R. "Integrating the molecular and cellular basis of odor coding in the Drosophila antenna", Neuron, Vol. 37, pp. 827-41, 2003. Duchamp-Viret, P. and Duchamp, P. "Odor processing in the frog olfactory system" Prog Neurobiol, Vol. 53, pp. 561-602, 1997. Dulac, C. and Axel, R. "A novel family of genes encoding putative pheromone receptors in mammals", Cell, Vol. 83, pp. 195-206, 1995. Dunipace, L., Meister, S., McNealy, C. and Amrein, H. "Spatially restricted expression of candidate taste receptors in the Drosophila gustatory system" Curr Biol, Vol. 11, pp. 822-35, 2001. Eddy, S.R. "Profile hidden Markov models" Bioinformatics, Vol. 14, pp. 755-63, 1998. Elefsinioti, A.L., Bagos, P.G., Spyropoulos, I.C. and Hamodrakas, H.J. "A database for G proteins and their interaction with GPCRs", BMC Bioinformatics, Vol. 5, pp. 208, 2004.
43.
44.
45.
46.
47.
48. 49. 50.
51. 52.
265
53.
Elphick, M.R. and Egertova, M. "The neurobiology and evolution of cannabinoid signalling" Philos Trans R Soc Lond B Biol Sci, Vol. 356, pp. 381-408, 2001. Felsenstein, J. "Evolutionary trees from DNA sequences: a maximum likelihood approach", J Mol Evol, Vol. 17, pp. 368-76, 1981. Felsenstein, J. and Churchill, G.A. "A Hidden Markov Model approach to variation among sites in rate of evolution", Mol Biol Evol, Vol. 13, pp 93-104, 1996. Fire, S., Xu, M.K., Montgomery, S.A., Kostas, S.E., Driver and Mello, C.C. "Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans", Nature, Vol. 391, pp. 806-11, 1998. Firestein "How the olfactory system makes sense of scents", Nature, Vol. 413(6852):Review. PubMed PMID: 11557990, pp. 211-8, 2001. Firestein, S. and Werblin, F. "Odor-induced membrane currents in vertebrate-olfactory receptor neurons" Science, Vol. 244, pp. 79-82, 1989. Foord, S.M., Jupe, S. and Holbrook, J. "Bioinformatics and type II Gprotein-coupled receptors", Biochem., Soc Trans, Vol. 30, pp. 473-9, 2002. Fortini, M.E., Skupski, M.P., Boguski, M.S. and I. K. Hariharan "A survey of human disease gene counterparts in the Drosophila genome" J Cell Biol, Vol. 150, pp. F23-30, 2000. Fredriksson, R. and Schioth, H.B. "The repertoire of G-protein-coupled receptors in fully sequenced genomes", Mol Pharmacol, Vol. 67, pp. 1414-25, 2005. Freitag, J., Beck, A., Ludwig, G., von Buchholtz, L. and Breer, H. "On the origin of the olfactory receptor family: receptor genes of the jawless fish (Lampetra fluviatilis)" Gene, Vol. 226, pp. 165-74, 1999. Freitag, J., Krieger, J., Strotmann, J. and Breer, H. "Two classes of olfactory receptors in Xenopus laevis" Neuron, Vol. 15, pp. 1383-92, 1995. Freitag, J., Ludwig, G., Andreini, I., Rossler, P. and Breer, H. "Olfactory receptors in aquatic and terrestrial vertebrates" J Comp Physiol A, Vol. 183, pp. 635-50, 1998.
54. 55.
56.
57. 58.
59.
60.
61.
62.
63.
64.
266
65.
Friedrich, R.W. and Korsching, S.I. "Combinatorial and chemotopic odorant coding in the zebrafish olfactory bulb visualized by optical imaging", Neuron, Vol. 18, pp. 737-52, 1997. Gaillard, I., Rouquier, S. and Giorgi, D. "Olfactory receptors" Cell Mol Life Sci, Vol. 61, pp. 456-69, 2004. Gao, Q. and Chess, A. "Identification of candidate Drosophila olfactory receptors from genomic DNA sequence", Genomics, Vol. 60, pp. 31-9, 1999. Gether, U. "Uncovering molecular mechanisms involved in activation of G protein-coupled receptors" Endocr Rev, Vol. 21, pp. 90-113, 2000. Glusman, G., Yanai, I., Rubin, I. and Lancet, D. "The complete human olfactory subgenome", Genome Res, Vol. 11, pp. 685-702, 2001. Gonzalez, M.W. and Pearson, W.R. "Homologous over-extension: a challenge for iterative similarity searches", Nucleic Acids Res, Vol. 38, pp. 2177-89, 2010. Gottlieb, A., Olender, T., Lancet, D. and Horn, D. "Common peptides shed light on evolution of Olfactory Receptors", BMC Evol Biol, Vol. 9, p. 91, 2009. Greenwald, I. "LIN-12/Notch signaling in C. elegans " WormBook, Vol. 12, pp. 1-16, 2005. Grill, E. and Christmann, A. "Botany. A plant receptor with a big family" Science, Vol. 315, pp. 1676-7, 2007. Hallem, E.A., Ho, M.G. and Carlson, J.R. "The molecular basis of odor coding in the Drosophila antenna" Cell, Vol. 117, pp. 965-79, 2004. Hanks, S.K., Quinn, A.M. and Hunter, T. "The protein kinase family: conserved features and deduced phylogeny of the catalytic domains", Science, Vol. 241, pp. 42-52, 1988. Harini, K. and Sowdhamini, R. "Molecular modelling of oligomeric states of DmOR83b, an olfactory receptor in D. melanogaster", Bioinformatics and Biology Insights ,Vol. 6,pp. 3347, 2012.
66. 67.
68.
69. 70.
71.
72. 73. 74. 75.
76.
267
77.
Hayden, S., Bekaert, M., Crider, T.A., Mariani, S., Murphy, W.J. and Teeling, E.C. "Ecological adaptation determines functional mammalian olfactory subgenomes", Genome Res, Vol. 20, pp. 1-9, 2010. Henikoff, S. and Henikoff, J.G. "Amino acid substitution matrices from protein blocks", Proc Natl Acad Sci U S A, Vol. 89, pp. 10915-9, 1992. Hilliard, M.A., Bargmann, C.I. and Bazzicalupo, P. "C. elegans responds to chemical repellents by integrating sensory inputs from the head and the tail", Curr Biol, Vol. 12, pp. 730-4, 2002. Hillier, L.W., Coulson, A., Murray, J.I., Bao, Z., Sulston, J.E. and Waterston, R.H. "Genomics in C. elegans: So many genes, such a little worm", Genome Research, Vol. 15, pp. 1651-60, 2005. Hirokawa, T., Boon-Chieng, S. and Mitaku, S. "SOSUI: classification and secondary structure prediction system for membrane proteins", Bioinformatics, Vol. 14, pp. 378-9, 1998. Huang, Y., Niu, B., Gao, Y., Fu, L. and Li, W. "CD-HIT Suite: a web server for clustering and comparing biological sequences", Bioinformatics, Vol. 26, pp. 680-2, 2010. Insel, P.A., Tang, C.M., Hahntow, I. and Michel, M.C. "Impact of GPCRs in clinical medicine: monogenic diseases, genetic variants and drug targets", Biochim Biophys Acta, Vol. 1768, pp. 994-1005, 2007. Jaakola, V.P., Griffith, M.T., Hanson, M.A., Cherezov, V.A., Chien, E.Y., Lane, J.R., Ijzerman, A.P. and Stevens, R.C. "The 2.6 angstrom crystal structure of a human A2A adenosine receptor bound to an antagonist", Science, Vol. 322, pp. 1211-7, 2008. Jeffery, C.J. and Koshland, D.E. "A single hydrophobic to hydrophobic substitution in the transmembrane domain impairs aspartate receptor function", Biochemistry, Vol. 33, pp. 3457-63, 1994. Ji, Y., Zhang, Z. and Hu, Y. "The repertoire of G-protein-coupled receptors in Xenopus tropicalis", BMC Genomics, Vol. 10, pp. 263, 2009. Jones, D.T., Taylor, W.R. and Thornton, J.M. "The rapid generation of mutation data matrices from protein sequences", Comput Appl Biosci, Vol. 8, pp. 275-82, 1992.
78.
79.
80.
81.
82.
83.
84.
85.
86.
87.
268
88.
Joost, P. and Methner, A. "Phylogenetic analysis of 277 human Gprotein-coupled receptors as a tool for the prediction of orphan receptor ligands", Genome Biol., Epub 2002 Oct 17. PubMed PMID: 12429062; PubMed Central PMCID: PMC133447. Vol. 17, No. 3, p. 63. 2002. Josefsson, L.G. and Rask, L. "Cloning of a putative G-protein-coupled receptor from Arabidopsis thaliana", Eur J Biochem, Vol. 249, pp. 415-20, 1997. Kll, L., Krogh, A. and S. EL. "Advantages of combined transmembrane topology and signal peptide prediction the Phobius web server", Nucleic Acids Res., (Web Server issue), Vol. 35, pp. 429-32, 2007. Kandaswamy, K.K., Pugalenthi, G., Hartmann, E., Kalies, K.U., Moller, S., Suganthan, P.M. and Martinetz, T. "SPRED: A machine learning approach for the identification of classical and non-classical secretory proteins in mammalian genomes" Biochem Biophys Res Commun, Vol.391, pp.1306-11, 2010. Kang, J. and Caprio, J. "Electro-olfactogram and multisubunit olfactory receptor responses to complex mixtures of amino acids in the channel catfish, Ictalurus punctatus" J. Gen. Physiol, Vol. 98, pp. 699721, 1991. Karplus, K., Barrett, C. and Hughey, R. "Hidden Markov models for detecting remote protein homologies" Bioinformatics, Vol. 14, pp. 846-56, 1998. Karuppiah Kanagarajadurai, Manoharan Malini, Aditi Bhattacharya,Mitradas M. Panicker and Ramanathan Sowdhamini "Molecular modeling and docking studies of human 5hydroxytryptamine A (5-HT2 A) receptor for the identification of otspots for ligand binding", Molecular Bio Systems, Vol.5,pp.187788, 2009. Kashiwayanagi, M. and Kurihara, K. "Odor responses after complete desensitization of the cAMP-dependent pathway in turtle olfactory cells" Neurosci Lett, Vol. 193, pp. 61-4, 1995. Katoh, Misawa, Kuma and Miyata "MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform" Nucleic Acids Res, Vol. 30, pp. 3059-3066, 2002.
89.
90.
91.
92.
93.
94.
95.
96.
269
97.
Kerrien, S., Aranda, B., Breuza, L., Bridge, A., Broackes-Carter, F., Chen, C., Duesbury, M., Dumousseau, M., Feuermann, M., Hinz, U., Jandrasits, C., Jimenez, R.C., Khadake, J., Mahadevan, U., Masson, P., Pedruzzi, I., Pfeiffenberger, E., Porras, P., Raghunath, A., Roechert, B., Orchard, S. and Hermjakob, H. "The IntAct molecular interaction database in 2012" Nucleic Acids Res, Vol. 40, pp. D841-6, 2011. Kim, K., Sato, K., Shibuya, M., Zeiger, D.M., Butcher, R.A., Ragains, J.R., Clardy, J., Touhara, K. and Sengupta, P. "Two chemoreceptors mediate developmental effects of dauer pheromone in C. elegans", Science, Vol. 326, pp. 994-8, 2009. Koszelak-Rosenblum, M., Krol, A., Mozumdar, N., Wunsch, K., Ferin, A., Cook, E., Veatch, C.K., Nagel, R., Luft, J.R., Detitta, G.T. and Malkowski, M.G. "Determination and application of empirically derived detergent phase boundaries to effectively crystallize membrane proteins", Protein Sci., Vol. 18, pp. 1828-39, 2009. Kristiansen, K. "Molecular mechanisms of ligand binding, signaling, and regulation within the superfamily of G-protein-coupled receptors: molecular modeling and mutagenesis approaches to receptor structure and function", Pharmacol Ther, Vol. 103, pp. 21-80, 2004. Krogh, A., Larsson, B., von Heijne, G. and Sonnhammer, E.L. "Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes", J. Mol. Biol., Vol. 305, pp. 567-80, 2001. Krogh, A., Mian, I.S. and Haussler, D. "A hidden markov model that finds genes in E.coli dna" Nucleic. Acids Res, Vol. 22, No. 22, pp. 47684778, 1994b. Kumar, S., Nei, M., Dudley, J. and Tamura, K. "MEGA: a biologistcentric software for evolutionary analysis of DNA and protein sequences", Brief Bioinform, Vol. 9, pp. 299-306, 2008. Kuwabara, P.E. and O'Neil, N. "The use of functional genomics in C. elegans for studying human development and disease" J Inherit Metab Dis., Vol. 2, pp. 127-38, 2001. Laage, R., Rohde, J., Brosig, B. and Langosch, D. "A conserved membrane-spanning amino acid motif drives homomeric and supports heteromeric assembly of presynaptic SNARE proteins", PubMed PMID: 10764817., Vol. 9, No. 275, pp. 17481-7, 2000.
98.
99.
100.
101.
102.
103.
104.
105.
270
106.
Lai, C.H., Chou, C.Y., Ch'ang, L.Y., Liu, C.S. and Lin, W. "Identification of novel human genes evolutionarily conserved in Caenorhabditis elegans by comparative proteomics", Genome Res, Vol. 10, pp. 703-13, 2000. Lao, D.M., Okuno, T. and Shimizu, T. "Evaluating transmembrane topology prediction methods for the effect of signal peptide in topology prediction", In Silico Biol, Vol. 2, pp. 485-94, 2002. Lapidot, M., Pilpel, Y., Gilad, Y., Falcovitz, A., Sharon, D., Haaf, T. and Lancet, D. "Mouse-human orthology relationships in an olfactory receptor gene cluster" Genomics, Vol. 71, pp. 296-306, 2001. Larsson, M.C., Domingos, A.I., Jones, W.D., Chiappe, M.E., Amrein, H. and Vosshall, L.B. "Or83b encodes a broadly expressed odorant receptor essential for Drosophila olfaction", Neuron, Vol. 43, pp. 70314, 2004. Laskowski, R.A., MacArthur, M.W., Moss, D.S. and Thornton, J.M. "PROCHECK - a program to check the stereochemical quality of protein structures", J Appl., Crystallogr., Vol. 64, pp. 897934, 1993. Lee, T., Seeman, P., Rajput, A., Farley, I.J. and Hornykiewicz, O. "Receptor basis for dopaminergic supersensitivity in Parkinson's disease", Nature, Vol. 273, pp. 59-61, 1978. Leonov, H. and Arkin, I.T. "A periodicity analysis of transmembrane helices" Bioinformatics, Vol. 21, pp. 2604-10, 2005. Li, J., Edwards, P.C., Burghammer, M., Villa, C. and Schertler, G.F. "Structure of bovine rhodopsin in a trigonal crystal form", J Mol Biol, Vol. 343, pp. 1409-38, 2004. Lipman, D., Flicek, P., Salzberg, S., Gerstein, M. and Knight, R. "Closure of the NCBI SRA and implications for the long-term future of genomics data storage", Genome Biol, Vol. 12, pp. 402, 2011. Lipman, D.J. and Pearson, W.R. "Rapid and sensitive protein similarity searches", Science, Vol. 227, pp. 1435-41, 1985. Low, K.E.Y. "Ruminations on Smell as a Sociocultural Phenomenon", Current Sociology, Vol. 53, pp. 397-417 2005. Lundin, C., Kall, L., Kreher, S.A., Kapp, K., Sonnhammer, E.L., Carlson, J.R., Heijne, G. and Nilsson, I. "Membrane topology of the
107.
108.
109.
110.
111.
112. 113.
114.
115. 116. 117.
271
Drosophila OR83b odorant receptor", FEBS Lett, Vol. 581, pp. 5601-4, 2007. 118. Makalowski, W., Zhang, J. and Boguski, M.S. "Comparative analysis of 1196 orthologous mouse and human full-length mRNA and protein sequences", Genome Res, Vol. 6, pp. 846-57, 1996. Marchler-Bauer, J.B., Anderson, F., Chitsaz, M.K., Derbyshire, C., DeWeese-Scott, J.H., Fong, L.Y., Geer, R.C., Geer, N.R., Gonzales, M., Gwadz, S., He, D.I., Hurwitz, J.D., Jackson, Z., Ke, C.J., Lanczycki, C.A., Liebert, C., Liu, F., Lu, S., Lu, G.H., Marchler, M., Mullokandov, J.S., Song, A., Tasneem, N., Thanki, R.A., Yamashita, D., Zhang, N., Zhang and Bryant, S.H. "CDD: specific functional annotation with the Conserved Domain Database", Nucleic Acids Res, Vol. 37, pp. D205-10, 2009. Marinissen, M.J. and Gutkind, J.S. "G-protein-coupled receptors and signaling networks: emerging paradigms", Trends Pharmacol Sci., Vol. 22, pp. 368-76, 2001. Matsumoto, M., Kamohara, M., Sugimoto, T., Hidaka, K., Takasaki, J., Saito, T., Okada, M., Yamaguchi, T. and Furuichi, K. "The novel Gprotein coupled receptor SALPR shares sequence similarity with somatostatin and angiotensin receptors", Gene, Vol. 248, pp. 183-9, 2000. McCarroll, S.A., Li, H. and Bargmann, C.I. "Identification of transcriptional regulatory elements in chemosensory receptor genes by probabilistic segmentation" Curr Biol, Vol. 15, pp. 347-52, 2005. McGrath, P.T., Xu, Y., Ailion, M., Garrison, J.L., Butcher, R.A. and Bargmann, C.I. "Parallel evolution of domesticated Caenorhabditis species targets pheromone receptor genes" Nature, Vol. 477, pp. 321-5, 2011. Melkman, T. and Sengupta, P. "The worm's sense of smell. Development of functional diversity in the chemosensory system of Caenorhabditis elegans", Dev Biol., Vol. 265, pp. 302-19, 2004. Metpally, R.P.R. and Sowdhamini, R. "Cross genome phylogenetic analysis of human and Drosophila G protein-coupled receptors: application to functional annotation of orphan receptors", BMC Genomics, Vol. 6:106, pp. 1-20, 2005.
119.
120.
121.
122.
123.
124.
125.
272
126.
Meyer, M.R., Angele, A., Kremmer, E., Kaupp, U.B. and Muller, F. "A cGMP-signaling pathway in a subset of olfactory sensory neurons", Proc Natl Acad Sci U S A, Vol. 97, pp 10595-600, 2000. Mezler, M., Fleischer, J. and Breer, H. "Characteristic features and ligand specificity of the two olfactory receptor classes from Xenopus laevis", J Exp Biol, Vol. 204, pp. 2987-97, 2001. Mombaerts, P. "Molecular biology of odorant receptors in vertebrates" Annu Rev Neurosci, Vol. 22, pp. 487-509, 1999. Mombaerts, P., Wang, F., Dulac, C., Chao, S.K., Nemes, A., Mendelsohn, M., J. Edmondson and R. Axel "Visualizing an olfactory sensory map" Cell, Vol. 87, pp. 675-86, 1996. Montero, C., Campillo, N.E., Goya, P. and Paez, J.A. "Homology models of the cannabinoid CB1 and CB2 receptors. A docking analysis study" Eur J Med Chem, Vol. 40, pp. 75-83, 2005. Mori, I. "Genetics of chemotaxis and thermotaxis in the nematode Caenorhabditis elegans" Annu Rev Genet, Vol. 33, pp. 399-422, 1999. Muller, T. and Vingron, M. "Modeling amino acid replacement", J Comput Biol, Vol. 7, pp. 761-76, 2000. Munger, S.D., Leinders-Zufall, T. and Zufall, F. "Subsystem organization of the mammalian sense of smell" Annu Rev Physiol, Vol. 71, pp. 115-40, 2009. Murphy, P.M. and Tiffany, H.L. "Cloning of complementary DNA encoding a functional human interleukin-8 receptor", Science, Vol. 13, No. 253, pp. 12803, 1991. Needleman, S.B. and Wunsch, C.D. "A general method applicable to the search for similarities in the amino acid sequence of two proteins" J Mol Biol, Vol. 48, pp. 443-53, 1970. Nehme, R., Joubert, O., Bidet, M., Lacombe, B., Polidori, A., Pucci, B. and Mus-Veteau, I. "Stability study of the human G-protein coupled receptor, Smoothened", Biochim Biophys Acta, Vol. 1798, pp. 110010, 2010. Nemoto, W. and Toh, H. "GRIP: A server for predicting interfaces for GPCR oligomerization", J Recept Signal Transduct Res.,pp. 312-7, 2009.
127.
128. 129.
130.
131. 132. 133.
134.
135.
136.
137.
273
138.
Nemoto, W. and Toh, H. "Prediction of interfaces for oligomerizations of G-protein coupled receptors", PROTEINS: Structure, Function, and Bioinformatics, Vol.58, pp.644-60, 2005. Ng, P.C., Henikoff, J.G. and Henikoff, S. "PHAT: a transmembranespecific substitution matrix. Predicted hydrophobic and transmembrane" Bioinformatics, Vol. 16, pp. 760-6, 2000. Ngai, J., Chess, A., Dowling, M.M., Necles, N., Macagno, E.R and Axel, R. "Coding of olfactory information: topography of odorant receptor expression in the catfish olfactory epithelium", Cell, Vol. 72, pp. 667-80, 1993. Niimura, Y. and Nei, M. "Comparative evolutionary analysis of olfactory receptor gene clusters between humans and mice", Gene, Vol. 346, pp. 13-21, 2005. Niimura, Y. and Nei, M. "Evolutionary changes of the number of olfactory receptor genes in the human and mouse lineages", Gene, Vol. 346, pp. 23-8, 2005. Nufer, O., Guldbrandsen, S., Degen, M., Kappeler, F., Paccaud, J.P., Tani, K. and Hauri, H.P. "Role of cytoplasmic C-terminal amino acids of membrane proteins in ER export", J Cell Sci, Vol. 115, pp. 619-28, 2002. Ohki-Hamazaki, H., Watase, K., Yamamoto, K., Ogura, H., Yamano, M., Yamada, K., Maeno, H., Imaki, J., Kikuyama, S., Wada, E. and Wada, K. "Mice lacking bombesin receptor subtype-3 develop metabolic defects and obesity", Nature, Vol. 390, pp. 165-9, 1997. Ono, Y., Fujibuchi, W. and Suwa, M. "Automatic gene collection system for genome-scale overview of G-protein coupled receptors in eukaryotes", Gene, Vol. 364, pp. 63-73, 2005. Pace,U., Hanski, E., Salomon, Y. and Lancet, D. "Odorant-sensitive adenylate cyclase may mediate olfactory reception" Nature, Vol. 18-24, No. 16, pp. 255-8, 1985. Palczewski, K., Kumasaka, T., Hori, T., Behnke, C.A., Motoshima, H., Fox, B.A., Le Trong, I., Teller, D.C., Okada, T., Stenkamp, R.E., Yamamoto, M. and Miyano, M. "Crystal structure of rhodopsin: A G protein-coupled receptor", Science, Vol. 289, pp. 739-45, 2000.
139.
140.
141.
142.
143.
144.
145.
146.
147.
274
148.
Parmentier, M., Libert, F., Schurmans, S., Schiffmann, S., Lefort, A., Eggerickx, D., Ledent, C., Mollereau, C., Gerard, C. and Perret, J. "Expression of members of the putative olfactory receptor gene family in mammalian germ cells", Nature, Vol. 355, pp. 453-5, 1992. Perez, D. M. "From plants to man: the GPCR "tree of life"", Mol Pharmacol, Vol. 67, pp. 1383-4, 2005. Perfus-Barbeoch, L., Jones, A.M. and Assmann, S.M. "Plant heterotrimeric G protein function: insights from Arabidopsis and rice mutants", Curr Opin Plant Biol, Vol. 7, pp. 719-31, 2004. Pirovano, W., Feenstra, K.A. and Heringa "PRALINETM: a strategy for improved multiple alignment of transmembrane proteins", Bioinformatics, Vol. 24, No. 2, pp. 492-497, 2008. Prinster, S.C., Hague, C. and Hall, R.A. "Heterodimerization of g protein-coupled receptors: specificity and functional significance", Pharmacol Rev, Vol. 57, pp. 289-98, 2005. Probst, W.C., Snyder, L.A., Schuster, D.I., Brosius, J. and Sealfon, S.C. "Sequence alignment of the G-protein coupled receptor superfamily", DNA Cell Biol., Vol. 11, pp. 1-20, 1992. Pugalenthi, G., Kandaswamy, K.K., Suganthan, P.N., Archunan, G. and Sowdhamini, R. "Identification of functionally diverse lipocalin proteins from sequence information using support vector machine", Amino Acids, Vol. 39, No. 3, pp. 777-83, 2010. Raman, P., Cherezov, V. and Caffrey, M. "The Membrane Protein Data Bank" Cell Mol. Life Sci., Vol. 63, pp. 36-51, 2006. Rasmussen, S.G., Choi, H.J., Rosenbaum, D.M., Kobilka, T.S., Thian, F.S., Edwards, P.C., Burghammer, M., Ratnala, V.R., Sanishvili, R., Fischetti, R.F., Schertler, G.F., Weis, W.I. and Brian Kobilka, K. "Crystal structure of the human 2 adrenergic G-protein-coupled receptor", Nature, Vol. 450, pp. 383-387, 2007. Redfern, O.C., Dessailly, B. and Orengo, C.A. "Exploring the structure and function paradigm", Curr Opin Struct Biol, Vol. 18, pp. 394-402, 2008.
149. 150.
151.
152.
153.
154.
155. 156.
157.
275
158.
Remm, M. and Sonnhammer, E. "Classification of transmembrane protein families in the Caenorhabditis elegans genome and identification of human orthologs", Genome Res, Vol. 10, pp. 1679-89, 2000. Ressler, K.J., Sullivan, S.L. and Buck, L.B. "A zonal organization of odorant receptor gene expression in the olfactory epithelium" Cell, Vol. 73, pp. 597-609, 1993. Roayaie, K., Crump, J.G., Sagasti, AS. and Bargmann, C.I. "The G alpha protein ODR-3 mediates olfactory and nociceptive function and controls cilium morphogenesis in C. elegans olfactory neurons", Neuron, Vol. 20, pp. 55-67, 1998. Robertson, H.M. "Two large families of chemoreceptor genes in the nematodes Caenorhabditis elegans and Caenorhabditis briggsae reveal extensive gene duplication, diversification, movement, and intron loss", Genome Res, Vol. 8, pp. 449-63, 1998. Robertson, H.M. and Thomas, J.H. "The putative chemoreceptor families of C. elegans", WormBook, pp. 1-12, 2006. Robertson, H.M., Warr, C.G. and Carlson, J.R. "Molecular evolution of the insect chemoreceptor gene superfamily in Drosophila melanogaster", Proc Natl Acad Sci U S A, Vol. 100 Suppl 2, pp. 14537-42, 2003. Rodbell, M., Birnbaumer, L., Pohl, S.L. and Sundby, F. "The reaction of glucagon with its receptor: evidence for discrete regions of activity and binding in the glucagon molecule", Proc Natl Acad Sci U S A, Vol. 68, pp. 909-13, 1971. Rognan, D. "Development and virtual screening of target libraries" Journal of Physiology-Paris, Vol. 99, No. 23, pp. 23244, 2006. Rompler, H., Yu, H.T., Arnold, A., Orth, A. and Schoneberg, T. "Functional consequences of naturally occurring DRY motif variants in the mammalian chemoattractant receptor GPR33", Genomics, Vol. 87, pp. 724-32, 2006. Rouquier, S., Blancher, A. and Giorgi, D. "The olfactory receptor gene repertoire in primates and mouse: evidence for reduction of the functional fraction in primates" Proc Natl Acad Sci U S A, Vol. 97, pp. 2870-4, 2000.
159.
160.
161.
162. 163.
164.
165. 166.
167.
276
168.
Rovati, G.E., Capra, V. and Neubig. R.R. "The highly conserved DRY motif of class A G protein-coupled receptors: beyond the ground state" Mol Pharmacol, Epub 2006 Dec 27. Review. PubMed PMID: 17192495., Vol. 71, pp. 959-64, 2007. Saitou, N. and Nei, M. "The neighbor-joining method: a new method for reconstructing phylogenetic trees", Mol Biol Evol., Vol. 4, pp. 40625, 1987. Sali, A. and Blundell, T.L. "Comparative protein modelling by satisfaction of spatial restraints", J Mol Biol, Vol. 234, pp. 779-815, 1993. Saslis-Lagoudakis, C.H., Klitgaard, B.B., Forest, F., Francis, L., Savolainen, V., Williamson, E.M. and Hawkins. J.A. "The use of phylogeny to interpret cross-cultural patterns in plant use and guide medicinal plant discovery: an example from Pterocarpus (Leguminosae)", PLoS One., Vol. 6, No. 7, pp. e22275, 2011. Sato, K., Pellegrino, M., Nakagawa, T., Nakagawa, T., Vosshall, L.B. and Touhara, K. "Insect olfactory receptors are heteromeric ligandgated ion channels", Nature, Vol. 452, pp. 1002-6, 2008. Schaffer, A.A., Wolf, Y.I., Ponting, C.P., Koonin, E.V., Aravind, L. and Altschul, S.F. "IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices", Bioinformatics, Vol. 15, pp. 1000-11, 1999. Schiaffino, M.V., Baschirotto, B., Pellegrini, G., Montalti, S., Tacchetti, C., De Luca, M. and Ballabio, A. "The ocular albinism type 1 gene product is a membrane glycoprotein localized to melanosomes" Proc Natl Acad Sci U S A, Vol. 93, pp. 9055-60, 1996. Schiaffino, M.V., d'Addio, M., Alloni, A., Baschirotto, C., Valetti, C., Cortese, K., Puri, C., Bassi, M.T., Colla, C., De Luca, M., Tacchetti, C. and Ballabio, A. "Ocular albinism: evidence for a defect in an intracellular signal transduction system", Nat Genet, Vol. 23, pp. 108-12, 1999. Schluter, J.P., Reinkensmeier, J., Daschkey, S., EvguenievaHackenberg, E., Janssen, S., Janicke, S., Becker, J.D., Giegerich, R. and Becker, R. "A genome-wide survey of sRNAs in the symbiotic nitrogen-fixing alpha-proteobacterium Sinorhizobium meliloti" BMC Genomics, Vol. 11, pp. 245, 2010.
169.
170.
171.
172.
173.
174.
175.
176.
277
177.
Schlyer, S. and Horuk, R. "I want a new drug: G-protein-coupled receptors in drug development", Drug Discov Today, Vol. 11, pp. 481-93, 2006. Schmidt, H.A., Strimmer, K., Vingron, M. and Haeseler, A. "TREEPUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing", Bioinformatics, PubMed PMID: 11934758., Vol. 18, No. 3, pp. 502-4, 2002. Scott Gleim, Aleksandar Stojanovic, Eric Arehart and Daniel Byington "Conserved Rhodopsin Intradiscal Structural Motifs Mediate Stabilization: Effects of Zinc", Biochemistry, Vol. 48, No. 8, pp. 17931800, 2009. Seeman, P. "Dopamine receptors and the dopamine hypothesis of schizophrenia", Synapse, Vol. 1, pp. 133-52, 1987. Seminara, S.B., Messager, S., Chatzidaki, E.E., Thresher, R.R., Acierno, J.S., Shagoury, J.K., Bo-Abbas, Y., Kuohung, W., Schwinof, K.M., Hendrick, A.G., Zahn, D., Dixon, J., Kaiser, U.B., Slaugenhaupt, S.A., Gusella, J.F., O'Rahilly, S., Carlton, M.B., Crowley, W.F., Aparicio, S.A. and Colledge, W.H. "The GPR54 gene as a regulator of puberty", N Engl J Med, Vol. 349, pp. 1614-27, 2003. Sengupta, P., Chou, J.H. and Bargmann, C.I. "odr-10 encodes a seven transmembrane domain olfactory receptor required for responses to the odorant diacetyl", Cell, Vol. 84, No. 6, pp. 899-909, 1996. Shafrir, Y. and Guy, H.R. "STAM: simple transmembrane alignment method" Bioinformatics, Vol. 20, pp. 758-69, 2004. Shah, P.K. and Sowdhamini, R. "Structural understanding of the transmembrane domains of inositol triphosphate receptors and ryanodine receptors towards calcium channeling",. Prot Engng, Vol.14,pp. 86774, 2001. Shameer, K., Nagarajan, P., Gaurav, K. and Sowdhamini, R. "3PFDB A database of Best Representative PSSM Profiles (BRPs) of Protein Families generated using a novel data mining approach", BioData Min., Vol. 2-8, 2009. Sharon, D., Glusman, G., Pilpel, Y., Horn-Saban, S. and Lancet, D. "Genome dynamics, evolution, and protein modeling in the olfactory receptor gene superfamily", Ann N Y Acad Sci, Vol. 855, pp. 182-93, 1998.
178.
179.
180. 181.
182.
183. 184.
185.
186.
278
187. 188.
Siddiqi, O. "Olfaction in Drosophila", Chemical senses Wyzocki et al. Marcel Dekker, NY, Vol. 3 Eds, pp. 79-96, 1990. Sikder, D. and Kodadek, T. "The neurohormone orexin stimulates hypoxia-inducible factor-1 activity", Genes Dev, Vol. 21, pp. 29953005, 2007. Sklar, P.B., Anholt, R.R. and Snyder, S.H. "The odorant-sensitive adenylate cyclase of olfactory receptor cells. Differential stimulation by distinct classes of odorants", J Biol Chem, Vol. 261, pp. 15538-43, 1986. Skoufos, E., Marenco, L., Nadkarni, P.M., Miller, P.L. and Shepherd, G.M. "Olfactory receptor database: a sensory chemoreceptor resource", Nucleic Acids Res, Vol. 28, pp. 341-3, 2000. Smith, T.F. and Waterman, M.S. "Identification of common molecular subsequences", J Mol Biol., Vol. 147, pp. 195-7, 1981 Sokal, R. and Michener, C. "A statistical method for evaluating systematic relationships" University of Kansas Science Bulletin, Vol. 38, pp. 14091438, 1958. Sonnhammer, E.L., Eddy, S.R. and Durbin, R. "Pfam: a comprehensive database of protein domain families based on seed alignments", Proteins, Vol. 28, pp. 405-20, 1997 Sonnhammer, E.L., von Heijne, G. and Krogh, A. "A hidden Markov model for predicting transmembrane helices in protein sequences", Proc Int Conf Intell Syst Mol Biol, Vol. 6, pp. 175-82, 1998. Speca, D.J., Lin, D.M., Sorensen, P.W., Isacoff, E.Y., Ngai, J. and Dittman, A.H. "Functional identification of a goldfish odorant receptor", Neuron, Vol. 23, pp. 487-98, 1999. Stefano Costanzi and Gershengorn, S.N.M.C. "Seven transmembranespanning receptors for free fatty acids as therapeutic targets for diabetes mellitus: pharmacological, phylogenetic, and drug discovery aspects", The Journal of Biological Chemistry, Vol. 283, pp. 1626973, 2006. Steiger, S.S., Fidler, A.E. and Kempenaers, B. "Evidence for increased olfactory receptor gene repertoire size in two nocturnal bird species with well-developed olfactory ability" BMC Evol Biol, Vol. 9, pp. 117, 2009.
189.
190.
191. 192.
193.
194.
195.
196.
197.
279
198.
Stein, L.D., Bao, Z., Blasiar, D., Blumenthal, T., Brent, M.R., Chen, N., Chinwalla, A., Clarke, L., Clee, C., Coghlan, A., Coulson, A., D'Eustachio, P., Fitch, D.H., Fulton, L.A., Fulton, R.E., GriffithsJones, S., Harris, T.W., Hillier, L.W., Kamath, R., Kuwabara, P.E., Mardis, E.R., Marra, M.A., Miner, T.L., Minx, P., Mullikin, J.C., Plumb, R.W., Rogers, J., Schein, J.E., Sohrmann, M., Spieth, J., Stajich, J.E., Wei, C., Willey, D., Wilson, R.K., Durbin, R. and Waterston, R.H. "The genome sequence of Caenorhabditis briggsae: a platform for comparative genomics", PLoS Biol, Vol. 1, pp. E45, 2003. Stensmyr, M.C., Erland, S., Hallberg, E., Wallen, R., Greenaway, P. and Hansson, B.S. "Insect-like olfactory adaptations in the terrestrial giant robber crab", Curr Biol, Vol. 15, pp. 116-21, 2005. Stocker "The organization of the chemosensory system in Drosophila melanogaster : a review. " Cell Tissue Res Vol. 275, pp.326, pp 1994. Strimmer, K. and von Haeseler, A. "Likelihood-mapping: a simple method to visualize phylogenetic content of a sequence alignment", Proc Natl Acad Sci U S A, Vol. 94, pp. 6815-9, 1997. Sullivan, S.L., Adamson, M.C., Ressler, K.J., Kozak, C.A. and Buck, L.B. "The chromosomal distribution of mouse odorant receptor genes" Proc Natl Acad Sci U S A, Vol. 93, pp.. 884-8, 1996. Tamura, K., Dudley, J., Nei, M. and Kumar, S. "MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0" Mol Biol Evol, Vol. 24, pp. 1596-9, 2007. Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M. and Kumar, S. "MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods", Mol Biol Evol, Vol. 28, pp. 2731-9, 2011. Taniura, H., Sanada, N., Kuramoto, N. and Yoneda, Y. "A etabotropic glutamate receptor family gene in dictyostelium discoideum", JBC Papers in Press, 2006. Tareilus, E., Noe, J. and Breer, H. "Calcium signals in olfactory neurons" Biochim Biophys Acta, Vol. 1269, pp. 129-38, 1995. Teng, M.S., Dekkers, M.P., Ng, B.L., Rademakers, S., Jansen, G., Fraser, G.A. and McCafferty, J. "Expression of mammalian GPCRs in C. elegans generates novel behavioural responses to human ligands" BMC Biol, Vol. 4, pp. 22, 2006.
199.
200. 201.
202.
203.
204.
205.
206. 207.
280
208.
Theodoropoulou, M.C., Bagos, P.G., Spyropoulos, I.C. and H. SJ. "gpDB: a database of GPCRs, G-proteins, effectors and their interactions", Bioinformatics, Vol. 15, No. 24(12), pp. 1471-2, 2008. Thomas, J.H. and Robertson, H.M. "The Caenorhabditis chemoreceptor gene families", BMC Biol, Vol. 6, pp. 42, 2008. Thomas, R., Chen, J., Roudier, M.M., Vessella, R.L., Lantry, L.E. and Nunn, A.D. "In vitro binding evaluation of 177 Lu-AMBA, a novel 177 Lu-labeled GRP-R agonist for systemic radiotherapy in human tissues", Clin Exp Metastasis, Epub Oct 31, 2008. PMID: 18975117, Vol. 26, No. 2, pp. 1059, 2009. Thompson, H.D. and Gibson, J.D. "CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice." Nucleic Acids Res., PubMed PMID: 7984417; PubMed Central PMCID: PMC308517., Vol. 11, No. 22, pp. 4673-80, 1994. Tripathi, L.P. and Sowdhamini, R. "Genome-wide survey of prokaryotic serine proteases: analysis of distribution and domain architectures of five serine protease families in prokaryotes" BMC Genomics, Vol. 9, pp. 549, 2008. Troemel, E.R., Chou, J.H., Dwyer, N.D., Colbert, H.A. and Bargmann, C.I. "Divergent seven transmembrane receptors are candidate chemosensory receptors in C. elegans", Cell, PubMed PMID: 7585938, Vol. 20, No. 83, pp. 207-18, 1995. Tsuboi, A., Miyazaki, T., Imai, T. and Sakano, H. "Olfactory sensory neurons expressing class I odorant receptors converge their axons on an antero-dorsal domain of the olfactory bulb in the mouse" Eur J Neurosci., Vol. 23, No. 6, pp. 1436-44, 2006. Tusnady, G.E. and Simon, I "The HMMTOP transmembrane topology prediction server", Bioinformatics, Vol. 17, pp. 849-50, 2001. Tusnady, G.E., Kalmar, L. and Simon, I. "TOPDB: topology data bank of transmembrane proteins", Nucleic Acids Res, Vol. 36, pp. D234-9, 2008. Ulrich, C.D., Ferber, I., Holicky, E., Hadac, E., Buell, G. and Miller, L.J. "Molecular cloning and functional expression of the human gallbladder cholecystokinin A receptor", Biochem Biophys Res Commun, Vol. 193, pp. 204-11, 1993.
209. 210.
211.
212.
213.
214.
215. 216.
217.
281
218.
Vadakkadath Meethal, S., Gallego, M.J., Haasl, R.J., Petras, S.J., Sgro, J.Y. and Atwood, C.S. "Identification of a gonadotropin-releasing hormone receptor orthologue in Caenorhabditis elegans", BMC Evol Biol, Vol. 6, pp. 103, 2006. Venkatesh, S. and Singh, R. "Sensilla on the third antennal segment of Drosophila melanogaster meigen", Int. J. Insect.Morphol. Embryol., Vol. 13 pp. 51-63, 1984. Villeneuve, A., Gignac, S. and Provencher, P.H. "Glucocorticoids decrease endothelin-and -B-receptor expression in the kidney", J Cardiovasc Pharmacol, Vol. 36 (5 Suppl 1), 2000. Vinson, C.R. and Adler, P.N. "Directional non-cell autonomy and the transmission of polarity information by the frizzled gene of Drosophila", Nature, Vol. 329, pp. 549-51, 1987. Vosshall, L.B. and Stocker, R.F. "Molecular architecture of smell and taste in Drosophila", Annu Rev Neurosci, Vol. 30, pp. 505-33, 2007. Warne, T., Serrano-Vega, M.J., Baker, G.J., Moukhametzianov, R., Edwards, P.C., Henderson, R., Leslie, A.G., Tate, C.G. and Schertler, G.F. "Structure of a beta1-adrenergic G-protein-coupled receptor", Nature, Vol. 454, pp. 486-91, 2008. Warr, C., Clyne, P., de Bruyne, M., Kim, J. and Carlson, J.R. "Olfaction in Drosophila: coding, genetics and e-genetics" Chem Senses, Vol. 26, pp. 201-6, 2001. Wettschureck, N. and Offermanns, S. "Mammalian G proteins and their cell type specific functions" Physiol Rev, Vol. 85, pp. 1159-204, 2005. Whelan, S. and Goldman, N. "Estimating the frequency of events that cause multiple-nucleotide changes", Genetics, Vol. 167, pp. 2027-43, 2004. White, J.G., Southgate, E., Thomson, J.N. and Brenner, S. "The structure of the nervous system of the nematode Caenorhabditis elegans", Philos Trans R Soc Lond B Biol Sci, Vol. 314, pp. 1-340, 1986. Wicher, D., Schafer, R., Bauernfeind, R., Stensmyr, M.C., Heller, R., Heinemann, S.H. and Hansson, B.S. "dOr83b--receptor or ion channel?", Ann N Y Acad Sci, Vol. 1170, pp. 164-7, 2009.
219.
220.
221.
222. 223.
224.
225.
226.
227.
228.
282
229.
Wilbur, W.J. and LipmanD.J. "Rapid similarity searches of nucleic acid and protein data banks", Proc Natl Acad Sci U S A, Vol. 80, pp. 726-30, 1983. Wistrand, M., Kall, L. and Sonnhammer, E.L. "A general model of G protein-coupled receptor sequences and its application to detect remote homologs", Protein Sci, Vol. 15, pp. 509-21, 2006. Woollard, A. "Gene duplications and genetic redundancy in C. elegans" WormBook, Vol. 41, pp. 1-6, 2005. Wu, B., Chien, E.Y., Mol, C.D., Fenalti, G., Liu, W., Katritch, V., Abagyan, R., Brooun, A., Wells, P., Bi, F.C., Hamel, D.J., Kuhn, P., Handel, T.M., Cherezov, V. and Stevens, R.C. "Structures of the CXCR4 chemokine GPCR with small-molecule and cyclic peptide antagonists", Science, Vol. 330, pp. 1066-71, 2010. Yamano, Y., Kamon, R., Yoshimizu, T., Toda, Y., Oshida, Y., Chaki, S., Yoshioka, M. and Morishima, I. "The role of the DRY motif of human MC4R for receptor activation", Biosci Biotechnol Biochem, Vol. 68, pp. 1369-71, 2004. Ye, J., Fang, L., Zheng, H., Zhang, Y., Chen, J., Zhang, Z., Wang, J., Li, S., Li, Bolund, L., and Wang, J. "WEGO: a web tool for plotting GO annotations" Nucleic Acids Res, Vol. 34, pp. W293-7, 2006. Yoo, A.S. and Greenwald, I. "LIN-12/Notch activation leads to microRNA-mediated down-regulation of Vav in C. elegans" Science, Vol. 310, pp. 1330-3, 2005. Zhang, and Firestein, S. "The olfactory receptor gene superfamily of the mouse", Nat Neurosci, Vol. 5, pp. 124-33, 2002. Zhang, X. and Firestein, S. "The olfactory receptor gene superfamily of the mouse", Nature Neuroscience, Vol. 5, pp. 124 - 133, 2002. Zhang, X., Rogers, M., Tian, H., Zhang, X., Zou, D.J., Liu, J., Ma, M., Shepher, M. and Firestein, S.J. "High-throughput microarray detection of olfactory receptor gene expression in the mouse", Proc Natl Acad Sci U S A, Vol. 101, pp. 14168-73, 2004. Zhang, X., Zhang, X. and Firestein, S. "Comparative genomics of odorant and pheromone receptor genes in rodents" Genomics, Vol. 89, pp. 441-50, 2007.
230.
231. 232.
233.
234.
235.
236. 237. 238.
239.
283
240.
Zhang, X., Zhao, F., Guan, X., Yang, Y., Liang, C. and Qin, S. "Genome-wide survey of putative serine/threonine protein kinases in cyanobacteria", BMC Genomics, Vol. 8, pp 395, 2007. Zozulya, S., Echeverri, F. and Nguyen, T. "The human olfactory receptor repertoire" Genome Biol, Vol. 2, p. 18, 2001.
241.
284
LIST OF PUBLICATION
1. Balasubramanian Nagarathnam., Kannan Sankar., Varadhan Dharnidharka., Veluchamy Balakrishnan., Govindaraju Archunan, and Ramanathan Sowdhamini TM-MOTIF: an alignment viewer to annotate predicted transmembrane helices and conserved motifs in aligned set of sequences, Bioinformation. Published online 2011 October 31. PMCID: PMC3218415, Vol.7, No.5,pp. 214221, 2011. Balasubramanian Nagarathnam., Sankar Kannan., Varadhan Dharnidharka., Veluchamy Balakrishnan., Govindaraju Archunan, and Ramanathan Sowdhamini Insights from the analysis of conserved motifs and permitted amino acid exchanges in the human, the fly and the worm GPCR clusters, Bioinformation, Published online 2011 August 20. PMCID: PMC3163927, Vol.7, No. 1, pp.1520, 2011. Balasubramanian Nagarathnam., Singaravelu Kalaimathy., Veluchamy Balakrishnan, and Ramanathan Sowdhamini. Crossgenome clustering of human and C. elegans G-protein coupled receptors, Evolutionary Bioinformatics, Vol.8, pp.229-259, 2012.
2.
3.
285
CURRICULUM VITAE
NAGARATHNAM B. has worked towards her PhD under the guidance of Dr. V. Balakrishnan, Assistant Professor, KSR College of Technology, Tirchengode, India and Dr. R. Sowdhamini, Professor, National Centre for Biological Sciences, Bangalore, India. Her project on Genomewide survey of certain mammalian GPCRs and olfactory receptors was funded under India-Japan Collaborative Research Project by National Institute of Advanced Industrial Science and Technology (AIST), Japan and Department of Biotechnology (DBT), India. She carried her complete research work as full-time Research Scholar at Lab-25, C/o Prof. R. Sowdhamini, National Centre for Biological Sciences, Bangalore, India. Nagarathnam started her academic career with a distinction in BSc Zoology at Bharathiar University and earned her Masters in Applied Biology from Gandhigram Rural Institute in which she was the gold-medalist. She also holds an MPhil in Biotechnology from Periyar University. She has been a lecturer of biological sciences in several colleges in India. At various times, Nagarathnam has taught as a Lecturer in the Department of Zoology, Vellalar College of Arts and Science and in the Department of Microbiology, NS College of Arts and Science. She has also served as the Head, Department of Bioinformatics, KSR College of Arts and Science, Tamil Nadu. At the science-industry interface, She has been a Senior Research Associate at Bio Informatics Research, GIC online (Pvt), Chennai. She has attended several international conferences such as the 3rd Japan-India Bilateral Workshop on Bioinformatics at AIST, Tsukuba, Japan. She served as a student volunteer and also presented a poster at the 8th AsiaPacific Bioinformatics Conference, Bangalore. She is particularly interested in bioinformatic methods in sequence analysis of membrane proteins, identify motifs and to disentangle their phylogeny. She is also into developing bioinformatic tools that may be used by the wider scientific community to address these questions.

0 Front Pages New - Merged

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

0 Front Pages New - Merged

Diunggah oleh

Hak Cipta:

Format Tersedia

GENOME WIDE SURVEY OF CERTAIN MAMMALIAN GPCRS AND OLFACTORY RECEPTORS

in partial fulfillment for the award of the degree of

FACULTY OF SCIENCE AND HUMANITIES ANNA UNIVERSITY CHENNAI 600 025

ABSTRACT LIST OF TABLES LIST OF FIGURES LIST OF ABBREVIATIONS

iii xxii xxiv xxx

DATA REPOSITORIES FOR MEMBRANE PROTEINS 16 17

CLUSTER ASSOCIATIONS SEQUENCE CONSERVATION AND DIVERSITY

HOMOLOGY MODELLING OF GPCRs/ORs

CHAPTER NO. 2.2

OBJECTIVES PRIOR ART 2.4.1 Superfamilies of Serpentine Receptors

TITLE 2.6.4 Result Summary for Biogenic Amine Receptors

FAMILIES AND SUPERFAMILIES OF SERPENTINE RECEPTORS IN C. elegans 120 122

123 123 124

3.9.4 Phylogeny of Selected Serpentine Receptors 124

145 146 146

scheme: (by using Run TM option) 146

TITLE 6.1.7. Results

PAGE NO. 181

TITLE 6.3.5.3 Cocluster HXC3 - class II type receptors

196 199 199 199

236 236 237 240 242 245 247 253

Distribution of Human and C. elegans GPCRs in 32 Clusters 114 116

Analysis on sequence features of 10 human OR subclusters 183

List of conserved motifs in 10 human OR subclusters (60% level of conservations) 184

Sequence identity and similarity between odr-10 and associated SR 213

FIGURE NO. 1.1 1.2 1.3

Overview on the techniques involved in genomewide survey 22

Flow-chart to depict the step-wise procedure for cross-genome clustering of GPCRs 37

Pictorial representation for various types of cluster association 42

146 147 148 149

Snapshot Depicts the Display of Over Predicted TM-Helices 151

FIGURE NO. 5.1 5.2 5.3 5.4(a-c) 6.1

158 159 168 171 179 180 189

FIGURE NO. 6.7 6.8

PAGE NO. 201

226 227 228 229

6.16 6.17 6.18 6.19

Display of 3D Structure and related features in DOR

Central dogma of genome-wide survey on sequences

Crystal structure of bovine rhodopsin (Li et al 2004)

Membrane topology of olfactory receptor (odr-10) in C. elegans

GPCR signaling pathway

is based on an explicit in the BLAST

evolutionary model (Dayhoff et al 1978) is

Overview on the techniques involved in genomewide survey

, MAFFT can be used for membrane

CHAPTER 2 CROSS-GENOME CLUSTERING OF HUMAN AND C. ELEGANS G-PROTEIN COUPLED RECEPTORS

INTRODUCTION Membrane proteins are ubiquitous (Perez 2005), constitute nearly

system (White et al 1986),

knowledge on RNA interference in

Flow-chart to depict the step-wise procedure for crossgenome clustering of GPCRs

Neighbor Clades [NC] Refers to homogenous occurrence of C. elegans GPCRs adjacent or

Figure 2.2(A-C) Pictorial representation for various types of cluster association

(a) Figure 2.3

Figure 2.12 (A and B)

Cross-genome phylogeny of peptide receptors: (Rectangular Display and Radial Display

Cluster 11 Cluster 11 retains 10 human chemokine peptide receptors and 62 C.

Result summary for Chemokine Receptors

Result Summary for Class B (Secretin) Receptors

Result summary for cell adhesion receptors

Cluster 29 Human calcium-sensing calcium-sensing receptor receptor (CASR_Hum-Extracellular Cell calcium-sensing

Result summary for frizzed/smoothened receptors

Cluster 32 Cluster-32 comprises of receptors with similar domain