BioGuide - Project - User Requirements 

BioGuide Project





Use BioGuideSRS


Use BioGuide


Restricted area

User documentation


Querying process
Configuration file









Interviews (questionnaire available) were performed with 30 individuals. Their research interests felt into three main domains: studies of diseases (in particular, cancer), functional and structural genomics. Both biologists and physicians have been interviewed. All of them were used to perform bioinformatics analyses and use Entrez (NCBI) or SRS (EBI) to query biological data. More information about the affiliation of the interviewees is available at the Acknowledgements page.

We asked interviewees to describe the biological questions they were used to pose and ask them to make explicit what were the main biological entities of interest in those questions.

Complete examples of use of BioGuide are available here while additional examples of how questions can be posed using BioGuide are available here.

Collected Questions


  • What is known about the genetic disease narcolepsy (use at least OMIM and SwissProt)?

  • What are the related proteins and genes associated with the Narcolepsy disease?

  • A mutation of what gene(s) results in dysprothrombinemia haemophilia? (disease)

  • What proteins are inactive in patients having dysprothrombinemia haemophilia (disease)?

  • Do our candidate proteins belong to any known cellular or metabolic pathways found in Kegg?

  • What is the functional domain in Pfam corresponding to this exon?

  • Where is this gene located onto the genome (use MapView, fish clones)?

  • Which are the domains of this set of proteins that are involved in a same pathway according to Gene Ontology?

  • Is there a therapeutic target (e.g. phosphatase domains) to be exploited in this group of genes?

  • What is the functional role of this set of genes differentially expressed in my array?

  • Which parts of my sequence found in GenBank are very similar among this group of genes?

  • From this gene name, give me its sequence from EMBL, the corresponding transcripts and domains (transcripts from SwissProt if possible or TrEMBL otherwise).

  • Return all sequences of genes associated with Cholesterol. 

  • Return the list of genes of EMBL coding for nuclear cofactors (function).

  • Return the list of genes involved in nuclear cofactor (function).

Micro-array experiments analysis

  • In ArrayExpress, list all experiments performed by a single user.

  • Retrieve all experiments entered into Geo since October 31, 2004.

  • Retrieve normalized data for two arrays in an experiment and graph the luminosity values on a log-log scatter plot.

  • List all experiments from a particular lab, or operator.

  • List all experiments using a particular protocol.

  • List all experiments performed on an extract from a particular tissue type.

  • Which genes are expressed in response to pathogen A in Geo?

  • Gene A is over expressed under conditions {Ci}, is gene A over expressed under conditions {Cj} or in another cellular type?

  • Genes A, B et C have the same signal in their promoter regions, are they all over or under expressed in similar conditions?

  • Which genes are expressed only on a given tissue?

  • From a given gene sequence, return all of functional information available (at least GO terms).

  • What is the set of genes that have seen their expression modified in a given condition? Within this set, is there a subset of genes that are co-regulated?

  • What are the elements that may explain the modulation of the expression of certain genes in a given situation?

  • Which genes are differentially expressed between the susceptible vs. non-susceptible mice in MGD?

  • What does the pattern of expression in the different embryonic tissues tell us about how the defect (which comes from a variety of tissues) is caused?

  • What is the time course of these changes in gene expression and how does it relate to the time course of the expression of the corresponding proteins in the two groups (use HugeIndex for the expression data)?

  • In which tissues are this set of genes expressed (consult ArrayExpress)?

Multi-parametric analyses

  • What are the elements that may explain a parallel or opposite modulation (expression) of certain genes:

    • Membership to a functional class?

    • Homologies occurring in their peptide sequences (proteins)?

    • Or in their nucleic sequences (gene) particularly in the promoting region?

  • Among the deregulated genes: which ones are leaders (or chef-orchestras, regulating other genes (pathways), consult Kegg and other? Which ones play a target role, terminal effectors leading to a straightforward occurrence of the disease state?

  • Is there any correlation between gene expression levels and a certain pathological phenotype?

  • What is the set of genes of which a deregulation characterizes a pathological sample, by indicating: a gravity level, a prognostic factor, a sensitivity level and a resistance to a certain treatment?

  • What genes are involved in a multi-genic neurological disorder?

  • Return all the transcription factors upregulated (Expression) in acute myeloid leukemia (Disease) with sequence similarity to common promoter motifs.

  • Is my cDNA similar to any mouse genes of MGD that are predicted to encode transcription factors (transcription factor) and have been localized to mouse chromosome 5?”

  • List all genes whose proteins are predicted to contain a signal peptide and for which there is evidence that they are expressed in Plasmodium falciparum's late schizont stage (Disease).

  • Which genes on chromosome 2 are expressed in pancreas and are involved in signal transduction (function).

  • Have keratin (function) genes ever been found to be expressed in brain tissue?

  • Which microarray platform has the cytokines (proteins) I am interested in?

  • This SNP from dbSNP is in a coding region. Does the native protein have a known three dimensional structure (in PSB or other?)?

  • Does the amino acid occur in the active site of the protein?

  • Which ligands are known to bind in the active site of this protein?

  • Select proteins which have been annotated with isomerase (function)?

  • Select motifs for antigenic human proteins that participate in apoptosis (disease)

Annotation of new genomes

  • Identify homologues of a sequence (Contig from ensEMBL); of these pull out either n closest or sequences specified.

  • Return the BAC clones which are contained in a minimum tagging path of the chromosome 1 in rice genome?

  • Return groups of genes similar to my sequence and cluster these groups by functional category.

  • What proteins are homologous (use BlastP) to this sequence (of unknown protein)?

  • What proteins of a given organism (Bacterian) are homologous  to this sequence?

  • Which homologies (between proteins) have changed between two releases of a same  databank (ie. SwissProt + trEMBL)?

  • Which are the differences/similarities between the metabolic pathway of L. bacilus bulgarium and L. johnsonii (use MegaBlast or other)?

  • Which pairs of genes in L. bulgaricus have fusioned to form a single gene in another organism's genome?

  • Which are the proteins homologous to Lb gasseri (and/or)  Lb plantarum (and/or) Lactobacillus but without any homologies with another organism?

  • Which are the proteins homologous to Lactobacillus and another organism such that the score of similarity between the protein and its homologous in Lactobacillus is higher than its homologous in the other organism? In other words:  Which are the **skelet backbone regions** of these bacteria?

  • For each protein of a given organism return the list of its homologous having the highest scores of similarity, specify the corresponding organisms (use UniProt).

  • Which is the structure (exon/intron) of this gene?

  • Which is the set of transcripts (est and messager RNA) associated to this gene?

  • Which is the set of paralogous genes to this gene (use Homologene)?

  • Which is the set of est associated to this set of paralogous genes?

  • Which is the groups of genes which intron/exon decomposition is the same?


  • What are the evolutionary relationships among organisms that are capableof nitrogen fixation (function)?

    • By what pathways is nitrogen fixed?

    • Are there any FAME signatures (motif) common for these groups of organisms?

  • Return all of the tubuline (function) proteins using GO terms?

  • Which motif corresponds to the tubuline domain?

  • Which are the predicted positions of a given list of introns (using a given list of ESTs)?

  • Which are all the predicted ORF in yeasts?

  • Build the corresponding phylogenetic tree (using the MEGA tool) of this set of proteins.

Location, genomic

  • What probe sets of the DNA chip are localized on the chromosome 17 (location according to MapView, Fish clones)?

  • Find information on the known DNA sequences on human chromosome 22 (see MapView and UCSCGenome).

  • What are all the genes located around ERBB2?

  • What genes are tyrosine kinase receptors (function) according to SwissProt?

  • Which parts of the gene's sequence correspond to intra or extra cellular parts of the protein (cellular localization)?

  • What functional domains this gene (NF2) contains (use at least PRODOM and InterPro)?

  • Which genes contain a particular functional domain (helicase)?

  • Where are the BACs of my CGH array located onto the genome sequence?

  • Which are the known genes in the BAC … (use ensEMBL and UCSCGenome)?

  • Show me the locations of all of the genes that contain inteins (use MapView).

  • Show me the locations of all genes that have names like "dehydrogenase"

  • Show me the locations of all of the genes in this genome whose protein translations contain at least two predicted transmembrane regions, consult PFAM for the domains (domain).

  • Show me the location of all genes that are longer that 500 bp and are annotated as being involved in amino acid biosynthesis (pathway) in GO.

    • Show me only those genes that are expressed in this genome under the conditions described in experiment X.

    • Is the expression of gene BCR1 significantly correlated with the up or down regulation of expression with another gene in the genome?

  • Return all sequences (Contig) which are putative members of the olfactory receptor family (function) which map 'close' to marker M on human chromosome 19?

  • Return all genomic sequences (Contig) for which alu elements are located internal to a gene domain (domain).

  • Return the map location, where known, of all alu elements having homology greater than "h" with the alu sequence (domain) "S".

  • Return all human gene sequences, with annotation information, for which a putative functional (function) has been identified

  • Return all mammalian gene sequences for proteins identified as being involved in intra-cellular signal transduction (Pathways)

  • Return the genes for zinc-finger proteins on chromosome 19 that have been sequenced.

  • Return the number and a list of the distinct human genes that have been sequenced.

  • Return all the human contigs greater than 150 kb.

  • Return all sequences (BACs), for which at least two sequence variants are known

  • Return all G1/S serine/threonine kinase genes and their translated proteins that are known (experimentally) or are thought (by similarity) also to exhibit tyrosine phosphorylation activity (pathway).

  • How many genes are in the last human genome release (see MapView and ensEMBL)?

  • Retrieve all interORFs regions coordinates for chromosome 4.

  • Return all the proteins involved in the oxidative stress response (function) and closed to (i.e. similar with, using BlastP) YAP1.

  • Look for the conservation percentage of the most expressed genes in Yeasts (use SGD at least)?

  • Return the average length of the essential genes promoter area

Protein's 3D-Structure, protein-protein docking

  • What is the structure of this protein?

  • What is (are) the function(s) known to be associated with this structure?

  • Show me the 3D-structure of a given known protein (from PIR)?

  • Which are all the structural homologous of this protein (3D-structure)?

  • Which are the proteins containing a catalytic site (structural domain)?

  • Which are all the proteins which contain a particular structural domain (use PDB)?

  • Which are all the hydrolase (protein) with 2 chains (3D-structure)?

  • Which are all the ADN fixing sites of this protein?

  • From this set of structural motifs, what are the corresponding proteins and the corresponding ORF having the same 3Dstructure (use FSSB)?

Laboratoire de Recherche en Informatique, University of Pennsylvania

Contact us