The Central Core DNA Sequences Information System (CCSIS) has been set up in order to give support to the Molecular Biology and Genomics Unit in the collection and exploitation of GMO related DNA sequence data. The GMO DNA sequences have been manually annotated following international standards and made available via web through a sequence retrieval system integrated with over 200 bioinformatics applications. This application are running on a dedicated Apple Workgroup Cluster for Bioinformatics for High Performance Computing.
The MBG Bioinformatics group is also responsible for the set-up of the IHCP specialised EMBnet node.
Central Core DNA Sequence Information System (CCSIS)
The Central Core DNA Sequences Information System (CCSIS) is the molecular database where the submitted GMO sequence data to the European Union Reference Laboratory for GM Food and Feed (EURL-GMFF) by applicants is stored to run homology searches in order to assess the specificity of the proposed GMO detection method as required by the Commission Regulation (EC) No 641/2004.
The sequences and their biological metadata have been manually encoded and annotated following international sequence standard formats according the rules of the International Nucleotide Sequence Database Collaboration’s (INSDC DDBJ/EMBL/GenBank) Feature Table. The NCBI’s freely available standalone annotation and submission program Sequin has been used for the annotation.
As described in the "Guideline for the submission of DNA sequences to the EURL-GMFF" published on the EURL-GMFF, following information (when available) is added to the sequence during the annotation process:
- “DEFINITION” (Title describing the sequence record),
- “SOURCE” and “ORGANISM” (according to the NCBI Taxonomy database),
- “SIZE” (in base pairs (bp)),
- “MOLECULE TYPE” (DNA),
- “TOPOLOGY” (linear / circular),
- “REFERENCE” (References with Authors, Title, Journals, etc.),
- “source” (regions / sources of GMO insert and host organism),
- “STS” (PCR amplicon of the Detection Method),
- “primer_bind” (with primer name and sequence for Fwd, Rev Primer and Probe),
- all genetic elements (“gene”, “promoter”, “terminator”, etc.) and “CDS”,
- the full sequence of the insert(s), together with (at least) the base pairs of the host flanking sequences needed to establish an event-specific detection method,
- the full sequence of the species-specific target (reference gene) and/ its GenBank accession number if available.
The sequence records are uploaded to the GMO sequence database and made accessible via web with restricted user-access integrated with common bioinformatics applications (BLAST, ClustalW, EMBOSS package…) for immediate bioinformatics analyses.
In July 2007 CCSIS contains 120 nucleotide sequence records (in total 662770 bp) of GMOs and related specific PCR amplicons with primers and probes for GMO detection methods derived from 60 GMO Dossiers manually encoded and annotated in international sequence standard format.
Bioinformatics Tools and Computing Resources
To provide a better bioinformatics service to the Molecular Biology and Genomics Unit, a dedicated High Performance Computing Cluster with the "Central Core DNA Sequence Information System" has been installed and made available via web. Our Cluster is build of 4 Apple Xserve-G5 dual processors running Mac OS X Server operating system. Each node has 2 G5 CPU’s running at 2.3 GHz and 2 GB of RAM memory.
Over 180 Bioinformatics Tools (like NCBI Blast, ClustalW, EMBOSS, etc.) are made available through the Bioteam iNquiry package. The cluster nodes are setup as Portal Architecture and use the SunGrid engine for distributed computing and resource management. Application integration is controlled via an extensible XML-based framework based upon Pise.
The GMO Sequence data of the CCSIS and many public available sequence databases, like GenBank and SwissProt are locally installed and integrated with the local MRS sequence retrieval system and the Bioinformatics Tools.
Since 2005 the Molecular Biology and Genomics Unit manages the IHCP-BGMOs Unit Specialist EMBnet node.