I'm glad you've decided to synchronize your Chado data with CMap. This document will help you get started
Postgres 8.0+ with Perl support is required to use the Perl triggers. You will need to upgrade your install of Postgres if your version does not meet the requirements.
When upgrading, first see the upgrade secion of postgres INSTALL doc.
Here is what I had to do:
$ pg_dumpall > outputfile
use the --with-perl flag during config
$ ./configure --with-perl $ make $ make install $ su - postgres $ /usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data $ /usr/local/pgsql/bin/pg_ctl -D /usr/local/pgsql/data -l /usr/local/pgsql/logfile start
$ /usr/local/pgsql/bin/psql -d postgres -f outputfile
$ createlang plperlu template1 $ createlang plperlu dbname
These tables give the concept of a feature set (featureset) and links it to feature (feature_featureset) and dbxref (featureset_dbxref). It also adds a linker table from featureloc to dbxref (featureloc_dbxref).
$ psql test2 < chado_integration/chado_synchronize/chado_sync_tables.sql
In order for this whole thing to work, the featureset table needs to be populated. Someday, a script might help you out with this but for now you will have to do this by hand.
The concept of a featureset is the same as a map set in CMap. Basically, maps (sequence assemblies, chromosomes or whatever you want to make a map) are grouped into sets. For example, sequence assembies from the same assembly run would all be in the same set to differenciate from other assembly runs.
All maps in a set must be of the same type.
All maps in a set must be from the same organism.
Maps must only belong to one featureset otherwise there will possibly be inconsistencies.
insert into featureset (name,uniquename,feature_type_id,organism_id) ( select distinct o.abbreviation||'_'||cvt.name, o.abbreviation||'_'||cvt.name, f.type_id, f.organism_id from feature f, organism o, cvterm cvt where f.type_id = cvt.cvterm_id and f.organism_id = o.organism_id and cvt.name = 'chromosome_arm' );
In chado, the ``maps'' are stored in the feature table. Each ``map'' should be connected to it's featureset by feature_featureset.
This example is not very complex. For instance it does not take into account various versions of data but you can use it as a starting point.
insert into feature_featureset select fs.featureset_id, f.feature_id from featureset fs, feature f where f.type_id = fs.feature_type_id;
You will need to insert new map types and feature types into the CMap config file. The accessions for these types should be the cvterm ``name'' but with spaces replaced by ``_''.
This script will look at the featuresets that you've inserted and insert the data into CMap.
You will have several options. The most import is the feature types to look at. If you select too many, it may take a long time.
$ chado_integration/chado_synchronize/cmap_synchronize_chado.pl --chado_datasource datasource -u sql_username [-p sql_password] --cmap_datasource cmap_datasource
You will now need to use cmap_admin.pm to create correspondences between the newly created features.
Postgres triggers have been written to keep the data in the CMap database synchronized with chado.
Included in this directory is a file called trigger.PL. It will help create the triggers file.
Run trigger.PL using the cmap data source (the example uses CMAP_DEMO as the cmap data source). This will create a file called ``triggers.DATASOURCE.sql'' (in the example it will create ``triggers.CMAP_DEMO.sql'').
$ perl trigger.PL -d CMAP_DEMO
$ psql chado-fly < triggers.CMAP_DEMO.sql
If you want to remove the links to CMap in the chado database, run remove_sync_from_chado.pl (in this directory). This will remove all of the dbxref that point to the CMap database.
$ chado_integration/chado_synchronize/remove_sync_from_chado.pl --chado_datasource datasource -u sql_username [-p sql_password] [--db_base_name db_base_name]
The options are similar to cmap_synchronize_chado.pl except db_base_name. This is the base name used to name the entries in the ``db'' table. It will be set to ``cmap'' by default (which is fine if the value in cmap_synchronize_chado.pl wasn't changed). Only if there are multiple CMap databases that need to be connected will this value be changed.
From the CMap side, you can specify cross-references (xrefs) for each feature or map. This is not automatic however.
Then to create the link in the image, you can modify the area_code option in the config directory for each feature type.
The following is an example of how to get an xref from the database.
area_code <<EOF my $dbxrefs = $self->sql()->get_xrefs( cmap_object => $self, object_id => $feature->{'feature_id'}, object_type => 'feature', xref_name => 'Chado', ); my $new_url = ''; if ( @{ $dbxrefs || [] } ) { my $t = $self->template; $t->process( \$dbxrefs->[0]{'xref_url'}, { object => $feature }, \$new_url ); } $url = $new_url; $code=sprintf("onMouseOver=\"window.status='%s';return true\"",$feature->{'feature_type_acc'}); EOF
For more information about how the area_code works, see the ``Map, Feature and Evidence Type Information'' section of ADMINISTRATION.html.
For more information about how cross-references work, see attributes-and-xrefs.html.
Maybe include the trigger installation in the cmap_synchronize_chado.pl script or at the least replace the text in the trigger.template file to make it easier for the user.
Ben Faga <faga@cshl.org>