[BBF Standards] Registry exchange format
Raik Gruenberg
raik.gruenberg at crg.es
Fri Apr 11 14:05:54 EDT 2008
Hi Tim,
great to get a signal from your new institute!
Timothy Ham wrote:
> Hi,
> My name is Tim, and I am working on a part registry for the recently
> formed Joint Bioenergy Institute (JBEI) here in Berkeley. I have read
> many posts on this list with interest.
>
> I was wondering what the progress is on the data exchange format for
> iGEM 2008. I couldn't find anything on openwetware nor the igem web
> site, and I have not heard anything back from Randy. And reading what
> Raik has wrote on exchange formats leaves me wondering if it's easier
> to just agree on some really minimal exchange fields.
Absolutely! Having a minimal set of properties would be the logical first step.
You probably have already found it, if not, please have a look at the discussion
wiki where we have already loosely collected some points:
http://www.openwetware.org/wiki/The_BioBricks_Foundation:Standards/Technical/Exchange#minimal_Biobrick_information
>
> As a biologist, what's really important to me is the "desired piece of
> DNA", irregardless of packaging format. The packaging is really
> helpful, but sometimes not.
As you read, we had quite some discussion about "what constitutes a Biobrick"
(see above wiki). If your registry is not actually doing any sample tracking but
only collects the theoretical information about Biobricks, you may indeed not
care much about the wrapping sequences. However, as soon as you want to track
and exchange Biobrick samples, the format is a crucial bit of information which
you need to have alongside the actual Biobrick sequence. I am already running
into the situation that I am transferring Biobricks from one into another format
and the result is, for all experimental matters, two different Biobricks. One
can be used in my assembly the other not. If your approach is mostly based on
direct synthesis rather than assembly, you can define a 'blunt' Biobrick format
with zero prefix and suffix. Mind you, the format also decides on how the scar
looks in assembled Biobricks.
>
> So here is how I'm doing it at JBEI:
>
> -A Part is a "unique piece of DNA" containing a "desired sequence of
> DNA", the Sequence.
> -The packaging format has a defined prefix and suffix nucleotide sequence.
agreed so far.
> -The Sequence may overlap with the packaging format, if necessary, in
> which case the overlap must be specified. The Sequence remains the
> same.
We don't so far have any case where there is an overlap between sequence and
Biobrick flanks. The 1.0 coding parts are forming a special format with a
shortened prefix. I think it's better to leave it like that so that you can
always simply add vector + prefix + sequence + suffix to get your physical DNA.
On the software side, you may need to implement constraints (e.g. this format
doesn't tollerate that restriction site in the BB sequence) but since there are
not that many formats we can simply communicate them human-2-human for the time
being.
> -The Sequence is sha-1 hashed for identification. Allows quick check
> of duplicates or alterations.
neat trick.
> -A "Part Number" is assigned to a Part. So a Sequence, and the same
> Sequence in two packaging formats may get three total "part numbers".
Right. So my format discussion above was not needed.
> -Composite Part is nothing but a new Part with a new Sequence with
> regions annotated as being from other Parts.
We (or at least I, didn't get much feedback on that) were favoring to not copy
the actual sequence (stay DRY -- Don't Repeat Yourself). So the composite part
would be defined as unique sequence of basic Biobricks plus all intervening scar
sequences. This is also the Brickit approach.
>
> -Exchange is done via xml.
There indeed needs to be a XML serialization. I think an RDF XML would make us
more future proof.
> -The db accepts, stores, and retrieves foreign tags. This allows
> accepting customizations from other registries. If registry A adds a
> new field for storing model information, semantics, ontology or
> whatever, and sends this xml file to registry B that's not ready to
> handle it, registry B still hold and export that information
> unmolested. Proper namespace is required for foreign tags.
Mhm, interesting idea. That would need to be seen in practise. My goal was more
that each registry keeps only properties it knows about and other location can
project further information into the Biobrick using the unique location.
> -uuids get attached to "part numbers" for global identification.
>
I much prefer the URI concept that will allow to cross-link data. So each
Biobrick should have a unique location. Still it would be useful to have unique
ids even without including a full address but these could simply consist of a
code for each new registry followed by whatever you deam unique within this
registry. Well, the typical N3 translation of an URI ends up looking like this
anyway:
@prefix jbei: "http://....";
...
jbei:BBb_012121
There is minimal one-time communication with the BBF required to ensure that
nobody else is using 'jbei' as their registry code... looks much more human to
me than a long hex number or something. Plus it remains RDF compatible.
> Is this minimal enough to agree on?
I think the most important issue are:
- the ID (URI rather than uuid)
- include a format record
- whether you would consider supporting RDF / XML.
>
> Some other nice fields we have:
> Creator, Aliases, References, keywords, summary notes, attached files.
> A draft jbei xml format is mostly complete, and I will post it here in
> a few days.
Field-wise I pretty much agree otherwise. It may be helpful for you to have a
look at the BrickIt data model. It's a simple file defining some Python classes
so you can study it without even installing Brickit:
http://brickit.svn.sourceforge.net/viewvc/brickit/branches/devel/djbrickit/repository/models.py?view=markup
(Note, that I am pointing you to the devel branch here.)
BTW, did you consider building upon the BrickIt base? I am quite eager to have a
more distributed development here...
In any case, if your platform choice is still open, you may want to have a close
look at Django. Hint: The Python / Django combination is the first web solution
supported by the upcoming Google app platform. That tells us something I would
say ;-)
Greetings and have a nice weekend!
Raik
>
> Exchange of parts information should be easy as sending people e-mail.
> As such, I would like to coordinate with others as much as possible.
>
> Tim
>
> _______________________________________________
> Standards mailing list
> Standards at biobricks.org
> http://biobricks.org/mailman/listinfo/standards_biobricks.org
>
>
--
________________________________
Dr. Raik Gruenberg
http://www.raiks.de/contact.html
________________________________
More information about the Standards
mailing list