[BBF Standards] Registry exchange format

Raik Gruenberg raik.gruenberg at crg.es
Fri Apr 11 14:05:54 EDT 2008


Hi Tim,

great to get a signal from your new institute!

Timothy Ham wrote:
> Hi,
> My name is Tim, and I am working on a part registry for the recently
> formed Joint Bioenergy Institute (JBEI) here in Berkeley. I have read
> many posts on this list with interest.
> 
> I was wondering what the progress is on the data exchange format for
> iGEM 2008. I couldn't find anything on openwetware nor the igem web
> site, and I have not heard anything back from Randy. And reading what
> Raik has wrote on exchange formats leaves me wondering if it's easier
> to just agree on some really minimal exchange fields.

Absolutely! Having a minimal set of properties would be the logical first step. 
You probably have already found it, if not, please have a look at the discussion 
wiki where we have already loosely collected some points:
http://www.openwetware.org/wiki/The_BioBricks_Foundation:Standards/Technical/Exchange#minimal_Biobrick_information

> 
> As a biologist, what's really important to me is the "desired piece of
> DNA", irregardless of packaging format. The packaging is really
> helpful, but sometimes not.

As you read, we had quite some discussion about "what constitutes a Biobrick" 
(see above wiki). If your registry is not actually doing any sample tracking but 
only collects the theoretical information about Biobricks, you may indeed not 
care much about the wrapping sequences. However, as soon as you want to track 
and exchange Biobrick samples, the format is a crucial bit of information which 
you need to have alongside the actual Biobrick sequence. I am already running 
into the situation that I am transferring Biobricks from one into another format 
and the result is, for all experimental matters, two different Biobricks. One 
can be used in my assembly the other not. If your approach is mostly based on 
direct synthesis rather than assembly, you can define a 'blunt' Biobrick format 
with zero prefix and suffix. Mind you, the format also decides on how the scar 
looks in assembled Biobricks.

> 
> So here is how I'm doing it at JBEI:
> 
> -A Part is a "unique piece of DNA" containing a "desired sequence of
> DNA", the Sequence.
> -The packaging format has a defined prefix and suffix nucleotide sequence.

agreed so far.

> -The Sequence may overlap with the packaging format, if necessary, in
> which case the overlap must be specified. The Sequence remains the
> same.

We don't so far have any case where there is an overlap between sequence and 
Biobrick flanks. The 1.0 coding parts are forming a special format with a 
shortened prefix. I think it's better to leave it like that so that you can 
always simply add vector + prefix + sequence + suffix to get your physical DNA.
On the software side, you may need to implement constraints (e.g. this format 
doesn't tollerate that restriction site in the BB sequence) but since there are 
not that many formats we can simply communicate them human-2-human for the time 
being.

> -The Sequence is sha-1 hashed for identification. Allows quick check
> of duplicates or alterations.

neat trick.

> -A "Part Number" is assigned to a Part. So a Sequence, and the same
> Sequence in two packaging formats may get three total "part numbers".

Right. So my format discussion above was not needed.

> -Composite Part is nothing but a new Part with a new Sequence with
> regions annotated as being from other Parts.

We (or at least I, didn't get much feedback on that) were favoring to not copy 
the actual sequence (stay DRY -- Don't Repeat Yourself). So the composite part 
would be defined as unique sequence of basic Biobricks plus all intervening scar 
sequences. This is also the Brickit approach.

> 
> -Exchange is done via xml.

There indeed needs to be a XML serialization. I think an RDF XML would make us 
more future proof.

> -The db accepts, stores, and retrieves foreign tags. This allows
> accepting customizations from other registries. If registry A adds a
> new field for storing model information, semantics, ontology or
> whatever, and sends this xml file to registry B that's not ready to
> handle it, registry B still hold and export that information
> unmolested. Proper namespace is required for foreign tags.

Mhm, interesting idea. That would need to be seen in practise. My goal was more 
that each registry keeps only properties it knows about and other location can 
project further information into the Biobrick using the unique location.

> -uuids get attached to "part numbers" for global identification.
>
I much prefer the URI concept that will allow to cross-link data. So each 
Biobrick should have a unique location. Still it would be useful to have unique 
ids even without including a full address but these could simply consist of a 
code for each new registry followed by whatever you deam unique within this 
registry. Well, the typical N3 translation of an URI ends up looking like this
anyway:

@prefix jbei: "http://....";
...
jbei:BBb_012121

There is minimal one-time communication with the BBF required to ensure that 
nobody else is using 'jbei' as their registry code... looks much more human to 
me than a long hex number or something. Plus it remains RDF compatible.

> Is this minimal enough to agree on?

I think the most important issue are:
- the ID (URI rather than uuid)
- include a format record
- whether you would consider supporting RDF / XML.

> 
> Some other nice fields we have:
> Creator, Aliases, References, keywords, summary notes, attached files.
> A draft jbei xml format is mostly complete, and I will post it here in
> a few days.

Field-wise I pretty much agree otherwise. It may be helpful for you to have a 
look at the BrickIt data model. It's a simple file defining some Python classes 
so you can study it without even installing Brickit:
http://brickit.svn.sourceforge.net/viewvc/brickit/branches/devel/djbrickit/repository/models.py?view=markup
(Note, that I am pointing you to the devel branch here.)

BTW, did you consider building upon the BrickIt base? I am quite eager to have a 
more distributed development here...
In any case, if your platform choice is still open, you may want to have a close 
look at Django. Hint: The Python / Django combination is the first web solution 
supported by the upcoming Google app platform. That tells us something I would 
say  ;-)

Greetings and have a nice weekend!
Raik

> 
> Exchange of parts information should be easy as sending people e-mail.
> As such, I would like to coordinate with others as much as possible.
> 
> Tim
> 
> _______________________________________________
> Standards mailing list
> Standards at biobricks.org
> http://biobricks.org/mailman/listinfo/standards_biobricks.org
> 
> 

-- 
________________________________

Dr. Raik Gruenberg
http://www.raiks.de/contact.html
________________________________



More information about the Standards mailing list