[BBF Standards] Biobrick data exchange format

Raik Gruenberg raik.gruenberg at crg.es
Fri Feb 1 08:16:26 EST 2008


> But I would rephrase #1 to say
> 
> 1.) What is the data model needed to describe a biobrick?

Good point! And we are in the middle of this discussion.

Clearly, there are are lots of data we would like to capture about Biobricks -- 
in order to aid (automatc?) design, to set up simulations, etc.  Furthermore, we 
can't even know yet what kind of data might arrive from future experiments, or 
what may be crucial for simulations in 2 years from now.

The risk I see is that we now loose ourself in the quest for the perfect 
all-encompassing data model, from which we will emerge after two years of debate 
and specialized workshops with a big fat XML or UML layout (in need of revision 
immediately), plus a database of 9 or 10 Biobricks fulfilling the standard...

On the other hand, Biobricks are being planned, ordered and cloned right now -- 
just yesterday, I've added 6 new users from 4 different labs to our little 
internal registry  here at the CRG (and my own list of Biobricks is growing 
too). We need some way to exchange the information that is essential to the 
experimentalist. And we need it rather quickly to encourage exchange, expand the 
community beyond iGem, avoid fragmentation, and, eventually, get more Biobricks 
to play with!

> Once the data model is firmly in place, the format should follow as the one
> that best implements that data model. 

I would suggest to first focus on a *minimal* biobrick description that contains 
the essential information for (human) planning of new devices and assembly. 
Preferably, the data model should not contain much more (rather less, where 
possible) than what is now available at parts.mit.edu.

Admittedly, this strategy is also connected to my (now very outdated) experience 
with the RDF/N3 format. Unlike the classic approach, RDF/XML/N3 would allow us 
to get quickly up and running with a core biobrick definition and refine the 
data model and add connections later, without breaking the existing format. It 
looks also like a lot of the data are going to be relations/links, e.g. BBa_03 
"is made from" BBa_01 and BBa_02 or "works together with" BBa_10, "works in" 
<chassis>, etc -- for which RDF is quite optimal.

> So things to think about in a model are what type of relationships to we want to convey?
> 

essential IMO:

 > * Inheritance (where was a particular part derived from, and by who)
 > * Plays well with others (what other parts can this one interact with - with 
possible data associated with this interaction = link + data)

+ source gene / organism (if any)
+ contributing user(s) / lab / organization
+ target chassis / organism
+ assembly format (prefix/suffix/scar)
   [But: does the same part in different format constitute a different biobrick?]

+ references (web / wiki)
+ references (literature)
+ references (experiences)

+ type or tags or some classification
+ status or progress
+ Sequencing results or 'verified' flag

Plus simple facts:

+ ID
+ Sequence
But: Does the same protein part with different codon optimization constitute a 
different biobrick?
+ short description
+ long description


> 2. I’m not entirely sure about this, but I think we should consider
> modeling part interactions, especially when empirical data is involved, as
> separate entities than the parts themselves. ... When an interaction is defined,
> are there conditions (i.e. pH range etc.) that should be attached to that
> interaction definition?
> 

Experimental data could be defined separately (think modular :-)-- the beauty 
about RDF is that you are free to create a document/ressource (say, in Paris) 
that links additional data to an existing Biobrick (defined say, at the MIT). 
Once the external data and structure has settled, the Biobrick definition (MIT) 
can also link back.

>> It sounds like there needs to be an annotation system in place that  
>> would allow compatibility evidence to be labeled as experimentally or
>>   computationally generated (or both).  We might even consider the
>> 
> I certainly see the need to be able to reference other sub-ontologies 
> related to understanding the experimental methods for testing the 
> brick, or the specifications for the computational evidence etc.
> 

Again, that's exactly how RDF/N3 works to start with.

> The BioBrick parts themselves -- in their XML file format -- would 
> specify what sort of measurement utilities should be used, in some 
> cases maybe PCR on a LOA chip or in other cases maybe visual processing 

IMO, we need to learn more before it's clear what we need there.
See above, you could set up a specialized site for that and link these data in.

 > ...
> I can't help but think that it would be far more easy if we had a much 
> larger knowledge base (kb) to work off of, so that we may summarize and 
> make generalizations about important trends in incompatabilities. But 
> maybe somebody can provide a proof that we can start off without this 
> KB and still be able to make in silico specifications that turn out to 
> be mostly correct?  

I think, the key is to start with a set of essential core data and be flexible 
enough to add more as the need and consensus arises.

> After all, whatever data format we choose will 
> determine what we can later record in that format, so if we miss some 
> important characteristics or are unable to graph various chemical 
> reactions, then we're screwed and would have to rewrite software many 
> times.

That's exactly the trap we can avoid with the RDF/XML/N3 format. Admittedly, the 
software infrastructure for the semantic formats seems not as mature. This 
shouldn't bother us too much for now though -- For example, there are hundreds 
of RSS readers out there, and I bet even most of their developers don't realize 
that they are actually working with semantic web documents...

OK, Enough RDF propaganda, Sorry about the lengthy mail!
Greetings,
Raik

-- 
________________________________

Dr. Raik Gruenberg
http://www.raiks.de/contact.html
________________________________



More information about the Standards mailing list