[BBF Standards] Registry exchange format

Douglas Densmore densmore at eecs.berkeley.edu
Fri Apr 11 15:37:20 EDT 2008


Hello,

I will use this topic as a chance to introduce myself to the list. My 
name is Doug Densmore and I am a post doc in EECS here at UC Berkeley. 
My background is in the modeling of embedded electronics at the system 
level and performing formal refinement verification of programmable 
architecture models. I am going to be running the tools side of the iGEM 
team here at UC Berkeley this summer and the topic brought up by Tim is 
of interest to me as well.

One of the key components of the tool that I am creating here is support 
for remote repositories of parts whether they be exchanged via XML or 
through relational databases. The tool will actually be agnostic to this 
and have a variety of interfaces standard as well as support user 
defined interfaces.

Right now, my thought is that the tool will support a number of 
functionalities. Each functionality will require certain information. 
What will be required is a "binding" between what information the tool 
requires and what information is provided by the data source. For 
example, the database entry may provide info A, B, C, D. The function 
that the tool wants to perform may require info A, B, Y, Z. A and B are 
fine since the function and the data are consistent here (part of why 
standardizing this exchange is useful). The tool will now prompt the 
user and say, "I am missing Y and Z". The user will now have to see if C 
or D can be bound to this information. Perhaps this could be as trivial 
as a naming convention or perhaps a small transformation has to take 
place. If so the user can determine is C->Y for example. This 
information can be stored LOCALLY and associated with this data source 
so that it does not have to be done in the future. In this case the 
naming convention can be stored locally so the tool will know to apply 
the transformation each time and will not actually modify the 
information stored on the data side. In the case where Z cannot be 
bound, the tool will "push" information to the user indicating that the 
functionality can not be performed on the given data. If we see a lot of 
common cases here, it would indicate potential candidates to add to the 
exchange.

That scenario is for importing and using the data. The other scenario is 
when we have to export data to from the tool and the exchange format 
cannot store all the information that we produce or wants other 
information. Again, a reverse binding will have to occur. A key issue is 
that we do not want information "lost" in either process which makes the 
part less useful or meaningful.

I agree with Tim that it is useful for a repository to except 
information but I worry that if N users use N different naming 
conventions than the part quickly gets very "noisy". I also imagine that 
some information can be derived from some of the "minimum required 
entries" and storing this is redundant.

In summary
- I think there should be "required" exchange fields
- I think a function in a tool should be described abstractly as to what 
data (fields) it needs
- I think tools should be able to locally bind data fields to the 
operations they support; this binding can be user specified; this 
binding can be syntactic or computational
- I think we need to be careful about adding any random data field to 
the source repository
- I think local information gained during the manipulation of a part can 
be stored locally and associated with that part if it is again brought 
into the design environment

I can't comment on Tim's suggestions regarding the specific  fields 
until I begin to actually see how these parts are used (i.e. learn more 
biology). I do like the hashing suggestion for not duplicating 
sequences. Is there work on sequence compression (just curious)?

Doug

>Today's Topics:
>
>   1. Registry exchange format (Timothy Ham)
>
>
>----------------------------------------------------------------------
>
>Message: 1
>Date: Thu, 10 Apr 2008 12:40:30 -0700
>From: "Timothy Ham" <tsham at lbl.gov>
>Subject: [BBF Standards] Registry exchange format
>To: standards at biobricks.org
>Message-ID:
>	<632cdbf70804101240u1c47ff55ga82e52a58be1875 at mail.gmail.com>
>Content-Type: text/plain; charset=ISO-8859-1
>
>Hi,
>My name is Tim, and I am working on a part registry for the recently
>formed Joint Bioenergy Institute (JBEI) here in Berkeley. I have read
>many posts on this list with interest.
>
>I was wondering what the progress is on the data exchange format for
>iGEM 2008. I couldn't find anything on openwetware nor the igem web
>site, and I have not heard anything back from Randy. And reading what
>Raik has wrote on exchange formats leaves me wondering if it's easier
>to just agree on some really minimal exchange fields.
>
>As a biologist, what's really important to me is the "desired piece of
>DNA", irregardless of packaging format. The packaging is really
>helpful, but sometimes not.
>
>So here is how I'm doing it at JBEI:
>
>-A Part is a "unique piece of DNA" containing a "desired sequence of
>DNA", the Sequence.
>-The packaging format has a defined prefix and suffix nucleotide sequence.
>-The Sequence may overlap with the packaging format, if necessary, in
>which case the overlap must be specified. The Sequence remains the
>same.
>-The Sequence is sha-1 hashed for identification. Allows quick check
>of duplicates or alterations.
>-A "Part Number" is assigned to a Part. So a Sequence, and the same
>Sequence in two packaging formats may get three total "part numbers".
>-Composite Part is nothing but a new Part with a new Sequence with
>regions annotated as being from other Parts.
>
>-Exchange is done via xml.
>-The db accepts, stores, and retrieves foreign tags. This allows
>accepting customizations from other registries. If registry A adds a
>new field for storing model information, semantics, ontology or
>whatever, and sends this xml file to registry B that's not ready to
>handle it, registry B still hold and export that information
>unmolested. Proper namespace is required for foreign tags.
>-uuids get attached to "part numbers" for global identification.
>
>Is this minimal enough to agree on?
>
>Some other nice fields we have:
>Creator, Aliases, References, keywords, summary notes, attached files.
>A draft jbei xml format is mostly complete, and I will post it here in
>a few days.
>
>Exchange of parts information should be easy as sending people e-mail.
>As such, I would like to coordinate with others as much as possible.
>
>Tim
>
>
>
>------------------------------
>
>_______________________________________________
>Standards mailing list
>Standards at biobricks.org
>http://biobricks.org/mailman/listinfo/standards_biobricks.org
>
>
>End of Standards Digest, Vol 4, Issue 2
>***************************************
>  
>




More information about the Standards mailing list