[BBF Standards] Registry exchange format
Douglas Densmore
densmore at eecs.berkeley.edu
Fri Apr 11 15:37:20 EDT 2008
Hello,
I will use this topic as a chance to introduce myself to the list. My
name is Doug Densmore and I am a post doc in EECS here at UC Berkeley.
My background is in the modeling of embedded electronics at the system
level and performing formal refinement verification of programmable
architecture models. I am going to be running the tools side of the iGEM
team here at UC Berkeley this summer and the topic brought up by Tim is
of interest to me as well.
One of the key components of the tool that I am creating here is support
for remote repositories of parts whether they be exchanged via XML or
through relational databases. The tool will actually be agnostic to this
and have a variety of interfaces standard as well as support user
defined interfaces.
Right now, my thought is that the tool will support a number of
functionalities. Each functionality will require certain information.
What will be required is a "binding" between what information the tool
requires and what information is provided by the data source. For
example, the database entry may provide info A, B, C, D. The function
that the tool wants to perform may require info A, B, Y, Z. A and B are
fine since the function and the data are consistent here (part of why
standardizing this exchange is useful). The tool will now prompt the
user and say, "I am missing Y and Z". The user will now have to see if C
or D can be bound to this information. Perhaps this could be as trivial
as a naming convention or perhaps a small transformation has to take
place. If so the user can determine is C->Y for example. This
information can be stored LOCALLY and associated with this data source
so that it does not have to be done in the future. In this case the
naming convention can be stored locally so the tool will know to apply
the transformation each time and will not actually modify the
information stored on the data side. In the case where Z cannot be
bound, the tool will "push" information to the user indicating that the
functionality can not be performed on the given data. If we see a lot of
common cases here, it would indicate potential candidates to add to the
exchange.
That scenario is for importing and using the data. The other scenario is
when we have to export data to from the tool and the exchange format
cannot store all the information that we produce or wants other
information. Again, a reverse binding will have to occur. A key issue is
that we do not want information "lost" in either process which makes the
part less useful or meaningful.
I agree with Tim that it is useful for a repository to except
information but I worry that if N users use N different naming
conventions than the part quickly gets very "noisy". I also imagine that
some information can be derived from some of the "minimum required
entries" and storing this is redundant.
In summary
- I think there should be "required" exchange fields
- I think a function in a tool should be described abstractly as to what
data (fields) it needs
- I think tools should be able to locally bind data fields to the
operations they support; this binding can be user specified; this
binding can be syntactic or computational
- I think we need to be careful about adding any random data field to
the source repository
- I think local information gained during the manipulation of a part can
be stored locally and associated with that part if it is again brought
into the design environment
I can't comment on Tim's suggestions regarding the specific fields
until I begin to actually see how these parts are used (i.e. learn more
biology). I do like the hashing suggestion for not duplicating
sequences. Is there work on sequence compression (just curious)?
Doug
>Today's Topics:
>
> 1. Registry exchange format (Timothy Ham)
>
>
>----------------------------------------------------------------------
>
>Message: 1
>Date: Thu, 10 Apr 2008 12:40:30 -0700
>From: "Timothy Ham" <tsham at lbl.gov>
>Subject: [BBF Standards] Registry exchange format
>To: standards at biobricks.org
>Message-ID:
> <632cdbf70804101240u1c47ff55ga82e52a58be1875 at mail.gmail.com>
>Content-Type: text/plain; charset=ISO-8859-1
>
>Hi,
>My name is Tim, and I am working on a part registry for the recently
>formed Joint Bioenergy Institute (JBEI) here in Berkeley. I have read
>many posts on this list with interest.
>
>I was wondering what the progress is on the data exchange format for
>iGEM 2008. I couldn't find anything on openwetware nor the igem web
>site, and I have not heard anything back from Randy. And reading what
>Raik has wrote on exchange formats leaves me wondering if it's easier
>to just agree on some really minimal exchange fields.
>
>As a biologist, what's really important to me is the "desired piece of
>DNA", irregardless of packaging format. The packaging is really
>helpful, but sometimes not.
>
>So here is how I'm doing it at JBEI:
>
>-A Part is a "unique piece of DNA" containing a "desired sequence of
>DNA", the Sequence.
>-The packaging format has a defined prefix and suffix nucleotide sequence.
>-The Sequence may overlap with the packaging format, if necessary, in
>which case the overlap must be specified. The Sequence remains the
>same.
>-The Sequence is sha-1 hashed for identification. Allows quick check
>of duplicates or alterations.
>-A "Part Number" is assigned to a Part. So a Sequence, and the same
>Sequence in two packaging formats may get three total "part numbers".
>-Composite Part is nothing but a new Part with a new Sequence with
>regions annotated as being from other Parts.
>
>-Exchange is done via xml.
>-The db accepts, stores, and retrieves foreign tags. This allows
>accepting customizations from other registries. If registry A adds a
>new field for storing model information, semantics, ontology or
>whatever, and sends this xml file to registry B that's not ready to
>handle it, registry B still hold and export that information
>unmolested. Proper namespace is required for foreign tags.
>-uuids get attached to "part numbers" for global identification.
>
>Is this minimal enough to agree on?
>
>Some other nice fields we have:
>Creator, Aliases, References, keywords, summary notes, attached files.
>A draft jbei xml format is mostly complete, and I will post it here in
>a few days.
>
>Exchange of parts information should be easy as sending people e-mail.
>As such, I would like to coordinate with others as much as possible.
>
>Tim
>
>
>
>------------------------------
>
>_______________________________________________
>Standards mailing list
>Standards at biobricks.org
>http://biobricks.org/mailman/listinfo/standards_biobricks.org
>
>
>End of Standards Digest, Vol 4, Issue 2
>***************************************
>
>
More information about the Standards
mailing list