[BBF Standards] data exchange -- say no to web services

Raik Gruenberg raik.gruenberg at crg.es
Wed Feb 13 17:51:36 EST 2008


Hi Jason,

would you mind to (re)launch the "What is the best format / technology for 
exchange?" section in the wiki? Your mail looks like a very good starting point.

Jason Morrison wrote:
> If you don't care about REST, SPARQL libraries, or application 
> structuring, you can skip my email and know that I vote +1 to "say no to 
> [heavy] web services" :)
> 
> I'd like to put a few ideas out there, but not propose anything 
> concrete.  Raik is certainly on the right track when he suggests we 
> should determine the BB model before the format, and it's during 
> discussion of the latter that we can begin to discuss the interface for 
> querying and updating the data store.  But, since it's on my mind, and I 
> can't help myself:

No need to restrain ourself. The technology choices are important also for the 
data model discussion -- depending on the technology we have very different 
possibilities for relations and connections. But before we for instance commit 
ourself on building a semantic web of biobricks, we need some overview to what 
extend this is actually feasible.

> 
> For reference, I'm considering a piece of web-accessible software, like 
> the MIT Registry or BrickIt, that has BB data in some sort of 
> persistence layer (be it a relational DB, an object DB,  an XML store, a 
> hash store like CouchDB/SimpleDB, or a triple store), offers a 
> human-facing UI, and a programmatic interface for 3rd party software 
> integration that allows *read/write access* with authentication and 
> authorization rules.
> 
> *== XML/DB backend, REST API ==*
> If we end up storing BB descriptor documents in XML of a custom schema 
> (like CellML/SBML), or a relational database (like BrickIt does), and 
> want the tools that store such data to expose a programmatic API, I 
> believe that a RESTful architecture might have some advantages.  In 
> particular, REST is a simpler approach to data access than SOAP; REST is 
> easy to work with since it's simply HTTP, and software support is 
> plentiful. 

Yes, REST may be the way to go. BTW, a web server serving and consuming N3 or 
RDF/XML documents would be the ultimate REST solution, as far as I understand it 
-- I think in contrast to the Sparql API you describe below. So picture an 
automatic data exchange between our local Brickit server and the MIT registry:

1. update notification
- parts.mit subscribes to brickit.crg RSS-feed
- parts.mit receives RSS digest that there is a new biobrick record BBb_F0101

2. Read access
- parts.mit loads brickit.crg/parts.n3#BBb_F0101
-> brickit.crg serves the internal record as N3 document (technically no problem 
whatsoever, as I discussed somewhere before)

3. Write or rather "inverse read"
- parts.mit parses the document (rdflib, redland...)
- parts.mit verifies the ontology/content
- parts.mit inserts a new record (ignores any properties that are not defined in 
its own ontology -- the crg people may be experimenting with additional data)
- parts.mit adds a property "owl:sameAs <brickit.crg/parts.n3#BBb_F0101>;" to 
the new record
...or it may not copy it at all (DRY, Dont Repeat Yourself), but just link it 
into the appropriate biobrick families and cache it for faster queries ...

Note, that there is no write access in this scenario. That means, there is no 
authentication needed either. It's up to the receiver to decide whether or not 
to ignore the RSS and what to do about the new record.

The question is whether this model is feasible with the available tools.

> 
> * Wikipedia article: http://en.wikipedia.org/wiki/REST
> * Introduction: http://www.xfront.com/REST-Web-Services.html
> * Ch 05 of Fielding's thesis (theory behind REST):  
> http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm
> 
> Note that this approach involves a layer of abstraction over the 
> persistence layer.  The disadvantage is, compared to offering a straight 
> up SQL/etc interface, is the additional step necessary to write the 
> layer.  However, you'll have to design a layer of abstraction anyhow for 
> the UI (such as a web application serving HTML) and frameworks such as 
> Django and Rails can make it easy to expose alternative content types 
> (XML, JSON) in parallel with your human-consumable HTML data views.
> 
> * Rails resource_controller plugin: 
> http://jamesgolick.com/resource_controller
> * Django rest interface: http://code.google.com/p/django-rest-interface/

If the django-rest app works as advertised, it would solve the data exchange at 
least between BrickIt servers in 3 or 4 lines of code...

could be at least a good short-term solution.

> 
> The advantage is that you get to decouple the internal representation 
> from the public API.  This allows you to modify your underlying data 
> store (database, schema, etc.) and not break the interface that your 
> clients are using.  It also allows your application to perform data 
> validation, and allows you to write that in the higher-level language of 
> your application rather than in SQL triggers/keys.  Also, you do not 
> have to repeat this validation logic across both your application and in 
> the database.  It also affords you more power in the 
> authentication/authorization department than simple database logins.  
> This approach (doing validation/auth in the application later) is that 
> of an Application Database and essentially precludes you from offering a 
> raw SQL interface.
> 
> * Application Database: 
> http://martinfowler.com/bliki/ApplicationDatabase.html
> 
> *== Triple backend, SPARQL/SPARUL API ==
> *If, on the other hand, we elect a triple-based storage format, query 
> languages such as SPARQL and SPARQL/Update (aka SPARUL) offer great power.

... perhaps more on the application side rather than for actual data exchange.
A SPARQL API could for instance allow the EBI to list all related Biobricks next 
to a Blast hit and ultimately use the sophisticated Biobrick description to 
annotate the natural protein ;-)

> 
> * http://jena.hpl.hp.com/~afs/SPARQL-Update.html

This would indeed allow to write client programs that can directly push a record 
into a remote registry. Yet another possibility.

Greetings,
Raik

> 
> Note that, with this approach, the tool could expose the underlying RDF 
> as a SPARQL/SPARUL endpoint, and both the application's web interface 
> and the API interface could work against that.  The point here is that 
> triples are likely flexible enough to withstand a "schema change"  and 
> providing a SPARQL-adhering endpoint is a layer of abstraction that 
> allows you to swap out the underlying triple store if necessary .  I am 
> not sure how authentication/authorization and data validation happen in 
> this scenario, as I am less familiar with it.
> 
> For rolling up your sleeves and hacking around, see object/RDF modeling 
> libraries such as:
> * http://www.activerdf.org/ (Ruby)
> * http://oort.to/ (Python).
> * http://arc.semsol.org/ (PHP)
> 
> The following articles contain a good deal of discussion on the topic of 
> building web applications for the semantic web:
> * http://thefigtrees.net/lee/blog/2007/01/using_rdf_on_the_web_a_survey.html
> * http://thefigtrees.net/lee/blog/2007/01/using_rdf_on_the_web_a_vision.html
> 
> That's all for now :)
> 
> Jason
> 
> On Feb 12, 2008 4:11 PM, Raik Gruenberg <raik.gruenberg at crg.es 
> <mailto:raik.gruenberg at crg.es>> wrote:
> 
>      > Interesting. There are two websites out there that I would like to
>      > mention (as well as Freebase, though I do not know about its
>      > relevance). In particular, check out these groups:
> 
>     Freebase is interesting because it shows the power of triple-based
>     knowledge.
>     They basically re-invent RDF and put an excellent user interface on
>     top of it.
>     The kind of questions you can ask there and the freedom to link
>     everything with
>     everything is far surpassing relational database technology. The
>     problem I see
>     is that:
>     * they don't follow the existing standards
>     * it's closed source
>     * most important, all information rests on their server -- defying
>     the semantic
>     web idea.
> 
>      >
>      > http://dataportability.org/ (more for social networks)
>      > http://theinfo.org/ -- for processing large datasets.
>      >
>      > A good first step might be to make the bioinformatics databases
>     expose
>      > MySQL/PostgreSQL interfaces to the public for automated querying.
>     Would
>      > this be useful?
> 
>     I am not an expert here, but I believe some do. This still leaves
>     the data
>     isolated though and doesn't tell you anything about their meaning.
>     Web services
>     are then often used to offer APIs to access the data but this
>     requires an
>     explicit programing effort for every link and has the described
>     problems.
>     That's why efforts like Moby -- to centralize and unify web
>     services. Some
>     databases start providing direct RDF, though (e.g. uniprot).
> 
>     Greetings,
>     Raik
> 
>     Bryan Bishop wrote:
>      > On Monday 11 February 2008, Raik Gruenberg wrote:
>      >>> The number of web service providers in the field of bioinformatics
>      >>> is
>      >>  > increasing every year. In theory, these services are
>     interoperable
>      >> and > independent of specific computer languages. However, each
>      >> service uses its own > definition of data types and method naming
>      >> conventions. Moreover, > theseservices are often not usable by
>      >> specific languages (partly, due to the > lack of compliance of the
>      >> SOAP/WSDL specification in the language's library).
>      >>
>      >> ... and then they need to fix up the mess with an extra layer of
>      >> ontologies of web services while the actual data seem to remain
>      >> undescribed and disconnected.
>      >
>      > Interesting. There are two websites out there that I would like to
>      > mention (as well as Freebase, though I do not know about its
>      > relevance). In particular, check out these groups:
>      >
>      > http://dataportability.org/ (more for social networks)
>      > http://theinfo.org/ -- for processing large datasets.
>      >
>      > A good first step might be to make the bioinformatics databases
>     expose
>      > MySQL/PostgreSQL interfaces to the public for automated querying.
>     Would
>      > this be useful?
>      >
>      > - Bryan
>      > ________________________________________
>      > Bryan Bishop
>      > http://heybryan.org/
>      >
>      > _______________________________________________
>      > Standards mailing list
>      > Standards at biobricks.org <mailto:Standards at biobricks.org>
>      > http://biobricks.org/mailman/listinfo/standards_biobricks.org
>      >
>      >
> 
>     --
>     ________________________________
> 
>     Dr. Raik Gruenberg
>     http://www.raiks.de/contact.html
>     ________________________________
> 
>     _______________________________________________
>     Standards mailing list
>     Standards at biobricks.org <mailto:Standards at biobricks.org>
>     http://biobricks.org/mailman/listinfo/standards_biobricks.org
> 
> 
> 
> 
> -- 
> Jason Morrison
> jason.p.morrison at gmail.com <mailto:jason.p.morrison at gmail.com>
> http://jayunit.net
> (585) 216-5657

-- 
________________________________

Dr. Raik Gruenberg
http://www.raiks.de/contact.html
________________________________



More information about the Standards mailing list