[BBF Standards] data exchange -- say no to web services
Jason Morrison
jason.p.morrison at gmail.com
Tue Feb 12 17:57:07 EST 2008
If you don't care about REST, SPARQL libraries, or application structuring,
you can skip my email and know that I vote +1 to "say no to [heavy] web
services" :)
I'd like to put a few ideas out there, but not propose anything concrete.
Raik is certainly on the right track when he suggests we should determine
the BB model before the format, and it's during discussion of the latter
that we can begin to discuss the interface for querying and updating the
data store. But, since it's on my mind, and I can't help myself:
For reference, I'm considering a piece of web-accessible software, like the
MIT Registry or BrickIt, that has BB data in some sort of persistence layer
(be it a relational DB, an object DB, an XML store, a hash store like
CouchDB/SimpleDB, or a triple store), offers a human-facing UI, and a
programmatic interface for 3rd party software integration that allows
*read/write access* with authentication and authorization rules.
*== XML/DB backend, REST API ==*
If we end up storing BB descriptor documents in XML of a custom schema (like
CellML/SBML), or a relational database (like BrickIt does), and want the
tools that store such data to expose a programmatic API, I believe that a
RESTful architecture might have some advantages. In particular, REST is a
simpler approach to data access than SOAP; REST is easy to work with since
it's simply HTTP, and software support is plentiful.
* Wikipedia article: http://en.wikipedia.org/wiki/REST
* Introduction: http://www.xfront.com/REST-Web-Services.html
* Ch 05 of Fielding's thesis (theory behind REST):
http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm
Note that this approach involves a layer of abstraction over the persistence
layer. The disadvantage is, compared to offering a straight up SQL/etc
interface, is the additional step necessary to write the layer. However,
you'll have to design a layer of abstraction anyhow for the UI (such as a
web application serving HTML) and frameworks such as Django and Rails can
make it easy to expose alternative content types (XML, JSON) in parallel
with your human-consumable HTML data views.
* Rails resource_controller plugin:
http://jamesgolick.com/resource_controller
* Django rest interface: http://code.google.com/p/django-rest-interface/
The advantage is that you get to decouple the internal representation from
the public API. This allows you to modify your underlying data store
(database, schema, etc.) and not break the interface that your clients are
using. It also allows your application to perform data validation, and
allows you to write that in the higher-level language of your application
rather than in SQL triggers/keys. Also, you do not have to repeat this
validation logic across both your application and in the database. It also
affords you more power in the authentication/authorization department than
simple database logins. This approach (doing validation/auth in the
application later) is that of an Application Database and essentially
precludes you from offering a raw SQL interface.
* Application Database:
http://martinfowler.com/bliki/ApplicationDatabase.html
*== Triple backend, SPARQL/SPARUL API ==
*If, on the other hand, we elect a triple-based storage format, query
languages such as SPARQL and SPARQL/Update (aka SPARUL) offer great power.
* http://jena.hpl.hp.com/~afs/SPARQL-Update.html
Note that, with this approach, the tool could expose the underlying RDF as a
SPARQL/SPARUL endpoint, and both the application's web interface and the API
interface could work against that. The point here is that triples are
likely flexible enough to withstand a "schema change" and providing a
SPARQL-adhering endpoint is a layer of abstraction that allows you to swap
out the underlying triple store if necessary . I am not sure how
authentication/authorization and data validation happen in this scenario, as
I am less familiar with it.
For rolling up your sleeves and hacking around, see object/RDF modeling
libraries such as:
* http://www.activerdf.org/ (Ruby)
* http://oort.to/ (Python).
* http://arc.semsol.org/ (PHP)
The following articles contain a good deal of discussion on the topic of
building web applications for the semantic web:
* http://thefigtrees.net/lee/blog/2007/01/using_rdf_on_the_web_a_survey.html
* http://thefigtrees.net/lee/blog/2007/01/using_rdf_on_the_web_a_vision.html
That's all for now :)
Jason
On Feb 12, 2008 4:11 PM, Raik Gruenberg <raik.gruenberg at crg.es> wrote:
> > Interesting. There are two websites out there that I would like to
> > mention (as well as Freebase, though I do not know about its
> > relevance). In particular, check out these groups:
>
> Freebase is interesting because it shows the power of triple-based
> knowledge.
> They basically re-invent RDF and put an excellent user interface on top of
> it.
> The kind of questions you can ask there and the freedom to link everything
> with
> everything is far surpassing relational database technology. The problem I
> see
> is that:
> * they don't follow the existing standards
> * it's closed source
> * most important, all information rests on their server -- defying the
> semantic
> web idea.
>
> >
> > http://dataportability.org/ (more for social networks)
> > http://theinfo.org/ -- for processing large datasets.
> >
> > A good first step might be to make the bioinformatics databases expose
> > MySQL/PostgreSQL interfaces to the public for automated querying. Would
> > this be useful?
>
> I am not an expert here, but I believe some do. This still leaves the data
> isolated though and doesn't tell you anything about their meaning. Web
> services
> are then often used to offer APIs to access the data but this requires an
> explicit programing effort for every link and has the described problems.
> That's why efforts like Moby -- to centralize and unify web services. Some
> databases start providing direct RDF, though (e.g. uniprot).
>
> Greetings,
> Raik
>
> Bryan Bishop wrote:
> > On Monday 11 February 2008, Raik Gruenberg wrote:
> >>> The number of web service providers in the field of bioinformatics
> >>> is
> >> > increasing every year. In theory, these services are interoperable
> >> and > independent of specific computer languages. However, each
> >> service uses its own > definition of data types and method naming
> >> conventions. Moreover, > theseservices are often not usable by
> >> specific languages (partly, due to the > lack of compliance of the
> >> SOAP/WSDL specification in the language's library).
> >>
> >> ... and then they need to fix up the mess with an extra layer of
> >> ontologies of web services while the actual data seem to remain
> >> undescribed and disconnected.
> >
> > Interesting. There are two websites out there that I would like to
> > mention (as well as Freebase, though I do not know about its
> > relevance). In particular, check out these groups:
> >
> > http://dataportability.org/ (more for social networks)
> > http://theinfo.org/ -- for processing large datasets.
> >
> > A good first step might be to make the bioinformatics databases expose
> > MySQL/PostgreSQL interfaces to the public for automated querying. Would
> > this be useful?
> >
> > - Bryan
> > ________________________________________
> > Bryan Bishop
> > http://heybryan.org/
> >
> > _______________________________________________
> > Standards mailing list
> > Standards at biobricks.org
> > http://biobricks.org/mailman/listinfo/standards_biobricks.org
> >
> >
>
> --
> ________________________________
>
> Dr. Raik Gruenberg
> http://www.raiks.de/contact.html
> ________________________________
>
> _______________________________________________
> Standards mailing list
> Standards at biobricks.org
> http://biobricks.org/mailman/listinfo/standards_biobricks.org
>
--
Jason Morrison
jason.p.morrison at gmail.com
http://jayunit.net
(585) 216-5657
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://biobricks.org/pipermail/standards_biobricks.org/attachments/20080212/d1462710/attachment.html
More information about the Standards
mailing list