Pages

Tuesday, 22 June 2010

Web Process Service Round Up

I have a fun bit of work lined up - updating the web processing service client code in uDig.

It is no secret that I am a huge fan on the idea of Web Processing Service - I am excited about the possibilities in using a WPS as a front to a grid of computers (a strategy 52North seems to be pursuing), the ability to bundle up processes written in a number of languages (something ZooWPS is really going after).

The part I am really keen on does not seem to be tackled yet: I am very interested in chaining processes using standard diagrams such as BPEL - this represents a really nice olive branch between GIS and the business analysts that would love to know what the department is doing). There is some confusion in this area as the diagrams end up looking similar to those provided by BI tools (since GIS is used for decision making) or similar to ETL tools (since chains of processing are required).

Today am making contact with the different web processing service implementations and warning them what I am up to and generally finding out where they live and what is a good contact point for communications.

Thus far:
  • 52North - 30 mins to respond to email, seems to be very active and able to link to an example WPS service right out of the gate. This is the established open source WPS solution and I am looking forward to seeing how it handles feature collections and raster processing.
  • ZooWPS - no response to email yet, but the IRC channel was well populated (turns out half the members were my LISAsoft co-workers from different offices around Australia). This is the new kid on the block in the WPS space
  • GeoServer WPS Community Module - no email since I had already been following that email list. The GeoServer WPS community module has been very quiet in its development but has made recent progress in the two areas I am interested in testing.
  • deegree 3 is working on their second generation WPS implementation and is under active development - I may end up building from source in order to have something to test. It is great to see the continued support of WPS here (deegree 2 worked against an earlier version of the specification).
The two areas I am targeting each have their own special risks.

Features should be the bread and butter of GIS processing and we are held back in this area by the generally hap hazard support for GML. I can see nailing everything to the wall using GML and XML Schema - this is really what should be done - (since it is a data interchange format) when shuttling data between services. GML allows us to communicate the range and limits of the data and be able to negotiate differences between data models. I could see using this approach in an ETL context or when doing scientific work.

The expectations of the current crop of implementations are in a slightly different direction: focus on geometry (hey it is spatial!) and have the attributes carried along for the ride. The ZooWPS implementation also supports GeoJason which is very good for this style of ad-hoc collaboration. Even for this ad-hoc style we will need to indicate "which" geometry in a feature needs to be acted on ... so it should be fun seeing what the different implementations have provided.

Raster data is also interesting/scary. There is an answer in place for the obvious question of data size (the WPS specification accounts for this by allowing long running processes making use of FTP sites for staging results). The other question is the same one encountered by web coverage service; what does the data mean? Which bands mean what and how is your DEM height measured etc. I am really not sure if WPS is up to capturing this information; will the file format headers capture this in enough detail; or will each process need to be supplied hints to sort out how to interact with the information.

2 comments:

Tom Kralidis said...

Also check out PyWPS for another implementation example.

Unknown said...

Jody,

Don't forget WCPS. Rasdaman has an implementation.

With regards to your comments on what does the raster data mean... The same comments also apply to vector data.

It will be interesting to see how much easier modeling our data via data models and exchanging the data via Community Schema makes creating future services.

I suspect that it will make it much easier to chain services together when we can automate data processing with valid assumptions based on authoritative definitions of the data being consumed (because it adheres to a well thought out and well defined data (and data exchange) model).

Bruce