.. currentmodule:: pywps
.. _process:
Processes
#########
.. versionadded:: 4.0.0
.. todo::
* Input validation
* IOHandler
PyWPS works with processes and services. A process is a Python `Class`
containing an `handler` method and a list of inputs and outputs. A PyWPS
service instance is then a collection of selected processes.
PyWPS does not ship with any processes predefined - it's on you, as user of
PyWPS to set up the processes of your choice. PyWPS is here to help you
publishing your awesome geospatial operation on the web - it takes care of
communication and security, you then have to add the content.
.. note:: There are some example processes in the `PyWPS-Flask`_ project.
Writing a Process
=================
.. note:: At this place, you should prepare your environment for final
:ref:`deployment`. At least, you should create a single directory with
your processes, which is typically named `processes`::
$ mkdir processes
In this directory, we will create single python scripts containing
processes.
Processes can be located *anywhere in the system* as long as their
location is identified in the :envvar:`PYTHONPATH` environment
variable, and can be imported in the final server instance.
A processes is coded as a class inheriting from :class:`Process`.
In the `PyWPS-Flask`_ server they are
kept inside the *processes* folder, usually in separated files.
The instance of a *Process* needs following attributes to be configured:
:identifier:
unique identifier of the process
:title:
corresponding title
:inputs:
list of process inputs
:outputs:
list of process outputs
:handler:
method which recieves :class:`pywps.app.WPSRequest` and :class:`pywps.response.WPSResponse` as inputs.
Example vector buffer process
=============================
As an example, we will create a *buffer* process - which will take a vector
file as the input, create specified the buffer around the data (using `Shapely
`_), and return back the result.
Therefore, the process will have two inputs:
* `ComplexData` input - the vector file
* `LiteralData` input - the buffer size
And it will have one output:
* `ComplexData` output - the final buffer
The process can be called `demobuffer` and we can now start coding it::
$ cd processes
$ $EDITOR demobuffer.py
At the beginning, we have to import the required classes and modules
Here is a very basic example:
.. literalinclude:: demobuffer.py
:language: python
:lines: 28-31
:linenos:
:lineno-start: 28
As the next step, we define a list of inputs. The first input is
:class:`pywps.ComplexInput` with the identifier `vector`, title `Vector map`
and there is only one allowed format: GML.
The next input is :class:`pywps.LiteralInput`, with the identifier `size` and
the data type set to `float`:
.. literalinclude:: demobuffer.py
:language: python
:lines: 33-40
:linenos:
:lineno-start: 33
Next we define the output `output` as :class:`pywps.ComplexOutput`. This
output supports GML format only.
.. literalinclude:: demobuffer.py
:language: python
:lines: 42-46
:linenos:
:lineno-start: 42
Next we create a new list variables for inputs and outputs.
.. literalinclude:: demobuffer.py
:language: python
:lines: 48-49
:linenos:
:lineno-start: 48
Next we define the *handler* method. In it, *geospatial analysis
may happen*. The method gets a :class:`pywps.app.WPSRequest` and a
:class:`pywps.response.WPSResponse` object as parameters. In our case, we
calculate the buffer around each vector feature using
`GDAL/OGR library `_. We will not got much into the details,
what you should note is how to get input data from the
:class:`pywps.app.WPSRequest` object and how to set data as outputs in the
:class:`pywps.response.WPSResponse` object.
.. literalinclude:: demobuffer.py
:language: python
:pyobject: _handler
:emphasize-lines: 8-12, 50-54
:linenos:
:lineno-start: 68
At the end, we put everything together and create new a `DemoBuffer` class with
handler, inputs and outputs. It's based on :class:`pywps.Process`:
.. literalinclude:: demobuffer.py
:pyobject: DemoBuffer
:language: python
:linenos:
:lineno-start: 51
Declaring inputs and outputs
============================
Clients need to know which inputs the processes expects. They can be declared
as :class:`pywps.Input` objects in the :class:`Process` class declaration:
.. code-block:: python
from pywps import Process, LiteralInput, LiteralOutput
class FooProcess(Process):
def __init__(self):
inputs = [
LiteralInput('foo', data_type='string'),
ComplexInput('bar', [Format('text/xml')])
]
outputs = [
LiteralOutput('foo_output', data_type='string'),
ComplexOutput('bar_output', [Format('JSON')])
]
super(FooProcess, self).__init__(
...
inputs=inputs,
outputs=outputs
)
...
.. note:: A more generic description can be found in :ref:`wps` chapter.
LiteralData
-----------
* :class:`LiteralInput`
* :class:`LiteralOutput`
A simple value embedded in the request. The first argument is a
name. The second argument is the type, one of `string`, `float`,
`integer` or `boolean`.
ComplexData
-----------
* :class:`ComplexInput`
* :class:`ComplexOutput`
A large data object, for example a layer. ComplexData do have a `format`
attribute as one of their key properties. It's either a list of supported
formats or a single (already selected) format. It shall be an instance of
the :class:`pywps.inout.formats.Format` class.
ComplexData :class:`Format` and input validation
------------------------------------------------
The ComplexData needs as one of its parameters a list of supported data
formats. They are derived from the :class:`Format` class. A :class:`Format`
instance needs, among others, a `mime_type` parameter, a `validate`
method -- which is used for input data validation -- and also a `mode`
parameter -- defining how strict the validation should be (see
:class:`pywps.validator.mode.MODE`).
The `Validate` method is up to you, the user, to code. It requires two input
paramers - `data_input` (a :class:`ComplexInput` object), and `mode`. This
methid must return a `boolean` value indicating whether the input data are
considered valid or not for given `mode`. You can draw inspiration from the
:py:func:`pywps.validator.complexvalidator.validategml` method.
The good news is: there are already predefined validation methods for the ESRI
Shapefile, GML and GeoJSON formats, using GDAL/OGR. There is also an XML Schema
validaton and a JSON schema validator - you just have to pick the propper
supported formats from the :class:`pywps.inout.formats.FORMATS` list and set
the validation mode to your :class:`ComplexInput` object.
Even better news is: you can define custom validation functions and validate
input data according to your needs.
BoundingBoxData
---------------
* :class:`BoundingBoxInput`
* :class:`BoundingBoxOutput`
BoundingBoxData contain information about the bounding box of the desired area
and coordinate reference system. Interesting attributes of the BoundingBoxData
are:
`crs`
current coordinate reference system
`dimensions`
number of dimensions
`ll`
pair of coordinates (or triplet) of the lower-left corner
`ur`
pair of coordinates (or triplet) of the upper-right corner
Accessing the inputs and outputs in the `handler` method
========================================================
Handlers receive as input argument a :class:`WPSRequest` object. Input
values are found in the `inputs` dictionary::
@staticmethod
def _handler(request, response):
name = request.inputs['name'][0].data
response.outputs['output'].data = 'Hello world %s!' % name
return response
`inputs` is a plain Python dictionary.
Most of the inputs and outputs are derived from the :class:`IOHandler` class.
This enables the user to access the data in four different ways:
`input.file`
Returns a file name - you can access the data using the name of the file
stored on the hard drive.
`input.url`
Return a link to the resource using either the ``file://`` or ``http://`` scheme. The target of the url is not downloaded to the PyWPS server until its content is explicitly accessed through either one of the ``file``, ``data`` or ``stream`` attributes.
`input.data`
Is the direct link to the data themselves. No need to create a file object
on the hard drive or opening the file and closing it - PyWPS will do
everything for you.
`input.stream`
Provides the IOStream of the data. No need for opening the file, you just
have to `read()` the data.
Because there could be multiple input values with the same identifier, the
inputs are accessed with an index. For example::
request.inputs['file_input'][0].file
request.inputs['data_input'][0].data
request.inputs['stream_input'][0].stream
url_input = request.inputs['url_input'][0]
As mentioned, if an input is a link to a remote file (an ``http`` address), accessing the ``url`` attribute simply returns the url's string, but accessing any other attribute triggers the file's download::
url_input.url # returns the link as a string (no download)
url_input.file # downloads target and returns the local path
url_input.data # returns the content of the local copy
PyWPS will persistently transform the input (and output) data to the desired
form. You can also set the data for your `Output` object like `output.data = 1`
or `output.file = "myfile.json"` - it works the same way. However, once the source
type is set, it cannot be changed. That is, a `ComplexOutput` whose ``data``
attribute has been set once has read-only access to the three other attributes
(``file``, ``stream`` and ``url``), while the ``data`` attribute can be freely
modified.
Progress and status report
==========================
OGC WPS standard enables asynchronous process execution call, that is in
particular useful, when the process execution takes longer time - process
instance is set to background and WPS Execute Response document with `ProcessAccepted`
messag is returned immediately to the client. The client has to check
`statusLocation` URL, where the current status report is deployed, say every
n-seconds or n-minutes (depends on calculation time). Content of the response is
usually `percentDone` information about the progress along with `statusMessage`
text information, what is currently happening.
You can set process status any time in the `handler` using the
:py:func:`WPSResponse.update_status` function.
Returning large data
====================
WPS allows for a clever method of returning a large data file: instead
of embedding the data in the response, it can be saved separately, and
a URL is returned from where the data can be downloaded. In the current
implementation, PyWPS saves the file in a folder specified
in the configuration passed by the service (or in a default location).
The URL returned is embedded in the XML response.
This behaviour can be requested either by using a GET::
...ResponseDocument=output=@asReference=true...
Or a POST request::
...
output
Some Output
...
**output** is the identifier of the output the user wishes to have stored
and accessible from a URL. The user may request as many outputs by reference
as needed, but only *one* may be requested in RAW format.
Process deployment
==================
In order for clients to invoke processes, a PyWPS
:class:`Service` class must be present with the ability to listen for requests.
An instance of this class must created, receiving instances of
all the desired processes classes.
In the *flask* example service the :class:`Service` class instance is created in the
:class:`Server` class. :class:`Server` is a development server that relies
on `Flask`_. The publication of processes is encapsulated in *demo.py*, where
a main method passes a list of processes instances to the
:class:`Server` class::
from pywps import Service
from processes.helloworld import HelloWorld
from processes.demobuffer import DemoBuffer
...
processes = [ DemoBuffer(), ... ]
server = Server(processes=processes)
...
Running the dev server
======================
The :ref:`flask` server is a `WSGI application`_ that accepts incoming `Execute`
requests and calls the appropriate process to handle them. It also
answers `GetCapabilities` and `DescribeProcess` requests based on the
process identifier and their inputs and outputs.
.. _WSGI application: http://werkzeug.pocoo.org/docs/terms/#wsgi
A host, a port, a config file and the processes can be passed as arguments to the
:class:`Server` constructor.
**host** and **port** will be **prioritised** if passed to the constructor,
otherwise the contents of the config file (`pywps.cfg`) are used.
Use the `run` method to start the server::
...
s = Server(host='localhost', processes=processes, config_file=config_file)
s.run()
...
To make the server visible from another computer, replace ``localhost`` with ``0.0.0.0``.
Automated process documentation
===============================
A :class:`Process` can be automatically documented with `Sphinx`_ using the
`autoprocess` directive. The :class:`Process` object is instantiated and its
content examined to create, behind the scenes, a docstring in the Numpy format. This
lets developers embed the documentation directly in the code instead of having to
describe each process manually. For example::
.. autoprocess:: pywps.tests.DocExampleProcess
:docstring:
:skiplines: 1
would yield
.. autoprocess:: pywps.tests.DocExampleProcess
:docstring:
:skiplines: 1
The :option:`docstring` option fetches the :class:`Process` docstring and appends it after the
Reference section. The first lines of this docstring can be skipped using the
:option:`skiplines` option.
To use the `autoprocess` directive, first add `'sphinx.ext.napoleon'` and
`'pywps.ext_autodoc'` to the list of extensions in the Sphinx configuration file
:file:`conf.py`. Then, insert `autoprocess` directives in your documentation
source files, just as you would use an `autoclass` directive, and build the
documentation.
Note that for input and output parameters, the `title` is displayed only if no `abstract`
is defined. In other words, if both `title` and `abstract` are given, only the `abstract`
will be included in the documentation to avoid redundancy.
.. _Flask: http://flask.pocoo.org
.. _PyWPS-Flask: http://github.com/geopython/pywps-flask
.. _Sphinx: http://sphinx-doc.org