.. currentmodule:: pywps

.. _process:

Processes
#########

.. versionadded:: 4.0.0

.. todo::
    * Input validation
    * IOHandler

PyWPS works with processes and services. A process is a Python `Class`
containing an `handler` method and a list of inputs and outputs. A PyWPS
service instance is then a collection of selected processes.

PyWPS does not ship with any processes predefined - it's on you, as user of
PyWPS to set up the processes of your choice. PyWPS is here to help you
publishing your awesome geospatial operation on the web - it takes care of
communication and security, you then have to add the content.

.. note:: There are some example processes in the `PyWPS-Flask`_ project.

Writing a Process
=================

.. note:: At this place, you should prepare your environment for final
        :ref:`deployment`. At least, you should create a single directory with
        your processes, which is typically named `processes`::

        $ mkdir processes

        In this directory, we will create single python scripts containing
        processes.

        Processes can be located *anywhere in the system* as long as their
        location is identified in the :envvar:`PYTHONPATH` environment
        variable, and can be imported in the final server instance.

A processes is coded as a class inheriting from :class:`Process`.
In the `PyWPS-Flask`_ server they are
kept inside the *processes* folder, usually in separated files.

The instance of a *Process* needs following attributes to be configured:

:identifier:
    unique identifier of the process
:title:
    corresponding title
:inputs:
    list of process inputs
:outputs:
    list of process outputs
:handler:
    method which recieves :class:`pywps.app.WPSRequest` and :class:`pywps.response.WPSResponse` as inputs.

Example vector buffer process
=============================

As an example, we will create a *buffer* process - which will take a vector
file as the input, create specified the buffer around the data (using `Shapely
<http://toblerity.org/shapely/>`_),  and return back the result.

Therefore, the process will have two inputs:

* `ComplexData` input - the vector file
* `LiteralData` input - the buffer size

And it will have one output:

* `ComplexData` output - the final buffer

The process can be called `demobuffer` and we can now start coding it::

    $ cd processes
    $ $EDITOR demobuffer.py

At the beginning, we have to import the required classes and modules

Here is a very basic example:

.. literalinclude:: demobuffer.py
   :language: python
   :lines: 28-31
   :linenos:
   :lineno-start: 28

As the next step, we define a list of inputs. The first input is
:class:`pywps.ComplexInput` with the identifier `vector`, title `Vector map`
and there is only one allowed format: GML.

The next input is :class:`pywps.LiteralInput`, with the identifier `size` and
the data type set to `float`:

.. literalinclude:: demobuffer.py
   :language: python
   :lines: 33-40
   :linenos:
   :lineno-start: 33

Next we define the output `output` as :class:`pywps.ComplexOutput`. This
output supports GML format only.

.. literalinclude:: demobuffer.py
   :language: python
   :lines: 42-46
   :linenos:
   :lineno-start: 42

Next we create a new list variables for inputs and outputs.

.. literalinclude:: demobuffer.py
   :language: python
   :lines: 48-49
   :linenos:
   :lineno-start: 48

Next we define the *handler* method. In it, *geospatial analysis
may happen*. The method gets a :class:`pywps.app.WPSRequest` and a
:class:`pywps.response.WPSResponse` object as parameters. In our case, we
calculate the buffer around each vector feature using
`GDAL/OGR library <http://gdal.org>`_. We will not got much into the details,
what you should note is how to get input data from the
:class:`pywps.app.WPSRequest` object and how to set data as outputs in the
:class:`pywps.response.WPSResponse` object.

.. literalinclude:: demobuffer.py
   :language: python
   :pyobject: _handler
   :emphasize-lines: 8-12, 50-54
   :linenos:
   :lineno-start: 68

At the end, we put everything together and create new a `DemoBuffer` class with
handler, inputs and outputs. It's based on :class:`pywps.Process`:

.. literalinclude:: demobuffer.py
   :pyobject: DemoBuffer
   :language: python
   :linenos:
   :lineno-start: 51


Declaring inputs and outputs
============================

Clients need to know which inputs the processes expects. They can be declared
as :class:`pywps.Input` objects in the :class:`Process` class declaration:

.. code-block:: python

    from pywps import Process, LiteralInput, LiteralOutput

    class FooProcess(Process):
        def __init__(self):
            inputs = [
                LiteralInput('foo', data_type='string'),
                ComplexInput('bar', [Format('text/xml')])
            ]
            outputs = [
                LiteralOutput('foo_output', data_type='string'),
                ComplexOutput('bar_output', [Format('JSON')])
            ]

            super(FooProcess, self).__init__(
                ...
                inputs=inputs,
                outputs=outputs
            )
            ...

.. note:: A more generic description can be found in :ref:`wps` chapter.

LiteralData
-----------

* :class:`LiteralInput`
* :class:`LiteralOutput`

A simple value embedded in the request. The first argument is a
name. The second argument is the type, one of `string`, `float`,
`integer` or `boolean`.

ComplexData
-----------

* :class:`ComplexInput`
* :class:`ComplexOutput`

A large data object, for example a layer. ComplexData do have a `format`
attribute as one of their key properties. It's either a list of supported
formats or a single (already selected) format. It shall be an instance of
the :class:`pywps.inout.formats.Format` class.

ComplexData :class:`Format` and input validation
------------------------------------------------
The ComplexData needs as one of its parameters a list of supported data
formats. They are derived from the :class:`Format` class. A :class:`Format`
instance needs, among others, a `mime_type` parameter, a `validate`
method -- which is used for input data validation -- and also a `mode`
parameter -- defining how strict the validation should be (see
:class:`pywps.validator.mode.MODE`).

The `Validate` method is up to you, the user, to code. It requires two input
paramers - `data_input` (a :class:`ComplexInput` object), and `mode`. This
methid must return a `boolean` value indicating whether the input data are
considered valid or not for given `mode`. You can draw inspiration from the
:py:func:`pywps.validator.complexvalidator.validategml` method.

The good news is: there are already predefined validation methods for the ESRI
Shapefile, GML and GeoJSON formats, using GDAL/OGR. There is also an XML Schema
validaton and a JSON schema validator - you just have to pick the propper
supported formats from the :class:`pywps.inout.formats.FORMATS` list and set
the validation mode to your :class:`ComplexInput` object.

Even better news is: you can define custom validation functions and validate
input data according to your needs.

BoundingBoxData
---------------

* :class:`BoundingBoxInput`
* :class:`BoundingBoxOutput`

BoundingBoxData contain information about the bounding box of the desired area
and coordinate reference system. Interesting attributes of the BoundingBoxData
are:

`crs`
    current coordinate reference system
`dimensions`
    number of dimensions
`ll`
    pair of coordinates (or triplet) of the lower-left corner
`ur`
    pair of coordinates (or triplet) of the upper-right corner


Accessing the inputs and outputs in the `handler` method
========================================================

Handlers receive as input argument a :class:`WPSRequest` object. Input
values are found in the `inputs` dictionary::

    @staticmethod
    def _handler(request, response):
        name = request.inputs['name'][0].data
        response.outputs['output'].data = 'Hello world %s!' % name
        return response

`inputs` is a plain Python dictionary.
Most of the inputs and outputs are derived from the :class:`IOHandler` class.
This enables the user to access the data in four different ways:

`input.file`
    Returns a file name - you can access the data using the name of the file
    stored on the hard drive.
`input.url`
    Return a link to the resource using either the ``file://`` or ``http://`` scheme. The target of the url is not downloaded to the PyWPS server until its content is explicitly accessed through either one of the ``file``, ``data`` or ``stream`` attributes.
`input.data`
    Is the direct link to the data themselves. No need to create a file object
    on the hard drive or opening the file and closing it - PyWPS will do
    everything for you.
`input.stream`
    Provides the IOStream of the data. No need for opening the file, you just
    have to `read()` the data.

Because there could be multiple input values with the same identifier, the
inputs are accessed with an index. For example::

    request.inputs['file_input'][0].file
    request.inputs['data_input'][0].data
    request.inputs['stream_input'][0].stream
    url_input = request.inputs['url_input'][0]

As mentioned, if an input is a link to a remote file (an ``http`` address), accessing the ``url`` attribute simply returns the url's string, but accessing any other attribute triggers the file's download::

    url_input.url # returns the link as a string (no download)
    url_input.file # downloads target and returns the local path
    url_input.data # returns the content of the local copy


PyWPS will persistently transform the input (and output) data to the desired
form. You can also set the data for your `Output` object like `output.data = 1`
or `output.file = "myfile.json"` - it works the same way. However, once the source
type is set, it cannot be changed. That is, a `ComplexOutput` whose ``data``
attribute has been set once has read-only access to the three other attributes
(``file``, ``stream`` and ``url``), while the ``data`` attribute can be freely
modified.


Progress and status report
==========================

OGC WPS standard enables asynchronous process execution call, that is in
particular useful, when the process execution takes longer time - process
instance is set to background and WPS Execute Response document with `ProcessAccepted`
messag is returned immediately to the client. The client has to check
`statusLocation` URL, where the current status report is deployed, say every
n-seconds or n-minutes (depends on calculation time). Content of the response is
usually `percentDone` information about the progress along with `statusMessage`
text information, what is currently happening.

You can set process status any time in the `handler` using the
:py:func:`WPSResponse.update_status` function.


Returning large data
====================
WPS allows for a clever method of returning a large data file: instead
of embedding the data in the response, it can be saved separately, and
a URL is returned from where the data can be downloaded. In the current
implementation, PyWPS saves the file in a folder specified
in the configuration passed by the service (or in a default location).
The URL returned is embedded in the XML response.

This behaviour can be requested either by using a GET::

    ...ResponseDocument=output=@asReference=true...

Or a POST request::

    ...
    <wps:ResponseForm>
        <wps:ResponseDocument>
            <wps:Output asReference="true">
                <ows:Identifier>output</ows:Identifier>
                <ows:Title>Some Output</ows:Title>
            </wps:Output>
        </wps:ResponseDocument>
    </wps:ResponseForm>
    ...

**output** is the identifier of the output the user wishes to have stored
and accessible from a URL. The user may request as many outputs by reference
as needed, but only *one* may be requested in RAW format.


Process deployment
==================
In order for clients to invoke processes, a PyWPS
:class:`Service` class must be present with the ability to listen for requests.
An instance of this class must created, receiving instances of
all the desired processes classes.

In the *flask* example service the :class:`Service` class instance is created in the
:class:`Server` class. :class:`Server` is a development server that relies
on `Flask`_. The publication of processes is encapsulated in *demo.py*, where
a main method passes a list of processes instances to the
:class:`Server` class::

    from pywps import Service
    from processes.helloworld import HelloWorld
    from processes.demobuffer import DemoBuffer

    ...
    processes = [ DemoBuffer(), ... ]

    server = Server(processes=processes)

    ...

Running the dev server
======================

The :ref:`flask` server is a `WSGI application`_ that accepts incoming `Execute`
requests and calls the appropriate process to handle them. It also
answers `GetCapabilities` and `DescribeProcess` requests based on the
process identifier and their inputs and outputs.

.. _WSGI application: http://werkzeug.pocoo.org/docs/terms/#wsgi


A host, a port, a config file and the processes can be passed as arguments to the
:class:`Server` constructor.
**host** and **port** will be **prioritised** if passed to the constructor,
otherwise the contents of the config file (`pywps.cfg`) are used.


Use the `run` method to start the server::

    ...
    s = Server(host='localhost', processes=processes, config_file=config_file)
    s.run()
    ...

To make the server visible from another computer, replace ``localhost`` with ``0.0.0.0``.

Automated process documentation
===============================

A :class:`Process` can be automatically documented with `Sphinx`_ using the
`autoprocess` directive. The :class:`Process` object is instantiated and its
content examined to create, behind the scenes, a docstring in the Numpy format. This
lets developers embed the documentation directly in the code instead of having to
describe each process manually. For example::

  .. autoprocess:: pywps.tests.DocExampleProcess
     :docstring:
     :skiplines: 1

would yield

.. autoprocess:: pywps.tests.DocExampleProcess
   :docstring:
   :skiplines: 1

The :option:`docstring` option fetches the :class:`Process` docstring and appends it after the
Reference section. The first lines of this docstring can be skipped using the
:option:`skiplines` option.

To use the `autoprocess` directive, first add `'sphinx.ext.napoleon'` and
`'pywps.ext_autodoc'` to the list of extensions in the Sphinx configuration file
:file:`conf.py`. Then, insert `autoprocess` directives in your documentation
source files, just as you would use an `autoclass` directive, and build the
documentation.

Note that for input and output parameters, the `title` is displayed only if no `abstract`
is defined. In other words, if both `title` and `abstract` are given, only the `abstract`
will be included in the documentation to avoid redundancy.

.. _Flask: http://flask.pocoo.org
.. _PyWPS-Flask: http://github.com/geopython/pywps-flask
.. _Sphinx: http://sphinx-doc.org