Processes

New in version 4.0.0.

Todo

  • Input validation
  • IOHandler

PyWPS works with processes and services. A process is a Python Class containing an handler method and a list of inputs and outputs. A PyWPS service instance is then a collection of selected processes.

PyWPS does not ship with any processes predefined - it’s on you, as user of PyWPS to set up the processes of your choice. PyWPS is here to help you publishing your awesome geospatial operation on the web - it takes care of communication and security, you then have to add the content.

Note

There are some example processes in the PyWPS-Flask project.

Writing a Process

Note

At this place, you should prepare your environment for final Deployment to a production server. At least, you should create a single directory with your processes, which is typically named processes:

$ mkdir processes

In this directory, we will create single python scripts containing processes.

Processes can be located anywhere in the system as long as their location is identified in the PYTHONPATH environment variable, and can be imported in the final server instance.

A processes is coded as a class inheriting from Process. In the PyWPS-Flask server they are kept inside the processes folder, usually in separated files.

The instance of a Process needs following attributes to be configured:

identifier:unique identifier of the process
title:corresponding title
inputs:list of process inputs
outputs:list of process outputs
handler:method which recieves pywps.app.WPSRequest and pywps.app.WPSResponse as inputs.

Example vector buffer process

As an example, we will create a buffer process - which will take a vector file as the input, create specified the buffer around the data (using Shapely), and return back the result.

Therefore, the process will have two inputs:

  • ComplexData input - the vector file
  • LiteralData input - the buffer size

And it will have one output:

  • ComplexData output - the final buffer

The process can be called demobuffer and we can now start coding it:

$ cd processes
$ $EDITOR demobuffer.py

At the beginning, we have to import the required classes and modules

Here is a very basic example:

28
29
30
31
from pywps import Process, LiteralInput, ComplexOutput, ComplexInput, Format
from pywps.app.Common import Metadata
from pywps.validator.mode import MODE
from pywps.inout.formats import FORMATS

As the next step, we define a list of inputs. The first input is pywps.ComplexInput with the identifier vector, title Vector map and there is only one allowed format: GML.

The next input is pywps.LiteralInput, with the identifier size and the data type set to float:

33
34
35
36
37
38
39
40
inpt_vector = ComplexInput(
    'vector',
    'Vector map',
    supported_formats=[Format('application/gml+xml')],
    mode=MODE.STRICT
)

inpt_size = LiteralInput('size', 'Buffer size', data_type='float')

Next we define the output output as pywps.ComplexOutput. This output supports GML format only.

42
43
44
45
46
out_output = ComplexOutput(
    'output',
    'HelloWorld Output',
    supported_formats=[Format('application/gml+xml')]
)

Next we create a new list variables for inputs and outputs.

48
49
inputs = [inpt_vector, inpt_size]
outputs = [out_output]

Next we define the handler method. In it, geospatial analysis may happen. The method gets a pywps.app.WPSRequest and a pywps.app.WPSResponse object as parameters. In our case, we calculate the buffer around each vector feature using GDAL/OGR library. We will not got much into the details, what you should note is how to get input data from the pywps.app.WPSRequest object and how to set data as outputs in the pywps.app.WPSResponse object.

 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
def _handler(request, response):
    """Handler method - this method obtains request object and response
    object and creates the buffer
    """

    from osgeo import ogr

    # obtaining input with identifier 'vector' as file name
    input_file = request.inputs['vector'][0].file

    # obtaining input with identifier 'size' as data directly
    size = request.inputs['size'][0].data

    # open file the "gdal way"
    input_source = ogr.Open(input_file)
    input_layer = input_source.GetLayer()
    layer_name = input_layer.GetName()

    # create output file
    driver = ogr.GetDriverByName('GML')
    output_source = driver.CreateDataSource(layer_name,
        ["XSISCHEMAURI=http://schemas.opengis.net/gml/2.1.2/feature.xsd"])
    output_layer = output_source.CreateLayer(layer_name, None, ogr.wkbUnknown)

    # get feature count
    count = input_layer.GetFeatureCount()
    index = 0

    # make buffer for each feature
    while index < count:

        response.update_status('Buffering feature %s' % index, float(index)/count)

        # get the geometry
        input_feature = input_layer.GetNextFeature()
        input_geometry = input_feature.GetGeometryRef()

        # make the buffer
        buffer_geometry = input_geometry.Buffer(
                float(size)
        )

        # create output feature to the file
        output_feature = ogr.Feature(feature_def=output_layer.GetLayerDefn())
        output_feature.SetGeometryDirectly(buffer_geometry)
        output_layer.CreateFeature(output_feature)
        output_feature.Destroy()
        index += 1

    # set output format
    response.outputs['output'].output_format = FORMATS.GML

    # set output data as file name
    response.outputs['output'].file = layer_name

    return response

At the end, we put everything together and create new a DemoBuffer class with handler, inputs and outputs. It’s based on pywps.Process:

51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
class DemoBuffer(Process):
    def __init__(self):

        super(DemoBuffer, self).__init__(
            _handler,
            identifier='demobuffer',
            version='1.0.0',
            title='Buffer',
            abstract='This process demonstrates, how to create any process in PyWPS environment',
            metadata=[Metadata('process metadata 1', 'http://example.org/1'), Metadata('process metadata 2', 'http://example.org/2')]
            inputs=inputs,
            outputs=outputs,
            store_supported=True,
            status_supported=True
        )

Declaring inputs and outputs

Clients need to know which inputs the processes expects. They can be declared as pywps.Input objects in the Process class declaration:

from pywps import Process, LiteralInput, LiteralOutput

class FooProcess(Process):
    def __init__(self):
        inputs = [
            LiteralInput('foo', data_type='string'),
            ComplexInput('bar', [Format('text/xml')])
        ]
        outputs = [
            LiteralOutput('foo_output', data_type='string'),
            ComplexOutput('bar_output', [Format('JSON')])
        ]

        super(FooProcess, self).__init__(
            ...
            inputs=inputs,
            outputs=outputs
        )
        ...

Note

A more generic description can be found in OGC Web Processing Service (OGC WPS) chapter.

LiteralData

A simple value embedded in the request. The first argument is a name. The second argument is the type, one of string, float, integer or boolean.

ComplexData

A large data object, for example a layer. ComplexData do have a format attribute as one of their key properties. It’s either a list of supported formats or a single (already selected) format. It shall be an instance of the pywps.inout.formats.Format class.

ComplexData Format and input validation

The ComplexData needs as one of its parameters a list of supported data formats. They are derived from the Format class. A Format instance needs, among others, a mime_type parameter, a validate method – which is used for input data validation – and also a mode parameter – defining how strict the validation should be (see pywps.validator.mode.MODE).

The Validate method is up to you, the user, to code. It requires two input paramers - data_input (a ComplexInput object), and mode. This methid must return a boolean value indicating whether the input data are considered valid or not for given mode. You can draw inspiration from the pywps.validator.complexvalidator.validategml() method.

The good news is: there are already predefined validation methods for the ESRI Shapefile, GML and GeoJSON formats, using GDAL/OGR. There is also an XML Schema validaton and a JSON schema validator - you just have to pick the propper supported formats from the pywps.inout.formats.FORMATS list and set the validation mode to your ComplexInput object.

Even better news is: you can define custom validation functions and validate input data according to your needs.

BoundingBoxData

BoundingBoxData contain information about the bounding box of the desired area and coordinate reference system. Interesting attributes of the BoundingBoxData are:

crs
current coordinate reference system
dimensions
number of dimensions
ll
pair of coordinates (or triplet) of the lower-left corner
ur
pair of coordinates (or triplet) of the upper-right corner

Accessing the inputs and outputs in the handler method

Handlers receive as input argument a WPSRequest object. Input values are found in the inputs dictionary:

@staticmethod
def _handler(request, response):
    name = request.inputs['name'][0].data
    response.outputs['output'].data = 'Hello world %s!' % name
    return response

inputs is a plain Python dictionary. Most of the inputs and outputs are derived from the IOHandler class. This enables the user to access the data in 3 different ways:

input.file
Returns a file name - you can access the data using the name of the file stored on the hard drive.
input.data
Is the direct link to the data themselves. No need to create a file object on the hard drive or opening the file and closing it - PyWPS will do everything for you.
input.stream
Provides the IOStream of the data. No need for opening the file, you just have to read() the data.

PyWPS will persistently transform the input (and output) data to the desired form. You can also set the data for your Output object like output.data = 1 or output.file = “myfile.json” - it works the same way.

Example:

request.inputs['file_input'][0].file
request.inputs['data_input'][0].data
request.inputs['stream_input'][0].stream

Because there could be multiple input values with the same identifier, the inputs are accessed with an index. For LiteralInput, the value is a string. For ComplexInput, the value is an open file object, with a mime_type attribute:

@staticmethod
def handler(request, response):
    layer_file = request.inputs['layer'][0].file
    mime_type = layer_file.mime_type
    bytes = layer_file.read()
    msg = ("You gave me a file of type %s and size %d"
           % (mime_type, len(bytes)))
    response.outputs['output'].data = msg
    return response

Progress and status report

OGC WPS standard enables asynchronous process execution call, that is in particular useful, when the process execution takes longer time - process instance is set to background and WPS Execute Response document with ProcessAccepted messag is returned immediately to the client. The client has to check statusLocation URL, where the current status report is deployed, say every n-seconds or n-minutes (depends on calculation time). Content of the response is usually percentDone information about the progress along with statusMessage text information, what is currently happening.

You can set process status any time in the handler using the WPSResponse.update_status() function.

Returning large data

WPS allows for a clever method of returning a large data file: instead of embedding the data in the response, it can be saved separately, and a URL is returned from where the data can be downloaded. In the current implementation, PyWPS saves the file in a folder specified in the configuration passed by the service (or in a default location). The URL returned is embedded in the XML response.

This behaviour can be requested either by using a GET:

...ResponseDocument=output=@asReference=true...

Or a POST request:

...
<wps:ResponseForm>
    <wps:ResponseDocument>
        <wps:Output asReference="true">
            <ows:Identifier>output</ows:Identifier>
            <ows:Title>Some Output</ows:Title>
        </wps:Output>
    </wps:ResponseDocument>
</wps:ResponseForm>
...

output is the identifier of the output the user wishes to have stored and accessible from a URL. The user may request as many outputs by reference as needed, but only one may be requested in RAW format.

Process deployment

In order for clients to invoke processes, a PyWPS Service class must be present with the ability to listen for requests. An instance of this class must created, receiving instances of all the desired processes classes.

In the flask example service the Service class instance is created in the Server class. Server is a development server that relies on Flask. The publication of processes is encapsulated in demo.py, where a main method passes a list of processes instances to the Server class:

from pywps import Service
from processes.helloworld import HelloWorld
from processes.demobuffer import DemoBuffer

...
processes = [ DemoBuffer(), ... ]

server = Server(processes=processes)

...

Running the dev server

The flask server is a WSGI application that accepts incoming Execute requests and calls the appropriate process to handle them. It also answers GetCapabilities and DescribeProcess requests based on the process identifier and their inputs and outputs.

A host, a port, a config file and the processes can be passed as arguments to the Server constructor. host and port will be prioritised if passed to the constructor, otherwise the contents of the config file (pywps.cfg) are used.

Use the run method to start the server:

...
s = Server(host='localhost', processes=processes, config_file=config_file)
s.run()
...

To make the server visible from another computer, replace localhost with 0.0.0.0.