doc/applications/arclink

SeedLink was designed for real-time data transfer. A SeedLink client can only access data that is in a relatively short real-time ringbuffer. Moreover, SeedLink does neither have the functionality to query the station database nor deal with the instrument responses and thus does not support full SEED. ArcLink complements SeedLink by providing the above functionality. The ArcLink protocol is similar to SeedLink: it is based on TCP and uses simple commands in ASCII coding. One conceptual difference is that the client not “subscribes” to real-time streams, but requests data based on time windows. Unlike SeedLink, the data will not be sent immediately, but possibly minutes or even hours later, when the request is processed. An ArcLink request is associated with a request !ID that can be used by the client to get the status of the request, to download the data and to delete the request. The ArcLink server does not access the data archive directly, but delegates this job to a “request handler”. Thus, it is possible to use ArcLink for accessing different data archives by using different request handlers. This is equivalent to SeedLink that can get real-time data from different sources. The request handler is analogous to a SeedLink plug-in, however, while SeedLink starts exactly one instance of each defined plug-in at startup, ArcLink uses a single request handler and starts one instance of request handler per request. In addition to waveforms and metadata, it is also possible to request routing information from an ArcLink server. The routing information tell which ArcLink server provides the data of a given station. The routing database itself is supposed to be synchronized between all ArcLink servers. In this way a client can connect to any public ArcLink server, requests routing information and splits the request accordingly.

XML file formats involved with the ArcLink protocol

Status XML:

The status XML document is returned by the status command as described below. Some description of this XML document can be found here: http://www.seiscomp3.org/wiki/doc/applications/arclink-status-xml

Inventory XML:

Is the ArcLink Mapping of the Seiscomp3 iventory schemma.

Routing XML:

Is the ArcLink Mapping of the Seiscomp3 routing schemma.

Request format

The generic request format is the following:

REQUEST 
<request_type> <optional_attributes>
<start_time>,<end_time>,<net>,<station>,<stream>,<loc_id>,<optional_constraints>
[more request lines...]
END

Allowed request types are currently WAVEFORM, RESPONSE, INVENTORY, ROUTING and QC. Data format of WAVEFORM and RESPONSE requests is SEED (Mini-SEED, dataless SEED, full SEED). Data format of INVENTORY, ROUTING and QC requests is XML. Data can be optionally compressed by bzip2.

WAVEFORM request

If request_type==WAVEFORM, attributes “format” and “compression” are defined. The value of “format” can be “MSEED” for Mini-SEED or “FSEED” (default) for full SEED; “compression” can be “bzip2” or “none” (default). Wildcards are allowed only in stream and loc_id. Constraints are not allowed. loc_id is optional. If loc_id is missing or “.”, only streams with empty location ID are requested. Sample waveform request:

REQUEST WAVEFORM format=MSEED
2005,09,01,00,05,00 2005,09,01,00,10,00 IA PPI BHZ .
END
RESPONSE request

If request_type==RESPONSE, attribute “compression” is defined, which can be “bzip2” or “none” (default). Constraints are not allowed. Wildcard “*” is allowed in station stream and loc_id, so it is possible to request a dataless volume of a whole network. If loc_id is missing or “.”, only streams with empty location ID are included in the dataless volume.

INVENTORY request

If request_type==INVENTORY, attributes “instruments”, “compression” and ”modified_after” are defined. The value of “instruments” can be “true” or ”false”, “compression” can be “bzip2” or “none” (default), and ”modified after”, if present, must contain an ISO time string. instruments [false] whether instrument data is added to XML compression [none] compress XML data modified after if set, only entries modified after given time will be returned. Can be used for DB synchronization. Wildcard “*” is allowed in all fields, except start time and end time. station, stream and loc_id are optional. If station or stream is not specified, the respective elements are not added to the XML tree; if loc_id is missing or “.”, only streams with empty location ID are included. For example, to request a just a list of GEOFON stations (but not stream information), one would use:

REQUEST INVENTORY
1990,1,1,0,0,0 2030,12,31,0,0,0 GE *
END

The following constraints are defined:

sensortype
limit streams to those using specific sensor types: “VBB”, “BB”, “SM”, “OBS”, etc. Can be also a combination like “VBB+BB+SM”.
latmin
minimum latitude
latmax
maximum latitude
lonmin
minimum longitude
lonmax
maximum longitude
permanent
true or false, requesting only permanent or temporary networks respectively
restricted
true or false, requesting only networks/stations/streams that have restricted or open data respectively.

If any of station, stream or loc id is missing, one or more dots should be used before constraints. For example, to request the list of networks with open data, one would use:

REQUEST INVENTORY
1990,1,1,0,0,0 2030,12,31,0,0,0 * . restricted=false
END
ROUTING request

If request_type==ROUTING, attributes “compression” and “modified_after” are defined. The value of “compression” can be “bzip2” or “none” (default); ”modified after”, if present, must contain an ISO time string. compression [none] compress XML data modified after if set, only entries modified after given time will be returned. Can be used for DB synchronization. Wildcard “*” is allowed in all fields, except start time and end time. Constraints are not allowed. All fields except start time, end time and net are optional; missing station stands for “default route” of a given network. stream and loc id are ignored.

QC request

If request_type==QC, attributes “compression”, ”outages”, ”logs” and ”parameters” are defined. The value of “compression” can be “bzip2” or “none” (default). compression [none] compress XML data outages include list of outages (“true” or “false”). logs include log messages (“true” or “false”). parameters comma-separated list of QC parameters. Wildcard “*” is allowed in all fields, except start time and end time. All fields must be present. Constraints are not allowed. The following QC parameters are implemented in the present version: availability, delay, gaps count, gaps interval, gaps length, latency, offset, overlaps count, overlaps interval, overlaps length, rms, spikes amplitude, spikes count, spikes interval, timing quality. These parameters are documented at doc/applications/scqc.

Client protocol

ArcLink commands consist of an ASCII string followed by zero or more arguments separated by spaces and terminated with carriage return (<cr>, ASCII code 13) followed by an optional line feed (<lf>, ASCII code 10). Except for STATUS, the response on the command consists of one or several lines terminated with <cr><lf>. Unless noted otherwise, the response is OK<cr><lf>or ERROR<cr><lf>, depending if the command has been successful or not. After getting the ERROR response, it is possible to retrieve the error message with SHOWERR. The following ArcLink commands are defined:

HELLO
returns two <cr><lf>-terminated lines with the software version and the data centre name.
BYE
close the connection (useful for testing the server with telnet, otherwise it is enough to close the client-side socket).
USER <username> <password>
authenticates the user; required before any of the following commands.
INSTITUTION <any string>
optionally specifies institution name.
LABEL <label>
optional label of request.
SHOWERR
returns one <cr><lf>-terminated line, containing the error message (to be used after getting the ERROR response).
REQUEST <request_type> <optional_attributes>
start of a request
END
end of a request; if successful, returns request ID, otherwise ERROR<cr><lf>
STATUS req id
send status of request with the ID “req id”. if “req id”==ALL, the status of all requests of the user are send. The response is either ERROR<cr><lf>or an XML document, followed by END<cr><lf>.
DOWNLOAD <req_id[.vol_id] [pos]>
download the result of the request. Response is ERROR<cr><lf>or size, followed by the data and END<cr><lf>. Optional argument “pos” enables to resume a broken download.
BDOWNLOAD <req_id[.vol_id] [pos]>
like DOWNLOAD, but will block the download until the data are complete.
PURGE <req_id>
delete the result of a request from the server.

Request handler protocol

The ArcLink server sends a request to a request handler in the following format:

USER <username> <password>
[INSTITUTION <any string>]
[LABEL <label>]
REQUEST <request_type> <req_id> <optional_attributes>
[one or more request lines...]
END

After receiving the request, the request handler can send responses to the server. The following responses are defined:

RESTRICTED
the request handler indicates that the final volume will contain restricted data.
STATUS LINE <n> PROCESSING <vol_id>
add request line number n (0-based) to volume vol id. The volume is created if it does not already exist.
STATUS <ref> <status>
set line or volume status, where ref is “LINE n” or ”VOLUME vol id” and status is one of the following:

StatusDescription
OK request successfully processed, data available
NODATA no processing errors, but data not available
WARN processing errors, some downloadable data available
ERROR processing errors, no downloadable data available
RETRY temporarily no data available
DENIED access to data denied for the user
CANCEL processing cancelled (eg., by operator)
MESSAGE
any string error message in case of WARN or ERROR, but can be used regardless of the status (the last message is shown in the STATUS response)
SIZE <n>
data size; in case of a volume, it must be the exact size of downloadable product.
MESSAGE <any_string>
send a general processing (error) message. The last message is shown in the STATUS response.
ERROR
the request handler could not process the request due to an error (e.g., got an unhandled Python exception). This terminates the request and normally the request handler quits. If the request handler does not quit, it should be ready to handle the next request. Note that if the request handler quits (crashes) without sending ERROR, then the request will be repeated (sent to another request handler instance) by the server. This behavior might be changed in future server versions to avoid loops, e.g., by implying ERROR if the request handler quits.
END
request processing finished normally. The request handler is ready for the next request.

ArcLink server uses file descriptor 62 to send request to a request handler and reads status responses from file descriptor 63. It is possible to test a request handler interactively by running a command similar to following:

python reqhandler -vvv 62>&0 63<&1 >reqhandler.log 2>&1

Now the program “reqhandler” waits for input from terminal and writes output to terminal as well. Additionally a logfile is written.

Let's type the following request:

USER somebody@gfz-potsdam.de
REQUEST WAVEFORM 123 format=MSEED
2008,2,21,2,50,0 2008,2,21,3,10,0 EE MTSE BHZ .
2008,2,21,2,50,0 2008,2,21,3,10,0 GE WLF BHZ .
END

The log might look similar to following:

01 [None] > USER somebody@gfz-potsdam.de
02 [None] > REQUEST WAVEFORM 123 format=MSEED
03 [123] new WAVEFORM request from somebody@gfz-potsdam.de, None
04 [123] > 2008,2,21,2,50,0 2008,2,21,3,10,0 EE MTSE BHZ .
05 [123] < STATUS LINE 0 PROCESSING GFZ
06 [123] > 2008,2,21,2,50,0 2008,2,21,3,10,0 GE WLF BHZ .
07 [123] < STATUS LINE 0 SIZE 43008
08 [123] > END
09 [123] < STATUS LINE 1 PROCESSING GFZ
10 [123] < STATUS LINE 0 OK
11 [123] < STATUS LINE 1 MESSAGE size not known
12 [123] < STATUS LINE 1 OK
13 [123] < STATUS VOLUME GFZ SIZE 73728
14 [123] < STATUS VOLUME GFZ OK
15 [123] < END

Note that the status responses are asynchronous.

02
request ID “123” is chosen by the server (not user!)
05
the first request line is associated with volume ID “GFZ” (chosen by the request handler)
07
request handler tells the size of data that is related to the first request line (optional)
09
the second request line is associated with volume ID “GFZ”
10
processing of the first line completed without error
11
send optional message regarding the second line
12
processing of the second line completed without error
13
request handler tells the size of volume (mandatory)
14
volume completed without error
15
request processing finished

Minimal status response when data is available:

STATUS LINE 0 PROCESSING GFZ
STATUS LINE 1 PROCESSING GFZ
STATUS LINE 0 OK
STATUS LINE 1 OK
STATUS VOLUME GFZ SIZE 73728
STATUS VOLUME GFZ OK
END

Minimal status response (and an optional error message) when data is not available:

STATUS LINE 0 PROCESSING GFZ
STATUS LINE 1 PROCESSING GFZ
STATUS LINE 0 NODATA
STATUS LINE 1 NODATA
STATUS VOLUME GFZ NODATA
MESSAGE optional error message
END

Configuration

arclink.ini has the same syntax as seedlink.ini. It may contain several sections, but only one has the same name as the executable being used. A section in arclink.ini has the following structure (default values are shown in squared brackets, but relying on them is not recommended):

parameter “organization”
organization ID, same as in SeedLink. (Arbitrary string.)

parameter “request_dir”
path to the directory where (temporary) the request files are stored.
parameter “connections” [0]
maximum number of parallel TCP connections (0—no limit).
parameter “connections_per_ip” [0]
maximum number of parallel TCP connections per IP (0—no limit).
parameter “request_queue” [0]
maximum number of requests waiting to be processed. When the request queue is full, no more requests are accepted (0—no limit).
parameter “request_size” [100]
maximum request size in lines.
parameter “handlers_soft” [10]
number of request handler instances to keep running even if they are idle.
parameter “handlers_hard” [100]
maximum numbers of request handler instances, e.g., the maximum number of requests that are processed in parallel.
parameter “handler_cmd”
command to start request handler subprocess, for example “python /some/path/reqhandler.py”
parameter “handler_timeout” [600]
if a request handler blocks the input for more than the given time period in seconds, then the ArcLink server shuts down the request handler (0—no timeout check).
parameter “handler_start_retry” [60]
restart terminated request handlers after this time period in seconds (0—never re-start terminated request handlers). A request handler may terminate itself because of some internal error or it can be shut down by ArcLink if timeout occurs or an invalid response was received.
parameter “handler_shutdown_wait” [10]
wait this time period in seconds for a request handler to terminate the connection itself, then sending the TERM signal (0—wait forever). If a request handler does not terminate on its own within this time period, the KILL signal will be sent.
parameter “port” [18001]
TCP port used by the server.
parameter “lockfile”
path to the lock file; used by the seiscomp utility to check if ArcLink is running.
parameter “statefile”
the state of requests is dumped into this file when ArcLink exits. If this parameter is defined, but the file does not exist (e.g., because ArcLink crashed), then ArcLink reads the *.desc files in the request directory to restore state. If “statefile” is not defined, then ArcLink does not restore the state after restart.
parameter “admin_password”
password of user “admin” (special user that can view requests of all users).
parameter “handlers_*”
maximum number of simultaneous request handler instances per request type.
parameter “swapout_time” [0]
delete finished requests from RAM when not used (STATUS, DOWNLOAD or BDOWNLOAD commands) after the given time span in seconds (0—never delete requests).
parameter “purge_time” [0]
delete finished requests and data products also from the request directory when not used (STATUS, DOWNLOAD or BDOWNLOAD commands) after the given time span in seconds (0—never delete requests).