Skip to content

th2-net/th2-crawler

Repository files navigation

Crawler (0.0.3)

Overview

This component sends events/messages to crawler data processor for further processing via gRPC. It requests events/messages for the certain time intervals using rpt-data-provider. Those intervals are processed periodically, and new ones are written to Cradle if necessary.

The crawler data processor must implement the crawler data processor gRPC service.

Configuration parameters

from: 2021-06-16T12:00:00.00Z - the lower boundary for processing interval of time. The Crawler processes the data starting from this point in time. Required parameter

to: 2021-06-17T14:00:00.00Z - the higher boundary for processing interval of time. The Crawler does not process the data after this point in time. If it is not set the Crawler will work until it is stopped.

type: EVENTS - the type of data the Crawler processes. Allowed values are EVENTS, MESSAGES. The default value is EVENTS.

name: CrawlerName - the Crawler's name to allow data processor to identify it. Required parameter

defaultLength: PT10M - the step that the Crawler will use to create intervals. It uses the Java Duration format. You can read more about it here. The default value is PT1H.

lastUpdateOffset: 10 - the timeout to check previously processed intervals. Works only if the higher boundary (to parameter is set). The default value is 1

lastUpdateOffsetUnit: HOURS - the time unit for lastUpdateOffset parameter. Allowed values are described here in Enum Constants block. The default value is HOURS

delay: 10 - the delay in seconds between the Crawler has processed the current interval and starts processing the next one. The default value is 10

batchSize: 500 - the size of data chunks the Crawler requests from the data provider and feeds to the data processor. The default value is 300

toLag: 5 - the offset from the real time. When the interval's higher bound is greater than the current time - toLag the Crawler will wait until the interval's end is less than current time - toLag. The default value is 1.

toLagOffsetUnit: MINUTES - the time unit for toLag parameter. Allowed values are described here in Enum Constants block. The default value is HOURS.

Example of infra-schema

schema component description example (crawler.yml):

apiVersion: th2.exactpro.com/v1
kind: Th2Box
metadata:
    name: crawler
spec:
    image-name: ghcr.io/th2-net/th2-crawler
    image-version: <verison>
    type: th2-conn
    custom-config:
        from: 2021-06-16T12:00:00.00Z
        to: 2021-06-16T20:00:00.00Z
        name: test-crawler
        type: EVENTS
        defaultLength: PT1H
        lastUpdateOffset: 2
        lastUpdateOffsetUnit: HOURS
        delay: 10
        batchSize: 300
        toLag: 5
        toLagOffsetUnit: MINUTES
    pins:
      - name: to_data_provider
        connection-type: grpc
      - name: to_data_processor
        connection-type: grpc
    extended-settings:
      service:
        enabled: true
    resources:
      limits:
        memory: 200Mi
        cpu: 200m
      requests:
        memory: 100Mi
        cpu: 50m

Links

The crawler required the following links:

  • gRPC link to the data provider working in the gRPC mode
  • gRPC link to the crawler data processor

Links example:

apiVersion: th2.exactpro.com/v1
kind: Th2Link
metadata:
  name: crawler-links
spec:
  boxes-relation:
    router-grpc:
    - name: crawler-to-data-provider
      from:
        strategy: filter
        box: crawler
        pin: to_data_provider
      to:
        service-class: com.exactpro.th2.dataprovider.grpc.DataProviderService
        strategy: robin
        box: data-provider
        pin: server
    - name: crawler-to-data-serivce
      from:
        strategy: filter
        box: crawler
        pin: to_data_processor
      to:
        service-class: com.exactpro.th2.crawler.dataprocessor.grpc.DataProcessorService
        strategy: robin
        box: data-service
        pin: server

Important notes

Crawler takes events/messages from intervals with startTimestamps >= "from" and < "to" of intervals.