OpenEM and Tator Pipelines¶
Tator is a web-based media management and curation project. Part of the media management is executing algorithms or workflows on a set of media. OpenEM is able to be run within the confines of a Tator workflow. Currently Retinanet-based Detection is supported for inference within a workflow.
Using the Reference Detection Workflow¶
The reference workflow can be used by modifying the scripts/tator/detection_workflow.yaml to match those of the given project.
Generating a data image¶
The reference workflow at run-time pulls a docker image containing network coefficients and weights. To generate a weights image, one can use the scripts/make_pipeline_image.py in a manner similar to below:
1 | python3 make_pipeline_image.py --graph-pb <trained.pb> --train-ini <path_to_train.ini> --publish <docker_hub_user>/<image_name> |
Note the values of <docker_hub_user> and <image_name> for use in the next section.
The referenced train.ini can be a subset of full train.ini; a minimal configuration such as the following is acceptable for the requirements of uploadToTator.py:
1 2 3 | [Data] # Names of species, separated by commas. Species=Fish |
Using the reference workflow definition¶
A reference workflow yaml is in the repository which can be modified to indicate project-specific requirements. img_max_side, img_min_side, batch_size, and keep_threshold map to the arguments in infer.py directly.
This workflow is for executing retinanet-based detections on a video dataset using tensor-rt enabled hardware.
Nominally the only parameters required to change is the strategy definition.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | apiVersion: argoproj.io/v1alpha1 kind: Workflow metadata: generateName: openem-workflow- spec: entrypoint: pipeline ttlSecondsAfterFinished: 3600 volumes: - name: dockersock hostPath: path: /var/run/docker.sock - name: workdir emptyDir: medium: Memory templates: - name: pipeline steps: - - name: worker template: worker - name: worker inputs: artifacts: - name: strategy path: /data/strategy.yaml raw: data: | img-size: [<max>,<min>] keep-threshold: <keep> batch-size: <batch> date_image: <docker_image> version_id: <version_id> box_type_id: <localization_id> sentinel_name: <name for sentinal string attr> container: image: cvisionai/openem_lite:latest volumeMounts: - name: dockersock mountPath: /var/run/docker.sock resources: limits: nvidia.com/gpu: 1 env: - name: TATOR_MEDIA_IDS value: "{{workflow.parameters.media_ids}}" - name: TATOR_API_SERVICE value: "{{workflow.parameters.rest_url}}" - name: TATOR_AUTH_TOKEN value: "{{workflow.parameters.rest_token}}" - name: TATOR_PROJECT_ID value: "{{workflow.parameters.project_id}}" - name: TATOR_WORK_DIR value: "/work" volumeMounts: - name: workdir mountPath: /work command: [python3] args: ["/scripts/tator/detection_entry.py"] |
Detailed Mechanics¶
This section walks through the mechanics of the reference workflow so that users could build more elaborate workflows on OpenEM technology.
A Tator Workflow is specified no differently than a regular Argo workflow, other than there is an expectation the Tator REST API is used to access media files and supply results to a project.
A canonical Tator workflow has three parts: setup, execution, and teardown. More advanced workflows can replace the execution stage with multiple stages using the directed acyclic graph capabilities of argo.
Project setup¶
A project for using this workflow has a media type (either a video type or
an image type) represented by a <media_type_id>
. The project also has a
localization box type represented by <box_type_id>
. The
<media_type_id>>
has the following required attributes:
- Object Detector Processed
- A string attribute type that is set to the date time when the object detector finishes processing the media file.
The <box_type_id>
requires the following attributes:
- Species
- A string representing the name for an object class. If ‘Species’ is not
an appropriate name for class, this can be customized via the
species_attr_name
key in the pipeline argument object to the teardown stage. It defaults to ‘Species’ if not specified. - Confidence
- A float attribute representing the score of the detection. If ‘Confidence’
is not a desired name, it can be customized via the
confidence_attr_name
key in the pipeline argument object to the teardown stage. It defaults to ‘Confidence’ if not specified.
Acquiring media¶
The example setup.py provides a canonical way to download media for a given workflow.
Executing Work¶
The heart of the reference workflow is infer.py from the openem_lite docker image. However, it is useful to have a layer of scripting above that CLI utility to translate workflow definitions to the underlying utility.
Submitting results¶
infer.py generates a csv with inference results, so another utility must interpret these results and submit to the underlying Tator web service. A script called uploadToTator.py is located in scripts, but similar to infer.py; inserting a layer between the raw script can be helpful to manage the environment.