Prodata Documentation#

prodata#

Simplified proapi access, response formatting and preprocessing of machine data for data analytics.

Dependencies#

Python: >= 3.10

In addition to the requirements.txt dependencies, the package propai (provided by Proemion) is required.

What is it?#

This package has two focus areas, details as follows:

ProQuery#

The ProQuery class offers several helpers to in and output API query data and responses directly in certain formats like pandas.Dataframes, handling API pagination and date formatting etc.

Preprocessing#

preprocessing which offers advanced data processing pipelines using Scikit-learn-based transforms framework.

How to use#

Proquery#

Example: Get all machines with a specific ECU software version and a certain operating hours range.#

# imports
from prodata.proquery import ProQuery
from collections import namedtuple

# Instantiate ProQuery and API and perform authentication.
pq = ProQuery(client_id="YOUR_CLIENT_ID", client_secret="YOUR_CLIENT_SECRET")
API = pq.api

# Set signal keys and query machines with the query string 'q'.
signal = namedtuple("Signal", "type key")
signal_hours = signal(type='numeric',
                      key='value.common.machine.hours.operation.total')
signal_ecu = signal(type='string',
                    key='value.custom.ecu.drive.software.identification')
machines = pq.get_df(
    API.machines_api.machines_get_machines,
    q=f'measurements.{signal_hours.type}.{signal_hours.key}=gt=1000 and '
      f'measurements.{signal_hours.type}.{signal_hours.key}=lt=3000 and '
      f'measurements.{signal_ecu.type}.{signal_ecu.key}==1.09'
)

Alright that was easy, now let´s check which of these machines have a specific DTC active:

# additional imports
import datetime

# Define a timerange.
_from, to = pq.convert_to_posix((2024, 10, 1), datetime.datetime.now())

# Query machines which have DTCs with specific target Source, SPN and FMI.
target_spn = '520568'
target_fmi = '22'
target_source = '0'
machines_w_errors = pq.input_df(
    API.j1939_api.j1939_get_machines_id_dtcs,
    data=machines,
    params_col_names={'id': 'id'},
    q=f'source == {target_source} '
      f'and spn=={target_spn} '
      f'and fmi=={target_fmi} '
      f'and active == true',
   _from=_from, to=to)

Great, now let´s check if there are some overdue maintenance tasks for the machines showing the error which were not touched yet:

# Filter all machines having the DTC.
machines_w_errors = machines_w_errors[machines_w_errors['status'].notna()]
# Get info for maintenance tasks which were not addressed yet.
# NOTE: Below is shown how columns of a dataframe are passed into a RSQL query.
machines_w_errors = pq.input_df(
    API.maintenance_tasks_api.maintenance_tasks_get_maintenance_tasks,
    rsql_cols=['input_id'],
    q='machine.id=={} '
      'and deadline == "overdue" '
      'and progress =out= ("skipped", "completed")',
    data=machines_w_errors)

Contents:

Indices and tables#