Databricks: Train PySpark Models: RF and GBT

This article describes how to train Random Forest (RF) and Gradient Boosted Tree (GBT) models using  PySpark API and the databricks notebook.

Data is from the Kaggle Credit Card Fraud Data Set.

It is a large set with over 280K lines, so it should give a fair estimation for the models.

Continue reading

Predix: Asset Store Model Viewer

Predix: Asset Store Model Viewer

Summary

This article will go through how to use the GEL extension, the parameters it uses (LEVEL and TYPE), and visualize the asset model, which allows a user to make an estimation of the model’s accuracy and a validation/consistency of the model graph.

Graph Expression Language (GEL) extension

GEL extension query

Screen Shot 2018-07-03 at 11.04.46 AM

The asset store browser returns connected graph(s) by GEL selected nodes.  There are two additional parameters: level and type, to limit requested node connections in the viewer and Prolog style predicates that gives additional content value for the query.

Continue reading

Velociraptor

Velociraptor

“Discovery is always rape of the natural world. Always.” ― Michael Crichton, Jurassic Park

Time Series Device Modeling Tool

Velociraptor was created to model a scalable amount of devices that generate dynamic data using different interpolation methods to dynamically create sensors in a device. It allows you to create and start/stop all or individual device(s) using the REST API. During modeling, you may change a device’s profile to generate data. Velociraptor can be use in modeling, machine learning, predictive analytics and many other industrial tasks where we need time series generated data.

Data input

“They don’t have intelligence. They have what I call ‘thintelligence.’ They see the immediate situation.” ― Michael Crichton, Jurassic Park

Continue reading