From the course: Advanced Data Processing: Batch, Real-Time, and Cloud Architectures for AI

Unlock this course with a free trial

Join today to access over 24,900 courses taught by industry experts.

Batch model training

Batch model training

- [Instructor] Let's now continue our discussion on batch AI architectures with batch model training. What does the batch training pipeline architecture look like? The pipeline starts where the feature engineering pipeline ends. The feature store has the data ready for machine learning. First, this data is sent through a job to split the dataset. Typically, the dataset is split into a training dataset and a test dataset. A third dataset called validation dataset may also be added if final validation is required. The sets so created are stored in the respective data stores. The model training job then kicks in. It uses the training data set to train the model. Depending on the type of model, this may take a few seconds or a few days. The model training job delivers a model. This could be a pickle file. It can be a large language model with millions of parameters. A model testing job then runs on this model. It uses the test data set to test the model and measure its accuracy. If the…

Contents