Learning from the Master: Using ChatGPT for Reinforcement Learning – part 3

"In the final instalment of this series, we will be taking the Q-learning solution generated by ChatGPT and optimizing it for maximum performance. In the previous posts, we demonstrated how ChatGPT can be used as a tool for data science workflows, including generating boilerplate solutions and addressing bugs and errors. Now, with a working solution in hand, we will explore various optimization techniques to fine-tune and improve the performance of the Q-learning algorithm. From adjusting hyperparameters to implementing advanced techniques, this post will show you how to take your ChatGPT-generated solution to the next level.

In the previous instalments of this blog series, I demonstrated how to use ChatGPT as tool in your data science workflow and how much it helps, timewise, compared to doing everything from scratch. In this third part, I will ask ChatGPT to optimize code in a working solution and comment on how good of a job it does. As previously source code is available here: solita/chatGPT_RL_blog3: All resources referenced in Solita chatGPR for RL blog part3 (github.com)

And as a treat, my assignment solution for the university is also available for you in the repository. Feel free to compare and evaluate both.

Optimizing the agent

I’ll start with the actual neural network architecture. Let’s ask ChatGPT to suggest a better Deep Q-learning architecture.

Input: “Suggest a better deep q learning architecture than this one:

def build_model(self):

        “””Build the neural network model for the DQN”””

        model = Sequential()

        model.add(Dense(24, input_dim=self.state_size, activation=’relu’))

        model.add(Dense(24, activation=’relu’))

        model.add(Dense(self.action_size, activation=’linear’))

        model.compile(loss=’mse’, optimizer=Adam(learning_rate=self.learning_rate))


        return model

explain why would this work better”

Output: One improvement to this architecture would be to use a convolutional neural network (CNN) instead of a fully connected network for the initial layers. This is because CNNs are well-suited for processing images and other grid-like data, which is often the type of input that a DQN receives.

Additionally, you can use a technique called experience replay which stores past experiences and randomly samples from them to improve the stability and efficiency of learning.

You can also add a target network that is used to compute the target Q-values during training. This can help to stabilize the training process and improve the overall performance of the DQN.

Another improvement would be to use a technique called Double DQN which addresses the problem of overestimation of Q values by having two different networks to estimate the Q values, one for action selection and the other for action evaluation.

Incorporating these techniques can lead to a better DQN architecture and improve the performance of the agent.

Input: Produce a model with the suggested improvements, keep in mind that the input data is 1-dimensional if you use convolutional layers


A solid proposition for creating a better architecture for the neural network. Convolutional layers are used as a feature extraction layer and then passed to feed-forward neural network layers for estimating q-values. You can see from the notebook provided that this change had quite a drastic effect on the reward that the agent gets.

I’ll test another function from the agent class before moving to the environment class.

Input: Do you have propositions on how to make this function faster?

def get_action(self, state, possible_actions_index):

        “””get action from model using epsilon-greedy policy”””


        I transferred the epsilon decay method to the notebook.

        The chatGPT generated function is only choosing a random action or the action with the highest predicted Q-value.

        It should also be considering the possible actions that are available in the current state. Additionally, the function is only decreasing epsilon after each episode, while it should be decreasing epsilon after each sample.

        I don’t want to pass the environment class as a parameter to access the env.requests() function. We’ll just pass the possible action indices and actions an rewrite this function.


        if np.random.rand() <= self.epsilon:

            # explore: choose a random action from possible actions

            return random.choice(possible_actions_index)


            # exploit: choose the action with the highest predicted Q-value

            state = np.array(self.convert_state_to_vector(state)

                             ).reshape(1, self.state_size)

            q_vals = self.model.predict(state, verbose=0)

            return possible_actions_index[np.argmax(q_vals[0][possible_actions_index])] 


In all honesty, some of these don’t make much sense and this is due to the fact that ChatGPT didn’t get any context in the input, just the code.

  1. would be valid if the possible_actions_index variable would be numpy array and not a list.
  2. With refactoring it would be possible to use vectorized data, a good suggestion
  3. preprocessing the requests could be done in this case since all requests could be pre-computed from the time matrix.
  4. kind of makes this whole code pointless since the agents job is to estimate the q-values.
  5. Epsilon decay is done after each step.
  6. Prioritized replay buffer is a valid suggestion and wouldn’t be too complex to implement, but since it involves some extra computation and look backs during each step (compared to the vanilla replay buffer I implemented) it would make the solution slower albeit the Q-learning algorithm would, probably, converge faster.
  7. This is the architecture that I originally submitted myself, it makes training a little bit slower,, but the model converges faster due to increased stability
  8. A valid option also

Based on this I would say that ChatGPT is actually making some nice propositions on a function level on how to optimize the agent class but not on a solution level since it’s lacking the context.

Here’s the code for the optimized DQN Agent

Let’s move on to the environment class, in the last part of this blog series I pointed out that there is a repeated code problem in the reward and next state functions. How will ChatGPT react to it?

Time to see how would ChatGPT optimize the environment class.

Input: Give a suggestion on how to optimize this class:


  1. Suggestion is already in the code, 
  2. the second and third suggestions are good suggestions but with the state_space tuple you must be careful, since you need to add that state for an offline action. Once the tuple is created you cannot modify it.
  3. Suggestions 4. To 6. Resonate well with the previous suggestion of not calling the state vectorization function. If the data would be in a NumPy format, we wouldn’t need to do the vectorization calls. This would require extensive refactoring and take some time to achieve.
  4. The last suggestion is maybe too much for our use case.

Let’s check how ChatGPT optimizes the reward_func and next_state_func where I pointed out that we’re making the same 4D array slicing in both functions.

Input: Explain how to make this code snippet computationally less expensive and easier to read. *the reward_func and next_state_func as input*


Good suggestions you can see how I implemented these in the repository: optimized environment class


As a conclusion to this blog series, I have demonstrated how Data Scientists can use ChatGPT as a tool to streamline their working process and get suggestions for problems or code quality enhancements. 

ChatGPT is quite proficient at suggesting good and mostly relevant changes to solutions that already work but it isn’t yet quite able to produce a working solution just by itself.

It will be quite exciting to start working on solving business problems with large-language-models and personally, I can’t wait to test the coding optimized version of GPT3.5 that Microsoft is now offering Azure OpenAI Service – Advanced Language Models | Microsoft Azure

If you would be interested in learning more about opportunities with generative models, and optimization or are looking for a partner to help you with your use cases don’t hesitate to contact us.

Learning from the Master: Using ChatGPT for Reinforcement Learning – part 2

In the first part of this series, we explored the capabilities of ChatGPT, a state-of-the-art language model developed by OpenAI, in assisting data scientists with tasks such as data cleaning, preprocessing, and code generation. In this second part, we will delve deeper into what ChatGPT generated and why it didn't work. We will discuss the specific challenges that come with using AI-generated code, and how to effectively address these issues to ensure the reliability and accuracy of the final product. Whether you're a data scientist or a developer, this post will provide valuable insights into how to use ChatGPT to improve your workflow and streamline your development process.

In the first instalment of this blog series, we explored the capabilities of ChatGPT in generating boilerplate code from well-defined problem statements. We also discussed the benefits of providing extensive detail on the desired code functionality and the performance of ChatGPT when given a skeleton code to fill.

While the results were impressive, and a good starting point for a data science project, it was an effort to make the code function.

In this part of the blog, I will walk you through the bugs and mistakes ChatGPT made. As for why ChatGPT made the mistakes, there are multiple reasons and I have explained some of the problems in the first chapter of the series.


I was having trouble figuring out a way to present this part of the blog.  There are plenty of bugs and mistakes ChatGPT made and to make it easier to understand, I’ll provide an explanation of the smaller pieces (attributes, functions, variables) in more detail. This post is written for developers and assumes that the reader has a basic understanding of Python programming.

I have chosen to explain the major changes I made on the function level. To see all of the changes, you will need to refer to the code in the repository provided.

This post will go through each function, explain what it is supposed to do, and what ChatGPT did wrong in my opinion. The actual fixes can be found by looking at the code in the repository.

Boilerplate ChatGPT code can be found at: chatGPT_RL_blog1/Boilerplates/chatGPT at main · solita/chatGPT_RL_blog1 (github.com)

The complete finished solution can be found at: solita/chatGPT_RL_blog2: All the material discussed in the second part of the chatGPT for reinforcement learning blog series (github.com)

Fixing the environment boilerplate

link to the full script: chatGPT_RL_blog2/environment.py at main · solita/chatGPT_RL_blog2 (github.com)

The init script

Docstub: “Initialise your state, define your action space, and state space”

My understanding of the errors that ChatGPT made:

The init for action_space that chatGPT generated was wrong since it generates a list of all possible pairs of integers between 0 and m-1 where m=5, including pairs where the integers are the same, and an additional pair [0, 0]. 

The agent can’t travel in a loop from state 1 to 1, eg. the pickup location and the drop-off location can’t be the same as per the problem description. The only exception is the offline action [0,0] when the agent chooses to go offline.

I fixed this so that the action_space init generates a list of all possible pairs of integers between 0 and m-1, where the integers in each pair are distinct and at least one of them is 0.

The requests function

Docstub: Determining the number of requests basis, and the location. Use the table specified in the MDP and complete it for the rest of the locations.

My understanding of the errors that ChatGPT made:

ChatGPT only handled pickup location 0 when m=5.

I added handling for the rest. Using a dictionary instead of the if-else structure suggested by ChatGPT. ChatGPT does not add the index [0] to indicate no ride action, the method just returned an empty list.

The reward function

Docstub: Takes in state, action and Time-matrix and returns the reward

My understanding of the errors that chatGPT made:

  • No-ride action is not handled correctly. No-ride action should move the time component +1h as described in the problem definition. The function was returning the reward=-C which does not correspond to the reward calculation formula: (time * R) – (time * C). time = total transit time from current location through pickup to dropoff (transitioning from current state to next state).
  • chatGPT is calculating the travel time from the current state to the next state and updates the location. chatGPT is doing a mistake, hour and day in a state tuple are integers.

ChatGPTs way of calculating the time it takes to transition (for the taxi to drive) from the current state to the next state results in returning arrays for h and d. 

This is due to the fact that ChatGPT is slicing the 4D Time Matrix in the wrong manner. ChatGPT is using two sets of indices, pickup and drop-off, to slice the array.

  • indices are needed to slice the array in the correct way. I broke the total transition time calculation into multiple steps for clarity

The next state function

Docstub: Takes state and action as input and returns next state

My understanding of the errors that chatGPT made:

  • chatGPT is calculating the travel time from the current state to the next state and updates the location. chatGPT is doing a mistake, hour and day in a state tuple are integers.

ChatGPTs way of calculating the time it takes to transition (for the taxi to drive) from the current state to the next state results in returning arrays for h and d. 

This is due to the fact that ChatGPT is slicing the 4D Time Matrix in the wrong manner. ChatGPT is using two sets of indices, pickup and drop-off, to slice the array.

  • indices are needed to slice the array in the correct way. I broke the total transition time calculation into multiple steps for clarity

Just to point out there is a repeated code problem in these functions since they make the same lookup, it should be refactored as a function, but I’ll leave that to the next part of the blog.

What was missing? A function to do a step:

In reinforcement learning (RL), a “step” refers to the process of transitioning from the current state to the next state and selecting an action based on the agent’s predictions. The process of taking a step typically includes the following steps:

  1. The agent observes the current state of the environment
  2. The agent selects an action based on its policy and the current state
  3. The agent takes the selected action, which causes the environment to transition to a new state
  4. The agent receives a reward signal, which indicates how well the selected action led to the new state
  5. The agent updates its policy based on the received reward and the new state

At each step, the agent uses its current policy, which is a function that takes the current state as input and produces an action as output, to select the next action. The agent also uses the rewards obtained from the environment to update its policy so as to maximize the cumulative rewards.

Looking at the code that was generated by ChatGPT all of the pieces were there. The step function I implemented is just a wrapper that uses the reward and gets the next state functions. Look at the solution in the repository for details.

Fixing the Q-Learning agent boilerplate

Link to the full script: chatGPT_RL_blog2/agent_chatGPT.py at main · solita/chatGPT_RL_blog2 (github.com)

The init script

Docstub: “Initialise the DQNAgent class and parameters.”

My understanding of the errors that chatGPT made:

“Everything was initialized properly, variable for the initial exploration rate epsilon was missing so I added that. “

The build_model method

Docstub: “Build the neural network model for the DQN.”

ChatGPT didn’t do any mistakes here, it builds a simple feed-forward neural network, and the input and output sizes are defined correctly.

The get_action method

Docstub: “get action from the model using epsilon-greedy policy”

My understanding of the errors that ChatGPT made:

“Transferred the epsilon decay method to the notebook. The ChatGPT-generated function is only choosing a random action or the action with the highest predicted Q-value. It should also be considering the possible actions that are available in the current state. Additionally, the function is only decreasing epsilon after each episode, while it should be decreasing epsilon after each sample. I don’t want to pass the environment class as a parameter to access the env.requests() function. We’ll just pass the possible action indices and actions and rewrite this function.”

The train_model method

Docstub: “Function to train the model on each step run. Picks the random memory events according to batch size and runs it through the network to train it.”

My understanding of the errors that ChatGPT made:

“This boilerplate from ChatGPT won’t quite do. It is updating the Q values for one sample at a time, and not using a batch sampled from the memory. Using a batch will speed up training and stabilize the model. “


Overall going through the boilerplate code that ChatGPT generated and fixing them took around 10 hours. As a comparison when I originally solved this coding problem and generated a solution with the help of Google, it took around 30 hours. The boilerplate provided as input had a clear impact on the solution, both with ChatGPT and me.

In the next part of the blog series, I’ll ask ChatGPT to propose optimizations to the fixed code and see if it makes or breaks it. My original solution will be available for comparison.

Learning from the Master: Using ChatGPT for Reinforcement Learning – part 1

ChatGPT, a state-of-the-art language model developed by OpenAI, has the ability to assist data scientists in a variety of tasks. Its advanced natural language processing capabilities make it well-suited for tasks such as data cleaning and preprocessing, text summarization, and even the generation of code. In this blog post, we will explore one of the ways in which ChatGPT can be utilized in data science workflows, and discuss its potential to streamline and automate various aspects of the data science process.

This blog will reference source code presented here: chatGPT_RL_Blog1

Grammarly and GitHub Copilot are tools that help professionals improve their writing and coding by identifying errors and suggesting corrections. Grammarly is a writing tool that checks grammar, spelling, and punctuation, while GitHub Copilot is a coding tool that suggests code completions and helps with refactoring. These tools are designed to help professionals who already know the solution to the problem but want to speed up their work by automating error checking and providing suggestions to improve their writing or coding.

ChatGPT, on the other hand, is a language model that can generate text similar to human language. It can be used to generate code based on input, but it’s not specifically designed or trained to optimize code. However, ChatGPT can understand natural language instructions and generate code that follows those instructions, which makes it even better for people who are not experts in coding to write code based on their needs and it can perform a wide range of tasks. Additionally, ChatGPT has the ability to understand natural language inputs and generate human-like responses, which is not the case for Grammarly and GitHub Copilot which are specialized for specific tasks.

I have come across several blog posts (f.ex. https://www.solita.fi/en/blogs/gpt-will-you-do-the-fandango/) discussing the use of ChatGPT for data science and have personally conducted my own testing and troubleshooting with it. As a result, I decided to utilize ChatGPT for a hobby project in data science. It revolves around optimizing the revenue of taxi drivers with reinforcement learning. The following details my process, the challenges encountered, and the outcomes achieved.

For reference, this optimization problem was presented as a part of a Reinforcement Learning course when I was studying at university. Multiple different solutions to this problem are available on GitHub and Kaggle.

Here is the link to the source material presented: chatGPT_RL_Blog1

Primer – What is reinforcement learning?

Reinforcement learning is a type of machine learning where an agent learns to make decisions by interacting with its environment. The agent receives rewards or penalties for its actions and uses this feedback to improve its decision-making over time. A common example of this is a computer program that learns to play a game by repeatedly playing it and adjusting its strategy based on its performance. The goal of the agent is to maximise its overall reward over time. Reinforcement learning has been used to train agents to play complex games like Go and chess, as well as control robots and drones.

Reader discretion is advised – Why can the outputs differ when using the same inputs in chatGPT?

“ChatGPT generates its output based on the input it receives and the patterns it has learned from the dataset it was trained on. If the input it receives is slightly different, the model will generate a different output based on the patterns it has learned. Additionally, ChatGPT is a probabilistic model, so it samples from the distribution of possibilities it has learned from the data. Therefore, even if the input is the same, ChatGPT may generate a different output each time it is run. The variation can also come from the fact that the model is not deterministic.” – chatGPT

If you want to deep dive into the subject, “why does the data make the model”, I recommend having a look at Andrew Ng’s work Data-centric AI Resource Hub (datacentricai.org)

Generating boilerplate code with chatGPT

Let’s first see how chatGPT builds the environment for the reinforcement learning problem.

The problem declaration was given as a README.md file. The environment for the cab driver is defined and constrained very clearly so that students can start working with it immediately.

Let’s see how chatGPT performs, there won’t be pictures of the complete outputs since that would make this blog way too long. The reader is encouraged to try these input prompts while you’re reading and taking a look at the boilerplate was the result of this experiment.

Input: Based on this input explain the problem and summary of a possible solution.

Output: “The problem at hand is to develop an RL-based algorithm that can assist cab drivers in maximizing their profits by helping them make better decisions on the field. The recent hikes in electricity prices have caused many drivers to complain that their profits are almost flat despite gradually increasing revenues. The algorithm’s goal is to help drivers choose the ‘right’ rides, i.e. those that are likely to maximize the total profit earned by the driver that day. The objective of the problem is to maximize the profit earned over the long-term. The decisions are made at an hourly interval, and the algorithm needs to take into consideration the current state of the driver (location, hour-of-the-day, and day-of-the-week), the ride requests received, and the rules governing the ride-allocation system. A possible solution in python would be to develop a Q-learning algorithm that takes into account the state, action, and reward, and updates the Q-values over time to converge to an optimal policy for the cab driver to follow.”

If you read the problem definition you can see that chatGPT wrote a really nice summary of the problem described. 

Input:  Based on the previous input generate, a environment.py file that can be used to train an RL agent to solve the problem. The reinforcement learning agent is a Tensorflow Keras API based deep Q learning model.

Output: “I’m sorry but creating an environment.py file that can be used to train an RL agent to solve the problem would require me to have access to the relevant data and the complete structure of the deep Q learning model. Additionally, it would also require a significant amount of code development and testing to make sure the environment is working correctly.

However, I can give you some guidelines on how to create such an environment.py file:

  1. Define the state space and the action space for the agent
  2. Create a class for the environment that initializes the state and implements the step function, which takes in the action chosen by the agent and returns the next state, reward, and a boolean indicating if the episode is over
  3. Implement a method for resetting the environment to the initial state
  4. Implement a method for rendering the current state of the environment
  5. Create a function for getting the state from the environment, which will be used by the agent to update its Q-values
  6. Add any additional functionality needed for the specific problem and the chosen RL algorithm

Please keep in mind that the implementation details of the environment will depend on the data and the specific problem you are trying to solve.“

Let’s try a different type of input, a summary in natural language about the code that we want ChatGPT to generate.


Output an example in python based on this definition:

The CabDriver class is used to simulate the cab-driving environment for the reinforcement learning agent.

  1. The __init__ function initializes the class and sets the following attributes:
    • self.action_space: a list of all possible actions the agent can take, represented as a pair of integers for the pickup and drop-off locations.
    • self.state_space: a list of all possible states the agent can be in, represented as a tuple of integers for the current location, hour, and day.
    • self.state_init: a randomly generated initial state for the agent to start in.
  2. The convert_state_to_vector function takes in the current state and converts it into a vector format, which can be fed into a neural network. The vector is of size m + t + d, where m is the number of cities, t is the number of hours, and d is the number of days.
  3. The replay function takes in the current state and returns the number of requests the agent can receive at that location, time and day. The function uses a Poisson distribution with an average rate specified for each location. The possible actions are obtained by randomly sampling from the action space and appending a [0,0] action which means no pickup or drop-off
  4. The get_reward function takes in the current state, action and the time matrix and returns the reward for the given state and action. The reward is calculated as the difference between the revenue from the passenger and the cost of fuel and other expenses.
  5. The get_next_state function takes in the current state, action and the time matrix and returns the next state of the agent after taking the given action from the current state. The next state will be the destination location, the time of arrival and the day of the week after completing the travel time. If the action is [0,0], meaning no pickup or drop off, the next state will be the same as the current state.
  6. The reset() function returns the action_space, state_space and the initial state of the agent at the beginning of the episode.


Output: code output from natural language input – environment

Now the result looks promising, let’s iterate once more. Code stubs for the environment and agent were provided as a part of the assignment from the university. Let’s ask chatGPT to fill in the code stub based on the previous output.

Let’s try a third kind of input, a code skeleton.

Input: Fill in this code skeleton. Add docstrings for all functions.

Output: Boilerplate filled from the skeleton by chatGPT

The output looks like something we can start working with.

Q-learning algorithm

Let’s start working on the RL agent class.

Input: Create a DQNAgent python script that uses the CabDriver class for training


This looks really good; the function naming is now clearer than with the environment and ChatGPT has added the necessary parameters. Let’s try using the natural language input that defines the DQNAgent class.

Input: Build a python script based on this description and the previous output:

  1. The DQNAgent class is a class for creating a deep Q-network (DQN) agent.
  2. The __init__ method is used to initialize the class, and sets the size of the state and action space for the agent, as well as initializes the following hyperparameters for the DQN:
  • discount_factor: the discount factor used to weigh future rewards in the Q-learning algorithm
  • learning_rate: the learning rate used to update the model’s weights
  • epsilon_max: the maximum value of the epsilon exploration parameter
  • epsilon_decay: the rate at which epsilon decays towards epsilon_min
  • epsilon_min: the minimum value of the epsilon exploration parameter
  • batch_size: the number of samples used in each training update
  • memory: a deque object used to store previous experiences for training
  • model: the DQN model created by the build_model method
  1. The build_model method is used to build the neural network model for the DQN. This method creates a new Sequential model, adds layers to the model, and compiles it with the mean squared error loss function and the Adam optimizer with the learning rate specified in the class initialization.
  2. The get_action method is used to select the next action to take in the environment. It uses an epsilon-greedy policy, where with probability epsilon, a random action is chosen, and with probability 1-epsilon, the action with the highest predicted Q-value is chosen.
  3. The append_sample method is used to save a sample of the form (state, action, reward, next_state) to the replay memory. This sample can later be used for training the DQN.
  4. The train_model method is used to train the DQN model using samples from the replay memory. It first checks if there are enough samples in the memory, and if so, it randomly samples a batch of samples from the memory. It then creates empty arrays update_output and update_input that will be used to update the model’s weights.
  5. It then loops through the samples in the mini-batch and for each sample, it predicts the target Q-values using the current model, computes the target Q-values using the next state, and updates the update_output and update_input arrays with the state and target Q-values respectively.
  6. After updating the arrays, it then fits the model with the update_input and update_output arrays, with the batch size specified in the class initialization and for only one epoch.
  7. The save method is used to save the current DQN model to a file with the specified name.

Output: code output from natural language input – agent

Interestingly the output didn’t change that much, a train_model function was added, and the function naming was replaced by the inputs. And finally let’s see the input when we use a code skeleton and chatGPT fills in the blanks.

Input: Fill in this code skeleton based on the previous inputs and outputs. Add docstrings to all functions.

Output: Boilerplate Agent code filled from the skeleton by chatGPT 

This will be the boilerplate that I’ll start working on in the next part of this blog. As it can be seen it is by no means a working solution.

Interestingly as I requested chatGPT to regenerate the output a few times, this started happening:

Parting words

In summary, this blog post has shown you how to use chatGPT as a starting point for your machine-learning projects. You can find all the code and examples used in this blog post from GitHub: chatGPT_RL_Blog1

In the next part of the blog, I will go through the changes I made to the boilerplate code to make it work. In the last chapter of this series, I will use chatGPT suggestions to optimize functions in the code and add some optimizations of my own to see how good of a solution we’ll end up with.

FYI: All the text in this blog has been written with the help of Grammarly and the images have been generated by: DALL·E (openai.com)

AWS SageMaker Pipelines – Making MLOps easier for the Data Scientist

SageMaker Pipelines is a machine learning pipeline creation SDK designed to make deploying machine learning models to production fast and easy. I recently got to use the service in an edge ML project and here are my thoughts about its pros and cons. (For more about the said project refer to Solita data blog series about IIoT and connected factories https://data.solita.fi/factory-floor-and-edge-computing/)

Example pipeline

Why do we need MLOps?

First, there were statistics then came the emperor’s new clothes – machine learning, a rebranding of old methods accompanied with new ones emerged. Fast forward to today and we’re all the time talking about this thing called “AI”, the hype is real, it’s palpable because of products like Siri and Amazon Alexa.

But from a Data Scientist point of view, what does it take to develop such a model? Or even a simpler model, say a binary classifier? The amount of work is quite large, and this is only the tip of the iceberg. How much more work is needed to put that model into the continuous development and delivery cycle?

For a Data Scientist, it can be hard to visualize what kind of systems you need to automate everything your model needs to perform its task. Data ETL, feature engineering, model training, inference, hyperparameter optimization, performance monitoring etc. Sounds like a lot to automate?

(Hidden technical debt in machine learning https://proceedings.neurips.cc/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf)


This is where MLOps comes to the picture, bridging DevOps CI/CD practices to the data science world and bringing in some new aspects as well. You can see more information about MLOps from previous Solita content such as https://www.solita.fi/en/events/webinar-what-is-mlops-and-how-to-benefit-from-it/ 

Building an MLOps infrastructure is one thing but learning to use it fluently is also a task of its own. For a Data Scientist at the beginning of his/her career, it could seem too much to learn how to use cloud infrastructure as well as learn how to develop Python code that is “production” ready. A Jupyter notebook outputting predictions to a CSV file simply isn’t enough at this stage of the machine learning revolution.

(The “first” standard on MLOps, Uber Michelangelo Platform https://eng.uber.com/michelangelo-machine-learning-platform/)


A Jupyter notebook outputting predictions to a CSV file simply isn’t enough at this stage of the machine learning revolution.

Usually, companies that have a long track record of Data Science projects have a few DevOps, Data Engineer/Machine Learning Engineer roles working closely with their Data Scientists teams to distribute the different tasks of production machine learning deployment. Maybe they even have built the tooling and the infrastructure needed to deploy models into production more easily. But there are still quite a few Data Science teams and data-driven companies figuring out how to do this MLOps thing.

Why should you try SageMaker Pipelines?

AWS is the biggest cloud provider ATM so it has all the tooling imaginable that you’d need to build a system like this. They are also heavily invested in Data Science with their SageMaker product and new features are popping up constantly. The problem so far has been that there are perhaps too many different ways of building a system like this.

AWS tries to tackle some of the problems with the technical debt involving production machine learning with their SageMaker Pipelines product. I’ve recently been involved in project building and deploying an MLOps pipeline for edge devices using SageMaker Pipelines and I’ll try to provide some insight on why it is good and what is lacking compared to a completely custom-built MLOps pipeline.

The SageMaker Pipelines approach is an ambitious one. What if, Data Scientists, instead of having to learn to use this complex cloud infrastructure, you could deploy to production just by learning how to use a single Python SDK (https://github.com/aws/sagemaker-python-sdk)? You don’t even need the AWS cloud to get started, it also runs locally (to a point).

SageMaker Pipelines aims at making MLOps easy for Data Scientists. You can define your whole MLOps pipeline in f.ex. A Jupyter Notebook and automate the whole process. There are a lot of prebuilt containers for data engineering, model training and model monitoring that have been custom-built for AWS. If these are not enough you can use your containers enabling you to do anything that is not supported out of the box. There are also a couple of very niche features like out-of-network training where your model will be trained in an instance that has no access to the internet mitigating the risk of somebody from the outside trying to influence your model training with f.ex. Altered training data.

You can version your models via the model registry. If you have multiple different use cases for the same model architectures with differences being in the datasets used for training it’s easy to select the suitable version from SageMaker UI or the python SDK and refactor the pipeline to suit your needs.  With this approach, the aim is that each MLOps pipeline has a lot of components that are reusable in the next project. This enables faster development cycles and the time to production is reduced. 

SageMaker Pipelines logs every step of the workflow from training instance sizes to model hyperparameters automatically. You can seamlessly deploy your model to the SageMaker Endpoint (a separate service) and after deployment, you can also automatically monitor your model for concept drifts in the data or f.ex. latencies in your API. You can even deploy multiple versions of your models and do A/B testing to select which one is proving to be the best.

And if you want to deploy your model to the edge, be it a fleet of RaspberryPi4s or something else, SageMaker provides tooling for that also and it seamlessly integrates with Pipelines.

You can recompile your models for a specific device type using SageMaker Neo Compilation jobs (basically if you’re deploying to an ARM etc. device you need to do certain conversions for everything to work as it should) and deploy to your fleet using SageMaker fleet management.

Considerations before choosing SageMaker Pipelines

By combining all of these features to a single service usable through SDK and UI, Amazon has managed to automate a lot of the CI/CD work needed for deploying machine learning models into production at scale with agile project development methodologies. You can also leverage all of the other SageMaker products f.ex. Feature Store or Forekaster if you happen to need them. If you’re already invested in using AWS you should give this a try.

Be it a great product to get started with machine learning pipelines it isn’t without its flaws. It is quite capable for batch learning settings but there is no support as of yet for streaming/online learning tasks. 

And for the so-called Citizen Data Scientist, this is not the right product since you need to be somewhat fluent in Python. Citizen Data Scientists are better off with BI products like Tableau or Qlik (which use SageMaker Autopilot as their backend for ML) or perhaps with products like DataRobot. 

And in a time where software products are high availability and high usage the SageMaker EndPoints model API deployment scenario where you have to pre-decide the number of machines serving your model isn’t quite enough.

 In e-commerce applications, you could run into situations where your API is receiving so much traffic that it can’t handle all the requests because you didn’t select a big enough cluster to serve the model with. The only way to increase the cluster size in SageMaker Pipelines is to redeploy a new revision within a bigger cluster. It is pretty much a no brainer to use a Kubernetes cluster with horizontal scaling if you want to be able to serve your model as the traffic to the API keeps increasing.

Overall it is a very nicely packaged product with a lot of good features. The problem with MLOps in AWS has been that there are too many ways of doing the same thing and SageMaker Pipelines is an effort for trying to streamline and package all those different methodologies together for machine learning pipeline creation.

It’s a great fit if you work with batch learning models and want to create machine learning pipelines really fast. If you’re working with online learning or reinforcement models you’ll need a custom solution. And if you are adamant that you need autoscaling then you need to do the API deployments yourself, SageMaker endpoints aren’t quite there yet. For references to a “complete” architecture refer to the AWS blog https://aws.amazon.com/blogs/machine-learning/automate-model-retraining-with-amazon-sagemaker-pipelines-when-drift-is-detected/