TMCnet Feature
September 22, 2022

Software Deployment for Computer Vision Models: Step By Step

Deploying computer vision models, like other machine learning models, is often a challenge. Only a small fraction of all models developed go into production.

The software deployment lifecycle for a computer vision model begins with quality data collection and preparation, followed by model training, evaluation, deployment, monitoring, and retraining. There are several ways to deploy and serve a computer vision model.

Typically, machine learning models are deployed and accessed via API endpoints. APIs like REST/RPC provide an underlying language and model for how different systems can interact.

Another approach is to deploy an ML model at the edge. Data consumption occurs on the device of origin via computer vision applications. There are also hybrid deployments that combine API endpoints with edge devices.

The interface used depends on how end-users consume the computer vision model. Some users can consume the model through a simple bash command-line interface, while others can consume the model through a web-based or application-based interactive UI. Usually, an API serves the model while downstream applications consume the API endpoint, submitting inputs and receiving predictions.

There are many UI and platform options available to deploy production computer vision models, including laptops (where developers often write the code), remote machines, remote virtual machines (VMs), remote servers hosting Jupyter notebooks, or containers deployed in a cloud environment.

Deploying computer vision models has four main steps:

  1. Identifying, collecting, and preparing data
  2. Running inference
  3. Monitoring production models
  4. Fine tuning model data and parameters

1. Identifying, Collecting, and Preparing Data

Like other machine learning models, computer vision models learn from training data and apply the knowledge to new, never-before-seen data, making predictions and achieving specified business goals.

If the organization building an ML model doesn’t have enough training data, it won’t be able to build a successful model. In addition to the quantity of data, the training data must be clean and high quality to be useful. Obtaining high quality data is typically the responsibility of a machine learning engineering team.

Determining the data requirements is essential for assessing the suitability of collected data for a computer vision project. The focus should be on identifying data, collecting initial data sets, specifying requirements, determining quality, extracting insights, and other factors meriting further research. There is growing awareness in the industry that obtaining and continuously improving high quality datasets can be more important than sophisticated ML models and parameter tuning. This approach is known as data centric AI.

Once the team correctly identifies the training data, it needs to format it for model training. The team should focus on the activities required to build data sets for modeling tasks. Data preparation includes collecting, cleaning, scaling, normalizing, cropping, and annotating training images.

2. Running Inference

In machine learning and computer vision, inference refers to feeding real-time data to a computer vision algorithm to compute outputs, such as numeric scores or image classification tasks. This process is known as operationalizing the model or putting it in production.

Computer vision inference requires deploying software applications in a production environment—a computer vision model includes code that executes mathematical algorithms to perform image recognition, classification, or other tasks. These algorithms perform calculations based on features extracted from an image.

When deploying a computer vision model to a production environment, it is important to consider how the model makes predictions with real-world applications. Like any AI model, there are two key processes:

  • Real-time inference—allows the model to perform predictions and trigger immediate responses at any time (also known as interactive inference). It is useful for analyzing data from interactive applications and streamed video content.
  • Batch inference—this process is asynchronous and makes predictions based on batch observations. It stores the models’ predictions in a file or database for business apps and end-users.

After the initial training, evaluating your model and comparing these two inference processes is important. Computer vision project teams should choose the approach that best fits the specific model, considering:

  • The frequency of predictions generated.
  • The required speed of the predictions.
  • Whether the model needs to generate predictions individually or in small or large batches.
  • The latency expectations.
  • The computing power required to run the model.
  • Additional operational and maintenance costs and implications.

3. Monitoring Production Models

Monitoring a computer vision model involves various techniques to track the model’s performance in production, ensuring its accuracy and stability. Computer vision models are machine learning algorithms that train by processing examples of images or videos in a given category. The image data set should provide enough examples to allow the model to generalize and apply the learned insights to new inputs.

During the training process, monitoring the model to see how well it performs a given task and minimizes error is important. After training on static data sets, the production computer vision model performs inference on new data from evolving real-world conditions and events. The difference between the static data from the training stage and the dynamic data at the production stage can gradually degrade the model’s performance.

The monitoring system should be able to identify changes in input data. Failure to check for these changes in advance can result in the model failing silently. A silent failure can have a serious adverse impact on a company’s performance and reputation. There are many reasons a model might perform poorly in production.

Monitoring the computer vision model in production helps the team maintain and enhance its performance, verifying that it behaves as expected. The deployed model will continue interacting with different real-world scenarios, meaning that the data it processes will constantly change.

Performance monitoring helps teams identify model degradation quickly, allowing them to fix and improve the algorithm. Performance metrics vary by model and training job. For example, an image classification task can use a confusion matrix to identify true positives, true negatives, false positives, and false negatives. The AUC ROC curve will typically be used to estimate if a prediction falls into one of these four categories.

In addition to observing performance metrics under real-world conditions, data science teams can examine the visual inputs to gain deeper insight into the cause of performance degradation.

Measuring model drift is also an important part of monitoring. It involves tracking the changes in the distribution of the model’s inputs and outputs to assess their divergence over time. It is important to check the model for data drift to determine if it is outdated or if there is a problem with the data quality. Using computer vision monitoring to detect drift can help the team better understand how to address these issues.

4. Fine Tuning Model Data and Parameters

Even when the model is already running in production and is regularly monitored, it is important to keep improving it in ongoing iterations. The data collected from the real world can also change in unexpected ways. These factors can inform new requirements when deploying a computer vision model to another system or endpoint.

The completion of the training process is just the beginning of the next iteration. This stage is where the team should decide the following:

  • New requirements for future use cases.
  • How to train the model further to include additional features.
  • How to improve the model’s performance, accuracy, and functionality.
  • Operational requirements for other deployments.
  • How to address model drift and data drift.

This stage involves reflecting on what works in the model or needs improvement. The surest way to successfully build a computer vision model is to identify elements to improve and address changing business needs on an ongoing basis.

Common Pitfalls When Building a Computer Vision Model

Watch out for these mistakes to ensure an effective computer vision model.

Failure to Detect Open or Closed Contours

Computer vision models should be able to identify open and closed contours in an image. While humans can naturally recognize these shapes, machines don’t have this innate ability, so developers must train the deep learning model to identify if a shape is closed or open under various conditions.

Training a computer vision model to understand contours requires effort and expertise to build training data sets and fix complex image processing issues. An alternative is to use an existing trained neural network that easily recognizes curved and straight lines (e.g., Microsoft’s (News - Alert) computer vision model).

Biased Training Data

Ideally, the dataset trains the machine learning model and enables it to perform well after deployment. The right data ensures the model can make accurate predictions regardless of changing conditions.

However, datasets are often unbalanced, containing over- or under-represented examples. In such cases, the resulting computer vision model might lack the realistic context required to perform properly in production. The predictions will be less accurate.

There are several potential sources of database bias when building a computer vision project:

  • Selection bias - arises when the computer vision model trains on a skewed representation of possible scenarios, not covering the likely conditions in production.
  • Confirmation bias - arises when the team collecting the data reinforces its members’ assumptions about real-world distributions, causing the model to perform poorly.
  • Measurement bias - arises when the methods used to collect training data differ from how the team or application collects production data for prediction.

Detecting bias and identifying its source are crucial steps in minimizing and mitigating biased datasets. One useful technique is a data slicing evaluation, which helps the team understand how the model performs differently based on different dataset versions.

Slicing or subsetting a dataset allows the team to study the model’s behavior with each subset or slice. It makes it easy to identify groups with different sliced metrics from the overall dataset.

Unsuitable Transfer Learning Techniques

Another often overlooked aspect of computer vision model development is the choice of transfer learning technique. The two main techniques for adapting a pre-trained model to predict new tasks with minimal data are:

  • Transfer learning with feature extraction
  • Transfer learning with fine-tuning

Often, practitioners may prefer feature extraction over fine-tuning. However, the transfer learning method should depend on the problem the computer vision project aims to solve. In some cases, fine-tuning the model’s final layers will produce a better-performing model.

When unsure which technique is best suited to the specific task, computer vision practitioners should err on the side of fine-tuning, especially when training the model on small image datasets.


In this article, I explained the four basic steps of deploying computer vision models to production:

  • Identifying, collecting, and preparing data—involves building a high quality data set that will allow the model to effectively train.
  • Running inference—setting up the model to receive inputs and generate predictions, either in real-time or batch mode.
  • Monitoring production models—defining operating metrics and reviewing them to identify model failure and drift.
  • Fine tuning model data and parameters—based on the model’s real-world performance, iteratively improving it by updating datasets, model parameters, and model architecture.

I hope this will be useful as you move your models from the lab to a production environment.

» More TMCnet Feature Articles


» More TMCnet Feature Articles