Highlights

We worked with a healthcare startup on a mission to offer healthcare professionals with an improved ability to remotely monitor at-home patients. Specifically, the target was to prevent patient falls — a billion dollar cost to the healthcare system. Reducing patient falls would reduce injury and the associated costs, and would improve overall patient experience in acute, home care, and long-term care environments. 

To deliver on this mission, the customer applied state of the art IoT devices and cameras to track patient vitals. Bitstrapped was contracted to develop a more scalable architecture on GCP to migrate the customer over to. An event driven architecture with best practices for automation, Dev Ops, and use of serverless and highly scalable compute, enabled the customer to improve the performance of their camera ML models by shipping faster updates, ingesting more data, and automating more backend processes in real time.

To make this possible, Bitstrapped utilized Kubernetes Engine pipelines with design of preemptible GPU node pools to power ML training and automate pipeline orchestration for data intake and processing. Bitstrapped brang traditional infrastructure strategies to solve for the hard parts of operating machine learning at scale, automating away tedious, manual workflows, enabling the client to work smarter on their ML technologies.

Success Metrics

  • Reduce % of backend processes that are manually orchestrated
  • Improve scalability of data ingest, experimentation, training, and new deployments
  • Overcome inability to automatically deploy new models to many devices
  • Improve team collaboration efficiency and scalability of labelling process
  • Reduce training time and compute performance
  • Architect with enterprise grade best practices
Industry
Healthcare
Headquarters
Los Angeles, United States

Challenge

  • Pipelines in existing public cloud were not production ready, lacking automation and robustness, preventing ability to rapidly improve camera runtimes
  • Lack of system automation for adding new data, running new experiments, and retraining machine learning models
  • Adhoc commands issued to run models without device registration and automation
  • No effective method to label new camera images in high volume setting was leading to poor camera accuracy and inability to collaborate as a team
  • Lack of infrastructure portability and DevOps automation was limiting ability to migrate or adopt multi-cloud strategy

Solution

Bitstrapped re-architected an event driven system on GCP that enabled greater degree of automation and high volume of data ingestion and processing events. The chosen architecture for pre-processing and data ingestion was serverless, powered by IoT core with enablement of automated deployment of models to the edge. As images were collected, the architecture would facilitate data augmentation and preprocessing of the images in order to speed up downstream training pipelines.

For the data science team, the new infrastructure on Google Kubernetes Engine and use of production KubeFlow, enabled automation of ML pipelines to pre-process, train, and deploy the models. This included flexible compute for their experimentation dynamic and GPU enablement as needed to accelerate training time. 

Once the event-driven pipeline automation was in place, focus shifted to the performance of the camera solution. Implementing a semi-automated labelling solution would automatically detect failures and allow for humans in the loop as needed. This way, the team could be efficient in labelling images, while also improving the ability to do meta learning and make use of previously trained models to label new datasets. Label automation was achieved by integrating a custom instance of Label Studio into the architecture and MLOps process. 

Results

Event driven automation enabled:

  • Limitless data ingestion scale
  • Real-time automation of raw data ingest, data pre-processing, data lake ingest, ETL process
  • Semi-automated labelling pipelines reduced labelling activities from 1 week to <24hrs

DevOps and Orchestration enabled:

  • Reduction of model training time by 75%

Kubernetes Engine adoption enabled:

  • Increased performance through preemptible GPU node pools
  • Clusters with greater horizontal scalability
  • Terraform configurable Kubernetes CronJobs
  • Kubeflow adoption for ETL pipelines

Improved Security with:

  • Data uploads using Short-lived signed urls
  • Cloud IotCore
  • Adoption of service accounts
  • Least privilege IAM and bucket access

Best Practices adopted:

  • Data partitioning
  • Terraform IaS for disaster recovery and portability
  • Highly reliable event processing with Pub/Sub

IoT Core enabled:

  • Registration of limitless devices
  • Auto deployment of model updates to whole device ecosystem

Customer was equipped with productive grade, practically fully automated process, that freed back all the time previously spent manually executing steps to trigger events, label, and upload data. As a result, the same teams could focus on the accuracy of the models and new model development with a self sufficient cloud environment.

See more case studies

Predictive Maintenance for Oil and Gas Supermajors

Cloud-based simulations to predict failure of equipment and improve the efficiency of maintenance operations

Read Case Study

Predictive responsiveness in neurological recovery therapy

Medical devices and robotics company leverages machine learning to launch predictive models to classify the level of responsiveness of in-home neurological recovery therapy

Read Case Study