2025-06-01

IVR System Optimization Project

Healthcare insurance IVR system intelligent optimization using BERT modeling and Topic Modeling

Tech Stack

Topic Modeling BERT encoder training NLP IVR Funnel Analysis Python

Pharmaceutical Company IVR System Optimization Project

📋 Project Overview

A pharmaceutical company's IVR (Interactive Voice Response) system processes approximately 20,000 user calls daily, but the self-contained rate (proportion of user issues resolved within the IVR system) needed improvement, with many users still requiring transfer to human customer service. We used advanced unsupervised machine learning and deep learning techniques to mine hidden user intent patterns from massive call records, optimize intent classification accuracy, and establish an intelligent failure reason analysis system. This project combined cutting-edge technologies including UMAP dimensionality reduction, HDBSCAN clustering, BERT deep learning, and large language models, providing data-driven solutions for intelligent IVR system improvement.

🚀 Key Features

Core Implementation

Unsupervised New Intent Discovery: Used combination technology stack of Sentence Transformers + UMAP + HDBSCAN to discover unknown user intent patterns
BERT Deep Learning Classifier: Built high-precision user intent classification model based on BERT Encoder, supporting transfer learning
Large Language Model Intelligent Analysis: Integrated DeepSeek and Llama3.1 models for automated failure reason tagging and analysis
Real-time Anomaly Detection: Established automated failure rate monitoring and alerting mechanisms, supporting business operation optimization
Multi-dimensional Data Science Analysis: Comprehensive customer service data analysis through clustering, classification, topic modeling, and other ML techniques

Technical Highlights

High-dimensional Data Dimensionality Reduction Optimization: Used UMAP algorithm for high-quality dimensionality reduction of semantic embeddings, maintaining intrinsic data structure
Hierarchical Clustering Intelligent Discovery: HDBSCAN algorithm automatically identifies density changes, discovering business-meaningful user intent clusters
Deep Transfer Learning: Utilized labeled data from other projects to train BERT models, achieving cross-domain knowledge transfer
Hybrid Large Model Architecture: DeepSeek batch analysis + Llama3.1 real-time processing, balancing processing efficiency and cost

💻 Project Detail

Our IVR system optimization solution is based on advanced data science methods, achieving systematic performance improvement through three specialized modules:

Unsupervised Learning New Intent Discovery Module:
- Used Sentence Transformers model to semantically encode user call records, generating 768-dimensional vector representations
- Applied UMAP (Uniform Manifold Approximation and Projection) algorithm to reduce high-dimensional semantic vectors to 2-3 dimensional space
- Deployed HDBSCAN (Hierarchical Density-Based Spatial Clustering) hierarchical density clustering algorithm to identify user intent patterns
- Through Topic Modeling techniques extracted keywords and themes from clustering results, identifying 15+ new user requirement scenarios for the IVR system
- Used scikit-learn and FAISS for large-scale vector similarity computation and retrieval optimization
Deep Learning Intent Classifier Optimization:
- Built transformer architecture text classification model based on BERT Encoder
- Used PyTorch framework to implement end-to-end deep learning training workflow
- Implemented transfer learning strategy, using labeled data from other customer service projects for pre-training
- Established high-precision prediction model for utterance-intent mapping relationships
- Implemented multi-model version management and A/B testing framework, supporting continuous model performance optimization
Large Language Model Failure Reason Analysis:
- Integrated DeepSeek large model for batch sample analysis, automatically summarizing main transfer reasons and patterns
- Deployed Llama3.1 local model for real-time failure reason identification and classification
- Established statistical-based anomaly detection algorithms, monitoring abnormal fluctuations in self-contained rate
- Built automated alerting mechanism, timely notifying product teams and operations personnel
Data Engineering and Storage:
- Used PostgreSQL to store structured customer service call data and analysis results
- Established ETL processes to handle massive call records, supporting real-time and batch data processing
- Used pandas and NumPy for efficient data preprocessing and feature engineering
Model Deployment and Monitoring:
- Conducted model prototype development and experiments in Jupyter Notebooks environment
- Established complete MLOps processes, supporting model version control and automated deployment
- Implemented model performance monitoring and data drift detection mechanisms

📊 Project Impact

Data Insights and Discovery Value:

Discovered 15+ new user intent categories through unsupervised clustering analysis, providing data support for product team IVR process optimization
Identified high-frequency but uncovered user requirement scenarios, directly guiding product functionality improvement directions
Topic Modeling-based theme discovery provided scientific basis for customer service training and knowledge base construction

Machine Learning Model Performance Enhancement:

BERT intent classifier accuracy improved approximately 12% compared to original system, significantly reducing user experience issues caused by misclassification
Deep learning model deployment enabled better understanding of complex user expressions and semantic changes
Transfer learning strategy effectively utilized cross-project labeled data, improving model generalization capability

Operational Efficiency and Automation Improvement:

Achieved automated failure reason tagging, replacing originally time-consuming manual analysis processes
Established real-time anomaly monitoring system capable of quickly responding to abnormal fluctuations in self-contained rate
Provided data-driven training directions and system improvement suggestions for customer service teams

Business Value Creation:

Significantly reduced human customer service workload by improving automation resolution rate
Improved user experience, particularly in pharmaceutical consultation scenarios requiring high accuracy
Provided replicable technical framework for intelligent customer service system upgrades in pharmaceutical industry

🛠️ Technology Stack

Machine Learning & Deep Learning:
  - BERT Encoder (Deep Language Understanding)
  - PyTorch (Deep Learning Framework)
  - Sentence Transformers (Semantic Vector Encoding)
  - Transformer Architecture (Attention Mechanism Model)

Unsupervised Learning:
  - UMAP (High-dimensional Data Dimensionality Reduction)
  - HDBSCAN (Hierarchical Density Clustering)
  - Topic Modeling (Theme Discovery)
  - Hierarchical Clustering (Hierarchical Cluster Analysis)

Large Language Models:
  - DeepSeek (Batch Intelligent Analysis)
  - Llama3.1 (Real-time Failure Analysis)
  - Prompt Engineering (Prompt Engineering)

Data Science & Analytics:
  - scikit-learn (Machine Learning Toolkit)
  - FAISS (Large-scale Vector Retrieval)
  - PostgreSQL (Structured Data Storage)
  - pandas (Data Processing & Analysis)
  - NumPy (Numerical Computing)

Development & Deployment:
  - Jupyter Notebooks (Model Development Environment)
  - Python (Core Development Language)
  - MLOps (Machine Learning Operations)
  - A/B Testing (Model Effect Validation)

Statistical Analysis:
  - Anomaly Detection (Anomaly Detection)
  - Time Series Analysis (Time Series Analysis)
  - Statistical Modeling (Statistical Modeling)
  - Performance Metrics (Performance Metrics Analysis)

This project demonstrates comprehensive application of unsupervised machine learning, deep learning, and large language models in customer service system optimization, providing advanced technical practice for intelligent customer service upgrades in pharmaceutical industry.