2025-01-17

Clinical Data Visualization POC

Clinical trial data visualization proof of concept system using Streamlit and Plotly

Tech Stack
Streamlit LLM code generation Plotly Data Visualization Python Interactive Dashboard
Clinical Data Visualization POC

Clinical Trial Data Visualization POC System

📋 Project Overview

In clinical trials within the pharmaceutical industry, researchers need to process large amounts of trial data and generate standardized report tables and visualization charts. Traditional data processing workflows typically require manual data organization, table creation, and then using professional tools to create charts - a time-consuming and error-prone process. We developed a POC system based on AWS Claude 3 multimodal AI that achieves end-to-end automation from clinical trial data to HTML tables to matplotlib charts. This system combines text understanding and image understanding capabilities, automatically creating corresponding visualization charts based on users' natural language descriptions or reference images, compressing data processing work that originally took hours into minutes of automated workflow.

🚀 Key Features

Core Implementation

  • Multimodal AI-Driven Table Generation: Used AWS Claude 3 Sonnet model to automatically parse clinical trial data and generate HTML tables conforming to standard formats
  • Intelligent Chart Code Generation: Based on natural language descriptions or reference images, automatically generates complete matplotlib Python code
  • Dynamic Code Execution Engine: System automatically executes generated Python code, ensuring reliability of chart output
  • Multimodal Input Support: Supports both text descriptions and image references as input methods, meeting different user needs
  • Clinical Data Standardization: Automatically processes complex data formats in clinical trials, including percentages in parentheses, statistical data, etc.

Technical Highlights

  • AWS Claude 3 Multimodal Capabilities: Combined text understanding and computer vision to achieve intelligent chart style imitation and data filling
  • BeautifulSoup Data Processing: Precisely parsed generated HTML tables, converting them to pandas DataFrame for subsequent analysis
  • Dynamic Python Code Execution: Realized real-time execution and result verification of generated code through exec() function
  • Streamlit Interactive Interface: Provided intuitive Web interface supporting data upload, process display, and result download

💻 Project Detail

Our multimodal clinical data visualization system addresses core pain points in data processing and visualization in pharmaceutical research. The specific implementation process is as follows:

  1. Clinical Data Input Processing:

  2. Users input raw clinical trial data text through Streamlit interface

  3. System automatically identifies and parses data structure, including subject information, symptom records, statistical indicators, etc.
  4. Supports multiple clinical trial data formats with automatic standardization processing

  5. Intelligent HTML Table Generation:

  6. AWS Claude 3 Sonnet model deeply analyzes input data, identifying key clinical indicators and values

  7. Automatically fills data according to predefined HTML templates, generating standardized tables
  8. Supports multiple clinical trial common table types such as Subject Details and Summary of Signs
  9. Automatically processes complex formats like percentages in parentheses, confidence intervals, and other statistical data

  10. Table Data Structured Conversion:

  11. Used BeautifulSoup to precisely parse generated HTML table structure

  12. Extracted headers, row data, and statistical information, converting to pandas DataFrame format
  13. Ensured correct data type conversion, providing clean data sources for subsequent chart generation

  14. Multimodal Chart Generation:

  15. Text-Driven Mode: Users describe chart requirements in natural language (type, style, color scheme, etc.), Claude 3 generates complete matplotlib code

  16. Image Reference Mode: Upload reference images, AI analyzes image style features and replicates similar chart layouts and visual effects with new data
  17. Automatically generates complete Python code including data processing, chart drawing, and style settings

  18. Code Execution and Result Verification:

  19. System uses Python exec() function to dynamically execute generated matplotlib code

  20. Real-time checking of code execution status, automatically handling possible syntax errors or data mismatches
  21. Automatically saves generated chart files, supporting multiple output formats

  22. Intelligent Chart Interpretation:

  23. Used multimodal AI to analyze generated charts, providing concise descriptions and data insights within 3 sentences
  24. Automatically identifies data trends, outliers, and key statistical features
  25. Provides professional data interpretation recommendations for clinical researchers

The entire system ensures AI can accurately understand professional terminology and data format requirements in clinical trials through carefully designed prompt engineering.

📊 Project Impact

Pharmaceutical Research Efficiency Enhancement:

  • Successfully validated application potential of multimodal AI in pharmaceutical data processing, laying technical foundation for subsequent production-level system development
  • Accurately processed complex clinical data formats in testing, generating tables and charts conforming to industry standards
  • Significantly improved data processing efficiency, providing innovative ideas for digital transformation in pharmaceutical industry

Technical Innovation Validation:

  • Demonstrated practical application value of Claude 3 multimodal capabilities in professional domains
  • Validated feasibility and accuracy of natural language to code generation
  • Provided proof of concept for AI-driven scientific data visualization

System Design Value:

  • Modular architecture design supports subsequent functionality expansion and customization
  • Multimodal input methods significantly lowered user barriers
  • Automated code execution mechanism ensured system reliability and practicality

🛠️ Technology Stack

AI & Machine Learning:
  - AWS Claude 3 Sonnet (Multimodal AI Understanding & Generation)
  - AWS Bedrock (Enterprise-grade AI Service Platform)
  - Natural Language Processing (Clinical Data Semantic Understanding)
  - Computer Vision (Reference Image Style Analysis)
  - Multimodal AI (Text + Image Understanding)

Data Processing & Visualization:
  - pandas (Data Processing & Analysis)
  - matplotlib (Professional Chart Generation & Visualization)
  - BeautifulSoup (HTML Table Parsing & Processing)
  - Python exec() (Dynamic Code Execution Engine)

Web Framework & Interface:
  - FastAPI (High-performance RESTful API)
  - Streamlit (Interactive Frontend Interface)
  - Pydantic (Data Validation & Serialization)

Document & Template Processing:
  - HTML Templates (Standardized Table Templates)
  - Regular Expressions (Code Extraction & Parsing)
  - File System Storage (Chart & Data Storage)

Clinical Data Handling:
  - Clinical Trial Data Processing (Clinical Trial Data Processing)
  - Statistical Data Formatting (Statistical Data Formatting)
  - Medical Terminology Recognition (Medical Terminology Recognition)
  - Subject Details Analysis (Subject Details Analysis)

Infrastructure & Deployment:
  - Docker (Containerized Deployment)
  - AWS Infrastructure (Cloud AI Services)
  - Environment Variables (Configuration Management)
  - Error Handling (Exception Handling Mechanism)

This project demonstrates the application potential of multimodal generative AI in pharmaceutical data processing, providing innovative technical validation for automated processing and visualization of clinical trial data.

Harvey

Full Stack Developer

A full-stack developer passionate about solving real-world business challenges, with expertise in data science and artificial intelligence.

Contact Me