Use of Azure for OCR and Parsing of Scanned Medical Content

Historically, the healthcare industry has relied heavily on paper records - a method fraught with challenges like physical storage, accessibility and sharing, and data retrieval inefficiencies. The shift towards digital solutions through Centers for Medicare & Medicaid Services (CMS) interoperability initiatives like MyHealthEData not only addresses these issues but also paves the way for advanced data analysis and improved patient outcomes.

The digital transformation in healthcare is accelerating with the integration of technologies like Azure Document Intelligence and Text Analytics for health. This powerful new technology offers an innovative solution for healthcare businesses transitioning from paper-based records to full digital documentation. By focusing on the extraction of critical information like diagnoses, medications, and protected health information (PHI), medical institutions and ancillary partners will enhance workflow efficiency and ensure better patient care.

Azure Document Intelligence is an advanced tool for converting images of text into digital text. When combined with Azure Text Analytics for health, which is specifically tailored to understand and extract medical data, analysts can extract and collate information from unstructured text, providing valuable insights and structured data.

How integration works:

  1. Digitizing documents with Azure Document Intelligence: The first step involves converting physical documents into digital text. Azure Document Intelligence accurately scans medical records, prescriptions, and other digitized physical documents.

  2. Extracting medical insights with Text Analytics for health: Once the text is digitized, Azure Text Analytics for health comes into play. This service specializes in extracting medical information such as diagnoses, symptoms, medications, and dosages from unstructured text.

Benefits for Healthcare Businesses:

  • Enhanced data accessibility and management: Digitizing records makes them easily accessible, searchable, and manageable.

  • Improved accuracy and efficiency: Automated text extraction reduces human error and speeds up the process of data entry and analysis.

  • Better patient care: Quick access to accurate patient data allows healthcare providers to make more informed decisions, improving patient outcomes such as leveraging a more complete snapshot of a patient’s health history.

  • Data compliance and cost-effective storage: Digitizing records helps in maintaining compliance with health data regulations such as the HIPAA Privacy Rule. Digital storage is more cost-effective by shrinking the need for on- or off-site record management.

Final Thoughts:

The integration of Azure Document Intelligence for text optical character recognition (OCR) and Text Analytics for health is more than just a technical advancement; it's a transformative shift for the entire healthcare sector. By harnessing the power of these AI-driven tools, healthcare providers and organizations are not only streamlining their data management processes but are also unlocking new potentials in patient care and medical research.

The integration of Azure Document Intelligence and Text Analytics for health represents a significant leap forward in the digital healthcare landscape. It's a pivotal development that not only addresses current operational challenges but also sets the stage for future innovations in patient care and medical research. As healthcare continues to evolve, embracing such technologies will be crucial for healthcare providers aiming to deliver superior patient care and stay at the forefront of medical innovation. Read on below for a sample workflow and code sample.


Quick Start Guide:

  1. Create an Azure account by visiting https://azure.microsoft.com/en-us/free with your web browser.

2. Create a Document Intelligence resource in Azure by searching for “Document Intelligence” in the Azure Marketplace.  This resource will allow for OCR of scanned documents.

3. Once the Document Intelligence resource has been created, browse to the created resource and click on the “Keys and Endpoint” tab to copy “Key 1” and the “Endpoint” values.  Copy these entries to a safe place so that you may add them to the project later.

4. Create a Language Service resource in Azure by searching for “Language service”.  This resource will allow for retrieval of healthcare related entities from our converted documents.

5. Once the Language Service resource has been created, browse to the created resource and click on the “Keys and Endpoint” tab to copy “Key 1” and the “Endpoint” values.  Copy these entries to a safe place so that you may add them to the project later

6. Start your preferred IDE and create a new console application

7. Install the Azure.AI.FormRecognizer and Azure.AI.TextAnalytics packages from NuGet

8. In Program.cs, delete all existing content and replace it with the following code sample:

  
using Azure;
using Azure.AI.FormRecognizer.DocumentAnalysis; // DocumentAnalysisClient used for OCR
using Azure.AI.TextAnalytics; // TextAnalyticsClient used for PHI

// Azure Document Intelligence key and endpoint
string ocrKey = "YOUR_DOCUMENT_INTELLIGENCE_KEY";
string ocrEndpoint = "YOUR_DOCUMENT_INTELLIGENCE_ENDPOINT";

// OCR credentials and client
AzureKeyCredential ocrCredential = new AzureKeyCredential(ocrKey);
DocumentAnalysisClient ocrClient = new DocumentAnalysisClient(new Uri(ocrEndpoint), ocrCredential);

// uri to sample document
Uri fileUri = new Uri("https://samplesite.com/sample.png");

// Analyze the document and return the result
AnalyzeDocumentOperation operation = await ocrClient.AnalyzeDocumentFromUriAsync(WaitUntil.Completed, "prebuilt-read", fileUri);
AnalyzeResult result = operation.Value;

// Azure Language key and endpoint
string languageKey = "YOUR_LANGUAGE_KEY";
string languageEndpoint = "YOUR_LANGUAGE_ENDPOINT";

// Language credentials and client
AzureKeyCredential languageCredential = new AzureKeyCredential(languageKey);
var languageClient = new TextAnalyticsClient(new Uri(languageEndpoint), languageCredential);

// Analyze the document and return the health entities
AnalyzeHealthcareEntitiesOperation healthOperation = await languageClient.StartAnalyzeHealthcareEntitiesAsync(new List< string >(){result.Content });
await healthOperation.WaitForCompletionAsync();

await foreach (AnalyzeHealthcareEntitiesResultCollection documentsInPage in healthOperation.Value)
{
    // Extracting specific information
    foreach (var doc in documentsInPage)
    {
        if (!doc.HasError)
        {
            foreach (var entity in doc.Entities)
            {
                if (entity.Category == "Diagnosis" || entity.Category == "Condition")
                {
                    Console.WriteLine($"Diagnosis: {entity.Text}");
                }
                else if (entity.Category == "MedicationName")
                {
                    Console.WriteLine($"Medication: {entity.Text}");
                }
                else if (entity.Category == "Age" || entity.Category == "Gender" || entity.Category == "PhoneNumber")
                {
                    Console.WriteLine($"PHI: {entity.Text} (Category: {entity.Category})");
                }
            }
        }
    }
}

9. Replace the values of “ocrKey” and “ocrEndpoint” with your saved values from your Azure Document Intelligence resource

10. Replace the values of “languageKey” and “languageEndpoint” with your saved values from your Azure Language resource

11. Change the value of “fileUri” to a location of the document you wish to scan.

12. A sample document can be found below:

13. Run the console application.

14. The results should be as follows with the sample document:

PHI: 22 (Category: Age)
PHI: Female (Category: Gender)
Diagnosis: Displaced Fracture
Medication: Ibuprofen
Mike Sipes

Mike Sipes is a seasoned full-stack software engineer at Invene, boasting over 27 years of experience in developing enterprise-class software. His extensive career has seen him creating innovative solutions in diverse industries, including the medical sector, library automation, and real estate, catering to the needs of both corporate entities and governmental organizations globally.

Renowned for his practical approach and unwavering commitment to quality, Mike consistently delivers software experiences that surpass client expectations. His expertise extends beyond coding; he excels in leading teams, fostering a culture of collaboration and innovation to overcome complex challenges.

Mike's passion for technology and problem-solving shines through in his work, establishing him as a reliable and invaluable contributor to any project. His adaptability and wide-ranging experience across various markets underscore his status as a versatile and proficient engineer in the dynamic world of software development.

Previous
Previous

Introduction to Basic Statistical Modeling with Healthcare Data

Next
Next

Invene Welcomes New VP of Engineering