Scalable Analytics Platform for Digital Healthcare

icon
icon
icon
icon

Emerline developed and implemented a scalable analytics platform for a healthcare company, enabling them to provide transparent reporting and real-time analytics to their clients.

Client and Challenge

Our client is a Norwegian innovator operating in the digital healthcare industry. The company offers technology solutions for collecting information from various advertising sources (Facebook Ads, Google Ads, social media) and user activity on their website (events), and then creates dashboards to compare user activity and advertising campaigns. The end-users of the developed solution are the company's own clients.

When the company approached Emerline, it faced a number of critical challenges that threatened its ability to scale and effectively serve its clients:

  • Extreme scalability requirements

The primary challenge was the need to create a solution capable of handling the current user base and seamlessly scaling 10x by year-end and 100x within a few years, without requiring significant additional DevOps or development resources.

  • High data ingest and analytics costs

The client needed a solution for collecting data from various advertising sources and user activity, as well as creating dashboards for comparing this data, but with strict limitations on the total project cost and its subsequent support. This necessitated critical decisions regarding tool selection, especially self-hosted Metabase (as a BI tool) and AWS Athena (as a database for the BI tool), based on their cost-effectiveness.

  • Complexity of data security and GDPR compliance

In the healthcare and digital marketing industries, working with user data required strict implementation of Row-Level Security (RLS) and full compliance with the General Data Protection Regulation (GDPR). This posed a significant challenge within the chosen architecture.

  • Disparate data sources and lack of unified analytics

Information came from numerous disparate advertising platforms and user behavior analytics systems (Posthog), creating difficulties in data unification and obtaining a comprehensive picture of campaign effectiveness.

Methodology

Emerline applied a highly adaptive and cost-effective Time and Materials approach. This enabled us to efficiently solve complex scaling and data security challenges under strict budget constraints.

Phase 1: Requirements definition and cost analysis (cost-optimization focus)

This phase involved in-depth research of the client's business goals: collecting data from advertising sources and user activity, and creating dashboards for comparing campaigns. The most complex aspect was the overall project cost optimization. Critical decisions were made regarding tool selection, meticulously evaluating Metabase and AWS Athena for their cost and effectiveness.

Phase 2: Architecture designed for scalable growth

Based on projected data volume growth — 10× by year-end and up to 100× in the following years — the architecture was intentionally designed for scalability, elasticity, and operational efficiency. Databricks was selected as the core platform due to its ability to handle large-scale batch and streaming workloads with minimal DevOps overhead. Its native support for Delta Lake, job orchestration, and collaborative development reduced the need for external tooling and engineering effort. The final architecture seamlessly integrated data from sources like PostHog and multiple advertising networks, while remaining flexible enough to scale linearly without major refactoring or infrastructure rework.

Phase 3: ETL and data transformation development

Fundamental data engineering work, including the implementation of ETL (Extract, Transform, Load) processes using Databricks, proceeded very smoothly and without significant issues. This allowed the team to efficiently transition to subsequent project phases.

Phase 4: Security and compliance implementation

At this stage, the main task was the implementation of robust Row-Level Security (RLS) and ensuring full GDPR compliance within the chosen architecture.

Phase 5: Deployment and monitoring

The solution was successfully deployed and launched. After the launch, continuous monitoring was established to track performance, stability, and compliance with key metrics.

Justification for Technology Stack Selection

The choice of this technology stack is driven by the aim to ensure high performance, scalability, flexibility, and cost-effectiveness of the solution, as well as to accommodate the specifics of working with large volumes of distributed data:

  • Amazon Athena and Amazon S3 (Lakehouse Architecture)

    This combination creates a Lakehouse architecture that allows storing vast amounts of raw data in S3 (cost-effective and scalable object storage) and directly querying it using standard SQL via Athena without the need to move data into a traditional data warehouse. This provides flexibility, reduces ETL costs, and allows working with various data formats (JSON).

  • Airbyte

    Airbyte was chosen due to the need for efficient and seamless data loading from various, including internal, sources. Airbyte offers an extensive library of connectors, simplifying integration and reducing data pipeline development time.

Collectively, this technology stack allowed for the creation of a flexible, scalable, and cost-effective solution capable of processing large volumes of data, ensuring high transparency, and supporting complex analytics for geographically dispersed teams, fully meeting the client's stated business objectives.

Solution

We developed a comprehensive analytics platform that automates the entire client reporting process — from data ingestion and processing to visualization. This enables our client's customers to gain on-demand access to service performance information, demonstrating value, ensuring transparency, and facilitating data-driven decision-making.

Key functional modules and technological features of the solution:

Data integration

The solution ingests data from disparate sources, specifically product analytics data from Posthog and marketing performance data from advertising networks (Social Network Ads).

Data processing and transformation

A multi-layered Medallion architecture (Bronze, Silver, Gold) is implemented using Databricks Delta Tables, enabling efficient, reliable, and ACID-compliant data processing at scale. Thanks to native support for incremental processing, workflow scheduling, and notebook-driven development within Databricks, the solution operates without the need for traditional orchestration tools, simplifying pipeline management and reducing operational overhead. This architecture ensures high data reliability, integrity, and readiness for downstream analytics

Embedded analytics and visualization

Processed data is presented to end-users (Customer's clients) via secure, embedded dashboards, accessible directly within their user environment, providing a seamless analytical experience.

High scalability

The solution's architecture, built on Databricks on AWS, is designed to handle a 100x increase in users without requiring rework or significant additional resources.

Cost-efficient BI access via multi-catalog architecture

Curated Databricks Delta Lake tables (Gold layer), governed via Unity Catalog, are exposed to Amazon Athena through the AWS Glue Data Catalog, enabling interactive SQL access without triggering Databricks clusters. This setup forms a multi-catalog architecture, allowing data to remain in a single storage layer (S3) while being accessible from multiple compute engines — Databricks for heavy processing and Athena for lightweight BI workloads. As a result, the platform minimizes compute costs by leveraging Athena’s pay-per-scan pricing model instead of Databricks' per-cluster runtime, while eliminating data duplication and preserving a unified view across the stack.

Row-level security (RLS) and GDPR compliance

The primary security measure is the implementation of RLS, which ensures that each client can view only the data for which they have explicit permissions. The entire solution adheres to GDPR principles, including privacy by design and data minimization.

Technology Stack

The solution is built on the AWS cloud platform using Databricks, ensuring high performance, scalability, and security for big data analytics:

Cloud platform

Amazon Web Services (AWS)

AWS IAM (for access management

AWS S3 (for data storage)

AWS Athena

Data platform

Databricks

Databricks Lakehouse Platform (version 13.3)

Delta Lake (as the storage format)

BI and visualization

Metabase

Data loading (Ingestion)

Airbyte Cloud (for external data sources)

Posthog Data Pipelines (for internal user events)

Databases

Databricks Lakehouse Platform

AWS Athena (as a database for the BI tool)

Programming languages

PySpark

SQL

Operating system

Managed by cloud providers

Results

Upon successful completion, the project demonstrated cost-effectiveness, flexibility, and scalability, fully meeting the client's key requirements.

Key achievements at the current stage:

  • Real-time metrics access

    Ensured fast, on-demand access to analytical metrics, which significantly accelerated the process of gaining insights and making decisions.

  • Team unification

    A Single Source of Truth was established, which unified team workflows and ensured data consistency.

  • Data management centralization

    Data management was centralized for multiple business units, ensuring data consistency and accuracy across the entire organization.

  • Scalability

    The solution's architecture, built on Databricks on AWS, can handle a 100x increase in the user base without requiring significant additional resources, enabling business growth without technological limitations.

  • Cost-effectiveness and flexibility

    The solution demonstrated high cost-effectiveness and flexibility, fully meeting the client's key requirements and staying within budget constraints.

  • Reliable data architecture

    Data engineering and the implementation of ETL processes on Databricks were successfully completed, providing a reliable and stable foundation for the platform's further development.

  • Support and maintenance

    Comprehensive documentation was provided and training was conducted for the seamless implementation and ongoing operation of the platform.

More Case Studies
MedBill-IQ: Medical Bill Tracking App Aimed at Reducing Healthcare Costs
MedBill-IQ: Medical Bill Tracking App Aimed at Reducing Healthcare Costs

Emerline developed iOS and Android apps to help their customers reach a broader audience of clients, while complementing its web solution with mobile alternatives.

Midday: Mobile App Development for Healthcare
Midday: Mobile App Development for Healthcare

Advanced mobile menopause management app for women in midlife

AI-Powered Medical Surgery Recording App
AI-Powered Medical Surgery Recording App

Advanced AI-powered iOS Application Integrated with Innovative Health Tech Software Platform