Building a Full-scale Predictive Maintenance Platform for the Large Iron and Mining Company

icon
icon

We reimagined the technical maintenance procedures for a large iron and mining manufacturer by building and integrating a full-scale predictive maintenance solution into their workflows.

Background

The client, one of their country's largest iron mining companies, sought ways to minimize production losses. The main issues leading to these losses were equipment failures and malfunctions, so they needed a solution that would allow them to detect early signs of equipment failures, prevent them, and, thus, achieve the highest level of equipment reliability and performance.

They came to us with the idea of a full-scale preventive maintenance platform. The solution was supposed to streamline the client’s mission-critical production workflows as follows:

  • Reduce production downtime and financial losses associated with equipment failures and breakdowns;
  • Reduce production downtime due to planned maintenance procedures;
  • Minimize costs of technical maintenance procedures;
  • Ensure health safety in the workplace by reducing adverse environmental effects.

Challenge

At the discovery phase, we understood that we could not build a new solution on top of the existing one due to their incompatibility issues. The optimal measure was to migrate data from the legacy enterprise management platform (ERP) to its modern variant that we decided to additionally customize with the preventive maintenance functionality. It was a feasible task. However, it posed the following challenges:
Data migration and synchronization between the two systems

When moving from the legacy, ‘out-of-the-box’ solution to a custom-built one, preserving all the client’s critical data was important. We were to transfer five years of records to the new database structure and introduce new functionality to the platform.

Slow performance

Due to the legacy system's limited performance capabilities, it wasn’t possible to deploy the preventive maintenance functionality that would allow the client to handle large amounts of conditions data collected from multiple machines at the same time.

Methodology & Approach

Flexible project management via the Agile methodology

For this project, we decided to take an agile development approach for the following reasons:

  • Changing requirements: We worked in a highly dynamic environment, synchronizing with the client throughout the development. The migration process is always difficult, as it has its own pitfalls, some of which one can’t foresee at first. That’s why we needed to provide status updates and flexibly change plans while aligning with the customer’s expectations as we encountered new challenges during development.
  • Lockdown and distant work: Our team worked on the project at times of the global pandemic, which necessitated remote work between our team and the client. At this point, we needed to maintain continuous communication between all team members and provide regular, detailed, and transparent feedback on project results for the client to ensure we stayed on track and met the client’s expectations.

Based on the Agile approach, we divided the development process into two-week sprints, providing progress reports at the end of each sprint. Additionally, we synchronized on project status once a week. Thanks to this transparent approach to project communication, we could proactively address any unexpected issues during the process and dynamically adjust client requirements to the new data architecture.

As a result, we managed to streamline the development process and deliver a market-ready preventive maintenance solution within a tight deadline.

Reliability-Centered Maintenance (RCM)

The client contacted us with an idea of ​​Reliability-Centered Maintenance (RCM) solution. RCM is a maintenance approach designed to help organizations enhance equipment uptime and reduce the need for asset replacements.

RCM is built on several core principles:

  • Ensure asset functionality: Focus on keeping assets operational and capable of performing their intended tasks efficiently.
  • Identify failure scenarios: Determine potential ways equipment may fail to meet performance expectations, including full or partial malfunctions.
  • Conduct Failure Modes and Effects Analysis (FMEA): Analyze potential failure modes, evaluate their impact, and rank them according to severity and overall consequences.
  • Risk-based maintenance prioritization: Base maintenance priorities on the risks associated with equipment failures, considering factors like safety, environmental concerns, and operational impacts.
  • Select optimal maintenance strategies: Opt for the most suitable maintenance strategy — whether preventive, predictive, condition-based, or corrective — based on identified failure risks.
  • Ongoing improvement: Continuously reassess and refine maintenance strategies as new information and data become available.

We planned to develop software that would enable the client to create equipment maintenance plans based on the RCM approach, as follows:

  1. Compose an equipment catalog and break it down into distinct units. We assigned a criticality rank to each unit, which was calculated based on predetermined criteria.
  2. Develop a maintenance strategy for each unit. A strategy is a sequence of actions required to maintain the required level of equipment conditions. Each action aimed to reduce the likelihood of failure or the severity of its consequences (in case the failure occurred).
  3. Collect the strategies in a process chart. Process chart is an instruction for servicing the unit. By following the instruction, we minimized the risks of a negative event.
  4. Draw up a maintenance plan. We pulled actions and necessary resources from the process chart. We then transformed the plan into shift tasks for the repair personnel.
  5. Perform work on the units according to the selected strategy and monitor their quality. If any shortcomings or malfunctions were detected in the process, the performer recorded them for further analysis.
  6. Collect statistics on malfunctions and investigate their root causes. If required, we made changes to maintenance strategies.

Risk matrix

We created a risk matrix with all the risks divided by their criticality criteria. Next, we performed the risk priority analysis, developing maintenance strategies in various conditions and operating modes. Finally, we tested various scenarios to define an optimal strategy for each level of risk.

Minimum Viable Product (MVP)

At the MVP stage, we created basic spreadsheets in Microsoft Excel and drafted repair and inspection plans in them. This simplistic form of task management brought us inspiring results, as we identified redundant operations that did not reduce the risk of failure and removed them. We also improved the quality of maintenance strategies, which reduced the number of accidents and maintenance costs.

Solution

In due time, we provided a full-scale preventive maintenance platform, delivering a series of solutions.

Full-cycle predictive maintenance functionality

Based on the RCM approach, we developed the following solutions as part of the new platform:

1. Smart risk assessment tools

The client could conveniently monitor all the divided risks on the new platform based on the predetermined risk criteria.

During internal brainstorming sessions, we prioritized the most severe risks and important equipment assets. At this ideation stage, we heavily relied on data about past failures (in the form of failure alerts, notifications, and messages) that we collected from the client’s legacy system.

We evaluated each asset against such criteria as working safety, environmental impact, production quality, and financial risks — these were the four categories of our risk matrix.

Let’s take a wagon dumper as an example. If one of its support rollers jams, this will reduce the unit's performance but will not lead to an accident. However, destroying a bearing on the rotor leads to an emergency long-term stop and high replacement costs.

Next, we provided recommendations to avoid or mitigate the identified risks.

1. Smart risk assessment tools

2. Real-time performance monitoring dashboards

We created performance monitoring dashboards that enabled operators to estimate financial losses if the risk occurred, including funds required to avoid these risks.

Our team developed preventive maintenance strategies for each risk based on the created recommendations. For example, the destruction of the bearing unit due to mechanical wear is a negative event. To prevent this from happening, the operator must regularly lubricate the bearing. To lubricate the bearings of one roller, they must stop the unit’s operation and assign two engineers who will use 0.5 kg of lubricant (which may take two hours to complete).

However, the tasks were often periodical, meaning that the operator might need to lubricate the bearing each month or perform a unit inspection every year.

The strategies were version-based, meaning they could be further enhanced, improved, or otherwise changed if the operator found more optimal, cost-effective ways to deal with the associated risks, or operational conditions of particular equipment assets change.

2. Real-time performance monitoring dashboards

3. Strategy implementation

We created orders (tasks) in the platform’s planner (calendar) to execute the developed maintenance strategies. For example, if the operator needed to replace lubricants in the machinery each month, they assigned a recurring task to the responsible team or department.

For each task, we specified all the required information, including the scope of work (e.g., checking bearings and lubricating them), materials (e.g., purchasing 0.5 Kg of lubricant), and tools (e.g., wrenches, etc.) required to perform the task.

4. Root cause analysis

We investigated cases of failures by determining and recording the causes of these failures.

For example, an engine breaks down; with an advanced visualization of causal connections that we implemented in the form of ‘Miro-like’ mind maps, we could define the root cause. In the same way as on Miro, users could collaboratively work on mind maps, making their brainstorming more efficient and allowing for faster preventive measures development.

4. Root cause analysis

Highly customizable identity access management

To establish high security of the critical data on the platform, we implemented Microsoft Azure’s Identity Access Management Tool that we customized to fit the client’s specific requirements and procedures.

We arranged a complex, flexibly configurable data access management logic with different roles and permissions, creating a customizable, role-based system. Initially, we defined the following roles (based on the employees’ positions that should have had access to the system at the time):

  • Reliability Engineer;
  • Chief Reliability Specialist, Reliability Specialist, Recommendation Specialist;
  • KIV Coordinator, KIV Observer, KIV Recommendation Specialist;
  • Department Head, Planning Manager, Working Group Members.

The platform’s admin could create custom roles for each user role. For example, roles that allowed giving approvals for recommendations. Or the role that could define which departments and associated equipment failures a particular user (role) could see.

In this way, an entire hierarchy of roles could be created. See the example below:

  • Company
  • Manufacturing site
  • Department
  • Working zone
  • Machinery
  • Equipment parts

When a user with a particular role accessed the system, automated data filtration was triggered based on the user’s permissions and accesses. For example, the user could see everything within their steel plant but could approve recommendations for particular equipment assets within their working area.

Additionally, users could group and filter the data by custom criteria, such as particular equipment machining tools or time of failure.

Each role presented a collection of permissions. However, a single user could have multiple roles assigned to them (including permissions). For example, when the operator needed to replace a worker who took a sick day or went on vacation, each permission had a limited validity period.

Finally, we arranged smart recommendation management per user role. Based on a particular user role, permissions, and assigned manufacturing site(s) and/or equipment asset(s), the system automatically defined the approver(s) for each document.

Migration to SAP S/4HANA

At the same time, we migrated the client’s data from the previous system to SAP S/4HANA and integrated the new functionality with it. In particular, we transferred the equipment catalog, including all integrations, connections, and dependencies. We also launched the migration projects for SAP MRS and Work Manager. Thus, all the maintenance activities were centralized on a single information system.

Why did we choose SAP S/4HANA as the platform to migrate to? It allowed us to automate all the required activities within the client’s enterprise, reduce dependence on human factors, increase the speed of production, and, ultimately, increase customer satisfaction and higher profits.

However, other, more specific factors affected our choice of the migration platform.

The deprecated version of SAP ERP: At the time of the migration, the client’s company was using SAP ECC 6.0. However, by the time the client contacted us, the development of this system had stopped (by 2027, the official support for SAP ECC would be completely discontinued). Yes, there was still time, and the client could stay on the legacy platform, but we needed that time to deploy and optimize the system (as we continue doing now).

Functional incompatibilities: All the improvements, customizations, and adjustments that we planned could potentially ‘break down’ the client’s legacy system or, at least, slow it down.

Heavy reliance on custom code: With SAP S/4HANA, we strived to reduce dependence on custom, non-standard code and ensure faster time to market by doing so.

Lack of scalability potential: The client’s legacy system couldn’t accommodate the growing production capacities. With S/4HANA, on the other hand, it became possible as we migrated the client’s data to the cloud infrastructure, providing them with broad scalability opportunities.

New data architecture

When developing the new data architecture, our goal was to ensure a high level of usability and facilitate deployment. As a result, we arranged the following workflow: the user worked in a browser, authenticated via Active Directory, and received mailings about important events by email.

We stored the information system data in a relational Database Management System (DBMS), integrated with ERP to obtain reference books and transfer maintenance strategies for execution, and generated reports in SAP Business Warehouse (SAP BW).

Additionally, we arranged a two-way integration with SAP ERP, meaning that data was transferred in real-time in both directions. Users could take data from SAP for analytics purposes and send their input data to it at the same time.

New data architecture

Technology Stack

The backend was written in Java, and the frontend was written in JavaScript and ReactJS. Java applications run on Java Virtual Machine (JVM). Everything was packed in a Docker container under the orchestration of Kubernetes (the latter was used to automate deployment procedures), which allowed us not to rely on a specific platform but to add the required customizations instead.

Enterprise management

SAP S/4HANA

Data management

OracleDB

Frontend

JavaScript

ReactJS

Backend

Java

Domain management

Microsoft Active Directory

Authentication management

Kerberos SSO

Development & deployment

Docker

Kubernetes

Project Results

With the predictive maintenance functionality integrated into the client’s new platform, we achieved the following significant results:

  • Introduced an entirely new method of proactive equipment maintenance. This approach implied continuous quality control of equipment performance and condition diagnostics, allowing for building trends and predicting potential failures based on these. With this method, we managed to reduce re-maintenance costs and plan spare parts supply more efficiently.
  • Established more flexible and deeper integration with SAP ERP. By migrating to SAP S/4HANA, we managed to significantly enhance the platform’s functionality. For example, when performing the criticality analysis, we evaluated not only the cost of an hour of equipment downtime but also the cost of materials for repairs and the cost of employee salaries. When creating a list of potential failures, we automatically took into account historical data.
  • Reduced repair costs by 40% and increased the equipment reliability rate by 3%. We ensured full risk control and equipment readiness for production tasks with an unbiased, data-driven picture of repair costs.
  • Built risk prioritization processes end-to-end while maintaining full transparency of decision-making. Data-driven annual planning of the client’s maintenance program and repair processes showed records of financial efficiency.
  • Achieved high performance capabilities of the platform, with data reading performed in less than a second, data extraction for saving and editing done within 3 seconds, and quick processing of large data volumes (that is, when executing a maintenance strategy for an equipment asset, the main action was completed within 3 seconds, while the rest of the actions were executed asynchronously).

To date, more than 1,000 users in five client’s companies use the solution, 1.5 million spots were added to the equipment catalog, 20 thousand maintenance strategies were developed, 140 thousand process maps were created, and 400 thousand actions were performed.

Related Cases
CPG Data. Development of a custom analytics platform

The development of an extensive sales analytics platform with a variety of modules aimed at facilitating the work of distributors and maximizing their performance.

Employee Service Portal

A desk-booking tool and inter-corporate solution for storing and managing employee data in one place. It allows companies with 1000+ employees to take full advantage of the hybrid work model and is friendly in use to both employees and HR managers.