Predictive Maintenance with Machine Learning

Project Overview

This project aimed to develop a Predictive Maintenance System using Machine Learning to reduce unplanned downtime and maintenance costs in industrial manufacturing. By analyzing sensor data from machines, the model predicts potential failures before they occur, allowing businesses to take preventive actions.

Key Objectives:

Preprocess and analyze a dataset containing machine sensor readings.
Identify key failure indicators using exploratory data analysis (EDA).
Train a machine learning model to predict failures with high accuracy.
Evaluate the model’s performance to ensure it meets deployment standards.

This system is particularly valuable for OEMs, manufacturers, and industrial automation companies looking to optimize maintenance schedules, reduce unexpected breakdowns, and improve overall equipment efficiency.

Execution Process

1. Dataset Preparation

Dataset Overview:
- The dataset contained 10,000 machine records with 10 features, including:
  - Air temperature, process temperature, torque, rotational speed, tool wear, and failure types.
  - A binary "Target" column indicating machine failures (1 = Failure, 0 = No Failure).
Challenges in Raw Data:
- Categorical variables (e.g., machine type) needed encoding.
- Imbalanced dataset (failure cases were underrepresented).
- Noisy sensor readings required standardization.
Preprocessing Steps:
- Handled missing values and removed irrelevant columns (e.g., unique IDs).
- Encoded categorical features using Label Encoding.
- Normalized numerical features (scaling sensor values for better model performance).
- Balanced the dataset using SMOTE (Synthetic Minority Over-sampling Technique) to avoid model bias.

2. Exploratory Data Analysis (EDA)

We conducted EDA to identify key factors influencing machine failures and gain insights from the data.

Correlation Heatmap:
- Showed strong correlations between Torque, Tool Wear, and Failures.
- Air temperature and Process temperature were related but weak predictors of failures.
Failure Distribution Analysis: Boxplots & Pairplots:
- Certain failure types (2, 3, and 5) had a higher occurrence of breakdowns.
- Failure Type 1 (Mostly No Failures)
  - This represents the "No Failure" class, where no specific failure occurred. This aligns with typical datasets where most equipment runs without issues.
- Failure Types 2, 3, and 5 (Mostly Failures)
  - These failure types show exclusively or predominantly machine failures (orange bars)
  - These categories represent specific failure modes where, if the condition exists, it almost always leads to a machine failure. They could represent critical failure types like:
    - Overstrain
    - Heat Damage
    - Tool Wear Issues
- Failure Types 0 and 4
  - Both show moderate counts of failures, with very few or no non-failure cases.
  - These might represent rare failure modes that still cause breakdowns when present but are not as common.
Higher torque & longer tool wear times increased failure probability.

Rotational speed was inversely related to torque, consistent with mechanical expectations.
Diagonal (Feature Distributions):
- Torque [Nm] & Tool Wear [min]:
  - Orange curves (failures) skew towards higher values compared to blue (non-failures).
  - Insight: Higher torque and increased tool wear are linked to more frequent machine failures.
- Air & Process Temperature: Both distributions overlap significantly, meaning temperature alone isn't a strong failure indicator.
Strong Negative Correlation:
- Torque [Nm] vs. Rotational Speed [rpm]:
  - A clear inverse relationship: as torque increases, rotational speed decreases.
  - This aligns with mechanical principles—high torque often reduces rotational speed.
Clusters Indicating Patterns:
- Torque vs. Tool Wear:
  - Failures (orange) are denser at high torque and high tool wear combinations.
  - This cluster suggests that when machines operate under high stress for prolonged periods, failure likelihood increases.
Overlapping Regions:
- Features like Air Temperature and Process Temperature show heavy overlap between failure and non-failure cases.
- Insight: These features might contribute less to predictive power on their own but could be useful in combination with others.

📌 Key Takeaway: Torque, Tool Wear, and Specific Failure Types were the most influential indicators of failures.

3. Model Selection and Training

Chosen Model: Random Forest Classifier

✅ Handles both numerical and categorical data.

✅ Resistant to noise and overfitting.

✅ Provides feature importance insights.

Model Training Configuration:

Training/Test Split: 80% training, 20% testing.
Hyperparameters:
- n_estimators=100 (number of trees)
- max_depth=5 (limits complexity to prevent overfitting)
Balanced class weights to ensure fair learning between failure and non-failure cases.

4. Model Evaluation

Performance Metrics:

Metric Result Accuracy 99.5% Precision 0.99 Recall 0.99 F1-Score 1.00 ROC-AUC Score 0.99

Confusion Matrix Analysis:
- 15 False Negatives (missed failures) – a potential risk for real-world deployment.
- 3 False Positives – minor but could lead to unnecessary maintenance.

📌 Key Takeaway: The high precision and recall indicate strong predictive capabilities, but false negatives need to be minimized for real-world deployment.

Results and Business Impact

Detection Performance

Model accurately predicts machine failures with high confidence (79% probability in real-time predictions).
Feature importance analysis validated key business insights:
- High torque and tool wear lead to failures.
- Temperature is less critical than mechanical factors.

Business Benefits

✔ Reduced Downtime – Proactive maintenance prevents costly breakdowns.

✔ Optimized Maintenance Schedules – Machines are serviced only when needed.

✔ Cost Savings – Avoids unnecessary part replacements and labor expenses.

✔ Longer Equipment Lifespan – Early failure detection helps maintain machine health.

📌 Key Takeaway: Implementing predictive maintenance models like this can lead to millions in cost savings for large-scale manufacturers.

Challenges and Solutions

Challenge 1: Data Imbalance

Problem: Failures were rare, causing the model to favor "no failure" predictions.
Solution: Applied SMOTE (Synthetic Minority Over-sampling Technique) to ensure balanced learning.

Challenge 2: False Negatives in Predictions

Problem: Missing failures could lead to catastrophic breakdowns.
Solution: Adjusted classification threshold to reduce false negatives.

Challenge 3: Real-World Generalization

Problem: The model was trained on synthetic data, so real-world variability may impact accuracy.
Solution: Next steps involve testing on real sensor data from industrial machines.

Instructions for Testing the Model

1. Run a Sample Prediction on Unseen Data

Use this Python script to test the model with new machine sensor readings:

python
CopyEdit
import pandas as pd

# Sample unseen machine data (simulated)
sample_data = pd.DataFrame({
'Type': [1],
'Air temperature [K]': [0.5],
'Process temperature [K]': [1.0],
'Rotational speed [rpm]': [0.8],
'Torque [Nm]': [1.5],
'Tool wear [min]': [1.2],
'Failure Type': [2]
})

# Ensure column order matches training data
sample_data = sample_data[X_train.columns]

# Predict using the trained model
prediction = rf_model.predict(sample_data)
prediction_proba = rf_model.predict_proba(sample_data)

# Display the prediction
print("Predicted Machine Failure (0 = No, 1 = Yes):", prediction[0])
print("Probability of No Failure:", prediction_proba[0][0])
print("Probability of Failure:", prediction_proba[0][1])

Future Work and Deployment Considerations

Next Steps for Real-World Deployment

Validate with Real Sensor Data – Test on actual IoT machine readings.
Implement a Live Monitoring System – Deploy as a REST API using Flask/FastAPI.
Adjust Classification Thresholds – Fine-tune probability thresholds to reduce false negatives.
Enable Real-Time Alerts – Send failure warnings to maintenance teams.
Deploy on Edge Devices – Implement on IoT hardware for on-site predictions.

Conclusion

This project successfully demonstrated how Machine Learning can be leveraged for Predictive Maintenance, enabling manufacturers to proactively prevent failures, optimize maintenance, and reduce costs.

High Accuracy (~99%), ensuring reliable predictions.
Key Business Insights from sensor data analysis.
Scalable Approach for industrial IoT applications.

📌 Final Takeaway: This model shows strong potential for real-world deployment, but requires further validation on live data streams before full integration into production environments.