Model Predictive Analytics Terhadap Pasien Diabetes Menggunakan Exploratory Data Analysis dan Algoritma Random Forest

  • Hamid Muhammad Jumasa Muhammadiyah University of Purworejo
Keywords: Exploratory Data Analysis, EDA, Random Forest, Predictive Analytics


Diabetes is one of the diseases that fall into the category of chronic (long-term) diseases. This disease is characterized by increased blood sugar (glucose) levels that exceed the normal threshold. As a result, the function of the insulin hormone in the body is disrupted.1In 2021, the International Diabetes Federation (IDF) noted that there were 537 million adults aged 20 - 79 years (Reza Pahlevi, 2021). Diabetes also causes 6.7 million deaths. Several factors that cause diabetes include being overweight, high cholesterol levels, lifestyle and not exercising and age. Until now, no medicine has been found that can treat this disease completely, so what needs to be done is to detect diabetes early to control the dangers of diabetes.1
This research will create a predictive analytics model to predict whether someone will develop diabetes. The data analysis technique used Exploratory Data Analysis (EDA) and the machine learning model used Random Forest. This research used data from the website Kaggle with a total of 769 people. The data consists of 9 columns with 7 data and 2 data.
After analyzing the sampling data, the accuracy of the training data was 0.998207 with a Mean Squared Error of 0.00179. Testing data obtained was 0.74603 with Mean Squared Error of 0.25396. The prediction results from 20 sample data tested, obtained 18 times the model made correct predictions and 2 times the model made incorrect predictions.


