Predicting Readmission within 30 days for diabetic patients with TensorFlow

According to the World Health Organization, in 1980, 108 million people worldwide had diabetes and 422 million in 2014. It’s estimated that 700 million adults will have diabetes by 2045. Diabetes is a chronic illness where the blood sugar from someone’s body is too high. In the long term, it can cause serious complications like heart disease, kidney disease, blindness or the need for an amputation.

Readmission of patients is a metric that correlates to increased mortality risk, so yes the more readmission to the hospital the more likely that patient’s chances of death. Not only is readmission a problem for the patient’s health but it also costs hospitals money. The costs are estimated to be around $25 billion annually in the U.S alone. Using Deep learning we can predict if a patient is going to return to the hospital.

Photo by Mykenzie Johnson on Unsplash

Basics of deep learning

To understand how deep learning works, lets go through how we as humans learn. For example, let’s say that we wanted to cook something. We’d get the ingredients, cook and taste it to see how good our recipe tastes. After doing this over and over and every try making subtle changes that make the recipe taste better you finally perfect the recipe. Deep learning works in the same way. You give the model data, the model makes connections with the data by adding biases and weights then the model gives a predicted output. We measure how incorrect the model was and let the model try again, using what it learns from the previous attempts and tweaking itself until it achieves the lowest error rate that it can find.

Source: here

A model is comprised of layers. The input layer is where your data gets fed to the model. You can have as many hidden layers you want with as many nodes in each layer as you want. The last layer is called the output layer, this is where the model makes its prediction for a given data point.

Hidden layers and activation functions

More hidden layers in a network allows the model to create more complex relationships between the data. This takes away the need for feature extraction in a data set. All the layers in a network are interconnected as you can see in the diagram above. At each node in a hidden layer, you apply what’s called an activation function. These functions are just mathematical functions that get applied to values in the nodes of a hidden layer and are the reason why hidden layers can make complex non-linear connections. For a deeper understanding check this out.


As shown in the image above, these are common activation functions. In this article, we’re going to use the ReLU activation function for our input and hidden layers; and the Sigmoid activation function for our output layer. ReLU activation takes the value of the node and if it’s above 0 the value stays the same but if it’s below 0 it just gets set to 0. The Sigmoid is for classification of 0 or 1 (aka binary classification).

Making our model

Now that you have a basic understanding of deep learning, let’s get started on building out the model.

Data preparation

We’re going to be using NumPy and pandas so lets go ahead and import them.

import pandas as pd
import numpy as np

Now let’s load in the data and inspect some features of the dataset.

df = pd.read_csv("diabetic_data.csv")

The ‘?’s in the dataset are the same as having an empty cell so we replace those cells and we drop the columns where the data is random like the id column or there are a lot of missing data.

# drop columns not needed
df[df == "?"] = np.nan
df = df.drop(["encounter_id","weight", "medical_specialty","patient_nbr"],axis=1)

Next, we need to turn the column we’re predicting into 0’s and 1’s for readmitting within 30 days and not readmit within 30 days. Since we’re only focusing on 30 days the patients who are readmitted more than 30 days gets counted as a 0.

def binary_readmitted(elem):
if(elem == "<30"):
return 1
return 0
df["readmitted"] = df["readmitted"].apply(binary_readmitted)

Categorical columns can need to be turned into numbers so the model can understand them. We’re going to use a method called one-hot-encoding where you made a column for each category in a given column for multiple columns. Pandas have a built-in function that allows for this called get dummies. You can learn more about dummies here.

string_columns = [
df_dummies = pd.get_dummies(df[string_columns],drop_first=True)
df = df.drop(string_columns,axis=1)

Inspect the dummies


Add the dummies to the data set

df = df.join(df_dummies)

Now before we split the data into training and testing sets, if we make the number of rows with readmitting within 30-days the same as those who weren’t the accuracy of the model increases along with the true positive rate.

readmit_patients = df[df["readmitted"] == 1][:10500]
not_readmit_patients = df[df["readmitted"] == 0][:10500]
balanced_df = readmit_patients.append(not_readmit_patients)

Last but not least, lets split the data into the testing and the training sets.

y = balanced_df["readmitted"].values
X = balanced_df.drop("readmitted", axis=1).values
X_train, X_test, y_train, y_test = train_test_split(X,y, random_state=42, test_size=0.3)
input_shape = (len(list(df.columns)) - 1,)

Building and training our model

For our model, we’re going to use Tensorflow Keras. Our model is going to be sequential, meaning that one layer comes after another. We’re going to have An input layer with 120 nodes and a ReLU activation; 1 hidden layer with 50 nodes and ReLU activation and finally our output layer with 1 node for one output and a Sigmoid activation because we’re doing binary classification. If you’re interested in going deeper into configuring the model architecture check this out.

model = Sequential()
model.add(Dense(120, activation='relu', input_shape=input_shape))
model.add(Dense(50, activation='relu'))model.add(Dense(1, activation='sigmoid'))early_stopping_monitor = EarlyStopping(patience=2)

Now lets compile the model using the binary_crossentropy loss, adam optimizer and the accuracy metric.

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

Finally, we get to train the model with 13 epochs, using 0.3 of the train data to validate the model on unseen data while its training and a batch size of 10. The early stopping monitor stops the model from over training.

history =,y_train,epochs=13, validation_split=0.3, batch_size=10,

Evaluating our model

Doing this we get a validation accuracy of 0.78 and a validation loss of 0.41. Let’s look at the true positives. True positives are when the model predicts that the patient will be readmitted within 30 days and the prediction is true. We measure our true positives with AUC, the area under the curve of true positives over false positives. The closer to 1, the better and the closer to 0.5 the more the model is predicting randomly.

from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt
y_pred_test = model.predict(X_test)[:,0]
y_pred_train = model.predict(X_train)[:,0]
fpr_test, tpr_test, thresholds_test = roc_curve(y_test, y_pred_test)
fpr_train, tpr_train, thresholds_train = roc_curve(y_train, y_pred_train)
auc_test = auc(fpr_test, tpr_test)
auc_train = auc(fpr_train, tpr_train)
plt.plot([0, 1], [0, 1], 'k--')
plt.plot(fpr_train, tpr_train, label='Train (area = {:.3f})'.format(auc_train))
plt.plot(fpr_test, tpr_test, label='Test (area = {:.3f})'.format(auc_test))
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')

The AUC score is 0.87 on the validation data which is good. Now lets look at the loss of the training and validation data over the course of the epochs.

plt.title('model loss')
plt.legend(['train', 'val'], loc='upper left')

The training and validation lines are usually parallel but in this graph, it’s not. When the training loss goes lower, that means that the model is over-fitting to the data. Another factor that plays into this and the fact that the validation accuracy isn’t the best is because the data set itself has a lot of missing or heavily biased data. If we look at the data on Kaggle, It shows that for a lot of the columns, it has 100% No and 0% Steady. The data is imbalanced like this for the majority of the other columns, making this model not able to make predictions properly.

Thanks for reading, if you liked this article, follow me and check out my other articles!

AI + Software Engineering