I'm using Keras to build an LSTM recurrent neural network. My code is working well, but could do with a serious refactoring. I am forecasting time series values, and depending on the window-size I want to predict, I end up writing code that seems far too specific to that window-size, i.e. it is hard to cater for lots of different sizes.
I split my dataset into train & test sets
train_size = int(len(dataset) * 0.67)
test_size = len(dataset) - train_size
train, test = dataset[0:train_size,:], dataset[train_size:len(dataset),:]
If I want to predict 5 timesteps ahead then my predictions, stored in the variable output
, will take the form [[t+6, t+7, t+8, t+9, t+10], [t+7, t+8, t+9, t+10, t+11]]
, i.e.
prediction 1 [t+6, t+7, t+8, t+9, t+10]
prediction 2 [t+7, t+8, t+9, t+10, t+11]
prediction 3 [t+8, t+9, t+10, t+11, t+12]
prediction 4 [t+9, t+10, t+11, t+12, t+13]
prediction 5 [t+10, t+11, t+12, t+13, t+14]
Now if I want to get these values back in a logical sequence, i.e. t+6, t+7, t+8,...,t+14
I am using this code
output_plot = np.array([])
output_plot = np.append(output_plot, output[0][0])
output_plot = np.append(output_plot, np.mean([output[0][1], output[1][0]]))
output_plot = np.append(output_plot, np.mean([output[0][2], output[1][1], output[2][0]]))
output_plot = np.append(output_plot, np.mean([output[0][3], output[1][2], output[2][1], output[3][0]]))
for i in range (len(output) - predict_steps + 1):
tmp = np.mean([output[i][4], output[i+1][3], output[i+2][2], output[i+3][1], output[i+4][0]])
output_plot = np.append(output_plot, tmp)
My problem arises when I want to extend the prediction window out to say 10 timesteps. Then I manually extend the preceding code as follows
output_plot = np.array([])
output_plot = np.append(output_plot, output[0][0])
output_plot = np.append(output_plot, np.mean([output[0][1], output[1][0]]))
output_plot = np.append(output_plot, np.mean([output[0][2], output[1][1], output[2][0]]))
output_plot = np.append(output_plot, np.mean([output[0][3], output[1][2], output[2][1], output[3][0]]))
output_plot = np.append(output_plot, np.mean([output[0][4], output[1][3], output[2][2], output[3][1], output[4][0]]))
output_plot = np.append(output_plot, np.mean([output[0][5], output[1][4], output[2][3], output[3][2], output[4][1], output[5][0]]))
output_plot = np.append(output_plot, np.mean([output[0][6], output[1][5], output[2][4], output[3][3], output[4][2], output[5][1], output[6][0]]))
output_plot = np.append(output_plot, np.mean([output[0][7], output[1][6], output[2][5], output[3][4], output[4][3], output[5][2], output[6][1], output[7][0]]))
output_plot = np.append(output_plot, np.mean([output[0][8], output[1][7], output[2][6], output[3][5], output[4][4], output[5][3], output[6][2], output[7][1], output[8][0]]))
for i in range (len(output) - predict_steps + 1):
tmp = np.mean([output[i][9], output[i+1][8], output[i+2][7], output[i+3][6], output[i+4][5], output[i+5][4], output[i+6][3], output[i+7][2], output[i+8][1], output[i+9][0]])
output_plot = np.append(output_plot, tmp)
While this works, it is horrendously inefficient. How can I best refactor these steps to make the code more amenable to a wider range of prediction windows? Also, my question title could do with some improvement, so please edit away!
Aucun commentaire:
Enregistrer un commentaire