Copy right@Michael Guo

In this notebook, I am cocatenate all meaningful data in one dataframe.

These data include: voltage, capacity, efficiency, charging OCV, discharging OCV, charging IR, discharging IR.

Later, each cells dQ/dV peak position.

exp_path = 'ProcessedData/' ocv_charging = read_file(exp_path+'OCV-charge-final.csv')

ocv_charging.head(5)

c = ocv_charging.columns c

col =['Cycle ID', 'Step Name', 'Voltage(V)', 'Current(mA)', 'Capacity(mAh)', 'Time(h:min:s.ms)', 'Realtime', 'Time', 'RTime']

opening the file in read mode

my_file = open("cells.txt", "r")

reading the file

data = my_file.read()

replacing end of line('/n') with ' ' and

splitting the text it further when '.' is seen.

cells_into_list = data.split("\n")

Define functions to read files with selected columns, convert data from a column data to row data.

Define selected columns name, change the name according to the cell name

Experiment regression on single cell data to predict capacity retention.

Use linear regression first

Convert data for all cells, export data to one dataframe for clustering

Need to predefine the 'cells.txt' file

Convert data in one dataframe for one cell, selected features ocv, capacity for regression,

Since all cells have the similar properties, we will concat all data together with the same column name.

use scikit learn for training

Only use OCV for retention prediction

use one cell data for the same training df_final_one

Explore tpot to search which regressor is the best

Use Randomforest regressor for the training

Explore correlation of data, remove high correlated data

Dimension reduction with correlation: The dimension is reduced from 207 features to 8 features

Explore clustering for the data

plt.figure(figsize=(12, 8)) plt.subplot(2,2,1) sns.scatterplot( t['Voltage_min1'],t['OCV_charge1'], hue=t['Labels'], palette=sns.color_palette('hls', 3)) plt.title('KMeans with 3 Clusters') plt.subplot(2,2,2) sns.scatterplot( t['Voltage_max1'],t['OCV_charge1'], hue=t['Labels'], palette=sns.color_palette('hls', 3)) plt.title('KMeans with 3 Clusters') plt.subplot(2,2,3) sns.scatterplot( t['Voltage_max1'],t['Voltage_min1'], hue=t['Labels'], palette=sns.color_palette('hls', 3)) plt.title('KMeans with 3 Clusters')

plt.subplot(2,2,4) sns.scatterplot( t['CellID'],t['Voltage_min1'], hue=t['Labels'], palette=sns.color_palette('hls', 3)) plt.title('KMeans with 3 Clusters') plt.show()