CLUSTERING ANALYSIS WITH VARIOUS DIMENSIONALITY REDUCTION TECHNIQUES AND KMEANS ALGORITHM ON WEEKLY SALES DATA.¶

The weekly sales data is from kaggle: https://www.kaggle.com/balaganeshm/clustering

It contains sales volume of 810 products every week for the whole year. It has both normalized and un-normalized data in the same dataframe so it is useful to do some analysis and we can try to cluster the products into different classes and see if we can extract some information.

LOADING AND UNDERSTANDING THE DATA¶

import numpy as np # linear algebra
import pandas as pd # data processing
from matplotlib import style

df = pd.read_csv('../input/clustering/clustering/data/sales_transactions_dataset_weekly.csv')

df.head() #show first 5 rows

df.info() #show overall stats of the dataframe

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 811 entries, 0 to 810
Columns: 107 entries, product_code to normalized_51
dtypes: float64(52), int64(54), object(1)
memory usage: 678.1+ KB

print(list(df.columns))   #show the different columns

['product_code', 'w0', 'w1', 'w2', 'w3', 'w4', 'w5', 'w6', 'w7', 'w8', 'w9', 'w10', 'w11', 'w12', 'w13', 'w14', 'w15', 'w16', 'w17', 'w18', 'w19', 'w20', 'w21', 'w22', 'w23', 'w24', 'w25', 'w26', 'w27', 'w28', 'w29', 'w30', 'w31', 'w32', 'w33', 'w34', 'w35', 'w36', 'w37', 'w38', 'w39', 'w40', 'w41', 'w42', 'w43', 'w44', 'w45', 'w46', 'w47', 'w48', 'w49', 'w50', 'w51', 'min', 'max', 'normalized_0', 'normalized_1', 'normalized_2', 'normalized_3', 'normalized_4', 'normalized_5', 'normalized_6', 'normalized_7', 'normalized_8', 'normalized_9', 'normalized_10', 'normalized_11', 'normalized_12', 'normalized_13', 'normalized_14', 'normalized_15', 'normalized_16', 'normalized_17', 'normalized_18', 'normalized_19', 'normalized_20', 'normalized_21', 'normalized_22', 'normalized_23', 'normalized_24', 'normalized_25', 'normalized_26', 'normalized_27', 'normalized_28', 'normalized_29', 'normalized_30', 'normalized_31', 'normalized_32', 'normalized_33', 'normalized_34', 'normalized_35', 'normalized_36', 'normalized_37', 'normalized_38', 'normalized_39', 'normalized_40', 'normalized_41', 'normalized_42', 'normalized_43', 'normalized_44', 'normalized_45', 'normalized_46', 'normalized_47', 'normalized_48', 'normalized_49', 'normalized_50', 'normalized_51']

df.isnull().any().sum()  #checkingfor null values

0

THOUGHTS:¶

No NULL values, there are 107 columns, of which product_code is the index , min and max are the numerical minimum and maximum products sold in 52 weeks. The normalized are repeated columns that are normalized with min_max strategy.

There are no categorical columns except product_code which is the index and can be removed. Rest are all numerical and since the data is already normalized we can just work with it.

raw = df.iloc[:,1:53]       #raw un-normalized columns
raw.describe()

normal = df.iloc[:,55:]
normal.describe()              #normalized columns

It is not possible to visualize high dimensional data. So we will implement a PCA method on the normalized data to project and calculate the feature variance. Both 2 and 3 components are visualized.

PCA ANALYSIS¶

What is PCA?

Principal component analysis (PCA) is a technique for reducing the dimensionality of high dimensional datasets, increasing interpretability but at the same time minimizing information loss. It does so by creating new uncorrelated variables that successively maximize variance.

In our case, we have a data of shape (811,52). Although it is feasible to train clustering models on n dimensional data, it is not feasible to visualize these said clusters. The best we can do is 2D or 3D. So we will compute the PCA vectors with 2 and 3 projections and visualize using plotly.

'''
compute pca and t-sne for the raw data. Store the lower dimensional data in new dataframes'''



from sklearn.decomposition import PCA
import matplotlib.pyplot as plt 
%matplotlib inline

pca_2 = PCA(n_components=2)
pca_3 = PCA(n_components=3)

principle_components_2 = pca_2.fit_transform(normal)
principle_components_3 = pca_3.fit_transform(normal)

pca_data_2 = pd.DataFrame(data=principle_components_2, columns = ['principle component 1', 'principle component 2'])
pca_data_3 = pd.DataFrame(data=principle_components_3, columns = ['principle component 1', 'principle component 2', 'principle component 3'])


print(f'variance of 2 dimensional PCA: {pca_2.explained_variance_ratio_}')
print(f'variance of 3 dimensional PCA: {pca_3.explained_variance_ratio_}')

variance of 2 dimensional PCA: [0.33139718 0.04500733]
variance of 3 dimensional PCA: [0.33139718 0.04500733 0.02435102]

2 projections¶

ax_pca = plt.scatter(x = pca_data_2['principle component 1'], y = pca_data_3['principle component 2'])
plt.xlabel('principle component 1')
plt.ylabel('principle component 2')
plt.title('PCA for normalized weekly sales data')
plt.show()

3 projections¶

import plotly.express as px


# Creating plot
px.scatter_3d(x = pca_data_3['principle component 1'], y = pca_data_3['principle component 2'], 
             z = pca_data_3['principle component 3'])

Okay, but selecting 2 or 3 is arbitray and we only do that so it is easier for use to visualize, but it might not explain all the variance in a dataset. Sometimes it might take >3 projections. So lets visualize the variance captured for each projections in increasing manner.

'''
plot the variance for all the columns
'''

pca = PCA()
components = pca.fit_transform(normal)

exp_var_cumul = np.cumsum(pca.explained_variance_ratio_)

px.area(
    x=range(1, exp_var_cumul.shape[0] + 1),
    y=exp_var_cumul,
    labels={"x": "# Components", "y": "Explained Variance"})

Generally, accepted variance to represent the entire dataset is 90%. In our case 40 features represents 90% of the dataset. But, since the dataset is not that huge and training complexity wont be affected by much from the rest of 12 features we will add them to training data anyways.

t-SNE analysis¶

t-SNE is another more modern approach of dimensionality reduction. Seriously, PCA was invented in 1933!

t-SNE is inherently different from PCA. In the latter we calculate the variance between the features and only take the feature set with a high variance threshol. t-SNE on the other hand calculates the closeness of the datapoints from a perspective of local cluster and global cluster.

There are two imporatant hyperparameters for t-SNE, perplexity which explains how to balance attention between local and global aspects of the data. Typically it ranges from 5 to 50 depending on the size of the data. Another parameter is the iterations, which is usually tuned for training computational complexity.

from sklearn.manifold import TSNE



tsne_2 = TSNE(n_components=2,perplexity=5).fit_transform(normal)
tsne_3 = TSNE(n_components=3,perplexity=5).fit_transform(normal)

tsne_data_2 = pd.DataFrame(data=tsne_2, columns = ['Embedding 1', 'Embedding 2'])
tsne_data_3 = pd.DataFrame(data=tsne_3, columns = ['Embedding 1', 'Embedding 2', 'Embedding 3'])

fig, ax = plt.subplots(1,2,figsize=(15,8))
ax[0].scatter(x = pca_data_2['principle component 1'], y = pca_data_3['principle component 2'])
ax[0].set_xlabel('principle component 1')
ax[0].set_ylabel('principle component 2')
ax[0].set_title('principle component analysis')

ax[1].scatter(x = tsne_data_2['Embedding 1'], y = tsne_data_2['Embedding 2'])
ax[1].set_xlabel('Embedding 1')
ax[1].set_ylabel('Embedding 2')
ax[1].set_title('t-SNE analysis')

plt.show()

from plotly.subplots import make_subplots


px.scatter_3d(x = tsne_data_3['Embedding 1'], y = tsne_data_3['Embedding 2'], 
             z = tsne_data_3['Embedding 3'])

Kmeans Clustering¶

The total number of clusters you expect should be small enough (otherwise there's no clustering) but large enough so that inertia can be reasonable (small enough). Inertia measures the typical distance between a data point and the center of its cluster.

But we have to do some analysis before getting to the point of optimum cluster count (k). First we will train a trivial number of clusters in the range from 1 to 50, expecting the optimal cluster to be somwhere between 5 to 10.

from sklearn.cluster import KMeans
from sklearn.metrics.cluster import contingency_matrix

''' cluster function takes dataframe and cluster size k as input
    returns the predicted cluster and model.
'''


def cluster(nclusters,data):
    kmeans = KMeans(n_clusters=nclusters)
    kmeans.fit(data)
    Z = kmeans.predict(data)
    return kmeans, Z

max_cluster_size = 50         

inertias = np.zeros(max_cluster_size)       #mask array that will be filled with calculated interias
for i in range(1, max_cluster_size):
    kmeans, Z = cluster(i,normal)
    inertias[i] = kmeans.inertia_

from plotly.graph_objs import *

'''
plot for the elbow method to find the optimal k.
'''

trace1 = {
  "mode": "lines+markers", 
  "name": "lines+markers", 
  "type": "scatter", 
  "x": list(range(1,50)), 
  "y": list(inertias[1:]),
    
    
}

data = Data([trace1])

layout = {
  "title": "Elbow for KMeans clustering", 
  "xaxis": {"title": "Number of clusters"}, 
  "yaxis": {"title": "Inertia"},

}
fig = Figure(data=data, layout=layout)

/opt/conda/lib/python3.7/site-packages/plotly/graph_objs/_deprecations.py:40: DeprecationWarning:

plotly.graph_objs.Data is deprecated.
Please replace it with a list or tuple of instances of the following types
  - plotly.graph_objs.Scatter
  - plotly.graph_objs.Bar
  - plotly.graph_objs.Area
  - plotly.graph_objs.Histogram
  - etc.

fig.show()

Elbow Method¶

In cluster analysis, the elbow method is a heuristic used in determining the number of clusters in a data set. The method consists of plotting the explained variation as a function of the number of clusters, and picking the elbow of the curve as the number of clusters to use.

If we look at the elbow graph result of our data, we can that we can choose the number of cluster to be somewhere between 3 and 6. We cannot be sure which of those would be optimal. Elbow method works well when there is a distinct change between two points. For example, if the difference btween 3 and 5 is steep, we can say 4 is the optimal k. If not, there is a workaround for this. We can calculate the scaled intertia rather than vannila intertia. Scaling here means, we add regularization in the form of alpha. Alpha should be a very low number. As alpha goes towards, 0 the number of cluster will be 1. In order to get the cluster size, we simply do a argmin of all found weighted interia.

$$ Weighted Inertia_i^k = {(Inertia_i^k) \over (\text{Inertia of k} = 1) \times alpha \times k}$$

def AutoKmeans(data,k,alpha_k=0.02):
    
    inertia_o = np.square((data.values - data.values.mean(axis=0))).sum()       
    kmeans = KMeans(n_clusters=k, random_state=0).fit(data)
    scaled_inertia = kmeans.inertia_ / inertia_o + alpha_k * k
    return scaled_inertia

def chooseBestKforKMeans(data, k_range):
    ans = []
    for k in k_range:
        scaled_inertia = AutoKmeans(data, k)
        ans.append((k, scaled_inertia))
    results = pd.DataFrame(ans, columns = ['k','Scaled Inertia']).set_index('k')
    
    return results

res_df = chooseBestKforKMeans(normal,range(1,50))

print(f'Best k for the data: {res_df.idxmin()[0]}')

Best k for the data: 4

Okay, so we found that 4 is the optimal cluster count with the weighted intertia method. Now we can train some model based on this and get the predictions for them.

n_clusters = 4
model, Z = cluster(n_clusters, normal)

model_pca, Z_pca = cluster(n_clusters, pca_data_2)
model_tsne, Z_tsne = cluster(n_clusters, tsne_data_2)

model_pca_3, Z_pca_3 = cluster(n_clusters, pca_data_3)
model_tsne_3, Z_tsne_3 = cluster(n_clusters, tsne_data_3)

Concatenating the predicted cluster with the PCA and t-sne data to visualize in low dimension.

pca_data_2['class_normal'] = Z
pca_data_2['class_pca'] = Z_pca

pca_data_3['class_normal'] = Z
pca_data_3['class_pca'] = Z_pca_3

tsne_data_2['class_normal'] = Z
tsne_data_2['class_tsne'] = Z_tsne

tsne_data_3['class_normal'] = Z
tsne_data_3['class_tsne'] = Z_tsne_3

PCA RESULTS¶

classes = ['1','2','3','4']

fig, ax = plt.subplots(1,2,figsize=(15,8))
ax[0].scatter(x = pca_data_2['principle component 1'], y = pca_data_2['principle component 2'],
                 c = pca_data_2['class_normal'], label=pca_data_2['class_normal'])
ax[0].set_xlabel('principle component 1')
ax[0].set_ylabel('principle component 2')
ax[0].set_title('Kmeans before PCA')

ax[1].scatter(x = pca_data_2['principle component 1'], y = pca_data_2['principle component 2'], 
             c = pca_data_2['class_pca'], label=pca_data_2['class_pca'])
ax[1].set_xlabel('principle component 1')
ax[1].set_ylabel('principle component 2')
ax[1].set_title('Kmeans with PCA')

plt.show()

px.scatter_3d(x = pca_data_3['principle component 1'], y = pca_data_3['principle component 2'], 
             z = pca_data_3['principle component 3'], color=pca_data_3['class_pca'])

THOUGHTS:¶

Applying K-means on PCA data and kmeans on raw normalized data are very similar. As we can see from the 2D and 3D clusters there are some outliers in both the cases, but overall the results seems to good.

T-SNE RESULTS¶

classes = ['1','2','3','4']

fig, ax = plt.subplots(1,2,figsize=(15,8))        #raw data subsplot
ax[0].scatter(x = tsne_data_2['Embedding 1'], y = tsne_data_2['Embedding 2'],
                 c = tsne_data_2['class_normal'], label=tsne_data_2['class_normal'])
ax[0].set_xlabel('Embedding 1')
ax[0].set_ylabel('Embedding 2')
ax[0].set_title('Kmeans before t-SNE')

ax[1].scatter(x = tsne_data_2['Embedding 1'], y = tsne_data_2['Embedding 2'],  #t-sne data subsplot
             c = tsne_data_2['class_tsne'], label=tsne_data_2['class_tsne'])
ax[1].set_xlabel('Embedding 1')
ax[1].set_ylabel('Embedding 2')
ax[1].set_title('Kmeans with t-SNE')

plt.show()

px.scatter_3d(x = tsne_data_3['Embedding 1'], y = tsne_data_3['Embedding 2'], 
             z = tsne_data_3['Embedding 3'], color=tsne_data_3['class_tsne'])

THOUGHTS¶

Both the dimensionality reduction techniques seems to work very well. But does it mean they are correct? From one of the stack overflow answer here the writer says that the t-sne clusters must be carefully understood before coming to conclusions as they can be misleading.

While clustering after t-SNE will sometimes (often?) work, you will never know whether the "clusters" you find are real, or just artifacts of t-SNE. You may just be seeing shapes in clouds.

Even though the visuals give some ideas about the models clustering capability, we need to quantify it to make it more trustworthy.

Quantifying results with some metrics is easier when working with supervised learning. But given the lack of lables in unsupervised learning, quantification is limited. One of the most common metrics to use for clustering problems is the Silhouette Coefficient. It is given by,

$$ S = {(b-a) \over \max(a,b)} $$

a = average intracluster distance between datapoints in a cluster.

b= average intercluster distance between datapoints in all clusters.

Silhouette ranges from -1 to 1, where -1 means the data is opposing to each other, 0 means the distance between clusters is insignificant, 1 means the distance between clusters are well pronounced.

from sklearn.metrics import silhouette_score, silhouette_samples

normal_score = silhouette_score(normal, model.labels_, metric='euclidean')
pca_score_2 = silhouette_score(pca_data_2, model_pca.labels_, metric='euclidean')
tsne_score_2 = silhouette_score(tsne_data_2, model_tsne.labels_, metric='euclidean')

pca_score_3 = silhouette_score(pca_data_3, model_pca_3.labels_, metric='euclidean')
tsne_score_3 = silhouette_score(tsne_data_3, model_tsne_3.labels_, metric='euclidean')


nl = '\n'              
print(f'KMeans Non Engineered Silhouette Score: {nl} {normal_score}')
      
print('\n')
              
print(f'KMeans PCA Scaled Silhouette Score; {nl} 2 components: {pca_score_2}; {nl} 3 components: {pca_score_3}')
      
      
print('\n')
print(f'KMeans t-SNE Scaled Silhouette Score;{nl}  2 embeddings: {tsne_score_2};{nl} 3 embeddings: {tsne_score_3}')

KMeans Non Engineered Silhouette Score: 
 0.07730232765455476


KMeans PCA Scaled Silhouette Score; 
 2 components: 0.731275924926822; 
 3 components: 0.7335979125472216


KMeans t-SNE Scaled Silhouette Score;
  2 embeddings: 0.44243904464483286;
 3 embeddings: 0.26589447561946394

This somewhat contradicts with the result from the visualizations of t-sne. Interestingly, the silhouetee score for the model trained on the entire df does not equate to the results shown in lower dimension visuals. This shows the need for quantification. Ok. So for our data, the final combination that works best would be Kmeans+PCA.

INFERENCE¶

Why do all this anyway? We can get some nice information from the clustered data. First lets concatenate the predicted labels back to the raw data and see if there are any patterns.

inf = raw.copy()
inf['class'] = Z

sales_group = inf.groupby('class').sum().astype(int).reset_index()    #groupby the predicted cluster and calculate the median sales 
weekly_sales = sales_group.drop('class',axis=1)

Let's say we want to see the sales pattern of the product in a week. We do not have names for the product but arbitrary values like P1 and P2. But let's assume if P1 is ice-cream and P2 is pencils. Obviously they both would have different sales pattern and possible be clustered into different groups. We want to see at which week of the year they sell the most so we can manage stocks.

A typical plot from the raw data would look like this. Very messy and illegible

fig = px.line(raw.T, title='Messy data representation of weekly sales')
fig.update_xaxes(title_text='Week')
fig.update_yaxes(title_text='Sales Count')
fig.show()

Now lets take a look at out clustered and grouped data. So much better. When a new product with certain parameter is added into the data, we can easily predict which time of the year the clustered product would sell the most. Offcourse the data is not very elobrate to support this kind of patterns since it only has amount of weekly sales from one year. It can be made more robust with other features added such as product categories, seasonal data, more data from different years etc..

Two types of products seems to fall behind in sales as the year progresses. Hmm I suppose what could that be? meanwhile the other two seems to be static and low volume.

fig = px.line(weekly_sales.T, title='After applying clustering to raw data')
fig.update_xaxes(title_text='Week')
fig.update_yaxes(title_text='Sales Count')
fig.show()

'''
Adding weekly data(every 4 columns) and further grouing by the predicted cluster and plotting monthly data'''


monthly = inf.iloc[:,:52].groupby((np.arange(len(inf.iloc[:,:52].columns)) // 4) + 1, axis=1).sum()
monthly['class'] = inf['class']

monthly_group = monthly.groupby('class').sum().astype(int).reset_index()
monthly_sales = monthly_group.drop('class',axis=1)

fig = px.line(monthly_sales.T, title='Monthly sales of different clusters')
fig.update_xaxes(title_text='Month')
fig.update_yaxes(title_text='Sales Count')
fig.show()

CONCLUSIONS¶

Okay so we went through two different dimensionality reduction techniques and one clustering techniuqe in the form of Kmeans. There are several other algorithims for clustering like DBSAN, Agglomerative clustering etc.. but this notebook is already too long and the results are quite decent for the combination of kmeans and PCA. Thank you for your attention and have a good day!

	product_code	w0	w1	w2	w3	w4	w5	w6	w7	w8	...	normalized_42	normalized_43	normalized_44	normalized_45	normalized_46	normalized_47	normalized_48	normalized_49	normalized_50	normalized_51
0	P1	11	12	10	8	13	12	14	21	6	...	0.06	0.22	0.28	0.39	0.50	0.00	0.22	0.17	0.11	0.39
1	P2	7	6	3	2	7	1	6	3	3	...	0.20	0.40	0.50	0.10	0.10	0.40	0.50	0.10	0.60	0.00
2	P3	7	11	8	9	10	8	7	13	12	...	0.27	1.00	0.18	0.18	0.36	0.45	1.00	0.45	0.45	0.36
3	P4	12	8	13	5	9	6	9	13	13	...	0.41	0.47	0.06	0.12	0.24	0.35	0.71	0.35	0.29	0.35
4	P5	8	5	13	11	6	7	9	14	9	...	0.27	0.53	0.27	0.60	0.20	0.20	0.13	0.53	0.33	0.40

	w0	w1	w2	w3	w4	w5	w6	w7	w8	w9	...	w42	w43	w44	w45	w46	w47	w48	w49	w50	w51
count	811.000000	811.000000	811.000000	811.000000	811.000000	811.000000	811.000000	811.000000	811.000000	811.000000	...	811.000000	811.000000	811.000000	811.000000	811.000000	811.000000	811.000000	811.000000	811.000000	811.000000
mean	8.902589	9.129470	9.389642	9.717633	9.574599	9.466091	9.720099	9.585697	9.784217	9.681874	...	8.394575	8.318126	8.434032	8.556104	8.720099	8.670777	8.674476	8.895191	8.861899	8.889026
std	12.067163	12.564766	13.045073	13.553294	13.095765	12.823195	13.347375	13.049138	13.550237	13.137916	...	11.348777	11.250455	11.223499	11.382041	11.621684	11.435870	11.222996	10.941375	10.492710	9.558011
min	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	...	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000
25%	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	...	0.000000	0.000000	0.000000	1.000000	1.000000	0.000000	1.000000	1.000000	1.000000	1.000000
50%	3.000000	3.000000	3.000000	4.000000	4.000000	3.000000	4.000000	4.000000	4.000000	4.000000	...	4.000000	4.000000	4.000000	4.000000	4.000000	4.000000	4.000000	4.000000	5.000000	5.000000
75%	12.000000	12.000000	12.000000	13.000000	13.000000	12.500000	13.000000	12.500000	13.000000	13.000000	...	10.000000	11.000000	11.000000	11.000000	11.000000	12.000000	12.000000	12.000000	13.000000	14.000000
max	54.000000	53.000000	56.000000	59.000000	61.000000	52.000000	56.000000	62.000000	63.000000	52.000000	...	52.000000	50.000000	46.000000	46.000000	55.000000	49.000000	50.000000	52.000000	57.000000	73.000000

	normalized_0	normalized_1	normalized_2	normalized_3	normalized_4	normalized_5	normalized_6	normalized_7	normalized_8	normalized_9	...	normalized_42	normalized_43	normalized_44	normalized_45	normalized_46	normalized_47	normalized_48	normalized_49	normalized_50	normalized_51
count	811.000000	811.000000	811.000000	811.000000	811.000000	811.000000	811.000000	811.000000	811.00000	811.000000	...	811.000000	811.000000	811.000000	811.000000	811.000000	811.000000	811.00000	811.000000	811.000000	811.000000
mean	0.289396	0.299100	0.306732	0.319852	0.326905	0.319420	0.332848	0.326572	0.32434	0.326843	...	0.299149	0.287571	0.304846	0.316017	0.334760	0.314636	0.33815	0.358903	0.373009	0.427941
std	0.266307	0.281343	0.284234	0.296498	0.297291	0.292765	0.301855	0.298986	0.29320	0.292093	...	0.266993	0.256630	0.263396	0.262226	0.275203	0.266029	0.27569	0.286665	0.295197	0.342360
min	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.00000	0.000000	...	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.00000	0.000000	0.000000	0.000000
25%	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.000000	0.00000	0.000000	...	0.000000	0.000000	0.000000	0.020000	0.085000	0.000000	0.10500	0.100000	0.110000	0.090000
50%	0.250000	0.280000	0.290000	0.290000	0.310000	0.300000	0.310000	0.330000	0.32000	0.330000	...	0.280000	0.270000	0.300000	0.310000	0.330000	0.310000	0.33000	0.330000	0.350000	0.430000
75%	0.500000	0.500000	0.500000	0.535000	0.550000	0.520000	0.530000	0.540000	0.53500	0.530000	...	0.490000	0.450000	0.500000	0.500000	0.500000	0.500000	0.50000	0.550000	0.560000	0.670000
max	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.00000	1.000000	...	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000	1.00000	1.000000	1.000000	1.000000