Post

self-learn MI复盘

难度-Medium

self-learn MI复盘

介绍:这里开展自学主题博客,主要是来让我学习理解关于机器学习,决策与预测,非参数统计,时间序列的课题内容代码部署,如果你看到这篇博客对应上述理解或者代码问题可以通过discord联系我我会及时纠正错误discord:lingmj

时间序列

时间序列分解法构建模型,查考知识文件:https://wiki.mbalib.com/wiki/%E6%97%B6%E9%97%B4%E5%BA%8F%E5%88%97%E5%88%86%E8%A7%A3%E6%B3%95, 手册地址:https://www.statsmodels.org/stable/generated/statsmodels.tsa.seasonal.seasonal_decompose.html

数据为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
年份	季度	时间代码	销售量
2015	1	       1	    25
        2	       2	    32
        3	       3	    37
        4	       4	    26
2016	1	       5	    30
        2	       6	    38
        3	       7	    42
        4	       8	    30
2017	1	       9	    29
        2	       10	    39
        3	       11	    50
        4	       12	    35
2018	1	       13	    30
        2	       14	    39
        3	       15	    51
        4	       16	    37
2019	1	       17	    29
        2	       18	    42
        3              19	    55
        4	       20	    38
2020	1	       21	    31
        2	       22	    43
        3	       23	    54
        4	       24	    41

python代码,编辑公式:Y=TSC

前期工作:安装numpy,pandas,plt,openpyxl

文件读取代码

1
2
3
4
5
6
7
8
9
10
11
12
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from statsmodels.tsa.seasonal import seasonal_decompose

file_path = "/home/lingmj/MI/data.xlsx"

df = pd.read_excel(file_path,engine="openpyxl")

print("数据如下:",end='\n')
print(df)

预处理

1
2
3
4
df['年份'] = df['年份'].ffill() 
df['时间代码'] = df['时间代码'].astype(int)
df['日期'] = pd.to_datetime(df['年份'].astype(int).astype(str) + '-' + df['季度'].astype(int).astype(str) + '-1', format='%Y-%m-%d')
df.set_index('日期', inplace=True)

时间序列分解,按照周期行进行处理,我这里使用的加分的,时间序列分解分为加分和乘法,提取季节部分,趋势部分,残差(随机项),并且去除缺失值

1
2
3
4
decomposition = seasonal_decompose(sales_series, model='additive', period=4)
trend = decomposition.trend.dropna() 
seasonal = decomposition.seasonal.dropna() 
residual = decomposition.resid.dropna()

使用乘法模型

1
2
3
4
decomposition = seasonal_decompose(sales_series, model='multiplicative', period=4)
trend = decomposition.trend.dropna()
seasonal = decomposition.seasonal.dropna()
residual = decomposition.resid.dropna()

但是我们是Y=STC,所以我们需要使用乘法模型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
X = np.arange(len(trend)).reshape(-1, 1)
y = trend.values
model = LinearRegression()
model.fit(X, y)

future_X = np.array([len(trend) + i for i in range(1, 5)]).reshape(-1, 1)
future_trend = model.predict(future_X)

seasonal_values = seasonal.values[:4] 

residual_mean = residual.mean()

predictions = future_trend * seasonal_values * residual_mean

results = []
for i in range(4):
    T = future_trend[i] + 1.58
    S = seasonal_values[i]
    C = residual_mean
    results.append([T, S, C, T * S * C])

results_df = pd.DataFrame(results, columns=['T', 'S', 'C', '预测值'], index=[1, 2, 3, 4])
print("2021年各季度预测结果:")
print(results_df)

结果:

1
2
3
4
5
6
7
root@LingMj:/home/lingmj/MI# python3 MI.py 
2021年各季度预测结果:
           T         S        C        预测值
1  45.665902  0.792230  0.99627  36.042933
2  46.219568  1.042365  0.99627  47.997937
3  46.773233  1.275205  0.99627  59.422995
4  47.326898  0.890201  0.99627  41.973293

因为结果有特殊误差和数据是我手动加的,进行了对应部分值单独研究和整理

T值是一元线性回归,它可以利用原包进行系数和常数计算并且赋予预测

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
X = df[['时间代码']] 
y = df['销售量']     

model = LinearRegression()

model.fit(X, y)

b0 = model.intercept_ 
b1 = model.coef_[0]   

time_codes_2021 = [25, 26, 27, 28]

predictions_2021 = [b0 + b1 * tc for tc in time_codes_2021]

for i, value in enumerate(predictions_2021, start=1):
    print(f"2021 年第 {i} 季度预测销售量 (T值): {value:.4f}")

接下来进行S值的计算,S值是移动平均,所以数值是可以直接公式获得

1
2
3
4
5
6
7
8
9
10
11
12
13
14
df['年份'] = df['年份'].ffill() 
df['时间代码'] = df['时间代码'].astype(int)
df['日期'] = pd.to_datetime(df['年份'].astype(int).astype(str) + '-' + df['季度'].astype(int).astype(str) + '-1', format='%Y-%m-%d')
df.set_index('日期', inplace=True)

sales_series = df['销售量']

decomposition = seasonal_decompose(sales_series, model='multiplicative', period=4)
seasonal = decomposition.seasonal.dropna() 

seasonal_values = seasonal.values[:4] 

for i, value in enumerate(seasonal_values, start=1):
    print(f"2021 年第 {i} 季度预测销售量 (S值): {value:.4f}")

最后就是C值的预测,首先需要四项居中平均和居中平均与T值进行除数计算获得

我对上述方法进行拆分,先进行四项居中平均操作

1
2
3
4
5
6
7
fore_averages = []

for i in range(3,len(df)):
    average = (df['销售量'][i-3] + df['销售量'][i-2] + df['销售量'][i-1] + df['销售量'][i])/4
    fore_averages.append(average)

df['四项居中平均'] = [None] * 3 + fore_averages

接下来是居中平均

1
2
3
4
5
6
7
averages = []

for i in range(3,len(df)-1):
    averag = (df['四项居中平均'][i] + df['四项居中平均'][i+1])/2
    averages.append(averag)

df['居中平均'] = [None] * 3 + averages + [None]

最后计算C值

1
2
3
4
5
6
7
predictions_C = []

for i in range(2, len(df['T'])):
    prediction_C = (df['居中平均'][i]/df['T'][i])
    predictions_C.append(round(prediction_C, 4))

df['C'] = [None] * 2 + predictions_C

通过计算出的C值按照季节平均进行预测

1
2
3
4
5
6
7
8
9
10
for i in range(4):
    C_means = 0
    for j in range(i,len(df),4):

        if pd.isna(df['C'].iloc[j]):
            C_means += 0
        else:
            C_means += df['C'].iloc[j]
    
    residual_mean.append(round(C_means/5, 4)) 

销售值预测

1
2
3
4
5
6
results = []
for i in range(4):
    T = future_trend[i]
    S = round(seasonal_values[i], 3)
    C = round(residual_mean[i], 3)
    results.append([T, S, C, T * S * C])

完整代码示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
file_path = "/home/lingmj/MI/data.xlsx"

df = pd.read_excel(file_path,engine="openpyxl")

X = df[['时间代码']] 
y = df['销售量']     

model = LinearRegression()

model.fit(X, y)

b0 = model.intercept_ 
b1 = model.coef_[0]   

time_codes_2021 = [25, 26, 27, 28]

predictions_2021 = [b0 + b1 * tc for tc in time_codes_2021]

future_trend = []

for i, value in enumerate(predictions_2021, start=1):
    future_trend.append(round(value, 3))





df['年份'] = df['年份'].ffill() 
df['时间代码'] = df['时间代码'].astype(int)
df['日期'] = pd.to_datetime(df['年份'].astype(int).astype(str) + '-' + df['季度'].astype(int).astype(str) + '-1', format='%Y-%m-%d')
df.set_index('日期', inplace=True)

sales_series = df['销售量']

decomposition = seasonal_decompose(sales_series, model='multiplicative', period=4)
seasonal = decomposition.seasonal.dropna() 

seasonal_values = seasonal.values[:20]

print(seasonal_values[17])

residual_mean = []

for i in range(4):
    C_means = 0
    for j in range(i,len(df),4):

        if pd.isna(df['C'].iloc[j]):
            C_means += 0
        else:
            C_means += df['C'].iloc[j]
    
    residual_mean.append(round(C_means/5, 4)) 

results = []
for i in range(4):
    T = future_trend[i]
    S = round(seasonal_values[i], 3)
    C = round(residual_mean[i], 3)
    results.append([T, S, C, T * S * C])



results_df = pd.DataFrame(results, columns=['T', 'S', 'C', '预测值'], index=[1, 2, 3, 4])
print("2021年各季度预测结果:")
print(results_df)

x_values = []
y_values = []

df = df.reset_index(drop=True)
results_df = results_df.reset_index(drop=True)

for i in range(len(df['时间代码'])):
    x_values.append(df['时间代码'].iloc[i])

x_values = x_values + time_codes_2021


for i in range(len(df['销售量'])):
    y_values.append(df['销售量'].iloc[i])

for i in range(len(results_df)):
    y_values.append(results_df['预测值'].iloc[i])



plt.figure(figsize=(10, 6))  
plt.plot(x_values, y_values, marker='o', linestyle='-', color='b', label='Sales and Predictions')  
plt.title('Sales and Predictions Over Time')  
plt.xlabel('Time Code')  
plt.ylabel('Sales and Predictions') 
plt.legend()  
plt.grid(True)  

plt.savefig('/home/lingmj/MI/line_chart.png')  

plt.show()

如果你有改进的方案欢迎discord联系我

This post is licensed under CC BY 4.0 by the author.