Sky Ocean Technology: On-Demand AI

New AI：Single Epoch Learning (SEL)

　We provide technology for training neural networks such as deep learning accurately in a short time. Traditional training methods, such as Back Propagation (BP), adjust neural network parameters over and over again. Our patented SEL provides sufficient training with just one adjustment.
　In the conventional training method, the error is gradually reduced by iterative calculation based on the gradient information of the error of the neural network. On the other hand, this patented technology allows to reach nearly optimum solution with a single algebraic calculation. Therefore, the error can be reduced accurately without waste in calculation.

Fig.1 Error Comparison between SEL and BP (Yacht Hydrodynamics Data Set)

Performance Comparision

　We evaluated the algorithms using the datasets in UCI Machine Learning Repository that frequently used for machine learning performance evaluation (Table 1).

Table 1: Data Sets used in the Evaluation
Data Set	Input Dimension	Output Dimension	Output Information	Number of Training Data	Number of Test Data
1. Airfoil Self-Noise Data Set	5	1	Scaled sound pressure level	752	753
2. Computer Hardware Data Set	8	1	ERP	104	105
3. Concrete Compressive Strength Data Set	8	1	Concrete compressive strength	515	515
4. Online News Popularity Data Set	58	1	Shares	19822	19822
5. Wine Quality Data Set (red)	11	1	Quality	800	799
6. Wine Quality Data Set (white)	11	1	Quality	2449	2449
7. Yacht Hydrodynamics Data Set	6	1	Residuary resistance	154	154

　Table 2 shows the absolute value error and calculation time when single epoch learning was performed with one thread on Intel Core i9-8950HK, where, xx-yy indicates that the number of nodes in Intermediate Layer 1 was xx and those in Intermediate Layer 2 was yy. For example, in Airfoil Self-Noize Data Set 20-20, Input Layer had 5 nodes, Intermediate Layer 1 had 20 nodes, Intermediate Layer 2 had 20 nodes, and Output Layer had 1 node. Every layer was fully connected. The activation functions were "Linear" for Output Layer and "Sigmoid" for the other layers. The values in parentheses in the tables are the learning time [ms], measured by C++ STL chrono.

Table 2: Absolute Error and Learning Time [ms] (in parentheses) of SEL
Data Set	Int. Layer: None	2	5	10	20	2-2	5-5	10-10	20-20
1.	3.82 (3)	3.83 (6)	3.82 (10)	3.68 (12)	3.08 (22)	3.75 (4)	3.98 (12)	3.72 (26)	3.04 (72)
2.	23.4 (8)	21.1 (15)	17.7 (30)	12.9 (14)	80.2 (25)	44.1 (23)	24.1 (33)	58.6 (19)	56.5 (38)
3.	7.97 (11)	8.18 (20)	7.93 (20)	7.50 (24)	8.34 (35)	12.3 (31)	7.05 (26)	7.54 (30)	7.79 (64)
4.	5266 (231)	3243 (614)	3768 (1226)	3992 (2243)	5223 (4179)	3217 (660)	3225 (1323)	3314 (2616)	8.27E3 (5649)
5.	0.516 (9)	0.525 (13)	0.516 (23)	0.521 (37)	0.518 (59)	0.717 (21)	0.516 (29)	0.527 (52)	0.521 (102)
6.	0.579 (5)	0.582 (24)	0.579 (49)	0.580 (68)	0.579 (107)	0.659 (33)	0.579 (64)	0.579 (107)	0.635 (260)
7.	7.51 (2)	7.50 (14)	1.55 (42)	5.91 (17)	2.54 (17)	7.31 (13)	8.73 (20)	2.18 (62)	11.5 (27)

　The results of BP training are shown in Tables 3 and 4. These tables show the results at the time of 1000 learning epochs. From the comparison with Table 2, SEL reduced the error quicker with more accuracy. SEL also has adjustment-free feature of update rate (UR). BP is prone to fall in local optimums and sometimes gets error vibrations depending on its update rate. SEL does not take these problems.

Table 3: Absolute Error and Learning Time [ms] (in parentheses) of BP with 0.00001 UR
Data Set	Int. Layer: None	2	5	10	20	2-2	5-5	10-10	20-20
1	nan (82)	5.70 (235)	4.98 (395)	5.77 (778)	5.66 (1455)	5.70 (442)	5.70 (867)	5.70 (2567)	5.70 (4362)
2	nan (19)	91.8 (46)	93.0 (72)	93.1 (122)	93.1 (223)	91.7 (68)	93.0 (135)	93.1 (281)	93.3 (624)
3	nan (64)	12.8 (185)	12.8 (326)	10.8 (580)	12.8 (1086)	12.8 (365)	12.8 (531)	12.8 (1301)	12.8 (3817)
4	nan (1.3E4)	3207 (3.5E4)	3207 (5.8E4)	2.73E11 (9.8E4)	nan (1.6E5)	3207 (3.8E4)	3202 (6.9E4)	2.90E30 (1.2E5)	nan (2.3E5)
5	nan (128)	0.747 (373)	0.686 (668)	0.562 (986)	0.515 (1790)	0.518 (420)	1.34 (879)	0.528 (1874)	0.680 (4477)
6	nan (383)	0.678 (1073)	0.664 (2014)	0.649 (3610)	0.642 (6283)	0.665 (1276)	0.685 (2678)	0.629 (5665)	0.644 (1.3E4)
7	11.4 (21)	11.5 (48)	11.5 (79)	11.5 (136)	11.3 (250)	11.5 (66)	11.4 (141)	9.95 (300)	13.7 (770)

Table 4: Absolute Error and Learning Time [ms] (in parentheses) of BP with 0.0001 UR
Data Set	Int. Layer: None	2	5	10	20	2-2	5-5	10-10	20-20
1	nan (72)	5.70 (229)	4.91 (421)	4.93 (795)	5.70 (1478)	5.70 (393)	5.70 (857)	5.70 (1848)	5.70 (5910)
2	nan (19)	93.1 (44)	93.1 (72)	93.1 (129)	93.1 (241)	93.1 (68)	94.1 (133)	94.6 (283)	93.9 (688)
3	nan (67)	12.6 (214)	10.4 (326)	12.4 (594)	11.7 (1168)	12.8 (447)	12.8 (541)	12.8 (1288)	12.8 (3955)
4	nan (1.3E4)	nan (3.5E4)	nan (5.8E4)	nan (9.5E4)	nan (1.6E5)	nan (3.6E4)	nan (6.1E4)	nan (1.1E5)	nan (2.2E5)
5	nan (126)	0.728 (399)	0.739 (622)	0.707 (1090)	0.674 (1970)	0.741 (432)	1.34 (891)	0.528 (1867)	0.680 (4409)
6	nan (370)	0.641 (1166)	0.641 (1897)	3.53E34 (3466)	nan (5534)	0.666 (1270)	0.667 (2923)	1.47E38 (7443)	nan (1.2E4)
7	10.4 (22)	11.5 (47)	11.5 (80)	7.52 (135)	7.50 (247)	11.5 (67)	11.5 (139)	11.7 (300)	11.5 (763)