21 References
McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the
ideas immanent in nervous activity. The Bulletin of Mathematical
Biophysics, 5(4), 115–133. https://doi.org/10.1007/BF02478259
Shannon, C. E. (1948). A mathematical theory of communication. The
Bell System Technical Journal, 27(3), 379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
Turing, A. M. (1948). Intelligent machinery. National Physical
Laboratory.
Hebb, D. O. (1949). The organization of behavior: A
neuropsychological theory. Wiley.
Kullback, S., & Leibler, R. A. (1951). On information and
sufficiency. The Annals of Mathematical Statistics,
22(1), 79–86. https://doi.org/10.1214/aoms/1177729694
Robbins, H., & Monro, S. (1951). A stochastic approximation method.
The Annals of Mathematical Statistics, 22(3), 400–407.
https://doi.org/10.1214/aoms/1177729586
Farley, B. G., & Clark, W. A. (1954). Simulation of self-organizing
systems by digital computer. IRE Transactions on Information
Theory, 4(4), 76–84.
Minsky, M. L. (1954). Theory of neural-analog reinforcement systems
and its application to the brain-model problem [PhD thesis].
Princeton University.
Rochester, N., Holland, J. H., Habit, L. H., & Duda, W. L. (1956).
Tests on a cell assembly theory of the action of the brain, using a
large digital computer. IRE Transactions on Information Theory,
2(3), 80–93.
Rosenblatt, F. (1958). The perceptron: A probabilistic model for
information storage and organization in the brain. Psychological
Review, 65(6), 386–408. https://doi.org/10.1037/h0042519
Novikoff, A. B. J. (1962). On convergence proofs on perceptrons.
Proceedings of the Symposium on the Mathematical Theory of
Automata, 12, 615–622.
Polyak, B. T. (1964). Some methods of speeding up the convergence of
iteration methods. USSR Computational Mathematics and Mathematical
Physics, 4(5), 1–17. https://doi.org/10.1016/0041-5553(64)90137-5
Minsky, M., & Papert, S. (1969). Perceptrons: An introduction to
computational geometry. MIT Press.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning
representations by back-propagating errors. Nature,
323(6088), 533–536. https://doi.org/10.1038/323533a0
Hochreiter, S. (1991). Untersuchungen zu dynamischen neuronalen
netzen [PhD thesis]. Technische Universität
München.
Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term
dependencies with gradient descent is difficult. IEEE Transactions
on Neural Networks, 5(2), 157–166. https://doi.org/10.1109/72.279181
Cortes, C., & Vapnik, V. (1995). Support-vector networks.
Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018
LeCun, Y., Bottou, L., Orr, G. B., & Müller, K.-R. (1998). Efficient
BackProp. In Neural networks: Tricks of the trade (pp. 9–50).
Springer. https://doi.org/10.1007/3-540-49430-8_2
Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of
training deep feedforward neural networks. Proceedings of the 13th
International Conference on Artificial Intelligence and Statistics
(AISTATS), 249–256.
Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of
training recurrent neural networks. Proceedings of the 30th
International Conference on Machine Learning (ICML), 1310–1318.