1. “…everything became much clearer when I started writing code.”
2. The derivative can be thought of as a force on each input as we pull on the output to become higher.
3. The derivative with respect to some input can be computed by tweaking that input by a small amount and observing the change on the output value.
4. The analytic derivative requires no tweaking of the inputs. It can be derived using mathematics (calculus).
5. A single extra multiplication will turn a single (useless gate) into a cog in the complex machine that is an entire neural network.
6. A nice picture to have in mind is that as we pull on the circuit’s output value at the end, this induces pulls downward through the entire circuit, all the way down to the inputs.
7. “Maybe this is not immediately obvious, but this machinery is a powerful hammer for Machine Learning.”
8. A cost function is an expression that measures how bad your classifier is. When the training set is perfectly classified, the cost (ignoring the regularization) will be zero.
9. The majority of cost functions in Machine Learning consist of two parts: 1. A part that measures how well a model fits the data, and 2: Regularization, which measures some notion of how complex or likely a model is.