# NeuroEvolution of Augmenting Topologies

## Definitions

- Hidden node
- a node that is neither an input nor an output node.
- Bias unit
- An input that is always set to 1. Can connect to any node
*other*than an input.

## Evolution

Composed of:

- Selection
- Crossover
- Mutation

## Augmenting

We start with the smallest possible set of nodes containing inputs and outputs. This is called *minimal initialization*, and is a distinct advantage of NEAT.

The exact structure of your initial population is not set in stone, For simple networks, this could mean having a connection between each input node and output node.

The *bias* can be thought of as a y-intercept to our network: it allows us to shift our output up or down.

## Mutations

### Mutate Add Connection

## Speciation

The compatibility distance σ between two genomes is given by:

\begin{equation} \label{eq:1} \sigma=\frac{c_{1}E}{N}+\frac{c_{2}D}{N}+c_{3}\cdot \bar{W} \end{equation}

where:
\(N\) is the number of genes in the larger genome.
\(E\) is the number of excess genes.
\(D\) is the number of disjoint genes.
\(\bar{W}\) is the *average* difference in weight between all matching connections, including disabled ones.
\(c_1, c_2, c_3\) are hard-coded importance coefficients.

TODO It is unclear what “Number of genes” means, so I am doing number of nodes + number of connections.

## Adjusted Fitness

The paper defines the adjusted fitness of a particular genome as:

\begin{equation} \label{eq:2} f'=\frac{f_i}{\sum_{j=1}^n sh(\sigma(i,j))} \end{equation}

Where \(sh\) evaluates to 1 if the two genomes are similar enough to be classified as the same specifies. Thus, this equation simplifies to:

\begin{equation} \label{eq:3} f'=\frac{f_i}{N}, \end{equation}

where \(N\) is the number of genomes in the species.

## Breeding

For each species in every generation, only the single highest-performing genome survives. The rest of the new population is randomly bred from the other highest performing genomes, but almost all of those parents still die.

The number of offspring allocated to each species \(k\) is proportional to its relative performance in the generation.:

\begin{equation} \label{eq:4} n_k_{}_{}=\frac{\bar{F}_k_{}}{\bar{F}_{tot}}\cdot|P| \end{equation}

where: \(\bar{F}_k\) is the average adjusted fitness of species \(k\). \(\bar{F}_{tot}\) is the sum of the average adjusted fitnesses for each species. \(|P|\) is the constant population size.

## Activation

The output of any given node is given by:

\begin{align} \label{eq:5} y_j=\sigma \left(\sum_i w_{ij}x_i \right) \end{align}

Where \(w_{ij}\) is the connection weight from node \(i\) to node \(j\), \(x_i\) is the output of node \(i\), and \(\sigma\) is the activation function, which is typically the sigmoid function \(\frac{1}{1+e^{-x}}\).

However, https://neat-python.readthedocs.io/en/latest/glossary.html and other implementations use a different formula, given by:

*output = activation(bias+(response∗aggregation(inputs)))*

where *response* is a constant that may be subject to evolution.

## Normalizing Inputs

If the input variables are combined linearly, as in an MLP [multilayer perceptron], then it is rarely strictly necessary to standardize the inputs, at least in theory. The reason is that any rescaling of an input vector can be effectively undone by changing the corresponding weights and biases, leaving you with the exact same outputs as you had before. However, there are a variety of practical reasons why standardizing the inputs can make training faster and reduce the chances of getting stuck in local optima. Also, weight decay and Bayesian estimation can be done more conveniently with standardized inputs.